Re: [Qemu-devel] [PATCH v2 2/2] ivshmem: use irqfd to interrupt among VMs
On Thu, Nov 29, 2012 at 12:42 PM, Cam Macdonell wrote: > On Tue, Nov 27, 2012 at 7:53 PM, liu ping fan wrote: >> On Wed, Nov 28, 2012 at 5:48 AM, Cam Macdonell wrote: >>> On Sat, Nov 24, 2012 at 8:51 PM, Liu Ping Fan wrote: From: Liu Ping Fan Using irqfd, so we can avoid switch between kernel and user when VMs interrupts each other. >>> >>> Nice work. Due to a hardware failure, there will be a small delay in >>> me being able to test this. I'll follow up as soon as I can. >>> >> BTW where can I find the latest guest code for test? >> I got the guest code from git://gitorious.org/nahanni/guest-code.git. >> But it seems outdated, after fixing the unlocked_ioctl, and vm-id bits >> position (the codes conflict with ivshmem spec), it works (I have >> tested for V1). > > Hello, > > Which device driver are you using? > guest-code/kernel_module/standard/kvm_ivshmem.c > Cam > >> >> Regards, >> Pingfan >> Signed-off-by: Liu Ping Fan --- hw/ivshmem.c | 54 +- 1 files changed, 53 insertions(+), 1 deletions(-) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 7c8630c..5709e89 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -19,6 +19,7 @@ #include "hw.h" #include "pc.h" #include "pci.h" +#include "msi.h" #include "msix.h" #include "kvm.h" #include "migration.h" @@ -83,6 +84,7 @@ typedef struct IVShmemState { uint32_t vectors; uint32_t features; EventfdEntry *eventfd_table; +int *vector_virqs; Error *migration_blocker; @@ -625,16 +627,62 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int version_id) return 0; } +static int ivshmem_vector_use(PCIDevice *dev, unsigned vector, + MSIMessage msg) +{ +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); +int virq; +EventNotifier *n = &s->peers[s->vm_id].eventfds[vector]; + +virq = kvm_irqchip_add_msi_route(kvm_state, msg); +if (virq >= 0 && kvm_irqchip_add_irqfd_notifier(kvm_state, n, virq) >= 0) { +s->vector_virqs[vector] = virq; +qemu_chr_add_handlers(s->eventfd_chr[vector], NULL, NULL, NULL, NULL); +} else if (virq >= 0) { +kvm_irqchip_release_virq(kvm_state, virq); +error_report("ivshmem, can not setup irqfd\n"); +return -1; +} else { +error_report("ivshmem, no enough msi route to setup irqfd\n"); +return -1; +} + +return 0; +} + +static void ivshmem_vector_release(PCIDevice *dev, unsigned vector) +{ +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); +EventNotifier *n = &s->peers[s->vm_id].eventfds[vector]; +int virq = s->vector_virqs[vector]; + +if (s->vector_virqs[vector] >= 0) { +kvm_irqchip_remove_irqfd_notifier(kvm_state, n, virq); +kvm_irqchip_release_virq(kvm_state, virq); +s->vector_virqs[vector] = -1; +} +} + static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address, uint32_t val, int len) { +bool is_enabled, was_enabled = msi_enabled(pci_dev); + pci_default_write_config(pci_dev, address, val, len); +is_enabled = msi_enabled(pci_dev); +if (!was_enabled && is_enabled) { +msix_set_vector_notifiers(pci_dev, ivshmem_vector_use, +ivshmem_vector_release); +} else if (was_enabled && !is_enabled) { +msix_unset_vector_notifiers(pci_dev); +} } static int pci_ivshmem_init(PCIDevice *dev) { IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); uint8_t *pci_conf; +int i; if (s->sizearg == NULL) s->ivshmem_size = 4 << 20; /* 4 MB default */ @@ -758,7 +806,10 @@ static int pci_ivshmem_init(PCIDevice *dev) } s->dev.config_write = ivshmem_write_config; - +s->vector_virqs = g_new0(int, s->vectors); +for (i = 0; i < s->vectors; i++) { +s->vector_virqs[i] = -1; +} return 0; } @@ -770,6 +821,7 @@ static void pci_ivshmem_uninit(PCIDevice *dev) migrate_del_blocker(s->migration_blocker); error_free(s->migration_blocker); } +g_free(s->vector_virqs); memory_region_destroy(&s->ivshmem_mmio); memory_region_del_subregion(&s->bar, &s->ivshmem); -- 1.7.4.4 >>
Re: [Qemu-devel] [PATCH v7 1/7] qom: apply atomic on object's refcount
On Thu, Nov 29, 2012 at 1:16 AM, Richard Henderson wrote: > On 11/24/2012 06:02 PM, Liu Ping Fan wrote: >> -obj->ref--; >> >> /* parent always holds a reference to its children */ >> -if (obj->ref == 0) { >> +if (__sync_fetch_and_sub(&obj->ref, 1) == 1) { > > if (__sync_sub_and_fetch(&obj->ref, 1) == 0) > Applied, thanks > > r~
Re: [Qemu-devel] [PATCH V12 6/7] libqblock API implement
Il 29/11/2012 07:25, Wenchao Xia ha scritto: >> >> How can this be called with libqb_global_data.init_flag == 1? >> > OK, I will remove the if, but keep init_flag which can show > library is already initialized. Please remove it if there is no usage elsewhere. For debugging, qemu_aio_context != NULL is just as good to check for initialization. Paolo
Re: [Qemu-devel] [PATCH v4 00/11] virtio: virtio-blk data plane
On Thu, Nov 22, 2012 at 4:16 PM, Stefan Hajnoczi wrote: > This series adds the -device virtio-blk-pci,x-data-plane=on property that > enables a high performance I/O codepath. A dedicated thread is used to > process > virtio-blk requests outside the global mutex and without going through the > QEMU > block layer. > > Khoa Huynh reported an increase from 140,000 IOPS to 600,000 > IOPS for a single VM using virtio-blk-data-plane in July: > > http://comments.gmane.org/gmane.comp.emulators.kvm.devel/94580 > > The virtio-blk-data-plane approach was originally presented at Linux Plumbers > Conference 2010. The following slides contain a brief overview: > > > http://linuxplumbersconf.org/2010/ocw/system/presentations/651/original/Optimizing_the_QEMU_Storage_Stack.pdf > > The basic approach is: > 1. Each virtio-blk device has a thread dedicated to handling ioeventfd >signalling when the guest kicks the virtqueue. > 2. Requests are processed without going through the QEMU block layer using >Linux AIO directly. > 3. Completion interrupts are injected via irqfd from the dedicated thread. > > To try it out: > > qemu -drive if=none,id=drive0,cache=none,aio=native,format=raw,file=... >-device virtio-blk-pci,drive=drive0,scsi=off,x-data-plane=on > > Limitations: > * Only format=raw is supported > * Live migration is not supported > * Block jobs, hot unplug, and other operations fail with -EBUSY > * I/O throttling limits are ignored > * Only Linux hosts are supported due to Linux AIO usage > > The code has reached a stage where I feel it is ready to merge. Users have > been playing with it for some time and want the significant performance boost. > > We are refactoring QEMU to get rid of the global mutex. I believe that > virtio-blk-data-plane can eventually become the default mode of operation. > > Instead of waiting for global mutex removal efforts to finish, I want to use > virtio-blk-data-plane as an example device for AioContext and threaded hw > dispatch refactoring. This means: > > 1. When the block layer can bind to an AioContext and execute I/O outside the >global mutex, virtio-blk-data-plane can use this (and gain image format >support). > > 2. When hw dispatch no longer needs the global mutex we can use hw/virtio.c >again and perhaps run a pool of iothreads instead of dedicated data plane >threads. > > But in the meantime, I have cleaned up the virtio-blk-data-plane code so that > it can be merged as an experimental feature. > > v4: > * Add qemu_iovec_concat_iov() [Paolo] > * Use QEMUIOVector to copy out virtio_blk_inhdr [Michael, Paolo] > > v3: > * Don't assume iovec layout [Michael] > * Better naming for hostmem.c MemoryListener callbacks [Don] > * More vring quarantining if commands are bogus instead of exiting [Blue] > > v2: > * Use MemoryListener for thread-safe memory mapping [Paolo, Anthony, and > everyone else pointed this out ;-)] > * Quarantine invalid vring instead of exiting [Blue] > * Replace __u16 kernel types with uint16_t [Blue] > > Changes from the RFC v9: > * Add x-data-plane=on|off option and coexist with regular virtio-blk code > * Create thread from BH so it inherits iothread cpusets > * Drain requests on vm_stop() so stopped guest does not access image file > * Add migration blocker > * Add bdrv_in_use() to prevent block jobs and other operations that can > interfere > * Drop IOQueue request merging for simplicity > * Drop ioctl interrupt injection and always use irqfd for simplicity > * Major cleanup to split up source files > * Rebase from qemu-kvm.git onto qemu.git > * Address Michael Tsirkin's review comments > > Stefan Hajnoczi (11): > raw-posix: add raw_get_aio_fd() for virtio-blk-data-plane > configure: add CONFIG_VIRTIO_BLK_DATA_PLANE > dataplane: add host memory mapping code > dataplane: add virtqueue vring code > dataplane: add event loop > dataplane: add Linux AIO request queue > iov: add iov_discard() to remove data > test-iov: add iov_discard() testcase > iov: add qemu_iovec_concat_iov() > dataplane: add virtio-blk data plane code > virtio-blk: add x-data-plane=on|off performance feature > > block.h| 9 + > block/raw-posix.c | 34 > configure | 21 +++ > hw/Makefile.objs | 2 +- > hw/dataplane/Makefile.objs | 3 + > hw/dataplane/event-poll.c | 109 > hw/dataplane/event-poll.h | 40 + > hw/dataplane/hostmem.c | 165 ++ > hw/dataplane/hostmem.h | 52 ++ > hw/dataplane/ioq.c | 118 + > hw/dataplane/ioq.h | 57 ++ > hw/dataplane/virtio-blk.c | 427 > + > hw/dataplane/virtio-blk.h | 41 + > hw/dataplane/vring.c | 344 > hw/dataplane/vring.h | 62 +++ > hw/virtio-blk.c| 59 ++- > hw/virtio-blk.h| 1 + > hw/
[Qemu-devel] [Bug 955379] Re: cmake hangs with qemu-arm-static
Ok, test case attached (80M tar). This hugely stripped one is not 100% reproducer, but do few loops and you will hit it. Instructions for using: - extract, chroot - cd /home/abuild/rpmbuild - su abuild - export RPM_BUILD_ROOT=$PWD - rpmbuild -ba SOURCES/libshortcut.spec ** Attachment added: "Tizen chroot" https://bugs.launchpad.net/qemu/+bug/955379/+attachment/3446633/+files/root5-small.tar.bz2 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/955379 Title: cmake hangs with qemu-arm-static Status in QEMU: New Status in Linaro QEMU: New Status in “qemu-linaro” package in Ubuntu: Confirmed Bug description: I'm using git commit 3e7ecd976b06f... configured with --target-list =arm-linux-user --static in a chroot environment to compile some things. I ran into this problem with both pcl and opencv-2.3.1. cmake consistently freezes at some point during its execution, though in a different spot each time, usually during a step when it's searching for some libraries. For instance, pcl most commonly stops after: [snip] -- Boost version: 1.46.1 -- Found the following Boost libraries: -- system -- filesystem -- thread -- date_time -- checking for module 'eigen3' -- found eigen3, version 3.0.1 which is perplexing because it freezes after finding what it wants, not during the search. When it does get past that point, it does so almost immediately but freezes somewhere else. I'm using 64-bit Ubuntu 11.10 with kernel release 3.0.0-16-generic with an Intel i5. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/955379/+subscriptions
[Qemu-devel] [Bug 955379] Re: cmake hangs with qemu-arm-static
Mind you, when you hit the bug it just hangs and cmake test errors are just to speed up the process of hitting the bug (if cmake just fails you did not hit the bug). Feel free to try with any qemu variant, they all hang similarly when bug is hit. I think that root had some suse 1.2 one inside. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/955379 Title: cmake hangs with qemu-arm-static Status in QEMU: New Status in Linaro QEMU: New Status in “qemu-linaro” package in Ubuntu: Confirmed Bug description: I'm using git commit 3e7ecd976b06f... configured with --target-list =arm-linux-user --static in a chroot environment to compile some things. I ran into this problem with both pcl and opencv-2.3.1. cmake consistently freezes at some point during its execution, though in a different spot each time, usually during a step when it's searching for some libraries. For instance, pcl most commonly stops after: [snip] -- Boost version: 1.46.1 -- Found the following Boost libraries: -- system -- filesystem -- thread -- date_time -- checking for module 'eigen3' -- found eigen3, version 3.0.1 which is perplexing because it freezes after finding what it wants, not during the search. When it does get past that point, it does so almost immediately but freezes somewhere else. I'm using 64-bit Ubuntu 11.10 with kernel release 3.0.0-16-generic with an Intel i5. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/955379/+subscriptions
[Qemu-devel] [PATCHv3] rbd block driver fix race between aio completition and aio cancel
This one fixes a race which qemu had also in iscsi block driver between cancellation and io completition. qemu_rbd_aio_cancel was not synchronously waiting for the end of the command. To archieve this it introduces a new status flag which uses -EINPROGRESS. Changes since last PATCH: - fixed missing braces - added vfree for bounce Signed-off-by: Stefan Priebe --- block/rbd.c | 27 ++- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/block/rbd.c b/block/rbd.c index 0aaacaf..917c64c 100644 --- a/block/rbd.c +++ b/block/rbd.c @@ -77,6 +77,7 @@ typedef struct RBDAIOCB { int error; struct BDRVRBDState *s; int cancelled; +int status; } RBDAIOCB; typedef struct RADOSCB { @@ -376,12 +377,6 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb) RBDAIOCB *acb = rcb->acb; int64_t r; -if (acb->cancelled) { -qemu_vfree(acb->bounce); -qemu_aio_release(acb); -goto done; -} - r = rcb->ret; if (acb->cmd == RBD_AIO_WRITE || @@ -406,10 +401,11 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb) acb->ret = r; } } +acb->status = 0; + /* Note that acb->bh can be NULL in case where the aio was cancelled */ acb->bh = qemu_bh_new(rbd_aio_bh_cb, acb); qemu_bh_schedule(acb->bh); -done: g_free(rcb); } @@ -568,6 +564,13 @@ static void qemu_rbd_aio_cancel(BlockDriverAIOCB *blockacb) { RBDAIOCB *acb = (RBDAIOCB *) blockacb; acb->cancelled = 1; + +while (acb->status == -EINPROGRESS) { +qemu_aio_wait(); +} + +qemu_vfree(acb->bounce); +qemu_aio_release(acb); } static const AIOCBInfo rbd_aiocb_info = { @@ -640,7 +643,9 @@ static void rbd_aio_bh_cb(void *opaque) qemu_bh_delete(acb->bh); acb->bh = NULL; -qemu_aio_release(acb); +if (!acb->cancelled) { +qemu_aio_release(acb); +} } static int rbd_aio_discard_wrapper(rbd_image_t image, @@ -685,6 +690,7 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs, acb->s = s; acb->cancelled = 0; acb->bh = NULL; +acb->status = -EINPROGRESS; if (cmd == RBD_AIO_WRITE) { qemu_iovec_to_buf(acb->qiov, 0, acb->bounce, qiov->size); @@ -731,7 +737,10 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs, failed: g_free(rcb); s->qemu_aio_count--; -qemu_aio_release(acb); +if (!acb->cancelled) { +qemu_vfree(acb->bounce); +qemu_aio_release(acb); +} return NULL; } -- 1.7.10.4
[Qemu-devel] [PATCH for-1.3] Build system clean generated source files in tests
Currently .c files generated in ./tests are not deleted in make clean. This introduce trouble that, once we made tests in source root directory, we can't do a succesfully build for tests in another out of tree directory, for that some file may miss the step to be generated. This patch fix it. Steps to reproduce: in root source directory: 1 ./configure --target-list=x86_64-softmmu 2 make check-report-unit.xml 3 make distclean && rm -rf *-linux-user *-softmmu goto another directory: 4 configure 5 make check-report-unit.xml Signed-off-by: Wenchao Xia --- tests/Makefile |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/tests/Makefile b/tests/Makefile index b60f0fb..2c90138 100644 --- a/tests/Makefile +++ b/tests/Makefile @@ -30,6 +30,7 @@ check-qtest-sparc-y = tests/m48t59-test$(EXESUF) check-qtest-sparc64-y = tests/m48t59-test$(EXESUF) GENERATED_HEADERS += tests/test-qapi-types.h tests/test-qapi-visit.h tests/test-qmp-commands.h +GENERATED_SOURCES += tests/test-qapi-types.c tests/test-qapi-visit.c tests/test-qmp-marshal.c test-obj-y = tests/check-qint.o tests/check-qstring.o tests/check-qdict.o \ tests/check-qlist.o tests/check-qfloat.o tests/check-qjson.o \ -- 1.7.1
Re: [Qemu-devel] [PATCH 1/3] define name for some fields of dr7
On 29 November 2012 03:32, liguang wrote: Your Subject: line is missing the "target-i386:" prefix. (also, should be "names".) > Signed-off-by: liguang > --- > target-i386/cpu.h | 13 + > 1 files changed, 13 insertions(+), 0 deletions(-) > > diff --git a/target-i386/cpu.h b/target-i386/cpu.h > index 90ef1ff..7f292e6 100644 > --- a/target-i386/cpu.h > +++ b/target-i386/cpu.h > @@ -558,6 +558,19 @@ > #define CPU_INTERRUPT_TPR CPU_INTERRUPT_TGT_INT_3 > > > +/* dr7 fields */ > +/* max breakpoints*/ > +#define MAX_BP 4 > +/* Break on instruction execution only */ > +#define BP_INST 0x0 > +/* Break on data writes only */ > +#define BP_DATA_WR 0x1 > +/* Break on I/O reads or writes */ ... or undefined for 386, 486 or if CR4 DE flag is clear. > +#define BP_IO_RW0x10 > +/* Break on data reads or writes but not instruction fetches */ > +#define BP_DATA_RW 0x11 These should all go next to the existing definitions in this file for some DR7 fields, and they should follow the existing naming conventions, ie DR7_something. I suggest naming them DR7_TYPE_INSN, DR7_TYPE_DATA_WR, etc. Could also use a comment that the DR7_TYPE_ constants are the values for the TYPE field of DR7, and are what is returned by hw_breakpoint_type(). -- PMM
Re: [Qemu-devel] [PATCH 2/3] use dr7's bit name for breakpoint
On 29 November 2012 03:32, liguang wrote: > Signed-off-by: liguang > --- > target-i386/cpu.h |2 ++ > target-i386/helper.c | 24 +++- > target-i386/misc_helper.c |6 +++--- > target-i386/seg_helper.c |6 +++--- > 4 files changed, 19 insertions(+), 19 deletions(-) > > diff --git a/target-i386/cpu.h b/target-i386/cpu.h > index 7f292e6..7ecfe21 100644 > --- a/target-i386/cpu.h > +++ b/target-i386/cpu.h > @@ -561,6 +561,8 @@ > /* dr7 fields */ > /* max breakpoints*/ > #define MAX_BP 4 > +/*enable local breakpoint bit 0,2,4,6*/ > +#define BP_LOCAL0x55 This needs a better name, to make it clear that it's not just a single enable bit but actually a mask of all the local enable bits. Also needs DR7_ prefix. You've split these changes up between patches inconsistently; either have one patch which adds all the constants and then patches which just use them, or have patches which both add and use the constants, but don't mix the two. I'd recommend that each patch should both add and use a related set of constants, so it's self contained and easy to review. -- PMM
Re: [Qemu-devel] [PATCH 3/3] target-i386:refactor check_hw_breakpoints function
On 29 November 2012 03:32, liguang wrote: > Signed-off-by: liguang > --- > target-i386/helper.c | 28 > 1 files changed, 20 insertions(+), 8 deletions(-) > > diff --git a/target-i386/helper.c b/target-i386/helper.c > index 9ca52a7..a506df0 100644 > --- a/target-i386/helper.c > +++ b/target-i386/helper.c > @@ -1012,22 +1012,34 @@ void hw_breakpoint_remove(CPUX86State *env, int index) > int check_hw_breakpoints(CPUX86State *env, int force_dr6_update) > { > target_ulong dr6; > -int reg, type; > +int index; > int hit_enabled = 0; > > dr6 = env->dr[6] & ~0xf; > -for (reg = 0; reg < MAX_BP; reg++) { > -type = hw_breakpoint_type(env->dr[7], reg); > -if ((type == 0 && env->dr[reg] == env->eip) || > -((type & 1) && env->cpu_watchpoint[reg] && > - (env->cpu_watchpoint[reg]->flags & BP_WATCHPOINT_HIT))) { > -dr6 |= 1 << reg; > -if (hw_breakpoint_enabled(env->dr[7], reg)) > +for (index = 0; index < MAX_BP; index++) { > +switch (hw_breakpoint_type(env->dr[7], index)){ > +case BP_INST: > +if (env->dr[index] != env->eip) > +break; > +goto enable_hit; > +case BP_DATA_WR: > +case BP_DATA_RW: > +if (!env->cpu_watchpoint[index]) > +break; > +if (!(env->cpu_watchpoint[index]->flags & BP_WATCHPOINT_HIT)) > +break; > +enable_hit: > +dr6 |= 1 << index; > +if (hw_breakpoint_enabled(env->dr[7], index)) > hit_enabled = 1; > +break; > +case BP_IO_RW: > +break; > } If you have to resort to gotos and fallthroughs in a switch statement, I don't think this is actually clearer than the existing code... (Also it has coding style issues, use scripts/checkpatch.pl.) -- PMM
Re: [Qemu-devel] [PATCH 03/10] qemu-ga: qmp_guest_file_*: improve error reporting
On Wed, 28 Nov 2012 16:17:43 -0600 mdroth wrote: > On Wed, Nov 28, 2012 at 04:26:29PM -0500, Eric Blake wrote: > > > > > > > if (ferror(fh)) { > > > > > +error_setg_errno(err, errno, "failed to read file"); > > > > > slog("guest-file-read failed, handle: %ld", handle); > > > > > -error_set(err, QERR_QGA_COMMAND_FAILED, "fread() > > > > > failed"); > > > > > } else { > > > > > > > > I'm not sure about relying on errno for FILE/f*() functions. C99 > > > > doesn't > > > > appear to require setting it for implementations > > > > Correct that C99 doesn't require it, but POSIX _does_ require it. > > > > Windows is the biggest system out there where errno is unreliable after > > failure on FILE operations (but as we DO support mingw, we ARE impacted > > by the lameness of Microsoft's C library being C89 but not POSIX). > > Well, if it's primarilly an issue with windows, then I think we're okay > relying on it for anything in qga/commands-posix.c at least, as those > implementations get swapped out for the implementations in > qga/commands-win32.o for win32/mingw builds. We'll need to be wary of > this in the future if we end up sharing more code with the win32 port > becomes more feature-full though. > > The comment elsewhere about setmntent() might still apply however. Ok, so can I respin only 05/10? > > > However, the other FILE functions seem safe to me. I'd be very > > > surprised > > > if some implementation doesn't set errno on fopen() failure for > > > example > > > > Then you probably haven't experimented much with mingw :) Nope. But I was referring to other unixes...
Re: [Qemu-devel] [PATCH v4 00/11] virtio: virtio-blk data plane
Il 29/11/2012 10:18, Stefan Hajnoczi ha scritto: > Michael, Paolo: Are you happy with v4? Sure. > Kevin: Do you want to take this series through the block tree? Paolo
Re: [Qemu-devel] [PATCH v4 03/11] dataplane: add host memory mapping code
On Thu, Nov 22, 2012 at 04:16:44PM +0100, Stefan Hajnoczi wrote: > The data plane thread needs to map guest physical addresses to host > pointers. Normally this is done with cpu_physical_memory_map() but the > function assumes the global mutex is held. The data plane thread does > not touch the global mutex and therefore needs a thread-safe memory > mapping mechanism. > > Hostmem registers a MemoryListener similar to how vhost collects and > pushes memory region information into the kernel. There is a > fine-grained lock on the regions list which is held during lookup and > when installing a new regions list. > > When the physical memory map changes the MemoryListener callbacks are > invoked. They build up a new list of memory regions which is finally > installed when the list has been completed. > > Note that this approach is not safe across memory hotplug because mapped > pointers may still be in used across memory unplug. However, this is > currently a problem for QEMU in general and needs to be addressed in the > future. Sounds like a serious problem. I'm not sure I understand - do you say this currently a problem for QEMU virtio? Coul you give an example please? > > Signed-off-by: Stefan Hajnoczi > --- > hw/dataplane/Makefile.objs | 3 + > hw/dataplane/hostmem.c | 165 > + > hw/dataplane/hostmem.h | 52 ++ > 3 files changed, 220 insertions(+) > create mode 100644 hw/dataplane/Makefile.objs > create mode 100644 hw/dataplane/hostmem.c > create mode 100644 hw/dataplane/hostmem.h > > diff --git a/hw/dataplane/Makefile.objs b/hw/dataplane/Makefile.objs > new file mode 100644 > index 000..8c8dea1 > --- /dev/null > +++ b/hw/dataplane/Makefile.objs > @@ -0,0 +1,3 @@ > +ifeq ($(CONFIG_VIRTIO), y) > +common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o > +endif > diff --git a/hw/dataplane/hostmem.c b/hw/dataplane/hostmem.c > new file mode 100644 > index 000..48aabf0 > --- /dev/null > +++ b/hw/dataplane/hostmem.c > @@ -0,0 +1,165 @@ > +/* > + * Thread-safe guest to host memory mapping > + * > + * Copyright 2012 Red Hat, Inc. and/or its affiliates > + * > + * Authors: > + * Stefan Hajnoczi > + * > + * This work is licensed under the terms of the GNU GPL, version 2 or later. > + * See the COPYING file in the top-level directory. > + * > + */ > + > +#include "exec-memory.h" > +#include "hostmem.h" > + > +static int hostmem_lookup_cmp(const void *phys_, const void *region_) > +{ > +hwaddr phys = *(const hwaddr *)phys_; > +const HostmemRegion *region = region_; > + > +if (phys < region->guest_addr) { > +return -1; > +} else if (phys >= region->guest_addr + region->size) { > +return 1; > +} else { > +return 0; > +} > +} > + > +/** > + * Map guest physical address to host pointer > + */ > +void *hostmem_lookup(Hostmem *hostmem, hwaddr phys, hwaddr len, bool > is_write) > +{ > +HostmemRegion *region; > +void *host_addr = NULL; > +hwaddr offset_within_region; > + > +qemu_mutex_lock(&hostmem->current_regions_lock); > +region = bsearch(&phys, hostmem->current_regions, > + hostmem->num_current_regions, > + sizeof(hostmem->current_regions[0]), > + hostmem_lookup_cmp); > +if (!region) { > +goto out; > +} > +if (is_write && region->readonly) { > +goto out; > +} > +offset_within_region = phys - region->guest_addr; > +if (offset_within_region + len <= region->size) { > +host_addr = region->host_addr + offset_within_region; > +} > +out: > +qemu_mutex_unlock(&hostmem->current_regions_lock); > + > +return host_addr; > +} > + > +/** > + * Install new regions list > + */ > +static void hostmem_listener_commit(MemoryListener *listener) > +{ > +Hostmem *hostmem = container_of(listener, Hostmem, listener); > + > +qemu_mutex_lock(&hostmem->current_regions_lock); > +g_free(hostmem->current_regions); > +hostmem->current_regions = hostmem->new_regions; > +hostmem->num_current_regions = hostmem->num_new_regions; > +qemu_mutex_unlock(&hostmem->current_regions_lock); > + > +/* Reset new regions list */ > +hostmem->new_regions = NULL; > +hostmem->num_new_regions = 0; > +} > + > +/** > + * Add a MemoryRegionSection to the new regions list > + */ > +static void hostmem_append_new_region(Hostmem *hostmem, > + MemoryRegionSection *section) > +{ > +void *ram_ptr = memory_region_get_ram_ptr(section->mr); > +size_t num = hostmem->num_new_regions; > +size_t new_size = (num + 1) * sizeof(hostmem->new_regions[0]); > + > +hostmem->new_regions = g_realloc(hostmem->new_regions, new_size); > +hostmem->new_regions[num] = (HostmemRegion){ > +.host_addr = ram_ptr + section->offset_within_region, > +.guest_addr = section->offset_within_address_space, > +.size = se
Re: [Qemu-devel] [RFC PATCH v2 1/3] virtio-bus : Introduce VirtioBus.
On 26/11/2012 17:59, Anthony Liguori wrote: Peter Maydell writes: On 26 November 2012 14:33, Anthony Liguori wrote: VirtioBusInfo is not a great name. This is a proxy class that allows for a device to implement the virtio bus interface. This could be done as an interface but since nothing else uses interfaces, I'm okay with something like this. But the first argument ought to be an opaque for all methods. We have at least one user of Interface in the tree IIRC. I'd much rather we did this the right way -- the only reason it's the way Fred has coded it is that there's no obvious body of code in the tree to copy, so we're thrashing around a bit. If you tell us what the correct set of structs/classes/ interfaces/etc is then we can implement it :-) I really think extending virtio-bus to a virtio-pci-bus and then initializing it with a link to the PCI device is the best approach. It's by far the simpliest approach in terms of coding. Did I explain it adequately? To recap: virtio-bus extends bus-state - implements everything that VirtIOBindings implements as methods virtio-pci-bus extends virtio-bus - is constructed with a pointer to a PCIDevice - implements the methods necessary to be a virtio bus I still have trouble with that ^^. virtio-pci-bus extends virtio-bus so I put something like that : static void virtio_pci_bus_class_init(ObjectClass *klass, void *data) { BusClass *bc = BUS_CLASS(klass); VirtioBusClass *k = VIRTIO_BUS_CLASS(klass); /* VirtIOBindings */ k->notify = virtio_pci_notify; k->save_config = virtio_pci_save_config; k->load_config = virtio_pci_load_config; k->save_queue = virtio_pci_save_queue; k->load_queue = virtio_pci_load_queue; k->get_features = virtio_pci_get_features; k->query_guest_notifiers = virtio_pci_query_guest_notifiers; k->set_host_notifier = virtio_pci_set_host_notifier; k->set_guest_notifiers = virtio_pci_set_guest_notifiers; k->vmstate_change = virtio_pci_vmstate_change; /* * TODO : Init and exit function. * void (*init)(void *opaque); * void (*exit)(void *opaque); */ } static TypeInfo virtio_pci_bus_info = { .name = TYPE_VIRTIO_PCI_BUS, .parent= TYPE_VIRTIO_BUS, .class_init= virtio_pci_bus_class_init, }; and I have VirtioDevice which extends DeviceState like that : static void virtio_device_class_init(ObjectClass *klass, void *data) { /* Set the default value here. */ DeviceClass *dc = DEVICE_CLASS(klass); dc->bus_type = TYPE_VIRTIO_BUS; dc->init = virtio_device_init; } static TypeInfo virtio_device_info = { .name = TYPE_VIRTIO_DEVICE, .parent = TYPE_DEVICE, .instance_size = sizeof(VirtioDeviceState), /* Abstract the virtio-device */ .class_init = virtio_device_class_init, .abstract = true, .class_size = sizeof(VirtioDeviceClass), }; The problem is that the virtio devices can't be connected to the virtio-pci-bus even if it extends virtio-bus because TYPE_VIRTIO_BUS != TYPE_VIRTIO_PCI_BUS. Did I miss something ? Thanks, Fred virtio-device extends device-state - implements methods used by virtio-bus Regards, Anthony Liguori -- PMM
Re: [Qemu-devel] [PATCH v4 03/11] dataplane: add host memory mapping code
On Thu, Nov 29, 2012 at 02:33:11PM +0200, Michael S. Tsirkin wrote: > On Thu, Nov 22, 2012 at 04:16:44PM +0100, Stefan Hajnoczi wrote: > > The data plane thread needs to map guest physical addresses to host > > pointers. Normally this is done with cpu_physical_memory_map() but the > > function assumes the global mutex is held. The data plane thread does > > not touch the global mutex and therefore needs a thread-safe memory > > mapping mechanism. > > > > Hostmem registers a MemoryListener similar to how vhost collects and > > pushes memory region information into the kernel. There is a > > fine-grained lock on the regions list which is held during lookup and > > when installing a new regions list. > > > > When the physical memory map changes the MemoryListener callbacks are > > invoked. They build up a new list of memory regions which is finally > > installed when the list has been completed. > > > > Note that this approach is not safe across memory hotplug because mapped > > pointers may still be in used across memory unplug. However, this is > > currently a problem for QEMU in general and needs to be addressed in the > > future. > > Sounds like a serious problem. > I'm not sure I understand - do you say this currently a problem for QEMU > virtio? Coul you give an example please? This is a limitation of the memory API but cannot be triggered by users today since we don't support memory hot unplug. I'm just explaining that virtio-blk-data-plane has the same issue as hw/virtio-blk.c or any other device emulation code here. Some more detail: The issue is that hw/virtio-blk.c submits an asynchronous I/O request on the host with the guest buffer. Then virtio-blk emulation returns back to the caller and continues QEMU execution. It is unsafe to unplug memory while the I/O request is pending since there's no mechanism (e.g. refcount) to wait until the guest memory is no longer in use. This is a known issue. There's no way to trigger a problem today but we need to eventually enhance QEMU's memory API to handle this case. Stefan
Re: [Qemu-devel] [PATCH v4 04/11] dataplane: add virtqueue vring code
On Thu, Nov 22, 2012 at 04:16:45PM +0100, Stefan Hajnoczi wrote: > The virtio-blk-data-plane cannot access memory using the usual QEMU > functions since it executes outside the global mutex and the memory APIs > are this time are not thread-safe. > > This patch introduces a virtqueue module based on the kernel's vhost > vring code. The trick is that we map guest memory ahead of time and > access it cheaply outside the global mutex. > > Once the hardware emulation code can execute outside the global mutex it > will be possible to drop this code. > > Signed-off-by: Stefan Hajnoczi Is there no way to factor out ommon code and share it with virtio.c? > --- > hw/Makefile.objs | 2 +- > hw/dataplane/Makefile.objs | 2 +- > hw/dataplane/vring.c | 344 > + > hw/dataplane/vring.h | 62 > trace-events | 3 + > 5 files changed, 411 insertions(+), 2 deletions(-) > create mode 100644 hw/dataplane/vring.c > create mode 100644 hw/dataplane/vring.h > > diff --git a/hw/Makefile.objs b/hw/Makefile.objs > index ea46f81..db87fbf 100644 > --- a/hw/Makefile.objs > +++ b/hw/Makefile.objs > @@ -1,4 +1,4 @@ > -common-obj-y = usb/ ide/ > +common-obj-y = usb/ ide/ dataplane/ > common-obj-y += loader.o > common-obj-$(CONFIG_VIRTIO) += virtio-console.o > common-obj-$(CONFIG_VIRTIO) += virtio-rng.o > diff --git a/hw/dataplane/Makefile.objs b/hw/dataplane/Makefile.objs > index 8c8dea1..34e6d57 100644 > --- a/hw/dataplane/Makefile.objs > +++ b/hw/dataplane/Makefile.objs > @@ -1,3 +1,3 @@ > ifeq ($(CONFIG_VIRTIO), y) > -common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o > +common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o > endif > diff --git a/hw/dataplane/vring.c b/hw/dataplane/vring.c > new file mode 100644 > index 000..2632fbd > --- /dev/null > +++ b/hw/dataplane/vring.c > @@ -0,0 +1,344 @@ > +/* Copyright 2012 Red Hat, Inc. > + * Copyright IBM, Corp. 2012 > + * > + * Based on Linux vhost code: > + * Copyright (C) 2009 Red Hat, Inc. > + * Copyright (C) 2006 Rusty Russell IBM Corporation > + * > + * Author: Michael S. Tsirkin > + * Stefan Hajnoczi > + * > + * Inspiration, some code, and most witty comments come from > + * Documentation/virtual/lguest/lguest.c, by Rusty Russell > + * > + * This work is licensed under the terms of the GNU GPL, version 2. > + */ > + > +#include "trace.h" > +#include "hw/dataplane/vring.h" > + > +/* Map the guest's vring to host memory */ > +bool vring_setup(Vring *vring, VirtIODevice *vdev, int n) > +{ > +hwaddr vring_addr = virtio_queue_get_ring_addr(vdev, n); > +hwaddr vring_size = virtio_queue_get_ring_size(vdev, n); > +void *vring_ptr; > + > +vring->broken = false; > + > +hostmem_init(&vring->hostmem); > +vring_ptr = hostmem_lookup(&vring->hostmem, vring_addr, vring_size, > true); > +if (!vring_ptr) { > +error_report("Failed to map vring " > + "addr %#" HWADDR_PRIx " size %" HWADDR_PRIu, > + vring_addr, vring_size); > +vring->broken = true; > +return false; > +} > + > +vring_init(&vring->vr, virtio_queue_get_num(vdev, n), vring_ptr, 4096); > + > +vring->last_avail_idx = 0; > +vring->last_used_idx = 0; > +vring->signalled_used = 0; > +vring->signalled_used_valid = false; > + > +trace_vring_setup(virtio_queue_get_ring_addr(vdev, n), > + vring->vr.desc, vring->vr.avail, vring->vr.used); > +return true; > +} > + > +void vring_teardown(Vring *vring) > +{ > +hostmem_finalize(&vring->hostmem); > +} > + > +/* Toggle guest->host notifies */ > +void vring_set_notification(VirtIODevice *vdev, Vring *vring, bool enable) > +{ > +if (vdev->guest_features & (1 << VIRTIO_RING_F_EVENT_IDX)) { > +if (enable) { > +vring_avail_event(&vring->vr) = vring->vr.avail->idx; > +} > +} else if (enable) { > +vring->vr.used->flags &= ~VRING_USED_F_NO_NOTIFY; > +} else { > +vring->vr.used->flags |= VRING_USED_F_NO_NOTIFY; > +} > +} > + > +/* This is stolen from linux/drivers/vhost/vhost.c:vhost_notify() */ > +bool vring_should_notify(VirtIODevice *vdev, Vring *vring) > +{ > +uint16_t old, new; > +bool v; > +/* Flush out used index updates. This is paired > + * with the barrier that the Guest executes when enabling > + * interrupts. */ > +smp_mb(); > + > +if ((vdev->guest_features & VIRTIO_F_NOTIFY_ON_EMPTY) && > +unlikely(vring->vr.avail->idx == vring->last_avail_idx)) { > +return true; > +} > + > +if (!(vdev->guest_features & VIRTIO_RING_F_EVENT_IDX)) { > +return !(vring->vr.avail->flags & VRING_AVAIL_F_NO_INTERRUPT); > +} > +old = vring->signalled_used; > +v = vring->signalled_used_valid; > +new = vring->signalled_used = vring->last_used_idx; > +vring->signalled_used_valid = true; > + > +if (unlike
Re: [Qemu-devel] [PATCH v4 03/11] dataplane: add host memory mapping code
On Thu, Nov 29, 2012 at 01:45:19PM +0100, Stefan Hajnoczi wrote: > On Thu, Nov 29, 2012 at 02:33:11PM +0200, Michael S. Tsirkin wrote: > > On Thu, Nov 22, 2012 at 04:16:44PM +0100, Stefan Hajnoczi wrote: > > > The data plane thread needs to map guest physical addresses to host > > > pointers. Normally this is done with cpu_physical_memory_map() but the > > > function assumes the global mutex is held. The data plane thread does > > > not touch the global mutex and therefore needs a thread-safe memory > > > mapping mechanism. > > > > > > Hostmem registers a MemoryListener similar to how vhost collects and > > > pushes memory region information into the kernel. There is a > > > fine-grained lock on the regions list which is held during lookup and > > > when installing a new regions list. > > > > > > When the physical memory map changes the MemoryListener callbacks are > > > invoked. They build up a new list of memory regions which is finally > > > installed when the list has been completed. > > > > > > Note that this approach is not safe across memory hotplug because mapped > > > pointers may still be in used across memory unplug. However, this is > > > currently a problem for QEMU in general and needs to be addressed in the > > > future. > > > > Sounds like a serious problem. > > I'm not sure I understand - do you say this currently a problem for QEMU > > virtio? Coul you give an example please? > > This is a limitation of the memory API but cannot be triggered by users > today since we don't support memory hot unplug. I'm just explaining > that virtio-blk-data-plane has the same issue as hw/virtio-blk.c or any > other device emulation code here. > > Some more detail: > > The issue is that hw/virtio-blk.c submits an asynchronous I/O request on > the host with the guest buffer. Then virtio-blk emulation returns back > to the caller and continues QEMU execution. > > It is unsafe to unplug memory while the I/O request is pending since > there's no mechanism (e.g. refcount) to wait until the guest memory is > no longer in use. > > This is a known issue. There's no way to trigger a problem today but we > need to eventually enhance QEMU's memory API to handle this case. > > Stefan For this problem we would simply need to flush outstanding aio before freeing memory for unplug, no refcount necessary. This patch however introduces the issue in the frontend and it looks like there won't be any way to solve it without changing the API. -- MST
Re: [Qemu-devel] [PATCH v4 03/11] dataplane: add host memory mapping code
On Thu, Nov 29, 2012 at 02:54:26PM +0200, Michael S. Tsirkin wrote: > On Thu, Nov 29, 2012 at 01:45:19PM +0100, Stefan Hajnoczi wrote: > > On Thu, Nov 29, 2012 at 02:33:11PM +0200, Michael S. Tsirkin wrote: > > > On Thu, Nov 22, 2012 at 04:16:44PM +0100, Stefan Hajnoczi wrote: > > > > The data plane thread needs to map guest physical addresses to host > > > > pointers. Normally this is done with cpu_physical_memory_map() but the > > > > function assumes the global mutex is held. The data plane thread does > > > > not touch the global mutex and therefore needs a thread-safe memory > > > > mapping mechanism. > > > > > > > > Hostmem registers a MemoryListener similar to how vhost collects and > > > > pushes memory region information into the kernel. There is a > > > > fine-grained lock on the regions list which is held during lookup and > > > > when installing a new regions list. > > > > > > > > When the physical memory map changes the MemoryListener callbacks are > > > > invoked. They build up a new list of memory regions which is finally > > > > installed when the list has been completed. > > > > > > > > Note that this approach is not safe across memory hotplug because mapped > > > > pointers may still be in used across memory unplug. However, this is > > > > currently a problem for QEMU in general and needs to be addressed in the > > > > future. > > > > > > Sounds like a serious problem. > > > I'm not sure I understand - do you say this currently a problem for QEMU > > > virtio? Coul you give an example please? > > > > This is a limitation of the memory API but cannot be triggered by users > > today since we don't support memory hot unplug. I'm just explaining > > that virtio-blk-data-plane has the same issue as hw/virtio-blk.c or any > > other device emulation code here. > > > > Some more detail: > > > > The issue is that hw/virtio-blk.c submits an asynchronous I/O request on > > the host with the guest buffer. Then virtio-blk emulation returns back > > to the caller and continues QEMU execution. > > > > It is unsafe to unplug memory while the I/O request is pending since > > there's no mechanism (e.g. refcount) to wait until the guest memory is > > no longer in use. > > > > This is a known issue. There's no way to trigger a problem today but we > > need to eventually enhance QEMU's memory API to handle this case. > > > > Stefan > > For this problem we would simply need to flush outstanding aio > before freeing memory for unplug, no refcount necessary. > > This patch however introduces the issue in the frontend > and it looks like there won't be any way to solve > it without changing the API. To clarify, as you say it is not triggerable so I don't think this is strictly required to address this at this point though it should not be too hard: just register callback that flushes the frontend processing. But if you can't code it at this point, please add a TODO comment in code. > -- > MST
Re: [Qemu-devel] [PATCH] virtio: limit avail bytes lookahead
On (Wed) 28 Nov 2012 [23:53:08], Michael S. Tsirkin wrote: > On Tue, Nov 27, 2012 at 06:25:04PM +0200, Michael S. Tsirkin wrote: > > On Thu, Nov 01, 2012 at 06:07:21PM +0200, Michael S. Tsirkin wrote: > > > Commit 0d8d7690850eb0cf2b2b60933cf47669a6b6f18f introduced > > > a regression in virtio-net performance because it looks > > > into the ring aggressively while we really only care > > > about a single packet worth of buffers. > > > To fix, add parameters limiting lookahead, and > > > use in virtqueue_avail_bytes. > > > > > > Signed-off-by: Michael S. Tsirkin > > > Reported-by: Edivaldo de Araujo Pereira > > > > Ping. > > Anthony - going to apply this? > > virtio rng was added since so naturally build broke. > Here's a patch on top to fix it up. I never used virtio rng before so > could not test at this hour, but it does fix the build. > > I'll take a look at how to test it tomorrow but any > info would be appreciated. > Amit could you pls review? Looks fine, I assume you will send a v2 of the patch to Anthony? Amit
Re: [Qemu-devel] [PATCH 3/3] spice-qemu-char: register interface on post load
On (Wed) 28 Nov 2012 [12:59:40], Markus Armbruster wrote: > Alon Levy writes: > > >> Alon Levy writes: > >> > >> > The target has not seen the guest_connected event via > >> > spice_chr_guest_open or spice_chr_write, and so spice server > >> > wrongly > >> > assumes there is no agent active, while the client continues to > >> > send > >> > motion events only by the agent channel, which the server ignores. > >> > The > >> > net effect is that the mouse is static in the guest. > >> > > >> > By registering the interface on post load spice server will pass on > >> > the > >> > agent messages fixing the mouse behavior after migration. > >> > > >> > RHBZ #725965 > >> > > >> > Signed-off-by: Alon Levy > >> > --- > >> > spice-qemu-char.c | 34 ++ > >> > 1 file changed, 34 insertions(+) > >> > > >> > diff --git a/spice-qemu-char.c b/spice-qemu-char.c > >> > index 09aa22d..08b6ba0 100644 > >> > --- a/spice-qemu-char.c > >> > +++ b/spice-qemu-char.c > >> > @@ -1,6 +1,7 @@ > >> > #include "config-host.h" > >> > #include "trace.h" > >> > #include "ui/qemu-spice.h" > >> > +#include "hw/virtio-serial.h" > >> > #include > >> > #include > >> > > >> > @@ -25,6 +26,7 @@ typedef struct SpiceCharDriver { > >> > uint8_t *datapos; > >> > ssize_t bufsize, datalen; > >> > uint32_t debug; > >> > +QEMUTimer *post_load_timer; > >> > } SpiceCharDriver; > >> > > >> > static int vmc_write(SpiceCharDeviceInstance *sin, const uint8_t > >> > *buf, int len) > >> > @@ -156,6 +158,7 @@ static void spice_chr_close(struct > >> > CharDriverState *chr) > >> > > >> > printf("%s\n", __func__); > >> > vmc_unregister_interface(s); > >> > +qemu_free_timer(s->post_load_timer); > >> > g_free(s); > >> > } > >> > > >> > @@ -188,6 +191,33 @@ static void print_allowed_subtypes(void) > >> > fprintf(stderr, "\n"); > >> > } > >> > > >> > +static void spice_chr_post_load_cb(void *opaque) > >> > +{ > >> > +SpiceCharDriver *s = opaque; > >> > + > >> > +vmc_register_interface(s); > >> > +} > >> > + > >> > +static int spice_chr_post_load(void *opaque, int version_id) > >> > +{ > >> > +SpiceCharDriver *s = opaque; > >> > + > >> > +if (s && s->chr && qemu_chr_be_connected(s->chr)) { > >> > +qemu_mod_timer(s->post_load_timer, 1); > >> > +} > >> > >> You use the time to delay spice_chr_post_load_cb(), right? Can you > >> explain why you have to delay? > > > > This is a precaution, it ensures vmc_register_interface is called when > > the vm is running as opposed to stopped which is the state when > > spice_chr_post_load is called. In theory vmc_register_interface could > > lead to an attempt to inject an interrupt into the guest, and we know > > that fails with kvm irqchip from a previous bug with virtio-serial. > > So your fixed delay of 1ns (I think) on the vm_clock is a roundabout way > to delay the callback from "post load" until after the run state > transition to RUN_STATE_RUNNING. Correct? > > If yes, then a VM change state handler might be cleaner. Agreed. Juan and I had discussed this earlier, and there are no hooks on the target side (i.e. for loadvm) yet. So this roundabout way is the best way to solve the problem for now. (This discussion with Juan was done earlier, when the patch to virtio-serial-bus pointed to by Alon in the other message was done). Amit
Re: [Qemu-devel] [RFC PATCH v2 1/3] virtio-bus : Introduce VirtioBus.
On 29 November 2012 12:37, Konrad Frederic wrote: > On 26/11/2012 17:59, Anthony Liguori wrote: >> virtio-pci-bus extends virtio-bus >> - is constructed with a pointer to a PCIDevice >> - implements the methods necessary to be a virtio bus > > I still have trouble with that ^^. > The problem is that the virtio devices can't be connected to the > virtio-pci-bus even if it extends virtio-bus because TYPE_VIRTIO_BUS != > TYPE_VIRTIO_PCI_BUS. Conceptually it ought to work I think: if the bus is-a TYPE_VIRTIO_BUS then we should permit plugging in even if the actual bus object happens to be an instance of a subclass. I suspect that qbus_find_recursive should be doing an object_class_dynamic_cast() to check that the bus is of a suitable type, rather than the (strcmp(object_get_typename(OBJECT(bus)), bus_typename) != 0) which it does at the moment. -- PMM
Re: [Qemu-devel] [PATCH v4 11/11] virtio-blk: add x-data-plane=on|off performance feature
On Thu, Nov 22, 2012 at 04:16:52PM +0100, Stefan Hajnoczi wrote: > The virtio-blk-data-plane feature is easy to integrate into > hw/virtio-blk.c. The data plane can be started and stopped similar to > vhost-net. > > Users can take advantage of the virtio-blk-data-plane feature using the > new -device virtio-blk-pci,x-data-plane=on property. > > The x-data-plane name was chosen because at this stage the feature is > experimental and likely to see changes in the future. > > If the VM configuration does not support virtio-blk-data-plane an error > message is printed. Although we could fall back to regular virtio-blk, > I prefer the explicit approach since it prompts the user to fix their > configuration if they want the performance benefit of > virtio-blk-data-plane. Not only that, this affects features exposed to guest so it really can't be trasparent. Which reminds me - shouldn't some features be turned off? For example, VIRTIO_BLK_F_SCSI? > Limitations: > * Only format=raw is supported > * Live migration is not supported This is probably fixable long term? > * Block jobs, hot unplug, and other operations fail with -EBUSY Hmm I don't see code to disable PCU unplug in this patch. I expected no_hotplug to be set. Where is it? > * I/O throttling limits are ignored And this? Meanwhile can we have attempts to set them fail? > * Only Linux hosts are supported due to Linux AIO usage > > Signed-off-by: Stefan Hajnoczi > --- > hw/virtio-blk.c | 59 > - > hw/virtio-blk.h | 1 + > hw/virtio-pci.c | 3 +++ > 3 files changed, 62 insertions(+), 1 deletion(-) > > diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c > index e25cc96..7f6004e 100644 > --- a/hw/virtio-blk.c > +++ b/hw/virtio-blk.c > @@ -17,6 +17,8 @@ > #include "hw/block-common.h" > #include "blockdev.h" > #include "virtio-blk.h" > +#include "hw/dataplane/virtio-blk.h" > +#include "migration.h" > #include "scsi-defs.h" > #ifdef __linux__ > # include > @@ -33,6 +35,8 @@ typedef struct VirtIOBlock > VirtIOBlkConf *blk; > unsigned short sector_mask; > DeviceState *qdev; > +VirtIOBlockDataPlane *dataplane; > +Error *migration_blocker; Would be nice to move the migration disabling checking supported formats and all the rest of it out to dataplane code. > } VirtIOBlock; > > static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev) > @@ -407,6 +411,14 @@ static void virtio_blk_handle_output(VirtIODevice *vdev, > VirtQueue *vq) > .num_writes = 0, > }; > > +/* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start > + * dataplane here instead of waiting for .set_status(). > + */ > +if (s->dataplane) { > +virtio_blk_data_plane_start(s->dataplane); > +return; > +} > + > while ((req = virtio_blk_get_request(s))) { > virtio_blk_handle_request(req, &mrb); > } > @@ -446,8 +458,13 @@ static void virtio_blk_dma_restart_cb(void *opaque, int > running, > { > VirtIOBlock *s = opaque; > > -if (!running) > +if (!running) { > +/* qemu_drain_all() doesn't know about data plane, quiesce here */ > +if (s->dataplane) { > +virtio_blk_data_plane_drain(s->dataplane); > +} > return; > +} > > if (!s->bh) { > s->bh = qemu_bh_new(virtio_blk_dma_restart_bh, s); > @@ -538,6 +555,10 @@ static void virtio_blk_set_status(VirtIODevice *vdev, > uint8_t status) > VirtIOBlock *s = to_virtio_blk(vdev); > uint32_t features; > > +if (s->dataplane && !(status & VIRTIO_CONFIG_S_DRIVER)) { > +virtio_blk_data_plane_stop(s->dataplane); > +} > + > if (!(status & VIRTIO_CONFIG_S_DRIVER_OK)) { > return; > } > @@ -604,6 +625,7 @@ VirtIODevice *virtio_blk_init(DeviceState *dev, > VirtIOBlkConf *blk) > { > VirtIOBlock *s; > static int virtio_blk_id; > +int fd = -1; > > if (!blk->conf.bs) { > error_report("drive property not set"); > @@ -619,6 +641,21 @@ VirtIODevice *virtio_blk_init(DeviceState *dev, > VirtIOBlkConf *blk) > return NULL; > } > > +if (blk->data_plane) { > +if (blk->scsi) { > +error_report("device is incompatible with x-data-plane, " > + "use scsi=off"); > +return NULL; > +} > + > +fd = raw_get_aio_fd(blk->conf.bs); > +if (fd < 0) { > +error_report("drive is incompatible with x-data-plane, " > + "use format=raw,cache=none,aio=native"); > +return NULL; > +} > +} > + > s = (VirtIOBlock *)virtio_common_init("virtio-blk", VIRTIO_ID_BLOCK, >sizeof(struct virtio_blk_config), >sizeof(VirtIOBlock)); > @@ -636,6 +673,17 @@ VirtIODevice *virtio_blk_init(DeviceState *dev, > VirtIOBlkConf *blk) > > s->vq = vi
[Qemu-devel] [PULL for-1.3 0/1] QMP queue
This is an important fix as it fixes a 32-bit breakage. The changes (since e9bff10f8db94912b1b0e6e2e3394cae02faf614) are available in the following repository: git://repo.or.cz/qemu/qmp-unstable.git queue/qmp Bruce Rogers (1): qapi: fix qapi_dealloc_type_size parameter type qapi/qapi-dealloc-visitor.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- 1.8.0
[Qemu-devel] [PULL] qapi: fix qapi_dealloc_type_size parameter type
From: Bruce Rogers The second parameter to qapi_dealloc_type_size should be a uint64_t *, not a size_t *. This was causing our 32 bit x86 build to fail, since warnings are treated as errors. Signed-off-by: Bruce Rogers Reviewed-by: Michael Roth Reviewed-by: Stefan Weil Signed-off-by: Luiz Capitulino --- qapi/qapi-dealloc-visitor.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/qapi/qapi-dealloc-visitor.c b/qapi/qapi-dealloc-visitor.c index a07b171..75214e7 100644 --- a/qapi/qapi-dealloc-visitor.c +++ b/qapi/qapi-dealloc-visitor.c @@ -132,7 +132,7 @@ static void qapi_dealloc_type_number(Visitor *v, double *obj, const char *name, { } -static void qapi_dealloc_type_size(Visitor *v, size_t *obj, const char *name, +static void qapi_dealloc_type_size(Visitor *v, uint64_t *obj, const char *name, Error **errp) { } -- 1.8.0
[Qemu-devel] [PATCH 2/3] usb: fail usbdevice_create() when there is no USB bus
From: Stefan Hajnoczi Report an error instead of segfaulting when attaching a USB device to a machine with no USB busses: $ qemu-system-arm -machine vexpress-a9 \ -sd Fedora-17-armhfp-vexpress-mmcblk0.img \ -kernel vmlinuz-3.4.2-3.fc17.armv7hl \ -initrd initramfs-3.4.2-3.fc17.armv7hl.img \ -usbdevice disk:format=raw:test.img Note that the vexpress-a9 machine does not have a USB host controller. Reported-by: David Abdurachmanov Signed-off-by: Stefan Hajnoczi Signed-off-by: Gerd Hoffmann --- hw/usb/bus.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/hw/usb/bus.c b/hw/usb/bus.c index 99aac7a..55d0edd 100644 --- a/hw/usb/bus.c +++ b/hw/usb/bus.c @@ -590,6 +590,13 @@ USBDevice *usbdevice_create(const char *cmdline) return NULL; } +if (!bus) { +error_report("Error: no usb bus to attach usbdevice %s, " + "please try -machine usb=on and check that " + "the machine model supports USB", driver); +return NULL; +} + if (!f->usbdevice_init) { if (*params) { error_report("usbdevice %s accepts no params", driver); -- 1.7.1
[Qemu-devel] [PATCH 1/3] usb: tag usb host adapters as not hotpluggable.
Hotplugging them simply doesn't work, so tag them accordingly to avoid users trying and then crashing qemu. For xhci there is nothing fundamental which prevents hotplug from working, we'll "only" need a exit() function which cleans up everything properly. That isn't for 1.3 though. For ehci+uhci+ohci hotplug can't be supported until qemu gains the capability to hotplug multifunction pci devices. https://bugzilla.redhat.com/show_bug.cgi?id=879096 Signed-off-by: Gerd Hoffmann --- hw/usb/hcd-ehci-pci.c |1 + hw/usb/hcd-ohci.c |1 + hw/usb/hcd-uhci.c |1 + hw/usb/hcd-xhci.c |1 + 4 files changed, 4 insertions(+), 0 deletions(-) diff --git a/hw/usb/hcd-ehci-pci.c b/hw/usb/hcd-ehci-pci.c index 5887eab..41dbb53 100644 --- a/hw/usb/hcd-ehci-pci.c +++ b/hw/usb/hcd-ehci-pci.c @@ -123,6 +123,7 @@ static void ehci_class_init(ObjectClass *klass, void *data) k->revision = i->revision; k->class_id = PCI_CLASS_SERIAL_USB; k->config_write = usb_ehci_pci_write_config; +k->no_hotplug = 1; dc->vmsd = &vmstate_ehci_pci; dc->props = ehci_pci_properties; } diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c index 64de906..e16a2ec 100644 --- a/hw/usb/hcd-ohci.c +++ b/hw/usb/hcd-ohci.c @@ -1882,6 +1882,7 @@ static void ohci_pci_class_init(ObjectClass *klass, void *data) k->vendor_id = PCI_VENDOR_ID_APPLE; k->device_id = PCI_DEVICE_ID_APPLE_IPID_USB; k->class_id = PCI_CLASS_SERIAL_USB; +k->no_hotplug = 1; dc->desc = "Apple USB Controller"; dc->props = ohci_pci_properties; } diff --git a/hw/usb/hcd-uhci.c b/hw/usb/hcd-uhci.c index 8e47803..d053791 100644 --- a/hw/usb/hcd-uhci.c +++ b/hw/usb/hcd-uhci.c @@ -1327,6 +1327,7 @@ static void uhci_class_init(ObjectClass *klass, void *data) k->device_id = info->device_id; k->revision = info->revision; k->class_id = PCI_CLASS_SERIAL_USB; +k->no_hotplug = 1; dc->vmsd = &vmstate_uhci; dc->props = uhci_properties; u->info = *info; diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c index 8ef4b07..efb509e 100644 --- a/hw/usb/hcd-xhci.c +++ b/hw/usb/hcd-xhci.c @@ -3167,6 +3167,7 @@ static void xhci_class_init(ObjectClass *klass, void *data) k->class_id = PCI_CLASS_SERIAL_USB; k->revision = 0x03; k->is_express = 1; +k->no_hotplug = 1; } static TypeInfo xhci_info = { -- 1.7.1
[Qemu-devel] [PATCH 1/1] qxl: reload memslots after migration, when qxl is in UNDEFINED mode
From: Yonit Halperin The devram memslot stays active when qxl enters UNDEFINED mode (i.e, no primary surface). If migration has occurred while the device is in UNDEFINED stae, the memslots have to be reloaded at the destination. Fixes rhbz#874574 Signed-off-by: Yonit Halperin Signed-off-by: Gerd Hoffmann --- hw/qxl.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/hw/qxl.c b/hw/qxl.c index 1bc2d32..96887c4 100644 --- a/hw/qxl.c +++ b/hw/qxl.c @@ -2146,6 +2146,7 @@ static int qxl_post_load(void *opaque, int version) switch (newmode) { case QXL_MODE_UNDEFINED: +qxl_create_memslots(d); break; case QXL_MODE_VGA: qxl_create_memslots(d); -- 1.7.1
[Qemu-devel] [PULL for-1.3 0/1] spice patch queue.
Hi, Pretty short this time, with a single lonely bugfix. please pull, Gerd The following changes since commit e9bff10f8db94912b1b0e6e2e3394cae02faf614: event notifier: Fix setup for win32 (2012-11-28 13:33:01 -0600) are available in the git repository at: git://anongit.freedesktop.org/spice/qemu spice.v65 Yonit Halperin (1): qxl: reload memslots after migration, when qxl is in UNDEFINED mode hw/qxl.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-)
Re: [Qemu-devel] [PATCH 0/2] QEMU/xen: simplify cpu_ioreq_pio and cpu_ioreq_move
On Thu, 29 Nov 2012, Xu, Dongxiao wrote: > > -Original Message- > > From: Stefano Stabellini [mailto:stefano.stabell...@eu.citrix.com] > > Sent: Tuesday, September 18, 2012 6:24 PM > > To: Xu, Dongxiao > > Cc: Stefano Stabellini; xen-de...@lists.xensource.com; Ian Jackson; > > qemu-devel@nongnu.org; Keir (Xen.org) > > Subject: RE: [PATCH 0/2] QEMU/xen: simplify cpu_ioreq_pio and > > cpu_ioreq_move > > > > On Tue, 18 Sep 2012, Xu, Dongxiao wrote: > > > Hi Stefano, > > > > > > Is these patches merged with Xen 4.2? I didn't see them in the upstream. > > > The uint/int fix is critical to fix the nested guest boot issue. > > > > They are not. Ian decided that he wanted to merge a different version of > > them. > > Hi Stefano and Ian, > > What's the status of the two patches? I still didn't see them merged... Ian, do you have any updates? > Also I think 4.2.1 need these patches to enable the basic Xen on Xen nested > virtualization usage scenario. I agree.
Re: [Qemu-devel] [PATCH] qapi: fix qapi_dealloc_type_size parameter type
On Tue, Nov 27, 2012 at 01:11:25PM -0700, Bruce Rogers wrote: > The second parameter to qapi_dealloc_type_size should be a uint64_t *, > not a size_t *. This was causing our 32 bit x86 build to fail, since > warnings are treated as errors. > > Signed-off-by: Bruce Rogers > --- > qapi/qapi-dealloc-visitor.c |2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) Sorry for the broken build. Reviewed-by: Stefan Hajnoczi
Re: [Qemu-devel] [PATCH v4 10/11] dataplane: add virtio-blk data plane code
On Thu, Nov 22, 2012 at 04:16:51PM +0100, Stefan Hajnoczi wrote: > virtio-blk-data-plane is a subset implementation of virtio-blk. It only > handles read, write, and flush requests. It does this using a dedicated > thread that executes an epoll(2)-based event loop and processes I/O > using Linux AIO. > > This approach performs very well but can be used for raw image files > only. The number of IOPS achieved has been reported to be several times > higher than the existing virtio-blk implementation. > > Eventually it should be possible to unify virtio-blk-data-plane with the > main body of QEMU code once the block layer and hardware emulation is > able to run outside the global mutex. > > Signed-off-by: Stefan Hajnoczi > --- > hw/dataplane/Makefile.objs | 2 +- > hw/dataplane/virtio-blk.c | 427 > + > hw/dataplane/virtio-blk.h | 41 + > trace-events | 6 + > 4 files changed, 475 insertions(+), 1 deletion(-) > create mode 100644 hw/dataplane/virtio-blk.c > create mode 100644 hw/dataplane/virtio-blk.h > > diff --git a/hw/dataplane/Makefile.objs b/hw/dataplane/Makefile.objs > index abd408f..682aa9e 100644 > --- a/hw/dataplane/Makefile.objs > +++ b/hw/dataplane/Makefile.objs > @@ -1,3 +1,3 @@ > ifeq ($(CONFIG_VIRTIO), y) > -common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o event-poll.o > ioq.o > +common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o event-poll.o > ioq.o virtio-blk.o > endif > diff --git a/hw/dataplane/virtio-blk.c b/hw/dataplane/virtio-blk.c > new file mode 100644 > index 000..9b29969 > --- /dev/null > +++ b/hw/dataplane/virtio-blk.c > @@ -0,0 +1,427 @@ > +/* > + * Dedicated thread for virtio-blk I/O processing > + * > + * Copyright 2012 IBM, Corp. > + * Copyright 2012 Red Hat, Inc. and/or its affiliates > + * > + * Authors: > + * Stefan Hajnoczi > + * > + * This work is licensed under the terms of the GNU GPL, version 2 or later. > + * See the COPYING file in the top-level directory. > + * > + */ > + > +#include "trace.h" > +#include "iov.h" > +#include "event-poll.h" > +#include "qemu-thread.h" > +#include "vring.h" > +#include "ioq.h" > +#include "hw/virtio-blk.h" > +#include "hw/dataplane/virtio-blk.h" > + > +enum { > +SEG_MAX = 126, /* maximum number of I/O segments */ > +VRING_MAX = SEG_MAX + 2,/* maximum number of vring descriptors */ > +REQ_MAX = VRING_MAX,/* maximum number of requests in the > vring, > + * is VRING_MAX / 2 with traditional and > + * VRING_MAX with indirect descriptors */ > +}; > + > +typedef struct { > +struct iocb iocb; /* Linux AIO control block */ > +QEMUIOVector *inhdr;/* iovecs for virtio_blk_inhdr */ > +unsigned int head; /* vring descriptor index */ > +} VirtIOBlockRequest; > + > +struct VirtIOBlockDataPlane { > +bool started; > +QEMUBH *start_bh; > +QemuThread thread; > + > +int fd; /* image file descriptor */ > + > +VirtIODevice *vdev; > +Vring vring;/* virtqueue vring */ > +EventNotifier *guest_notifier; /* irq */ > + > +EventPoll event_poll; /* event poller */ > +EventHandler io_handler;/* Linux AIO completion handler */ > +EventHandler notify_handler;/* virtqueue notify handler */ > + > +IOQueue ioqueue;/* Linux AIO queue (should really be per > + dataplane thread) */ > +VirtIOBlockRequest requests[REQ_MAX]; /* pool of requests, managed by the > + queue */ > + > +unsigned int num_reqs; > +QemuMutex num_reqs_lock; > +QemuCond no_reqs_cond; > +}; > + > +/* Raise an interrupt to signal guest, if necessary */ > +static void notify_guest(VirtIOBlockDataPlane *s) > +{ > +if (!vring_should_notify(s->vdev, &s->vring)) { > +return; > +} > + > +event_notifier_set(s->guest_notifier); > +} > + > +static void complete_request(struct iocb *iocb, ssize_t ret, void *opaque) > +{ > +VirtIOBlockDataPlane *s = opaque; > +VirtIOBlockRequest *req = container_of(iocb, VirtIOBlockRequest, iocb); > +struct virtio_blk_inhdr hdr; > +int len; > + > +if (likely(ret >= 0)) { > +hdr.status = VIRTIO_BLK_S_OK; > +len = ret; > +} else { > +hdr.status = VIRTIO_BLK_S_IOERR; > +len = 0; > +} > + > +trace_virtio_blk_data_plane_complete_request(s, req->head, ret); > + > +qemu_iovec_from_buf(req->inhdr, 0, &hdr, sizeof(hdr)); > +qemu_iovec_destroy(req->inhdr); > +g_slice_free(QEMUIOVector, req->inhdr); > + > +/* According to the virtio specification len should be the number of > bytes > + * written to, but for virtio-blk it seems to be the number of bytes > + * transferred plus the sta
[Qemu-devel] [PATCH 3/3] ehci-sysbus: Attach DMA context.
From: Peter Crosthwaite This was left as NULL on the initial merge due to debate on the mailing list on how to handle DMA contexts for sysbus devices. Patch 9e11908f12f92e31ea94dc2a4c962c836cba9f2a was later merged to fix OHCI. This is the, equivalent fix for sysbus EHCI. Signed-off-by: Peter Crosthwaite Signed-off-by: Gerd Hoffmann --- hw/usb/hcd-ehci-sysbus.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/hw/usb/hcd-ehci-sysbus.c b/hw/usb/hcd-ehci-sysbus.c index 1584079..803df92 100644 --- a/hw/usb/hcd-ehci-sysbus.c +++ b/hw/usb/hcd-ehci-sysbus.c @@ -45,6 +45,7 @@ static int usb_ehci_sysbus_initfn(SysBusDevice *dev) s->capsbase = 0x100; s->opregbase = 0x140; +s->dma = &dma_context_memory; usb_ehci_initfn(s, DEVICE(dev)); sysbus_init_irq(dev, &s->irq); -- 1.7.1
Re: [Qemu-devel] [PATCH v4 04/11] dataplane: add virtqueue vring code
On Thu, Nov 22, 2012 at 04:16:45PM +0100, Stefan Hajnoczi wrote: > The virtio-blk-data-plane cannot access memory using the usual QEMU > functions since it executes outside the global mutex and the memory APIs > are this time are not thread-safe. > > This patch introduces a virtqueue module based on the kernel's vhost > vring code. The trick is that we map guest memory ahead of time and > access it cheaply outside the global mutex. > > Once the hardware emulation code can execute outside the global mutex it > will be possible to drop this code. > > Signed-off-by: Stefan Hajnoczi > --- > hw/Makefile.objs | 2 +- > hw/dataplane/Makefile.objs | 2 +- > hw/dataplane/vring.c | 344 > + > hw/dataplane/vring.h | 62 > trace-events | 3 + > 5 files changed, 411 insertions(+), 2 deletions(-) > create mode 100644 hw/dataplane/vring.c > create mode 100644 hw/dataplane/vring.h > > diff --git a/hw/Makefile.objs b/hw/Makefile.objs > index ea46f81..db87fbf 100644 > --- a/hw/Makefile.objs > +++ b/hw/Makefile.objs > @@ -1,4 +1,4 @@ > -common-obj-y = usb/ ide/ > +common-obj-y = usb/ ide/ dataplane/ > common-obj-y += loader.o > common-obj-$(CONFIG_VIRTIO) += virtio-console.o > common-obj-$(CONFIG_VIRTIO) += virtio-rng.o > diff --git a/hw/dataplane/Makefile.objs b/hw/dataplane/Makefile.objs > index 8c8dea1..34e6d57 100644 > --- a/hw/dataplane/Makefile.objs > +++ b/hw/dataplane/Makefile.objs > @@ -1,3 +1,3 @@ > ifeq ($(CONFIG_VIRTIO), y) > -common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o > +common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o > endif > diff --git a/hw/dataplane/vring.c b/hw/dataplane/vring.c > new file mode 100644 > index 000..2632fbd > --- /dev/null > +++ b/hw/dataplane/vring.c > @@ -0,0 +1,344 @@ > +/* Copyright 2012 Red Hat, Inc. > + * Copyright IBM, Corp. 2012 > + * > + * Based on Linux vhost code: > + * Copyright (C) 2009 Red Hat, Inc. > + * Copyright (C) 2006 Rusty Russell IBM Corporation > + * > + * Author: Michael S. Tsirkin > + * Stefan Hajnoczi > + * > + * Inspiration, some code, and most witty comments come from > + * Documentation/virtual/lguest/lguest.c, by Rusty Russell > + * > + * This work is licensed under the terms of the GNU GPL, version 2. > + */ > + > +#include "trace.h" > +#include "hw/dataplane/vring.h" > + > +/* Map the guest's vring to host memory */ > +bool vring_setup(Vring *vring, VirtIODevice *vdev, int n) > +{ > +hwaddr vring_addr = virtio_queue_get_ring_addr(vdev, n); > +hwaddr vring_size = virtio_queue_get_ring_size(vdev, n); > +void *vring_ptr; > + > +vring->broken = false; > + > +hostmem_init(&vring->hostmem); > +vring_ptr = hostmem_lookup(&vring->hostmem, vring_addr, vring_size, > true); > +if (!vring_ptr) { > +error_report("Failed to map vring " > + "addr %#" HWADDR_PRIx " size %" HWADDR_PRIu, > + vring_addr, vring_size); > +vring->broken = true; > +return false; > +} > + > +vring_init(&vring->vr, virtio_queue_get_num(vdev, n), vring_ptr, 4096); > + > +vring->last_avail_idx = 0; > +vring->last_used_idx = 0; > +vring->signalled_used = 0; > +vring->signalled_used_valid = false; > + > +trace_vring_setup(virtio_queue_get_ring_addr(vdev, n), > + vring->vr.desc, vring->vr.avail, vring->vr.used); > +return true; > +} > + > +void vring_teardown(Vring *vring) > +{ > +hostmem_finalize(&vring->hostmem); > +} > + > +/* Toggle guest->host notifies */ > +void vring_set_notification(VirtIODevice *vdev, Vring *vring, bool enable) > +{ > +if (vdev->guest_features & (1 << VIRTIO_RING_F_EVENT_IDX)) { > +if (enable) { > +vring_avail_event(&vring->vr) = vring->vr.avail->idx; > +} > +} else if (enable) { > +vring->vr.used->flags &= ~VRING_USED_F_NO_NOTIFY; > +} else { > +vring->vr.used->flags |= VRING_USED_F_NO_NOTIFY; > +} > +} > + > +/* This is stolen from linux/drivers/vhost/vhost.c:vhost_notify() */ > +bool vring_should_notify(VirtIODevice *vdev, Vring *vring) > +{ > +uint16_t old, new; > +bool v; > +/* Flush out used index updates. This is paired > + * with the barrier that the Guest executes when enabling > + * interrupts. */ > +smp_mb(); > + > +if ((vdev->guest_features & VIRTIO_F_NOTIFY_ON_EMPTY) && > +unlikely(vring->vr.avail->idx == vring->last_avail_idx)) { > +return true; > +} > + > +if (!(vdev->guest_features & VIRTIO_RING_F_EVENT_IDX)) { > +return !(vring->vr.avail->flags & VRING_AVAIL_F_NO_INTERRUPT); > +} > +old = vring->signalled_used; > +v = vring->signalled_used_valid; > +new = vring->signalled_used = vring->last_used_idx; > +vring->signalled_used_valid = true; > + > +if (unlikely(!v)) { > +return true; > +} > + > +return vring_need
[Qemu-devel] [PULL for-1.3 0/3] usb patch queue
Hi, This is the usb patch queue, carrying three little fixes for 1.3. please pull, Gerd The following changes since commit e9bff10f8db94912b1b0e6e2e3394cae02faf614: event notifier: Fix setup for win32 (2012-11-28 13:33:01 -0600) are available in the git repository at: git://git.kraxel.org/qemu usb.73 Gerd Hoffmann (1): usb: tag usb host adapters as not hotpluggable. Peter Crosthwaite (1): ehci-sysbus: Attach DMA context. Stefan Hajnoczi (1): usb: fail usbdevice_create() when there is no USB bus hw/usb/bus.c |7 +++ hw/usb/hcd-ehci-pci.c|1 + hw/usb/hcd-ehci-sysbus.c |1 + hw/usb/hcd-ohci.c|1 + hw/usb/hcd-uhci.c|1 + hw/usb/hcd-xhci.c|1 + 6 files changed, 12 insertions(+), 0 deletions(-)
Re: [Qemu-devel] [RFC PATCH v2 1/3] virtio-bus : Introduce VirtioBus.
On 29/11/2012 14:09, Peter Maydell wrote: On 29 November 2012 12:37, Konrad Frederic wrote: On 26/11/2012 17:59, Anthony Liguori wrote: virtio-pci-bus extends virtio-bus - is constructed with a pointer to a PCIDevice - implements the methods necessary to be a virtio bus I still have trouble with that ^^. The problem is that the virtio devices can't be connected to the virtio-pci-bus even if it extends virtio-bus because TYPE_VIRTIO_BUS != TYPE_VIRTIO_PCI_BUS. Conceptually it ought to work I think: if the bus is-a TYPE_VIRTIO_BUS then we should permit plugging in even if the actual bus object happens to be an instance of a subclass. Yes, is my implementation doing the right thing ? I mean is the virtio-pci-bus a virtio-bus ? I suspect that qbus_find_recursive should be doing an object_class_dynamic_cast() to check that the bus is of a suitable type, rather than the (strcmp(object_get_typename(OBJECT(bus)), bus_typename) != 0) which it does at the moment. Yes, but we can cast VIRTIO_BUS in BUS no ? So in this case we could plug VirtioDevice in BUS and that's not what we want ? -- PMM
[Qemu-devel] [PATCH] qdev: relax bus type check in qdev_device_add()
We are currently checking for an exact type match. Use QOM dynamic_cast to check for a compatible type instead. Cc: Konrad Frederic Signed-off-by: Anthony Liguori --- hw/qdev-monitor.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/hw/qdev-monitor.c b/hw/qdev-monitor.c index 479eecd..69f5ff2 100644 --- a/hw/qdev-monitor.c +++ b/hw/qdev-monitor.c @@ -431,11 +431,16 @@ DeviceState *qdev_device_add(QemuOpts *opts) /* find bus */ path = qemu_opt_get(opts, "bus"); if (path != NULL) { +ObjectClass *bus_class; + bus = qbus_find(path); if (!bus) { return NULL; } -if (strcmp(object_get_typename(OBJECT(bus)), k->bus_type) != 0) { + +bus_class = OBJECT_CLASS(BUS_GET_CLASS(bus)); + +if (!object_class_dynamic_cast(bus_class, k->bus_type)) { qerror_report(QERR_BAD_BUS_FOR_DEVICE, driver, object_get_typename(OBJECT(bus))); return NULL; -- 1.8.0
Re: [Qemu-devel] [PATCH v4 03/11] dataplane: add host memory mapping code
On Thu, Nov 22, 2012 at 04:16:44PM +0100, Stefan Hajnoczi wrote: > The data plane thread needs to map guest physical addresses to host > pointers. Normally this is done with cpu_physical_memory_map() but the > function assumes the global mutex is held. The data plane thread does > not touch the global mutex and therefore needs a thread-safe memory > mapping mechanism. > > Hostmem registers a MemoryListener similar to how vhost collects and > pushes memory region information into the kernel. There is a > fine-grained lock on the regions list which is held during lookup and > when installing a new regions list. > > When the physical memory map changes the MemoryListener callbacks are > invoked. They build up a new list of memory regions which is finally > installed when the list has been completed. > > Note that this approach is not safe across memory hotplug because mapped > pointers may still be in used across memory unplug. However, this is > currently a problem for QEMU in general and needs to be addressed in the > future. > > Signed-off-by: Stefan Hajnoczi Worth bothering with binary search? vhost does a linear search over regions because the number of ram regions is very small. > --- > hw/dataplane/Makefile.objs | 3 + > hw/dataplane/hostmem.c | 165 > + > hw/dataplane/hostmem.h | 52 ++ > 3 files changed, 220 insertions(+) > create mode 100644 hw/dataplane/Makefile.objs > create mode 100644 hw/dataplane/hostmem.c > create mode 100644 hw/dataplane/hostmem.h > > diff --git a/hw/dataplane/Makefile.objs b/hw/dataplane/Makefile.objs > new file mode 100644 > index 000..8c8dea1 > --- /dev/null > +++ b/hw/dataplane/Makefile.objs > @@ -0,0 +1,3 @@ > +ifeq ($(CONFIG_VIRTIO), y) > +common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o > +endif > diff --git a/hw/dataplane/hostmem.c b/hw/dataplane/hostmem.c > new file mode 100644 > index 000..48aabf0 > --- /dev/null > +++ b/hw/dataplane/hostmem.c > @@ -0,0 +1,165 @@ > +/* > + * Thread-safe guest to host memory mapping > + * > + * Copyright 2012 Red Hat, Inc. and/or its affiliates > + * > + * Authors: > + * Stefan Hajnoczi > + * > + * This work is licensed under the terms of the GNU GPL, version 2 or later. > + * See the COPYING file in the top-level directory. > + * > + */ > + > +#include "exec-memory.h" > +#include "hostmem.h" > + > +static int hostmem_lookup_cmp(const void *phys_, const void *region_) > +{ > +hwaddr phys = *(const hwaddr *)phys_; > +const HostmemRegion *region = region_; > + > +if (phys < region->guest_addr) { > +return -1; > +} else if (phys >= region->guest_addr + region->size) { > +return 1; > +} else { > +return 0; > +} > +} > + > +/** > + * Map guest physical address to host pointer > + */ > +void *hostmem_lookup(Hostmem *hostmem, hwaddr phys, hwaddr len, bool > is_write) > +{ > +HostmemRegion *region; > +void *host_addr = NULL; > +hwaddr offset_within_region; > + > +qemu_mutex_lock(&hostmem->current_regions_lock); > +region = bsearch(&phys, hostmem->current_regions, > + hostmem->num_current_regions, > + sizeof(hostmem->current_regions[0]), > + hostmem_lookup_cmp); > +if (!region) { > +goto out; > +} > +if (is_write && region->readonly) { > +goto out; > +} > +offset_within_region = phys - region->guest_addr; > +if (offset_within_region + len <= region->size) { > +host_addr = region->host_addr + offset_within_region; > +} > +out: > +qemu_mutex_unlock(&hostmem->current_regions_lock); > + > +return host_addr; > +} > + > +/** > + * Install new regions list > + */ > +static void hostmem_listener_commit(MemoryListener *listener) > +{ > +Hostmem *hostmem = container_of(listener, Hostmem, listener); > + > +qemu_mutex_lock(&hostmem->current_regions_lock); > +g_free(hostmem->current_regions); > +hostmem->current_regions = hostmem->new_regions; > +hostmem->num_current_regions = hostmem->num_new_regions; > +qemu_mutex_unlock(&hostmem->current_regions_lock); > + > +/* Reset new regions list */ > +hostmem->new_regions = NULL; > +hostmem->num_new_regions = 0; > +} > + > +/** > + * Add a MemoryRegionSection to the new regions list > + */ > +static void hostmem_append_new_region(Hostmem *hostmem, > + MemoryRegionSection *section) > +{ > +void *ram_ptr = memory_region_get_ram_ptr(section->mr); > +size_t num = hostmem->num_new_regions; > +size_t new_size = (num + 1) * sizeof(hostmem->new_regions[0]); > + > +hostmem->new_regions = g_realloc(hostmem->new_regions, new_size); > +hostmem->new_regions[num] = (HostmemRegion){ > +.host_addr = ram_ptr + section->offset_within_region, > +.guest_addr = section->offset_within_address_space, > +.size = section->size, > +
Re: [Qemu-devel] [RFC PATCH v2 1/3] virtio-bus : Introduce VirtioBus.
Konrad Frederic writes: > On 26/11/2012 17:59, Anthony Liguori wrote: >> Peter Maydell writes: >> >>> On 26 November 2012 14:33, Anthony Liguori wrote: VirtioBusInfo is not a great name. This is a proxy class that allows for a device to implement the virtio bus interface. This could be done as an interface but since nothing else uses interfaces, I'm okay with something like this. But the first argument ought to be an opaque for all methods. >>> We have at least one user of Interface in the tree IIRC. >>> I'd much rather we did this the right way -- the only reason >>> it's the way Fred has coded it is that there's no obvious >>> body of code in the tree to copy, so we're thrashing around >>> a bit. If you tell us what the correct set of structs/classes/ >>> interfaces/etc is then we can implement it :-) >> I really think extending virtio-bus to a virtio-pci-bus and then >> initializing it with a link to the PCI device is the best approach. >> >> It's by far the simpliest approach in terms of coding. >> >> Did I explain it adequately? To recap: >> >> virtio-bus extends bus-state >> - implements everything that VirtIOBindings implements as methods >> >> virtio-pci-bus extends virtio-bus >> - is constructed with a pointer to a PCIDevice >> - implements the methods necessary to be a virtio bus > I still have trouble with that ^^. > > virtio-pci-bus extends virtio-bus so I put something like that : > > static void virtio_pci_bus_class_init(ObjectClass *klass, void *data) > { > BusClass *bc = BUS_CLASS(klass); > VirtioBusClass *k = VIRTIO_BUS_CLASS(klass); > /* VirtIOBindings */ > k->notify = virtio_pci_notify; > k->save_config = virtio_pci_save_config; > k->load_config = virtio_pci_load_config; > k->save_queue = virtio_pci_save_queue; > k->load_queue = virtio_pci_load_queue; > k->get_features = virtio_pci_get_features; > k->query_guest_notifiers = virtio_pci_query_guest_notifiers; > k->set_host_notifier = virtio_pci_set_host_notifier; > k->set_guest_notifiers = virtio_pci_set_guest_notifiers; > k->vmstate_change = virtio_pci_vmstate_change; > /* > * TODO : Init and exit function. > * void (*init)(void *opaque); > * void (*exit)(void *opaque); > */ > } > > static TypeInfo virtio_pci_bus_info = { > .name = TYPE_VIRTIO_PCI_BUS, > .parent= TYPE_VIRTIO_BUS, > .class_init= virtio_pci_bus_class_init, > }; > > and I have VirtioDevice which extends DeviceState like that : > > static void virtio_device_class_init(ObjectClass *klass, void *data) > { > /* Set the default value here. */ > DeviceClass *dc = DEVICE_CLASS(klass); > dc->bus_type = TYPE_VIRTIO_BUS; > dc->init = virtio_device_init; > } > > static TypeInfo virtio_device_info = { > .name = TYPE_VIRTIO_DEVICE, > .parent = TYPE_DEVICE, > .instance_size = sizeof(VirtioDeviceState), > /* Abstract the virtio-device */ > .class_init = virtio_device_class_init, > .abstract = true, > .class_size = sizeof(VirtioDeviceClass), > }; > > The problem is that the virtio devices can't be connected to the > virtio-pci-bus even if it extends virtio-bus because TYPE_VIRTIO_BUS != > TYPE_VIRTIO_PCI_BUS. That's just a bug. See the patch I just sent out to fix it. The type check is too strict in qdev_device_add(). Regards, Anthony Liguori > > Did I miss something ? > > Thanks, > > Fred > > >> virtio-device extends device-state >> - implements methods used by virtio-bus >> >> Regards, >> >> Anthony Liguori >> >>> -- PMM
Re: [Qemu-devel] [RFC PATCH v2 1/3] virtio-bus : Introduce VirtioBus.
On 29 November 2012 13:47, Konrad Frederic wrote: > On 29/11/2012 14:09, Peter Maydell wrote: >> I suspect that qbus_find_recursive should be doing an >> object_class_dynamic_cast() to check that the bus is of a suitable >> type, rather than the >> (strcmp(object_get_typename(OBJECT(bus)), bus_typename) != 0) >> which it does at the moment. > > Yes, but we can cast VIRTIO_BUS in BUS no ? > So in this case we could plug VirtioDevice in BUS and that's not what we > want ? I don't understand what you're asking. qbus_find_recursive() looks for a bus which might be specified either by name or by bus type. At the moment it insists on an exact type match (so if you ask for a virtio-bus you have to provide a virtio-bus object, not a subclass of virtio-bus); it should do a dynamic cast, so anything which is a virtio-bus or a subclass of that will do. Yes, if you say "please find me any bus which is a bus of any kind" then you'll get something wrong back -- so don't do that. In particular, since VirtioDevice sets its bus_type to TYPE_VIRTIO_BUS then we will ask for that, and anything which isn't a virtio bus (or subclass) won't be found. -- PMM
Re: [Qemu-devel] [RFC PATCH v2 1/3] virtio-bus : Introduce VirtioBus.
Am 29.11.2012 14:47, schrieb Konrad Frederic: > On 29/11/2012 14:09, Peter Maydell wrote: >> On 29 November 2012 12:37, Konrad Frederic >> wrote: >>> On 26/11/2012 17:59, Anthony Liguori wrote: virtio-pci-bus extends virtio-bus - is constructed with a pointer to a PCIDevice - implements the methods necessary to be a virtio bus >>> I still have trouble with that ^^. >>> The problem is that the virtio devices can't be connected to the >>> virtio-pci-bus even if it extends virtio-bus because TYPE_VIRTIO_BUS != >>> TYPE_VIRTIO_PCI_BUS. >> Conceptually it ought to work I think: if the bus is-a TYPE_VIRTIO_BUS >> then we should permit plugging in even if the actual bus object happens >> to be an instance of a subclass. > Yes, is my implementation doing the right thing ? > I mean is the virtio-pci-bus a virtio-bus ? In your v3 patchset no. In the inline code yes, via virtio_pci_bus_info's .parent. >> I suspect that qbus_find_recursive should be doing an >> object_class_dynamic_cast() to check that the bus is of a suitable >> type, rather than the >> (strcmp(object_get_typename(OBJECT(bus)), bus_typename) != 0) >> which it does at the moment. > Yes, but we can cast VIRTIO_BUS in BUS no ? > So in this case we could plug VirtioDevice in BUS and that's not what we > want ? You would want to check whether object_class_dynamic_cast(OBJECT(bus), TYPE_VIRTIO_BUS) == NULL. This should evaluate to true if the bus is not a virtio-bus. Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH] qdev: relax bus type check in qdev_device_add()
On 29 November 2012 13:49, Anthony Liguori wrote: > We are currently checking for an exact type match. Use QOM dynamic_cast to > check for a compatible type instead. I think this only catches the case where a bus was explicitly specified via bus=. For the default case you also need to change the similar code for checking the bus type in qbus_find_recursive(), right? -- PMM
Re: [Qemu-devel] [Xen-devel] [PATCH 0/2] QEMU/xen: simplify cpu_ioreq_pio and cpu_ioreq_move
On Thu, 2012-11-29 at 13:21 +, Stefano Stabellini wrote: > > Also I think 4.2.1 need these patches to enable the basic Xen on Xen > > nested virtualization usage scenario. > > I agree. Nested virt was a tech preview in 4.2.0, is it really worth backporting ? Ian.
Re: [Qemu-devel] [PATCH] rbd block driver fix race between aio completition and aio cancel
On Thu, Nov 22, 2012 at 11:00:19AM +0100, Stefan Priebe wrote: > @@ -406,10 +401,11 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb) > acb->ret = r; > } > } > +acb->status = 0; > + I suggest doing this in the BH. The qemu_aio_wait() loop in qemu_rbd_aio_cancel() needs to wait until the BH has executed. By clearing status in the BH we ensure that no matter in which order qemu_aio_wait() invokes BHs and callbacks, we'll always wait until the BH has completed before ending the while loop in qemu_rbd_aio_cancel(). > @@ -737,7 +741,8 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState > *bs, > failed: > g_free(rcb); > s->qemu_aio_count--; > -qemu_aio_release(acb); > +if (!acb->cancelled) > +qemu_aio_release(acb); > return NULL; > } This scenario is impossible. We haven't returned the acb back to the caller yet so they could not have invoked qemu_aio_cancel(). Stefan
Re: [Qemu-devel] [PATCH v4 10/11] dataplane: add virtio-blk data plane code
On Thu, Nov 22, 2012 at 04:16:51PM +0100, Stefan Hajnoczi wrote: > virtio-blk-data-plane is a subset implementation of virtio-blk. It only > handles read, write, and flush requests. It does this using a dedicated > thread that executes an epoll(2)-based event loop and processes I/O > using Linux AIO. > > This approach performs very well but can be used for raw image files > only. The number of IOPS achieved has been reported to be several times > higher than the existing virtio-blk implementation. > > Eventually it should be possible to unify virtio-blk-data-plane with the > main body of QEMU code once the block layer and hardware emulation is > able to run outside the global mutex. > > Signed-off-by: Stefan Hajnoczi > --- > hw/dataplane/Makefile.objs | 2 +- > hw/dataplane/virtio-blk.c | 427 > + > hw/dataplane/virtio-blk.h | 41 + > trace-events | 6 + > 4 files changed, 475 insertions(+), 1 deletion(-) > create mode 100644 hw/dataplane/virtio-blk.c > create mode 100644 hw/dataplane/virtio-blk.h > > diff --git a/hw/dataplane/Makefile.objs b/hw/dataplane/Makefile.objs > index abd408f..682aa9e 100644 > --- a/hw/dataplane/Makefile.objs > +++ b/hw/dataplane/Makefile.objs > @@ -1,3 +1,3 @@ > ifeq ($(CONFIG_VIRTIO), y) > -common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o event-poll.o > ioq.o > +common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o event-poll.o > ioq.o virtio-blk.o > endif > diff --git a/hw/dataplane/virtio-blk.c b/hw/dataplane/virtio-blk.c > new file mode 100644 > index 000..9b29969 > --- /dev/null > +++ b/hw/dataplane/virtio-blk.c > @@ -0,0 +1,427 @@ > +/* > + * Dedicated thread for virtio-blk I/O processing > + * > + * Copyright 2012 IBM, Corp. > + * Copyright 2012 Red Hat, Inc. and/or its affiliates > + * > + * Authors: > + * Stefan Hajnoczi > + * > + * This work is licensed under the terms of the GNU GPL, version 2 or later. > + * See the COPYING file in the top-level directory. > + * > + */ > + > +#include "trace.h" > +#include "iov.h" > +#include "event-poll.h" > +#include "qemu-thread.h" > +#include "vring.h" > +#include "ioq.h" > +#include "hw/virtio-blk.h" > +#include "hw/dataplane/virtio-blk.h" > + > +enum { > +SEG_MAX = 126, /* maximum number of I/O segments */ > +VRING_MAX = SEG_MAX + 2,/* maximum number of vring descriptors */ > +REQ_MAX = VRING_MAX,/* maximum number of requests in the > vring, > + * is VRING_MAX / 2 with traditional and > + * VRING_MAX with indirect descriptors */ > +}; > + > +typedef struct { > +struct iocb iocb; /* Linux AIO control block */ > +QEMUIOVector *inhdr;/* iovecs for virtio_blk_inhdr */ > +unsigned int head; /* vring descriptor index */ > +} VirtIOBlockRequest; > + > +struct VirtIOBlockDataPlane { > +bool started; > +QEMUBH *start_bh; > +QemuThread thread; > + > +int fd; /* image file descriptor */ > + > +VirtIODevice *vdev; > +Vring vring;/* virtqueue vring */ > +EventNotifier *guest_notifier; /* irq */ > + > +EventPoll event_poll; /* event poller */ > +EventHandler io_handler;/* Linux AIO completion handler */ > +EventHandler notify_handler;/* virtqueue notify handler */ > + > +IOQueue ioqueue;/* Linux AIO queue (should really be per > + dataplane thread) */ > +VirtIOBlockRequest requests[REQ_MAX]; /* pool of requests, managed by the > + queue */ > + > +unsigned int num_reqs; > +QemuMutex num_reqs_lock; OK the only reason this lock is needed is because you want to drain outside the thread. Won't it be better to queue process the drain request through the thread? You won't need any locks then. > +QemuCond no_reqs_cond; > +}; > + > +/* Raise an interrupt to signal guest, if necessary */ > +static void notify_guest(VirtIOBlockDataPlane *s) > +{ > +if (!vring_should_notify(s->vdev, &s->vring)) { > +return; > +} > + > +event_notifier_set(s->guest_notifier); > +} > + > +static void complete_request(struct iocb *iocb, ssize_t ret, void *opaque) > +{ > +VirtIOBlockDataPlane *s = opaque; > +VirtIOBlockRequest *req = container_of(iocb, VirtIOBlockRequest, iocb); > +struct virtio_blk_inhdr hdr; > +int len; > + > +if (likely(ret >= 0)) { > +hdr.status = VIRTIO_BLK_S_OK; > +len = ret; > +} else { > +hdr.status = VIRTIO_BLK_S_IOERR; > +len = 0; > +} > + > +trace_virtio_blk_data_plane_complete_request(s, req->head, ret); > + > +qemu_iovec_from_buf(req->inhdr, 0, &hdr, sizeof(hdr)); > +qemu_iovec_destroy(req->inhdr); > +g_slice_free(QEMUIOVector, req->inhdr); > +
Re: [Qemu-devel] [PATCH] qdev: relax bus type check in qdev_device_add()
On 29/11/2012 14:56, Peter Maydell wrote: On 29 November 2012 13:49, Anthony Liguori wrote: We are currently checking for an exact type match. Use QOM dynamic_cast to check for a compatible type instead. I think this only catches the case where a bus was explicitly specified via bus=. For the default case you also need to change the similar code for checking the bus type in qbus_find_recursive(), right? -- PMM Right, it's working only with the "bus=" command line.
Re: [Qemu-devel] [PATCH v4 00/11] virtio: virtio-blk data plane
On Thu, Nov 29, 2012 at 10:18:59AM +0100, Stefan Hajnoczi wrote: > Michael, Paolo: Are you happy with v4? Looks pretty clean by itself. I sent some comments but they can be addressed later. What worries me most is the code duplication with regular virtio. I see two ways to reduce the maintainance somewhat - split out ring handling code in virtio-blk to a separate file to make it more obvious which part is inactive when data plane runs. - share ring processing code with virtio/virtio-blk (e.g. use callbacks) Was any thought given to implementing one of these two approaches? -- MST
Re: [Qemu-devel] [PATCH] virtio: limit avail bytes lookahead
On Thu, Nov 29, 2012 at 06:34:46PM +0530, Amit Shah wrote: > On (Wed) 28 Nov 2012 [23:53:08], Michael S. Tsirkin wrote: > > On Tue, Nov 27, 2012 at 06:25:04PM +0200, Michael S. Tsirkin wrote: > > > On Thu, Nov 01, 2012 at 06:07:21PM +0200, Michael S. Tsirkin wrote: > > > > Commit 0d8d7690850eb0cf2b2b60933cf47669a6b6f18f introduced > > > > a regression in virtio-net performance because it looks > > > > into the ring aggressively while we really only care > > > > about a single packet worth of buffers. > > > > To fix, add parameters limiting lookahead, and > > > > use in virtqueue_avail_bytes. > > > > > > > > Signed-off-by: Michael S. Tsirkin > > > > Reported-by: Edivaldo de Araujo Pereira > > > > > > Ping. > > > Anthony - going to apply this? > > > > virtio rng was added since so naturally build broke. > > Here's a patch on top to fix it up. I never used virtio rng before so > > could not test at this hour, but it does fix the build. > > > > I'll take a look at how to test it tomorrow but any > > info would be appreciated. > > Amit could you pls review? > > Looks fine, I assume you will send a v2 of the patch to Anthony? > > Amit Anthony volunteered to test this so there will only be v2 if he sees problems.
Re: [Qemu-devel] [PATCH 1.3] virtio-rng: do not use g_assert_cmpint
On Tue, Nov 27, 2012 at 09:16:24AM +0100, Paolo Bonzini wrote: > g_assert_cmpint is not available on glib 2.12, which is the minimum > version required to build QEMU (we only require 2.16 to run tests, > since that is the first version including GTester). Do not use it > in hardware models, use a normal assertion instead. > > This fixes the buildbot failure for default_x86_64_rhel5. > > Signed-off-by: Paolo Bonzini > --- > hw/virtio-rng.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) Reviewed-by: Stefan Hajnoczi
Re: [Qemu-devel] main-loop.c: About Select handling
On Wed, Nov 28, 2012 at 03:08:24AM +, Furukawa, Eiji wrote: > About a source of qemu-1.2.0/main-loop.c > The select handling of os_host_main_loop_wait function > I do not seem to do Exit by interrupts such as SIGUSR1 > Will not it be necessary to make modifications? > > Before > LineNumber:308 ret = select(nfds + 1, &rfds, &wfds, &xfds, tvarg); > > After(Example) > do { > ret = select(nfds + 1, &rfds, &wfds, &xfds, tvarg); > } while(ret == -1 && (errno == EINTR || errno == EAGAIN)) What is the specific bug or problem you're seeing? QEMU uses several signals internally and sets up per-thread signal masks appropriately. Stefan
Re: [Qemu-devel] [PATCH v4 03/11] dataplane: add host memory mapping code
On Thu, Nov 29, 2012 at 03:54:25PM +0200, Michael S. Tsirkin wrote: > On Thu, Nov 22, 2012 at 04:16:44PM +0100, Stefan Hajnoczi wrote: > > The data plane thread needs to map guest physical addresses to host > > pointers. Normally this is done with cpu_physical_memory_map() but the > > function assumes the global mutex is held. The data plane thread does > > not touch the global mutex and therefore needs a thread-safe memory > > mapping mechanism. > > > > Hostmem registers a MemoryListener similar to how vhost collects and > > pushes memory region information into the kernel. There is a > > fine-grained lock on the regions list which is held during lookup and > > when installing a new regions list. > > > > When the physical memory map changes the MemoryListener callbacks are > > invoked. They build up a new list of memory regions which is finally > > installed when the list has been completed. > > > > Note that this approach is not safe across memory hotplug because mapped > > pointers may still be in used across memory unplug. However, this is > > currently a problem for QEMU in general and needs to be addressed in the > > future. > > > > Signed-off-by: Stefan Hajnoczi > > Worth bothering with binary search? > vhost does a linear search over regions because > the number of ram regions is very small. memory.c does binary search. I did the same but in practice there are <20 regions for a simple VM. It's probably not worth it but without performance results this is speculation. I think there's no harm in using binary search to start with. > > +static void hostmem_listener_append_region(MemoryListener *listener, > > + MemoryRegionSection *section) > > +{ > > +Hostmem *hostmem = container_of(listener, Hostmem, listener); > > + > > +if (memory_region_is_ram(section->mr)) { > > +hostmem_append_new_region(hostmem, section); > > +} > > I think you also need to remove VGA region since you > don't mark these pages as dirty so access there won't work. I don't understand. If memory in the VGA region returns true from memory_region_is_ram(), why would there be a problem?
Re: [Qemu-devel] [RFC PATCH v2 1/3] virtio-bus : Introduce VirtioBus.
On 29/11/2012 14:55, Andreas Färber wrote: Am 29.11.2012 14:47, schrieb Konrad Frederic: On 29/11/2012 14:09, Peter Maydell wrote: On 29 November 2012 12:37, Konrad Frederic wrote: On 26/11/2012 17:59, Anthony Liguori wrote: virtio-pci-bus extends virtio-bus - is constructed with a pointer to a PCIDevice - implements the methods necessary to be a virtio bus I still have trouble with that ^^. The problem is that the virtio devices can't be connected to the virtio-pci-bus even if it extends virtio-bus because TYPE_VIRTIO_BUS != TYPE_VIRTIO_PCI_BUS. Conceptually it ought to work I think: if the bus is-a TYPE_VIRTIO_BUS then we should permit plugging in even if the actual bus object happens to be an instance of a subclass. Yes, is my implementation doing the right thing ? I mean is the virtio-pci-bus a virtio-bus ? In your v3 patchset no. In the inline code yes, via virtio_pci_bus_info's .parent. I suspect that qbus_find_recursive should be doing an object_class_dynamic_cast() to check that the bus is of a suitable type, rather than the (strcmp(object_get_typename(OBJECT(bus)), bus_typename) != 0) which it does at the moment. Yes, but we can cast VIRTIO_BUS in BUS no ? So in this case we could plug VirtioDevice in BUS and that's not what we want ? You would want to check whether object_class_dynamic_cast(OBJECT(bus), TYPE_VIRTIO_BUS) == NULL. This should evaluate to true if the bus is not a virtio-bus. Yes, ok I got it! Thanks, Fred
[Qemu-devel] LiveMigration and TSC
Hi, can someone give me a short status of the topic Live Migration and VMs that use TSC as clocksource? Broken, yes / no? And if yes under what circumstances? I have some TSC VMs that freeze occasionally after LiveMigration. Thank you, Peter
[Qemu-devel] [PATCHv4] rbd block driver fix race between aio completition and aio cancel
This one fixes a race which qemu had also in iscsi block driver between cancellation and io completition. qemu_rbd_aio_cancel was not synchronously waiting for the end of the command. To archieve this it introduces a new status flag which uses -EINPROGRESS. Changes since PATCHv3: - removed unnecessary if condition in rbd_start_aio as we haven't start io yet - moved acb->status = 0 to rbd_aio_bh_cb so qemu_aio_wait always waits until BH was executed Changes since PATCHv2: - fixed missing braces - added vfree for bounce Signed-off-by: Stefan Priebe --- block/rbd.c | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/block/rbd.c b/block/rbd.c index f3becc7..28e94ab 100644 --- a/block/rbd.c +++ b/block/rbd.c @@ -77,6 +77,7 @@ typedef struct RBDAIOCB { int error; struct BDRVRBDState *s; int cancelled; +int status; } RBDAIOCB; typedef struct RADOSCB { @@ -376,12 +377,6 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb) RBDAIOCB *acb = rcb->acb; int64_t r; -if (acb->cancelled) { -qemu_vfree(acb->bounce); -qemu_aio_release(acb); -goto done; -} - r = rcb->ret; if (acb->cmd == RBD_AIO_WRITE || @@ -409,7 +404,6 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb) /* Note that acb->bh can be NULL in case where the aio was cancelled */ acb->bh = qemu_bh_new(rbd_aio_bh_cb, acb); qemu_bh_schedule(acb->bh); -done: g_free(rcb); } @@ -568,6 +562,12 @@ static void qemu_rbd_aio_cancel(BlockDriverAIOCB *blockacb) { RBDAIOCB *acb = (RBDAIOCB *) blockacb; acb->cancelled = 1; + +while (acb->status == -EINPROGRESS) { +qemu_aio_wait(); +} + +qemu_vfree(acb->bounce); } static const AIOCBInfo rbd_aiocb_info = { @@ -639,6 +639,7 @@ static void rbd_aio_bh_cb(void *opaque) acb->common.cb(acb->common.opaque, (acb->ret > 0 ? 0 : acb->ret)); qemu_bh_delete(acb->bh); acb->bh = NULL; +acb->status = 0; qemu_aio_release(acb); } @@ -685,6 +686,7 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs, acb->s = s; acb->cancelled = 0; acb->bh = NULL; +acb->status = -EINPROGRESS; if (cmd == RBD_AIO_WRITE) { qemu_iovec_to_buf(acb->qiov, 0, acb->bounce, qiov->size); -- 1.7.10.4
Re: [Qemu-devel] [PATCH] rbd block driver fix race between aio completition and aio cancel
Hi, i hope i've done everything correctly. I've send a new v4 patch. Am 29.11.2012 14:58, schrieb Stefan Hajnoczi: On Thu, Nov 22, 2012 at 11:00:19AM +0100, Stefan Priebe wrote: @@ -406,10 +401,11 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb) acb->ret = r; } } +acb->status = 0; + I suggest doing this in the BH. The qemu_aio_wait() loop in qemu_rbd_aio_cancel() needs to wait until the BH has executed. By clearing status in the BH we ensure that no matter in which order qemu_aio_wait() invokes BHs and callbacks, we'll always wait until the BH has completed before ending the while loop in qemu_rbd_aio_cancel(). @@ -737,7 +741,8 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs, failed: g_free(rcb); s->qemu_aio_count--; -qemu_aio_release(acb); +if (!acb->cancelled) +qemu_aio_release(acb); return NULL; } This scenario is impossible. We haven't returned the acb back to the caller yet so they could not have invoked qemu_aio_cancel(). Greets, Stefan
Re: [Qemu-devel] [PATCH v4 03/11] dataplane: add host memory mapping code
On Thu, Nov 29, 2012 at 03:26:56PM +0100, Stefan Hajnoczi wrote: > On Thu, Nov 29, 2012 at 03:54:25PM +0200, Michael S. Tsirkin wrote: > > On Thu, Nov 22, 2012 at 04:16:44PM +0100, Stefan Hajnoczi wrote: > > > The data plane thread needs to map guest physical addresses to host > > > pointers. Normally this is done with cpu_physical_memory_map() but the > > > function assumes the global mutex is held. The data plane thread does > > > not touch the global mutex and therefore needs a thread-safe memory > > > mapping mechanism. > > > > > > Hostmem registers a MemoryListener similar to how vhost collects and > > > pushes memory region information into the kernel. There is a > > > fine-grained lock on the regions list which is held during lookup and > > > when installing a new regions list. > > > > > > When the physical memory map changes the MemoryListener callbacks are > > > invoked. They build up a new list of memory regions which is finally > > > installed when the list has been completed. > > > > > > Note that this approach is not safe across memory hotplug because mapped > > > pointers may still be in used across memory unplug. However, this is > > > currently a problem for QEMU in general and needs to be addressed in the > > > future. > > > > > > Signed-off-by: Stefan Hajnoczi > > > > Worth bothering with binary search? > > vhost does a linear search over regions because > > the number of ram regions is very small. > > memory.c does binary search. I did the same but in practice there are > <20 regions for a simple VM. It's probably not worth it but without > performance results this is speculation. > > I think there's no harm in using binary search to start with. > > > > +static void hostmem_listener_append_region(MemoryListener *listener, > > > + MemoryRegionSection *section) > > > +{ > > > +Hostmem *hostmem = container_of(listener, Hostmem, listener); > > > + > > > +if (memory_region_is_ram(section->mr)) { > > > +hostmem_append_new_region(hostmem, section); > > > +} > > > > I think you also need to remove VGA region since you > > don't mark these pages as dirty so access there won't work. > > I don't understand. If memory in the VGA region returns true from > memory_region_is_ram(), why would there be a problem? If you change this memory but you don't update the display. Never happens with non buggy guests but we should catch and fail if it does. -- MST
Re: [Qemu-devel] [PATCH v4 11/11] virtio-blk: add x-data-plane=on|off performance feature
On Thu, Nov 29, 2012 at 03:12:35PM +0200, Michael S. Tsirkin wrote: > On Thu, Nov 22, 2012 at 04:16:52PM +0100, Stefan Hajnoczi wrote: > > The virtio-blk-data-plane feature is easy to integrate into > > hw/virtio-blk.c. The data plane can be started and stopped similar to > > vhost-net. > > > > Users can take advantage of the virtio-blk-data-plane feature using the > > new -device virtio-blk-pci,x-data-plane=on property. > > > > The x-data-plane name was chosen because at this stage the feature is > > experimental and likely to see changes in the future. > > > > If the VM configuration does not support virtio-blk-data-plane an error > > message is printed. Although we could fall back to regular virtio-blk, > > I prefer the explicit approach since it prompts the user to fix their > > configuration if they want the performance benefit of > > virtio-blk-data-plane. > > Not only that, this affects features exposed to guest so it really can't be > trasparent. > > Which reminds me - shouldn't some features be turned off? > For example, VIRTIO_BLK_F_SCSI? Yes, virtio-blk-data-plane only starts when you give -device virtio-blk-pci,scsi=off,x-data-plane=on. If you use scsi=on an error message is printed. > > Limitations: > > * Only format=raw is supported > > * Live migration is not supported > > This is probably fixable long term? Absolutely. There are two parts: 1. Marking written memory dirty so live RAM migration can work. Missing today, easy cheat is to switch off virtio-blk-data-plane and silently switch to regular virtio-blk emulation while memory dirty logging is enabled. The more long-term solution is to actually communicate the dirty information back to the memory API. 2. Synchronizing virtio-blk-data-plane vring state with virtio-blk so save/load works. This should be relatively straightforward. I don't want to gate this patch series on live migration support but it is on my TODO list for virtio-blk-data-plane after this initial series has been merged. > > * Block jobs, hot unplug, and other operations fail with -EBUSY > > Hmm I don't see code to disable PCU unplug in this patch. > I expected no_hotplug to be set. > Where is it? It uses the bdrv_in_use() mechanism. > > * I/O throttling limits are ignored > > And this? > Meanwhile can we have attempts to set them fail? This limitation exists because virtio-blk-data-plane today bypasses the QEMU block layer. The next step is to get the block layer working inside the data plane thread. At that point I/O limits work again. Adding an error would be a layering violation because I/O throttling happens in the QEMU block layer and is unaware of the emulated storage controller (virtio-blk, IDE, SCSI, etc). I think it's better to document the limitation and continue working on AioContext so that we can soon support I/O throttling with virtio-blk-data-plane. It would be quite ugly to add checks. > > @@ -33,6 +35,8 @@ typedef struct VirtIOBlock > > VirtIOBlkConf *blk; > > unsigned short sector_mask; > > DeviceState *qdev; > > +VirtIOBlockDataPlane *dataplane; > > +Error *migration_blocker; > > Would be nice to move the migration disabling > checking supported formats > and all the rest of it out to dataplane code. The reason to do it in virtio-blk.c is that we already have access to the device configuration. If we move it to hw/dataplane/virtio-blk.c then that code needs to reach inside and check data that it doesn't otherwise access. IMO it's nice to keep data plane "dumb" and perform these checks where we already have to deal with the relationship between VirtIOBlkConf and friends. Stefan
Re: [Qemu-devel] [PATCH v4 00/11] virtio: virtio-blk data plane
On Thu, Nov 29, 2012 at 04:09:28PM +0200, Michael S. Tsirkin wrote: > On Thu, Nov 29, 2012 at 10:18:59AM +0100, Stefan Hajnoczi wrote: > > Michael, Paolo: Are you happy with v4? > > Looks pretty clean by itself. I sent some comments but they can be > addressed later. What worries me most is the code duplication with > regular virtio. > > I see two ways to reduce the maintainance somewhat > - split out ring handling code in virtio-blk > to a separate file to make it more obvious which part > is inactive when data plane runs. > - share ring processing code with virtio/virtio-blk > (e.g. use callbacks) > > Was any thought given to implementing one of these two > approaches? Yes, your option #2 is where I'd like to move once threaded memory dispatch is working. I hope we can run virtio.c code in a thread outside the global mutex soon. That way we can kill hw/dataplane/vring.[ch]. Ping Fan Liu has been working on the memory API and device emulation stuff that we need in order to eventually use virtio.c outside the global mutex. Stefan
Re: [Qemu-devel] [PATCH v4 11/11] virtio-blk: add x-data-plane=on|off performance feature
On Thu, Nov 29, 2012 at 03:45:55PM +0100, Stefan Hajnoczi wrote: > On Thu, Nov 29, 2012 at 03:12:35PM +0200, Michael S. Tsirkin wrote: > > On Thu, Nov 22, 2012 at 04:16:52PM +0100, Stefan Hajnoczi wrote: > > > The virtio-blk-data-plane feature is easy to integrate into > > > hw/virtio-blk.c. The data plane can be started and stopped similar to > > > vhost-net. > > > > > > Users can take advantage of the virtio-blk-data-plane feature using the > > > new -device virtio-blk-pci,x-data-plane=on property. > > > > > > The x-data-plane name was chosen because at this stage the feature is > > > experimental and likely to see changes in the future. > > > > > > If the VM configuration does not support virtio-blk-data-plane an error > > > message is printed. Although we could fall back to regular virtio-blk, > > > I prefer the explicit approach since it prompts the user to fix their > > > configuration if they want the performance benefit of > > > virtio-blk-data-plane. > > > > Not only that, this affects features exposed to guest so it really can't be > > trasparent. > > > > Which reminds me - shouldn't some features be turned off? > > For example, VIRTIO_BLK_F_SCSI? > > Yes, virtio-blk-data-plane only starts when you give -device > virtio-blk-pci,scsi=off,x-data-plane=on. If you use scsi=on an error > message is printed. > > > > Limitations: > > > * Only format=raw is supported > > > * Live migration is not supported > > > > This is probably fixable long term? > > Absolutely. There are two parts: > > 1. Marking written memory dirty so live RAM migration can work. Missing >today, easy cheat is to switch off virtio-blk-data-plane and silently >switch to regular virtio-blk emulation while memory dirty logging is >enabled. The more long-term solution is to actually communicate the >dirty information back to the memory API. > > 2. Synchronizing virtio-blk-data-plane vring state with virtio-blk so >save/load works. This should be relatively straightforward. > > I don't want to gate this patch series on live migration support but it > is on my TODO list for virtio-blk-data-plane after this initial series > has been merged. > > > > * Block jobs, hot unplug, and other operations fail with -EBUSY > > > > Hmm I don't see code to disable PCU unplug in this patch. > > I expected no_hotplug to be set. > > Where is it? > > It uses the bdrv_in_use() mechanism. Hmm but PCI device can still go away if guest ejects it. Does this work fine? > > > * I/O throttling limits are ignored > > > > And this? > > Meanwhile can we have attempts to set them fail? > > This limitation exists because virtio-blk-data-plane today bypasses the > QEMU block layer. The next step is to get the block layer working > inside the data plane thread. At that point I/O limits work again. > > Adding an error would be a layering violation because I/O throttling > happens in the QEMU block layer and is unaware of the emulated storage > controller (virtio-blk, IDE, SCSI, etc). > > I think it's better to document the limitation and continue working on > AioContext so that we can soon support I/O throttling with > virtio-blk-data-plane. It would be quite ugly to add checks. > > > > @@ -33,6 +35,8 @@ typedef struct VirtIOBlock > > > VirtIOBlkConf *blk; > > > unsigned short sector_mask; > > > DeviceState *qdev; > > > +VirtIOBlockDataPlane *dataplane; > > > +Error *migration_blocker; > > > > Would be nice to move the migration disabling > > checking supported formats > > and all the rest of it out to dataplane code. > > The reason to do it in virtio-blk.c is that we already have access to > the device configuration. If we move it to hw/dataplane/virtio-blk.c > then that code needs to reach inside and check data that it doesn't > otherwise access. Not really, just pass it all necessary data. > IMO it's nice to keep data plane "dumb" and perform these checks where > we already have to deal with the relationship between VirtIOBlkConf and > friends. > > Stefan Yes but then it's not contained.
Re: [Qemu-devel] [PATCH 15/37] target-i386: set default value of "hypervisor" feature using static property
On Fri, 9 Nov 2012 16:48:07 +0100 Eduardo Habkost wrote: > > On 22/10/2012, at 17:03, Igor Mammedov wrote: > > > Signed-off-by: Igor Mammedov > > --- > > target-i386/cpu.c | 9 +++-- > > 1 file changed, 3 insertions(+), 6 deletions(-) > > > > diff --git a/target-i386/cpu.c b/target-i386/cpu.c > > index 3131945..dc4fcdf 100644 > > --- a/target-i386/cpu.c > > +++ b/target-i386/cpu.c > > @@ -174,7 +174,7 @@ static Property cpu_x86_properties[] = { > > DEFINE_PROP_BIT("f-xsave", X86CPU, env.cpuid_ext_features, 26, false), > > DEFINE_PROP_BIT("f-osxsave", X86CPU, env.cpuid_ext_features, 27, > > false), DEFINE_PROP_BIT("f-avx", X86CPU, env.cpuid_ext_features, 28, > > false), > > -DEFINE_PROP_BIT("f-hypervisor", X86CPU, env.cpuid_ext_features, 31, > > false), > > +DEFINE_PROP_BIT("f-hypervisor", X86CPU, env.cpuid_ext_features, 31, > > true), DEFINE_PROP_BIT("f-syscall", X86CPU, env.cpuid_ext2_features, 11, > > false), DEFINE_PROP_BIT("f-nx", X86CPU, env.cpuid_ext2_features, 20, > > false), DEFINE_PROP_BIT("f-xd", X86CPU, env.cpuid_ext2_features, 20, > > false), @@ -1307,11 +1307,12 @@ static int cpu_x86_find_by_name(X86CPU > > *cpu, x86_def_t *x86_cpu_def, { > > unsigned int i; > > x86_def_t *def; > > +CPUX86State *env = &cpu->env; > > > > char *s = g_strdup(cpu_model); > > char *featurestr, *name = strtok(s, ","); > > /* Features to be added*/ > > -uint32_t plus_features = 0, plus_ext_features = 0; > > +uint32_t plus_features = 0, plus_ext_features = > > env->cpuid_ext_features; > > Moving data back and forth between CPUX86State and x86_def_t makes the > initialization ordering confusing (today data is moved from x86_def_t to > X86CPU, and never the other way around). > > As this code is removed in the next patches, I don't mind too much, but > maybe it's simpler to implement this change only after the "use static > properties for setting cpuid features" patch? It won't eliminate moving data back and forth between CPUX86State and x86_def_t until CPU sub-classes (i.e. when x86_def_t isn't used anymore), if defaults from static properties are used. But I've rewritten patches in a way to make this easier to review and less confusing. > > > uint32_t plus_ext2_features = 0, plus_ext3_features = 0; > > uint32_t plus_kvm_features = 0, plus_svm_features = 0; > > uint32_t plus_7_0_ebx_features = 0; > > @@ -1345,10 +1346,6 @@ static int cpu_x86_find_by_name(X86CPU *cpu, > > x86_def_t *x86_cpu_def, plus_kvm_features = 0; > > #endif > > > > -add_flagname_to_bitmaps("hypervisor", &plus_features, > > -&plus_ext_features, &plus_ext2_features, &plus_ext3_features, > > -&plus_kvm_features, &plus_svm_features, > > &plus_7_0_ebx_features); - > > featurestr = strtok(NULL, ","); > > > > while (featurestr) { > > -- > > 1.7.11.7 > > > > >
[Qemu-devel] [PATCH] qdev: relax bus type check in qdev_device_add() (v2)
We are currently checking for an exact type match. Use QOM dynamic_cast to check for a compatible type instead. Cc: Konrad Frederic Cc: Peter Maydell Signed-off-by: Anthony Liguori --- v1 -> v2: - also add cast to qbus_find_recursive (Peter) - simplify by doing object_dynamic_cast instead of messing with classes --- hw/qdev-monitor.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/hw/qdev-monitor.c b/hw/qdev-monitor.c index 479eecd..a1b4d6a 100644 --- a/hw/qdev-monitor.c +++ b/hw/qdev-monitor.c @@ -289,8 +289,7 @@ static BusState *qbus_find_recursive(BusState *bus, const char *name, if (name && (strcmp(bus->name, name) != 0)) { match = 0; } -if (bus_typename && -(strcmp(object_get_typename(OBJECT(bus)), bus_typename) != 0)) { +if (bus_typename && !object_dynamic_cast(OBJECT(bus), bus_typename)) { match = 0; } if (match) { @@ -435,7 +434,7 @@ DeviceState *qdev_device_add(QemuOpts *opts) if (!bus) { return NULL; } -if (strcmp(object_get_typename(OBJECT(bus)), k->bus_type) != 0) { +if (!object_dynamic_cast(OBJECT(bus), k->bus_type)) { qerror_report(QERR_BAD_BUS_FOR_DEVICE, driver, object_get_typename(OBJECT(bus))); return NULL; -- 1.8.0
Re: [Qemu-devel] [PATCH v4 04/11] dataplane: add virtqueue vring code
> > +/* Toggle guest->host notifies */ > > +void vring_set_notification(VirtIODevice *vdev, Vring *vring, bool > > enable) > > +{ > > +if (vdev->guest_features & (1 << VIRTIO_RING_F_EVENT_IDX)) { > > +if (enable) { > > +vring_avail_event(&vring->vr) = vring->vr.avail->idx; > > +} > > +} else if (enable) { > > +vring->vr.used->flags &= ~VRING_USED_F_NO_NOTIFY; > > +} else { > > +vring->vr.used->flags |= VRING_USED_F_NO_NOTIFY; > > +} > > +} This is similar to the (guest-side) virtqueue_disable_cb/virtqueue_enable_cb. Perhaps having two functions will be easier to use, because from your other code it looks like you'd benefit from a return value when enable == true (again similar to virtqueue_enable_cb). Paolo
Re: [Qemu-devel] [PATCH] qdev: relax bus type check in qdev_device_add() (v2)
On 29/11/2012 16:12, Anthony Liguori wrote: We are currently checking for an exact type match. Use QOM dynamic_cast to check for a compatible type instead. Cc: Konrad Frederic Cc: Peter Maydell Signed-off-by: Anthony Liguori --- v1 -> v2: - also add cast to qbus_find_recursive (Peter) - simplify by doing object_dynamic_cast instead of messing with classes --- hw/qdev-monitor.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/hw/qdev-monitor.c b/hw/qdev-monitor.c index 479eecd..a1b4d6a 100644 --- a/hw/qdev-monitor.c +++ b/hw/qdev-monitor.c @@ -289,8 +289,7 @@ static BusState *qbus_find_recursive(BusState *bus, const char *name, if (name&& (strcmp(bus->name, name) != 0)) { match = 0; } -if (bus_typename&& -(strcmp(object_get_typename(OBJECT(bus)), bus_typename) != 0)) { +if (bus_typename&& !object_dynamic_cast(OBJECT(bus), bus_typename)) { match = 0; } if (match) { @@ -435,7 +434,7 @@ DeviceState *qdev_device_add(QemuOpts *opts) if (!bus) { return NULL; } -if (strcmp(object_get_typename(OBJECT(bus)), k->bus_type) != 0) { +if (!object_dynamic_cast(OBJECT(bus), k->bus_type)) { qerror_report(QERR_BAD_BUS_FOR_DEVICE, driver, object_get_typename(OBJECT(bus))); return NULL; That seems to work. Thanks, Fred.
Re: [Qemu-devel] [PATCH v4 10/11] dataplane: add virtio-blk data plane code
> > +unsigned int num_reqs; > > +QemuMutex num_reqs_lock; > > OK the only reason this lock is needed is because > you want to drain outside the thread. > Won't it be better to queue process the drain request through > the thread? > You won't need any locks then. Draining is processed in the thread. This lock is only needed to use it together with no_reqs_cond, because userspace threads do not have something like wait_event. Direct usage of futexes would let you remove the lock, but it's not portable. Paolo > > +QemuCond no_reqs_cond; > > +}; > > + > > +/* Raise an interrupt to signal guest, if necessary */ > > +static void notify_guest(VirtIOBlockDataPlane *s) > > +{ > > +if (!vring_should_notify(s->vdev, &s->vring)) { > > +return; > > +} > > + > > +event_notifier_set(s->guest_notifier); > > +} > > + > > +static void complete_request(struct iocb *iocb, ssize_t ret, void > > *opaque) > > +{ > > +VirtIOBlockDataPlane *s = opaque; > > +VirtIOBlockRequest *req = container_of(iocb, > > VirtIOBlockRequest, iocb); > > +struct virtio_blk_inhdr hdr; > > +int len; > > + > > +if (likely(ret >= 0)) { > > +hdr.status = VIRTIO_BLK_S_OK; > > +len = ret; > > +} else { > > +hdr.status = VIRTIO_BLK_S_IOERR; > > +len = 0; > > +} > > + > > +trace_virtio_blk_data_plane_complete_request(s, req->head, > > ret); > > + > > +qemu_iovec_from_buf(req->inhdr, 0, &hdr, sizeof(hdr)); > > +qemu_iovec_destroy(req->inhdr); > > +g_slice_free(QEMUIOVector, req->inhdr); > > + > > +/* According to the virtio specification len should be the > > number of bytes > > + * written to, but for virtio-blk it seems to be the number of > > bytes > > + * transferred plus the status bytes. > > + */ > > +vring_push(&s->vring, req->head, len + sizeof(hdr)); > > + > > +qemu_mutex_lock(&s->num_reqs_lock); > > +if (--s->num_reqs == 0) { > > +qemu_cond_broadcast(&s->no_reqs_cond); > > +} > > +qemu_mutex_unlock(&s->num_reqs_lock); > > +} > > + > > +static void fail_request_early(VirtIOBlockDataPlane *s, unsigned > > int head, > > + QEMUIOVector *inhdr, unsigned char > > status) > > +{ > > +struct virtio_blk_inhdr hdr = { > > +.status = status, > > +}; > > + > > +qemu_iovec_from_buf(inhdr, 0, &hdr, sizeof(hdr)); > > +qemu_iovec_destroy(inhdr); > > +g_slice_free(QEMUIOVector, inhdr); > > + > > +vring_push(&s->vring, head, sizeof(hdr)); > > +notify_guest(s); > > +} > > + > > +static int process_request(IOQueue *ioq, struct iovec iov[], > > + unsigned int out_num, unsigned int > > in_num, > > + unsigned int head) > > +{ > > +VirtIOBlockDataPlane *s = container_of(ioq, > > VirtIOBlockDataPlane, ioqueue); > > +struct iovec *in_iov = &iov[out_num]; > > +struct virtio_blk_outhdr outhdr; > > +QEMUIOVector *inhdr; > > +size_t in_size; > > + > > +/* Copy in outhdr */ > > +if (unlikely(iov_to_buf(iov, out_num, 0, &outhdr, > > +sizeof(outhdr)) != sizeof(outhdr))) { > > +error_report("virtio-blk request outhdr too short"); > > +return -EFAULT; > > +} > > +iov_discard(&iov, &out_num, sizeof(outhdr)); > > + > > +/* Grab inhdr for later */ > > +in_size = iov_size(in_iov, in_num); > > +if (in_size < sizeof(struct virtio_blk_inhdr)) { > > +error_report("virtio_blk request inhdr too short"); > > +return -EFAULT; > > +} > > +inhdr = g_slice_new(QEMUIOVector); > > +qemu_iovec_init(inhdr, 1); > > +qemu_iovec_concat_iov(inhdr, in_iov, in_num, > > +in_size - sizeof(struct virtio_blk_inhdr), > > +sizeof(struct virtio_blk_inhdr)); > > +iov_discard(&in_iov, &in_num, -sizeof(struct > > virtio_blk_inhdr)); > > + > > +/* TODO Linux sets the barrier bit even when not advertised! > > */ > > +outhdr.type &= ~VIRTIO_BLK_T_BARRIER; > > + > > +struct iocb *iocb; > > +switch (outhdr.type & (VIRTIO_BLK_T_OUT | > > VIRTIO_BLK_T_SCSI_CMD | > > + VIRTIO_BLK_T_FLUSH)) { > > +case VIRTIO_BLK_T_IN: > > +iocb = ioq_rdwr(ioq, true, in_iov, in_num, outhdr.sector * > > 512); > > +break; > > + > > +case VIRTIO_BLK_T_OUT: > > +iocb = ioq_rdwr(ioq, false, iov, out_num, outhdr.sector * > > 512); > > +break; > > + > > +case VIRTIO_BLK_T_SCSI_CMD: > > +/* TODO support SCSI commands */ > > +fail_request_early(s, head, inhdr, VIRTIO_BLK_S_UNSUPP); > > +return 0; > > + > > +case VIRTIO_BLK_T_FLUSH: > > +/* TODO fdsync not supported by Linux AIO, do it > > synchronously here! */ > > +fdatasync(s->fd); > > +fail_request_early(s, head, inhdr, VIRTIO_BLK_S_OK); > > +return 0; > > + > > +default: > > +error_report("virtio-blk unsupported request ty
Re: [Qemu-devel] [PATCHv4] rbd block driver fix race between aio completition and aio cancel
- Messaggio originale - > Da: "Stefan Priebe" > A: qemu-devel@nongnu.org > Cc: stefa...@gmail.com, "josh durgin" , > ceph-de...@vger.kernel.org, pbonz...@redhat.com, > "Stefan Priebe" > Inviato: Giovedì, 29 novembre 2012 15:28:35 > Oggetto: [PATCHv4] rbd block driver fix race between aio completition and aio > cancel > > This one fixes a race which qemu had also in iscsi block driver > between cancellation and io completition. > > qemu_rbd_aio_cancel was not synchronously waiting for the end of > the command. > > To archieve this it introduces a new status flag which uses > -EINPROGRESS. > > Changes since PATCHv3: > - removed unnecessary if condition in rbd_start_aio as we > haven't start io yet > - moved acb->status = 0 to rbd_aio_bh_cb so qemu_aio_wait always > waits until BH was executed > > Changes since PATCHv2: > - fixed missing braces > - added vfree for bounce > > Signed-off-by: Stefan Priebe > --- > block/rbd.c | 16 +--- > 1 file changed, 9 insertions(+), 7 deletions(-) > > diff --git a/block/rbd.c b/block/rbd.c > index f3becc7..28e94ab 100644 > --- a/block/rbd.c > +++ b/block/rbd.c > @@ -77,6 +77,7 @@ typedef struct RBDAIOCB { > int error; > struct BDRVRBDState *s; > int cancelled; > +int status; > } RBDAIOCB; > > typedef struct RADOSCB { > @@ -376,12 +377,6 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb) > RBDAIOCB *acb = rcb->acb; > int64_t r; > > -if (acb->cancelled) { > -qemu_vfree(acb->bounce); > -qemu_aio_release(acb); > -goto done; > -} > - > r = rcb->ret; > > if (acb->cmd == RBD_AIO_WRITE || > @@ -409,7 +404,6 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb) > /* Note that acb->bh can be NULL in case where the aio was > cancelled */ > acb->bh = qemu_bh_new(rbd_aio_bh_cb, acb); > qemu_bh_schedule(acb->bh); > -done: > g_free(rcb); > } > > @@ -568,6 +562,12 @@ static void qemu_rbd_aio_cancel(BlockDriverAIOCB > *blockacb) > { > RBDAIOCB *acb = (RBDAIOCB *) blockacb; > acb->cancelled = 1; > + > +while (acb->status == -EINPROGRESS) { > +qemu_aio_wait(); > +} > + > +qemu_vfree(acb->bounce); This vfree is not needed, since the BH will run and do the free. Otherwise looks ok. > } > > static const AIOCBInfo rbd_aiocb_info = { > @@ -639,6 +639,7 @@ static void rbd_aio_bh_cb(void *opaque) > acb->common.cb(acb->common.opaque, (acb->ret > 0 ? 0 : > acb->ret)); > qemu_bh_delete(acb->bh); > acb->bh = NULL; > +acb->status = 0; > > qemu_aio_release(acb); > } > @@ -685,6 +686,7 @@ static BlockDriverAIOCB > *rbd_start_aio(BlockDriverState *bs, > acb->s = s; > acb->cancelled = 0; > acb->bh = NULL; > +acb->status = -EINPROGRESS; > > if (cmd == RBD_AIO_WRITE) { > qemu_iovec_to_buf(acb->qiov, 0, acb->bounce, qiov->size); > -- > 1.7.10.4 > >
Re: [Qemu-devel] [PATCH] rbd block driver fix race between aio completition and aio cancel
> > @@ -574,6 +570,12 @@ static void > > qemu_rbd_aio_cancel(BlockDriverAIOCB *blockacb) > > { > > RBDAIOCB *acb = (RBDAIOCB *) blockacb; > > acb->cancelled = 1; > > + > > +while (acb->status == -EINPROGRESS) { > > +qemu_aio_wait(); > > +} > > + > > There should be a qemu_vfree(acb->bounce); here No, because the BH will have run at this point and you'd doubly-free the buffer. Paolo > > +qemu_aio_release(acb); > > } > > > > static AIOPool rbd_aio_pool = { > > @@ -646,7 +648,8 @@ static void rbd_aio_bh_cb(void *opaque) > > qemu_bh_delete(acb->bh); > > acb->bh = NULL; > > > > -qemu_aio_release(acb); > > +if (!acb->cancelled) > > +qemu_aio_release(acb); > > } > > > > static int rbd_aio_discard_wrapper(rbd_image_t image, > > @@ -691,6 +694,7 @@ static BlockDriverAIOCB > > *rbd_start_aio(BlockDriverState *bs, > > acb->s = s; > > acb->cancelled = 0; > > acb->bh = NULL; > > +acb->status = -EINPROGRESS; > > > > if (cmd == RBD_AIO_WRITE) { > > qemu_iovec_to_buf(acb->qiov, 0, acb->bounce, qiov->size); > > @@ -737,7 +741,8 @@ static BlockDriverAIOCB > > *rbd_start_aio(BlockDriverState *bs, > > failed: > > g_free(rcb); > > s->qemu_aio_count--; > > -qemu_aio_release(acb); > > +if (!acb->cancelled) > > qemu_vfree(acb->bounce) should be here as well, although that's a > separate bug that's probably never hit. > > > +qemu_aio_release(acb); > > return NULL; > > } > > > > > >
Re: [Qemu-devel] [PATCH v4 10/11] dataplane: add virtio-blk data plane code
On Thu, Nov 29, 2012 at 10:21:31AM -0500, Paolo Bonzini wrote: > > > > +unsigned int num_reqs; > > > +QemuMutex num_reqs_lock; > > > > OK the only reason this lock is needed is because > > you want to drain outside the thread. > > Won't it be better to queue process the drain request through > > the thread? > > You won't need any locks then. > > Draining is processed in the thread. Confused. You say same thread is always taking this lock? This is what I suggest. > This lock is only needed > to use it together with no_reqs_cond, because userspace threads > do not have something like wait_event. > > Direct usage of futexes would let you remove the lock, but it's > not portable. > > Paolo It's easier: simply signal condition which is local on stack. > > > +QemuCond no_reqs_cond; > > > +}; > > > + > > > +/* Raise an interrupt to signal guest, if necessary */ > > > +static void notify_guest(VirtIOBlockDataPlane *s) > > > +{ > > > +if (!vring_should_notify(s->vdev, &s->vring)) { > > > +return; > > > +} > > > + > > > +event_notifier_set(s->guest_notifier); > > > +} > > > + > > > +static void complete_request(struct iocb *iocb, ssize_t ret, void > > > *opaque) > > > +{ > > > +VirtIOBlockDataPlane *s = opaque; > > > +VirtIOBlockRequest *req = container_of(iocb, > > > VirtIOBlockRequest, iocb); > > > +struct virtio_blk_inhdr hdr; > > > +int len; > > > + > > > +if (likely(ret >= 0)) { > > > +hdr.status = VIRTIO_BLK_S_OK; > > > +len = ret; > > > +} else { > > > +hdr.status = VIRTIO_BLK_S_IOERR; > > > +len = 0; > > > +} > > > + > > > +trace_virtio_blk_data_plane_complete_request(s, req->head, > > > ret); > > > + > > > +qemu_iovec_from_buf(req->inhdr, 0, &hdr, sizeof(hdr)); > > > +qemu_iovec_destroy(req->inhdr); > > > +g_slice_free(QEMUIOVector, req->inhdr); > > > + > > > +/* According to the virtio specification len should be the > > > number of bytes > > > + * written to, but for virtio-blk it seems to be the number of > > > bytes > > > + * transferred plus the status bytes. > > > + */ > > > +vring_push(&s->vring, req->head, len + sizeof(hdr)); > > > + > > > +qemu_mutex_lock(&s->num_reqs_lock); > > > +if (--s->num_reqs == 0) { > > > +qemu_cond_broadcast(&s->no_reqs_cond); > > > +} > > > +qemu_mutex_unlock(&s->num_reqs_lock); > > > +} > > > + > > > +static void fail_request_early(VirtIOBlockDataPlane *s, unsigned > > > int head, > > > + QEMUIOVector *inhdr, unsigned char > > > status) > > > +{ > > > +struct virtio_blk_inhdr hdr = { > > > +.status = status, > > > +}; > > > + > > > +qemu_iovec_from_buf(inhdr, 0, &hdr, sizeof(hdr)); > > > +qemu_iovec_destroy(inhdr); > > > +g_slice_free(QEMUIOVector, inhdr); > > > + > > > +vring_push(&s->vring, head, sizeof(hdr)); > > > +notify_guest(s); > > > +} > > > + > > > +static int process_request(IOQueue *ioq, struct iovec iov[], > > > + unsigned int out_num, unsigned int > > > in_num, > > > + unsigned int head) > > > +{ > > > +VirtIOBlockDataPlane *s = container_of(ioq, > > > VirtIOBlockDataPlane, ioqueue); > > > +struct iovec *in_iov = &iov[out_num]; > > > +struct virtio_blk_outhdr outhdr; > > > +QEMUIOVector *inhdr; > > > +size_t in_size; > > > + > > > +/* Copy in outhdr */ > > > +if (unlikely(iov_to_buf(iov, out_num, 0, &outhdr, > > > +sizeof(outhdr)) != sizeof(outhdr))) { > > > +error_report("virtio-blk request outhdr too short"); > > > +return -EFAULT; > > > +} > > > +iov_discard(&iov, &out_num, sizeof(outhdr)); > > > + > > > +/* Grab inhdr for later */ > > > +in_size = iov_size(in_iov, in_num); > > > +if (in_size < sizeof(struct virtio_blk_inhdr)) { > > > +error_report("virtio_blk request inhdr too short"); > > > +return -EFAULT; > > > +} > > > +inhdr = g_slice_new(QEMUIOVector); > > > +qemu_iovec_init(inhdr, 1); > > > +qemu_iovec_concat_iov(inhdr, in_iov, in_num, > > > +in_size - sizeof(struct virtio_blk_inhdr), > > > +sizeof(struct virtio_blk_inhdr)); > > > +iov_discard(&in_iov, &in_num, -sizeof(struct > > > virtio_blk_inhdr)); > > > + > > > +/* TODO Linux sets the barrier bit even when not advertised! > > > */ > > > +outhdr.type &= ~VIRTIO_BLK_T_BARRIER; > > > + > > > +struct iocb *iocb; > > > +switch (outhdr.type & (VIRTIO_BLK_T_OUT | > > > VIRTIO_BLK_T_SCSI_CMD | > > > + VIRTIO_BLK_T_FLUSH)) { > > > +case VIRTIO_BLK_T_IN: > > > +iocb = ioq_rdwr(ioq, true, in_iov, in_num, outhdr.sector * > > > 512); > > > +break; > > > + > > > +case VIRTIO_BLK_T_OUT: > > > +iocb = ioq_rdwr(ioq, false, iov, out_num, outhdr.sector * > > > 512); > > > +break; > > > + > > > +case VIR
Re: [Qemu-devel] [PATCH v4 03/11] dataplane: add host memory mapping code
> > I don't understand. If memory in the VGA region returns true from > > memory_region_is_ram(), why would there be a problem? > > If you change this memory but you don't update the display. > Never happens with non buggy guests but we should catch and fail if > it does. Actually it _could_ happen with an MS-DOS guest that does the oh-so-retro DEF SEG=&HB800 BLOAD "file.pic", 0 (I think). Paolo
Re: [Qemu-devel] [PATCH v4 00/11] virtio: virtio-blk data plane
On Thu, Nov 29, 2012 at 03:48:04PM +0100, Stefan Hajnoczi wrote: > On Thu, Nov 29, 2012 at 04:09:28PM +0200, Michael S. Tsirkin wrote: > > On Thu, Nov 29, 2012 at 10:18:59AM +0100, Stefan Hajnoczi wrote: > > > Michael, Paolo: Are you happy with v4? > > > > Looks pretty clean by itself. I sent some comments but they can be > > addressed later. What worries me most is the code duplication with > > regular virtio. > > > > I see two ways to reduce the maintainance somewhat > > - split out ring handling code in virtio-blk > > to a separate file to make it more obvious which part > > is inactive when data plane runs. > > - share ring processing code with virtio/virtio-blk > > (e.g. use callbacks) > > > > Was any thought given to implementing one of these two > > approaches? > > Yes, your option #2 is where I'd like to move once threaded memory > dispatch is working. I hope we can run virtio.c code in a thread > outside the global mutex soon. That way we can kill > hw/dataplane/vring.[ch]. > > Ping Fan Liu has been working on the memory API and device emulation > stuff that we need in order to eventually use virtio.c outside the > global mutex. > > Stefan I guess we can live with this short term.
Re: [Qemu-devel] [PATCH] qdev: relax bus type check in qdev_device_add() (v2)
Am 29.11.2012 16:12, schrieb Anthony Liguori: > We are currently checking for an exact type match. Use QOM dynamic_cast to > check for a compatible type instead. > > Cc: Konrad Frederic > Cc: Peter Maydell > Signed-off-by: Anthony Liguori Reviewed-by: Andreas Färber Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH 15/37] target-i386: set default value of "hypervisor" feature using static property
On Thu, Nov 29, 2012 at 03:56:09PM +0100, Igor Mammedov wrote: > On Fri, 9 Nov 2012 16:48:07 +0100 > Eduardo Habkost wrote: > > > > > On 22/10/2012, at 17:03, Igor Mammedov wrote: > > > > > Signed-off-by: Igor Mammedov > > > --- > > > target-i386/cpu.c | 9 +++-- > > > 1 file changed, 3 insertions(+), 6 deletions(-) > > > > > > diff --git a/target-i386/cpu.c b/target-i386/cpu.c > > > index 3131945..dc4fcdf 100644 > > > --- a/target-i386/cpu.c > > > +++ b/target-i386/cpu.c > > > @@ -174,7 +174,7 @@ static Property cpu_x86_properties[] = { > > > DEFINE_PROP_BIT("f-xsave", X86CPU, env.cpuid_ext_features, 26, false), > > > DEFINE_PROP_BIT("f-osxsave", X86CPU, env.cpuid_ext_features, 27, > > > false), DEFINE_PROP_BIT("f-avx", X86CPU, env.cpuid_ext_features, 28, > > > false), > > > -DEFINE_PROP_BIT("f-hypervisor", X86CPU, env.cpuid_ext_features, 31, > > > false), > > > +DEFINE_PROP_BIT("f-hypervisor", X86CPU, env.cpuid_ext_features, 31, > > > true), DEFINE_PROP_BIT("f-syscall", X86CPU, env.cpuid_ext2_features, 11, > > > false), DEFINE_PROP_BIT("f-nx", X86CPU, env.cpuid_ext2_features, 20, > > > false), DEFINE_PROP_BIT("f-xd", X86CPU, env.cpuid_ext2_features, 20, > > > false), @@ -1307,11 +1307,12 @@ static int cpu_x86_find_by_name(X86CPU > > > *cpu, x86_def_t *x86_cpu_def, { > > > unsigned int i; > > > x86_def_t *def; > > > +CPUX86State *env = &cpu->env; > > > > > > char *s = g_strdup(cpu_model); > > > char *featurestr, *name = strtok(s, ","); > > > /* Features to be added*/ > > > -uint32_t plus_features = 0, plus_ext_features = 0; > > > +uint32_t plus_features = 0, plus_ext_features = > > > env->cpuid_ext_features; > > > > Moving data back and forth between CPUX86State and x86_def_t makes the > > initialization ordering confusing (today data is moved from x86_def_t to > > X86CPU, and never the other way around). > > > > As this code is removed in the next patches, I don't mind too much, but > > maybe it's simpler to implement this change only after the "use static > > properties for setting cpuid features" patch? > It won't eliminate moving data back and forth between CPUX86State and > x86_def_t until CPU sub-classes (i.e. when x86_def_t isn't used anymore), > if defaults from static properties are used. You don't need CPU subclasses to reduce back-and-forth data movement. We can just reorder code to look like: split_cpu_model_string(cpu_model, &model, &features); fill_cpudef_for_model_somehow(model, &cpudef); cpudef_2_x86_cpu(cpu, &cpudef); cpu_parse_feature_string(cpu, features); Instead of what we have today, that is: split_cpu_model_string(cpu_model, &model, &features); fill_cpudef_for_model_somehow(model, &cpudef); cpu_parse_feature_string(cpu, features); /* (the 3 steps above are all done inside cpu_x86_find_by_name()) */ cpudef_2_x86_cpu(cpu, &cpudef); You did that on patch 32/37, but I would be interesting to do that _before_ introducing the whole new CPU properties code, to make review easier. Reordering the steps to make the code less confusing before introducing the CPU properties makes the code easier to review, that's why I sent the CPU init cleanup series. And it makes us one step closer to completely eliminate the X86CPUDefinition struct in the future. I actually made the reordering change above using the CPU init series as base, and it was very easy to do, after the CPU init was cleaned up. You can see the result at: https://github.com/ehabkost/qemu-hacks/commit/5256503135e787cb3550092c24dbeb3c5fc5a747 (it's completely untested. The rest of the changes are on the work/cpu-model-classes branch) > > But I've rewritten patches in a way to make this easier to review and less > confusing. I'm curious to see the result. Right now I am using the CPU init cleanup series as base for my CPU subclass experiments. > > > > > > uint32_t plus_ext2_features = 0, plus_ext3_features = 0; > > > uint32_t plus_kvm_features = 0, plus_svm_features = 0; > > > uint32_t plus_7_0_ebx_features = 0; > > > @@ -1345,10 +1346,6 @@ static int cpu_x86_find_by_name(X86CPU *cpu, > > > x86_def_t *x86_cpu_def, plus_kvm_features = 0; > > > #endif > > > > > > -add_flagname_to_bitmaps("hypervisor", &plus_features, > > > -&plus_ext_features, &plus_ext2_features, &plus_ext3_features, > > > -&plus_kvm_features, &plus_svm_features, > > > &plus_7_0_ebx_features); - > > > featurestr = strtok(NULL, ","); > > > > > > while (featurestr) { > > > -- > > > 1.7.11.7 > > > > > > > > > -- Eduardo
Re: [Qemu-devel] [PATCH v4 10/11] dataplane: add virtio-blk data plane code
- Messaggio originale - > Da: "Michael S. Tsirkin" > A: "Paolo Bonzini" > Cc: qemu-devel@nongnu.org, "Kevin Wolf" , "Anthony Liguori" > , "Blue Swirl" > , k...@us.ibm.com, "Asias He" , > "Stefan Hajnoczi" > Inviato: Giovedì, 29 novembre 2012 16:27:39 > Oggetto: Re: [PATCH v4 10/11] dataplane: add virtio-blk data plane code > > On Thu, Nov 29, 2012 at 10:21:31AM -0500, Paolo Bonzini wrote: > > > > > > +unsigned int num_reqs; > > > > +QemuMutex num_reqs_lock; > > > > > > OK the only reason this lock is needed is because > > > you want to drain outside the thread. > > > Won't it be better to queue process the drain request through > > > the thread? > > > You won't need any locks then. > > > > Draining is processed in the thread. > > Confused. You say same thread is always taking this lock? No, one thread needs to take the lock just for waiting on the condition variable. However, indeed you are right about this: > > > > +/* Block until pending requests have completed > > > > + * > > > > + * The vring continues to be serviced so ensure no new > > > > + * requests will be added to avoid races. > > > > > > This comment confuses me. "avoid races" is a kind of vague > > > comment that does not really help. > > > > > > This function does not seem to ensure > > > no new requests - it simply waits until num requests > > > gets to 0. But requests could get added right afterwards > > > and it won't help. and solving this race would also let Stefan remove the lock. Stefan, perhaps you could replace the stop_notifier mechanism of event-poll.c with something similar to aio_notify/qemu_notify_event, and even remove event_poll_run in favor of event_poll (aka aio_wait...). And also remove the return value from the callback, since your remaining callbacks always return true. The main thread can just signal the event loop: s->stopping = true; event_poll_notify(&s->event_poll); qemu_thread_join(&s->thread); and the dataplane thread will drain the queue before exiting. Paolo > This is what I suggest. > > > This lock is only needed > > to use it together with no_reqs_cond, because userspace threads > > do not have something like wait_event. > > > > Direct usage of futexes would let you remove the lock, but it's > > not portable. > > > > Paolo > > It's easier: simply signal condition which is local on stack. > > > > > +QemuCond no_reqs_cond; > > > > +}; > > > > + > > > > +/* Raise an interrupt to signal guest, if necessary */ > > > > +static void notify_guest(VirtIOBlockDataPlane *s) > > > > +{ > > > > +if (!vring_should_notify(s->vdev, &s->vring)) { > > > > +return; > > > > +} > > > > + > > > > +event_notifier_set(s->guest_notifier); > > > > +} > > > > + > > > > +static void complete_request(struct iocb *iocb, ssize_t ret, > > > > void > > > > *opaque) > > > > +{ > > > > +VirtIOBlockDataPlane *s = opaque; > > > > +VirtIOBlockRequest *req = container_of(iocb, > > > > VirtIOBlockRequest, iocb); > > > > +struct virtio_blk_inhdr hdr; > > > > +int len; > > > > + > > > > +if (likely(ret >= 0)) { > > > > +hdr.status = VIRTIO_BLK_S_OK; > > > > +len = ret; > > > > +} else { > > > > +hdr.status = VIRTIO_BLK_S_IOERR; > > > > +len = 0; > > > > +} > > > > + > > > > +trace_virtio_blk_data_plane_complete_request(s, req->head, > > > > ret); > > > > + > > > > +qemu_iovec_from_buf(req->inhdr, 0, &hdr, sizeof(hdr)); > > > > +qemu_iovec_destroy(req->inhdr); > > > > +g_slice_free(QEMUIOVector, req->inhdr); > > > > + > > > > +/* According to the virtio specification len should be the > > > > number of bytes > > > > + * written to, but for virtio-blk it seems to be the > > > > number of > > > > bytes > > > > + * transferred plus the status bytes. > > > > + */ > > > > +vring_push(&s->vring, req->head, len + sizeof(hdr)); > > > > + > > > > +qemu_mutex_lock(&s->num_reqs_lock); > > > > +if (--s->num_reqs == 0) { > > > > +qemu_cond_broadcast(&s->no_reqs_cond); > > > > +} > > > > +qemu_mutex_unlock(&s->num_reqs_lock); > > > > +} > > > > + > > > > +static void fail_request_early(VirtIOBlockDataPlane *s, > > > > unsigned > > > > int head, > > > > + QEMUIOVector *inhdr, unsigned > > > > char > > > > status) > > > > +{ > > > > +struct virtio_blk_inhdr hdr = { > > > > +.status = status, > > > > +}; > > > > + > > > > +qemu_iovec_from_buf(inhdr, 0, &hdr, sizeof(hdr)); > > > > +qemu_iovec_destroy(inhdr); > > > > +g_slice_free(QEMUIOVector, inhdr); > > > > + > > > > +vring_push(&s->vring, head, sizeof(hdr)); > > > > +notify_guest(s); > > > > +} > > > > + > > > > +static int process_request(IOQueue *ioq, struct iovec iov[], > > > > + unsigned int out_num, unsigned int > > > > in_num, > > > > + unsigned int head) > > > > +{ > > > > +VirtIOBlockDataPlan
[Qemu-devel] [PATCH for-1.3] qemu-tech.texi: update implemented xtensa features list
Debug option is available since QEMU-1.2; FP coprocessor and coprocessor context is available since QEMU-1.3. Signed-off-by: Max Filippov --- qemu-tech.texi | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/qemu-tech.texi b/qemu-tech.texi index d73dda8..8aefa74 100644 --- a/qemu-tech.texi +++ b/qemu-tech.texi @@ -262,16 +262,16 @@ Current QEMU limitations: @item Core Xtensa ISA emulation, including most options: code density, loop, extended L32R, 16- and 32-bit multiplication, 32-bit division, -MAC16, miscellaneous operations, boolean, multiprocessor synchronization, +MAC16, miscellaneous operations, boolean, FP coprocessor, coprocessor +context, debug, multiprocessor synchronization, conditional store, exceptions, relocatable vectors, unaligned exception, interrupts (including high priority and timer), hardware alignment, region protection, region translation, MMU, windowed registers, thread pointer, processor ID. -@item Not implemented options: FP coprocessor, coprocessor context, -data/instruction cache (including cache prefetch and locking), XLMI, -processor interface, debug. Also options not covered by the core ISA -(e.g. FLIX, wide branches) are not implemented. +@item Not implemented options: data/instruction cache (including cache +prefetch and locking), XLMI, processor interface. Also options not +covered by the core ISA (e.g. FLIX, wide branches) are not implemented. @item Can run most Xtensa Linux binaries. -- 1.7.7.6
[Qemu-devel] [PATCHv5 1.3] seccomp: adding new syscalls (bugzilla 855162)
According to the bug 855162[0] - there's the need of adding new syscalls to the whitelist when using Qemu with Libvirt. [0] - https://bugzilla.redhat.com/show_bug.cgi?id=855162 Reported-by: Paul Moore Signed-off-by: Eduardo Otubo Signed-off-by: Corey Bryant --- qemu-seccomp.c | 156 ++-- 1 file changed, 139 insertions(+), 17 deletions(-) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index 64329a3..2a71d6f 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -26,8 +26,12 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(timer_gettime), 254 }, { SCMP_SYS(futex), 253 }, { SCMP_SYS(select), 252 }, +#if defined(__x86_64__) { SCMP_SYS(recvfrom), 251 }, { SCMP_SYS(sendto), 250 }, +#elif defined(__i386__) +{ SCMP_SYS(socketcall), 250 }, +#endif { SCMP_SYS(read), 249 }, { SCMP_SYS(brk), 248 }, { SCMP_SYS(clone), 247 }, @@ -36,15 +40,30 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(execve), 245 }, { SCMP_SYS(open), 245 }, { SCMP_SYS(ioctl), 245 }, +#if defined(__x86_64__) +{ SCMP_SYS(socket), 245 }, +{ SCMP_SYS(setsockopt), 245 }, { SCMP_SYS(recvmsg), 245 }, { SCMP_SYS(sendmsg), 245 }, { SCMP_SYS(accept), 245 }, { SCMP_SYS(connect), 245 }, +{ SCMP_SYS(socketpair), 245 }, +{ SCMP_SYS(bind), 245 }, +{ SCMP_SYS(listen), 245 }, +{ SCMP_SYS(semget), 245 }, +#elif defined(__i386__) +{ SCMP_SYS(ipc), 245 }, +#endif { SCMP_SYS(gettimeofday), 245 }, { SCMP_SYS(readlink), 245 }, { SCMP_SYS(access), 245 }, { SCMP_SYS(prctl), 245 }, { SCMP_SYS(signalfd), 245 }, +{ SCMP_SYS(getrlimit), 245 }, +{ SCMP_SYS(set_tid_address), 245 }, +{ SCMP_SYS(statfs), 245 }, +{ SCMP_SYS(unlink), 245 }, +{ SCMP_SYS(wait4), 245 }, #if defined(__i386__) { SCMP_SYS(fcntl64), 245 }, { SCMP_SYS(fstat64), 245 }, @@ -56,30 +75,33 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(sigreturn), 245 }, { SCMP_SYS(_newselect), 245 }, { SCMP_SYS(_llseek), 245 }, -{ SCMP_SYS(mmap2), 245}, +{ SCMP_SYS(mmap2), 245 }, { SCMP_SYS(sigprocmask), 245 }, -#elif defined(__x86_64__) -{ SCMP_SYS(sched_getparam), 245}, -{ SCMP_SYS(sched_getscheduler), 245}, -{ SCMP_SYS(fstat), 245}, -{ SCMP_SYS(clock_getres), 245}, -{ SCMP_SYS(sched_get_priority_min), 245}, -{ SCMP_SYS(sched_get_priority_max), 245}, -{ SCMP_SYS(stat), 245}, -{ SCMP_SYS(socket), 245}, -{ SCMP_SYS(setsockopt), 245}, -{ SCMP_SYS(uname), 245}, -{ SCMP_SYS(semget), 245}, #endif +{ SCMP_SYS(sched_getparam), 245 }, +{ SCMP_SYS(sched_getscheduler), 245 }, +{ SCMP_SYS(fstat), 245 }, +{ SCMP_SYS(clock_getres), 245 }, +{ SCMP_SYS(sched_get_priority_min), 245 }, +{ SCMP_SYS(sched_get_priority_max), 245 }, +{ SCMP_SYS(stat), 245 }, +{ SCMP_SYS(uname), 245 }, { SCMP_SYS(eventfd2), 245 }, { SCMP_SYS(dup), 245 }, +{ SCMP_SYS(dup2), 245 }, +{ SCMP_SYS(dup3), 245 }, { SCMP_SYS(gettid), 245 }, +{ SCMP_SYS(getgid), 245 }, +{ SCMP_SYS(getegid), 245 }, +{ SCMP_SYS(getuid), 245 }, +{ SCMP_SYS(geteuid), 245 }, { SCMP_SYS(timer_create), 245 }, { SCMP_SYS(exit), 245 }, { SCMP_SYS(clock_gettime), 245 }, { SCMP_SYS(time), 245 }, { SCMP_SYS(restart_syscall), 245 }, { SCMP_SYS(pwrite64), 245 }, +{ SCMP_SYS(nanosleep), 245 }, { SCMP_SYS(chown), 245 }, { SCMP_SYS(openat), 245 }, { SCMP_SYS(getdents), 245 }, @@ -93,8 +115,6 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(lseek), 245 }, { SCMP_SYS(pselect6), 245 }, { SCMP_SYS(fork), 245 }, -{ SCMP_SYS(bind), 245 }, -{ SCMP_SYS(listen), 245 }, { SCMP_SYS(eventfd), 245 }, { SCMP_SYS(rt_sigprocmask), 245 }, { SCMP_SYS(write), 244 }, @@ -104,10 +124,112 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(pipe2), 242 }, { SCMP_SYS(munmap), 242 }, { SCMP_SYS(mremap), 242 }, +{ SCMP_SYS(fdatasync), 242 }, +{ SCMP_SYS(close), 242 }, +{ SCMP_SYS(rt_sigpending), 242 }, +{ SCMP_SYS(rt_sigtimedwait), 242 }, +{ SCMP_SYS(readv), 242 }, +{ SCMP_SYS(writev), 242 }, +{ SCMP_SYS(preadv), 242 }, +{ SCMP_SYS(pwritev), 242 }, +{ SCMP_SYS(setrlimit), 242 }, +{ SCMP_SYS(ftruncate), 242 }, +{ SCMP_SYS(lstat), 242 }, +{ SCMP_SYS(pipe), 242 }, +{ SCMP_SYS(umask), 242 }, +{ SCMP_SYS(chdir), 242 }, +{ SCMP_SYS(setitimer), 242 }, +{ SCMP_SYS(setsid), 242 }, +{ SCMP_SYS(poll), 242 }, +{ SCMP_SYS(epoll_create), 242 }, +{ SCMP_SYS(epoll_ctl), 242 }, +{ SCMP_SYS(epoll_wait), 242 }, +#if defined(__i386__) +{ SCMP_SYS(waitpid), 242 }, +#elif defined(__x86_64__) { SCMP_SYS(getsockname), 242 }, { SCMP_SYS(getpeername), 242 }, -{ SCM
Re: [Qemu-devel] [PATCHv5 1.3] seccomp: adding new syscalls (bugzilla 855162)
On Thursday, November 29, 2012 01:56:41 PM Eduardo Otubo wrote: > According to the bug 855162[0] - there's the need of adding new syscalls > to the whitelist when using Qemu with Libvirt. > > [0] - https://bugzilla.redhat.com/show_bug.cgi?id=855162 > > Reported-by: Paul Moore > Signed-off-by: Eduardo Otubo > Signed-off-by: Corey Bryant Works for me on 64-bit x86 F17. Tested-by: Paul Moore > diff --git a/qemu-seccomp.c b/qemu-seccomp.c > index 64329a3..2a71d6f 100644 > --- a/qemu-seccomp.c > +++ b/qemu-seccomp.c > @@ -26,8 +26,12 @@ static const struct QemuSeccompSyscall > seccomp_whitelist[] = { { SCMP_SYS(timer_gettime), 254 }, > { SCMP_SYS(futex), 253 }, > { SCMP_SYS(select), 252 }, > +#if defined(__x86_64__) > { SCMP_SYS(recvfrom), 251 }, > { SCMP_SYS(sendto), 250 }, > +#elif defined(__i386__) > +{ SCMP_SYS(socketcall), 250 }, > +#endif > { SCMP_SYS(read), 249 }, > { SCMP_SYS(brk), 248 }, > { SCMP_SYS(clone), 247 }, > @@ -36,15 +40,30 @@ static const struct QemuSeccompSyscall > seccomp_whitelist[] = { { SCMP_SYS(execve), 245 }, > { SCMP_SYS(open), 245 }, > { SCMP_SYS(ioctl), 245 }, > +#if defined(__x86_64__) > +{ SCMP_SYS(socket), 245 }, > +{ SCMP_SYS(setsockopt), 245 }, > { SCMP_SYS(recvmsg), 245 }, > { SCMP_SYS(sendmsg), 245 }, > { SCMP_SYS(accept), 245 }, > { SCMP_SYS(connect), 245 }, > +{ SCMP_SYS(socketpair), 245 }, > +{ SCMP_SYS(bind), 245 }, > +{ SCMP_SYS(listen), 245 }, > +{ SCMP_SYS(semget), 245 }, > +#elif defined(__i386__) > +{ SCMP_SYS(ipc), 245 }, > +#endif > { SCMP_SYS(gettimeofday), 245 }, > { SCMP_SYS(readlink), 245 }, > { SCMP_SYS(access), 245 }, > { SCMP_SYS(prctl), 245 }, > { SCMP_SYS(signalfd), 245 }, > +{ SCMP_SYS(getrlimit), 245 }, > +{ SCMP_SYS(set_tid_address), 245 }, > +{ SCMP_SYS(statfs), 245 }, > +{ SCMP_SYS(unlink), 245 }, > +{ SCMP_SYS(wait4), 245 }, > #if defined(__i386__) > { SCMP_SYS(fcntl64), 245 }, > { SCMP_SYS(fstat64), 245 }, > @@ -56,30 +75,33 @@ static const struct QemuSeccompSyscall > seccomp_whitelist[] = { { SCMP_SYS(sigreturn), 245 }, > { SCMP_SYS(_newselect), 245 }, > { SCMP_SYS(_llseek), 245 }, > -{ SCMP_SYS(mmap2), 245}, > +{ SCMP_SYS(mmap2), 245 }, > { SCMP_SYS(sigprocmask), 245 }, > -#elif defined(__x86_64__) > -{ SCMP_SYS(sched_getparam), 245}, > -{ SCMP_SYS(sched_getscheduler), 245}, > -{ SCMP_SYS(fstat), 245}, > -{ SCMP_SYS(clock_getres), 245}, > -{ SCMP_SYS(sched_get_priority_min), 245}, > -{ SCMP_SYS(sched_get_priority_max), 245}, > -{ SCMP_SYS(stat), 245}, > -{ SCMP_SYS(socket), 245}, > -{ SCMP_SYS(setsockopt), 245}, > -{ SCMP_SYS(uname), 245}, > -{ SCMP_SYS(semget), 245}, > #endif > +{ SCMP_SYS(sched_getparam), 245 }, > +{ SCMP_SYS(sched_getscheduler), 245 }, > +{ SCMP_SYS(fstat), 245 }, > +{ SCMP_SYS(clock_getres), 245 }, > +{ SCMP_SYS(sched_get_priority_min), 245 }, > +{ SCMP_SYS(sched_get_priority_max), 245 }, > +{ SCMP_SYS(stat), 245 }, > +{ SCMP_SYS(uname), 245 }, > { SCMP_SYS(eventfd2), 245 }, > { SCMP_SYS(dup), 245 }, > +{ SCMP_SYS(dup2), 245 }, > +{ SCMP_SYS(dup3), 245 }, > { SCMP_SYS(gettid), 245 }, > +{ SCMP_SYS(getgid), 245 }, > +{ SCMP_SYS(getegid), 245 }, > +{ SCMP_SYS(getuid), 245 }, > +{ SCMP_SYS(geteuid), 245 }, > { SCMP_SYS(timer_create), 245 }, > { SCMP_SYS(exit), 245 }, > { SCMP_SYS(clock_gettime), 245 }, > { SCMP_SYS(time), 245 }, > { SCMP_SYS(restart_syscall), 245 }, > { SCMP_SYS(pwrite64), 245 }, > +{ SCMP_SYS(nanosleep), 245 }, > { SCMP_SYS(chown), 245 }, > { SCMP_SYS(openat), 245 }, > { SCMP_SYS(getdents), 245 }, > @@ -93,8 +115,6 @@ static const struct QemuSeccompSyscall > seccomp_whitelist[] = { { SCMP_SYS(lseek), 245 }, > { SCMP_SYS(pselect6), 245 }, > { SCMP_SYS(fork), 245 }, > -{ SCMP_SYS(bind), 245 }, > -{ SCMP_SYS(listen), 245 }, > { SCMP_SYS(eventfd), 245 }, > { SCMP_SYS(rt_sigprocmask), 245 }, > { SCMP_SYS(write), 244 }, > @@ -104,10 +124,112 @@ static const struct QemuSeccompSyscall > seccomp_whitelist[] = { { SCMP_SYS(pipe2), 242 }, > { SCMP_SYS(munmap), 242 }, > { SCMP_SYS(mremap), 242 }, > +{ SCMP_SYS(fdatasync), 242 }, > +{ SCMP_SYS(close), 242 }, > +{ SCMP_SYS(rt_sigpending), 242 }, > +{ SCMP_SYS(rt_sigtimedwait), 242 }, > +{ SCMP_SYS(readv), 242 }, > +{ SCMP_SYS(writev), 242 }, > +{ SCMP_SYS(preadv), 242 }, > +{ SCMP_SYS(pwritev), 242 }, > +{ SCMP_SYS(setrlimit), 242 }, > +{ SCMP_SYS(ftruncate), 242 }, > +{ SCMP_SYS(lstat), 242 }, > +{ SCMP_SYS(pipe), 242 }, > +{ SCMP_SYS(umask), 242 }, > +{ SCMP_SYS(chdir), 242 }, > +{ SCMP_SYS(setitimer), 242 }, > +{ SCMP_SYS(setsid), 242 }, > +{ SCMP_SYS(poll), 242 }, > +{ SCMP_SYS(epoll_create), 242 }, > +{ SCMP_S
Re: [Qemu-devel] [PATCH 1/1] exec: make -mem-path filenames deterministic
Ping? P.S. Please note a typo in the cover letter: when I referred to KSM, I meant "kernel samepage merging", not "kernel shared memory" :-) On Tue, Nov 20, 2012 at 7:10 AM, Peter Feiner wrote: > This patch makes the -mem-path filenames deterministic and allows some > control > over how QEMU mmaps the files. Given this control, QEMU can be used to > implement > exogenous memory management techniques quite simply. Two examples follow: > > 1. Post-copy migration (mmap=shared for origin, save device state, > path in >NFS available on both hosts, mmap=shared for destination). > 2. memory sharing via VM cloning (mmap=shared for master, save device > state, >then mmap=private for clones). > > These new features are exposed via arguments to -mem-path: > > -mem-path \ > path[,mode=temp|open|create][,mmap=private|shared][,nofallback][,anyfs] > > The backing file names are made deterministic by including their RAMBlocks' > offsets, which are unique given a qemu command line. Note that Xen's live > migration relies on RAMBlocks having the same offsets between QEMU > processes > (see xen_read_physmap). The new file name format is > > qemu_back_mem.OFFSET+SIZE.NAME[.RANDOM] > > where SIZE and NAME are added for extra sanity checking when mode="open". > > By default, the old -mem-path behavior is preserved. I.e., mode="temp" is > used, > which adds a random suffix to the deterministic name and unlinks the files. > > Signed-off-by: Peter Feiner > --- > cpu-all.h |5 > exec.c | 60 > ++ > qemu-config.c | 26 +++ > qemu-options.hx | 24 +++-- > vl.c| 43 +- > 5 files changed, 131 insertions(+), 27 deletions(-) > > diff --git a/cpu-all.h b/cpu-all.h > index c9c51b8..a83f71e 100644 > --- a/cpu-all.h > +++ b/cpu-all.h > @@ -500,6 +500,11 @@ typedef struct RAMList { > extern RAMList ram_list; > > extern const char *mem_path; > +extern int mem_temp; > +extern int mem_create; > +extern int mem_fallback; > +extern int mem_shared; > +extern int mem_hugetlbfs; > extern int mem_prealloc; > > /* Flags stored in the low bits of the TLB virtual address. These are > diff --git a/exec.c b/exec.c > index 8435de0..75b146c 100644 > --- a/exec.c > +++ b/exec.c > @@ -2346,7 +2346,7 @@ void qemu_flush_coalesced_mmio_buffer(void) > > #define HUGETLBFS_MAGIC 0x958458f6 > > -static long gethugepagesize(const char *path) > +static long getfspagesize(const char *path, int want_hugetlbfs) > { > struct statfs fs; > int ret; > @@ -2360,7 +2360,7 @@ static long gethugepagesize(const char *path) > return 0; > } > > -if (fs.f_type != HUGETLBFS_MAGIC) > +if (want_hugetlbfs && fs.f_type != HUGETLBFS_MAGIC) > fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path); > > return fs.f_bsize; > @@ -2373,17 +2373,15 @@ static void *file_ram_alloc(RAMBlock *block, > char *filename; > void *area; > int fd; > -#ifdef MAP_POPULATE > int flags; > -#endif > -unsigned long hpagesize; > +unsigned long pagesize; > > -hpagesize = gethugepagesize(path); > -if (!hpagesize) { > +pagesize = getfspagesize(path, mem_hugetlbfs); > +if (!pagesize) { > return NULL; > } > > -if (memory < hpagesize) { > +if (memory < pagesize) { > return NULL; > } > > @@ -2392,20 +2390,35 @@ static void *file_ram_alloc(RAMBlock *block, > return NULL; > } > > -if (asprintf(&filename, "%s/qemu_back_mem.XX", path) == -1) { > +if (asprintf(&filename, "%s/qemu_back_mem.%"PRIx64"+%"PRIx64".%s", > + path, block->offset, memory, block->mr->name) == -1) { > return NULL; > } > > -fd = mkstemp(filename); > +if (mem_temp) { > +if (asprintf(&filename, "%s.XX", filename) == -1) { > +return NULL; > +} > +fd = mkstemp(filename); > +} else { > +flags = O_RDWR | O_CLOEXEC | (mem_create ? O_CREAT : 0); > +fd = open(filename, flags, 0600); > +} > + > if (fd < 0) { > -perror("unable to create backing store for hugepages"); > +fprintf(stderr, "unable to create backing store %s " > +"for guest memory: %s\n", filename, strerror(errno)); > free(filename); > return NULL; > } > -unlink(filename); > + > +if (mem_temp) { > +unlink(filename); > +} > + > free(filename); > > -memory = (memory+hpagesize-1) & ~(hpagesize-1); > +memory = (memory+pagesize-1) & ~(pagesize-1); > > /* > * ftruncate is not supported by hugetlbfs in older > @@ -2416,16 +2429,16 @@ static void *file_ram_alloc(RAMBlock *block, > if (ftruncate(fd, memory)) > perror("ftruncate"); > > +flags = mem_shared ? MAP_SHARED : MAP_PRIVATE; > #ifdef MAP_POPULATE > /* NB: M
Re: [Qemu-devel] [PATCH v2 0/6] block: bdrv_img_create(): propagate errors
On Fri, 2 Nov 2012 11:42:32 -0200 Luiz Capitulino wrote: > On Fri, 02 Nov 2012 14:40:03 +0100 > Kevin Wolf wrote: > > > Am 02.11.2012 14:25, schrieb Luiz Capitulino: > > > On Fri, 19 Oct 2012 11:27:59 -0300 > > > Luiz Capitulino wrote: > > > > > >> By adding error propagation to bdrv_img_create() we improve error > > >> reporting > > >> in qmp_transaction() and simplify qemu-img.c:img_create() a bit. > > >> > > >> Please, check individual patches for details. > > > > > > Kevin, is this in your review queue? > > > > Yes, it is. With KVM Forum and lots of other patch series, no promises > > though. > > Sure, just wanted to know if you were aware about it. Still in your queue? :)
[Qemu-devel] [PATCH 3/3] hw/arm_gic_common: Correct GICC_PMR reset value for newer GICs
The GIC architecture specification for v1 and v2 GICs (as found on the Cortex-A9 and newer) states that the GICC_PMR reset value is zero; this differs from the 0xf0 reset value used on 11MPCore. The NVIC is different again in not having a CPU interface; since we share the GIC code we must force the priority mask field to allow through all interrupts. Signed-off-by: Peter Maydell --- hw/arm_gic_common.c |6 +- hw/armv7m_nvic.c|4 +++- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/hw/arm_gic_common.c b/hw/arm_gic_common.c index 8369309..73ae331 100644 --- a/hw/arm_gic_common.c +++ b/hw/arm_gic_common.c @@ -127,7 +127,11 @@ static void arm_gic_common_reset(DeviceState *dev) int i; memset(s->irq_state, 0, GIC_MAXIRQ * sizeof(gic_irq_state)); for (i = 0 ; i < s->num_cpu; i++) { -s->priority_mask[i] = 0xf0; +if (s->revision == REV_11MPCORE) { +s->priority_mask[i] = 0xf0; +} else { +s->priority_mask[i] = 0; +} s->current_pending[i] = 1023; s->running_irq[i] = 1023; s->running_priority[i] = 0x100; diff --git a/hw/armv7m_nvic.c b/hw/armv7m_nvic.c index f0a2e7b..4963678 100644 --- a/hw/armv7m_nvic.c +++ b/hw/armv7m_nvic.c @@ -455,9 +455,11 @@ static void armv7m_nvic_reset(DeviceState *dev) nc->parent_reset(dev); /* Common GIC reset resets to disabled; the NVIC doesn't have * per-CPU interfaces so mark our non-existent CPU interface - * as enabled by default. + * as enabled by default, and with a priority mask which allows + * all interrupts through. */ s->gic.cpu_enabled[0] = 1; +s->gic.priority_mask[0] = 0x100; /* The NVIC as a whole is always enabled. */ s->gic.enabled = 1; systick_reset(s); -- 1.7.9.5
[Qemu-devel] [PATCH 1.3?] multiboot: fix e801 memory map
The e801 memory sizes in the multiboot structures hard-code the available low memory to 640. However, the value should not include the size of the EBDA. Fill the value in the option ROM, getting the size of low memory from the BIOS. Cc: Alexander Graf Signed-off-by: Paolo Bonzini --- Alex, can you ack this patch for 1.3? pc-bios/multiboot.bin | Bin 1024 -> 1024 bytes pc-bios/optionrom/multiboot.S | 7 +++ 2 files changed, 7 insertions(+) diff --git a/pc-bios/multiboot.bin b/pc-bios/multiboot.bin index f74a6e142fddc054d7f40ab346a108532afac40f..7b3c1745a430ea5e0e15b9aa817d1cbbaa40db14 100644 GIT binary patch delta 81 zcmZqRXyBNj#q7o8KT+38L4+xd@o;KdryK*rZVrajPB|8aZaEhwAcKt|ty|7*VtW(k kS)sHUDQSyY7$$Qv{%2ueV91@!$z;eVv)P(y2P2~%0BkrF9smFU delta 72 zcmZqRXyBNj#capqJWgJByEE-zoLs=f!*~_|qXQE} diff --git a/pc-bios/optionrom/multiboot.S b/pc-bios/optionrom/multiboot.S index f08222a..003bcfb 100644 --- a/pc-bios/optionrom/multiboot.S +++ b/pc-bios/optionrom/multiboot.S @@ -75,6 +75,13 @@ run_multiboot: shr $4, %eax mov %ax, %fs + /* Account for the EBDA in the multiboot structure's e801 +* map. +*/ + int $0x12 + cwtl + movl%eax, %fs:4 + /* ES = mmap_addr */ mov %fs:48, %eax shr $4, %eax -- 1.8.0
[Qemu-devel] [PATCH] multiboot: fix e801 memory map
The e801 memory sizes in the multiboot structures hard-code the available low memory to 640. However, the value should not include the size of the EBDA. Fill the value in the option ROM, getting the size of low memory from the BIOS. Cc: Alexander Graf Signed-off-by: Paolo Bonzini --- pc-bios/multiboot.bin | Bin 1024 -> 1024 bytes pc-bios/optionrom/multiboot.S | 7 +++ 2 files changed, 7 insertions(+) diff --git a/pc-bios/multiboot.bin b/pc-bios/multiboot.bin index f74a6e142fddc054d7f40ab346a108532afac40f..7b3c1745a430ea5e0e15b9aa817d1cbbaa40db14 100644 GIT binary patch delta 81 zcmZqRXyBNj#q7o8KT+38L4+xd@o;KdryK*rZVrajPB|8aZaEhwAcKt|ty|7*VtW(k kS)sHUDQSyY7$$Qv{%2ueV91@!$z;eVv)P(y2P2~%0BkrF9smFU delta 72 zcmZqRXyBNj#capqJWgJByEE-zoLs=f!*~_|qXQE} diff --git a/pc-bios/optionrom/multiboot.S b/pc-bios/optionrom/multiboot.S index f08222a..003bcfb 100644 --- a/pc-bios/optionrom/multiboot.S +++ b/pc-bios/optionrom/multiboot.S @@ -75,6 +75,13 @@ run_multiboot: shr $4, %eax mov %ax, %fs + /* Account for the EBDA in the multiboot structure's e801 +* map. +*/ + int $0x12 + cwtl + movl%eax, %fs:4 + /* ES = mmap_addr */ mov %fs:48, %eax shr $4, %eax -- 1.8.0
[Qemu-devel] [PATCH 0/3] ARM: fix secondary boot GIC init, GIC bugs
The secondary CPU boot code we use on ARM had a couple of places where it was accidentally relying on bugs or implementation dependent behaviour of QEMU's on GIC implementation: * we weren't initialising the GICC_PMR priority mask, which in a correct v1 or v2 GIC is set to mask out all interrupts from reset. This worked on the QEMU GIC because our GIC (a) gets the reset value of PMR wrong on non-11MPCore and (b) is doing an incorrect comparison against the PMR value when delivering interrupts anyway. * no barrier between initialising the GIC and doing a WFI; this is fine for TCG QEMU but could potentially result in the GIC config not being guaranteed to have happened before we hit the WFI when running on real CPU hardware under ARM KVM. This patch series first fixes the secondary CPU boot code bugs, and then corrects our GIC model to match the specs. NB: I don't have a working test setup/images for highbank or exynos4 so those changes are only compile tested, but they are basically the same as the generic boot code changes. Peter Maydell (3): hw/arm_boot, exynos4210, highbank: Fix secondary boot GIC init hw/arm_gic: Fix comparison with priority mask register hw/arm_gic_common: Correct GICC_PMR reset value for newer GICs hw/arm_boot.c | 17 ++--- hw/arm_gic.c|2 +- hw/arm_gic_common.c |6 +- hw/armv7m_nvic.c|4 +++- hw/exynos4210.c | 10 +++--- hw/highbank.c |7 +-- 6 files changed, 35 insertions(+), 11 deletions(-) -- 1.7.9.5
[Qemu-devel] [PATCH 2/3] hw/arm_gic: Fix comparison with priority mask register
The GIC spec states that only interrupts with higher priority than the value in the GICC_PMR priority mask register are passed through to the processor. We were incorrectly allowing through interrupts with a priority equal to the specified value: correct the comparison operation to match the spec. Signed-off-by: Peter Maydell --- hw/arm_gic.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/arm_gic.c b/hw/arm_gic.c index f9e423f..672d539 100644 --- a/hw/arm_gic.c +++ b/hw/arm_gic.c @@ -73,7 +73,7 @@ void gic_update(GICState *s) } } level = 0; -if (best_prio <= s->priority_mask[cpu]) { +if (best_prio < s->priority_mask[cpu]) { s->current_pending[cpu] = best_irq; if (best_prio < s->running_priority[cpu]) { DPRINTF("Raised pending IRQ %d\n", best_irq); -- 1.7.9.5
[Qemu-devel] [PATCH 1/3] hw/arm_boot, exynos4210, highbank: Fix secondary boot GIC init
Fix the code in the secondary CPU boot stubs so that it correctly initialises the GIC rather than relying on bugs or implementation dependent aspects of the QEMU GIC implementation: * set the GIC_PMR.Priority field to all-ones, so that all interrupts are passed through. The default of all-zeroes means all interrupts are masked, and QEMU only booted because of a bug in the priority masking in our GIC implementation. * add a barrier after GIC setup and before WFI to ensure that GIC config is complete before we go into a possible low power state. This isn't needed with the software GIC model but could be required when using KVM and executing this code on the real hardware CPU. Note that of the three secondary stub implementations, only the common generic one needs to support both v6 and v7 DSB encodings; highbank and exynos4210 will always be v7 CPUs. Signed-off-by: Peter Maydell --- hw/arm_boot.c | 17 ++--- hw/exynos4210.c | 10 +++--- hw/highbank.c |7 +-- 3 files changed, 26 insertions(+), 8 deletions(-) diff --git a/hw/arm_boot.c b/hw/arm_boot.c index 92e2cab..ec3b8d5 100644 --- a/hw/arm_boot.c +++ b/hw/arm_boot.c @@ -44,11 +44,17 @@ static uint32_t bootloader[] = { * for an interprocessor interrupt and polling a configurable * location for the kernel secondary CPU entry point. */ +#define DSB_INSN 0xf57ff04f +#define CP15_DSB_INSN 0xee070f9a /* mcr cp15, 0, r0, c7, c10, 4 */ + static uint32_t smpboot[] = { - 0xe59f201c, /* ldr r2, gic_cpu_if */ - 0xe59f001c, /* ldr r0, startaddr */ + 0xe59f2028, /* ldr r2, gic_cpu_if */ + 0xe59f0028, /* ldr r0, startaddr */ 0xe3a01001, /* mov r1, #1 */ - 0xe5821000, /* str r1, [r2] */ + 0xe5821000, /* str r1, [r2] - set GICC_CTLR.Enable */ + 0xe3a010ff, /* mov r1, #0xff */ + 0xe5821004, /* str r1, [r2, 4] - set GIC_PMR.Priority to 0xff */ + DSB_INSN, /* dsb */ 0xe320f003, /* wfi */ 0xe5901000, /* ldr r1, [r0] */ 0xe1110001, /* tst r1, r1 */ @@ -65,6 +71,11 @@ static void default_write_secondary(ARMCPU *cpu, smpboot[ARRAY_SIZE(smpboot) - 1] = info->smp_bootreg_addr; smpboot[ARRAY_SIZE(smpboot) - 2] = info->gic_cpu_if_addr; for (n = 0; n < ARRAY_SIZE(smpboot); n++) { +/* Replace DSB with the pre-v7 DSB if necessary. */ +if (!arm_feature(&cpu->env, ARM_FEATURE_V7) && +smpboot[n] == DSB_INSN) { +smpboot[n] = CP15_DSB_INSN; +} smpboot[n] = tswap32(smpboot[n]); } rom_add_blob_fixed("smpboot", smpboot, sizeof(smpboot), diff --git a/hw/exynos4210.c b/hw/exynos4210.c index 00d4db8..22148cd 100644 --- a/hw/exynos4210.c +++ b/hw/exynos4210.c @@ -80,12 +80,16 @@ void exynos4210_write_secondary(ARMCPU *cpu, { int n; uint32_t smpboot[] = { -0xe59f3024, /* ldr r3, External gic_cpu_if */ -0xe59f2024, /* ldr r2, Internal gic_cpu_if */ -0xe59f0024, /* ldr r0, startaddr */ +0xe59f3034, /* ldr r3, External gic_cpu_if */ +0xe59f2034, /* ldr r2, Internal gic_cpu_if */ +0xe59f0034, /* ldr r0, startaddr */ 0xe3a01001, /* mov r1, #1 */ 0xe5821000, /* str r1, [r2] */ 0xe5831000, /* str r1, [r3] */ +0xe3a010ff, /* mov r1, #0xff */ +0xe5821004, /* str r1, [r2, #4] */ +0xe5831004, /* str r1, [r3, #4] */ +0xf57ff04f, /* dsb */ 0xe320f003, /* wfi */ 0xe5901000, /* ldr r1, [r0] */ 0xe1110001, /* tst r1, r1 */ diff --git a/hw/highbank.c b/hw/highbank.c index afbb005..447e57d 100644 --- a/hw/highbank.c +++ b/hw/highbank.c @@ -44,9 +44,12 @@ static void hb_write_secondary(ARMCPU *cpu, const struct arm_boot_info *info) 0xe21f, /* ands r0, r0, #0x0f */ 0xe3a03040, /* mov r3, #0x40 - jump address is 0x40 + 0x10 * core id */ 0xe0830200, /* add r0, r3, r0, lsl #4 */ -0xe59f2018, /* ldr r2, privbase */ +0xe59f2024, /* ldr r2, privbase */ 0xe3a01001, /* mov r1, #1 */ -0xe5821100, /* str r1, [r2, #256] */ +0xe5821100, /* str r1, [r2, #256] - set GICC_CTLR.Enable */ +0xe3a010ff, /* mov r1, #0xff */ +0xe5821104, /* str r1, [r2, #260] - set GICC_PMR.Priority to 0xff */ +0xf57ff04f, /* dsb */ 0xe320f003, /* wfi */ 0xe5901000, /* ldr r1, [r0] */ 0xe1110001, /* tst r1, r1 */ -- 1.7.9.5
[Qemu-devel] [PATCH v2 05/10] qemu-ga: build_fs_mount_list(): take an Error argument
Signed-off-by: Luiz Capitulino --- o v2: use error_setg() in build_fs_mount_list() qga/commands-posix.c | 30 -- 1 file changed, 16 insertions(+), 14 deletions(-) diff --git a/qga/commands-posix.c b/qga/commands-posix.c index b3a3f26..ff14174 100644 --- a/qga/commands-posix.c +++ b/qga/commands-posix.c @@ -371,7 +371,7 @@ static void free_fs_mount_list(FsMountList *mounts) /* * Walk the mount table and build a list of local file systems */ -static int build_fs_mount_list(FsMountList *mounts) +static void build_fs_mount_list(FsMountList *mounts, Error **err) { struct mntent *ment; FsMount *mount; @@ -380,8 +380,8 @@ static int build_fs_mount_list(FsMountList *mounts) fp = setmntent(mtab, "r"); if (!fp) { -g_warning("fsfreeze: unable to read mtab"); -return -1; +error_setg(err, "failed to open mtab file: '%s'", mtab); +return; } while ((ment = getmntent(fp))) { @@ -405,8 +405,6 @@ static int build_fs_mount_list(FsMountList *mounts) } endmntent(fp); - -return 0; } #endif @@ -433,15 +431,17 @@ int64_t qmp_guest_fsfreeze_freeze(Error **err) int ret = 0, i = 0; FsMountList mounts; struct FsMount *mount; +Error *local_err = NULL; int fd; char err_msg[512]; slog("guest-fsfreeze called"); QTAILQ_INIT(&mounts); -ret = build_fs_mount_list(&mounts); -if (ret < 0) { -return ret; +build_fs_mount_list(&mounts, &local_err); +if (error_is_set(&local_err)) { +error_propagate(err, local_err); +return -1; } /* cannot risk guest agent blocking itself on a write in this state */ @@ -498,12 +498,12 @@ int64_t qmp_guest_fsfreeze_thaw(Error **err) FsMountList mounts; FsMount *mount; int fd, i = 0, logged; +Error *local_err = NULL; QTAILQ_INIT(&mounts); -ret = build_fs_mount_list(&mounts); -if (ret) { -error_set(err, QERR_QGA_COMMAND_FAILED, - "failed to enumerate filesystems"); +build_fs_mount_list(&mounts, &local_err); +if (error_is_set(&local_err)) { +error_propagate(err, local_err); return 0; } @@ -568,6 +568,7 @@ void qmp_guest_fstrim(bool has_minimum, int64_t minimum, Error **err) FsMountList mounts; struct FsMount *mount; int fd; +Error *local_err = NULL; char err_msg[512]; struct fstrim_range r = { .start = 0, @@ -578,8 +579,9 @@ void qmp_guest_fstrim(bool has_minimum, int64_t minimum, Error **err) slog("guest-fstrim called"); QTAILQ_INIT(&mounts); -ret = build_fs_mount_list(&mounts); -if (ret < 0) { +build_fs_mount_list(&mounts, &local_err); +if (error_is_set(&local_err)) { +error_propagate(err, local_err); return; } -- 1.8.0
Re: [Qemu-devel] [PATCH v2 2/2] ivshmem: use irqfd to interrupt among VMs
On Thu, Nov 29, 2012 at 1:34 AM, liu ping fan wrote: > On Thu, Nov 29, 2012 at 12:42 PM, Cam Macdonell wrote: >> On Tue, Nov 27, 2012 at 7:53 PM, liu ping fan wrote: >>> On Wed, Nov 28, 2012 at 5:48 AM, Cam Macdonell wrote: On Sat, Nov 24, 2012 at 8:51 PM, Liu Ping Fan wrote: > From: Liu Ping Fan > > Using irqfd, so we can avoid switch between kernel and user when > VMs interrupts each other. Nice work. Due to a hardware failure, there will be a small delay in me being able to test this. I'll follow up as soon as I can. >>> BTW where can I find the latest guest code for test? >>> I got the guest code from git://gitorious.org/nahanni/guest-code.git. >>> But it seems outdated, after fixing the unlocked_ioctl, and vm-id bits >>> position (the codes conflict with ivshmem spec), it works (I have >>> tested for V1). >> >> Hello, >> >> Which device driver are you using? >> > guest-code/kernel_module/standard/kvm_ivshmem.c The uio driver is the recommended one, however if you want to use the kvm_ivshmem one and have it working, then feel free to continue. I had deleted it form the repo, but some users had based solutions off it, so I added it back. btw, my hardware issue has been resolved, so I'll get to testing your patch soon. Sincerely, Cam > >> Cam >> >>> >>> Regards, >>> Pingfan >>> > > Signed-off-by: Liu Ping Fan > --- > hw/ivshmem.c | 54 > +- > 1 files changed, 53 insertions(+), 1 deletions(-) > > diff --git a/hw/ivshmem.c b/hw/ivshmem.c > index 7c8630c..5709e89 100644 > --- a/hw/ivshmem.c > +++ b/hw/ivshmem.c > @@ -19,6 +19,7 @@ > #include "hw.h" > #include "pc.h" > #include "pci.h" > +#include "msi.h" > #include "msix.h" > #include "kvm.h" > #include "migration.h" > @@ -83,6 +84,7 @@ typedef struct IVShmemState { > uint32_t vectors; > uint32_t features; > EventfdEntry *eventfd_table; > +int *vector_virqs; > > Error *migration_blocker; > > @@ -625,16 +627,62 @@ static int ivshmem_load(QEMUFile* f, void *opaque, > int version_id) > return 0; > } > > +static int ivshmem_vector_use(PCIDevice *dev, unsigned vector, > + MSIMessage msg) > +{ > +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); > +int virq; > +EventNotifier *n = &s->peers[s->vm_id].eventfds[vector]; > + > +virq = kvm_irqchip_add_msi_route(kvm_state, msg); > +if (virq >= 0 && kvm_irqchip_add_irqfd_notifier(kvm_state, n, virq) > >= 0) { > +s->vector_virqs[vector] = virq; > +qemu_chr_add_handlers(s->eventfd_chr[vector], NULL, NULL, NULL, > NULL); > +} else if (virq >= 0) { > +kvm_irqchip_release_virq(kvm_state, virq); > +error_report("ivshmem, can not setup irqfd\n"); > +return -1; > +} else { > +error_report("ivshmem, no enough msi route to setup irqfd\n"); > +return -1; > +} > + > +return 0; > +} > + > +static void ivshmem_vector_release(PCIDevice *dev, unsigned vector) > +{ > +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); > +EventNotifier *n = &s->peers[s->vm_id].eventfds[vector]; > +int virq = s->vector_virqs[vector]; > + > +if (s->vector_virqs[vector] >= 0) { > +kvm_irqchip_remove_irqfd_notifier(kvm_state, n, virq); > +kvm_irqchip_release_virq(kvm_state, virq); > +s->vector_virqs[vector] = -1; > +} > +} > + > static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address, > uint32_t val, int len) > { > +bool is_enabled, was_enabled = msi_enabled(pci_dev); > + > pci_default_write_config(pci_dev, address, val, len); > +is_enabled = msi_enabled(pci_dev); > +if (!was_enabled && is_enabled) { > +msix_set_vector_notifiers(pci_dev, ivshmem_vector_use, > +ivshmem_vector_release); > +} else if (was_enabled && !is_enabled) { > +msix_unset_vector_notifiers(pci_dev); > +} > } > > static int pci_ivshmem_init(PCIDevice *dev) > { > IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); > uint8_t *pci_conf; > +int i; > > if (s->sizearg == NULL) > s->ivshmem_size = 4 << 20; /* 4 MB default */ > @@ -758,7 +806,10 @@ static int pci_ivshmem_init(PCIDevice *dev) > } > > s->dev.config_write = ivshmem_write_config; > - > +s->vector_virqs = g_new0(int, s->vectors); > +for (i = 0; i < s->vectors; i++) { > +s->vector_virqs[i] = -1; > +} > return 0; > } > > @@ -770,6 +821
Re: [Qemu-devel] [PATCH V7 0/6] VMXNET3 paravirtual NIC device implementation
On Fri, Nov 16, 2012 at 2:55 PM, Dmitry Fleytman wrote: > We didn't succeed to find any guide or sample for the > king of tests required (packets transmission). > If someone can provide a pointer to the relevant > information we'll be very grateful. QEMU does not have a NIC test suite. You can use manual testing of guests with vmxnet3 drivers to see if the NIC works as expected. Maybe you could even try running WHQL tests on the VMXNET3 driver inside a Windows guest - I'm not sure, I've never looked into WHQL. The community has been developing the qtest framework which is for device-level testing. For example, you could poke hardware registers and check that the correct interrupt gets raised. However, this framework is being extended to cover busses and hardware as developers decide they want test coverage. PCI support isn't available yet but Anthony Liguori recently worked on qtest PCI code.
Re: [Qemu-devel] [PATCH 15/37] target-i386: set default value of "hypervisor" feature using static property
On Thu, 29 Nov 2012 13:47:37 -0200 Eduardo Habkost wrote: > On Thu, Nov 29, 2012 at 03:56:09PM +0100, Igor Mammedov wrote: > > On Fri, 9 Nov 2012 16:48:07 +0100 > > Eduardo Habkost wrote: > > > > > > > > On 22/10/2012, at 17:03, Igor Mammedov wrote: > > > > > > > Signed-off-by: Igor Mammedov > > > > --- > > > > target-i386/cpu.c | 9 +++-- > > > > 1 file changed, 3 insertions(+), 6 deletions(-) > > > > > > > > diff --git a/target-i386/cpu.c b/target-i386/cpu.c > > > > index 3131945..dc4fcdf 100644 > > > > --- a/target-i386/cpu.c > > > > +++ b/target-i386/cpu.c > > > > @@ -174,7 +174,7 @@ static Property cpu_x86_properties[] = { > > > > DEFINE_PROP_BIT("f-xsave", X86CPU, env.cpuid_ext_features, 26, > > > > false), DEFINE_PROP_BIT("f-osxsave", X86CPU, env.cpuid_ext_features, > > > > 27, false), DEFINE_PROP_BIT("f-avx", X86CPU, env.cpuid_ext_features, > > > > 28, false), > > > > -DEFINE_PROP_BIT("f-hypervisor", X86CPU, env.cpuid_ext_features, > > > > 31, false), > > > > +DEFINE_PROP_BIT("f-hypervisor", X86CPU, env.cpuid_ext_features, > > > > 31, true), DEFINE_PROP_BIT("f-syscall", X86CPU, > > > > env.cpuid_ext2_features, 11, false), DEFINE_PROP_BIT("f-nx", X86CPU, > > > > env.cpuid_ext2_features, 20, false), DEFINE_PROP_BIT("f-xd", X86CPU, > > > > env.cpuid_ext2_features, 20, false), @@ -1307,11 +1307,12 @@ static > > > > int cpu_x86_find_by_name(X86CPU *cpu, x86_def_t *x86_cpu_def, { > > > > unsigned int i; > > > > x86_def_t *def; > > > > +CPUX86State *env = &cpu->env; > > > > > > > > char *s = g_strdup(cpu_model); > > > > char *featurestr, *name = strtok(s, ","); > > > > /* Features to be added*/ > > > > -uint32_t plus_features = 0, plus_ext_features = 0; > > > > +uint32_t plus_features = 0, plus_ext_features = > > > > env->cpuid_ext_features; > > > > > > Moving data back and forth between CPUX86State and x86_def_t makes the > > > initialization ordering confusing (today data is moved from x86_def_t to > > > X86CPU, and never the other way around). > > > > > > As this code is removed in the next patches, I don't mind too much, but > > > maybe it's simpler to implement this change only after the "use static > > > properties for setting cpuid features" patch? > > It won't eliminate moving data back and forth between CPUX86State and > > x86_def_t until CPU sub-classes (i.e. when x86_def_t isn't used anymore), > > if defaults from static properties are used. > > You don't need CPU subclasses to reduce back-and-forth data movement. > We can just reorder code to look like: > > split_cpu_model_string(cpu_model, &model, &features); > fill_cpudef_for_model_somehow(model, &cpudef); somewhere between here and [2], One still needs to enable "hypervisor" feature, and set kvm_features defaults. I guess "hypervisor" could be pushed out into x86_def_t for each model and kvm_cpu_fill_host(). But it would be hard to do so for KVM features (see how pv_eoi is set). see 8d239fbb9b, ded9f32153, a7bb291e5b, 2623e2f876, bd539e14ef in https://github.com/imammedo/qemu/commits/x86-cpu-properties.WIP > cpudef_2_x86_cpu(cpu, &cpudef); As alternative it's possible to merge (OR) ext_features and kvm_features from cpudef into env inside of cpudef_2_x86_cpu() to keep defaults SET by static properties when cpu obj was created. [2] This merge won't disappear completely until cpudefs are converted into classes and X_class_init() will set all defaults for a given cpu model. and common defaults could/should be set by parent X86CPU class_init(), and ideally rewrite should end up in a inheritance tree of subclasses, each adding new features relative to previous model. > cpu_parse_feature_string(cpu, features); > > Instead of what we have today, that is: > > split_cpu_model_string(cpu_model, &model, &features); > fill_cpudef_for_model_somehow(model, &cpudef); > cpu_parse_feature_string(cpu, features); > /* (the 3 steps above are all done inside cpu_x86_find_by_name()) */ > cpudef_2_x86_cpu(cpu, &cpudef); > > You did that on patch 32/37, but I would be interesting to do that > _before_ introducing the whole new CPU properties code, to make review > easier. > > Reordering the steps to make the code less confusing before introducing > the CPU properties makes the code easier to review, that's why I sent > the CPU init cleanup series. And it makes us one step closer to > completely eliminate the X86CPUDefinition struct in the future. > > I actually made the reordering change above using the CPU init series as > base, and it was very easy to do, after the CPU init was cleaned up. You > can see the result at: > https://github.com/ehabkost/qemu-hacks/commit/5256503135e787cb3550092c24dbeb3c5fc5a747 After some playing, I've abandoned approach to change cpu_x86_find_by_name() and opted in favor of not touching it and feature string parsing much, except of doing almost the same split as you. As additional thing, I've moved setting defaults out of it as well. > (it's completely
Re: [Qemu-devel] [PATCH]target-i386:cpu_x86_init():do clean up work
On Wed, Nov 28, 2012 at 02:20:23PM +0800, liguang wrote: > 1.remove unused variable env It's not unused. You are removing the line that sets env->cpu_model_str. > 2.remove redundant error handling > > Signed-off-by: liguang > --- > target-i386/helper.c | 17 - > 1 files changed, 8 insertions(+), 9 deletions(-) > > diff --git a/target-i386/helper.c b/target-i386/helper.c > index bf206cf..5686130 100644 > --- a/target-i386/helper.c > +++ b/target-i386/helper.c > @@ -1243,25 +1243,24 @@ int cpu_x86_get_descr_debug(CPUX86State *env, > unsigned int selector, > X86CPU *cpu_x86_init(const char *cpu_model) > { > X86CPU *cpu; > -CPUX86State *env; > Error *error = NULL; > > cpu = X86_CPU(object_new(TYPE_X86_CPU)); > -env = &cpu->env; > -env->cpu_model_str = cpu_model; Why did you remove this line? > > -if (cpu_x86_register(cpu, cpu_model) < 0) { > -object_delete(OBJECT(cpu)); > -return NULL; > -} > +if (cpu_x86_register(cpu, cpu_model) < 0) > + goto error_out; You are mixing tabs and spaces here, and below: > > x86_cpu_realize(OBJECT(cpu), &error); > if (error) { > error_free(error); > -object_delete(OBJECT(cpu)); > -return NULL; > + goto error_out; (here) > } > + > return cpu; > + > + error_out: > + object_delete(OBJECT(cpu)); > + return NULL; (and here) > } > > #if !defined(CONFIG_USER_ONLY) > -- > 1.7.2.5 > -- Eduardo
Re: [Qemu-devel] [PATCH 1.3?] multiboot: fix e801 memory map
On 29.11.2012, at 18:11, Paolo Bonzini wrote: > The e801 memory sizes in the multiboot structures hard-code the available > low memory to 640. However, the value should not include the size of the > EBDA. Fill the value in the option ROM, getting the size of low memory > from the BIOS. > > Cc: Alexander Graf > Signed-off-by: Paolo Bonzini > --- >Alex, can you ack this patch for 1.3? Do you have a test case (OS that fails for example)? Alex > > pc-bios/multiboot.bin | Bin 1024 -> 1024 bytes > pc-bios/optionrom/multiboot.S | 7 +++ > 2 files changed, 7 insertions(+) > > diff --git a/pc-bios/multiboot.bin b/pc-bios/multiboot.bin > index > f74a6e142fddc054d7f40ab346a108532afac40f..7b3c1745a430ea5e0e15b9aa817d1cbbaa40db14 > 100644 > GIT binary patch > delta 81 > zcmZqRXyBNj#q7o8KT+38L4+xd@o;KdryK*rZVrajPB|8aZaEhwAcKt|ty|7*VtW(k > kS)sHUDQSyY7$$Qv{%2ueV91@!$z;eVv)P(y2P2~%0BkrF9smFU > > delta 72 > zcmZqRXyBNj#capqJW bA;$lVDU(H+3>gJByEE-zoLs=f!*~_|qXQE} > > diff --git a/pc-bios/optionrom/multiboot.S b/pc-bios/optionrom/multiboot.S > index f08222a..003bcfb 100644 > --- a/pc-bios/optionrom/multiboot.S > +++ b/pc-bios/optionrom/multiboot.S > @@ -75,6 +75,13 @@ run_multiboot: > shr $4, %eax > mov %ax, %fs > > + /* Account for the EBDA in the multiboot structure's e801 > + * map. > + */ > + int $0x12 > + cwtl > + movl%eax, %fs:4 > + > /* ES = mmap_addr */ > mov %fs:48, %eax > shr $4, %eax > -- > 1.8.0 >
Re: [Qemu-devel] [PATCH 1.3?] multiboot: fix e801 memory map
On 29.11.2012, at 18:11, Paolo Bonzini wrote: > The e801 memory sizes in the multiboot structures hard-code the available > low memory to 640. However, the value should not include the size of the > EBDA. Fill the value in the option ROM, getting the size of low memory > from the BIOS. The description mentions that instead of hard coding, you fetch the value dynamically via a BIOS call. However, I don't see the code that hard codes it changed. Where is that one? Alex > > Cc: Alexander Graf > Signed-off-by: Paolo Bonzini > --- >Alex, can you ack this patch for 1.3? > > pc-bios/multiboot.bin | Bin 1024 -> 1024 bytes > pc-bios/optionrom/multiboot.S | 7 +++ > 2 files changed, 7 insertions(+) > > diff --git a/pc-bios/multiboot.bin b/pc-bios/multiboot.bin > index > f74a6e142fddc054d7f40ab346a108532afac40f..7b3c1745a430ea5e0e15b9aa817d1cbbaa40db14 > 100644 > GIT binary patch > delta 81 > zcmZqRXyBNj#q7o8KT+38L4+xd@o;KdryK*rZVrajPB|8aZaEhwAcKt|ty|7*VtW(k > kS)sHUDQSyY7$$Qv{%2ueV91@!$z;eVv)P(y2P2~%0BkrF9smFU > > delta 72 > zcmZqRXyBNj#capqJW bA;$lVDU(H+3>gJByEE-zoLs=f!*~_|qXQE} > > diff --git a/pc-bios/optionrom/multiboot.S b/pc-bios/optionrom/multiboot.S > index f08222a..003bcfb 100644 > --- a/pc-bios/optionrom/multiboot.S > +++ b/pc-bios/optionrom/multiboot.S > @@ -75,6 +75,13 @@ run_multiboot: > shr $4, %eax > mov %ax, %fs > > + /* Account for the EBDA in the multiboot structure's e801 > + * map. > + */ > + int $0x12 > + cwtl > + movl%eax, %fs:4 > + > /* ES = mmap_addr */ > mov %fs:48, %eax > shr $4, %eax > -- > 1.8.0 >
Re: [Qemu-devel] tap devices not receiving packets from a bridge
Am 23.11.2012 08:02, schrieb Stefan Hajnoczi: > On Thu, Nov 22, 2012 at 03:29:52PM +0100, Peter Lieven wrote: >> is anyone aware of a problem with the linux network bridge that in very rare >> circumstances stops >> a bridge from sending pakets to a tap device? >> >> My problem occurs in conjunction with vanilla qemu-kvm-1.2.0 and Ubuntu >> Kernel 3.2.0-34.53 >> which is based on Linux 3.2.33. >> >> I was not yet able to reproduce the issue, it happens in really rare cases. >> The symptom is that >> the tap does not have any TX packets. RX is working fine. I see the packets >> coming in at >> the physical interface on the host, but they are not forwarded to the tap >> interface. >> The bridge itself has learnt the mac address of the vServer that is >> connected to the tap interface. >> It does not help to toggle the bridge link status, the tap interface status >> or the interface in the vServer. >> It seems that problem occurs if a tap interface that has previously been >> used, but set to nonpersistent >> is set persistent again and then is by chance assigned to the same vServer >> (=same mac address on same >> bridge) again. Unfortunately it seems not to be reproducible. > > Not sure but this patch from Michael Tsirkin may help - it solves an > issue with persistent tap devices: > > http://patchwork.ozlabs.org/patch/198598/ I have tried that patch (even if I do not use persistant taps), but it doesn't help. What I can say now is that if a VM is not working with a tap - lets say tap10 then it will not work with tap10, even if the vm is shut down. tap10 is set to non-persistant. then the vm is started again, assigned occasionally again tap10 and is not working again. BUT, if I use qemu-kvm-1.0.1 in the second case it is working. I have seen that there is a lot of changes between 1.0.1 and 1.2.0 in the tap code. Maybe there is a bug in the inititialization since then. What also seem to have changed is that vlans have been removed. I was not aware of that, so I still use vlans which are now emulated via hubs. Maybe this is related. Peter > > Stefan >
Re: [Qemu-devel] [PATCH] virtio: limit avail bytes lookahead
"Michael S. Tsirkin" writes: > On Thu, Nov 29, 2012 at 06:34:46PM +0530, Amit Shah wrote: >> On (Wed) 28 Nov 2012 [23:53:08], Michael S. Tsirkin wrote: >> > On Tue, Nov 27, 2012 at 06:25:04PM +0200, Michael S. Tsirkin wrote: >> > > On Thu, Nov 01, 2012 at 06:07:21PM +0200, Michael S. Tsirkin wrote: >> > > > Commit 0d8d7690850eb0cf2b2b60933cf47669a6b6f18f introduced >> > > > a regression in virtio-net performance because it looks >> > > > into the ring aggressively while we really only care >> > > > about a single packet worth of buffers. >> > > > To fix, add parameters limiting lookahead, and >> > > > use in virtqueue_avail_bytes. >> > > > >> > > > Signed-off-by: Michael S. Tsirkin >> > > > Reported-by: Edivaldo de Araujo Pereira >> > > >> > > Ping. >> > > Anthony - going to apply this? >> > >> > virtio rng was added since so naturally build broke. >> > Here's a patch on top to fix it up. I never used virtio rng before so >> > could not test at this hour, but it does fix the build. >> > >> > I'll take a look at how to test it tomorrow but any >> > info would be appreciated. >> > Amit could you pls review? >> >> Looks fine, I assume you will send a v2 of the patch to Anthony? >> >> Amit > > > Anthony volunteered to test this so there will only be v2 if he sees > problems. I need a Signed-off-by so why don't you just go ahead and send a v2 and I'll test that. Regards, Anthony Liguori
Re: [Qemu-devel] [Xen-devel] [PATCH 0/2] QEMU/xen: simplify cpu_ioreq_pio and cpu_ioreq_move
On Thu, Nov 29, 2012 at 01:57:11PM +, Ian Campbell wrote: > On Thu, 2012-11-29 at 13:21 +, Stefano Stabellini wrote: > > > Also I think 4.2.1 need these patches to enable the basic Xen on Xen > > > nested virtualization usage scenario. > > > > I agree. > > Nested virt was a tech preview in 4.2.0, is it really worth > backporting ? > IIRC AMD Nested Virt works on 4.2.0, so if this backport makes Intel Nested Virt working aswell it'd be nice.. -- Pasi
Re: [Qemu-devel] [PATCH V7 0/6] VMXNET3 paravirtual NIC device implementation
Hello Stefan Yes, we tested this device with various guests (both Linux and Windows) and it also passed WHQL tests for Windows Server 2008 R2 OS. The issue is Anthony asked us to write a basic test for the device (a packet transmission test) and we are trying to get some reference code or documentation to do this. Anthony, Paolo, please, advise. Thanks in advance, Dmitry Fleytman On Thu, Nov 29, 2012 at 7:45 PM, Stefan Hajnoczi wrote: > On Fri, Nov 16, 2012 at 2:55 PM, Dmitry Fleytman > wrote: > > We didn't succeed to find any guide or sample for the > > king of tests required (packets transmission). > > If someone can provide a pointer to the relevant > > information we'll be very grateful. > > QEMU does not have a NIC test suite. You can use manual testing of > guests with vmxnet3 drivers to see if the NIC works as expected. > Maybe you could even try running WHQL tests on the VMXNET3 driver > inside a Windows guest - I'm not sure, I've never looked into WHQL. > > The community has been developing the qtest framework which is for > device-level testing. For example, you could poke hardware registers > and check that the correct interrupt gets raised. However, this > framework is being extended to cover busses and hardware as developers > decide they want test coverage. PCI support isn't available yet but > Anthony Liguori recently worked on qtest PCI code. > -- Dmitry Fleytman Technology Expert and Consultant, Daynix Computing Ltd. Cell: +972-54-2819481 Skype: dmitry.fleytman
Re: [Qemu-devel] [PATCH 15/37] target-i386: set default value of "hypervisor" feature using static property
On Thu, Nov 29, 2012 at 07:30:29PM +0100, Igor Mammedov wrote: > On Thu, 29 Nov 2012 13:47:37 -0200 > Eduardo Habkost wrote: > > > On Thu, Nov 29, 2012 at 03:56:09PM +0100, Igor Mammedov wrote: > > > On Fri, 9 Nov 2012 16:48:07 +0100 > > > Eduardo Habkost wrote: > > > > > > > > > > > On 22/10/2012, at 17:03, Igor Mammedov wrote: > > > > > > > > > Signed-off-by: Igor Mammedov > > > > > --- > > > > > target-i386/cpu.c | 9 +++-- > > > > > 1 file changed, 3 insertions(+), 6 deletions(-) > > > > > > > > > > diff --git a/target-i386/cpu.c b/target-i386/cpu.c > > > > > index 3131945..dc4fcdf 100644 > > > > > --- a/target-i386/cpu.c > > > > > +++ b/target-i386/cpu.c > > > > > @@ -174,7 +174,7 @@ static Property cpu_x86_properties[] = { > > > > > DEFINE_PROP_BIT("f-xsave", X86CPU, env.cpuid_ext_features, 26, > > > > > false), DEFINE_PROP_BIT("f-osxsave", X86CPU, env.cpuid_ext_features, > > > > > 27, false), DEFINE_PROP_BIT("f-avx", X86CPU, env.cpuid_ext_features, > > > > > 28, false), > > > > > -DEFINE_PROP_BIT("f-hypervisor", X86CPU, env.cpuid_ext_features, > > > > > 31, false), > > > > > +DEFINE_PROP_BIT("f-hypervisor", X86CPU, env.cpuid_ext_features, > > > > > 31, true), DEFINE_PROP_BIT("f-syscall", X86CPU, > > > > > env.cpuid_ext2_features, 11, false), DEFINE_PROP_BIT("f-nx", X86CPU, > > > > > env.cpuid_ext2_features, 20, false), DEFINE_PROP_BIT("f-xd", X86CPU, > > > > > env.cpuid_ext2_features, 20, false), @@ -1307,11 +1307,12 @@ static > > > > > int cpu_x86_find_by_name(X86CPU *cpu, x86_def_t *x86_cpu_def, { > > > > > unsigned int i; > > > > > x86_def_t *def; > > > > > +CPUX86State *env = &cpu->env; > > > > > > > > > > char *s = g_strdup(cpu_model); > > > > > char *featurestr, *name = strtok(s, ","); > > > > > /* Features to be added*/ > > > > > -uint32_t plus_features = 0, plus_ext_features = 0; > > > > > +uint32_t plus_features = 0, plus_ext_features = > > > > > env->cpuid_ext_features; > > > > > > > > Moving data back and forth between CPUX86State and x86_def_t makes the > > > > initialization ordering confusing (today data is moved from x86_def_t to > > > > X86CPU, and never the other way around). > > > > > > > > As this code is removed in the next patches, I don't mind too much, but > > > > maybe it's simpler to implement this change only after the "use static > > > > properties for setting cpuid features" patch? > > > It won't eliminate moving data back and forth between CPUX86State and > > > x86_def_t until CPU sub-classes (i.e. when x86_def_t isn't used anymore), > > > if defaults from static properties are used. > > > > You don't need CPU subclasses to reduce back-and-forth data movement. > > We can just reorder code to look like: > > > > split_cpu_model_string(cpu_model, &model, &features); > > fill_cpudef_for_model_somehow(model, &cpudef); > somewhere between here and [2], One still needs to enable "hypervisor" > feature, and set kvm_features defaults. I guess "hypervisor" could be pushed > out into x86_def_t for each model and kvm_cpu_fill_host(). But it would be > hard to do so for KVM features (see how pv_eoi is set). If the features are set by default on all models, we can hardcode or set them by default either in the fill_cpudef_for_model_somehow() step (meaning that the models have it implicitly set), or inside cpudef_2_x86_cpu() (meaning that the feature can't be controlled by the CPU feature string in any way). In think both the KVM defaults and the hypervisor flag fit the former case (because the defaults can be changed by the feature string later). In other words: even if the feature is not mentioned in builtin_x86_defs, it can be simply be considered part of the defaults for all CPU models. Currently the defaults are set inside the feature string parsing code (in the pseudo-code I wrote, that means setting them by default on cpu_parse_feature_string()). That works, too, but it's confusing IMO. One thing I forgot to mention about the example code, is that eventually the two steps: fill_cpudef_for_model_somehow(model, &cpudef); cpudef_2_x86_cpu(cpu, &cpudef); will become just a single step: initialize_default_flags_for_model(cpu, model); or, even better, it will become: cpu = create_cpu_object_with_default_flags_already_set(model); When we change the code to do that, we will be able to easily introduce the subclasses, and to easily eliminate the X86CPUDefinnition struct (aka x86_def_t). > > see 8d239fbb9b, ded9f32153, a7bb291e5b, 2623e2f876, bd539e14ef in > https://github.com/imammedo/qemu/commits/x86-cpu-properties.WIP > > > cpudef_2_x86_cpu(cpu, &cpudef); > As alternative it's possible to merge (OR) ext_features and kvm_features from > cpudef into env inside of cpudef_2_x86_cpu() to keep defaults SET by static > properties when cpu obj was created. The feature string can be used to disable features too, not just to enable them, so we can't just merge two bitmaps later. > > [2] > > This merge won't
Re: [Qemu-devel] [PATCH 1.3?] multiboot: fix e801 memory map
- Messaggio originale - > Da: "Alexander Graf" > A: "Paolo Bonzini" > Cc: qemu-devel@nongnu.org > Inviato: Giovedì, 29 novembre 2012 19:51:07 > Oggetto: Re: [PATCH 1.3?] multiboot: fix e801 memory map > > > On 29.11.2012, at 18:11, Paolo Bonzini wrote: > > > The e801 memory sizes in the multiboot structures hard-code the available > > low memory to 640. However, the value should not include the size > > of the EBDA. Fill the value in the option ROM, getting the size of low > > memory from the BIOS. > > The description mentions that instead of hard coding, you fetch the > value dynamically via a BIOS call. However, I don't see the code > that hard codes it changed. Where is that one? It is in hw/multiboot.c: stl_p(bootinfo + MBI_MEM_LOWER, 640); Regarding the testcase, Xen will touch the EBDA without this patch and with http://permalink.gmane.org/gmane.comp.emulators.xen.devel/144855. However, SeaBIOS does not complain. Paolo
Re: [Qemu-devel] [PATCH V5 2/2] Enabling IA32_TSC_ADJUST for KVM guest VM support
Marcelo, The behavior on reset is to return the TSC_AJUST msr value to 0x0. I am currently initializing this emulated msr in kvm_arch_vcpu_init(). >- Behaviour on reset: what is the behaviour on RESET? I am testing the rebase now. I would like to get any needed changes for this initialization into my next patch set version (v6) if possible. Thanks, Will > -Original Message- > From: Marcelo Tosatti [mailto:mtosa...@redhat.com] > Sent: Tuesday, November 27, 2012 5:00 PM > To: Auld, Will > Cc: k...@vger.kernel.org; Dugger, Donald D; Liu, Jinsong; Zhang, > Xiantao; a...@redhat.com; qemu-devel; Gleb > Subject: Re: [PATCH V5 2/2] Enabling IA32_TSC_ADJUST for KVM guest VM > support > > > Hi Will, > > On Tue, Nov 27, 2012 at 11:09:00AM -0800, Will Auld wrote: > > CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported > > > > Basic design is to emulate the MSR by allowing reads and writes to a > > guest vcpu specific location to store the value of the emulated MSR > > while adding the value to the vmcs tsc_offset. In this way the > > IA32_TSC_ADJUST value will be included in all reads to the TSC MSR > > whether through rdmsr or rdtsc. This is of course as long as the "use > > TSC counter offsetting" VM-execution control is enabled as well as > the IA32_TSC_ADJUST control. > > > > However, because hardware will only return the TSC + IA32_TSC_ADJUST > + > > vmsc tsc_offset for a guest process when it does and rdtsc (with the > > correct > > settings) the value of our virtualized IA32_TSC_ADJUST must be stored > > in one of these three locations. The argument against storing it in > > the actual MSR is performance. This is likely to be seldom used while > > the save/restore is required on every transition. IA32_TSC_ADJUST was > > created as a way to solve some issues with writing TSC itself so that > is not an option either. > > The remaining option, defined above as our solution has the problem > of > > returning incorrect vmcs tsc_offset values (unless we intercept and > > fix, not done here) as mentioned above. However, more problematic is > > that storing the data in vmcs tsc_offset will have a different > > semantic effect on the system than does using the actual MSR. This is > illustrated in the following example: > > The hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a > > guest process performs a rdtsc. In this case the guest process will > > get TSC + IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including > IA32_TSC_ADJUST_guest. > > While the total system semantics changed the semantics as seen by the > > guest do not and hence this will not cause a problem. > > > > Signed-off-by: Will Auld > > --- > > arch/x86/include/asm/cpufeature.h | 1 + > > arch/x86/include/asm/kvm_host.h | 3 +++ > > arch/x86/include/asm/msr-index.h | 1 + > > arch/x86/kvm/cpuid.c | 2 ++ > > arch/x86/kvm/cpuid.h | 8 > > arch/x86/kvm/svm.c| 7 +++ > > arch/x86/kvm/vmx.c| 9 + > > arch/x86/kvm/x86.c| 22 ++ > > 8 files changed, 53 insertions(+) > > > > diff --git a/arch/x86/include/asm/cpufeature.h > > b/arch/x86/include/asm/cpufeature.h > > index 6b7ee5f..e574d81 100644 > > --- a/arch/x86/include/asm/cpufeature.h > > +++ b/arch/x86/include/asm/cpufeature.h > > @@ -199,6 +199,7 @@ > > > > /* Intel-defined CPU features, CPUID level 0x0007:0 (ebx), word > 9 */ > > #define X86_FEATURE_FSGSBASE (9*32+ 0) /* {RD/WR}{FS/GS}BASE > instructions*/ > > +#define X86_FEATURE_TSC_ADJUST (9*32+ 1) /* TSC adjustment MSR 0x3b > > +*/ > > #define X86_FEATURE_BMI1 (9*32+ 3) /* 1st group bit manipulation > extensions */ > > #define X86_FEATURE_HLE(9*32+ 4) /* Hardware Lock Elision > */ > > #define X86_FEATURE_AVX2 (9*32+ 5) /* AVX2 instructions */ > > diff --git a/arch/x86/include/asm/kvm_host.h > > b/arch/x86/include/asm/kvm_host.h index da34027..cf8c7e0 100644 > > --- a/arch/x86/include/asm/kvm_host.h > > +++ b/arch/x86/include/asm/kvm_host.h > > @@ -442,6 +442,8 @@ struct kvm_vcpu_arch { > > u32 virtual_tsc_mult; > > u32 virtual_tsc_khz; > > > > + s64 ia32_tsc_adjust_msr; > > + > > atomic_t nmi_queued; /* unprocessed asynchronous NMIs */ > > unsigned nmi_pending; /* NMI queued after currently running > handler */ > > bool nmi_injected;/* Trying to inject an NMI this entry */ > > @@ -690,6 +692,7 @@ struct kvm_x86_ops { > > bool (*has_wbinvd_exit)(void); > > > > void (*set_tsc_khz)(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool > > scale); > > + u64 (*read_tsc_offset)(struct kvm_vcpu *vcpu); > > void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset); > > > > u64 (*compute_tsc_offset)(struct kvm_vcpu *vcpu, u64 target_tsc); > > diff --git a/arch/x86/include/asm/msr-index.h > > b/arch/x86/include/asm/msr-index.h > > index 957ec87..6486569 100644 > > --- a/arch/x86/include/asm/msr-index.h > > +++ b/arch
Re: [Qemu-devel] [PATCH 1.3?] multiboot: fix e801 memory map
On 29.11.2012, at 20:21, Paolo Bonzini wrote: > > > - Messaggio originale - >> Da: "Alexander Graf" >> A: "Paolo Bonzini" >> Cc: qemu-devel@nongnu.org >> Inviato: Giovedì, 29 novembre 2012 19:51:07 >> Oggetto: Re: [PATCH 1.3?] multiboot: fix e801 memory map >> >> >> On 29.11.2012, at 18:11, Paolo Bonzini wrote: >> >>> The e801 memory sizes in the multiboot structures hard-code the available >>> low memory to 640. However, the value should not include the size >>> of the EBDA. Fill the value in the option ROM, getting the size of low >>> memory from the BIOS. >> >> The description mentions that instead of hard coding, you fetch the >> value dynamically via a BIOS call. However, I don't see the code >> that hard codes it changed. Where is that one? > > It is in hw/multiboot.c: > >stl_p(bootinfo + MBI_MEM_LOWER, 640); You want to remove that one then. > Regarding the testcase, Xen will touch the EBDA without this patch and > with http://permalink.gmane.org/gmane.comp.emulators.xen.devel/144855. > However, SeaBIOS does not complain. I see :). Well, the patch is unintrusive enough for be ok for 1.3 IMHO. Alex
Re: [Qemu-devel] [PATCH 1.3?] multiboot: fix e801 memory map
Il 29/11/2012 20:24, Alexander Graf ha scritto: >> > It is in hw/multiboot.c: >> > >> >stl_p(bootinfo + MBI_MEM_LOWER, 640); > You want to remove that one then. I wasn't sure of what happens if the multiboot option ROM is old. Do we support that to any extent? > > Regarding the testcase, Xen will touch the EBDA without this patch and > > with http://permalink.gmane.org/gmane.comp.emulators.xen.devel/144855. > > However, SeaBIOS does not complain. > > I see :). Well, the patch is unintrusive enough for be ok for 1.3 IMHO. Actually it's relatively easy to see a failure with Xen and a need to PCI passthrough something with a bulky boot ROM; iSCSI or FC cards will do. You'll need both the Xen patch above, and this patch to boot successfully. Otherwise, it will fail to boot. Paolo
Re: [Qemu-devel] [PATCH V5 2/2] Enabling IA32_TSC_ADJUST for KVM guest VM support
On Thu, Nov 29, 2012 at 07:21:28PM +, Auld, Will wrote: > Marcelo, > > The behavior on reset is to return the TSC_AJUST msr value to 0x0. I am > currently initializing this emulated msr in kvm_arch_vcpu_init(). Will, Reset is handled by QEMU. kvm_arch_vcpu_init is only called during vcpu allocation (ie vm creation). See x86_cpu_reset in QEMU's target-i386/cpu.c file. > > >- Behaviour on reset: what is the behaviour on RESET? > > I am testing the rebase now. I would like to get any needed changes for this > initialization into my next patch set version (v6) if possible. For the kernel patch, everything is fine. Apparently for the QEMU patch it is necessary to set value to zero on reset.