Re: [Qemu-devel] [PATCH v2 2/2] ivshmem: use irqfd to interrupt among VMs

2012-11-29 Thread liu ping fan
On Thu, Nov 29, 2012 at 12:42 PM, Cam Macdonell  wrote:
> On Tue, Nov 27, 2012 at 7:53 PM, liu ping fan  wrote:
>> On Wed, Nov 28, 2012 at 5:48 AM, Cam Macdonell  wrote:
>>> On Sat, Nov 24, 2012 at 8:51 PM, Liu Ping Fan  wrote:
 From: Liu Ping Fan 

 Using irqfd, so we can avoid switch between kernel and user when
 VMs interrupts each other.
>>>
>>> Nice work.  Due to a hardware failure, there will be a small delay in
>>> me being able to test this.  I'll follow up as soon as I can.
>>>
>> BTW where can I find the latest guest code for test?
>> I got the guest code from git://gitorious.org/nahanni/guest-code.git.
>> But it seems outdated, after fixing the unlocked_ioctl, and vm-id bits
>> position (the codes conflict with ivshmem spec), it works (I have
>> tested for V1).
>
> Hello,
>
> Which device driver are you using?
>
guest-code/kernel_module/standard/kvm_ivshmem.c

> Cam
>
>>
>> Regards,
>> Pingfan
>>

 Signed-off-by: Liu Ping Fan 
 ---
  hw/ivshmem.c |   54 +-
  1 files changed, 53 insertions(+), 1 deletions(-)

 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index 7c8630c..5709e89 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -19,6 +19,7 @@
  #include "hw.h"
  #include "pc.h"
  #include "pci.h"
 +#include "msi.h"
  #include "msix.h"
  #include "kvm.h"
  #include "migration.h"
 @@ -83,6 +84,7 @@ typedef struct IVShmemState {
  uint32_t vectors;
  uint32_t features;
  EventfdEntry *eventfd_table;
 +int *vector_virqs;

  Error *migration_blocker;

 @@ -625,16 +627,62 @@ static int ivshmem_load(QEMUFile* f, void *opaque, 
 int version_id)
  return 0;
  }

 +static int ivshmem_vector_use(PCIDevice *dev, unsigned vector,
 + MSIMessage msg)
 +{
 +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
 +int virq;
 +EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
 +
 +virq = kvm_irqchip_add_msi_route(kvm_state, msg);
 +if (virq >= 0 && kvm_irqchip_add_irqfd_notifier(kvm_state, n, virq) 
 >= 0) {
 +s->vector_virqs[vector] = virq;
 +qemu_chr_add_handlers(s->eventfd_chr[vector], NULL, NULL, NULL, 
 NULL);
 +} else if (virq >= 0) {
 +kvm_irqchip_release_virq(kvm_state, virq);
 +error_report("ivshmem, can not setup irqfd\n");
 +return -1;
 +} else {
 +error_report("ivshmem, no enough msi route to setup irqfd\n");
 +return -1;
 +}
 +
 +return 0;
 +}
 +
 +static void ivshmem_vector_release(PCIDevice *dev, unsigned vector)
 +{
 +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
 +EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
 +int virq = s->vector_virqs[vector];
 +
 +if (s->vector_virqs[vector] >= 0) {
 +kvm_irqchip_remove_irqfd_notifier(kvm_state, n, virq);
 +kvm_irqchip_release_virq(kvm_state, virq);
 +s->vector_virqs[vector] = -1;
 +}
 +}
 +
  static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address,
  uint32_t val, int len)
  {
 +bool is_enabled, was_enabled = msi_enabled(pci_dev);
 +
  pci_default_write_config(pci_dev, address, val, len);
 +is_enabled = msi_enabled(pci_dev);
 +if (!was_enabled && is_enabled) {
 +msix_set_vector_notifiers(pci_dev, ivshmem_vector_use,
 +ivshmem_vector_release);
 +} else if (was_enabled && !is_enabled) {
 +msix_unset_vector_notifiers(pci_dev);
 +}
  }

  static int pci_ivshmem_init(PCIDevice *dev)
  {
  IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
  uint8_t *pci_conf;
 +int i;

  if (s->sizearg == NULL)
  s->ivshmem_size = 4 << 20; /* 4 MB default */
 @@ -758,7 +806,10 @@ static int pci_ivshmem_init(PCIDevice *dev)
  }

  s->dev.config_write = ivshmem_write_config;
 -
 +s->vector_virqs = g_new0(int, s->vectors);
 +for (i = 0; i < s->vectors; i++) {
 +s->vector_virqs[i] = -1;
 +}
  return 0;
  }

 @@ -770,6 +821,7 @@ static void pci_ivshmem_uninit(PCIDevice *dev)
  migrate_del_blocker(s->migration_blocker);
  error_free(s->migration_blocker);
  }
 +g_free(s->vector_virqs);

  memory_region_destroy(&s->ivshmem_mmio);
  memory_region_del_subregion(&s->bar, &s->ivshmem);
 --
 1.7.4.4

>>



Re: [Qemu-devel] [PATCH v7 1/7] qom: apply atomic on object's refcount

2012-11-29 Thread liu ping fan
On Thu, Nov 29, 2012 at 1:16 AM, Richard Henderson  wrote:
> On 11/24/2012 06:02 PM, Liu Ping Fan wrote:
>> -obj->ref--;
>>
>>  /* parent always holds a reference to its children */
>> -if (obj->ref == 0) {
>> +if (__sync_fetch_and_sub(&obj->ref, 1) == 1) {
>
>   if (__sync_sub_and_fetch(&obj->ref, 1) == 0)
>
Applied, thanks
>
> r~



Re: [Qemu-devel] [PATCH V12 6/7] libqblock API implement

2012-11-29 Thread Paolo Bonzini
Il 29/11/2012 07:25, Wenchao Xia ha scritto:
>>
>> How can this be called with libqb_global_data.init_flag == 1?
>>
>   OK, I will remove the if, but keep init_flag which can show
> library is already initialized.

Please remove it if there is no usage elsewhere.

For debugging, qemu_aio_context != NULL is just as good to check for
initialization.

Paolo



Re: [Qemu-devel] [PATCH v4 00/11] virtio: virtio-blk data plane

2012-11-29 Thread Stefan Hajnoczi
On Thu, Nov 22, 2012 at 4:16 PM, Stefan Hajnoczi  wrote:
> This series adds the -device virtio-blk-pci,x-data-plane=on property that
> enables a high performance I/O codepath.  A dedicated thread is used to 
> process
> virtio-blk requests outside the global mutex and without going through the 
> QEMU
> block layer.
>
> Khoa Huynh  reported an increase from 140,000 IOPS to 600,000
> IOPS for a single VM using virtio-blk-data-plane in July:
>
>   http://comments.gmane.org/gmane.comp.emulators.kvm.devel/94580
>
> The virtio-blk-data-plane approach was originally presented at Linux Plumbers
> Conference 2010.  The following slides contain a brief overview:
>
>   
> http://linuxplumbersconf.org/2010/ocw/system/presentations/651/original/Optimizing_the_QEMU_Storage_Stack.pdf
>
> The basic approach is:
> 1. Each virtio-blk device has a thread dedicated to handling ioeventfd
>signalling when the guest kicks the virtqueue.
> 2. Requests are processed without going through the QEMU block layer using
>Linux AIO directly.
> 3. Completion interrupts are injected via irqfd from the dedicated thread.
>
> To try it out:
>
>   qemu -drive if=none,id=drive0,cache=none,aio=native,format=raw,file=...
>-device virtio-blk-pci,drive=drive0,scsi=off,x-data-plane=on
>
> Limitations:
>  * Only format=raw is supported
>  * Live migration is not supported
>  * Block jobs, hot unplug, and other operations fail with -EBUSY
>  * I/O throttling limits are ignored
>  * Only Linux hosts are supported due to Linux AIO usage
>
> The code has reached a stage where I feel it is ready to merge.  Users have
> been playing with it for some time and want the significant performance boost.
>
> We are refactoring QEMU to get rid of the global mutex.  I believe that
> virtio-blk-data-plane can eventually become the default mode of operation.
>
> Instead of waiting for global mutex removal efforts to finish, I want to use
> virtio-blk-data-plane as an example device for AioContext and threaded hw
> dispatch refactoring.  This means:
>
> 1. When the block layer can bind to an AioContext and execute I/O outside the
>global mutex, virtio-blk-data-plane can use this (and gain image format
>support).
>
> 2. When hw dispatch no longer needs the global mutex we can use hw/virtio.c
>again and perhaps run a pool of iothreads instead of dedicated data plane
>threads.
>
> But in the meantime, I have cleaned up the virtio-blk-data-plane code so that
> it can be merged as an experimental feature.
>
> v4:
>  * Add qemu_iovec_concat_iov() [Paolo]
>  * Use QEMUIOVector to copy out virtio_blk_inhdr [Michael, Paolo]
>
> v3:
>  * Don't assume iovec layout [Michael]
>  * Better naming for hostmem.c MemoryListener callbacks [Don]
>  * More vring quarantining if commands are bogus instead of exiting [Blue]
>
> v2:
>  * Use MemoryListener for thread-safe memory mapping [Paolo, Anthony, and 
> everyone else pointed this out ;-)]
>  * Quarantine invalid vring instead of exiting [Blue]
>  * Replace __u16 kernel types with uint16_t [Blue]
>
> Changes from the RFC v9:
>  * Add x-data-plane=on|off option and coexist with regular virtio-blk code
>  * Create thread from BH so it inherits iothread cpusets
>  * Drain requests on vm_stop() so stopped guest does not access image file
>  * Add migration blocker
>  * Add bdrv_in_use() to prevent block jobs and other operations that can 
> interfere
>  * Drop IOQueue request merging for simplicity
>  * Drop ioctl interrupt injection and always use irqfd for simplicity
>  * Major cleanup to split up source files
>  * Rebase from qemu-kvm.git onto qemu.git
>  * Address Michael Tsirkin's review comments
>
> Stefan Hajnoczi (11):
>   raw-posix: add raw_get_aio_fd() for virtio-blk-data-plane
>   configure: add CONFIG_VIRTIO_BLK_DATA_PLANE
>   dataplane: add host memory mapping code
>   dataplane: add virtqueue vring code
>   dataplane: add event loop
>   dataplane: add Linux AIO request queue
>   iov: add iov_discard() to remove data
>   test-iov: add iov_discard() testcase
>   iov: add qemu_iovec_concat_iov()
>   dataplane: add virtio-blk data plane code
>   virtio-blk: add x-data-plane=on|off performance feature
>
>  block.h|   9 +
>  block/raw-posix.c  |  34 
>  configure  |  21 +++
>  hw/Makefile.objs   |   2 +-
>  hw/dataplane/Makefile.objs |   3 +
>  hw/dataplane/event-poll.c  | 109 
>  hw/dataplane/event-poll.h  |  40 +
>  hw/dataplane/hostmem.c | 165 ++
>  hw/dataplane/hostmem.h |  52 ++
>  hw/dataplane/ioq.c | 118 +
>  hw/dataplane/ioq.h |  57 ++
>  hw/dataplane/virtio-blk.c  | 427 
> +
>  hw/dataplane/virtio-blk.h  |  41 +
>  hw/dataplane/vring.c   | 344 
>  hw/dataplane/vring.h   |  62 +++
>  hw/virtio-blk.c|  59 ++-
>  hw/virtio-blk.h|   1 +
>  hw/

[Qemu-devel] [Bug 955379] Re: cmake hangs with qemu-arm-static

2012-11-29 Thread Janne Karhunen
Ok, test case attached (80M tar). This hugely stripped one is not 100% 
reproducer, but do few loops and you will hit it. Instructions for using:
- extract, chroot
- cd /home/abuild/rpmbuild
- su abuild
- export RPM_BUILD_ROOT=$PWD
- rpmbuild -ba SOURCES/libshortcut.spec


** Attachment added: "Tizen chroot"
   
https://bugs.launchpad.net/qemu/+bug/955379/+attachment/3446633/+files/root5-small.tar.bz2

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/955379

Title:
  cmake hangs with qemu-arm-static

Status in QEMU:
  New
Status in Linaro QEMU:
  New
Status in “qemu-linaro” package in Ubuntu:
  Confirmed

Bug description:
  I'm using git commit 3e7ecd976b06f... configured with --target-list
  =arm-linux-user --static in a chroot environment to compile some
  things. I ran into this problem with both pcl and opencv-2.3.1. cmake
  consistently freezes at some point during its execution, though in a
  different spot each time, usually during a step when it's searching
  for some libraries. For instance, pcl most commonly stops after:

  [snip]
  -- Boost version: 1.46.1
  -- Found the following Boost libraries:
  --   system
  --   filesystem
  --   thread
  --   date_time
  -- checking for module 'eigen3'
  --   found eigen3, version 3.0.1

  which is perplexing because it freezes after finding what it wants,
  not during the search. When it does get past that point, it does so
  almost immediately but freezes somewhere else.

  I'm using 64-bit Ubuntu 11.10 with kernel release 3.0.0-16-generic
  with an Intel i5.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/955379/+subscriptions



[Qemu-devel] [Bug 955379] Re: cmake hangs with qemu-arm-static

2012-11-29 Thread Janne Karhunen
Mind you, when you hit the bug it just hangs and cmake test errors are
just to speed up the process of hitting the bug (if cmake just fails you
did not hit the bug). Feel free to try with any qemu variant, they all
hang similarly when bug is hit. I think that root had some suse 1.2 one
inside.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/955379

Title:
  cmake hangs with qemu-arm-static

Status in QEMU:
  New
Status in Linaro QEMU:
  New
Status in “qemu-linaro” package in Ubuntu:
  Confirmed

Bug description:
  I'm using git commit 3e7ecd976b06f... configured with --target-list
  =arm-linux-user --static in a chroot environment to compile some
  things. I ran into this problem with both pcl and opencv-2.3.1. cmake
  consistently freezes at some point during its execution, though in a
  different spot each time, usually during a step when it's searching
  for some libraries. For instance, pcl most commonly stops after:

  [snip]
  -- Boost version: 1.46.1
  -- Found the following Boost libraries:
  --   system
  --   filesystem
  --   thread
  --   date_time
  -- checking for module 'eigen3'
  --   found eigen3, version 3.0.1

  which is perplexing because it freezes after finding what it wants,
  not during the search. When it does get past that point, it does so
  almost immediately but freezes somewhere else.

  I'm using 64-bit Ubuntu 11.10 with kernel release 3.0.0-16-generic
  with an Intel i5.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/955379/+subscriptions



[Qemu-devel] [PATCHv3] rbd block driver fix race between aio completition and aio cancel

2012-11-29 Thread Stefan Priebe
This one fixes a race which qemu had also in iscsi block driver
between cancellation and io completition.

qemu_rbd_aio_cancel was not synchronously waiting for the end of
the command.

To archieve this it introduces a new status flag which uses
-EINPROGRESS.

Changes since last PATCH:
- fixed missing braces
- added vfree for bounce

Signed-off-by: Stefan Priebe 
---
 block/rbd.c |   27 ++-
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 0aaacaf..917c64c 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -77,6 +77,7 @@ typedef struct RBDAIOCB {
 int error;
 struct BDRVRBDState *s;
 int cancelled;
+int status;
 } RBDAIOCB;
 
 typedef struct RADOSCB {
@@ -376,12 +377,6 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
 RBDAIOCB *acb = rcb->acb;
 int64_t r;
 
-if (acb->cancelled) {
-qemu_vfree(acb->bounce);
-qemu_aio_release(acb);
-goto done;
-}
-
 r = rcb->ret;
 
 if (acb->cmd == RBD_AIO_WRITE ||
@@ -406,10 +401,11 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
 acb->ret = r;
 }
 }
+acb->status = 0;
+
 /* Note that acb->bh can be NULL in case where the aio was cancelled */
 acb->bh = qemu_bh_new(rbd_aio_bh_cb, acb);
 qemu_bh_schedule(acb->bh);
-done:
 g_free(rcb);
 }
 
@@ -568,6 +564,13 @@ static void qemu_rbd_aio_cancel(BlockDriverAIOCB *blockacb)
 {
 RBDAIOCB *acb = (RBDAIOCB *) blockacb;
 acb->cancelled = 1;
+
+while (acb->status == -EINPROGRESS) {
+qemu_aio_wait();
+}
+
+qemu_vfree(acb->bounce);
+qemu_aio_release(acb);
 }
 
 static const AIOCBInfo rbd_aiocb_info = {
@@ -640,7 +643,9 @@ static void rbd_aio_bh_cb(void *opaque)
 qemu_bh_delete(acb->bh);
 acb->bh = NULL;
 
-qemu_aio_release(acb);
+if (!acb->cancelled) {
+qemu_aio_release(acb);
+}
 }
 
 static int rbd_aio_discard_wrapper(rbd_image_t image,
@@ -685,6 +690,7 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs,
 acb->s = s;
 acb->cancelled = 0;
 acb->bh = NULL;
+acb->status = -EINPROGRESS;
 
 if (cmd == RBD_AIO_WRITE) {
 qemu_iovec_to_buf(acb->qiov, 0, acb->bounce, qiov->size);
@@ -731,7 +737,10 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState 
*bs,
 failed:
 g_free(rcb);
 s->qemu_aio_count--;
-qemu_aio_release(acb);
+if (!acb->cancelled) {
+qemu_vfree(acb->bounce);
+qemu_aio_release(acb);
+}
 return NULL;
 }
 
-- 
1.7.10.4




[Qemu-devel] [PATCH for-1.3] Build system clean generated source files in tests

2012-11-29 Thread Wenchao Xia
  Currently .c files generated in ./tests are not deleted in make
clean. This introduce trouble that, once we made tests in source
root directory, we can't do a succesfully build for tests in another
out of tree directory, for that some file may miss the step to be
generated. This patch fix it.

Steps to reproduce:
in root source directory:
1 ./configure --target-list=x86_64-softmmu
2 make check-report-unit.xml
3 make distclean && rm -rf *-linux-user *-softmmu
goto another directory:
4 configure
5 make check-report-unit.xml

Signed-off-by: Wenchao Xia 
---
 tests/Makefile |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/tests/Makefile b/tests/Makefile
index b60f0fb..2c90138 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -30,6 +30,7 @@ check-qtest-sparc-y = tests/m48t59-test$(EXESUF)
 check-qtest-sparc64-y = tests/m48t59-test$(EXESUF)
 
 GENERATED_HEADERS += tests/test-qapi-types.h tests/test-qapi-visit.h 
tests/test-qmp-commands.h
+GENERATED_SOURCES += tests/test-qapi-types.c tests/test-qapi-visit.c 
tests/test-qmp-marshal.c
 
 test-obj-y = tests/check-qint.o tests/check-qstring.o tests/check-qdict.o \
tests/check-qlist.o tests/check-qfloat.o tests/check-qjson.o \
-- 
1.7.1





Re: [Qemu-devel] [PATCH 1/3] define name for some fields of dr7

2012-11-29 Thread Peter Maydell
On 29 November 2012 03:32, liguang  wrote:

Your Subject: line is missing the "target-i386:" prefix.
(also, should be "names".)

> Signed-off-by: liguang 
> ---
>  target-i386/cpu.h |   13 +
>  1 files changed, 13 insertions(+), 0 deletions(-)
>
> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> index 90ef1ff..7f292e6 100644
> --- a/target-i386/cpu.h
> +++ b/target-i386/cpu.h
> @@ -558,6 +558,19 @@
>  #define CPU_INTERRUPT_TPR   CPU_INTERRUPT_TGT_INT_3
>
>
> +/* dr7 fields */
> +/* max breakpoints*/
> +#define MAX_BP  4
> +/* Break on instruction execution only */
> +#define BP_INST 0x0
> +/* Break on data writes only */
> +#define BP_DATA_WR  0x1
> +/* Break on I/O reads or writes */

... or undefined for 386, 486 or if CR4 DE flag is clear.

> +#define BP_IO_RW0x10
> +/* Break on data reads or writes but not instruction fetches */
> +#define BP_DATA_RW  0x11

These should all go next to the existing definitions in this
file for some DR7 fields, and they should follow the existing
naming conventions, ie DR7_something. I suggest naming them
DR7_TYPE_INSN, DR7_TYPE_DATA_WR, etc. Could also use a comment
that the DR7_TYPE_ constants are the values for the TYPE field
of DR7, and are what is returned by hw_breakpoint_type().

-- PMM



Re: [Qemu-devel] [PATCH 2/3] use dr7's bit name for breakpoint

2012-11-29 Thread Peter Maydell
On 29 November 2012 03:32, liguang  wrote:
> Signed-off-by: liguang 
> ---
>  target-i386/cpu.h |2 ++
>  target-i386/helper.c  |   24 +++-
>  target-i386/misc_helper.c |6 +++---
>  target-i386/seg_helper.c  |6 +++---
>  4 files changed, 19 insertions(+), 19 deletions(-)
>
> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> index 7f292e6..7ecfe21 100644
> --- a/target-i386/cpu.h
> +++ b/target-i386/cpu.h
> @@ -561,6 +561,8 @@
>  /* dr7 fields */
>  /* max breakpoints*/
>  #define MAX_BP  4
> +/*enable local breakpoint bit 0,2,4,6*/
> +#define BP_LOCAL0x55

This needs a better name, to make it clear that it's not
just a single enable bit but actually a mask of all the
local enable bits. Also needs DR7_ prefix.

You've split these changes up between patches inconsistently;
either have one patch which adds all the constants and
then patches which just use them, or have patches which
both add and use the constants, but don't mix the two.

I'd recommend that each patch should both add and use a
related set of constants, so it's self contained and
easy to review.

-- PMM



Re: [Qemu-devel] [PATCH 3/3] target-i386:refactor check_hw_breakpoints function

2012-11-29 Thread Peter Maydell
On 29 November 2012 03:32, liguang  wrote:
> Signed-off-by: liguang 
> ---
>  target-i386/helper.c |   28 
>  1 files changed, 20 insertions(+), 8 deletions(-)
>
> diff --git a/target-i386/helper.c b/target-i386/helper.c
> index 9ca52a7..a506df0 100644
> --- a/target-i386/helper.c
> +++ b/target-i386/helper.c
> @@ -1012,22 +1012,34 @@ void hw_breakpoint_remove(CPUX86State *env, int index)
>  int check_hw_breakpoints(CPUX86State *env, int force_dr6_update)
>  {
>  target_ulong dr6;
> -int reg, type;
> +int index;
>  int hit_enabled = 0;
>
>  dr6 = env->dr[6] & ~0xf;
> -for (reg = 0; reg < MAX_BP; reg++) {
> -type = hw_breakpoint_type(env->dr[7], reg);
> -if ((type == 0 && env->dr[reg] == env->eip) ||
> -((type & 1) && env->cpu_watchpoint[reg] &&
> - (env->cpu_watchpoint[reg]->flags & BP_WATCHPOINT_HIT))) {
> -dr6 |= 1 << reg;
> -if (hw_breakpoint_enabled(env->dr[7], reg))
> +for (index = 0; index < MAX_BP; index++) {
> +switch (hw_breakpoint_type(env->dr[7], index)){
> +case BP_INST:
> +if (env->dr[index] != env->eip)
> +break;
> +goto enable_hit;
> +case BP_DATA_WR:
> +case BP_DATA_RW:
> +if (!env->cpu_watchpoint[index])
> +break;
> +if (!(env->cpu_watchpoint[index]->flags & BP_WATCHPOINT_HIT))
> +break;
> +enable_hit:
> +dr6 |= 1 << index;
> +if (hw_breakpoint_enabled(env->dr[7], index))
>  hit_enabled = 1;
> +break;
> +case BP_IO_RW:
> +break;
>  }

If you have to resort to gotos and fallthroughs in
a switch statement, I don't think this is actually
clearer than the existing code...

(Also it has coding style issues, use scripts/checkpatch.pl.)

-- PMM



Re: [Qemu-devel] [PATCH 03/10] qemu-ga: qmp_guest_file_*: improve error reporting

2012-11-29 Thread Luiz Capitulino
On Wed, 28 Nov 2012 16:17:43 -0600
mdroth  wrote:

> On Wed, Nov 28, 2012 at 04:26:29PM -0500, Eric Blake wrote:
> > 
> > > > >  if (ferror(fh)) {
> > > > > +error_setg_errno(err, errno, "failed to read file");
> > > > >  slog("guest-file-read failed, handle: %ld", handle);
> > > > > -error_set(err, QERR_QGA_COMMAND_FAILED, "fread()
> > > > > failed");
> > > > >  } else {
> > > > 
> > > > I'm not sure about relying on errno for FILE/f*() functions. C99
> > > > doesn't
> > > > appear to require setting it for implementations
> > 
> > Correct that C99 doesn't require it, but POSIX _does_ require it.
> > 
> > Windows is the biggest system out there where errno is unreliable after
> > failure on FILE operations (but as we DO support mingw, we ARE impacted
> > by the lameness of Microsoft's C library being C89 but not POSIX).
> 
> Well, if it's primarilly an issue with windows, then I think we're okay
> relying on it for anything in qga/commands-posix.c at least, as those
> implementations get swapped out for the implementations in
> qga/commands-win32.o for win32/mingw builds. We'll need to be wary of
> this in the future if we end up sharing more code with the win32 port
> becomes more feature-full though.
> 
> The comment elsewhere about setmntent() might still apply however.

Ok, so can I respin only 05/10?

> > > However, the other FILE functions seem safe to me. I'd be very
> > > surprised
> > > if some implementation doesn't set errno on fopen() failure for
> > > example
> > 
> > Then you probably haven't experimented much with mingw :)

Nope. But I was referring to other unixes...



Re: [Qemu-devel] [PATCH v4 00/11] virtio: virtio-blk data plane

2012-11-29 Thread Paolo Bonzini
Il 29/11/2012 10:18, Stefan Hajnoczi ha scritto:
> Michael, Paolo: Are you happy with v4?

Sure.

> Kevin: Do you want to take this series through the block tree?

Paolo



Re: [Qemu-devel] [PATCH v4 03/11] dataplane: add host memory mapping code

2012-11-29 Thread Michael S. Tsirkin
On Thu, Nov 22, 2012 at 04:16:44PM +0100, Stefan Hajnoczi wrote:
> The data plane thread needs to map guest physical addresses to host
> pointers.  Normally this is done with cpu_physical_memory_map() but the
> function assumes the global mutex is held.  The data plane thread does
> not touch the global mutex and therefore needs a thread-safe memory
> mapping mechanism.
> 
> Hostmem registers a MemoryListener similar to how vhost collects and
> pushes memory region information into the kernel.  There is a
> fine-grained lock on the regions list which is held during lookup and
> when installing a new regions list.
> 
> When the physical memory map changes the MemoryListener callbacks are
> invoked.  They build up a new list of memory regions which is finally
> installed when the list has been completed.
> 
> Note that this approach is not safe across memory hotplug because mapped
> pointers may still be in used across memory unplug.  However, this is
> currently a problem for QEMU in general and needs to be addressed in the
> future.

Sounds like a serious problem.
I'm not sure I understand - do you say this currently a problem for QEMU
virtio? Coul you give an example please?

> 
> Signed-off-by: Stefan Hajnoczi 

> ---
>  hw/dataplane/Makefile.objs |   3 +
>  hw/dataplane/hostmem.c | 165 
> +
>  hw/dataplane/hostmem.h |  52 ++
>  3 files changed, 220 insertions(+)
>  create mode 100644 hw/dataplane/Makefile.objs
>  create mode 100644 hw/dataplane/hostmem.c
>  create mode 100644 hw/dataplane/hostmem.h
> 
> diff --git a/hw/dataplane/Makefile.objs b/hw/dataplane/Makefile.objs
> new file mode 100644
> index 000..8c8dea1
> --- /dev/null
> +++ b/hw/dataplane/Makefile.objs
> @@ -0,0 +1,3 @@
> +ifeq ($(CONFIG_VIRTIO), y)
> +common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o
> +endif
> diff --git a/hw/dataplane/hostmem.c b/hw/dataplane/hostmem.c
> new file mode 100644
> index 000..48aabf0
> --- /dev/null
> +++ b/hw/dataplane/hostmem.c
> @@ -0,0 +1,165 @@
> +/*
> + * Thread-safe guest to host memory mapping
> + *
> + * Copyright 2012 Red Hat, Inc. and/or its affiliates
> + *
> + * Authors:
> + *   Stefan Hajnoczi 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "exec-memory.h"
> +#include "hostmem.h"
> +
> +static int hostmem_lookup_cmp(const void *phys_, const void *region_)
> +{
> +hwaddr phys = *(const hwaddr *)phys_;
> +const HostmemRegion *region = region_;
> +
> +if (phys < region->guest_addr) {
> +return -1;
> +} else if (phys >= region->guest_addr + region->size) {
> +return 1;
> +} else {
> +return 0;
> +}
> +}
> +
> +/**
> + * Map guest physical address to host pointer
> + */
> +void *hostmem_lookup(Hostmem *hostmem, hwaddr phys, hwaddr len, bool 
> is_write)
> +{
> +HostmemRegion *region;
> +void *host_addr = NULL;
> +hwaddr offset_within_region;
> +
> +qemu_mutex_lock(&hostmem->current_regions_lock);
> +region = bsearch(&phys, hostmem->current_regions,
> + hostmem->num_current_regions,
> + sizeof(hostmem->current_regions[0]),
> + hostmem_lookup_cmp);
> +if (!region) {
> +goto out;
> +}
> +if (is_write && region->readonly) {
> +goto out;
> +}
> +offset_within_region = phys - region->guest_addr;
> +if (offset_within_region + len <= region->size) {
> +host_addr = region->host_addr + offset_within_region;
> +}
> +out:
> +qemu_mutex_unlock(&hostmem->current_regions_lock);
> +
> +return host_addr;
> +}
> +
> +/**
> + * Install new regions list
> + */
> +static void hostmem_listener_commit(MemoryListener *listener)
> +{
> +Hostmem *hostmem = container_of(listener, Hostmem, listener);
> +
> +qemu_mutex_lock(&hostmem->current_regions_lock);
> +g_free(hostmem->current_regions);
> +hostmem->current_regions = hostmem->new_regions;
> +hostmem->num_current_regions = hostmem->num_new_regions;
> +qemu_mutex_unlock(&hostmem->current_regions_lock);
> +
> +/* Reset new regions list */
> +hostmem->new_regions = NULL;
> +hostmem->num_new_regions = 0;
> +}
> +
> +/**
> + * Add a MemoryRegionSection to the new regions list
> + */
> +static void hostmem_append_new_region(Hostmem *hostmem,
> +  MemoryRegionSection *section)
> +{
> +void *ram_ptr = memory_region_get_ram_ptr(section->mr);
> +size_t num = hostmem->num_new_regions;
> +size_t new_size = (num + 1) * sizeof(hostmem->new_regions[0]);
> +
> +hostmem->new_regions = g_realloc(hostmem->new_regions, new_size);
> +hostmem->new_regions[num] = (HostmemRegion){
> +.host_addr = ram_ptr + section->offset_within_region,
> +.guest_addr = section->offset_within_address_space,
> +.size = se

Re: [Qemu-devel] [RFC PATCH v2 1/3] virtio-bus : Introduce VirtioBus.

2012-11-29 Thread Konrad Frederic

On 26/11/2012 17:59, Anthony Liguori wrote:

Peter Maydell  writes:


On 26 November 2012 14:33, Anthony Liguori  wrote:

VirtioBusInfo is not a great name.  This is a proxy class that allows
for a device to implement the virtio bus interface.

This could be done as an interface but since nothing else uses
interfaces, I'm okay with something like this.  But the first argument
ought to be an opaque for all methods.

We have at least one user of Interface in the tree IIRC.
I'd much rather we did this the right way -- the only reason
it's the way Fred has coded it is that there's no obvious
body of code in the tree to copy, so we're thrashing around
a bit. If you tell us what the correct set of structs/classes/
interfaces/etc is then we can implement it :-)

I really think extending virtio-bus to a virtio-pci-bus and then
initializing it with a link to the PCI device is the best approach.

It's by far the simpliest approach in terms of coding.

Did I explain it adequately?  To recap:

virtio-bus extends bus-state
  - implements everything that VirtIOBindings implements as methods

virtio-pci-bus extends virtio-bus
  - is constructed with a pointer to a PCIDevice
  - implements the methods necessary to be a virtio bus

I still have trouble with that ^^.

virtio-pci-bus extends virtio-bus so I put something like that :

static void virtio_pci_bus_class_init(ObjectClass *klass, void *data)
{
BusClass *bc = BUS_CLASS(klass);
VirtioBusClass *k = VIRTIO_BUS_CLASS(klass);
/* VirtIOBindings */
k->notify = virtio_pci_notify;
k->save_config = virtio_pci_save_config;
k->load_config = virtio_pci_load_config;
k->save_queue = virtio_pci_save_queue;
k->load_queue = virtio_pci_load_queue;
k->get_features = virtio_pci_get_features;
k->query_guest_notifiers = virtio_pci_query_guest_notifiers;
k->set_host_notifier = virtio_pci_set_host_notifier;
k->set_guest_notifiers = virtio_pci_set_guest_notifiers;
k->vmstate_change = virtio_pci_vmstate_change;
/*
 * TODO : Init and exit function.
 * void (*init)(void *opaque);
 * void (*exit)(void *opaque);
 */
}

static TypeInfo virtio_pci_bus_info = {
.name  = TYPE_VIRTIO_PCI_BUS,
.parent= TYPE_VIRTIO_BUS,
.class_init= virtio_pci_bus_class_init,
};

and I have VirtioDevice which extends DeviceState like that :

static void virtio_device_class_init(ObjectClass *klass, void *data)
{
/* Set the default value here. */
DeviceClass *dc = DEVICE_CLASS(klass);
dc->bus_type = TYPE_VIRTIO_BUS;
dc->init = virtio_device_init;
}

static TypeInfo virtio_device_info = {
.name = TYPE_VIRTIO_DEVICE,
.parent = TYPE_DEVICE,
.instance_size = sizeof(VirtioDeviceState),
/* Abstract the virtio-device */
.class_init = virtio_device_class_init,
.abstract = true,
.class_size = sizeof(VirtioDeviceClass),
};

The problem is that the virtio devices can't be connected to the 
virtio-pci-bus even if it extends virtio-bus because TYPE_VIRTIO_BUS != 
TYPE_VIRTIO_PCI_BUS.


Did I miss something ?

Thanks,

Fred



virtio-device extends device-state
  - implements methods used by virtio-bus

Regards,

Anthony Liguori


-- PMM





Re: [Qemu-devel] [PATCH v4 03/11] dataplane: add host memory mapping code

2012-11-29 Thread Stefan Hajnoczi
On Thu, Nov 29, 2012 at 02:33:11PM +0200, Michael S. Tsirkin wrote:
> On Thu, Nov 22, 2012 at 04:16:44PM +0100, Stefan Hajnoczi wrote:
> > The data plane thread needs to map guest physical addresses to host
> > pointers.  Normally this is done with cpu_physical_memory_map() but the
> > function assumes the global mutex is held.  The data plane thread does
> > not touch the global mutex and therefore needs a thread-safe memory
> > mapping mechanism.
> > 
> > Hostmem registers a MemoryListener similar to how vhost collects and
> > pushes memory region information into the kernel.  There is a
> > fine-grained lock on the regions list which is held during lookup and
> > when installing a new regions list.
> > 
> > When the physical memory map changes the MemoryListener callbacks are
> > invoked.  They build up a new list of memory regions which is finally
> > installed when the list has been completed.
> > 
> > Note that this approach is not safe across memory hotplug because mapped
> > pointers may still be in used across memory unplug.  However, this is
> > currently a problem for QEMU in general and needs to be addressed in the
> > future.
> 
> Sounds like a serious problem.
> I'm not sure I understand - do you say this currently a problem for QEMU
> virtio? Coul you give an example please?

This is a limitation of the memory API but cannot be triggered by users
today since we don't support memory hot unplug.  I'm just explaining
that virtio-blk-data-plane has the same issue as hw/virtio-blk.c or any
other device emulation code here.

Some more detail:

The issue is that hw/virtio-blk.c submits an asynchronous I/O request on
the host with the guest buffer.  Then virtio-blk emulation returns back
to the caller and continues QEMU execution.

It is unsafe to unplug memory while the I/O request is pending since
there's no mechanism (e.g. refcount) to wait until the guest memory is
no longer in use.

This is a known issue.  There's no way to trigger a problem today but we
need to eventually enhance QEMU's memory API to handle this case.

Stefan



Re: [Qemu-devel] [PATCH v4 04/11] dataplane: add virtqueue vring code

2012-11-29 Thread Michael S. Tsirkin
On Thu, Nov 22, 2012 at 04:16:45PM +0100, Stefan Hajnoczi wrote:
> The virtio-blk-data-plane cannot access memory using the usual QEMU
> functions since it executes outside the global mutex and the memory APIs
> are this time are not thread-safe.
> 
> This patch introduces a virtqueue module based on the kernel's vhost
> vring code.  The trick is that we map guest memory ahead of time and
> access it cheaply outside the global mutex.
> 
> Once the hardware emulation code can execute outside the global mutex it
> will be possible to drop this code.
> 
> Signed-off-by: Stefan Hajnoczi 

Is there no way to factor out ommon code and share it with virtio.c?

> ---
>  hw/Makefile.objs   |   2 +-
>  hw/dataplane/Makefile.objs |   2 +-
>  hw/dataplane/vring.c   | 344 
> +
>  hw/dataplane/vring.h   |  62 
>  trace-events   |   3 +
>  5 files changed, 411 insertions(+), 2 deletions(-)
>  create mode 100644 hw/dataplane/vring.c
>  create mode 100644 hw/dataplane/vring.h
> 
> diff --git a/hw/Makefile.objs b/hw/Makefile.objs
> index ea46f81..db87fbf 100644
> --- a/hw/Makefile.objs
> +++ b/hw/Makefile.objs
> @@ -1,4 +1,4 @@
> -common-obj-y = usb/ ide/
> +common-obj-y = usb/ ide/ dataplane/
>  common-obj-y += loader.o
>  common-obj-$(CONFIG_VIRTIO) += virtio-console.o
>  common-obj-$(CONFIG_VIRTIO) += virtio-rng.o
> diff --git a/hw/dataplane/Makefile.objs b/hw/dataplane/Makefile.objs
> index 8c8dea1..34e6d57 100644
> --- a/hw/dataplane/Makefile.objs
> +++ b/hw/dataplane/Makefile.objs
> @@ -1,3 +1,3 @@
>  ifeq ($(CONFIG_VIRTIO), y)
> -common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o
> +common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o
>  endif
> diff --git a/hw/dataplane/vring.c b/hw/dataplane/vring.c
> new file mode 100644
> index 000..2632fbd
> --- /dev/null
> +++ b/hw/dataplane/vring.c
> @@ -0,0 +1,344 @@
> +/* Copyright 2012 Red Hat, Inc.
> + * Copyright IBM, Corp. 2012
> + *
> + * Based on Linux vhost code:
> + * Copyright (C) 2009 Red Hat, Inc.
> + * Copyright (C) 2006 Rusty Russell IBM Corporation
> + *
> + * Author: Michael S. Tsirkin 
> + * Stefan Hajnoczi 
> + *
> + * Inspiration, some code, and most witty comments come from
> + * Documentation/virtual/lguest/lguest.c, by Rusty Russell
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + */
> +
> +#include "trace.h"
> +#include "hw/dataplane/vring.h"
> +
> +/* Map the guest's vring to host memory */
> +bool vring_setup(Vring *vring, VirtIODevice *vdev, int n)
> +{
> +hwaddr vring_addr = virtio_queue_get_ring_addr(vdev, n);
> +hwaddr vring_size = virtio_queue_get_ring_size(vdev, n);
> +void *vring_ptr;
> +
> +vring->broken = false;
> +
> +hostmem_init(&vring->hostmem);
> +vring_ptr = hostmem_lookup(&vring->hostmem, vring_addr, vring_size, 
> true);
> +if (!vring_ptr) {
> +error_report("Failed to map vring "
> + "addr %#" HWADDR_PRIx " size %" HWADDR_PRIu,
> + vring_addr, vring_size);
> +vring->broken = true;
> +return false;
> +}
> +
> +vring_init(&vring->vr, virtio_queue_get_num(vdev, n), vring_ptr, 4096);
> +
> +vring->last_avail_idx = 0;
> +vring->last_used_idx = 0;
> +vring->signalled_used = 0;
> +vring->signalled_used_valid = false;
> +
> +trace_vring_setup(virtio_queue_get_ring_addr(vdev, n),
> +  vring->vr.desc, vring->vr.avail, vring->vr.used);
> +return true;
> +}
> +
> +void vring_teardown(Vring *vring)
> +{
> +hostmem_finalize(&vring->hostmem);
> +}
> +
> +/* Toggle guest->host notifies */
> +void vring_set_notification(VirtIODevice *vdev, Vring *vring, bool enable)
> +{
> +if (vdev->guest_features & (1 << VIRTIO_RING_F_EVENT_IDX)) {
> +if (enable) {
> +vring_avail_event(&vring->vr) = vring->vr.avail->idx;
> +}
> +} else if (enable) {
> +vring->vr.used->flags &= ~VRING_USED_F_NO_NOTIFY;
> +} else {
> +vring->vr.used->flags |= VRING_USED_F_NO_NOTIFY;
> +}
> +}
> +
> +/* This is stolen from linux/drivers/vhost/vhost.c:vhost_notify() */
> +bool vring_should_notify(VirtIODevice *vdev, Vring *vring)
> +{
> +uint16_t old, new;
> +bool v;
> +/* Flush out used index updates. This is paired
> + * with the barrier that the Guest executes when enabling
> + * interrupts. */
> +smp_mb();
> +
> +if ((vdev->guest_features & VIRTIO_F_NOTIFY_ON_EMPTY) &&
> +unlikely(vring->vr.avail->idx == vring->last_avail_idx)) {
> +return true;
> +}
> +
> +if (!(vdev->guest_features & VIRTIO_RING_F_EVENT_IDX)) {
> +return !(vring->vr.avail->flags & VRING_AVAIL_F_NO_INTERRUPT);
> +}
> +old = vring->signalled_used;
> +v = vring->signalled_used_valid;
> +new = vring->signalled_used = vring->last_used_idx;
> +vring->signalled_used_valid = true;
> +
> +if (unlike

Re: [Qemu-devel] [PATCH v4 03/11] dataplane: add host memory mapping code

2012-11-29 Thread Michael S. Tsirkin
On Thu, Nov 29, 2012 at 01:45:19PM +0100, Stefan Hajnoczi wrote:
> On Thu, Nov 29, 2012 at 02:33:11PM +0200, Michael S. Tsirkin wrote:
> > On Thu, Nov 22, 2012 at 04:16:44PM +0100, Stefan Hajnoczi wrote:
> > > The data plane thread needs to map guest physical addresses to host
> > > pointers.  Normally this is done with cpu_physical_memory_map() but the
> > > function assumes the global mutex is held.  The data plane thread does
> > > not touch the global mutex and therefore needs a thread-safe memory
> > > mapping mechanism.
> > > 
> > > Hostmem registers a MemoryListener similar to how vhost collects and
> > > pushes memory region information into the kernel.  There is a
> > > fine-grained lock on the regions list which is held during lookup and
> > > when installing a new regions list.
> > > 
> > > When the physical memory map changes the MemoryListener callbacks are
> > > invoked.  They build up a new list of memory regions which is finally
> > > installed when the list has been completed.
> > > 
> > > Note that this approach is not safe across memory hotplug because mapped
> > > pointers may still be in used across memory unplug.  However, this is
> > > currently a problem for QEMU in general and needs to be addressed in the
> > > future.
> > 
> > Sounds like a serious problem.
> > I'm not sure I understand - do you say this currently a problem for QEMU
> > virtio? Coul you give an example please?
> 
> This is a limitation of the memory API but cannot be triggered by users
> today since we don't support memory hot unplug.  I'm just explaining
> that virtio-blk-data-plane has the same issue as hw/virtio-blk.c or any
> other device emulation code here.
> 
> Some more detail:
> 
> The issue is that hw/virtio-blk.c submits an asynchronous I/O request on
> the host with the guest buffer.  Then virtio-blk emulation returns back
> to the caller and continues QEMU execution.
> 
> It is unsafe to unplug memory while the I/O request is pending since
> there's no mechanism (e.g. refcount) to wait until the guest memory is
> no longer in use.
> 
> This is a known issue.  There's no way to trigger a problem today but we
> need to eventually enhance QEMU's memory API to handle this case.
> 
> Stefan

For this problem we would simply need to flush outstanding aio
before freeing memory for unplug, no refcount necessary.

This patch however introduces the issue in the frontend
and it looks like there won't be any way to solve
it without changing the API.

-- 
MST



Re: [Qemu-devel] [PATCH v4 03/11] dataplane: add host memory mapping code

2012-11-29 Thread Michael S. Tsirkin
On Thu, Nov 29, 2012 at 02:54:26PM +0200, Michael S. Tsirkin wrote:
> On Thu, Nov 29, 2012 at 01:45:19PM +0100, Stefan Hajnoczi wrote:
> > On Thu, Nov 29, 2012 at 02:33:11PM +0200, Michael S. Tsirkin wrote:
> > > On Thu, Nov 22, 2012 at 04:16:44PM +0100, Stefan Hajnoczi wrote:
> > > > The data plane thread needs to map guest physical addresses to host
> > > > pointers.  Normally this is done with cpu_physical_memory_map() but the
> > > > function assumes the global mutex is held.  The data plane thread does
> > > > not touch the global mutex and therefore needs a thread-safe memory
> > > > mapping mechanism.
> > > > 
> > > > Hostmem registers a MemoryListener similar to how vhost collects and
> > > > pushes memory region information into the kernel.  There is a
> > > > fine-grained lock on the regions list which is held during lookup and
> > > > when installing a new regions list.
> > > > 
> > > > When the physical memory map changes the MemoryListener callbacks are
> > > > invoked.  They build up a new list of memory regions which is finally
> > > > installed when the list has been completed.
> > > > 
> > > > Note that this approach is not safe across memory hotplug because mapped
> > > > pointers may still be in used across memory unplug.  However, this is
> > > > currently a problem for QEMU in general and needs to be addressed in the
> > > > future.
> > > 
> > > Sounds like a serious problem.
> > > I'm not sure I understand - do you say this currently a problem for QEMU
> > > virtio? Coul you give an example please?
> > 
> > This is a limitation of the memory API but cannot be triggered by users
> > today since we don't support memory hot unplug.  I'm just explaining
> > that virtio-blk-data-plane has the same issue as hw/virtio-blk.c or any
> > other device emulation code here.
> > 
> > Some more detail:
> > 
> > The issue is that hw/virtio-blk.c submits an asynchronous I/O request on
> > the host with the guest buffer.  Then virtio-blk emulation returns back
> > to the caller and continues QEMU execution.
> > 
> > It is unsafe to unplug memory while the I/O request is pending since
> > there's no mechanism (e.g. refcount) to wait until the guest memory is
> > no longer in use.
> > 
> > This is a known issue.  There's no way to trigger a problem today but we
> > need to eventually enhance QEMU's memory API to handle this case.
> > 
> > Stefan
> 
> For this problem we would simply need to flush outstanding aio
> before freeing memory for unplug, no refcount necessary.
> 
> This patch however introduces the issue in the frontend
> and it looks like there won't be any way to solve
> it without changing the API.

To clarify, as you say it is not triggerable
so I don't think this is strictly required to address
this at this point though it should not be too hard:
just register callback that flushes the frontend processing.

But if you can't code it at this point, please add
a TODO comment in code.

> -- 
> MST



Re: [Qemu-devel] [PATCH] virtio: limit avail bytes lookahead

2012-11-29 Thread Amit Shah
On (Wed) 28 Nov 2012 [23:53:08], Michael S. Tsirkin wrote:
> On Tue, Nov 27, 2012 at 06:25:04PM +0200, Michael S. Tsirkin wrote:
> > On Thu, Nov 01, 2012 at 06:07:21PM +0200, Michael S. Tsirkin wrote:
> > > Commit 0d8d7690850eb0cf2b2b60933cf47669a6b6f18f introduced
> > > a regression in virtio-net performance because it looks
> > > into the ring aggressively while we really only care
> > > about a single packet worth of buffers.
> > > To fix, add parameters limiting lookahead, and
> > > use in virtqueue_avail_bytes.
> > > 
> > > Signed-off-by: Michael S. Tsirkin 
> > > Reported-by: Edivaldo de Araujo Pereira 
> > 
> > Ping.
> > Anthony - going to apply this?
> 
> virtio rng was added since so naturally build broke.
> Here's a patch on top to fix it up. I never used virtio rng before so
> could not test at this hour, but it does fix the build.
> 
> I'll take a look at how to test it tomorrow but any
> info would be appreciated.
> Amit could you pls review?

Looks fine, I assume you will send a v2 of the patch to Anthony?

Amit



Re: [Qemu-devel] [PATCH 3/3] spice-qemu-char: register interface on post load

2012-11-29 Thread Amit Shah
On (Wed) 28 Nov 2012 [12:59:40], Markus Armbruster wrote:
> Alon Levy  writes:
> 
> >> Alon Levy  writes:
> >> 
> >> > The target has not seen the guest_connected event via
> >> > spice_chr_guest_open or spice_chr_write, and so spice server
> >> > wrongly
> >> > assumes there is no agent active, while the client continues to
> >> > send
> >> > motion events only by the agent channel, which the server ignores.
> >> > The
> >> > net effect is that the mouse is static in the guest.
> >> >
> >> > By registering the interface on post load spice server will pass on
> >> > the
> >> > agent messages fixing the mouse behavior after migration.
> >> >
> >> > RHBZ #725965
> >> >
> >> > Signed-off-by: Alon Levy 
> >> > ---
> >> >  spice-qemu-char.c | 34 ++
> >> >  1 file changed, 34 insertions(+)
> >> >
> >> > diff --git a/spice-qemu-char.c b/spice-qemu-char.c
> >> > index 09aa22d..08b6ba0 100644
> >> > --- a/spice-qemu-char.c
> >> > +++ b/spice-qemu-char.c
> >> > @@ -1,6 +1,7 @@
> >> >  #include "config-host.h"
> >> >  #include "trace.h"
> >> >  #include "ui/qemu-spice.h"
> >> > +#include "hw/virtio-serial.h"
> >> >  #include 
> >> >  #include 
> >> >  
> >> > @@ -25,6 +26,7 @@ typedef struct SpiceCharDriver {
> >> >  uint8_t   *datapos;
> >> >  ssize_t   bufsize, datalen;
> >> >  uint32_t  debug;
> >> > +QEMUTimer *post_load_timer;
> >> >  } SpiceCharDriver;
> >> >  
> >> >  static int vmc_write(SpiceCharDeviceInstance *sin, const uint8_t
> >> >  *buf, int len)
> >> > @@ -156,6 +158,7 @@ static void spice_chr_close(struct
> >> > CharDriverState *chr)
> >> >  
> >> >  printf("%s\n", __func__);
> >> >  vmc_unregister_interface(s);
> >> > +qemu_free_timer(s->post_load_timer);
> >> >  g_free(s);
> >> >  }
> >> >  
> >> > @@ -188,6 +191,33 @@ static void print_allowed_subtypes(void)
> >> >  fprintf(stderr, "\n");
> >> >  }
> >> >  
> >> > +static void spice_chr_post_load_cb(void *opaque)
> >> > +{
> >> > +SpiceCharDriver *s = opaque;
> >> > +
> >> > +vmc_register_interface(s);
> >> > +}
> >> > +
> >> > +static int spice_chr_post_load(void *opaque, int version_id)
> >> > +{
> >> > +SpiceCharDriver *s = opaque;
> >> > +
> >> > +if (s && s->chr && qemu_chr_be_connected(s->chr)) {
> >> > +qemu_mod_timer(s->post_load_timer, 1);
> >> > +}
> >> 
> >> You use the time to delay spice_chr_post_load_cb(), right?  Can you
> >> explain why you have to delay?
> >
> > This is a precaution, it ensures vmc_register_interface is called when
> > the vm is running as opposed to stopped which is the state when
> > spice_chr_post_load is called. In theory vmc_register_interface could
> > lead to an attempt to inject an interrupt into the guest, and we know
> > that fails with kvm irqchip from a previous bug with virtio-serial.
> 
> So your fixed delay of 1ns (I think) on the vm_clock is a roundabout way
> to delay the callback from "post load" until after the run state
> transition to RUN_STATE_RUNNING.  Correct?
> 
> If yes, then a VM change state handler might be cleaner.

Agreed.  Juan and I had discussed this earlier, and there are no hooks
on the target side (i.e. for loadvm) yet.  So this roundabout way is
the best way to solve the problem for now.

(This discussion with Juan was done earlier, when the patch to
virtio-serial-bus pointed to by Alon in the other message was done).

Amit



Re: [Qemu-devel] [RFC PATCH v2 1/3] virtio-bus : Introduce VirtioBus.

2012-11-29 Thread Peter Maydell
On 29 November 2012 12:37, Konrad Frederic  wrote:
> On 26/11/2012 17:59, Anthony Liguori wrote:
>> virtio-pci-bus extends virtio-bus
>>   - is constructed with a pointer to a PCIDevice
>>   - implements the methods necessary to be a virtio bus
>
> I still have trouble with that ^^.

> The problem is that the virtio devices can't be connected to the
> virtio-pci-bus even if it extends virtio-bus because TYPE_VIRTIO_BUS !=
> TYPE_VIRTIO_PCI_BUS.

Conceptually it ought to work I think: if the bus is-a TYPE_VIRTIO_BUS
then we should permit plugging in even if the actual bus object happens
to be an instance of a subclass.

I suspect that qbus_find_recursive should be doing an
object_class_dynamic_cast() to check that the bus is of a suitable
type, rather than the
(strcmp(object_get_typename(OBJECT(bus)), bus_typename) != 0)
which it does at the moment.

-- PMM



Re: [Qemu-devel] [PATCH v4 11/11] virtio-blk: add x-data-plane=on|off performance feature

2012-11-29 Thread Michael S. Tsirkin
On Thu, Nov 22, 2012 at 04:16:52PM +0100, Stefan Hajnoczi wrote:
> The virtio-blk-data-plane feature is easy to integrate into
> hw/virtio-blk.c.  The data plane can be started and stopped similar to
> vhost-net.
> 
> Users can take advantage of the virtio-blk-data-plane feature using the
> new -device virtio-blk-pci,x-data-plane=on property.
> 
> The x-data-plane name was chosen because at this stage the feature is
> experimental and likely to see changes in the future.
> 
> If the VM configuration does not support virtio-blk-data-plane an error
> message is printed.  Although we could fall back to regular virtio-blk,
> I prefer the explicit approach since it prompts the user to fix their
> configuration if they want the performance benefit of
> virtio-blk-data-plane.

Not only that, this affects features exposed to guest so it really can't be
trasparent.

Which reminds me - shouldn't some features be turned off?
For example, VIRTIO_BLK_F_SCSI?

> Limitations:
>  * Only format=raw is supported
>  * Live migration is not supported

This is probably fixable long term?

>  * Block jobs, hot unplug, and other operations fail with -EBUSY

Hmm I don't see code to disable PCU unplug in this patch.
I expected no_hotplug to be set.
Where is it?

>  * I/O throttling limits are ignored

And this?
Meanwhile can we have attempts to set them fail?

>  * Only Linux hosts are supported due to Linux AIO usage
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  hw/virtio-blk.c | 59 
> -
>  hw/virtio-blk.h |  1 +
>  hw/virtio-pci.c |  3 +++
>  3 files changed, 62 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
> index e25cc96..7f6004e 100644
> --- a/hw/virtio-blk.c
> +++ b/hw/virtio-blk.c
> @@ -17,6 +17,8 @@
>  #include "hw/block-common.h"
>  #include "blockdev.h"
>  #include "virtio-blk.h"
> +#include "hw/dataplane/virtio-blk.h"
> +#include "migration.h"
>  #include "scsi-defs.h"
>  #ifdef __linux__
>  # include 
> @@ -33,6 +35,8 @@ typedef struct VirtIOBlock
>  VirtIOBlkConf *blk;
>  unsigned short sector_mask;
>  DeviceState *qdev;
> +VirtIOBlockDataPlane *dataplane;
> +Error *migration_blocker;

Would be nice to move the migration disabling
checking supported formats
and all the rest of it out to dataplane code.

>  } VirtIOBlock;
>  
>  static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev)
> @@ -407,6 +411,14 @@ static void virtio_blk_handle_output(VirtIODevice *vdev, 
> VirtQueue *vq)
>  .num_writes = 0,
>  };
>  
> +/* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
> + * dataplane here instead of waiting for .set_status().
> + */
> +if (s->dataplane) {
> +virtio_blk_data_plane_start(s->dataplane);
> +return;
> +}
> +
>  while ((req = virtio_blk_get_request(s))) {
>  virtio_blk_handle_request(req, &mrb);
>  }
> @@ -446,8 +458,13 @@ static void virtio_blk_dma_restart_cb(void *opaque, int 
> running,
>  {
>  VirtIOBlock *s = opaque;
>  
> -if (!running)
> +if (!running) {
> +/* qemu_drain_all() doesn't know about data plane, quiesce here */
> +if (s->dataplane) {
> +virtio_blk_data_plane_drain(s->dataplane);
> +}
>  return;
> +}
>  
>  if (!s->bh) {
>  s->bh = qemu_bh_new(virtio_blk_dma_restart_bh, s);
> @@ -538,6 +555,10 @@ static void virtio_blk_set_status(VirtIODevice *vdev, 
> uint8_t status)
>  VirtIOBlock *s = to_virtio_blk(vdev);
>  uint32_t features;
>  
> +if (s->dataplane && !(status & VIRTIO_CONFIG_S_DRIVER)) {
> +virtio_blk_data_plane_stop(s->dataplane);
> +}
> +
>  if (!(status & VIRTIO_CONFIG_S_DRIVER_OK)) {
>  return;
>  }
> @@ -604,6 +625,7 @@ VirtIODevice *virtio_blk_init(DeviceState *dev, 
> VirtIOBlkConf *blk)
>  {
>  VirtIOBlock *s;
>  static int virtio_blk_id;
> +int fd = -1;
>  
>  if (!blk->conf.bs) {
>  error_report("drive property not set");
> @@ -619,6 +641,21 @@ VirtIODevice *virtio_blk_init(DeviceState *dev, 
> VirtIOBlkConf *blk)
>  return NULL;
>  }
>  
> +if (blk->data_plane) {
> +if (blk->scsi) {
> +error_report("device is incompatible with x-data-plane, "
> + "use scsi=off");
> +return NULL;
> +}
> +
> +fd = raw_get_aio_fd(blk->conf.bs);
> +if (fd < 0) {
> +error_report("drive is incompatible with x-data-plane, "
> + "use format=raw,cache=none,aio=native");
> +return NULL;
> +}
> +}
> +
>  s = (VirtIOBlock *)virtio_common_init("virtio-blk", VIRTIO_ID_BLOCK,
>sizeof(struct virtio_blk_config),
>sizeof(VirtIOBlock));
> @@ -636,6 +673,17 @@ VirtIODevice *virtio_blk_init(DeviceState *dev, 
> VirtIOBlkConf *blk)
>  
>  s->vq = vi

[Qemu-devel] [PULL for-1.3 0/1] QMP queue

2012-11-29 Thread Luiz Capitulino
This is an important fix as it fixes a 32-bit breakage.

The changes (since e9bff10f8db94912b1b0e6e2e3394cae02faf614) are available
in the following repository:

git://repo.or.cz/qemu/qmp-unstable.git queue/qmp

Bruce Rogers (1):
  qapi: fix qapi_dealloc_type_size parameter type

 qapi/qapi-dealloc-visitor.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
1.8.0



[Qemu-devel] [PULL] qapi: fix qapi_dealloc_type_size parameter type

2012-11-29 Thread Luiz Capitulino
From: Bruce Rogers 

The second parameter to qapi_dealloc_type_size should be a uint64_t *,
not a size_t *. This was causing our 32 bit x86 build to fail, since
warnings are treated as errors.

Signed-off-by: Bruce Rogers 
Reviewed-by: Michael Roth 
Reviewed-by: Stefan Weil 
Signed-off-by: Luiz Capitulino 
---
 qapi/qapi-dealloc-visitor.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qapi/qapi-dealloc-visitor.c b/qapi/qapi-dealloc-visitor.c
index a07b171..75214e7 100644
--- a/qapi/qapi-dealloc-visitor.c
+++ b/qapi/qapi-dealloc-visitor.c
@@ -132,7 +132,7 @@ static void qapi_dealloc_type_number(Visitor *v, double 
*obj, const char *name,
 {
 }
 
-static void qapi_dealloc_type_size(Visitor *v, size_t *obj, const char *name,
+static void qapi_dealloc_type_size(Visitor *v, uint64_t *obj, const char *name,
Error **errp)
 {
 }
-- 
1.8.0




[Qemu-devel] [PATCH 2/3] usb: fail usbdevice_create() when there is no USB bus

2012-11-29 Thread Gerd Hoffmann
From: Stefan Hajnoczi 

Report an error instead of segfaulting when attaching a USB device to a
machine with no USB busses:

  $ qemu-system-arm -machine vexpress-a9 \
  -sd Fedora-17-armhfp-vexpress-mmcblk0.img \
  -kernel vmlinuz-3.4.2-3.fc17.armv7hl \
  -initrd initramfs-3.4.2-3.fc17.armv7hl.img \
  -usbdevice disk:format=raw:test.img

Note that the vexpress-a9 machine does not have a USB host controller.

Reported-by: David Abdurachmanov 
Signed-off-by: Stefan Hajnoczi 
Signed-off-by: Gerd Hoffmann 
---
 hw/usb/bus.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/hw/usb/bus.c b/hw/usb/bus.c
index 99aac7a..55d0edd 100644
--- a/hw/usb/bus.c
+++ b/hw/usb/bus.c
@@ -590,6 +590,13 @@ USBDevice *usbdevice_create(const char *cmdline)
 return NULL;
 }
 
+if (!bus) {
+error_report("Error: no usb bus to attach usbdevice %s, "
+ "please try -machine usb=on and check that "
+ "the machine model supports USB", driver);
+return NULL;
+}
+
 if (!f->usbdevice_init) {
 if (*params) {
 error_report("usbdevice %s accepts no params", driver);
-- 
1.7.1




[Qemu-devel] [PATCH 1/3] usb: tag usb host adapters as not hotpluggable.

2012-11-29 Thread Gerd Hoffmann
Hotplugging them simply doesn't work, so tag them accordingly to
avoid users trying and then crashing qemu.

For xhci there is nothing fundamental which prevents hotplug from
working, we'll "only" need a exit() function which cleans up
everything properly.  That isn't for 1.3 though.

For ehci+uhci+ohci hotplug can't be supported until qemu gains the
capability to hotplug multifunction pci devices.

https://bugzilla.redhat.com/show_bug.cgi?id=879096

Signed-off-by: Gerd Hoffmann 
---
 hw/usb/hcd-ehci-pci.c |1 +
 hw/usb/hcd-ohci.c |1 +
 hw/usb/hcd-uhci.c |1 +
 hw/usb/hcd-xhci.c |1 +
 4 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/hw/usb/hcd-ehci-pci.c b/hw/usb/hcd-ehci-pci.c
index 5887eab..41dbb53 100644
--- a/hw/usb/hcd-ehci-pci.c
+++ b/hw/usb/hcd-ehci-pci.c
@@ -123,6 +123,7 @@ static void ehci_class_init(ObjectClass *klass, void *data)
 k->revision = i->revision;
 k->class_id = PCI_CLASS_SERIAL_USB;
 k->config_write = usb_ehci_pci_write_config;
+k->no_hotplug = 1;
 dc->vmsd = &vmstate_ehci_pci;
 dc->props = ehci_pci_properties;
 }
diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c
index 64de906..e16a2ec 100644
--- a/hw/usb/hcd-ohci.c
+++ b/hw/usb/hcd-ohci.c
@@ -1882,6 +1882,7 @@ static void ohci_pci_class_init(ObjectClass *klass, void 
*data)
 k->vendor_id = PCI_VENDOR_ID_APPLE;
 k->device_id = PCI_DEVICE_ID_APPLE_IPID_USB;
 k->class_id = PCI_CLASS_SERIAL_USB;
+k->no_hotplug = 1;
 dc->desc = "Apple USB Controller";
 dc->props = ohci_pci_properties;
 }
diff --git a/hw/usb/hcd-uhci.c b/hw/usb/hcd-uhci.c
index 8e47803..d053791 100644
--- a/hw/usb/hcd-uhci.c
+++ b/hw/usb/hcd-uhci.c
@@ -1327,6 +1327,7 @@ static void uhci_class_init(ObjectClass *klass, void 
*data)
 k->device_id = info->device_id;
 k->revision  = info->revision;
 k->class_id  = PCI_CLASS_SERIAL_USB;
+k->no_hotplug = 1;
 dc->vmsd = &vmstate_uhci;
 dc->props = uhci_properties;
 u->info = *info;
diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
index 8ef4b07..efb509e 100644
--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -3167,6 +3167,7 @@ static void xhci_class_init(ObjectClass *klass, void 
*data)
 k->class_id = PCI_CLASS_SERIAL_USB;
 k->revision = 0x03;
 k->is_express   = 1;
+k->no_hotplug   = 1;
 }
 
 static TypeInfo xhci_info = {
-- 
1.7.1




[Qemu-devel] [PATCH 1/1] qxl: reload memslots after migration, when qxl is in UNDEFINED mode

2012-11-29 Thread Gerd Hoffmann
From: Yonit Halperin 

The devram memslot stays active when qxl enters UNDEFINED mode (i.e, no
primary surface). If migration has occurred while the device is in
UNDEFINED stae, the memslots have to be reloaded at the destination.

Fixes rhbz#874574

Signed-off-by: Yonit Halperin 
Signed-off-by: Gerd Hoffmann 
---
 hw/qxl.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/hw/qxl.c b/hw/qxl.c
index 1bc2d32..96887c4 100644
--- a/hw/qxl.c
+++ b/hw/qxl.c
@@ -2146,6 +2146,7 @@ static int qxl_post_load(void *opaque, int version)
 
 switch (newmode) {
 case QXL_MODE_UNDEFINED:
+qxl_create_memslots(d);
 break;
 case QXL_MODE_VGA:
 qxl_create_memslots(d);
-- 
1.7.1




[Qemu-devel] [PULL for-1.3 0/1] spice patch queue.

2012-11-29 Thread Gerd Hoffmann
  Hi,

Pretty short this time, with a single lonely bugfix.

please pull,
  Gerd

The following changes since commit e9bff10f8db94912b1b0e6e2e3394cae02faf614:

  event notifier: Fix setup for win32 (2012-11-28 13:33:01 -0600)

are available in the git repository at:
  git://anongit.freedesktop.org/spice/qemu spice.v65

Yonit Halperin (1):
  qxl: reload memslots after migration, when qxl is in UNDEFINED mode

 hw/qxl.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)



Re: [Qemu-devel] [PATCH 0/2] QEMU/xen: simplify cpu_ioreq_pio and cpu_ioreq_move

2012-11-29 Thread Stefano Stabellini
On Thu, 29 Nov 2012, Xu, Dongxiao wrote:
> > -Original Message-
> > From: Stefano Stabellini [mailto:stefano.stabell...@eu.citrix.com]
> > Sent: Tuesday, September 18, 2012 6:24 PM
> > To: Xu, Dongxiao
> > Cc: Stefano Stabellini; xen-de...@lists.xensource.com; Ian Jackson;
> > qemu-devel@nongnu.org; Keir (Xen.org)
> > Subject: RE: [PATCH 0/2] QEMU/xen: simplify cpu_ioreq_pio and
> > cpu_ioreq_move
> > 
> > On Tue, 18 Sep 2012, Xu, Dongxiao wrote:
> > > Hi Stefano,
> > >
> > > Is these patches merged with Xen 4.2? I didn't see them in the upstream.
> > > The uint/int fix is critical to fix the nested guest boot issue.
> > 
> > They are not. Ian decided that he wanted to merge a different version of 
> > them.
> 
> Hi Stefano and Ian,
> 
> What's the status of the two patches? I still didn't see them merged...

Ian, do you have any updates?


> Also I think 4.2.1 need these patches to enable the basic Xen on Xen nested 
> virtualization usage scenario.

I agree.



Re: [Qemu-devel] [PATCH] qapi: fix qapi_dealloc_type_size parameter type

2012-11-29 Thread Stefan Hajnoczi
On Tue, Nov 27, 2012 at 01:11:25PM -0700, Bruce Rogers wrote:
> The second parameter to qapi_dealloc_type_size should be a uint64_t *,
> not a size_t *. This was causing our 32 bit x86 build to fail, since
> warnings are treated as errors.
> 
> Signed-off-by: Bruce Rogers 
> ---
>  qapi/qapi-dealloc-visitor.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)

Sorry for the broken build.

Reviewed-by: Stefan Hajnoczi 



Re: [Qemu-devel] [PATCH v4 10/11] dataplane: add virtio-blk data plane code

2012-11-29 Thread Michael S. Tsirkin
On Thu, Nov 22, 2012 at 04:16:51PM +0100, Stefan Hajnoczi wrote:
> virtio-blk-data-plane is a subset implementation of virtio-blk.  It only
> handles read, write, and flush requests.  It does this using a dedicated
> thread that executes an epoll(2)-based event loop and processes I/O
> using Linux AIO.
> 
> This approach performs very well but can be used for raw image files
> only.  The number of IOPS achieved has been reported to be several times
> higher than the existing virtio-blk implementation.
> 
> Eventually it should be possible to unify virtio-blk-data-plane with the
> main body of QEMU code once the block layer and hardware emulation is
> able to run outside the global mutex.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  hw/dataplane/Makefile.objs |   2 +-
>  hw/dataplane/virtio-blk.c  | 427 
> +
>  hw/dataplane/virtio-blk.h  |  41 +
>  trace-events   |   6 +
>  4 files changed, 475 insertions(+), 1 deletion(-)
>  create mode 100644 hw/dataplane/virtio-blk.c
>  create mode 100644 hw/dataplane/virtio-blk.h
> 
> diff --git a/hw/dataplane/Makefile.objs b/hw/dataplane/Makefile.objs
> index abd408f..682aa9e 100644
> --- a/hw/dataplane/Makefile.objs
> +++ b/hw/dataplane/Makefile.objs
> @@ -1,3 +1,3 @@
>  ifeq ($(CONFIG_VIRTIO), y)
> -common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o event-poll.o 
> ioq.o
> +common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o event-poll.o 
> ioq.o virtio-blk.o
>  endif
> diff --git a/hw/dataplane/virtio-blk.c b/hw/dataplane/virtio-blk.c
> new file mode 100644
> index 000..9b29969
> --- /dev/null
> +++ b/hw/dataplane/virtio-blk.c
> @@ -0,0 +1,427 @@
> +/*
> + * Dedicated thread for virtio-blk I/O processing
> + *
> + * Copyright 2012 IBM, Corp.
> + * Copyright 2012 Red Hat, Inc. and/or its affiliates
> + *
> + * Authors:
> + *   Stefan Hajnoczi 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "trace.h"
> +#include "iov.h"
> +#include "event-poll.h"
> +#include "qemu-thread.h"
> +#include "vring.h"
> +#include "ioq.h"
> +#include "hw/virtio-blk.h"
> +#include "hw/dataplane/virtio-blk.h"
> +
> +enum {
> +SEG_MAX = 126,  /* maximum number of I/O segments */
> +VRING_MAX = SEG_MAX + 2,/* maximum number of vring descriptors */
> +REQ_MAX = VRING_MAX,/* maximum number of requests in the 
> vring,
> + * is VRING_MAX / 2 with traditional and
> + * VRING_MAX with indirect descriptors */
> +};
> +
> +typedef struct {
> +struct iocb iocb;   /* Linux AIO control block */
> +QEMUIOVector *inhdr;/* iovecs for virtio_blk_inhdr */
> +unsigned int head;  /* vring descriptor index */
> +} VirtIOBlockRequest;
> +
> +struct VirtIOBlockDataPlane {
> +bool started;
> +QEMUBH *start_bh;
> +QemuThread thread;
> +
> +int fd; /* image file descriptor */
> +
> +VirtIODevice *vdev;
> +Vring vring;/* virtqueue vring */
> +EventNotifier *guest_notifier;  /* irq */
> +
> +EventPoll event_poll;   /* event poller */
> +EventHandler io_handler;/* Linux AIO completion handler */
> +EventHandler notify_handler;/* virtqueue notify handler */
> +
> +IOQueue ioqueue;/* Linux AIO queue (should really be per
> +   dataplane thread) */
> +VirtIOBlockRequest requests[REQ_MAX]; /* pool of requests, managed by the
> + queue */
> +
> +unsigned int num_reqs;
> +QemuMutex num_reqs_lock;
> +QemuCond no_reqs_cond;
> +};
> +
> +/* Raise an interrupt to signal guest, if necessary */
> +static void notify_guest(VirtIOBlockDataPlane *s)
> +{
> +if (!vring_should_notify(s->vdev, &s->vring)) {
> +return;
> +}
> +
> +event_notifier_set(s->guest_notifier);
> +}
> +
> +static void complete_request(struct iocb *iocb, ssize_t ret, void *opaque)
> +{
> +VirtIOBlockDataPlane *s = opaque;
> +VirtIOBlockRequest *req = container_of(iocb, VirtIOBlockRequest, iocb);
> +struct virtio_blk_inhdr hdr;
> +int len;
> +
> +if (likely(ret >= 0)) {
> +hdr.status = VIRTIO_BLK_S_OK;
> +len = ret;
> +} else {
> +hdr.status = VIRTIO_BLK_S_IOERR;
> +len = 0;
> +}
> +
> +trace_virtio_blk_data_plane_complete_request(s, req->head, ret);
> +
> +qemu_iovec_from_buf(req->inhdr, 0, &hdr, sizeof(hdr));
> +qemu_iovec_destroy(req->inhdr);
> +g_slice_free(QEMUIOVector, req->inhdr);
> +
> +/* According to the virtio specification len should be the number of 
> bytes
> + * written to, but for virtio-blk it seems to be the number of bytes
> + * transferred plus the sta

[Qemu-devel] [PATCH 3/3] ehci-sysbus: Attach DMA context.

2012-11-29 Thread Gerd Hoffmann
From: Peter Crosthwaite 

This was left as NULL on the initial merge due to debate on the mailing list on
how to handle DMA contexts for sysbus devices. Patch
9e11908f12f92e31ea94dc2a4c962c836cba9f2a was later merged to fix OHCI. This is 
the,
equivalent fix for sysbus EHCI.

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Gerd Hoffmann 
---
 hw/usb/hcd-ehci-sysbus.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/hw/usb/hcd-ehci-sysbus.c b/hw/usb/hcd-ehci-sysbus.c
index 1584079..803df92 100644
--- a/hw/usb/hcd-ehci-sysbus.c
+++ b/hw/usb/hcd-ehci-sysbus.c
@@ -45,6 +45,7 @@ static int usb_ehci_sysbus_initfn(SysBusDevice *dev)
 
 s->capsbase = 0x100;
 s->opregbase = 0x140;
+s->dma = &dma_context_memory;
 
 usb_ehci_initfn(s, DEVICE(dev));
 sysbus_init_irq(dev, &s->irq);
-- 
1.7.1




Re: [Qemu-devel] [PATCH v4 04/11] dataplane: add virtqueue vring code

2012-11-29 Thread Michael S. Tsirkin
On Thu, Nov 22, 2012 at 04:16:45PM +0100, Stefan Hajnoczi wrote:
> The virtio-blk-data-plane cannot access memory using the usual QEMU
> functions since it executes outside the global mutex and the memory APIs
> are this time are not thread-safe.
> 
> This patch introduces a virtqueue module based on the kernel's vhost
> vring code.  The trick is that we map guest memory ahead of time and
> access it cheaply outside the global mutex.
> 
> Once the hardware emulation code can execute outside the global mutex it
> will be possible to drop this code.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  hw/Makefile.objs   |   2 +-
>  hw/dataplane/Makefile.objs |   2 +-
>  hw/dataplane/vring.c   | 344 
> +
>  hw/dataplane/vring.h   |  62 
>  trace-events   |   3 +
>  5 files changed, 411 insertions(+), 2 deletions(-)
>  create mode 100644 hw/dataplane/vring.c
>  create mode 100644 hw/dataplane/vring.h
> 
> diff --git a/hw/Makefile.objs b/hw/Makefile.objs
> index ea46f81..db87fbf 100644
> --- a/hw/Makefile.objs
> +++ b/hw/Makefile.objs
> @@ -1,4 +1,4 @@
> -common-obj-y = usb/ ide/
> +common-obj-y = usb/ ide/ dataplane/
>  common-obj-y += loader.o
>  common-obj-$(CONFIG_VIRTIO) += virtio-console.o
>  common-obj-$(CONFIG_VIRTIO) += virtio-rng.o
> diff --git a/hw/dataplane/Makefile.objs b/hw/dataplane/Makefile.objs
> index 8c8dea1..34e6d57 100644
> --- a/hw/dataplane/Makefile.objs
> +++ b/hw/dataplane/Makefile.objs
> @@ -1,3 +1,3 @@
>  ifeq ($(CONFIG_VIRTIO), y)
> -common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o
> +common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o
>  endif
> diff --git a/hw/dataplane/vring.c b/hw/dataplane/vring.c
> new file mode 100644
> index 000..2632fbd
> --- /dev/null
> +++ b/hw/dataplane/vring.c
> @@ -0,0 +1,344 @@
> +/* Copyright 2012 Red Hat, Inc.
> + * Copyright IBM, Corp. 2012
> + *
> + * Based on Linux vhost code:
> + * Copyright (C) 2009 Red Hat, Inc.
> + * Copyright (C) 2006 Rusty Russell IBM Corporation
> + *
> + * Author: Michael S. Tsirkin 
> + * Stefan Hajnoczi 
> + *
> + * Inspiration, some code, and most witty comments come from
> + * Documentation/virtual/lguest/lguest.c, by Rusty Russell
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + */
> +
> +#include "trace.h"
> +#include "hw/dataplane/vring.h"
> +
> +/* Map the guest's vring to host memory */
> +bool vring_setup(Vring *vring, VirtIODevice *vdev, int n)
> +{
> +hwaddr vring_addr = virtio_queue_get_ring_addr(vdev, n);
> +hwaddr vring_size = virtio_queue_get_ring_size(vdev, n);
> +void *vring_ptr;
> +
> +vring->broken = false;
> +
> +hostmem_init(&vring->hostmem);
> +vring_ptr = hostmem_lookup(&vring->hostmem, vring_addr, vring_size, 
> true);
> +if (!vring_ptr) {
> +error_report("Failed to map vring "
> + "addr %#" HWADDR_PRIx " size %" HWADDR_PRIu,
> + vring_addr, vring_size);
> +vring->broken = true;
> +return false;
> +}
> +
> +vring_init(&vring->vr, virtio_queue_get_num(vdev, n), vring_ptr, 4096);
> +
> +vring->last_avail_idx = 0;
> +vring->last_used_idx = 0;
> +vring->signalled_used = 0;
> +vring->signalled_used_valid = false;
> +
> +trace_vring_setup(virtio_queue_get_ring_addr(vdev, n),
> +  vring->vr.desc, vring->vr.avail, vring->vr.used);
> +return true;
> +}
> +
> +void vring_teardown(Vring *vring)
> +{
> +hostmem_finalize(&vring->hostmem);
> +}
> +
> +/* Toggle guest->host notifies */
> +void vring_set_notification(VirtIODevice *vdev, Vring *vring, bool enable)
> +{
> +if (vdev->guest_features & (1 << VIRTIO_RING_F_EVENT_IDX)) {
> +if (enable) {
> +vring_avail_event(&vring->vr) = vring->vr.avail->idx;
> +}
> +} else if (enable) {
> +vring->vr.used->flags &= ~VRING_USED_F_NO_NOTIFY;
> +} else {
> +vring->vr.used->flags |= VRING_USED_F_NO_NOTIFY;
> +}
> +}
> +
> +/* This is stolen from linux/drivers/vhost/vhost.c:vhost_notify() */
> +bool vring_should_notify(VirtIODevice *vdev, Vring *vring)
> +{
> +uint16_t old, new;
> +bool v;
> +/* Flush out used index updates. This is paired
> + * with the barrier that the Guest executes when enabling
> + * interrupts. */
> +smp_mb();
> +
> +if ((vdev->guest_features & VIRTIO_F_NOTIFY_ON_EMPTY) &&
> +unlikely(vring->vr.avail->idx == vring->last_avail_idx)) {
> +return true;
> +}
> +
> +if (!(vdev->guest_features & VIRTIO_RING_F_EVENT_IDX)) {
> +return !(vring->vr.avail->flags & VRING_AVAIL_F_NO_INTERRUPT);
> +}
> +old = vring->signalled_used;
> +v = vring->signalled_used_valid;
> +new = vring->signalled_used = vring->last_used_idx;
> +vring->signalled_used_valid = true;
> +
> +if (unlikely(!v)) {
> +return true;
> +}
> +
> +return vring_need

[Qemu-devel] [PULL for-1.3 0/3] usb patch queue

2012-11-29 Thread Gerd Hoffmann
  Hi,

This is the usb patch queue, carrying three little fixes for 1.3.

please pull,
  Gerd

The following changes since commit e9bff10f8db94912b1b0e6e2e3394cae02faf614:

  event notifier: Fix setup for win32 (2012-11-28 13:33:01 -0600)

are available in the git repository at:
  git://git.kraxel.org/qemu usb.73

Gerd Hoffmann (1):
  usb: tag usb host adapters as not hotpluggable.

Peter Crosthwaite (1):
  ehci-sysbus: Attach DMA context.

Stefan Hajnoczi (1):
  usb: fail usbdevice_create() when there is no USB bus

 hw/usb/bus.c |7 +++
 hw/usb/hcd-ehci-pci.c|1 +
 hw/usb/hcd-ehci-sysbus.c |1 +
 hw/usb/hcd-ohci.c|1 +
 hw/usb/hcd-uhci.c|1 +
 hw/usb/hcd-xhci.c|1 +
 6 files changed, 12 insertions(+), 0 deletions(-)



Re: [Qemu-devel] [RFC PATCH v2 1/3] virtio-bus : Introduce VirtioBus.

2012-11-29 Thread Konrad Frederic

On 29/11/2012 14:09, Peter Maydell wrote:

On 29 November 2012 12:37, Konrad Frederic  wrote:

On 26/11/2012 17:59, Anthony Liguori wrote:

virtio-pci-bus extends virtio-bus
   - is constructed with a pointer to a PCIDevice
   - implements the methods necessary to be a virtio bus

I still have trouble with that ^^.
The problem is that the virtio devices can't be connected to the
virtio-pci-bus even if it extends virtio-bus because TYPE_VIRTIO_BUS !=
TYPE_VIRTIO_PCI_BUS.

Conceptually it ought to work I think: if the bus is-a TYPE_VIRTIO_BUS
then we should permit plugging in even if the actual bus object happens
to be an instance of a subclass.

Yes, is my implementation doing the right thing ?
I mean is the virtio-pci-bus a virtio-bus ?



I suspect that qbus_find_recursive should be doing an
object_class_dynamic_cast() to check that the bus is of a suitable
type, rather than the
 (strcmp(object_get_typename(OBJECT(bus)), bus_typename) != 0)
which it does at the moment.

Yes, but we can cast VIRTIO_BUS in BUS no ?
So in this case we could plug VirtioDevice in BUS and that's not what we 
want ?




-- PMM






[Qemu-devel] [PATCH] qdev: relax bus type check in qdev_device_add()

2012-11-29 Thread Anthony Liguori
We are currently checking for an exact type match.  Use QOM dynamic_cast to
check for a compatible type instead.

Cc: Konrad Frederic 
Signed-off-by: Anthony Liguori 
---
 hw/qdev-monitor.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/qdev-monitor.c b/hw/qdev-monitor.c
index 479eecd..69f5ff2 100644
--- a/hw/qdev-monitor.c
+++ b/hw/qdev-monitor.c
@@ -431,11 +431,16 @@ DeviceState *qdev_device_add(QemuOpts *opts)
 /* find bus */
 path = qemu_opt_get(opts, "bus");
 if (path != NULL) {
+ObjectClass *bus_class;
+
 bus = qbus_find(path);
 if (!bus) {
 return NULL;
 }
-if (strcmp(object_get_typename(OBJECT(bus)), k->bus_type) != 0) {
+
+bus_class = OBJECT_CLASS(BUS_GET_CLASS(bus));
+
+if (!object_class_dynamic_cast(bus_class, k->bus_type)) {
 qerror_report(QERR_BAD_BUS_FOR_DEVICE,
   driver, object_get_typename(OBJECT(bus)));
 return NULL;
-- 
1.8.0




Re: [Qemu-devel] [PATCH v4 03/11] dataplane: add host memory mapping code

2012-11-29 Thread Michael S. Tsirkin
On Thu, Nov 22, 2012 at 04:16:44PM +0100, Stefan Hajnoczi wrote:
> The data plane thread needs to map guest physical addresses to host
> pointers.  Normally this is done with cpu_physical_memory_map() but the
> function assumes the global mutex is held.  The data plane thread does
> not touch the global mutex and therefore needs a thread-safe memory
> mapping mechanism.
> 
> Hostmem registers a MemoryListener similar to how vhost collects and
> pushes memory region information into the kernel.  There is a
> fine-grained lock on the regions list which is held during lookup and
> when installing a new regions list.
> 
> When the physical memory map changes the MemoryListener callbacks are
> invoked.  They build up a new list of memory regions which is finally
> installed when the list has been completed.
> 
> Note that this approach is not safe across memory hotplug because mapped
> pointers may still be in used across memory unplug.  However, this is
> currently a problem for QEMU in general and needs to be addressed in the
> future.
> 
> Signed-off-by: Stefan Hajnoczi 

Worth bothering with binary search?
vhost does a linear search over regions because
the number of ram regions is very small.

> ---
>  hw/dataplane/Makefile.objs |   3 +
>  hw/dataplane/hostmem.c | 165 
> +
>  hw/dataplane/hostmem.h |  52 ++
>  3 files changed, 220 insertions(+)
>  create mode 100644 hw/dataplane/Makefile.objs
>  create mode 100644 hw/dataplane/hostmem.c
>  create mode 100644 hw/dataplane/hostmem.h
> 
> diff --git a/hw/dataplane/Makefile.objs b/hw/dataplane/Makefile.objs
> new file mode 100644
> index 000..8c8dea1
> --- /dev/null
> +++ b/hw/dataplane/Makefile.objs
> @@ -0,0 +1,3 @@
> +ifeq ($(CONFIG_VIRTIO), y)
> +common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o
> +endif
> diff --git a/hw/dataplane/hostmem.c b/hw/dataplane/hostmem.c
> new file mode 100644
> index 000..48aabf0
> --- /dev/null
> +++ b/hw/dataplane/hostmem.c
> @@ -0,0 +1,165 @@
> +/*
> + * Thread-safe guest to host memory mapping
> + *
> + * Copyright 2012 Red Hat, Inc. and/or its affiliates
> + *
> + * Authors:
> + *   Stefan Hajnoczi 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "exec-memory.h"
> +#include "hostmem.h"
> +
> +static int hostmem_lookup_cmp(const void *phys_, const void *region_)
> +{
> +hwaddr phys = *(const hwaddr *)phys_;
> +const HostmemRegion *region = region_;
> +
> +if (phys < region->guest_addr) {
> +return -1;
> +} else if (phys >= region->guest_addr + region->size) {
> +return 1;
> +} else {
> +return 0;
> +}
> +}
> +
> +/**
> + * Map guest physical address to host pointer
> + */
> +void *hostmem_lookup(Hostmem *hostmem, hwaddr phys, hwaddr len, bool 
> is_write)
> +{
> +HostmemRegion *region;
> +void *host_addr = NULL;
> +hwaddr offset_within_region;
> +
> +qemu_mutex_lock(&hostmem->current_regions_lock);
> +region = bsearch(&phys, hostmem->current_regions,
> + hostmem->num_current_regions,
> + sizeof(hostmem->current_regions[0]),
> + hostmem_lookup_cmp);
> +if (!region) {
> +goto out;
> +}
> +if (is_write && region->readonly) {
> +goto out;
> +}
> +offset_within_region = phys - region->guest_addr;
> +if (offset_within_region + len <= region->size) {
> +host_addr = region->host_addr + offset_within_region;
> +}
> +out:
> +qemu_mutex_unlock(&hostmem->current_regions_lock);
> +
> +return host_addr;
> +}
> +
> +/**
> + * Install new regions list
> + */
> +static void hostmem_listener_commit(MemoryListener *listener)
> +{
> +Hostmem *hostmem = container_of(listener, Hostmem, listener);
> +
> +qemu_mutex_lock(&hostmem->current_regions_lock);
> +g_free(hostmem->current_regions);
> +hostmem->current_regions = hostmem->new_regions;
> +hostmem->num_current_regions = hostmem->num_new_regions;
> +qemu_mutex_unlock(&hostmem->current_regions_lock);
> +
> +/* Reset new regions list */
> +hostmem->new_regions = NULL;
> +hostmem->num_new_regions = 0;
> +}
> +
> +/**
> + * Add a MemoryRegionSection to the new regions list
> + */
> +static void hostmem_append_new_region(Hostmem *hostmem,
> +  MemoryRegionSection *section)
> +{
> +void *ram_ptr = memory_region_get_ram_ptr(section->mr);
> +size_t num = hostmem->num_new_regions;
> +size_t new_size = (num + 1) * sizeof(hostmem->new_regions[0]);
> +
> +hostmem->new_regions = g_realloc(hostmem->new_regions, new_size);
> +hostmem->new_regions[num] = (HostmemRegion){
> +.host_addr = ram_ptr + section->offset_within_region,
> +.guest_addr = section->offset_within_address_space,
> +.size = section->size,
> + 

Re: [Qemu-devel] [RFC PATCH v2 1/3] virtio-bus : Introduce VirtioBus.

2012-11-29 Thread Anthony Liguori
Konrad Frederic  writes:

> On 26/11/2012 17:59, Anthony Liguori wrote:
>> Peter Maydell  writes:
>>
>>> On 26 November 2012 14:33, Anthony Liguori  wrote:
 VirtioBusInfo is not a great name.  This is a proxy class that allows
 for a device to implement the virtio bus interface.

 This could be done as an interface but since nothing else uses
 interfaces, I'm okay with something like this.  But the first argument
 ought to be an opaque for all methods.
>>> We have at least one user of Interface in the tree IIRC.
>>> I'd much rather we did this the right way -- the only reason
>>> it's the way Fred has coded it is that there's no obvious
>>> body of code in the tree to copy, so we're thrashing around
>>> a bit. If you tell us what the correct set of structs/classes/
>>> interfaces/etc is then we can implement it :-)
>> I really think extending virtio-bus to a virtio-pci-bus and then
>> initializing it with a link to the PCI device is the best approach.
>>
>> It's by far the simpliest approach in terms of coding.
>>
>> Did I explain it adequately?  To recap:
>>
>> virtio-bus extends bus-state
>>   - implements everything that VirtIOBindings implements as methods
>>
>> virtio-pci-bus extends virtio-bus
>>   - is constructed with a pointer to a PCIDevice
>>   - implements the methods necessary to be a virtio bus
> I still have trouble with that ^^.
>
> virtio-pci-bus extends virtio-bus so I put something like that :
>
> static void virtio_pci_bus_class_init(ObjectClass *klass, void *data)
> {
>  BusClass *bc = BUS_CLASS(klass);
>  VirtioBusClass *k = VIRTIO_BUS_CLASS(klass);
>  /* VirtIOBindings */
>  k->notify = virtio_pci_notify;
>  k->save_config = virtio_pci_save_config;
>  k->load_config = virtio_pci_load_config;
>  k->save_queue = virtio_pci_save_queue;
>  k->load_queue = virtio_pci_load_queue;
>  k->get_features = virtio_pci_get_features;
>  k->query_guest_notifiers = virtio_pci_query_guest_notifiers;
>  k->set_host_notifier = virtio_pci_set_host_notifier;
>  k->set_guest_notifiers = virtio_pci_set_guest_notifiers;
>  k->vmstate_change = virtio_pci_vmstate_change;
>  /*
>   * TODO : Init and exit function.
>   * void (*init)(void *opaque);
>   * void (*exit)(void *opaque);
>   */
> }
>
> static TypeInfo virtio_pci_bus_info = {
>  .name  = TYPE_VIRTIO_PCI_BUS,
>  .parent= TYPE_VIRTIO_BUS,
>  .class_init= virtio_pci_bus_class_init,
> };
>
> and I have VirtioDevice which extends DeviceState like that :
>
> static void virtio_device_class_init(ObjectClass *klass, void *data)
> {
>  /* Set the default value here. */
>  DeviceClass *dc = DEVICE_CLASS(klass);
>  dc->bus_type = TYPE_VIRTIO_BUS;
>  dc->init = virtio_device_init;
> }
>
> static TypeInfo virtio_device_info = {
>  .name = TYPE_VIRTIO_DEVICE,
>  .parent = TYPE_DEVICE,
>  .instance_size = sizeof(VirtioDeviceState),
>  /* Abstract the virtio-device */
>  .class_init = virtio_device_class_init,
>  .abstract = true,
>  .class_size = sizeof(VirtioDeviceClass),
> };
>
> The problem is that the virtio devices can't be connected to the 
> virtio-pci-bus even if it extends virtio-bus because TYPE_VIRTIO_BUS != 
> TYPE_VIRTIO_PCI_BUS.

That's just a bug.  See the patch I just sent out to fix it.

The type check is too strict in qdev_device_add().

Regards,

Anthony Liguori

>
> Did I miss something ?
>
> Thanks,
>
> Fred
>
>
>> virtio-device extends device-state
>>   - implements methods used by virtio-bus
>>
>> Regards,
>>
>> Anthony Liguori
>>
>>> -- PMM




Re: [Qemu-devel] [RFC PATCH v2 1/3] virtio-bus : Introduce VirtioBus.

2012-11-29 Thread Peter Maydell
On 29 November 2012 13:47, Konrad Frederic  wrote:
> On 29/11/2012 14:09, Peter Maydell wrote:
>> I suspect that qbus_find_recursive should be doing an
>> object_class_dynamic_cast() to check that the bus is of a suitable
>> type, rather than the
>>  (strcmp(object_get_typename(OBJECT(bus)), bus_typename) != 0)
>> which it does at the moment.
>
> Yes, but we can cast VIRTIO_BUS in BUS no ?
> So in this case we could plug VirtioDevice in BUS and that's not what we
> want ?

I don't understand what you're asking. qbus_find_recursive()
looks for a bus which might be specified either by name or by
bus type. At the moment it insists on an exact type match
(so if you ask for a virtio-bus you have to provide a virtio-bus
object, not a subclass of virtio-bus); it should do a dynamic
cast, so anything which is a virtio-bus or a subclass of that
will do. Yes, if you say "please find me any bus which is a
bus of any kind" then you'll get something wrong back -- so
don't do that. In particular, since VirtioDevice sets its
bus_type to TYPE_VIRTIO_BUS then we will ask for that, and
anything which isn't a virtio bus (or subclass) won't be found.

-- PMM



Re: [Qemu-devel] [RFC PATCH v2 1/3] virtio-bus : Introduce VirtioBus.

2012-11-29 Thread Andreas Färber
Am 29.11.2012 14:47, schrieb Konrad Frederic:
> On 29/11/2012 14:09, Peter Maydell wrote:
>> On 29 November 2012 12:37, Konrad Frederic 
>> wrote:
>>> On 26/11/2012 17:59, Anthony Liguori wrote:
 virtio-pci-bus extends virtio-bus
- is constructed with a pointer to a PCIDevice
- implements the methods necessary to be a virtio bus
>>> I still have trouble with that ^^.
>>> The problem is that the virtio devices can't be connected to the
>>> virtio-pci-bus even if it extends virtio-bus because TYPE_VIRTIO_BUS !=
>>> TYPE_VIRTIO_PCI_BUS.
>> Conceptually it ought to work I think: if the bus is-a TYPE_VIRTIO_BUS
>> then we should permit plugging in even if the actual bus object happens
>> to be an instance of a subclass.
> Yes, is my implementation doing the right thing ?
> I mean is the virtio-pci-bus a virtio-bus ?

In your v3 patchset no. In the inline code yes, via
virtio_pci_bus_info's .parent.

>> I suspect that qbus_find_recursive should be doing an
>> object_class_dynamic_cast() to check that the bus is of a suitable
>> type, rather than the
>>  (strcmp(object_get_typename(OBJECT(bus)), bus_typename) != 0)
>> which it does at the moment.
> Yes, but we can cast VIRTIO_BUS in BUS no ?
> So in this case we could plug VirtioDevice in BUS and that's not what we
> want ?

You would want to check whether object_class_dynamic_cast(OBJECT(bus),
TYPE_VIRTIO_BUS) == NULL. This should evaluate to true if the bus is not
a virtio-bus.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



Re: [Qemu-devel] [PATCH] qdev: relax bus type check in qdev_device_add()

2012-11-29 Thread Peter Maydell
On 29 November 2012 13:49, Anthony Liguori  wrote:
> We are currently checking for an exact type match.  Use QOM dynamic_cast to
> check for a compatible type instead.

I think this only catches the case where a bus was explicitly
specified via bus=. For the default case you also need to change
the similar code for checking the bus type in qbus_find_recursive(),
right?

-- PMM



Re: [Qemu-devel] [Xen-devel] [PATCH 0/2] QEMU/xen: simplify cpu_ioreq_pio and cpu_ioreq_move

2012-11-29 Thread Ian Campbell
On Thu, 2012-11-29 at 13:21 +, Stefano Stabellini wrote:
> > Also I think 4.2.1 need these patches to enable the basic Xen on Xen
> > nested virtualization usage scenario.
> 
> I agree.

Nested virt was a tech preview in 4.2.0, is it really worth
backporting ?

Ian.




Re: [Qemu-devel] [PATCH] rbd block driver fix race between aio completition and aio cancel

2012-11-29 Thread Stefan Hajnoczi
On Thu, Nov 22, 2012 at 11:00:19AM +0100, Stefan Priebe wrote:
> @@ -406,10 +401,11 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
>  acb->ret = r;
>  }
>  }
> +acb->status = 0;
> +

I suggest doing this in the BH.  The qemu_aio_wait() loop in
qemu_rbd_aio_cancel() needs to wait until the BH has executed.  By
clearing status in the BH we ensure that no matter in which order
qemu_aio_wait() invokes BHs and callbacks, we'll always wait until the
BH has completed before ending the while loop in qemu_rbd_aio_cancel().

> @@ -737,7 +741,8 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState 
> *bs,
>  failed:
>  g_free(rcb);
>  s->qemu_aio_count--;
> -qemu_aio_release(acb);
> +if (!acb->cancelled)
> +qemu_aio_release(acb);
>  return NULL;
>  }

This scenario is impossible.  We haven't returned the acb back to the
caller yet so they could not have invoked qemu_aio_cancel().

Stefan



Re: [Qemu-devel] [PATCH v4 10/11] dataplane: add virtio-blk data plane code

2012-11-29 Thread Michael S. Tsirkin
On Thu, Nov 22, 2012 at 04:16:51PM +0100, Stefan Hajnoczi wrote:
> virtio-blk-data-plane is a subset implementation of virtio-blk.  It only
> handles read, write, and flush requests.  It does this using a dedicated
> thread that executes an epoll(2)-based event loop and processes I/O
> using Linux AIO.
> 
> This approach performs very well but can be used for raw image files
> only.  The number of IOPS achieved has been reported to be several times
> higher than the existing virtio-blk implementation.
> 
> Eventually it should be possible to unify virtio-blk-data-plane with the
> main body of QEMU code once the block layer and hardware emulation is
> able to run outside the global mutex.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  hw/dataplane/Makefile.objs |   2 +-
>  hw/dataplane/virtio-blk.c  | 427 
> +
>  hw/dataplane/virtio-blk.h  |  41 +
>  trace-events   |   6 +
>  4 files changed, 475 insertions(+), 1 deletion(-)
>  create mode 100644 hw/dataplane/virtio-blk.c
>  create mode 100644 hw/dataplane/virtio-blk.h
> 
> diff --git a/hw/dataplane/Makefile.objs b/hw/dataplane/Makefile.objs
> index abd408f..682aa9e 100644
> --- a/hw/dataplane/Makefile.objs
> +++ b/hw/dataplane/Makefile.objs
> @@ -1,3 +1,3 @@
>  ifeq ($(CONFIG_VIRTIO), y)
> -common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o event-poll.o 
> ioq.o
> +common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o event-poll.o 
> ioq.o virtio-blk.o
>  endif
> diff --git a/hw/dataplane/virtio-blk.c b/hw/dataplane/virtio-blk.c
> new file mode 100644
> index 000..9b29969
> --- /dev/null
> +++ b/hw/dataplane/virtio-blk.c
> @@ -0,0 +1,427 @@
> +/*
> + * Dedicated thread for virtio-blk I/O processing
> + *
> + * Copyright 2012 IBM, Corp.
> + * Copyright 2012 Red Hat, Inc. and/or its affiliates
> + *
> + * Authors:
> + *   Stefan Hajnoczi 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "trace.h"
> +#include "iov.h"
> +#include "event-poll.h"
> +#include "qemu-thread.h"
> +#include "vring.h"
> +#include "ioq.h"
> +#include "hw/virtio-blk.h"
> +#include "hw/dataplane/virtio-blk.h"
> +
> +enum {
> +SEG_MAX = 126,  /* maximum number of I/O segments */
> +VRING_MAX = SEG_MAX + 2,/* maximum number of vring descriptors */
> +REQ_MAX = VRING_MAX,/* maximum number of requests in the 
> vring,
> + * is VRING_MAX / 2 with traditional and
> + * VRING_MAX with indirect descriptors */
> +};
> +
> +typedef struct {
> +struct iocb iocb;   /* Linux AIO control block */
> +QEMUIOVector *inhdr;/* iovecs for virtio_blk_inhdr */
> +unsigned int head;  /* vring descriptor index */
> +} VirtIOBlockRequest;
> +
> +struct VirtIOBlockDataPlane {
> +bool started;
> +QEMUBH *start_bh;
> +QemuThread thread;
> +
> +int fd; /* image file descriptor */
> +
> +VirtIODevice *vdev;
> +Vring vring;/* virtqueue vring */
> +EventNotifier *guest_notifier;  /* irq */
> +
> +EventPoll event_poll;   /* event poller */
> +EventHandler io_handler;/* Linux AIO completion handler */
> +EventHandler notify_handler;/* virtqueue notify handler */
> +
> +IOQueue ioqueue;/* Linux AIO queue (should really be per
> +   dataplane thread) */
> +VirtIOBlockRequest requests[REQ_MAX]; /* pool of requests, managed by the
> + queue */
> +
> +unsigned int num_reqs;
> +QemuMutex num_reqs_lock;

OK the only reason this lock is needed is because
you want to drain outside the thread.
Won't it be better to queue process the drain request through
the thread?
You won't need any locks then.

> +QemuCond no_reqs_cond;
> +};
> +
> +/* Raise an interrupt to signal guest, if necessary */
> +static void notify_guest(VirtIOBlockDataPlane *s)
> +{
> +if (!vring_should_notify(s->vdev, &s->vring)) {
> +return;
> +}
> +
> +event_notifier_set(s->guest_notifier);
> +}
> +
> +static void complete_request(struct iocb *iocb, ssize_t ret, void *opaque)
> +{
> +VirtIOBlockDataPlane *s = opaque;
> +VirtIOBlockRequest *req = container_of(iocb, VirtIOBlockRequest, iocb);
> +struct virtio_blk_inhdr hdr;
> +int len;
> +
> +if (likely(ret >= 0)) {
> +hdr.status = VIRTIO_BLK_S_OK;
> +len = ret;
> +} else {
> +hdr.status = VIRTIO_BLK_S_IOERR;
> +len = 0;
> +}
> +
> +trace_virtio_blk_data_plane_complete_request(s, req->head, ret);
> +
> +qemu_iovec_from_buf(req->inhdr, 0, &hdr, sizeof(hdr));
> +qemu_iovec_destroy(req->inhdr);
> +g_slice_free(QEMUIOVector, req->inhdr);
> +

Re: [Qemu-devel] [PATCH] qdev: relax bus type check in qdev_device_add()

2012-11-29 Thread Konrad Frederic

On 29/11/2012 14:56, Peter Maydell wrote:

On 29 November 2012 13:49, Anthony Liguori  wrote:

We are currently checking for an exact type match.  Use QOM dynamic_cast to
check for a compatible type instead.

I think this only catches the case where a bus was explicitly
specified via bus=. For the default case you also need to change
the similar code for checking the bus type in qbus_find_recursive(),
right?

-- PMM

Right, it's working only with the "bus=" command line.



Re: [Qemu-devel] [PATCH v4 00/11] virtio: virtio-blk data plane

2012-11-29 Thread Michael S. Tsirkin
On Thu, Nov 29, 2012 at 10:18:59AM +0100, Stefan Hajnoczi wrote:
> Michael, Paolo: Are you happy with v4?

Looks pretty clean by itself. I sent some comments but they can be
addressed later.  What worries me most is the code duplication with
regular virtio.

I see two ways to reduce the maintainance somewhat
- split out ring handling code in virtio-blk
  to a separate file to make it more obvious which part
  is inactive when data plane runs.
- share ring processing code with virtio/virtio-blk
  (e.g. use callbacks)

Was any thought given to implementing one of these two
approaches? 

-- 
MST



Re: [Qemu-devel] [PATCH] virtio: limit avail bytes lookahead

2012-11-29 Thread Michael S. Tsirkin
On Thu, Nov 29, 2012 at 06:34:46PM +0530, Amit Shah wrote:
> On (Wed) 28 Nov 2012 [23:53:08], Michael S. Tsirkin wrote:
> > On Tue, Nov 27, 2012 at 06:25:04PM +0200, Michael S. Tsirkin wrote:
> > > On Thu, Nov 01, 2012 at 06:07:21PM +0200, Michael S. Tsirkin wrote:
> > > > Commit 0d8d7690850eb0cf2b2b60933cf47669a6b6f18f introduced
> > > > a regression in virtio-net performance because it looks
> > > > into the ring aggressively while we really only care
> > > > about a single packet worth of buffers.
> > > > To fix, add parameters limiting lookahead, and
> > > > use in virtqueue_avail_bytes.
> > > > 
> > > > Signed-off-by: Michael S. Tsirkin 
> > > > Reported-by: Edivaldo de Araujo Pereira 
> > > 
> > > Ping.
> > > Anthony - going to apply this?
> > 
> > virtio rng was added since so naturally build broke.
> > Here's a patch on top to fix it up. I never used virtio rng before so
> > could not test at this hour, but it does fix the build.
> > 
> > I'll take a look at how to test it tomorrow but any
> > info would be appreciated.
> > Amit could you pls review?
> 
> Looks fine, I assume you will send a v2 of the patch to Anthony?
> 
>   Amit


Anthony volunteered to test this so there will only be v2 if he sees
problems.



Re: [Qemu-devel] [PATCH 1.3] virtio-rng: do not use g_assert_cmpint

2012-11-29 Thread Stefan Hajnoczi
On Tue, Nov 27, 2012 at 09:16:24AM +0100, Paolo Bonzini wrote:
> g_assert_cmpint is not available on glib 2.12, which is the minimum
> version required to build QEMU (we only require 2.16 to run tests,
> since that is the first version including GTester).  Do not use it
> in hardware models, use a normal assertion instead.
> 
> This fixes the buildbot failure for default_x86_64_rhel5.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  hw/virtio-rng.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi 



Re: [Qemu-devel] main-loop.c: About Select handling

2012-11-29 Thread Stefan Hajnoczi
On Wed, Nov 28, 2012 at 03:08:24AM +, Furukawa, Eiji wrote:
> About a source of qemu-1.2.0/main-loop.c
> The select handling of os_host_main_loop_wait function
> I do not seem to do Exit by interrupts such as SIGUSR1
> Will not it be necessary to make modifications?
> 
> Before
>  LineNumber:308   ret = select(nfds + 1, &rfds, &wfds, &xfds, tvarg);
> 
> After(Example)
> do {
> ret = select(nfds + 1, &rfds, &wfds, &xfds, tvarg);
> } while(ret == -1 && (errno == EINTR || errno == EAGAIN))

What is the specific bug or problem you're seeing?

QEMU uses several signals internally and sets up per-thread signal masks
appropriately.

Stefan



Re: [Qemu-devel] [PATCH v4 03/11] dataplane: add host memory mapping code

2012-11-29 Thread Stefan Hajnoczi
On Thu, Nov 29, 2012 at 03:54:25PM +0200, Michael S. Tsirkin wrote:
> On Thu, Nov 22, 2012 at 04:16:44PM +0100, Stefan Hajnoczi wrote:
> > The data plane thread needs to map guest physical addresses to host
> > pointers.  Normally this is done with cpu_physical_memory_map() but the
> > function assumes the global mutex is held.  The data plane thread does
> > not touch the global mutex and therefore needs a thread-safe memory
> > mapping mechanism.
> > 
> > Hostmem registers a MemoryListener similar to how vhost collects and
> > pushes memory region information into the kernel.  There is a
> > fine-grained lock on the regions list which is held during lookup and
> > when installing a new regions list.
> > 
> > When the physical memory map changes the MemoryListener callbacks are
> > invoked.  They build up a new list of memory regions which is finally
> > installed when the list has been completed.
> > 
> > Note that this approach is not safe across memory hotplug because mapped
> > pointers may still be in used across memory unplug.  However, this is
> > currently a problem for QEMU in general and needs to be addressed in the
> > future.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> 
> Worth bothering with binary search?
> vhost does a linear search over regions because
> the number of ram regions is very small.

memory.c does binary search.  I did the same but in practice there are
<20 regions for a simple VM.  It's probably not worth it but without
performance results this is speculation.

I think there's no harm in using binary search to start with.

> > +static void hostmem_listener_append_region(MemoryListener *listener,
> > +   MemoryRegionSection *section)
> > +{
> > +Hostmem *hostmem = container_of(listener, Hostmem, listener);
> > +
> > +if (memory_region_is_ram(section->mr)) {
> > +hostmem_append_new_region(hostmem, section);
> > +}
> 
> I think you also need to remove VGA region since you
> don't mark these pages as dirty so access there won't work.

I don't understand.  If memory in the VGA region returns true from
memory_region_is_ram(), why would there be a problem?



Re: [Qemu-devel] [RFC PATCH v2 1/3] virtio-bus : Introduce VirtioBus.

2012-11-29 Thread Konrad Frederic

On 29/11/2012 14:55, Andreas Färber wrote:

Am 29.11.2012 14:47, schrieb Konrad Frederic:

On 29/11/2012 14:09, Peter Maydell wrote:

On 29 November 2012 12:37, Konrad Frederic
wrote:

On 26/11/2012 17:59, Anthony Liguori wrote:

virtio-pci-bus extends virtio-bus
- is constructed with a pointer to a PCIDevice
- implements the methods necessary to be a virtio bus

I still have trouble with that ^^.
The problem is that the virtio devices can't be connected to the
virtio-pci-bus even if it extends virtio-bus because TYPE_VIRTIO_BUS !=
TYPE_VIRTIO_PCI_BUS.

Conceptually it ought to work I think: if the bus is-a TYPE_VIRTIO_BUS
then we should permit plugging in even if the actual bus object happens
to be an instance of a subclass.

Yes, is my implementation doing the right thing ?
I mean is the virtio-pci-bus a virtio-bus ?

In your v3 patchset no. In the inline code yes, via
virtio_pci_bus_info's .parent.


I suspect that qbus_find_recursive should be doing an
object_class_dynamic_cast() to check that the bus is of a suitable
type, rather than the
  (strcmp(object_get_typename(OBJECT(bus)), bus_typename) != 0)
which it does at the moment.

Yes, but we can cast VIRTIO_BUS in BUS no ?
So in this case we could plug VirtioDevice in BUS and that's not what we
want ?

You would want to check whether object_class_dynamic_cast(OBJECT(bus),
TYPE_VIRTIO_BUS) == NULL. This should evaluate to true if the bus is not
a virtio-bus.

Yes, ok I got it!

Thanks,

Fred



[Qemu-devel] LiveMigration and TSC

2012-11-29 Thread Peter Lieven
Hi,

can someone give me a short status of the topic Live Migration and VMs that use 
TSC as clocksource?

Broken, yes / no? And if yes under what circumstances?

I have some TSC VMs that freeze occasionally after LiveMigration.

Thank you,
Peter



[Qemu-devel] [PATCHv4] rbd block driver fix race between aio completition and aio cancel

2012-11-29 Thread Stefan Priebe
This one fixes a race which qemu had also in iscsi block driver
between cancellation and io completition.

qemu_rbd_aio_cancel was not synchronously waiting for the end of
the command.

To archieve this it introduces a new status flag which uses
-EINPROGRESS.

Changes since PATCHv3:
- removed unnecessary if condition in rbd_start_aio as we
  haven't start io yet
- moved acb->status = 0 to rbd_aio_bh_cb so qemu_aio_wait always
  waits until BH was executed

Changes since PATCHv2:
- fixed missing braces
- added vfree for bounce

Signed-off-by: Stefan Priebe 
---
 block/rbd.c |   16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index f3becc7..28e94ab 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -77,6 +77,7 @@ typedef struct RBDAIOCB {
 int error;
 struct BDRVRBDState *s;
 int cancelled;
+int status;
 } RBDAIOCB;
 
 typedef struct RADOSCB {
@@ -376,12 +377,6 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
 RBDAIOCB *acb = rcb->acb;
 int64_t r;
 
-if (acb->cancelled) {
-qemu_vfree(acb->bounce);
-qemu_aio_release(acb);
-goto done;
-}
-
 r = rcb->ret;
 
 if (acb->cmd == RBD_AIO_WRITE ||
@@ -409,7 +404,6 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
 /* Note that acb->bh can be NULL in case where the aio was cancelled */
 acb->bh = qemu_bh_new(rbd_aio_bh_cb, acb);
 qemu_bh_schedule(acb->bh);
-done:
 g_free(rcb);
 }
 
@@ -568,6 +562,12 @@ static void qemu_rbd_aio_cancel(BlockDriverAIOCB *blockacb)
 {
 RBDAIOCB *acb = (RBDAIOCB *) blockacb;
 acb->cancelled = 1;
+
+while (acb->status == -EINPROGRESS) {
+qemu_aio_wait();
+}
+
+qemu_vfree(acb->bounce);
 }
 
 static const AIOCBInfo rbd_aiocb_info = {
@@ -639,6 +639,7 @@ static void rbd_aio_bh_cb(void *opaque)
 acb->common.cb(acb->common.opaque, (acb->ret > 0 ? 0 : acb->ret));
 qemu_bh_delete(acb->bh);
 acb->bh = NULL;
+acb->status = 0;
 
 qemu_aio_release(acb);
 }
@@ -685,6 +686,7 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs,
 acb->s = s;
 acb->cancelled = 0;
 acb->bh = NULL;
+acb->status = -EINPROGRESS;
 
 if (cmd == RBD_AIO_WRITE) {
 qemu_iovec_to_buf(acb->qiov, 0, acb->bounce, qiov->size);
-- 
1.7.10.4




Re: [Qemu-devel] [PATCH] rbd block driver fix race between aio completition and aio cancel

2012-11-29 Thread Stefan Priebe - Profihost AG

Hi,

i hope i've done everything correctly. I've send a new v4 patch.

Am 29.11.2012 14:58, schrieb Stefan Hajnoczi:

On Thu, Nov 22, 2012 at 11:00:19AM +0100, Stefan Priebe wrote:

@@ -406,10 +401,11 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
  acb->ret = r;
  }
  }
+acb->status = 0;
+


I suggest doing this in the BH.  The qemu_aio_wait() loop in
qemu_rbd_aio_cancel() needs to wait until the BH has executed.  By
clearing status in the BH we ensure that no matter in which order
qemu_aio_wait() invokes BHs and callbacks, we'll always wait until the
BH has completed before ending the while loop in qemu_rbd_aio_cancel().


@@ -737,7 +741,8 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs,
  failed:
  g_free(rcb);
  s->qemu_aio_count--;
-qemu_aio_release(acb);
+if (!acb->cancelled)
+qemu_aio_release(acb);
  return NULL;
  }


This scenario is impossible.  We haven't returned the acb back to the
caller yet so they could not have invoked qemu_aio_cancel().


Greets,
Stefan



Re: [Qemu-devel] [PATCH v4 03/11] dataplane: add host memory mapping code

2012-11-29 Thread Michael S. Tsirkin
On Thu, Nov 29, 2012 at 03:26:56PM +0100, Stefan Hajnoczi wrote:
> On Thu, Nov 29, 2012 at 03:54:25PM +0200, Michael S. Tsirkin wrote:
> > On Thu, Nov 22, 2012 at 04:16:44PM +0100, Stefan Hajnoczi wrote:
> > > The data plane thread needs to map guest physical addresses to host
> > > pointers.  Normally this is done with cpu_physical_memory_map() but the
> > > function assumes the global mutex is held.  The data plane thread does
> > > not touch the global mutex and therefore needs a thread-safe memory
> > > mapping mechanism.
> > > 
> > > Hostmem registers a MemoryListener similar to how vhost collects and
> > > pushes memory region information into the kernel.  There is a
> > > fine-grained lock on the regions list which is held during lookup and
> > > when installing a new regions list.
> > > 
> > > When the physical memory map changes the MemoryListener callbacks are
> > > invoked.  They build up a new list of memory regions which is finally
> > > installed when the list has been completed.
> > > 
> > > Note that this approach is not safe across memory hotplug because mapped
> > > pointers may still be in used across memory unplug.  However, this is
> > > currently a problem for QEMU in general and needs to be addressed in the
> > > future.
> > > 
> > > Signed-off-by: Stefan Hajnoczi 
> > 
> > Worth bothering with binary search?
> > vhost does a linear search over regions because
> > the number of ram regions is very small.
> 
> memory.c does binary search.  I did the same but in practice there are
> <20 regions for a simple VM.  It's probably not worth it but without
> performance results this is speculation.
> 
> I think there's no harm in using binary search to start with.
> 
> > > +static void hostmem_listener_append_region(MemoryListener *listener,
> > > +   MemoryRegionSection *section)
> > > +{
> > > +Hostmem *hostmem = container_of(listener, Hostmem, listener);
> > > +
> > > +if (memory_region_is_ram(section->mr)) {
> > > +hostmem_append_new_region(hostmem, section);
> > > +}
> > 
> > I think you also need to remove VGA region since you
> > don't mark these pages as dirty so access there won't work.
> 
> I don't understand.  If memory in the VGA region returns true from
> memory_region_is_ram(), why would there be a problem?

If you change this memory but you don't update the display.
Never happens with non buggy guests but we should catch and fail if it does.

-- 
MST



Re: [Qemu-devel] [PATCH v4 11/11] virtio-blk: add x-data-plane=on|off performance feature

2012-11-29 Thread Stefan Hajnoczi
On Thu, Nov 29, 2012 at 03:12:35PM +0200, Michael S. Tsirkin wrote:
> On Thu, Nov 22, 2012 at 04:16:52PM +0100, Stefan Hajnoczi wrote:
> > The virtio-blk-data-plane feature is easy to integrate into
> > hw/virtio-blk.c.  The data plane can be started and stopped similar to
> > vhost-net.
> > 
> > Users can take advantage of the virtio-blk-data-plane feature using the
> > new -device virtio-blk-pci,x-data-plane=on property.
> > 
> > The x-data-plane name was chosen because at this stage the feature is
> > experimental and likely to see changes in the future.
> > 
> > If the VM configuration does not support virtio-blk-data-plane an error
> > message is printed.  Although we could fall back to regular virtio-blk,
> > I prefer the explicit approach since it prompts the user to fix their
> > configuration if they want the performance benefit of
> > virtio-blk-data-plane.
> 
> Not only that, this affects features exposed to guest so it really can't be
> trasparent.
> 
> Which reminds me - shouldn't some features be turned off?
> For example, VIRTIO_BLK_F_SCSI?

Yes, virtio-blk-data-plane only starts when you give -device
virtio-blk-pci,scsi=off,x-data-plane=on.  If you use scsi=on an error
message is printed.

> > Limitations:
> >  * Only format=raw is supported
> >  * Live migration is not supported
> 
> This is probably fixable long term?

Absolutely.  There are two parts:

1. Marking written memory dirty so live RAM migration can work.  Missing
   today, easy cheat is to switch off virtio-blk-data-plane and silently
   switch to regular virtio-blk emulation while memory dirty logging is
   enabled.  The more long-term solution is to actually communicate the
   dirty information back to the memory API.

2. Synchronizing virtio-blk-data-plane vring state with virtio-blk so
   save/load works.  This should be relatively straightforward.

I don't want to gate this patch series on live migration support but it
is on my TODO list for virtio-blk-data-plane after this initial series
has been merged.

> >  * Block jobs, hot unplug, and other operations fail with -EBUSY
> 
> Hmm I don't see code to disable PCU unplug in this patch.
> I expected no_hotplug to be set.
> Where is it?

It uses the bdrv_in_use() mechanism.

> >  * I/O throttling limits are ignored
> 
> And this?
> Meanwhile can we have attempts to set them fail?

This limitation exists because virtio-blk-data-plane today bypasses the
QEMU block layer.  The next step is to get the block layer working
inside the data plane thread.  At that point I/O limits work again.

Adding an error would be a layering violation because I/O throttling
happens in the QEMU block layer and is unaware of the emulated storage
controller (virtio-blk, IDE, SCSI, etc).

I think it's better to document the limitation and continue working on
AioContext so that we can soon support I/O throttling with
virtio-blk-data-plane.  It would be quite ugly to add checks.

> > @@ -33,6 +35,8 @@ typedef struct VirtIOBlock
> >  VirtIOBlkConf *blk;
> >  unsigned short sector_mask;
> >  DeviceState *qdev;
> > +VirtIOBlockDataPlane *dataplane;
> > +Error *migration_blocker;
> 
> Would be nice to move the migration disabling
> checking supported formats
> and all the rest of it out to dataplane code.

The reason to do it in virtio-blk.c is that we already have access to
the device configuration.  If we move it to hw/dataplane/virtio-blk.c
then that code needs to reach inside and check data that it doesn't
otherwise access.

IMO it's nice to keep data plane "dumb" and perform these checks where
we already have to deal with the relationship between VirtIOBlkConf and
friends.

Stefan



Re: [Qemu-devel] [PATCH v4 00/11] virtio: virtio-blk data plane

2012-11-29 Thread Stefan Hajnoczi
On Thu, Nov 29, 2012 at 04:09:28PM +0200, Michael S. Tsirkin wrote:
> On Thu, Nov 29, 2012 at 10:18:59AM +0100, Stefan Hajnoczi wrote:
> > Michael, Paolo: Are you happy with v4?
> 
> Looks pretty clean by itself. I sent some comments but they can be
> addressed later.  What worries me most is the code duplication with
> regular virtio.
> 
> I see two ways to reduce the maintainance somewhat
> - split out ring handling code in virtio-blk
>   to a separate file to make it more obvious which part
>   is inactive when data plane runs.
> - share ring processing code with virtio/virtio-blk
>   (e.g. use callbacks)
> 
> Was any thought given to implementing one of these two
> approaches? 

Yes, your option #2 is where I'd like to move once threaded memory
dispatch is working.  I hope we can run virtio.c code in a thread
outside the global mutex soon.  That way we can kill
hw/dataplane/vring.[ch].

Ping Fan Liu has been working on the memory API and device emulation
stuff that we need in order to eventually use virtio.c outside the
global mutex.

Stefan



Re: [Qemu-devel] [PATCH v4 11/11] virtio-blk: add x-data-plane=on|off performance feature

2012-11-29 Thread Michael S. Tsirkin
On Thu, Nov 29, 2012 at 03:45:55PM +0100, Stefan Hajnoczi wrote:
> On Thu, Nov 29, 2012 at 03:12:35PM +0200, Michael S. Tsirkin wrote:
> > On Thu, Nov 22, 2012 at 04:16:52PM +0100, Stefan Hajnoczi wrote:
> > > The virtio-blk-data-plane feature is easy to integrate into
> > > hw/virtio-blk.c.  The data plane can be started and stopped similar to
> > > vhost-net.
> > > 
> > > Users can take advantage of the virtio-blk-data-plane feature using the
> > > new -device virtio-blk-pci,x-data-plane=on property.
> > > 
> > > The x-data-plane name was chosen because at this stage the feature is
> > > experimental and likely to see changes in the future.
> > > 
> > > If the VM configuration does not support virtio-blk-data-plane an error
> > > message is printed.  Although we could fall back to regular virtio-blk,
> > > I prefer the explicit approach since it prompts the user to fix their
> > > configuration if they want the performance benefit of
> > > virtio-blk-data-plane.
> > 
> > Not only that, this affects features exposed to guest so it really can't be
> > trasparent.
> > 
> > Which reminds me - shouldn't some features be turned off?
> > For example, VIRTIO_BLK_F_SCSI?
> 
> Yes, virtio-blk-data-plane only starts when you give -device
> virtio-blk-pci,scsi=off,x-data-plane=on.  If you use scsi=on an error
> message is printed.
> 
> > > Limitations:
> > >  * Only format=raw is supported
> > >  * Live migration is not supported
> > 
> > This is probably fixable long term?
> 
> Absolutely.  There are two parts:
> 
> 1. Marking written memory dirty so live RAM migration can work.  Missing
>today, easy cheat is to switch off virtio-blk-data-plane and silently
>switch to regular virtio-blk emulation while memory dirty logging is
>enabled.  The more long-term solution is to actually communicate the
>dirty information back to the memory API.
> 
> 2. Synchronizing virtio-blk-data-plane vring state with virtio-blk so
>save/load works.  This should be relatively straightforward.
> 
> I don't want to gate this patch series on live migration support but it
> is on my TODO list for virtio-blk-data-plane after this initial series
> has been merged.
> 
> > >  * Block jobs, hot unplug, and other operations fail with -EBUSY
> > 
> > Hmm I don't see code to disable PCU unplug in this patch.
> > I expected no_hotplug to be set.
> > Where is it?
> 
> It uses the bdrv_in_use() mechanism.

Hmm but PCI device can still go away if
guest ejects it. Does this work fine?

> > >  * I/O throttling limits are ignored
> > 
> > And this?
> > Meanwhile can we have attempts to set them fail?
> 
> This limitation exists because virtio-blk-data-plane today bypasses the
> QEMU block layer.  The next step is to get the block layer working
> inside the data plane thread.  At that point I/O limits work again.
> 
> Adding an error would be a layering violation because I/O throttling
> happens in the QEMU block layer and is unaware of the emulated storage
> controller (virtio-blk, IDE, SCSI, etc).
> 
> I think it's better to document the limitation and continue working on
> AioContext so that we can soon support I/O throttling with
> virtio-blk-data-plane.  It would be quite ugly to add checks.
> 
> > > @@ -33,6 +35,8 @@ typedef struct VirtIOBlock
> > >  VirtIOBlkConf *blk;
> > >  unsigned short sector_mask;
> > >  DeviceState *qdev;
> > > +VirtIOBlockDataPlane *dataplane;
> > > +Error *migration_blocker;
> > 
> > Would be nice to move the migration disabling
> > checking supported formats
> > and all the rest of it out to dataplane code.
> 
> The reason to do it in virtio-blk.c is that we already have access to
> the device configuration.  If we move it to hw/dataplane/virtio-blk.c
> then that code needs to reach inside and check data that it doesn't
> otherwise access.

Not really, just pass it all necessary data.

> IMO it's nice to keep data plane "dumb" and perform these checks where
> we already have to deal with the relationship between VirtIOBlkConf and
> friends.
> 
> Stefan

Yes but then it's not contained.



Re: [Qemu-devel] [PATCH 15/37] target-i386: set default value of "hypervisor" feature using static property

2012-11-29 Thread Igor Mammedov
On Fri, 9 Nov 2012 16:48:07 +0100
Eduardo Habkost  wrote:

> 
> On 22/10/2012, at 17:03, Igor Mammedov  wrote:
> 
> > Signed-off-by: Igor Mammedov 
> > ---
> > target-i386/cpu.c | 9 +++--
> > 1 file changed, 3 insertions(+), 6 deletions(-)
> > 
> > diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> > index 3131945..dc4fcdf 100644
> > --- a/target-i386/cpu.c
> > +++ b/target-i386/cpu.c
> > @@ -174,7 +174,7 @@ static Property cpu_x86_properties[] = {
> > DEFINE_PROP_BIT("f-xsave", X86CPU, env.cpuid_ext_features, 26, false),
> > DEFINE_PROP_BIT("f-osxsave", X86CPU, env.cpuid_ext_features, 27,
> > false), DEFINE_PROP_BIT("f-avx", X86CPU, env.cpuid_ext_features, 28,
> > false),
> > -DEFINE_PROP_BIT("f-hypervisor", X86CPU, env.cpuid_ext_features, 31,
> > false),
> > +DEFINE_PROP_BIT("f-hypervisor", X86CPU, env.cpuid_ext_features, 31,
> > true), DEFINE_PROP_BIT("f-syscall", X86CPU, env.cpuid_ext2_features, 11,
> > false), DEFINE_PROP_BIT("f-nx", X86CPU, env.cpuid_ext2_features, 20,
> > false), DEFINE_PROP_BIT("f-xd", X86CPU, env.cpuid_ext2_features, 20,
> > false), @@ -1307,11 +1307,12 @@ static int cpu_x86_find_by_name(X86CPU
> > *cpu, x86_def_t *x86_cpu_def, {
> > unsigned int i;
> > x86_def_t *def;
> > +CPUX86State *env = &cpu->env;
> > 
> > char *s = g_strdup(cpu_model);
> > char *featurestr, *name = strtok(s, ",");
> > /* Features to be added*/
> > -uint32_t plus_features = 0, plus_ext_features = 0;
> > +uint32_t plus_features = 0, plus_ext_features =
> > env->cpuid_ext_features;
> 
> Moving data back and forth between CPUX86State and x86_def_t makes the
> initialization ordering confusing (today data is moved from x86_def_t to
> X86CPU, and never the other way around).
> 
> As this code is removed in the next patches, I don't mind too much, but
> maybe it's simpler to implement this change only after the "use static
> properties for setting cpuid features" patch?
It won't eliminate moving data back and forth between CPUX86State and
x86_def_t until CPU sub-classes (i.e. when x86_def_t isn't used anymore),
if defaults from static properties are used.

But I've rewritten patches in a way to make this easier to review and less
confusing.

> 
> > uint32_t plus_ext2_features = 0, plus_ext3_features = 0;
> > uint32_t plus_kvm_features = 0, plus_svm_features = 0;
> > uint32_t plus_7_0_ebx_features = 0;
> > @@ -1345,10 +1346,6 @@ static int cpu_x86_find_by_name(X86CPU *cpu,
> > x86_def_t *x86_cpu_def, plus_kvm_features = 0;
> > #endif
> > 
> > -add_flagname_to_bitmaps("hypervisor", &plus_features,
> > -&plus_ext_features, &plus_ext2_features, &plus_ext3_features,
> > -&plus_kvm_features, &plus_svm_features,
> > &plus_7_0_ebx_features); -
> > featurestr = strtok(NULL, ",");
> > 
> > while (featurestr) {
> > -- 
> > 1.7.11.7
> > 
> > 
> 




[Qemu-devel] [PATCH] qdev: relax bus type check in qdev_device_add() (v2)

2012-11-29 Thread Anthony Liguori
We are currently checking for an exact type match.  Use QOM dynamic_cast to
check for a compatible type instead.

Cc: Konrad Frederic 
Cc: Peter Maydell 
Signed-off-by: Anthony Liguori 
---
v1 -> v2:
 - also add cast to qbus_find_recursive (Peter)
 - simplify by doing object_dynamic_cast instead of messing with classes
---
 hw/qdev-monitor.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/qdev-monitor.c b/hw/qdev-monitor.c
index 479eecd..a1b4d6a 100644
--- a/hw/qdev-monitor.c
+++ b/hw/qdev-monitor.c
@@ -289,8 +289,7 @@ static BusState *qbus_find_recursive(BusState *bus, const 
char *name,
 if (name && (strcmp(bus->name, name) != 0)) {
 match = 0;
 }
-if (bus_typename &&
-(strcmp(object_get_typename(OBJECT(bus)), bus_typename) != 0)) {
+if (bus_typename && !object_dynamic_cast(OBJECT(bus), bus_typename)) {
 match = 0;
 }
 if (match) {
@@ -435,7 +434,7 @@ DeviceState *qdev_device_add(QemuOpts *opts)
 if (!bus) {
 return NULL;
 }
-if (strcmp(object_get_typename(OBJECT(bus)), k->bus_type) != 0) {
+if (!object_dynamic_cast(OBJECT(bus), k->bus_type)) {
 qerror_report(QERR_BAD_BUS_FOR_DEVICE,
   driver, object_get_typename(OBJECT(bus)));
 return NULL;
-- 
1.8.0




Re: [Qemu-devel] [PATCH v4 04/11] dataplane: add virtqueue vring code

2012-11-29 Thread Paolo Bonzini

> > +/* Toggle guest->host notifies */
> > +void vring_set_notification(VirtIODevice *vdev, Vring *vring, bool
> > enable)
> > +{
> > +if (vdev->guest_features & (1 << VIRTIO_RING_F_EVENT_IDX)) {
> > +if (enable) {
> > +vring_avail_event(&vring->vr) = vring->vr.avail->idx;
> > +}
> > +} else if (enable) {
> > +vring->vr.used->flags &= ~VRING_USED_F_NO_NOTIFY;
> > +} else {
> > +vring->vr.used->flags |= VRING_USED_F_NO_NOTIFY;
> > +}
> > +}

This is similar to the (guest-side) virtqueue_disable_cb/virtqueue_enable_cb.

Perhaps having two functions will be easier to use, because from your
other code it looks like you'd benefit from a return value when
enable == true (again similar to virtqueue_enable_cb).

Paolo



Re: [Qemu-devel] [PATCH] qdev: relax bus type check in qdev_device_add() (v2)

2012-11-29 Thread Konrad Frederic

On 29/11/2012 16:12, Anthony Liguori wrote:

We are currently checking for an exact type match.  Use QOM dynamic_cast to
check for a compatible type instead.

Cc: Konrad Frederic
Cc: Peter Maydell
Signed-off-by: Anthony Liguori
---
v1 ->  v2:
  - also add cast to qbus_find_recursive (Peter)
  - simplify by doing object_dynamic_cast instead of messing with classes
---
  hw/qdev-monitor.c | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/qdev-monitor.c b/hw/qdev-monitor.c
index 479eecd..a1b4d6a 100644
--- a/hw/qdev-monitor.c
+++ b/hw/qdev-monitor.c
@@ -289,8 +289,7 @@ static BusState *qbus_find_recursive(BusState *bus, const 
char *name,
  if (name&&  (strcmp(bus->name, name) != 0)) {
  match = 0;
  }
-if (bus_typename&&
-(strcmp(object_get_typename(OBJECT(bus)), bus_typename) != 0)) {
+if (bus_typename&&  !object_dynamic_cast(OBJECT(bus), bus_typename)) {
  match = 0;
  }
  if (match) {
@@ -435,7 +434,7 @@ DeviceState *qdev_device_add(QemuOpts *opts)
  if (!bus) {
  return NULL;
  }
-if (strcmp(object_get_typename(OBJECT(bus)), k->bus_type) != 0) {
+if (!object_dynamic_cast(OBJECT(bus), k->bus_type)) {
  qerror_report(QERR_BAD_BUS_FOR_DEVICE,
driver, object_get_typename(OBJECT(bus)));
  return NULL;

That seems to work.

Thanks,

Fred.



Re: [Qemu-devel] [PATCH v4 10/11] dataplane: add virtio-blk data plane code

2012-11-29 Thread Paolo Bonzini

> > +unsigned int num_reqs;
> > +QemuMutex num_reqs_lock;
> 
> OK the only reason this lock is needed is because
> you want to drain outside the thread.
> Won't it be better to queue process the drain request through
> the thread?
> You won't need any locks then.

Draining is processed in the thread.  This lock is only needed
to use it together with no_reqs_cond, because userspace threads
do not have something like wait_event.

Direct usage of futexes would let you remove the lock, but it's
not portable.

Paolo

> > +QemuCond no_reqs_cond;
> > +};
> > +
> > +/* Raise an interrupt to signal guest, if necessary */
> > +static void notify_guest(VirtIOBlockDataPlane *s)
> > +{
> > +if (!vring_should_notify(s->vdev, &s->vring)) {
> > +return;
> > +}
> > +
> > +event_notifier_set(s->guest_notifier);
> > +}
> > +
> > +static void complete_request(struct iocb *iocb, ssize_t ret, void
> > *opaque)
> > +{
> > +VirtIOBlockDataPlane *s = opaque;
> > +VirtIOBlockRequest *req = container_of(iocb,
> > VirtIOBlockRequest, iocb);
> > +struct virtio_blk_inhdr hdr;
> > +int len;
> > +
> > +if (likely(ret >= 0)) {
> > +hdr.status = VIRTIO_BLK_S_OK;
> > +len = ret;
> > +} else {
> > +hdr.status = VIRTIO_BLK_S_IOERR;
> > +len = 0;
> > +}
> > +
> > +trace_virtio_blk_data_plane_complete_request(s, req->head,
> > ret);
> > +
> > +qemu_iovec_from_buf(req->inhdr, 0, &hdr, sizeof(hdr));
> > +qemu_iovec_destroy(req->inhdr);
> > +g_slice_free(QEMUIOVector, req->inhdr);
> > +
> > +/* According to the virtio specification len should be the
> > number of bytes
> > + * written to, but for virtio-blk it seems to be the number of
> > bytes
> > + * transferred plus the status bytes.
> > + */
> > +vring_push(&s->vring, req->head, len + sizeof(hdr));
> > +
> > +qemu_mutex_lock(&s->num_reqs_lock);
> > +if (--s->num_reqs == 0) {
> > +qemu_cond_broadcast(&s->no_reqs_cond);
> > +}
> > +qemu_mutex_unlock(&s->num_reqs_lock);
> > +}
> > +
> > +static void fail_request_early(VirtIOBlockDataPlane *s, unsigned
> > int head,
> > +   QEMUIOVector *inhdr, unsigned char
> > status)
> > +{
> > +struct virtio_blk_inhdr hdr = {
> > +.status = status,
> > +};
> > +
> > +qemu_iovec_from_buf(inhdr, 0, &hdr, sizeof(hdr));
> > +qemu_iovec_destroy(inhdr);
> > +g_slice_free(QEMUIOVector, inhdr);
> > +
> > +vring_push(&s->vring, head, sizeof(hdr));
> > +notify_guest(s);
> > +}
> > +
> > +static int process_request(IOQueue *ioq, struct iovec iov[],
> > +   unsigned int out_num, unsigned int
> > in_num,
> > +   unsigned int head)
> > +{
> > +VirtIOBlockDataPlane *s = container_of(ioq,
> > VirtIOBlockDataPlane, ioqueue);
> > +struct iovec *in_iov = &iov[out_num];
> > +struct virtio_blk_outhdr outhdr;
> > +QEMUIOVector *inhdr;
> > +size_t in_size;
> > +
> > +/* Copy in outhdr */
> > +if (unlikely(iov_to_buf(iov, out_num, 0, &outhdr,
> > +sizeof(outhdr)) != sizeof(outhdr))) {
> > +error_report("virtio-blk request outhdr too short");
> > +return -EFAULT;
> > +}
> > +iov_discard(&iov, &out_num, sizeof(outhdr));
> > +
> > +/* Grab inhdr for later */
> > +in_size = iov_size(in_iov, in_num);
> > +if (in_size < sizeof(struct virtio_blk_inhdr)) {
> > +error_report("virtio_blk request inhdr too short");
> > +return -EFAULT;
> > +}
> > +inhdr = g_slice_new(QEMUIOVector);
> > +qemu_iovec_init(inhdr, 1);
> > +qemu_iovec_concat_iov(inhdr, in_iov, in_num,
> > +in_size - sizeof(struct virtio_blk_inhdr),
> > +sizeof(struct virtio_blk_inhdr));
> > +iov_discard(&in_iov, &in_num, -sizeof(struct
> > virtio_blk_inhdr));
> > +
> > +/* TODO Linux sets the barrier bit even when not advertised!
> > */
> > +outhdr.type &= ~VIRTIO_BLK_T_BARRIER;
> > +
> > +struct iocb *iocb;
> > +switch (outhdr.type & (VIRTIO_BLK_T_OUT |
> > VIRTIO_BLK_T_SCSI_CMD |
> > +   VIRTIO_BLK_T_FLUSH)) {
> > +case VIRTIO_BLK_T_IN:
> > +iocb = ioq_rdwr(ioq, true, in_iov, in_num, outhdr.sector *
> > 512);
> > +break;
> > +
> > +case VIRTIO_BLK_T_OUT:
> > +iocb = ioq_rdwr(ioq, false, iov, out_num, outhdr.sector *
> > 512);
> > +break;
> > +
> > +case VIRTIO_BLK_T_SCSI_CMD:
> > +/* TODO support SCSI commands */
> > +fail_request_early(s, head, inhdr, VIRTIO_BLK_S_UNSUPP);
> > +return 0;
> > +
> > +case VIRTIO_BLK_T_FLUSH:
> > +/* TODO fdsync not supported by Linux AIO, do it
> > synchronously here! */
> > +fdatasync(s->fd);
> > +fail_request_early(s, head, inhdr, VIRTIO_BLK_S_OK);
> > +return 0;
> > +
> > +default:
> > +error_report("virtio-blk unsupported request ty

Re: [Qemu-devel] [PATCHv4] rbd block driver fix race between aio completition and aio cancel

2012-11-29 Thread Paolo Bonzini


- Messaggio originale -
> Da: "Stefan Priebe" 
> A: qemu-devel@nongnu.org
> Cc: stefa...@gmail.com, "josh durgin" , 
> ceph-de...@vger.kernel.org, pbonz...@redhat.com,
> "Stefan Priebe" 
> Inviato: Giovedì, 29 novembre 2012 15:28:35
> Oggetto: [PATCHv4] rbd block driver fix race between aio completition and aio 
> cancel
> 
> This one fixes a race which qemu had also in iscsi block driver
> between cancellation and io completition.
> 
> qemu_rbd_aio_cancel was not synchronously waiting for the end of
> the command.
> 
> To archieve this it introduces a new status flag which uses
> -EINPROGRESS.
> 
> Changes since PATCHv3:
> - removed unnecessary if condition in rbd_start_aio as we
>   haven't start io yet
> - moved acb->status = 0 to rbd_aio_bh_cb so qemu_aio_wait always
>   waits until BH was executed
> 
> Changes since PATCHv2:
> - fixed missing braces
> - added vfree for bounce
> 
> Signed-off-by: Stefan Priebe 
> ---
>  block/rbd.c |   16 +---
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/block/rbd.c b/block/rbd.c
> index f3becc7..28e94ab 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -77,6 +77,7 @@ typedef struct RBDAIOCB {
>  int error;
>  struct BDRVRBDState *s;
>  int cancelled;
> +int status;
>  } RBDAIOCB;
>  
>  typedef struct RADOSCB {
> @@ -376,12 +377,6 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
>  RBDAIOCB *acb = rcb->acb;
>  int64_t r;
>  
> -if (acb->cancelled) {
> -qemu_vfree(acb->bounce);
> -qemu_aio_release(acb);
> -goto done;
> -}
> -
>  r = rcb->ret;
>  
>  if (acb->cmd == RBD_AIO_WRITE ||
> @@ -409,7 +404,6 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
>  /* Note that acb->bh can be NULL in case where the aio was
>  cancelled */
>  acb->bh = qemu_bh_new(rbd_aio_bh_cb, acb);
>  qemu_bh_schedule(acb->bh);
> -done:
>  g_free(rcb);
>  }
>  
> @@ -568,6 +562,12 @@ static void qemu_rbd_aio_cancel(BlockDriverAIOCB
> *blockacb)
>  {
>  RBDAIOCB *acb = (RBDAIOCB *) blockacb;
>  acb->cancelled = 1;
> +
> +while (acb->status == -EINPROGRESS) {
> +qemu_aio_wait();
> +}
> +
> +qemu_vfree(acb->bounce);

This vfree is not needed, since the BH will run and do the free.

Otherwise looks ok.

>  }
>  
>  static const AIOCBInfo rbd_aiocb_info = {
> @@ -639,6 +639,7 @@ static void rbd_aio_bh_cb(void *opaque)
>  acb->common.cb(acb->common.opaque, (acb->ret > 0 ? 0 :
>  acb->ret));
>  qemu_bh_delete(acb->bh);
>  acb->bh = NULL;
> +acb->status = 0;
>  
>  qemu_aio_release(acb);
>  }
> @@ -685,6 +686,7 @@ static BlockDriverAIOCB
> *rbd_start_aio(BlockDriverState *bs,
>  acb->s = s;
>  acb->cancelled = 0;
>  acb->bh = NULL;
> +acb->status = -EINPROGRESS;
>  
>  if (cmd == RBD_AIO_WRITE) {
>  qemu_iovec_to_buf(acb->qiov, 0, acb->bounce, qiov->size);
> --
> 1.7.10.4
> 
> 



Re: [Qemu-devel] [PATCH] rbd block driver fix race between aio completition and aio cancel

2012-11-29 Thread Paolo Bonzini

> > @@ -574,6 +570,12 @@ static void
> > qemu_rbd_aio_cancel(BlockDriverAIOCB *blockacb)
> >   {
> >   RBDAIOCB *acb = (RBDAIOCB *) blockacb;
> >   acb->cancelled = 1;
> > +
> > +while (acb->status == -EINPROGRESS) {
> > +qemu_aio_wait();
> > +}
> > +
> 
> There should be a qemu_vfree(acb->bounce); here

No, because the BH will have run at this point and you'd doubly-free
the buffer.

Paolo

> > +qemu_aio_release(acb);
> >   }
> >
> >   static AIOPool rbd_aio_pool = {
> > @@ -646,7 +648,8 @@ static void rbd_aio_bh_cb(void *opaque)
> >   qemu_bh_delete(acb->bh);
> >   acb->bh = NULL;
> >
> > -qemu_aio_release(acb);
> > +if (!acb->cancelled)
> > +qemu_aio_release(acb);
> >   }
> >
> >   static int rbd_aio_discard_wrapper(rbd_image_t image,
> > @@ -691,6 +694,7 @@ static BlockDriverAIOCB
> > *rbd_start_aio(BlockDriverState *bs,
> >   acb->s = s;
> >   acb->cancelled = 0;
> >   acb->bh = NULL;
> > +acb->status = -EINPROGRESS;
> >
> >   if (cmd == RBD_AIO_WRITE) {
> >   qemu_iovec_to_buf(acb->qiov, 0, acb->bounce, qiov->size);
> > @@ -737,7 +741,8 @@ static BlockDriverAIOCB
> > *rbd_start_aio(BlockDriverState *bs,
> >   failed:
> >   g_free(rcb);
> >   s->qemu_aio_count--;
> > -qemu_aio_release(acb);
> > +if (!acb->cancelled)
> 
> qemu_vfree(acb->bounce) should be here as well, although that's a
> separate bug that's probably never hit.
> 
> > +qemu_aio_release(acb);
> >   return NULL;
> >   }
> >
> >
> 
> 



Re: [Qemu-devel] [PATCH v4 10/11] dataplane: add virtio-blk data plane code

2012-11-29 Thread Michael S. Tsirkin
On Thu, Nov 29, 2012 at 10:21:31AM -0500, Paolo Bonzini wrote:
> 
> > > +unsigned int num_reqs;
> > > +QemuMutex num_reqs_lock;
> > 
> > OK the only reason this lock is needed is because
> > you want to drain outside the thread.
> > Won't it be better to queue process the drain request through
> > the thread?
> > You won't need any locks then.
> 
> Draining is processed in the thread.

Confused. You say same thread is always taking this lock?
This is what I suggest.

>  This lock is only needed
> to use it together with no_reqs_cond, because userspace threads
> do not have something like wait_event.
> 
> Direct usage of futexes would let you remove the lock, but it's
> not portable.
> 
> Paolo

It's easier: simply signal condition which is local on stack.

> > > +QemuCond no_reqs_cond;
> > > +};
> > > +
> > > +/* Raise an interrupt to signal guest, if necessary */
> > > +static void notify_guest(VirtIOBlockDataPlane *s)
> > > +{
> > > +if (!vring_should_notify(s->vdev, &s->vring)) {
> > > +return;
> > > +}
> > > +
> > > +event_notifier_set(s->guest_notifier);
> > > +}
> > > +
> > > +static void complete_request(struct iocb *iocb, ssize_t ret, void
> > > *opaque)
> > > +{
> > > +VirtIOBlockDataPlane *s = opaque;
> > > +VirtIOBlockRequest *req = container_of(iocb,
> > > VirtIOBlockRequest, iocb);
> > > +struct virtio_blk_inhdr hdr;
> > > +int len;
> > > +
> > > +if (likely(ret >= 0)) {
> > > +hdr.status = VIRTIO_BLK_S_OK;
> > > +len = ret;
> > > +} else {
> > > +hdr.status = VIRTIO_BLK_S_IOERR;
> > > +len = 0;
> > > +}
> > > +
> > > +trace_virtio_blk_data_plane_complete_request(s, req->head,
> > > ret);
> > > +
> > > +qemu_iovec_from_buf(req->inhdr, 0, &hdr, sizeof(hdr));
> > > +qemu_iovec_destroy(req->inhdr);
> > > +g_slice_free(QEMUIOVector, req->inhdr);
> > > +
> > > +/* According to the virtio specification len should be the
> > > number of bytes
> > > + * written to, but for virtio-blk it seems to be the number of
> > > bytes
> > > + * transferred plus the status bytes.
> > > + */
> > > +vring_push(&s->vring, req->head, len + sizeof(hdr));
> > > +
> > > +qemu_mutex_lock(&s->num_reqs_lock);
> > > +if (--s->num_reqs == 0) {
> > > +qemu_cond_broadcast(&s->no_reqs_cond);
> > > +}
> > > +qemu_mutex_unlock(&s->num_reqs_lock);
> > > +}
> > > +
> > > +static void fail_request_early(VirtIOBlockDataPlane *s, unsigned
> > > int head,
> > > +   QEMUIOVector *inhdr, unsigned char
> > > status)
> > > +{
> > > +struct virtio_blk_inhdr hdr = {
> > > +.status = status,
> > > +};
> > > +
> > > +qemu_iovec_from_buf(inhdr, 0, &hdr, sizeof(hdr));
> > > +qemu_iovec_destroy(inhdr);
> > > +g_slice_free(QEMUIOVector, inhdr);
> > > +
> > > +vring_push(&s->vring, head, sizeof(hdr));
> > > +notify_guest(s);
> > > +}
> > > +
> > > +static int process_request(IOQueue *ioq, struct iovec iov[],
> > > +   unsigned int out_num, unsigned int
> > > in_num,
> > > +   unsigned int head)
> > > +{
> > > +VirtIOBlockDataPlane *s = container_of(ioq,
> > > VirtIOBlockDataPlane, ioqueue);
> > > +struct iovec *in_iov = &iov[out_num];
> > > +struct virtio_blk_outhdr outhdr;
> > > +QEMUIOVector *inhdr;
> > > +size_t in_size;
> > > +
> > > +/* Copy in outhdr */
> > > +if (unlikely(iov_to_buf(iov, out_num, 0, &outhdr,
> > > +sizeof(outhdr)) != sizeof(outhdr))) {
> > > +error_report("virtio-blk request outhdr too short");
> > > +return -EFAULT;
> > > +}
> > > +iov_discard(&iov, &out_num, sizeof(outhdr));
> > > +
> > > +/* Grab inhdr for later */
> > > +in_size = iov_size(in_iov, in_num);
> > > +if (in_size < sizeof(struct virtio_blk_inhdr)) {
> > > +error_report("virtio_blk request inhdr too short");
> > > +return -EFAULT;
> > > +}
> > > +inhdr = g_slice_new(QEMUIOVector);
> > > +qemu_iovec_init(inhdr, 1);
> > > +qemu_iovec_concat_iov(inhdr, in_iov, in_num,
> > > +in_size - sizeof(struct virtio_blk_inhdr),
> > > +sizeof(struct virtio_blk_inhdr));
> > > +iov_discard(&in_iov, &in_num, -sizeof(struct
> > > virtio_blk_inhdr));
> > > +
> > > +/* TODO Linux sets the barrier bit even when not advertised!
> > > */
> > > +outhdr.type &= ~VIRTIO_BLK_T_BARRIER;
> > > +
> > > +struct iocb *iocb;
> > > +switch (outhdr.type & (VIRTIO_BLK_T_OUT |
> > > VIRTIO_BLK_T_SCSI_CMD |
> > > +   VIRTIO_BLK_T_FLUSH)) {
> > > +case VIRTIO_BLK_T_IN:
> > > +iocb = ioq_rdwr(ioq, true, in_iov, in_num, outhdr.sector *
> > > 512);
> > > +break;
> > > +
> > > +case VIRTIO_BLK_T_OUT:
> > > +iocb = ioq_rdwr(ioq, false, iov, out_num, outhdr.sector *
> > > 512);
> > > +break;
> > > +
> > > +case VIR

Re: [Qemu-devel] [PATCH v4 03/11] dataplane: add host memory mapping code

2012-11-29 Thread Paolo Bonzini
> > I don't understand.  If memory in the VGA region returns true from
> > memory_region_is_ram(), why would there be a problem?
> 
> If you change this memory but you don't update the display.
> Never happens with non buggy guests but we should catch and fail if
> it does.

Actually it _could_ happen with an MS-DOS guest that does the oh-so-retro

DEF SEG=&HB800
BLOAD "file.pic", 0

(I think).

Paolo



Re: [Qemu-devel] [PATCH v4 00/11] virtio: virtio-blk data plane

2012-11-29 Thread Michael S. Tsirkin
On Thu, Nov 29, 2012 at 03:48:04PM +0100, Stefan Hajnoczi wrote:
> On Thu, Nov 29, 2012 at 04:09:28PM +0200, Michael S. Tsirkin wrote:
> > On Thu, Nov 29, 2012 at 10:18:59AM +0100, Stefan Hajnoczi wrote:
> > > Michael, Paolo: Are you happy with v4?
> > 
> > Looks pretty clean by itself. I sent some comments but they can be
> > addressed later.  What worries me most is the code duplication with
> > regular virtio.
> > 
> > I see two ways to reduce the maintainance somewhat
> > - split out ring handling code in virtio-blk
> >   to a separate file to make it more obvious which part
> >   is inactive when data plane runs.
> > - share ring processing code with virtio/virtio-blk
> >   (e.g. use callbacks)
> > 
> > Was any thought given to implementing one of these two
> > approaches? 
> 
> Yes, your option #2 is where I'd like to move once threaded memory
> dispatch is working.  I hope we can run virtio.c code in a thread
> outside the global mutex soon.  That way we can kill
> hw/dataplane/vring.[ch].
> 
> Ping Fan Liu has been working on the memory API and device emulation
> stuff that we need in order to eventually use virtio.c outside the
> global mutex.
> 
> Stefan

I guess we can live with this short term.




Re: [Qemu-devel] [PATCH] qdev: relax bus type check in qdev_device_add() (v2)

2012-11-29 Thread Andreas Färber
Am 29.11.2012 16:12, schrieb Anthony Liguori:
> We are currently checking for an exact type match.  Use QOM dynamic_cast to
> check for a compatible type instead.
> 
> Cc: Konrad Frederic 
> Cc: Peter Maydell 
> Signed-off-by: Anthony Liguori 

Reviewed-by: Andreas Färber 

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



Re: [Qemu-devel] [PATCH 15/37] target-i386: set default value of "hypervisor" feature using static property

2012-11-29 Thread Eduardo Habkost
On Thu, Nov 29, 2012 at 03:56:09PM +0100, Igor Mammedov wrote:
> On Fri, 9 Nov 2012 16:48:07 +0100
> Eduardo Habkost  wrote:
> 
> > 
> > On 22/10/2012, at 17:03, Igor Mammedov  wrote:
> > 
> > > Signed-off-by: Igor Mammedov 
> > > ---
> > > target-i386/cpu.c | 9 +++--
> > > 1 file changed, 3 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> > > index 3131945..dc4fcdf 100644
> > > --- a/target-i386/cpu.c
> > > +++ b/target-i386/cpu.c
> > > @@ -174,7 +174,7 @@ static Property cpu_x86_properties[] = {
> > > DEFINE_PROP_BIT("f-xsave", X86CPU, env.cpuid_ext_features, 26, false),
> > > DEFINE_PROP_BIT("f-osxsave", X86CPU, env.cpuid_ext_features, 27,
> > > false), DEFINE_PROP_BIT("f-avx", X86CPU, env.cpuid_ext_features, 28,
> > > false),
> > > -DEFINE_PROP_BIT("f-hypervisor", X86CPU, env.cpuid_ext_features, 31,
> > > false),
> > > +DEFINE_PROP_BIT("f-hypervisor", X86CPU, env.cpuid_ext_features, 31,
> > > true), DEFINE_PROP_BIT("f-syscall", X86CPU, env.cpuid_ext2_features, 11,
> > > false), DEFINE_PROP_BIT("f-nx", X86CPU, env.cpuid_ext2_features, 20,
> > > false), DEFINE_PROP_BIT("f-xd", X86CPU, env.cpuid_ext2_features, 20,
> > > false), @@ -1307,11 +1307,12 @@ static int cpu_x86_find_by_name(X86CPU
> > > *cpu, x86_def_t *x86_cpu_def, {
> > > unsigned int i;
> > > x86_def_t *def;
> > > +CPUX86State *env = &cpu->env;
> > > 
> > > char *s = g_strdup(cpu_model);
> > > char *featurestr, *name = strtok(s, ",");
> > > /* Features to be added*/
> > > -uint32_t plus_features = 0, plus_ext_features = 0;
> > > +uint32_t plus_features = 0, plus_ext_features =
> > > env->cpuid_ext_features;
> > 
> > Moving data back and forth between CPUX86State and x86_def_t makes the
> > initialization ordering confusing (today data is moved from x86_def_t to
> > X86CPU, and never the other way around).
> > 
> > As this code is removed in the next patches, I don't mind too much, but
> > maybe it's simpler to implement this change only after the "use static
> > properties for setting cpuid features" patch?
> It won't eliminate moving data back and forth between CPUX86State and
> x86_def_t until CPU sub-classes (i.e. when x86_def_t isn't used anymore),
> if defaults from static properties are used.

You don't need CPU subclasses to reduce back-and-forth data movement.
We can just reorder code to look like:

 split_cpu_model_string(cpu_model, &model, &features);
 fill_cpudef_for_model_somehow(model, &cpudef);
 cpudef_2_x86_cpu(cpu, &cpudef);
 cpu_parse_feature_string(cpu, features);

Instead of what we have today, that is:

 split_cpu_model_string(cpu_model, &model, &features);
 fill_cpudef_for_model_somehow(model, &cpudef);
 cpu_parse_feature_string(cpu, features);
 /* (the 3 steps above are all done inside cpu_x86_find_by_name()) */
 cpudef_2_x86_cpu(cpu, &cpudef);

You did that on patch 32/37, but I would be interesting to do that
_before_ introducing the whole new CPU properties code, to make review
easier.

Reordering the steps to make the code less confusing before introducing
the CPU properties makes the code easier to review, that's why I sent
the CPU init cleanup series. And it makes us one step closer to
completely eliminate the X86CPUDefinition struct in the future.

I actually made the reordering change above using the CPU init series as
base, and it was very easy to do, after the CPU init was cleaned up. You
can see the result at:
https://github.com/ehabkost/qemu-hacks/commit/5256503135e787cb3550092c24dbeb3c5fc5a747
(it's completely untested. The rest of the changes are on the
work/cpu-model-classes branch)


> 
> But I've rewritten patches in a way to make this easier to review and less
> confusing.

I'm curious to see the result. Right now I am using the CPU init cleanup
series as base for my CPU subclass experiments.


> 
> > 
> > > uint32_t plus_ext2_features = 0, plus_ext3_features = 0;
> > > uint32_t plus_kvm_features = 0, plus_svm_features = 0;
> > > uint32_t plus_7_0_ebx_features = 0;
> > > @@ -1345,10 +1346,6 @@ static int cpu_x86_find_by_name(X86CPU *cpu,
> > > x86_def_t *x86_cpu_def, plus_kvm_features = 0;
> > > #endif
> > > 
> > > -add_flagname_to_bitmaps("hypervisor", &plus_features,
> > > -&plus_ext_features, &plus_ext2_features, &plus_ext3_features,
> > > -&plus_kvm_features, &plus_svm_features,
> > > &plus_7_0_ebx_features); -
> > > featurestr = strtok(NULL, ",");
> > > 
> > > while (featurestr) {
> > > -- 
> > > 1.7.11.7
> > > 
> > > 
> > 
> 

-- 
Eduardo



Re: [Qemu-devel] [PATCH v4 10/11] dataplane: add virtio-blk data plane code

2012-11-29 Thread Paolo Bonzini


- Messaggio originale -
> Da: "Michael S. Tsirkin" 
> A: "Paolo Bonzini" 
> Cc: qemu-devel@nongnu.org, "Kevin Wolf" , "Anthony Liguori" 
> , "Blue Swirl"
> , k...@us.ibm.com, "Asias He" , 
> "Stefan Hajnoczi" 
> Inviato: Giovedì, 29 novembre 2012 16:27:39
> Oggetto: Re: [PATCH v4 10/11] dataplane: add virtio-blk data plane code
> 
> On Thu, Nov 29, 2012 at 10:21:31AM -0500, Paolo Bonzini wrote:
> > 
> > > > +unsigned int num_reqs;
> > > > +QemuMutex num_reqs_lock;
> > > 
> > > OK the only reason this lock is needed is because
> > > you want to drain outside the thread.
> > > Won't it be better to queue process the drain request through
> > > the thread?
> > > You won't need any locks then.
> > 
> > Draining is processed in the thread.
> 
> Confused. You say same thread is always taking this lock?

No, one thread needs to take the lock just for waiting on the condition
variable.

However, indeed you are right about this:

> > > > +/* Block until pending requests have completed
> > > > + *
> > > > + * The vring continues to be serviced so ensure no new
> > > > + * requests will be added to avoid races.
> > > 
> > > This comment confuses me. "avoid races" is a kind of vague
> > > comment that does not really help.
> > > 
> > > This function does not seem to ensure
> > > no new requests - it simply waits until num requests
> > > gets to 0. But requests could get added right afterwards
> > > and it won't help.

and solving this race would also let Stefan remove the lock.

Stefan, perhaps you could replace the stop_notifier mechanism of
event-poll.c with something similar to aio_notify/qemu_notify_event,
and even remove event_poll_run in favor of event_poll (aka aio_wait...).
And also remove the return value from the callback, since your remaining
callbacks always return true.

The main thread can just signal the event loop:

s->stopping = true;
event_poll_notify(&s->event_poll);
qemu_thread_join(&s->thread);

and the dataplane thread will drain the queue before exiting.

Paolo

> This is what I suggest.
> 
> >  This lock is only needed
> > to use it together with no_reqs_cond, because userspace threads
> > do not have something like wait_event.
> > 
> > Direct usage of futexes would let you remove the lock, but it's
> > not portable.
> > 
> > Paolo
> 
> It's easier: simply signal condition which is local on stack.
> 
> > > > +QemuCond no_reqs_cond;
> > > > +};
> > > > +
> > > > +/* Raise an interrupt to signal guest, if necessary */
> > > > +static void notify_guest(VirtIOBlockDataPlane *s)
> > > > +{
> > > > +if (!vring_should_notify(s->vdev, &s->vring)) {
> > > > +return;
> > > > +}
> > > > +
> > > > +event_notifier_set(s->guest_notifier);
> > > > +}
> > > > +
> > > > +static void complete_request(struct iocb *iocb, ssize_t ret,
> > > > void
> > > > *opaque)
> > > > +{
> > > > +VirtIOBlockDataPlane *s = opaque;
> > > > +VirtIOBlockRequest *req = container_of(iocb,
> > > > VirtIOBlockRequest, iocb);
> > > > +struct virtio_blk_inhdr hdr;
> > > > +int len;
> > > > +
> > > > +if (likely(ret >= 0)) {
> > > > +hdr.status = VIRTIO_BLK_S_OK;
> > > > +len = ret;
> > > > +} else {
> > > > +hdr.status = VIRTIO_BLK_S_IOERR;
> > > > +len = 0;
> > > > +}
> > > > +
> > > > +trace_virtio_blk_data_plane_complete_request(s, req->head,
> > > > ret);
> > > > +
> > > > +qemu_iovec_from_buf(req->inhdr, 0, &hdr, sizeof(hdr));
> > > > +qemu_iovec_destroy(req->inhdr);
> > > > +g_slice_free(QEMUIOVector, req->inhdr);
> > > > +
> > > > +/* According to the virtio specification len should be the
> > > > number of bytes
> > > > + * written to, but for virtio-blk it seems to be the
> > > > number of
> > > > bytes
> > > > + * transferred plus the status bytes.
> > > > + */
> > > > +vring_push(&s->vring, req->head, len + sizeof(hdr));
> > > > +
> > > > +qemu_mutex_lock(&s->num_reqs_lock);
> > > > +if (--s->num_reqs == 0) {
> > > > +qemu_cond_broadcast(&s->no_reqs_cond);
> > > > +}
> > > > +qemu_mutex_unlock(&s->num_reqs_lock);
> > > > +}
> > > > +
> > > > +static void fail_request_early(VirtIOBlockDataPlane *s,
> > > > unsigned
> > > > int head,
> > > > +   QEMUIOVector *inhdr, unsigned
> > > > char
> > > > status)
> > > > +{
> > > > +struct virtio_blk_inhdr hdr = {
> > > > +.status = status,
> > > > +};
> > > > +
> > > > +qemu_iovec_from_buf(inhdr, 0, &hdr, sizeof(hdr));
> > > > +qemu_iovec_destroy(inhdr);
> > > > +g_slice_free(QEMUIOVector, inhdr);
> > > > +
> > > > +vring_push(&s->vring, head, sizeof(hdr));
> > > > +notify_guest(s);
> > > > +}
> > > > +
> > > > +static int process_request(IOQueue *ioq, struct iovec iov[],
> > > > +   unsigned int out_num, unsigned int
> > > > in_num,
> > > > +   unsigned int head)
> > > > +{
> > > > +VirtIOBlockDataPlan

[Qemu-devel] [PATCH for-1.3] qemu-tech.texi: update implemented xtensa features list

2012-11-29 Thread Max Filippov
Debug option is available since QEMU-1.2; FP coprocessor and
coprocessor context is available since QEMU-1.3.

Signed-off-by: Max Filippov 
---
 qemu-tech.texi |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/qemu-tech.texi b/qemu-tech.texi
index d73dda8..8aefa74 100644
--- a/qemu-tech.texi
+++ b/qemu-tech.texi
@@ -262,16 +262,16 @@ Current QEMU limitations:
 
 @item Core Xtensa ISA emulation, including most options: code density,
 loop, extended L32R, 16- and 32-bit multiplication, 32-bit division,
-MAC16, miscellaneous operations, boolean, multiprocessor synchronization,
+MAC16, miscellaneous operations, boolean, FP coprocessor, coprocessor
+context, debug, multiprocessor synchronization,
 conditional store, exceptions, relocatable vectors, unaligned exception,
 interrupts (including high priority and timer), hardware alignment,
 region protection, region translation, MMU, windowed registers, thread
 pointer, processor ID.
 
-@item Not implemented options: FP coprocessor, coprocessor context,
-data/instruction cache (including cache prefetch and locking), XLMI,
-processor interface, debug. Also options not covered by the core ISA
-(e.g. FLIX, wide branches) are not implemented.
+@item Not implemented options: data/instruction cache (including cache
+prefetch and locking), XLMI, processor interface. Also options not
+covered by the core ISA (e.g. FLIX, wide branches) are not implemented.
 
 @item Can run most Xtensa Linux binaries.
 
-- 
1.7.7.6




[Qemu-devel] [PATCHv5 1.3] seccomp: adding new syscalls (bugzilla 855162)

2012-11-29 Thread Eduardo Otubo
According to the bug 855162[0] - there's the need of adding new syscalls
to the whitelist when using Qemu with Libvirt.

[0] - https://bugzilla.redhat.com/show_bug.cgi?id=855162

Reported-by: Paul Moore 
Signed-off-by: Eduardo Otubo 
Signed-off-by: Corey Bryant 
---
 qemu-seccomp.c |  156 ++--
 1 file changed, 139 insertions(+), 17 deletions(-)

diff --git a/qemu-seccomp.c b/qemu-seccomp.c
index 64329a3..2a71d6f 100644
--- a/qemu-seccomp.c
+++ b/qemu-seccomp.c
@@ -26,8 +26,12 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = 
{
 { SCMP_SYS(timer_gettime), 254 },
 { SCMP_SYS(futex), 253 },
 { SCMP_SYS(select), 252 },
+#if defined(__x86_64__)
 { SCMP_SYS(recvfrom), 251 },
 { SCMP_SYS(sendto), 250 },
+#elif defined(__i386__)
+{ SCMP_SYS(socketcall), 250 },
+#endif
 { SCMP_SYS(read), 249 },
 { SCMP_SYS(brk), 248 },
 { SCMP_SYS(clone), 247 },
@@ -36,15 +40,30 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] 
= {
 { SCMP_SYS(execve), 245 },
 { SCMP_SYS(open), 245 },
 { SCMP_SYS(ioctl), 245 },
+#if defined(__x86_64__)
+{ SCMP_SYS(socket), 245 },
+{ SCMP_SYS(setsockopt), 245 },
 { SCMP_SYS(recvmsg), 245 },
 { SCMP_SYS(sendmsg), 245 },
 { SCMP_SYS(accept), 245 },
 { SCMP_SYS(connect), 245 },
+{ SCMP_SYS(socketpair), 245 },
+{ SCMP_SYS(bind), 245 },
+{ SCMP_SYS(listen), 245 },
+{ SCMP_SYS(semget), 245 },
+#elif defined(__i386__)
+{ SCMP_SYS(ipc), 245 },
+#endif
 { SCMP_SYS(gettimeofday), 245 },
 { SCMP_SYS(readlink), 245 },
 { SCMP_SYS(access), 245 },
 { SCMP_SYS(prctl), 245 },
 { SCMP_SYS(signalfd), 245 },
+{ SCMP_SYS(getrlimit), 245 },
+{ SCMP_SYS(set_tid_address), 245 },
+{ SCMP_SYS(statfs), 245 },
+{ SCMP_SYS(unlink), 245 },
+{ SCMP_SYS(wait4), 245 },
 #if defined(__i386__)
 { SCMP_SYS(fcntl64), 245 },
 { SCMP_SYS(fstat64), 245 },
@@ -56,30 +75,33 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] 
= {
 { SCMP_SYS(sigreturn), 245 },
 { SCMP_SYS(_newselect), 245 },
 { SCMP_SYS(_llseek), 245 },
-{ SCMP_SYS(mmap2), 245},
+{ SCMP_SYS(mmap2), 245 },
 { SCMP_SYS(sigprocmask), 245 },
-#elif defined(__x86_64__)
-{ SCMP_SYS(sched_getparam), 245},
-{ SCMP_SYS(sched_getscheduler), 245},
-{ SCMP_SYS(fstat), 245},
-{ SCMP_SYS(clock_getres), 245},
-{ SCMP_SYS(sched_get_priority_min), 245},
-{ SCMP_SYS(sched_get_priority_max), 245},
-{ SCMP_SYS(stat), 245},
-{ SCMP_SYS(socket), 245},
-{ SCMP_SYS(setsockopt), 245},
-{ SCMP_SYS(uname), 245},
-{ SCMP_SYS(semget), 245},
 #endif
+{ SCMP_SYS(sched_getparam), 245 },
+{ SCMP_SYS(sched_getscheduler), 245 },
+{ SCMP_SYS(fstat), 245 },
+{ SCMP_SYS(clock_getres), 245 },
+{ SCMP_SYS(sched_get_priority_min), 245 },
+{ SCMP_SYS(sched_get_priority_max), 245 },
+{ SCMP_SYS(stat), 245 },
+{ SCMP_SYS(uname), 245 },
 { SCMP_SYS(eventfd2), 245 },
 { SCMP_SYS(dup), 245 },
+{ SCMP_SYS(dup2), 245 },
+{ SCMP_SYS(dup3), 245 },
 { SCMP_SYS(gettid), 245 },
+{ SCMP_SYS(getgid), 245 },
+{ SCMP_SYS(getegid), 245 },
+{ SCMP_SYS(getuid), 245 },
+{ SCMP_SYS(geteuid), 245 },
 { SCMP_SYS(timer_create), 245 },
 { SCMP_SYS(exit), 245 },
 { SCMP_SYS(clock_gettime), 245 },
 { SCMP_SYS(time), 245 },
 { SCMP_SYS(restart_syscall), 245 },
 { SCMP_SYS(pwrite64), 245 },
+{ SCMP_SYS(nanosleep), 245 },
 { SCMP_SYS(chown), 245 },
 { SCMP_SYS(openat), 245 },
 { SCMP_SYS(getdents), 245 },
@@ -93,8 +115,6 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = 
{
 { SCMP_SYS(lseek), 245 },
 { SCMP_SYS(pselect6), 245 },
 { SCMP_SYS(fork), 245 },
-{ SCMP_SYS(bind), 245 },
-{ SCMP_SYS(listen), 245 },
 { SCMP_SYS(eventfd), 245 },
 { SCMP_SYS(rt_sigprocmask), 245 },
 { SCMP_SYS(write), 244 },
@@ -104,10 +124,112 @@ static const struct QemuSeccompSyscall 
seccomp_whitelist[] = {
 { SCMP_SYS(pipe2), 242 },
 { SCMP_SYS(munmap), 242 },
 { SCMP_SYS(mremap), 242 },
+{ SCMP_SYS(fdatasync), 242 },
+{ SCMP_SYS(close), 242 },
+{ SCMP_SYS(rt_sigpending), 242 },
+{ SCMP_SYS(rt_sigtimedwait), 242 },
+{ SCMP_SYS(readv), 242 },
+{ SCMP_SYS(writev), 242 },
+{ SCMP_SYS(preadv), 242 },
+{ SCMP_SYS(pwritev), 242 },
+{ SCMP_SYS(setrlimit), 242 },
+{ SCMP_SYS(ftruncate), 242 },
+{ SCMP_SYS(lstat), 242 },
+{ SCMP_SYS(pipe), 242 },
+{ SCMP_SYS(umask), 242 },
+{ SCMP_SYS(chdir), 242 },
+{ SCMP_SYS(setitimer), 242 },
+{ SCMP_SYS(setsid), 242 },
+{ SCMP_SYS(poll), 242 },
+{ SCMP_SYS(epoll_create), 242 },
+{ SCMP_SYS(epoll_ctl), 242 },
+{ SCMP_SYS(epoll_wait), 242 },
+#if defined(__i386__)
+{ SCMP_SYS(waitpid), 242 },
+#elif defined(__x86_64__)
 { SCMP_SYS(getsockname), 242 },
 { SCMP_SYS(getpeername), 242 },
-{ SCM

Re: [Qemu-devel] [PATCHv5 1.3] seccomp: adding new syscalls (bugzilla 855162)

2012-11-29 Thread Paul Moore
On Thursday, November 29, 2012 01:56:41 PM Eduardo Otubo wrote:
> According to the bug 855162[0] - there's the need of adding new syscalls
> to the whitelist when using Qemu with Libvirt.
> 
> [0] - https://bugzilla.redhat.com/show_bug.cgi?id=855162
> 
> Reported-by: Paul Moore 
> Signed-off-by: Eduardo Otubo 
> Signed-off-by: Corey Bryant 

Works for me on 64-bit x86 F17.

Tested-by: Paul Moore 

> diff --git a/qemu-seccomp.c b/qemu-seccomp.c
> index 64329a3..2a71d6f 100644
> --- a/qemu-seccomp.c
> +++ b/qemu-seccomp.c
> @@ -26,8 +26,12 @@ static const struct QemuSeccompSyscall
> seccomp_whitelist[] = { { SCMP_SYS(timer_gettime), 254 },
>  { SCMP_SYS(futex), 253 },
>  { SCMP_SYS(select), 252 },
> +#if defined(__x86_64__)
>  { SCMP_SYS(recvfrom), 251 },
>  { SCMP_SYS(sendto), 250 },
> +#elif defined(__i386__)
> +{ SCMP_SYS(socketcall), 250 },
> +#endif
>  { SCMP_SYS(read), 249 },
>  { SCMP_SYS(brk), 248 },
>  { SCMP_SYS(clone), 247 },
> @@ -36,15 +40,30 @@ static const struct QemuSeccompSyscall
> seccomp_whitelist[] = { { SCMP_SYS(execve), 245 },
>  { SCMP_SYS(open), 245 },
>  { SCMP_SYS(ioctl), 245 },
> +#if defined(__x86_64__)
> +{ SCMP_SYS(socket), 245 },
> +{ SCMP_SYS(setsockopt), 245 },
>  { SCMP_SYS(recvmsg), 245 },
>  { SCMP_SYS(sendmsg), 245 },
>  { SCMP_SYS(accept), 245 },
>  { SCMP_SYS(connect), 245 },
> +{ SCMP_SYS(socketpair), 245 },
> +{ SCMP_SYS(bind), 245 },
> +{ SCMP_SYS(listen), 245 },
> +{ SCMP_SYS(semget), 245 },
> +#elif defined(__i386__)
> +{ SCMP_SYS(ipc), 245 },
> +#endif
>  { SCMP_SYS(gettimeofday), 245 },
>  { SCMP_SYS(readlink), 245 },
>  { SCMP_SYS(access), 245 },
>  { SCMP_SYS(prctl), 245 },
>  { SCMP_SYS(signalfd), 245 },
> +{ SCMP_SYS(getrlimit), 245 },
> +{ SCMP_SYS(set_tid_address), 245 },
> +{ SCMP_SYS(statfs), 245 },
> +{ SCMP_SYS(unlink), 245 },
> +{ SCMP_SYS(wait4), 245 },
>  #if defined(__i386__)
>  { SCMP_SYS(fcntl64), 245 },
>  { SCMP_SYS(fstat64), 245 },
> @@ -56,30 +75,33 @@ static const struct QemuSeccompSyscall
> seccomp_whitelist[] = { { SCMP_SYS(sigreturn), 245 },
>  { SCMP_SYS(_newselect), 245 },
>  { SCMP_SYS(_llseek), 245 },
> -{ SCMP_SYS(mmap2), 245},
> +{ SCMP_SYS(mmap2), 245 },
>  { SCMP_SYS(sigprocmask), 245 },
> -#elif defined(__x86_64__)
> -{ SCMP_SYS(sched_getparam), 245},
> -{ SCMP_SYS(sched_getscheduler), 245},
> -{ SCMP_SYS(fstat), 245},
> -{ SCMP_SYS(clock_getres), 245},
> -{ SCMP_SYS(sched_get_priority_min), 245},
> -{ SCMP_SYS(sched_get_priority_max), 245},
> -{ SCMP_SYS(stat), 245},
> -{ SCMP_SYS(socket), 245},
> -{ SCMP_SYS(setsockopt), 245},
> -{ SCMP_SYS(uname), 245},
> -{ SCMP_SYS(semget), 245},
>  #endif
> +{ SCMP_SYS(sched_getparam), 245 },
> +{ SCMP_SYS(sched_getscheduler), 245 },
> +{ SCMP_SYS(fstat), 245 },
> +{ SCMP_SYS(clock_getres), 245 },
> +{ SCMP_SYS(sched_get_priority_min), 245 },
> +{ SCMP_SYS(sched_get_priority_max), 245 },
> +{ SCMP_SYS(stat), 245 },
> +{ SCMP_SYS(uname), 245 },
>  { SCMP_SYS(eventfd2), 245 },
>  { SCMP_SYS(dup), 245 },
> +{ SCMP_SYS(dup2), 245 },
> +{ SCMP_SYS(dup3), 245 },
>  { SCMP_SYS(gettid), 245 },
> +{ SCMP_SYS(getgid), 245 },
> +{ SCMP_SYS(getegid), 245 },
> +{ SCMP_SYS(getuid), 245 },
> +{ SCMP_SYS(geteuid), 245 },
>  { SCMP_SYS(timer_create), 245 },
>  { SCMP_SYS(exit), 245 },
>  { SCMP_SYS(clock_gettime), 245 },
>  { SCMP_SYS(time), 245 },
>  { SCMP_SYS(restart_syscall), 245 },
>  { SCMP_SYS(pwrite64), 245 },
> +{ SCMP_SYS(nanosleep), 245 },
>  { SCMP_SYS(chown), 245 },
>  { SCMP_SYS(openat), 245 },
>  { SCMP_SYS(getdents), 245 },
> @@ -93,8 +115,6 @@ static const struct QemuSeccompSyscall
> seccomp_whitelist[] = { { SCMP_SYS(lseek), 245 },
>  { SCMP_SYS(pselect6), 245 },
>  { SCMP_SYS(fork), 245 },
> -{ SCMP_SYS(bind), 245 },
> -{ SCMP_SYS(listen), 245 },
>  { SCMP_SYS(eventfd), 245 },
>  { SCMP_SYS(rt_sigprocmask), 245 },
>  { SCMP_SYS(write), 244 },
> @@ -104,10 +124,112 @@ static const struct QemuSeccompSyscall
> seccomp_whitelist[] = { { SCMP_SYS(pipe2), 242 },
>  { SCMP_SYS(munmap), 242 },
>  { SCMP_SYS(mremap), 242 },
> +{ SCMP_SYS(fdatasync), 242 },
> +{ SCMP_SYS(close), 242 },
> +{ SCMP_SYS(rt_sigpending), 242 },
> +{ SCMP_SYS(rt_sigtimedwait), 242 },
> +{ SCMP_SYS(readv), 242 },
> +{ SCMP_SYS(writev), 242 },
> +{ SCMP_SYS(preadv), 242 },
> +{ SCMP_SYS(pwritev), 242 },
> +{ SCMP_SYS(setrlimit), 242 },
> +{ SCMP_SYS(ftruncate), 242 },
> +{ SCMP_SYS(lstat), 242 },
> +{ SCMP_SYS(pipe), 242 },
> +{ SCMP_SYS(umask), 242 },
> +{ SCMP_SYS(chdir), 242 },
> +{ SCMP_SYS(setitimer), 242 },
> +{ SCMP_SYS(setsid), 242 },
> +{ SCMP_SYS(poll), 242 },
> +{ SCMP_SYS(epoll_create), 242 },
> +{ SCMP_S

Re: [Qemu-devel] [PATCH 1/1] exec: make -mem-path filenames deterministic

2012-11-29 Thread Peter Feiner
Ping?

P.S. Please note a typo in the cover letter: when I referred to KSM, I
meant "kernel samepage merging", not "kernel shared memory" :-)


On Tue, Nov 20, 2012 at 7:10 AM, Peter Feiner  wrote:

> This patch makes the -mem-path filenames deterministic and allows some
> control
> over how QEMU mmaps the files. Given this control, QEMU can be used to
> implement
> exogenous memory management techniques quite simply. Two examples follow:
>
> 1. Post-copy migration (mmap=shared for origin, save device state,
> path in
>NFS available on both hosts, mmap=shared for destination).
> 2. memory sharing via VM cloning (mmap=shared for master, save device
> state,
>then mmap=private for clones).
>
> These new features are exposed via arguments to -mem-path:
>
> -mem-path \
> path[,mode=temp|open|create][,mmap=private|shared][,nofallback][,anyfs]
>
> The backing file names are made deterministic by including their RAMBlocks'
> offsets, which are unique given a qemu command line. Note that Xen's live
> migration relies on RAMBlocks having the same offsets between QEMU
> processes
> (see xen_read_physmap). The new file name format is
>
> qemu_back_mem.OFFSET+SIZE.NAME[.RANDOM]
>
> where SIZE and NAME are added for extra sanity checking when mode="open".
>
> By default, the old -mem-path behavior is preserved. I.e., mode="temp" is
> used,
> which adds a random suffix to the deterministic name and unlinks the files.
>
> Signed-off-by: Peter Feiner 
> ---
>  cpu-all.h   |5 
>  exec.c  |   60
> ++
>  qemu-config.c   |   26 +++
>  qemu-options.hx |   24 +++--
>  vl.c|   43 +-
>  5 files changed, 131 insertions(+), 27 deletions(-)
>
> diff --git a/cpu-all.h b/cpu-all.h
> index c9c51b8..a83f71e 100644
> --- a/cpu-all.h
> +++ b/cpu-all.h
> @@ -500,6 +500,11 @@ typedef struct RAMList {
>  extern RAMList ram_list;
>
>  extern const char *mem_path;
> +extern int mem_temp;
> +extern int mem_create;
> +extern int mem_fallback;
> +extern int mem_shared;
> +extern int mem_hugetlbfs;
>  extern int mem_prealloc;
>
>  /* Flags stored in the low bits of the TLB virtual address.  These are
> diff --git a/exec.c b/exec.c
> index 8435de0..75b146c 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2346,7 +2346,7 @@ void qemu_flush_coalesced_mmio_buffer(void)
>
>  #define HUGETLBFS_MAGIC   0x958458f6
>
> -static long gethugepagesize(const char *path)
> +static long getfspagesize(const char *path, int want_hugetlbfs)
>  {
>  struct statfs fs;
>  int ret;
> @@ -2360,7 +2360,7 @@ static long gethugepagesize(const char *path)
>  return 0;
>  }
>
> -if (fs.f_type != HUGETLBFS_MAGIC)
> +if (want_hugetlbfs && fs.f_type != HUGETLBFS_MAGIC)
>  fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
>
>  return fs.f_bsize;
> @@ -2373,17 +2373,15 @@ static void *file_ram_alloc(RAMBlock *block,
>  char *filename;
>  void *area;
>  int fd;
> -#ifdef MAP_POPULATE
>  int flags;
> -#endif
> -unsigned long hpagesize;
> +unsigned long pagesize;
>
> -hpagesize = gethugepagesize(path);
> -if (!hpagesize) {
> +pagesize = getfspagesize(path, mem_hugetlbfs);
> +if (!pagesize) {
>  return NULL;
>  }
>
> -if (memory < hpagesize) {
> +if (memory < pagesize) {
>  return NULL;
>  }
>
> @@ -2392,20 +2390,35 @@ static void *file_ram_alloc(RAMBlock *block,
>  return NULL;
>  }
>
> -if (asprintf(&filename, "%s/qemu_back_mem.XX", path) == -1) {
> +if (asprintf(&filename, "%s/qemu_back_mem.%"PRIx64"+%"PRIx64".%s",
> + path, block->offset, memory, block->mr->name) == -1) {
>  return NULL;
>  }
>
> -fd = mkstemp(filename);
> +if (mem_temp) {
> +if (asprintf(&filename, "%s.XX", filename) == -1) {
> +return NULL;
> +}
> +fd = mkstemp(filename);
> +} else {
> +flags = O_RDWR | O_CLOEXEC | (mem_create ? O_CREAT : 0);
> +fd = open(filename, flags, 0600);
> +}
> +
>  if (fd < 0) {
> -perror("unable to create backing store for hugepages");
> +fprintf(stderr, "unable to create backing store %s "
> +"for guest memory: %s\n", filename, strerror(errno));
>  free(filename);
>  return NULL;
>  }
> -unlink(filename);
> +
> +if (mem_temp) {
> +unlink(filename);
> +}
> +
>  free(filename);
>
> -memory = (memory+hpagesize-1) & ~(hpagesize-1);
> +memory = (memory+pagesize-1) & ~(pagesize-1);
>
>  /*
>   * ftruncate is not supported by hugetlbfs in older
> @@ -2416,16 +2429,16 @@ static void *file_ram_alloc(RAMBlock *block,
>  if (ftruncate(fd, memory))
>  perror("ftruncate");
>
> +flags = mem_shared ? MAP_SHARED : MAP_PRIVATE;
>  #ifdef MAP_POPULATE
>  /* NB: M

Re: [Qemu-devel] [PATCH v2 0/6] block: bdrv_img_create(): propagate errors

2012-11-29 Thread Luiz Capitulino
On Fri, 2 Nov 2012 11:42:32 -0200
Luiz Capitulino  wrote:

> On Fri, 02 Nov 2012 14:40:03 +0100
> Kevin Wolf  wrote:
> 
> > Am 02.11.2012 14:25, schrieb Luiz Capitulino:
> > > On Fri, 19 Oct 2012 11:27:59 -0300
> > > Luiz Capitulino  wrote:
> > > 
> > >> By adding error propagation to bdrv_img_create() we improve error 
> > >> reporting
> > >> in qmp_transaction() and simplify qemu-img.c:img_create() a bit.
> > >>
> > >> Please, check individual patches for details.
> > > 
> > > Kevin, is this in your review queue?
> > 
> > Yes, it is. With KVM Forum and lots of other patch series, no promises
> > though.
> 
> Sure, just wanted to know if you were aware about it.

Still in your queue? :)



[Qemu-devel] [PATCH 3/3] hw/arm_gic_common: Correct GICC_PMR reset value for newer GICs

2012-11-29 Thread Peter Maydell
The GIC architecture specification for v1 and v2 GICs (as found
on the Cortex-A9 and newer) states that the GICC_PMR reset value
is zero; this differs from the 0xf0 reset value used on 11MPCore.
The NVIC is different again in not having a CPU interface; since
we share the GIC code we must force the priority mask field to
allow through all interrupts.

Signed-off-by: Peter Maydell 
---
 hw/arm_gic_common.c |6 +-
 hw/armv7m_nvic.c|4 +++-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/hw/arm_gic_common.c b/hw/arm_gic_common.c
index 8369309..73ae331 100644
--- a/hw/arm_gic_common.c
+++ b/hw/arm_gic_common.c
@@ -127,7 +127,11 @@ static void arm_gic_common_reset(DeviceState *dev)
 int i;
 memset(s->irq_state, 0, GIC_MAXIRQ * sizeof(gic_irq_state));
 for (i = 0 ; i < s->num_cpu; i++) {
-s->priority_mask[i] = 0xf0;
+if (s->revision == REV_11MPCORE) {
+s->priority_mask[i] = 0xf0;
+} else {
+s->priority_mask[i] = 0;
+}
 s->current_pending[i] = 1023;
 s->running_irq[i] = 1023;
 s->running_priority[i] = 0x100;
diff --git a/hw/armv7m_nvic.c b/hw/armv7m_nvic.c
index f0a2e7b..4963678 100644
--- a/hw/armv7m_nvic.c
+++ b/hw/armv7m_nvic.c
@@ -455,9 +455,11 @@ static void armv7m_nvic_reset(DeviceState *dev)
 nc->parent_reset(dev);
 /* Common GIC reset resets to disabled; the NVIC doesn't have
  * per-CPU interfaces so mark our non-existent CPU interface
- * as enabled by default.
+ * as enabled by default, and with a priority mask which allows
+ * all interrupts through.
  */
 s->gic.cpu_enabled[0] = 1;
+s->gic.priority_mask[0] = 0x100;
 /* The NVIC as a whole is always enabled. */
 s->gic.enabled = 1;
 systick_reset(s);
-- 
1.7.9.5




[Qemu-devel] [PATCH 1.3?] multiboot: fix e801 memory map

2012-11-29 Thread Paolo Bonzini
The e801 memory sizes in the multiboot structures hard-code the available
low memory to 640.  However, the value should not include the size of the
EBDA.  Fill the value in the option ROM, getting the size of low memory
from the BIOS.

Cc: Alexander Graf 
Signed-off-by: Paolo Bonzini 
---
Alex, can you ack this patch for 1.3?

 pc-bios/multiboot.bin | Bin 1024 -> 1024 bytes
 pc-bios/optionrom/multiboot.S |   7 +++
 2 files changed, 7 insertions(+)

diff --git a/pc-bios/multiboot.bin b/pc-bios/multiboot.bin
index 
f74a6e142fddc054d7f40ab346a108532afac40f..7b3c1745a430ea5e0e15b9aa817d1cbbaa40db14
 100644
GIT binary patch
delta 81
zcmZqRXyBNj#q7o8KT+38L4+xd@o;KdryK*rZVrajPB|8aZaEhwAcKt|ty|7*VtW(k
kS)sHUDQSyY7$$Qv{%2ueV91@!$z;eVv)P(y2P2~%0BkrF9smFU

delta 72
zcmZqRXyBNj#capqJWgJByEE-zoLs=f!*~_|qXQE}

diff --git a/pc-bios/optionrom/multiboot.S b/pc-bios/optionrom/multiboot.S
index f08222a..003bcfb 100644
--- a/pc-bios/optionrom/multiboot.S
+++ b/pc-bios/optionrom/multiboot.S
@@ -75,6 +75,13 @@ run_multiboot:
shr $4, %eax
mov %ax, %fs
 
+   /* Account for the EBDA in the multiboot structure's e801
+* map.
+*/
+   int $0x12
+   cwtl
+   movl%eax, %fs:4
+
/* ES = mmap_addr */
mov %fs:48, %eax
shr $4, %eax
-- 
1.8.0




[Qemu-devel] [PATCH] multiboot: fix e801 memory map

2012-11-29 Thread Paolo Bonzini
The e801 memory sizes in the multiboot structures hard-code the available
low memory to 640.  However, the value should not include the size of the
EBDA.  Fill the value in the option ROM, getting the size of low memory
from the BIOS.

Cc: Alexander Graf 
Signed-off-by: Paolo Bonzini 
---
 pc-bios/multiboot.bin | Bin 1024 -> 1024 bytes
 pc-bios/optionrom/multiboot.S |   7 +++
 2 files changed, 7 insertions(+)

diff --git a/pc-bios/multiboot.bin b/pc-bios/multiboot.bin
index 
f74a6e142fddc054d7f40ab346a108532afac40f..7b3c1745a430ea5e0e15b9aa817d1cbbaa40db14
 100644
GIT binary patch
delta 81
zcmZqRXyBNj#q7o8KT+38L4+xd@o;KdryK*rZVrajPB|8aZaEhwAcKt|ty|7*VtW(k
kS)sHUDQSyY7$$Qv{%2ueV91@!$z;eVv)P(y2P2~%0BkrF9smFU

delta 72
zcmZqRXyBNj#capqJWgJByEE-zoLs=f!*~_|qXQE}

diff --git a/pc-bios/optionrom/multiboot.S b/pc-bios/optionrom/multiboot.S
index f08222a..003bcfb 100644
--- a/pc-bios/optionrom/multiboot.S
+++ b/pc-bios/optionrom/multiboot.S
@@ -75,6 +75,13 @@ run_multiboot:
shr $4, %eax
mov %ax, %fs
 
+   /* Account for the EBDA in the multiboot structure's e801
+* map.
+*/
+   int $0x12
+   cwtl
+   movl%eax, %fs:4
+
/* ES = mmap_addr */
mov %fs:48, %eax
shr $4, %eax
-- 
1.8.0




[Qemu-devel] [PATCH 0/3] ARM: fix secondary boot GIC init, GIC bugs

2012-11-29 Thread Peter Maydell
The secondary CPU boot code we use on ARM had a couple of
places where it was accidentally relying on bugs or implementation
dependent behaviour of QEMU's on GIC implementation:
 * we weren't initialising the GICC_PMR priority mask, which
   in a correct v1 or v2 GIC is set to mask out all interrupts
   from reset. This worked on the QEMU GIC because our GIC (a)
   gets the reset value of PMR wrong on non-11MPCore and (b) is
   doing an incorrect comparison against the PMR value when
   delivering interrupts anyway.
 * no barrier between initialising the GIC and doing a WFI; this
   is fine for TCG QEMU but could potentially result in the GIC
   config not being guaranteed to have happened before we hit the
   WFI when running on real CPU hardware under ARM KVM.

This patch series first fixes the secondary CPU boot code bugs,
and then corrects our GIC model to match the specs.

NB: I don't have a working test setup/images for highbank or
exynos4 so those changes are only compile tested, but they are
basically the same as the generic boot code changes.

Peter Maydell (3):
  hw/arm_boot, exynos4210, highbank: Fix secondary boot GIC init
  hw/arm_gic: Fix comparison with priority mask register
  hw/arm_gic_common: Correct GICC_PMR reset value for newer GICs

 hw/arm_boot.c   |   17 ++---
 hw/arm_gic.c|2 +-
 hw/arm_gic_common.c |6 +-
 hw/armv7m_nvic.c|4 +++-
 hw/exynos4210.c |   10 +++---
 hw/highbank.c   |7 +--
 6 files changed, 35 insertions(+), 11 deletions(-)

-- 
1.7.9.5




[Qemu-devel] [PATCH 2/3] hw/arm_gic: Fix comparison with priority mask register

2012-11-29 Thread Peter Maydell
The GIC spec states that only interrupts with higher priority
than the value in the GICC_PMR priority mask register are
passed through to the processor. We were incorrectly allowing
through interrupts with a priority equal to the specified
value: correct the comparison operation to match the spec.

Signed-off-by: Peter Maydell 
---
 hw/arm_gic.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/arm_gic.c b/hw/arm_gic.c
index f9e423f..672d539 100644
--- a/hw/arm_gic.c
+++ b/hw/arm_gic.c
@@ -73,7 +73,7 @@ void gic_update(GICState *s)
 }
 }
 level = 0;
-if (best_prio <= s->priority_mask[cpu]) {
+if (best_prio < s->priority_mask[cpu]) {
 s->current_pending[cpu] = best_irq;
 if (best_prio < s->running_priority[cpu]) {
 DPRINTF("Raised pending IRQ %d\n", best_irq);
-- 
1.7.9.5




[Qemu-devel] [PATCH 1/3] hw/arm_boot, exynos4210, highbank: Fix secondary boot GIC init

2012-11-29 Thread Peter Maydell
Fix the code in the secondary CPU boot stubs so that it correctly
initialises the GIC rather than relying on bugs or implementation
dependent aspects of the QEMU GIC implementation:
 * set the GIC_PMR.Priority field to all-ones, so that all
   interrupts are passed through. The default of all-zeroes
   means all interrupts are masked, and QEMU only booted because
   of a bug in the priority masking in our GIC implementation.
 * add a barrier after GIC setup and before WFI to ensure that
   GIC config is complete before we go into a possible low power
   state. This isn't needed with the software GIC model but could
   be required when using KVM and executing this code on the
   real hardware CPU.

Note that of the three secondary stub implementations, only
the common generic one needs to support both v6 and v7 DSB
encodings; highbank and exynos4210 will always be v7 CPUs.

Signed-off-by: Peter Maydell 
---
 hw/arm_boot.c   |   17 ++---
 hw/exynos4210.c |   10 +++---
 hw/highbank.c   |7 +--
 3 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/hw/arm_boot.c b/hw/arm_boot.c
index 92e2cab..ec3b8d5 100644
--- a/hw/arm_boot.c
+++ b/hw/arm_boot.c
@@ -44,11 +44,17 @@ static uint32_t bootloader[] = {
  * for an interprocessor interrupt and polling a configurable
  * location for the kernel secondary CPU entry point.
  */
+#define DSB_INSN 0xf57ff04f
+#define CP15_DSB_INSN 0xee070f9a /* mcr cp15, 0, r0, c7, c10, 4 */
+
 static uint32_t smpboot[] = {
-  0xe59f201c, /* ldr r2, gic_cpu_if */
-  0xe59f001c, /* ldr r0, startaddr */
+  0xe59f2028, /* ldr r2, gic_cpu_if */
+  0xe59f0028, /* ldr r0, startaddr */
   0xe3a01001, /* mov r1, #1 */
-  0xe5821000, /* str r1, [r2] */
+  0xe5821000, /* str r1, [r2] - set GICC_CTLR.Enable */
+  0xe3a010ff, /* mov r1, #0xff */
+  0xe5821004, /* str r1, [r2, 4] - set GIC_PMR.Priority to 0xff */
+  DSB_INSN,   /* dsb */
   0xe320f003, /* wfi */
   0xe5901000, /* ldr r1, [r0] */
   0xe1110001, /* tst r1, r1 */
@@ -65,6 +71,11 @@ static void default_write_secondary(ARMCPU *cpu,
 smpboot[ARRAY_SIZE(smpboot) - 1] = info->smp_bootreg_addr;
 smpboot[ARRAY_SIZE(smpboot) - 2] = info->gic_cpu_if_addr;
 for (n = 0; n < ARRAY_SIZE(smpboot); n++) {
+/* Replace DSB with the pre-v7 DSB if necessary. */
+if (!arm_feature(&cpu->env, ARM_FEATURE_V7) &&
+smpboot[n] == DSB_INSN) {
+smpboot[n] = CP15_DSB_INSN;
+}
 smpboot[n] = tswap32(smpboot[n]);
 }
 rom_add_blob_fixed("smpboot", smpboot, sizeof(smpboot),
diff --git a/hw/exynos4210.c b/hw/exynos4210.c
index 00d4db8..22148cd 100644
--- a/hw/exynos4210.c
+++ b/hw/exynos4210.c
@@ -80,12 +80,16 @@ void exynos4210_write_secondary(ARMCPU *cpu,
 {
 int n;
 uint32_t smpboot[] = {
-0xe59f3024, /* ldr r3, External gic_cpu_if */
-0xe59f2024, /* ldr r2, Internal gic_cpu_if */
-0xe59f0024, /* ldr r0, startaddr */
+0xe59f3034, /* ldr r3, External gic_cpu_if */
+0xe59f2034, /* ldr r2, Internal gic_cpu_if */
+0xe59f0034, /* ldr r0, startaddr */
 0xe3a01001, /* mov r1, #1 */
 0xe5821000, /* str r1, [r2] */
 0xe5831000, /* str r1, [r3] */
+0xe3a010ff, /* mov r1, #0xff */
+0xe5821004, /* str r1, [r2, #4] */
+0xe5831004, /* str r1, [r3, #4] */
+0xf57ff04f, /* dsb */
 0xe320f003, /* wfi */
 0xe5901000, /* ldr r1, [r0] */
 0xe1110001, /* tst r1, r1 */
diff --git a/hw/highbank.c b/hw/highbank.c
index afbb005..447e57d 100644
--- a/hw/highbank.c
+++ b/hw/highbank.c
@@ -44,9 +44,12 @@ static void hb_write_secondary(ARMCPU *cpu, const struct 
arm_boot_info *info)
 0xe21f, /* ands r0, r0, #0x0f */
 0xe3a03040, /* mov r3, #0x40 - jump address is 0x40 + 0x10 * core id */
 0xe0830200, /* add r0, r3, r0, lsl #4 */
-0xe59f2018, /* ldr r2, privbase */
+0xe59f2024, /* ldr r2, privbase */
 0xe3a01001, /* mov r1, #1 */
-0xe5821100, /* str r1, [r2, #256] */
+0xe5821100, /* str r1, [r2, #256] - set GICC_CTLR.Enable */
+0xe3a010ff, /* mov r1, #0xff */
+0xe5821104, /* str r1, [r2, #260] - set GICC_PMR.Priority to 0xff */
+0xf57ff04f, /* dsb */
 0xe320f003, /* wfi */
 0xe5901000, /* ldr r1, [r0] */
 0xe1110001, /* tst r1, r1 */
-- 
1.7.9.5




[Qemu-devel] [PATCH v2 05/10] qemu-ga: build_fs_mount_list(): take an Error argument

2012-11-29 Thread Luiz Capitulino
Signed-off-by: Luiz Capitulino 
---

o v2: use error_setg() in build_fs_mount_list()

 qga/commands-posix.c | 30 --
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index b3a3f26..ff14174 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -371,7 +371,7 @@ static void free_fs_mount_list(FsMountList *mounts)
 /*
  * Walk the mount table and build a list of local file systems
  */
-static int build_fs_mount_list(FsMountList *mounts)
+static void build_fs_mount_list(FsMountList *mounts, Error **err)
 {
 struct mntent *ment;
 FsMount *mount;
@@ -380,8 +380,8 @@ static int build_fs_mount_list(FsMountList *mounts)
 
 fp = setmntent(mtab, "r");
 if (!fp) {
-g_warning("fsfreeze: unable to read mtab");
-return -1;
+error_setg(err, "failed to open mtab file: '%s'", mtab);
+return;
 }
 
 while ((ment = getmntent(fp))) {
@@ -405,8 +405,6 @@ static int build_fs_mount_list(FsMountList *mounts)
 }
 
 endmntent(fp);
-
-return 0;
 }
 #endif
 
@@ -433,15 +431,17 @@ int64_t qmp_guest_fsfreeze_freeze(Error **err)
 int ret = 0, i = 0;
 FsMountList mounts;
 struct FsMount *mount;
+Error *local_err = NULL;
 int fd;
 char err_msg[512];
 
 slog("guest-fsfreeze called");
 
 QTAILQ_INIT(&mounts);
-ret = build_fs_mount_list(&mounts);
-if (ret < 0) {
-return ret;
+build_fs_mount_list(&mounts, &local_err);
+if (error_is_set(&local_err)) {
+error_propagate(err, local_err);
+return -1;
 }
 
 /* cannot risk guest agent blocking itself on a write in this state */
@@ -498,12 +498,12 @@ int64_t qmp_guest_fsfreeze_thaw(Error **err)
 FsMountList mounts;
 FsMount *mount;
 int fd, i = 0, logged;
+Error *local_err = NULL;
 
 QTAILQ_INIT(&mounts);
-ret = build_fs_mount_list(&mounts);
-if (ret) {
-error_set(err, QERR_QGA_COMMAND_FAILED,
-  "failed to enumerate filesystems");
+build_fs_mount_list(&mounts, &local_err);
+if (error_is_set(&local_err)) {
+error_propagate(err, local_err);
 return 0;
 }
 
@@ -568,6 +568,7 @@ void qmp_guest_fstrim(bool has_minimum, int64_t minimum, 
Error **err)
 FsMountList mounts;
 struct FsMount *mount;
 int fd;
+Error *local_err = NULL;
 char err_msg[512];
 struct fstrim_range r = {
 .start = 0,
@@ -578,8 +579,9 @@ void qmp_guest_fstrim(bool has_minimum, int64_t minimum, 
Error **err)
 slog("guest-fstrim called");
 
 QTAILQ_INIT(&mounts);
-ret = build_fs_mount_list(&mounts);
-if (ret < 0) {
+build_fs_mount_list(&mounts, &local_err);
+if (error_is_set(&local_err)) {
+error_propagate(err, local_err);
 return;
 }
 
-- 
1.8.0




Re: [Qemu-devel] [PATCH v2 2/2] ivshmem: use irqfd to interrupt among VMs

2012-11-29 Thread Cam Macdonell
On Thu, Nov 29, 2012 at 1:34 AM, liu ping fan  wrote:
> On Thu, Nov 29, 2012 at 12:42 PM, Cam Macdonell  wrote:
>> On Tue, Nov 27, 2012 at 7:53 PM, liu ping fan  wrote:
>>> On Wed, Nov 28, 2012 at 5:48 AM, Cam Macdonell  wrote:
 On Sat, Nov 24, 2012 at 8:51 PM, Liu Ping Fan  wrote:
> From: Liu Ping Fan 
>
> Using irqfd, so we can avoid switch between kernel and user when
> VMs interrupts each other.

 Nice work.  Due to a hardware failure, there will be a small delay in
 me being able to test this.  I'll follow up as soon as I can.

>>> BTW where can I find the latest guest code for test?
>>> I got the guest code from git://gitorious.org/nahanni/guest-code.git.
>>> But it seems outdated, after fixing the unlocked_ioctl, and vm-id bits
>>> position (the codes conflict with ivshmem spec), it works (I have
>>> tested for V1).
>>
>> Hello,
>>
>> Which device driver are you using?
>>
> guest-code/kernel_module/standard/kvm_ivshmem.c

The uio driver is the recommended one, however if you want to use the
kvm_ivshmem one and have it working, then feel free to continue.

I had deleted it form the repo, but some users had based solutions off
it, so I added it back.

btw, my hardware issue has been resolved, so I'll get to testing your
patch soon.

Sincerely,
Cam

>
>> Cam
>>
>>>
>>> Regards,
>>> Pingfan
>>>
>
> Signed-off-by: Liu Ping Fan 
> ---
>  hw/ivshmem.c |   54 
> +-
>  1 files changed, 53 insertions(+), 1 deletions(-)
>
> diff --git a/hw/ivshmem.c b/hw/ivshmem.c
> index 7c8630c..5709e89 100644
> --- a/hw/ivshmem.c
> +++ b/hw/ivshmem.c
> @@ -19,6 +19,7 @@
>  #include "hw.h"
>  #include "pc.h"
>  #include "pci.h"
> +#include "msi.h"
>  #include "msix.h"
>  #include "kvm.h"
>  #include "migration.h"
> @@ -83,6 +84,7 @@ typedef struct IVShmemState {
>  uint32_t vectors;
>  uint32_t features;
>  EventfdEntry *eventfd_table;
> +int *vector_virqs;
>
>  Error *migration_blocker;
>
> @@ -625,16 +627,62 @@ static int ivshmem_load(QEMUFile* f, void *opaque, 
> int version_id)
>  return 0;
>  }
>
> +static int ivshmem_vector_use(PCIDevice *dev, unsigned vector,
> + MSIMessage msg)
> +{
> +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
> +int virq;
> +EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
> +
> +virq = kvm_irqchip_add_msi_route(kvm_state, msg);
> +if (virq >= 0 && kvm_irqchip_add_irqfd_notifier(kvm_state, n, virq) 
> >= 0) {
> +s->vector_virqs[vector] = virq;
> +qemu_chr_add_handlers(s->eventfd_chr[vector], NULL, NULL, NULL, 
> NULL);
> +} else if (virq >= 0) {
> +kvm_irqchip_release_virq(kvm_state, virq);
> +error_report("ivshmem, can not setup irqfd\n");
> +return -1;
> +} else {
> +error_report("ivshmem, no enough msi route to setup irqfd\n");
> +return -1;
> +}
> +
> +return 0;
> +}
> +
> +static void ivshmem_vector_release(PCIDevice *dev, unsigned vector)
> +{
> +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
> +EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
> +int virq = s->vector_virqs[vector];
> +
> +if (s->vector_virqs[vector] >= 0) {
> +kvm_irqchip_remove_irqfd_notifier(kvm_state, n, virq);
> +kvm_irqchip_release_virq(kvm_state, virq);
> +s->vector_virqs[vector] = -1;
> +}
> +}
> +
>  static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address,
>  uint32_t val, int len)
>  {
> +bool is_enabled, was_enabled = msi_enabled(pci_dev);
> +
>  pci_default_write_config(pci_dev, address, val, len);
> +is_enabled = msi_enabled(pci_dev);
> +if (!was_enabled && is_enabled) {
> +msix_set_vector_notifiers(pci_dev, ivshmem_vector_use,
> +ivshmem_vector_release);
> +} else if (was_enabled && !is_enabled) {
> +msix_unset_vector_notifiers(pci_dev);
> +}
>  }
>
>  static int pci_ivshmem_init(PCIDevice *dev)
>  {
>  IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
>  uint8_t *pci_conf;
> +int i;
>
>  if (s->sizearg == NULL)
>  s->ivshmem_size = 4 << 20; /* 4 MB default */
> @@ -758,7 +806,10 @@ static int pci_ivshmem_init(PCIDevice *dev)
>  }
>
>  s->dev.config_write = ivshmem_write_config;
> -
> +s->vector_virqs = g_new0(int, s->vectors);
> +for (i = 0; i < s->vectors; i++) {
> +s->vector_virqs[i] = -1;
> +}
>  return 0;
>  }
>
> @@ -770,6 +821

Re: [Qemu-devel] [PATCH V7 0/6] VMXNET3 paravirtual NIC device implementation

2012-11-29 Thread Stefan Hajnoczi
On Fri, Nov 16, 2012 at 2:55 PM, Dmitry Fleytman  wrote:
>   We didn't succeed to find any guide or sample for the
>   king of tests required (packets transmission).
>   If someone can provide a pointer to the relevant
>   information we'll be very grateful.

QEMU does not have a NIC test suite.  You can use manual testing of
guests with vmxnet3 drivers to see if the NIC works as expected.
Maybe you could even try running WHQL tests on the VMXNET3 driver
inside a Windows guest - I'm not sure, I've never looked into WHQL.

The community has been developing the qtest framework which is for
device-level testing.  For example, you could poke hardware registers
and check that the correct interrupt gets raised.  However, this
framework is being extended to cover busses and hardware as developers
decide they want test coverage.  PCI support isn't available yet but
Anthony Liguori recently worked on qtest PCI code.



Re: [Qemu-devel] [PATCH 15/37] target-i386: set default value of "hypervisor" feature using static property

2012-11-29 Thread Igor Mammedov
On Thu, 29 Nov 2012 13:47:37 -0200
Eduardo Habkost  wrote:

> On Thu, Nov 29, 2012 at 03:56:09PM +0100, Igor Mammedov wrote:
> > On Fri, 9 Nov 2012 16:48:07 +0100
> > Eduardo Habkost  wrote:
> > 
> > > 
> > > On 22/10/2012, at 17:03, Igor Mammedov  wrote:
> > > 
> > > > Signed-off-by: Igor Mammedov 
> > > > ---
> > > > target-i386/cpu.c | 9 +++--
> > > > 1 file changed, 3 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> > > > index 3131945..dc4fcdf 100644
> > > > --- a/target-i386/cpu.c
> > > > +++ b/target-i386/cpu.c
> > > > @@ -174,7 +174,7 @@ static Property cpu_x86_properties[] = {
> > > > DEFINE_PROP_BIT("f-xsave", X86CPU, env.cpuid_ext_features, 26,
> > > > false), DEFINE_PROP_BIT("f-osxsave", X86CPU, env.cpuid_ext_features,
> > > > 27, false), DEFINE_PROP_BIT("f-avx", X86CPU, env.cpuid_ext_features,
> > > > 28, false),
> > > > -DEFINE_PROP_BIT("f-hypervisor", X86CPU, env.cpuid_ext_features,
> > > > 31, false),
> > > > +DEFINE_PROP_BIT("f-hypervisor", X86CPU, env.cpuid_ext_features,
> > > > 31, true), DEFINE_PROP_BIT("f-syscall", X86CPU,
> > > > env.cpuid_ext2_features, 11, false), DEFINE_PROP_BIT("f-nx", X86CPU,
> > > > env.cpuid_ext2_features, 20, false), DEFINE_PROP_BIT("f-xd", X86CPU,
> > > > env.cpuid_ext2_features, 20, false), @@ -1307,11 +1307,12 @@ static
> > > > int cpu_x86_find_by_name(X86CPU *cpu, x86_def_t *x86_cpu_def, {
> > > > unsigned int i;
> > > > x86_def_t *def;
> > > > +CPUX86State *env = &cpu->env;
> > > > 
> > > > char *s = g_strdup(cpu_model);
> > > > char *featurestr, *name = strtok(s, ",");
> > > > /* Features to be added*/
> > > > -uint32_t plus_features = 0, plus_ext_features = 0;
> > > > +uint32_t plus_features = 0, plus_ext_features =
> > > > env->cpuid_ext_features;
> > > 
> > > Moving data back and forth between CPUX86State and x86_def_t makes the
> > > initialization ordering confusing (today data is moved from x86_def_t to
> > > X86CPU, and never the other way around).
> > > 
> > > As this code is removed in the next patches, I don't mind too much, but
> > > maybe it's simpler to implement this change only after the "use static
> > > properties for setting cpuid features" patch?
> > It won't eliminate moving data back and forth between CPUX86State and
> > x86_def_t until CPU sub-classes (i.e. when x86_def_t isn't used anymore),
> > if defaults from static properties are used.
> 
> You don't need CPU subclasses to reduce back-and-forth data movement.
> We can just reorder code to look like:
> 
>  split_cpu_model_string(cpu_model, &model, &features);
>  fill_cpudef_for_model_somehow(model, &cpudef);
somewhere between here and [2], One still needs to enable "hypervisor"
feature, and set kvm_features defaults. I guess "hypervisor" could be pushed
out into x86_def_t for each model and kvm_cpu_fill_host(). But it would be
hard to do so for KVM features (see how pv_eoi is set).

see 8d239fbb9b, ded9f32153, a7bb291e5b, 2623e2f876, bd539e14ef in
https://github.com/imammedo/qemu/commits/x86-cpu-properties.WIP

>  cpudef_2_x86_cpu(cpu, &cpudef);
As alternative it's possible to merge (OR) ext_features and kvm_features from
cpudef into env inside of cpudef_2_x86_cpu() to keep defaults SET by static
properties when cpu obj was created.

[2]

This merge won't disappear completely until cpudefs are converted into
classes and X_class_init() will set all defaults for a given cpu model.

and common defaults could/should be set by parent X86CPU class_init(),
and ideally rewrite should end up in a inheritance tree of subclasses, each
adding new features relative to previous model.

>  cpu_parse_feature_string(cpu, features);
> 
> Instead of what we have today, that is:
> 
>  split_cpu_model_string(cpu_model, &model, &features);
>  fill_cpudef_for_model_somehow(model, &cpudef);
>  cpu_parse_feature_string(cpu, features);
>  /* (the 3 steps above are all done inside cpu_x86_find_by_name()) */
>  cpudef_2_x86_cpu(cpu, &cpudef);
> 
> You did that on patch 32/37, but I would be interesting to do that
> _before_ introducing the whole new CPU properties code, to make review
> easier.
> 
> Reordering the steps to make the code less confusing before introducing
> the CPU properties makes the code easier to review, that's why I sent
> the CPU init cleanup series. And it makes us one step closer to
> completely eliminate the X86CPUDefinition struct in the future.
> 
> I actually made the reordering change above using the CPU init series as
> base, and it was very easy to do, after the CPU init was cleaned up. You
> can see the result at:
> https://github.com/ehabkost/qemu-hacks/commit/5256503135e787cb3550092c24dbeb3c5fc5a747
After some playing, I've abandoned approach to change
cpu_x86_find_by_name() and opted in favor of not touching it and feature
string parsing much, except of doing almost the same split as you. As
additional thing, I've moved setting defaults out of it as well.

> (it's completely 

Re: [Qemu-devel] [PATCH]target-i386:cpu_x86_init():do clean up work

2012-11-29 Thread Eduardo Habkost
On Wed, Nov 28, 2012 at 02:20:23PM +0800, liguang wrote:
> 1.remove unused variable env

It's not unused. You are removing the line that sets env->cpu_model_str.

> 2.remove redundant error handling
> 
> Signed-off-by: liguang 
> ---
>  target-i386/helper.c |   17 -
>  1 files changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/target-i386/helper.c b/target-i386/helper.c
> index bf206cf..5686130 100644
> --- a/target-i386/helper.c
> +++ b/target-i386/helper.c
> @@ -1243,25 +1243,24 @@ int cpu_x86_get_descr_debug(CPUX86State *env, 
> unsigned int selector,
>  X86CPU *cpu_x86_init(const char *cpu_model)
>  {
>  X86CPU *cpu;
> -CPUX86State *env;
>  Error *error = NULL;
>  
>  cpu = X86_CPU(object_new(TYPE_X86_CPU));
> -env = &cpu->env;
> -env->cpu_model_str = cpu_model;

Why did you remove this line?

>  
> -if (cpu_x86_register(cpu, cpu_model) < 0) {
> -object_delete(OBJECT(cpu));
> -return NULL;
> -}
> +if (cpu_x86_register(cpu, cpu_model) < 0)
> +   goto error_out;

You are mixing tabs and spaces here, and below:

>  
>  x86_cpu_realize(OBJECT(cpu), &error);
>  if (error) {
>  error_free(error);
> -object_delete(OBJECT(cpu));
> -return NULL;
> + goto error_out;

(here)

>  }
> +
>  return cpu;
> +
> + error_out:
> + object_delete(OBJECT(cpu));
> + return NULL;

(and here)

>  }
>  
>  #if !defined(CONFIG_USER_ONLY)
> -- 
> 1.7.2.5
> 

-- 
Eduardo



Re: [Qemu-devel] [PATCH 1.3?] multiboot: fix e801 memory map

2012-11-29 Thread Alexander Graf

On 29.11.2012, at 18:11, Paolo Bonzini wrote:

> The e801 memory sizes in the multiboot structures hard-code the available
> low memory to 640.  However, the value should not include the size of the
> EBDA.  Fill the value in the option ROM, getting the size of low memory
> from the BIOS.
> 
> Cc: Alexander Graf 
> Signed-off-by: Paolo Bonzini 
> ---
>Alex, can you ack this patch for 1.3?

Do you have a test case (OS that fails for example)?


Alex

> 
> pc-bios/multiboot.bin | Bin 1024 -> 1024 bytes
> pc-bios/optionrom/multiboot.S |   7 +++
> 2 files changed, 7 insertions(+)
> 
> diff --git a/pc-bios/multiboot.bin b/pc-bios/multiboot.bin
> index 
> f74a6e142fddc054d7f40ab346a108532afac40f..7b3c1745a430ea5e0e15b9aa817d1cbbaa40db14
>  100644
> GIT binary patch
> delta 81
> zcmZqRXyBNj#q7o8KT+38L4+xd@o;KdryK*rZVrajPB|8aZaEhwAcKt|ty|7*VtW(k
> kS)sHUDQSyY7$$Qv{%2ueV91@!$z;eVv)P(y2P2~%0BkrF9smFU
> 
> delta 72
> zcmZqRXyBNj#capqJW bA;$lVDU(H+3>gJByEE-zoLs=f!*~_|qXQE}
> 
> diff --git a/pc-bios/optionrom/multiboot.S b/pc-bios/optionrom/multiboot.S
> index f08222a..003bcfb 100644
> --- a/pc-bios/optionrom/multiboot.S
> +++ b/pc-bios/optionrom/multiboot.S
> @@ -75,6 +75,13 @@ run_multiboot:
>   shr $4, %eax
>   mov %ax, %fs
> 
> + /* Account for the EBDA in the multiboot structure's e801
> +  * map.
> +  */
> + int $0x12
> + cwtl
> + movl%eax, %fs:4
> +
>   /* ES = mmap_addr */
>   mov %fs:48, %eax
>   shr $4, %eax
> -- 
> 1.8.0
> 




Re: [Qemu-devel] [PATCH 1.3?] multiboot: fix e801 memory map

2012-11-29 Thread Alexander Graf

On 29.11.2012, at 18:11, Paolo Bonzini wrote:

> The e801 memory sizes in the multiboot structures hard-code the available
> low memory to 640.  However, the value should not include the size of the
> EBDA.  Fill the value in the option ROM, getting the size of low memory
> from the BIOS.

The description mentions that instead of hard coding, you fetch the value 
dynamically via a BIOS call. However, I don't see the code that hard codes it 
changed. Where is that one?


Alex

> 
> Cc: Alexander Graf 
> Signed-off-by: Paolo Bonzini 
> ---
>Alex, can you ack this patch for 1.3?
> 
> pc-bios/multiboot.bin | Bin 1024 -> 1024 bytes
> pc-bios/optionrom/multiboot.S |   7 +++
> 2 files changed, 7 insertions(+)
> 
> diff --git a/pc-bios/multiboot.bin b/pc-bios/multiboot.bin
> index 
> f74a6e142fddc054d7f40ab346a108532afac40f..7b3c1745a430ea5e0e15b9aa817d1cbbaa40db14
>  100644
> GIT binary patch
> delta 81
> zcmZqRXyBNj#q7o8KT+38L4+xd@o;KdryK*rZVrajPB|8aZaEhwAcKt|ty|7*VtW(k
> kS)sHUDQSyY7$$Qv{%2ueV91@!$z;eVv)P(y2P2~%0BkrF9smFU
> 
> delta 72
> zcmZqRXyBNj#capqJW bA;$lVDU(H+3>gJByEE-zoLs=f!*~_|qXQE}
> 
> diff --git a/pc-bios/optionrom/multiboot.S b/pc-bios/optionrom/multiboot.S
> index f08222a..003bcfb 100644
> --- a/pc-bios/optionrom/multiboot.S
> +++ b/pc-bios/optionrom/multiboot.S
> @@ -75,6 +75,13 @@ run_multiboot:
>   shr $4, %eax
>   mov %ax, %fs
> 
> + /* Account for the EBDA in the multiboot structure's e801
> +  * map.
> +  */
> + int $0x12
> + cwtl
> + movl%eax, %fs:4
> +
>   /* ES = mmap_addr */
>   mov %fs:48, %eax
>   shr $4, %eax
> -- 
> 1.8.0
> 




Re: [Qemu-devel] tap devices not receiving packets from a bridge

2012-11-29 Thread Peter Lieven
Am 23.11.2012 08:02, schrieb Stefan Hajnoczi:
> On Thu, Nov 22, 2012 at 03:29:52PM +0100, Peter Lieven wrote:
>> is anyone aware of a problem with the linux network bridge that in very rare 
>> circumstances stops
>> a bridge from sending pakets to a tap device?
>>
>> My problem occurs in conjunction with vanilla qemu-kvm-1.2.0 and Ubuntu 
>> Kernel 3.2.0-34.53
>> which is based on Linux 3.2.33.
>>
>> I was not yet able to reproduce the issue, it happens in really rare cases. 
>> The symptom is that
>> the tap does not have any TX packets. RX is working fine. I see the packets 
>> coming in at
>> the physical interface on the host, but they are not forwarded to the tap 
>> interface.
>> The bridge itself has learnt the mac address of the vServer that is 
>> connected to the tap interface.
>> It does not help to toggle the bridge link status,  the tap interface status 
>> or the interface in the vServer.
>> It seems that problem occurs if a tap interface that has previously been 
>> used, but set to nonpersistent
>> is set persistent again and then is by chance assigned to the same vServer 
>> (=same mac address on same
>> bridge) again. Unfortunately it seems not to be reproducible.
> 
> Not sure but this patch from Michael Tsirkin may help - it solves an
> issue with persistent tap devices:
> 
> http://patchwork.ozlabs.org/patch/198598/

I have tried that patch (even if I do not use persistant taps), but it doesn't 
help.

What I can say now is that if a VM is not working with a tap - lets say tap10 
then
it will not work with tap10, even if the vm is shut down. tap10 is set to 
non-persistant.
then the vm is started again, assigned occasionally again tap10 and is not 
working again.

BUT, if I use qemu-kvm-1.0.1 in the second case it is working. I have seen that 
there is
a lot of changes between 1.0.1 and 1.2.0 in the tap code. Maybe there is a bug 
in the
inititialization since then.

What also seem to have changed is that vlans have been removed. I was not aware 
of that,
so I still use vlans which are now emulated via hubs. Maybe this is related.

Peter

> 
> Stefan
> 




Re: [Qemu-devel] [PATCH] virtio: limit avail bytes lookahead

2012-11-29 Thread Anthony Liguori
"Michael S. Tsirkin"  writes:

> On Thu, Nov 29, 2012 at 06:34:46PM +0530, Amit Shah wrote:
>> On (Wed) 28 Nov 2012 [23:53:08], Michael S. Tsirkin wrote:
>> > On Tue, Nov 27, 2012 at 06:25:04PM +0200, Michael S. Tsirkin wrote:
>> > > On Thu, Nov 01, 2012 at 06:07:21PM +0200, Michael S. Tsirkin wrote:
>> > > > Commit 0d8d7690850eb0cf2b2b60933cf47669a6b6f18f introduced
>> > > > a regression in virtio-net performance because it looks
>> > > > into the ring aggressively while we really only care
>> > > > about a single packet worth of buffers.
>> > > > To fix, add parameters limiting lookahead, and
>> > > > use in virtqueue_avail_bytes.
>> > > > 
>> > > > Signed-off-by: Michael S. Tsirkin 
>> > > > Reported-by: Edivaldo de Araujo Pereira 
>> > > 
>> > > Ping.
>> > > Anthony - going to apply this?
>> > 
>> > virtio rng was added since so naturally build broke.
>> > Here's a patch on top to fix it up. I never used virtio rng before so
>> > could not test at this hour, but it does fix the build.
>> > 
>> > I'll take a look at how to test it tomorrow but any
>> > info would be appreciated.
>> > Amit could you pls review?
>> 
>> Looks fine, I assume you will send a v2 of the patch to Anthony?
>> 
>>  Amit
>
>
> Anthony volunteered to test this so there will only be v2 if he sees
> problems.

I need a Signed-off-by so why don't you just go ahead and send a v2 and
I'll test that.

Regards,

Anthony Liguori



Re: [Qemu-devel] [Xen-devel] [PATCH 0/2] QEMU/xen: simplify cpu_ioreq_pio and cpu_ioreq_move

2012-11-29 Thread Pasi Kärkkäinen
On Thu, Nov 29, 2012 at 01:57:11PM +, Ian Campbell wrote:
> On Thu, 2012-11-29 at 13:21 +, Stefano Stabellini wrote:
> > > Also I think 4.2.1 need these patches to enable the basic Xen on Xen
> > > nested virtualization usage scenario.
> > 
> > I agree.
> 
> Nested virt was a tech preview in 4.2.0, is it really worth
> backporting ?
> 

IIRC AMD Nested Virt works on 4.2.0, so if this backport makes Intel Nested Virt
working aswell it'd be nice..

-- Pasi




Re: [Qemu-devel] [PATCH V7 0/6] VMXNET3 paravirtual NIC device implementation

2012-11-29 Thread Dmitry Fleytman
Hello Stefan

Yes, we tested this device with various guests (both Linux and Windows) and
it also passed WHQL tests for Windows Server 2008 R2 OS.
The issue is Anthony asked us to write a basic test for the device (a
packet transmission test) and we are trying to get some reference code or
documentation to do this.

Anthony, Paolo, please, advise.

Thanks in advance,
Dmitry Fleytman


On Thu, Nov 29, 2012 at 7:45 PM, Stefan Hajnoczi  wrote:

> On Fri, Nov 16, 2012 at 2:55 PM, Dmitry Fleytman 
> wrote:
> >   We didn't succeed to find any guide or sample for the
> >   king of tests required (packets transmission).
> >   If someone can provide a pointer to the relevant
> >   information we'll be very grateful.
>
> QEMU does not have a NIC test suite.  You can use manual testing of
> guests with vmxnet3 drivers to see if the NIC works as expected.
> Maybe you could even try running WHQL tests on the VMXNET3 driver
> inside a Windows guest - I'm not sure, I've never looked into WHQL.
>
> The community has been developing the qtest framework which is for
> device-level testing.  For example, you could poke hardware registers
> and check that the correct interrupt gets raised.  However, this
> framework is being extended to cover busses and hardware as developers
> decide they want test coverage.  PCI support isn't available yet but
> Anthony Liguori recently worked on qtest PCI code.
>



-- 
Dmitry Fleytman
Technology Expert and Consultant,

Daynix Computing Ltd.

Cell: +972-54-2819481
Skype: dmitry.fleytman


Re: [Qemu-devel] [PATCH 15/37] target-i386: set default value of "hypervisor" feature using static property

2012-11-29 Thread Eduardo Habkost
On Thu, Nov 29, 2012 at 07:30:29PM +0100, Igor Mammedov wrote:
> On Thu, 29 Nov 2012 13:47:37 -0200
> Eduardo Habkost  wrote:
> 
> > On Thu, Nov 29, 2012 at 03:56:09PM +0100, Igor Mammedov wrote:
> > > On Fri, 9 Nov 2012 16:48:07 +0100
> > > Eduardo Habkost  wrote:
> > > 
> > > > 
> > > > On 22/10/2012, at 17:03, Igor Mammedov  wrote:
> > > > 
> > > > > Signed-off-by: Igor Mammedov 
> > > > > ---
> > > > > target-i386/cpu.c | 9 +++--
> > > > > 1 file changed, 3 insertions(+), 6 deletions(-)
> > > > > 
> > > > > diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> > > > > index 3131945..dc4fcdf 100644
> > > > > --- a/target-i386/cpu.c
> > > > > +++ b/target-i386/cpu.c
> > > > > @@ -174,7 +174,7 @@ static Property cpu_x86_properties[] = {
> > > > > DEFINE_PROP_BIT("f-xsave", X86CPU, env.cpuid_ext_features, 26,
> > > > > false), DEFINE_PROP_BIT("f-osxsave", X86CPU, env.cpuid_ext_features,
> > > > > 27, false), DEFINE_PROP_BIT("f-avx", X86CPU, env.cpuid_ext_features,
> > > > > 28, false),
> > > > > -DEFINE_PROP_BIT("f-hypervisor", X86CPU, env.cpuid_ext_features,
> > > > > 31, false),
> > > > > +DEFINE_PROP_BIT("f-hypervisor", X86CPU, env.cpuid_ext_features,
> > > > > 31, true), DEFINE_PROP_BIT("f-syscall", X86CPU,
> > > > > env.cpuid_ext2_features, 11, false), DEFINE_PROP_BIT("f-nx", X86CPU,
> > > > > env.cpuid_ext2_features, 20, false), DEFINE_PROP_BIT("f-xd", X86CPU,
> > > > > env.cpuid_ext2_features, 20, false), @@ -1307,11 +1307,12 @@ static
> > > > > int cpu_x86_find_by_name(X86CPU *cpu, x86_def_t *x86_cpu_def, {
> > > > > unsigned int i;
> > > > > x86_def_t *def;
> > > > > +CPUX86State *env = &cpu->env;
> > > > > 
> > > > > char *s = g_strdup(cpu_model);
> > > > > char *featurestr, *name = strtok(s, ",");
> > > > > /* Features to be added*/
> > > > > -uint32_t plus_features = 0, plus_ext_features = 0;
> > > > > +uint32_t plus_features = 0, plus_ext_features =
> > > > > env->cpuid_ext_features;
> > > > 
> > > > Moving data back and forth between CPUX86State and x86_def_t makes the
> > > > initialization ordering confusing (today data is moved from x86_def_t to
> > > > X86CPU, and never the other way around).
> > > > 
> > > > As this code is removed in the next patches, I don't mind too much, but
> > > > maybe it's simpler to implement this change only after the "use static
> > > > properties for setting cpuid features" patch?
> > > It won't eliminate moving data back and forth between CPUX86State and
> > > x86_def_t until CPU sub-classes (i.e. when x86_def_t isn't used anymore),
> > > if defaults from static properties are used.
> > 
> > You don't need CPU subclasses to reduce back-and-forth data movement.
> > We can just reorder code to look like:
> > 
> >  split_cpu_model_string(cpu_model, &model, &features);
> >  fill_cpudef_for_model_somehow(model, &cpudef);
> somewhere between here and [2], One still needs to enable "hypervisor"
> feature, and set kvm_features defaults. I guess "hypervisor" could be pushed
> out into x86_def_t for each model and kvm_cpu_fill_host(). But it would be
> hard to do so for KVM features (see how pv_eoi is set).

If the features are set by default on all models, we can hardcode or set
them by default either in the fill_cpudef_for_model_somehow() step
(meaning that the models have it implicitly set), or inside
cpudef_2_x86_cpu() (meaning that the feature can't be controlled by the
CPU feature string in any way).

In think both the KVM defaults and the hypervisor flag fit the former
case (because the defaults can be changed by the feature string later).
In other words: even if the feature is not mentioned in
builtin_x86_defs, it can be simply be  considered part of the defaults
for all CPU models.

Currently the defaults are set inside the feature string parsing code
(in the pseudo-code I wrote, that means setting them by default on
cpu_parse_feature_string()). That works, too, but it's confusing IMO.

One thing I forgot to mention about the example code, is that eventually
the two steps:

  fill_cpudef_for_model_somehow(model, &cpudef);
  cpudef_2_x86_cpu(cpu, &cpudef);

will become just a single step:

  initialize_default_flags_for_model(cpu, model);

or, even better, it will become:

  cpu = create_cpu_object_with_default_flags_already_set(model);

When we change the code to do that, we will be able to easily introduce
the subclasses, and to easily eliminate the X86CPUDefinnition struct
(aka x86_def_t).

> 
> see 8d239fbb9b, ded9f32153, a7bb291e5b, 2623e2f876, bd539e14ef in
> https://github.com/imammedo/qemu/commits/x86-cpu-properties.WIP
> 
> >  cpudef_2_x86_cpu(cpu, &cpudef);
> As alternative it's possible to merge (OR) ext_features and kvm_features from
> cpudef into env inside of cpudef_2_x86_cpu() to keep defaults SET by static
> properties when cpu obj was created.

The feature string can be used to disable features too, not just to
enable them, so we can't just merge two bitmaps later.

> 
> [2]
> 
> This merge won't 

Re: [Qemu-devel] [PATCH 1.3?] multiboot: fix e801 memory map

2012-11-29 Thread Paolo Bonzini


- Messaggio originale -
> Da: "Alexander Graf" 
> A: "Paolo Bonzini" 
> Cc: qemu-devel@nongnu.org
> Inviato: Giovedì, 29 novembre 2012 19:51:07
> Oggetto: Re: [PATCH 1.3?] multiboot: fix e801 memory map
> 
> 
> On 29.11.2012, at 18:11, Paolo Bonzini wrote:
> 
> > The e801 memory sizes in the multiboot structures hard-code the available
> > low memory to 640.  However, the value should not include the size
> > of the EBDA.  Fill the value in the option ROM, getting the size of low
> > memory from the BIOS.
> 
> The description mentions that instead of hard coding, you fetch the
> value dynamically via a BIOS call. However, I don't see the code
> that hard codes it changed. Where is that one?

It is in hw/multiboot.c:

stl_p(bootinfo + MBI_MEM_LOWER,   640);


Regarding the testcase, Xen will touch the EBDA without this patch and
with http://permalink.gmane.org/gmane.comp.emulators.xen.devel/144855.
However, SeaBIOS does not complain.

Paolo



Re: [Qemu-devel] [PATCH V5 2/2] Enabling IA32_TSC_ADJUST for KVM guest VM support

2012-11-29 Thread Auld, Will
Marcelo,

The behavior on reset is to return the TSC_AJUST msr value to 0x0. I am 
currently initializing this emulated msr in kvm_arch_vcpu_init(). 

>- Behaviour on reset: what is the behaviour on RESET?

I am testing the rebase now. I would like to get any needed changes for this 
initialization into my next patch set version (v6) if possible.

Thanks,

Will

> -Original Message-
> From: Marcelo Tosatti [mailto:mtosa...@redhat.com]
> Sent: Tuesday, November 27, 2012 5:00 PM
> To: Auld, Will
> Cc: k...@vger.kernel.org; Dugger, Donald D; Liu, Jinsong; Zhang,
> Xiantao; a...@redhat.com; qemu-devel; Gleb
> Subject: Re: [PATCH V5 2/2] Enabling IA32_TSC_ADJUST for KVM guest VM
> support
> 
> 
> Hi Will,
> 
> On Tue, Nov 27, 2012 at 11:09:00AM -0800, Will Auld wrote:
> > CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported
> >
> > Basic design is to emulate the MSR by allowing reads and writes to a
> > guest vcpu specific location to store the value of the emulated MSR
> > while adding the value to the vmcs tsc_offset. In this way the
> > IA32_TSC_ADJUST value will be included in all reads to the TSC MSR
> > whether through rdmsr or rdtsc. This is of course as long as the "use
> > TSC counter offsetting" VM-execution control is enabled as well as
> the IA32_TSC_ADJUST control.
> >
> > However, because hardware will only return the TSC + IA32_TSC_ADJUST
> +
> > vmsc tsc_offset for a guest process when it does and rdtsc (with the
> > correct
> > settings) the value of our virtualized IA32_TSC_ADJUST must be stored
> > in one of these three locations. The argument against storing it in
> > the actual MSR is performance. This is likely to be seldom used while
> > the save/restore is required on every transition. IA32_TSC_ADJUST was
> > created as a way to solve some issues with writing TSC itself so that
> is not an option either.
> > The remaining option, defined above as our solution has the problem
> of
> > returning incorrect vmcs tsc_offset values (unless we intercept and
> > fix, not done here) as mentioned above. However, more problematic is
> > that storing the data in vmcs tsc_offset will have a different
> > semantic effect on the system than does using the actual MSR. This is
> illustrated in the following example:
> > The hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a
> > guest process performs a rdtsc. In this case the guest process will
> > get TSC + IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including
> IA32_TSC_ADJUST_guest.
> > While the total system semantics changed the semantics as seen by the
> > guest do not and hence this will not cause a problem.
> >
> > Signed-off-by: Will Auld 
> > ---
> >  arch/x86/include/asm/cpufeature.h |  1 +
> >  arch/x86/include/asm/kvm_host.h   |  3 +++
> >  arch/x86/include/asm/msr-index.h  |  1 +
> >  arch/x86/kvm/cpuid.c  |  2 ++
> >  arch/x86/kvm/cpuid.h  |  8 
> >  arch/x86/kvm/svm.c|  7 +++
> >  arch/x86/kvm/vmx.c|  9 +
> >  arch/x86/kvm/x86.c| 22 ++
> >  8 files changed, 53 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/cpufeature.h
> > b/arch/x86/include/asm/cpufeature.h
> > index 6b7ee5f..e574d81 100644
> > --- a/arch/x86/include/asm/cpufeature.h
> > +++ b/arch/x86/include/asm/cpufeature.h
> > @@ -199,6 +199,7 @@
> >
> >  /* Intel-defined CPU features, CPUID level 0x0007:0 (ebx), word
> 9 */
> >  #define X86_FEATURE_FSGSBASE   (9*32+ 0) /* {RD/WR}{FS/GS}BASE
> instructions*/
> > +#define X86_FEATURE_TSC_ADJUST  (9*32+ 1) /* TSC adjustment MSR 0x3b
> > +*/
> >  #define X86_FEATURE_BMI1   (9*32+ 3) /* 1st group bit manipulation
> extensions */
> >  #define X86_FEATURE_HLE(9*32+ 4) /* Hardware Lock Elision
> */
> >  #define X86_FEATURE_AVX2   (9*32+ 5) /* AVX2 instructions */
> > diff --git a/arch/x86/include/asm/kvm_host.h
> > b/arch/x86/include/asm/kvm_host.h index da34027..cf8c7e0 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -442,6 +442,8 @@ struct kvm_vcpu_arch {
> > u32 virtual_tsc_mult;
> > u32 virtual_tsc_khz;
> >
> > +   s64 ia32_tsc_adjust_msr;
> > +
> > atomic_t nmi_queued;  /* unprocessed asynchronous NMIs */
> > unsigned nmi_pending; /* NMI queued after currently running
> handler */
> > bool nmi_injected;/* Trying to inject an NMI this entry */
> > @@ -690,6 +692,7 @@ struct kvm_x86_ops {
> > bool (*has_wbinvd_exit)(void);
> >
> > void (*set_tsc_khz)(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool
> > scale);
> > +   u64 (*read_tsc_offset)(struct kvm_vcpu *vcpu);
> > void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
> >
> > u64 (*compute_tsc_offset)(struct kvm_vcpu *vcpu, u64 target_tsc);
> > diff --git a/arch/x86/include/asm/msr-index.h
> > b/arch/x86/include/asm/msr-index.h
> > index 957ec87..6486569 100644
> > --- a/arch/x86/include/asm/msr-index.h
> > +++ b/arch

Re: [Qemu-devel] [PATCH 1.3?] multiboot: fix e801 memory map

2012-11-29 Thread Alexander Graf

On 29.11.2012, at 20:21, Paolo Bonzini wrote:

> 
> 
> - Messaggio originale -
>> Da: "Alexander Graf" 
>> A: "Paolo Bonzini" 
>> Cc: qemu-devel@nongnu.org
>> Inviato: Giovedì, 29 novembre 2012 19:51:07
>> Oggetto: Re: [PATCH 1.3?] multiboot: fix e801 memory map
>> 
>> 
>> On 29.11.2012, at 18:11, Paolo Bonzini wrote:
>> 
>>> The e801 memory sizes in the multiboot structures hard-code the available
>>> low memory to 640.  However, the value should not include the size
>>> of the EBDA.  Fill the value in the option ROM, getting the size of low
>>> memory from the BIOS.
>> 
>> The description mentions that instead of hard coding, you fetch the
>> value dynamically via a BIOS call. However, I don't see the code
>> that hard codes it changed. Where is that one?
> 
> It is in hw/multiboot.c:
> 
>stl_p(bootinfo + MBI_MEM_LOWER,   640);

You want to remove that one then.

> Regarding the testcase, Xen will touch the EBDA without this patch and
> with http://permalink.gmane.org/gmane.comp.emulators.xen.devel/144855.
> However, SeaBIOS does not complain.

I see :). Well, the patch is unintrusive enough for be ok for 1.3 IMHO.


Alex




Re: [Qemu-devel] [PATCH 1.3?] multiboot: fix e801 memory map

2012-11-29 Thread Paolo Bonzini
Il 29/11/2012 20:24, Alexander Graf ha scritto:
>> > It is in hw/multiboot.c:
>> > 
>> >stl_p(bootinfo + MBI_MEM_LOWER,   640);
> You want to remove that one then.

I wasn't sure of what happens if the multiboot option ROM is old.  Do we
support that to any extent?

> > Regarding the testcase, Xen will touch the EBDA without this patch and
> > with http://permalink.gmane.org/gmane.comp.emulators.xen.devel/144855.
> > However, SeaBIOS does not complain.
>
> I see :). Well, the patch is unintrusive enough for be ok for 1.3 IMHO.

Actually it's relatively easy to see a failure with Xen and a need to
PCI passthrough something with a bulky boot ROM; iSCSI or FC cards will
do.  You'll need both the Xen patch above, and this patch to boot
successfully.  Otherwise, it will fail to boot.

Paolo



Re: [Qemu-devel] [PATCH V5 2/2] Enabling IA32_TSC_ADJUST for KVM guest VM support

2012-11-29 Thread Marcelo Tosatti
On Thu, Nov 29, 2012 at 07:21:28PM +, Auld, Will wrote:
> Marcelo,
> 
> The behavior on reset is to return the TSC_AJUST msr value to 0x0. I am 
> currently initializing this emulated msr in kvm_arch_vcpu_init(). 

Will,

Reset is handled by QEMU. kvm_arch_vcpu_init is only called during vcpu
allocation (ie vm creation). See x86_cpu_reset in QEMU's
target-i386/cpu.c file.

> 
> >- Behaviour on reset: what is the behaviour on RESET?
> 
> I am testing the rebase now. I would like to get any needed changes for this 
> initialization into my next patch set version (v6) if possible.

For the kernel patch, everything is fine. Apparently for the QEMU patch
it is  necessary to set value to zero on reset.




  1   2   >