date:20150707

Re: [Qemu-devel] vpc size reporting problem

2015-07-07 Thread Chun Yan Liu



>>> On 7/7/2015 at 02:36 PM, in message <559b7366.5030...@kamp.de>, Peter Lieven
 wrote: 
> Am 07.07.2015 um 08:34 schrieb Chun Yan Liu: 
> > 
>  On 7/7/2015 at 02:19 PM, in message <559B6F79.237 : 102 : 21807>, Chun 
>  Yan  
> Liu 
> > wrote: 
> >>
> > On 7/7/2015 at 02:03 PM, in message <559b6bbe.3050...@kamp.de>, Peter 
> > Lieven 
> >>  wrote: 
> >>> Am 07.07.2015 um 07:59 schrieb Chun Yan Liu: 
> 
> >>> On 7/7/2015 at 01:50 PM, in message <559b68b2.5060...@kamp.de>, Peter 
> >>> Lieven 
>   wrote: 
> > Am 07.07.2015 um 03:50 schrieb Chun Yan Liu: 
> > On 7/6/2015 at 06:42 PM, in message <559a5b79.4010...@kamp.de>, 
> > Peter Lieven 
> >>  wrote: 
> >>> Am 06.07.2015 um 11:44 schrieb Chun Yan Liu: 
>  While testing with a 1GB VHD file created on win7, found that the 
>  VHD file 
>  size reported on Windows is different from that is reported by 
>  qemu-img 
>  info or within a Linux KVM guest. 
> 
>  Created a dynamic VHD file on win7, on Windows, it is reported 
>  1024MB 
>  (2097152 sectors). But with qemu-img info or within a Linux KVM 
>  guest, 
>  it is reported 1023MB (2096640 sectors). 
> 
>  The values in the footer_buf are as follows: 
>  creator_app: "win " 
>  cylinders: 0x820 (2080) 
>  heads: 0x10 (16) 
>  cyl/sec: 0x3f (63) 
>  current_size: 0x4000 (1G) 
> 
>  So, if using current_size, it's correct; but using CHS will get a 
>  smaller 
> >>> size. 
>  Should we add a check in this case and use "current_size" instead of 
>  CHS? 
> >>> 
> >>> As far as I remember the issue was and still is that there is no 
> >>> official 
> >>> spec that says 
> >>> use current_size in case A and CHS in case B. 
> >> Understand. 
> >>
> >>> 
> >>> If currrent_size is greater than CHS and Windows would use CHS (we 
> >>> don't 
> >>> know that) we might run into issues if Qemu uses current_size. In 
> >>> this 
> >>> cas we would write data beyond the end of the container (from Windows 
> >>> perspective). 
> >> That's right. The fact is in our testing we found Windows does not use 
> >> CHS 
> >> but current_size (from testing result), we create and get the VHD 
> >> parted on 
> >> Windows, then take the VHD file into Linux KVM guest, it fails to show 
> > partition 
> >> table (since the reported disk size is shrinking, some of the 
> >> partitions 
> > extend 
> >> beyond the end of the disk). 
> >   
> > Which version of Windows are you referring to? 
>  Tested with WS2012R2 and Win7. 
> >>> 
> >>> Which storage driver? 
> > And imported to a Win7 guest on KVM as IDE device, it's also reported as 
> > 1024MB (not CHS value, CHS is 1023MB). 
>  
> And what storage driver reports 1023MB under Qemu? 

SCSI driver under Linux guest.

>  
> Peter
>  
>  
>  
>

Re: [Qemu-devel] [PATCH qemu v10 11/14] spapr_pci_vfio: Enable multiple groups per container

2015-07-07 Thread Thomas Huth

On Mon,  6 Jul 2015 12:11:07 +1000
Alexey Kardashevskiy  wrote:

> This enables multiple IOMMU groups in one VFIO container which means
> that multiple devices from different groups can share the same IOMMU
> table (or tables if DDW).
> 
> This removes a group id from vfio_container_ioctl(). The kernel support
> is required for this; if the host kernel does not have the support,
> it will allow only one group per container. The PHB's "iommuid" property
> is ignored. The ioctl is called for every container attached to
> the address space. At the moment there is just one container anyway.
> 
> If there is no container attached to the address space,
> vfio_container_do_ioctl() returns -1.
> 
> This removes casts to sPAPRPHBVFIOState as none of sPAPRPHBVFIOState
> members is accessed here.
> 
> Signed-off-by: Alexey Kardashevskiy 
> Reviewed-by: David Gibson 
> ---
>  hw/ppc/spapr_pci_vfio.c | 17 ++---
>  hw/vfio/common.c| 20 ++--
>  include/hw/vfio/vfio.h  |  2 +-
>  3 files changed, 13 insertions(+), 26 deletions(-)
...
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index b1045da..89ef37b 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -918,34 +918,26 @@ void vfio_put_base_device(VFIODevice *vbasedev)
>  close(vbasedev->fd);
>  }
>  
> -static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid,
> +static int vfio_container_do_ioctl(AddressSpace *as,
> int req, void *param)
>  {
> -VFIOGroup *group;
>  VFIOContainer *container;
>  int ret = -1;
> +VFIOAddressSpace *space = vfio_get_address_space(as);
>  
> -group = vfio_get_group(groupid, as);
> -if (!group) {
> -error_report("vfio: group %d not registered", groupid);
> -return ret;
> -}
> -
> -container = group->container;
> -if (group->container) {
> +QLIST_FOREACH(container, &space->containers, next) {
>  ret = ioctl(container->fd, req, param);
>  if (ret < 0) {
>  error_report("vfio: failed to ioctl %d to container: ret=%d, %s",
>   _IOC_NR(req) - VFIO_BASE, ret, strerror(errno));
> +return -errno;
>  }
>  }
>  
> -vfio_put_group(group);
> -
>  return ret;
>  }
>  
> -int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
> +int vfio_container_ioctl(AddressSpace *as,
>   int req, void *param)

You could easily fit that into one line now.

>  {
>  /* We allow only certain ioctls to the container */
> @@ -960,5 +952,5 @@ int vfio_container_ioctl(AddressSpace *as, int32_t 
> groupid,
>  return -1;
>  }
>  
> -return vfio_container_do_ioctl(as, groupid, req, param);
> +return vfio_container_do_ioctl(as, req, param);
>  }
> diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h
> index 0b26cd8..76b5744 100644
> --- a/include/hw/vfio/vfio.h
> +++ b/include/hw/vfio/vfio.h
> @@ -3,7 +3,7 @@
>  
>  #include "qemu/typedefs.h"
>  
> -extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
> +extern int vfio_container_ioctl(AddressSpace *as,
>  int req, void *param);

Dito.

Apart from the two cosmetic nits, patch looks fine to me:

Reviewed-by: Thomas Huth

Re: [Qemu-devel] [PATCH qemu v10 13/14] vfio: spapr: Add SPAPR IOMMU v2 support (DMA memory preregistering)

2015-07-07 Thread Thomas Huth

On Mon,  6 Jul 2015 12:11:09 +1000
Alexey Kardashevskiy  wrote:

> This makes use of the new "memory registering" feature. The idea is
> to provide the userspace ability to notify the host kernel about pages
> which are going to be used for DMA. Having this information, the host
> kernel can pin them all once per user process, do locked pages
> accounting (once) and not spent time on doing that in real time with
> possible failures which cannot be handled nicely in some cases.
> 
> This adds a guest RAM memory listener which notifies a VFIO container
> about memory which needs to be pinned/unpinned. VFIO MMIO regions
> (i.e. "skip dump" regions) are skipped.
> 
> The feature is only enabled for SPAPR IOMMU v2. The host kernel changes
> are required. Since v2 does not need/support VFIO_IOMMU_ENABLE, this does
> not call it when v2 is detected and enabled.
> 
> This does not change the guest visible interface.
> 
> Signed-off-by: Alexey Kardashevskiy 
> Reviewed-by: David Gibson 
> ---
> Changes:
> v9:
> * since there is no more SPAPR-specific data in container::iommu_data,
> the memory preregistration fields are common and potentially can be used
> by other architectures
> 
> v7:
> * in vfio_spapr_ram_listener_region_del(), do unref() after ioctl()
> * s'ramlistener'register_listener'
> 
> v6:
> * fixed commit log (s/guest/userspace/), added note about no guest visible
> change
> * fixed error checking if ram registration failed
> * added alignment check for section->offset_within_region
> 
> v5:
> * simplified the patch
> * added trace points
> * added round_up() for the size
> * SPAPR IOMMU v2 used
> ---
>  hw/vfio/common.c  | 109 
> ++
>  include/hw/vfio/vfio-common.h |   3 ++
>  trace-events  |   1 +
>  3 files changed, 104 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 8eacfd7..0c7ba8c 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -488,6 +488,76 @@ static void vfio_listener_release(VFIOContainer 
> *container)
>  memory_listener_unregister(&container->iommu_data.type1.listener);
>  }
>  
> +static void vfio_ram_do_region(VFIOContainer *container,
> +  MemoryRegionSection *section, unsigned long 
> req)
> +{
> +int ret;
> +struct vfio_iommu_spapr_register_memory reg = { .argsz = sizeof(reg) };
> +
> +if (!memory_region_is_ram(section->mr) ||
> +memory_region_is_skip_dump(section->mr)) {
> +return;
> +}
> +
> +if (unlikely((section->offset_within_region & (getpagesize() - 1 {
> +error_report("%s received unaligned region", __func__);
> +return;
> +}
> +
> +reg.vaddr = (__u64) memory_region_get_ram_ptr(section->mr) +

We're in usespace here ... I think it would be better to use uint64_t
instead of the kernel-type __u64.

> +section->offset_within_region;
> +reg.size = ROUND_UP(int128_get64(section->size), TARGET_PAGE_SIZE);
> +
> +ret = ioctl(container->fd, req, ®);
> +trace_vfio_ram_register(_IOC_NR(req) - VFIO_BASE, reg.vaddr, reg.size,
> +ret ? -errno : 0);
> +if (!ret) {
> +return;
> +}
> +
> +/*
> + * On the initfn path, store the first error in the container so we
> + * can gracefully fail.  Runtime, there's not much we can do other
> + * than throw a hardware error.
> + */
> +if (!container->iommu_data.ram_reg_initialized) {
> +if (!container->iommu_data.ram_reg_error) {
> +container->iommu_data.ram_reg_error = -errno;
> +}
> +} else {
> +hw_error("vfio: RAM registering failed, unable to continue");
> +}
> +}
> +
> +static void vfio_ram_listener_region_add(MemoryListener *listener,
> + MemoryRegionSection *section)
> +{
> +VFIOContainer *container = container_of(listener, VFIOContainer,
> +iommu_data.register_listener);
> +memory_region_ref(section->mr);
> +vfio_ram_do_region(container, section, VFIO_IOMMU_SPAPR_REGISTER_MEMORY);
> +}
> +
> +static void vfio_ram_listener_region_del(MemoryListener *listener,
> + MemoryRegionSection *section)
> +{
> +VFIOContainer *container = container_of(listener, VFIOContainer,
> +iommu_data.register_listener);
> +vfio_ram_do_region(container, section, 
> VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY);
> +memory_region_unref(section->mr);
> +}
> +
> +static const MemoryListener vfio_ram_memory_listener = {
> +.region_add = vfio_ram_listener_region_add,
> +.region_del = vfio_ram_listener_region_del,
> +};
> +
> +static void vfio_spapr_listener_release_v2(VFIOContainer *container)
> +{
> +memory_listener_unregister(&container->iommu_data.register_listener);
> +vfio_listener_release(container);
> +}
> +
>  int vfio_mmap_region(Object *obj,

Re: [Qemu-devel] [PATCH pic32 v2 5/5] Two new machine platforms: pic32mz7 and pic32mz.

2015-07-07 Thread Antony Pavlov

On Mon, 6 Jul 2015 11:58:54 -0700
Serge Vakulenko  wrote:

> On Mon, Jul 6, 2015 at 12:33 AM, Antony Pavlov  
> wrote:
> > On Sun, 5 Jul 2015 21:18:11 -0700
> > Serge Vakulenko  wrote:
> >
> >> On Wed, Jul 1, 2015 at 6:41 AM, Aurelien Jarno  
> >> wrote:
> >> > On 2015-06-30 21:12, Serge Vakulenko wrote:
> >> >> Signed-off-by: Serge Vakulenko 
> >> >> ---
> >> >>  hw/mips/Makefile.objs   |3 +
> >> >>  hw/mips/mips_pic32mx7.c | 1652 +
> >> >>  hw/mips/mips_pic32mz.c  | 2840 
> >> >> +++
> >> >>  hw/mips/pic32_ethernet.c|  557 +
> >> >>  hw/mips/pic32_gpio.c|   39 +
> >> >>  hw/mips/pic32_load_hex.c|  238 
> >> >>  hw/mips/pic32_peripherals.h |  210 
> >> >>  hw/mips/pic32_sdcard.c  |  428 +++
> >> >>  hw/mips/pic32_spi.c |  121 ++
> >> >>  hw/mips/pic32_uart.c|  228 
> >> >>  hw/mips/pic32mx.h   | 1290 
> >> >>  hw/mips/pic32mz.h   | 2093 +++
> >> >>  12 files changed, 9699 insertions(+)
> >> >>  create mode 100644 hw/mips/mips_pic32mx7.c
> >> >>  create mode 100644 hw/mips/mips_pic32mz.c
> >> >>  create mode 100644 hw/mips/pic32_ethernet.c
> >> >>  create mode 100644 hw/mips/pic32_gpio.c
> >> >>  create mode 100644 hw/mips/pic32_load_hex.c
> >> >>  create mode 100644 hw/mips/pic32_peripherals.h
> >> >>  create mode 100644 hw/mips/pic32_sdcard.c
> >> >>  create mode 100644 hw/mips/pic32_spi.c
> >> >>  create mode 100644 hw/mips/pic32_uart.c
> >> >>  create mode 100644 hw/mips/pic32mx.h
> >> >>  create mode 100644 hw/mips/pic32mz.h
> >> >
> >> > This patch is huge, and needs to be splitted to ease the review.
> >>
> >> I'll prepare a new patch set, with every new file put into a separate
> >> message. Other issues fixed as well.
> >
> > Putting every new file into a separate message is a nonsense.
> > Please separate __logical changes__ into a single patch.
> 
> Aurelien Jarno asked to split this patch to ease the review.

IMHO he meant something very different.

Please reread the qemu submitting patch manual carefully
(see http://wiki.qemu.org/Contribute/SubmitAPatch).

Here is a quote:

  Split up longer patches into a patch series of logical code changes.
  Each change should compile and execute successfully. For instance,
  don't add a file to the makefile in patch one and then add the file itself
  in patch two. (This rule is here so that people can later use tools
  like git bisect without hitting points in the commit history
  where QEMU doesn't work for reasons unrelated to the bug they're chasing.)

Also please reread this Peter's comment very very carefully:

   http://lists.nongnu.org/archive/html/qemu-devel/2015-07/msg01430.html

Peter asks you to rework your device support code: every device should be 
self-contained.
E.g. for UART support code this means that:

   0. Object model is used. Your UART code implements operation of one UART 
instance.
  private structure is used for storing UART instance's current state.
  The SoC code (or even board code) creates as many UART instances as it 
needs.

  Also please see this Aurilien's comment: 
http://lists.nongnu.org/archive/html/qemu-devel/2015-07/msg01242.html

   1. UART C code go to qemu.git/hw/char/;

   2. externally visible UART stuff (header file) go to 
qemu.git/include/hw/char/;

  Pay attention that there is no need to put all UART related macro into 
header file.
  If nobody outside your UART C code use these macros then you can keep 
their definition in the C code.

   3. UART C code compilation has to be enabled only for mips-softmmu target.
  So make your UART C code compilation dependendent on a Makefile option,
  enable this option only in qemu.git/default-configs/mips-softmmu.mak.

   4. UART support have to be added in a separate patch. So this patch have to 
contain changes in these files:

default-configs/mips-softmmu.mak
hw/char/Makefile.objs
hw/char/pic32_uart.c
include/hw/char/pic32_uart.h

  This UART support patch has to be submitted __before__ a patch with 
SoC/board code that use UART.

As Peter suggests please use 'Netduino 2 Machine Model' patchseries as a model,
  see http://lists.gnu.org/archive/html/qemu-devel/2015-02/msg03398.html

-- 
Best regards,
  Antony Pavlov

Re: [Qemu-devel] [PATCH 0/4] spapr: Small PAPR compliance fixes

2015-07-07 Thread David Gibson

On Fri, Jul 03, 2015 at 01:42:16PM +1000, Sam Bobroff wrote:
> 
> This patch set contains several small fixes to make QEMU more PAPR
> compliant.

Applied to spapr-next, thanks.

In future, please CC me on PAPR related patches - otherwise I'm likely
to lose them in the flood of qemu-devel.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpq6cfQyVrbQ.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH v2 05/10] block: add block job transactions

2015-07-07 Thread Fam Zheng

On Mon, 07/06 15:24, Stefan Hajnoczi wrote:
> +/**
> + * block_job_txn_add_job:
> + * @txn: The transaction (may be NULL)
> + * @job: Job to add to the transaction
> + *
> + * Add @job to the transaction.  The @job must not already be in a 
> transaction.
> + * The block job driver must call block_job_txn_prepare_to_complete() before

s/block_job_txn_prepare_to_complete/block_job_txn_job_done/

Reading this for a second time I start to feel it too complicated for the good.

I have another idea: in block_job_completed, check if other jobs have failed,
and call this job driver's (imaginary) "abort()" callback accordingly; if all
jobs has succeeded, call a "commit" callback during last block_job_completed.

Does that make sense?

Fam

Re: [Qemu-devel] [PATCH V3] block/nfs: add support for setting debug level

2015-07-07 Thread Fam Zheng

On Tue, 07/07 08:50, Peter Lieven wrote:
> upcoming libnfs versions will support logging debug messages. Add
> support for it in qemu through a per-drive option.
> 
> Examples:
>  qemu -drive if=virtio,file=nfs://...,file.debug=2
>  qemu-img create -o debug=2 nfs://... 10G
> 
> Signed-off-by: Peter Lieven 

Reviewed-by: Fam Zheng

Re: [Qemu-devel] [PATCH v2] net: Flush queued packets when guest resumes

2015-07-07 Thread Jason Wang



On 07/07/2015 09:21 AM, Fam Zheng wrote:
> Since commit 6e99c63 "net/socket: Drop net_socket_can_send" and friends,
> net queues need to be explicitly flushed after qemu_can_send_packet()
> returns false, because the netdev side will disable the polling of fd.
>
> This fixes the case of "cont" after "stop" (or migration).
>
> Signed-off-by: Fam Zheng 
>
> ---
>
> v2: Unify with VM stop handler. (Stefan)
> ---
>  net/net.c | 19 ---
>  1 file changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/net/net.c b/net/net.c
> index 6ff7fec..28a5597 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -1257,14 +1257,19 @@ void qmp_set_link(const char *name, bool up, Error 
> **errp)
>  static void net_vm_change_state_handler(void *opaque, int running,
>  RunState state)
>  {
> -/* Complete all queued packets, to guarantee we don't modify
> - * state later when VM is not running.
> - */
> -if (!running) {
> -NetClientState *nc;
> -NetClientState *tmp;
> +NetClientState *nc;
> +NetClientState *tmp;
>  
> -QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> +QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> +if (running) {
> +/* Flush queued packets and wake up backends. */
> +if (nc->peer && qemu_can_send_packet(nc)) {
> +qemu_flush_queued_packets(nc->peer);
> +}
> +} else {
> +/* Complete all queued packets, to guarantee we don't modify
> + * state later when VM is not running.
> + */
>  qemu_flush_or_purge_queued_packets(nc, true);
>  }

Looks like qemu_can_send_packet() checks both nc->peer and runstate. So
probably, we can simplify this to:

if (qemu_can_send_packet(nc))
qemu_flush_queued_packets(nc->peer);
else
qemu_flush_or_purge_queued_packets(nc, true);

>  }

Re: [Qemu-devel] [PATCH] raw-posix.c: remove raw device access for cdrom

2015-07-07 Thread Stefan Hajnoczi

On Mon, Jul 6, 2015 at 4:58 PM, Programmingkid
 wrote:
> Quick question, In order to use a real cdrom in buffered mode (/dev/disk1s0), 
> QEMU would have to unmount the cdrom from the desktop. Is unmounting the 
> cdrom in the hdev_open() function ok? . I am making a version 3 of the cdrom 
> patch, so please disregard the last patch.

Please keep qemu-devel@nongnu.org CCed so discussion stays on the
mailing list and others can participate.

Does the user need to manually mount the CD-ROM again after QEMU has
terminated?  If so, then maybe the user should manually unmount before
running QEMU.

Stefan

Re: [Qemu-devel] [PATCH 06/10] qga: guest exec functionality for Windows guests

2015-07-07 Thread Denis V. Lunev


On 07/07/15 04:31, Michael Roth wrote:

Quoting Denis V. Lunev (2015-06-30 05:25:19)

From: Olga Krishtal 

Child process' stdin/stdout/stderr can be associated
with handles for communication via read/write interfaces.

The workflow should be something like this:
* Open an anonymous pipe through guest-pipe-open
* Execute a binary or a script in the guest. Arbitrary arguments and
   environment to a new child process could be passed through options
* Read/pass information from/to executed process using
   guest-file-read/write
* Collect the status of a child process

Have you seen anything like this in your testing?

{'execute':'guest-exec','arguments':{'path':'/Windows/System32/ipconfig.exe',
  'timeout':5000}}
{"return": {"pid": 588}}
{'execute':'guest-exec-status','arguments':{'pid':588}}
{"return": {"exit": 0, "handle-stdout": -1, "handle-stderr": -1,
  "handle-stdin": -1, "signal": -1}}
{'execute':'guest-exec-status','arguments':{'pid':588}}
{"error": {"class": "GenericError", "desc": "Invalid parameter 'pid'"}}

{'execute':'guest-exec','arguments':{'path':'/Windows/System32/ipconfig.exe',
  'timeout':5000}}
{"error": {"class": "GenericError", "desc": "CreateProcessW() failed:
  The parameter is incorrect. (error: 57)"}}
{'execute':'guest-exec','arguments':{'path':'/Windows/System32/ipconfig.exe',
  'timeout':5000}}
{"error": {"class": "GenericError", "desc": "CreateProcessW() failed:
  The parameter is incorrect. (error: 57)"}}

{'execute':'guest-exec','arguments':{'path':'/Windows/System32/ipconfig.exe',
  'timeout':5000}}
{"return": {"pid": 1836}}

I'll check this later during office time. Something definitely went wrong.


The guest-exec-status failures are expected since the first call reaps
everything, but the CreateProcessW() failures are not. Will look into it
more this evening, but it doesn't look like I'll be able to apply this in
it's current state.

I have concerns over the schema as well. I think last time we discussed
it we both seemed to agree that guest-file-open was unwieldy and
unnecessary. We should just let guest-exec return a set of file handles
instead of having users do all the plumbing.
no, the discussion was a bit different AFAIR. First of all, you have 
proposed

to use unified code to perform exec. On the other hand current mechanics
with pipes is quite inconvenient for end-users of the feature for example
for interactive shell in the guest.

We have used very simple approach for our application: pipes are not
used, the application creates VirtIO serial channel and forces guest through
this API to fork/exec the child using this serial as a stdio in/out. In this
case we do receive a convenient API for shell processing.

This means that this flexibility with direct specification of the file
descriptors is necessary.

There are two solutions from my point of view:
- keep current API, it is suitable for us
- switch to "pipe only" mechanics for guest exec, i.e. the command
   will work like "ssh" with one descriptor for read and one for write
   created automatically, but in this case we do need either a way
   to connect Unix socket in host with file descriptor in guest or
   make possibility to send events from QGA to client using QMP


I'm really sorry for chiming in right before hard freeze, very poor
timing/planning on my part.

:( can we somehow schedule this better next time? This functionality
is mandatory for us and we can not afford to drop it or forget about
it for long. There was no pressure in winter but now I am on a hard
pressure. Thus can we at least agree on API terms and come to an
agreement?


Will look at the fs/pci info patches tonight.


Signed-off-by: Olga Krishtal 
Acked-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Eric Blake 
CC: Michael Roth 
---
  qga/commands-win32.c | 309 ++-
  1 file changed, 303 insertions(+), 6 deletions(-)

diff --git a/qga/commands-win32.c b/qga/commands-win32.c
index 435a049..ad445d9 100644
--- a/qga/commands-win32.c
+++ b/qga/commands-win32.c
@@ -451,10 +451,231 @@ static void guest_file_init(void)
  QTAILQ_INIT(&guest_file_state.filehandles);
  }

+
+typedef struct GuestExecInfo {
+int pid;
+HANDLE phandle;
+GuestFileHandle *gfh_stdin;
+GuestFileHandle *gfh_stdout;
+GuestFileHandle *gfh_stderr;
+QTAILQ_ENTRY(GuestExecInfo) next;
+} GuestExecInfo;
+
+static struct {
+QTAILQ_HEAD(, GuestExecInfo) processes;
+} guest_exec_state;
+
+static void guest_exec_init(void)
+{
+QTAILQ_INIT(&guest_exec_state.processes);
+}
+
+static void guest_exec_info_add(int pid, HANDLE phandle,
+GuestFileHandle *in, GuestFileHandle *out,
+GuestFileHandle *error)
+{
+GuestExecInfo *gei;
+
+gei = g_malloc0(sizeof(GuestExecInfo));
+gei->pid = pid;
+gei->phandle = phandle;
+gei->gfh_stdin = in;
+gei->gfh_stdout = out;
+gei->gfh_stderr = error;
+QTAILQ_INSERT_TAIL(&guest_ex

Re: [Qemu-devel] [PATCH v3 14/16] acpi: Add a way for devices to add ACPI tables

2015-07-07 Thread Igor Mammedov

On Mon,  8 Jun 2015 20:12:09 -0500
miny...@acm.org wrote:

> From: Corey Minyard 
> 
> Some devices, like IPMI, need to add ACPI table entries to report
> their presence.  Add a method for adding these entries.
I think that it's not up to device to define in which table/scope
it's entries should be but rather upto a board/platform to decide.

I'd prefer the old way of adding device's ACPI AML into existing
SSDT, like we do for every other device that needs it
(for example: pvpanic).
That allows to keep ACPI code in one place and for each platform
to decide to which table put a device description and where exactly
it should be.

So I'd drop this patch and add function that returns non NULL
"IPMIFwInfo *" if IPMI device is present and use slightly modified
15/16 to add device entry into the existing SSDT.

> Signed-off-by: Corey Minyard 
> ---
>  hw/acpi/Makefile.objs |  1 +
>  hw/acpi/acpi-dev-tables.c | 80 
> +++
>  hw/i386/acpi-build.c  | 17 +
>  include/hw/acpi/acpi-dev-tables.h | 38 +++
>  4 files changed, 136 insertions(+)
>  create mode 100644 hw/acpi/acpi-dev-tables.c
>  create mode 100644 include/hw/acpi/acpi-dev-tables.h
> 
> diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
> index b9fefa7..2b84f08 100644
> --- a/hw/acpi/Makefile.objs
> +++ b/hw/acpi/Makefile.objs
> @@ -3,3 +3,4 @@ common-obj-$(CONFIG_ACPI) += memory_hotplug.o
>  common-obj-$(CONFIG_ACPI) += acpi_interface.o
>  common-obj-$(CONFIG_ACPI) += bios-linker-loader.o
>  common-obj-$(CONFIG_ACPI) += aml-build.o
> +common-obj-$(CONFIG_ACPI) += acpi-dev-tables.o
> diff --git a/hw/acpi/acpi-dev-tables.c b/hw/acpi/acpi-dev-tables.c
> new file mode 100644
> index 000..7f07a3d
> --- /dev/null
> +++ b/hw/acpi/acpi-dev-tables.c
> @@ -0,0 +1,80 @@
> +/*
> + * Add and get ACPI tables registered by devices.
> + *
> + * Copyright (c) 2015 Corey Minyard, MontaVista Software, LLC
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a 
> copy
> + * of this software and associated documentation files (the "Software"), to 
> deal
> + * in the Software without restriction, including without limitation the 
> rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +struct acpi_dev_table {
> +void *data;
> +int size;
> +const char *sig;
> +uint8_t rev;
> +QSLIST_ENTRY(acpi_dev_table) next;
> +};
> +
> +static QSLIST_HEAD(, acpi_dev_table) acpi_table_entries;
> +
> +void
> +add_acpi_dev_table(void *data, int size, const char *sig, uint8_t rev)
> +{
> +struct acpi_dev_table *e = g_malloc(sizeof(*e));
> +
> +e->data = g_malloc(size);
> +memcpy(e->data, data, size);
> +e->size = size;
> +e->sig = sig;
> +e->rev = rev;
> +QSLIST_INSERT_HEAD(&acpi_table_entries, e, next);
> +}
> +
> +struct acpi_dev_table *acpi_dev_table_first(void)
> +{
> +return QSLIST_FIRST(&acpi_table_entries);
> +}
> +
> +struct acpi_dev_table *acpi_dev_table_next(struct acpi_dev_table *current)
> +{
> +return QSLIST_NEXT(current, next);
> +}
> +
> +uint8_t *acpi_dev_table_data(struct acpi_dev_table *e)
> +{
> +return e->data;
> +}
> +
> +unsigned acpi_dev_table_len(struct acpi_dev_table *e)
> +{
> +return e->size;
> +}
> +
> +const char *acpi_dev_table_sig(struct acpi_dev_table *e)
> +{
> +return e->sig;
> +}
> +
> +uint8_t acpi_dev_table_rev(struct acpi_dev_table *e)
> +{
> +return e->rev;
> +}
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index e761005..0b4d195 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -35,6 +35,7 @@
>  #include "hw/timer/hpet.h"
>  #include "hw/i386/acpi-defs.h"
>  #include "hw/acpi/acpi.h"
> +#include "hw/acpi/acpi-dev-tables.h"
>  #include "hw/nvram/fw_cfg.h"
>  #include "hw/acpi/bios-linker-loader.h"
>  #include "hw/loader.h"
> @@ -1377,6 +1378,7 @@ void acpi_build(PcGuestInfo *guest_info, 
> AcpiBuildTables *tables)
>  AcpiMcfgInfo mcfg;
>  PcPciInfo pci;
>  uint8_t *u;
> +struct acpi_dev_table *dt;
>  size_t aml_len = 0;
>  GArray *t

Re: [Qemu-devel] [PATCH] virtio-net: Drop net_virtio_info.can_receive

2015-07-07 Thread Michael S. Tsirkin

On Tue, Jul 07, 2015 at 08:53:59AM +0800, Fam Zheng wrote:
> On Mon, 07/06 20:09, Michael S. Tsirkin wrote:
> > On Mon, Jul 06, 2015 at 04:21:16PM +0100, Stefan Hajnoczi wrote:
> > > On Mon, Jul 06, 2015 at 11:32:25AM +0800, Jason Wang wrote:
> > > > 
> > > > 
> > > > On 07/02/2015 08:46 PM, Stefan Hajnoczi wrote:
> > > > > On Tue, Jun 30, 2015 at 04:35:24PM +0800, Jason Wang wrote:
> > > > >> On 06/30/2015 11:06 AM, Fam Zheng wrote:
> > > > >>> virtio_net_receive still does the check by calling
> > > > >>> virtio_net_can_receive, if the device or driver is not ready, the 
> > > > >>> packet
> > > > >>> is dropped.
> > > > >>>
> > > > >>> This is necessary because returning false from can_receive 
> > > > >>> complicates
> > > > >>> things: the peer would disable sending until we explicitly flush the
> > > > >>> queue.
> > > > >>>
> > > > >>> Signed-off-by: Fam Zheng 
> > > > >>> ---
> > > > >>>  hw/net/virtio-net.c | 1 -
> > > > >>>  1 file changed, 1 deletion(-)
> > > > >>>
> > > > >>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > > > >>> index d728233..dbef0d0 100644
> > > > >>> --- a/hw/net/virtio-net.c
> > > > >>> +++ b/hw/net/virtio-net.c
> > > > >>> @@ -1503,7 +1503,6 @@ static int 
> > > > >>> virtio_net_load_device(VirtIODevice *vdev, QEMUFile *f,
> > > > >>>  static NetClientInfo net_virtio_info = {
> > > > >>>  .type = NET_CLIENT_OPTIONS_KIND_NIC,
> > > > >>>  .size = sizeof(NICState),
> > > > >>> -.can_receive = virtio_net_can_receive,
> > > > >>>  .receive = virtio_net_receive,
> > > > >>>  .link_status_changed = virtio_net_set_link_status,
> > > > >>>  .query_rx_filter = virtio_net_query_rxfilter,
> > > > >> A side effect of this patch is it will read and then drop packet is
> > > > >> guest driver is no ok.
> > > > > I think that the semantics of .can_receive() and .receive() return
> > > > > values are currently incorrect in many NICs.  They have .can_receive()
> > > > > functions that return false for conditions where .receive() would
> > > > > discard the packet.  So what happens is that packets get queued when
> > > > > they should actually be discarded.
> > > > 
> > > > Yes, but they are bugs more or less.
> > > > 
> > > > >
> > > > > The purpose of the flow control (queuing) mechanism is to tell the
> > > > > sender to hold off until the receiver has more rx buffers available.
> > > > > It's a short-term thing that doesn't included link down, rx disable, 
> > > > > or
> > > > > NIC reset states.
> > > > >
> > > > > Therefore, I think this patch will not introduce a regression.  It is
> > > > > adjusting the code to stop queuing packets when they should actually 
> > > > > be
> > > > > dropped.
> > > > >
> > > > > Thoughts?
> > > > 
> > > > I agree there's no functional issue. But it cause wasting of cpu cycles
> > > > (consider guest is being flooded). Sometime it maybe even dangerous. For
> > > > tap, we're probably ok since we have 756ae78b but for other backend, we
> > > > don't.
> > > 
> > > If the guest uses iptables rules or other mechanisms to drop bogus
> > > packets the cost is even higher than discarding them at the QEMU layer.
> > > 
> > > What's more is that if you're using link down as a DoS mitigation
> > > strategy then you might as well hot unplug the NIC.
> > > 
> > > Stefan
> > 
> > 
> > 
> > Frankly, I don't see the point of the patch.  Is this supposed to be a
> > bugfix? If so, there's should be a description about how to trigger the
> > bug.  Is this an optimization? If so there should be some numbers
> > showing a gain.
> 
> It's a bug fix, we are not flushing the queue when DIRVER_OK is being set or
> when buffer is becoming available (the virtio_net_can_receive conditions). Not
> an issue before a90a7425cf but since that the semantics is enforced.
> 
> Fam

I think the safest and obvious fix is to flush on DRIVER_OK then (unless
vhost started). That might be 2.4 material.

-- 
MST

[Qemu-devel] Using QCOW2 with nand flashes.

2015-07-07 Thread sai pavan

Hi,

I am trying to implement fake disk images for emulating nand flashes.
I see the spares files are formed when the content is zeros. But for nand
flashes the content is all one's initially. It is difficult for me make a
sparse file with all ones.

Do any one have suggestions for this problem.

I am thinking of creating an nand flash file with all zeros and negating
the data at receiving end in qemu. So the input file will be null, but the
concept of all 1's be intact. But this will be confusing if some one likes
to compare the output bin files after a write. One should read the data
negating.

Regards,
Sai Pavan

[Qemu-devel] Fwd: Using QCOW2 with nand flashes.

2015-07-07 Thread sai pavan

Hi,

I am trying to implement fake disk images for emulating nand flashes.
I see the spares files are formed when the content is zeros. But for nand
flashes the content is all one's initially. It is difficult for me make a
sparse file with all ones.

Do any one have suggestions for this problem.

I am thinking of creating an nand flash file with all zeros and negating
the data at receiving end in qemu. So the input file will be null, but the
concept of all 1's be intact. But this will be confusing if some one likes
to compare the output bin files after a write. One should read the data
negating.

Regards,
Sai Pavan

Re: [Qemu-devel] [PATCH v2] net: Flush queued packets when guest resumes

2015-07-07 Thread Michael S. Tsirkin

On Tue, Jul 07, 2015 at 09:21:07AM +0800, Fam Zheng wrote:
> Since commit 6e99c63 "net/socket: Drop net_socket_can_send" and friends,
> net queues need to be explicitly flushed after qemu_can_send_packet()
> returns false, because the netdev side will disable the polling of fd.
> 
> This fixes the case of "cont" after "stop" (or migration).
> 
> Signed-off-by: Fam Zheng 

Note virtio has its own handler which must be used to
flush packets - this one might run too early or too late.

> ---
> 
> v2: Unify with VM stop handler. (Stefan)
> ---
>  net/net.c | 19 ---
>  1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/net/net.c b/net/net.c
> index 6ff7fec..28a5597 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -1257,14 +1257,19 @@ void qmp_set_link(const char *name, bool up, Error 
> **errp)
>  static void net_vm_change_state_handler(void *opaque, int running,
>  RunState state)
>  {
> -/* Complete all queued packets, to guarantee we don't modify
> - * state later when VM is not running.
> - */
> -if (!running) {
> -NetClientState *nc;
> -NetClientState *tmp;
> +NetClientState *nc;
> +NetClientState *tmp;
>  
> -QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> +QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> +if (running) {
> +/* Flush queued packets and wake up backends. */
> +if (nc->peer && qemu_can_send_packet(nc)) {
> +qemu_flush_queued_packets(nc->peer);
> +}
> +} else {
> +/* Complete all queued packets, to guarantee we don't modify
> + * state later when VM is not running.
> + */
>  qemu_flush_or_purge_queued_packets(nc, true);
>  }
>  }
> -- 
> 2.4.3

Re: [Qemu-devel] [PULL v3 00/13] KVM patches (SMM implementation) for 2015-07-06

2015-07-07 Thread Peter Maydell

On 6 July 2015 at 18:28, Paolo Bonzini  wrote:
> The following changes since commit 7edd8e4660beb301d527257f8e04ebec0f841cb0:
>
>   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
> staging (2015-07-06 14:03:44 +0100)
>
> are available in the git repository at:
>
>
>   git://github.com/bonzini/qemu.git tags/for-upstream-smm
>
> for you to fetch changes up to 355023f2010c4df619d88a0dd7012b4b9c74c12c:
>
>   pc: add SMM property (2015-07-06 18:39:59 +0200)
>
> 
> This series implements KVM support for SMM, and lets you enable/disable
> it through the "smm" property of x86 machine types.
>
> 
>

Applied, thanks.

-- PMM

Re: [Qemu-devel] [PATCH 01/10] util, qga: drop guest_file_toggle_flags

2015-07-07 Thread Denis V. Lunev


On 30/06/15 13:25, Denis V. Lunev wrote:

From: Olga Krishtal 

guest_file_toggle_flags is a copy from semi-portable qemu_set_nonblock.
The latter is not working properly for Windows due to reduced Windows
Posix implementation.

On Windows OS there is a separate API for changing flags of file, pipes
and sockets. Portable way to change file descriptor flags requires
to detect file descriptor type and proper actions depending of that
type. The patch adds wrapper qemu_set_fd_nonblocking into Windows specific
code to handle this stuff properly.

The only problem is that qemu_set_nonblock is void but this should not
be a problem.

Signed-off-by: Olga Krishtal 
Signed-off-by: Denis V. Lunev 
CC: Eric Blake 
CC: Michael Roth 

Michael, Eric,

can you please consider merging of this patch. This is a
semi-independent cleanup which has sense on its own.

Den



---
  qga/commands-posix.c | 27 ++-
  util/oslib-win32.c   | 52 +++-
  2 files changed, 49 insertions(+), 30 deletions(-)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index befd00b..40dbe25 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -28,6 +28,7 @@
  #include "qapi/qmp/qerror.h"
  #include "qemu/queue.h"
  #include "qemu/host-utils.h"
+#include "qemu/sockets.h"
  
  #ifndef CONFIG_HAS_ENVIRON

  #ifdef __APPLE__
@@ -376,27 +377,6 @@ safe_open_or_create(const char *path, const char *mode, 
Error **errp)
  return NULL;
  }
  
-static int guest_file_toggle_flags(int fd, int flags, bool set, Error **err)

-{
-int ret, old_flags;
-
-old_flags = fcntl(fd, F_GETFL);
-if (old_flags == -1) {
-error_setg_errno(err, errno, QERR_QGA_COMMAND_FAILED,
- "failed to fetch filehandle flags");
-return -1;
-}
-
-ret = fcntl(fd, F_SETFL, set ? (old_flags | flags) : (old_flags & ~flags));
-if (ret == -1) {
-error_setg_errno(err, errno, QERR_QGA_COMMAND_FAILED,
- "failed to set filehandle flags");
-return -1;
-}
-
-return ret;
-}
-
  int64_t qmp_guest_file_open(const char *path, bool has_mode, const char *mode,
  Error **errp)
  {
@@ -417,10 +397,7 @@ int64_t qmp_guest_file_open(const char *path, bool 
has_mode, const char *mode,
  /* set fd non-blocking to avoid common use cases (like reading from a
   * named pipe) from hanging the agent
   */
-if (guest_file_toggle_flags(fileno(fh), O_NONBLOCK, true, errp) < 0) {
-fclose(fh);
-return -1;
-}
+qemu_set_nonblock(fileno(fh));
  
  handle = guest_file_handle_add(fh, errp);

  if (handle < 0) {
diff --git a/util/oslib-win32.c b/util/oslib-win32.c
index 730a670..1a6ae72 100644
--- a/util/oslib-win32.c
+++ b/util/oslib-win32.c
@@ -119,17 +119,59 @@ struct tm *localtime_r(const time_t *timep, struct tm 
*result)
  return p;
  }
  
-void qemu_set_block(int fd)

+static void qemu_set_fd_nonblocking(int fd, bool nonblocking)
  {
-unsigned long opt = 0;
-WSAEventSelect(fd, NULL, 0);
+HANDLE handle;
+DWORD file_type, pipe_state;
+
+handle = (HANDLE)_get_osfhandle(fd);
+if (handle == INVALID_HANDLE_VALUE) {
+return;
+}
+
+file_type = GetFileType(handle);
+if (file_type != FILE_TYPE_PIPE) {
+return;
+}
+
+/* If file_type == FILE_TYPE_PIPE, according to msdn
+ * the specified file is socket or named pipe */
+if (GetNamedPipeHandleState(handle, &pipe_state, NULL,
+NULL, NULL, NULL, 0)) {
+/* The fd is named pipe fd */
+if (!nonblocking == !(pipe_state & PIPE_NOWAIT)) {
+/* In this case we do not need perform any operation, because
+ * nonblocking = true and PIPE_NOWAIT is already set or
+ * nonblocking = false and PIPE_NOWAIT is not set */
+return;
+}
+
+if (nonblocking) {
+pipe_state |= PIPE_NOWAIT;
+} else {
+pipe_state &= ~PIPE_NOWAIT;
+}
+
+SetNamedPipeHandleState(handle, &pipe_state, NULL, NULL);
+return;
+}
+
+/* The fd is socket fd */
+unsigned long opt = (unsigned long)nonblocking;
+if (!nonblocking) {
+WSAEventSelect(fd, NULL, 0);
+}
  ioctlsocket(fd, FIONBIO, &opt);
  }
  
+void qemu_set_block(int fd)

+{
+qemu_set_fd_nonblocking(fd, false);
+}
+
  void qemu_set_nonblock(int fd)
  {
-unsigned long opt = 1;
-ioctlsocket(fd, FIONBIO, &opt);
+qemu_set_fd_nonblocking(fd, true);
  qemu_fd_register(fd);
  }

Re: [Qemu-devel] [PATCH v2] net-hub: Drop can_receive

2015-07-07 Thread Stefan Hajnoczi

On Tue, Jul 07, 2015 at 02:30:30PM +0800, Fam Zheng wrote:
> This moves the semantics from net_hub_port_can_receive to receive
> functions, by returning 0 if all receiving ports return 0. Also,
> remember to flush the source port's queue in that case.
> 
> Signed-off-by: Fam Zheng 
> ---
>  net/hub.c | 54 +-
>  1 file changed, 29 insertions(+), 25 deletions(-)

This patch revision doesn't take into account the special case code in
qemu_flush_or_purge_queued_packets(), which I mentioned in my reply to
the previous revision of this patch.

The queue is now flushed twice because you've introduced
net_hub_port_send_cb() but qemu_flush_or_purge_queued_packets() already
calls net_hub_flush().

If you want to get rid of net_hub_flush(), that's great.  But please
remove the duplicate code.

pgpqt14tcERsI.pgp
Description: PGP signature

[Qemu-devel] [PATCH COLO-BLOCK v8 00/18] Block replication for continuous checkpoints

2015-07-07 Thread Wen Congyang

Block replication is a very important feature which is used for
continuous checkpoints(for example: COLO).

You can the detailed information about block replication from here:
http://wiki.qemu.org/Features/BlockReplication

Usage:
Please refer to docs/block-replication.txt

You can get the patch here:
https://github.com/coloft/qemu/tree/wency/block-replication-v8

You can get the patch with framework here:
https://github.com/coloft/qemu/tree/wency/colo_framework_v8

TODO:
1. Continuous block replication. It will be started after basic functions
   are accepted.

Wen Congyang (18):
  Add new block driver interface to add/delete a BDS's child
  quorum: implement block driver interfaces add/delete a BDS's child
  hmp: add monitor command to add/remove a child
  introduce a new API qemu_opts_absorb_qdict_by_index()
  quorum: allow ignoring child errors
  introduce a new API to enable/disable attach device model
  introduce a new API to check if blk is attached
  block: make bdrv_put_ref_bh_schedule() as a public API
  Backup: clear all bitmap when doing block checkpoint
  allow writing to the backing file
  Allow creating backup jobs when opening BDS
  block: Allow references for backing files
  docs: block replication's description
  Add new block driver interfaces to control block replication
  skip nbd_target when starting block replication
  quorum: implement block driver interfaces for block replication
  Implement new driver for block replication
  Add a new API to start/stop replication, do checkpoint to all BDSes

 block.c| 266 -
 block/Makefile.objs|   3 +-
 block/backup.c |  13 ++
 block/block-backend.c  |  33 +++
 block/quorum.c | 244 ++-
 block/replication.c| 443 +
 blockdev.c |  90 ++---
 blockjob.c |  10 +
 docs/block-replication.txt | 182 +
 hmp-commands.hx|  28 +++
 include/block/block.h  |  15 ++
 include/block/block_int.h  |  19 ++
 include/block/blockjob.h   |  12 ++
 include/qemu/option.h  |   2 +
 include/sysemu/block-backend.h |   3 +
 include/sysemu/blockdev.h  |   2 +
 qapi/block.json|  16 ++
 util/qemu-option.c |  44 
 18 files changed, 1378 insertions(+), 47 deletions(-)
 create mode 100644 block/replication.c
 create mode 100644 docs/block-replication.txt

-- 
2.4.3

[Qemu-devel] [PATCH COLO-BLOCK v8 01/18] Add new block driver interface to add/delete a BDS's child

2015-07-07 Thread Wen Congyang

In some cases, we want to take a quorum child offline, and take
another child online.

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
---
 block.c   | 39 +++
 include/block/block.h |  4 
 include/block/block_int.h |  5 +
 3 files changed, 48 insertions(+)

diff --git a/block.c b/block.c
index 7e130cc..2cbc4f9 100644
--- a/block.c
+++ b/block.c
@@ -4195,3 +4195,42 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs)
 {
 return &bs->stats;
 }
+
+/*
+ * Hot add/remove a BDS's child. So the user can take a child offline when
+ * it is broken and take a new child online
+ */
+void bdrv_add_child(BlockDriverState *bs, QDict *options, Error **errp)
+{
+
+if (!bs->drv || !bs->drv->bdrv_add_child) {
+error_setg(errp, "this feature or command is not currently supported");
+return;
+}
+
+bs->drv->bdrv_add_child(bs, options, errp);
+}
+
+void bdrv_del_child(BlockDriverState *bs, BlockDriverState *child_bs,
+Error **errp)
+{
+BdrvChild *child;
+
+if (!bs->drv || !bs->drv->bdrv_del_child) {
+error_setg(errp, "this feature or command is not currently supported");
+return;
+}
+
+QLIST_FOREACH(child, &bs->children, next) {
+if (child->bs == child_bs) {
+break;
+}
+}
+
+if (!child) {
+error_setg(errp, "Invalid child");
+return;
+}
+
+bs->drv->bdrv_del_child(bs, child_bs, errp);
+}
diff --git a/include/block/block.h b/include/block/block.h
index 06e4137..29d3363 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -609,4 +609,8 @@ void bdrv_flush_io_queue(BlockDriverState *bs);
 
 BlockAcctStats *bdrv_get_stats(BlockDriverState *bs);
 
+void bdrv_add_child(BlockDriverState *bs, QDict *options, Error **errp);
+void bdrv_del_child(BlockDriverState *bs, BlockDriverState *child,
+Error **errp);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 8996baf..6fddea4 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -288,6 +288,11 @@ struct BlockDriver {
  */
 int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);
 
+void (*bdrv_add_child)(BlockDriverState *bs, QDict *options,
+   Error **errp);
+void (*bdrv_del_child)(BlockDriverState *bs, BlockDriverState *child,
+   Error **errp);
+
 QLIST_ENTRY(BlockDriver) list;
 };
 
-- 
2.4.3

[Qemu-devel] [PATCH COLO-BLOCK v8 02/18] quorum: implement block driver interfaces add/delete a BDS's child

2015-07-07 Thread Wen Congyang

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Cc: Alberto Garcia 
---
 block/quorum.c | 73 --
 1 file changed, 71 insertions(+), 2 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index b0eead0..76b29b1 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -67,6 +67,9 @@ typedef struct QuorumVotes {
 typedef struct BDRVQuorumState {
 BlockDriverState **bs; /* children BlockDriverStates */
 int num_children;  /* children count */
+int max_children;  /* The maximum children count, we need to reallocate
+* bs if num_children will larger than maximum.
+*/
 int threshold; /* if less than threshold children reads gave the
 * same result a quorum error occurs.
 */
@@ -879,9 +882,9 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 ret = -EINVAL;
 goto exit;
 }
-if (s->num_children < 2) {
+if (s->num_children < 1) {
 error_setg(&local_err,
-   "Number of provided children must be greater than 1");
+   "Number of provided children must be 1 or more");
 ret = -EINVAL;
 goto exit;
 }
@@ -930,6 +933,7 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 /* allocate the children BlockDriverState array */
 s->bs = g_new0(BlockDriverState *, s->num_children);
 opened = g_new0(bool, s->num_children);
+s->max_children = s->num_children;
 
 for (i = 0; i < s->num_children; i++) {
 char indexstr[32];
@@ -1000,6 +1004,68 @@ static void quorum_attach_aio_context(BlockDriverState 
*bs,
 }
 }
 
+static void quorum_add_child(BlockDriverState *bs, QDict *options, Error 
**errp)
+{
+BDRVQuorumState *s = bs->opaque;
+int ret;
+Error *local_err = NULL;
+
+bdrv_drain(bs);
+
+if (s->num_children == s->max_children) {
+if (s->max_children >= INT_MAX) {
+error_setg(errp, "Too many children");
+return;
+}
+
+s->bs = g_renew(BlockDriverState *, s->bs, s->max_children + 1);
+s->bs[s->num_children] = NULL;
+s->max_children += 1;
+}
+
+ret = bdrv_open_image(&s->bs[s->num_children], NULL, options, "child", bs,
+  &child_format, false, &local_err);
+if (ret < 0) {
+error_propagate(errp, local_err);
+return;
+}
+s->num_children++;
+}
+
+static void quorum_del_child(BlockDriverState *bs, BlockDriverState *child_bs,
+ Error **errp)
+{
+BDRVQuorumState *s = bs->opaque;
+int i;
+
+for (i = 0; i < s->num_children; i++) {
+if (s->bs[i] == child_bs) {
+break;
+}
+}
+
+if (i == s->num_children) {
+error_setg(errp, "Invalid child");
+return;
+}
+
+if (s->num_children <= s->threshold) {
+error_setg(errp, "Cannot remove any more child");
+return;
+}
+
+if (s->num_children == 1) {
+error_setg(errp, "Cannot remove the last child");
+return;
+}
+
+bdrv_drain(bs);
+/* We can safe remove this child now */
+memmove(&s->bs[i], &s->bs[i+1], (s->num_children - i - 1) * sizeof(void 
*));
+s->num_children--;
+s->bs[s->num_children] = NULL;
+}
+
 static void quorum_refresh_filename(BlockDriverState *bs)
 {
 BDRVQuorumState *s = bs->opaque;
@@ -1054,6 +1120,9 @@ static BlockDriver bdrv_quorum = {
 .bdrv_detach_aio_context= quorum_detach_aio_context,
 .bdrv_attach_aio_context= quorum_attach_aio_context,
 
+.bdrv_add_child = quorum_add_child,
+.bdrv_del_child = quorum_del_child,
+
 .is_filter  = true,
 .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
 };
-- 
2.4.3

[Qemu-devel] [PATCH COLO-BLOCK v8 04/18] introduce a new API qemu_opts_absorb_qdict_by_index()

2015-07-07 Thread Wen Congyang

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
---
 include/qemu/option.h |  2 ++
 util/qemu-option.c| 44 
 2 files changed, 46 insertions(+)

diff --git a/include/qemu/option.h b/include/qemu/option.h
index 57e51c9..725a781 100644
--- a/include/qemu/option.h
+++ b/include/qemu/option.h
@@ -129,6 +129,8 @@ QemuOpts *qemu_opts_from_qdict(QemuOptsList *list, const 
QDict *qdict,
Error **errp);
 QDict *qemu_opts_to_qdict(QemuOpts *opts, QDict *qdict);
 void qemu_opts_absorb_qdict(QemuOpts *opts, QDict *qdict, Error **errp);
+void qemu_opts_absorb_qdict_by_index(QemuOpts *opts, QDict *qdict,
+ const char *index, Error **errp);
 
 typedef int (*qemu_opts_loopfunc)(void *opaque, QemuOpts *opts, Error **errp);
 int qemu_opts_foreach(QemuOptsList *list, qemu_opts_loopfunc func,
diff --git a/util/qemu-option.c b/util/qemu-option.c
index efe9d27..a93a269 100644
--- a/util/qemu-option.c
+++ b/util/qemu-option.c
@@ -1021,6 +1021,50 @@ void qemu_opts_absorb_qdict(QemuOpts *opts, QDict 
*qdict, Error **errp)
 }
 
 /*
+ * Adds all QDict entries to the QemuOpts that can be added and removes them
+ * from the QDict. The key starts with "%index." in the %qdict. When this
+ * function returns, the QDict contains only those entries that couldn't be
+ * added to the QemuOpts.
+ */
+void qemu_opts_absorb_qdict_by_index(QemuOpts *opts, QDict *qdict,
+ const char *index, Error **errp)
+{
+const QDictEntry *entry, *next;
+const char *key;
+int len = strlen(index);
+
+entry = qdict_first(qdict);
+
+while (entry != NULL) {
+Error *local_err = NULL;
+OptsFromQDictState state = {
+.errp = &local_err,
+.opts = opts,
+};
+
+next = qdict_next(qdict, entry);
+if (strncmp(entry->key, index, len) || *(entry->key + len) != '.') {
+entry = next;
+continue;
+}
+
+key = entry->key + len + 1;
+
+if (find_desc_by_name(opts->list->desc, key)) {
+qemu_opts_from_qdict_1(key, entry->value, &state);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+} else {
+qdict_del(qdict, entry->key);
+}
+}
+
+entry = next;
+}
+}
+
+/*
  * Convert from QemuOpts to QDict.
  * The QDict values are of type QString.
  * TODO We'll want to use types appropriate for opt->desc->type, but
-- 
2.4.3

[Qemu-devel] [PATCH COLO-BLOCK v8 05/18] quorum: allow ignoring child errors

2015-07-07 Thread Wen Congyang

If the child is not ready, read/write/getlength/flush will
return -errno. It is not critical error, and can be ignored:
1. read/write:
   Just not report the error event.
2. getlength:
   just ignore it. If all children's getlength return -errno,
   and be ignored, return -EIO.
3. flush:
   Just ignore it. If all children's getlength return -errno,
   and be ignored, return 0.

Usage: children.x.ignore-errors=true

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Cc: Alberto Garcia 
---
 block/quorum.c | 94 +-
 1 file changed, 87 insertions(+), 7 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index 76b29b1..2a45d0e 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -31,6 +31,7 @@
 #define QUORUM_OPT_BLKVERIFY  "blkverify"
 #define QUORUM_OPT_REWRITE"rewrite-corrupted"
 #define QUORUM_OPT_READ_PATTERN   "read-pattern"
+#define QUORUM_CHILDREN_OPT_IGNORE_ERRORS   "ignore-errors"
 
 /* This union holds a vote hash value */
 typedef union QuorumVoteValue {
@@ -66,6 +67,7 @@ typedef struct QuorumVotes {
 /* the following structure holds the state of one quorum instance */
 typedef struct BDRVQuorumState {
 BlockDriverState **bs; /* children BlockDriverStates */
+bool *ignore_errors;   /* ignore children's error? */
 int num_children;  /* children count */
 int max_children;  /* The maximum children count, we need to reallocate
 * bs if num_children will larger than maximum.
@@ -101,6 +103,7 @@ typedef struct QuorumChildRequest {
 uint8_t *buf;
 int ret;
 QuorumAIOCB *parent;
+int index;
 } QuorumChildRequest;
 
 /* Quorum will use the following structure to track progress of each read/write
@@ -213,6 +216,7 @@ static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
 acb->qcrs[i].buf = NULL;
 acb->qcrs[i].ret = 0;
 acb->qcrs[i].parent = acb;
+acb->qcrs[i].index = i;
 }
 
 return acb;
@@ -306,7 +310,7 @@ static void quorum_aio_cb(void *opaque, int ret)
 acb->count++;
 if (ret == 0) {
 acb->success_count++;
-} else {
+} else if (!s->ignore_errors[sacb->index]) {
 quorum_report_bad(acb, sacb->aiocb->bs->node_name, ret);
 }
 assert(acb->count <= s->num_children);
@@ -721,19 +725,31 @@ static BlockAIOCB *quorum_aio_writev(BlockDriverState *bs,
 static int64_t quorum_getlength(BlockDriverState *bs)
 {
 BDRVQuorumState *s = bs->opaque;
-int64_t result;
+int64_t result = -EIO;
 int i;
 
 /* check that all file have the same length */
-result = bdrv_getlength(s->bs[0]);
-if (result < 0) {
-return result;
-}
-for (i = 1; i < s->num_children; i++) {
+for (i = 0; i < s->num_children; i++) {
 int64_t value = bdrv_getlength(s->bs[i]);
+
 if (value < 0) {
 return value;
 }
+
+if (value == 0 && s->ignore_errors[i]) {
+/*
+ * If the child is not ready, it cannot return -errno,
+ * otherwise refresh_total_sectors() will fail when
+ * we open the child.
+ */
+continue;
+}
+
+if (result == -EIO) {
+result = value;
+continue;
+}
+
 if (value != result) {
 return -EIO;
 }
@@ -771,6 +787,9 @@ static coroutine_fn int quorum_co_flush(BlockDriverState 
*bs)
 
 for (i = 0; i < s->num_children; i++) {
 result = bdrv_co_flush(s->bs[i]);
+if (result < 0 && s->ignore_errors[i]) {
+result = 0;
+}
 result_value.l = result;
 quorum_count_vote(&error_votes, &result_value, i);
 }
@@ -845,6 +864,19 @@ static QemuOptsList quorum_runtime_opts = {
 },
 };
 
+static QemuOptsList quorum_children_common_opts = {
+.name = "quorum children",
+.head = QTAILQ_HEAD_INITIALIZER(quorum_children_common_opts.head),
+.desc = {
+{
+.name = QUORUM_CHILDREN_OPT_IGNORE_ERRORS,
+.type = QEMU_OPT_BOOL,
+.help = "ignore child I/O error",
+},
+{ /* end of list */ }
+},
+};
+
 static int parse_read_pattern(const char *opt)
 {
 int i;
@@ -863,6 +895,37 @@ static int parse_read_pattern(const char *opt)
 return -EINVAL;
 }
 
+static int parse_children_options(BDRVQuorumState *s, QDict *options,
+  const char *indexstr, int index,
+  Error **errp)
+{
+QemuOpts *children_opts = NULL;
+Error *local_err = NULL;
+int ret = 0;
+bool value;
+
+children_opts = qemu_opts_create(&quorum_children_common_opts, NULL, 0,
+ &error_abort);
+qemu_opts_absorb_qdict_by_index(children_opts, options, indexstr,
+&local_err);
+if (local_err) {
+ret = -EINVAL;
+goto out;
+}
+
+

[Qemu-devel] [PATCH COLO-BLOCK v8 10/18] allow writing to the backing file

2015-07-07 Thread Wen Congyang

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
---
 block.c | 41 -
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index f319921..879ca75 100644
--- a/block.c
+++ b/block.c
@@ -747,6 +747,15 @@ static const BdrvChildRole child_backing = {
 .inherit_flags = bdrv_backing_flags,
 };
 
+static int bdrv_backing_rw_flags(int flags)
+{
+return bdrv_backing_flags(flags) | BDRV_O_RDWR;
+}
+
+static const BdrvChildRole child_backing_rw = {
+.inherit_flags = bdrv_backing_rw_flags,
+};
+
 static int bdrv_open_flags(BlockDriverState *bs, int flags)
 {
 int open_flags = flags | BDRV_O_CACHE_WB;
@@ -1133,6 +1142,20 @@ out:
 bdrv_refresh_limits(bs, NULL);
 }
 
+#define ALLOW_WRITE_BACKING_FILE"allow-write-backing-file"
+static QemuOptsList backing_file_opts = {
+.name = "backing_file",
+.head = QTAILQ_HEAD_INITIALIZER(backing_file_opts.head),
+.desc = {
+{
+.name = ALLOW_WRITE_BACKING_FILE,
+.type = QEMU_OPT_BOOL,
+.help = "allow write to backing file",
+},
+{ /* end of list */ }
+},
+};
+
 /*
  * Opens the backing file for a BlockDriverState if not yet open
  *
@@ -1147,6 +1170,9 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*options, Error **errp)
 int ret = 0;
 BlockDriverState *backing_hd;
 Error *local_err = NULL;
+QemuOpts *opts = NULL;
+bool child_rw = false;
+const BdrvChildRole *child_role = NULL;
 
 if (bs->backing_hd != NULL) {
 QDECREF(options);
@@ -1159,6 +1185,18 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*options, Error **errp)
 }
 
 bs->open_flags &= ~BDRV_O_NO_BACKING;
+
+opts = qemu_opts_create(&backing_file_opts, NULL, 0, &error_abort);
+qemu_opts_absorb_qdict(opts, options, &local_err);
+if (local_err) {
+ret = -EINVAL;
+error_propagate(errp, local_err);
+QDECREF(options);
+goto free_exit;
+}
+child_rw = qemu_opt_get_bool(opts, ALLOW_WRITE_BACKING_FILE, false);
+child_role = child_rw ? &child_backing_rw : &child_backing;
+
 if (qdict_haskey(options, "file.filename")) {
 backing_filename[0] = '\0';
 } else if (bs->backing_file[0] == '\0' && qdict_size(options) == 0) {
@@ -1191,7 +1229,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*options, Error **errp)
 assert(bs->backing_hd == NULL);
 ret = bdrv_open_inherit(&backing_hd,
 *backing_filename ? backing_filename : NULL,
-NULL, options, 0, bs, &child_backing,
+NULL, options, 0, bs, child_role,
 NULL, &local_err);
 if (ret < 0) {
 bdrv_unref(backing_hd);
@@ -1205,6 +1243,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*options, Error **errp)
 bdrv_set_backing_hd(bs, backing_hd);
 
 free_exit:
+qemu_opts_del(opts);
 g_free(backing_filename);
 return ret;
 }
-- 
2.4.3

[Qemu-devel] [PATCH COLO-BLOCK v8 13/18] docs: block replication's description

2015-07-07 Thread Wen Congyang

Signed-off-by: Wen Congyang 
Signed-off-by: Yang Hongyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
---
 docs/block-replication.txt | 182 +
 1 file changed, 182 insertions(+)
 create mode 100644 docs/block-replication.txt

diff --git a/docs/block-replication.txt b/docs/block-replication.txt
new file mode 100644
index 000..13e004e
--- /dev/null
+++ b/docs/block-replication.txt
@@ -0,0 +1,182 @@
+Block replication
+
+Copyright Fujitsu, Corp. 2015
+Copyright (c) 2015 Intel Corporation
+Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+Block replication is used for continuous checkpoints. It is designed
+for COLO (COurse-grain LOck-stepping) where the Secondary VM is running.
+It can also be applied for FT/HA (Fault-tolerance/High Assurance) scenario,
+where the Secondary VM is not running.
+
+This document gives an overview of block replication's design.
+
+== Background ==
+High availability solutions such as micro checkpoint and COLO will do
+consecutive checkpoints. The VM state of Primary VM and Secondary VM is
+identical right after a VM checkpoint, but becomes different as the VM
+executes till the next checkpoint. To support disk contents checkpoint,
+the modified disk contents in the Secondary VM must be buffered, and are
+only dropped at next checkpoint time. To reduce the network transportation
+effort at the time of checkpoint, the disk modification operations of
+Primary disk are asynchronously forwarded to the Secondary node.
+
+== Workflow ==
+The following is the image of block replication workflow:
+
++--+++
+|Primary Write Requests||Secondary Write Requests|
++--+++
+  |   |
+  |  (4)
+  |   V
+  |  /-\
+  |  Copy and Forward| |
+  |-(1)--+   | Disk Buffer |
+  |  |   | |
+  | (3)  \-/
+  | speculative  ^
+  |write through(2)
+  |  |   |
+  V  V   |
+   +--+   ++
+   | Primary Disk |   | Secondary Disk |
+   +--+   ++
+
+1) Primary write requests will be copied and forwarded to Secondary
+   QEMU.
+2) Before Primary write requests are written to Secondary disk, the
+   original sector content will be read from Secondary disk and
+   buffered in the Disk buffer, but it will not overwrite the existing
+   sector content(it could be from either "Secondary Write Requests" or
+   previous COW of "Primary Write Requests") in the Disk buffer.
+3) Primary write requests will be written to Secondary disk.
+4) Secondary write requests will be buffered in the Disk buffer and it
+   will overwrite the existing sector content in the buffer.
+
+== Architecture ==
+We are going to implement block replication from many basic
+blocks that are already in QEMU.
+
+ virtio-blk   ||
+ ^||.--
+ |||| Secondary
+1 Quorum  ||'--
+ /  \ ||
+/\||
+   Primary2 filter
+ disk ^
 virtio-blk
+  |
  ^
+3 NBD  --->  3 NBD 
  |
+client|| server
  2 filter
+  ||^  
  ^
+. |||  
  |
+Primary | ||  Secondary disk <- hidden-disk 5 
<- active-disk 4
+' |||  backing^   backing
+  ||| |
+  ||| |
+  ||'-'
+  ||   drive-backup sync=none
+
+1) The disk on the pr

[Qemu-devel] [PATCH COLO-BLOCK v8 06/18] introduce a new API to enable/disable attach device model

2015-07-07 Thread Wen Congyang

Signed-off-by: Wen Congyang 
---
 block/block-backend.c  | 24 
 include/sysemu/block-backend.h |  2 ++
 2 files changed, 26 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index aee8a12..72d8b2c 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -344,6 +344,30 @@ void *blk_get_attached_dev(BlockBackend *blk)
 }
 
 /*
+ * Disable to attach a device mode to @blk.
+ * Return 0 on success, -EBUSY when a device model is attached already.
+ */
+int blk_disable_attach_dev(BlockBackend *blk)
+{
+if (blk->dev) {
+return blk->dev == (void *)-1 ? 0 : -EBUSY;
+}
+
+blk->dev = (void *)-1;
+return 0;
+}
+
+/*
+ * Enable to attach a device mode to @blk.
+ */
+void blk_enable_attach_dev(BlockBackend *blk)
+{
+if (blk->dev == (void *)-1) {
+blk->dev = NULL;
+}
+}
+
+/*
  * Set @blk's device model callbacks to @ops.
  * @opaque is the opaque argument to pass to the callbacks.
  * This is for use by device models.
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 8fc960f..7619a9f 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -80,6 +80,8 @@ int blk_attach_dev(BlockBackend *blk, void *dev);
 void blk_attach_dev_nofail(BlockBackend *blk, void *dev);
 void blk_detach_dev(BlockBackend *blk, void *dev);
 void *blk_get_attached_dev(BlockBackend *blk);
+int blk_disable_attach_dev(BlockBackend *blk);
+void blk_enable_attach_dev(BlockBackend *blk);
 void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops, void *opaque);
 int blk_read(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
  int nb_sectors);
-- 
2.4.3

[Qemu-devel] [PATCH COLO-BLOCK v8 14/18] Add new block driver interfaces to control block replication

2015-07-07 Thread Wen Congyang

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Cc: Luiz Capitulino 
Cc: Michael Roth 
Reviewed-by: Paolo Bonzini 
---
 block.c   | 40 
 include/block/block.h |  5 +
 include/block/block_int.h | 14 ++
 qapi/block.json   | 16 
 4 files changed, 75 insertions(+)

diff --git a/block.c b/block.c
index f7192a3..5778064 100644
--- a/block.c
+++ b/block.c
@@ -4329,3 +4329,43 @@ void bdrv_del_child(BlockDriverState *bs, 
BlockDriverState *child_bs,
 
 bs->drv->bdrv_del_child(bs, child_bs, errp);
 }
+
+void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode,
+Error **errp)
+{
+BlockDriver *drv = bs->drv;
+
+if (drv && drv->bdrv_start_replication) {
+drv->bdrv_start_replication(bs, mode, errp);
+} else if (bs->file) {
+bdrv_start_replication(bs->file, mode, errp);
+} else {
+error_setg(errp, "this feature or command is not currently supported");
+}
+}
+
+void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+BlockDriver *drv = bs->drv;
+
+if (drv && drv->bdrv_do_checkpoint) {
+drv->bdrv_do_checkpoint(bs, errp);
+} else if (bs->file) {
+bdrv_do_checkpoint(bs->file, errp);
+} else {
+error_setg(errp, "this feature or command is not currently supported");
+}
+}
+
+void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp)
+{
+BlockDriver *drv = bs->drv;
+
+if (drv && drv->bdrv_stop_replication) {
+drv->bdrv_stop_replication(bs, failover, errp);
+} else if (bs->file) {
+bdrv_stop_replication(bs->file, failover, errp);
+} else {
+error_setg(errp, "this feature or command is not currently supported");
+}
+}
diff --git a/include/block/block.h b/include/block/block.h
index db52306..1518ae8 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -615,4 +615,9 @@ void bdrv_add_child(BlockDriverState *bs, QDict *options, 
Error **errp);
 void bdrv_del_child(BlockDriverState *bs, BlockDriverState *child,
 Error **errp);
 
+void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode,
+Error **errp);
+void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp);
+void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 6fddea4..296cba0 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -293,6 +293,20 @@ struct BlockDriver {
 void (*bdrv_del_child)(BlockDriverState *bs, BlockDriverState *child,
Error **errp);
 
+void (*bdrv_start_replication)(BlockDriverState *bs, ReplicationMode mode,
+   Error **errp);
+/* Drop Disk buffer when doing checkpoint. */
+void (*bdrv_do_checkpoint)(BlockDriverState *bs, Error **errp);
+/*
+ * After failover, we should flush Disk buffer into secondary disk
+ * and stop block replication.
+ *
+ * If the guest is shutdown, we should drop Disk buffer and stop
+ * block representation.
+ */
+void (*bdrv_stop_replication)(BlockDriverState *bs, bool failover,
+  Error **errp);
+
 QLIST_ENTRY(BlockDriver) list;
 };
 
diff --git a/qapi/block.json b/qapi/block.json
index aad645c..04dc4c2 100644
--- a/qapi/block.json
+++ b/qapi/block.json
@@ -40,6 +40,22 @@
   'data': ['auto', 'none', 'lba', 'large', 'rechs']}
 
 ##
+# @ReplicationMode
+#
+# An enumeration of replication modes.
+#
+# @unprotected: Replication is not started or after failover.
+#
+# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
+#
+# @secondary: Secondary mode, receive the vm's state from primary QEMU.
+#
+# Since: 2.4
+##
+{ 'enum' : 'ReplicationMode',
+  'data' : ['unprotected', 'primary', 'secondary']}
+
+##
 # @BlockdevSnapshotInternal
 #
 # @device: the name of the device to generate the snapshot from
-- 
2.4.3

[Qemu-devel] [PATCH COLO-BLOCK v8 07/18] introduce a new API to check if blk is attached

2015-07-07 Thread Wen Congyang

Signed-off-by: Wen Congyang 
---
 block.c| 4 ++--
 block/block-backend.c  | 9 +
 include/sysemu/block-backend.h | 1 +
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 2cbc4f9..d560248 100644
--- a/block.c
+++ b/block.c
@@ -2032,7 +2032,7 @@ void bdrv_swap(BlockDriverState *bs_new, BlockDriverState 
*bs_old)
 }
 
 /* bs_new must be unattached and shouldn't have anything fancy enabled */
-assert(!bs_new->blk);
+assert(!blk_is_attached(bs_new->blk));
 assert(QLIST_EMPTY(&bs_new->dirty_bitmaps));
 assert(bs_new->job == NULL);
 assert(bs_new->io_limits_enabled == false);
@@ -2049,7 +2049,7 @@ void bdrv_swap(BlockDriverState *bs_new, BlockDriverState 
*bs_old)
 bdrv_move_feature_fields(bs_new, &tmp);
 
 /* bs_new must remain unattached */
-assert(!bs_new->blk);
+assert(!blk_is_attached(bs_new->blk));
 
 /* Check a few fields that should remain attached to the device */
 assert(bs_new->job == NULL);
diff --git a/block/block-backend.c b/block/block-backend.c
index 72d8b2c..1463c37 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -368,6 +368,15 @@ void blk_enable_attach_dev(BlockBackend *blk)
 }
 
 /*
+ * Return true if a device model is attached to @blk already,
+ * otherwise, return false.
+ */
+bool blk_is_attached(BlockBackend *blk)
+{
+return blk != NULL && blk->dev != NULL && blk->dev != (void *)-1;
+}
+
+/*
  * Set @blk's device model callbacks to @ops.
  * @opaque is the opaque argument to pass to the callbacks.
  * This is for use by device models.
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 7619a9f..a8c6fd2 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -82,6 +82,7 @@ void blk_detach_dev(BlockBackend *blk, void *dev);
 void *blk_get_attached_dev(BlockBackend *blk);
 int blk_disable_attach_dev(BlockBackend *blk);
 void blk_enable_attach_dev(BlockBackend *blk);
+bool blk_is_attached(BlockBackend *blk);
 void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops, void *opaque);
 int blk_read(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
  int nb_sectors);
-- 
2.4.3

Re: [Qemu-devel] Fwd: Using QCOW2 with nand flashes.

2015-07-07 Thread Peter Crosthwaite

On Mon, Jul 6, 2015 at 11:54 PM, sai pavan  wrote:
>
> Hi,
>
> I am trying to implement fake disk images for emulating nand flashes.
> I see the spares files are formed when the content is zeros. But for nand
> flashes the content is all one's initially. It is difficult for me make a
> sparse file with all ones.
>
> Do any one have suggestions for this problem.
>
> I am thinking of creating an nand flash file with all zeros and negating the
> data at receiving end in qemu.

Could this be a feature of qcow or some other file format rather than
a NAND specific thing? It probably applies to other flash media.

Regards,
Peter

> So the input file will be null, but the
> concept of all 1's be intact. But this will be confusing if some one likes
> to compare the output bin files after a write. One should read the data
> negating.
>
> Regards,
> Sai Pavan
>
>
>

[Qemu-devel] [PATCH COLO-BLOCK v8 16/18] quorum: implement block driver interfaces for block replication

2015-07-07 Thread Wen Congyang

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Reviewed-by: Alberto Garcia 
---
 block/quorum.c | 77 ++
 1 file changed, 77 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 2a45d0e..58238f7 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -88,6 +88,8 @@ typedef struct BDRVQuorumState {
 */
 
 QuorumReadPattern read_pattern;
+
+int replication_index; /* store which child supports block replication */
 } BDRVQuorumState;
 
 typedef struct QuorumAIOCB QuorumAIOCB;
@@ -1019,6 +1021,7 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 
 g_free(opened);
+s->replication_index = -1;
 goto exit;
 
 close_exit:
@@ -1179,6 +1182,76 @@ static void quorum_refresh_filename(BlockDriverState *bs)
 bs->full_open_options = opts;
 }
 
+static void quorum_start_replication(BlockDriverState *bs, ReplicationMode 
mode,
+ Error **errp)
+{
+BDRVQuorumState *s = bs->opaque;
+int count = 0, i, index;
+Error *local_err = NULL;
+
+/*
+ * TODO: support REPLICATION_MODE_SECONDARY if we allow secondary
+ * QEMU becoming primary QEMU.
+ */
+if (mode != REPLICATION_MODE_PRIMARY) {
+error_setg(errp, "The replication mode for quorum should be 
'primary'");
+return;
+}
+
+if (s->read_pattern != QUORUM_READ_PATTERN_FIFO) {
+error_setg(errp, "Block replication needs read pattern 'fifo'");
+return;
+}
+
+for (i = 0; i < s->num_children; i++) {
+bdrv_start_replication(s->bs[i], mode, &local_err);
+if (local_err) {
+error_free(local_err);
+local_err = NULL;
+} else {
+count++;
+index = i;
+}
+}
+
+if (count == 0) {
+error_setg(errp, "No child supports block replication");
+} else if (count > 1) {
+for (i = 0; i < s->num_children; i++) {
+bdrv_stop_replication(s->bs[i], false, NULL);
+}
+error_setg(errp, "Too many children support block replication");
+} else {
+s->replication_index = index;
+}
+}
+
+static void quorum_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+BDRVQuorumState *s = bs->opaque;
+
+if (s->replication_index < 0) {
+error_setg(errp, "Block replication is not running");
+return;
+}
+
+bdrv_do_checkpoint(s->bs[s->replication_index], errp);
+}
+
+static void quorum_stop_replication(BlockDriverState *bs, bool failover,
+Error **errp)
+{
+BDRVQuorumState *s = bs->opaque;
+
+if (s->replication_index < 0) {
+error_setg(errp, "Block replication is not running");
+return;
+}
+
+bdrv_stop_replication(s->bs[s->replication_index], failover, errp);
+s->replication_index = -1;
+}
+
 static BlockDriver bdrv_quorum = {
 .format_name= "quorum",
 .protocol_name  = "quorum",
@@ -1205,6 +1278,10 @@ static BlockDriver bdrv_quorum = {
 
 .is_filter  = true,
 .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
+
+.bdrv_start_replication = quorum_start_replication,
+.bdrv_do_checkpoint = quorum_do_checkpoint,
+.bdrv_stop_replication  = quorum_stop_replication,
 };
 
 static void bdrv_quorum_init(void)
-- 
2.4.3

[Qemu-devel] qemu build fails on xen

2015-07-07 Thread Michael S. Tsirkin

The following error triggers on Fedora 22:

In file included from /scm/qemu/include/hw/xen/xen_backend.h:4:0,
 from hw/block/xen_disk.c:39:
/scm/qemu/include/hw/xen/xen_common.h:198:18: error: conflicting types for 
‘ioservid_t’
 typedef uint32_t ioservid_t;
  ^
In file included from /usr/include/xen/hvm/params.h:24:0,
 from /usr/include/xenctrl.h:46,
 from /scm/qemu/include/hw/xen/xen_common.h:9,
 from /scm/qemu/include/hw/xen/xen_backend.h:4,
 from hw/block/xen_disk.c:39:
/usr/include/xen/hvm/hvm_op.h:255:18: note: previous declaration of 
‘ioservid_t’ was here
 typedef uint16_t ioservid_t;
  ^
/scm/qemu/rules.mak:57: recipe for target 'hw/block/xen_disk.o' failed
make: *** [hw/block/xen_disk.o] Error 1
make: *** Waiting for unfinished jobs

Reverting 3996e85c1822e05c50250f8d2d1e57b6bea1229d
Author: Paul Durrant 
Date:   Tue Jan 20 11:06:19 2015 +

Xen: Use the ioreq-server API when available


Looking at that header:

#ifndef HVM_PARAM_BUFIOREQ_EVTCHN
#define HVM_PARAM_BUFIOREQ_EVTCHN 26
#endif

#define IOREQ_TYPE_PCI_CONFIG 2


typedef uint32_t ioservid_t;


Are all polluting the global namespace, not to mention, violate the coding
style. Why not prefix them with Xen_, xen_ etc?


-- 
MST

[Qemu-devel] [PATCH COLO-BLOCK v8 08/18] block: make bdrv_put_ref_bh_schedule() as a public API

2015-07-07 Thread Wen Congyang

Signed-off-by: Wen Congyang 
---
 block.c   | 25 +
 blockdev.c| 37 ++---
 include/block/block.h |  1 +
 3 files changed, 32 insertions(+), 31 deletions(-)

diff --git a/block.c b/block.c
index d560248..f319921 100644
--- a/block.c
+++ b/block.c
@@ -3562,6 +3562,31 @@ void bdrv_unref(BlockDriverState *bs)
 }
 }
 
+typedef struct {
+QEMUBH *bh;
+BlockDriverState *bs;
+} BDRVPutRefBH;
+
+static void bdrv_put_ref_bh(void *opaque)
+{
+BDRVPutRefBH *s = opaque;
+
+bdrv_unref(s->bs);
+qemu_bh_delete(s->bh);
+g_free(s);
+}
+
+/* Release a BDS reference in a BH */
+void bdrv_put_ref_bh_schedule(BlockDriverState *bs)
+{
+BDRVPutRefBH *s;
+
+s = g_new(BDRVPutRefBH, 1);
+s->bh = qemu_bh_new(bdrv_put_ref_bh, s);
+s->bs = bs;
+qemu_bh_schedule(s->bh);
+}
+
 struct BdrvOpBlocker {
 Error *reason;
 QLIST_ENTRY(BdrvOpBlocker) list;
diff --git a/blockdev.c b/blockdev.c
index e5f7779..7ad8401 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -276,37 +276,6 @@ static void bdrv_format_print(void *opaque, const char 
*name)
 error_printf(" %s", name);
 }
 
-typedef struct {
-QEMUBH *bh;
-BlockDriverState *bs;
-} BDRVPutRefBH;
-
-static void bdrv_put_ref_bh(void *opaque)
-{
-BDRVPutRefBH *s = opaque;
-
-bdrv_unref(s->bs);
-qemu_bh_delete(s->bh);
-g_free(s);
-}
-
-/*
- * Release a BDS reference in a BH
- *
- * It is not safe to use bdrv_unref() from a callback function when the callers
- * still need the BlockDriverState.  In such cases we schedule a BH to release
- * the reference.
- */
-static void bdrv_put_ref_bh_schedule(BlockDriverState *bs)
-{
-BDRVPutRefBH *s;
-
-s = g_new(BDRVPutRefBH, 1);
-s->bh = qemu_bh_new(bdrv_put_ref_bh, s);
-s->bs = bs;
-qemu_bh_schedule(s->bh);
-}
-
 static int parse_block_error_action(const char *buf, bool is_read, Error 
**errp)
 {
 if (!strcmp(buf, "ignore")) {
@@ -2326,6 +2295,12 @@ static void block_job_cb(void *opaque, int ret)
 block_job_event_completed(bs->job, msg);
 }
 
+
+/*
+ * It is not safe to use bdrv_unref() from a callback function when the
+ * callers still need the BlockDriverState. In such cases we schedule
+ * a BH to release the reference.
+ */
 bdrv_put_ref_bh_schedule(bs);
 }
 
diff --git a/include/block/block.h b/include/block/block.h
index 29d3363..cbe79bc 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -507,6 +507,7 @@ void bdrv_disable_copy_on_read(BlockDriverState *bs);
 
 void bdrv_ref(BlockDriverState *bs);
 void bdrv_unref(BlockDriverState *bs);
+void bdrv_put_ref_bh_schedule(BlockDriverState *bs);
 
 bool bdrv_op_is_blocked(BlockDriverState *bs, BlockOpType op, Error **errp);
 void bdrv_op_block(BlockDriverState *bs, BlockOpType op, Error *reason);
-- 
2.4.3

[Qemu-devel] [PATCH COLO-BLOCK v8 09/18] Backup: clear all bitmap when doing block checkpoint

2015-07-07 Thread Wen Congyang

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Cc: Jeff Cody 
---
 block/backup.c   | 13 +
 blockjob.c   | 10 ++
 include/block/blockjob.h | 12 
 3 files changed, 35 insertions(+)

diff --git a/block/backup.c b/block/backup.c
index d3c7d9f..ebb8a88 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -211,11 +211,24 @@ static void backup_iostatus_reset(BlockJob *job)
 bdrv_iostatus_reset(s->target);
 }
 
+static void backup_do_checkpoint(BlockJob *job, Error **errp)
+{
+BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
+
+if (backup_job->sync_mode != MIRROR_SYNC_MODE_NONE) {
+error_setg(errp, "this feature or command is not currently supported");
+return;
+}
+
+hbitmap_reset_all(backup_job->bitmap);
+}
+
 static const BlockJobDriver backup_job_driver = {
 .instance_size  = sizeof(BackupBlockJob),
 .job_type   = BLOCK_JOB_TYPE_BACKUP,
 .set_speed  = backup_set_speed,
 .iostatus_reset = backup_iostatus_reset,
+.do_checkpoint  = backup_do_checkpoint,
 };
 
 static BlockErrorAction backup_error_action(BackupBlockJob *job,
diff --git a/blockjob.c b/blockjob.c
index ec46fad..cb412d1 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -400,3 +400,13 @@ void block_job_defer_to_main_loop(BlockJob *job,
 
 qemu_bh_schedule(data->bh);
 }
+
+void block_job_do_checkpoint(BlockJob *job, Error **errp)
+{
+if (!job->driver->do_checkpoint) {
+error_setg(errp, "this feature or command is not currently supported");
+return;
+}
+
+job->driver->do_checkpoint(job, errp);
+}
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 57d8ef1..b832dc3 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -50,6 +50,9 @@ typedef struct BlockJobDriver {
  * manually.
  */
 void (*complete)(BlockJob *job, Error **errp);
+
+/** Optional callback for job types that support checkpoint. */
+void (*do_checkpoint)(BlockJob *job, Error **errp);
 } BlockJobDriver;
 
 /**
@@ -348,4 +351,13 @@ void block_job_defer_to_main_loop(BlockJob *job,
   BlockJobDeferToMainLoopFn *fn,
   void *opaque);
 
+/**
+ * block_job_do_checkpoint:
+ * @job: The job.
+ * @errp: Error object.
+ *
+ * Do block checkpoint on the specified job.
+ */
+void block_job_do_checkpoint(BlockJob *job, Error **errp);
+
 #endif
-- 
2.4.3

[Qemu-devel] [PATCH COLO-BLOCK v8 03/18] hmp: add monitor command to add/remove a child

2015-07-07 Thread Wen Congyang

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
---
 blockdev.c| 53 +++
 hmp-commands.hx   | 28 +
 include/sysemu/blockdev.h |  2 ++
 3 files changed, 83 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index c11611d..e5f7779 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2186,6 +2186,59 @@ void hmp_drive_del(Monitor *mon, const QDict *qdict)
 aio_context_release(aio_context);
 }
 
+void hmp_child_add(Monitor *mon, const QDict *qdict)
+{
+const char *id = qdict_get_str(qdict, "id");
+const char *optstr = qdict_get_str(qdict, "opts");
+QemuOpts *opts;
+QDict *bs_opts = qdict_new();
+BlockDriverState *bs;
+Error *local_err = NULL;
+
+opts = drive_def(optstr);
+if (!opts) {
+/* We have reported error in drive_def */
+return;
+}
+bs_opts = qemu_opts_to_qdict(opts, bs_opts);
+
+bs = bdrv_lookup_bs(id, id, &local_err);
+if (!bs) {
+error_report_err(local_err);
+return;
+}
+
+bdrv_add_child(bs, bs_opts, &local_err);
+if (local_err) {
+error_report_err(local_err);
+}
+}
+
+void hmp_child_del(Monitor *mon, const QDict *qdict)
+{
+const char *id = qdict_get_str(qdict, "id");
+const char *child_id = qdict_get_str(qdict, "child");
+BlockDriverState *bs, *child_bs;
+Error *local_err = NULL;
+
+bs = bdrv_lookup_bs(id, id, &local_err);
+if (!bs) {
+error_report_err(local_err);
+return;
+}
+
+child_bs = bdrv_lookup_bs(child_id, child_id, &local_err);
+if (!child_bs) {
+error_report_err(local_err);
+return;
+}
+
+bdrv_del_child(bs, child_bs, &local_err);
+if (local_err) {
+error_report_err(local_err);
+}
+}
+
 void qmp_block_resize(bool has_device, const char *device,
   bool has_node_name, const char *node_name,
   int64_t size, Error **errp)
diff --git a/hmp-commands.hx b/hmp-commands.hx
index d3b7932..1d5b392 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -193,6 +193,34 @@ actions (drive options rerror, werror).
 ETEXI
 
 {
+.name   = "child_add",
+.args_type  = "id:B,opts:s",
+.params = "device child.file=file",
+.help   = "add a child to a BDS",
+.mhandler.cmd = hmp_child_add,
+},
+
+STEXI
+@item child_add @var{device} @var{options}
+@findex child_add
+Add a child to the block device.
+ETEXI
+
+{
+.name   = "child_del",
+.args_type  = "id:B,child:B",
+.params = "parent child",
+.help   = "remove a child from a BDS",
+.mhandler.cmd = hmp_child_del,
+},
+
+STEXI
+@item child_del @var{parent device} @var{child device}
+@findex child_del
+Remove a child from the parent device.
+ETEXI
+
+{
 .name   = "change",
 .args_type  = "device:B,target:F,arg:s?",
 .params = "device filename [format]",
diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h
index 3104150..594bfab 100644
--- a/include/sysemu/blockdev.h
+++ b/include/sysemu/blockdev.h
@@ -67,4 +67,6 @@ void qmp_change_blockdev(const char *device, const char 
*filename,
  const char *format, Error **errp);
 void hmp_commit(Monitor *mon, const QDict *qdict);
 void hmp_drive_del(Monitor *mon, const QDict *qdict);
+void hmp_child_add(Monitor *mon, const QDict *qdict);
+void hmp_child_del(Monitor *mon, const QDict *qdict);
 #endif
-- 
2.4.3

Re: [Qemu-devel] qemu build fails on xen

2015-07-07 Thread Michael S. Tsirkin

On Tue, Jul 07, 2015 at 11:45:29AM +0300, Michael S. Tsirkin wrote:
> The following error triggers on Fedora 22:
> 
> In file included from /scm/qemu/include/hw/xen/xen_backend.h:4:0,
>  from hw/block/xen_disk.c:39:
> /scm/qemu/include/hw/xen/xen_common.h:198:18: error: conflicting types for 
> ‘ioservid_t’
>  typedef uint32_t ioservid_t;
>   ^
> In file included from /usr/include/xen/hvm/params.h:24:0,
>  from /usr/include/xenctrl.h:46,
>  from /scm/qemu/include/hw/xen/xen_common.h:9,
>  from /scm/qemu/include/hw/xen/xen_backend.h:4,
>  from hw/block/xen_disk.c:39:
> /usr/include/xen/hvm/hvm_op.h:255:18: note: previous declaration of 
> ‘ioservid_t’ was here
>  typedef uint16_t ioservid_t;
>   ^
> /scm/qemu/rules.mak:57: recipe for target 'hw/block/xen_disk.o' failed
> make: *** [hw/block/xen_disk.o] Error 1
> make: *** Waiting for unfinished jobs
> 
> Reverting 3996e85c1822e05c50250f8d2d1e57b6bea1229d

Sorry - I meant reverting this commit fixes the problem.



> Author: Paul Durrant 
> Date:   Tue Jan 20 11:06:19 2015 +
> 
> Xen: Use the ioreq-server API when available
> 
> 
> Looking at that header:
> 
> #ifndef HVM_PARAM_BUFIOREQ_EVTCHN
> #define HVM_PARAM_BUFIOREQ_EVTCHN 26
> #endif
> 
> #define IOREQ_TYPE_PCI_CONFIG 2
> 
> 
> typedef uint32_t ioservid_t;
> 
> 
> Are all polluting the global namespace, not to mention, violate the coding
> style. Why not prefix them with Xen_, xen_ etc?
> 
> 
> -- 
> MST

Re: [Qemu-devel] [PATCH 2/2] ahci: fix signature generation

2015-07-07 Thread Stefan Hajnoczi

On Mon, Jul 06, 2015 at 05:49:52PM -0400, John Snow wrote:
> The initial register device-to-host FIS no longer needs to specially
> set certain fields, as these can be handled generically by setting those
> fields explicitly with the signatures we want at port reset time.
> 
> (1) Signatures are decomposed into their four component registers and
> set upon (AHCI) port reset.
> (2) the signature cache register is no longer set manually per-each
> device type, but instead just once during ahci_init_d2h.
> 
> Signed-off-by: John Snow 
> ---
>  hw/ide/ahci.c | 33 -
>  1 file changed, 20 insertions(+), 13 deletions(-)

I see two code paths that call ahci_init_d2h().  Either
ahci_reset_port() does it (if a block device is attached) or it's called
when the guest writes to the PORT_CMD register.

I'm not sure the latter works.  The signature doesn't seem to be set
anywhere.

Any ideas?

> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
> index bb6a92f..f352dd7 100644
> --- a/hw/ide/ahci.c
> +++ b/hw/ide/ahci.c
> @@ -537,20 +537,31 @@ static void ahci_init_d2h(AHCIDevice *ad)
>  {
>  uint8_t init_fis[20];
>  IDEState *ide_state = &ad->port.ifs[0];
> +AHCIPortRegs *pr = &ad->port_regs;
>  
>  memset(init_fis, 0, sizeof(init_fis));
>  
> -init_fis[4] = 1;
> -init_fis[12] = 1;
> -
> -if (ide_state->drive_kind == IDE_CD) {
> -init_fis[5] = ide_state->lcyl;
> -init_fis[6] = ide_state->hcyl;
> -}
> +/* We're emulating receiving the first Reg H2D Fis from the device;
> + * Update the SIG register, but otherwise procede as normal. */
> +pr->sig = (ide_state->hcyl << 24) |
> +(ide_state->lcyl << 16) |
> +(ide_state->sector << 8) |
> +(ide_state->nsector & 0xFF);
>  
>  ahci_write_fis_d2h(ad, init_fis);
>  }
>  
> +static void ahci_set_signature(AHCIDevice *ad, uint32_t sig)
> +{
> +IDEState *s = &ad->port.ifs[0];
> +s->hcyl = sig >> 24 & 0xFF;
> +s->lcyl = sig >> 16 & 0xFF;
> +s->sector = sig >> 8 & 0xFF;
> +s->nsector = sig & 0xFF;
> +
> +DPRINTF(ad->port_no, "set hcyl:lcyl:sect:nsect = 0x%08x\n", sig);
> +}
> +
>  static void ahci_reset_port(AHCIState *s, int port)
>  {
>  AHCIDevice *d = &s->dev[port];
> @@ -600,16 +611,12 @@ static void ahci_reset_port(AHCIState *s, int port)
>  
>  s->dev[port].port_state = STATE_RUN;
>  if (!ide_state->blk) {
> -pr->sig = 0;
>  ide_state->status = SEEK_STAT | WRERR_STAT;
>  } else if (ide_state->drive_kind == IDE_CD) {
> -pr->sig = SATA_SIGNATURE_CDROM;
> -ide_state->lcyl = 0x14;
> -ide_state->hcyl = 0xeb;
> -DPRINTF(port, "set lcyl = %d\n", ide_state->lcyl);
> +ahci_set_signature(d, SATA_SIGNATURE_CDROM);
>  ide_state->status = SEEK_STAT | WRERR_STAT | READY_STAT;
>  } else {
> -pr->sig = SATA_SIGNATURE_DISK;
> +ahci_set_signature(d, SATA_SIGNATURE_DISK);
>  ide_state->status = SEEK_STAT | WRERR_STAT;
>  }
>  
> -- 
> 2.1.0
> 


pgpYWuXKQ5wzI.pgp
Description: PGP signature

[Qemu-devel] [PATCH COLO-BLOCK v8 12/18] block: Allow references for backing files

2015-07-07 Thread Wen Congyang

Usage:
-drive file=xxx,id=Y, \
-drive file=,id=X,backing.backing_reference=Y

It will create such backing chain:
   {virtio-blk dev 'Y'}  {virtio-blk dev 'X'}
 | |
 | |
 v v

[base] <- [mid] <- ( Y )  <- ( X )

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
---
 block.c   | 39 +++
 include/block/block.h |  1 +
 2 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index 879ca75..f7192a3 100644
--- a/block.c
+++ b/block.c
@@ -1143,6 +1143,7 @@ out:
 }
 
 #define ALLOW_WRITE_BACKING_FILE"allow-write-backing-file"
+#define BACKING_REFERENCE   "backing_reference"
 static QemuOptsList backing_file_opts = {
 .name = "backing_file",
 .head = QTAILQ_HEAD_INITIALIZER(backing_file_opts.head),
@@ -1152,6 +1153,11 @@ static QemuOptsList backing_file_opts = {
 .type = QEMU_OPT_BOOL,
 .help = "allow write to backing file",
 },
+{
+.name = BACKING_REFERENCE,
+.type = QEMU_OPT_STRING,
+.help = "reference to the exsiting BDS",
+},
 { /* end of list */ }
 },
 };
@@ -1168,11 +1174,12 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*options, Error **errp)
 {
 char *backing_filename = g_malloc0(PATH_MAX);
 int ret = 0;
-BlockDriverState *backing_hd;
+BlockDriverState *backing_hd = NULL;
 Error *local_err = NULL;
 QemuOpts *opts = NULL;
 bool child_rw = false;
 const BdrvChildRole *child_role = NULL;
+const char *reference = NULL;
 
 if (bs->backing_hd != NULL) {
 QDECREF(options);
@@ -1195,9 +1202,10 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*options, Error **errp)
 goto free_exit;
 }
 child_rw = qemu_opt_get_bool(opts, ALLOW_WRITE_BACKING_FILE, false);
+reference = qemu_opt_get(opts, BACKING_REFERENCE);
 child_role = child_rw ? &child_backing_rw : &child_backing;
 
-if (qdict_haskey(options, "file.filename")) {
+if (qdict_haskey(options, "file.filename") || reference) {
 backing_filename[0] = '\0';
 } else if (bs->backing_file[0] == '\0' && qdict_size(options) == 0) {
 QDECREF(options);
@@ -1220,7 +1228,9 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*options, Error **errp)
 goto free_exit;
 }
 
-backing_hd = bdrv_new();
+if (!reference) {
+backing_hd = bdrv_new();
+}
 
 if (bs->backing_format[0] != '\0' && !qdict_haskey(options, "driver")) {
 qdict_put(options, "driver", qstring_from_str(bs->backing_format));
@@ -1229,7 +1239,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*options, Error **errp)
 assert(bs->backing_hd == NULL);
 ret = bdrv_open_inherit(&backing_hd,
 *backing_filename ? backing_filename : NULL,
-NULL, options, 0, bs, child_role,
+reference, options, 0, bs, child_role,
 NULL, &local_err);
 if (ret < 0) {
 bdrv_unref(backing_hd);
@@ -1240,12 +1250,30 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*options, Error **errp)
 error_free(local_err);
 goto free_exit;
 }
+if (reference) {
+if (bdrv_op_is_blocked(backing_hd, BLOCK_OP_TYPE_BACKING_REFERENCE,
+   errp)) {
+ret = -EBUSY;
+goto free_reference_exit;
+}
+if (backing_hd->blk && blk_disable_attach_dev(backing_hd->blk)) {
+error_setg(errp, "backing_hd %s is used by the other device model",
+   reference);
+ret = -EBUSY;
+goto free_reference_exit;
+}
+}
 bdrv_set_backing_hd(bs, backing_hd);
 
 free_exit:
 qemu_opts_del(opts);
 g_free(backing_filename);
 return ret;
+
+free_reference_exit:
+bdrv_unref(backing_hd);
+bs->open_flags |= BDRV_O_NO_BACKING;
+goto free_exit;
 }
 
 /*
@@ -1899,6 +1927,9 @@ void bdrv_close(BlockDriverState *bs)
 if (bs->backing_hd) {
 BlockDriverState *backing_hd = bs->backing_hd;
 bdrv_set_backing_hd(bs, NULL);
+if (backing_hd->blk) {
+blk_enable_attach_dev(backing_hd->blk);
+}
 bdrv_unref(backing_hd);
 }
 bs->drv->bdrv_close(bs);
diff --git a/include/block/block.h b/include/block/block.h
index cbe79bc..db52306 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -168,6 +168,7 @@ typedef enum BlockOpType {
 BLOCK_OP_TYPE_RESIZE,
 BLOCK_OP_TYPE_STREAM,
 BLOCK_OP_TYPE_REPLACE,
+BLOCK_OP_TYPE_BACKING_REFERENCE,
 BLOCK_OP_TYPE_MAX,
 } BlockOpType;
 
-- 
2.4.3

[Qemu-devel] [PATCH COLO-BLOCK v8 15/18] skip nbd_target when starting block replication

2015-07-07 Thread Wen Congyang

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
---
 block.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/block.c b/block.c
index 5778064..0a6691e 100644
--- a/block.c
+++ b/block.c
@@ -4335,6 +4335,10 @@ void bdrv_start_replication(BlockDriverState *bs, 
ReplicationMode mode,
 {
 BlockDriver *drv = bs->drv;
 
+if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, NULL)) {
+return;
+}
+
 if (drv && drv->bdrv_start_replication) {
 drv->bdrv_start_replication(bs, mode, errp);
 } else if (bs->file) {
@@ -4348,6 +4352,10 @@ void bdrv_do_checkpoint(BlockDriverState *bs, Error 
**errp)
 {
 BlockDriver *drv = bs->drv;
 
+if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, NULL)) {
+return;
+}
+
 if (drv && drv->bdrv_do_checkpoint) {
 drv->bdrv_do_checkpoint(bs, errp);
 } else if (bs->file) {
@@ -4361,6 +4369,10 @@ void bdrv_stop_replication(BlockDriverState *bs, bool 
failover, Error **errp)
 {
 BlockDriver *drv = bs->drv;
 
+if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, NULL)) {
+return;
+}
+
 if (drv && drv->bdrv_stop_replication) {
 drv->bdrv_stop_replication(bs, failover, errp);
 } else if (bs->file) {
-- 
2.4.3

[Qemu-devel] [PATCH COLO-BLOCK v8 17/18] Implement new driver for block replication

2015-07-07 Thread Wen Congyang

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
---
 block/Makefile.objs |   1 +
 block/replication.c | 443 
 2 files changed, 444 insertions(+)
 create mode 100644 block/replication.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index f068666..84952b1 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -23,6 +23,7 @@ block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o
 block-obj-y += write-threshold.o
 block-obj-y += backup.o
+block-obj-y += replication.o
 
 common-obj-y += stream.o
 common-obj-y += commit.o
diff --git a/block/replication.c b/block/replication.c
new file mode 100644
index 000..2124e2d
--- /dev/null
+++ b/block/replication.c
@@ -0,0 +1,443 @@
+#include "qemu-common.h"
+#include "block/block_int.h"
+#include "block/blockjob.h"
+#include "block/nbd.h"
+
+typedef struct BDRVReplicationState {
+ReplicationMode mode;
+int replication_state;
+BlockDriverState *active_disk;
+BlockDriverState *hidden_disk;
+BlockDriverState *secondary_disk; /* nbd target */
+int error;
+} BDRVReplicationState;
+
+enum {
+BLOCK_REPLICATION_NONE, /* block replication is not started */
+BLOCK_REPLICATION_RUNNING,  /* block replication is running */
+BLOCK_REPLICATION_DONE, /* block replication is done(failover) */
+};
+
+#define COMMIT_CLUSTER_BITS 16
+#define COMMIT_CLUSTER_SIZE (1 << COMMIT_CLUSTER_BITS)
+#define COMMIT_SECTORS_PER_CLUSTER (COMMIT_CLUSTER_SIZE / BDRV_SECTOR_SIZE)
+
+static void replication_stop(BlockDriverState *bs, bool failover, Error 
**errp);
+
+#define REPLICATION_MODE"mode"
+static QemuOptsList replication_runtime_opts = {
+.name = "replication",
+.head = QTAILQ_HEAD_INITIALIZER(replication_runtime_opts.head),
+.desc = {
+{
+.name = REPLICATION_MODE,
+.type = QEMU_OPT_STRING,
+},
+{ /* end of list */ }
+},
+};
+
+static int replication_open(BlockDriverState *bs, QDict *options,
+int flags, Error **errp)
+{
+int ret;
+BDRVReplicationState *s = bs->opaque;;
+Error *local_err = NULL;
+QemuOpts *opts = NULL;
+const char *mode;
+
+ret = -EINVAL;
+opts = qemu_opts_create(&replication_runtime_opts, NULL, 0, &error_abort);
+qemu_opts_absorb_qdict(opts, options, &local_err);
+if (local_err) {
+goto fail;
+}
+
+mode = qemu_opt_get(opts, REPLICATION_MODE);
+if (!mode) {
+error_setg(&local_err, "Missing the option mode");
+goto fail;
+}
+
+if (!strcmp(mode, "primary")) {
+s->mode = REPLICATION_MODE_PRIMARY;
+} else if (!strcmp(mode, "secondary")) {
+s->mode = REPLICATION_MODE_SECONDARY;
+} else {
+error_setg(&local_err,
+   "The option mode's value should be primary or secondary");
+goto fail;
+}
+
+return 0;
+
+fail:
+qemu_opts_del(opts);
+/* propagate error */
+if (local_err) {
+error_propagate(errp, local_err);
+}
+return ret;
+}
+
+static void replication_close(BlockDriverState *bs)
+{
+BDRVReplicationState *s = bs->opaque;
+
+if (s->replication_state == BLOCK_REPLICATION_RUNNING) {
+replication_stop(bs, false, NULL);
+}
+}
+
+static int64_t replication_getlength(BlockDriverState *bs)
+{
+return bdrv_getlength(bs->file);
+}
+
+static int replication_get_io_status(BDRVReplicationState *s)
+{
+switch (s->replication_state) {
+case BLOCK_REPLICATION_NONE:
+return -EIO;
+case BLOCK_REPLICATION_RUNNING:
+return 0;
+case BLOCK_REPLICATION_DONE:
+return s->mode == REPLICATION_MODE_PRIMARY ? -EIO : 1;
+default:
+abort();
+}
+}
+
+static int replication_return_value(BDRVReplicationState *s, int ret)
+{
+if (s->mode == REPLICATION_MODE_SECONDARY) {
+return ret;
+}
+
+if (ret < 0) {
+s->error = ret;
+ret = 0;
+}
+
+return ret;
+}
+
+static coroutine_fn int replication_co_readv(BlockDriverState *bs,
+ int64_t sector_num,
+ int remaining_sectors,
+ QEMUIOVector *qiov)
+{
+BDRVReplicationState *s = bs->opaque;
+int ret;
+
+if (s->mode == REPLICATION_MODE_PRIMARY) {
+/* We only use it to forward primary write requests */
+return -EIO;
+}
+
+ret = replication_get_io_status(s);
+if (ret < 0) {
+return ret;
+}
+
+/*
+ * After failover, because we don't commit active disk/hidden disk
+ * to secondary disk(nbd target), so we should read from active disk
+ * directly.
+ */
+ret = bdrv_co_readv(bs->file, sector_num, remaining_sectors, qiov);
+return replication_return_value(s, ret);
+}
+
+static coroutine_fn int replication_co_writev(BlockDriverState *bs,
+

Re: [Qemu-devel] [PATCH v2] net: Flush queued packets when guest resumes

2015-07-07 Thread Stefan Hajnoczi

On Tue, Jul 07, 2015 at 09:21:07AM +0800, Fam Zheng wrote:
> Since commit 6e99c63 "net/socket: Drop net_socket_can_send" and friends,
> net queues need to be explicitly flushed after qemu_can_send_packet()
> returns false, because the netdev side will disable the polling of fd.
> 
> This fixes the case of "cont" after "stop" (or migration).
> 
> Signed-off-by: Fam Zheng 
> 
> ---
> 
> v2: Unify with VM stop handler. (Stefan)

Thanks!  I'm happy with this but I'll wait for you to respond to Jason's
comment.

> ---
>  net/net.c | 19 ---
>  1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/net/net.c b/net/net.c
> index 6ff7fec..28a5597 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -1257,14 +1257,19 @@ void qmp_set_link(const char *name, bool up, Error 
> **errp)
>  static void net_vm_change_state_handler(void *opaque, int running,
>  RunState state)
>  {
> -/* Complete all queued packets, to guarantee we don't modify
> - * state later when VM is not running.
> - */
> -if (!running) {
> -NetClientState *nc;
> -NetClientState *tmp;
> +NetClientState *nc;
> +NetClientState *tmp;
>  
> -QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> +QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> +if (running) {
> +/* Flush queued packets and wake up backends. */
> +if (nc->peer && qemu_can_send_packet(nc)) {
> +qemu_flush_queued_packets(nc->peer);
> +}
> +} else {
> +/* Complete all queued packets, to guarantee we don't modify
> + * state later when VM is not running.
> + */
>  qemu_flush_or_purge_queued_packets(nc, true);
>  }
>  }
> -- 
> 2.4.3
> 


pgpoQdVqwjtcw.pgp
Description: PGP signature

[Qemu-devel] [PATCH COLO-BLOCK v8 18/18] Add a new API to start/stop replication, do checkpoint to all BDSes

2015-07-07 Thread Wen Congyang

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
---
 block.c   | 68 +++
 include/block/block.h |  4 +++
 2 files changed, 72 insertions(+)

diff --git a/block.c b/block.c
index 0a6691e..43d175b 100644
--- a/block.c
+++ b/block.c
@@ -4381,3 +4381,71 @@ void bdrv_stop_replication(BlockDriverState *bs, bool 
failover, Error **errp)
 error_setg(errp, "this feature or command is not currently supported");
 }
 }
+
+void bdrv_start_replication_all(ReplicationMode mode, Error **errp)
+{
+BlockDriverState *bs = NULL, *temp = NULL;
+Error *local_err = NULL;
+
+while ((bs = bdrv_next(bs))) {
+if (bdrv_is_read_only(bs) || !bdrv_is_inserted(bs)) {
+continue;
+}
+
+bdrv_start_replication(bs, mode, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+goto fail;
+}
+}
+
+return;
+
+fail:
+while ((temp = bdrv_next(temp)) && bs != temp) {
+bdrv_stop_replication(temp, false, NULL);
+}
+}
+
+void bdrv_do_checkpoint_all(Error **errp)
+{
+BlockDriverState *bs = NULL;
+Error *local_err = NULL;
+
+while ((bs = bdrv_next(bs))) {
+if (bdrv_is_read_only(bs) || !bdrv_is_inserted(bs)) {
+continue;
+}
+
+bdrv_do_checkpoint(bs, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+}
+}
+
+void bdrv_stop_replication_all(bool failover, Error **errp)
+{
+BlockDriverState *bs = NULL;
+Error *local_err = NULL;
+
+while ((bs = bdrv_next(bs))) {
+if (bdrv_is_read_only(bs) || !bdrv_is_inserted(bs)) {
+continue;
+}
+
+bdrv_stop_replication(bs, failover, &local_err);
+if (!errp) {
+/*
+ * The caller doesn't care the result, they just
+ * want to stop all block's replication.
+ */
+continue;
+}
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+}
+}
diff --git a/include/block/block.h b/include/block/block.h
index 1518ae8..e1251bd 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -620,4 +620,8 @@ void bdrv_start_replication(BlockDriverState *bs, 
ReplicationMode mode,
 void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp);
 void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp);
 
+void bdrv_start_replication_all(ReplicationMode mode, Error **errp);
+void bdrv_do_checkpoint_all(Error **errp);
+void bdrv_stop_replication_all(bool failover, Error **errp);
+
 #endif
-- 
2.4.3

Re: [Qemu-devel] [PATCH] virtio-net: Drop net_virtio_info.can_receive

2015-07-07 Thread Jason Wang



On 07/06/2015 11:21 PM, Stefan Hajnoczi wrote:
> On Mon, Jul 06, 2015 at 11:32:25AM +0800, Jason Wang wrote:
>>
>> On 07/02/2015 08:46 PM, Stefan Hajnoczi wrote:
>>> On Tue, Jun 30, 2015 at 04:35:24PM +0800, Jason Wang wrote:
 On 06/30/2015 11:06 AM, Fam Zheng wrote:
> virtio_net_receive still does the check by calling
> virtio_net_can_receive, if the device or driver is not ready, the packet
> is dropped.
>
> This is necessary because returning false from can_receive complicates
> things: the peer would disable sending until we explicitly flush the
> queue.
>
> Signed-off-by: Fam Zheng 
> ---
>  hw/net/virtio-net.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index d728233..dbef0d0 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1503,7 +1503,6 @@ static int virtio_net_load_device(VirtIODevice 
> *vdev, QEMUFile *f,
>  static NetClientInfo net_virtio_info = {
>  .type = NET_CLIENT_OPTIONS_KIND_NIC,
>  .size = sizeof(NICState),
> -.can_receive = virtio_net_can_receive,
>  .receive = virtio_net_receive,
>  .link_status_changed = virtio_net_set_link_status,
>  .query_rx_filter = virtio_net_query_rxfilter,
 A side effect of this patch is it will read and then drop packet is
 guest driver is no ok.
>>> I think that the semantics of .can_receive() and .receive() return
>>> values are currently incorrect in many NICs.  They have .can_receive()
>>> functions that return false for conditions where .receive() would
>>> discard the packet.  So what happens is that packets get queued when
>>> they should actually be discarded.
>> Yes, but they are bugs more or less.
>>
>>> The purpose of the flow control (queuing) mechanism is to tell the
>>> sender to hold off until the receiver has more rx buffers available.
>>> It's a short-term thing that doesn't included link down, rx disable, or
>>> NIC reset states.
>>>
>>> Therefore, I think this patch will not introduce a regression.  It is
>>> adjusting the code to stop queuing packets when they should actually be
>>> dropped.
>>>
>>> Thoughts?
>> I agree there's no functional issue. But it cause wasting of cpu cycles
>> (consider guest is being flooded). Sometime it maybe even dangerous. For
>> tap, we're probably ok since we have 756ae78b but for other backend, we
>> don't.
> If the guest uses iptables rules or other mechanisms to drop bogus
> packets the cost is even higher than discarding them at the QEMU layer.

But it was the choice of guest.

>
> What's more is that if you're using link down as a DoS mitigation
> strategy then you might as well hot unplug the NIC.
>
> Stefan

I think there're two problems for virtio-net:

1) mitigation method when guest driver is ok. For tx, we have either
timer or bh, for rx and only for tap, we have 756ae78b. We probably need
fixes for other backends.

2) when driver is not ok, the point is we should not poll the backend at
all (I believe this is one of the main objects of main loop). Something
like tap_can_send() and the commit that drops tap_can_send() all follow
this rule. But this patch does not, we end up with:

- driver is not ok or no driver, qemu keep reading and dropping packets.
- driver is ok but not enough rx buffer, qemu will disable tap read poll.

Which looks conflicted.

We need fix this either in 2.4 or later and also for other NICs.

[Qemu-devel] [PATCH for-2.4] watchdog/diag288: correctly register for system reset requests

2015-07-07 Thread Cornelia Huck

From: Xu Wang 

The diag288 watchdog is no sysbus device, therefore it doesn't get
triggered on resets automatically using dc->reset.

Let's register the reset handler manually, so we get correctly notified
again when a system reset was requested. Also reset the watchdog on
subsystem resets that don't trigger a full system reset.

Signed-off-by: Xu Wang 
Reviewed-by: David Hildenbrand 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/s390-virtio-ccw.c | 6 +-
 hw/watchdog/wdt_diag288.c  | 8 
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 3d20d6a..4c51d1a 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -36,7 +36,7 @@ typedef struct S390CcwMachineState {
 
 void io_subsystem_reset(void)
 {
-DeviceState *css, *sclp, *flic;
+DeviceState *css, *sclp, *flic, *diag288;
 
 css = DEVICE(object_resolve_path_type("", "virtual-css-bridge", NULL));
 if (css) {
@@ -51,6 +51,10 @@ void io_subsystem_reset(void)
 if (flic) {
 qdev_reset_all(flic);
 }
+diag288 = DEVICE(object_resolve_path_type("", "diag288", NULL));
+if (diag288) {
+qdev_reset_all(diag288);
+}
 }
 
 static int virtio_ccw_hcall_notify(const uint64_t *args)
diff --git a/hw/watchdog/wdt_diag288.c b/hw/watchdog/wdt_diag288.c
index 1185e06..2a885a4 100644
--- a/hw/watchdog/wdt_diag288.c
+++ b/hw/watchdog/wdt_diag288.c
@@ -40,6 +40,13 @@ static void wdt_diag288_reset(DeviceState *dev)
 timer_del(diag288->timer);
 }
 
+static void diag288_reset(void *opaque)
+{
+DeviceState *diag288 = opaque;
+
+wdt_diag288_reset(diag288);
+}
+
 static void diag288_timer_expired(void *dev)
 {
 qemu_log_mask(CPU_LOG_RESET, "Watchdog timer expired.\n");
@@ -80,6 +87,7 @@ static void wdt_diag288_realize(DeviceState *dev, Error 
**errp)
 {
 DIAG288State *diag288 = DIAG288(dev);
 
+qemu_register_reset(diag288_reset, diag288);
 diag288->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, diag288_timer_expired,
   dev);
 }
-- 
2.4.5

[Qemu-devel] [PATCH for-2.4] s390x: diag288 reset fix

2015-07-07 Thread Cornelia Huck

One more fix for s390x: The newly introduced diag288 watchdog driver
is not on the sysbus but bus-less (as feedback suggested). Unfortunately,
this also means we need to wire up any resets by hand.

Xu Wang (1):
  watchdog/diag288: correctly register for system reset requests

 hw/s390x/s390-virtio-ccw.c | 6 +-
 hw/watchdog/wdt_diag288.c  | 8 
 2 files changed, 13 insertions(+), 1 deletion(-)

-- 
2.4.5

Re: [Qemu-devel] [PATCH v2] net: Flush queued packets when guest resumes

2015-07-07 Thread Jason Wang



On 07/07/2015 04:13 PM, Michael S. Tsirkin wrote:
> On Tue, Jul 07, 2015 at 09:21:07AM +0800, Fam Zheng wrote:
>> Since commit 6e99c63 "net/socket: Drop net_socket_can_send" and friends,
>> net queues need to be explicitly flushed after qemu_can_send_packet()
>> returns false, because the netdev side will disable the polling of fd.
>>
>> This fixes the case of "cont" after "stop" (or migration).
>>
>> Signed-off-by: Fam Zheng 
> Note virtio has its own handler which must be used to
> flush packets - this one might run too early or too late.

If runs too realy (DRIVER_OK is not set), then: packet will be dropped
(if this patch is used with "drop virtio_net_can_receive()", or the
queue will be purged since qemu_can_send_packet() returns false. If too
late, at least tap read poll will be enabled. So still looks ok?

>
>> ---
>>
>> v2: Unify with VM stop handler. (Stefan)
>> ---
>>  net/net.c | 19 ---
>>  1 file changed, 12 insertions(+), 7 deletions(-)
>>
>> diff --git a/net/net.c b/net/net.c
>> index 6ff7fec..28a5597 100644
>> --- a/net/net.c
>> +++ b/net/net.c
>> @@ -1257,14 +1257,19 @@ void qmp_set_link(const char *name, bool up, Error 
>> **errp)
>>  static void net_vm_change_state_handler(void *opaque, int running,
>>  RunState state)
>>  {
>> -/* Complete all queued packets, to guarantee we don't modify
>> - * state later when VM is not running.
>> - */
>> -if (!running) {
>> -NetClientState *nc;
>> -NetClientState *tmp;
>> +NetClientState *nc;
>> +NetClientState *tmp;
>>  
>> -QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
>> +QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
>> +if (running) {
>> +/* Flush queued packets and wake up backends. */
>> +if (nc->peer && qemu_can_send_packet(nc)) {
>> +qemu_flush_queued_packets(nc->peer);
>> +}
>> +} else {
>> +/* Complete all queued packets, to guarantee we don't modify
>> + * state later when VM is not running.
>> + */
>>  qemu_flush_or_purge_queued_packets(nc, true);
>>  }
>>  }
>> -- 
>> 2.4.3

Re: [Qemu-devel] [PATCH] vl: move rom_load_all after machine init done

2015-07-07 Thread Eric Auger

Hi Paolo, Peter,
On 06/22/2015 11:58 AM, Eric Auger wrote:
> On 06/22/2015 11:53 AM, Paolo Bonzini wrote:
>>
>>
>> On 22/06/2015 11:49, Eric Auger wrote:
> It seems safe because rom_load_all really doesn't load anything, it only
> does an overlap check.  Is this right?
>>> it does the check + isrom field setting
>
> Is the bug that some overlapping ROMs are not detected?  The commit
> message is not clear.
>>> The regression is that the both overlap check and isrom setting are not
>>> done since ROM are inserted in the roms list afterwards, at machine init
>>> done time. The bug was not really observed yet I think.
>>
>> isrom is just an optimization though, right?  What is it useful for?
> My understanding is it serves 2 purposes:
> 
> - report info in the monitor (hmp_info_roms)
> - decide whether the rom->data can be freed on ROM reset notifier
> (rom_reset).
> 
> Hope I didn't miss anything else.
> 
> Eric

What do we decide then about this regression on arm. Do we fix it in 2.4
or later?

Best Regards

Eric
>>
>> Paolo
>>
>

[Qemu-devel] [PATCH 2.4] socket: pass correct size in net_socket_send()

2015-07-07 Thread Jason Wang

We should pass the size of packet instead of the remaining to
qemu_send_packet_async().

Fixes: 6e99c631f116221d169ea53953d91b8aa74d297a
   ("net/socket: Drop net_socket_can_send")

Signed-off-by: Jason Wang 
---
 net/socket.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/socket.c b/net/socket.c
index c752696..b1e3b1c 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -213,7 +213,7 @@ static void net_socket_send(void *opaque)
 if (s->index >= s->packet_len) {
 s->index = 0;
 s->state = 0;
-if (qemu_send_packet_async(&s->nc, s->buf, size,
+if (qemu_send_packet_async(&s->nc, s->buf, s->packet_len,
net_socket_send_completed) == 0) {
 net_socket_read_poll(s, false);
 break;
-- 
2.1.4

Re: [Qemu-devel] [PATCH] vl: move rom_load_all after machine init done

2015-07-07 Thread Paolo Bonzini



On 07/07/2015 11:00, Eric Auger wrote:
> Hi Paolo, Peter,
> On 06/22/2015 11:58 AM, Eric Auger wrote:
>> On 06/22/2015 11:53 AM, Paolo Bonzini wrote:
>>>
>>>
>>> On 22/06/2015 11:49, Eric Auger wrote:
>> It seems safe because rom_load_all really doesn't load anything, it only
>> does an overlap check.  Is this right?
 it does the check + isrom field setting
>>
>> Is the bug that some overlapping ROMs are not detected?  The commit
>> message is not clear.
 The regression is that the both overlap check and isrom setting are not
 done since ROM are inserted in the roms list afterwards, at machine init
 done time. The bug was not really observed yet I think.
>>>
>>> isrom is just an optimization though, right?  What is it useful for?
>> My understanding is it serves 2 purposes:
>>
>> - report info in the monitor (hmp_info_roms)
>> - decide whether the rom->data can be freed on ROM reset notifier
>> (rom_reset).
>>
>> Hope I didn't miss anything else.
>>
>> Eric
> 
> What do we decide then about this regression on arm. Do we fix it in 2.4
> or later?

Yes, it should be fixed in 2.4.

Paolo

Re: [Qemu-devel] [PATCH v2] net: Flush queued packets when guest resumes

2015-07-07 Thread Fam Zheng

On Tue, 07/07 15:44, Jason Wang wrote:
> 
> 
> On 07/07/2015 09:21 AM, Fam Zheng wrote:
> > Since commit 6e99c63 "net/socket: Drop net_socket_can_send" and friends,
> > net queues need to be explicitly flushed after qemu_can_send_packet()
> > returns false, because the netdev side will disable the polling of fd.
> >
> > This fixes the case of "cont" after "stop" (or migration).
> >
> > Signed-off-by: Fam Zheng 
> >
> > ---
> >
> > v2: Unify with VM stop handler. (Stefan)
> > ---
> >  net/net.c | 19 ---
> >  1 file changed, 12 insertions(+), 7 deletions(-)
> >
> > diff --git a/net/net.c b/net/net.c
> > index 6ff7fec..28a5597 100644
> > --- a/net/net.c
> > +++ b/net/net.c
> > @@ -1257,14 +1257,19 @@ void qmp_set_link(const char *name, bool up, Error 
> > **errp)
> >  static void net_vm_change_state_handler(void *opaque, int running,
> >  RunState state)
> >  {
> > -/* Complete all queued packets, to guarantee we don't modify
> > - * state later when VM is not running.
> > - */
> > -if (!running) {
> > -NetClientState *nc;
> > -NetClientState *tmp;
> > +NetClientState *nc;
> > +NetClientState *tmp;
> >  
> > -QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> > +QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> > +if (running) {
> > +/* Flush queued packets and wake up backends. */
> > +if (nc->peer && qemu_can_send_packet(nc)) {
> > +qemu_flush_queued_packets(nc->peer);
> > +}
> > +} else {
> > +/* Complete all queued packets, to guarantee we don't modify
> > + * state later when VM is not running.
> > + */
> >  qemu_flush_or_purge_queued_packets(nc, true);
> >  }
> 
> Looks like qemu_can_send_packet() checks both nc->peer and runstate. So
> probably, we can simplify this to:
> 
> if (qemu_can_send_packet(nc))
> qemu_flush_queued_packets(nc->peer);
> else
> qemu_flush_or_purge_queued_packets(nc, true);
> 
> >  }
> 

qemu_can_send_packet returns 1 if !nc->peer, so this doesn't work.

Fam

Re: [Qemu-devel] [PATCH] vl: move rom_load_all after machine init done

2015-07-07 Thread Eric Auger

On 07/07/2015 11:02 AM, Paolo Bonzini wrote:
> 
> 
> On 07/07/2015 11:00, Eric Auger wrote:
>> Hi Paolo, Peter,
>> On 06/22/2015 11:58 AM, Eric Auger wrote:
>>> On 06/22/2015 11:53 AM, Paolo Bonzini wrote:


 On 22/06/2015 11:49, Eric Auger wrote:
>>> It seems safe because rom_load_all really doesn't load anything, it only
>>> does an overlap check.  Is this right?
> it does the check + isrom field setting
>>>
>>> Is the bug that some overlapping ROMs are not detected?  The commit
>>> message is not clear.
> The regression is that the both overlap check and isrom setting are not
> done since ROM are inserted in the roms list afterwards, at machine init
> done time. The bug was not really observed yet I think.

 isrom is just an optimization though, right?  What is it useful for?
>>> My understanding is it serves 2 purposes:
>>>
>>> - report info in the monitor (hmp_info_roms)
>>> - decide whether the rom->data can be freed on ROM reset notifier
>>> (rom_reset).
>>>
>>> Hope I didn't miss anything else.
>>>
>>> Eric
>>
>> What do we decide then about this regression on arm. Do we fix it in 2.4
>> or later?
> 
> Yes, it should be fixed in 2.4.
Do you want me to resend it with a new commit message or is the context
clearer now?

Thanks

Eric
> 
> Paolo
>

[Qemu-devel] [PATCH COLO-BLOCK v8 11/18] Allow creating backup jobs when opening BDS

2015-07-07 Thread Wen Congyang

When opening BDS, we need to create backup jobs for
image-fleecing.

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Cc: Jeff Cody 
Reviewed-by: Stefan Hajnoczi 
---
 block/Makefile.objs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/Makefile.objs b/block/Makefile.objs
index c34fd7c..f068666 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -22,10 +22,10 @@ block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
 block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o
 block-obj-y += write-threshold.o
+block-obj-y += backup.o
 
 common-obj-y += stream.o
 common-obj-y += commit.o
-common-obj-y += backup.o
 
 iscsi.o-cflags := $(LIBISCSI_CFLAGS)
 iscsi.o-libs   := $(LIBISCSI_LIBS)
-- 
2.4.3

Re: [Qemu-devel] [PATCH v2] net: Flush queued packets when guest resumes

2015-07-07 Thread Fam Zheng

On Tue, 07/07 11:13, Michael S. Tsirkin wrote:
> On Tue, Jul 07, 2015 at 09:21:07AM +0800, Fam Zheng wrote:
> > Since commit 6e99c63 "net/socket: Drop net_socket_can_send" and friends,
> > net queues need to be explicitly flushed after qemu_can_send_packet()
> > returns false, because the netdev side will disable the polling of fd.
> > 
> > This fixes the case of "cont" after "stop" (or migration).
> > 
> > Signed-off-by: Fam Zheng 
> 
> Note virtio has its own handler which must be used to
> flush packets - this one might run too early or too late.

Which handler do you mean? I don't think virtio-net handles resume now. (If it
does, we probably should drop it together with this change, since it's needed
by as all NICs.)

Fam

> 
> > ---
> > 
> > v2: Unify with VM stop handler. (Stefan)
> > ---
> >  net/net.c | 19 ---
> >  1 file changed, 12 insertions(+), 7 deletions(-)
> > 
> > diff --git a/net/net.c b/net/net.c
> > index 6ff7fec..28a5597 100644
> > --- a/net/net.c
> > +++ b/net/net.c
> > @@ -1257,14 +1257,19 @@ void qmp_set_link(const char *name, bool up, Error 
> > **errp)
> >  static void net_vm_change_state_handler(void *opaque, int running,
> >  RunState state)
> >  {
> > -/* Complete all queued packets, to guarantee we don't modify
> > - * state later when VM is not running.
> > - */
> > -if (!running) {
> > -NetClientState *nc;
> > -NetClientState *tmp;
> > +NetClientState *nc;
> > +NetClientState *tmp;
> >  
> > -QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> > +QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> > +if (running) {
> > +/* Flush queued packets and wake up backends. */
> > +if (nc->peer && qemu_can_send_packet(nc)) {
> > +qemu_flush_queued_packets(nc->peer);
> > +}
> > +} else {
> > +/* Complete all queued packets, to guarantee we don't modify
> > + * state later when VM is not running.
> > + */
> >  qemu_flush_or_purge_queued_packets(nc, true);
> >  }
> >  }
> > -- 
> > 2.4.3

Re: [Qemu-devel] Fwd: Using QCOW2 with nand flashes.

2015-07-07 Thread sai pavan

On Tue, Jul 7, 2015 at 2:11 PM, Peter Crosthwaite
 wrote:
> On Mon, Jul 6, 2015 at 11:54 PM, sai pavan  wrote:
>>
>> Hi,
>>
>> I am trying to implement fake disk images for emulating nand flashes.
>> I see the spares files are formed when the content is zeros. But for nand
>> flashes the content is all one's initially. It is difficult for me make a
>> sparse file with all ones.
>>
>> Do any one have suggestions for this problem.
>>
>> I am thinking of creating an nand flash file with all zeros and negating the
>> data at receiving end in qemu.
>
> Could this be a feature of qcow or some other file format rather than
> a NAND specific thing? It probably applies to other flash media.
Yeah, its for all flash devices. Could be useful to emulate a bigger
spi flash too.
Let me know if qemu has any similar implantation in blockdev.

Thanks,
Sai Pavan.
>
> Regards,
> Peter
>
>> So the input file will be null, but the
>> concept of all 1's be intact. But this will be confusing if some one likes
>> to compare the output bin files after a write. One should read the data
>> negating.
>>
>> Regards,
>> Sai Pavan
>>
>>
>>

Re: [Qemu-devel] [PATCH 2.4] socket: pass correct size in net_socket_send()

2015-07-07 Thread Fam Zheng

On Tue, 07/07 17:00, Jason Wang wrote:
> We should pass the size of packet instead of the remaining to
> qemu_send_packet_async().
> 
> Fixes: 6e99c631f116221d169ea53953d91b8aa74d297a
>("net/socket: Drop net_socket_can_send")
> 
> Signed-off-by: Jason Wang 

Thanks!

Reviewed-by: Fam Zheng 

> ---
>  net/socket.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/socket.c b/net/socket.c
> index c752696..b1e3b1c 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -213,7 +213,7 @@ static void net_socket_send(void *opaque)
>  if (s->index >= s->packet_len) {
>  s->index = 0;
>  s->state = 0;
> -if (qemu_send_packet_async(&s->nc, s->buf, size,
> +if (qemu_send_packet_async(&s->nc, s->buf, s->packet_len,
> net_socket_send_completed) == 0) {
>  net_socket_read_poll(s, false);
>  break;
> -- 
> 2.1.4
>

Re: [Qemu-devel] [PATCH v2] net: Flush queued packets when guest resumes

2015-07-07 Thread Michael S. Tsirkin

On Tue, Jul 07, 2015 at 04:58:29PM +0800, Jason Wang wrote:
> 
> 
> On 07/07/2015 04:13 PM, Michael S. Tsirkin wrote:
> > On Tue, Jul 07, 2015 at 09:21:07AM +0800, Fam Zheng wrote:
> >> Since commit 6e99c63 "net/socket: Drop net_socket_can_send" and friends,
> >> net queues need to be explicitly flushed after qemu_can_send_packet()
> >> returns false, because the netdev side will disable the polling of fd.
> >>
> >> This fixes the case of "cont" after "stop" (or migration).
> >>
> >> Signed-off-by: Fam Zheng 
> > Note virtio has its own handler which must be used to
> > flush packets - this one might run too early or too late.
> 
> If runs too realy (DRIVER_OK is not set), then: packet will be dropped
> (if this patch is used with "drop virtio_net_can_receive()", or the
> queue will be purged since qemu_can_send_packet() returns false. If too
> late, at least tap read poll will be enabled. So still looks ok?

It's still not helpful for virtio. So which cards are fixed?

I just find the comment
This fixes the case of "cont" after "stop" (or migration)
too vague.
Can you please include the info about what's broken in the commit log?

> >
> >> ---
> >>
> >> v2: Unify with VM stop handler. (Stefan)
> >> ---
> >>  net/net.c | 19 ---
> >>  1 file changed, 12 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/net/net.c b/net/net.c
> >> index 6ff7fec..28a5597 100644
> >> --- a/net/net.c
> >> +++ b/net/net.c
> >> @@ -1257,14 +1257,19 @@ void qmp_set_link(const char *name, bool up, Error 
> >> **errp)
> >>  static void net_vm_change_state_handler(void *opaque, int running,
> >>  RunState state)
> >>  {
> >> -/* Complete all queued packets, to guarantee we don't modify
> >> - * state later when VM is not running.
> >> - */
> >> -if (!running) {
> >> -NetClientState *nc;
> >> -NetClientState *tmp;
> >> +NetClientState *nc;
> >> +NetClientState *tmp;
> >>  
> >> -QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> >> +QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> >> +if (running) {
> >> +/* Flush queued packets and wake up backends. */
> >> +if (nc->peer && qemu_can_send_packet(nc)) {
> >> +qemu_flush_queued_packets(nc->peer);
> >> +}
> >> +} else {
> >> +/* Complete all queued packets, to guarantee we don't modify
> >> + * state later when VM is not running.
> >> + */
> >>  qemu_flush_or_purge_queued_packets(nc, true);
> >>  }
> >>  }
> >> -- 
> >> 2.4.3

Re: [Qemu-devel] [PATCH 06/10] qga: guest exec functionality for Windows guests

2015-07-07 Thread Olga Krishtal


On 07/07/15 11:06, Denis V. Lunev wrote:

On 07/07/15 04:31, Michael Roth wrote:

Quoting Denis V. Lunev (2015-06-30 05:25:19)

From: Olga Krishtal 

Child process' stdin/stdout/stderr can be associated
with handles for communication via read/write interfaces.

The workflow should be something like this:
* Open an anonymous pipe through guest-pipe-open
* Execute a binary or a script in the guest. Arbitrary arguments and
   environment to a new child process could be passed through options
* Read/pass information from/to executed process using
   guest-file-read/write
* Collect the status of a child process

Have you seen anything like this in your testing?

{'execute':'guest-exec','arguments':{'path':'/Windows/System32/ipconfig.exe', 


  'timeout':5000}}
{"return": {"pid": 588}}
{'execute':'guest-exec-status','arguments':{'pid':588}}
{"return": {"exit": 0, "handle-stdout": -1, "handle-stderr": -1,
  "handle-stdin": -1, "signal": -1}}
{'execute':'guest-exec-status','arguments':{'pid':588}}
{"error": {"class": "GenericError", "desc": "Invalid parameter 'pid'"}}

{'execute':'guest-exec','arguments':{'path':'/Windows/System32/ipconfig.exe', 


  'timeout':5000}}
{"error": {"class": "GenericError", "desc": "CreateProcessW() failed:
  The parameter is incorrect. (error: 57)"}}
{'execute':'guest-exec','arguments':{'path':'/Windows/System32/ipconfig.exe', 


  'timeout':5000}}
{"error": {"class": "GenericError", "desc": "CreateProcessW() failed:
  The parameter is incorrect. (error: 57)"}}

First if all what version of Windows are you using?
Secondly, you do need to specify environmental variable:
sudo virsh qemu-agent-command w2k12r2 
'{"execute":"guest-exec","arguments":{"path":"/Windows/System32/ipconfig.exe", 
"timeout": 5000, "env":["MyEnv=00"]}' :
For Windows Server 2003 we do not have to pass "env" at all, but if we 
are working with Server 2008 and older we have to pass "env" = "00" if 
we do not want to use it. 
https://social.msdn.microsoft.com/Forums/windowsdesktop/en-US/ 
59450592-aa52-4170-9742-63c84bff0010/unexpected-errorinvalidparameter 
-returned-by-createprocess-too-bad?forum=windowsgeneraldevelopmentissues
This comment where included in first version of patches and I may have 
forgotten it. Try to specify env and call exec several times. It should 
work fine.

I will look closer at guest-exec-status double call.




{'execute':'guest-exec','arguments':{'path':'/Windows/System32/ipconfig.exe', 


  'timeout':5000}}
{"return": {"pid": 1836}}
I'll check this later during office time. Something definitely went 
wrong.



The guest-exec-status failures are expected since the first call reaps
everything, but the CreateProcessW() failures are not. Will look into it
more this evening, but it doesn't look like I'll be able to apply 
this in

it's current state.

I have concerns over the schema as well. I think last time we discussed
it we both seemed to agree that guest-file-open was unwieldy and
unnecessary. We should just let guest-exec return a set of file handles
instead of having users do all the plumbing.
no, the discussion was a bit different AFAIR. First of all, you have 
proposed

to use unified code to perform exec. On the other hand current mechanics
with pipes is quite inconvenient for end-users of the feature for example
for interactive shell in the guest.

We have used very simple approach for our application: pipes are not
used, the application creates VirtIO serial channel and forces guest 
through
this API to fork/exec the child using this serial as a stdio in/out. 
In this

case we do receive a convenient API for shell processing.

This means that this flexibility with direct specification of the file
descriptors is necessary.

There are two solutions from my point of view:
- keep current API, it is suitable for us
- switch to "pipe only" mechanics for guest exec, i.e. the command
   will work like "ssh" with one descriptor for read and one for write
   created automatically, but in this case we do need either a way
   to connect Unix socket in host with file descriptor in guest or
   make possibility to send events from QGA to client using QMP


I'm really sorry for chiming in right before hard freeze, very poor
timing/planning on my part.

:( can we somehow schedule this better next time? This functionality
is mandatory for us and we can not afford to drop it or forget about
it for long. There was no pressure in winter but now I am on a hard
pressure. Thus can we at least agree on API terms and come to an
agreement?


Will look at the fs/pci info patches tonight.


Signed-off-by: Olga Krishtal 
Acked-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Eric Blake 
CC: Michael Roth 
---
  qga/commands-win32.c | 309 
++-

  1 file changed, 303 insertions(+), 6 deletions(-)

diff --git a/qga/commands-win32.c b/qga/commands-win32.c
index 435a049..ad445d9 100644
--- a/qga/commands-win32.c
+++ b/qga/commands-win32.c
@@ -451,10 +451,231 @@ sta

Re: [Qemu-devel] [PATCH COLO-BLOCK v7 00/17] Block replication for continuous checkpoints

2015-07-07 Thread Dr. David Alan Gilbert

* Wen Congyang (we...@cn.fujitsu.com) wrote:
> On 07/07/2015 08:25 AM, Michael R. Hines wrote:
> > On 07/04/2015 07:46 AM, Wen Congyang wrote:
> >> At 2015/7/3 23:30, Dr. David Alan Gilbert Wrote:
> >>> * Wen Congyang (we...@cn.fujitsu.com) wrote:
>  Block replication is a very important feature which is used for
>  continuous checkpoints(for example: COLO).
> 
>  Usage:
>  Please refer to docs/block-replication.txt
> 
>  You can get the patch here:
>  https://github.com/wencongyang/qemu-colo/commits/block-replication-v7
> 
>  You can get ths patch with framework here:
>  https://github.com/wencongyang/qemu-colo/commits/colo_framework_v7.2
> >>>
> >>> Hi,
> >>>I seem to be having problems with the new listed syntax on the wiki;
> >>> on the secondary I'm getting the error
> >>>
> >>>   Block format 'replication' used by device 'virtio0' doesn't support the 
> >>> option 'export'
> >>>
> >>> ./try/bin/qemu-system-x86_64 -enable-kvm -nographic \
> >>>   -boot c -m 4096 -smp 4 -S \
> >>>   -name debug-threads=on -trace events=trace-file \
> >>>   -netdev tap,id=hn0,script=$PWD/ifup-slave,\
> >>> downscript=no,colo_script=$PWD/qemu/scripts/colo-proxy-script.sh,colo_nicname=em4
> >>>  \
> >>>   -device e1000,mac=9c:da:4d:1c:b5:89,id=net-pci0,netdev=hn0 \
> >>>   -device virtio-rng-pci \
> >>>   -drive 
> >>> if=none,driver=raw,file=/home/localvms/bugzilla.raw,id=colo1,cache=none,aio=native
> >>>  \
> >>>   -drive 
> >>> if=virtio,driver=replication,mode=secondary,export=colo1,throttling.bps-total-max=7000,\
> >>> file.file.filename=$TMPDISKS/colo-active-disk.qcow2,\
> >>> file.driver=qcow2,\
> >>> file.backing.file.filename=$TMPDISKS/colo-hidden-disk.qcow2,\
> >>> file.backing.driver=qcow2,\
> >>> file.backing.backing.backing_reference=colo1,\
> >>> file.backing.allow-write-backing-file=on \
> >>>   -incoming tcp:0:
> >>
> >> Sorry, the option export is removed, because we use the qmp command 
> >> nbd-server-add to let a BB be NBD server.
> >>
> > 
> > Still doesn't work. The server says:
> > 
> > nbd.c:nbd_receive_options():L447: read failed
> 
> This log is very stange. The NBD client connects to NBD server, and NBD 
> server wants to read data
> from NBD client, but reading fails. It seems that the connection is closed 
> unexpectedly. Can you
> give me more log and how do you use it?

That was the same failure I was getting.   I think it's that the NBD server and 
client are in different
modes, with one of them expecting the export.

Dave

> Thanks
> Wen Congyang
> 
> > nbd.c:nbd_send_negotiate():L562: option negotiation failed
> > 
> > - Michael
> > 
> > .
> > 
> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v2] net: Flush queued packets when guest resumes

2015-07-07 Thread Michael S. Tsirkin

On Tue, Jul 07, 2015 at 05:09:09PM +0800, Fam Zheng wrote:
> On Tue, 07/07 11:13, Michael S. Tsirkin wrote:
> > On Tue, Jul 07, 2015 at 09:21:07AM +0800, Fam Zheng wrote:
> > > Since commit 6e99c63 "net/socket: Drop net_socket_can_send" and friends,
> > > net queues need to be explicitly flushed after qemu_can_send_packet()
> > > returns false, because the netdev side will disable the polling of fd.
> > > 
> > > This fixes the case of "cont" after "stop" (or migration).
> > > 
> > > Signed-off-by: Fam Zheng 
> > 
> > Note virtio has its own handler which must be used to
> > flush packets - this one might run too early or too late.
> 
> Which handler do you mean? I don't think virtio-net handles resume now. (If it
> does, we probably should drop it together with this change, since it's needed
> by as all NICs.)
> 
> Fam

virtio_vmstate_change

It's all far from trivial. I suspect these whack-a-mole approach
spreading purge here and there will only create more bugs.

Why would we ever need to process network packets when
VM is not running? I don't see any point to it.
How about we simply stop the job processing network on
vm stop and restart on vm start?



> > 
> > > ---
> > > 
> > > v2: Unify with VM stop handler. (Stefan)
> > > ---
> > >  net/net.c | 19 ---
> > >  1 file changed, 12 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/net/net.c b/net/net.c
> > > index 6ff7fec..28a5597 100644
> > > --- a/net/net.c
> > > +++ b/net/net.c
> > > @@ -1257,14 +1257,19 @@ void qmp_set_link(const char *name, bool up, 
> > > Error **errp)
> > >  static void net_vm_change_state_handler(void *opaque, int running,
> > >  RunState state)
> > >  {
> > > -/* Complete all queued packets, to guarantee we don't modify
> > > - * state later when VM is not running.
> > > - */
> > > -if (!running) {
> > > -NetClientState *nc;
> > > -NetClientState *tmp;
> > > +NetClientState *nc;
> > > +NetClientState *tmp;
> > >  
> > > -QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> > > +QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> > > +if (running) {
> > > +/* Flush queued packets and wake up backends. */
> > > +if (nc->peer && qemu_can_send_packet(nc)) {
> > > +qemu_flush_queued_packets(nc->peer);
> > > +}
> > > +} else {
> > > +/* Complete all queued packets, to guarantee we don't modify
> > > + * state later when VM is not running.
> > > + */
> > >  qemu_flush_or_purge_queued_packets(nc, true);
> > >  }
> > >  }
> > > -- 
> > > 2.4.3

Re: [Qemu-devel] [PATCH] vl: move rom_load_all after machine init done

2015-07-07 Thread Paolo Bonzini



On 07/07/2015 11:07, Eric Auger wrote:
> On 07/07/2015 11:02 AM, Paolo Bonzini wrote:
>>
>>
>> On 07/07/2015 11:00, Eric Auger wrote:
>>> Hi Paolo, Peter,
>>> On 06/22/2015 11:58 AM, Eric Auger wrote:
 On 06/22/2015 11:53 AM, Paolo Bonzini wrote:
>
>
> On 22/06/2015 11:49, Eric Auger wrote:
 It seems safe because rom_load_all really doesn't load anything, it 
 only
 does an overlap check.  Is this right?
>> it does the check + isrom field setting

 Is the bug that some overlapping ROMs are not detected?  The commit
 message is not clear.
>> The regression is that the both overlap check and isrom setting are not
>> done since ROM are inserted in the roms list afterwards, at machine init
>> done time. The bug was not really observed yet I think.
>
> isrom is just an optimization though, right?  What is it useful for?
 My understanding is it serves 2 purposes:

 - report info in the monitor (hmp_info_roms)
 - decide whether the rom->data can be freed on ROM reset notifier
 (rom_reset).

 Hope I didn't miss anything else.

 Eric
>>>
>>> What do we decide then about this regression on arm. Do we fix it in 2.4
>>> or later?
>>
>> Yes, it should be fixed in 2.4.
> Do you want me to resend it with a new commit message or is the context
> clearer now?

It's okay, thanks!

Paolo

Re: [Qemu-devel] [PATCH COLO-BLOCK v7 00/17] Block replication for continuous checkpoints

2015-07-07 Thread Paolo Bonzini



On 07/07/2015 11:13, Dr. David Alan Gilbert wrote:
>> > This log is very stange. The NBD client connects to NBD server, and NBD 
>> > server wants to read data
>> > from NBD client, but reading fails. It seems that the connection is closed 
>> > unexpectedly. Can you
>> > give me more log and how do you use it?
> That was the same failure I was getting.   I think it's that the NBD server 
> and client are in different
> modes, with one of them expecting the export.

nbd_server_add always expects the export.

Paolo

Re: [Qemu-devel] qemu build fails on xen

2015-07-07 Thread Paul Durrant

> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: 07 July 2015 09:48
> To: Stefano Stabellini
> Cc: Chen, Tiejun; pbonz...@redhat.com; r...@twiddle.net;
> ehabk...@redhat.com; qemu-devel@nongnu.org; Paul Durrant; Peter
> Maydell
> Subject: Re: qemu build fails on xen
> 
> On Tue, Jul 07, 2015 at 11:45:29AM +0300, Michael S. Tsirkin wrote:
> > The following error triggers on Fedora 22:
> >
> > In file included from /scm/qemu/include/hw/xen/xen_backend.h:4:0,
> >  from hw/block/xen_disk.c:39:
> > /scm/qemu/include/hw/xen/xen_common.h:198:18: error: conflicting
> types for ‘ioservid_t’
> >  typedef uint32_t ioservid_t;
> >   ^
> > In file included from /usr/include/xen/hvm/params.h:24:0,
> >  from /usr/include/xenctrl.h:46,
> >  from /scm/qemu/include/hw/xen/xen_common.h:9,
> >  from /scm/qemu/include/hw/xen/xen_backend.h:4,
> >  from hw/block/xen_disk.c:39:
> > /usr/include/xen/hvm/hvm_op.h:255:18: note: previous declaration of
> ‘ioservid_t’ was here
> >  typedef uint16_t ioservid_t;
> >   ^
> > /scm/qemu/rules.mak:57: recipe for target 'hw/block/xen_disk.o' failed
> > make: *** [hw/block/xen_disk.o] Error 1
> > make: *** Waiting for unfinished jobs
> >
> > Reverting 3996e85c1822e05c50250f8d2d1e57b6bea1229d
> 
> Sorry - I meant reverting this commit fixes the problem.

Hmm. I'm not sure why the definition in xen_common.h is there. I guess it's 
probably for compatibility. It's clearly wrong though.

  Paul
 
> 
> 
> > Author: Paul Durrant 
> > Date:   Tue Jan 20 11:06:19 2015 +
> >
> > Xen: Use the ioreq-server API when available
> >
> >
> > Looking at that header:
> >
> > #ifndef HVM_PARAM_BUFIOREQ_EVTCHN
> > #define HVM_PARAM_BUFIOREQ_EVTCHN 26
> > #endif
> >
> > #define IOREQ_TYPE_PCI_CONFIG 2
> >
> >
> > typedef uint32_t ioservid_t;
> >
> >
> > Are all polluting the global namespace, not to mention, violate the coding
> > style. Why not prefix them with Xen_, xen_ etc?
> >
> >
> > --
> > MST

Re: [Qemu-devel] [PATCH qemu v10 10/14] spapr_pci: Enable vfio-pci hotplug

2015-07-07 Thread Alexey Kardashevskiy


On 07/07/2015 07:31 AM, Thomas Huth wrote:

On Mon,  6 Jul 2015 12:11:06 +1000
Alexey Kardashevskiy  wrote:


sPAPR IOMMU is managing two copies of an TCE table:
1) a guest view of the table - this is what emulated devices use and
this is where H_GET_TCE reads from;
2) a hardware TCE table - only present if there is at least one vfio-pci
device on a PHB; it is updated via a memory listener on a PHB address
space which forwards map/unmap requests to vfio-pci IOMMU host driver.

At the moment presence of vfio-pci devices on a bus affect the way
the guest view table is allocated. If there is no vfio-pci on a PHB
and the host kernel supports KVM acceleration of H_PUT_TCE, a table
is allocated in KVM. However, if there is vfio-pci and we do yet not
support KVM acceleration for these, the table has to be allocated
by the userspace.

When vfio-pci device is hotplugged and there were no vfio-pci devices
already, the guest view table could have been allocated by KVM which
means that H_PUT_TCE is handled by the host kernel and since we
do not support vfio-pci in KVM, the hardware table will not be updated.

This reallocates the guest view table in QEMU if the first vfio-pci
device has just been plugged. spapr_tce_realloc_userspace() handles this.


I wonder whether it would help to improve the readability of the code
later if you put the description of the function into the code instead
of the commit message?



Not sure I understood how much of this commit log you'd like to see in the 
code. The function has some comments already...





This replays all the mappings to make sure that the tables are in sync.
This will not have a visible effect though as for a new device
the guest kernel will allocate-and-map new addresses and therefore
existing mappings from emulated devices will not be used by vfio-pci
devices.

This adds calls to spapr_phb_dma_capabilities_update() in PCI hotplug
hooks.

Signed-off-by: Alexey Kardashevskiy 
---

...

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 76c988f..d1fa157 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -827,6 +827,43 @@ int spapr_phb_dma_reset(sPAPRPHBState *sphb)
  return 0;
  }

+static int spapr_phb_hotplug_dma_sync(sPAPRPHBState *sphb)
+{
+int ret = 0, i;
+bool had_vfio = sphb->has_vfio;
+sPAPRTCETable *tcet;
+
+spapr_phb_dma_capabilities_update(sphb);
+
+if (!had_vfio && sphb->has_vfio) {


 if (had_vfio || !sphb->has_vfio) {
 return 0;
 }

... and then you can save one level of indentation for the following
for-loop.


Right. I was going to add another chunk later with "if", "had_vfio" and 
"sphb->has_vfio", this is why this indentation. I'll remove this.




+for (i = 0; i < SPAPR_PCI_DMA_MAX_WINDOWS; ++i) {
+tcet = spapr_tce_find_by_liobn(SPAPR_PCI_LIOBN(sphb->index, i));
+if (!tcet || !tcet->enabled) {
+continue;
+}
+if (tcet->fd >= 0) {
+/*
+ * We got first vfio-pci device on accelerated table.
+ * VFIO acceleration is not possible.
+ * Reallocate table in userspace and replay mappings.
+ */
+ret = spapr_tce_realloc_userspace(tcet, true);
+trace_spapr_pci_dma_realloc_update(tcet->liobn, ret);
+} else {
+/* There was no acceleration, so just replay mappings. */
+ret = spapr_tce_replay(tcet);
+trace_spapr_pci_dma_update(tcet->liobn, ret);
+}
+if (ret) {
+break;
+}
+}
+return ret;
+}
+
+return 0;
+}
+
  /* Macros to operate with address in OF binding to PCI */
  #define b_x(x, p, l)(((x) & ((1<<(l))-1)) << (p))
  #define b_n(x)  b_x((x), 31, 1) /* 0 if relocatable */

...

@@ -1130,6 +1174,9 @@ static void spapr_phb_remove_pci_device_cb(DeviceState 
*dev, void *opaque)
   */
  pci_device_reset(PCI_DEVICE(dev));
  object_unparent(OBJECT(dev));
+
+/* Actual VFIO device release happens from RCU so postpone DMA update */
+call_rcu1(&((sPAPRPHBState *)opaque)->rcu, spapr_phb_remove_sync_dma);


Too much brackets again for my taste ;-)



Never too much! ;)





  }



  Thomas





--
Alexey

Re: [Qemu-devel] [PATCH] block: update bdrv_drain_all()/bdrv_drain() comments

2015-07-07 Thread Stefan Hajnoczi

On Thu, Jul 02, 2015 at 05:24:41PM +0100, Stefan Hajnoczi wrote:
> The doc comments for bdrv_drain_all() and bdrv_drain() are outdated:
> 
>  * The bdrv_drain() comment is a poor man's bdrv_lock()/bdrv_unlock()
>which Fam Zheng is currently developing.  Unfortunately this warning
>was never really enough because devices keep submitting I/O and op
>blockers don't prevent that.
> 
>  * The bdrv_drain_all() comment is still partially correct but reflects
>the nature of the implementation rather than API documentation.
> 
> Do make it clear that bdrv_drain() is only appropriate within an
> AioContext.  For anything spanning AioContexts you need
> bdrv_drain_all().
> 
> Cc: Markus Armbruster 
> Cc: Paolo Bonzini 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  block/io.c | 20 ++--
>  1 file changed, 10 insertions(+), 10 deletions(-)

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan


pgpEDL_Hp8eml.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH qemu v10 14/14] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)

2015-07-07 Thread Thomas Huth

On Mon,  6 Jul 2015 12:11:10 +1000
Alexey Kardashevskiy  wrote:

> This adds support for Dynamic DMA Windows (DDW) option defined by
> the SPAPR specification which allows to have additional DMA window(s)
> 
> This implements DDW for emulated and VFIO devices. As all TCE root regions
> are mapped at 0 and 64bit long (and actual tables are child regions),
> this replaces memory_region_add_subregion() with _overlap() to make
> QEMU memory API happy.
> 
> This reserves RTAS token numbers for DDW calls.
> 
> This implements helpers to interact with VFIO kernel interface.
> 
> This changes the TCE table migration descriptor to support dynamic
> tables as from now on, PHB will create as many stub TCE table objects
> as PHB can possibly support but not all of them might be initialized at
> the time of migration because DDW might or might not be requested by
> the guest.
> 
> The "ddw" property is enabled by default on a PHB but for compatibility
> the pseries-2.3 machine and older disable it.
> 
> This implements DDW for VFIO. The host kernel support is required.
> This adds a "levels" property to PHB to control the number of levels
> in the actual TCE table allocated by the host kernel, 0 is the default
> value to tell QEMU to calculate the correct value. Current hardware
> supports up to 5 levels.
> 
> The existing linux guests try creating one additional huge DMA window
> with 64K or 16MB pages and map the entire guest RAM to. If succeeded,
> the guest switches to dma_direct_ops and never calls TCE hypercalls
> (H_PUT_TCE,...) again. This enables VFIO devices to use the entire RAM
> and not waste time on map/unmap later. This adds a "dma64_win_addr"
> property which is a bus address for the 64bit window and by default
> set to 0x800... as this is what the modern POWER8 hardware
> uses and this allows having emulated and VFIO devices on the same bus.
> 
> This adds 4 RTAS handlers:
> * ibm,query-pe-dma-window
> * ibm,create-pe-dma-window
> * ibm,remove-pe-dma-window
> * ibm,reset-pe-dma-window
> These are registered from type_init() callback.
> 
> These RTAS handlers are implemented in a separate file to avoid polluting
> spapr_iommu.c with PCI.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
...
> diff --git a/hw/ppc/spapr_rtas_ddw.c b/hw/ppc/spapr_rtas_ddw.c
> new file mode 100644
> index 000..7539c6a
> --- /dev/null
> +++ b/hw/ppc/spapr_rtas_ddw.c
> @@ -0,0 +1,300 @@
> +/*
> + * QEMU sPAPR Dynamic DMA windows support
> + *
> + * Copyright (c) 2014 Alexey Kardashevskiy, IBM Corporation.

Happy new year?

> + *  This program is free software; you can redistribute it and/or modify
> + *  it under the terms of the GNU General Public License as published by
> + *  the Free Software Foundation; either version 2 of the License,
> + *  or (at your option) any later version.
> + *
> + *  This program is distributed in the hope that it will be useful,
> + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
> + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + *  GNU General Public License for more details.
> + *
> + *  You should have received a copy of the GNU General Public License
> + *  along with this program; if not, see .
> + */
> +
> +#include "qemu/error-report.h"
> +#include "hw/ppc/spapr.h"
> +#include "hw/pci-host/spapr.h"
> +#include "trace.h"
> +
> +static int spapr_phb_get_active_win_num_cb(Object *child, void *opaque)
> +{
> +sPAPRTCETable *tcet;
> +
> +tcet = (sPAPRTCETable *) object_dynamic_cast(child, 
> TYPE_SPAPR_TCE_TABLE);
> +if (tcet && tcet->enabled) {
> +++*(unsigned *)opaque;
> +}
> +return 0;
> +}
> +
> +static unsigned spapr_phb_get_active_win_num(sPAPRPHBState *sphb)
> +{
> +unsigned ret = 0;
> +
> +object_child_foreach(OBJECT(sphb), spapr_phb_get_active_win_num_cb, 
> &ret);
> +
> +return ret;
> +}
> +
> +static int spapr_phb_get_free_liobn_cb(Object *child, void *opaque)
> +{
> +sPAPRTCETable *tcet;
> +
> +tcet = (sPAPRTCETable *) object_dynamic_cast(child, 
> TYPE_SPAPR_TCE_TABLE);
> +if (tcet && !tcet->enabled) {
> +*(uint32_t *)opaque = tcet->liobn;
> +return 1;
> +}
> +return 0;
> +}
> +
> +static unsigned spapr_phb_get_free_liobn(sPAPRPHBState *sphb)
> +{
> +uint32_t liobn = 0;
> +
> +object_child_foreach(OBJECT(sphb), spapr_phb_get_free_liobn_cb, &liobn);
> +
> +return liobn;
> +}
> +
> +static uint32_t spapr_query_mask(struct ppc_one_seg_page_size *sps,
> + uint64_t page_mask)
> +{
> +int i, j;
> +uint32_t mask = 0;
> +const struct { int shift; uint32_t mask; } masks[] = {
> +{ 12, RTAS_DDW_PGSIZE_4K },
> +{ 16, RTAS_DDW_PGSIZE_64K },
> +{ 24, RTAS_DDW_PGSIZE_16M },
> +{ 25, RTAS_DDW_PGSIZE_32M },
> +{ 26, RTAS_DDW_PGSIZE_64M },
> +{ 27, RTAS_DDW_PGSIZE_128M },
> +{ 28, RTAS_DDW_PGSIZE_256M },
> +{ 34, RTAS_DD

Re: [Qemu-devel] [PATCH v2] net-hub: Drop can_receive

2015-07-07 Thread Fam Zheng

On Tue, 07/07 09:37, Stefan Hajnoczi wrote:
> On Tue, Jul 07, 2015 at 02:30:30PM +0800, Fam Zheng wrote:
> > This moves the semantics from net_hub_port_can_receive to receive
> > functions, by returning 0 if all receiving ports return 0. Also,
> > remember to flush the source port's queue in that case.
> > 
> > Signed-off-by: Fam Zheng 
> > ---
> >  net/hub.c | 54 +-
> >  1 file changed, 29 insertions(+), 25 deletions(-)
> 
> This patch revision doesn't take into account the special case code in
> qemu_flush_or_purge_queued_packets(), which I mentioned in my reply to
> the previous revision of this patch.
> 
> The queue is now flushed twice because you've introduced
> net_hub_port_send_cb() but qemu_flush_or_purge_queued_packets() already
> calls net_hub_flush().
> 
> If you want to get rid of net_hub_flush(), that's great.  But please
> remove the duplicate code.

Right, I missed that. I'll remove the duplicate and send again.

Re: [Qemu-devel] [PATCH qemu v10 14/14] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)

2015-07-07 Thread Alexey Kardashevskiy


On 07/06/2015 09:06 PM, David Gibson wrote:

On Mon, Jul 06, 2015 at 12:11:10PM +1000, Alexey Kardashevskiy wrote:

This adds support for Dynamic DMA Windows (DDW) option defined by
the SPAPR specification which allows to have additional DMA window(s)

This implements DDW for emulated and VFIO devices. As all TCE root regions
are mapped at 0 and 64bit long (and actual tables are child regions),
this replaces memory_region_add_subregion() with _overlap() to make
QEMU memory API happy.

This reserves RTAS token numbers for DDW calls.

This implements helpers to interact with VFIO kernel interface.

This changes the TCE table migration descriptor to support dynamic
tables as from now on, PHB will create as many stub TCE table objects
as PHB can possibly support but not all of them might be initialized at
the time of migration because DDW might or might not be requested by
the guest.

The "ddw" property is enabled by default on a PHB but for compatibility
the pseries-2.3 machine and older disable it.

This implements DDW for VFIO. The host kernel support is required.
This adds a "levels" property to PHB to control the number of levels
in the actual TCE table allocated by the host kernel, 0 is the default
value to tell QEMU to calculate the correct value. Current hardware
supports up to 5 levels.

The existing linux guests try creating one additional huge DMA window
with 64K or 16MB pages and map the entire guest RAM to. If succeeded,
the guest switches to dma_direct_ops and never calls TCE hypercalls
(H_PUT_TCE,...) again. This enables VFIO devices to use the entire RAM
and not waste time on map/unmap later. This adds a "dma64_win_addr"
property which is a bus address for the 64bit window and by default
set to 0x800... as this is what the modern POWER8 hardware
uses and this allows having emulated and VFIO devices on the same bus.

This adds 4 RTAS handlers:
* ibm,query-pe-dma-window
* ibm,create-pe-dma-window
* ibm,remove-pe-dma-window
* ibm,reset-pe-dma-window
These are registered from type_init() callback.

These RTAS handlers are implemented in a separate file to avoid polluting
spapr_iommu.c with PCI.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v10:
* added dma64_win_addr property to PHB
* removed redundand check for "!migtable" in spapr_tce_table_post_load()

v9:
* fixed default 64bit window start (from mdroth)
* fixed type cast in dma window update code (from mdroth)
* spapr_phb_dma_update() now can fail and cause hotplug failure if
hardware TCE table cannot be mapped to the same bus address as the emulated one

v7:
* fixed uninitialized variables

v6:
* rework as there is no more special device for VFIO PHB

v5:
* total rework
* enabled for machines >2.3
* fixed migration
* merged rtas handlers here

v4:
* reset handler is back in generalized form

v3:
* removed reset
* windows_num is now 1 or bigger rather than 0-based value and it is only
changed in PHB code, not in RTAS
* added page mask check in create()
* added SPAPR_PCI_DDW_MAX_WINDOWS to track how many windows are already
created

v2:
* tested on hacked emulated E1000
* implemented DDW reset on the PHB reset
* spapr_pci_ddw_remove/spapr_pci_ddw_reset are public for reuse by VFIO
---
  hw/ppc/Makefile.objs|   3 +
  hw/ppc/spapr.c  |   5 +
  hw/ppc/spapr_iommu.c|  32 -
  hw/ppc/spapr_pci.c  | 110 ++--
  hw/ppc/spapr_pci_vfio.c |  88 +
  hw/ppc/spapr_rtas_ddw.c | 300 
  hw/vfio/common.c|   2 +
  include/hw/pci-host/spapr.h |  21 +++-
  include/hw/ppc/spapr.h  |  17 ++-
  trace-events|   6 +
  10 files changed, 568 insertions(+), 16 deletions(-)
  create mode 100644 hw/ppc/spapr_rtas_ddw.c

diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index c8ab06e..0b2ff6d 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -7,6 +7,9 @@ obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o
  ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
  obj-y += spapr_pci_vfio.o
  endif
+ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES), yy)
+obj-y += spapr_rtas_ddw.o
+endif
  # PowerPC 4xx boards
  obj-y += ppc405_boards.o ppc4xx_devs.o ppc405_uc.o ppc440_bamboo.o
  obj-y += ppc4xx_pci.o
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 5ca817c..d50d50b 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1860,6 +1860,11 @@ static const TypeInfo spapr_machine_info = {
  .driver   = "spapr-pci-host-bridge",\
  .property = "dynamic-reconfiguration",\
  .value= "off",\
+},\
+{\
+.driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,\
+.property = "ddw",\
+.value= stringify(off),\
  },

  #define SPAPR_COMPAT_2_2 \
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 2d99c3b..b54c3d8 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -136,6 +136,15 @@ static IOMMUTLBEntry 
spap

Re: [Qemu-devel] [RFC PATCH V6 01/18] cpu: make cpu_thread_is_idle public.

2015-07-07 Thread Alex Bennée


fred.kon...@greensocs.com writes:

> From: KONRAD Frederic 

Why are we making this visible? Looking at the tree I can't see it being
used outside the cpus.c. I see the function is modified later for async
work. Is this something we are planing to use later?

>
> Signed-off-by: KONRAD Frederic 
> ---
>  cpus.c|  2 +-
>  include/qom/cpu.h | 11 +++
>  2 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/cpus.c b/cpus.c
> index 4f0e54d..2d62a35 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -74,7 +74,7 @@ bool cpu_is_stopped(CPUState *cpu)
>  return cpu->stopped || !runstate_is_running();
>  }
>  
> -static bool cpu_thread_is_idle(CPUState *cpu)
> +bool cpu_thread_is_idle(CPUState *cpu)
>  {
>  if (cpu->stop || cpu->queued_work_first) {
>  return false;
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index 39f0f19..af3c9e4 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -514,6 +514,17 @@ void qemu_cpu_kick(CPUState *cpu);
>  bool cpu_is_stopped(CPUState *cpu);
>  
>  /**
> + * cpu_thread_is_idle:
> + * @cpu: The CPU to check.
> + *
> + * Checks whether the CPU thread is idle.
> + *
> + * Returns: %true if the thread is idle;
> + * %false otherwise.
> + */
> +bool cpu_thread_is_idle(CPUState *cpu);
> +
> +/**
>   * run_on_cpu:
>   * @cpu: The vCPU to run on.
>   * @func: The function to be executed.

-- 
Alex Bennée

Re: [Qemu-devel] [PATCH v3 15/16] ipmi: Add ACPI table entries

2015-07-07 Thread Igor Mammedov

On Mon,  8 Jun 2015 20:12:10 -0500
miny...@acm.org wrote:

> From: Corey Minyard 
> 
> Use the new ACPI table construction tools to create an ACPI
> entry for IPMI.

Above doesn't tell what is purpose of patch.
It would be nice for commit message to describe what the patch
actually does with references to relevant specs.

> Signed-off-by: Corey Minyard 
> Acked-by: Michael S. Tsirkin 
> ---
>  hw/ipmi/Makefile.objs |   1 +
>  hw/ipmi/ipmi_acpi.c   | 122 
> ++
>  2 files changed, 123 insertions(+)
>  create mode 100644 hw/ipmi/ipmi_acpi.c
> 
> diff --git a/hw/ipmi/Makefile.objs b/hw/ipmi/Makefile.objs
> index 81fb8e7..a5ba7d5 100644
> --- a/hw/ipmi/Makefile.objs
> +++ b/hw/ipmi/Makefile.objs
> @@ -4,3 +4,4 @@ common-obj-$(CONFIG_IPMI_LOCAL) += ipmi_bmc_extern.o
>  common-obj-$(CONFIG_ISA_IPMI_KCS) += isa_ipmi_kcs.o
>  common-obj-$(CONFIG_ISA_IPMI_BT) += isa_ipmi_bt.o
>  common-obj-$(call land,$(CONFIG_IPMI),$(CONFIG_SMBIOS)) += ipmi_smbios.o
> +common-obj-$(call land,$(CONFIG_IPMI),$(CONFIG_ACPI)) += ipmi_acpi.o
if device is planned to be used only with PC machine I'd suggest to
put below functions into hw/i386/acpi-build.c
but if there is plans to make it work with ARM target as well
then a more generic hw/acpi/ipmi_acpi.c would be more suitable.

> diff --git a/hw/ipmi/ipmi_acpi.c b/hw/ipmi/ipmi_acpi.c
> new file mode 100644
> index 000..28ddbe4
> --- /dev/null
> +++ b/hw/ipmi/ipmi_acpi.c
> @@ -0,0 +1,122 @@
> +/*
> + * IPMI ACPI firmware handling
> + *
> + * Copyright (c) 2015 Corey Minyard, MontaVista Software, LLC
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a 
> copy
> + * of this software and associated documentation files (the "Software"), to 
> deal
> + * in the Software without restriction, including without limitation the 
> rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "hw/ipmi/ipmi.h"
> +#include "hw/acpi/aml-build.h"
> +#include "hw/acpi/acpi.h"
> +#include "hw/acpi/acpi-dev-tables.h"
> +#include 
> +
> +static Aml *aml_ipmi_crs(IPMIFwInfo *info)
> +{
> +Aml *crs = aml_resource_template();
> +uint8_t regspacing = info->register_spacing;
> +
> +if (regspacing == 1) {
> +regspacing = 0;
> +}
what's is purpose of ^^^ ???
Is it used but other code than ACPI, if not then just drop it and
hardcode it here naming it according to field it's used for (i.e. alignment)



> +
> +switch (info->memspace) {
> +case IPMI_MEMSPACE_IO:
> +aml_append(crs, aml_io(aml_decode16, info->base_address,
> +   info->base_address + info->register_length - 1
> +   regspacing, info->register_length));
> +break;
> +case IPMI_MEMSPACE_MEM32:
> +aml_append(crs,
> +   aml_dword_memory(aml_pos_decode,
> +aml_min_fixed, aml_max_fixed,
> +aml_non_cacheable, aml_ReadWrite,
> +0x,
> +info->base_address,
> +info->base_address + info->register_length - 1,
> +regspacing, info->register_length));
> +break;
> +case IPMI_MEMSPACE_MEM64:
> +aml_append(crs,
> +   aml_qword_memory(aml_pos_decode,
> +aml_min_fixed, aml_max_fixed,
> +aml_non_cacheable, aml_ReadWrite,
> +0xULL,
> +info->base_address,
> +info->base_address + info->register_length - 1,
> +regspacing, info->register_length));
> +break;
> +case IPMI_MEMSPACE_SMBUS:
> +aml_append(crs, aml_return(aml_int(info->base_address)));
> +break;
> +}
does device support reprogramming of "base_address" or is it fixed?

+ case for default with abort()

> +
> +if (info->interrupt_number) {
> +aml_append(crs, aml_irq_no_flags(info->interrupt_number));
> +}
> +
> +return crs;
> +}
> +
> +static vo

Re: [Qemu-devel] [PATCH 06/10] qga: guest exec functionality for Windows guests

2015-07-07 Thread Denis V. Lunev


On 07/07/15 12:12, Olga Krishtal wrote:

On 07/07/15 11:06, Denis V. Lunev wrote:

On 07/07/15 04:31, Michael Roth wrote:

Quoting Denis V. Lunev (2015-06-30 05:25:19)

From: Olga Krishtal 

Child process' stdin/stdout/stderr can be associated
with handles for communication via read/write interfaces.

The workflow should be something like this:
* Open an anonymous pipe through guest-pipe-open
* Execute a binary or a script in the guest. Arbitrary arguments and
   environment to a new child process could be passed through options
* Read/pass information from/to executed process using
   guest-file-read/write
* Collect the status of a child process

Have you seen anything like this in your testing?

{'execute':'guest-exec','arguments':{'path':'/Windows/System32/ipconfig.exe', 


  'timeout':5000}}
{"return": {"pid": 588}}
{'execute':'guest-exec-status','arguments':{'pid':588}}
{"return": {"exit": 0, "handle-stdout": -1, "handle-stderr": -1,
  "handle-stdin": -1, "signal": -1}}
{'execute':'guest-exec-status','arguments':{'pid':588}}
{"error": {"class": "GenericError", "desc": "Invalid parameter 'pid'"}}

{'execute':'guest-exec','arguments':{'path':'/Windows/System32/ipconfig.exe', 


  'timeout':5000}}
{"error": {"class": "GenericError", "desc": "CreateProcessW() failed:
  The parameter is incorrect. (error: 57)"}}
{'execute':'guest-exec','arguments':{'path':'/Windows/System32/ipconfig.exe', 


  'timeout':5000}}
{"error": {"class": "GenericError", "desc": "CreateProcessW() failed:
  The parameter is incorrect. (error: 57)"}}

First if all what version of Windows are you using?
Secondly, you do need to specify environmental variable:
sudo virsh qemu-agent-command w2k12r2 
'{"execute":"guest-exec","arguments":{"path":"/Windows/System32/ipconfig.exe", 
"timeout": 5000, "env":["MyEnv=00"]}' :


Argh I have missed this fact during internal discussion and review.
For sure this should be passed to the client. I think that it would
be better to add this automatically to the environment variables
passed to the exec arguments.

For Windows Server 2003 we do not have to pass "env" at all, but if we 
are working with Server 2008 and older we have to pass "env" = "00" if 
we do not want to use it. 
https://social.msdn.microsoft.com/Forums/windowsdesktop/en-US/ 
59450592-aa52-4170-9742-63c84bff0010/unexpected-errorinvalidparameter 
-returned-by-createprocess-too-bad?forum=windowsgeneraldevelopmentissues
This comment where included in first version of patches and I may have 
forgotten it. Try to specify env and call exec several times. It 
should work fine.

I will look closer at guest-exec-status double call.

Re: [Qemu-devel] [PATCH v2] net: Flush queued packets when guest resumes

2015-07-07 Thread Fam Zheng

On Tue, 07/07 12:10, Michael S. Tsirkin wrote:
> On Tue, Jul 07, 2015 at 04:58:29PM +0800, Jason Wang wrote:
> > 
> > 
> > On 07/07/2015 04:13 PM, Michael S. Tsirkin wrote:
> > > On Tue, Jul 07, 2015 at 09:21:07AM +0800, Fam Zheng wrote:
> > >> Since commit 6e99c63 "net/socket: Drop net_socket_can_send" and friends,
> > >> net queues need to be explicitly flushed after qemu_can_send_packet()
> > >> returns false, because the netdev side will disable the polling of fd.
> > >>
> > >> This fixes the case of "cont" after "stop" (or migration).
> > >>
> > >> Signed-off-by: Fam Zheng 
> > > Note virtio has its own handler which must be used to
> > > flush packets - this one might run too early or too late.
> > 
> > If runs too realy (DRIVER_OK is not set), then: packet will be dropped
> > (if this patch is used with "drop virtio_net_can_receive()", or the
> > queue will be purged since qemu_can_send_packet() returns false. If too
> > late, at least tap read poll will be enabled. So still looks ok?
> 
> It's still not helpful for virtio. So which cards are fixed?
> 
> I just find the comment
>   This fixes the case of "cont" after "stop" (or migration)
> too vague.
> Can you please include the info about what's broken in the commit log?

It does fix virtio-net, as well as most other cards.  HMP "stop" then "cont"
breaks net, tested with host pinging guest while doing it.

Fam

> 
> > >
> > >> ---
> > >>
> > >> v2: Unify with VM stop handler. (Stefan)
> > >> ---
> > >>  net/net.c | 19 ---
> > >>  1 file changed, 12 insertions(+), 7 deletions(-)
> > >>
> > >> diff --git a/net/net.c b/net/net.c
> > >> index 6ff7fec..28a5597 100644
> > >> --- a/net/net.c
> > >> +++ b/net/net.c
> > >> @@ -1257,14 +1257,19 @@ void qmp_set_link(const char *name, bool up, 
> > >> Error **errp)
> > >>  static void net_vm_change_state_handler(void *opaque, int running,
> > >>  RunState state)
> > >>  {
> > >> -/* Complete all queued packets, to guarantee we don't modify
> > >> - * state later when VM is not running.
> > >> - */
> > >> -if (!running) {
> > >> -NetClientState *nc;
> > >> -NetClientState *tmp;
> > >> +NetClientState *nc;
> > >> +NetClientState *tmp;
> > >>  
> > >> -QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> > >> +QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> > >> +if (running) {
> > >> +/* Flush queued packets and wake up backends. */
> > >> +if (nc->peer && qemu_can_send_packet(nc)) {
> > >> +qemu_flush_queued_packets(nc->peer);
> > >> +}
> > >> +} else {
> > >> +/* Complete all queued packets, to guarantee we don't modify
> > >> + * state later when VM is not running.
> > >> + */
> > >>  qemu_flush_or_purge_queued_packets(nc, true);
> > >>  }
> > >>  }
> > >> -- 
> > >> 2.4.3

Re: [Qemu-devel] [PATCH v4 00/10] Consolidate crypto APIs & implementations

2015-07-07 Thread Paolo Bonzini



On 01/07/2015 19:10, Daniel P. Berrange wrote:
> This small series covers the crypto consolidation patches
> I previously posted:
> 
> RFC: https://lists.nongnu.org/archive/html/qemu-devel/2015-04/msg02038.html
>  v1: https://lists.nongnu.org/archive/html/qemu-devel/2015-05/msg04267.html
>  v2: https://lists.nongnu.org/archive/html/qemu-devel/2015-06/msg00601.html
>  v3: https://lists.nongnu.org/archive/html/qemu-devel/2015-06/msg05059.html
> 
> Currently there are 5 main places in QEMU which use some
> form of cryptographic hash or cipher algorithm. These are
> the quorum block driver (hash), qcow{1,2} block driver (cipher),
> VNC password auth (cipher), VNC websockets (hash) and some
> of the CPU instruction emulation (cipher).
> 
> For ciphers the code is using the in-tree implementations
> of AES and/or the RFB cripple-DES. While there is nothing
> broken about these implementations, it is none the less
> desirable to be able to use the GNUTLS provided impls in
> cases where we are already linking to GNUTLS. This will
> allow QEMU to use FIPS certified implementations, which
> have been well audited, have some protection against
> side-channel leakage and are generally actively maintained
> by people knowledgable about encryption.
> 
> For hash digests the code is already using GNUTLS APIs.
> 
> With the TLS work, and possible future improved block device
> encryption, there will be more general purpose crypto APIs
> needed in QEMU.
> 
> It is undesirable to continue to litter the code with
> countless #ifdef WITH_GNUTLS conditionals, as it makes
> it increasingly hard to understand the code.
> 
> The goal of this series is to thus consolidate all the
> crypto code into a single logical place in QEMU - the
> source in $GIT/crypto and heads in $GIT/include/crypto
> The code in this location will provide QEMU internal
> APIs for hash digests, ciphers, and later TLS and block
> encryption primitives. The implementations will be
> backed by GNUTLS, and either libgcrypt or nettle depending
> on which of these GNUTLS is linking to. In the case where
> GNUTLS is disabled at build time, we'll still keep the
> built-in AES & RFB-cripple-DES implementations available
> so we have no regression vs today's level of support.
> 
> The callers of the crypto code can now be unconditionally
> compiled and, if needed, they can check the availability
> of algorithms they want at runtime and report clear errors
> to the CLI or QMP if not available. This is a minor
> difference in behaviour for the quorum block driver which
> would previously be disabled at compile time if gnutls
> was not available.
> 
> A future posting will include the TLS crypto APIs.
> 
> I have not attempted to convert the CPU emulation code to
> use the new crypto APIs, since that code appears to have
> quite specific need for access to the low level internal
> stages of the AES algorithm. So I've left it using the
> QEMU built-in AES code.
> 
> I've added myself in the MAINTAINERS file for the new
> directories, since it was't clear if anyone else on the
> existing QEMU maintainer list had any interest / knowledge
> in maintaining the crypto related pieces.
> 
> Changes since v3:
> 
>   - Removed need for crypto-internal.h file which was
> missing from v3 patches sent.
>   - Resolve conflicts with error reporting & main loop
> API changes / cleanup on master
> 
> Changes since v2:
> 
>   - Remove _(..) gettext markers from error messages
>   - Fix array bounds check in hash module (Richard Henderson)
>   - Fix null dereference in freeing of gcrypt cipher impl
> (Gonglei)
> 
> Changes since v1:
> 
>   - Add explicit algorithm constants for each AES key size,
> instead of inferring it from array length
>   - Share code for munging des rfb key bit order
>   - Share code for validating key array size vs algorithm
>   - Refactor built-in cipher impl to reduce number of big
> switch statements
>   - Fix uninitialized 'Error *err' var
>   - Add comments in places where error reporting should be
> 
> Daniel P. Berrange (10):
>   crypto: introduce new module for computing hash digests
>   crypto: move built-in AES implementation into crypto/
>   crypto: move built-in D3DES implementation into crypto/
>   crypto: introduce generic cipher API & built-in implementation
>   crypto: add a gcrypt cipher implementation
>   crypto: add a nettle cipher implementation
>   block: convert quorum blockdrv to use crypto APIs
>   ui: convert VNC websockets to use crypto APIs
>   block: convert qcow/qcow2 to use generic cipher API
>   ui: convert VNC to use generic cipher API
> 
>  MAINTAINERS   |   7 +
>  Makefile.objs |   1 +
>  block/Makefile.objs   |   2 +-
>  block/qcow.c  | 102 ++---
>  block/qcow2-cluster.c |  46 +++-
>  block/qcow2.c |  96 
>  block/qcow2.h |  13 +-
>  block/quorum.

Re: [Qemu-devel] [PATCH qemu v10 13/14] vfio: spapr: Add SPAPR IOMMU v2 support (DMA memory preregistering)

2015-07-07 Thread Alexey Kardashevskiy


On 07/07/2015 05:23 PM, Thomas Huth wrote:

On Mon,  6 Jul 2015 12:11:09 +1000
Alexey Kardashevskiy  wrote:


This makes use of the new "memory registering" feature. The idea is
to provide the userspace ability to notify the host kernel about pages
which are going to be used for DMA. Having this information, the host
kernel can pin them all once per user process, do locked pages
accounting (once) and not spent time on doing that in real time with
possible failures which cannot be handled nicely in some cases.

This adds a guest RAM memory listener which notifies a VFIO container
about memory which needs to be pinned/unpinned. VFIO MMIO regions
(i.e. "skip dump" regions) are skipped.

The feature is only enabled for SPAPR IOMMU v2. The host kernel changes
are required. Since v2 does not need/support VFIO_IOMMU_ENABLE, this does
not call it when v2 is detected and enabled.

This does not change the guest visible interface.

Signed-off-by: Alexey Kardashevskiy 
Reviewed-by: David Gibson 
---
Changes:
v9:
* since there is no more SPAPR-specific data in container::iommu_data,
the memory preregistration fields are common and potentially can be used
by other architectures

v7:
* in vfio_spapr_ram_listener_region_del(), do unref() after ioctl()
* s'ramlistener'register_listener'

v6:
* fixed commit log (s/guest/userspace/), added note about no guest visible
change
* fixed error checking if ram registration failed
* added alignment check for section->offset_within_region

v5:
* simplified the patch
* added trace points
* added round_up() for the size
* SPAPR IOMMU v2 used
---
  hw/vfio/common.c  | 109 ++
  include/hw/vfio/vfio-common.h |   3 ++
  trace-events  |   1 +
  3 files changed, 104 insertions(+), 9 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 8eacfd7..0c7ba8c 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -488,6 +488,76 @@ static void vfio_listener_release(VFIOContainer *container)
  memory_listener_unregister(&container->iommu_data.type1.listener);
  }

+static void vfio_ram_do_region(VFIOContainer *container,
+  MemoryRegionSection *section, unsigned long req)
+{
+int ret;
+struct vfio_iommu_spapr_register_memory reg = { .argsz = sizeof(reg) };
+
+if (!memory_region_is_ram(section->mr) ||
+memory_region_is_skip_dump(section->mr)) {
+return;
+}
+
+if (unlikely((section->offset_within_region & (getpagesize() - 1 {
+error_report("%s received unaligned region", __func__);
+return;
+}
+
+reg.vaddr = (__u64) memory_region_get_ram_ptr(section->mr) +


We're in usespace here ... I think it would be better to use uint64_t
instead of the kernel-type __u64.



We are calling a kernel here - @reg is a kernel-defined struct.





+section->offset_within_region;
+reg.size = ROUND_UP(int128_get64(section->size), TARGET_PAGE_SIZE);
+
+ret = ioctl(container->fd, req, ®);
+trace_vfio_ram_register(_IOC_NR(req) - VFIO_BASE, reg.vaddr, reg.size,
+ret ? -errno : 0);
+if (!ret) {
+return;
+}
+
+/*
+ * On the initfn path, store the first error in the container so we
+ * can gracefully fail.  Runtime, there's not much we can do other
+ * than throw a hardware error.
+ */
+if (!container->iommu_data.ram_reg_initialized) {
+if (!container->iommu_data.ram_reg_error) {
+container->iommu_data.ram_reg_error = -errno;
+}
+} else {
+hw_error("vfio: RAM registering failed, unable to continue");
+}
+}
+
+static void vfio_ram_listener_region_add(MemoryListener *listener,
+ MemoryRegionSection *section)
+{
+VFIOContainer *container = container_of(listener, VFIOContainer,
+iommu_data.register_listener);
+memory_region_ref(section->mr);
+vfio_ram_do_region(container, section, VFIO_IOMMU_SPAPR_REGISTER_MEMORY);
+}
+
+static void vfio_ram_listener_region_del(MemoryListener *listener,
+ MemoryRegionSection *section)
+{
+VFIOContainer *container = container_of(listener, VFIOContainer,
+iommu_data.register_listener);
+vfio_ram_do_region(container, section, VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY);
+memory_region_unref(section->mr);
+}
+
+static const MemoryListener vfio_ram_memory_listener = {
+.region_add = vfio_ram_listener_region_add,
+.region_del = vfio_ram_listener_region_del,
+};
+
+static void vfio_spapr_listener_release_v2(VFIOContainer *container)
+{
+memory_listener_unregister(&container->iommu_data.register_listener);
+vfio_listener_release(container);
+}
+
  int vfio_mmap_region(Object *obj, VFIORegion *region,
   MemoryRegion *mem, MemoryRegion *submem,
   void **map, size_t size, off_t o

Re: [Qemu-devel] [Bug 1472083] [NEW] Qemu 2.1.2 hang when stop command

2015-07-07 Thread Stefan Hajnoczi

On Tue, Jul 07, 2015 at 05:36:38AM -, changlimin wrote:
> Qemu 2.1.2, Linux kernel 3.13.6, this is the stack.

If you are running a distro packaged QEMU, please report this to the
distro.

If you have built QEMU from source, please try the latest stable release
(QEMU 2.3).

> #0  in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
> #1  in qemu_poll_ns (fds=0x7fa82a8de380, nfds=1, timeout=-1) at 
> qemu-timer.c:314
> #2  in aio_poll (ctx=0x7fa82a8b5000, blocking=true) at aio-posix.c:250
> #3  in bdrv_drain_all () at block.c:1924
> #4  in do_vm_stop (state=RUN_STATE_PAUSED) at /qemu-2.1.2/cpus.c:544
> #5  in vm_stop (state=RUN_STATE_PAUSED) at /qemu-2.1.2/cpus.c:1227
> #6  in qmp_stop (errp=0x7b6dcaf8) at qmp.c:98
> #7  in qmp_marshal_input_stop (mon=0x7fa82a8e0970, qdict=0x7fa830295020, 
> ret=0x7b6dcb48) at qmp-marshal.c:2806
> #8  in qmp_call_cmd (mon=0x7fa82a8e0970, cmd=0x7fa8290558a0, 
> params=0x7fa830295020)  at /qemu-2.1.2/monitor.c:5038
> #9  in handle_qmp_command (parser=0x7fa82a8e0a28, tokens=0x7fa82a8d9b50) at 
> /qemu-2.1.2/monitor.c:5104
> #10 in json_message_process_token (lexer=0x7fa82a8e0a30, 
> token=0x7fa830122b60, type=JSON_OPERATOR, x=39, y=17865) at 
> qobject/json-streamer.c:87
> #11 in json_lexer_feed_char (lexer=0x7fa82a8e0a30, ch=125 '}', flush=false) 
> at qobject/json-lexer.c:303
> #12 in json_lexer_feed (lexer=0x7fa82a8e0a30, buffer=0x7b6dcdb0 
> "}\315m\373\377\177", size=1) at qobject/json-lexer.c:356
> #13 in json_message_parser_feed (parser=0x7fa82a8e0a28, buffer=0x7b6dcdb0 
> "}\315m\373\377\177", size=1) at qobject/json-streamer.c:111
> #14 in monitor_control_read (opaque=0x7fa82a8e0970, buf=0x7b6dcdb0 
> "}\315m\373\377\177", size=1) at /qemu-2.1.2/monitor.c:5125
> #15 in qemu_chr_be_write (s=0x7fa82a8c2020, buf=0x7b6dcdb0 
> "}\315m\373\377\177", len=1) at qemu-char.c:213
> #16 in tcp_chr_read (chan=0x7fa82a8c4ba0, cond=G_IO_IN, 
> opaque=0x7fa82a8c2020) at qemu-char.c:2729
> #17 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
> #18 in glib_pollfds_poll () at main-loop.c:190
> #19 in os_host_main_loop_wait (timeout=2400) at main-loop.c:235
> #20 in main_loop_wait (nonblocking=0) at main-loop.c:484
> #21 in main_loop () at vl.c:2034
> #22 in main (argc=55, argv=0x7b6de338, envp=0x7b6de4f8) at vl.c:4583

There is not enough information here to determine the cause of the hang.
Please post the QEMU command-line so we know the guest configuration.


pgpMtRviajTz0.pgp
Description: PGP signature

[Qemu-devel] [PATCH] acpi: split out ICH ACPI support

2015-07-07 Thread Michael S. Tsirkin

MIPS doesn't need it, and including it creates problem as we are adding
dependency on ISA LPC bridge.

Signed-off-by: Michael S. Tsirkin 
---
 default-configs/i386-softmmu.mak   | 1 +
 default-configs/x86_64-softmmu.mak | 1 +
 hw/acpi/Makefile.objs  | 3 ++-
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 91d602c..48b5762 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -16,6 +16,7 @@ CONFIG_PCKBD=y
 CONFIG_FDC=y
 CONFIG_ACPI=y
 CONFIG_ACPI_X86=y
+CONFIG_ACPI_X86_ICH=y
 CONFIG_ACPI_MEMORY_HOTPLUG=y
 CONFIG_ACPI_CPU_HOTPLUG=y
 CONFIG_APM=y
diff --git a/default-configs/x86_64-softmmu.mak 
b/default-configs/x86_64-softmmu.mak
index 62575eb..4962ed7 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -17,6 +17,7 @@ CONFIG_PCKBD=y
 CONFIG_FDC=y
 CONFIG_ACPI=y
 CONFIG_ACPI_X86=y
+CONFIG_ACPI_X86_ICH=y
 CONFIG_ACPI_MEMORY_HOTPLUG=y
 CONFIG_ACPI_CPU_HOTPLUG=y
 CONFIG_APM=y
diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
index 3db1f07..7d3230c 100644
--- a/hw/acpi/Makefile.objs
+++ b/hw/acpi/Makefile.objs
@@ -1,4 +1,5 @@
-common-obj-$(CONFIG_ACPI_X86) += core.o piix4.o ich9.o pcihp.o tco.o
+common-obj-$(CONFIG_ACPI_X86) += core.o piix4.o pcihp.o
+common-obj-$(CONFIG_ACPI_X86_ICH) += ich9.o tco.o
 common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu_hotplug.o
 common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
 common-obj-$(CONFIG_ACPI) += acpi_interface.o
-- 
MST

Re: [Qemu-devel] [PATCH 06/10] qga: guest exec functionality for Windows guests

2015-07-07 Thread Olga Krishtal


On 07/07/15 12:12, Olga Krishtal wrote:

On 07/07/15 11:06, Denis V. Lunev wrote:

On 07/07/15 04:31, Michael Roth wrote:

Quoting Denis V. Lunev (2015-06-30 05:25:19)

From: Olga Krishtal 

Child process' stdin/stdout/stderr can be associated
with handles for communication via read/write interfaces.

The workflow should be something like this:
* Open an anonymous pipe through guest-pipe-open
* Execute a binary or a script in the guest. Arbitrary arguments and
   environment to a new child process could be passed through options
* Read/pass information from/to executed process using
   guest-file-read/write
* Collect the status of a child process

Have you seen anything like this in your testing?

{'execute':'guest-exec','arguments':{'path':'/Windows/System32/ipconfig.exe', 


  'timeout':5000}}
{"return": {"pid": 588}}
{'execute':'guest-exec-status','arguments':{'pid':588}}
{"return": {"exit": 0, "handle-stdout": -1, "handle-stderr": -1,
  "handle-stdin": -1, "signal": -1}}
{'execute':'guest-exec-status','arguments':{'pid':588}}
{"error": {"class": "GenericError", "desc": "Invalid parameter 'pid'"}}
I tracked this execution -  it is absolutely normal, because in options 
of ipconfig.exe there is no timeout argument,

as I saw using ipconfig -h in cmd.
However, if we use smth like calc.exe and call exec status twice - the 
output will be normal:
sudo virsh qemu-agent-command w2k12r2 
'{"execute":"guest-exec-status","arguments":{"pid":2840}}'

setlocale: No such file or directory
{"return":{"exit":-1,"handle-stdout":-1,"handle-stderr":-1,"handle-stdin":-1,"signal":-1}}

 sudo virsh qemu-agent-command w2k12r2 
'{"execute":"guest-exec-status","arguments":{"pid":2840}}'

setlocale: No such file or directory
{"return":{"exit":-1,"handle-stdout":-1,"handle-stderr":-1,"handle-stdin":-1,"signal":-1}}


{'execute':'guest-exec','arguments':{'path':'/Windows/System32/ipconfig.exe', 


  'timeout':5000}}
{"error": {"class": "GenericError", "desc": "CreateProcessW() failed:
  The parameter is incorrect. (error: 57)"}}
{'execute':'guest-exec','arguments':{'path':'/Windows/System32/ipconfig.exe', 


  'timeout':5000}}
{"error": {"class": "GenericError", "desc": "CreateProcessW() failed:
  The parameter is incorrect. (error: 57)"}}

First if all what version of Windows are you using?
Secondly, you do need to specify environmental variable:
sudo virsh qemu-agent-command w2k12r2 
'{"execute":"guest-exec","arguments":{"path":"/Windows/System32/ipconfig.exe", 
"timeout": 5000, "env":["MyEnv=00"]}' :
For Windows Server 2003 we do not have to pass "env" at all, but if we 
are working with Server 2008 and older we have to pass "env" = "00" if 
we do not want to use it. 
https://social.msdn.microsoft.com/Forums/windowsdesktop/en-US/ 
59450592-aa52-4170-9742-63c84bff0010/unexpected-errorinvalidparameter 
-returned-by-createprocess-too-bad?forum=windowsgeneraldevelopmentissues
This comment where included in first version of patches and I may have 
forgotten it. Try to specify env and call exec several times. It 
should work fine.

I will look closer at guest-exec-status double call.




{'execute':'guest-exec','arguments':{'path':'/Windows/System32/ipconfig.exe', 


  'timeout':5000}}
{"return": {"pid": 1836}}
I'll check this later during office time. Something definitely went 
wrong.



The guest-exec-status failures are expected since the first call reaps
everything, but the CreateProcessW() failures are not. Will look 
into it
more this evening, but it doesn't look like I'll be able to apply 
this in

it's current state.

I have concerns over the schema as well. I think last time we discussed
it we both seemed to agree that guest-file-open was unwieldy and
unnecessary. We should just let guest-exec return a set of file handles
instead of having users do all the plumbing.
no, the discussion was a bit different AFAIR. First of all, you have 
proposed

to use unified code to perform exec. On the other hand current mechanics
with pipes is quite inconvenient for end-users of the feature for 
example

for interactive shell in the guest.

We have used very simple approach for our application: pipes are not
used, the application creates VirtIO serial channel and forces guest 
through
this API to fork/exec the child using this serial as a stdio in/out. 
In this

case we do receive a convenient API for shell processing.

This means that this flexibility with direct specification of the file
descriptors is necessary.

There are two solutions from my point of view:
- keep current API, it is suitable for us
- switch to "pipe only" mechanics for guest exec, i.e. the command
   will work like "ssh" with one descriptor for read and one for write
   created automatically, but in this case we do need either a way
   to connect Unix socket in host with file descriptor in guest or
   make possibility to send events from QGA to client using QMP


I'm really sorry for chiming in right before hard freeze, very poor
timing/planning on my part.

:( can

Re: [Qemu-devel] [PATCH v5 08/11] target-i386: exception handling for seg_helper functions

2015-07-07 Thread Pavel Dovgaluk

> From: Richard Henderson [mailto:rth7...@gmail.com] On Behalf Of Richard 
> Henderson
> On 07/06/2015 09:26 AM, Pavel Dovgalyuk wrote:
> > This patch fixes exception handling for seg_helper functions.
> >
> > Signed-off-by: Pavel Dovgalyuk 
> 
> 
> No, you don't want to discriminately change every call.  That was my original
> point about not needing to change seg_helper.c or smm_helper.c.
> 
> Further, any such changes would go along with the changes in translate.c to
> remove the state saving there.
> 
> I would only change those that are "normal" memory operations, like fp loads
> etc.  The segmentation changes are rare.  The task state helpers require state
> saving anyway, so requiring a TCG search is a pessimization.

I can refine the patch, but the most of the changes should remain.
E.g., lcall helpers can cause an exception or not. TB ends in both cases.
But icount and PC values in these two situations should be different.
And lcall helpers use most of the seg functions I changed in the patch.

Pavel Dovgalyuk

Re: [Qemu-devel] [RFC PATCH V6 02/18] replace spinlock by QemuMutex.

2015-07-07 Thread Alex Bennée


fred.kon...@greensocs.com writes:

> From: KONRAD Frederic 
>
> spinlock is only used in two cases:
>   * cpu-exec.c: to protect TranslationBlock
>   * mem_helper.c: for lock helper in target-i386 (which seems broken).
>
> It's a pthread_mutex_t in user-mode so better using QemuMutex directly in this
> case.
> It allows as well to reuse tb_lock mutex of TBContext in case of multithread
> TCG.
>
> Signed-off-by: KONRAD Frederic 
> ---
>  cpu-exec.c   | 15 +++
>  include/exec/exec-all.h  |  4 ++--
>  linux-user/main.c|  6 +++---
>  target-i386/mem_helper.c | 16 +---
>  tcg/i386/tcg-target.c|  8 
>  5 files changed, 37 insertions(+), 12 deletions(-)
>
> diff --git a/cpu-exec.c b/cpu-exec.c
> index 2ffeb6e..d6336d9 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -362,7 +362,9 @@ int cpu_exec(CPUArchState *env)
>  SyncClocks sc;
>  
>  /* This must be volatile so it is not trashed by longjmp() */
> +#if defined(CONFIG_USER_ONLY)
>  volatile bool have_tb_lock = false;
> +#endif
>  
>  if (cpu->halted) {
>  if (!cpu_has_work(cpu)) {
> @@ -480,8 +482,10 @@ int cpu_exec(CPUArchState *env)
>  cpu->exception_index = EXCP_INTERRUPT;
>  cpu_loop_exit(cpu);
>  }
> -spin_lock(&tcg_ctx.tb_ctx.tb_lock);
> +#if defined(CONFIG_USER_ONLY)
> +qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
>  have_tb_lock = true;
> +#endif

Why are the locking rules different for CONFIG_USER versus system
emulation? Looking at the final tree:

>  tb = tb_find_fast(env);

this eventually ends up doing a tb_lock on the find_slow path which IIRC
is when might end up doing the actual code generation.

>  /* Note: we do it here to avoid a gcc bug on Mac OS X when
> doing it in tb_find_slow */
> @@ -503,9 +507,10 @@ int cpu_exec(CPUArchState *env)
>  tb_add_jump((TranslationBlock *)(next_tb & 
> ~TB_EXIT_MASK),
>  next_tb & TB_EXIT_MASK, tb);
>  }
> +#if defined(CONFIG_USER_ONLY)
>  have_tb_lock = false;
> -spin_unlock(&tcg_ctx.tb_ctx.tb_lock);
> -
> +qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
> +#endif
>  /* cpu_interrupt might be called while translating the
> TB, but before it is linked into a potentially
> infinite loop and becomes env->current_tb. Avoid
> @@ -572,10 +577,12 @@ int cpu_exec(CPUArchState *env)
>  #ifdef TARGET_I386
>  x86_cpu = X86_CPU(cpu);
>  #endif
> +#if defined(CONFIG_USER_ONLY)
>  if (have_tb_lock) {
> -spin_unlock(&tcg_ctx.tb_ctx.tb_lock);
> +qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
>  have_tb_lock = false;
>  }
> +#endif
>  }
>  } /* for(;;) */
>  
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index 2573e8c..44f3336 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -176,7 +176,7 @@ struct TranslationBlock {
>  struct TranslationBlock *jmp_first;
>  };
>  
> -#include "exec/spinlock.h"
> +#include "qemu/thread.h"
>  
>  typedef struct TBContext TBContext;
>  
> @@ -186,7 +186,7 @@ struct TBContext {
>  TranslationBlock *tb_phys_hash[CODE_GEN_PHYS_HASH_SIZE];
>  int nb_tbs;
>  /* any access to the tbs or the page table must use this lock */
> -spinlock_t tb_lock;
> +QemuMutex tb_lock;
>  
>  /* statistics */
>  int tb_flush_count;
> diff --git a/linux-user/main.c b/linux-user/main.c
> index c855bcc..bce3a98 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -107,7 +107,7 @@ static int pending_cpus;
>  /* Make sure everything is in a consistent state for calling fork().  */
>  void fork_start(void)
>  {
> -pthread_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
> +qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
>  pthread_mutex_lock(&exclusive_lock);
>  mmap_fork_start();
>  }
> @@ -129,11 +129,11 @@ void fork_end(int child)
>  pthread_mutex_init(&cpu_list_mutex, NULL);
>  pthread_cond_init(&exclusive_cond, NULL);
>  pthread_cond_init(&exclusive_resume, NULL);
> -pthread_mutex_init(&tcg_ctx.tb_ctx.tb_lock, NULL);
> +qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock);
>  gdbserver_fork((CPUArchState *)thread_cpu->env_ptr);
>  } else {
>  pthread_mutex_unlock(&exclusive_lock);
> -pthread_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
> +qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
>  }
>  }
>  
> diff --git a/target-i386/mem_helper.c b/target-i386/mem_helper.c
> index 1aec8a5..7106cc3 100644
> --- a/target-i386/mem_helper.c
> +++ b/target-i386/mem_helper.c
> @@ -23,17 +23,27 @@
>  
>  /* broken thread support */
>  
> -static spinlock_t global_cpu_lock = SPIN_LOCK_UNLOCKED;
> +#if d

Re: [Qemu-devel] [virtio guest] vring_need_event() from virtqueue_kick_prepare()

2015-07-07 Thread Stefan Hajnoczi

On Mon, Jul 06, 2015 at 06:13:29PM +0300, Catalin Vasile wrote:
> What is the logic behind vring_need_event() when used with
> virtqueue_kick_prepare()?
> What does the keyword >>just<< refer to from the following context:
> /* The following is used with USED_EVENT_IDX and AVAIL_EVENT_IDX */
> /* Assuming a given event_idx value from the other size, if
>  * we have just incremented index from old to new_idx,
>  * should we trigger an event? */
> ?

"just" means since the last time the host/guest-visible index field was
changed.  After avail or used rings have been processed, the index field
for that ring is published to the host/guest.  At that point a check is
made whether the other side needs to be kicked.

> I am sending 2 jobs, one after another, and the second one just does
> not want to kick, although the first one finished completely and the
> backend went back to interrupt mode, all because vring_need_event()
> returns false.

Maybe the vhost driver called vhost_disable_notify() and hasn't
re-enabled notify yet?

This could happen if the guest adds buffers to the virtqueue while the
host is processing the virtqueue.  Take a look at the vhost_net code for
how to correctly disable and re-enable notify without race conditions on
the host.

The idea behind disabling notify is to eliminate unnecessary
vmexits/notifications since the host is already processing the virtqueue
and will see new buffers.  It's like a polling vs interrupt mode.

If the vhost driver on the host doesn't implement it correctly, then the
device could stop responding to the avail ring.

pgpkpaAQtaSRm.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH 12/12] i386/kvm: Hyper-v crash msrs set/get'ers and migration

2015-07-07 Thread Paolo Bonzini



On 03/07/2015 14:01, Denis V. Lunev wrote:
> diff --git a/linux-headers/asm-x86/hyperv.h b/linux-headers/asm-x86/hyperv.h
> index ce6068d..5f88dc7 100644
> --- a/linux-headers/asm-x86/hyperv.h
> +++ b/linux-headers/asm-x86/hyperv.h
> @@ -108,6 +108,8 @@
>  #define HV_X64_HYPERCALL_PARAMS_XMM_AVAILABLE(1 << 4)
>  /* Support for a virtual guest idle state is available */
>  #define HV_X64_GUEST_IDLE_STATE_AVAILABLE(1 << 5)
> +/* Guest crash data handler available */
> +#define HV_X64_GUEST_CRASH_MSR_AVAILABLE (1 << 10)
>  
>  /*
>   * Implementation recommendations. Indicates which behaviors the hypervisor

This hunk is not present in the kernel-side patches.  I'll apply it to
the kernel KVM tree.

Paolo

Re: [Qemu-devel] [PULL 00/11] VFIO updates for 2.4-rc0

2015-07-07 Thread Peter Maydell

On 6 July 2015 at 19:34, Alex Williamson  wrote:
> The following changes since commit 7edd8e4660beb301d527257f8e04ebec0f841cb0:
>
>   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
> staging (2015-07-06 14:03:44 +0100)
>
> are available in the git repository at:
>
>
>   git://github.com/awilliam/qemu-vfio.git tags/vfio-update-20150706.0
>
> for you to fetch changes up to 43302969966bc3a95470bfc300289a83068ef5d9:
>
>   vfio/pci : Add pba_offset PCI quirk for Chelsio T5 devices (2015-07-06 
> 12:15:15 -0600)
>
> 
> VFIO updates for 2.4-rc0
> - "real" host page size API (Peter Crosthwaite)
> - platform device irqfd support (Eric Auger)
> - spapr container disconnect fix (Alexey Kardashevskiy)
> - quirk for broken Chelsio hardware (Gabriel Laupre)
> - coverity fix (Paolo Bonzini)
>

Applied, thanks.

-- PMM

Re: [Qemu-devel] [PATCH v4 0/9] GIC-500 implementation, software + KVM

2015-07-07 Thread Eric Auger

Hi Pavel,
On 07/02/2015 04:13 PM, Pavel Fedin wrote:
> This is a complete GICv3 implementation, both software emulation and KVM
> acceleration are supported.
do you plan to resend a version without TCG code, just using shared base
class?

Eric
> 
> This series is a consolidated and updated patch set, based on:
> - GIC-500 implementation:
>   http://lists.nongnu.org/archive/html/qemu-devel/2015-06/msg01512.html
> - vGICv3 implementation:
>   https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg04496.html
> I decided to repost because qemu development has progressed and it became
> difficult to apply those series to current master.
> 
> Changes from previous versions:
> - Removed RFC prefix
> - Base class separated from the rest, as was requested in RFC v2 review
> - Fixed small number of broken comments / code formatting issues, according
>   to the same review
> - Removed #if 0 in virt.c
> - vGICv3 patch set restructured and more clearly separated into portions,
>   according to old Eric's review
> - Removed duplication of low-level vGICv3 code. Common helper routines are
>   used instead.
> - Put informative commit messages
> 
> Pavel Fedin (6):
>   Add virt-v3 machine that uses GIC-500
>   Extract some reusable vGIC code
>   Set kernel_irqchip_type for the rest of ARM boards which use GIC
>   Make use of kernel_irqchip_type in kvm_arch_irqchip_create()
>   Initial implementation of vGICv3
>   Enable KVM acceleration for GICv3
> 
> Shlomo Pongratz (3):
>   Implement GIC-500 base class
>   Implement GIC-500
>   GICv3 support
> 
>  hw/arm/exynos4_boards.c|1 +
>  hw/arm/realview.c  |1 +
>  hw/arm/vexpress.c  |1 +
>  hw/arm/virt.c  |  141 ++-
>  hw/intc/Makefile.objs  |3 +
>  hw/intc/arm_gic_kvm.c  |   84 +-
>  hw/intc/arm_gicv3.c| 2086 
> 
>  hw/intc/arm_gicv3_common.c |  216 
>  hw/intc/arm_gicv3_kvm.c|  203 
>  hw/intc/gicv3_internal.h   |  159 +++
>  hw/intc/vgic_common.h  |   43 +
>  include/hw/arm/fdt.h   |2 +-
>  include/hw/arm/virt.h  |6 +-
>  include/hw/boards.h|1 +
>  include/hw/intc/arm_gicv3.h|   44 +
>  include/hw/intc/arm_gicv3_common.h |  116 ++
>  include/sysemu/kvm.h   |3 +-
>  kvm-all.c  |2 +-
>  stubs/kvm.c|2 +-
>  target-arm/cpu.h   |   12 +
>  target-arm/cpu64.c |  105 ++
>  target-arm/kvm.c   |8 +-
>  22 files changed, 3164 insertions(+), 75 deletions(-)
>  create mode 100644 hw/intc/arm_gicv3.c
>  create mode 100644 hw/intc/arm_gicv3_common.c
>  create mode 100644 hw/intc/arm_gicv3_kvm.c
>  create mode 100644 hw/intc/gicv3_internal.h
>  create mode 100644 hw/intc/vgic_common.h
>  create mode 100644 include/hw/intc/arm_gicv3.h
>  create mode 100644 include/hw/intc/arm_gicv3_common.h
>

Re: [Qemu-devel] [PATCH v6 0/12] HyperV equivalent of pvpanic driver

2015-07-07 Thread Paolo Bonzini



On 03/07/2015 14:01, Denis V. Lunev wrote:
> Windows 2012 guests can notify hypervisor about occurred guest crash
> (Windows bugcheck(BSOD)) by writing specific Hyper-V msrs. This patch does
> handling of this MSR's by KVM and sending notification to user space that
> allows to gather Windows guest crash dump by QEMU/LIBVIRT.
> 
> The idea is to provide functionality equal to pvpanic device without
> QEMU guest agent for Windows.
> 
> The idea is borrowed from Linux HyperV bus driver and validated against
> Windows 2k12.
> 
> Changes from v5:
> * added hyperv crash msrs into supported/emulated list
> * qemu: reset CPUState::crash_occurred at cpu reset
> * qemu: userspace checks kernel support of hyperv crash msrs
>   by kvm_get_supported_msrs
> 
> Changes from v4:
> * fixed typo in email of Andreas Färber 
>   my vim strangely behaves on lines with extended Deutch chars
> 
> Changes from v3:
> * remove unused HV_X64_MSR_CRASH_CTL_NOTIFY
> * added documentation section about KVM_SYSTEM_EVENT_CRASH
> * allow only supported values inside crash ctl msr
> * qemu: split patch into generic crash handling patches and hyperv specific
> * qemu: skip migration of crash ctl msr value
> 
> Changes from v2:
> * forbid modification crash ctl msr by guest
> * qemu_system_guest_panicked usage in pvpanic and s390x
> * hyper-v crash handler move from generic kvm to i386
> * hyper-v crash handler: skip fetching crash msrs just mark crash occurred
> * sync with linux-next 20150629
> * patch 11 squashed to patch 10
> * patch 9 squashed to patch 7
> 
> Changes from v1:
> * hyperv code move to hyperv.c
> * added read handlers of crash data msrs
> * added per vm and per cpu hyperv context structures
> * added saving crash msrs inside qemu cpu state
> * added qemu fetch and update of crash msrs
> * added qemu crash msrs store in cpu state and it's migration
> 
> Signed-off-by: Andrey Smetanin 
> Signed-off-by: Denis V. Lunev 
> CC: Gleb Natapov 
> CC: Paolo Bonzini 
> 

I'm queuing patches 1-8 to the KVM tree.  For patch 9-12, I've applied
them locally but would like Eduardo or Andreas to ack 11 and 12.

Paolo

Re: [Qemu-devel] [PATCH qemu v10 13/14] vfio: spapr: Add SPAPR IOMMU v2 support (DMA memory preregistering)

2015-07-07 Thread Thomas Huth

On Tue, 7 Jul 2015 20:05:25 +1000
Alexey Kardashevskiy  wrote:

> On 07/07/2015 05:23 PM, Thomas Huth wrote:
> > On Mon,  6 Jul 2015 12:11:09 +1000
> > Alexey Kardashevskiy  wrote:
...
> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >> index 8eacfd7..0c7ba8c 100644
> >> --- a/hw/vfio/common.c
> >> +++ b/hw/vfio/common.c
> >> @@ -488,6 +488,76 @@ static void vfio_listener_release(VFIOContainer 
> >> *container)
> >>   memory_listener_unregister(&container->iommu_data.type1.listener);
> >>   }
> >>
> >> +static void vfio_ram_do_region(VFIOContainer *container,
> >> +  MemoryRegionSection *section, unsigned long 
> >> req)
> >> +{
> >> +int ret;
> >> +struct vfio_iommu_spapr_register_memory reg = { .argsz = sizeof(reg) 
> >> };
> >> +
> >> +if (!memory_region_is_ram(section->mr) ||
> >> +memory_region_is_skip_dump(section->mr)) {
> >> +return;
> >> +}
> >> +
> >> +if (unlikely((section->offset_within_region & (getpagesize() - 1 {
> >> +error_report("%s received unaligned region", __func__);
> >> +return;
> >> +}
> >> +
> >> +reg.vaddr = (__u64) memory_region_get_ram_ptr(section->mr) +
> >
> > We're in usespace here ... I think it would be better to use uint64_t
> > instead of the kernel-type __u64.
> 
> We are calling a kernel here - @reg is a kernel-defined struct.

If you grep for __u64 in the QEMU sources, you'll see that hardly
anybody is using this type - even if calling ioctls. So for
consistency, I'd really suggest to use uint64_t here.

> >> @@ -698,14 +768,18 @@ static int vfio_connect_container(VFIOGroup *group, 
> >> AddressSpace *as)
> >>
> >>   container->iommu_data.type1.initialized = true;
> >>
> >> -} else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
> >> +} else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU) ||
> >> +   ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_v2_IOMMU)) {
> >> +bool v2 = !!ioctl(fd, VFIO_CHECK_EXTENSION, 
> >> VFIO_SPAPR_TCE_v2_IOMMU);
> >
> > That "!!" sounds somewhat wrong here. I think you either want to check
> > for "ioctl() == 1" (because only in this case you can be sure that v2
> > is supported), or you can simply omit the "!!" because you're 100% sure
> > that the ioctl only returns 0 or 1 (and never a negative error code).
> 
> 
> The host kernel does not return an error on these ioctls, it returns 0 or 
> 1. And "!!" is shorter than "(bool)". VFIO_CHECK_EXTENSION for Type1 does 
> exactly the same already.

Simply using nothing instead is even shorter than using "!!". The
compiler is smart enough to convert from 0 and 1 to bool.
"!!" is IMHO quite ugly and should only be used when it is really
necessary.

> >> @@ -717,19 +791,36 @@ static int vfio_connect_container(VFIOGroup *group, 
> >> AddressSpace *as)
> >>* when container fd is closed so we do not call it explicitly
> >>* in this file.
> >>*/
> >> -ret = ioctl(fd, VFIO_IOMMU_ENABLE);
> >> -if (ret) {
> >> -error_report("vfio: failed to enable container: %m");
> >> -ret = -errno;
> >> -goto free_container_exit;
> >> +if (!v2) {
> >> +ret = ioctl(fd, VFIO_IOMMU_ENABLE);
> >> +if (ret) {
> >> +error_report("vfio: failed to enable container: %m");
> >> +ret = -errno;
> >> +goto free_container_exit;
> >> +}
> >>   }
> >>
> >>   container->iommu_data.type1.listener = vfio_memory_listener;
> >> -container->iommu_data.release = vfio_listener_release;
> >> -
> >>   memory_listener_register(&container->iommu_data.type1.listener,
> >>container->space->as);
> >>
> >> +if (!v2) {
> >> +container->iommu_data.release = vfio_listener_release;
> >> +} else {
> >> +container->iommu_data.release = 
> >> vfio_spapr_listener_release_v2;
> >> +container->iommu_data.register_listener =
> >> +vfio_ram_memory_listener;
> >> +
> >> memory_listener_register(&container->iommu_data.register_listener,
> >> + &address_space_memory);
> >> +
> >> +if (container->iommu_data.ram_reg_error) {
> >> +error_report("vfio: RAM memory listener initialization 
> >> failed for container");
> >
> > Line > 80 columns?
> 
> afaik user visible strings are an exception in QEMU and kernel.

You're right for the kernel, but AFAIK QEMU (currently still) has a
hard limit at 80 columns.

 Thomas

Re: [Qemu-devel] [PATCH 12/12] i386/kvm: Hyper-v crash msrs set/get'ers and migration

2015-07-07 Thread Paolo Bonzini

On 03/07/2015 14:01, Denis V. Lunev wrote:
> @@ -904,6 +905,7 @@ typedef struct CPUX86State {
>  uint64_t msr_hv_guest_os_id;
>  uint64_t msr_hv_vapic;
>  uint64_t msr_hv_tsc;
> +uint64_t msr_hv_crash_prm[HV_X64_MSR_CRASH_PARAMS];

Do not abbreviate variable names!  The enum even says PARAMS, so use
params in the array name as well.

I've done the change locally in all files.

Paolo

>  /* exception/interrupt handling */
>  int error_code;

Re: [Qemu-devel] [RFC PATCH V6 02/18] replace spinlock by QemuMutex.

2015-07-07 Thread Paolo Bonzini



On 07/07/2015 12:15, Alex Bennée wrote:
> Why are the locking rules different for CONFIG_USER versus system
> emulation? Looking at the final tree:
> 
>> >  tb = tb_find_fast(env);
> this eventually ends up doing a tb_lock on the find_slow path which IIRC
> is when might end up doing the actual code generation.
> 

Up to this point, system emulation is using the BQL for everything.  I
guess things change later.

Paolo

Re: [Qemu-devel] [PATCH for-2.4] watchdog/diag288: correctly register for system reset requests

2015-07-07 Thread Christian Borntraeger

Am 07.07.2015 um 10:53 schrieb Cornelia Huck:
> From: Xu Wang 
> 
> The diag288 watchdog is no sysbus device, therefore it doesn't get
> triggered on resets automatically using dc->reset.
> 
> Let's register the reset handler manually, so we get correctly notified
> again when a system reset was requested. Also reset the watchdog on
> subsystem resets that don't trigger a full system reset.
> 
> Signed-off-by: Xu Wang 
> Reviewed-by: David Hildenbrand 
> Signed-off-by: Cornelia Huck 

kdump/kexec and reboot disable the watchdog.


Tested-by: Christian Borntraeger 





> ---
>  hw/s390x/s390-virtio-ccw.c | 6 +-
>  hw/watchdog/wdt_diag288.c  | 8 
>  2 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> index 3d20d6a..4c51d1a 100644
> --- a/hw/s390x/s390-virtio-ccw.c
> +++ b/hw/s390x/s390-virtio-ccw.c
> @@ -36,7 +36,7 @@ typedef struct S390CcwMachineState {
> 
>  void io_subsystem_reset(void)
>  {
> -DeviceState *css, *sclp, *flic;
> +DeviceState *css, *sclp, *flic, *diag288;
> 
>  css = DEVICE(object_resolve_path_type("", "virtual-css-bridge", NULL));
>  if (css) {
> @@ -51,6 +51,10 @@ void io_subsystem_reset(void)
>  if (flic) {
>  qdev_reset_all(flic);
>  }
> +diag288 = DEVICE(object_resolve_path_type("", "diag288", NULL));
> +if (diag288) {
> +qdev_reset_all(diag288);
> +}
>  }
> 
>  static int virtio_ccw_hcall_notify(const uint64_t *args)
> diff --git a/hw/watchdog/wdt_diag288.c b/hw/watchdog/wdt_diag288.c
> index 1185e06..2a885a4 100644
> --- a/hw/watchdog/wdt_diag288.c
> +++ b/hw/watchdog/wdt_diag288.c
> @@ -40,6 +40,13 @@ static void wdt_diag288_reset(DeviceState *dev)
>  timer_del(diag288->timer);
>  }
> 
> +static void diag288_reset(void *opaque)
> +{
> +DeviceState *diag288 = opaque;
> +
> +wdt_diag288_reset(diag288);
> +}
> +
>  static void diag288_timer_expired(void *dev)
>  {
>  qemu_log_mask(CPU_LOG_RESET, "Watchdog timer expired.\n");
> @@ -80,6 +87,7 @@ static void wdt_diag288_realize(DeviceState *dev, Error 
> **errp)
>  {
>  DIAG288State *diag288 = DIAG288(dev);
> 
> +qemu_register_reset(diag288_reset, diag288);
>  diag288->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, diag288_timer_expired,
>dev);
>  }
>

Re: [Qemu-devel] [PULL 00/10] qemu-ga patches for 2.4.0

2015-07-07 Thread Peter Maydell

On 7 July 2015 at 05:40, Michael Roth  wrote:
> Hi Peter,
>
> Sorry for the last minute pull. This is a round-up of all tested/reviewed
> qemu-ga patches posted prior to soft-freeze, along with 1 bug fix that
> came in last week.
>
> This adds win32 implementations of:
>   guest-get-fsinfo
>   guest-network-get-interfaces
>
> and modifies guest-fstrim to return per-mount results and continue on to other
> mounts even when a failure is encountered.
>
> There's also bug fixes for guest-fstrim and guest-set-time.
>
> The following changes since commit 7edd8e4660beb301d527257f8e04ebec0f841cb0:
>
>   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
> staging (2015-07-06 14:03:44 +0100)
>
> are available in the git repository at:
>
>
>   git://github.com/mdroth/qemu.git tags/qga-pull-2015-07-06-tag
>
> for you to fetch changes up to d1ad92aab4a9419538b7b1b7423a8a770c7a2859:
>
>   qga: added GuestPCIAddress information (2015-07-06 23:06:12 -0500)
>
> 
> tag for qga-pull-2015-07-06

Hi. I'm afraid this doesn't build for me for Windows:

/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c:21:22:
error: ws2ipdef.h: No such file or directory
  CCqga/vss-win32.o
  CCqga/qapi-generated/qga-qapi-types.o
/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c:25:22:
error: ntddscsi.h: No such file or directory
  CCqga/qapi-generated/qga-qapi-visit.o
  CCqga/qapi-generated/qga-qmp-marshal.o
cc1: warnings being treated as errors
/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c:39:
warning: ‘GUID_DEVINTERFACE_VOLUME’ initialized and declared ‘extern’
/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c:101: error:
expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘win2qemu’
/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c:124: error:
expected ‘)’ before ‘bus’
  ARlibqemuutil.a
/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c: In
function ‘get_disk_bus_type’:
/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c:526: error:
‘STORAGE_PROPERTY_QUERY’ undeclared (first use in this function)
/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c:526: error:
(Each undeclared identifier is reported only once
/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c:526: error:
for each function it appears in.)
/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c:526: error:
expected ‘;’ before ‘query’
/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c:527: error:
‘STORAGE_DEVICE_DESCRIPTOR’ undeclared (first use in this function)
/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c:527: error:
‘dev_desc’ undeclared (first use in this function)
/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c:527: error:
‘buf’ undeclared (first use in this function)
/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c:527:
warning: left-hand operand of comma expression has no effect
/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c:532: error:
‘query’ undeclared (first use in this function)
/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c:532: error:
‘StorageDeviceProperty’ undeclared (first use in this function)
/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c:533: error:
‘PropertyStandardQuery’ undeclared (first use in this function)
/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c:535: error:
‘IOCTL_STORAGE_QUERY_PROPERTY’ undeclared (first use in this function)
/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c:543:
warning: control reaches end of non-void function
/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c: In
function ‘build_guest_disk_info’:
/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c:552: error:
‘SCSI_ADDRESS’ undeclared (first use in this function)


/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c:552: error:
expected ‘;’ before ‘addr’
/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c:557: error:
‘scsi_ad’ undeclared (first use in this function)
/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c:557: error:
‘addr’ undeclared (first use in this function)
/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c:573:
warning: implicit declaration of function ‘find_bus_type’
/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c:573:
warning: nested extern declaration of ‘find_bus_type’
/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c:574: error:
‘BusTypeScsi’ undeclared (first use in this function)
/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c:574: error:
‘BusTypeAta’ undeclared (first use in this function)
/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c:574: error:
‘BusTypeRAID’ undeclared (first use in this function)
/home/petmay01/linaro/qemu-for-merges/qga/commands-win32.c:583: error:
‘IOCTL_SCSI_GET_ADDRESS’ undeclared (first use in this function)
make: *** [qga/commands-win32.o] Error 1

thank

Re: [Qemu-devel] [Qemu-block] [PATCH] blockjob: Don't sleep too short

2015-07-07 Thread Stefan Hajnoczi

On Mon, Jul 06, 2015 at 11:28:11AM +0800, Fam Zheng wrote:
> diff --git a/include/block/blockjob.h b/include/block/blockjob.h
> index 57d8ef1..3deb731 100644
> --- a/include/block/blockjob.h
> +++ b/include/block/blockjob.h
> @@ -146,11 +146,13 @@ void *block_job_create(const BlockJobDriver *driver, 
> BlockDriverState *bs,
> int64_t speed, BlockCompletionFunc *cb,
> void *opaque, Error **errp);
>  
> +#define BLOCK_JOB_SLEEP_NS_MIN 1000L

Please introduce a block_job_relax_cpu() or similar function instead of
changing block_job_sleep_ns() to 10 millisecond minimum.  This change
would make legitimate <10 ms users imprecise!


pgpg66JQ2zXSI.pgp
Description: PGP signature

Re: [Qemu-devel] [Qemu-block] [PATCH] block: Initialize local_err in bdrv_append_temp_snapshot

2015-07-07 Thread Stefan Hajnoczi

On Mon, Jul 06, 2015 at 12:24:44PM +0800, Fam Zheng wrote:
> Cc: qemu-sta...@nongnu.org
> Signed-off-by: Fam Zheng 
> ---
>  block.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan


pgpVhoeCiNN6u.pgp
Description: PGP signature

[Qemu-devel] [PATCH rebased for-2.4] target-i386: add ABM to Haswell* and Broadwell* CPU models

2015-07-07 Thread Paolo Bonzini

ABM is only implemented as a single instruction set by AMD; all AMD
processors support both instructions or neither. Intel considers POPCNT
as part of SSE4.2, and LZCNT as part of BMI1, but Intel also uses AMD's
ABM flag to indicate support for both POPCNT and LZCNT.  It has to be
added to Haswell and Broadwell because Haswell, by adding LZCNT, has
completed the ABM.

Tested with "qemu-kvm -cpu Haswell-noTSX,enforce" (and also with older
machine types) on an Haswell-EP machine.

Signed-off-by: Paolo Bonzini 
---
 hw/i386/pc_piix.c | 4 
 hw/i386/pc_q35.c  | 4 
 target-i386/cpu.c | 8 
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 56cdcb9..d9e9987 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -312,6 +312,10 @@ static void pc_compat_2_3(MachineState *machine)
 if (kvm_enabled()) {
 pcms->smm = ON_OFF_AUTO_OFF;
 }
+x86_cpu_compat_set_features("Haswell", FEAT_8000_0001_ECX, 0, 
CPUID_EXT3_ABM);
+x86_cpu_compat_set_features("Haswell-noTSX", FEAT_8000_0001_ECX, 0, 
CPUID_EXT3_ABM);
+x86_cpu_compat_set_features("Broadwell", FEAT_8000_0001_ECX, 0, 
CPUID_EXT3_ABM);
+x86_cpu_compat_set_features("Broadwell-noTSX", FEAT_8000_0001_ECX, 0, 
CPUID_EXT3_ABM);
 }
 
 static void pc_compat_2_2(MachineState *machine)
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 8aa3a67..a15a1b1 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -295,6 +295,10 @@ static void pc_compat_2_3(MachineState *machine)
 if (kvm_enabled()) {
 pcms->smm = ON_OFF_AUTO_OFF;
 }
+x86_cpu_compat_set_features("Haswell", FEAT_8000_0001_ECX, 0, 
CPUID_EXT3_ABM);
+x86_cpu_compat_set_features("Haswell-noTSX", FEAT_8000_0001_ECX, 0, 
CPUID_EXT3_ABM);
+x86_cpu_compat_set_features("Broadwell", FEAT_8000_0001_ECX, 0, 
CPUID_EXT3_ABM);
+x86_cpu_compat_set_features("Broadwell-noTSX", FEAT_8000_0001_ECX, 0, 
CPUID_EXT3_ABM);
 }
 
 static void pc_compat_2_2(MachineState *machine)
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 04a8408..76031e0 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -1091,7 +1091,7 @@ static X86CPUDefinition builtin_x86_defs[] = {
 CPUID_EXT2_LM | CPUID_EXT2_RDTSCP | CPUID_EXT2_NX |
 CPUID_EXT2_SYSCALL,
 .features[FEAT_8000_0001_ECX] =
-CPUID_EXT3_LAHF_LM,
+CPUID_EXT3_ABM | CPUID_EXT3_LAHF_LM,
 .features[FEAT_7_0_EBX] =
 CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 |
 CPUID_7_0_EBX_AVX2 | CPUID_7_0_EBX_SMEP |
@@ -1124,7 +1124,7 @@ static X86CPUDefinition builtin_x86_defs[] = {
 CPUID_EXT2_LM | CPUID_EXT2_RDTSCP | CPUID_EXT2_NX |
 CPUID_EXT2_SYSCALL,
 .features[FEAT_8000_0001_ECX] =
-CPUID_EXT3_LAHF_LM,
+CPUID_EXT3_ABM | CPUID_EXT3_LAHF_LM,
 .features[FEAT_7_0_EBX] =
 CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 |
 CPUID_7_0_EBX_HLE | CPUID_7_0_EBX_AVX2 | CPUID_7_0_EBX_SMEP |
@@ -1159,7 +1159,7 @@ static X86CPUDefinition builtin_x86_defs[] = {
 CPUID_EXT2_LM | CPUID_EXT2_RDTSCP | CPUID_EXT2_NX |
 CPUID_EXT2_SYSCALL,
 .features[FEAT_8000_0001_ECX] =
-CPUID_EXT3_LAHF_LM | CPUID_EXT3_3DNOWPREFETCH,
+CPUID_EXT3_ABM | CPUID_EXT3_LAHF_LM | CPUID_EXT3_3DNOWPREFETCH,
 .features[FEAT_7_0_EBX] =
 CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 |
 CPUID_7_0_EBX_AVX2 | CPUID_7_0_EBX_SMEP |
@@ -1195,7 +1195,7 @@ static X86CPUDefinition builtin_x86_defs[] = {
 CPUID_EXT2_LM | CPUID_EXT2_RDTSCP | CPUID_EXT2_NX |
 CPUID_EXT2_SYSCALL,
 .features[FEAT_8000_0001_ECX] =
-CPUID_EXT3_LAHF_LM | CPUID_EXT3_3DNOWPREFETCH,
+CPUID_EXT3_ABM | CPUID_EXT3_LAHF_LM | CPUID_EXT3_3DNOWPREFETCH,
 .features[FEAT_7_0_EBX] =
 CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 |
 CPUID_7_0_EBX_HLE | CPUID_7_0_EBX_AVX2 | CPUID_7_0_EBX_SMEP |
-- 
2.4.3

Re: [Qemu-devel] [PATCH qemu v10 14/14] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)

2015-07-07 Thread Alexey Kardashevskiy


On 07/07/2015 07:33 PM, Thomas Huth wrote:

On Mon,  6 Jul 2015 12:11:10 +1000
Alexey Kardashevskiy  wrote:


This adds support for Dynamic DMA Windows (DDW) option defined by
the SPAPR specification which allows to have additional DMA window(s)

This implements DDW for emulated and VFIO devices. As all TCE root regions
are mapped at 0 and 64bit long (and actual tables are child regions),
this replaces memory_region_add_subregion() with _overlap() to make
QEMU memory API happy.

This reserves RTAS token numbers for DDW calls.

This implements helpers to interact with VFIO kernel interface.

This changes the TCE table migration descriptor to support dynamic
tables as from now on, PHB will create as many stub TCE table objects
as PHB can possibly support but not all of them might be initialized at
the time of migration because DDW might or might not be requested by
the guest.

The "ddw" property is enabled by default on a PHB but for compatibility
the pseries-2.3 machine and older disable it.

This implements DDW for VFIO. The host kernel support is required.
This adds a "levels" property to PHB to control the number of levels
in the actual TCE table allocated by the host kernel, 0 is the default
value to tell QEMU to calculate the correct value. Current hardware
supports up to 5 levels.

The existing linux guests try creating one additional huge DMA window
with 64K or 16MB pages and map the entire guest RAM to. If succeeded,
the guest switches to dma_direct_ops and never calls TCE hypercalls
(H_PUT_TCE,...) again. This enables VFIO devices to use the entire RAM
and not waste time on map/unmap later. This adds a "dma64_win_addr"
property which is a bus address for the 64bit window and by default
set to 0x800... as this is what the modern POWER8 hardware
uses and this allows having emulated and VFIO devices on the same bus.

This adds 4 RTAS handlers:
* ibm,query-pe-dma-window
* ibm,create-pe-dma-window
* ibm,remove-pe-dma-window
* ibm,reset-pe-dma-window
These are registered from type_init() callback.

These RTAS handlers are implemented in a separate file to avoid polluting
spapr_iommu.c with PCI.

Signed-off-by: Alexey Kardashevskiy 
---

...

diff --git a/hw/ppc/spapr_rtas_ddw.c b/hw/ppc/spapr_rtas_ddw.c
new file mode 100644
index 000..7539c6a
--- /dev/null
+++ b/hw/ppc/spapr_rtas_ddw.c
@@ -0,0 +1,300 @@
+/*
+ * QEMU sPAPR Dynamic DMA windows support
+ *
+ * Copyright (c) 2014 Alexey Kardashevskiy, IBM Corporation.


Happy new year?


+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License,
+ *  or (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+#include "qemu/error-report.h"
+#include "hw/ppc/spapr.h"
+#include "hw/pci-host/spapr.h"
+#include "trace.h"
+
+static int spapr_phb_get_active_win_num_cb(Object *child, void *opaque)
+{
+sPAPRTCETable *tcet;
+
+tcet = (sPAPRTCETable *) object_dynamic_cast(child, TYPE_SPAPR_TCE_TABLE);
+if (tcet && tcet->enabled) {
+++*(unsigned *)opaque;
+}
+return 0;
+}
+
+static unsigned spapr_phb_get_active_win_num(sPAPRPHBState *sphb)
+{
+unsigned ret = 0;
+
+object_child_foreach(OBJECT(sphb), spapr_phb_get_active_win_num_cb, &ret);
+
+return ret;
+}
+
+static int spapr_phb_get_free_liobn_cb(Object *child, void *opaque)
+{
+sPAPRTCETable *tcet;
+
+tcet = (sPAPRTCETable *) object_dynamic_cast(child, TYPE_SPAPR_TCE_TABLE);
+if (tcet && !tcet->enabled) {
+*(uint32_t *)opaque = tcet->liobn;
+return 1;
+}
+return 0;
+}
+
+static unsigned spapr_phb_get_free_liobn(sPAPRPHBState *sphb)
+{
+uint32_t liobn = 0;
+
+object_child_foreach(OBJECT(sphb), spapr_phb_get_free_liobn_cb, &liobn);
+
+return liobn;
+}
+
+static uint32_t spapr_query_mask(struct ppc_one_seg_page_size *sps,
+ uint64_t page_mask)
+{
+int i, j;
+uint32_t mask = 0;
+const struct { int shift; uint32_t mask; } masks[] = {
+{ 12, RTAS_DDW_PGSIZE_4K },
+{ 16, RTAS_DDW_PGSIZE_64K },
+{ 24, RTAS_DDW_PGSIZE_16M },
+{ 25, RTAS_DDW_PGSIZE_32M },
+{ 26, RTAS_DDW_PGSIZE_64M },
+{ 27, RTAS_DDW_PGSIZE_128M },
+{ 28, RTAS_DDW_PGSIZE_256M },
+{ 34, RTAS_DDW_PGSIZE_16G },
+};
+
+for (i = 0; i < PPC_PAGE_SIZES_MAX_SZ; i++) {
+for (j = 0; j < ARRAY_SIZE(masks); ++j) {
+if ((sps[i].page_shift == masks[j].shift) &&
+(page_mask & (1U

Re: [Qemu-devel] [PATCH for-2.4] watchdog/diag288: correctly register for system reset requests

2015-07-07 Thread Peter Crosthwaite

On Tue, Jul 7, 2015 at 1:53 AM, Cornelia Huck  wrote:
> From: Xu Wang 
>
> The diag288 watchdog is no sysbus device, therefore it doesn't get
> triggered on resets automatically using dc->reset.
>
> Let's register the reset handler manually, so we get correctly notified
> again when a system reset was requested. Also reset the watchdog on
> subsystem resets that don't trigger a full system reset.
>
> Signed-off-by: Xu Wang 
> Reviewed-by: David Hildenbrand 
> Signed-off-by: Cornelia Huck 
> ---
>  hw/s390x/s390-virtio-ccw.c | 6 +-
>  hw/watchdog/wdt_diag288.c  | 8 
>  2 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> index 3d20d6a..4c51d1a 100644
> --- a/hw/s390x/s390-virtio-ccw.c
> +++ b/hw/s390x/s390-virtio-ccw.c
> @@ -36,7 +36,7 @@ typedef struct S390CcwMachineState {
>
>  void io_subsystem_reset(void)
>  {
> -DeviceState *css, *sclp, *flic;
> +DeviceState *css, *sclp, *flic, *diag288;
>
>  css = DEVICE(object_resolve_path_type("", "virtual-css-bridge", NULL));
>  if (css) {
> @@ -51,6 +51,10 @@ void io_subsystem_reset(void)
>  if (flic) {
>  qdev_reset_all(flic);
>  }
> +diag288 = DEVICE(object_resolve_path_type("", "diag288", NULL));
> +if (diag288) {
> +qdev_reset_all(diag288);
> +}
>  }
>
>  static int virtio_ccw_hcall_notify(const uint64_t *args)
> diff --git a/hw/watchdog/wdt_diag288.c b/hw/watchdog/wdt_diag288.c
> index 1185e06..2a885a4 100644
> --- a/hw/watchdog/wdt_diag288.c
> +++ b/hw/watchdog/wdt_diag288.c
> @@ -40,6 +40,13 @@ static void wdt_diag288_reset(DeviceState *dev)
>  timer_del(diag288->timer);
>  }
>
> +static void diag288_reset(void *opaque)
> +{
> +DeviceState *diag288 = opaque;
> +
> +wdt_diag288_reset(diag288);
> +}
> +
>  static void diag288_timer_expired(void *dev)
>  {
>  qemu_log_mask(CPU_LOG_RESET, "Watchdog timer expired.\n");
> @@ -80,6 +87,7 @@ static void wdt_diag288_realize(DeviceState *dev, Error 
> **errp)
>  {
>  DIAG288State *diag288 = DIAG288(dev);
>
> +qemu_register_reset(diag288_reset, diag288);

Doesn't seem right. Even if it is not a SBD it should still sit in the
QOM tree in a place where the reset is reached. Where is this device
in the QOM tree?

I.E. What string do you get with an object_get_canonical_path() of the
obj after machine init?

Regards,
Peter

>  diag288->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, diag288_timer_expired,
>dev);
>  }
> --
> 2.4.5
>
>

[Qemu-devel] [PATCH 2/2] tests: test rx recovery from cont

2015-07-07 Thread Jason Wang

Rx should be recovered after cont.

Signed-off-by: Jason Wang 
---
 tests/virtio-net-test.c | 48 
 1 file changed, 48 insertions(+)

diff --git a/tests/virtio-net-test.c b/tests/virtio-net-test.c
index 97aa442..aeae80c 100644
--- a/tests/virtio-net-test.c
+++ b/tests/virtio-net-test.c
@@ -138,6 +138,45 @@ static void tx_test(const QVirtioBus *bus, QVirtioDevice 
*dev,
 g_assert_cmpstr(buffer, ==, "TEST");
 }
 
+static void rx_stop_cont_test(const QVirtioBus *bus, QVirtioDevice *dev,
+  QGuestAllocator *alloc, QVirtQueue *vq,
+  int socket)
+{
+uint64_t req_addr;
+uint32_t free_head;
+char test[] = "TEST";
+char buffer[64];
+int len = htonl(sizeof(test));
+struct iovec iov[] = {
+{
+.iov_base = &len,
+.iov_len = sizeof(len),
+}, {
+.iov_base = test,
+.iov_len = sizeof(test),
+},
+};
+int ret;
+
+req_addr = guest_alloc(alloc, 64);
+
+free_head = qvirtqueue_add(vq, req_addr, 64, true, false);
+qvirtqueue_kick(bus, dev, vq, free_head);
+
+qmp("{ 'execute' : 'stop'}");
+
+ret = iov_send(socket, iov, 2, 0, sizeof(len) + sizeof(test));
+g_assert_cmpint(ret, ==, sizeof(test) + sizeof(len));
+
+qmp("{ 'execute' : 'cont'}");
+
+qvirtio_wait_queue_isr(bus, dev, vq, QVIRTIO_NET_TIMEOUT_US);
+memread(req_addr + 12, buffer, sizeof(test));
+g_assert_cmpstr(buffer, ==, "TEST");
+
+guest_free(alloc, req_addr);
+}
+
 static void send_recv_test(const QVirtioBus *bus, QVirtioDevice *dev,
QGuestAllocator *alloc, QVirtQueue *rvq,
QVirtQueue *tvq, int socket)
@@ -146,6 +185,13 @@ static void send_recv_test(const QVirtioBus *bus, 
QVirtioDevice *dev,
 tx_test(bus, dev, alloc, tvq, socket);
 }
 
+static void stop_cont_test(const QVirtioBus *bus, QVirtioDevice *dev,
+   QGuestAllocator *alloc, QVirtQueue *rvq,
+   QVirtQueue *tvq, int socket)
+{
+rx_stop_cont_test(bus, dev, alloc, rvq, socket);
+}
+
 static void pci_basic(gconstpointer data)
 {
 QVirtioPCIDevice *dev;
@@ -204,6 +250,8 @@ int main(int argc, char **argv)
 g_test_init(&argc, &argv, NULL);
 #ifndef _WIN32
 qtest_add_data_func("/virtio/net/pci/basic", send_recv_test, pci_basic);
+qtest_add_data_func("/virtio/net/pci/rx_stop_cont",
+stop_cont_test, pci_basic);
 #endif
 qtest_add_func("/virtio/net/pci/hotplug", hotplug);
 
-- 
2.1.4

[Qemu-devel] [PATCH 1/2] tests: introduce basic pci test for virtio-net

2015-07-07 Thread Jason Wang

Signed-off-by: Jason Wang 
---
 tests/Makefile  |   2 +-
 tests/virtio-net-test.c | 184 ++--
 2 files changed, 178 insertions(+), 8 deletions(-)

diff --git a/tests/Makefile b/tests/Makefile
index eff5e11..56179e5 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -372,7 +372,7 @@ tests/ne2000-test$(EXESUF): tests/ne2000-test.o
 tests/wdt_ib700-test$(EXESUF): tests/wdt_ib700-test.o
 tests/virtio-balloon-test$(EXESUF): tests/virtio-balloon-test.o
 tests/virtio-blk-test$(EXESUF): tests/virtio-blk-test.o $(libqos-virtio-obj-y)
-tests/virtio-net-test$(EXESUF): tests/virtio-net-test.o $(libqos-pc-obj-y)
+tests/virtio-net-test$(EXESUF): tests/virtio-net-test.o $(libqos-pc-obj-y) 
$(libqos-virtio-obj-y)
 tests/virtio-rng-test$(EXESUF): tests/virtio-rng-test.o $(libqos-pc-obj-y)
 tests/virtio-scsi-test$(EXESUF): tests/virtio-scsi-test.o 
$(libqos-virtio-obj-y)
 tests/virtio-9p-test$(EXESUF): tests/virtio-9p-test.o
diff --git a/tests/virtio-net-test.c b/tests/virtio-net-test.c
index ea7478c..97aa442 100644
--- a/tests/virtio-net-test.c
+++ b/tests/virtio-net-test.c
@@ -10,20 +10,191 @@
 #include 
 #include 
 #include "libqtest.h"
+#include "qemu-common.h"
+#include "qemu/sockets.h"
 #include "qemu/osdep.h"
-#include "libqos/pci.h"
+#include "qemu/iov.h"
+#include "libqos/pci-pc.h"
+#include "libqos/virtio.h"
+#include "libqos/virtio-pci.h"
+#include "libqos/malloc.h"
+#include "libqos/malloc-pc.h"
+#include "libqos/malloc-generic.h"
+#include "qemu/bswap.h"
 
 #define PCI_SLOT_HP 0x06
+#define PCI_SLOT0x04
+#define PCI_FN  0x00
 
-/* Tests only initialization so far. TODO: Replace with functional tests */
-static void pci_nop(void)
+#define QVIRTIO_NET_TIMEOUT_US (30 * 1000 * 1000)
+
+static void test_end(void)
+{
+qtest_end();
+}
+
+#ifndef _WIN32
+
+static QVirtioPCIDevice *virtio_net_pci_init(QPCIBus *bus, int slot)
+{
+QVirtioPCIDevice *dev;
+
+dev = qvirtio_pci_device_find(bus, QVIRTIO_NET_DEVICE_ID);
+g_assert(dev != NULL);
+g_assert_cmphex(dev->vdev.device_type, ==, QVIRTIO_NET_DEVICE_ID);
+
+qvirtio_pci_device_enable(dev);
+qvirtio_reset(&qvirtio_pci, &dev->vdev);
+qvirtio_set_acknowledge(&qvirtio_pci, &dev->vdev);
+qvirtio_set_driver(&qvirtio_pci, &dev->vdev);
+
+return dev;
+}
+
+static QPCIBus *pci_test_start(int socket)
+{
+char *cmdline;
+
+cmdline = g_strdup_printf("-netdev socket,fd=%d,id=hs0 -device "
+  "virtio-net-pci,netdev=hs0", socket);
+qtest_start(cmdline);
+g_free(cmdline);
+
+return qpci_init_pc();
+}
+
+static void driver_init(const QVirtioBus *bus, QVirtioDevice *dev)
+{
+uint32_t features;
+
+features = qvirtio_get_features(bus, dev);
+features = features & ~(QVIRTIO_F_BAD_FEATURE |
+QVIRTIO_F_RING_INDIRECT_DESC |
+QVIRTIO_F_RING_EVENT_IDX);
+qvirtio_set_features(bus, dev, features);
+
+qvirtio_set_driver_ok(bus, dev);
+}
+
+static void rx_test(const QVirtioBus *bus, QVirtioDevice *dev,
+QGuestAllocator *alloc, QVirtQueue *vq,
+int socket)
 {
+uint64_t req_addr;
+uint32_t free_head;
+char test[] = "TEST";
+char buffer[64];
+int len = htonl(sizeof(test));
+struct iovec iov[] = {
+{
+.iov_base = &len,
+.iov_len = sizeof(len),
+}, {
+.iov_base = test,
+.iov_len = sizeof(test),
+},
+};
+int ret;
+
+req_addr = guest_alloc(alloc, 64);
+
+free_head = qvirtqueue_add(vq, req_addr, 64, true, false);
+qvirtqueue_kick(bus, dev, vq, free_head);
+
+ret = iov_send(socket, iov, 2, 0, sizeof(len) + sizeof(test));
+g_assert_cmpint(ret, ==, sizeof(test) + sizeof(len));
+
+qvirtio_wait_queue_isr(bus, dev, vq, QVIRTIO_NET_TIMEOUT_US);
+memread(req_addr + 12, buffer, sizeof(test));
+g_assert_cmpstr(buffer, ==, "TEST");
+
+guest_free(alloc, req_addr);
 }
 
+static void tx_test(const QVirtioBus *bus, QVirtioDevice *dev,
+QGuestAllocator *alloc, QVirtQueue *vq,
+int socket)
+{
+uint64_t req_addr;
+uint32_t free_head;
+uint32_t len;
+char buffer[64];
+int ret;
+
+req_addr = guest_alloc(alloc, 64);
+memwrite(req_addr + 12, "TEST", 4);
+
+free_head = qvirtqueue_add(vq, req_addr, 64, false, false);
+qvirtqueue_kick(bus, dev, vq, free_head);
+
+qvirtio_wait_queue_isr(bus, dev, vq, QVIRTIO_NET_TIMEOUT_US);
+guest_free(alloc, req_addr);
+
+ret = qemu_recv(socket, &len, sizeof(len), 0);
+g_assert_cmpint(ret, ==, sizeof(len));
+len = ntohl(len);
+
+ret = qemu_recv(socket, buffer, len, 0);
+g_assert_cmpstr(buffer, ==, "TEST");
+}
+
+static void send_recv_test(const QVirtioBus *bus, QVirtioDevice *dev,
+   QGuestAllocator *alloc, QVirtQueue *rvq,
+

Re: [Qemu-devel] [PATCH qemu v10 13/14] vfio: spapr: Add SPAPR IOMMU v2 support (DMA memory preregistering)

2015-07-07 Thread Alexey Kardashevskiy


On 07/07/2015 08:21 PM, Thomas Huth wrote:

On Tue, 7 Jul 2015 20:05:25 +1000
Alexey Kardashevskiy  wrote:


On 07/07/2015 05:23 PM, Thomas Huth wrote:

On Mon,  6 Jul 2015 12:11:09 +1000
Alexey Kardashevskiy  wrote:

...

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 8eacfd7..0c7ba8c 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -488,6 +488,76 @@ static void vfio_listener_release(VFIOContainer *container)
   memory_listener_unregister(&container->iommu_data.type1.listener);
   }

+static void vfio_ram_do_region(VFIOContainer *container,
+  MemoryRegionSection *section, unsigned long req)
+{
+int ret;
+struct vfio_iommu_spapr_register_memory reg = { .argsz = sizeof(reg) };
+
+if (!memory_region_is_ram(section->mr) ||
+memory_region_is_skip_dump(section->mr)) {
+return;
+}
+
+if (unlikely((section->offset_within_region & (getpagesize() - 1 {
+error_report("%s received unaligned region", __func__);
+return;
+}
+
+reg.vaddr = (__u64) memory_region_get_ram_ptr(section->mr) +


We're in usespace here ... I think it would be better to use uint64_t
instead of the kernel-type __u64.


We are calling a kernel here - @reg is a kernel-defined struct.


If you grep for __u64 in the QEMU sources, you'll see that hardly
anybody is using this type - even if calling ioctls. So for
consistency, I'd really suggest to use uint64_t here.




I am not using it, I am packing data to a struct. So does vfio_dma_map() 
already.





@@ -698,14 +768,18 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)

   container->iommu_data.type1.initialized = true;

-} else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
+} else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU) ||
+   ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_v2_IOMMU)) {
+bool v2 = !!ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_v2_IOMMU);


That "!!" sounds somewhat wrong here. I think you either want to check
for "ioctl() == 1" (because only in this case you can be sure that v2
is supported), or you can simply omit the "!!" because you're 100% sure
that the ioctl only returns 0 or 1 (and never a negative error code).



The host kernel does not return an error on these ioctls, it returns 0 or
1. And "!!" is shorter than "(bool)". VFIO_CHECK_EXTENSION for Type1 does
exactly the same already.


Simply using nothing instead is even shorter than using "!!". The
compiler is smart enough to convert from 0 and 1 to bool.
"!!" is IMHO quite ugly and should only be used when it is really
necessary.



imho it is not but either way I'd rather follow the existing style, 
especially if I do literally the same thing (checking IOMMU version). 
Unless the original author tells me to convert all the existing occurences 
of "!!" to "!=0" (or something like this) before I post new ones.


Alex, should I get rid of "!!"s in the patch?





@@ -717,19 +791,36 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
* when container fd is closed so we do not call it explicitly
* in this file.
*/
-ret = ioctl(fd, VFIO_IOMMU_ENABLE);
-if (ret) {
-error_report("vfio: failed to enable container: %m");
-ret = -errno;
-goto free_container_exit;
+if (!v2) {
+ret = ioctl(fd, VFIO_IOMMU_ENABLE);
+if (ret) {
+error_report("vfio: failed to enable container: %m");
+ret = -errno;
+goto free_container_exit;
+}
   }

   container->iommu_data.type1.listener = vfio_memory_listener;
-container->iommu_data.release = vfio_listener_release;
-
   memory_listener_register(&container->iommu_data.type1.listener,
container->space->as);

+if (!v2) {
+container->iommu_data.release = vfio_listener_release;
+} else {
+container->iommu_data.release = vfio_spapr_listener_release_v2;
+container->iommu_data.register_listener =
+vfio_ram_memory_listener;
+memory_listener_register(&container->iommu_data.register_listener,
+ &address_space_memory);
+
+if (container->iommu_data.ram_reg_error) {
+error_report("vfio: RAM memory listener initialization failed for 
container");


Line > 80 columns?


afaik user visible strings are an exception in QEMU and kernel.


You're right for the kernel, but AFAIK QEMU (currently still) has a
hard limit at 80 columns.


This is not an error, this is warning and in fact nobody is enforcing this 
(and this is a good thing) and for example VFIO already has longer lines.




--
Alexey

Re: [Qemu-devel] [PATCH v4 00/10] Consolidate crypto APIs & implementations

2015-07-07 Thread Gonglei

On 2015/7/7 18:03, Paolo Bonzini wrote:
> 
> 
> On 01/07/2015 19:10, Daniel P. Berrange wrote:
>> This small series covers the crypto consolidation patches
>> I previously posted:
>>
>> RFC: https://lists.nongnu.org/archive/html/qemu-devel/2015-04/msg02038.html
>>  v1: https://lists.nongnu.org/archive/html/qemu-devel/2015-05/msg04267.html
>>  v2: https://lists.nongnu.org/archive/html/qemu-devel/2015-06/msg00601.html
>>  v3: https://lists.nongnu.org/archive/html/qemu-devel/2015-06/msg05059.html
>>
>> Currently there are 5 main places in QEMU which use some
>> form of cryptographic hash or cipher algorithm. These are
>> the quorum block driver (hash), qcow{1,2} block driver (cipher),
>> VNC password auth (cipher), VNC websockets (hash) and some
>> of the CPU instruction emulation (cipher).
>>
>> For ciphers the code is using the in-tree implementations
>> of AES and/or the RFB cripple-DES. While there is nothing
>> broken about these implementations, it is none the less
>> desirable to be able to use the GNUTLS provided impls in
>> cases where we are already linking to GNUTLS. This will
>> allow QEMU to use FIPS certified implementations, which
>> have been well audited, have some protection against
>> side-channel leakage and are generally actively maintained
>> by people knowledgable about encryption.
>>
>> For hash digests the code is already using GNUTLS APIs.
>>
>> With the TLS work, and possible future improved block device
>> encryption, there will be more general purpose crypto APIs
>> needed in QEMU.
>>
>> It is undesirable to continue to litter the code with
>> countless #ifdef WITH_GNUTLS conditionals, as it makes
>> it increasingly hard to understand the code.
>>
>> The goal of this series is to thus consolidate all the
>> crypto code into a single logical place in QEMU - the
>> source in $GIT/crypto and heads in $GIT/include/crypto
>> The code in this location will provide QEMU internal
>> APIs for hash digests, ciphers, and later TLS and block
>> encryption primitives. The implementations will be
>> backed by GNUTLS, and either libgcrypt or nettle depending
>> on which of these GNUTLS is linking to. In the case where
>> GNUTLS is disabled at build time, we'll still keep the
>> built-in AES & RFB-cripple-DES implementations available
>> so we have no regression vs today's level of support.
>>
>> The callers of the crypto code can now be unconditionally
>> compiled and, if needed, they can check the availability
>> of algorithms they want at runtime and report clear errors
>> to the CLI or QMP if not available. This is a minor
>> difference in behaviour for the quorum block driver which
>> would previously be disabled at compile time if gnutls
>> was not available.
>>
>> A future posting will include the TLS crypto APIs.
>>
>> I have not attempted to convert the CPU emulation code to
>> use the new crypto APIs, since that code appears to have
>> quite specific need for access to the low level internal
>> stages of the AES algorithm. So I've left it using the
>> QEMU built-in AES code.
>>
>> I've added myself in the MAINTAINERS file for the new
>> directories, since it was't clear if anyone else on the
>> existing QEMU maintainer list had any interest / knowledge
>> in maintaining the crypto related pieces.
>>
>> Changes since v3:
>>
>>   - Removed need for crypto-internal.h file which was
>> missing from v3 patches sent.
>>   - Resolve conflicts with error reporting & main loop
>> API changes / cleanup on master
>>
>> Changes since v2:
>>
>>   - Remove _(..) gettext markers from error messages
>>   - Fix array bounds check in hash module (Richard Henderson)
>>   - Fix null dereference in freeing of gcrypt cipher impl
>> (Gonglei)
>>
>> Changes since v1:
>>
>>   - Add explicit algorithm constants for each AES key size,
>> instead of inferring it from array length
>>   - Share code for munging des rfb key bit order
>>   - Share code for validating key array size vs algorithm
>>   - Refactor built-in cipher impl to reduce number of big
>> switch statements
>>   - Fix uninitialized 'Error *err' var
>>   - Add comments in places where error reporting should be
>>
>> Daniel P. Berrange (10):
>>   crypto: introduce new module for computing hash digests
>>   crypto: move built-in AES implementation into crypto/
>>   crypto: move built-in D3DES implementation into crypto/
>>   crypto: introduce generic cipher API & built-in implementation
>>   crypto: add a gcrypt cipher implementation
>>   crypto: add a nettle cipher implementation
>>   block: convert quorum blockdrv to use crypto APIs
>>   ui: convert VNC websockets to use crypto APIs
>>   block: convert qcow/qcow2 to use generic cipher API
>>   ui: convert VNC to use generic cipher API
>>
>>  MAINTAINERS   |   7 +
>>  Makefile.objs |   1 +
>>  block/Makefile.objs   |   2 +-
>>  block/qcow.c  | 102 ++---
>>  block/qcow2-cluster.c

Re: [Qemu-devel] [BUG/RFC] Two cpus are not brought up normally in SLES11 sp3 VM after reboot

2015-07-07 Thread Igor Mammedov

On Mon, 6 Jul 2015 17:59:10 +0800
zhanghailiang  wrote:

> On 2015/7/6 16:45, Paolo Bonzini wrote:
> >
> >
> > On 06/07/2015 09:54, zhanghailiang wrote:
> >>
> >>  From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
> >> consuming any cpu (Should be in idle state),
> >> All of VCPUs' stacks in host is like bellow:
> >>
> >> [] kvm_vcpu_block+0x65/0xa0 [kvm]
> >> [] __vcpu_run+0xd1/0x260 [kvm]
> >> [] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
> >> [] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
> >> [] do_vfs_ioctl+0x8b/0x3b0
> >> [] sys_ioctl+0xa1/0xb0
> >> [] system_call_fastpath+0x16/0x1b
> >> [<2ab9fe1f99a7>] 0x2ab9fe1f99a7
> >> [] 0x
> >>
> >> We looked into the kernel codes that could leading to the above 'Stuck'
> >> warning,
in current upstream there isn't any printk(...Stuck...) left since that code 
path
has been reworked.
I've often seen this on over-committed host during guest CPUs up/down torture 
test.
Could you update guest kernel to upstream and see if issue reproduces?

> >> and found that the only possible is the emulation of 'cpuid' instruct in
> >> kvm/qemu has something wrong.
> >> But since we can’t reproduce this problem, we are not quite sure.
> >> Is there any possible that the cupid emulation in kvm/qemu has some bug ?
> >
> > Can you explain the relationship to the cpuid emulation?  What do the
> > traces say about vcpus 1 and 7?
> 
> OK, we searched the VM's kernel codes with the 'Stuck' message, and  it is 
> located in
> do_boot_cpu(). It's in BSP context, the call process is:
> BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu() 
> -> wakeup_secondary_via_INIT() to trigger APs.
> It will wait 5s for APs to startup, if some AP not startup normally, it will 
> print 'CPU%d Stuck' or 'CPU%d: Not responding'.
> 
> If it prints 'Stuck', it means the AP has received the SIPI interrupt and 
> begins to execute the code
> 'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places 
> before smp_callin()(smpboot.c).
> The follow is the starup process of BSP and AP.
> BSP:
> start_kernel()
>->smp_init()
>   ->smp_boot_cpus()
> ->do_boot_cpu()
> ->start_ip = trampoline_address(); //set the address that AP will 
> go to execute
> ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU
> ->for (timeout = 0; timeout < 5; timeout++)
> if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if 
> AP startup or not
> 
> APs:
> ENTRY(trampoline_data) (trampoline_64.S)
>->ENTRY(secondary_startup_64) (head_64.S)
>   ->start_secondary() (smpboot.c)
>  ->cpu_init();
>  ->smp_callin();
>  ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP 
> comes here, the BSP will not prints the error message.
> 
>  From above call process, we can be sure that, the AP has been stuck between 
> trampoline_data and the cpumask_set_cpu() in
> smp_callin(), we look through these codes path carefully, and only found a 
> 'hlt' instruct that could block the process.
> It is located in trampoline_data():
> 
> ENTRY(trampoline_data)
>  ...
> 
>   callverify_cpu  # Verify the cpu supports long mode
>   testl   %eax, %eax  # Check for return code
>   jnz no_longmode
> 
>  ...
> 
> no_longmode:
>   hlt
>   jmp no_longmode
> 
> For the process verify_cpu(),
> we can only find the 'cpuid' sensitive instruct that could lead VM exit from 
> No-root mode.
> This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to 
> the fail in verify_cpu.
> 
>  From the message in VM, we know vcpu1 and vcpu7 is something wrong.
> [5.060042] CPU1: Stuck ??
> [   10.170815] CPU7: Stuck ??
> [   10.171648] Brought up 6 CPUs
> 
> Besides, the follow is the cpus message got from host.
> 80FF72F5-FF6D-E411-A8C8-00821800:/home/fsp/hrg # virsh 
> qemu-monitor-command instance-000
> * CPU #0: pc=0x7f64160c683d thread_id=68570
>CPU #1: pc=0x810301f1 (halted) thread_id=68573
>CPU #2: pc=0x810301e2 (halted) thread_id=68575
>CPU #3: pc=0x810301e2 (halted) thread_id=68576
>CPU #4: pc=0x810301e2 (halted) thread_id=68577
>CPU #5: pc=0x810301e2 (halted) thread_id=68578
>CPU #6: pc=0x810301e2 (halted) thread_id=68583
>CPU #7: pc=0x810301f1 (halted) thread_id=68584
> 
> Oh, i also forgot to mention in the above message that, we have bond each 
> vCPU to different physical CPU in
> host.
> 
> Thanks,
> zhanghailiang
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PULL v3 for-2.4 08/11] virtio-ccw: migrate ->revision

2015-07-07 Thread Christian Borntraeger

Am 02.07.2015 um 16:10 schrieb Cornelia Huck:
> We need to migrate the revision field as well. No compatibility
> concerns as we already introduced migration of ->config_vector in
> this release.
> 
> Signed-off-by: Cornelia Huck 
> ---
>  hw/s390x/virtio-ccw.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
> index 8597ac4..d631337 100644
> --- a/hw/s390x/virtio-ccw.c
> +++ b/hw/s390x/virtio-ccw.c
> @@ -1472,6 +1472,7 @@ static void virtio_ccw_save_config(DeviceState *d, 
> QEMUFile *f)
>  qemu_put_be16(f, vdev->config_vector);
>  qemu_put_be64(f, dev->routes.adapter.ind_offset);
>  qemu_put_byte(f, dev->thinint_isc);
> +qemu_put_be32(f, dev->revision);
>  }
> 
>  static int virtio_ccw_load_config(DeviceState *d, QEMUFile *f)
> @@ -1512,6 +1513,7 @@ static int virtio_ccw_load_config(DeviceState *d, 
> QEMUFile *f)
> dev->thinint_isc, true, false,
> &dev->routes.adapter.adapter_id);
>  }
> +dev->revision = qemu_get_be32(f);
> 
>  return 0;
>  }
> 

This broke migration:

2015-07-07T11:22:55.570968Z qemu-system-s390x: VQ 39 address 0x0 inconsistent 
with Host index 0x100
2015-07-07T11:22:55.571008Z qemu-system-s390x: error while loading state for 
instance 0x0 of 

Seems that revision is used before it is loaded, something like

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index d631337..f524140 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -1448,6 +1448,7 @@ static void virtio_ccw_save_config(DeviceState *d, 
QEMUFile *f)
 VirtIODevice *vdev = virtio_ccw_get_vdev(s);
 
 subch_device_save(s, f);
+qemu_put_be32(f, dev->revision);
 if (dev->indicators != NULL) {
 qemu_put_be32(f, dev->indicators->len);
 qemu_put_be64(f, dev->indicators->addr);
@@ -1472,7 +1473,6 @@ static void virtio_ccw_save_config(DeviceState *d, 
QEMUFile *f)
 qemu_put_be16(f, vdev->config_vector);
 qemu_put_be64(f, dev->routes.adapter.ind_offset);
 qemu_put_byte(f, dev->thinint_isc);
-qemu_put_be32(f, dev->revision);
 }
 
 static int virtio_ccw_load_config(DeviceState *d, QEMUFile *f)
@@ -1484,6 +1484,7 @@ static int virtio_ccw_load_config(DeviceState *d, 
QEMUFile *f)
 
 s->driver_data = dev;
 subch_device_load(s, f);
+dev->revision = qemu_get_be32(f);
 len = qemu_get_be32(f);
 if (len != 0) {
 dev->indicators = get_indicator(qemu_get_be64(f), len);
@@ -1513,7 +1514,6 @@ static int virtio_ccw_load_config(DeviceState *d, 
QEMUFile *f)
dev->thinint_isc, true, false,
&dev->routes.adapter.adapter_id);
 }
-dev->revision = qemu_get_be32(f);
 
 return 0;
 }

Seems to do the trick.

Re: [Qemu-devel] [PATCH qemu v10 14/14] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)

2015-07-07 Thread Thomas Huth

On Tue, 7 Jul 2015 20:43:44 +1000
Alexey Kardashevskiy  wrote:

> On 07/07/2015 07:33 PM, Thomas Huth wrote:
> > On Mon,  6 Jul 2015 12:11:10 +1000
> > Alexey Kardashevskiy  wrote:
...
> >> +static void rtas_ibm_create_pe_dma_window(PowerPCCPU *cpu,
> >> +  sPAPRMachineState *spapr,
> >> +  uint32_t token, uint32_t nargs,
> >> +  target_ulong args,
> >> +  uint32_t nret, target_ulong 
> >> rets)
> >> +{
> >> +sPAPRPHBState *sphb;
> >> +sPAPRTCETable *tcet = NULL;
> >> +uint32_t addr, page_shift, window_shift, liobn;
> >> +uint64_t buid;
> >> +long ret;
> >> +
> >> +if ((nargs != 5) || (nret != 4)) {
> >
> > Pascal bracket style again :-(
> 
> 
> Am I breaking any code design guideline here?

No, but my Pascal allergy causes me to sneeze here ;-)

> >> +goto param_error_exit;
> >> +}
> >> +
> >> +buid = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2);
> 
> But here braces are ok? :-/

You could remove them, too. But I did not need to sneeze here.

> >> +addr = rtas_ld(args, 0);
> >> +sphb = spapr_pci_find_phb(spapr, buid);
> >> +if (!sphb || !sphb->ddw_enabled) {
> >> +goto param_error_exit;
> >> +}
> >> +
> >> +page_shift = rtas_ld(args, 3);
> >> +window_shift = rtas_ld(args, 4);
> >> +liobn = spapr_phb_get_free_liobn(sphb);
> >> +
> >> +if (!liobn || !(sphb->page_size_mask & (1ULL << page_shift))) {
> >> +goto hw_error_exit;
> >> +}
> >> +
> >> +ret = spapr_phb_dma_init_window(sphb, liobn, page_shift,
> >> +1ULL << window_shift);
> >
> > As already mentioned in a comment to another patch in this series, I
> > think it maybe might be better to do some sanity checks on the
> > window_shift value, too?
> 
> 
> Well, as you suggested, I added a check to spapr_phb_dma_init_window() 
> which makes this code return RTAS_OUT_HW_ERROR. Or I can add this here:
> 
> if (window_shift < page_shift) {
>  goto param_error_exit;
> }
> 
> and RTAS handler will return RTAS_OUT_PARAM_ERROR.
> SPAPR does not say what is the correct reponse in this case...

Both error codes sound ok for me here, so do whatever you think is best.

> >> +
> >> +rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> >> +rtas_st(rets, 1, liobn);
> >> +rtas_st(rets, 2, tcet->bus_offset >> 32);
> >> +rtas_st(rets, 3, tcet->bus_offset & ((uint32_t) -1));
> >
> > Why don't you simply use 0x instead of ((uint32_t) -1) ?
> > That's shorter and much easier to understand at a first glance than
> > calulating the type-cast in your brain ;-)
> 
> 
> At a first glance I cannot tell if there are 7 or 8 or 9 "f"s in 
> 0x. I may accidentally add/remove one "f" and nobody will notice. 
> Such typecast of (-1) is quite typical.

But IMHO it's ugly to use it to mask a value to the lower 32 bits this
way. At least I had to read this twice to understand what you're
trying to achieve here. So if you don't like the 0x, what about
simply using:

rtas_st(rets, 3, (uint32_t)tcet->bus_offset);

?

 Thomas

Re: [Qemu-devel] [virtio guest] vring_need_event() from virtqueue_kick_prepare()

2015-07-07 Thread Catalin Vasile

My vhost module respects the format vhost-net uses:

/*  summary */
mutex_lock(&vq->mutex);
vhost_disable_notify();
for (;;) {
head = vhost_get_vq_desc();
   if (head == vq->num) {
if (unlikely(vhost_enable_notify())) {
vhost_disable_notify();
continue;
}
break;
   }
   vhost_add_used_and_signal();
}
mutex_unlock(&vq->mutex);
/*  */

I have made a lot of printk() calls and the first job gets processed
completely, and gets through all those calls:
1. it goes into a first loop and processes the first job (get
descriptor, work with the descriptor, add used and signal).
2. On the second loop it hits head == vq->num, and goes back to
listening to notifications (successfully, it does not get into the
fallback).

Now in the guest:
1. sends first job and the paramers used to call vring_need_event() are:
vring_avail_event=0, new=1, old=0 (which makes the function evaluate to "0 < 1")
2. the queue is kicked and vhost does its job.
3. the guest driver reaches the end of the first job, and lets the
following job take its course, only this time vring_need_event()
receives the following parameters:
vring_avail_event=0, new=2, old=1 (which makes the function evaluate to "1 < 1")
so a kick is not actually sent because vring_need_event() returns
false. From what I see as the definition for vring_need_event(), it
does not actually look at flags.
"if (vq->event) {" evaluates to true in both cases, so it always
verifies those indexes (it does not go on the branch which verifies
flags).
I am also pretty sure the jobs are serialized in the guest driver, and
do not cross each other's path. One of the reasons is that every
function that sends a job must hold a mutex that protects the
virtqueue.
The guest driver blocks awaiting an interrupt for the job being
finished, but vhost does not get woken up to process the job in the
first places, because a notification is not actually triggered because
of what I have explained above.

On Tue, Jul 7, 2015 at 1:17 PM, Stefan Hajnoczi  wrote:
> On Mon, Jul 06, 2015 at 06:13:29PM +0300, Catalin Vasile wrote:
>> What is the logic behind vring_need_event() when used with
>> virtqueue_kick_prepare()?
>> What does the keyword >>just<< refer to from the following context:
>> /* The following is used with USED_EVENT_IDX and AVAIL_EVENT_IDX */
>> /* Assuming a given event_idx value from the other size, if
>>  * we have just incremented index from old to new_idx,
>>  * should we trigger an event? */
>> ?
>
> "just" means since the last time the host/guest-visible index field was
> changed.  After avail or used rings have been processed, the index field
> for that ring is published to the host/guest.  At that point a check is
> made whether the other side needs to be kicked.
>
>> I am sending 2 jobs, one after another, and the second one just does
>> not want to kick, although the first one finished completely and the
>> backend went back to interrupt mode, all because vring_need_event()
>> returns false.
>
> Maybe the vhost driver called vhost_disable_notify() and hasn't
> re-enabled notify yet?
>
> This could happen if the guest adds buffers to the virtqueue while the
> host is processing the virtqueue.  Take a look at the vhost_net code for
> how to correctly disable and re-enable notify without race conditions on
> the host.
>
> The idea behind disabling notify is to eliminate unnecessary
> vmexits/notifications since the host is already processing the virtqueue
> and will see new buffers.  It's like a polling vs interrupt mode.
>
> If the vhost driver on the host doesn't implement it correctly, then the
> device could stop responding to the avail ring.

1 2 3 4 5 >

1 - 100 of 444 matches

Mail list logo