[Qemu-devel] live migration fails after host kernel upgrade (3.12 => 3.18)

2015-03-15 Thread Stefan Priebe

Hi,

after upgrading the host kernel from 3.12 to 3.18 live migration fails 
with the following qemu output (guest running on a host with 3.12 => 
host with 3.18):


kvm: Features 0x30afffe3 unsupported. Allowed features: 0x79bfbbe7
qemu: warning: error while loading state for instance 0x0 of device 
':00:12.0/virtio-net'

kvm: load of migration failed: Operation not permitted

But i can't

Greets,
Stefan



Re: [Qemu-devel] [PATCH] kvm: fix ioeventfd endianness on bi-endian architectures

2015-03-15 Thread Paolo Bonzini


On 13/03/2015 22:30, Patchew Tool wrote:
> This series passed Patchew automatic testing, but there are some warnings.
> 
> Find the log fragments below, or open the following URL to see the full log:
> 
> http://qemu.patchew.org/testing/log/<20150313212337.31142.3991.stgit@bahia.local>


I'll fix this up when applying.

The patch is okay.

Paolo

> --8<-
> 
> 
> === Starting docker ===
> 
> Copying test script "/home/patchew/tests/qemu-devel.sh"
> Starting docker...
> docker run --net=none -v 
> /var/tmp/patchew-tester/tmpEMd8Zd:/var/tmp/patchew-test patchew:fedora-20 
> timeout 3600 /var/tmp/patchew-test/qemu-devel.sh /var/tmp/patchew-test
> 
> *** Testing 'coding style check' ***
> 
> Checking 0001-kvm--fix-ioeventfd-endianness-on-bi-endian-architectures.patch
> command failed with exit code 0
> $@
> ERROR: space required before the open parenthesis '('
> #26: FILE: kvm-all.c:538:
> +switch(size) {
> 
> total: 1 errors, 0 warnings, 42 lines checked
> 
> 0001-kvm--fix-ioeventfd-endianness-on-bi-endian-architectures.patch has style 
> problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
> 
> 
> *** Testing 'configure' ***
> 
> Install prefix/usr/local
> BIOS directory/usr/local/share/qemu
> binary directory  /usr/local/bin
> library directory /usr/local/lib
> module directory  /usr/local/lib/qemu
> libexec directory /usr/local/libexec
> 
> 
> 



[Qemu-devel] [PATCH v3] vl: fix resource leak with monitor_fdset_add_fd

2015-03-15 Thread Paolo Bonzini
monitor_fdset_add_fd returns an AddfdInfo struct (used by the QMP
command add_fd).  Free it.

Signed-off-by: Paolo Bonzini 
---
v1->v2: line length [Fam], pass &error_abort [Shannon]
v2->v3: use "!!" instead of "? true : false" [Markus]
---
 vl.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/vl.c b/vl.c
index eba5d4c..5985680 100644
--- a/vl.c
+++ b/vl.c
@@ -1011,6 +1011,7 @@ static int parse_add_fd(QemuOpts *opts, void *opaque)
 int fd, dupfd, flags;
 int64_t fdset_id;
 const char *fd_opaque = NULL;
+AddfdInfo *fdinfo;
 
 fd = qemu_opt_get_number(opts, "fd", -1);
 fdset_id = qemu_opt_get_number(opts, "set", -1);
@@ -1060,8 +1061,9 @@ static int parse_add_fd(QemuOpts *opts, void *opaque)
 }
 
 /* add the duplicate fd, and optionally the opaque string, to the fd set */
-monitor_fdset_add_fd(dupfd, true, fdset_id, fd_opaque ? true : false,
- fd_opaque, NULL);
+fdinfo = monitor_fdset_add_fd(dupfd, true, fdset_id, !!fd_opaque, 
fd_opaque,
+  &error_abort);
+g_free(fdinfo);
 
 return 0;
 }
-- 
2.3.0




Re: [Qemu-devel] [PATCH] hw/bt/sdp: Fix resource leak detect by coverity

2015-03-15 Thread Paolo Bonzini


On 14/03/2015 11:07, Stefan Weil wrote:
> 
> This fixes the memory leak, but I still don't understand what is done here.
> data is allocated, then filled with values, now it is also deallocated.
> But I'm missing the part where all those data is used.

"data" escapes in record->attribute_list[record->attributes].pair.

The bug is in bt_l2cap_sdp_close_ch which does an invalid free every
time it frees the first sdp->service_list[i].attribute_list->pair (but
the qsort could have moved it elsewhere in the list).  The right fix is
to do a separate malloc for each attribute, instead of a single one.

In any case, it seems simpler to just leave this code aside.

Paolo



Re: [Qemu-devel] [PATCH] hw/bt/sdp: Fix resource leak detect by coverity

2015-03-15 Thread Michael Tokarev
15.03.2015 12:21, Paolo Bonzini wrote:
> On 14/03/2015 11:07, Stefan Weil wrote:
>>
>> This fixes the memory leak, but I still don't understand what is done here.
>> data is allocated, then filled with values, now it is also deallocated.
>> But I'm missing the part where all those data is used.
> 
> "data" escapes in record->attribute_list[record->attributes].pair.
> 
> The bug is in bt_l2cap_sdp_close_ch which does an invalid free every
> time it frees the first sdp->service_list[i].attribute_list->pair (but
> the qsort could have moved it elsewhere in the list).  The right fix is
> to do a separate malloc for each attribute, instead of a single one.

Or, alternatively, to keep this `data' pointer in sdp to use it in
bt_l2cap_sdp_close_ch().

> In any case, it seems simpler to just leave this code aside.

How many times this code is called?

We have many many places in qemu where resources are allocated once
at startup and never freed just because there's no need to.

Thanks,

/mjt



Re: [Qemu-devel] Get the memory trace

2015-03-15 Thread Peter Maydell
On 14 March 2015 at 13:54, Wenjie Liu  wrote:
> Hi all,
> Recently, I am trying to get the memory trace from qemu.
> Since I am using qemu by Marss, so the version of qemu is 0.14.

For upstream QEMU, 0.14 is extremely ancient history by now,
I'm afraid (it's four years old, and the codebase has changed
a lot since then). If you can't use a newer version of QEMU
I think you'll have to try asking the Marss developers about
this.

-- PMM



Re: [Qemu-devel] [PATCH] linux-user: Add missing check for return value of lock_user

2015-03-15 Thread Peter Maydell
On 14 March 2015 at 15:12, Stefan Weil  wrote:
> This fixes a warning from Coverity:
> "Dereference null return value (NULL_RETURNS)"
>
> Signed-off-by: Stefan Weil 
> ---
>  linux-user/flatload.c |8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/linux-user/flatload.c b/linux-user/flatload.c
> index 566a7a8..56ac790 100644
> --- a/linux-user/flatload.c
> +++ b/linux-user/flatload.c
> @@ -97,11 +97,13 @@ static int target_pread(int fd, abi_ulong ptr, abi_ulong 
> len,
>  abi_ulong offset)
>  {
>  void *buf;
> -int ret;
> +int ret = -TARGET_EFAULT;

The return value for this function should be a host errno,
not a target errno, I think. (If you track back upwards
then eventually main.c calls loader_exec() and treats the
return value as a host errno.) This seems like the right
thing given that these loader functions are all pre-run
setup, and are not involved in emulation of guest syscalls.

>
>  buf = lock_user(VERIFY_WRITE, ptr, len, 0);
> -ret = pread(fd, buf, len, offset);
> -unlock_user(buf, ptr, len);
> +if (buf) {
> +ret = pread(fd, buf, len, offset);

A different bug, but if ret here indicates that pread()
failed we should be returning -errno.

> +unlock_user(buf, ptr, len);
> +}
>  return ret;
>  }

-- PMM



Re: [Qemu-devel] [PATCH] tcg/optimize: Handle or r, a, a with constant a

2015-03-15 Thread Mark Cave-Ayland
On 13/03/15 19:26, Richard Henderson wrote:

> As seen with ubuntu-5.10-live-powerpc.iso.
> 
> Reported-by: Mark Cave-Ayland 
> Signed-off-by: Richard Henderson 
> ---
>  tcg/optimize.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index 067917c..37c1110 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -980,8 +980,11 @@ static void tcg_constant_folding(TCGContext *s)
>  if (temps_are_copies(args[1], args[2])) {
>  if (temps_are_copies(args[0], args[1])) {
>  tcg_op_remove(s, op);
> -} else {
> +} else if (temps[args[1]].state != TCG_TEMP_CONST) {
>  tcg_opt_gen_mov(s, op, args, opc, args[0], args[1]);
> +} else {
> +tcg_opt_gen_movi(s, op, args, opc,
> + args[0], temps[args[1]].val);
>  }
>  continue;
>  }
> 

Great work! Fixes the issue here for me so:

Tested-by: Mark Cave-Ayland 


ATB,

Mark.




[Qemu-devel] IO port 0x61 is multiple used

2015-03-15 Thread Weidong Huang
Hi ALL:

IO port 0x61 is used by pc speaker and NMI reason port.

This is ambiguous. How to distinguish in QEMU?

I see that NMI reason port is not realized in QEMU.

But linux guest will access this port to determine what the NMI reason code 
means.

It will access pc speaker's io port. Is this a bug?




Re: [Qemu-devel] [PATCH] hw/bt/sdp: Fix resource leak detect by coverity

2015-03-15 Thread Paolo Bonzini


On 15/03/2015 11:23, Michael Tokarev wrote:
> Or, alternatively, to keep this `data' pointer in sdp to use it in
> bt_l2cap_sdp_close_ch().

Yes.

>> > In any case, it seems simpler to just leave this code aside.
> How many times this code is called?
> 
> We have many many places in qemu where resources are allocated once
> at startup and never freed just because there's no need to.

Well, in this case the bug in bt_l2cap_sdp_close_ch is much worse than a
resource leak.  But bluetooth is not the utmost priority in QEMU
development...

Paolo



Re: [Qemu-devel] [PATCH v3] Execute _start and reach to the first function call successfully

2015-03-15 Thread Chen Gang

At present, I finished the first system call successfully (it is uname
called by glibc).

If no additional reply of this thread within 3 days, I shall send a new
patch which will include system call implementation.

Thanks.

On 3/14/15 14:03, Chen Gang wrote:
> QEMU TILE-Gx can decode bundle, disassemble code, and generate tcg code
> for 1st TB block (__start). Then directly jump to __libc_start_main (2nd
> TB block).
> 
> In __libc_start_main, it can continue executing to the first function
> call _dl_aux_init().
> 
> Signed-off-by: Chen Gang 
> ---
>  target-tilegx/cpu-qom.h   |   2 +
>  target-tilegx/cpu.c   |   4 -
>  target-tilegx/cpu.h   |  22 +-
>  target-tilegx/translate.c | 790 
> +-
>  4 files changed, 798 insertions(+), 20 deletions(-)
> 
> diff --git a/target-tilegx/cpu-qom.h b/target-tilegx/cpu-qom.h
> index 4ee11e1..5615c3b 100644
> --- a/target-tilegx/cpu-qom.h
> +++ b/target-tilegx/cpu-qom.h
> @@ -68,4 +68,6 @@ static inline TileGXCPU *tilegx_env_get_cpu(CPUTLGState 
> *env)
>  
>  #define ENV_GET_CPU(e) CPU(tilegx_env_get_cpu(e))
>  
> +#define ENV_OFFSET offsetof(TileGXCPU, env)
> +
>  #endif
> diff --git a/target-tilegx/cpu.c b/target-tilegx/cpu.c
> index cf46b8b..8255fdc 100644
> --- a/target-tilegx/cpu.c
> +++ b/target-tilegx/cpu.c
> @@ -69,10 +69,6 @@ static void tilegx_cpu_realizefn(DeviceState *dev, Error 
> **errp)
>  tcc->parent_realize(dev, errp);
>  }
>  
> -static void tilegx_tcg_init(void)
> -{
> -}
> -
>  static void tilegx_cpu_initfn(Object *obj)
>  {
>  CPUState *cs = CPU(obj);
> diff --git a/target-tilegx/cpu.h b/target-tilegx/cpu.h
> index 87dc56b..93e16c3 100644
> --- a/target-tilegx/cpu.h
> +++ b/target-tilegx/cpu.h
> @@ -30,16 +30,21 @@
>  #include "fpu/softfloat.h"
>  
>  /* TILE-Gx register alias */
> -#define TILEGX_R_RE  0   /*  0 register, for function/syscall return value */
> -#define TILEGX_R_NR  10  /* 10 register, for syscall number */
> -#define TILEGX_R_BP  52  /* 52 register, optional frame pointer */
> -#define TILEGX_R_TP  53  /* TP register, thread local storage data */
> -#define TILEGX_R_SP  54  /* SP register, stack pointer */
> -#define TILEGX_R_LR  55  /* LR register, may save pc, but it is not pc */
> +#define TILEGX_R_RE0   /*  0 register, for function/syscall return value 
> */
> +#define TILEGX_R_NR10  /* 10 register, for syscall number */
> +#define TILEGX_R_BP52  /* 52 register, optional frame pointer */
> +#define TILEGX_R_TP53  /* TP register, thread local storage data */
> +#define TILEGX_R_SP54  /* SP register, stack pointer */
> +#define TILEGX_R_LR55  /* LR register, may save pc, but it is not pc */
> +#define TILEGX_R_ZERO  63  /* Zero register, always zero */
> +#define TILEGX_R_COUNT 56  /* Only 56 registers are really useful */
> +#define TILEGX_R_NOREG 255 /* Invalid register value */
> +
>  
>  typedef struct CPUTLGState {
> -uint64_t regs[56];
> -uint64_t pc;
> +uint64_t regs[TILEGX_R_COUNT]; /* Common used registers by outside */
> +uint64_t pc;   /* Current pc */
> +
>  CPU_COMMON
>  } CPUTLGState;
>  
> @@ -54,6 +59,7 @@ typedef struct CPUTLGState {
>  
>  #include "exec/cpu-all.h"
>  
> +void tilegx_tcg_init(void);
>  int cpu_tilegx_exec(CPUTLGState *s);
>  int cpu_tilegx_signal_handler(int host_signum, void *pinfo, void *puc);
>  
> diff --git a/target-tilegx/translate.c b/target-tilegx/translate.c
> index 9aa82a9..a862006 100644
> --- a/target-tilegx/translate.c
> +++ b/target-tilegx/translate.c
> @@ -23,19 +23,793 @@
>  #include "disas/disas.h"
>  #include "tcg-op.h"
>  #include "exec/cpu_ldst.h"
> +#include "opcode_tilegx.h"
> +
> +#define TILEGX_OPCODE_MAX_X0164  /* include 164 */
> +#define TILEGX_OPCODE_MAX_X1107  /* include 107 */
> +#define TILEGX_OPCODE_MAX_Y0 15  /* include 15 */
> +#define TILEGX_OPCODE_MAX_Y1 15  /* include 15 */
> +#define TILEGX_OPCODE_MAX_Y2  3  /* include 3 */
> +
> +#define TILEGX_EXCP_OPCODE_UNKNOWN  0x1
> +#define TILEGX_EXCP_OPCODE_UNIMPLEMENT  0x2
> +#define TILEGX_EXCP_REG_UNSUPPORTED 0x81
> +
> +static TCGv_ptr cpu_env;
> +static TCGv cpu_pc;
> +static TCGv cpu_regs[TILEGX_R_COUNT];
> +
> +static const char * const reg_names[] = {
> + "r0",  "r1",  "r2",  "r3",  "r4",  "r5",  "r6",  "r7",
> + "r8",  "r9", "r10", "r11", "r12", "r13", "r14", "r15",
> +"r16", "r17", "r18", "r19", "r20", "r21", "r22", "r23",
> +"r24", "r25", "r26", "r27", "r28", "r29", "r30", "r31",
> +"r32", "r33", "r34", "r35", "r36", "r37", "r38", "r39",
> +"r40", "r41", "r42", "r43", "r44", "r45", "r46", "r47",
> +"r48", "r49", "r50", "r51",  "bp",  "tp",  "sp",  "lr"
> +};
> +
> +/* It is for temporary registers */
> +typedef struct DisasContextTemp {
> +unsigned char idx; /* index */
> +TCGv val;  /* value */
> +} DisasContextTemp;
> +
> +/* This is the state at tr

[Qemu-devel] Another solution for qga-vss stack protection issue

2015-03-15 Thread Joseph Hindin
qga/vss-win32/Makefile.objsi filters  out -fstack-protector-all option from C++ 
flags,
but with commit 63678e17c configure script may add option 
-fstack-protector-strong,
depending on availability. The suggested change filters out both option for 
qga-vss.dll
linking.

 Regards,
  Joseph Hindin





[Qemu-devel] [PATCH] gemu-ga-win: configure script may add -fstack-protector-strong option instead of -fstack-protector-all, depending on availability ( see commit 63678e17c ). Both options have to by

2015-03-15 Thread Joseph Hindin
Signed-off-by: Joseph Hindin 
---
 qga/vss-win32/Makefile.objs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qga/vss-win32/Makefile.objs b/qga/vss-win32/Makefile.objs
index 6a69d50..7c96c6b 100644
--- a/qga/vss-win32/Makefile.objs
+++ b/qga/vss-win32/Makefile.objs
@@ -3,7 +3,7 @@
 qga-vss-dll-obj-y += requester.o provider.o install.o
 
 obj-qga-vss-dll-obj-y = $(addprefix $(obj)/, $(qga-vss-dll-obj-y))
-$(obj-qga-vss-dll-obj-y): QEMU_CXXFLAGS = $(filter-out -Wstrict-prototypes 
-Wmissing-prototypes -Wnested-externs -Wold-style-declaration 
-Wold-style-definition -Wredundant-decls -fstack-protector-all, $(QEMU_CFLAGS)) 
-Wno-unknown-pragmas -Wno-delete-non-virtual-dtor
+$(obj-qga-vss-dll-obj-y): QEMU_CXXFLAGS = $(filter-out -Wstrict-prototypes 
-Wmissing-prototypes -Wnested-externs -Wold-style-declaration 
-Wold-style-definition -Wredundant-decls -fstack-protector-all 
-fstack-protector-strong, $(QEMU_CFLAGS)) -Wno-unknown-pragmas 
-Wno-delete-non-virtual-dtor
 
 $(obj)/qga-vss.dll: LDFLAGS = -shared 
-Wl,--add-stdcall-alias,--enable-stdcall-fixup -lole32 -loleaut32 -lshlwapi 
-luuid -static
 $(obj)/qga-vss.dll: $(obj-qga-vss-dll-obj-y) $(SRC_PATH)/$(obj)/qga-vss.def
-- 
2.1.0




Re: [Qemu-devel] [PATCH v3] Execute _start and reach to the first function call successfully

2015-03-15 Thread Andreas Färber
Am 15.03.2015 um 15:19 schrieb Chen Gang:
> If no additional reply of this thread within 3 days, I shall send a new
> patch which will include system call implementation.

Please use a proper subject then, saying what it does (rather what works
afterwards). In particular don't forget "target-tilegx:". :)

Thanks,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu,
Graham Norton; HRB 21284 (AG Nürnberg)



Re: [Qemu-devel] [PATCH 1/3] VFIO: Clear stale MSIx table during EEH reset

2015-03-15 Thread Gavin Shan
On Fri, Mar 13, 2015 at 03:33:50PM -0600, Alex Williamson wrote:
>On Wed, 2015-03-11 at 17:11 +1100, Gavin Shan wrote:
>> The PCI device MSIx table is cleaned out in hardware after EEH PE
>> reset. However, we still hold the stale MSIx entries in QEMU, which
>> should be cleared accordingly. Otherwise, we will run into another
>> (recursive) EEH error and the PCI devices contained in the PE have
>> to be offlined exceptionally.
>> 
>> The patch clears stale MSIx table before EEH PE reset so that MSIx
>> table could be restored properly after EEH PE reset.
>> 
>> Signed-off-by: Gavin Shan 
>> ---
>>  hw/vfio/common.c   |  6 +-
>>  hw/vfio/pci.c  | 39 +++
>>  include/hw/vfio/vfio.h |  3 ++-
>>  3 files changed, 46 insertions(+), 2 deletions(-)
>> 
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 148eb53..e3833f4 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -949,8 +949,12 @@ int vfio_container_ioctl(AddressSpace *as, int32_t 
>> groupid,
>>  switch (req) {
>>  case VFIO_CHECK_EXTENSION:
>>  case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
>> -case VFIO_EEH_PE_OP:
>>  break;
>> +case VFIO_EEH_PE_OP:
>> +if (!vfio_container_eeh_event(as, groupid, param)) {
>> +break;
>> +}
>> +/* fallthru */
>>  default:
>>  /* Return an error on unknown requests */
>>  error_report("vfio: unsupported ioctl %X", req);
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 6b80539..8c4a8cb 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3319,6 +3319,45 @@ static void 
>> vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
>>  vdev->req_enabled = false;
>>  }
>>  
>> +int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
>> + struct vfio_eeh_pe_op *op)
>
>Called by common, but lives in pci.  This would need a stub if QEMU is
>built without CONFIG_PCI.  Thanks,
>

Right. I'll add a stub for !CONFIG_PCI.

Thanks,
Gavin

>Alex
>
>> +{
>> +VFIOGroup *group;
>> +VFIODevice *vbasedev;
>> +VFIOPCIDevice *vdev;
>> +
>> +group = vfio_get_group(groupid, as);
>> +if (!group) {
>> +vfio_put_group(group);
>> +error_report("vfio: group %d not found\n", groupid);
>> +return -1;
>> +}
>> +
>> +switch (op->op) {
>> +case VFIO_EEH_PE_RESET_HOT:
>> +case VFIO_EEH_PE_RESET_FUNDAMENTAL:
>> +/*
>> + * The MSIx table will be cleaned out by reset. We need
>> + * disable it so that it can be reenabled properly. Also,
>> + * the cached MSIx table should be cleared as it's not
>> + * reflecting the contents in hardware.
>> + */
>> +QLIST_FOREACH(vbasedev, &group->device_list, next) {
>> +vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>> +if (msix_enabled(&vdev->pdev)) {
>> +vfio_disable_msix(vdev);
>> +}
>> +
>> +msix_reset(&vdev->pdev);
>> +}
>> +
>> +break;
>> +}
>> +
>> +vfio_put_group(group);
>> +return 0;
>> +}
>> +
>>  static int vfio_initfn(PCIDevice *pdev)
>>  {
>>  VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>> diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h
>> index 0b26cd8..99528a3 100644
>> --- a/include/hw/vfio/vfio.h
>> +++ b/include/hw/vfio/vfio.h
>> @@ -5,5 +5,6 @@
>>  
>>  extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
>>  int req, void *param);
>> -
>> +extern int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
>> +struct vfio_eeh_pe_op *op);
>>  #endif
>
>
>




Re: [Qemu-devel] [PATCH v3] Execute _start and reach to the first function call successfully

2015-03-15 Thread Chen Gang
On 3/16/15 00:50, Andreas Färber wrote:
> Am 15.03.2015 um 15:19 schrieb Chen Gang:
>> If no additional reply of this thread within 3 days, I shall send a new
>> patch which will include system call implementation.
> 
> Please use a proper subject then, saying what it does (rather what works
> afterwards). In particular don't forget "target-tilegx:". :)
> 

Oh, sorry for my carelessness, and if no additional reply, I shall send
patch v4 for it with the latest code (include syscall implementation)
within 3 days (2015-03-18).

And again, really thank all of you very much for your reviewing.

Thanks.
-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed



Re: [Qemu-devel] [qemu] How to reliably obtain physaddr from vaddr

2015-03-15 Thread Richard Henderson
On 03/15/2015 03:00 AM, Emilio G. Cota wrote:
> On a TLB hit this is trivial (just do nothing), but on
> a TLB miss I'm lost on what to do--I cannot even follow
> where helper_ld/st go (grep doesn't help), although I
> suspect it's TCG backend ops and I don't see an obvious
> way of adding a new operation there.

It goes into softmmu_template.h.  Which then tests a victim tlb, and finally
calls tlb_fill.  You'll probably need to do the same.


r~




Re: [Qemu-devel] [PATCH v3] Execute _start and reach to the first function call successfully

2015-03-15 Thread Peter Maydell
On 15 March 2015 at 23:08, Chen Gang  wrote:
> On 3/16/15 00:50, Andreas Färber wrote:
>> Am 15.03.2015 um 15:19 schrieb Chen Gang:
>>> If no additional reply of this thread within 3 days, I shall send a new
>>> patch which will include system call implementation.
>>
>> Please use a proper subject then, saying what it does (rather what works
>> afterwards). In particular don't forget "target-tilegx:". :)
>>
>
> Oh, sorry for my carelessness, and if no additional reply, I shall send
> patch v4 for it with the latest code (include syscall implementation)
> within 3 days (2015-03-18)

Make sure you split the patch up sensibly into a
series of patches if you're planning to add more
code to it...

-- PMM



Re: [Qemu-devel] [PATCH v3] Execute _start and reach to the first function call successfully

2015-03-15 Thread Peter Maydell
On 14 March 2015 at 06:03, Chen Gang  wrote:
> QEMU TILE-Gx can decode bundle, disassemble code, and generate tcg code
> for 1st TB block (__start). Then directly jump to __libc_start_main (2nd
> TB block).
>
> In __libc_start_main, it can continue executing to the first function
> call _dl_aux_init().

>  /* TILE-Gx register alias */
> -#define TILEGX_R_RE  0   /*  0 register, for function/syscall return value */
> -#define TILEGX_R_NR  10  /* 10 register, for syscall number */
> -#define TILEGX_R_BP  52  /* 52 register, optional frame pointer */
> -#define TILEGX_R_TP  53  /* TP register, thread local storage data */
> -#define TILEGX_R_SP  54  /* SP register, stack pointer */
> -#define TILEGX_R_LR  55  /* LR register, may save pc, but it is not pc */
> +#define TILEGX_R_RE0   /*  0 register, for function/syscall return value 
> */
> +#define TILEGX_R_NR10  /* 10 register, for syscall number */
> +#define TILEGX_R_BP52  /* 52 register, optional frame pointer */
> +#define TILEGX_R_TP53  /* TP register, thread local storage data */
> +#define TILEGX_R_SP54  /* SP register, stack pointer */
> +#define TILEGX_R_LR55  /* LR register, may save pc, but it is not pc */
> +#define TILEGX_R_ZERO  63  /* Zero register, always zero */
> +#define TILEGX_R_COUNT 56  /* Only 56 registers are really useful */
> +#define TILEGX_R_NOREG 255 /* Invalid register value */

This appears to be changing code that was introduced in
a previous patch (which one? this patch doesn't appear to be
part of a series). Don't do that -- just get it right in
the first place.

-- PMM



Re: [Qemu-devel] [PATCH v3] Execute _start and reach to the first function call successfully

2015-03-15 Thread Chen Gang
On 3/16/15 07:45, Peter Maydell wrote:
> On 14 March 2015 at 06:03, Chen Gang  wrote:
>> QEMU TILE-Gx can decode bundle, disassemble code, and generate tcg code
>> for 1st TB block (__start). Then directly jump to __libc_start_main (2nd
>> TB block).
>>
>> In __libc_start_main, it can continue executing to the first function
>> call _dl_aux_init().
> 
>>  /* TILE-Gx register alias */
>> -#define TILEGX_R_RE  0   /*  0 register, for function/syscall return value 
>> */
>> -#define TILEGX_R_NR  10  /* 10 register, for syscall number */
>> -#define TILEGX_R_BP  52  /* 52 register, optional frame pointer */
>> -#define TILEGX_R_TP  53  /* TP register, thread local storage data */
>> -#define TILEGX_R_SP  54  /* SP register, stack pointer */
>> -#define TILEGX_R_LR  55  /* LR register, may save pc, but it is not pc */
>> +#define TILEGX_R_RE0   /*  0 register, for function/syscall return 
>> value */
>> +#define TILEGX_R_NR10  /* 10 register, for syscall number */
>> +#define TILEGX_R_BP52  /* 52 register, optional frame pointer */
>> +#define TILEGX_R_TP53  /* TP register, thread local storage data */
>> +#define TILEGX_R_SP54  /* SP register, stack pointer */
>> +#define TILEGX_R_LR55  /* LR register, may save pc, but it is not pc */
>> +#define TILEGX_R_ZERO  63  /* Zero register, always zero */
>> +#define TILEGX_R_COUNT 56  /* Only 56 registers are really useful */
>> +#define TILEGX_R_NOREG 255 /* Invalid register value */
> 
> This appears to be changing code that was introduced in
> a previous patch (which one? this patch doesn't appear to be
> part of a series). Don't do that -- just get it right in
> the first place.
> 

OK, thanks. And next, I shall try to send the whole tilegx patches again
(it is about 6-8 patches), within 2015-03-18.

Thanks.
-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed



Re: [Qemu-devel] [PATCH v3] Execute _start and reach to the first function call successfully

2015-03-15 Thread Chen Gang
On 3/16/15 07:44, Peter Maydell wrote:
> On 15 March 2015 at 23:08, Chen Gang  wrote:
>> On 3/16/15 00:50, Andreas Färber wrote:
>>> Am 15.03.2015 um 15:19 schrieb Chen Gang:
 If no additional reply of this thread within 3 days, I shall send a new
 patch which will include system call implementation.
>>>
>>> Please use a proper subject then, saying what it does (rather what works
>>> afterwards). In particular don't forget "target-tilegx:". :)
>>>
>>
>> Oh, sorry for my carelessness, and if no additional reply, I shall send
>> patch v4 for it with the latest code (include syscall implementation)
>> within 3 days (2015-03-18)
> 
> Make sure you split the patch up sensibly into a
> series of patches if you're planning to add more
> code to it...
> 

OK, thanks. I shall try to split it into 2-3 patches.


Thanks.
-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed



Re: [Qemu-devel] [qemu] How to reliably obtain physaddr from vaddr

2015-03-15 Thread Emilio G. Cota
On Sun, Mar 15, 2015 at 16:10:21 -0700, Richard Henderson wrote:
> On 03/15/2015 03:00 AM, Emilio G. Cota wrote:
> > On a TLB hit this is trivial (just do nothing), but on
> > a TLB miss I'm lost on what to do--I cannot even follow
> > where helper_ld/st go (grep doesn't help), although I
> > suspect it's TCG backend ops and I don't see an obvious
> > way of adding a new operation there.
> 
> It goes into softmmu_template.h.  Which then tests a victim tlb, and finally
> calls tlb_fill.  You'll probably need to do the same.

Thanks, figured this out this morning after getting some sleep.

I've defined this vaddr->paddr as a helper and I'm calling it
before every aa32 store. However, this isn't a smooth sailing:

1. futex_init in the kernel causes an oops--it passes vaddr=0
   but the call happens with pagefaults disabled:
 http://lxr.free-electrons.com/source/kernel/futex.c?v=3.18#L590
   in the code below I'm just returning to avoid the oops.

2. The kernel (vexpress-a9 from buildroot) doesn't boot. It dies with:

> [...]
> devtmpfs: mounted
> Freeing unused kernel memory: 256K (805ea000 - 8062a000)
> Cannot continue, found non relative relocs during the bootstrap.
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0e00
> 
> CPU: 0 PID: 1 Comm: init Not tainted 3.18.8 #2
> [<800147cc>] (unwind_backtrace) from [<800115d4>] (show_stack+0x10/0x14)
> [<800115d4>] (show_stack) from [<80473e8c>] (dump_stack+0x84/0x94)
> [<80473e8c>] (dump_stack) from [<80471180>] (panic+0xa0/0x1f8)
> [<80471180>] (panic) from [<800240dc>] (complete_and_exit+0x0/0x1c)
> [<800240dc>] (complete_and_exit) from [<87827f70>] (0x87827f70)
> ---[ end Kernel panic - not syncing: Attempted to kill init! 
> exitcode=0x0e00

  Note that if I only call the helper before the "str[hb]*" stores,
  the kernel boots fine.

I'm appending the code (applies on top of current HEAD, 7ccfb495),
any help on what's going on greatly appreciated.

Thanks,

Emilio

diff --git a/cputlb.c b/cputlb.c
index 38f2151..5596bae 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -329,6 +329,7 @@ void tlb_set_page(CPUState *cpu, target_ulong vaddr,
 } else {
 te->addr_write = -1;
 }
+te->addr_phys = paddr;
 }
 
 /* NOTE: this function can trigger an exception */
diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
index 0ca6f0b..65f340c 100644
--- a/include/exec/cpu-defs.h
+++ b/include/exec/cpu-defs.h
@@ -74,10 +74,10 @@ typedef uint64_t target_ulong;
 /* use a fully associative victim tlb of 8 entries */
 #define CPU_VTLB_SIZE 8
 
-#if HOST_LONG_BITS == 32 && TARGET_LONG_BITS == 32
-#define CPU_TLB_ENTRY_BITS 4
-#else
+#if TARGET_LONG_BITS == 32
 #define CPU_TLB_ENTRY_BITS 5
+#else
+#define CPU_TLB_ENTRY_BITS 6
 #endif
 
 typedef struct CPUTLBEntry {
@@ -90,13 +90,14 @@ typedef struct CPUTLBEntry {
 target_ulong addr_read;
 target_ulong addr_write;
 target_ulong addr_code;
+target_ulong addr_phys;
 /* Addend to virtual address to get host address.  IO accesses
use the corresponding iotlb value.  */
 uintptr_t addend;
 /* padding to get a power of two size */
 uint8_t dummy[(1 << CPU_TLB_ENTRY_BITS) -
-  (sizeof(target_ulong) * 3 +
-   ((-sizeof(target_ulong) * 3) & (sizeof(uintptr_t) - 1)) +
+  (sizeof(target_ulong) * 4 +
+   ((-sizeof(target_ulong) * 4) & (sizeof(uintptr_t) - 1)) +
sizeof(uintptr_t))];
 } CPUTLBEntry;
 
diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h
index 1673287..168cde9 100644
--- a/include/exec/cpu_ldst.h
+++ b/include/exec/cpu_ldst.h
@@ -127,6 +127,9 @@ void helper_stl_mmu(CPUArchState *env, target_ulong addr,
 void helper_stq_mmu(CPUArchState *env, target_ulong addr,
 uint64_t val, int mmu_idx);
 
+hwaddr helper_st_paddr_mmu(CPUArchState *env, target_ulong addr,
+   int mmu_idx);
+
 uint8_t helper_ldb_cmmu(CPUArchState *env, target_ulong addr, int mmu_idx);
 uint16_t helper_ldw_cmmu(CPUArchState *env, target_ulong addr, int mmu_idx);
 uint32_t helper_ldl_cmmu(CPUArchState *env, target_ulong addr, int mmu_idx);
diff --git a/include/exec/cpu_ldst_template.h b/include/exec/cpu_ldst_template.h
index 95ab750..39cde9d 100644
--- a/include/exec/cpu_ldst_template.h
+++ b/include/exec/cpu_ldst_template.h
@@ -129,6 +129,39 @@ glue(glue(cpu_st, SUFFIX), MEMSUFFIX)(CPUArchState *env, 
target_ulong ptr,
 }
 }
 
+#if DATA_SIZE == 1
+
+static inline hwaddr
+glue(cpu_st_paddr, MEMSUFFIX)(CPUArchState *env, target_ulong ptr)
+{
+int page_index;
+target_ulong addr;
+int mmu_idx;
+hwaddr ret;
+
+addr = ptr;
+/*
+ * XXX Understand why this is necessary.
+ * futex_init on linux bootup calls cmpxchg on a NULL pointer. It expects
+ * -EFAULT to be read back, but when we do the below we get a kernel oops.
+ * However, when doing the load from TCG -EFAULT is read just fine--no 
oops.

Re: [Qemu-devel] [PATCH v4 0/7] QEMU memory hot unplug support

2015-03-15 Thread Zhu Guihua


On 03/13/2015 07:07 PM, Igor Mammedov wrote:

On Fri, 13 Mar 2015 14:05:08 +0800
Zhu Guihua  wrote:


ping...

I can't find v4 series in my mailbox nor on list archives,
perhaps it got lost somewhere, could you resend rebased
version on top of current master, pls?


OK, I will resend v4 ASAP.

Thanks,
Zhu




On 03/04/2015 02:01 PM, Zhu Guihua wrote:

Memory hot unplug are both asynchronous procedures.
When the unplug operation happens, unplug request cb is called first.
And when guest OS finished handling unplug, unplug cb will be called
to do the real removal of device.

This series is rebased on mst pci branch.
http://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git

v4:
   -reorganize the patchset
   -fix all about the new API acpi_send_gpe_event()
   -update ssdt-mem

v3:
   -commit message changes
   -reorganize the patchset, squash and separate some patches
   -update specs about acpi_mem_hotplug
   -first cleanup external state, then un-map and un-register memory device

v2:
   -do a generic for acpi to send gpe event
   -unparent object by PC_MACHINE
   -update description in acpi_mem_hotplug.txt
   -combine the last two patches in the last version
   -cleanup external state in acpi_memory_unplug_cb

Tang Chen (6):
acpi, mem-hotplug: Add acpi_memory_slot_status() to get MemStatus
acpi: Add acpi_send_gpe_event() to rise sci for hotplug
acpi, mem-hotplug: Add unplug request cb for memory device
pc-dimm: Add memory hot unplug request support for pc-dimm
acpi, mem-hotplug: Add unplug cb for memory device
pc-dimm: Add memory hot unplug support for pc-dimm

Zhu Guihua (1):
acpi: Add hardware implementation for memory hot unplug

   docs/specs/acpi_mem_hotplug.txt   | 11 +-
   hw/acpi/core.c|  7 
   hw/acpi/cpu_hotplug.c |  3 +-
   hw/acpi/ich9.c| 19 +++--
   hw/acpi/memory_hotplug.c  | 81 
+--
   hw/acpi/pcihp.c   |  7 +---
   hw/acpi/piix4.c   | 16 ++--
   hw/core/qdev.c|  2 +-
   hw/i386/acpi-build.c  |  9 +
   hw/i386/acpi-dsdt-mem-hotplug.dsl | 10 +
   hw/i386/pc.c  | 54 --
   include/hw/acpi/acpi.h| 10 +
   include/hw/acpi/memory_hotplug.h  |  8 +++-
   include/hw/acpi/pc-hotplug.h  |  3 +-
   include/hw/qdev-core.h|  1 +
   trace-events  |  1 +
   16 files changed, 207 insertions(+), 35 deletions(-)




.






Re: [Qemu-devel] [PATCH 2/3] VFIO: Clear INTx pending state on EEH reset

2015-03-15 Thread Gavin Shan
On Fri, Mar 13, 2015 at 03:51:27PM -0600, Alex Williamson wrote:
>On Wed, 2015-03-11 at 17:11 +1100, Gavin Shan wrote:
>> When Linux guest recovers from EEH error on the following Emulex
>> adapter, the MSIx interrupts are disabled and the INTx emulation
>> is enabled. One INTx interrupt is injected to the guest by host
>> because of detected pending INTx interrupts on the adapter. QEMU
>> disables mmap'ed BAR regions and starts a timer to enable those
>> regions at later point the INTx interrupt handler. Unfortunately,
>> "VFIOPCIDevice->intx.pending" isn't cleared, meaning those disabled
>> mapp'ed BAR regions won't be reenabled properly. It leads to EEH
>> recovery failure at guest side because of hanged MMIO access.
>> 
>>  # lspci | grep Emulex
>>  :01:00.0 Ethernet controller: Emulex Corporation \
>>   OneConnect 10Gb NIC (be3) (rev 02)
>>  :01:00.1 Ethernet controller: Emulex Corporation \
>>   OneConnect 10Gb NIC (be3) (rev 02)
>> 
>> The patch clears "VFIOPCIDevice->intx.pending" after EEH reset
>> is completed on the PE, which contains the adapter. In turn, the
>> mmap'ed BAR regions can be reenabled to avoid EEH recovery failure.
>> 
>> Signed-off-by: Gavin Shan 
>> ---
>>  hw/vfio/pci.c | 14 ++
>>  1 file changed, 14 insertions(+)
>> 
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 8c4a8cb..55e0904 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3352,6 +3352,20 @@ int vfio_container_eeh_event(AddressSpace *as, 
>> int32_t groupid,
>>  }
>>  
>>  break;
>> +case VFIO_EEH_PE_RESET_DEACTIVATE:
>> +/*
>> + * We might have INTx interrupt whose handler disabled the
>> + * memory mapped BARs. Without clearing the INTx pending
>> + * state, the timer kicked by the INTx interrupt handler
>> + * won't enable those disabled memory mapped BARs, which
>> + * leads EEH recovery failure.
>> + */
>> +QLIST_FOREACH(vbasedev, &group->device_list, next) {
>> +vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>> +vdev->intx.pending = false;
>> +}
>
>I'm nervous that "pending" is trying to track that a) the host interrupt
>is masked and b) the emulated INTx line for the device is asserted, but
>we're not clearing the state of any of that here.  We can handle a
>spurious EOI, the device should simply re-assert the interrupt, but
>changing one piece of tracking w/o getting everything in sync seems like
>a looming bug.  Thanks,
>

After the PCI device takes EEH reset, we shouldn't see pending INTx from
hardware side. So I think it's safe to clear "intx.pending" when EEH reset
is completed. However, I think I have to explain more about how this issue
was found and my understanding:

(1) Guest detects EEH errors. The device driver disables MSIx interrupt by
calling pci_disable_msix(), which disable MSIx vectors and then enable
INTx.

(2) QEMU sends IOCTL commands to host to disable MSIx and enable INTx. At
this stage the INTx is still masked. At later point, the guest is requesting
unmasking INTx, which is captured by host. Host checks and founds pending
INTx, which is sent to QEMU. In QEMU INTx handler (vfio_intx_interrupt()),
the mmap'ed regions are disabled, "intx.pending" is set and a timer is started
to reenable mmap'ed regions if "intx.pending" is cleared there. However,
"intx.pending" is only cleared upon BAR access in slow path, which is never
happing.

(3) After guest disables MSIx and issue EEH reset, the device driver starts
to check its firmware state by reading MMIO register, which isn't completed
by QEMU VFIO BAR slow path (Note: fast path supported by mmaped regions have
been disabled). Eventually, the guest hangs on reading MMIO register. With
this patch applied to QEMU, I didn't see the problem again. 

Thanks,
Gavin

>Alex
>
>> +
>> +break;
>>  }
>>  
>>  vfio_put_group(group);
>
>
>




Re: [Qemu-devel] [v2][PATCH 2/2] libxl: introduce gfx_passthru_kind

2015-03-15 Thread Chen, Tiejun

On 2015/3/13 18:11, Ian Campbell wrote:

On Fri, 2015-03-13 at 09:39 +0800, Chen, Tiejun wrote:

I don't think you can abort here, since a user can set
b_info->u.hvm.gfx_passthru_kind to default. You would need to
return an error.


Then, looks I should do this,

LOG(ERROR, "No supported IGD to passthru," " or please force set
gfx_passthru=\"igd\".\n"); return NULL;


If I remember the context correctly this is in the autodetect case,
so I think shouldn't mention IGD. Something like "Unable to detect
graphics passthru kind, please set gfx_passthru_kind. See xl.cfg(5)
for more


s/gfx_passthru_kind/gfx_passthru, right? Because actually we always get
'gfx_passthru_kind' from 'gfx_passthru'.


information."




@@ -720,6 +720,13 @@ void libxl_mac_copy(libxl_ctx *ctx,
libxl_mac *dst, libxl_mac *src);


[snip]


/* * libxl_domain_build_info has the u.hvm.gfx_passthru_kind field
and * the libxl_gfx_passthru_kind enumeration is defined. */ #define
LIBXL_HAVE_GFX_PASSTHRU_KIND

Users don't care about the internal details, just about the existence
of the support.



Updated.

Thanks
Tiejun



[Qemu-devel] [PATCH] fsdev/virtfs-proxy-helper: Fix improper use of negative value

2015-03-15 Thread Shannon Zhao
It's detected by coverity. Check the return value of proxy_marshal.

Signed-off-by: Shannon Zhao 
Signed-off-by: Shannon Zhao 
---
 fsdev/virtfs-proxy-helper.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fsdev/virtfs-proxy-helper.c b/fsdev/virtfs-proxy-helper.c
index c1da2d7..bf2e5f3 100644
--- a/fsdev/virtfs-proxy-helper.c
+++ b/fsdev/virtfs-proxy-helper.c
@@ -262,6 +262,9 @@ static int send_status(int sockfd, struct iovec *iovec, int 
status)
  */
 msg_size = proxy_marshal(iovec, 0, "ddd", header.type,
  header.size, status);
+if (msg_size < 0) {
+return msg_size;
+}
 retval = socket_write(sockfd, iovec->iov_base, msg_size);
 if (retval < 0) {
 return retval;
-- 
1.8.3.1





Re: [Qemu-devel] [PATCH v2] qga/commands-posix: Fix resource leak

2015-03-15 Thread zhanghailiang

On 2015/3/14 17:52, Shannon Zhao wrote:

It's detected by coverity. Close the dirfd.



Reviewed-by: zhanghailiang 


Signed-off-by: Shannon Zhao 
Signed-off-by: Shannon Zhao 
---
 v1->v2: close after use [Stefan Weil]
---
  qga/commands-posix.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index d5bb5cb..ba8de62 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -2283,6 +2283,7 @@ GuestMemoryBlockInfo 
*qmp_guest_get_memory_block_info(Error **errp)

  buf = g_malloc0(20);
  ga_read_sysfs_file(dirfd, "block_size_bytes", buf, 20, &local_err);
+close(dirfd);
  if (local_err) {
  g_free(buf);
  error_propagate(errp, local_err);







Re: [Qemu-devel] [PATCH] block/throttle: Use host clock type

2015-03-15 Thread Fam Zheng
On Fri, 03/13 13:28, Paolo Bonzini wrote:
> 
> 
> On 13/03/2015 13:23, Alberto Garcia wrote:
> > On Fri, Mar 13, 2015 at 02:35:29PM +0800, Fam Zheng wrote:
> > 
> >> Throttle timers won't make any progress when VCPU is not running,
> >> which is prone to stall the request queue in cases like utils,
> >> qtest, suspending, and live migration, unless carefully handled.
> > 
> > Yes, this can be easily reproduced by stopping the VM and starting a
> > block-commit job. If the I/O in that device is throttled then the job
> > will be stalled.
> 
> That may be a different bug.  Should jobs be subject to throttling at all?

You are asking about the next bug below, right?

I lean towards yes, but the problem is that we are in the middle of yes and no.
Before "stop" it is throttled, but upon "stop", it is drained right to the end,
unthrottled.

Fam

> 
> Paolo
> 
> > Then there's also the situation that we discussed in IRC: if the
> > block-commit job is ongoing and then we stop the VM, then the rest of
> > the data will be committed bypassing the throttling settings.  But
> > that's not related to these changes.
> > 
> >> Signed-off-by: Fam Zheng 
> >> ---
> >>  block.c   |  2 +-
> >>  tests/test-throttle.c | 14 +++---
> >>  2 files changed, 8 insertions(+), 8 deletions(-)
> > 
> > Reviewed-By: Alberto Garcia 
> > 
> > Berto
> > 
> > 
> 



Re: [Qemu-devel] [PATCH v5 2/7] aer: impove pcie_aer_init to support vfio device

2015-03-15 Thread Chen Fan


On 03/14/2015 06:25 AM, Alex Williamson wrote:

On Thu, 2015-03-12 at 18:23 +0800, Chen Fan wrote:

pcie_aer_init was used to emulate an aer capability for pcie device,
but for vfio device, the aer config space size is mutable and is not
always equal to PCI_ERR_SIZEOF(0x48). it depends on where the TLP Prefix
register required, so here we add a size argument.

This would need to go through the QEMU PCI tree.  Question, do all of
the fields of the AER capability directly translate to the guest view of
the device?  For instance, are there source and destination values that
need to be emulated for the guest?  I'm not really sure what we're
exposing via the TLP Prefix area.  Thanks,
I think all of the fields of the AER do not need to be emulated for the 
guest.
the guest directly read the register of err status, Header log and TLP 
prefix log.


and we don't need to care what TLP prefix exposes, due to the TLP prefix
area is depended on the End-End TLP Prefix Supported bit. the spec said
If the End-End TLP Prefix Supported bit (Section 7.8.15) is Clear, the 
TLP Prefix

Log register is not required to be implemented. therefore if we directly use
the PCI_ERR_SIZEOF size to initialize aer, the maybe occur size overlap.

Thanks,
Chen





Alex


Signed-off-by: Chen Fan 
---
  hw/pci-bridge/ioh3420.c| 2 +-
  hw/pci-bridge/xio3130_downstream.c | 2 +-
  hw/pci-bridge/xio3130_upstream.c   | 2 +-
  hw/pci/pcie_aer.c  | 4 ++--
  include/hw/pci/pcie_aer.h  | 2 +-
  5 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c
index cce2fdd..4d9cd3f 100644
--- a/hw/pci-bridge/ioh3420.c
+++ b/hw/pci-bridge/ioh3420.c
@@ -129,7 +129,7 @@ static int ioh3420_initfn(PCIDevice *d)
  goto err_pcie_cap;
  }
  pcie_cap_root_init(d);
-rc = pcie_aer_init(d, IOH_EP_AER_OFFSET);
+rc = pcie_aer_init(d, IOH_EP_AER_OFFSET, PCI_ERR_SIZEOF);
  if (rc < 0) {
  goto err;
  }
diff --git a/hw/pci-bridge/xio3130_downstream.c 
b/hw/pci-bridge/xio3130_downstream.c
index b3a6479..9737041 100644
--- a/hw/pci-bridge/xio3130_downstream.c
+++ b/hw/pci-bridge/xio3130_downstream.c
@@ -92,7 +92,7 @@ static int xio3130_downstream_initfn(PCIDevice *d)
  goto err_pcie_cap;
  }
  pcie_cap_arifwd_init(d);
-rc = pcie_aer_init(d, XIO3130_AER_OFFSET);
+rc = pcie_aer_init(d, XIO3130_AER_OFFSET, PCI_ERR_SIZEOF);
  if (rc < 0) {
  goto err;
  }
diff --git a/hw/pci-bridge/xio3130_upstream.c b/hw/pci-bridge/xio3130_upstream.c
index eada582..4d7f894 100644
--- a/hw/pci-bridge/xio3130_upstream.c
+++ b/hw/pci-bridge/xio3130_upstream.c
@@ -81,7 +81,7 @@ static int xio3130_upstream_initfn(PCIDevice *d)
  }
  pcie_cap_flr_init(d);
  pcie_cap_deverr_init(d);
-rc = pcie_aer_init(d, XIO3130_AER_OFFSET);
+rc = pcie_aer_init(d, XIO3130_AER_OFFSET, PCI_ERR_SIZEOF);
  if (rc < 0) {
  goto err;
  }
diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c
index 5a25c32..71045eb 100644
--- a/hw/pci/pcie_aer.c
+++ b/hw/pci/pcie_aer.c
@@ -94,12 +94,12 @@ static void aer_log_clear_all_err(PCIEAERLog *aer_log)
  aer_log->log_num = 0;
  }
  
-int pcie_aer_init(PCIDevice *dev, uint16_t offset)

+int pcie_aer_init(PCIDevice *dev, uint16_t offset, uint16_t size)
  {
  PCIExpressDevice *exp;
  
  pcie_add_capability(dev, PCI_EXT_CAP_ID_ERR, PCI_ERR_VER,

-offset, PCI_ERR_SIZEOF);
+offset, size);
  exp = &dev->exp;
  exp->aer_cap = offset;
  
diff --git a/include/hw/pci/pcie_aer.h b/include/hw/pci/pcie_aer.h

index bcac80a..a4cc6f3 100644
--- a/include/hw/pci/pcie_aer.h
+++ b/include/hw/pci/pcie_aer.h
@@ -87,7 +87,7 @@ struct PCIEAERErr {
  
  extern const VMStateDescription vmstate_pcie_aer_log;
  
-int pcie_aer_init(PCIDevice *dev, uint16_t offset);

+int pcie_aer_init(PCIDevice *dev, uint16_t offset, uint16_t size);
  void pcie_aer_exit(PCIDevice *dev);
  void pcie_aer_write_config(PCIDevice *dev,
 uint32_t addr, uint32_t val, int len);



.






Re: [Qemu-devel] [PATCH v5 7/7] pc: add PC_I440FX_COMPAT to disable aercap for vifo device

2015-03-15 Thread Chen Fan


On 03/14/2015 06:38 AM, Alex Williamson wrote:

On Thu, 2015-03-12 at 18:23 +0800, Chen Fan wrote:

for piix4 chipset, we don't need to expose aer, so introduce
PC_I440FX_COMPAT for all piix4 machines to disable aercap,
and add HW_COMPAT_2_2 to disable aercap for all lower
than 2.3.

440FX is not PCIe, so it doesn't seem like we need to do anything there.
Shouldn't this only cover q35 machine types through 2.3?  (QEMU 2.3 is
already in hard freeze afaik, this won't go in until after 2.4
development opens).  Thanks,

Yes, 440Fx is not PCIe, so extended cap won't be parsed.
I knew this patches won't be in QEMU 2.3. I want to wait for
the 2.4 opens and then rebase this patch.

Thanks,
Chen




Alex


Signed-off-by: Chen Fan 
---
  hw/i386/pc_piix.c   |  9 +
  hw/i386/pc_q35.c|  4 
  include/hw/compat.h | 10 ++
  3 files changed, 23 insertions(+)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 8eab4ba..ff9d312 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -307,6 +307,11 @@ static void pc_init1(MachineState *machine,
  
  static void pc_init_pci(MachineState *machine)

  {
+static GlobalProperty pc_compat_props[] = {
+PC_I440FX_COMPAT,
+{ /* end of list */ }
+};
+qdev_prop_register_global_list(pc_compat_props);
  pc_init1(machine, 1, 1);
  }
  
@@ -534,6 +539,10 @@ static QEMUMachine pc_i440fx_machine_v2_2 = {

  PC_I440FX_2_2_MACHINE_OPTIONS,
  .name = "pc-i440fx-2.2",
  .init = pc_init_pci_2_2,
+.compat_props = (GlobalProperty[]) {
+HW_COMPAT_2_2,
+{ /* end of list */ }
+},
  };
  
  #define PC_I440FX_2_1_MACHINE_OPTIONS   \

diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index c0f21fe..97afb7d 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -431,6 +431,10 @@ static QEMUMachine pc_q35_machine_v2_2 = {
  PC_Q35_2_2_MACHINE_OPTIONS,
  .name = "pc-q35-2.2",
  .init = pc_q35_init_2_2,
+.compat_props = (GlobalProperty[]) {
+HW_COMPAT_2_2,
+{ /* end of list */ }
+},
  };
  
  #define PC_Q35_2_1_MACHINE_OPTIONS  \

diff --git a/include/hw/compat.h b/include/hw/compat.h
index 313682a..40c974a 100644
--- a/include/hw/compat.h
+++ b/include/hw/compat.h
@@ -1,7 +1,17 @@
  #ifndef HW_COMPAT_H
  #define HW_COMPAT_H
  
+#define HW_COMPAT_2_2 PC_I440FX_COMPAT

+
+#define PC_I440FX_COMPAT \
+{\
+.driver   = "vfio-pci",\
+.property = "x-aer",\
+.value= "off",\
+}
+
  #define HW_COMPAT_2_1 \
+HW_COMPAT_2_2, \
  {\
  .driver   = "intel-hda",\
  .property = "old_msi_addr",\



.






Re: [Qemu-devel] [PATCH v5 7/7] pc: add PC_I440FX_COMPAT to disable aercap for vifo device

2015-03-15 Thread Chen Fan


On 03/14/2015 06:38 AM, Alex Williamson wrote:

On Thu, 2015-03-12 at 18:23 +0800, Chen Fan wrote:

for piix4 chipset, we don't need to expose aer, so introduce
PC_I440FX_COMPAT for all piix4 machines to disable aercap,
and add HW_COMPAT_2_2 to disable aercap for all lower
than 2.3.

440FX is not PCIe, so it doesn't seem like we need to do anything there.
Shouldn't this only cover q35 machine types through 2.3?  (QEMU 2.3 is
already in hard freeze afaik, this won't go in until after 2.4
development opens).  Thanks,

Yes, 440Fx is not PCIe, so extended cap won't be parsed.
I knew this patches won't be in QEMU 2.3. I want to wait for
the 2.4 opens and then rebase this patch separately.

Thanks,
Chen




Alex


Signed-off-by: Chen Fan 
---
  hw/i386/pc_piix.c   |  9 +
  hw/i386/pc_q35.c|  4 
  include/hw/compat.h | 10 ++
  3 files changed, 23 insertions(+)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 8eab4ba..ff9d312 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -307,6 +307,11 @@ static void pc_init1(MachineState *machine,
  
  static void pc_init_pci(MachineState *machine)

  {
+static GlobalProperty pc_compat_props[] = {
+PC_I440FX_COMPAT,
+{ /* end of list */ }
+};
+qdev_prop_register_global_list(pc_compat_props);
  pc_init1(machine, 1, 1);
  }
  
@@ -534,6 +539,10 @@ static QEMUMachine pc_i440fx_machine_v2_2 = {

  PC_I440FX_2_2_MACHINE_OPTIONS,
  .name = "pc-i440fx-2.2",
  .init = pc_init_pci_2_2,
+.compat_props = (GlobalProperty[]) {
+HW_COMPAT_2_2,
+{ /* end of list */ }
+},
  };
  
  #define PC_I440FX_2_1_MACHINE_OPTIONS   \

diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index c0f21fe..97afb7d 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -431,6 +431,10 @@ static QEMUMachine pc_q35_machine_v2_2 = {
  PC_Q35_2_2_MACHINE_OPTIONS,
  .name = "pc-q35-2.2",
  .init = pc_q35_init_2_2,
+.compat_props = (GlobalProperty[]) {
+HW_COMPAT_2_2,
+{ /* end of list */ }
+},
  };
  
  #define PC_Q35_2_1_MACHINE_OPTIONS  \

diff --git a/include/hw/compat.h b/include/hw/compat.h
index 313682a..40c974a 100644
--- a/include/hw/compat.h
+++ b/include/hw/compat.h
@@ -1,7 +1,17 @@
  #ifndef HW_COMPAT_H
  #define HW_COMPAT_H
  
+#define HW_COMPAT_2_2 PC_I440FX_COMPAT

+
+#define PC_I440FX_COMPAT \
+{\
+.driver   = "vfio-pci",\
+.property = "x-aer",\
+.value= "off",\
+}
+
  #define HW_COMPAT_2_1 \
+HW_COMPAT_2_2, \
  {\
  .driver   = "intel-hda",\
  .property = "old_msi_addr",\



.






Re: [Qemu-devel] [PATCH v5 0/7] pass aer error to guest for vfio device

2015-03-15 Thread Chen Fan

Cc: Michael S. Tsirkin

On 03/12/2015 06:23 PM, Chen Fan wrote:

For now, for vfio pci passthough devices when qemu receives
an error from host aer report, there just terminate the guest,
but usually user want to know what error occurred but stop the
guest, so this patches add aer capability support for vfio device,
and pass the error to guest, and have guest driver to recover
from the error.
and turning on SERR# for error forwording in bridge control register
patch in seabios has been merged.

v3-v4:
1. add 'x-aer' for user to off aer capability.
2. refactor vfio device to parse extended capabilities.

v2-v3:
1. refactor vfio device to parse extended capability.
2. add global property for piix4 to disable vfio aer cap.

v1-v2:
1. turn on SERR# for bridge control register in firmware.
2. initilize aer capability for vfio device.
3. fix some trivial bug.

Chen Fan (7):
   vfio: add pcie extanded capability support
   aer: impove pcie_aer_init to support vfio device
   vfio: add aer support for vfio device
   pcie_aer: expose pcie_aer_msg() interface
   vfio-pci: pass the aer error to guest
   vfio: add 'x-aer' property to expose aercap
   pc: add PC_I440FX_COMPAT to disable aercap for vifo device

  hw/i386/pc_piix.c  |   9 +++
  hw/i386/pc_q35.c   |   4 +
  hw/pci-bridge/ioh3420.c|   2 +-
  hw/pci-bridge/xio3130_downstream.c |   2 +-
  hw/pci-bridge/xio3130_upstream.c   |   2 +-
  hw/pci/pcie_aer.c  |   6 +-
  hw/vfio/pci.c  | 158 +++--
  include/hw/compat.h|  10 +++
  include/hw/pci/pcie_aer.h  |   3 +-
  9 files changed, 182 insertions(+), 14 deletions(-)






Re: [Qemu-devel] [PATCH v5 5/7] vfio-pci: pass the aer error to guest

2015-03-15 Thread Chen Fan


On 03/14/2015 06:34 AM, Alex Williamson wrote:

On Thu, 2015-03-12 at 18:23 +0800, Chen Fan wrote:

when the vfio device encounters an uncorrectable error in host,
the vfio_pci driver will signal the eventfd registered by this
vfio device, the results in the qemu eventfd handler getting
invoked.

this patch is to pass the error to guest and have the guest driver
recover from the error.

What is going to be the typical recovery mechanism for the guest?  I'm
concerned that the topology of the device in the guest doesn't
necessarily match the topology of the device in the host, so if the
guest were to attempt a bus reset to recover a device, for instance,
what happens?

the recovery mechanism is that when guest got an aer error from a device,
guest will clean the corresponding status bit in device register. and for
need reset device, the guest aer driver would reset all devices under bus.

Thanks,
Chen



Signed-off-by: Chen Fan 
---
  hw/vfio/pci.c | 34 --
  1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 0a515b6..8966c49 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3240,18 +3240,40 @@ static void vfio_put_device(VFIOPCIDevice *vdev)
  static void vfio_err_notifier_handler(void *opaque)
  {
  VFIOPCIDevice *vdev = opaque;
+PCIDevice *dev = &vdev->pdev;
+PCIEAERMsg msg = {
+.severity = 0,
+.source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn,
+};
  
  if (!event_notifier_test_and_clear(&vdev->err_notifier)) {

  return;
  }
  
+/* we should read the error details from the real hardware

+ * configuration spaces, here we only need to do is signaling
+ * to guest an uncorrectable error has occurred.
+ */

Inconsistent comment style


+ if(dev->exp.aer_cap) {

  ^ space


+uint8_t *aer_cap = dev->config + dev->exp.aer_cap;
+uint32_t uncor_status;
+bool isfatal;
+
+uncor_status = vfio_pci_read_config(dev,
+   dev->exp.aer_cap + PCI_ERR_UNCOR_STATUS, 4);
+
+isfatal = uncor_status & pci_get_long(aer_cap + PCI_ERR_UNCOR_SEVER);
+
+msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN :
+ PCI_ERR_ROOT_CMD_NONFATAL_EN;
+
+pcie_aer_msg(dev, &msg);
+return;
+}
+
  /*
- * TBD. Retrieve the error details and decide what action
- * needs to be taken. One of the actions could be to pass
- * the error to the guest and have the guest driver recover
- * from the error. This requires that PCIe capabilities be
- * exposed to the guest. For now, we just terminate the
- * guest to contain the error.
+ * If the aer capability is not exposed to the guest. we just
+ * terminate the guest to contain the error.
   */
  
  error_report("%s(%04x:%02x:%02x.%x) Unrecoverable error detected.  "



.






Re: [Qemu-devel] [PATCH 0/2] virtio len fixes for qemu.

2015-03-15 Thread Rusty Russell
"Michael S. Tsirkin"  writes:
> On Fri, Mar 13, 2015 at 11:47:18AM +1030, Rusty Russell wrote:
>> Here's my proposed spec patch, which spells this out:
>> 
>> diff --git a/content.tex b/content.tex
>> index 6ba079d..b6345a8 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -600,10 +600,19 @@ them: it is only written to by the device, and read by 
>> the driver.
>>  Each entry in the ring is a pair: \field{id} indicates the head entry of the
>>  descriptor chain describing the buffer (this matches an entry
>>  placed in the available ring by the guest earlier), and \field{len} the 
>> total
>> -of bytes written into the buffer. The latter is extremely useful
>> +of bytes written into the buffer. 
>> +
>> +\begin{note}
>> +\field{len} is extremely useful
>
> just "useful" maybe?

OK.

>>  for drivers using untrusted buffers: if you do not know exactly
>
> replace "you" with "driver" here?

Yep.

>> -how much has been written by the device, you usually have to zero
>> -the buffer to ensure no data leakage occurs.
>> +how much has been written by the device, a driver would have to zero
>> +the buffer in advance to ensure no data leakage occurs.
>> +
>> +For example, a network driver
>
> any driver really, right?

Well, the block device has an explicit status byte, and an fixed length.

But there's a subtler detail I was considering when I designed this.

Imagine a Xen-style "driver domain" which is actually your device; it's
*untrusted*.  This is possible if the (trusted) host does that actual
data transfer, *and* reports the length; and such a mechanism is
generic, so the host doesn't need to whether this is a block, net, or
other device.

(Imagine the device-guest has R/O mapping of the avail ring and
 descriptor table.  Ignoring indirect descriptors you only need a "copy
 this data to/from this avail entry" helper to make this work).

> How about something like this:
>
> +The device MUST write at least \field{len} bytes to descriptor,
> +beginning at the first device-writable buffer,
> +prior to updating the used index field.
> +The device MAY write more than \field{len} bytes to descriptor.
> +The driver MUST NOT make assumptions about data in the buffer pointed to
> +by the descriptor with WRITE flag
> +beyond the first \field{len} bytes: the data
> +might be unchanged by the device, or it might be
> +overwritten by the device.
> +The driver SHOULD ignore data beyond the first \field{len} bytes.

I like these, as long as we note that this MAY is to allow error cases,
otherwise people might think they should just set len to zero.

Here it is, using the device-writable terminology, and explicitly
requiring that the device must set len (otherwise the requirements
about the device obeying len makes it look like it's set by the driver):

diff --git a/content.tex b/content.tex
index 6ba079d..2c946a5 100644
--- a/content.tex
+++ b/content.tex
@@ -600,10 +600,19 @@ them: it is only written to by the device, and read by 
the driver.
 Each entry in the ring is a pair: \field{id} indicates the head entry of the
 descriptor chain describing the buffer (this matches an entry
 placed in the available ring by the guest earlier), and \field{len} the total
-of bytes written into the buffer. The latter is extremely useful
-for drivers using untrusted buffers: if you do not know exactly
-how much has been written by the device, you usually have to zero
-the buffer to ensure no data leakage occurs.
+of bytes written into the buffer. 
+
+\begin{note}
+\field{len} is useful
+for drivers using untrusted buffers: if a driver does not know exactly
+how much has been written by the device, the driver would have to zero
+the buffer in advance to ensure no data leakage occurs.
+
+For example, a network driver may hand a received buffer directly to
+an unprivileged userspace application.  If the network device has not
+overwritten the bytes which were in that buffer, this may leak the
+contents of freed memory from other processes to the application.
+\end{note}
 
 \begin{note}
 The legacy \hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]}
@@ -612,6 +621,28 @@ the constant as VRING_USED_F_NO_NOTIFY, but the layout and 
value were
 identical.
 \end{note}
 
+\devicenormative{\subsubsection}{Virtqueue Notification Suppression}{Basic 
Facilities of a Virtio Device / Virtqueues / The Virtqueue Used Ring}
+
+The device MUST set \field{len} prior to updating the used \field{idx}.
+
+The device MUST write at least \field{len} bytes to descriptor,
+beginning at the first device-writable buffer,
+prior to updating the used \field{idx}.
+
+The device MAY write more than \field{len} bytes to descriptor.
+
+\begin{note}
+There are potential error cases where a device might not know what
+parts of the buffers have been written.  This is why \field{len} may
+be an underestimate, but that's preferable to the driver believing
+that uninitialized memory has been overwritten when it has not.
+\end{note}
+
+\drivernormative{\subsubsection}{Virtq

Re: [Qemu-devel] [PATCH v5 5/7] vfio-pci: pass the aer error to guest

2015-03-15 Thread Alex Williamson
On Mon, 2015-03-16 at 11:05 +0800, Chen Fan wrote:
> On 03/14/2015 06:34 AM, Alex Williamson wrote:
> > On Thu, 2015-03-12 at 18:23 +0800, Chen Fan wrote:
> >> when the vfio device encounters an uncorrectable error in host,
> >> the vfio_pci driver will signal the eventfd registered by this
> >> vfio device, the results in the qemu eventfd handler getting
> >> invoked.
> >>
> >> this patch is to pass the error to guest and have the guest driver
> >> recover from the error.
> > What is going to be the typical recovery mechanism for the guest?  I'm
> > concerned that the topology of the device in the guest doesn't
> > necessarily match the topology of the device in the host, so if the
> > guest were to attempt a bus reset to recover a device, for instance,
> > what happens?
> the recovery mechanism is that when guest got an aer error from a device,
> guest will clean the corresponding status bit in device register. and for
> need reset device, the guest aer driver would reset all devices under bus.

Sorry, I'm still confused, how does the guest aer driver reset all
devices under a bus?  Are we talking about function-level, device
specific reset mechanisms or secondary bus resets?  If the guest is
performing secondary bus resets, what guarantee do they have that it
will translate to a physical secondary bus reset?  vfio may only do an
FLR when the bus is reset or it may not be able to do anything depending
on the available function-level resets and physical and virtual topology
of the device.  Thanks,

Alex

> >> Signed-off-by: Chen Fan 
> >> ---
> >>   hw/vfio/pci.c | 34 --
> >>   1 file changed, 28 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> >> index 0a515b6..8966c49 100644
> >> --- a/hw/vfio/pci.c
> >> +++ b/hw/vfio/pci.c
> >> @@ -3240,18 +3240,40 @@ static void vfio_put_device(VFIOPCIDevice *vdev)
> >>   static void vfio_err_notifier_handler(void *opaque)
> >>   {
> >>   VFIOPCIDevice *vdev = opaque;
> >> +PCIDevice *dev = &vdev->pdev;
> >> +PCIEAERMsg msg = {
> >> +.severity = 0,
> >> +.source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn,
> >> +};
> >>   
> >>   if (!event_notifier_test_and_clear(&vdev->err_notifier)) {
> >>   return;
> >>   }
> >>   
> >> +/* we should read the error details from the real hardware
> >> + * configuration spaces, here we only need to do is signaling
> >> + * to guest an uncorrectable error has occurred.
> >> + */
> > Inconsistent comment style
> >
> >> + if(dev->exp.aer_cap) {
> >   ^ space
> >
> >> +uint8_t *aer_cap = dev->config + dev->exp.aer_cap;
> >> +uint32_t uncor_status;
> >> +bool isfatal;
> >> +
> >> +uncor_status = vfio_pci_read_config(dev,
> >> +   dev->exp.aer_cap + PCI_ERR_UNCOR_STATUS, 4);
> >> +
> >> +isfatal = uncor_status & pci_get_long(aer_cap + 
> >> PCI_ERR_UNCOR_SEVER);
> >> +
> >> +msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN :
> >> + PCI_ERR_ROOT_CMD_NONFATAL_EN;
> >> +
> >> +pcie_aer_msg(dev, &msg);
> >> +return;
> >> +}
> >> +
> >>   /*
> >> - * TBD. Retrieve the error details and decide what action
> >> - * needs to be taken. One of the actions could be to pass
> >> - * the error to the guest and have the guest driver recover
> >> - * from the error. This requires that PCIe capabilities be
> >> - * exposed to the guest. For now, we just terminate the
> >> - * guest to contain the error.
> >> + * If the aer capability is not exposed to the guest. we just
> >> + * terminate the guest to contain the error.
> >>*/
> >>   
> >>   error_report("%s(%04x:%02x:%02x.%x) Unrecoverable error detected.  "
> >
> >
> > .
> >
> 






[Qemu-devel] [PATCH qemu] profiler: Reenable built-in profiler

2015-03-15 Thread Alexey Kardashevskiy
2ed1ebcf6 "timer: replace time() with QEMU_CLOCK_HOST" broke compile
when configured with --enable-profiler. Turned out the profiler has been
broken for a while.

This does s/qemu_time/tcg_time/ as the profiler only works in a TCG mode.
This also fixes the compile error.

This changes profile_getclock() to return nanoseconds rather than
CPU ticks as the "profile" HMP command prints seconds and there is no
platform-independent way to get ticks-per-second rate.
Since TCG is quite slow and get_clock() returns nanoseconds (fine
enough), this should not affect precision much.

This removes unused qemu_time_start and tlb_flush_time.

Signed-off-by: Alexey Kardashevskiy 
---


On 03/10/2015 10:54 PM, Paolo Bonzini wrote:> 
> s/qemu_time/tcg_time/, but if Peter is not using it and Fabrice broke
> it, might as well remove it.
> 
> Paolo


Which Peter did you mean here, Paolo?



---
 cpus.c   | 2 +-
 include/qemu/timer.h | 5 ++---
 monitor.c| 6 +++---
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/cpus.c b/cpus.c
index 1ce90a1..314df16 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1353,7 +1353,7 @@ static int tcg_cpu_exec(CPUArchState *env)
 }
 ret = cpu_exec(env);
 #ifdef CONFIG_PROFILER
-qemu_time += profile_getclock() - ti;
+tcg_time += profile_getclock() - ti;
 #endif
 if (use_icount) {
 /* Fold pending instructions back into the
diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index eba8b21..e5bd494 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -999,11 +999,10 @@ static inline int64_t cpu_get_real_ticks (void)
 #ifdef CONFIG_PROFILER
 static inline int64_t profile_getclock(void)
 {
-return cpu_get_real_ticks();
+return get_clock();
 }
 
-extern int64_t qemu_time, qemu_time_start;
-extern int64_t tlb_flush_time;
+extern int64_t tcg_time;
 extern int64_t dev_time;
 #endif
 
diff --git a/monitor.c b/monitor.c
index c86a89e..4e67774 100644
--- a/monitor.c
+++ b/monitor.c
@@ -1974,7 +1974,7 @@ static void hmp_info_numa(Monitor *mon, const QDict 
*qdict)
 
 #ifdef CONFIG_PROFILER
 
-int64_t qemu_time;
+int64_t tcg_time;
 int64_t dev_time;
 
 static void hmp_info_profile(Monitor *mon, const QDict *qdict)
@@ -1982,8 +1982,8 @@ static void hmp_info_profile(Monitor *mon, const QDict 
*qdict)
 monitor_printf(mon, "async time  %" PRId64 " (%0.3f)\n",
dev_time, dev_time / (double)get_ticks_per_sec());
 monitor_printf(mon, "qemu time   %" PRId64 " (%0.3f)\n",
-   qemu_time, qemu_time / (double)get_ticks_per_sec());
-qemu_time = 0;
+   tcg_time, tcg_time / (double)get_ticks_per_sec());
+tcg_time = 0;
 dev_time = 0;
 }
 #else
-- 
2.0.0




Re: [Qemu-devel] [PATCH qemu] pseries: Update SLOF firmware image to qemu-slof-20150313

2015-03-15 Thread Alexey Kardashevskiy

On 03/13/2015 10:45 PM, Alexey Kardashevskiy wrote:

The changelog is:
   > virtio: Fix vring allocation
   > helpers: Fix SLOF_alloc_mem_aligned to meet callers expectation
   > Set default palette according to "16-color Text Extension" document
   > Fix rectangle drawing functions to work also with higher bit depths
   > Fix the x86emu patch file
   > Silence compiler warning when building the biosemu
   > Use device-type Forth word to set up the corresponding property
   > Improve /openprom node
   > pci-properties: Remove redundant call to device-type
   > cas: reconfigure memory nodes
   > pci: use 64bit bar ranges



It would be really nice to have in 2.3 (I saw "hardfreeze" mail), or the 
whole pseries-2.3 machine thing won't make sense. Sorry, I missed this 
earlier, I honestly thought 64bit BARs are in upstream QEMU for a long time 
already.





Signed-off-by: Alexey Kardashevskiy 
---

I pushed it to github and I expect it to get mirrord to qemu.org tonight.

---
  pc-bios/README   |   2 +-
  pc-bios/slof.bin | Bin 911704 -> 912192 bytes
  roms/SLOF|   2 +-
  3 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/pc-bios/README b/pc-bios/README
index 8a85e69..63e7254 100644
--- a/pc-bios/README
+++ b/pc-bios/README
@@ -17,7 +17,7 @@
  - SLOF (Slimline Open Firmware) is a free IEEE 1275 Open Firmware
implementation for certain IBM POWER hardware.  The sources are at
https://github.com/aik/SLOF, and the image currently in qemu is
-  built from git tag qemu-slof-20141202.
+  built from git tag qemu-slof-20150313.

  - sgabios (the Serial Graphics Adapter option ROM) provides a means for
legacy x86 software to communicate with an attached serial console as
diff --git a/pc-bios/slof.bin b/pc-bios/slof.bin
index 
031e3063a277e769c78b637de13c93f4074389f5..ab72cba80c528aea2545974e9c717cee583c254c
 100644
GIT binary patch
delta 10238
zcmeHMdw5e-wm|PwXrbXD
zlvfF_tRjU5M5qp80LR0q0~v>zTW5SkuQQkH_>FPo&fst_)^T(SGIG~C`xMg3z4Lwd
zyZ_wpyZM^@@>{>P*Is+=eRfXIKL7rN$ZsbE^5sj89R+t9`zG`mGGT`|E%k8wedg=>
zYuB>G*(ZK~Xp-V8xNDKUv@Ab=tjk`Kzo_g~!11zN@L-9LkG+%U*~gCOd3kr=IpOZS
z0*jeX`vITl_EmeTD*3&9v8N{YJFV-dEIDhg>X7J_RCO9UWHwt)IuDKnmM@!%)w%@s
z=}_hY=P;H5ZFc=)n|
z$yT6pdL%Y@5TX-V77Q81CbLhv8mB{G6gD19Npj#h%m{Ce!W8yvSqDe6UF<#d?GNG6
zY!GupbTrG88@|Bs8PGo$O$$KDV|J*>WiMe$WS*7H`WdA{$^V{4?ZeSFe-i6s>AX@X
zNf%dN?*LheN*o%N@UKs@
z26j-bd5WDfu@@m{9p=3a3fHj-Y%2KIu}7^+N5R^)k=4IARaYqeW*b~w$L6vtB;Bw2
z*R#hA>@)bNkv+jyLE$Dg6U()G6B{E(8Jye1e$B>1U^AP-zED5d%z)tpRcvMbu-=VZ
z**9#kx_%p`k-3dhtY6h{<>a*q#SQ1atSbmfiG|XeQz6{Mj^6tq$pwBzntA<=$@Kk>
zXVLwses!JoQpV&1^7s7pG^#(}F8m~^&Mol<*Y7al_Q9}wJNp~kt8NXjF|2ocGWj$-
zd#SF#FDWyxM@GO~JJ_VmpJ+|-W9Id5`)l79t?Mk5{^=lO?_@t@%Yt)poo@Jk?AM9o(Mm5-j>tQN*gDjops*OSPVNZj-nT_hJ1=oVEk-P)cYL%pWS|s@!#6s!ygQQV4>fQb78rFS^!#T5Gvytz|
zn$IdI-;49H5?4?s|y6YIGm_k<5Mq^#|Bu
z)(b8iVE3?JsJxX`CnT3>5q@rZ8Nv%#4*1KN0qpDXBXEux)%Omu*$ivT599iKTb+Lx
z7Zk3N!_P65)=tq;_H*_e@HX}o8v!kCxT1DI#cRnK>a6G4QJK9bNa`^*of);B!2Tj$
zO5PXQI^zlIS>vj5oLypUM$jZ@0}rz=anYuoU|U!%gidIiLdHq9j(RxB2KGKGmJHUb
zVg09dGq3M|6W%(BJIH5{dWwz0i>2Tco6zeKZCp2<`hx!yn`5cf%%6UPb7%L!N2gdn
z+uLNqWp)mMUd*2)NyTvD5xEZp)MQ2FFSBoxWlM+3&#>PmvNzRh=h&x-Y`hx#HQSOv
zm(aU-32lSMciA7TR&OVhl+MA|cg(#0+sDC~%(&VdWmTq3t)I^pN(#-_t|?OSl__+U
z{L5D=KhHi)v09@yTfXtOXqj_?{c|7oJY4%5)_)m{>%_G+02({lKE@#X3fqqG@)h*#|zG-=xY~ruS3U}!2MnbqYOVR
zlhftaD8$HDk9$xLaYz!rheF_}M41cFy$(AeB&-?)AWFRwvqTj?0W%RH5(xz)9cs%w1QjyD)HrkTr1YjR87
zi)t4m;bVsIDHPOGxwxwlHh1~>_Y0@aHdne=pzCsv%dG`&K&eN}N{^Qdk6e)or&keo
zmiVgOyr$gEwM0qlmSB%CzN#8twcPFFqxe{S=MvAwueliNf!d{BUQ@+ON{5#Ecu84}
zn-6geF?*?p+&oWZ4W}gPDyqE|o*E8IKQ?5hbtO#|3~3EiB^-ul
z(bCbSD={sPE7#|CRaKUG7T5Z)4po)8qImzfcHN5ij{{<_E?@ue6gIAUT81fkrM@aJ
z*FGV9*^qua&e3d|?Mf_4b**$1elOLWj`Mz0Gx!2{%t#sGbP<`{qY3oFM6mhT5UFNQF
zm6t54aP$0exr=bNT3U=l<@|B#oc+d4Pn2K-hrWPswk+aP%H6IfxTlP-aP#t#S
zrObULm5bfx_>RTU<1P8)v;wMU2c=G93$)b!7kXOisp{SpoamF7HoAzHRr&a$suEvm
z?lSk%TFQHcr=o(_;?+`3*Rn{NW)Co|mO){PX`-Ck42^_?IkcykF3U;9>giNdK9iFq=&+c8jaD12rg^EjHL0;IP2g%q
zf74la=IhvrjBhjbCP@Y-eOi?jVdz+6O3#cUyM|LO$hqi3ot
z-40%|UJ!v-cR~nhW+KKh;5&~3N^9QlPheFV%n&8HH!_kP_$HYXpqv$4kLo}q@r^Td
zbqL;~p*-xrkVk4Vy9UlJn3x~$<(x+AN7eNQfCz18V
zw@8~be4+db@TXCJ6&cBTYZ5j23^-9{=fH8*oMx~pYkGs9{B0=5_ZATTYce2A{;!J?
zuEAeaqIwC}bb_)0X`2|wx((nU+5}#LGXf$ILIxW*nEKPi_L0|hK$P@UZN}yGw%|rg
zfGSDrLQmMJ4|jbI*imN9y7OTuCVGZ?vdQb;pgs`4*7;S4Y$X0I1U8!b!Iv9xAr(RN
zL9{)+6?h|d)PkKnkMAA}>FML(B>5GT;tG+V&*;;q!H+U)hv+5*1BA&rOLx{MA)x>{
zPQr#E5F_Vo@UFm~H_QSB8)cUD@OJ9nmEfS5Ufp{=c*#51!@FGfJ^&%|4(Z-uEg^V!
zlPQCB1~+3vx>Ba$yT<@GmVuq*C3;-F&j=lGLPu4UBmP1
zuE!xlu3?cvV>T_0bXGvuzfEqO1$+zAV!FI+@+Jo@j&xRtw>?Bl!B5l-PBh;B#Pv!0**ps5!3qm`t
zrOHlfnzk>+r_`14t0JWdyYWIgwcC`I`Aa=_v0iQrZBHu1D5nD815#+`-*X=*dvL4|
zpnv>?d68o&_RInY$(wb#tFC@s#!Vh&T-zN%oP|BOD-k_H$=c+g0ir~?fN{YflKTaj
z4|ZZ@SQK=ugkofwO+9)Eh+cv_AwbT};Qv=s-vkBbLj*aT4dSX?PhF?8%~UO$3_F1X
zQv6jR7%c#=b=o4D@m7hf6_{^^B_vzxSMGC?^_bwi*z~7vrXO?5ee+O296@+ExnlOl9!_#e;J36W2{3@LNUp!by@Gq&?WmJ
zfHM0K_uA;ZaTMaLJt;3WDM<%P5Z{6r08waLeeAWXDA_#Clj!Ojcc<9`9=AesM4
z5pD8IW^f!t=Q;=x>;x~ujT;<&E{$~Gg0u5dv+!yIeuUaygeZB_y$uTDC4O}fM6OFlXH(9&*;}f%@STkqT_H2M5

Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/3] VFIO: Clear INTx pending state on EEH reset

2015-03-15 Thread Benjamin Herrenschmidt
On Mon, 2015-03-16 at 12:04 +1100, Gavin Shan wrote:
> 
> 
> (2) QEMU sends IOCTL commands to host to disable MSIx and enable INTx. At
> this stage the INTx is still masked. At later point, the guest is requesting
> unmasking INTx, which is captured by host. Host checks and founds pending
> INTx, which is sent to QEMU. In QEMU INTx handler (vfio_intx_interrupt()),
> the mmap'ed regions are disabled, "intx.pending" is set and a timer is started
> to reenable mmap'ed regions if "intx.pending" is cleared there. However,
> "intx.pending" is only cleared upon BAR access in slow path, which is never
> happing.
> 
> (3) After guest disables MSIx and issue EEH reset, the device driver starts
> to check its firmware state by reading MMIO register, which isn't completed
> by QEMU VFIO BAR slow path (Note: fast path supported by mmaped regions have
> been disabled). Eventually, the guest hangs on reading MMIO register. With
> this patch applied to QEMU, I didn't see the problem again. 

Note that it might be a good idea to disable INTx (and synchronize with a cfg
read of some sort) around resetting a device.

Otherwise, you may hit a known issue if the device is behind a switch and has
sent the INTx "assert" message, and not the "deassert" one before it gets reset.

That can cause the INTx to effectively be "stuck" in the switch preventing a
subsequent one from being delivered.

Cheers,
Ben.





Re: [Qemu-devel] [Qemu-Dev-QUERY] Related to Live Migration source Code or API for NUMA Node specific

2015-03-15 Thread Arkajit Ghosh
 Hi Team,

Can anyone please provide their point of views related to my below query.

Thanks & Regards
Arkajit Ghosh



-Arkajit Ghosh/DEL/TCS wrote: -
To: qemu-devel@nongnu.org
From: Arkajit Ghosh/DEL/TCS
Date: 03/12/2015 02:28PM
Subject: Re: [Qemu-Dev-QUERY] Related to Live Migration source Code or API for 
NUMA Node specific

 Hi Team,

Can anyone please provide their point of views related to my below query.

Thanks & Regards
Arkajit Ghosh


-Arkajit Ghosh/DEL/TCS wrote: -
To: qemu-devel@nongnu.org
From: Arkajit Ghosh/DEL/TCS
Date: 03/12/2015 09:59AM
Subject: Re: [Qemu-Dev-QUERY] Related to Live Migration source Code or API for 
NUMA Node specific

 Hi Team,

Can anyone please provide their point of views related to my below query.

Thanks & Regards
Arkajit Ghosh


-Arkajit Ghosh/DEL/TCS wrote: -
To: qemu-devel@nongnu.org
From: Arkajit Ghosh/DEL/TCS
Date: 03/11/2015 09:49PM
Subject: Re: [Qemu-Dev] Related to Live Migration source Code or API for NUMA 
Node specific

 
Hi,

Can anyone please provide their point of views related to my below query.

Thanks & Regards
Arkajit Ghosh


-Arkajit Ghosh/DEL/TCS wrote: -
To: qemu-devel@nongnu.org
From: Arkajit Ghosh/DEL/TCS
Date: 03/11/2015 05:03PM
Cc: Arkajit Ghosh/DEL/TCS@TCS
Subject: [Qemu-Dev] Related to Live Migration source Code or API for NUMA Node 
specific

Hi Team,

I was went through Live Migration source code in Qemu. I didn't find any 
specific API for NUMA which can able to copy the dirty pages for Local and 
Remote memory corresponding to individual CPU core.

As I understood, in NUMA aware Intel CPU Processor has one Local memory which 
is attached to dedicated CPU core (individual) and also can able to access 
remote memory. So during Live migration, dirty pages can be created in local 
memory and as well as can create in remote memory also.  

My query is, is there any source code or API exposed in Qemu which can able to 
access that offset addresses of Local memory and Remote memory and then able to 
copy the dirty pages? If "yes" can you please let me know how it will work? If 
"No" then I need to think different approaches. 

Please let me know if my query is not clear to you. I hope this is the right 
place to put this query.

Thanks & Regards
Arkajit Ghosh
   
=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you




Re: [Qemu-devel] [PATCH v2] qga/commands-posix: Fix resource leak

2015-03-15 Thread Shannon Zhao
On 2015/3/14 17:56, Stefan Weil wrote:
> Am 14.03.2015 um 10:52 schrieb Shannon Zhao:
>> It's detected by coverity. Close the dirfd.
>>
>> Signed-off-by: Shannon Zhao 
>> Signed-off-by: Shannon Zhao 
>> ---
>>  v1->v2: close after use [Stefan Weil]
>> ---
>>   qga/commands-posix.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/qga/commands-posix.c b/qga/commands-posix.c
>> index d5bb5cb..ba8de62 100644
>> --- a/qga/commands-posix.c
>> +++ b/qga/commands-posix.c
>> @@ -2283,6 +2283,7 @@ GuestMemoryBlockInfo 
>> *qmp_guest_get_memory_block_info(Error **errp)
>> buf = g_malloc0(20);
>>   ga_read_sysfs_file(dirfd, "block_size_bytes", buf, 20, &local_err);
>> +close(dirfd);
>>   if (local_err) {
>>   g_free(buf);
>>   error_propagate(errp, local_err);
> 
> Thanks.
> 
> Reviewed-by: Stefan Weil 
> 
Hi Stefan,

Thanks for your review.

Shannon




Re: [Qemu-devel] [PATCH v2] qga/commands-posix: Fix resource leak

2015-03-15 Thread Shannon Zhao
On 2015/3/16 9:27, zhanghailiang wrote:
> On 2015/3/14 17:52, Shannon Zhao wrote:
>> It's detected by coverity. Close the dirfd.
>>
> 
> Reviewed-by: zhanghailiang 
> 

Hi zhanghailiang,

Thanks for your review.

Shannon

>> Signed-off-by: Shannon Zhao 
>> Signed-off-by: Shannon Zhao 
>> ---
>>  v1->v2: close after use [Stefan Weil]
>> ---
>>   qga/commands-posix.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/qga/commands-posix.c b/qga/commands-posix.c
>> index d5bb5cb..ba8de62 100644
>> --- a/qga/commands-posix.c
>> +++ b/qga/commands-posix.c
>> @@ -2283,6 +2283,7 @@ GuestMemoryBlockInfo 
>> *qmp_guest_get_memory_block_info(Error **errp)
>>
>>   buf = g_malloc0(20);
>>   ga_read_sysfs_file(dirfd, "block_size_bytes", buf, 20, &local_err);
>> +close(dirfd);
>>   if (local_err) {
>>   g_free(buf);
>>   error_propagate(errp, local_err);
>>
> 
> 
> 
> .
> 




Re: [Qemu-devel] [PATCH v5 0/7] pass aer error to guest for vfio device

2015-03-15 Thread Michael S. Tsirkin
On Mon, Mar 16, 2015 at 10:52:52AM +0800, Chen Fan wrote:
> Cc: Michael S. Tsirkin
> 
> On 03/12/2015 06:23 PM, Chen Fan wrote:
> >For now, for vfio pci passthough devices when qemu receives
> >an error from host aer report, there just terminate the guest,
> >but usually user want to know what error occurred but stop the
> >guest, so this patches add aer capability support for vfio device,
> >and pass the error to guest, and have guest driver to recover
> >from the error.
> >and turning on SERR# for error forwording in bridge control register
> >patch in seabios has been merged.
> >
> >v3-v4:
> >1. add 'x-aer' for user to off aer capability.

User-exposed properties should not start with x- - by convention
that's for internal properties.

> >2. refactor vfio device to parse extended capabilities.
> >
> >v2-v3:
> >1. refactor vfio device to parse extended capability.
> >2. add global property for piix4 to disable vfio aer cap.
> >
> >v1-v2:
> >1. turn on SERR# for bridge control register in firmware.
> >2. initilize aer capability for vfio device.
> >3. fix some trivial bug.
> >
> >Chen Fan (7):
> >   vfio: add pcie extanded capability support
> >   aer: impove pcie_aer_init to support vfio device
> >   vfio: add aer support for vfio device
> >   pcie_aer: expose pcie_aer_msg() interface
> >   vfio-pci: pass the aer error to guest
> >   vfio: add 'x-aer' property to expose aercap
> >   pc: add PC_I440FX_COMPAT to disable aercap for vifo device
> >
> >  hw/i386/pc_piix.c  |   9 +++
> >  hw/i386/pc_q35.c   |   4 +
> >  hw/pci-bridge/ioh3420.c|   2 +-
> >  hw/pci-bridge/xio3130_downstream.c |   2 +-
> >  hw/pci-bridge/xio3130_upstream.c   |   2 +-
> >  hw/pci/pcie_aer.c  |   6 +-
> >  hw/vfio/pci.c  | 158 
> > +++--
> >  include/hw/compat.h|  10 +++
> >  include/hw/pci/pcie_aer.h  |   3 +-
> >  9 files changed, 182 insertions(+), 14 deletions(-)
> >



Re: [Qemu-devel] [RFC PATCH] target-ppc: Register CPU class per family only when needed

2015-03-15 Thread Alexey Kardashevskiy

On 03/06/2015 12:17 AM, Alexander Graf wrote:



On 05.03.15 02:56, Alexey Kardashevskiy wrote:

At the moment when running in KVM mode, QEMU registers "host" class to
match the current CPU PVR value. It also registers another CPU class
with a CPU family name os if we run QEMU on POWER7 machine, "host" and
"POWER7" classes are created, this way we can always use "-cpu POWER7"
on the actual POWER7 machine.

The existing code uses DeviceClass::desc field of the CPU class as
a source for the class name; it was pointed out that it is wrong to use
user-visible string as a type name.

This adds a common CPU class name into PowerPCCPUClass struct.
This makes registration of a CPU named after the family conditional -
PowerPCCPUClass::common_cpu_name has to be non-zero. Only POWER7/POWER8
families have this field initialized by now.

Signed-off-by: Alexey Kardashevskiy 


LGTM. Andreas, do you agree?



Ping?





Alex



---
  target-ppc/cpu-qom.h|  1 +
  target-ppc/kvm.c| 11 ++-
  target-ppc/translate_init.c |  2 ++
  3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/target-ppc/cpu-qom.h b/target-ppc/cpu-qom.h
index 6967a80..4b471d7 100644
--- a/target-ppc/cpu-qom.h
+++ b/target-ppc/cpu-qom.h
@@ -55,6 +55,7 @@ typedef struct PowerPCCPUClass {
  DeviceRealize parent_realize;
  void (*parent_reset)(CPUState *cpu);

+const char *common_cpu_name;
  uint32_t pvr;
  bool (*pvr_match)(struct PowerPCCPUClass *pcc, uint32_t pvr);
  uint64_t pcr_mask;
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index b479471..3f2df65 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -2221,7 +2221,6 @@ static int kvm_ppc_register_host_cpu_type(void)
  };
  uint32_t host_pvr = mfpvr();
  PowerPCCPUClass *pvr_pcc;
-DeviceClass *dc;

  pvr_pcc = ppc_cpu_class_by_pvr(host_pvr);
  if (pvr_pcc == NULL) {
@@ -2235,10 +2234,12 @@ static int kvm_ppc_register_host_cpu_type(void)

  /* Register generic family CPU class for a family */
  pvr_pcc = ppc_cpu_get_family_class(pvr_pcc);
-dc = DEVICE_CLASS(pvr_pcc);
-type_info.parent = object_class_get_name(OBJECT_CLASS(pvr_pcc));
-type_info.name = g_strdup_printf("%s-"TYPE_POWERPC_CPU, dc->desc);
-type_register(&type_info);
+if (pvr_pcc->common_cpu_name) {
+type_info.parent = object_class_get_name(OBJECT_CLASS(pvr_pcc));
+type_info.name = g_strdup_printf("%s-"TYPE_POWERPC_CPU,
+ pvr_pcc->common_cpu_name);
+type_register(&type_info);
+}

  return 0;
  }
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index df1a62c..3d0be66 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -8117,6 +8117,7 @@ POWERPC_FAMILY(POWER7)(ObjectClass *oc, void *data)
  dc->fw_name = "PowerPC,POWER7";
  dc->desc = "POWER7";
  dc->props = powerpc_servercpu_properties;
+pcc->common_cpu_name = "POWER7";
  pcc->pvr_match = ppc_pvr_match_power7;
  pcc->pcr_mask = PCR_COMPAT_2_05 | PCR_COMPAT_2_06;
  pcc->init_proc = init_proc_POWER7;
@@ -8193,6 +8194,7 @@ POWERPC_FAMILY(POWER8)(ObjectClass *oc, void *data)
  dc->fw_name = "PowerPC,POWER8";
  dc->desc = "POWER8";
  dc->props = powerpc_servercpu_properties;
+pcc->common_cpu_name = "POWER8";
  pcc->pvr_match = ppc_pvr_match_power8;
  pcc->pcr_mask = PCR_COMPAT_2_05 | PCR_COMPAT_2_06;
  pcc->init_proc = init_proc_POWER8;




--
Alexey



Re: [Qemu-devel] [PATCH 0/2] virtio len fixes for qemu.

2015-03-15 Thread Michael S. Tsirkin
On Mon, Mar 16, 2015 at 01:44:22PM +1030, Rusty Russell wrote:
> "Michael S. Tsirkin"  writes:
> > On Fri, Mar 13, 2015 at 11:47:18AM +1030, Rusty Russell wrote:
> >> Here's my proposed spec patch, which spells this out:
> >> 
> >> diff --git a/content.tex b/content.tex
> >> index 6ba079d..b6345a8 100644
> >> --- a/content.tex
> >> +++ b/content.tex
> >> @@ -600,10 +600,19 @@ them: it is only written to by the device, and read 
> >> by the driver.
> >>  Each entry in the ring is a pair: \field{id} indicates the head entry of 
> >> the
> >>  descriptor chain describing the buffer (this matches an entry
> >>  placed in the available ring by the guest earlier), and \field{len} the 
> >> total
> >> -of bytes written into the buffer. The latter is extremely useful
> >> +of bytes written into the buffer. 
> >> +
> >> +\begin{note}
> >> +\field{len} is extremely useful
> >
> > just "useful" maybe?
> 
> OK.
> 
> >>  for drivers using untrusted buffers: if you do not know exactly
> >
> > replace "you" with "driver" here?
> 
> Yep.
> 
> >> -how much has been written by the device, you usually have to zero
> >> -the buffer to ensure no data leakage occurs.
> >> +how much has been written by the device, a driver would have to zero
> >> +the buffer in advance to ensure no data leakage occurs.
> >> +
> >> +For example, a network driver
> >
> > any driver really, right?
> 
> Well, the block device has an explicit status byte, and an fixed length.
> 
> But there's a subtler detail I was considering when I designed this.
> 
> Imagine a Xen-style "driver domain" which is actually your device; it's
> *untrusted*.  This is possible if the (trusted) host does that actual
> data transfer, *and* reports the length; and such a mechanism is
> generic, so the host doesn't need to whether this is a block, net, or
> other device.
> 
> (Imagine the device-guest has R/O mapping of the avail ring and
>  descriptor table.  Ignoring indirect descriptors you only need a "copy
>  this data to/from this avail entry" helper to make this work).
> 
> > How about something like this:
> >
> > +The device MUST write at least \field{len} bytes to descriptor,
> > +beginning at the first device-writable buffer,
> > +prior to updating the used index field.
> > +The device MAY write more than \field{len} bytes to descriptor.
> > +The driver MUST NOT make assumptions about data in the buffer pointed to
> > +by the descriptor with WRITE flag
> > +beyond the first \field{len} bytes: the data
> > +might be unchanged by the device, or it might be
> > +overwritten by the device.
> > +The driver SHOULD ignore data beyond the first \field{len} bytes.
> 
> I like these, as long as we note that this MAY is to allow error cases,
> otherwise people might think they should just set len to zero.
> 
> Here it is, using the device-writable terminology, and explicitly
> requiring that the device must set len (otherwise the requirements
> about the device obeying len makes it look like it's set by the driver):
> 
> diff --git a/content.tex b/content.tex
> index 6ba079d..2c946a5 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -600,10 +600,19 @@ them: it is only written to by the device, and read by 
> the driver.
>  Each entry in the ring is a pair: \field{id} indicates the head entry of the
>  descriptor chain describing the buffer (this matches an entry
>  placed in the available ring by the guest earlier), and \field{len} the total
> -of bytes written into the buffer. The latter is extremely useful
> -for drivers using untrusted buffers: if you do not know exactly
> -how much has been written by the device, you usually have to zero
> -the buffer to ensure no data leakage occurs.
> +of bytes written into the buffer. 
> +
> +\begin{note}
> +\field{len} is useful
> +for drivers using untrusted buffers: if a driver does not know exactly
> +how much has been written by the device, the driver would have to zero
> +the buffer in advance to ensure no data leakage occurs.
> +
> +For example, a network driver may hand a received buffer directly to
> +an unprivileged userspace application.  If the network device has not
> +overwritten the bytes which were in that buffer, this may leak the
> +contents of freed memory from other processes to the application.
> +\end{note}
>  
>  \begin{note}
>  The legacy \hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]}
> @@ -612,6 +621,28 @@ the constant as VRING_USED_F_NO_NOTIFY, but the layout 
> and value were
>  identical.
>  \end{note}
>  
> +\devicenormative{\subsubsection}{Virtqueue Notification Suppression}{Basic 
> Facilities of a Virtio Device / Virtqueues / The Virtqueue Used Ring}
> +
> +The device MUST set \field{len} prior to updating the used \field{idx}.
> +
> +The device MUST write at least \field{len} bytes to descriptor,
> +beginning at the first device-writable buffer,
> +prior to updating the used \field{idx}.
> +
> +The device MAY write more than \field{len} bytes to descriptor.
> +
> +\begin{note}
> +There are pote

Re: [Qemu-devel] virtio fixes pull for 4.0?

2015-03-15 Thread Michael S. Tsirkin
On Mon, Mar 09, 2015 at 05:43:19PM +1030, Rusty Russell wrote:
> > I think it's a good idea to merge these patches (maybe except the
> > !TASK_RUNNING thing) sooner rather than later, to make sure people have
> > the time to test the fixes properly.  Would you like me to pack up (some
> > of them) them up and do a pull request?
> 
> I'm waiting a bit longer, since they seem to still be tricking in.

I don't think there was a pull request since rc1 which
makes me a bit anxious. Any outstanding issues?

-- 
MST



Re: [Qemu-devel] [PATCH] virtio-pci: Clear IRQ at reset

2015-03-15 Thread Michael S. Tsirkin
On Fri, Mar 13, 2015 at 02:28:20PM +0800, Fam Zheng wrote:
> On Fri, 03/13 14:07, Fam Zheng wrote:
> > On Thu, 03/12 12:15, Michael S. Tsirkin wrote:
> > > On Thu, Mar 12, 2015 at 11:04:33AM +, Peter Maydell wrote:
> > > > On 12 March 2015 at 10:57, Michael S. Tsirkin  wrote:
> > > > > This isn't a device reset though.
> > > > > The function that Fam is touching is called
> > > > > when a special "virtio reset" register to
> > > > > poked by the driver.
> > > > > It only resets part of the device, not all of it,
> > > > > and it seems reasonable to ask that it clear the
> > > > > interrupt.
> > > > 
> > > > Oh, right, sorry. Yes, that should clear the interrupt, then.
> > > > (Is there a similar bug on other virtio transports?)
> > > > 
> > > > -- PMM
> > > 
> > > Hmm interesting.
> > > 
> > > I looked at virtio_reset and that one does:
> > > vdev->isr = 0;
> > > vdev->config_vector = VIRTIO_NO_VECTOR;
> > > virtio_notify_vector(vdev, vdev->config_vector);
> > > 
> > > which in turn would call
> > > static void virtio_pci_notify(DeviceState *d, uint16_t vector)
> > > {
> > > VirtIOPCIProxy *proxy = to_virtio_pci_proxy_fast(d);
> > > 
> > > if (msix_enabled(&proxy->pci_dev))
> > > msix_notify(&proxy->pci_dev, vector);
> > > else {
> > > VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus);
> > > pci_set_irq(&proxy->pci_dev, vdev->isr & 1);
> > > }
> > > }
> > > 
> > > since isr is 0, and msi is disabled, it looks like
> > > pci_set_irq will get invoked automatically.
> > > 
> > > so at this point I stopped understanding how can
> > > this patch help.
> > > 
> > > Fam, does your patch actually help some guests?
> > > Could you pls investigate why isn't virtio_reset
> > > sufficient?
> > 
> > Because virtio_reset doesn't disable msix, so here
> > "msix_notify(&proxy->pci_dev, 0)" is called.
> 
> s/0/VIRTIO_NO_VECTOR/

My answer still holds - we don't need to clear msix
interrupts, and with msi enabled we know intx is low.

No?

> > 
> > I tested by applying your patch:
> > 
> > http://www.spinics.net/lists/kernel/msg1943182.html
> > 
> > and adding printf's in QEMU's virtio_reset and virtio_pci_notify.
> > 
> > It shows pci_set_irq is never called.
> > 
> > Fam
> > 
> > 



[Qemu-devel] [PATCH v3 3/4] exec: Notify cpu_register_map_client caller if the bounce buffer is available

2015-03-15 Thread Fam Zheng
The caller's workflow is like

if (!address_space_map()) {
...
cpu_register_map_client();
}

If bounce buffer became available after address_space_map() but before
cpu_register_map_client(), the caller could miss it and has to wait for the
next bounce buffer notify, which may never happen in the worse case.

Just notify the list in cpu_register_map_client().

Signed-off-by: Fam Zheng 
---
 exec.c | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/exec.c b/exec.c
index 3e54580..20381a0 100644
--- a/exec.c
+++ b/exec.c
@@ -2489,6 +2489,17 @@ QemuMutex map_client_list_lock;
 static QLIST_HEAD(map_client_list, MapClient) map_client_list
 = QLIST_HEAD_INITIALIZER(map_client_list);
 
+static void cpu_notify_map_clients_unlocked(void)
+{
+MapClient *client;
+
+while (!QLIST_EMPTY(&map_client_list)) {
+client = QLIST_FIRST(&map_client_list);
+client->callback(client->opaque);
+cpu_unregister_map_client(client);
+}
+}
+
 void *cpu_register_map_client(void *opaque, void (*callback)(void *opaque))
 {
 MapClient *client = g_malloc(sizeof(*client));
@@ -2497,6 +2508,9 @@ void *cpu_register_map_client(void *opaque, void 
(*callback)(void *opaque))
 client->opaque = opaque;
 client->callback = callback;
 QLIST_INSERT_HEAD(&map_client_list, client, link);
+if (!atomic_read(&bounce.in_use)) {
+cpu_notify_map_clients_unlocked();
+}
 qemu_mutex_unlock(&map_client_list_lock);
 return client;
 }
@@ -2521,14 +2535,8 @@ static void cpu_unregister_map_client(void *_client)
 
 static void cpu_notify_map_clients(void)
 {
-MapClient *client;
-
 qemu_mutex_lock(&map_client_list_lock);
-while (!QLIST_EMPTY(&map_client_list)) {
-client = QLIST_FIRST(&map_client_list);
-client->callback(client->opaque);
-cpu_unregister_map_client(client);
-}
+cpu_notify_map_clients_unlocked();
 qemu_mutex_unlock(&map_client_list_lock);
 }
 
-- 
1.9.3




[Qemu-devel] [PATCH v3 2/4] exec: Protect map_client_list with mutex

2015-03-15 Thread Fam Zheng
So that accesses from multiple threads are safe.

Signed-off-by: Fam Zheng 
---
 exec.c | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/exec.c b/exec.c
index 4080044..3e54580 100644
--- a/exec.c
+++ b/exec.c
@@ -429,15 +429,6 @@ address_space_translate_for_iotlb(CPUState *cpu, hwaddr 
addr,
 }
 #endif
 
-void cpu_exec_init_all(void)
-{
-#if !defined(CONFIG_USER_ONLY)
-qemu_mutex_init(&ram_list.mutex);
-memory_map_init();
-io_mem_init();
-#endif
-}
-
 #if !defined(CONFIG_USER_ONLY)
 
 static int cpu_common_post_load(void *opaque, int version_id)
@@ -2494,6 +2485,7 @@ typedef struct MapClient {
 QLIST_ENTRY(MapClient) link;
 } MapClient;
 
+QemuMutex map_client_list_lock;
 static QLIST_HEAD(map_client_list, MapClient) map_client_list
 = QLIST_HEAD_INITIALIZER(map_client_list);
 
@@ -2501,12 +2493,24 @@ void *cpu_register_map_client(void *opaque, void 
(*callback)(void *opaque))
 {
 MapClient *client = g_malloc(sizeof(*client));
 
+qemu_mutex_lock(&map_client_list_lock);
 client->opaque = opaque;
 client->callback = callback;
 QLIST_INSERT_HEAD(&map_client_list, client, link);
+qemu_mutex_unlock(&map_client_list_lock);
 return client;
 }
 
+void cpu_exec_init_all(void)
+{
+#if !defined(CONFIG_USER_ONLY)
+qemu_mutex_init(&ram_list.mutex);
+memory_map_init();
+io_mem_init();
+#endif
+qemu_mutex_init(&map_client_list_lock);
+}
+
 static void cpu_unregister_map_client(void *_client)
 {
 MapClient *client = (MapClient *)_client;
@@ -2519,11 +2523,13 @@ static void cpu_notify_map_clients(void)
 {
 MapClient *client;
 
+qemu_mutex_lock(&map_client_list_lock);
 while (!QLIST_EMPTY(&map_client_list)) {
 client = QLIST_FIRST(&map_client_list);
 client->callback(client->opaque);
 cpu_unregister_map_client(client);
 }
+qemu_mutex_unlock(&map_client_list_lock);
 }
 
 bool address_space_access_valid(AddressSpace *as, hwaddr addr, int len, bool 
is_write)
-- 
1.9.3




[Qemu-devel] [PATCH v3 0/4] exec: Make bounce buffer thread safe

2015-03-15 Thread Fam Zheng
v3: Address Paolo's comments:
Use atomic_xchg for bounce buffer.
Use mutex and BH for map_client_list.

The global bounce buffer used for non-direct memory access is not thread-safe:

 1) Access to "bounce" is not atomic.

 2) Access to "map_client_list" is not atomic.

 3) In dma_blk_cb, there is a race condition between:

mem = dma_memory_map(...
and
cpu_register_map_client(...

Bounce may become available after dma_memory_map failed but before
cpu_register_map_client is called.

 4) The reschedule_dma is not in the right AioContext;
continue_after_map_failure called from other threads will race with
dma_aio_cancel.

This series fixes these issues respectively.


Fam Zheng (4):
  exec: Atomic access to bounce buffer
  exec: Protect map_client_list with mutex
  exec: Notify cpu_register_map_client caller if the bounce buffer is
available
  dma-helpers: Fix race condition of continue_after_map_failure and
dma_aio_cancel

 dma-helpers.c | 17 +--
 exec.c| 77 ---
 include/exec/cpu-common.h |  3 +-
 3 files changed, 62 insertions(+), 35 deletions(-)

-- 
1.9.3




[Qemu-devel] [PATCH v3 1/4] exec: Atomic access to bounce buffer

2015-03-15 Thread Fam Zheng
There could be a race condition when two processes call
address_space_map concurrently and both want to use the bounce buffer.

Add an in_use flag in BounceBuffer to sync it.

Signed-off-by: Fam Zheng 
---
 exec.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/exec.c b/exec.c
index e97071a..4080044 100644
--- a/exec.c
+++ b/exec.c
@@ -2483,6 +2483,7 @@ typedef struct {
 void *buffer;
 hwaddr addr;
 hwaddr len;
+bool in_use;
 } BounceBuffer;
 
 static BounceBuffer bounce;
@@ -2571,9 +2572,10 @@ void *address_space_map(AddressSpace *as,
 l = len;
 mr = address_space_translate(as, addr, &xlat, &l, is_write);
 if (!memory_access_is_direct(mr, is_write)) {
-if (bounce.buffer) {
+if (atomic_xchg(&bounce.in_use, true)) {
 return NULL;
 }
+smp_mb();
 /* Avoid unbounded allocations */
 l = MIN(l, TARGET_PAGE_SIZE);
 bounce.buffer = qemu_memalign(TARGET_PAGE_SIZE, l);
@@ -2641,6 +2643,7 @@ void address_space_unmap(AddressSpace *as, void *buffer, 
hwaddr len,
 qemu_vfree(bounce.buffer);
 bounce.buffer = NULL;
 memory_region_unref(bounce.mr);
+atomic_mb_set(&bounce.in_use, false);
 cpu_notify_map_clients();
 }
 
-- 
1.9.3




[Qemu-devel] [PATCH v3 4/4] dma-helpers: Fix race condition of continue_after_map_failure and dma_aio_cancel

2015-03-15 Thread Fam Zheng
If DMA's owning thread cancels the IO while the bounce buffer's owning thread
is notifying the "cpu client list", a use-after-free happens:

 continue_after_map_failure   dma_aio_cancel
 --
 aio_bh_new
  qemu_bh_delete
 qemu_bh_schedule (use after free)

Also, the old code doesn't run the bh in the right AioContext.

Fix both problems by passing a QEMUBH to cpu_register_map_client.

Signed-off-by: Fam Zheng 
---
 dma-helpers.c | 17 -
 exec.c| 32 +---
 include/exec/cpu-common.h |  3 ++-
 3 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/dma-helpers.c b/dma-helpers.c
index 6918572..1fddf6a 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -92,14 +92,6 @@ static void reschedule_dma(void *opaque)
 dma_blk_cb(dbs, 0);
 }
 
-static void continue_after_map_failure(void *opaque)
-{
-DMAAIOCB *dbs = (DMAAIOCB *)opaque;
-
-dbs->bh = qemu_bh_new(reschedule_dma, dbs);
-qemu_bh_schedule(dbs->bh);
-}
-
 static void dma_blk_unmap(DMAAIOCB *dbs)
 {
 int i;
@@ -161,7 +153,9 @@ static void dma_blk_cb(void *opaque, int ret)
 
 if (dbs->iov.size == 0) {
 trace_dma_map_wait(dbs);
-cpu_register_map_client(dbs, continue_after_map_failure);
+dbs->bh = aio_bh_new(blk_get_aio_context(dbs->blk),
+ reschedule_dma, dbs);
+cpu_register_map_client(dbs->bh);
 return;
 }
 
@@ -183,6 +177,11 @@ static void dma_aio_cancel(BlockAIOCB *acb)
 if (dbs->acb) {
 blk_aio_cancel_async(dbs->acb);
 }
+if (dbs->bh) {
+cpu_unregister_map_client(dbs->bh);
+qemu_bh_delete(dbs->bh);
+dbs->bh = NULL;
+}
 }
 
 
diff --git a/exec.c b/exec.c
index 20381a0..b15ca5e 100644
--- a/exec.c
+++ b/exec.c
@@ -2480,8 +2480,7 @@ typedef struct {
 static BounceBuffer bounce;
 
 typedef struct MapClient {
-void *opaque;
-void (*callback)(void *opaque);
+QEMUBH *bh;
 QLIST_ENTRY(MapClient) link;
 } MapClient;
 
@@ -2489,30 +2488,29 @@ QemuMutex map_client_list_lock;
 static QLIST_HEAD(map_client_list, MapClient) map_client_list
 = QLIST_HEAD_INITIALIZER(map_client_list);
 
+static void cpu_unregister_map_client_do(MapClient *client);
 static void cpu_notify_map_clients_unlocked(void)
 {
 MapClient *client;
 
 while (!QLIST_EMPTY(&map_client_list)) {
 client = QLIST_FIRST(&map_client_list);
-client->callback(client->opaque);
-cpu_unregister_map_client(client);
+qemu_bh_schedule(client->bh);
+cpu_unregister_map_client_do(client);
 }
 }
 
-void *cpu_register_map_client(void *opaque, void (*callback)(void *opaque))
+void cpu_register_map_client(QEMUBH *bh)
 {
 MapClient *client = g_malloc(sizeof(*client));
 
 qemu_mutex_lock(&map_client_list_lock);
-client->opaque = opaque;
-client->callback = callback;
+client->bh = bh;
 QLIST_INSERT_HEAD(&map_client_list, client, link);
 if (!atomic_read(&bounce.in_use)) {
 cpu_notify_map_clients_unlocked();
 }
 qemu_mutex_unlock(&map_client_list_lock);
-return client;
 }
 
 void cpu_exec_init_all(void)
@@ -2525,14 +2523,26 @@ void cpu_exec_init_all(void)
 qemu_mutex_init(&map_client_list_lock);
 }
 
-static void cpu_unregister_map_client(void *_client)
+static void cpu_unregister_map_client_do(MapClient *client)
 {
-MapClient *client = (MapClient *)_client;
-
 QLIST_REMOVE(client, link);
 g_free(client);
 }
 
+void cpu_unregister_map_client(QEMUBH *bh)
+{
+MapClient *client;
+
+qemu_mutex_lock(&map_client_list_lock);
+QLIST_FOREACH(client, &map_client_list, link) {
+if (client->bh == bh) {
+cpu_unregister_map_client_do(client);
+break;
+}
+}
+qemu_mutex_unlock(&map_client_list_lock);
+}
+
 static void cpu_notify_map_clients(void)
 {
 qemu_mutex_lock(&map_client_list_lock);
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index fcc3162..43428bd 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -82,7 +82,8 @@ void *cpu_physical_memory_map(hwaddr addr,
   int is_write);
 void cpu_physical_memory_unmap(void *buffer, hwaddr len,
int is_write, hwaddr access_len);
-void *cpu_register_map_client(void *opaque, void (*callback)(void *opaque));
+void cpu_register_map_client(QEMUBH *bh);
+void cpu_unregister_map_client(QEMUBH *bh);
 
 bool cpu_physical_memory_is_io(hwaddr phys_addr);
 
-- 
1.9.3




Re: [Qemu-devel] [PATCH v3] vl: fix resource leak with monitor_fdset_add_fd

2015-03-15 Thread Fam Zheng
On Sun, 03/15 10:16, Paolo Bonzini wrote:
> monitor_fdset_add_fd returns an AddfdInfo struct (used by the QMP
> command add_fd).  Free it.
> 
> Signed-off-by: Paolo Bonzini 

Reviewed-by: Fam Zheng 

> ---
>   v1->v2: line length [Fam], pass &error_abort [Shannon]
> v2->v3: use "!!" instead of "? true : false" [Markus]
> ---
>  vl.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/vl.c b/vl.c
> index eba5d4c..5985680 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -1011,6 +1011,7 @@ static int parse_add_fd(QemuOpts *opts, void *opaque)
>  int fd, dupfd, flags;
>  int64_t fdset_id;
>  const char *fd_opaque = NULL;
> +AddfdInfo *fdinfo;
>  
>  fd = qemu_opt_get_number(opts, "fd", -1);
>  fdset_id = qemu_opt_get_number(opts, "set", -1);
> @@ -1060,8 +1061,9 @@ static int parse_add_fd(QemuOpts *opts, void *opaque)
>  }
>  
>  /* add the duplicate fd, and optionally the opaque string, to the fd set 
> */
> -monitor_fdset_add_fd(dupfd, true, fdset_id, fd_opaque ? true : false,
> - fd_opaque, NULL);
> +fdinfo = monitor_fdset_add_fd(dupfd, true, fdset_id, !!fd_opaque, 
> fd_opaque,
> +  &error_abort);
> +g_free(fdinfo);
>  
>  return 0;
>  }
> -- 
> 2.3.0
> 
> 



Re: [Qemu-devel] [RFC PATCH 01/14] docs: block replication's description

2015-03-15 Thread Wen Congyang
On 03/13/2015 05:05 PM, Fam Zheng wrote:
> On Fri, 03/13 17:01, Wen Congyang wrote:
>> On 03/11/2015 02:49 PM, Fam Zheng wrote:
>>> On Wed, 03/11 14:44, Wen Congyang wrote:
 On 03/03/2015 03:59 PM, Fam Zheng wrote:
> On Tue, 03/03 15:53, Wen Congyang wrote:
>> I test qcow2_make_empty()'s performance. The result shows that it may
>> take about 100ms(normal sata disk). It is not acceptable for COLO. So
>> I think disk buff is necessary(just use it to replace qcow2).
>
> Why not tmpfs or ramdisk?

 Another problem:
 After failover, secondary write request will be written in (active disk)?
 It is better to write request to (nbd target). Is there any feature can
 be reused to implement it?
>>>
>>> You can use block commit or stream to move the data.
>>
>> Can the job stream move the data? I don't find the write ops in 
>> block/stream.c.
> 
> It is bdrv_co_copy_on_readv that moves data.

Does the job stream move the data from base to top?

Thanks
Wen Congyang

> 
> Fam
> .
> 




Re: [Qemu-devel] [PATCH] fsdev/virtfs-proxy-helper: Fix improper use of negative value

2015-03-15 Thread Aneesh Kumar K.V
Shannon Zhao  writes:

> It's detected by coverity. Check the return value of proxy_marshal.
>
> Signed-off-by: Shannon Zhao 
> Signed-off-by: Shannon Zhao 

Applied

> ---
>  fsdev/virtfs-proxy-helper.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/fsdev/virtfs-proxy-helper.c b/fsdev/virtfs-proxy-helper.c
> index c1da2d7..bf2e5f3 100644
> --- a/fsdev/virtfs-proxy-helper.c
> +++ b/fsdev/virtfs-proxy-helper.c
> @@ -262,6 +262,9 @@ static int send_status(int sockfd, struct iovec *iovec, 
> int status)
>   */
>  msg_size = proxy_marshal(iovec, 0, "ddd", header.type,
>   header.size, status);
> +if (msg_size < 0) {
> +return msg_size;
> +}
>  retval = socket_write(sockfd, iovec->iov_base, msg_size);
>  if (retval < 0) {
>  return retval;
> -- 
> 1.8.3.1




Re: [Qemu-devel] [PATCH v2] fsdev/virtfs-proxy-helper: Fix possible overflow

2015-03-15 Thread Aneesh Kumar K.V
Shannon Zhao  writes:

> It's detected by coverity. As max of sockaddr_un.sun_path is
> sizeof(helper.sun_path), should check the length of source
> and use strncpy instead of strcpy.

updated such that,

The socket name specified should fit in the sockadd_un.sun_path. If not
abort.

with that applied.

>
> Signed-off-by: Shannon Zhao 
> Signed-off-by: Shannon Zhao 
> ---
> v1->v2: Still use strcpy [Paolo]
> ---
>  fsdev/virtfs-proxy-helper.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/fsdev/virtfs-proxy-helper.c b/fsdev/virtfs-proxy-helper.c
> index bf2e5f3..13fe032 100644
> --- a/fsdev/virtfs-proxy-helper.c
> +++ b/fsdev/virtfs-proxy-helper.c
> @@ -738,6 +738,7 @@ static int proxy_socket(const char *path, uid_t uid, 
> gid_t gid)
>  return -1;
>  }
>  
> +g_assert(strlen(path) < sizeof(proxy.sun_path));
>  sock = socket(AF_UNIX, SOCK_STREAM, 0);
>  if (sock < 0) {
>  do_perror("socket");
> -- 
> 1.8.3.1




Re: [Qemu-devel] [PATCH v5 18/45] MIG_CMD_PACKAGED: Send a packaged chunk of migration stream

2015-03-15 Thread David Gibson
On Fri, Mar 13, 2015 at 11:51:42AM +, Dr. David Alan Gilbert wrote:
> * David Gibson (da...@gibson.dropbear.id.au) wrote:
> > On Wed, Feb 25, 2015 at 04:51:41PM +, Dr. David Alan Gilbert (git) 
> > wrote:
> > > From: "Dr. David Alan Gilbert" 
> > > 
> > > MIG_CMD_PACKAGED is a migration command that allows a chunk
> > > of migration stream to be sent in one go, and be received by
> > > a separate instance of the loadvm loop while not interacting
> > > with the migration stream.
> > 
> > Hrm.  I'd be more comfortable if the semantics of CMD_PACKAGED were
> > defined in terms of visible effects on the other end, rather than in
> > terms of how it's implemented internally.
> > 
> > > This is used by postcopy to load device state (from the package)
> > > while loading memory pages from the main stream.
> > 
> > Which makes the above paragraph a bit misleading - the whole point
> > here is that loading the package data *does* interact with the
> > migration stream - just that it's the migration stream after the end
> > of the package.
> 
> Hmm, how about:
> 
> 
> MIG_CMD_PACKAGED is a migration command that wraps a chunk of migration
> stream inside a package whose length can be determined purely by reading
> its header.  The destination guarantees that the whole MIG_CMD_PACKAGED is
> read off the stream prior to parsing the contents.
> 
> This is used by postcopy to load device state (from the package)
> while leaving the main stream free to receive memory pages.

It's an improvement.

I'm still a bit concerned that the semantics of CMD_PACKAGED are
unhealthily bound up with hw the current implementation works.

[snip]
> > > +/* Immediately following this command is a blob of data containing an 
> > > embedded
> > > + * chunk of migration stream; read it and load it.
> > > + */
> > > +static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis,
> > > +  uint32_t length)
> > > +{
> > > +int ret;
> > > +uint8_t *buffer;
> > > +QEMUSizedBuffer *qsb;
> > > +
> > > +trace_loadvm_handle_cmd_packaged(length);
> > > +
> > > +if (length > MAX_VM_CMD_PACKAGED_SIZE) {
> > > +error_report("Unreasonably large packaged state: %u", length);
> > > +return -1;
> > 
> > It would be a good idea to check this on the send side as well as
> > receive, wouldn't it?
> 
> Yes, I had been doing that in the postcopy code that called the
> send code; but I've now moved it down into savevm_send_packaged.

Good, better symmetry that way.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgp9J4PsMG8l3.pgp
Description: PGP signature


Re: [Qemu-devel] [PATCH v5 26/45] Postcopy page-map-incoming (PMI) structure

2015-03-15 Thread David Gibson
On Fri, Mar 13, 2015 at 01:47:53PM +, Dr. David Alan Gilbert wrote:
> * David Gibson (da...@gibson.dropbear.id.au) wrote:
> > On Wed, Feb 25, 2015 at 04:51:49PM +, Dr. David Alan Gilbert (git) 
> > wrote:
> > > From: "Dr. David Alan Gilbert" 
[snip]
> > > +mis->postcopy_pmi.state0[BIT_WORD(bitmap_index)] |= shifted_mask;
> > > +} else {
> > > +mis->postcopy_pmi.state0[BIT_WORD(bitmap_index)] &= 
> > > ~shifted_mask;
> > > +}
> > > +if (state & 2) {
> > > +mis->postcopy_pmi.state1[BIT_WORD(bitmap_index)] |= shifted_mask;
> > > +} else {
> > > +mis->postcopy_pmi.state1[BIT_WORD(bitmap_index)] &= 
> > > ~shifted_mask;
> > > +}
> > > +}
> > > +
> > > +/*
> > > + * Retrieve the state of the given page
> > > + * Note: This version for use by callers already holding the lock
> > > + */
> > > +static PostcopyPMIState postcopy_pmi_get_state_nolock(
> > > +MigrationIncomingState *mis,
> > > +size_t bitmap_index)
> > > +{
> > > +bool b0, b1;
> > > +
> > > +b0 = test_hpbits(mis, bitmap_index, mis->postcopy_pmi.state0);
> > > +b1 = test_hpbits(mis, bitmap_index, mis->postcopy_pmi.state1);
> > > +
> > > +return (b0 ? 1 : 0) + (b1 ? 2 : 0);
> > 
> > Ugh.. this is a hidden dependency on the PostcopyPMIState enum
> > elements never changing value.  Safer to code it as:
> >   if (!b0 && !b1) {
> >   return POSTCOPY_PMI_MISSING;
> >   } else if (...)
> >...
> > 
> > and let gcc sort it out.
> 
> Again, I was trying to make this just the interface; so it doesn't
> know or care about the enum mapping; we can change the enum mapping to
> the bits without changing this function (or the callers) at all.

So.. I'm not entirely clear what you mean by that.  I think what
you're saying is that this function basically returns an arbitrary bit
pattern derived from the state maps, and the enum provides the mapping
from those bit patterns to meaningful states?

That's.. subtle :/.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpMAWFe4b6uN.pgp
Description: PGP signature


Re: [Qemu-devel] [PATCH v5 22/45] postcopy: OS support test

2015-03-15 Thread David Gibson
On Fri, Mar 13, 2015 at 10:41:53AM +, Dr. David Alan Gilbert wrote:
> * David Gibson (da...@gibson.dropbear.id.au) wrote:
> > On Wed, Feb 25, 2015 at 04:51:45PM +, Dr. David Alan Gilbert (git) 
> > wrote:
> > > From: "Dr. David Alan Gilbert" 
> > > 
> > > Provide a check to see if the OS we're running on has all the bits
> > > needed for postcopy.
> > > 
> > > Creates postcopy-ram.c which will get most of the other helpers we need.
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert 
> > > ---
> > >  include/migration/postcopy-ram.h |  19 +
> > >  migration/Makefile.objs  |   2 +-
> > >  migration/postcopy-ram.c | 161 
> > > +++
> > >  savevm.c |   5 ++
> > >  4 files changed, 186 insertions(+), 1 deletion(-)
> > >  create mode 100644 include/migration/postcopy-ram.h
> > >  create mode 100644 migration/postcopy-ram.c
> > > 
> > > diff --git a/include/migration/postcopy-ram.h 
> > > b/include/migration/postcopy-ram.h
> > > new file mode 100644
> > > index 000..d81934f
> > > --- /dev/null
> > > +++ b/include/migration/postcopy-ram.h
> > > @@ -0,0 +1,19 @@
> > > +/*
> > > + * Postcopy migration for RAM
> > > + *
> > > + * Copyright 2013 Red Hat, Inc. and/or its affiliates
> > > + *
> > > + * Authors:
> > > + *  Dave Gilbert  
> > > + *
> > > + * This work is licensed under the terms of the GNU GPL, version 2 or 
> > > later.
> > > + * See the COPYING file in the top-level directory.
> > > + *
> > > + */
> > > +#ifndef QEMU_POSTCOPY_RAM_H
> > > +#define QEMU_POSTCOPY_RAM_H
> > > +
> > > +/* Return true if the host supports everything we need to do 
> > > postcopy-ram */
> > > +bool postcopy_ram_supported_by_host(void);
> > > +
> > > +#endif
> > > diff --git a/migration/Makefile.objs b/migration/Makefile.objs
> > > index d929e96..0cac6d7 100644
> > > --- a/migration/Makefile.objs
> > > +++ b/migration/Makefile.objs
> > > @@ -1,7 +1,7 @@
> > >  common-obj-y += migration.o tcp.o
> > >  common-obj-y += vmstate.o
> > >  common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o 
> > > qemu-file-stdio.o
> > > -common-obj-y += xbzrle.o
> > > +common-obj-y += xbzrle.o postcopy-ram.o
> > >  
> > >  common-obj-$(CONFIG_RDMA) += rdma.o
> > >  common-obj-$(CONFIG_POSIX) += exec.o unix.o fd.o
> > > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > > new file mode 100644
> > > index 000..a0e20b2
> > > --- /dev/null
> > > +++ b/migration/postcopy-ram.c
> > > @@ -0,0 +1,161 @@
> > > +/*
> > > + * Postcopy migration for RAM
> > > + *
> > > + * Copyright 2013-2014 Red Hat, Inc. and/or its affiliates
> > > + *
> > > + * Authors:
> > > + *  Dave Gilbert  
> > > + *
> > > + * This work is licensed under the terms of the GNU GPL, version 2 or 
> > > later.
> > > + * See the COPYING file in the top-level directory.
> > > + *
> > > + */
> > > +
> > > +/*
> > > + * Postcopy is a migration technique where the execution flips from the
> > > + * source to the destination before all the data has been copied.
> > > + */
> > > +
> > > +#include 
> > > +#include 
> > > +#include 
> > > +
> > > +#include "qemu-common.h"
> > > +#include "migration/migration.h"
> > > +#include "migration/postcopy-ram.h"
> > > +#include "sysemu/sysemu.h"
> > > +#include "qemu/error-report.h"
> > > +#include "trace.h"
> > > +
> > > +/* Postcopy needs to detect accesses to pages that haven't yet been 
> > > copied
> > > + * across, and efficiently map new pages in, the techniques for doing 
> > > this
> > > + * are target OS specific.
> > > + */
> > > +#if defined(__linux__)
> > > +
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include  /* for __u64 */
> > > +#include 
> > > +
> > > +#ifdef HOST_X86_64
> > > +#ifndef __NR_userfaultfd
> > > +#define __NR_userfaultfd 323
> > 
> > Sholdn't this come from the kernel headers imported in the previous
> > patch?  Rather than having an arch-specific hack.
> 
> The header, like the rest of the kernel headers, just provides
> the constant and structure definitions for the call; the syscall numbers
> come from arch specific headers.  I guess in the final world I wouldn't
> need this at all since it'll come from the system headers; but what's
> the right way to put this in for new syscalls?

Uh.. yeah.. I'm not sure actually.  I was assuming the imported kernel
headers, because part of the reason they're there is for getting
defines that aren't in the system headers (e.g. ioctl numbers and
structures).  But without asm/unistd.h it's not clear to be how a new
system call should be handled.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpdYqWoZgZBA.pgp
Description: PGP signature


Re: [Qemu-devel] [PATCH v5 20/45] Modify savevm handlers for postcopy

2015-03-15 Thread David Gibson
On Fri, Mar 13, 2015 at 10:19:54AM +, Dr. David Alan Gilbert wrote:
> * David Gibson (da...@gibson.dropbear.id.au) wrote:
> > On Wed, Feb 25, 2015 at 04:51:43PM +, Dr. David Alan Gilbert (git) 
> > wrote:
> > > From: "Dr. David Alan Gilbert" 
> > > 
> > > Modify save_live_pending to return separate postcopiable and
> > > non-postcopiable counts.
> > > 
> > > Add 'can_postcopy' to allow a device to state if it can postcopy
> > 
> > What's the purpose of the can_postcopy callback?  There are no callers
> > in this patch - is it still necessary with the change to
> > save_live_pending?
> 
> The patch 'qemu_savevm_state_complete: Postcopy changes' uses
> it in qemu_savevm_state_postcopy_complete and qemu_savevm_state_complete
> to decide which devices must be completed at that point.

Couldn't they check for non-zero postcopiable state from
save_live_pending instead?

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpZ0hoMDSXIw.pgp
Description: PGP signature


Re: [Qemu-devel] [PATCH v5 23/45] migrate_start_postcopy: Command to trigger transition to postcopy

2015-03-15 Thread David Gibson
On Fri, Mar 13, 2015 at 11:19:06AM +, Dr. David Alan Gilbert wrote:
> * David Gibson (da...@gibson.dropbear.id.au) wrote:
> > On Wed, Feb 25, 2015 at 04:51:46PM +, Dr. David Alan Gilbert (git) 
> > wrote:
> > > From: "Dr. David Alan Gilbert" 
> > > 
> > > Once postcopy is enabled (with migrate_set_capability), the migration
> > > will still start on precopy mode.  To cause a transition into postcopy
> > > the:
> > > 
> > >   migrate_start_postcopy
> > > 
> > > command must be issued.  Postcopy will start sometime after this
> > > (when it's next checked in the migration loop).
> > > 
> > > Issuing the command before migration has started will error,
> > > and issuing after it has finished is ignored.
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert 
> > > Reviewed-by: Eric Blake 
> > > ---
> > >  hmp-commands.hx   | 15 +++
> > >  hmp.c |  7 +++
> > >  hmp.h |  1 +
> > >  include/migration/migration.h |  3 +++
> > >  migration/migration.c | 22 ++
> > >  qapi-schema.json  |  8 
> > >  qmp-commands.hx   | 19 +++
> > >  7 files changed, 75 insertions(+)
> > > 
> > > diff --git a/hmp-commands.hx b/hmp-commands.hx
> > > index e37bc8b..03b8b78 100644
> > > --- a/hmp-commands.hx
> > > +++ b/hmp-commands.hx
> > > @@ -985,6 +985,21 @@ Enable/Disable the usage of a capability 
> > > @var{capability} for migration.
> > >  ETEXI
> > >  
> > >  {
> > > +.name   = "migrate_start_postcopy",
> > > +.args_type  = "",
> > > +.params = "",
> > > +.help   = "Switch migration to postcopy mode",
> > > +.mhandler.cmd = hmp_migrate_start_postcopy,
> > > +},
> > > +
> > > +STEXI
> > > +@item migrate_start_postcopy
> > > +@findex migrate_start_postcopy
> > > +Switch in-progress migration to postcopy mode. Ignored after the end of
> > > +migration (or once already in postcopy).
> > > +ETEXI
> > > +
> > > +{
> > >  .name   = "client_migrate_info",
> > >  .args_type  = 
> > > "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
> > >  .params = "protocol hostname port tls-port cert-subject",
> > > diff --git a/hmp.c b/hmp.c
> > > index b47f331..df9736c 100644
> > > --- a/hmp.c
> > > +++ b/hmp.c
> > > @@ -1140,6 +1140,13 @@ void hmp_migrate_set_capability(Monitor *mon, 
> > > const QDict *qdict)
> > >  }
> > >  }
> > >  
> > > +void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict)
> > > +{
> > > +Error *err = NULL;
> > > +qmp_migrate_start_postcopy(&err);
> > > +hmp_handle_error(mon, &err);
> > > +}
> > > +
> > >  void hmp_set_password(Monitor *mon, const QDict *qdict)
> > >  {
> > >  const char *protocol  = qdict_get_str(qdict, "protocol");
> > > diff --git a/hmp.h b/hmp.h
> > > index 4bb5dca..da1334f 100644
> > > --- a/hmp.h
> > > +++ b/hmp.h
> > > @@ -64,6 +64,7 @@ void hmp_migrate_set_downtime(Monitor *mon, const QDict 
> > > *qdict);
> > >  void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
> > >  void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict);
> > >  void hmp_migrate_set_cache_size(Monitor *mon, const QDict *qdict);
> > > +void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict);
> > >  void hmp_set_password(Monitor *mon, const QDict *qdict);
> > >  void hmp_expire_password(Monitor *mon, const QDict *qdict);
> > >  void hmp_eject(Monitor *mon, const QDict *qdict);
> > > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > > index e6a814a..293c83e 100644
> > > --- a/include/migration/migration.h
> > > +++ b/include/migration/migration.h
> > > @@ -104,6 +104,9 @@ struct MigrationState
> > >  int64_t xbzrle_cache_size;
> > >  int64_t setup_time;
> > >  int64_t dirty_sync_count;
> > > +
> > > +/* Flag set once the migration has been asked to enter postcopy */
> > > +bool start_postcopy;
> > >  };
> > >  
> > >  void process_incoming_migration(QEMUFile *f);
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index a4fc7d7..43ca656 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -372,6 +372,28 @@ void 
> > > qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
> > >  }
> > >  }
> > >  
> > > +void qmp_migrate_start_postcopy(Error **errp)
> > > +{
> > > +MigrationState *s = migrate_get_current();
> > > +
> > > +if (!migrate_postcopy_ram()) {
> > > +error_setg(errp, "Enable postcopy with migration_set_capability 
> > > before"
> > > + " the start of migration");
> > > +return;
> > > +}
> > > +
> > > +if (s->state == MIG_STATE_NONE) {
> > > +error_setg(errp, "Postcopy must be started after migration has 
> > > been"
> > > + " started");
> > > +return;
> > > +}
> > > +/*
> > > + * we do