Re: [Qemu-devel] [PATCH 2/2] target-mips: reimplement SC instruction and use cmpxchg

2016-09-27 Thread Leon Alrae
On Wed, Sep 21, 2016 at 01:16:28PM -0700, Richard Henderson wrote:
> On 09/21/2016 01:07 AM, Leon Alrae wrote:
> >+tcg_gen_brcond_tl(TCG_COND_EQ, addr, cpu_lladdr, l1);
> >+tcg_temp_free(addr);
> >+tcg_gen_movi_tl(t0, 0);
> >+tcg_gen_br(done);
> >+
> >+gen_set_label(l1);
> >+/* generate cmpxchg */
> >+val = tcg_temp_new();
> >+gen_load_gpr(val, rt);
> >+tcg_gen_atomic_cmpxchg_tl(t0, cpu_lladdr, cpu_llval, val,
> >+  ctx->mem_idx, tcg_mo);
> >+tcg_gen_setcond_tl(TCG_COND_EQ, t0, t0, cpu_llval);
> >+tcg_temp_free(val);
> >+
> >+gen_set_label(done);
> >+/* store the result into the register */
> >+gen_store_gpr(t0, rt);
> > tcg_temp_free(t0);
> 
> The only thing I would change is to duplicate the gen_store_gpr into
> both branches, so that we don't have to store t0 into the stack
> across the blocks.

Done in v3.

> 
> Otherwise,
> 
> Reviewed-by: Richard Henderson 

Thanks for reviewing.

Leon



Re: [Qemu-devel] [PATCH v3 07/10] ppc/pnv: add XSCOM infrastructure

2016-09-27 Thread Cédric Le Goater
On 09/27/2016 08:10 AM, Benjamin Herrenschmidt wrote:
> On Tue, 2016-09-27 at 07:54 +0200, Cédric Le Goater wrote:
>>
>>> but I guess if you have the decoding of those "core" registers 
>>> here as well, then that doesn't make so much sense.
> 
> Those core registers may well change with P9, we havne't looked closely
> yet...

Neither have I ... qemu/pnv reaches the kernel but this is really a 
"Frankenstein P9". The interrupt controller is missing. 

C. 

>> yes and there is also the handling of the XSCOM failures.
>>
>> I can add some prologue handler to cover those "core" registers
>> but adding a MemoryRegion, ops, init and mapping would be a lot 
>> of churn just to return 0.
>>
>> Thanks,




Re: [Qemu-devel] [PATCH] tests: Test IPv6 and ppc64 in the PXE tester

2016-09-27 Thread Thomas Huth
On 27.09.2016 06:17, David Gibson wrote:
> On Mon, Sep 26, 2016 at 10:17:46PM +0200, Thomas Huth wrote:
>> The firmware of the pseries machine, SLOF, is able to load files via
>> IPv6 networking, too. So to test both, network bootloading on ppc64
>> and IPv6 (via Slirp) , let's add some PXE tests for this environment,
>> too. Since we can not use the normal x86 boot sector for network boot
>> loading, we use a simple Forth script on ppc64 instead.
>>
>> Signed-off-by: Thomas Huth 
> 
> I certainly approve of testing IPv6 more, a couple of queries about
> the details though:
> 
>> ---
>>  tests/Makefile.include |  1 +
>>  tests/boot-sector.c|  9 +
>>  tests/pxe-test.c   | 22 +++---
>>  3 files changed, 25 insertions(+), 7 deletions(-)
>>
>> diff --git a/tests/Makefile.include b/tests/Makefile.include
>> index d8101b3..18bc698 100644
>> --- a/tests/Makefile.include
>> +++ b/tests/Makefile.include
>> @@ -270,6 +270,7 @@ check-qtest-ppc64-y += tests/drive_del-test$(EXESUF)
>>  check-qtest-ppc64-y += tests/postcopy-test$(EXESUF)
>>  check-qtest-ppc64-y += tests/boot-serial-test$(EXESUF)
>>  check-qtest-ppc64-y += tests/rtas-test$(EXESUF)
>> +check-qtest-ppc64-y += tests/pxe-test$(EXESUF)
>>  
>>  check-qtest-sh4-y = tests/endianness-test$(EXESUF)
>>  
>> diff --git a/tests/boot-sector.c b/tests/boot-sector.c
>> index 3ffe298..e3193c0 100644
>> --- a/tests/boot-sector.c
>> +++ b/tests/boot-sector.c
>> @@ -77,6 +77,15 @@ int boot_sector_init(const char *fname)
>>  fprintf(stderr, "Couldn't open \"%s\": %s", fname, strerror(errno));
>>  return 1;
>>  }
>> +
>> +/* For Open Firmware based system, we can use a Forth script instead */
>> +if (strcmp(qtest_get_arch(), "ppc64") == 0) {
> 
> As always, I'm uneasy about using arch based tests for what's really a
> machine type property.  Still, as a test case, I guess we can fix that
> when and if someone actually tries to run it for a ppc machine that's
> not spapr (or an x86 machine that's not pc, theoretically speaking).

As long as we don't have a fancy qtest_get_machine() function, I think
this is the best we can do right now. And since this code has to be
touched anyway when another machine type should be used to run the
boot_sector_init() function, I think it's OK to postpone this to this
later point in time.

>> +memset(boot_sector, ' ', sizeof boot_sector);
>> +sprintf((char *)boot_sector, "\\ Bootscript\n%x %x c! %x %x c!\n",
>> +LOW(SIGNATURE), BOOT_SECTOR_ADDRESS + SIGNATURE_OFFSET,
>> +HIGH(SIGNATURE), BOOT_SECTOR_ADDRESS + SIGNATURE_OFFSET + 
>> 1);
>> +}
>> +
>>  fwrite(boot_sector, 1, sizeof boot_sector, f);
>>  fclose(f);
>>  return 0;
>> diff --git a/tests/pxe-test.c b/tests/pxe-test.c
>> index b2cc355..0bdb7a1 100644
>> --- a/tests/pxe-test.c
>> +++ b/tests/pxe-test.c
>> @@ -21,14 +21,14 @@
>>  
>>  static const char *disk = "tests/pxe-test-disk.raw";
>>  
>> -static void test_pxe_one(const char *params)
>> +static void test_pxe_one(const char *params, bool ipv6)
> 
> Is it wise to keep the "PXE" name.  OF style netbooting isn't really
> PXE in the sense of the Intel PXE spec, although it overlaps in the
> underlying protocols used.

Strictly speaking, you're right. But the overlap from the networking
protocol point of view is 95%, I'd guess, basically you can say that:

 PXE = TFTP + DHCP + some few DHCP extensions

... and PXE also defines a x86 API which of course does not apply for ppc.

So in my experience, most people simply talk / know about PXE, but
rather mean network booting via DHCP + TFTP. So I'm fine with keeping
the pxe wording here, but if you like, I can also add another patch to
get rid of this (but then the whole file should also be renamed, I
guess? ... is this worth the effort here?)

>>  {
>>  char *args;
>>  
>> -args = g_strdup_printf("-machine accel=tcg "
>> -   "-netdev user,id=" NETNAME ",tftp=./,bootfile=%s 
>> "
>> -   "%s ",
>> -   disk, params);
>> +args = g_strdup_printf("-machine accel=tcg -boot order=n "
>> +   "-netdev user,id=" NETNAME 
>> ",tftp=./,bootfile=%s,"
>> +   "ipv4=%s,ipv6=%s %s", disk, ipv6 ? "off" : "on",
>> +   ipv6 ? "on" : "off", params);
>>  
>>  qtest_start(args);
>>  boot_sector_test();
>> @@ -38,12 +38,17 @@ static void test_pxe_one(const char *params)
>>  
>>  static void test_pxe_e1000(void)
>>  {
>> -test_pxe_one("-device e1000,netdev=" NETNAME);
>> +test_pxe_one("-device e1000,netdev=" NETNAME, false);
>>  }
>>  
>>  static void test_pxe_virtio_pci(void)
>>  {
>> -test_pxe_one("-device virtio-net-pci,netdev=" NETNAME);
>> +test_pxe_one("-device virtio-net-pci,netdev=" NETNAME, false);
>> +}
>> +
>> +static void test_pxe_spapr_vlan(void)
>> +{
>> +test_pxe_one("-vga none -device spapr-vlan,net

[Qemu-devel] [PATCH] hw/mips_cpc: kick a VP when putting it into Run state

2016-09-27 Thread Leon Alrae
While testing mttcg I noticed that VP0 gets stuck in a loop waiting
for other VPs to come up (which never actually happens). To fix this
kick VPs while they are being powered up by Cluster Power Controller.

Signed-off-by: Leon Alrae 
---
 hw/misc/mips_cpc.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/hw/misc/mips_cpc.c b/hw/misc/mips_cpc.c
index 6d34574..b3ff558 100644
--- a/hw/misc/mips_cpc.c
+++ b/hw/misc/mips_cpc.c
@@ -38,6 +38,7 @@ static void cpc_run_vp(MIPSCPCState *cpc, uint64_t vp_run)
 uint64_t i = 1ULL << cs->cpu_index;
 if (i & vp_run & ~cpc->vp_running) {
 cpu_reset(cs);
+qemu_cpu_kick(cs);
 cpc->vp_running |= i;
 }
 }
-- 
1.7.1




Re: [Qemu-devel] [PATCH 1/3] virtio: add virtio_detach_element()

2016-09-27 Thread Ladi Prosek
On Mon, Sep 19, 2016 at 3:28 PM, Stefan Hajnoczi  wrote:
> During device reset or similar situations a VirtQueueElement needs to be
> freed without pushing it onto the used ring or rewinding the virtqueue.
> Extract a new function to do this.
>
> Later patches add virtio_detach_element() calls to existing device so
> that scatter-gather lists are unmapped and vq->inuse goes back to zero
> during device reset.  Currently some devices don't bother and simply
> call g_free(elem) which is not a clean way to throw away a
> VirtQueueElement.
>
> Signed-off-by: Stefan Hajnoczi 
> ---
>  hw/virtio/virtio.c | 27 +--
>  include/hw/virtio/virtio.h |  2 ++
>  2 files changed, 27 insertions(+), 2 deletions(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index fcf3358..adcef45 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -264,12 +264,35 @@ static void virtqueue_unmap_sg(VirtQueue *vq, const 
> VirtQueueElement *elem,
>0, elem->out_sg[i].iov_len);
>  }
>
> +/* virtqueue_detach_element:
> + * @vq: The #VirtQueue
> + * @elem: The #VirtQueueElement
> + * @len: number of bytes written
> + *
> + * Detach the element from the virtqueue.  This function is suitable for 
> device
> + * reset or other situations where a #VirtQueueElement is simply freed and 
> will
> + * not be pushed or discarded.
> + */
> +void virtqueue_detach_element(VirtQueue *vq, const VirtQueueElement *elem,
> +  unsigned int len)
> +{
> +vq->inuse--;
> +virtqueue_unmap_sg(vq, elem, len);
> +}
> +
> +/* virtqueue_discard:
> + * @vq: The #VirtQueue
> + * @elem: The #VirtQueueElement
> + * @len: number of bytes written
> + *
> + * Pretend the most recent element wasn't popped from the virtqueue.  The 
> next
> + * call to virtqueue_pop() will refetch the element.
> + */
>  void virtqueue_discard(VirtQueue *vq, const VirtQueueElement *elem,
> unsigned int len)
>  {
>  vq->last_avail_idx--;
> -vq->inuse--;
> -virtqueue_unmap_sg(vq, elem, len);
> +virtqueue_detach_element(vq, elem, len);

Random comment, not directly related to this change. Would it be worth
adding an assert to this function that elem->index and
vq->last_avail_idx match? In other words, enforce the "most recent"
qualifier mentioned in the comment. As more virtqueue_* functions are
added and the complexity goes up, it is easy to get confused. Also, I
think that naming this function virtqueue_unpop instead of
virtqueue_discard would help.

>  }
>
>  /* virtqueue_rewind:
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index f05559d..ad1e2d6 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -152,6 +152,8 @@ void *virtqueue_alloc_element(size_t sz, unsigned 
> out_num, unsigned in_num);
>  void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
>  unsigned int len);
>  void virtqueue_flush(VirtQueue *vq, unsigned int count);
> +void virtqueue_detach_element(VirtQueue *vq, const VirtQueueElement *elem,
> +  unsigned int len);
>  void virtqueue_discard(VirtQueue *vq, const VirtQueueElement *elem,
> unsigned int len);
>  bool virtqueue_rewind(VirtQueue *vq, unsigned int num);
> --
> 2.7.4
>

Reviewed-by: Ladi Prosek 

Thanks!
Ladi



Re: [Qemu-devel] [PATCH] m68k: change default system clock for m5208evb

2016-09-27 Thread Thomas Huth
On 27.09.2016 03:29, Greg Ungerer wrote:
> The shipping default setting for the Freescale M5208EVB board is to run
> the CPU at 166MHz. The current qemu emulation code for this board is
> defaulting to 66MHz. This results in time appearing to run way to slowly.
> So a "sleep 5" in a standard ColdFire Linux build takes almost 15
> seconds in real time to actually complete.
> 
> Change the hard coded default to match the default hardware setting.
> 
> Signed-off-by: Greg Ungerer 
> ---
>  hw/m68k/mcf5208.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/m68k/mcf5208.c b/hw/m68k/mcf5208.c
> index 9240ebf..2d0b464 100644
> --- a/hw/m68k/mcf5208.c
> +++ b/hw/m68k/mcf5208.c
> @@ -21,7 +21,7 @@
>  #include "elf.h"
>  #include "exec/address-spaces.h"
>  
> -#define SYS_FREQ 6600
> +#define SYS_FREQ 16600

Good catch. But actually, the M5208EVB User's Manual talks about 166.67
MHz, so while you're at it, maybe you should change it to 1 instead?

 Thomas




Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/4] libqos: add PCI management in qtest_vboot()/qtest_shutdown()

2016-09-27 Thread Laurent Vivier


On 27/09/2016 05:48, David Gibson wrote:
> On Mon, Sep 26, 2016 at 04:10:47PM +0200, Laurent Vivier wrote:
>> Signed-off-by: Laurent Vivier 
>> ---
>>  tests/e1000e-test.c |  2 +-
>>  tests/i440fx-test.c |  2 +-
>>  tests/ide-test.c|  2 +-
>>  tests/ivshmem-test.c|  2 +-
>>  tests/libqos/ahci.c |  2 +-
>>  tests/libqos/libqos-pc.c|  5 -
>>  tests/libqos/libqos-spapr.c |  5 -
>>  tests/libqos/libqos.c   | 21 -
>>  tests/libqos/libqos.h   |  3 +++
>>  tests/libqos/pci-pc.c   |  2 +-
>>  tests/libqos/pci-pc.h   |  3 ++-
>>  tests/q35-test.c|  2 +-
>>  tests/rtl8139-test.c|  2 +-
>>  tests/tco-test.c|  2 +-
>>  tests/usb-hcd-ehci-test.c   |  2 +-
>>  tests/usb-hcd-uhci-test.c   |  2 +-
>>  tests/vhost-user-test.c |  2 +-
>>  tests/virtio-9p-test.c  |  2 +-
>>  tests/virtio-blk-test.c |  2 +-
>>  tests/virtio-net-test.c |  2 +-
>>  tests/virtio-scsi-test.c|  2 +-
>>  21 files changed, 45 insertions(+), 24 deletions(-)
> 
> Couple of queries below.
> 

...

>> @@ -49,9 +54,15 @@ QOSState *qtest_boot(QOSOps *ops, const char 
>> *cmdline_fmt, ...)
>>   */
>>  void qtest_shutdown(QOSState *qs)
>>  {
>> -if (qs->alloc && qs->ops && qs->ops->uninit_allocator) {
>> -qs->ops->uninit_allocator(qs->alloc);
>> -qs->alloc = NULL;
>> +if (qs->ops) {
>> +if (qs->alloc && qs->ops->uninit_allocator) {
>> +qs->ops->uninit_allocator(qs->alloc);
>> +qs->alloc = NULL;
>> +}
>> +if (qs->pcibus && qs->ops->qpci_free) {
>> +qs->ops->qpci_free(qs->pcibus);
>> +qs->pcibus = NULL;
>> +}
> 
> Is it safe to cleanup the allocator before the PCI stuff?  Usually
> cleanups want to go in the opposite order to initialization.

Yes, you're right. Im' going to fix that.

>>  }
>>  qtest_quit(qs->qts);
>>  g_free(qs);
>> diff --git a/tests/libqos/libqos.h b/tests/libqos/libqos.h
>> index 604980d..a9f6990 100644
>> --- a/tests/libqos/libqos.h
>> +++ b/tests/libqos/libqos.h
>> @@ -8,11 +8,14 @@
>>  typedef struct QOSOps {
>>  QGuestAllocator *(*init_allocator)(QAllocOpts);
>>  void (*uninit_allocator)(QGuestAllocator *);
>> +QPCIBus *(*qpci_init)(QGuestAllocator *alloc);
>> +void (*qpci_free)(QPCIBus *bus);
>>  } QOSOps;
>>  
>>  typedef struct QOSState {
>>  QTestState *qts;
>>  QGuestAllocator *alloc;
>> +QPCIBus *pcibus;
>>  QOSOps *ops;
>>  } QOSState;
>>  
>> diff --git a/tests/libqos/pci-pc.c b/tests/libqos/pci-pc.c
>> index 82066b8..9600ed6 100644
>> --- a/tests/libqos/pci-pc.c
>> +++ b/tests/libqos/pci-pc.c
>> @@ -212,7 +212,7 @@ static void qpci_pc_iounmap(QPCIBus *bus, void *data)
>>  /* FIXME */
>>  }
>>  
>> -QPCIBus *qpci_init_pc(void)
>> +QPCIBus *qpci_init_pc(QGuestAllocator *alloc)
>>  {
>>  QPCIBusPC *ret;
>>
> 
> You've added the alloc parameter, but you don't actually appear to use it..

It's normal: qpci_init_spapr() needs it and to have the same function
signature we have to add it to qpci_init_pc() even if it is not used.
(it's why I have added a lot of of qpci_init_pc(NULL)), so we can add
init in a generic way in "struct QOSOps". Perhaps we can use "void
*opaque" instead of "QGuestAllocator *alloc"?

Thanks,
Laurent



Re: [Qemu-devel] [PATCH] 9pfs: fix NULL pointer dereference in v9fs_version

2016-09-27 Thread Greg Kurz
On Mon, 26 Sep 2016 21:38:48 -0700
Li Qiang  wrote:

> From: Li Qiang 
> 
> In 9pfs get version dispatch function, a guest can provide a NULL
> version string thus causing an NULL pointer dereference issue.

The guest doesn't provide NULL, but an empty string actually.

> This patch fix this issue.
> 
> Signed-off-by: Li Qiang 
> ---
>  hw/9pfs/9p.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> index 119ee58..dd3145c 100644
> --- a/hw/9pfs/9p.c
> +++ b/hw/9pfs/9p.c
> @@ -955,6 +955,11 @@ static void v9fs_version(void *opaque)
>  offset = err;
>  goto out;
>  }
> +
> +if (!version.data) {
> +offset = -EINVAL;
> +goto out;
> +}

If a client sends a Tversion message with an unrecognized version (which is
obviously the case with an empty string), the server should send back a
Rversion response with 'unknown'... while with this patch it will send Rerror.

http://man.cat-v.org/plan_9/5/version

>  trace_v9fs_version(pdu->tag, pdu->id, s->msize, version.data);
>  
>  virtfs_reset(pdu);

All guests originated strings come from v9fs_iov_vunmarshal() actually, which
does this:

str->data = g_malloc(str->size + 1);
copied = v9fs_unpack(str->data, out_sg, out_num, offset,
 str->size);
if (copied > 0) {
str->data[str->size] = 0;
} else {
v9fs_string_free(str);
}

It makes sense to free the buffer when v9fs_unpack() fails, because the error is
propagated and the caller isn't supposed to derefence str->data in this case.
But it looks wrong to do the same with an empty string, which is expected to be
usable by the caller.

The fix is to check copied >= 0.

Cheers.

--
Greg



Re: [Qemu-devel] [Nbd] [PATCH] proto: add 'shift' extension.

2016-09-27 Thread Alex Bligh

> On 27 Sep 2016, at 00:41, Wouter Verhelst  wrote:
> 
> Thoughts?

Honestly? This whole thing seems like complication for little gain. One command 
per 2GB wiped doesn't seem like a bad thing.

-- 
Alex Bligh







Re: [Qemu-devel] [PATCH] 9pfs: make illegal path name detection more robust

2016-09-27 Thread Greg Kurz
On Mon, 26 Sep 2016 21:40:17 -0700
Li Qiang  wrote:

> From: Li Qiang 
> 
> The parameter of name_is_illegal can be NULL, adding detection of
> this to avoid NULL pointer dereference issue.
> 

Same as with the other patch: the root cause is in v9fs_iov_vunmarshal().

> Signed-off-by: Li Qiang 
> ---
>  hw/9pfs/9p.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> index dd3145c..d960a2e 100644
> --- a/hw/9pfs/9p.c
> +++ b/hw/9pfs/9p.c
> @@ -1277,7 +1277,7 @@ static int v9fs_walk_marshal(V9fsPDU *pdu, uint16_t 
> nwnames, V9fsQID *qids)
>  
>  static bool name_is_illegal(const char *name)
>  {
> -return !*name || strchr(name, '/') != NULL;
> +return !name || !*name || strchr(name, '/') != NULL;
>  }
>  
>  static bool not_same_qid(const V9fsQID *qid1, const V9fsQID *qid2)




Re: [Qemu-devel] [Qemu-ppc] [PATCH 4/4] tests: enable ohci/uhci/xhci tests on PPC64

2016-09-27 Thread Laurent Vivier


On 27/09/2016 05:53, David Gibson wrote:
> On Mon, Sep 26, 2016 at 04:10:49PM +0200, Laurent Vivier wrote:

>>  void qusb_pci_init_one(QPCIBus *pcibus, struct qhc *hc, uint32_t devfn, int 
>> bar)
>>  {
>>  hc->dev = qpci_device_find(pcibus, devfn);
>> @@ -31,6 +38,13 @@ void uhci_port_test(struct qhc *hc, int port, uint16_t 
>> expect)
>>  uint16_t value = qpci_io_readw(hc->dev, addr);
>>  uint16_t mask = ~(UHCI_PORT_WRITE_CLEAR | UHCI_PORT_RSVD1);
>>  
>> +if (qtest_big_endian() && host_big_endian) {
>> +/* little endian device on big endian guest
>> + * must be swapped on big endian host
>> + */
>> +value = bswap16(value);
>> +}
>> +
> 
> Hm.. should the qpci_io_*() helpers handle the endian conversion?

I'm really wondering how to manage correctly this case (I've the same
kind of issue with virtio).

The protocol between guest and test program reads/writes data using the
guest CPU endianess, so it works in the overall case. But in this case,
hcd-uhci is a little-endian device (.endianness = DEVICE_LITTLE_ENDIAN)
on a big endian machine, so I think in the linux driver we should have a
"le16_to_cpu()". But in our case we can't use "le16_to_cpu()" because
endianess of the host cpu is not the same has the one of the guest CPU.
Perhaps I should add a "target_le16_to_cpu()"?

Thanks,
Laurent



Re: [Qemu-devel] [PATCH 2/3] virtio-blk: add missing virtio_detach_element() call

2016-09-27 Thread Ladi Prosek
On Mon, Sep 19, 2016 at 3:28 PM, Stefan Hajnoczi  wrote:
> Make sure to unmap the scatter-gather list and decrement vq->inuse
> before freeing requests in virtio_blk_reset().
>
> Signed-off-by: Stefan Hajnoczi 
> ---
>  hw/block/virtio-blk.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index 3a6112f..c7ca4d6 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -665,6 +665,7 @@ static void virtio_blk_reset(VirtIODevice *vdev)
>  while (s->rq) {
>  req = s->rq;
>  s->rq = req->next;
> +virtqueue_detach_element(req->vq, &req->elem, 0);
>  virtio_blk_free_request(req);

Random observation. virtio_blk_free_request should be static and
removed from the header file. Or maybe removed altogether because
g_free takes NULL pointers just fine.

>  }
>
> --
> 2.7.4
>

Reviewed-by: Ladi Prosek 

Thanks!



Re: [Qemu-devel] [PULL 10/23] vl: Switch qemu_uuid to QemuUUID

2016-09-27 Thread Christian Borntraeger
On 09/23/2016 07:10 AM, Fam Zheng wrote:
> Update all qemu_uuid users as well, especially get rid of the duplicated
> low level g_strdup_printf, sscanf and snprintf calls with QEMU UUID API.
> 
> Since qemu_uuid_parse is quite tangled with qemu_uuid, its switching to
> QemuUUID is done here too to keep everything in sync and avoid code
> churn.
> 
> Signed-off-by: Fam Zheng 
> Reviewed-by: Eric Blake 
> Reviewed-by: Jeff Cody 
> Message-Id: <1474432046-325-10-git-send-email-f...@redhat.com>
> ---
>  hw/ipmi/ipmi_bmc_sim.c |  2 +-
>  hw/nvram/fw_cfg.c  |  2 +-
>  hw/ppc/spapr.c |  7 +--
>  hw/ppc/spapr_rtas.c|  3 ++-
>  hw/smbios/smbios.c | 12 ++--
>  hw/xenpv/xen_domainbuild.c |  6 +-
>  include/qemu/uuid.h|  2 +-
>  include/sysemu/sysemu.h|  3 ++-
>  qmp.c  | 10 ++
>  ui/spice-core.c|  2 +-
>  util/uuid.c| 11 ++-
>  vl.c   |  6 +++---
>  12 files changed, 27 insertions(+), 39 deletions(-)
> 

This broke s390/kvm


make: Entering directory '/home/cborntra/REPOS/qemu/build'
  CCs390x-softmmu/target-s390x/kvm.o
/home/cborntra/REPOS/qemu/target-s390x/kvm.c: In function ‘insert_stsi_3_2_2’:
/home/cborntra/REPOS/qemu/target-s390x/kvm.c:2002:30: error: incompatible type 
for argument 2 of ‘memcpy’
 memcpy(sysib.vm[0].uuid, qemu_uuid, sizeof(sysib.vm[0].uuid));
  ^
In file included from /usr/include/features.h:365:0,
 from /usr/include/stdint.h:25,
 from /usr/lib/gcc/s390x-redhat-linux/5.3.1/include/stdint.h:9,
 from /home/cborntra/REPOS/qemu/include/qemu/osdep.h:65,
 from /home/cborntra/REPOS/qemu/target-s390x/kvm.c:24:
/usr/include/bits/string3.h:50:1: note: expected ‘const void * restrict’ but 
argument is of type ‘QemuUUID {aka struct }’
 __NTH (memcpy (void *__restrict __dest, const void *__restrict __src,
 ^
/home/cborntra/REPOS/qemu/rules.mak:60: recipe for target 'target-s390x/kvm.o' 
failed




Will send a fix.




Re: [Qemu-devel] [PATCH 2/3] virtio-blk: add missing virtio_detach_element() call

2016-09-27 Thread Greg Kurz
On Tue, 27 Sep 2016 09:49:17 +0200
Ladi Prosek  wrote:

> On Mon, Sep 19, 2016 at 3:28 PM, Stefan Hajnoczi  wrote:
> > Make sure to unmap the scatter-gather list and decrement vq->inuse
> > before freeing requests in virtio_blk_reset().
> >
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  hw/block/virtio-blk.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> > index 3a6112f..c7ca4d6 100644
> > --- a/hw/block/virtio-blk.c
> > +++ b/hw/block/virtio-blk.c
> > @@ -665,6 +665,7 @@ static void virtio_blk_reset(VirtIODevice *vdev)
> >  while (s->rq) {
> >  req = s->rq;
> >  s->rq = req->next;
> > +virtqueue_detach_element(req->vq, &req->elem, 0);
> >  virtio_blk_free_request(req);  
> 
> Random observation. virtio_blk_free_request should be static and
> removed from the header file. 

I've sent a followup patch for this:

<147487884587.6679.6170297932839464278.stgit@bahia>

> Or maybe removed altogether because g_free takes NULL pointers just fine.
> 

virtio_blk_free_request() does not seem useful indeed... :)

Cheers.

--
Greg

> >  }
> >
> > --
> > 2.7.4
> >  
> 
> Reviewed-by: Ladi Prosek 
> 
> Thanks!
> 




Re: [Qemu-devel] [PATCH] m68k: change default system clock for m5208evb

2016-09-27 Thread Laurent Vivier


Le 27/09/2016 à 09:33, Thomas Huth a écrit :
> On 27.09.2016 03:29, Greg Ungerer wrote:
>> The shipping default setting for the Freescale M5208EVB board is to run
>> the CPU at 166MHz. The current qemu emulation code for this board is
>> defaulting to 66MHz. This results in time appearing to run way to slowly.
>> So a "sleep 5" in a standard ColdFire Linux build takes almost 15
>> seconds in real time to actually complete.
>>
>> Change the hard coded default to match the default hardware setting.
>>
>> Signed-off-by: Greg Ungerer 
>> ---
>>  hw/m68k/mcf5208.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/m68k/mcf5208.c b/hw/m68k/mcf5208.c
>> index 9240ebf..2d0b464 100644
>> --- a/hw/m68k/mcf5208.c
>> +++ b/hw/m68k/mcf5208.c
>> @@ -21,7 +21,7 @@
>>  #include "elf.h"
>>  #include "exec/address-spaces.h"
>>  
>> -#define SYS_FREQ 6600
>> +#define SYS_FREQ 16600
> 
> Good catch. But actually, the M5208EVB User's Manual talks about 166.67
> MHz, so while you're at it, maybe you should change it to 1 instead?

In this case, it should be better to use a period of 600 ns (and
ptimer_set_period() instead of ptimer_set_freq()).

Laurent



Re: [Qemu-devel] [PULL 10/23] vl: Switch qemu_uuid to QemuUUID

2016-09-27 Thread Fam Zheng
On Tue, 09/27 10:05, Christian Borntraeger wrote:
> On 09/23/2016 07:10 AM, Fam Zheng wrote:
> > Update all qemu_uuid users as well, especially get rid of the duplicated
> > low level g_strdup_printf, sscanf and snprintf calls with QEMU UUID API.
> > 
> > Since qemu_uuid_parse is quite tangled with qemu_uuid, its switching to
> > QemuUUID is done here too to keep everything in sync and avoid code
> > churn.
> > 
> > Signed-off-by: Fam Zheng 
> > Reviewed-by: Eric Blake 
> > Reviewed-by: Jeff Cody 
> > Message-Id: <1474432046-325-10-git-send-email-f...@redhat.com>
> > ---
> >  hw/ipmi/ipmi_bmc_sim.c |  2 +-
> >  hw/nvram/fw_cfg.c  |  2 +-
> >  hw/ppc/spapr.c |  7 +--
> >  hw/ppc/spapr_rtas.c|  3 ++-
> >  hw/smbios/smbios.c | 12 ++--
> >  hw/xenpv/xen_domainbuild.c |  6 +-
> >  include/qemu/uuid.h|  2 +-
> >  include/sysemu/sysemu.h|  3 ++-
> >  qmp.c  | 10 ++
> >  ui/spice-core.c|  2 +-
> >  util/uuid.c| 11 ++-
> >  vl.c   |  6 +++---
> >  12 files changed, 27 insertions(+), 39 deletions(-)
> > 
> 
> This broke s390/kvm
> 
> 
> make: Entering directory '/home/cborntra/REPOS/qemu/build'
>   CCs390x-softmmu/target-s390x/kvm.o
> /home/cborntra/REPOS/qemu/target-s390x/kvm.c: In function ‘insert_stsi_3_2_2’:
> /home/cborntra/REPOS/qemu/target-s390x/kvm.c:2002:30: error: incompatible 
> type for argument 2 of ‘memcpy’
>  memcpy(sysib.vm[0].uuid, qemu_uuid, sizeof(sysib.vm[0].uuid));
>   ^
> In file included from /usr/include/features.h:365:0,
>  from /usr/include/stdint.h:25,
>  from 
> /usr/lib/gcc/s390x-redhat-linux/5.3.1/include/stdint.h:9,
>  from /home/cborntra/REPOS/qemu/include/qemu/osdep.h:65,
>  from /home/cborntra/REPOS/qemu/target-s390x/kvm.c:24:
> /usr/include/bits/string3.h:50:1: note: expected ‘const void * restrict’ but 
> argument is of type ‘QemuUUID {aka struct }’
>  __NTH (memcpy (void *__restrict __dest, const void *__restrict __src,
>  ^
> /home/cborntra/REPOS/qemu/rules.mak:60: recipe for target 
> 'target-s390x/kvm.o' failed
> 
> 
> 
> 
> Will send a fix.

Yes, there is also a similar breakage in xen_domain_build_pv. No idea how I
missed that.

Fam



Re: [Qemu-devel] [PATCH] 9pfs: fix potential host memory leak in v9fs_read

2016-09-27 Thread Greg Kurz
On Mon, 26 Sep 2016 21:42:26 -0700
Li Qiang  wrote:

> From: Li Qiang 
> 
> In 9pfs read dispatch function, it doesn't free two QEMUIOVector
> object thus causing potential memory leak. This patch avoid this.
> 

Good catch for the leak, but I'd prefer another fix. See below.

> Signed-off-by: Li Qiang 
> ---
>  hw/9pfs/9p.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> index d960a2e..b1ff8e7 100644
> --- a/hw/9pfs/9p.c
> +++ b/hw/9pfs/9p.c
> @@ -1830,12 +1830,16 @@ static void v9fs_read(void *opaque)
>  } while (len == -EINTR && !pdu->cancelled);
>  if (len < 0) {
>  /* IO error return the error */
> +qemu_iovec_destroy(&qiov);
> +qemu_iovec_destroy(&qiov_full);

We already have these lines at the end of the code block for
type P9_FID_FILE.

Let's introduce an out_free_iovec: label and...

>  err = len;
>  goto out;

goto out_free_iovec;

>  }
>  } while (count < max_count && len > 0);
>  err = pdu_marshal(pdu, offset, "d", count);
>  if (err < 0) {
> +qemu_iovec_destroy(&qiov);
> +qemu_iovec_destroy(&qiov_full);
>  goto out;

Same here.

>  }
>  err += offset + count;

As a valuable side effect, the error will be traced :)

Cheers.

--
Greg



Re: [Qemu-devel] [PATCH 2/3] target-arm: add env->tbflags

2016-09-27 Thread Paolo Bonzini
> Doing this for all MSR writes is a bit sad, because a lot of them
> don't actually change the TB flags, and quite a few of them which
> previously we were able to code to not have to do a helper call
> at all (direct writes to fields) now get a pointless helper call.

True.  On the other hand, MSR writes terminate the TB so you are
losing all the TB state anyway.  Before these patches you weren't
recomputing the TB flags in the common case of adjacent MSR writes
on the same page (so QEMU could use linked TBs), now you are.
However, given the speedup from the patch, I felt it was premature
optimization.

If there is a case where you get the helper in the profile, it is
possible to add a new ARM_CP_KEEP_TBFLAGS flag to ARMCPRegInfo.

> You're also recalculating more often than stated here, in that
> you also recalc on any gen_lookup_tb() call in the 32-bit
> decoder. (This is just as well because for instance vec_len
> and vec_stride aren't set via the cp15 system register write
> path.)

Right.  This was of course on purpose, but the commit message
was imprecise.

> You're treating the PSTATE_SS flag as static, but you don't
> have anything which causes a recalculation of it on the code
> path which changes it (gen_ss_advance()).
> 
> The 32-bit SETEND instruction changes CPSR_E, which has
> an effect on the BE_DATA_MASK flag, but I don't think
> that code path will cause us to recalculate flags.

This actually points to a bigger deficiency, in that---even
outside the PSTATE_SS and SETEND code paths---both pstate_write
and cpsr_write need to recompute the flags.  But I think that's
the only other case left.

Do you prefer to have the setend and clear_pstate_ss helpers
call cpsr_write/pstate_write, or do you prefer to inline the
modification to the tbflags?

> I found this patch kind of difficult to review because
> it isn't obvious why we recalculate the static flags at
> the points where we do (ie whether those points are
> necessary and sufficient for correct behaviour). Most
> of the comments above are the result of my looking at
> whether some particular flags that I suspected of being
> tricky were handled correctly :-)

You definitely have a point here.  Adding an assertion requires
looking at CPUARMState in gen_intermediate_code.  You're not really
supposed to do that, but I guess it's okay as long as it's for
debugging purposes.

Paolo



Re: [Qemu-devel] [PATCH v2 3/7] block/qapi: Move 'aio' option to file driver

2016-09-27 Thread Kevin Wolf
Am 26.09.2016 um 18:49 hat Max Reitz geschrieben:
> On 23.09.2016 16:32, Kevin Wolf wrote:
> > The option whether or not to use a native AIO interface really isn't a
> > generic option for all drivers, but only applies to the native file
> > protocols. This patch moves the option in blockdev-add to the
> > appropriate places (raw-posix and raw-win32).
> > 
> > We still have to keep the flag BDRV_O_NATIVE_AIO for compatibility
> > because so far the AIO option was usually specified on the wrong layer
> > (the top-level format driver, which didn't even look at it) and then
> > inherited by the protocol driver (where it was actually used). We can't
> > forbid this use except in new interfaces.
> > 
> > Signed-off-by: Kevin Wolf 
> > ---
> >  block/raw-posix.c  | 44 ---
> >  block/raw-win32.c  | 56 
> > +-
> >  qapi/block-core.json   |  6 +++---
> >  tests/qemu-iotests/087 |  4 ++--
> >  4 files changed, 83 insertions(+), 27 deletions(-)
> 
> [...]
> 
> > diff --git a/block/raw-win32.c b/block/raw-win32.c
> > index 56f45fe..734bb10 100644
> > --- a/block/raw-win32.c
> > +++ b/block/raw-win32.c
> 
> [...]
> 
> > +static bool get_aio_option(QemuOpts *opts, int flags, Error **errp)
> > +{
> > +BlockdevAioOptions aio, aio_default;
> > +
> > +aio_default = (flags & BDRV_O_NATIVE_AIO) ? BLOCKDEV_AIO_OPTIONS_NATIVE
> > +  : 
> > BLOCKDEV_AIO_OPTIONS_THREADS;
> > +aio = qapi_enum_parse(BlockdevAioOptions_lookup, qemu_opt_get(opts, 
> > "aio"),
> > +  BLOCKDEV_AIO_OPTIONS__MAX, aio_default, errp);
> > +
> > +switch (aio) {
> > +case BLOCKDEV_AIO_OPTIONS_NATIVE:
> > +return true;
> > +case BLOCKDEV_AIO_OPTIONS_THREADS:
> > +return false;
> > +default:
> > +error_setg(errp, "Invalid AIO option");
> 
> Any reason for catching this case here but not in raw-posix?
> 
> (Not that it really matters, though.)

Nobody will forget raw-posix when adding a new AIO mode to win32, so I
didn't feel like the additional code was worth it there. But if we add a
new AIO mode to raw-posix, I'm pretty sure we will forget win32.

Kevin


pgpfHbPUAfXua.pgp
Description: PGP signature


Re: [Qemu-devel] [PATCH v2 0/7] block: Make more blockdev-add options work

2016-09-27 Thread Kevin Wolf
Am 26.09.2016 um 19:10 hat Max Reitz geschrieben:
> It's a bit funny now how blockdev_init() and
> extract_common_blockdev_options() are actually used by neither
> blockdev-add nor -blockdev, but only by drive_new() (which happily
> advertises blockdev_init() to be shared with blockev-add, though).

Yes. Essentially drive_new() is now the clean wrappers that map legacy
syntax to the modern options, whereas blockdev_init() is left as the
place for the messy stuff. Maybe we can get rid of it sooner or later.

Kevin


pgpJFlU2wDfrk.pgp
Description: PGP signature


Re: [Qemu-devel] [PATCH v2 0/7] block: Make more blockdev-add options work

2016-09-27 Thread Kevin Wolf
Am 23.09.2016 um 16:32 hat Kevin Wolf geschrieben:
> This series moves a few more options from flags to the appropriate place. This
> doesn't only result in cleaner code, but also allows using these options in
> nested node definitions.
> 
> After this series, bds_tree_init() is only a thin wrapper around bdrv_open()
> which sets the right defaults for cache modes and the BDRV_O_INACTIVE flag if
> necessary.
> 
> Depends on my block branch, specifically Berto's 'read-only' patches.

Applied to the block branch.

Kevin



[Qemu-devel] [PATCH 1/8] [debug] uas: use 32 streams

2016-09-27 Thread Gerd Hoffmann
Signed-off-by: Gerd Hoffmann 
---
 hw/usb/dev-uas.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/usb/dev-uas.c b/hw/usb/dev-uas.c
index 3a8ff18..2a5fcbc 100644
--- a/hw/usb/dev-uas.c
+++ b/hw/usb/dev-uas.c
@@ -101,7 +101,7 @@ typedef struct {
 
 /* - */
 
-#define UAS_STREAM_BM_ATTR  4
+#define UAS_STREAM_BM_ATTR  5
 #define UAS_MAX_STREAMS (1 << UAS_STREAM_BM_ATTR)
 
 typedef struct UASDevice UASDevice;
-- 
1.8.3.1




[Qemu-devel] [PATCH 2/8] xhci: decouple EV_QUEUE from TD_QUEUE

2016-09-27 Thread Gerd Hoffmann
EV_QUEUE must not change because an array of that size is part of live
migration data.  Hard-code current value there, so we can touch TD_QUEUE
without breaking live migration.

Signed-off-by: Gerd Hoffmann 
---
 hw/usb/hcd-xhci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
index 726435c..15cac56 100644
--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -49,7 +49,7 @@
 #define TD_QUEUE 24
 
 /* Very pessimistic, let's hope it's enough for all cases */
-#define EV_QUEUE (((3*TD_QUEUE)+16)*MAXSLOTS)
+#define EV_QUEUE (((3 * 24) + 16) * MAXSLOTS)
 /* Do not deliver ER Full events. NEC's driver does some things not bound
  * to the specs when it gets them */
 #define ER_FULL_HACK
-- 
1.8.3.1




[Qemu-devel] [PATCH 3/8] xhci: drop unused comp_xfer field

2016-09-27 Thread Gerd Hoffmann
Signed-off-by: Gerd Hoffmann 
---
 hw/usb/hcd-xhci.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
index 15cac56..089dcbf 100644
--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -384,7 +384,6 @@ struct XHCIEPContext {
 
 XHCIRing ring;
 unsigned int next_xfer;
-unsigned int comp_xfer;
 XHCITransfer transfers[TD_QUEUE];
 XHCITransfer *retry;
 EPType type;
-- 
1.8.3.1




[Qemu-devel] [PATCH 0/8] xhci: stream transfer fixes, cleanups

2016-09-27 Thread Gerd Hoffmann
  Hi,

Our xhci emulation has trouble handling devices with many xhci streams,
which is fixed by patch #4.  Patch #1 can be used to reproduce the bug
with the qemu uas emulation.  It's there for testing convinience only,
I do not intend to merge it.

The other patches are cleanups.

please review,
  Gerd

Gerd Hoffmann (8):
  [debug] uas: use 32 streams
  xhci: decouple EV_QUEUE from TD_QUEUE
  xhci: drop unused comp_xfer field
  xhci: use linked list for transfers
  xhci: drop XHCITransfer->xhci
  xhci: add & use xhci_kick_epctx()
  xhci: drop XHCITransfer->{slotid,epid}
  xhci: make xhci_epid_to_usbep accept XHCIEPContext

 hw/usb/dev-uas.c  |   2 +-
 hw/usb/hcd-xhci.c | 219 +-
 2 files changed, 119 insertions(+), 102 deletions(-)

-- 
1.8.3.1




[Qemu-devel] [PATCH 6/8] xhci: add & use xhci_kick_epctx()

2016-09-27 Thread Gerd Hoffmann
xhci_kick_epctx is a xhci_kick_ep variant which takes an XHCIEPContext
as input instead of slotid and epid.  So in case we have a XHCIEPContext
at hand at the callsite we can just pass it directly.

Signed-off-by: Gerd Hoffmann 
---
 hw/usb/hcd-xhci.c | 43 ++-
 1 file changed, 26 insertions(+), 17 deletions(-)

diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
index da249f7..4e557c2 100644
--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -507,6 +507,7 @@ enum xhci_flags {
 
 static void xhci_kick_ep(XHCIState *xhci, unsigned int slotid,
  unsigned int epid, unsigned int streamid);
+static void xhci_kick_epctx(XHCIEPContext *epctx, unsigned int streamid);
 static TRBCCode xhci_disable_ep(XHCIState *xhci, unsigned int slotid,
 unsigned int epid);
 static void xhci_xfer_report(XHCITransfer *xfer);
@@ -1352,7 +1353,7 @@ static void xhci_set_ep_state(XHCIState *xhci, 
XHCIEPContext *epctx,
 static void xhci_ep_kick_timer(void *opaque)
 {
 XHCIEPContext *epctx = opaque;
-xhci_kick_ep(epctx->xhci, epctx->slotid, epctx->epid, 0);
+xhci_kick_epctx(epctx, 0);
 }
 
 static XHCIEPContext *xhci_alloc_epctx(XHCIState *xhci,
@@ -1998,7 +1999,7 @@ static int xhci_fire_ctl_transfer(XHCIState *xhci, 
XHCITransfer *xfer)
 
 xhci_complete_packet(xfer);
 if (!xfer->running_async && !xfer->running_retry) {
-xhci_kick_ep(xhci, xfer->slotid, xfer->epid, 0);
+xhci_kick_epctx(xfer->epctx, 0);
 }
 return 0;
 }
@@ -2102,7 +2103,7 @@ static int xhci_submit(XHCIState *xhci, XHCITransfer 
*xfer, XHCIEPContext *epctx
 
 xhci_complete_packet(xfer);
 if (!xfer->running_async && !xfer->running_retry) {
-xhci_kick_ep(xhci, xfer->slotid, xfer->epid, xfer->streamid);
+xhci_kick_epctx(xfer->epctx, xfer->streamid);
 }
 return 0;
 }
@@ -2116,16 +2117,8 @@ static int xhci_fire_transfer(XHCIState *xhci, 
XHCITransfer *xfer, XHCIEPContext
 static void xhci_kick_ep(XHCIState *xhci, unsigned int slotid,
  unsigned int epid, unsigned int streamid)
 {
-XHCIStreamContext *stctx;
 XHCIEPContext *epctx;
-XHCITransfer *xfer;
-XHCIRing *ring;
-USBEndpoint *ep = NULL;
-uint64_t mfindex;
-int length;
-int i;
 
-trace_usb_xhci_ep_kick(slotid, epid, streamid);
 assert(slotid >= 1 && slotid <= xhci->numslots);
 assert(epid >= 1 && epid <= 31);
 
@@ -2140,11 +2133,27 @@ static void xhci_kick_ep(XHCIState *xhci, unsigned int 
slotid,
 return;
 }
 
+xhci_kick_epctx(epctx, streamid);
+}
+
+static void xhci_kick_epctx(XHCIEPContext *epctx, unsigned int streamid)
+{
+XHCIState *xhci = epctx->xhci;
+XHCIStreamContext *stctx;
+XHCITransfer *xfer;
+XHCIRing *ring;
+USBEndpoint *ep = NULL;
+uint64_t mfindex;
+int length;
+int i;
+
+trace_usb_xhci_ep_kick(epctx->slotid, epctx->epid, streamid);
+
 /* If the device has been detached, but the guest has not noticed this
yet the 2 above checks will succeed, but we must NOT continue */
-if (!xhci->slots[slotid - 1].uport ||
-!xhci->slots[slotid - 1].uport->dev ||
-!xhci->slots[slotid - 1].uport->dev->attached) {
+if (!xhci->slots[epctx->slotid - 1].uport ||
+!xhci->slots[epctx->slotid - 1].uport->dev ||
+!xhci->slots[epctx->slotid - 1].uport->dev->attached) {
 return;
 }
 
@@ -2225,7 +2234,7 @@ static void xhci_kick_ep(XHCIState *xhci, unsigned int 
slotid,
 }
 xfer->streamid = streamid;
 
-if (epid == 1) {
+if (epctx->epid == 1) {
 xhci_fire_ctl_transfer(xhci, xfer);
 } else {
 xhci_fire_transfer(xhci, xfer, epctx);
@@ -2245,7 +2254,7 @@ static void xhci_kick_ep(XHCIState *xhci, unsigned int 
slotid,
 }
 }
 
-ep = xhci_epid_to_usbep(xhci, slotid, epid);
+ep = xhci_epid_to_usbep(xhci, epctx->slotid, epctx->epid);
 if (ep) {
 usb_device_flush_ep_queue(ep->dev, ep);
 }
@@ -3476,7 +3485,7 @@ static void xhci_complete(USBPort *port, USBPacket 
*packet)
 return;
 }
 xhci_complete_packet(xfer);
-xhci_kick_ep(xfer->epctx->xhci, xfer->slotid, xfer->epid, xfer->streamid);
+xhci_kick_epctx(xfer->epctx, xfer->streamid);
 if (xfer->complete) {
 xhci_ep_free_xfer(xfer);
 }
-- 
1.8.3.1




[Qemu-devel] [PATCH 5/8] xhci: drop XHCITransfer->xhci

2016-09-27 Thread Gerd Hoffmann
Use XHCITransfer->epctx->xhci instead.

Signed-off-by: Gerd Hoffmann 
---
 hw/usb/hcd-xhci.c | 22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
index d8e4074..da249f7 100644
--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -342,7 +342,6 @@ typedef struct XHCIPort {
 } XHCIPort;
 
 typedef struct XHCITransfer {
-XHCIState *xhci;
 XHCIEPContext *epctx;
 USBPacket packet;
 QEMUSGList sgl;
@@ -1439,7 +1438,6 @@ static XHCITransfer *xhci_ep_alloc_xfer(XHCIEPContext 
*epctx,
 }
 
 xfer = g_new0(XHCITransfer, 1);
-xfer->xhci = epctx->xhci;
 xfer->epctx = epctx;
 xfer->slotid = epctx->slotid;
 xfer->epid = epctx->epid;
@@ -1478,10 +1476,9 @@ static int xhci_ep_nuke_one_xfer(XHCITransfer *t, 
TRBCCode report)
 killed = 1;
 }
 if (t->running_retry) {
-XHCIEPContext *epctx = t->xhci->slots[t->slotid-1].eps[t->epid-1];
-if (epctx) {
-epctx->retry = NULL;
-timer_del(epctx->kick_timer);
+if (t->epctx) {
+t->epctx->retry = NULL;
+timer_del(t->epctx->kick_timer);
 }
 t->running_retry = 0;
 killed = 1;
@@ -1711,7 +1708,7 @@ static TRBCCode xhci_set_ep_dequeue(XHCIState *xhci, 
unsigned int slotid,
 
 static int xhci_xfer_create_sgl(XHCITransfer *xfer, int in_xfer)
 {
-XHCIState *xhci = xfer->xhci;
+XHCIState *xhci = xfer->epctx->xhci;
 int i;
 
 xfer->int_req = false;
@@ -1770,7 +1767,7 @@ static void xhci_xfer_report(XHCITransfer *xfer)
 bool reported = 0;
 bool shortpkt = 0;
 XHCIEvent event = {ER_TRANSFER, CC_SUCCESS};
-XHCIState *xhci = xfer->xhci;
+XHCIState *xhci = xfer->epctx->xhci;
 int i;
 
 left = xfer->packet.actual_length;
@@ -1844,9 +1841,8 @@ static void xhci_xfer_report(XHCITransfer *xfer)
 
 static void xhci_stall_ep(XHCITransfer *xfer)
 {
-XHCIState *xhci = xfer->xhci;
-XHCISlot *slot = &xhci->slots[xfer->slotid-1];
-XHCIEPContext *epctx = slot->eps[xfer->epid-1];
+XHCIEPContext *epctx = xfer->epctx;
+XHCIState *xhci = epctx->xhci;
 uint32_t err;
 XHCIStreamContext *sctx;
 
@@ -1870,7 +1866,7 @@ static int xhci_submit(XHCIState *xhci, XHCITransfer 
*xfer,
 
 static int xhci_setup_packet(XHCITransfer *xfer)
 {
-XHCIState *xhci = xfer->xhci;
+XHCIState *xhci = xfer->epctx->xhci;
 USBEndpoint *ep;
 int dir;
 
@@ -3480,7 +3476,7 @@ static void xhci_complete(USBPort *port, USBPacket 
*packet)
 return;
 }
 xhci_complete_packet(xfer);
-xhci_kick_ep(xfer->xhci, xfer->slotid, xfer->epid, xfer->streamid);
+xhci_kick_ep(xfer->epctx->xhci, xfer->slotid, xfer->epid, xfer->streamid);
 if (xfer->complete) {
 xhci_ep_free_xfer(xfer);
 }
-- 
1.8.3.1




[Qemu-devel] [PATCH 7/8] xhci: drop XHCITransfer->{slotid,epid}

2016-09-27 Thread Gerd Hoffmann
We can use XHCITransfer->epctx->{slotid,epid} instead.

Signed-off-by: Gerd Hoffmann 
---
 hw/usb/hcd-xhci.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
index 4e557c2..108d185 100644
--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -350,8 +350,6 @@ typedef struct XHCITransfer {
 bool complete;
 bool int_req;
 unsigned int iso_pkts;
-unsigned int slotid;
-unsigned int epid;
 unsigned int streamid;
 bool in_xfer;
 bool iso_xfer;
@@ -1440,8 +1438,6 @@ static XHCITransfer *xhci_ep_alloc_xfer(XHCIEPContext 
*epctx,
 
 xfer = g_new0(XHCITransfer, 1);
 xfer->epctx = epctx;
-xfer->slotid = epctx->slotid;
-xfer->epid = epctx->epid;
 xfer->trbs = g_new(XHCITRB, length);
 xfer->trb_count = length;
 usb_packet_init(&xfer->packet);
@@ -1806,8 +1802,8 @@ static void xhci_xfer_report(XHCITransfer *xfer)
 if (!reported && ((trb->control & TRB_TR_IOC) ||
   (shortpkt && (trb->control & TRB_TR_ISP)) ||
   (xfer->status != CC_SUCCESS && left == 0))) {
-event.slotid = xfer->slotid;
-event.epid = xfer->epid;
+event.slotid = xfer->epctx->slotid;
+event.epid = xfer->epctx->epid;
 event.length = (trb->status & 0x1) - chunk;
 event.flags = 0;
 event.ptr = trb->addr;
@@ -1876,7 +1872,7 @@ static int xhci_setup_packet(XHCITransfer *xfer)
 if (xfer->packet.ep) {
 ep = xfer->packet.ep;
 } else {
-ep = xhci_epid_to_usbep(xhci, xfer->slotid, xfer->epid);
+ep = xhci_epid_to_usbep(xhci, xfer->epctx->slotid, xfer->epctx->epid);
 if (!ep) {
 DPRINTF("xhci: slot %d has no device\n",
 xfer->slotid);
@@ -1956,7 +1952,8 @@ static int xhci_fire_ctl_transfer(XHCIState *xhci, 
XHCITransfer *xfer)
 trb_setup = &xfer->trbs[0];
 trb_status = &xfer->trbs[xfer->trb_count-1];
 
-trace_usb_xhci_xfer_start(xfer, xfer->slotid, xfer->epid, xfer->streamid);
+trace_usb_xhci_xfer_start(xfer, xfer->epctx->slotid,
+  xfer->epctx->epid, xfer->streamid);
 
 /* at most one Event Data TRB allowed after STATUS */
 if (TRB_TYPE(*trb_status) == TR_EVDATA && xfer->trb_count > 2) {
@@ -2110,7 +2107,8 @@ static int xhci_submit(XHCIState *xhci, XHCITransfer 
*xfer, XHCIEPContext *epctx
 
 static int xhci_fire_transfer(XHCIState *xhci, XHCITransfer *xfer, 
XHCIEPContext *epctx)
 {
-trace_usb_xhci_xfer_start(xfer, xfer->slotid, xfer->epid, xfer->streamid);
+trace_usb_xhci_xfer_start(xfer, xfer->epctx->slotid,
+  xfer->epctx->epid, xfer->streamid);
 return xhci_submit(xhci, xfer, epctx);
 }
 
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH V15 00/12] Introduce COLO-Proxy

2016-09-27 Thread Jason Wang



On 2016年09月27日 10:22, Zhang Chen wrote:

COLO-proxy is a part of COLO project. COLO project is
composed of COLO-frame, COLO-proxy and block-replication.
It is used to compare the network package to help COLO
decide whether to do checkpoint. With COLO-proxy's help,
COLO greatly improves the performance.

The filter-redirector, filter-mirror, colo-compare
and filter-rewriter compose the COLO-proxy.

COLO-compare
It is used to compare the network package to help COLO decide
whether to do checkpoint.

Filter-rewriter
It will rewrite some of secondary packet to make
secondary guest's connection established successfully.
In this module we will rewrite tcp packet's ack to the secondary
from primary,and rewrite tcp packet's seq to the primary from
secondary.

The full version in this github:
https://github.com/zhangckid/qemu/tree/colo-v2.7-proxy-mode-compare-and-rewriter-sep27

v15:
   - change "ConnectionKey key = { 0 };" to
 "ConnectionKey key = {{0},};", fix typo.


This causes  build failure for mingw@fedora docker test.

I tend to fix by not using the initializer but call a memset() in 
fill_connection_key().


If you agree, I will fix it in my tree and no need to repost a new version.




[Qemu-devel] [PATCH 8/8] xhci: make xhci_epid_to_usbep accept XHCIEPContext

2016-09-27 Thread Gerd Hoffmann
All callsites have a XHCIEPContext pointer anyway, so we can just pass
it directly instead of fiddeling with slotid and epid.

Signed-off-by: Gerd Hoffmann 
---
 hw/usb/hcd-xhci.c | 29 +++--
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
index 108d185..5176405 100644
--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -511,8 +511,7 @@ static TRBCCode xhci_disable_ep(XHCIState *xhci, unsigned 
int slotid,
 static void xhci_xfer_report(XHCITransfer *xfer);
 static void xhci_event(XHCIState *xhci, XHCIEvent *event, int v);
 static void xhci_write_event(XHCIState *xhci, XHCIEvent *event, int v);
-static USBEndpoint *xhci_epid_to_usbep(XHCIState *xhci,
-   unsigned int slotid, unsigned int epid);
+static USBEndpoint *xhci_epid_to_usbep(XHCIEPContext *epctx);
 
 static const char *TRBType_names[] = {
 [TRB_RESERVED] = "TRB_RESERVED",
@@ -1190,7 +1189,7 @@ static int xhci_epmask_to_eps_with_streams(XHCIState 
*xhci,
 }
 
 epctx = slot->eps[i - 1];
-ep = xhci_epid_to_usbep(xhci, slotid, i);
+ep = xhci_epid_to_usbep(epctx);
 if (!epctx || !epctx->nr_pstreams || !ep) {
 continue;
 }
@@ -1521,7 +1520,7 @@ static int xhci_ep_nuke_xfers(XHCIState *xhci, unsigned 
int slotid,
 xhci_ep_free_xfer(xfer);
 }
 
-ep = xhci_epid_to_usbep(xhci, slotid, epid);
+ep = xhci_epid_to_usbep(epctx);
 if (ep) {
 usb_device_ep_stopped(ep->dev, ep);
 }
@@ -1863,7 +1862,6 @@ static int xhci_submit(XHCIState *xhci, XHCITransfer 
*xfer,
 
 static int xhci_setup_packet(XHCITransfer *xfer)
 {
-XHCIState *xhci = xfer->epctx->xhci;
 USBEndpoint *ep;
 int dir;
 
@@ -1872,7 +1870,7 @@ static int xhci_setup_packet(XHCITransfer *xfer)
 if (xfer->packet.ep) {
 ep = xfer->packet.ep;
 } else {
-ep = xhci_epid_to_usbep(xhci, xfer->epctx->slotid, xfer->epctx->epid);
+ep = xhci_epid_to_usbep(xfer->epctx);
 if (!ep) {
 DPRINTF("xhci: slot %d has no device\n",
 xfer->slotid);
@@ -2252,7 +2250,7 @@ static void xhci_kick_epctx(XHCIEPContext *epctx, 
unsigned int streamid)
 }
 }
 
-ep = xhci_epid_to_usbep(xhci, epctx->slotid, epctx->epid);
+ep = xhci_epid_to_usbep(epctx);
 if (ep) {
 usb_device_flush_ep_queue(ep->dev, ep);
 }
@@ -3517,17 +3515,20 @@ static int xhci_find_epid(USBEndpoint *ep)
 }
 }
 
-static USBEndpoint *xhci_epid_to_usbep(XHCIState *xhci,
-   unsigned int slotid, unsigned int epid)
+static USBEndpoint *xhci_epid_to_usbep(XHCIEPContext *epctx)
 {
-assert(slotid >= 1 && slotid <= xhci->numslots);
+USBPort *uport;
+uint32_t token;
 
-if (!xhci->slots[slotid - 1].uport) {
+if (!epctx) {
 return NULL;
 }
-
-return usb_ep_get(xhci->slots[slotid - 1].uport->dev,
-  (epid & 1) ? USB_TOKEN_IN : USB_TOKEN_OUT, epid >> 1);
+uport = epctx->xhci->slots[epctx->slotid - 1].uport;
+token = (epctx->epid & 1) ? USB_TOKEN_IN : USB_TOKEN_OUT;
+if (!uport) {
+return NULL;
+}
+return usb_ep_get(uport->dev, token, epctx->epid >> 1);
 }
 
 static void xhci_wakeup_endpoint(USBBus *bus, USBEndpoint *ep,
-- 
1.8.3.1




[Qemu-devel] [PATCH 4/8] xhci: use linked list for transfers

2016-09-27 Thread Gerd Hoffmann
xhci has a fixed number of 24 (TD_QUEUE) XHCITransfer structs per
endpoint, which turns out to be a problem for usb3 devices with 32 (or
more) bulk streams.  xhci re-checks the trb rings on every finished
transfer to make sure it'll pick up any pending work.  But that scheme
breaks in case the first transfer of a ring can't be started because we
ran out of XHCITransfer structs already.

So remove static XHCITransfer array from XHCIEPContext.  Use a linked
list instead, and allocate/free XHCITransfer as needed.  Add helper
functions to allocate & initialize and to cleanup & release
XHCITransfer structs.  That also simplifies trb management, we never
have to realloc XHCITransfer->trbs because we don't reuse XHCITransfer
structs any more.

New dynamic limit for in-flight xhci transfers per endpoint is
number-of-streams + 16.

Signed-off-by: Gerd Hoffmann 
---
 hw/usb/hcd-xhci.c | 122 ++
 1 file changed, 68 insertions(+), 54 deletions(-)

diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
index 089dcbf..d8e4074 100644
--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -21,6 +21,7 @@
 #include "qemu/osdep.h"
 #include "hw/hw.h"
 #include "qemu/timer.h"
+#include "qemu/queue.h"
 #include "hw/usb.h"
 #include "hw/pci/pci.h"
 #include "hw/pci/msi.h"
@@ -46,8 +47,6 @@
 #define MAXSLOTS 64
 #define MAXINTRS 16
 
-#define TD_QUEUE 24
-
 /* Very pessimistic, let's hope it's enough for all cases */
 #define EV_QUEUE (((3 * 24) + 16) * MAXSLOTS)
 /* Do not deliver ER Full events. NEC's driver does some things not bound
@@ -344,6 +343,7 @@ typedef struct XHCIPort {
 
 typedef struct XHCITransfer {
 XHCIState *xhci;
+XHCIEPContext *epctx;
 USBPacket packet;
 QEMUSGList sgl;
 bool running_async;
@@ -359,7 +359,6 @@ typedef struct XHCITransfer {
 bool timed_xfer;
 
 unsigned int trb_count;
-unsigned int trb_alloced;
 XHCITRB *trbs;
 
 TRBCCode status;
@@ -369,6 +368,8 @@ typedef struct XHCITransfer {
 unsigned int cur_pkt;
 
 uint64_t mfindex_kick;
+
+QTAILQ_ENTRY(XHCITransfer) next;
 } XHCITransfer;
 
 struct XHCIStreamContext {
@@ -383,8 +384,8 @@ struct XHCIEPContext {
 unsigned int epid;
 
 XHCIRing ring;
-unsigned int next_xfer;
-XHCITransfer transfers[TD_QUEUE];
+uint32_t xfer_count;
+QTAILQ_HEAD(, XHCITransfer) transfers;
 XHCITransfer *retry;
 EPType type;
 dma_addr_t pctx;
@@ -1360,19 +1361,13 @@ static XHCIEPContext *xhci_alloc_epctx(XHCIState *xhci,
unsigned int epid)
 {
 XHCIEPContext *epctx;
-int i;
 
 epctx = g_new0(XHCIEPContext, 1);
 epctx->xhci = xhci;
 epctx->slotid = slotid;
 epctx->epid = epid;
 
-for (i = 0; i < ARRAY_SIZE(epctx->transfers); i++) {
-epctx->transfers[i].xhci = xhci;
-epctx->transfers[i].slotid = slotid;
-epctx->transfers[i].epid = epid;
-usb_packet_init(&epctx->transfers[i].packet);
-}
+QTAILQ_INIT(&epctx->transfers);
 epctx->kick_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, xhci_ep_kick_timer, 
epctx);
 
 return epctx;
@@ -1433,6 +1428,41 @@ static TRBCCode xhci_enable_ep(XHCIState *xhci, unsigned 
int slotid,
 return CC_SUCCESS;
 }
 
+static XHCITransfer *xhci_ep_alloc_xfer(XHCIEPContext *epctx,
+uint32_t length)
+{
+uint32_t limit = epctx->nr_pstreams + 16;
+XHCITransfer *xfer;
+
+if (epctx->xfer_count >= limit) {
+return NULL;
+}
+
+xfer = g_new0(XHCITransfer, 1);
+xfer->xhci = epctx->xhci;
+xfer->epctx = epctx;
+xfer->slotid = epctx->slotid;
+xfer->epid = epctx->epid;
+xfer->trbs = g_new(XHCITRB, length);
+xfer->trb_count = length;
+usb_packet_init(&xfer->packet);
+
+QTAILQ_INSERT_TAIL(&epctx->transfers, xfer, next);
+epctx->xfer_count++;
+
+return xfer;
+}
+
+static void xhci_ep_free_xfer(XHCITransfer *xfer)
+{
+QTAILQ_REMOVE(&xfer->epctx->transfers, xfer, next);
+xfer->epctx->xfer_count--;
+
+usb_packet_cleanup(&xfer->packet);
+g_free(xfer->trbs);
+g_free(xfer);
+}
+
 static int xhci_ep_nuke_one_xfer(XHCITransfer *t, TRBCCode report)
 {
 int killed = 0;
@@ -1459,7 +1489,7 @@ static int xhci_ep_nuke_one_xfer(XHCITransfer *t, 
TRBCCode report)
 g_free(t->trbs);
 
 t->trbs = NULL;
-t->trb_count = t->trb_alloced = 0;
+t->trb_count = 0;
 
 return killed;
 }
@@ -1469,7 +1499,8 @@ static int xhci_ep_nuke_xfers(XHCIState *xhci, unsigned 
int slotid,
 {
 XHCISlot *slot;
 XHCIEPContext *epctx;
-int i, xferi, killed = 0;
+XHCITransfer *xfer;
+int killed = 0;
 USBEndpoint *ep = NULL;
 assert(slotid >= 1 && slotid <= xhci->numslots);
 assert(epid >= 1 && epid <= 31);
@@ -1484,14 +1515,16 @@ static int xhci_ep_nuke_xfers(XHCIState *xhci, unsigned 
int slotid,
 
 epctx = slot->eps[epid-1];
 
-xferi = epctx->next_xfer;
-for (i = 0; i < TD_QUEUE; i++) {

Re: [Qemu-devel] [PATCH] qemu-img: Allow unmap backing image for zeroed clusters

2016-09-27 Thread Kevin Wolf
Am 27.09.2016 um 04:20 hat Fam Zheng geschrieben:
> We already specified BDRV_O_UNMAP when opening images in 'qemu-img
> commit', but didn't turn on the "unmap" in the active commit job. This
> patch fixes that so that zeroed clusters in top image can be discarded
> which is desired in the virt-sparsify use case, where a temporary
> overlay is created and fstrim'ed before commiting back, to free space in
> the original image.
> 
> The block-commit keeps the existing behavior.
> 
> Signed-off-by: Fam Zheng 

Is there a good reason for not using discard in an active commit if the
image was opened with BDRV_O_UNMAP? That is, wouldn't it make sense to
just set the unmap option for the mirror unconditionally and also for
block-commit?

If there is a reason, we should probably add an option to block-commit,
but I still feels that enabling it should be the default then.

Kevin



Re: [Qemu-devel] [PATCH] util: secure memfd_create fallback mechanism

2016-09-27 Thread Daniel P. Berrange
On Tue, Sep 27, 2016 at 03:06:21AM +, Rafael David Tinoco wrote:
> Commit: 35f9b6ef3acc9d0546c395a566b04e63ca84e302 added a fallback
> mechanism for systems not supporting memfd_create syscall (started
> being supported since 3.17).

This is really dubious code in general and IMHO should just
be reverted.

We have a golden rule that any time QEMU needs to be able to
create a file on disk, then the path should be explicitly
provided as a command line argument so that mgmt apps can
control the location used.

> Backporting memfd_create might not be accepted for distros relying
> on older kernels. Nowadays there is no way for security driver
> to discover memfd filename to be created: /memfd-XX.
> 
> It is more appropriate to include UUID and/or VM names in the
> temporary filename, allowing security driver rules to be applied
> while maintaining the required unpredictability with mkstemp.

We should not have QEMU creating unpredictabile filenames in the
first place - any filenames should be determined by libvirt
explicitly.

> This change will allow libvirt to know exact memfd file to be created
> for vhost log AND to create appropriate security rules to allow access
> per instance (instead of a opened rule like /memfd-*).

Even with this change it is bad - we don't want driver backends
creating arbitrary files in the shared /tmp directory.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] [PATCH V15 00/12] Introduce COLO-Proxy

2016-09-27 Thread Zhang Chen



On 09/27/2016 04:38 PM, Jason Wang wrote:



On 2016年09月27日 10:22, Zhang Chen wrote:

COLO-proxy is a part of COLO project. COLO project is
composed of COLO-frame, COLO-proxy and block-replication.
It is used to compare the network package to help COLO
decide whether to do checkpoint. With COLO-proxy's help,
COLO greatly improves the performance.

The filter-redirector, filter-mirror, colo-compare
and filter-rewriter compose the COLO-proxy.

COLO-compare
It is used to compare the network package to help COLO decide
whether to do checkpoint.

Filter-rewriter
It will rewrite some of secondary packet to make
secondary guest's connection established successfully.
In this module we will rewrite tcp packet's ack to the secondary
from primary,and rewrite tcp packet's seq to the primary from
secondary.

The full version in this github:
https://github.com/zhangckid/qemu/tree/colo-v2.7-proxy-mode-compare-and-rewriter-sep27 



v15:
   - change "ConnectionKey key = { 0 };" to
 "ConnectionKey key = {{0},};", fix typo.


This causes  build failure for mingw@fedora docker test.

I tend to fix by not using the initializer but call a memset() in 
fill_connection_key().


If you agree, I will fix it in my tree and no need to repost a new 
version.






OK~ I agree.

Thanks
Zhang Chen


.



--
Thanks
zhangchen






Re: [Qemu-devel] [PATCH] pci-testdev: enhance to support new testcases

2016-09-27 Thread Peter Xu
On Tue, Sep 27, 2016 at 02:37:44PM +0800, Peter Xu wrote:
> On Thu, Sep 22, 2016 at 09:23:05PM +0300, Michael S. Tsirkin wrote:
> > On Thu, Sep 22, 2016 at 02:15:08PM +0800, Peter Xu wrote:
> > > pci-testdev is used mostly in kvm-unit-test for some eventfd tests.
> > > However I see it a good framework for other tests as well (e.g., the
> > > IOMMU unit test in the future). So enhanced it to support more
> > > testcases.
> > > 
> > > The original memory handlers and protocol are strict and not easy to
> > > change (we need to keep the old behavior of pci-testdev).
> > > So I added a
> > > new parameter for the device, and memory ops will be dynamically handled
> > > depending on what testcase it is configured. To specify a new test case
> > > for pci-testdev, we use:
> > > 
> > >   -device pci-testdev,testcase=XXX
> > > 
> > > The default will be "eventfd", which is the original behavior for
> > > pci-testdev. In the future, we can just add new testcase for pci-testdev
> > > to achieve different goals.
> > 
> > Instead of a parameter, how about creating a subregion
> > of the BAR and adding new tests at an offset?
> 
> Yeah, I can do that as well.

Actually what I want to propose is a new "protocol" for pci-testdev.
Currently it only allows a very limited test case, which is ioeventfd
(static device behavior, no parameter can be passed from guest OS,
etc.). IMHO we need something more general, e.g., guest can send cmd
to the testdev, to let it do specific test; along the way, we should
be able to provide parameters from guest OS (maybe we can gather all
kinds of test requirements from the list if people have any idea).

Take my example: IOMMU unit test would want the guest to send DMA/IRQ
request from the device's perspective. In that case, we would like to
"tell" the pci-testdev about where to write the DMA, and what data to
write specifically, or which IRQ to trigger. That's something we
cannot do right now. And I don't want to just add a new test case for
that specifically. I think we can make it more common.

Btw, I just noticed that pci-io and pci-mmio tests are not run by
default by kvm-unit-test. I don't know how many people are using
pci-testdev (guessing there is little since the test is really a
specific one). In that case, I don't know whether it'll be okay that I
cook a patch to refactor the codes without compatibility
considerations. If so, I can provide twin patch for kvm-unit-test as
well.

Thanks,

-- peterx



Re: [Qemu-devel] [PATCH 0/2] Xen pvUSB correction

2016-09-27 Thread Gerd Hoffmann
On Mo, 2016-09-26 at 14:43 +0200, Juergen Gross wrote:
> Trying to use pvUSB in a Xen guest with a qemu emulated USB controller
> will crash qemu as it tries to attach a pvUSB device to the emulated
> controller.

Hmm.  --verbose please.

While this clearly doesn't do what you intended I think it should not
have crashed qemu.  pvUSB devices should work on emulated controller
(and emulated devices should work on the pvUSB controller).  If they
don't you probably taking shortcuts somewhere which work only for the
pvUSB device on pvUSB controller case.

Please check.

cheers,
  Gerd




Re: [Qemu-devel] [PATCH 0/5] target-i386: Eliminate unnecessary has_msr_* global variables

2016-09-27 Thread Paolo Bonzini


- Original Message -
> From: "Eduardo Habkost" 
> To: "Paolo Bonzini" , "Marcelo Tosatti" 
> 
> Cc: k...@vger.kernel.org, qemu-devel@nongnu.org
> Sent: Tuesday, September 27, 2016 12:24:05 AM
> Subject: [PATCH 0/5] target-i386: Eliminate unnecessary has_msr_* global 
> variables
> 
> Some of the has_msr_* global variables are set depending on
> X86PCU fields and are unnecessary because we can look at the
> X86CPU object directly. This series eliminates some of them.
> 
> Eduardo Habkost (5):
>   target-i386: Remove has_msr_mtrr global variable
>   target-i386: Remove has_msr_hv_apic global variable
>   target-i386: Remove has_msr_hv_tsc global variable
>   target-i386: Clear KVM CPUID features if KVM is disabled
>   target-i386: Remove has_msr_* global vars for KVM features
> 
>  target-i386/cpu.c |  4 
>  target-i386/kvm.c | 51 +++
>  2 files changed, 23 insertions(+), 32 deletions(-)


Reviewed-by: Paolo Bonzini 



Re: [Qemu-devel] [PULL 00/28] Misc patches for 2016-09-26

2016-09-27 Thread Paolo Bonzini


- Original Message -
> From: "Peter Xu" 
> To: "Peter Maydell" , "Paolo Bonzini" 
> 
> Cc: "QEMU Developers" 
> Sent: Tuesday, September 27, 2016 4:12:51 AM
> Subject: Re: [Qemu-devel] [PULL 00/28] Misc patches for 2016-09-26
> 
> On Mon, Sep 26, 2016 at 02:19:08PM -0700, Peter Maydell wrote:
> 
> [...]
> 
> > I also see this compile failure:
> > 
> >   CCi386-softmmu/hw/i386/amd_iommu.o
> > /home/petmay01/linaro/qemu-for-merges/hw/i386/amd_iommu.c: In function
> > ‘amdvi_init’:
> > /home/petmay01/linaro/qemu-for-merges/hw/i386/amd_iommu.c:1083:17:
> > error: ‘MemoryRegionIOMMUOps {aka struct MemoryRegionIOMMUOps}’ has no
> > member named ‘notify_started’
> >  s->iommu_ops.notify_started = amdvi_iommu_notify_started;
> >  ^
> > /home/petmay01/linaro/qemu-for-merges/rules.mak:60: recipe for target
> > 'hw/i386/amd_iommu.o' failed
> 
> Paolo,
> 
> Would you please help squash this into 02/28 of your PULL request to
> solve above error?

Shall I also redo patch 3/3 for AMD IOMMU, like this:

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index a868539..6365682 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -1072,9 +1072,12 @@ static void amdvi_iommu_notify_flag_changed(MemoryRegion 
*iommu,
 {
 AMDVIAddressSpace *as = container_of(iommu, AMDVIAddressSpace, iommu);
 
-hw_error("device %02x.%02x.%x requires iommu notifier which is not "
- "currently supported", as->bus_num, PCI_SLOT(as->devfn),
- PCI_FUNC(as->devfn));
+if (new & IOMMU_NOTIFIER_MAP) {
+error_report("device %02x.%02x.%x requires iommu notifier which is not 
"
+ "currently supported", as->bus_num, PCI_SLOT(as->devfn),
+ PCI_FUNC(as->devfn));
+exit(1);
+}
 }
 
 static void amdvi_init(AMDVIState *s)

?

Paolo



Re: [Qemu-devel] [PATCH 1/2] xen: add an own bus for xen backend devices

2016-09-27 Thread Gerd Hoffmann
On Mo, 2016-09-26 at 14:43 +0200, Juergen Gross wrote:
> Add a bus for Xen backend devices in order to be able to establish a
> dedicated device path for pluggable devices.

Looks sane to me.  Can take this through the usb queue if I get an ack
from xen.

> +#define TYPE_XENSYSDEV "xensysdev"
> +#define TYPE_XENSYSBUS "xen-sysbus"

I'd make this consistent and use the dash either for both or not at all.

cheers,
  Gerd





Re: [Qemu-devel] [PATCH] qemu-img: Allow unmap backing image for zeroed clusters

2016-09-27 Thread Fam Zheng
On Tue, 09/27 10:41, Kevin Wolf wrote:
> Am 27.09.2016 um 04:20 hat Fam Zheng geschrieben:
> > We already specified BDRV_O_UNMAP when opening images in 'qemu-img
> > commit', but didn't turn on the "unmap" in the active commit job. This
> > patch fixes that so that zeroed clusters in top image can be discarded
> > which is desired in the virt-sparsify use case, where a temporary
> > overlay is created and fstrim'ed before commiting back, to free space in
> > the original image.
> > 
> > The block-commit keeps the existing behavior.
> > 
> > Signed-off-by: Fam Zheng 
> 
> Is there a good reason for not using discard in an active commit if the
> image was opened with BDRV_O_UNMAP? That is, wouldn't it make sense to
> just set the unmap option for the mirror unconditionally and also for
> block-commit?
> 
> If there is a reason, we should probably add an option to block-commit,
> but I still feels that enabling it should be the default then.

I didn't touch block-commit because I wasn't sure, but you made a good point,
we can turn this on there too.

Fam



Re: [Qemu-devel] [PATCH 2/2] xen: add qemu device for each pvusb backend

2016-09-27 Thread Daniel P. Berrange
On Mon, Sep 26, 2016 at 02:43:57PM +0200, Juergen Gross wrote:
> In order to be able to specify to which pvusb controller a new pvusb
> device should be added we need a qemu device for each pvusb controller
> with an associated id.
> 
> Add such a device when a new controller is requested and attach the
> usb bus of that controller to the new device. Any device connected to
> that controller can now specify the bus and port directly via its
> properties.
> 
> Signed-off-by: Juergen Gross 
> ---
>  hw/usb/xen-usb.c | 81 
> +++-
>  1 file changed, 68 insertions(+), 13 deletions(-)
> 
> @@ -733,10 +740,10 @@ static void usbback_portid_add(struct usbback_info 
> *usbif, unsigned port,
>  {
>  unsigned speed;
>  char *portname;
> -USBPort *p;
>  Error *local_err = NULL;
>  QDict *qdict;
>  QemuOpts *opts;
> +char tmp[32];
>  
>  if (usbif->ports[port - 1].dev) {
>  return;
> @@ -749,11 +756,14 @@ static void usbback_portid_add(struct usbback_info 
> *usbif, unsigned port,
>  return;
>  }
>  portname++;
> -p = &(usbif->ports[port - 1].port);
> -snprintf(p->path, sizeof(p->path), "%s", portname);
>  
>  qdict = qdict_new();
>  qdict_put(qdict, "driver", qstring_from_str("usb-host"));
> +snprintf(tmp, sizeof(tmp), "%s.0", usbif->id);

Don't snprintf into fixed length buffers. g_strdup_printf() does the
right thing

> +qdict_put(qdict, "bus", qstring_from_str(tmp));
> +snprintf(tmp, sizeof(tmp), "%s-%u", usbif->id, port);
> +qdict_put(qdict, "id", qstring_from_str(tmp));


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] [PATCH v3 1/3] qemu-nbd: Add --fork option

2016-09-27 Thread Kevin Wolf
Am 25.08.2016 um 18:30 hat Max Reitz geschrieben:
> Using the --fork option, one can make qemu-nbd fork the worker process.
> The original process will exit on error of the worker or once the worker
> enters the main loop.
> 
> Suggested-by: Sascha Silbe 
> Signed-off-by: Max Reitz 
> ---
>  qemu-nbd.c | 15 ++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index e3571c2..8c2d582 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -48,6 +48,7 @@
>  #define QEMU_NBD_OPT_OBJECT260
>  #define QEMU_NBD_OPT_TLSCREDS  261
>  #define QEMU_NBD_OPT_IMAGE_OPTS262
> +#define QEMU_NBD_OPT_FORK  263
>  
>  #define MBR_SIZE 512
>  
> @@ -503,6 +504,7 @@ int main(int argc, char **argv)
>  { "tls-creds", required_argument, NULL, QEMU_NBD_OPT_TLSCREDS },
>  { "image-opts", no_argument, NULL, QEMU_NBD_OPT_IMAGE_OPTS },
>  { "trace", required_argument, NULL, 'T' },
> +{ "fork", no_argument, NULL, QEMU_NBD_OPT_FORK },
>  { NULL, 0, NULL, 0 }
>  };
>  int ch;
> @@ -524,6 +526,8 @@ int main(int argc, char **argv)
>  bool imageOpts = false;
>  bool writethrough = true;
>  char *trace_file = NULL;
> +bool fork_process = false;
> +int old_stderr = -1;
>  
>  /* The client thread uses SIGTERM to interrupt the server.  A signal
>   * handler ensures that "qemu-nbd -v -c" exits with a nice status code.
> @@ -714,6 +718,9 @@ int main(int argc, char **argv)
>  g_free(trace_file);
>  trace_file = trace_opt_parse(optarg);
>  break;
> +case QEMU_NBD_OPT_FORK:
> +fork_process = true;
> +break;
>  }
>  }
>  
> @@ -773,7 +780,7 @@ int main(int argc, char **argv)
>  return 0;
>  }
>  
> -if (device && !verbose) {
> +if ((device && !verbose) || fork_process) {
>  int stderr_fd[2];
>  pid_t pid;
>  int ret;
> @@ -796,6 +803,7 @@ int main(int argc, char **argv)
>  ret = qemu_daemon(1, 0);
>  
>  /* Temporarily redirect stderr to the parent's pipe...  */
> +old_stderr = dup(STDERR_FILENO);
>  dup2(stderr_fd[1], STDERR_FILENO);
>  if (ret < 0) {
>  error_report("Failed to daemonize: %s", strerror(errno));

I don't have a kernel with NBD support, so I can't test this easily, but
doesn't this break the daemon mode for --connect? I mean, it will still
fork, but won't the parent be alive until the child exits?

> @@ -951,6 +959,11 @@ int main(int argc, char **argv)
>  exit(EXIT_FAILURE);
>  }
>  
> +if (fork_process) {
> +dup2(old_stderr, STDERR_FILENO);
> +close(old_stderr);
> +}

Because this code doesn't run for --connect (unless --fork is given,
too).

>  state = RUNNING;
>  do {
>  main_loop_wait(false);

The documentation needs an update, too.

Kevin



Re: [Qemu-devel] [PATCH 2/2] xen: add qemu device for each pvusb backend

2016-09-27 Thread Gerd Hoffmann
  Hi,

>  struct usbback_info {
>  struct XenDevice xendev;  /* must be first */
> +char id[24];
> +struct USBBACKDevice *dev;
>  USBBus   bus;
>  void *urb_sring;
>  void *conn_sring;
> @@ -116,6 +124,10 @@ struct usbback_info {
>  QEMUBH   *bh;
>  };
>  
> +typedef struct USBBACKDevice {
> +DeviceState qdev;
> +} USBBACKDevice;

Hmm, I think the xen core needs better QOM support ...

struct XenDevice should have a DeviceState element, so it can be used as
device object directly instead of attaching a device object like
this ...

cheers,
  Gerd




Re: [Qemu-devel] [PATCH v3 07/10] ppc/pnv: add XSCOM infrastructure

2016-09-27 Thread Cédric Le Goater
>> +#include 
>> +
>> +static void xscom_complete(uint64_t hmer_bits)
>> +{
>> +CPUState *cs = current_cpu;
> 
> Hmm.. is current_cpu a safe thing to use in the case of KVM or MTTCG?

yes, as we are running under cpu_exec when doing this call.

>> +PowerPCCPU *cpu = POWERPC_CPU(cs);
>> +CPUPPCState *env = &cpu->env;
>> +
>> +cpu_synchronize_state(cs);
>> +env->spr[SPR_HMER] |= hmer_bits;
>> +
>> +/* XXX Need a CPU helper to set HMER, also handle gneeration
>> + * of HMIs
>> + */

Ben, 

The CPU helper would be to replicate the value of the SPR_HMER in all
the threads of the core I guess ? 

Thanks,

C.



[Qemu-devel] [PATCH] xenpv: Fix qemu_uuid compiling error

2016-09-27 Thread Fam Zheng
9c5ce8db2 switched the type of qemu_uuid and this should have followed.
Fix it.

Signed-off-by: Fam Zheng 
---
 hw/xenpv/xen_domainbuild.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/xenpv/xen_domainbuild.c b/hw/xenpv/xen_domainbuild.c
index b439b0e..457a897 100644
--- a/hw/xenpv/xen_domainbuild.c
+++ b/hw/xenpv/xen_domainbuild.c
@@ -232,7 +232,7 @@ int xen_domain_build_pv(const char *kernel, const char 
*ramdisk,
 unsigned long xenstore_mfn = 0, console_mfn = 0;
 int rc;
 
-memcpy(uuid, qemu_uuid, sizeof(uuid));
+memcpy(uuid, &qemu_uuid, sizeof(uuid));
 rc = xen_domain_create(xen_xc, ssidref, uuid, flags, &xen_domid);
 if (rc < 0) {
 fprintf(stderr, "xen: xc_domain_create() failed\n");
-- 
2.7.4




Re: [Qemu-devel] [PULL 00/28] Misc patches for 2016-09-26

2016-09-27 Thread Peter Xu
On Tue, Sep 27, 2016 at 04:53:57AM -0400, Paolo Bonzini wrote:
> 
> 
> - Original Message -
> > From: "Peter Xu" 
> > To: "Peter Maydell" , "Paolo Bonzini" 
> > 
> > Cc: "QEMU Developers" 
> > Sent: Tuesday, September 27, 2016 4:12:51 AM
> > Subject: Re: [Qemu-devel] [PULL 00/28] Misc patches for 2016-09-26
> > 
> > On Mon, Sep 26, 2016 at 02:19:08PM -0700, Peter Maydell wrote:
> > 
> > [...]
> > 
> > > I also see this compile failure:
> > > 
> > >   CCi386-softmmu/hw/i386/amd_iommu.o
> > > /home/petmay01/linaro/qemu-for-merges/hw/i386/amd_iommu.c: In function
> > > ‘amdvi_init’:
> > > /home/petmay01/linaro/qemu-for-merges/hw/i386/amd_iommu.c:1083:17:
> > > error: ‘MemoryRegionIOMMUOps {aka struct MemoryRegionIOMMUOps}’ has no
> > > member named ‘notify_started’
> > >  s->iommu_ops.notify_started = amdvi_iommu_notify_started;
> > >  ^
> > > /home/petmay01/linaro/qemu-for-merges/rules.mak:60: recipe for target
> > > 'hw/i386/amd_iommu.o' failed
> > 
> > Paolo,
> > 
> > Would you please help squash this into 02/28 of your PULL request to
> > solve above error?
> 
> Shall I also redo patch 3/3 for AMD IOMMU, like this:
> 
> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
> index a868539..6365682 100644
> --- a/hw/i386/amd_iommu.c
> +++ b/hw/i386/amd_iommu.c
> @@ -1072,9 +1072,12 @@ static void 
> amdvi_iommu_notify_flag_changed(MemoryRegion *iommu,
>  {
>  AMDVIAddressSpace *as = container_of(iommu, AMDVIAddressSpace, iommu);
>  
> -hw_error("device %02x.%02x.%x requires iommu notifier which is not "
> - "currently supported", as->bus_num, PCI_SLOT(as->devfn),
> - PCI_FUNC(as->devfn));
> +if (new & IOMMU_NOTIFIER_MAP) {
> +error_report("device %02x.%02x.%x requires iommu notifier which is 
> not "
> + "currently supported", as->bus_num, PCI_SLOT(as->devfn),
> + PCI_FUNC(as->devfn));
> +exit(1);
> +}
>  }
>  
>  static void amdvi_init(AMDVIState *s)
> 
> ?

I think we should keep it as it is, because Jason's patchset will only
support intel-iommu, not amd-iommu. For now, it won't have problem
(just like Intel IOMMU one). But after Jason's patch is merged, people
will be able to boot a guest with vhost and amd-iommu (which we
actually do not support yet), and that might be problematic.

Thanks,

-- peterx



Re: [Qemu-devel] Live migration without bdrv_drain_all()

2016-09-27 Thread Stefan Hajnoczi
On Mon, Aug 29, 2016 at 06:56:42PM +, Felipe Franciosi wrote:
> Heya!
> 
> > On 29 Aug 2016, at 08:06, Stefan Hajnoczi  wrote:
> > 
> > At KVM Forum an interesting idea was proposed to avoid
> > bdrv_drain_all() during live migration.  Mike Cui and Felipe Franciosi
> > mentioned running at queue depth 1.  It needs more thought to make it
> > workable but I want to capture it here for discussion and to archive
> > it.
> > 
> > bdrv_drain_all() is synchronous and can cause VM downtime if I/O
> > requests hang.  We should find a better way of quiescing I/O that is
> > not synchronous.  Up until now I thought we should simply add a
> > timeout to bdrv_drain_all() so it can at least fail (and live
> > migration would fail) if I/O is stuck instead of hanging the VM.  But
> > the following approach is also interesting...
> > 
> > During the iteration phase of live migration we could limit the queue
> > depth so points with no I/O requests in-flight are identified.  At
> > these points the migration algorithm has the opportunity to move to
> > the next phase without requiring bdrv_drain_all() since no requests
> > are pending.
> 
> I actually think that this "io quiesced state" is highly unlikely to _just_ 
> happen on a busy guest. The main idea behind running at QD1 is to naturally 
> throttle the guest and make it easier to "force quiesce" the VQs.
> 
> In other words, if the guest is busy and we run at QD1, I would expect the 
> rings to be quite full of pending (ie. unprocessed) requests. At the same 
> time, I would expect that a call to bdrv_drain_all() (as part of 
> do_vm_stop()) should complete much quicker.
> 
> Nevertheless, you mentioned that this is still problematic as that single 
> outstanding IO could block, leaving the VM paused for longer.
> 
> My suggestion is therefore that we leave the vCPUs running, but stop picking 
> up requests from the VQs. Provided nothing blocks, you should reach the "io 
> quiesced state" fairly quickly. If you don't, then the VM is at least still 
> running (despite seeing no progress on its VQs).
> 
> Thoughts on that?

If the guest experiences a hung disk it may enter error recovery.  QEMU
should avoid this so the guest doesn't remount file systems read-only.

This can be solved by only quiescing the disk for, say, 30 seconds at a
time.  If we don't reach a point where live migration can proceed during
those 30 seconds then the disk will service requests again temporarily
to avoid upsetting the guest.

I wonder if Juan or David have any thoughts from the live migration
perspective?

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v3 07/10] ppc/pnv: add XSCOM infrastructure

2016-09-27 Thread Cédric Le Goater
On 09/27/2016 11:10 AM, Cédric Le Goater wrote:
>>> +#include 
>>> +
>>> +static void xscom_complete(uint64_t hmer_bits)
>>> +{
>>> +CPUState *cs = current_cpu;
>>
>> Hmm.. is current_cpu a safe thing to use in the case of KVM or MTTCG?
> 
> yes, as we are running under cpu_exec when doing this call.

well, this is not true under the monitor. 

So we will have to come up with something to handle xscom read/writes 
from the monitor. Could we use first_cpu in that case ? 

Thanks,

C. 

>>> +PowerPCCPU *cpu = POWERPC_CPU(cs);
>>> +CPUPPCState *env = &cpu->env;
>>> +
>>> +cpu_synchronize_state(cs);
>>> +env->spr[SPR_HMER] |= hmer_bits;
>>> +
>>> +/* XXX Need a CPU helper to set HMER, also handle gneeration
>>> + * of HMIs
>>> + */
> 
> Ben, 
> 
> The CPU helper would be to replicate the value of the SPR_HMER in all
> the threads of the core I guess ? 
> 
> Thanks,
> 
> C.
> 




Re: [Qemu-devel] [PATCH v2 0/3] pc: compat macroses fixes/cleanups

2016-09-27 Thread Igor Mammedov
On Mon, 19 Sep 2016 10:32:32 +0200
Igor Mammedov  wrote:

> Changes since v1:
>  - drop 'pc: fix regression introduced by adding 2.8 machine'
>it's not needed
>  - cleanup old style compat chaining
>  - add missing HW_COMPAT_2_8 to PC_COMPAT_2_8 machine type

Eduardo could you take it through your tree?

> 
> ref to v1:
>   https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg03140.html
> 
> 
> Igor Mammedov (3):
>   pc: clean up COMPAT macroses chaining
>   target-i386: turn off CPU.l3-cache only for 2.7 and older machine
> types
>   add stub HW_COMPAT_2_8 to PC_COMPAT_2_8
> 
>  include/hw/compat.h  |  2 ++
>  include/hw/i386/pc.h | 10 --
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 



[Qemu-devel] [PATCH v2] block: Turn on "unmap" in active commit

2016-09-27 Thread Fam Zheng
We already specified BDRV_O_UNMAP when opening images in 'qemu-img
commit', but didn't turn on the "unmap" in the active commit job. This
patch fixes that so that zeroed clusters in top image can be discarded
which is desired in the virt-sparsify use case, where a temporary
overlay is created and fstrim'ed before commiting back, to free space in
the original image.

This also enables it for block-commit.

Signed-off-by: Fam Zheng 
---
v2: Add "unmap" to block-commit as well. [Kevin]
---
 block/mirror.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/mirror.c b/block/mirror.c
index f9d1fec..8f6f506 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1043,7 +1043,7 @@ void commit_active_start(const char *job_id, 
BlockDriverState *bs,
 mirror_start_job(job_id, bs, base, NULL, speed, 0, 0,
  MIRROR_LEAVE_BACKING_CHAIN,
  on_error, on_error, false, cb, opaque, &local_err,
- &commit_active_job_driver, false, base, auto_complete);
+ &commit_active_job_driver, true, base, auto_complete);
 if (local_err) {
 error_propagate(errp, local_err);
 goto error_restore_flags;
-- 
2.7.4




Re: [Qemu-devel] [PATCH v3 0/3] crypto: add ctr mode support and little inprovement

2016-09-27 Thread Gonglei (Arei)
Hi Daniel,

I'll post virtio-crypto v4 based on this patch set. 
Would you please merge it if it's ok? Thanks.

Regards,
-Gonglei


> -Original Message-
> From: Gonglei (Arei)
> Sent: Monday, September 26, 2016 5:23 PM
> To: qemu-devel@nongnu.org
> Cc: berra...@redhat.com; Wubin (H); Gonglei (Arei)
> Subject: [PATCH v3 0/3] crypto: add ctr mode support and little inprovement
> 
> Please see the detailed description in each patch.
> 
> v3:
>  - adjust the sequence of patch 1 and 2. (Daniel)
>  - fix a mising 'break' in code logic. (Daniel)
> v2:
>  - fix qtest complaint in cipher-builtin backend.
>  - introduce patch 2 and patch 3.
> 
> Gonglei (3):
>   crypto: extend mode as a parameter in qcrypto_cipher_supports()
>   crypto: add CTR mode support
>   crypto: add mode check in qcrypto_cipher_new() for cipher-builtin
> 
>  block/qcow.c   |  3 ++-
>  block/qcow2.c  |  3 ++-
>  crypto/cipher-builtin.c| 25 +++-
>  crypto/cipher-gcrypt.c | 38 +--
>  crypto/cipher-nettle.c | 28 +--
>  crypto/cipher.c|  1 +
>  include/crypto/cipher.h| 12 ++
>  qapi/crypto.json   |  3 ++-
>  tests/test-crypto-cipher.c | 57
> +-
>  ui/vnc.c   |  2 +-
>  10 files changed, 152 insertions(+), 20 deletions(-)
> 
> --
> 1.7.12.4
> 




Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/4] libqos: add PCI management in qtest_vboot()/qtest_shutdown()

2016-09-27 Thread David Gibson
On Tue, Sep 27, 2016 at 09:33:58AM +0200, Laurent Vivier wrote:
> 
> 
> On 27/09/2016 05:48, David Gibson wrote:
> > On Mon, Sep 26, 2016 at 04:10:47PM +0200, Laurent Vivier wrote:
> >> Signed-off-by: Laurent Vivier 
> >> ---
> >>  tests/e1000e-test.c |  2 +-
> >>  tests/i440fx-test.c |  2 +-
> >>  tests/ide-test.c|  2 +-
> >>  tests/ivshmem-test.c|  2 +-
> >>  tests/libqos/ahci.c |  2 +-
> >>  tests/libqos/libqos-pc.c|  5 -
> >>  tests/libqos/libqos-spapr.c |  5 -
> >>  tests/libqos/libqos.c   | 21 -
> >>  tests/libqos/libqos.h   |  3 +++
> >>  tests/libqos/pci-pc.c   |  2 +-
> >>  tests/libqos/pci-pc.h   |  3 ++-
> >>  tests/q35-test.c|  2 +-
> >>  tests/rtl8139-test.c|  2 +-
> >>  tests/tco-test.c|  2 +-
> >>  tests/usb-hcd-ehci-test.c   |  2 +-
> >>  tests/usb-hcd-uhci-test.c   |  2 +-
> >>  tests/vhost-user-test.c |  2 +-
> >>  tests/virtio-9p-test.c  |  2 +-
> >>  tests/virtio-blk-test.c |  2 +-
> >>  tests/virtio-net-test.c |  2 +-
> >>  tests/virtio-scsi-test.c|  2 +-
> >>  21 files changed, 45 insertions(+), 24 deletions(-)
> > 
> > Couple of queries below.
> > 
> 
> ...
> 
> >> @@ -49,9 +54,15 @@ QOSState *qtest_boot(QOSOps *ops, const char 
> >> *cmdline_fmt, ...)
> >>   */
> >>  void qtest_shutdown(QOSState *qs)
> >>  {
> >> -if (qs->alloc && qs->ops && qs->ops->uninit_allocator) {
> >> -qs->ops->uninit_allocator(qs->alloc);
> >> -qs->alloc = NULL;
> >> +if (qs->ops) {
> >> +if (qs->alloc && qs->ops->uninit_allocator) {
> >> +qs->ops->uninit_allocator(qs->alloc);
> >> +qs->alloc = NULL;
> >> +}
> >> +if (qs->pcibus && qs->ops->qpci_free) {
> >> +qs->ops->qpci_free(qs->pcibus);
> >> +qs->pcibus = NULL;
> >> +}
> > 
> > Is it safe to cleanup the allocator before the PCI stuff?  Usually
> > cleanups want to go in the opposite order to initialization.
> 
> Yes, you're right. Im' going to fix that.

Ok.

> >>  }
> >>  qtest_quit(qs->qts);
> >>  g_free(qs);
> >> diff --git a/tests/libqos/libqos.h b/tests/libqos/libqos.h
> >> index 604980d..a9f6990 100644
> >> --- a/tests/libqos/libqos.h
> >> +++ b/tests/libqos/libqos.h
> >> @@ -8,11 +8,14 @@
> >>  typedef struct QOSOps {
> >>  QGuestAllocator *(*init_allocator)(QAllocOpts);
> >>  void (*uninit_allocator)(QGuestAllocator *);
> >> +QPCIBus *(*qpci_init)(QGuestAllocator *alloc);
> >> +void (*qpci_free)(QPCIBus *bus);
> >>  } QOSOps;
> >>  
> >>  typedef struct QOSState {
> >>  QTestState *qts;
> >>  QGuestAllocator *alloc;
> >> +QPCIBus *pcibus;
> >>  QOSOps *ops;
> >>  } QOSState;
> >>  
> >> diff --git a/tests/libqos/pci-pc.c b/tests/libqos/pci-pc.c
> >> index 82066b8..9600ed6 100644
> >> --- a/tests/libqos/pci-pc.c
> >> +++ b/tests/libqos/pci-pc.c
> >> @@ -212,7 +212,7 @@ static void qpci_pc_iounmap(QPCIBus *bus, void *data)
> >>  /* FIXME */
> >>  }
> >>  
> >> -QPCIBus *qpci_init_pc(void)
> >> +QPCIBus *qpci_init_pc(QGuestAllocator *alloc)
> >>  {
> >>  QPCIBusPC *ret;
> >>
> > 
> > You've added the alloc parameter, but you don't actually appear to use it..
> 
> It's normal: qpci_init_spapr() needs it and to have the same function
> signature we have to add it to qpci_init_pc() even if it is not used.
> (it's why I have added a lot of of qpci_init_pc(NULL)), so we can add
> init in a generic way in "struct QOSOps". Perhaps we can use "void
> *opaque" instead of "QGuestAllocator *alloc"?

Oh, of course.  Sorry I missed that.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] Live migration without bdrv_drain_all()

2016-09-27 Thread Daniel P. Berrange
On Mon, Aug 29, 2016 at 11:06:48AM -0400, Stefan Hajnoczi wrote:
> At KVM Forum an interesting idea was proposed to avoid
> bdrv_drain_all() during live migration.  Mike Cui and Felipe Franciosi
> mentioned running at queue depth 1.  It needs more thought to make it
> workable but I want to capture it here for discussion and to archive
> it.
> 
> bdrv_drain_all() is synchronous and can cause VM downtime if I/O
> requests hang.  We should find a better way of quiescing I/O that is
> not synchronous.  Up until now I thought we should simply add a
> timeout to bdrv_drain_all() so it can at least fail (and live
> migration would fail) if I/O is stuck instead of hanging the VM.  But
> the following approach is also interesting...

How would you decide what an acceptable timeout is for the drain
operation ? At what point does a stuck drain op cause the VM
to stall ?  The drain call happens from the migration thread, so
it shouldn't impact vcpu threads or the main event loop thread
if it takes too long.

> 
> During the iteration phase of live migration we could limit the queue
> depth so points with no I/O requests in-flight are identified.  At
> these points the migration algorithm has the opportunity to move to
> the next phase without requiring bdrv_drain_all() since no requests
> are pending.
> 
> Unprocessed requests are left in the virtio-blk/virtio-scsi virtqueues
> so that the destination QEMU can process them after migration
> completes.
> 
> Unfortunately this approach makes convergence harder because the VM
> might also be dirtying memory pages during the iteration phase.  Now
> we need to reach a spot where no I/O is in-flight *and* dirty memory
> is under the threshold.

It doesn't seem like this could easily fit in with post-copy. During
the switchover from pre-copy to post-copy migration calls vm_stop_force_state
which will trigger bdrv_drain_all().

The point at which you switch from pre to post copy mode is not controlled
by QEMU, instead it is an explicit admin action triggered via a QMP command.
Now the actual switch over is not synchronous with completion of the QMP
command, so there is small scope for delaying it to a convenient time, but
not by a very significant amount & certainly not anywhere near 30 seconds.
Perhaps 1 second at the most.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



[Qemu-devel] [PATCH V9 3/7] coroutine: add a macro for the coroutine stack size

2016-09-27 Thread Peter Lieven
Signed-off-by: Peter Lieven 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Richard Henderson 
---
 include/qemu/coroutine_int.h | 2 ++
 util/coroutine-sigaltstack.c | 2 +-
 util/coroutine-ucontext.c| 2 +-
 util/coroutine-win32.c   | 2 +-
 4 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
index 6df9d33..14d4f1d 100644
--- a/include/qemu/coroutine_int.h
+++ b/include/qemu/coroutine_int.h
@@ -28,6 +28,8 @@
 #include "qemu/queue.h"
 #include "qemu/coroutine.h"
 
+#define COROUTINE_STACK_SIZE (1 << 20)
+
 typedef enum {
 COROUTINE_YIELD = 1,
 COROUTINE_TERMINATE = 2,
diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c
index 171cd44..a5bcb7e 100644
--- a/util/coroutine-sigaltstack.c
+++ b/util/coroutine-sigaltstack.c
@@ -143,7 +143,7 @@ static void coroutine_trampoline(int signal)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = 1 << 20;
+const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineSigAltStack *co;
 CoroutineThreadState *coTS;
 struct sigaction sa;
diff --git a/util/coroutine-ucontext.c b/util/coroutine-ucontext.c
index 2bb7e10..31254ab 100644
--- a/util/coroutine-ucontext.c
+++ b/util/coroutine-ucontext.c
@@ -82,7 +82,7 @@ static void coroutine_trampoline(int i0, int i1)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = 1 << 20;
+const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineUContext *co;
 ucontext_t old_uc, uc;
 sigjmp_buf old_env;
diff --git a/util/coroutine-win32.c b/util/coroutine-win32.c
index 02e28e8..de6bd4f 100644
--- a/util/coroutine-win32.c
+++ b/util/coroutine-win32.c
@@ -71,7 +71,7 @@ static void CALLBACK coroutine_trampoline(void *co_)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = 1 << 20;
+const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineWin32 *co;
 
 co = g_malloc0(sizeof(*co));
-- 
1.9.1




Re: [Qemu-devel] Live migration without bdrv_drain_all()

2016-09-27 Thread Dr. David Alan Gilbert
* Stefan Hajnoczi (stefa...@gmail.com) wrote:
> On Mon, Aug 29, 2016 at 06:56:42PM +, Felipe Franciosi wrote:
> > Heya!
> > 
> > > On 29 Aug 2016, at 08:06, Stefan Hajnoczi  wrote:
> > > 
> > > At KVM Forum an interesting idea was proposed to avoid
> > > bdrv_drain_all() during live migration.  Mike Cui and Felipe Franciosi
> > > mentioned running at queue depth 1.  It needs more thought to make it
> > > workable but I want to capture it here for discussion and to archive
> > > it.
> > > 
> > > bdrv_drain_all() is synchronous and can cause VM downtime if I/O
> > > requests hang.  We should find a better way of quiescing I/O that is
> > > not synchronous.  Up until now I thought we should simply add a
> > > timeout to bdrv_drain_all() so it can at least fail (and live
> > > migration would fail) if I/O is stuck instead of hanging the VM.  But
> > > the following approach is also interesting...
> > > 
> > > During the iteration phase of live migration we could limit the queue
> > > depth so points with no I/O requests in-flight are identified.  At
> > > these points the migration algorithm has the opportunity to move to
> > > the next phase without requiring bdrv_drain_all() since no requests
> > > are pending.
> > 
> > I actually think that this "io quiesced state" is highly unlikely to _just_ 
> > happen on a busy guest. The main idea behind running at QD1 is to naturally 
> > throttle the guest and make it easier to "force quiesce" the VQs.
> > 
> > In other words, if the guest is busy and we run at QD1, I would expect the 
> > rings to be quite full of pending (ie. unprocessed) requests. At the same 
> > time, I would expect that a call to bdrv_drain_all() (as part of 
> > do_vm_stop()) should complete much quicker.
> > 
> > Nevertheless, you mentioned that this is still problematic as that single 
> > outstanding IO could block, leaving the VM paused for longer.
> > 
> > My suggestion is therefore that we leave the vCPUs running, but stop 
> > picking up requests from the VQs. Provided nothing blocks, you should reach 
> > the "io quiesced state" fairly quickly. If you don't, then the VM is at 
> > least still running (despite seeing no progress on its VQs).
> > 
> > Thoughts on that?
> 
> If the guest experiences a hung disk it may enter error recovery.  QEMU
> should avoid this so the guest doesn't remount file systems read-only.
> 
> This can be solved by only quiescing the disk for, say, 30 seconds at a
> time.  If we don't reach a point where live migration can proceed during
> those 30 seconds then the disk will service requests again temporarily
> to avoid upsetting the guest.
> 
> I wonder if Juan or David have any thoughts from the live migration
> perspective?

Throttling IO to reduce the time in the final drain makes sense
to me, however:
   a) It doesn't solve the problem if the IO device dies at just the wrong time,
  so you can still get that hang in bdrv_drain_all

   b) Completely stopping guest IO sounds too drastic to me unless you can
  time it to be just at the point before the end of migration; that feels
  tricky to get right unless you can somehow tie it to an estimate of
  remaining dirty RAM (that never works that well).

   c) Something like a 30 second pause still feels too long; if that was
  a big hairy database workload it would effectively be 30 seconds
  of downtime.

Dave

> 
> Stefan


--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



[Qemu-devel] [PATCH V9 0/7] coroutine: mmap stack memory and stack size

2016-09-27 Thread Peter Lieven
I decided to split this from the rest of the Qemu RSS usage series as
it contains the more or less non contentious patches.

I omitted the MAP_GROWSDOWN flag in mmap as we are not 100% sure which
side effects it has.

I kept the guard page which is now nicely makes the stacks visible in
smaps. The old version of the relevent patch lacked the MAP_FIXED flag
in the second call to mmap.

v8->v9:
 - Patch 1: return size of stack including guard page [Kevin]
 - Patch 2: added to rename coroutine sigaltstack strucht [Kevin]

v7->v8:
 The series failed on platforms with 64kB page size. Thus the following changes
 where made:
 - Patch 1: add the guard page to the stack memory and do not deduct it [Kevin, 
Stephan]
 - Patch 1: Submit the requested page size as a pointer so that 
qemu_alloc_stack can
adjust the size according to system requirements and that the full 
size is usable
to the caller.
 - Patch 6: reduced stack size to 60kB so that on systems with 4kB page size we 
still get
64kB allocations.

v6->v7:
 - Patch 1: avoid multiple calls to sysconf and getpagesize [Richard]

v5->v6:
 - Patch 1: added info that the guard page is deducted from stack memory to
commit msg and headers [Stefan]
 - rebased to master

v4->v5:
 - Patch 1: check if _SC_THREAD_STACK_MIN is defined
 - Patch 1: guard against sysconf(_SC_THREAD_STACK_MIN) returning -1 [Eric]

v3->v4:
 - Patch 1: add a static function to adjust the stack size [Richard]
 - Patch 1: round up the stack size to multiple of the pagesize.

v2->v3:
 - Patch 1,6: adjusted commit message to mention the guard page [Markus]

v1->v2:
 - Patch 1: added an architecture dependend guard page [Richard]
 - Patch 1: avoid stacks smaller than _SC_THREAD_STACK_MIN [Richard]
 - Patch 1: use mmap+mprotect instead of mmap+mmap [Richard]
 - Patch 5: u_int32_t -> uint32_t [Richard]
 - Patch 5: only available if stack grows down

Peter Lieven (7):
  oslib-posix: add helpers for stack alloc and free
  coroutine-sigaltstack: rename coroutine struct appropriately
  coroutine: add a macro for the coroutine stack size
  coroutine-ucontext: use helper for allocating stack memory
  coroutine-sigaltstack: use helper for allocating stack memory
  oslib-posix: add a configure switch to debug stack usage
  coroutine: reduce stack size to 60kB

 configure| 19 +++
 include/qemu/coroutine_int.h |  2 ++
 include/sysemu/os-posix.h| 27 
 util/coroutine-sigaltstack.c | 25 +++---
 util/coroutine-ucontext.c| 11 ---
 util/coroutine-win32.c   |  2 +-
 util/oslib-posix.c   | 77 
 7 files changed, 145 insertions(+), 18 deletions(-)

-- 
1.9.1




[Qemu-devel] [PATCH V9 7/7] coroutine: reduce stack size to 60kB

2016-09-27 Thread Peter Lieven
evaluation with the recently introduced maximum stack usage monitoring revealed
that the actual used stack size was never above 4kB so allocating 1MB stack
for each coroutine is a lot of wasted memory. So reduce the stack size to
60kB which should still give enough head room. The guard page added
in qemu_alloc_stack will catch a potential stack overflow introduced
by this commit. The 60kB + guard page will result in an allocation of
64kB per coroutine on systems where a page is 4kB.

Signed-off-by: Peter Lieven 
---
 include/qemu/coroutine_int.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
index 14d4f1d..be14260 100644
--- a/include/qemu/coroutine_int.h
+++ b/include/qemu/coroutine_int.h
@@ -28,7 +28,7 @@
 #include "qemu/queue.h"
 #include "qemu/coroutine.h"
 
-#define COROUTINE_STACK_SIZE (1 << 20)
+#define COROUTINE_STACK_SIZE 61440
 
 typedef enum {
 COROUTINE_YIELD = 1,
-- 
1.9.1




[Qemu-devel] [PATCH V9 4/7] coroutine-ucontext: use helper for allocating stack memory

2016-09-27 Thread Peter Lieven
Signed-off-by: Peter Lieven 
---
 util/coroutine-ucontext.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/util/coroutine-ucontext.c b/util/coroutine-ucontext.c
index 31254ab..6621f3f 100644
--- a/util/coroutine-ucontext.c
+++ b/util/coroutine-ucontext.c
@@ -34,6 +34,7 @@
 typedef struct {
 Coroutine base;
 void *stack;
+size_t stack_size;
 sigjmp_buf env;
 
 #ifdef CONFIG_VALGRIND_H
@@ -82,7 +83,6 @@ static void coroutine_trampoline(int i0, int i1)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineUContext *co;
 ucontext_t old_uc, uc;
 sigjmp_buf old_env;
@@ -101,17 +101,18 @@ Coroutine *qemu_coroutine_new(void)
 }
 
 co = g_malloc0(sizeof(*co));
-co->stack = g_malloc(stack_size);
+co->stack_size = COROUTINE_STACK_SIZE;
+co->stack = qemu_alloc_stack(&co->stack_size);
 co->base.entry_arg = &old_env; /* stash away our jmp_buf */
 
 uc.uc_link = &old_uc;
 uc.uc_stack.ss_sp = co->stack;
-uc.uc_stack.ss_size = stack_size;
+uc.uc_stack.ss_size = co->stack_size;
 uc.uc_stack.ss_flags = 0;
 
 #ifdef CONFIG_VALGRIND_H
 co->valgrind_stack_id =
-VALGRIND_STACK_REGISTER(co->stack, co->stack + stack_size);
+VALGRIND_STACK_REGISTER(co->stack, co->stack + co->stack_size);
 #endif
 
 arg.p = co;
@@ -149,7 +150,7 @@ void qemu_coroutine_delete(Coroutine *co_)
 valgrind_stack_deregister(co);
 #endif
 
-g_free(co->stack);
+qemu_free_stack(co->stack, co->stack_size);
 g_free(co);
 }
 
-- 
1.9.1




[Qemu-devel] [PATCH V9 5/7] coroutine-sigaltstack: use helper for allocating stack memory

2016-09-27 Thread Peter Lieven
Signed-off-by: Peter Lieven 
---
 util/coroutine-sigaltstack.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c
index a5bcb7e..f6fc49a 100644
--- a/util/coroutine-sigaltstack.c
+++ b/util/coroutine-sigaltstack.c
@@ -33,6 +33,7 @@
 typedef struct {
 Coroutine base;
 void *stack;
+size_t stack_size;
 sigjmp_buf env;
 } CoroutineSigAltStack;
 
@@ -143,7 +144,6 @@ static void coroutine_trampoline(int signal)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineSigAltStack *co;
 CoroutineThreadState *coTS;
 struct sigaction sa;
@@ -164,7 +164,8 @@ Coroutine *qemu_coroutine_new(void)
  */
 
 co = g_malloc0(sizeof(*co));
-co->stack = g_malloc(stack_size);
+co->stack_size = COROUTINE_STACK_SIZE;
+co->stack = qemu_alloc_stack(&co->stack_size);
 co->base.entry_arg = &old_env; /* stash away our jmp_buf */
 
 coTS = coroutine_get_thread_state();
@@ -189,7 +190,7 @@ Coroutine *qemu_coroutine_new(void)
  * Set the new stack.
  */
 ss.ss_sp = co->stack;
-ss.ss_size = stack_size;
+ss.ss_size = co->stack_size;
 ss.ss_flags = 0;
 if (sigaltstack(&ss, &oss) < 0) {
 abort();
@@ -253,7 +254,7 @@ void qemu_coroutine_delete(Coroutine *co_)
 {
 CoroutineSigAltStack *co = DO_UPCAST(CoroutineSigAltStack, base, co_);
 
-g_free(co->stack);
+qemu_free_stack(co->stack, co->stack_size);
 g_free(co);
 }
 
-- 
1.9.1




[Qemu-devel] [PATCH V9 6/7] oslib-posix: add a configure switch to debug stack usage

2016-09-27 Thread Peter Lieven
this adds a knob to track the maximum stack usage of stacks
created by qemu_alloc_stack.

Signed-off-by: Peter Lieven 
---
 configure  | 19 +++
 util/oslib-posix.c | 35 +++
 2 files changed, 54 insertions(+)

diff --git a/configure b/configure
index 8fa62ad..93ef00a 100755
--- a/configure
+++ b/configure
@@ -296,6 +296,7 @@ libiscsi=""
 libnfs=""
 coroutine=""
 coroutine_pool=""
+debug_stack_usage="no"
 seccomp=""
 glusterfs=""
 glusterfs_xlator_opt="no"
@@ -1004,6 +1005,8 @@ for opt do
   ;;
   --enable-coroutine-pool) coroutine_pool="yes"
   ;;
+  --enable-debug-stack-usage) debug_stack_usage="yes"
+  ;;
   --disable-docs) docs="no"
   ;;
   --enable-docs) docs="yes"
@@ -4276,6 +4279,17 @@ if test "$coroutine" = "gthread" -a "$coroutine_pool" = 
"yes"; then
   error_exit "'gthread' coroutine backend does not support pool (use 
--disable-coroutine-pool)"
 fi
 
+if test "$debug_stack_usage" = "yes"; then
+  if test "$cpu" = "ia64" -o "$cpu" = "hppa"; then
+error_exit "stack usage debugging is not supported for $cpu"
+  fi
+  if test "$coroutine_pool" = "yes"; then
+echo "WARN: disabling coroutine pool for stack usage debugging"
+coroutine_pool=no
+  fi
+fi
+
+
 ##
 # check if we have open_by_handle_at
 
@@ -4861,6 +4875,7 @@ echo "QGA MSI support   $guest_agent_msi"
 echo "seccomp support   $seccomp"
 echo "coroutine backend $coroutine"
 echo "coroutine pool$coroutine_pool"
+echo "debug stack usage $debug_stack_usage"
 echo "GlusterFS support $glusterfs"
 echo "Archipelago support $archipelago"
 echo "gcov  $gcov_tool"
@@ -5330,6 +5345,10 @@ else
   echo "CONFIG_COROUTINE_POOL=0" >> $config_host_mak
 fi
 
+if test "$debug_stack_usage" = "yes" ; then
+  echo "CONFIG_DEBUG_STACK_USAGE=y" >> $config_host_mak
+fi
+
 if test "$open_by_handle_at" = "yes" ; then
   echo "CONFIG_OPEN_BY_HANDLE=y" >> $config_host_mak
 fi
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 5745229..21a5637 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -50,6 +50,10 @@
 
 #include "qemu/mmap-alloc.h"
 
+#ifdef CONFIG_DEBUG_STACK_USAGE
+#include "qemu/error-report.h"
+#endif
+
 int qemu_get_thread_id(void)
 {
 #if defined(__linux__)
@@ -503,6 +507,9 @@ pid_t qemu_fork(Error **errp)
 void *qemu_alloc_stack(size_t *sz)
 {
 void *ptr, *guardpage;
+#ifdef CONFIG_DEBUG_STACK_USAGE
+void *ptr2;
+#endif
 size_t pagesz = getpagesize();
 #ifdef _SC_THREAD_STACK_MIN
 /* avoid stacks smaller than _SC_THREAD_STACK_MIN */
@@ -534,10 +541,38 @@ void *qemu_alloc_stack(size_t *sz)
 abort();
 }
 
+#ifdef CONFIG_DEBUG_STACK_USAGE
+for (ptr2 = ptr + pagesz; ptr2 < ptr + *sz; ptr2 += sizeof(uint32_t)) {
+*(uint32_t *)ptr2 = 0xdeadbeaf;
+}
+#endif
+
 return ptr;
 }
 
+#ifdef CONFIG_DEBUG_STACK_USAGE
+static __thread unsigned int max_stack_usage;
+#endif
+
 void qemu_free_stack(void *stack, size_t sz)
 {
+#ifdef CONFIG_DEBUG_STACK_USAGE
+unsigned int usage;
+void *ptr;
+
+for (ptr = stack + getpagesize(); ptr < stack + sz;
+ ptr += sizeof(uint32_t)) {
+if (*(uint32_t *)ptr != 0xdeadbeaf) {
+break;
+}
+}
+usage = sz - (uintptr_t) (ptr - stack);
+if (usage > max_stack_usage) {
+error_report("thread %d max stack usage increased from %u to %u",
+ qemu_get_thread_id(), max_stack_usage, usage);
+max_stack_usage = usage;
+}
+#endif
+
 munmap(stack, sz);
 }
-- 
1.9.1




Re: [Qemu-devel] [PATCH 0/3] virtio: detach VirtQueueElements freed by reset

2016-09-27 Thread Stefan Hajnoczi
On Mon, Sep 19, 2016 at 02:28:02PM +0100, Stefan Hajnoczi wrote:
> virtio-blk and virtio-serial need to free VirtQueueElements during device
> reset.  Simply calling g_free(elem) is not enough because the scatter-gather
> list should be unmapped and vq->inuse must be decremented.
> 
> These patches address the issue.  I am not including a patch that changes
> vq->inuse = 0 to assert(!vq->inuse) in virtio_reset() yet because virtio-9p,
> virtio-gpu, and virtio-net have code paths that do not decrement vq->inuse.
> 
> Stefan Hajnoczi (3):
>   virtio: add virtio_detach_element()
>   virtio-blk: add missing virtio_detach_element() call
>   virtio-serial: add missing virtio_detach_element() call
> 
>  hw/block/virtio-blk.c   |  1 +
>  hw/char/virtio-serial-bus.c | 14 ++
>  hw/virtio/virtio.c  | 27 +--
>  include/hw/virtio/virtio.h  |  2 ++
>  4 files changed, 42 insertions(+), 2 deletions(-)

Ping?


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PULL 00/28] Misc patches for 2016-09-26

2016-09-27 Thread Paolo Bonzini

> I think we should keep it as it is, because Jason's patchset will only
> support intel-iommu, not amd-iommu. For now, it won't have problem
> (just like Intel IOMMU one). But after Jason's patch is merged, people
> will be able to boot a guest with vhost and amd-iommu (which we
> actually do not support yet), and that might be problematic.

Let's fix Jason's patch instead. :)

Paolo



Re: [Qemu-devel] [PATCH 1/3] virtio: add virtio_detach_element()

2016-09-27 Thread Stefan Hajnoczi
On Tue, Sep 27, 2016 at 09:32:40AM +0200, Ladi Prosek wrote:
> On Mon, Sep 19, 2016 at 3:28 PM, Stefan Hajnoczi  wrote:
> > During device res> > +/* virtqueue_discard:
> > + * @vq: The #VirtQueue
> > + * @elem: The #VirtQueueElement
> > + * @len: number of bytes written
> > + *
> > + * Pretend the most recent element wasn't popped from the virtqueue.  The 
> > next
> > + * call to virtqueue_pop() will refetch the element.
> > + */
> >  void virtqueue_discard(VirtQueue *vq, const VirtQueueElement *elem,
> > unsigned int len)
> >  {
> >  vq->last_avail_idx--;
> > -vq->inuse--;
> > -virtqueue_unmap_sg(vq, elem, len);
> > +virtqueue_detach_element(vq, elem, len);
> 
> Random comment, not directly related to this change. Would it be worth
> adding an assert to this function that elem->index and
> vq->last_avail_idx match? In other words, enforce the "most recent"
> qualifier mentioned in the comment. As more virtqueue_* functions are
> added and the complexity goes up, it is easy to get confused. Also, I
> think that naming this function virtqueue_unpop instead of
> virtqueue_discard would help.

elem->index is a descriptor ring index.  vq->last_avail_idx is an index
into the available ring.  They are different but your suggestion makes
sense in general.

We shouldn't read from vring memory again for an assertion so
deferencing the available ring isn't possible (because we cannot rely on
vring memory contents after processing the request).  One way to
implement the assertion is to put VirtQueueElements on a linked list
(vq->inuse_elems) but that probably needs live migration support.

I agree that renaming to unpop makes the semantics clearer.

Would you like to submit a patch for either or both?


signature.asc
Description: PGP signature


[Qemu-devel] [PULL V2 02/27] net: hmp_host_net_remove: Del the -net option of the removed host_net

2016-09-27 Thread Jason Wang
From: Shmulik Ladkani 

Upon hmp_host_net_remove(), the appropriate -net client is deleted
(according to the given vlan_id and device id), as well as the
corresponsing hub port.

However, the relevant '-net' option that was added by former
hmp_host_net_add() call is still present in "net" options group.

This makes the following legit HMP sequence erroneous:

(qemu) host_net_add tap id=n1,ifname=tap1,script=no,downscript=no,vlan=1
(qemu) host_net_remove 1 n1
(qemu) host_net_add tap id=n1,ifname=tap1,script=no,downscript=no,vlan=1
Duplicate ID 'n1' for net

Fix, by deleting the stored '-net' option associated with the given
device id.

Signed-off-by: Shmulik Ladkani 
Signed-off-by: Jason Wang 
---
 net/net.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/net.c b/net/net.c
index d51cb29..0bec096 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1179,6 +1179,7 @@ void hmp_host_net_remove(Monitor *mon, const QDict *qdict)
 
 qemu_del_net_client(nc->peer);
 qemu_del_net_client(nc);
+qemu_opts_del(qemu_opts_find(qemu_find_opts("net"), device));
 }
 
 void netdev_add(QemuOpts *opts, Error **errp)
-- 
2.7.4




[Qemu-devel] [PULL V2 00/27] Net patches

2016-09-27 Thread Jason Wang
The following changes since commit 7cfdc02dae0d2ff58c897496cfdbbafc0eda0f3f:

  Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging 
(2016-09-26 19:47:00 +0100)

are available in the git repository at:

  https://github.com/jasowang/qemu.git tags/net-pull-request

for you to fetch changes up to fa26f018393f18f5e91334820546bef07b133b88:

  imx_fec: fix error in qemu_send_packet argument (2016-09-27 17:54:22 +0800)



- allow to specify the rx queue size for virtio-net
- colo packet comparing thread
- filter-writer to rewrite tcp seq for comparing and secondary VM
- align some e1000e behaviour to spec
- allow to specify bridge for a tap ifup script

Changes from V1:
- fix build issues


Alexey Kardashevskiy (1):
  tap: Allow specifying a bridge

Dmitry Fleytman (7):
  e1000e: Flush all receive queues on receive enable
  e1000e: Flush receive queues on link up
  e1000e: Fix CTRL_EXT.EIAME behavior
  e1000e: Fix PBACLR implementation
  e1000e: Fix OTHER interrupts processing for MSI-X
  e1000e: Fix spurious RX TCP ACK interrupts
  e1000e: Fix EIAC register implementation

Gonglei (1):
  e1000: fix buliding complaint

Michael S. Tsirkin (1):
  virtio-net: allow increasing rx queue size

Paolo Bonzini (2):
  mcf_fec: fix error in qemu_send_packet argument
  imx_fec: fix error in qemu_send_packet argument

Peter Lieven (1):
  net: limit allocation in nc_sendv_compat

Prasad J Pandit (1):
  net: mcf: limit buffer descriptor count

Shmulik Ladkani (1):
  net: hmp_host_net_remove: Del the -net option of the removed host_net

Zhang Chen (12):
  qemu-char: Add qemu_chr_add_handlers_full() for GMaincontext
  colo-compare: introduce colo compare initialization
  net/colo.c: add colo.c to define and handle packet
  Jhash: add linux kernel jhashtable in qemu
  colo-compare: track connection and enqueue packet
  colo-compare: introduce packet comparison thread
  colo-compare: add TCP, UDP, ICMP packet comparison
  filter-rewriter: introduce filter-rewriter initialization
  filter-rewriter: track connection and parse packet
  filter-rewriter: rewrite tcp packet to keep secondary connection
  MAINTAINERS: add maintainer for COLO-proxy
  docs: Add documentation for COLO-proxy

 MAINTAINERS|   9 +
 docs/colo-proxy.txt| 188 ++
 hw/net/e1000e.c|   2 +-
 hw/net/e1000e_core.c   |  32 +-
 hw/net/e1000e_core.h   |   3 +
 hw/net/imx_fec.c   |   2 +-
 hw/net/mcf_fec.c   |   7 +-
 hw/net/trace-events|   2 +-
 hw/net/virtio-net.c|  26 +-
 include/hw/virtio/virtio-net.h |   1 +
 include/qemu/jhash.h   |  59 
 include/sysemu/char.h  |  11 +-
 net/Makefile.objs  |   3 +
 net/colo-compare.c | 781 +
 net/colo.c | 211 +++
 net/colo.h |  88 +
 net/filter-rewriter.c  | 263 ++
 net/net.c  |   9 +-
 net/tap.c  |   4 +-
 qapi-schema.json   |   3 +
 qemu-char.c|  77 ++--
 qemu-options.hx|  64 +++-
 trace-events   |  16 +
 vl.c   |   4 +-
 24 files changed, 1812 insertions(+), 53 deletions(-)
 create mode 100644 docs/colo-proxy.txt
 create mode 100644 include/qemu/jhash.h
 create mode 100644 net/colo-compare.c
 create mode 100644 net/colo.c
 create mode 100644 net/colo.h
 create mode 100644 net/filter-rewriter.c




[Qemu-devel] [PATCH V9 1/7] oslib-posix: add helpers for stack alloc and free

2016-09-27 Thread Peter Lieven
the allocated stack will be adjusted to the minimum supported stack size
by the OS and rounded up to be a multiple of the system pagesize.
Additionally an architecture dependent guard page is added to the stack
to catch stack overflows.

Signed-off-by: Peter Lieven 
---
 include/sysemu/os-posix.h | 27 +++
 util/oslib-posix.c| 42 ++
 2 files changed, 69 insertions(+)

diff --git a/include/sysemu/os-posix.h b/include/sysemu/os-posix.h
index 9c7dfdf..3cfedbc 100644
--- a/include/sysemu/os-posix.h
+++ b/include/sysemu/os-posix.h
@@ -60,4 +60,31 @@ int qemu_utimens(const char *path, const qemu_timespec 
*times);
 
 bool is_daemonized(void);
 
+/**
+ * qemu_alloc_stack:
+ * @sz: pointer to a size_t holding the requested usable stack size
+ *
+ * Allocate memory that can be used as a stack, for instance for
+ * coroutines. If the memory cannot be allocated, this function
+ * will abort (like g_malloc()). This function also inserts an
+ * additional guard page to catch a potential stack overflow.
+ * Note that the memory required for the guard page and alignment
+ * and minimal stack size restrictions will increase the value of sz.
+ *
+ * The allocated stack must be freed with qemu_free_stack().
+ *
+ * Returns: pointer to (the lowest address of) the stack memory.
+ */
+void *qemu_alloc_stack(size_t *sz);
+
+/**
+ * qemu_free_stack:
+ * @stack: stack to free
+ * @sz: size of stack in bytes
+ *
+ * Free a stack allocated via qemu_alloc_stack(). Note that sz must
+ * be exactly the adjusted stack size returned by qemu_alloc_stack.
+ */
+void qemu_free_stack(void *stack, size_t sz);
+
 #endif
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index f2d4e9e..5745229 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -499,3 +499,45 @@ pid_t qemu_fork(Error **errp)
 }
 return pid;
 }
+
+void *qemu_alloc_stack(size_t *sz)
+{
+void *ptr, *guardpage;
+size_t pagesz = getpagesize();
+#ifdef _SC_THREAD_STACK_MIN
+/* avoid stacks smaller than _SC_THREAD_STACK_MIN */
+long min_stack_sz = sysconf(_SC_THREAD_STACK_MIN);
+*sz = MAX(MAX(min_stack_sz, 0), *sz);
+#endif
+/* adjust stack size to a multiple of the page size */
+*sz = ROUND_UP(*sz, pagesz);
+/* allocate one extra page for the guard page */
+*sz += pagesz;
+
+ptr = mmap(NULL, *sz, PROT_READ | PROT_WRITE,
+   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+if (ptr == MAP_FAILED) {
+abort();
+}
+
+#if defined(HOST_IA64)
+/* separate register stack */
+guardpage = ptr + (((sz - pagesz) / 2) & ~pagesz);
+#elif defined(HOST_HPPA)
+/* stack grows up */
+guardpage = ptr + sz - pagesz;
+#else
+/* stack grows down */
+guardpage = ptr;
+#endif
+if (mprotect(guardpage, pagesz, PROT_NONE) != 0) {
+abort();
+}
+
+return ptr;
+}
+
+void qemu_free_stack(void *stack, size_t sz)
+{
+munmap(stack, sz);
+}
-- 
1.9.1




[Qemu-devel] [PULL V2 01/27] virtio-net: allow increasing rx queue size

2016-09-27 Thread Jason Wang
From: "Michael S. Tsirkin" 

This allows increasing the rx queue size up to 1024: unlike with tx,
guests don't put in huge S/G lists into RX so the risk of running into
the max 1024 limitation due to some off-by-one seems small.

It's helpful for users like OVS-DPDK which don't do any buffering on the
host - 1K roughly matches 500 entries in tun + 256 in the current rx
queue, which seems to work reasonably well. We could probably make do
with ~750 entries but virtio spec limits us to powers of two.
It might be a good idea to specify an s/g size limit in a future
version.

It also might be possible to make the queue size smaller down the road, 64
seems like the minimal value which will still work (as guests seem to
assume a queue full of 1.5K buffers is enough to process the largest
incoming packet, which is ~64K).  No one actually asked for this, and
with virtio 1 guests can reduce ring size without need for host
configuration, so don't bother with this for now.

Cc: Cornelia Huck 
Cc: Jason Wang 
Suggested-by: Patrik Hermansson 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Cornelia Huck 
Signed-off-by: Jason Wang 
---
 hw/net/virtio-net.c| 26 +-
 include/hw/virtio/virtio-net.h |  1 +
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 01f1351..6b8ae2c 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -31,6 +31,11 @@
 #define MAC_TABLE_ENTRIES64
 #define MAX_VLAN(1 << 12)   /* Per 802.1Q definition */
 
+/* previously fixed value */
+#define VIRTIO_NET_RX_QUEUE_DEFAULT_SIZE 256
+/* for now, only allow larger queues; with virtio-1, guest can downsize */
+#define VIRTIO_NET_RX_QUEUE_MIN_SIZE VIRTIO_NET_RX_QUEUE_DEFAULT_SIZE
+
 /*
  * Calculate the number of bytes up to and including the given 'field' of
  * 'container'.
@@ -1412,7 +1417,8 @@ static void virtio_net_add_queue(VirtIONet *n, int index)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(n);
 
-n->vqs[index].rx_vq = virtio_add_queue(vdev, 256, virtio_net_handle_rx);
+n->vqs[index].rx_vq = virtio_add_queue(vdev, n->net_conf.rx_queue_size,
+   virtio_net_handle_rx);
 if (n->net_conf.tx && !strcmp(n->net_conf.tx, "timer")) {
 n->vqs[index].tx_vq =
 virtio_add_queue(vdev, 256, virtio_net_handle_tx_timer);
@@ -1720,6 +1726,22 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)
 virtio_net_set_config_size(n, n->host_features);
 virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size);
 
+/*
+ * We set a lower limit on RX queue size to what it always was.
+ * Guests that want a smaller ring can always resize it without
+ * help from us (using virtio 1 and up).
+ */
+if (n->net_conf.rx_queue_size < VIRTIO_NET_RX_QUEUE_MIN_SIZE ||
+n->net_conf.rx_queue_size > VIRTQUEUE_MAX_SIZE ||
+(n->net_conf.rx_queue_size & (n->net_conf.rx_queue_size - 1))) {
+error_setg(errp, "Invalid rx_queue_size (= %" PRIu16 "), "
+   "must be a power of 2 between %d and %d.",
+   n->net_conf.rx_queue_size, VIRTIO_NET_RX_QUEUE_MIN_SIZE,
+   VIRTQUEUE_MAX_SIZE);
+virtio_cleanup(vdev);
+return;
+}
+
 n->max_queues = MAX(n->nic_conf.peers.queues, 1);
 if (n->max_queues * 2 + 1 > VIRTIO_QUEUE_MAX) {
 error_setg(errp, "Invalid number of queues (= %" PRIu32 "), "
@@ -1880,6 +1902,8 @@ static Property virtio_net_properties[] = {
TX_TIMER_INTERVAL),
 DEFINE_PROP_INT32("x-txburst", VirtIONet, net_conf.txburst, TX_BURST),
 DEFINE_PROP_STRING("tx", VirtIONet, net_conf.tx),
+DEFINE_PROP_UINT16("rx_queue_size", VirtIONet, net_conf.rx_queue_size,
+   VIRTIO_NET_RX_QUEUE_DEFAULT_SIZE),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index 91ed97c..0ced975 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -35,6 +35,7 @@ typedef struct virtio_net_conf
 uint32_t txtimer;
 int32_t txburst;
 char *tx;
+uint16_t rx_queue_size;
 } virtio_net_conf;
 
 /* Maximum packet size we can receive from tap device: header + 64k */
-- 
2.7.4




Re: [Qemu-devel] Live migration without bdrv_drain_all()

2016-09-27 Thread Daniel P. Berrange
On Tue, Sep 27, 2016 at 10:27:12AM +0100, Stefan Hajnoczi wrote:
> On Mon, Aug 29, 2016 at 06:56:42PM +, Felipe Franciosi wrote:
> > Heya!
> > 
> > > On 29 Aug 2016, at 08:06, Stefan Hajnoczi  wrote:
> > > 
> > > At KVM Forum an interesting idea was proposed to avoid
> > > bdrv_drain_all() during live migration.  Mike Cui and Felipe Franciosi
> > > mentioned running at queue depth 1.  It needs more thought to make it
> > > workable but I want to capture it here for discussion and to archive
> > > it.
> > > 
> > > bdrv_drain_all() is synchronous and can cause VM downtime if I/O
> > > requests hang.  We should find a better way of quiescing I/O that is
> > > not synchronous.  Up until now I thought we should simply add a
> > > timeout to bdrv_drain_all() so it can at least fail (and live
> > > migration would fail) if I/O is stuck instead of hanging the VM.  But
> > > the following approach is also interesting...
> > > 
> > > During the iteration phase of live migration we could limit the queue
> > > depth so points with no I/O requests in-flight are identified.  At
> > > these points the migration algorithm has the opportunity to move to
> > > the next phase without requiring bdrv_drain_all() since no requests
> > > are pending.
> > 
> > I actually think that this "io quiesced state" is highly unlikely to _just_ 
> > happen on a busy guest. The main idea behind running at QD1 is to naturally 
> > throttle the guest and make it easier to "force quiesce" the VQs.
> > 
> > In other words, if the guest is busy and we run at QD1, I would expect the 
> > rings to be quite full of pending (ie. unprocessed) requests. At the same 
> > time, I would expect that a call to bdrv_drain_all() (as part of 
> > do_vm_stop()) should complete much quicker.
> > 
> > Nevertheless, you mentioned that this is still problematic as that single 
> > outstanding IO could block, leaving the VM paused for longer.
> > 
> > My suggestion is therefore that we leave the vCPUs running, but stop 
> > picking up requests from the VQs. Provided nothing blocks, you should reach 
> > the "io quiesced state" fairly quickly. If you don't, then the VM is at 
> > least still running (despite seeing no progress on its VQs).
> > 
> > Thoughts on that?
> 
> If the guest experiences a hung disk it may enter error recovery.  QEMU
> should avoid this so the guest doesn't remount file systems read-only.
> 
> This can be solved by only quiescing the disk for, say, 30 seconds at a
> time.  If we don't reach a point where live migration can proceed during
> those 30 seconds then the disk will service requests again temporarily
> to avoid upsetting the guest.

What is the actual trigger for guest error recovery ? If you have the
situation where bdrv_drain_all could hang, surely even if you start
processing requests again after 30 seconds, you might not actually be
able to complete those requests for a long time, due to fact that
drain all has still got outstanding work blocking the new requests
you just accepted from the guest ?


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



[Qemu-devel] [PATCH V9 2/7] coroutine-sigaltstack: rename coroutine struct appropriately

2016-09-27 Thread Peter Lieven
The name of the sigaltstack coroutine struct was misleading.

Signed-off-by: Peter Lieven 
---
 util/coroutine-sigaltstack.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c
index a7c3366..171cd44 100644
--- a/util/coroutine-sigaltstack.c
+++ b/util/coroutine-sigaltstack.c
@@ -34,7 +34,7 @@ typedef struct {
 Coroutine base;
 void *stack;
 sigjmp_buf env;
-} CoroutineUContext;
+} CoroutineSigAltStack;
 
 /**
  * Per-thread coroutine bookkeeping
@@ -44,7 +44,7 @@ typedef struct {
 Coroutine *current;
 
 /** The default coroutine */
-CoroutineUContext leader;
+CoroutineSigAltStack leader;
 
 /** Information for the signal handler (trampoline) */
 sigjmp_buf tr_reenter;
@@ -89,7 +89,7 @@ static void __attribute__((constructor)) coroutine_init(void)
  * (from the signal handler when it is not signal handling, read ahead
  * for more information).
  */
-static void coroutine_bootstrap(CoroutineUContext *self, Coroutine *co)
+static void coroutine_bootstrap(CoroutineSigAltStack *self, Coroutine *co)
 {
 /* Initialize longjmp environment and switch back the caller */
 if (!sigsetjmp(self->env, 0)) {
@@ -109,7 +109,7 @@ static void coroutine_bootstrap(CoroutineUContext *self, 
Coroutine *co)
  */
 static void coroutine_trampoline(int signal)
 {
-CoroutineUContext *self;
+CoroutineSigAltStack *self;
 Coroutine *co;
 CoroutineThreadState *coTS;
 
@@ -144,7 +144,7 @@ static void coroutine_trampoline(int signal)
 Coroutine *qemu_coroutine_new(void)
 {
 const size_t stack_size = 1 << 20;
-CoroutineUContext *co;
+CoroutineSigAltStack *co;
 CoroutineThreadState *coTS;
 struct sigaction sa;
 struct sigaction osa;
@@ -251,7 +251,7 @@ Coroutine *qemu_coroutine_new(void)
 
 void qemu_coroutine_delete(Coroutine *co_)
 {
-CoroutineUContext *co = DO_UPCAST(CoroutineUContext, base, co_);
+CoroutineSigAltStack *co = DO_UPCAST(CoroutineSigAltStack, base, co_);
 
 g_free(co->stack);
 g_free(co);
@@ -260,8 +260,8 @@ void qemu_coroutine_delete(Coroutine *co_)
 CoroutineAction qemu_coroutine_switch(Coroutine *from_, Coroutine *to_,
   CoroutineAction action)
 {
-CoroutineUContext *from = DO_UPCAST(CoroutineUContext, base, from_);
-CoroutineUContext *to = DO_UPCAST(CoroutineUContext, base, to_);
+CoroutineSigAltStack *from = DO_UPCAST(CoroutineSigAltStack, base, from_);
+CoroutineSigAltStack *to = DO_UPCAST(CoroutineSigAltStack, base, to_);
 CoroutineThreadState *s = coroutine_get_thread_state();
 int ret;
 
-- 
1.9.1




[Qemu-devel] [PULL V2 08/27] colo-compare: introduce packet comparison thread

2016-09-27 Thread Jason Wang
From: Zhang Chen 

If primary packet is same with secondary packet,
we will send primary packet and drop secondary
packet, otherwise notify COLO frame to do checkpoint.
If primary packet comes but secondary packet does not,
after REGULAR_PACKET_CHECK_MS milliseconds we set
the primary packet as old_packet,then do a checkpoint.

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
Signed-off-by: Jason Wang 
---
 net/colo-compare.c | 233 +
 net/colo.c |   1 +
 net/colo.h |   3 +
 trace-events   |   2 +
 4 files changed, 239 insertions(+)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index bcc1beb..9596c8b 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -33,8 +33,12 @@
 #define COLO_COMPARE(obj) \
 OBJECT_CHECK(CompareState, (obj), TYPE_COLO_COMPARE)
 
+#define COMPARE_READ_LEN_MAX NET_BUFSIZE
 #define MAX_QUEUE_SIZE 1024
 
+/* TODO: Should be configurable */
+#define REGULAR_PACKET_CHECK_MS 3000
+
 /*
   + CompareState ++
   |   |
@@ -76,6 +80,11 @@ typedef struct CompareState {
 GQueue conn_list;
 /* hashtable to save connection */
 GHashTable *connection_track_table;
+/* compare thread, a thread for each NIC */
+QemuThread thread;
+/* Timer used on the primary to find packets that are never matched */
+QEMUTimer *timer;
+QemuMutex timer_check_lock;
 } CompareState;
 
 typedef struct CompareClass {
@@ -148,6 +157,118 @@ static int packet_enqueue(CompareState *s, int mode)
 return 0;
 }
 
+/*
+ * The IP packets sent by primary and secondary
+ * will be compared in here
+ * TODO support ip fragment, Out-Of-Order
+ * return:0  means packet same
+ *> 0 || < 0 means packet different
+ */
+static int colo_packet_compare(Packet *ppkt, Packet *spkt)
+{
+trace_colo_compare_ip_info(ppkt->size, inet_ntoa(ppkt->ip->ip_src),
+   inet_ntoa(ppkt->ip->ip_dst), spkt->size,
+   inet_ntoa(spkt->ip->ip_src),
+   inet_ntoa(spkt->ip->ip_dst));
+
+if (ppkt->size == spkt->size) {
+return memcmp(ppkt->data, spkt->data, spkt->size);
+} else {
+return -1;
+}
+}
+
+static int colo_packet_compare_all(Packet *spkt, Packet *ppkt)
+{
+trace_colo_compare_main("compare all");
+return colo_packet_compare(ppkt, spkt);
+}
+
+static int colo_old_packet_check_one(Packet *pkt, int64_t *check_time)
+{
+int64_t now = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+
+if ((now - pkt->creation_ms) > (*check_time)) {
+trace_colo_old_packet_check_found(pkt->creation_ms);
+return 0;
+} else {
+return 1;
+}
+}
+
+static void colo_old_packet_check_one_conn(void *opaque,
+   void *user_data)
+{
+Connection *conn = opaque;
+GList *result = NULL;
+int64_t check_time = REGULAR_PACKET_CHECK_MS;
+
+result = g_queue_find_custom(&conn->primary_list,
+ &check_time,
+ (GCompareFunc)colo_old_packet_check_one);
+
+if (result) {
+/* do checkpoint will flush old packet */
+/* TODO: colo_notify_checkpoint();*/
+}
+}
+
+/*
+ * Look for old packets that the secondary hasn't matched,
+ * if we have some then we have to checkpoint to wake
+ * the secondary up.
+ */
+static void colo_old_packet_check(void *opaque)
+{
+CompareState *s = opaque;
+
+g_queue_foreach(&s->conn_list, colo_old_packet_check_one_conn, NULL);
+}
+
+/*
+ * Called from the compare thread on the primary
+ * for compare connection
+ */
+static void colo_compare_connection(void *opaque, void *user_data)
+{
+CompareState *s = user_data;
+Connection *conn = opaque;
+Packet *pkt = NULL;
+GList *result = NULL;
+int ret;
+
+while (!g_queue_is_empty(&conn->primary_list) &&
+   !g_queue_is_empty(&conn->secondary_list)) {
+qemu_mutex_lock(&s->timer_check_lock);
+pkt = g_queue_pop_tail(&conn->primary_list);
+qemu_mutex_unlock(&s->timer_check_lock);
+result = g_queue_find_custom(&conn->secondary_list,
+  pkt, (GCompareFunc)colo_packet_compare_all);
+
+if (result) {
+ret = compare_chr_send(s->chr_out, pkt->data, pkt->size);
+if (ret < 0) {
+error_report("colo_send_primary_packet failed");
+}
+trace_colo_compare_main("packet same and release packet");
+g_queue_remove(&conn->secondary_list, result->data);
+packet_destroy(pkt, NULL);
+} else {
+/*
+ * If one packet arrive late, the secondary_list or
+ * primary_list will be empty, so we can't compare it
+ * until next comparison.
+ */
+trace_colo_compare_main("packet different");
+qemu_mutex_lock(&s->timer_check

[Qemu-devel] [PULL V2 04/27] colo-compare: introduce colo compare initialization

2016-09-27 Thread Jason Wang
From: Zhang Chen 

This a COLO net ascii figure:

 Primary qemu   
Secondary qemu
+--+   
++
| +--+ |   |  
+---+ |
| |  | |   |  | 
  | |
| |guest | |   |  | 
   guest  | |
| |  | |   |  | 
  | |
| +---^--+---+ |   |  
+-+++ |
| |  | |   |
^|  |
| |  | |   |
||  |
| |  +--+  |
||  |
|netfilter|  |   | ||  |   
netfilter||  |
| +--+ ++  ||  |  
+---+ |
| |   |  |   |  |out   ||  |  | 
||  filter excute order   | |
| |   |  |  +-+||  |  | 
|| +--->  | |
| |   |  |  ||  | |||  |  | 
||   TCP  | |
| | +-+--+-+  +-v+ +-v+ |pri +++sec||  |  | 
++  +---++---v+rewriter++  ++ | |
| | |  |  |  | |  | |in  | |in ||  |  | |   
 |  ||  |  || | |
| | |  filter  |  |  filter  | |  filter  +-->  colo   <--+ +>  
filter   +--> adjust |   adjust +-->   filter   | | |
| | |  mirror  |  |redirector| |redirector| || compare |   |  ||  | | 
redirector |  | ack|   seq|  | redirector | | |
| | |  |  |  | |  | || |   |  ||  | |   
 |  ||  |  || | |
| | +^-+  ++-+ +--+ |+-+   |  ||  | 
++  ++--+  +---++ | |
| |  |   tx|   rx   rx  |  |  ||  | 
   txall   |  rx  | |
| |  | ||  |  ||  
+---+ |
| |  | +--+ |  |  ||
   ||
| |  |   filter excute order  | |  |  ||
   ||
| |  |  +>| |  |  
++|
| +-+  |   |
|
||||   |
|
+--+   
++
 |guest receive   | guest send
 ||
++v+
|  |
  NOTE: filter direction is rx/tx/all
| tap  |
  rx:receive packets sent to the netdev
|  |
  tx:receive packets sent by the netdev
+--+

In COLO-compare, we do packet comparing job.
Packets coming from the primary char indev will be sent to outdev.
Packets coming from the secondary char dev will be dropped after comparing.
colo-comapre need two input chardev and one output chardev:
primary_in=chardev1-id (source: primary send packet)
secondary_in=chardev2-id (source: secondary send packet)
outd

[Qemu-devel] [PULL V2 03/27] qemu-char: Add qemu_chr_add_handlers_full() for GMaincontext

2016-09-27 Thread Jason Wang
From: Zhang Chen 

Add qemu_chr_add_handlers_full() API, we can use
this API pass in a GMainContext,make handler run
in the context rather than main_loop.
This comments from Daniel P . Berrange.

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
Reviewed-by: Daniel P. Berrange 
Signed-off-by: Jason Wang 
---
 include/sysemu/char.h | 11 +++-
 qemu-char.c   | 77 +++
 2 files changed, 63 insertions(+), 25 deletions(-)

diff --git a/include/sysemu/char.h b/include/sysemu/char.h
index ee7e554..0d0465a 100644
--- a/include/sysemu/char.h
+++ b/include/sysemu/char.h
@@ -65,7 +65,8 @@ struct CharDriverState {
 int (*chr_sync_read)(struct CharDriverState *s,
  const uint8_t *buf, int len);
 GSource *(*chr_add_watch)(struct CharDriverState *s, GIOCondition cond);
-void (*chr_update_read_handler)(struct CharDriverState *s);
+void (*chr_update_read_handler)(struct CharDriverState *s,
+GMainContext *context);
 int (*chr_ioctl)(struct CharDriverState *s, int cmd, void *arg);
 int (*get_msgfds)(struct CharDriverState *s, int* fds, int num);
 int (*set_msgfds)(struct CharDriverState *s, int *fds, int num);
@@ -422,6 +423,14 @@ void qemu_chr_add_handlers(CharDriverState *s,
IOEventHandler *fd_event,
void *opaque);
 
+/* This API can make handler run in the context what you pass to. */
+void qemu_chr_add_handlers_full(CharDriverState *s,
+IOCanReadHandler *fd_can_read,
+IOReadHandler *fd_read,
+IOEventHandler *fd_event,
+void *opaque,
+GMainContext *context);
+
 void qemu_chr_be_generic_open(CharDriverState *s);
 void qemu_chr_accept_input(CharDriverState *s);
 int qemu_chr_add_client(CharDriverState *s, int fd);
diff --git a/qemu-char.c b/qemu-char.c
index 8826419..fb456ce 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -449,11 +449,12 @@ void qemu_chr_fe_printf(CharDriverState *s, const char 
*fmt, ...)
 
 static void remove_fd_in_watch(CharDriverState *chr);
 
-void qemu_chr_add_handlers(CharDriverState *s,
-   IOCanReadHandler *fd_can_read,
-   IOReadHandler *fd_read,
-   IOEventHandler *fd_event,
-   void *opaque)
+void qemu_chr_add_handlers_full(CharDriverState *s,
+IOCanReadHandler *fd_can_read,
+IOReadHandler *fd_read,
+IOEventHandler *fd_event,
+void *opaque,
+GMainContext *context)
 {
 int fe_open;
 
@@ -467,8 +468,9 @@ void qemu_chr_add_handlers(CharDriverState *s,
 s->chr_read = fd_read;
 s->chr_event = fd_event;
 s->handler_opaque = opaque;
-if (fe_open && s->chr_update_read_handler)
-s->chr_update_read_handler(s);
+if (fe_open && s->chr_update_read_handler) {
+s->chr_update_read_handler(s, context);
+}
 
 if (!s->explicit_fe_open) {
 qemu_chr_fe_set_open(s, fe_open);
@@ -481,6 +483,16 @@ void qemu_chr_add_handlers(CharDriverState *s,
 }
 }
 
+void qemu_chr_add_handlers(CharDriverState *s,
+   IOCanReadHandler *fd_can_read,
+   IOReadHandler *fd_read,
+   IOEventHandler *fd_event,
+   void *opaque)
+{
+qemu_chr_add_handlers_full(s, fd_can_read, fd_read,
+   fd_event, opaque, NULL);
+}
+
 static int null_chr_write(CharDriverState *chr, const uint8_t *buf, int len)
 {
 return len;
@@ -722,7 +734,8 @@ static void mux_chr_event(void *opaque, int event)
 mux_chr_send_event(d, i, event);
 }
 
-static void mux_chr_update_read_handler(CharDriverState *chr)
+static void mux_chr_update_read_handler(CharDriverState *chr,
+GMainContext *context)
 {
 MuxDriver *d = chr->opaque;
 
@@ -736,8 +749,10 @@ static void mux_chr_update_read_handler(CharDriverState 
*chr)
 d->chr_event[d->mux_cnt] = chr->chr_event;
 /* Fix up the real driver with mux routines */
 if (d->mux_cnt == 0) {
-qemu_chr_add_handlers(d->drv, mux_chr_can_read, mux_chr_read,
-  mux_chr_event, chr);
+qemu_chr_add_handlers_full(d->drv, mux_chr_can_read,
+   mux_chr_read,
+   mux_chr_event,
+   chr, context);
 }
 if (d->focus != -1) {
 mux_chr_send_event(d, d->focus, CHR_EVENT_MUX_OUT);
@@ -853,6 +868,7 @@ typedef struct IOWatchPoll
 IOCanReadHandler *fd_can_read;
 GSourceFunc fd_read;
 void *opaque;
+GMain

[Qemu-devel] [PULL V2 05/27] net/colo.c: add colo.c to define and handle packet

2016-09-27 Thread Jason Wang
From: Zhang Chen 

The net/colo.c is used by colo-compare and filter-rewriter.
this can share common data structure like net packet,
and other functions.

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
Signed-off-by: Jason Wang 
---
 net/Makefile.objs  |   1 +
 net/colo-compare.c | 114 +++--
 net/colo.c |  86 
 net/colo.h |  37 +
 trace-events   |   6 +++
 5 files changed, 240 insertions(+), 4 deletions(-)
 create mode 100644 net/colo.c
 create mode 100644 net/colo.h

diff --git a/net/Makefile.objs b/net/Makefile.objs
index ba92f73..beb504b 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -17,3 +17,4 @@ common-obj-y += filter.o
 common-obj-y += filter-buffer.o
 common-obj-y += filter-mirror.o
 common-obj-y += colo-compare.o
+common-obj-y += colo.o
diff --git a/net/colo-compare.c b/net/colo-compare.c
index dc5f70c..cea9b27 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -14,6 +14,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/error-report.h"
+#include "trace.h"
 #include "qemu-common.h"
 #include "qapi/qmp/qerror.h"
 #include "qapi/error.h"
@@ -26,13 +27,34 @@
 #include "sysemu/char.h"
 #include "qemu/sockets.h"
 #include "qapi-visit.h"
+#include "net/colo.h"
 
 #define TYPE_COLO_COMPARE "colo-compare"
 #define COLO_COMPARE(obj) \
 OBJECT_CHECK(CompareState, (obj), TYPE_COLO_COMPARE)
 
-#define COMPARE_READ_LEN_MAX NET_BUFSIZE
-
+/*
+  + CompareState ++
+  |   |
+  +---+   +---+ +---+
+  |conn list  +--->conn   +->conn   |
+  +---+   +---+ +---+
+  |   | |   | |  |
+  +---+ +---v+  +---v++---v+ +---v+
+|primary |  |secondary|primary | |secondary
+|packet  |  |packet  +|packet  | |packet  +
+++  ++++ ++
+|   | |  |
++---v+  +---v++---v+ +---v+
+|primary |  |secondary|primary | |secondary
+|packet  |  |packet  +|packet  | |packet  +
+++  ++++ ++
+|   | |  |
++---v+  +---v++---v+ +---v+
+|primary |  |secondary|primary | |secondary
+|packet  |  |packet  +|packet  | |packet  +
+++  ++++ ++
+*/
 typedef struct CompareState {
 Object parent;
 
@@ -44,6 +66,9 @@ typedef struct CompareState {
 CharDriverState *chr_out;
 SocketReadState pri_rs;
 SocketReadState sec_rs;
+
+/* hashtable to save connection */
+GHashTable *connection_track_table;
 } CompareState;
 
 typedef struct CompareClass {
@@ -54,6 +79,76 @@ typedef struct CompareChardevProps {
 bool is_socket;
 } CompareChardevProps;
 
+enum {
+PRIMARY_IN = 0,
+SECONDARY_IN,
+};
+
+static int compare_chr_send(CharDriverState *out,
+const uint8_t *buf,
+uint32_t size);
+
+/*
+ * Return 0 on success, if return -1 means the pkt
+ * is unsupported(arp and ipv6) and will be sent later
+ */
+static int packet_enqueue(CompareState *s, int mode)
+{
+Packet *pkt = NULL;
+
+if (mode == PRIMARY_IN) {
+pkt = packet_new(s->pri_rs.buf, s->pri_rs.packet_len);
+} else {
+pkt = packet_new(s->sec_rs.buf, s->sec_rs.packet_len);
+}
+
+if (parse_packet_early(pkt)) {
+packet_destroy(pkt, NULL);
+pkt = NULL;
+return -1;
+}
+/* TODO: get connection key from pkt */
+
+/*
+ * TODO: use connection key get conn from
+ * connection_track_table
+ */
+
+/*
+ * TODO: insert pkt to it's conn->primary_list
+ * or conn->secondary_list
+ */
+
+return 0;
+}
+
+static int compare_chr_send(CharDriverState *out,
+const uint8_t *buf,
+uint32_t size)
+{
+int ret = 0;
+uint32_t len = htonl(size);
+
+if (!size) {
+return 0;
+}
+
+ret = qemu_chr_fe_write_all(out, (uint8_t *)&len, sizeof(len));
+if (ret != sizeof(len)) {
+goto err;
+}
+
+ret = qemu_chr_fe_write_all(out, (uint8_t *)buf, size);
+if (ret != size) {
+goto err;
+}
+
+return 0;
+
+err:
+return ret < 0 ? ret : -EIO;
+}
+
 static char *compare_get_pri_indev(Object *obj, Error **errp)
 {
 CompareState *s = COLO_COMPARE(obj);
@@ -101,12 +196,21 @@ static void compare_set_outdev(Object *obj, const char 
*value, Error **errp)
 
 static void compare_pri_rs_finalize

Re: [Qemu-devel] [PATCH] pci-testdev: enhance to support new testcases

2016-09-27 Thread Paolo Bonzini

> Take my example: IOMMU unit test would want the guest to send DMA/IRQ
> request from the device's perspective. In that case, we would like to
> "tell" the pci-testdev about where to write the DMA, and what data to
> write specifically, or which IRQ to trigger. That's something we
> cannot do right now. And I don't want to just add a new test case for
> that specifically. I think we can make it more common.

Do we need to use the pci-testdev?  There's also for example
the edu device, or we could just use virtio-serial with a null
backend.

Paolo



[Qemu-devel] [PULL V2 06/27] Jhash: add linux kernel jhashtable in qemu

2016-09-27 Thread Jason Wang
From: Zhang Chen 

Jhash will be used by colo-compare and filter-rewriter
to save and lookup net connection info

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
Signed-off-by: Jason Wang 
---
 include/qemu/jhash.h | 59 
 net/colo.h   |  1 +
 2 files changed, 60 insertions(+)
 create mode 100644 include/qemu/jhash.h

diff --git a/include/qemu/jhash.h b/include/qemu/jhash.h
new file mode 100644
index 000..742
--- /dev/null
+++ b/include/qemu/jhash.h
@@ -0,0 +1,59 @@
+/* jhash.h: Jenkins hash support.
+  *
+  * Copyright (C) 2006. Bob Jenkins (bob_jenk...@burtleburtle.net)
+  *
+  * http://burtleburtle.net/bob/hash/
+  *
+  * These are the credits from Bob's sources:
+  *
+  * lookup3.c, by Bob Jenkins, May 2006, Public Domain.
+  *
+  * These are functions for producing 32-bit hashes for hash table lookup.
+  * hashword(), hashlittle(), hashlittle2(), hashbig(), mix(), and final()
+  * are externally useful functions.  Routines to test the hash are included
+  * if SELF_TEST is defined.  You can use this free for any purpose. It's in
+  * the public domain.  It has no warranty.
+  *
+  * Copyright (C) 2009-2010 Jozsef Kadlecsik (kad...@blackhole.kfki.hu)
+  *
+  * I've modified Bob's hash to be useful in the Linux kernel, and
+  * any bugs present are my fault.
+  * Jozsef
+  */
+
+#ifndef QEMU_JHASH_H__
+#define QEMU_JHASH_H__
+
+#include "qemu/bitops.h"
+
+/*
+ * hashtable relation copy from linux kernel jhash
+ */
+
+/* __jhash_mix -- mix 3 32-bit values reversibly. */
+#define __jhash_mix(a, b, c)\
+{   \
+a -= c;  a ^= rol32(c, 4);  c += b; \
+b -= a;  b ^= rol32(a, 6);  a += c; \
+c -= b;  c ^= rol32(b, 8);  b += a; \
+a -= c;  a ^= rol32(c, 16); c += b; \
+b -= a;  b ^= rol32(a, 19); a += c; \
+c -= b;  c ^= rol32(b, 4);  b += a; \
+}
+
+/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */
+#define __jhash_final(a, b, c)  \
+{   \
+c ^= b; c -= rol32(b, 14);  \
+a ^= c; a -= rol32(c, 11);  \
+b ^= a; b -= rol32(a, 25);  \
+c ^= b; c -= rol32(b, 16);  \
+a ^= c; a -= rol32(c, 4);   \
+b ^= a; b -= rol32(a, 14);  \
+c ^= b; c -= rol32(b, 24);  \
+}
+
+/* An arbitrary initial parameter */
+#define JHASH_INITVAL   0xdeadbeef
+
+#endif /* QEMU_JHASH_H__ */
diff --git a/net/colo.h b/net/colo.h
index e211eda..05dc0b6 100644
--- a/net/colo.h
+++ b/net/colo.h
@@ -16,6 +16,7 @@
 #define QEMU_COLO_PROXY_H
 
 #include "slirp/slirp.h"
+#include "qemu/jhash.h"
 
 #define HASHTABLE_MAX_SIZE 16384
 
-- 
2.7.4




Re: [Qemu-devel] [PATCH v2] block: Turn on "unmap" in active commit

2016-09-27 Thread Fam Zheng
On Tue, 09/27 17:33, Fam Zheng wrote:
> We already specified BDRV_O_UNMAP when opening images in 'qemu-img
> commit', but didn't turn on the "unmap" in the active commit job. This
> patch fixes that so that zeroed clusters in top image can be discarded
> which is desired in the virt-sparsify use case, where a temporary
> overlay is created and fstrim'ed before commiting back, to free space in
> the original image.
> 
> This also enables it for block-commit.
> 
> Signed-off-by: Fam Zheng 

Sent too soon. I see a number of iotests failures (020 040 097 129),
still investigating.

Fam



[Qemu-devel] [PULL V2 07/27] colo-compare: track connection and enqueue packet

2016-09-27 Thread Jason Wang
From: Zhang Chen 

In this patch we use kernel jhash table to track
connection, and then enqueue net packet like this:

+ CompareState ++
|   |
+---+   +---+ +---+
|conn list  +--->conn   +->conn   |
+---+   +---+ +---+
|   | |   | |  |
+---+ +---v+  +---v++---v+ +---v+
  |primary |  |secondary|primary | |secondary
  |packet  |  |packet  +|packet  | |packet  +
  ++  ++++ ++
  |   | |  |
  +---v+  +---v++---v+ +---v+
  |primary |  |secondary|primary | |secondary
  |packet  |  |packet  +|packet  | |packet  +
  ++  ++++ ++
  |   | |  |
  +---v+  +---v++---v+ +---v+
  |primary |  |secondary|primary | |secondary
  |packet  |  |packet  +|packet  | |packet  +
  ++  ++++ ++

We use conn_list to record connection info.
When we want to enqueue a packet, firstly get the
connection from connection_track_table. then push
the packet to g_queue(pri/sec) in it's own conn.

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
Signed-off-by: Jason Wang 
---
 net/colo-compare.c |  53 +-
 net/colo.c | 108 +
 net/colo.h |  39 +++
 3 files changed, 190 insertions(+), 10 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index cea9b27..bcc1beb 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -33,6 +33,8 @@
 #define COLO_COMPARE(obj) \
 OBJECT_CHECK(CompareState, (obj), TYPE_COLO_COMPARE)
 
+#define MAX_QUEUE_SIZE 1024
+
 /*
   + CompareState ++
   |   |
@@ -67,6 +69,11 @@ typedef struct CompareState {
 SocketReadState pri_rs;
 SocketReadState sec_rs;
 
+/* connection list: the connections belonged to this NIC could be found
+ * in this list.
+ * element type: Connection
+ */
+GQueue conn_list;
 /* hashtable to save connection */
 GHashTable *connection_track_table;
 } CompareState;
@@ -94,7 +101,9 @@ static int compare_chr_send(CharDriverState *out,
  */
 static int packet_enqueue(CompareState *s, int mode)
 {
+ConnectionKey key;
 Packet *pkt = NULL;
+Connection *conn;
 
 if (mode == PRIMARY_IN) {
 pkt = packet_new(s->pri_rs.buf, s->pri_rs.packet_len);
@@ -107,17 +116,34 @@ static int packet_enqueue(CompareState *s, int mode)
 pkt = NULL;
 return -1;
 }
-/* TODO: get connection key from pkt */
+fill_connection_key(pkt, &key);
 
-/*
- * TODO: use connection key get conn from
- * connection_track_table
- */
+conn = connection_get(s->connection_track_table,
+  &key,
+  &s->conn_list);
 
-/*
- * TODO: insert pkt to it's conn->primary_list
- * or conn->secondary_list
- */
+if (!conn->processing) {
+g_queue_push_tail(&s->conn_list, conn);
+conn->processing = true;
+}
+
+if (mode == PRIMARY_IN) {
+if (g_queue_get_length(&conn->primary_list) <=
+   MAX_QUEUE_SIZE) {
+g_queue_push_tail(&conn->primary_list, pkt);
+} else {
+error_report("colo compare primary queue size too big,"
+ "drop packet");
+}
+} else {
+if (g_queue_get_length(&conn->secondary_list) <=
+   MAX_QUEUE_SIZE) {
+g_queue_push_tail(&conn->secondary_list, pkt);
+} else {
+error_report("colo compare secondary queue size too big,"
+ "drop packet");
+}
+}
 
 return 0;
 }
@@ -308,7 +334,12 @@ static void colo_compare_complete(UserCreatable *uc, Error 
**errp)
 net_socket_rs_init(&s->pri_rs, compare_pri_rs_finalize);
 net_socket_rs_init(&s->sec_rs, compare_sec_rs_finalize);
 
-/* use g_hash_table_new_full() to new a hashtable */
+g_queue_init(&s->conn_list);
+
+s->connection_track_table = g_hash_table_new_full(connection_key_hash,
+  connection_key_equal,
+  g_free,
+  connection_destroy);
 
 return;
 }
@@ -349,6 +380,8 @@ static void colo_compare_finalize(Object *obj)
 qemu_chr_fe_release(s->chr_out);
 }
 
+g_queue_free(&s->conn_list);
+
 g_free(s->pri_indev);
   

[Qemu-devel] [PULL V2 09/27] colo-compare: add TCP, UDP, ICMP packet comparison

2016-09-27 Thread Jason Wang
From: Zhang Chen 

We add TCP,UDP,ICMP packet comparison to replace
IP packet comparison. This can increase the
accuracy of the package comparison.
Less checkpoint more efficiency.

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
Signed-off-by: Jason Wang 
---
 net/colo-compare.c | 147 +++--
 trace-events   |   3 ++
 2 files changed, 146 insertions(+), 4 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 9596c8b..22b1da1 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -19,6 +19,7 @@
 #include "qapi/qmp/qerror.h"
 #include "qapi/error.h"
 #include "net/net.h"
+#include "net/eth.h"
 #include "qom/object_interfaces.h"
 #include "qemu/iov.h"
 #include "qom/object.h"
@@ -178,9 +179,131 @@ static int colo_packet_compare(Packet *ppkt, Packet *spkt)
 }
 }
 
-static int colo_packet_compare_all(Packet *spkt, Packet *ppkt)
+/*
+ * Called from the compare thread on the primary
+ * for compare tcp packet
+ * compare_tcp copied from Dr. David Alan Gilbert's branch
+ */
+static int colo_packet_compare_tcp(Packet *spkt, Packet *ppkt)
+{
+struct tcphdr *ptcp, *stcp;
+int res;
+char *sdebug, *ddebug;
+
+trace_colo_compare_main("compare tcp");
+if (ppkt->size != spkt->size) {
+if (trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) {
+trace_colo_compare_main("pkt size not same");
+}
+return -1;
+}
+
+ptcp = (struct tcphdr *)ppkt->transport_header;
+stcp = (struct tcphdr *)spkt->transport_header;
+
+/*
+ * The 'identification' field in the IP header is *very* random
+ * it almost never matches.  Fudge this by ignoring differences in
+ * unfragmented packets; they'll normally sort themselves out if different
+ * anyway, and it should recover at the TCP level.
+ * An alternative would be to get both the primary and secondary to rewrite
+ * somehow; but that would need some sync traffic to sync the state
+ */
+if (ntohs(ppkt->ip->ip_off) & IP_DF) {
+spkt->ip->ip_id = ppkt->ip->ip_id;
+/* and the sum will be different if the IDs were different */
+spkt->ip->ip_sum = ppkt->ip->ip_sum;
+}
+
+res = memcmp(ppkt->data + ETH_HLEN, spkt->data + ETH_HLEN,
+(spkt->size - ETH_HLEN));
+
+if (res != 0 && trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) {
+sdebug = strdup(inet_ntoa(ppkt->ip->ip_src));
+ddebug = strdup(inet_ntoa(ppkt->ip->ip_dst));
+fprintf(stderr, "%s: src/dst: %s/%s p: seq/ack=%u/%u"
+" s: seq/ack=%u/%u res=%d flags=%x/%x\n",
+__func__, sdebug, ddebug,
+(unsigned int)ntohl(ptcp->th_seq),
+(unsigned int)ntohl(ptcp->th_ack),
+(unsigned int)ntohl(stcp->th_seq),
+(unsigned int)ntohl(stcp->th_ack),
+res, ptcp->th_flags, stcp->th_flags);
+
+fprintf(stderr, "Primary len = %d\n", ppkt->size);
+qemu_hexdump((char *)ppkt->data, stderr, "colo-compare", ppkt->size);
+fprintf(stderr, "Secondary len = %d\n", spkt->size);
+qemu_hexdump((char *)spkt->data, stderr, "colo-compare", spkt->size);
+
+g_free(sdebug);
+g_free(ddebug);
+}
+
+return res;
+}
+
+/*
+ * Called from the compare thread on the primary
+ * for compare udp packet
+ */
+static int colo_packet_compare_udp(Packet *spkt, Packet *ppkt)
+{
+int ret;
+
+trace_colo_compare_main("compare udp");
+ret = colo_packet_compare(ppkt, spkt);
+
+if (ret) {
+trace_colo_compare_udp_miscompare("primary pkt size", ppkt->size);
+qemu_hexdump((char *)ppkt->data, stderr, "colo-compare", ppkt->size);
+trace_colo_compare_udp_miscompare("Secondary pkt size", spkt->size);
+qemu_hexdump((char *)spkt->data, stderr, "colo-compare", spkt->size);
+}
+
+return ret;
+}
+
+/*
+ * Called from the compare thread on the primary
+ * for compare icmp packet
+ */
+static int colo_packet_compare_icmp(Packet *spkt, Packet *ppkt)
 {
-trace_colo_compare_main("compare all");
+int network_length;
+
+trace_colo_compare_main("compare icmp");
+network_length = ppkt->ip->ip_hl * 4;
+if (ppkt->size != spkt->size ||
+ppkt->size < network_length + ETH_HLEN) {
+return -1;
+}
+
+if (colo_packet_compare(ppkt, spkt)) {
+trace_colo_compare_icmp_miscompare("primary pkt size",
+   ppkt->size);
+qemu_hexdump((char *)ppkt->data, stderr, "colo-compare",
+ ppkt->size);
+trace_colo_compare_icmp_miscompare("Secondary pkt size",
+   spkt->size);
+qemu_hexdump((char *)spkt->data, stderr, "colo-compare",
+ spkt->size);
+return -1;
+} else {
+return 0;
+}
+}
+
+/*
+ * Called from the compare thread on th

[Qemu-devel] [PULL V2 10/27] filter-rewriter: introduce filter-rewriter initialization

2016-09-27 Thread Jason Wang
From: Zhang Chen 

Filter-rewriter is a part of COLO project.
It will rewrite some of secondary packet to make
secondary guest's tcp connection established successfully.
In this module we will rewrite tcp packet's ack to the secondary
from primary,and rewrite tcp packet's seq to the primary from
secondary.

usage:

colo secondary:
-object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0
-object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1
-object filter-rewriter,id=rew0,netdev=hn0,queue=all

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
Signed-off-by: Jason Wang 
---
 net/Makefile.objs |   1 +
 net/filter-rewriter.c | 105 ++
 qemu-options.hx   |  13 +++
 vl.c  |   3 +-
 4 files changed, 121 insertions(+), 1 deletion(-)
 create mode 100644 net/filter-rewriter.c

diff --git a/net/Makefile.objs b/net/Makefile.objs
index beb504b..2a80df5 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -18,3 +18,4 @@ common-obj-y += filter-buffer.o
 common-obj-y += filter-mirror.o
 common-obj-y += colo-compare.o
 common-obj-y += colo.o
+common-obj-y += filter-rewriter.o
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
new file mode 100644
index 000..de29f07
--- /dev/null
+++ b/net/filter-rewriter.c
@@ -0,0 +1,105 @@
+/*
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 FUJITSU LIMITED
+ * Copyright (c) 2016 Intel Corporation
+ *
+ * Author: Zhang Chen 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "net/colo.h"
+#include "net/filter.h"
+#include "net/net.h"
+#include "qemu-common.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qerror.h"
+#include "qapi-visit.h"
+#include "qom/object.h"
+#include "qemu/main-loop.h"
+#include "qemu/iov.h"
+#include "net/checksum.h"
+
+#define FILTER_COLO_REWRITER(obj) \
+OBJECT_CHECK(RewriterState, (obj), TYPE_FILTER_REWRITER)
+
+#define TYPE_FILTER_REWRITER "filter-rewriter"
+
+typedef struct RewriterState {
+NetFilterState parent_obj;
+NetQueue *incoming_queue;
+/* hashtable to save connection */
+GHashTable *connection_track_table;
+} RewriterState;
+
+static void filter_rewriter_flush(NetFilterState *nf)
+{
+RewriterState *s = FILTER_COLO_REWRITER(nf);
+
+if (!qemu_net_queue_flush(s->incoming_queue)) {
+/* Unable to empty the queue, purge remaining packets */
+qemu_net_queue_purge(s->incoming_queue, nf->netdev);
+}
+}
+
+static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
+ NetClientState *sender,
+ unsigned flags,
+ const struct iovec *iov,
+ int iovcnt,
+ NetPacketSent *sent_cb)
+{
+/*
+ * if we get tcp packet
+ * we will rewrite it to make secondary guest's
+ * connection established successfully
+ */
+return 0;
+}
+
+static void colo_rewriter_cleanup(NetFilterState *nf)
+{
+RewriterState *s = FILTER_COLO_REWRITER(nf);
+
+/* flush packets */
+if (s->incoming_queue) {
+filter_rewriter_flush(nf);
+g_free(s->incoming_queue);
+}
+}
+
+static void colo_rewriter_setup(NetFilterState *nf, Error **errp)
+{
+RewriterState *s = FILTER_COLO_REWRITER(nf);
+
+s->connection_track_table = g_hash_table_new_full(connection_key_hash,
+  connection_key_equal,
+  g_free,
+  connection_destroy);
+s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
+}
+
+static void colo_rewriter_class_init(ObjectClass *oc, void *data)
+{
+NetFilterClass *nfc = NETFILTER_CLASS(oc);
+
+nfc->setup = colo_rewriter_setup;
+nfc->cleanup = colo_rewriter_cleanup;
+nfc->receive_iov = colo_rewriter_receive_iov;
+}
+
+static const TypeInfo colo_rewriter_info = {
+.name = TYPE_FILTER_REWRITER,
+.parent = TYPE_NETFILTER,
+.class_init = colo_rewriter_class_init,
+.instance_size = sizeof(RewriterState),
+};
+
+static void register_types(void)
+{
+type_register_static(&colo_rewriter_info);
+}
+
+type_init(register_types);
diff --git a/qemu-options.hx b/qemu-options.hx
index d0ed69a..987d055 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3887,6 +3887,19 @@ Create a filter-redirector we need to differ outdev id 
from indev id, id can not
 be the same. we can just use indev or outdev, but at least one of indev or 
outdev
 need to be specified.
 
+@item -object 
filter-rewriter,id=@var{id},netdev=@var{netdevid},rewriter-mode=@var{mode}[,queue=@var{all|rx|tx}]
+
+Filter-rewriter is a part of COLO project.It

[Qemu-devel] [PULL V2 16/27] tap: Allow specifying a bridge

2016-09-27 Thread Jason Wang
From: Alexey Kardashevskiy 

The tap backend is already using qemu-bridge-helper to attach tap
interface to a bridge but (unlike the bridge backend) it always uses
the default bridge name - br0.

This adds a "br" property support to the tap backend.

Signed-off-by: Alexey Kardashevskiy 
Reviewed-by: Greg Kurz 
Tested-by: Greg Kurz 
Signed-off-by: Jason Wang 
---
 net/tap.c|  4 +++-
 qapi-schema.json |  3 +++
 qemu-options.hx  | 12 +++-
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index 6abb962..b6896a7 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -857,7 +857,9 @@ free_fail:
 return -1;
 }
 
-fd = net_bridge_run_helper(tap->helper, DEFAULT_BRIDGE_INTERFACE,
+fd = net_bridge_run_helper(tap->helper,
+   tap->has_br ?
+   tap->br : DEFAULT_BRIDGE_INTERFACE,
errp);
 if (fd == -1) {
 return -1;
diff --git a/qapi-schema.json b/qapi-schema.json
index e507061..c3dcf11 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -2636,6 +2636,8 @@
 #
 # @downscript: #optional script to shut down the interface
 #
+# @br: #optional bridge name (since 2.8)
+#
 # @helper: #optional command to execute to configure bridge
 #
 # @sndbuf: #optional send buffer limit. Understands [TGMKkb] suffixes.
@@ -2665,6 +2667,7 @@
 '*fds':'str',
 '*script': 'str',
 '*downscript': 'str',
+'*br': 'str',
 '*helper': 'str',
 '*sndbuf': 'size',
 '*vnet_hdr':   'bool',
diff --git a/qemu-options.hx b/qemu-options.hx
index 987d055..01f01df 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1598,10 +1598,11 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
 "configure a host TAP network backend with ID 'str'\n"
 #else
 "-netdev 
tap,id=str[,fd=h][,fds=x:y:...:z][,ifname=name][,script=file][,downscript=dfile]\n"
-" 
[,helper=helper][,sndbuf=nbytes][,vnet_hdr=on|off][,vhost=on|off]\n"
+" 
[,br=bridge][,helper=helper][,sndbuf=nbytes][,vnet_hdr=on|off][,vhost=on|off]\n"
 " 
[,vhostfd=h][,vhostfds=x:y:...:z][,vhostforce=on|off][,queues=n]\n"
 " [,poll-us=n]\n"
 "configure a host TAP network backend with ID 'str'\n"
+"connected to a bridge (default=" DEFAULT_BRIDGE_INTERFACE 
")\n"
 "use network scripts 'file' (default=" 
DEFAULT_NETWORK_SCRIPT ")\n"
 "to configure it and 'dfile' (default=" 
DEFAULT_NETWORK_DOWN_SCRIPT ")\n"
 "to deconfigure it\n"
@@ -1888,8 +1889,8 @@ processed and applied to -net user. Mixing them with the 
new configuration
 syntax gives undefined results. Their use for new applications is discouraged
 as they will be removed from future versions.
 
-@item -netdev 
tap,id=@var{id}[,fd=@var{h}][,ifname=@var{name}][,script=@var{file}][,downscript=@var{dfile}][,helper=@var{helper}]
-@itemx -net 
tap[,vlan=@var{n}][,name=@var{name}][,fd=@var{h}][,ifname=@var{name}][,script=@var{file}][,downscript=@var{dfile}][,helper=@var{helper}]
+@item -netdev 
tap,id=@var{id}[,fd=@var{h}][,ifname=@var{name}][,script=@var{file}][,downscript=@var{dfile}][,br=@var{bridge}][,helper=@var{helper}]
+@itemx -net 
tap[,vlan=@var{n}][,name=@var{name}][,fd=@var{h}][,ifname=@var{name}][,script=@var{file}][,downscript=@var{dfile}][,br=@var{bridge}][,helper=@var{helper}]
 Connect the host TAP network interface @var{name} to VLAN @var{n}.
 
 Use the network script @var{file} to configure it and the network script
@@ -1900,8 +1901,9 @@ automatically provides one. The default network configure 
script is
 to disable script execution.
 
 If running QEMU as an unprivileged user, use the network helper
-@var{helper} to configure the TAP interface. The default network
-helper executable is @file{/path/to/qemu-bridge-helper}.
+@var{helper} to configure the TAP interface and attach it to the bridge.
+The default network helper executable is @file{/path/to/qemu-bridge-helper}
+and the default bridge device is @file{br0}.
 
 @option{fd}=@var{h} can be used to specify the handle of an already
 opened host TAP interface.
-- 
2.7.4




[Qemu-devel] [PULL V2 15/27] e1000: fix buliding complaint

2016-09-27 Thread Jason Wang
From: Gonglei 

hw/net/e1000e_core.c:56: warning: e1000e_set_interrupt_cause declared inline 
after being called
hw/net/e1000e_core.c:56: warning: previous declaration of 
e1000e_set_interrupt_cause was here

Signed-off-by: Gonglei 
Reviewed-by: Dmitry Fleytman 
Signed-off-by: Jason Wang 
---
 hw/net/e1000e_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index e0bd31c..03e3c46 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -2168,7 +2168,7 @@ e1000e_update_interrupt_state(E1000ECore *core)
 }
 }
 
-static inline void
+static void
 e1000e_set_interrupt_cause(E1000ECore *core, uint32_t val)
 {
 trace_e1000e_irq_set_cause_entry(val, core->mac[ICR]);
-- 
2.7.4




[Qemu-devel] [PULL V2 12/27] filter-rewriter: rewrite tcp packet to keep secondary connection

2016-09-27 Thread Jason Wang
From: Zhang Chen 

We will rewrite tcp packet secondary received and sent.
When colo guest is a tcp server.

Firstly, client start a tcp handshake. the packet's seq=client_seq,
ack=0,flag=SYN. COLO primary guest get this pkt and mirror(filter-mirror)
to secondary guest, secondary get it use filter-redirector.
Then,primary guest response pkt
(seq=primary_seq,ack=client_seq+1,flag=ACK|SYN).
secondary guest response pkt
(seq=secondary_seq,ack=client_seq+1,flag=ACK|SYN).
In here,we use filter-rewriter save the secondary_seq to it's tcp connection.
Finally handshake,client send pkt
(seq=client_seq+1,ack=primary_seq+1,flag=ACK).
Here,filter-rewriter can get primary_seq, and rewrite ack from primary_seq+1
to secondary_seq+1, recalculate checksum. So the secondary tcp connection
kept good.

When we send/recv packet.
client send pkt(seq=client_seq+1+data_len,ack=primary_seq+1,flag=ACK|PSH).
filter-rewriter rewrite ack and send to secondary guest.

primary guest response pkt
(seq=primary_seq+1,ack=client_seq+1+data_len,flag=ACK)
secondary guest response pkt
(seq=secondary_seq+1,ack=client_seq+1+data_len,flag=ACK)
we rewrite secondary guest seq from secondary_seq+1 to primary_seq+1.
So tcp connection kept good.

In code We use offset( = secondary_seq - primary_seq )
to rewrite seq or ack.
handle_primary_tcp_pkt: tcp_pkt->th_ack += offset;
handle_secondary_tcp_pkt: tcp_pkt->th_seq -= offset;

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
Signed-off-by: Jason Wang 
---
 net/colo.c|   2 +
 net/colo.h|   7 
 net/filter-rewriter.c | 112 +-
 trace-events  |   5 +++
 4 files changed, 124 insertions(+), 2 deletions(-)

diff --git a/net/colo.c b/net/colo.c
index 124994c..6a6eacd 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -134,6 +134,8 @@ Connection *connection_new(ConnectionKey *key)
 
 conn->ip_proto = key->ip_proto;
 conn->processing = false;
+conn->offset = 0;
+conn->syn_flag = 0;
 g_queue_init(&conn->primary_list);
 g_queue_init(&conn->secondary_list);
 
diff --git a/net/colo.h b/net/colo.h
index 6720a3a..7c524f3 100644
--- a/net/colo.h
+++ b/net/colo.h
@@ -62,6 +62,13 @@ typedef struct Connection {
 /* flag to enqueue unprocessed_connections */
 bool processing;
 uint8_t ip_proto;
+/* offset = secondary_seq - primary_seq */
+tcp_seq  offset;
+/*
+ * we use this flag update offset func
+ * run once in independent tcp connection
+ */
+int syn_flag;
 } Connection;
 
 uint32_t connection_key_hash(const void *opaque);
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index 9bf80d3..89abe72 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -10,6 +10,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "trace.h"
 #include "net/colo.h"
 #include "net/filter.h"
 #include "net/net.h"
@@ -58,6 +59,93 @@ static int is_tcp_packet(Packet *pkt)
 }
 }
 
+/* handle tcp packet from primary guest */
+static int handle_primary_tcp_pkt(NetFilterState *nf,
+  Connection *conn,
+  Packet *pkt)
+{
+struct tcphdr *tcp_pkt;
+
+tcp_pkt = (struct tcphdr *)pkt->transport_header;
+if (trace_event_get_state(TRACE_COLO_FILTER_REWRITER_DEBUG)) {
+char *sdebug, *ddebug;
+sdebug = strdup(inet_ntoa(pkt->ip->ip_src));
+ddebug = strdup(inet_ntoa(pkt->ip->ip_dst));
+trace_colo_filter_rewriter_pkt_info(__func__, sdebug, ddebug,
+ntohl(tcp_pkt->th_seq), ntohl(tcp_pkt->th_ack),
+tcp_pkt->th_flags);
+trace_colo_filter_rewriter_conn_offset(conn->offset);
+g_free(sdebug);
+g_free(ddebug);
+}
+
+if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_SYN)) {
+/*
+ * we use this flag update offset func
+ * run once in independent tcp connection
+ */
+conn->syn_flag = 1;
+}
+
+if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_ACK)) {
+if (conn->syn_flag) {
+/*
+ * offset = secondary_seq - primary seq
+ * ack packet sent by guest from primary node,
+ * so we use th_ack - 1 get primary_seq
+ */
+conn->offset -= (ntohl(tcp_pkt->th_ack) - 1);
+conn->syn_flag = 0;
+}
+/* handle packets to the secondary from the primary */
+tcp_pkt->th_ack = htonl(ntohl(tcp_pkt->th_ack) + conn->offset);
+
+net_checksum_calculate((uint8_t *)pkt->data, pkt->size);
+}
+
+return 0;
+}
+
+/* handle tcp packet from secondary guest */
+static int handle_secondary_tcp_pkt(NetFilterState *nf,
+Connection *conn,
+Packet *pkt)
+{
+struct tcphdr *tcp_pkt;
+
+tcp_pkt = (struct tcphdr *)pkt->transport_header;
+
+if (trace_event_get_state(TRACE_COLO_FILTER_REWRITER_DEB

[Qemu-devel] [PULL V2 11/27] filter-rewriter: track connection and parse packet

2016-09-27 Thread Jason Wang
From: Zhang Chen 

We use net/colo.h to track connection and parse packet

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
Signed-off-by: Jason Wang 
---
 net/colo.c| 14 ++
 net/colo.h|  1 +
 net/filter-rewriter.c | 50 ++
 3 files changed, 65 insertions(+)

diff --git a/net/colo.c b/net/colo.c
index 94f5992..124994c 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -114,6 +114,20 @@ void fill_connection_key(Packet *pkt, ConnectionKey *key)
 }
 }
 
+void reverse_connection_key(ConnectionKey *key)
+{
+struct in_addr tmp_ip;
+uint16_t tmp_port;
+
+tmp_ip = key->src;
+key->src = key->dst;
+key->dst = tmp_ip;
+
+tmp_port = key->src_port;
+key->src_port = key->dst_port;
+key->dst_port = tmp_port;
+}
+
 Connection *connection_new(ConnectionKey *key)
 {
 Connection *conn = g_slice_new(Connection);
diff --git a/net/colo.h b/net/colo.h
index 9a7d5e0..6720a3a 100644
--- a/net/colo.h
+++ b/net/colo.h
@@ -68,6 +68,7 @@ uint32_t connection_key_hash(const void *opaque);
 int connection_key_equal(const void *opaque1, const void *opaque2);
 int parse_packet_early(Packet *pkt);
 void fill_connection_key(Packet *pkt, ConnectionKey *key);
+void reverse_connection_key(ConnectionKey *key);
 Connection *connection_new(ConnectionKey *key);
 void connection_destroy(void *opaque);
 Connection *connection_get(GHashTable *connection_track_table,
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index de29f07..9bf80d3 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -44,6 +44,20 @@ static void filter_rewriter_flush(NetFilterState *nf)
 }
 }
 
+/*
+ * Return 1 on success, if return 0 means the pkt
+ * is not TCP packet
+ */
+static int is_tcp_packet(Packet *pkt)
+{
+if (!parse_packet_early(pkt) &&
+pkt->ip->ip_p == IPPROTO_TCP) {
+return 1;
+} else {
+return 0;
+}
+}
+
 static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
  NetClientState *sender,
  unsigned flags,
@@ -51,11 +65,47 @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
  int iovcnt,
  NetPacketSent *sent_cb)
 {
+RewriterState *s = FILTER_COLO_REWRITER(nf);
+Connection *conn;
+ConnectionKey key;
+Packet *pkt;
+ssize_t size = iov_size(iov, iovcnt);
+char *buf = g_malloc0(size);
+
+iov_to_buf(iov, iovcnt, 0, buf, size);
+pkt = packet_new(buf, size);
+
 /*
  * if we get tcp packet
  * we will rewrite it to make secondary guest's
  * connection established successfully
  */
+if (pkt && is_tcp_packet(pkt)) {
+
+fill_connection_key(pkt, &key);
+
+if (sender == nf->netdev) {
+/*
+ * We need make tcp TX and RX packet
+ * into one connection.
+ */
+reverse_connection_key(&key);
+}
+conn = connection_get(s->connection_track_table,
+  &key,
+  NULL);
+
+if (sender == nf->netdev) {
+/* NET_FILTER_DIRECTION_TX */
+/* handle_primary_tcp_pkt */
+} else {
+/* NET_FILTER_DIRECTION_RX */
+/* handle_secondary_tcp_pkt */
+}
+}
+
+packet_destroy(pkt, NULL);
+pkt = NULL;
 return 0;
 }
 
-- 
2.7.4




[Qemu-devel] [PULL V2 19/27] e1000e: Flush receive queues on link up

2016-09-27 Thread Jason Wang
From: Dmitry Fleytman 

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Jason Wang 
---
 hw/net/e1000e_core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index ea2a484..e8d50f6 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -1807,6 +1807,7 @@ e1000e_core_set_link_status(E1000ECore *core)
core->autoneg_timer);
 } else {
 e1000x_update_regs_on_link_up(core->mac, core->phy[0]);
+e1000e_start_recv(core);
 }
 }
 
@@ -2187,6 +2188,8 @@ e1000e_autoneg_timer(void *opaque)
 E1000ECore *core = opaque;
 if (!qemu_get_queue(core->owner_nic)->link_down) {
 e1000x_update_regs_on_autoneg_done(core->mac, core->phy[0]);
+e1000e_start_recv(core);
+
 e1000e_update_flowctl_status(core);
 /* signal link status change to the guest */
 e1000e_set_interrupt_cause(core, E1000_ICR_LSC);
-- 
2.7.4




[Qemu-devel] [PULL V2 17/27] net: limit allocation in nc_sendv_compat

2016-09-27 Thread Jason Wang
From: Peter Lieven 

we only need to allocate enough memory to hold the packet. This might be
less than NET_BUFSIZE. Additionally fail early if the packet is larger
than NET_BUFSIZE.

Signed-off-by: Peter Lieven 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Jason Wang 
---
 net/net.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/net.c b/net/net.c
index 0bec096..ec984bf 100644
--- a/net/net.c
+++ b/net/net.c
@@ -690,9 +690,13 @@ static ssize_t nc_sendv_compat(NetClientState *nc, const 
struct iovec *iov,
 buffer = iov[0].iov_base;
 offset = iov[0].iov_len;
 } else {
-buf = g_new(uint8_t, NET_BUFSIZE);
+offset = iov_size(iov, iovcnt);
+if (offset > NET_BUFSIZE) {
+return -1;
+}
+buf = g_malloc(offset);
 buffer = buf;
-offset = iov_to_buf(iov, iovcnt, 0, buf, NET_BUFSIZE);
+offset = iov_to_buf(iov, iovcnt, 0, buf, offset);
 }
 
 if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) {
-- 
2.7.4




[Qemu-devel] [PULL V2 14/27] docs: Add documentation for COLO-proxy

2016-09-27 Thread Jason Wang
From: Zhang Chen 

Introduce the design of COLO-proxy, and how to use it.

Signed-off-by: Zhang Chen 
Signed-off-by: Jason Wang 
---
 docs/colo-proxy.txt | 188 
 1 file changed, 188 insertions(+)
 create mode 100644 docs/colo-proxy.txt

diff --git a/docs/colo-proxy.txt b/docs/colo-proxy.txt
new file mode 100644
index 000..76767cb
--- /dev/null
+++ b/docs/colo-proxy.txt
@@ -0,0 +1,188 @@
+COLO-proxy
+--
+Copyright (c) 2016 Intel Corporation
+Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+Copyright (c) 2016 Fujitsu, Corp.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+This document gives an overview of COLO proxy's design.
+
+== Background ==
+COLO-proxy is a part of COLO project. It is used
+to compare the network package to help COLO decide
+whether to do checkpoint. With COLO-proxy's help,
+COLO greatly improves the performance.
+
+The filter-redirector, filter-mirror, colo-compare
+and filter-rewriter compose the COLO-proxy.
+
+== Architecture ==
+
+COLO-Proxy is based on qemu netfilter and it's a plugin for qemu netfilter
+(except colo-compare). It keep Secondary VM connect normally to
+client and compare packets sent by PVM with sent by SVM.
+If the packet difference, notify COLO-frame to do checkpoint and send
+all primary packet has queued. Otherwise just send the queued primary
+packet and drop the queued secondary packet.
+
+Below is a COLO proxy ascii figure:
+
+ Primary qemu   
Secondary qemu
++--+   
++
+| +--+ |   |  
+---+ |
+| |  | |   |  |
   | |
+| |guest | |   |  |
guest  | |
+| |  | |   |  |
   | |
+| +---^--+---+ |   |  
+-+++ |
+| |  | |   |   
 ^|  |
+| |  | |   |   
 ||  |
+| |  +--+  |   
 ||  |
+|netfilter|  |   | ||  |   
netfilter||  |
+| +--+ ++  ||  |  
+---+ |
+| |   |  |   |  |out   ||  |  |
 ||  filter excute order   | |
+| |   |  |  +-+||  |  |
 || +--->  | |
+| |   |  |  ||  | |||  |  |
 ||   TCP  | |
+| | +-+--+-+  +-v+ +-v+ |pri +++sec||  |  | 
++  +---++---v+rewriter++  ++ | |
+| | |  |  |  | |  | |in  | |in ||  |  | |  
  |  ||  |  || | |
+| | |  filter  |  |  filter  | |  filter  +-->  colo   <--+ +> 
 filter   +--> adjust |   adjust +-->   filter   | | |
+| | |  mirror  |  |redirector| |redirector| || compare |   |  ||  | | 
redirector |  | ack|   seq|  | redirector | | |
+| | |  |  |  | |  | || |   |  ||  | |  
  |  ||  |  || | |
+| | +^-+  ++-+ +--+ |+-+   |  ||  | 
++  ++--+  +---++ | |
+| |  |   tx|   rx   rx  |  |  ||  |
txall   |  rx  | |
+| |  | ||  |  ||  
+---+ |
+| |  | +--+ |  |  ||   
||
+| |  |   filter excute order  | |  |  ||   
||
+| |   

[Qemu-devel] [PULL V2 13/27] MAINTAINERS: add maintainer for COLO-proxy

2016-09-27 Thread Jason Wang
From: Zhang Chen 

add Zhang Chen and Li zhijian as co-maintainers of COLO-proxy.

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
Signed-off-by: Jason Wang 
---
 MAINTAINERS | 9 +
 1 file changed, 9 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index a3a2ad7..f3c1f7f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1364,6 +1364,15 @@ F: util/uuid.c
 F: include/qemu/uuid.h
 F: tests/test-uuid.c
 
+COLO Proxy
+M: Zhang Chen 
+M: Li Zhijian 
+S: Supported
+F: docs/colo-proxy.txt
+F: net/colo*
+F: net/filter-rewriter.c
+F: net/filter-mirror.c
+
 Usermode Emulation
 --
 Overall
-- 
2.7.4




[Qemu-devel] [PULL V2 21/27] e1000e: Fix PBACLR implementation

2016-09-27 Thread Jason Wang
From: Dmitry Fleytman 

This patch fixes incorrect check for
interrypt type being used.

PBSCLR register is valid for MSI-X only.

See spec. 10.2.3.13 MSI—X PBA Clear

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Jason Wang 
---
 hw/net/e1000e_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index a198a88..a5751ac 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -2347,7 +2347,7 @@ e1000e_set_pbaclr(E1000ECore *core, int index, uint32_t 
val)
 
 core->mac[PBACLR] = val & E1000_PBACLR_VALID_MASK;
 
-if (msix_enabled(core->owner)) {
+if (!msix_enabled(core->owner)) {
 return;
 }
 
-- 
2.7.4




[Qemu-devel] [PULL V2 20/27] e1000e: Fix CTRL_EXT.EIAME behavior

2016-09-27 Thread Jason Wang
From: Dmitry Fleytman 

CTRL_EXT.EIAME bit controls clearing of IAM bits,
but current code clears IMS bits instead.

See spec. 10.2.2.5 Extended Device Control Register.

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Jason Wang 
---
 hw/net/e1000e_core.c | 4 ++--
 hw/net/trace-events  | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index e8d50f6..a198a88 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -2008,8 +2008,8 @@ e1000e_msix_notify_one(E1000ECore *core, uint32_t cause, 
uint32_t int_cfg)
 }
 
 if (core->mac[CTRL_EXT] & E1000_CTRL_EXT_EIAME) {
-trace_e1000e_irq_ims_clear_eiame(core->mac[IAM], cause);
-e1000e_clear_ims_bits(core, core->mac[IAM] & cause);
+trace_e1000e_irq_iam_clear_eiame(core->mac[IAM], cause);
+core->mac[IAM] &= ~cause;
 }
 
 trace_e1000e_irq_icr_clear_eiac(core->mac[ICR], core->mac[EIAC]);
diff --git a/hw/net/trace-events b/hw/net/trace-events
index 47ab14a..1a5c909 100644
--- a/hw/net/trace-events
+++ b/hw/net/trace-events
@@ -223,7 +223,7 @@ e1000e_irq_icr_read_entry(uint32_t icr) "Starting ICR read. 
Current ICR: 0x%x"
 e1000e_irq_icr_read_exit(uint32_t icr) "Ending ICR read. Current ICR: 0x%x"
 e1000e_irq_icr_clear_zero_ims(void) "Clearing ICR on read due to zero IMS"
 e1000e_irq_icr_clear_iame(void) "Clearing ICR on read due to IAME"
-e1000e_irq_ims_clear_eiame(uint32_t iam, uint32_t cause) "Clearing IMS due to 
EIAME, IAM: 0x%X, cause: 0x%X"
+e1000e_irq_iam_clear_eiame(uint32_t iam, uint32_t cause) "Clearing IMS due to 
EIAME, IAM: 0x%X, cause: 0x%X"
 e1000e_irq_icr_clear_eiac(uint32_t icr, uint32_t eiac) "Clearing ICR bits due 
to EIAC, ICR: 0x%X, EIAC: 0x%X"
 e1000e_irq_ims_clear_set_imc(uint32_t val) "Clearing IMS bits due to IMC write 
0x%x"
 e1000e_irq_fire_delayed_interrupts(void) "Firing delayed interrupts"
-- 
2.7.4




[Qemu-devel] [PULL V2 22/27] e1000e: Fix OTHER interrupts processing for MSI-X

2016-09-27 Thread Jason Wang
From: Dmitry Fleytman 

Interrupt mask for legacy OTHER causes should
not apply to MSI-X OTHER cause.

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Jason Wang 
---
 hw/net/e1000e_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index a5751ac..d26b611 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -2131,7 +2131,7 @@ e1000e_update_interrupt_state(E1000ECore *core)
 
 /* Set ICR[OTHER] for MSI-X */
 if (is_msix) {
-if (core->mac[ICR] & core->mac[IMS] & E1000_ICR_OTHER_CAUSES) {
+if (core->mac[ICR] & E1000_ICR_OTHER_CAUSES) {
 core->mac[ICR] |= E1000_ICR_OTHER;
 trace_e1000e_irq_add_msi_other(core->mac[ICR]);
 }
-- 
2.7.4




[Qemu-devel] [PULL V2 18/27] e1000e: Flush all receive queues on receive enable

2016-09-27 Thread Jason Wang
From: Dmitry Fleytman 

Before this patch first netdev queue only was flushed.

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Jason Wang 
---
 hw/net/e1000e.c  | 2 +-
 hw/net/e1000e_core.c | 2 +-
 hw/net/e1000e_core.h | 3 +++
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
index bad43f4..4994e1c 100644
--- a/hw/net/e1000e.c
+++ b/hw/net/e1000e.c
@@ -400,7 +400,7 @@ static void e1000e_write_config(PCIDevice *pci_dev, 
uint32_t address,
 
 if (range_covers_byte(address, len, PCI_COMMAND) &&
 (pci_dev->config[PCI_COMMAND] & PCI_COMMAND_MASTER)) {
-qemu_flush_queued_packets(qemu_get_queue(s->nic));
+e1000e_start_recv(&s->core);
 }
 }
 
diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 03e3c46..ea2a484 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -953,7 +953,7 @@ e1000e_has_rxbufs(E1000ECore *core, const E1000E_RingInfo 
*r,
  core->rx_desc_buf_size;
 }
 
-static inline void
+void
 e1000e_start_recv(E1000ECore *core)
 {
 int i;
diff --git a/hw/net/e1000e_core.h b/hw/net/e1000e_core.h
index 5f413a9..1ff6978 100644
--- a/hw/net/e1000e_core.h
+++ b/hw/net/e1000e_core.h
@@ -144,3 +144,6 @@ e1000e_receive(E1000ECore *core, const uint8_t *buf, size_t 
size);
 
 ssize_t
 e1000e_receive_iov(E1000ECore *core, const struct iovec *iov, int iovcnt);
+
+void
+e1000e_start_recv(E1000ECore *core);
-- 
2.7.4




[Qemu-devel] [PULL V2 26/27] mcf_fec: fix error in qemu_send_packet argument

2016-09-27 Thread Jason Wang
From: Paolo Bonzini 

This uses the wrong frame size for packets composed of multiple
descriptors.

Signed-off-by: Paolo Bonzini 
Signed-off-by: Jason Wang 
---
 hw/net/mcf_fec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/mcf_fec.c b/hw/net/mcf_fec.c
index d31fea1..dc61bac 100644
--- a/hw/net/mcf_fec.c
+++ b/hw/net/mcf_fec.c
@@ -177,7 +177,7 @@ static void mcf_fec_do_tx(mcf_fec_state *s)
 if (bd.flags & FEC_BD_L) {
 /* Last buffer in frame.  */
 DPRINTF("Sending packet\n");
-qemu_send_packet(qemu_get_queue(s->nic), frame, len);
+qemu_send_packet(qemu_get_queue(s->nic), frame, frame_size);
 ptr = frame;
 frame_size = 0;
 s->eir |= FEC_INT_TXF;
-- 
2.7.4




Re: [Qemu-devel] [PATCH] proto: add 'shift' extension.

2016-09-27 Thread Paolo Bonzini
> We could go in a different direction and export flag
> 'has_zero_init' which will report that the storage is
> initialized with all zeroes at the moment. In this
> case mirroring code will not fall into this
> branch.

Why don't you add the zero_init flag to QEMU's NBD driver instead?

Thanks,

Paolo



[Qemu-devel] [PULL V2 24/27] e1000e: Fix EIAC register implementation

2016-09-27 Thread Jason Wang
From: Dmitry Fleytman 

This patch fixes 2 issues:

1. Bits set in EIAC register should be cleared
   from IMS when EIAM is not used.
2. Only bit that corresonds to the interrupt being
   raised should be cleared.

See spec. 10.2.4.7 Interrupt Auto Clear

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Jason Wang 
---
 hw/net/e1000e_core.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 0298136..6505983 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -2015,13 +2015,17 @@ e1000e_msix_notify_one(E1000ECore *core, uint32_t 
cause, uint32_t int_cfg)
 
 trace_e1000e_irq_icr_clear_eiac(core->mac[ICR], core->mac[EIAC]);
 
-if (core->mac[EIAC] & E1000_ICR_OTHER) {
-effective_eiac = (core->mac[EIAC] & E1000_EIAC_MASK) |
- E1000_ICR_OTHER_CAUSES;
-} else {
-effective_eiac = core->mac[EIAC] & E1000_EIAC_MASK;
+effective_eiac = core->mac[EIAC] & cause;
+
+if (effective_eiac == E1000_ICR_OTHER) {
+effective_eiac |= E1000_ICR_OTHER_CAUSES;
 }
+
 core->mac[ICR] &= ~effective_eiac;
+
+if (!(core->mac[CTRL_EXT] & E1000_CTRL_EXT_IAME)) {
+core->mac[IMS] &= ~effective_eiac;
+}
 }
 
 static void
-- 
2.7.4




[Qemu-devel] [PULL V2 23/27] e1000e: Fix spurious RX TCP ACK interrupts

2016-09-27 Thread Jason Wang
From: Dmitry Fleytman 

Do not raise ACK interrupts when
RFCTL.ACKDIS bit is set (see spec. 10.2.5.16).

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Jason Wang 
---
 hw/net/e1000e_core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index d26b611..0298136 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -1710,7 +1710,8 @@ e1000e_receive_iov(E1000ECore *core, const struct iovec 
*iov, int iovcnt)
 }
 
 /* Perform ACK receive detection */
-if (e1000e_is_tcp_ack(core, core->rx_pkt)) {
+if  (!(core->mac[RFCTL] & E1000_RFCTL_ACK_DIS) &&
+ (e1000e_is_tcp_ack(core, core->rx_pkt))) {
 n |= E1000_ICS_ACK;
 }
 
-- 
2.7.4




Re: [Qemu-devel] [PATCH RFC] tests: Run qtest cases in parallel

2016-09-27 Thread Daniel P. Berrange
On Fri, Sep 23, 2016 at 03:58:07PM +0800, Fam Zheng wrote:
> On Wed, 09/21 14:24, John Snow wrote:
> > 
> > 
> > On 08/12/2016 05:19 AM, Fam Zheng wrote:
> > > Previously all test cases in a category, such as check-qtest-y, are
> > > executed in a single long gtester command. This patch separates each
> > > test into its own make target to allow better parallism.
> > > 
> > > Signed-off-by: Fam Zheng 
> > > ---
> > > 
> > > This saves 50% of the time "make check takes" compared to on master
> > > (I use -j8). RFC because I'm not sure if the new gcov usage is correct
> > > with the now much higher level of parallism compared to before.
> > > ---
> > >  tests/Makefile.include | 34 ++
> > >  1 file changed, 18 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/tests/Makefile.include b/tests/Makefile.include
> > > index 14be491..9bf0326 100644
> > > --- a/tests/Makefile.include
> > > +++ b/tests/Makefile.include
> > > @@ -691,27 +691,29 @@ GCOV_OPTIONS = -n $(if $(V),-f,)
> > >  # gtester tests, possibly with verbose output
> > > 
> > >  .PHONY: $(patsubst %, check-qtest-%, $(QTEST_TARGETS))
> > > -$(patsubst %, check-qtest-%, $(QTEST_TARGETS)): check-qtest-%: 
> > > $(check-qtest-y)
> > > +
> > > +qtest-run-%: tests/%
> > >   $(if $(CONFIG_GCOV),@rm -f *.gcda */*.gcda */*/*.gcda */*/*/*.gcda,)
> > > - $(call quiet-command,QTEST_QEMU_BINARY=$*-softmmu/qemu-system-$* \
> > > + $(call quiet-command,\
> > > + $(if $(QTEST_TARGET), \
> > > + 
> > > QTEST_QEMU_BINARY=$(QTEST_TARGET)-softmmu/qemu-system-$(QTEST_TARGET)) \
> > >   QTEST_QEMU_IMG=qemu-img$(EXESUF) \
> > >   MALLOC_PERTURB_=$${MALLOC_PERTURB_:-$$((RANDOM % 255 + 1))} \
> > > - gtester $(GTESTER_OPTIONS) -m=$(SPEED) $(check-qtest-$*-y) 
> > > $(check-qtest-generic-y),"GTESTER $@")
> > > - $(if $(CONFIG_GCOV),@for f in $(gcov-files-$*-y) 
> > > $(gcov-files-generic-y); do \
> > > -   echo Gcov report for $$f:;\
> > > -   $(GCOV) $(GCOV_OPTIONS) $$f -o `dirname $$f`; \
> > > - done,)
> > > + gtester $< $(GTESTER_OPTIONS) -m=$(SPEED),"GTESTER $<")
> > > + $(if $(CONFIG_GCOV), @echo Gcov report for $<:;\
> > > +   $(GCOV) $(GCOV_OPTIONS) $< -o `dirname $<`; \
> > > + )
> > > +
> > > +$(foreach target, $(QTEST_TARGETS), \
> > > + $(eval check-qtest-$(target): QTEST_TARGET := $(target)) \
> > > + $(eval check-qtest-$(target): $(patsubst tests/%, qtest-run-%, \
> > > + $(check-qtest-y) \
> > > + $(check-qtest-$(target)-y) \
> > > + $(check-qtest-generic-y))) \
> > > +)
> > > 
> > >  .PHONY: $(patsubst %, check-%, $(check-unit-y))
> > > -$(patsubst %, check-%, $(check-unit-y)): check-%: %
> > > - $(if $(CONFIG_GCOV),@rm -f *.gcda */*.gcda */*/*.gcda */*/*/*.gcda,)
> > > - $(call quiet-command, \
> > > - MALLOC_PERTURB_=$${MALLOC_PERTURB_:-$$((RANDOM % 255 + 1))} \
> > > - gtester $(GTESTER_OPTIONS) -m=$(SPEED) $*,"GTESTER $*")
> > > - $(if $(CONFIG_GCOV),@for f in $(gcov-files-$(subst tests/,,$*)-y) 
> > > $(gcov-files-generic-y); do \
> > > -   echo Gcov report for $$f:;\
> > > -   $(GCOV) $(GCOV_OPTIONS) $$f -o `dirname $$f`; \
> > > - done,)
> > > +$(patsubst %, check-%, $(check-unit-y)): check-tests/%: qtest-run-%
> > > 
> > >  # gtester tests with XML output
> > > 
> > > 
> > 
> > I can't vouch for gcov either, but:
> 
> Too bad that this seems to be too big a hammer that breaks the way gcov was
> used: 1) the "rm *.gcda" command now runs in each test, so it's harder to
> get an overall coverage report; 2) there is a risk that the parallism may
> corrupt those files. :(

Almost no one will ever build with gcov support, so how about just
making the "normal" case run in parallel and add a special target
for running under gcov which serializes.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



[Qemu-devel] [PULL V2 25/27] net: mcf: limit buffer descriptor count

2016-09-27 Thread Jason Wang
From: Prasad J Pandit 

ColdFire Fast Ethernet Controller uses buffer descriptors to manage
data flow to/fro receive & transmit queues. While transmitting
packets, it could continue to read buffer descriptors if a buffer
descriptor has length of zero and has crafted values in bd.flags.
Set upper limit to number of buffer descriptors.

Reported-by: Li Qiang 
Signed-off-by: Prasad J Pandit 
Reviewed-by: Paolo Bonzini 
Signed-off-by: Jason Wang 
---
 hw/net/mcf_fec.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/net/mcf_fec.c b/hw/net/mcf_fec.c
index 0ee8ad9..d31fea1 100644
--- a/hw/net/mcf_fec.c
+++ b/hw/net/mcf_fec.c
@@ -23,6 +23,7 @@ do { printf("mcf_fec: " fmt , ## __VA_ARGS__); } while (0)
 #define DPRINTF(fmt, ...) do {} while(0)
 #endif
 
+#define FEC_MAX_DESC 1024
 #define FEC_MAX_FRAME_SIZE 2032
 
 typedef struct {
@@ -149,7 +150,7 @@ static void mcf_fec_do_tx(mcf_fec_state *s)
 uint32_t addr;
 mcf_fec_bd bd;
 int frame_size;
-int len;
+int len, descnt = 0;
 uint8_t frame[FEC_MAX_FRAME_SIZE];
 uint8_t *ptr;
 
@@ -157,7 +158,7 @@ static void mcf_fec_do_tx(mcf_fec_state *s)
 ptr = frame;
 frame_size = 0;
 addr = s->tx_descriptor;
-while (1) {
+while (descnt++ < FEC_MAX_DESC) {
 mcf_fec_read_bd(&bd, addr);
 DPRINTF("tx_bd %x flags %04x len %d data %08x\n",
 addr, bd.flags, bd.length, bd.data);
-- 
2.7.4




Re: [Qemu-devel] [PATCH v3 07/10] ppc/pnv: add XSCOM infrastructure

2016-09-27 Thread Benjamin Herrenschmidt
On Tue, 2016-09-27 at 11:30 +0200, Cédric Le Goater wrote:
> On 09/27/2016 11:10 AM, Cédric Le Goater wrote:
> > 
> > > 
> > > > 
> > > > +#include 
> > > > +
> > > > +static void xscom_complete(uint64_t hmer_bits)
> > > > +{
> > > > +CPUState *cs = current_cpu;
> > > 
> > > Hmm.. is current_cpu a safe thing to use in the case of KVM or MTTCG?
> > 
> > yes, as we are running under cpu_exec when doing this call.
> 
> well, this is not true under the monitor. 
> 
> So we will have to come up with something to handle xscom read/writes 
> from the monitor. Could we use first_cpu in that case ? 

Well, we'll need to find the chip etc... I think the XSCOM bridge (or bus)
should have special methods for the monitor that don't update HMER but
print out the status instead.

Cheers,
Ben.

> Thanks,
> 
> C. 
> 
> > 
> > > 
> > > > 
> > > > +PowerPCCPU *cpu = POWERPC_CPU(cs);
> > > > +CPUPPCState *env = &cpu->env;
> > > > +
> > > > +cpu_synchronize_state(cs);
> > > > +env->spr[SPR_HMER] |= hmer_bits;
> > > > +
> > > > +/* XXX Need a CPU helper to set HMER, also handle gneeration
> > > > + * of HMIs
> > > > + */
> > 
> > Ben, 
> > 
> > The CPU helper would be to replicate the value of the SPR_HMER in all
> > the threads of the core I guess ? 
> > 
> > Thanks,
> > 
> > C.
> > 



Re: [Qemu-devel] [Qemu-stable] [ANNOUNCE] QEMU 2.6.1 Stable released

2016-09-27 Thread Peter Lieven

Am 16.09.2016 um 15:56 schrieb Peter Lieven:

Am 13.09.2016 um 20:04 schrieb Michael Roth:

Quoting Peter Lieven (2016-09-13 10:52:04)

Am 13.09.2016 um 17:42 schrieb Stefan Hajnoczi:


On Thu, Sep 08, 2016 at 03:58:26PM -0500, Michael Roth wrote:
Quoting Stefan Hajnoczi (2016-09-05 12:54:35)

On Fri, Aug 26, 2016 at 01:45:56PM +0200, Peter Lieven wrote:

Am 25.08.2016 um 19:23 schrieb Michael Roth:
Quoting Peter Lieven (2016-08-25 01:38:13)

7c509d1 virtio: decrement vq->inuse in virtqueue_discard()
700f26b virtio: recalculate vq->inuse after migration

Looks like these got posted during the freeze :(


The virtio thing is important because live migration is broken without
the fix as  86cc089 is in 2.6.1.

Not sure I understand the relation to 86cc089. Wouldn't the check
introduced there always pass due to target initializing inuse to 0?

Or is the issue that the fix introduced in 86cc089 is only partially
effective due to inuse not being recalculated properly on target? That might
warrant a 2.6.1.1...

This is what Stefan wrote in the cover letter to the series:

"I should mention this is for QEMU 2.7. These fixes are needed if the
CVE-2016-5403 patch has been applied. Without these patches any device that 
holds VirtQueueElements acros
live migration will terminate with a "Virtqueue size exceeded" error message. 
virtio-balloon and virtio-scsi are affected. virtio-bl
probably too but I haven't tested it."

Maybe

The virtio inuse fixes are needed for stable (v2.6.2?) so that the
spurious "Virtqueue size exceeded" on migration is solved.

The error can be reproduced when there is a VirtQueueElement pending
across migration (e.g. virtio-blk s->rq failed request list).

Thanks for clarifying. I'm planning to do a 2.6.2 to capture these, the
patches Peter mentioned, and some other fixes that came during 2.7 RC
phase.

I have an initial staging tree at:

  https://github.com/mdroth/qemu/commits/stable-2.6-staging

There's still a few PULLs in flight with patches I plan to pull in, but
hoping to send out the patch round-up early next week and a release the
following week.

Two more candidates for stable:

4b7f91e virtio: zero vq->inuse in virtio_reset()
104e70c virtio-balloon: discard virtqueue element on reset

They also deal with "Virtqueue size exceeded" errors.

Stefan

There also seems to be an regression (segfault) in the VNC server in 2.6.1, but 
i am still investigating.

Do you have a reproducer? I can try a bisect. Trying to get the initial
staging tree posted today but want to make sure any known regressions are
addressed beforehand.


Hi Michael,

we have not been able to reproduce anymore. My guess is that our client had a 
bug in the new version
and that the regression can only happen in a special connection state. But we 
are still trying
to reproduce.


We are again able to reproduce the VNC error. The vServer dies with:

qemu: qemu_mutex_lock: Invalid argument

We are working on it.

Peter


Re: [Qemu-devel] [PATCH RFC] tests: Run qtest cases in parallel

2016-09-27 Thread Daniel P. Berrange
On Fri, Sep 23, 2016 at 05:59:05PM +0800, Fam Zheng wrote:
> On Fri, 09/23 09:39, Gonglei (Arei) wrote:
> > 
> > Hi Fam,
> > 
> > 
> > > -Original Message-
> > > From: Qemu-devel
> > > [mailto:qemu-devel-bounces+arei.gonglei=huawei@nongnu.org] On
> > > Behalf Of Fam Zheng
> > > Sent: Friday, September 23, 2016 3:58 PM
> > > To: John Snow
> > > Cc: pbonz...@redhat.com; qemu-devel@nongnu.org
> > > Subject: Re: [Qemu-devel] [PATCH RFC] tests: Run qtest cases in parallel
> > > 
> > > On Wed, 09/21 14:24, John Snow wrote:
> > > >
> > > >
> > > > On 08/12/2016 05:19 AM, Fam Zheng wrote:
> > > > > Previously all test cases in a category, such as check-qtest-y, are
> > > > > executed in a single long gtester command. This patch separates each
> > > > > test into its own make target to allow better parallism.
> > > > >
> > 
> > That's will be great if we can specify a test to run, especially for the 
> > scenario
> > which add one use qtest case.
> > 
> > For example: 
> > 
> >  # make check test-crypto-cipher
> > 
> > then only run the tests/ test-crypto-cipher. 
> > 
> > Do you think it makes sense?
> 
> Or more likely:
> 
> # make check TESTS="test-crypto-cipher test-crypto-hash ..."
> 
> Usually I just extract the gtester command line with V=1 and run it from my
> shell prompt.  Feel free to send a patch, though.

Shouldn't even need todo that in most cases - I tend to just do

  make tests/test-crypto-cipher && ./tess/test-crypto-cipher

If there are tests which rely on some environment set by the Makefile,
then really they should be fixed to have sensible defaults so that they
can be directly executed.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



  1   2   3   4   5   >