date:20150728

Re: [Qemu-devel] [PATCH v4 2/2] QOM: object_property_add() performance improvement

2015-07-28 Thread Paolo Bonzini



On 28/07/2015 08:45, Pavel Fedin wrote:
>  I can try to reengineer this and see what happens. If it works fine, will 
> such rework be accepted? [*] expansion would still be slow, but we could 
> deprecate it.
> 
>  I have just done a search of "[*]" across all *.c files, and here is what i 
> came up with:
> 1. memory_region_init()
> 2. xlnx_zynqmp_init()
> 3. qdev_init_gpio_in_named()
> 4. qdev_init_gpio_out_named()
> 5. qdev_connect_gpio_out_named()
> 6. spapr_dr_connector_new()
> 
>  Cases 2, 3, 4 can be reengineered for sure. The rest - i don't know, however 
> perhaps they are not common cases. I think (1) could also be problematic. How 
> many regions with the same name can we have?

Just worry about 3 and 4, they are the big offenders.

Paolo

Re: [Qemu-devel] [PATCH for-2.4 0/2] AioContext: fix deadlock after aio_context_acquire() race

2015-07-28 Thread Cornelia Huck

On Mon, 27 Jul 2015 17:33:37 +0100
Stefan Hajnoczi  wrote:

> See Patch 2 for details on the deadlock after two aio_context_acquire() calls
> race.  This caused dataplane to hang on startup.
> 
> Patch 1 is a memory leak fix for AioContext that's needed by Patch 2.
> 
> Stefan Hajnoczi (2):
>   AioContext: avoid leaking BHs on cleanup
>   AioContext: force event loop iteration using BH
> 
>  async.c | 29 +++--
>  include/block/aio.h |  3 +++
>  2 files changed, 30 insertions(+), 2 deletions(-)
> 

Just gave this a try: The stripped-down guest that hangs during startup
on master is working fine with these patches applied, and my full setup
works as well.

So,

Tested-by: Cornelia Huck

[Qemu-devel] [PATCH] memory: Add function pointers checks to memory_region_read/write()

2015-07-28 Thread Salva Peiró

The memory.c file directly calls the function pointers provided in
the MemoryRegionOps to handle read and write operations for memory regions.
The function pointers are called without checking if the function
pointers are initialised, therefore, leading to QEMU SIGSEGV when the
operations are not initialised.

The patch adds explicit checks to function pointers before issuing the calls.

Signed-off-by: Salva Peiró 
---
 memory.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/memory.c b/memory.c
index 5e5f325..d588c5f 100644
--- a/memory.c
+++ b/memory.c
@@ -380,6 +380,9 @@ static MemTxResult 
memory_region_oldmmio_read_accessor(MemoryRegion *mr,
 {
 uint64_t tmp;
 
+if (!mr->ops->old_mmio.read) {
+return MEMTX_ERROR;
+}
 tmp = mr->ops->old_mmio.read[ctz32(size)](mr->opaque, addr);
 trace_memory_region_ops_read(mr, addr, tmp, size);
 *value |= (tmp & mask) << shift;
@@ -396,6 +399,9 @@ static MemTxResult  
memory_region_read_accessor(MemoryRegion *mr,
 {
 uint64_t tmp;
 
+if (!mr->ops->read) {
+return MEMTX_ERROR;
+}
 tmp = mr->ops->read(mr->opaque, addr, size);
 trace_memory_region_ops_read(mr, addr, tmp, size);
 *value |= (tmp & mask) << shift;
@@ -413,6 +419,9 @@ static MemTxResult 
memory_region_read_with_attrs_accessor(MemoryRegion *mr,
 uint64_t tmp = 0;
 MemTxResult r;
 
+if (!mr->ops->read_with_attrs) {
+return MEMTX_ERROR;
+}
 r = mr->ops->read_with_attrs(mr->opaque, addr, &tmp, size, attrs);
 trace_memory_region_ops_read(mr, addr, tmp, size);
 *value |= (tmp & mask) << shift;
@@ -429,6 +438,9 @@ static MemTxResult 
memory_region_oldmmio_write_accessor(MemoryRegion *mr,
 {
 uint64_t tmp;
 
+if (!mr->ops->old_mmio.write) {
+return MEMTX_ERROR;
+}
 tmp = (*value >> shift) & mask;
 trace_memory_region_ops_write(mr, addr, tmp, size);
 mr->ops->old_mmio.write[ctz32(size)](mr->opaque, addr, tmp);
@@ -447,6 +459,9 @@ static MemTxResult 
memory_region_write_accessor(MemoryRegion *mr,
 
 tmp = (*value >> shift) & mask;
 trace_memory_region_ops_write(mr, addr, tmp, size);
+if (!mr->ops->write) {
+return MEMTX_ERROR;
+}
 mr->ops->write(mr->opaque, addr, tmp, size);
 return MEMTX_OK;
 }
@@ -463,6 +478,9 @@ static MemTxResult 
memory_region_write_with_attrs_accessor(MemoryRegion *mr,
 
 tmp = (*value >> shift) & mask;
 trace_memory_region_ops_write(mr, addr, tmp, size);
+if (!mr->ops->write_with_attrs) {
+return MEMTX_ERROR;
+}
 return mr->ops->write_with_attrs(mr->opaque, addr, tmp, size, attrs);
 }
 
@@ -1187,6 +1205,8 @@ void memory_region_init_io(MemoryRegion *mr,
uint64_t size)
 {
 memory_region_init(mr, owner, name, size);
+assert(ops != NULL);
+
 mr->ops = ops;
 mr->opaque = opaque;
 mr->terminates = true;
-- 
2.1.4

[Qemu-devel] memory: Add function pointers checks to memory_region_read/write()

2015-07-28 Thread Salva Peiró


The situation where QEMU crashes while attempting to call to a NULL
function pointer from a non-initialised field in the MemoryRegionOps
struct happens for the majority of emulated devices:
One approach for solving this is to correct it for each device. 
The other approach is to correct the memory_region_read/write caller
functions at memory.c to ensure that only initialised function pointers
are being called. This approach has the benefit of solving this kind of
error for all emulated devices.

The following patch adds function pointers checks to memory_region_read/write()

Re: [Qemu-devel] [PATCH v2 33/45] ivshmem-server: fix hugetlbfs support

2015-07-28 Thread Andrew Jones

On Tue, Jul 28, 2015 at 02:32:45AM +0200, Marc-André Lureau wrote:
> From: Marc-André Lureau 
> 
> As pointed out on the ML by Andrew Jones, glibc no longer permits
> creating POSIX shm on hugetlbfs directly. When given a hugetlbfs path,
> create a shareable file there.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  contrib/ivshmem-server/ivshmem-server.c | 47 
> -
>  contrib/ivshmem-server/ivshmem-server.h |  3 +--
>  contrib/ivshmem-server/main.c   |  5 ++--
>  3 files changed, 49 insertions(+), 6 deletions(-)
> 
> diff --git a/contrib/ivshmem-server/ivshmem-server.c 
> b/contrib/ivshmem-server/ivshmem-server.c
> index 972fda2..4bf774b 100644
> --- a/contrib/ivshmem-server/ivshmem-server.c
> +++ b/contrib/ivshmem-server/ivshmem-server.c
> @@ -11,6 +11,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "qemu-common.h"
>  #include "qemu/queue.h"
> @@ -271,15 +272,59 @@ ivshmem_server_init(IvshmemServer *server, const char 
> *unix_sock_path,
>  return 0;
>  }
>  
> +#define HUGETLBFS_MAGIC   0x958458f6
> +
> +static long gethugepagesize(const char *path)
> +{
> +struct statfs fs;
> +int ret;
> +
> +do {
> +ret = statfs(path, &fs);
> +} while (ret != 0 && errno == EINTR);
> +
> +if (ret != 0) {
> +if (errno != ENOENT) {
> +fprintf(stderr, "cannot stat shm file %s: %s\n", path,
> +strerror(errno));
> +}
> +return -1;
> +}
> +
> +if (fs.f_type != HUGETLBFS_MAGIC) {
> +return -1;
> +}
> +
> +return fs.f_bsize;
> +}
> +
> +
> +
few extra lines here
>  /* open shm, create and bind to the unix socket */
>  int
>  ivshmem_server_start(IvshmemServer *server)
>  {
>  struct sockaddr_un sun;
>  int shm_fd, sock_fd, ret;
> +long hpagesize;
> +gchar *filename;
>  
>  /* open shm file */
> -shm_fd = shm_open(server->shm_path, O_CREAT|O_RDWR, S_IRWXU);
> +hpagesize = gethugepagesize(server->shm_path);
> +if (hpagesize > 0) {
> +if (server->shm_size < hpagesize) {
should be >, but isn't this forcing the shared memory to be less than
equal to the size of a single hugepage? I think we should allow up to
nr-pages * page-size.

Also, I'm not sure we want the dependency, but what about libhugetlbfs?
It has hugetlbfs_test_path

> +fprintf(stderr, "hugepage must be at least of size: %ld\n",
> +hpagesize);
> +return -1;
> +}
> +filename = g_strdup_printf("%s/ivshmem.XX", server->shm_path);
> +shm_fd = mkstemp(filename);

Shouldn't we change the perms for shm_fd to match the non-hugetlbfs
case? Or change the non-hugetlbfs case to match mkstemp? Actually,
probably the later, because I don't think we want the region to have
execute perms, or do we? Also, I guess the plan is to pass the hugetlbfs
file descriptor around if other host processes need to know where the
memory is, as we never allow a full path. Should we do the same for shm?
i.e. keep them anonymous too and always pass file descriptors?

> +unlink(filename);
> +g_free(filename);
> +} else {
> +shm_fd = shm_open(server->shm_path, O_CREAT|O_RDWR, S_IRWXU);
> +}
> +
>  if (shm_fd < 0) {
>  fprintf(stderr, "cannot open shm file %s: %s\n", server->shm_path,
>  strerror(errno));
> diff --git a/contrib/ivshmem-server/ivshmem-server.h 
> b/contrib/ivshmem-server/ivshmem-server.h
> index 2176d5e..e9b0e7a 100644
> --- a/contrib/ivshmem-server/ivshmem-server.h
> +++ b/contrib/ivshmem-server/ivshmem-server.h
> @@ -81,8 +81,7 @@ typedef struct IvshmemServer {
>   * @server: A pointer to an uninitialized IvshmemServer structure
>   * @unix_sock_path: The pointer to the unix socket file name
>   * @shm_path:   Path to the shared memory. The path corresponds to a 
> POSIX
> - *  shm name. To use a real file, for instance in a 
> hugetlbfs,
> - *  it is possible to use /../../abspath/to/file.
> + *  shm name or a hugetlbfs mount point.
>   * @shm_size:   Size of shared memory
>   * @n_vectors:  Number of interrupt vectors per client
>   * @verbose:True to enable verbose mode
> diff --git a/contrib/ivshmem-server/main.c b/contrib/ivshmem-server/main.c
> index 84ffc4d..cd8d9ed 100644
> --- a/contrib/ivshmem-server/main.c
> +++ b/contrib/ivshmem-server/main.c
> @@ -47,9 +47,8 @@ ivshmem_server_usage(const char *name, int code)
>  " to listen to.\n"
>  " Default=%s\n", 
> IVSHMEM_SERVER_DEFAULT_UNIX_SOCK_PATH);
>  fprintf(stderr, "  -m : path to the shared memory.\n"
> -" The path corresponds to a POSIX shm name. To use 
> a\n"
> -" real file, for instance in a hugetlbfs, use\n"
> -" /../../abspath/to/file.\n"
> +" The path corr

Re: [Qemu-devel] [PATCH RFC v2 29/47] qapi: Replace dirty is_c_ptr() by method c_null()

2015-07-28 Thread Markus Armbruster

Eric Blake  writes:

> On 07/01/2015 02:22 PM, Markus Armbruster wrote:
>> is_c_ptr() looks whether the end of the C text for the type looks like
>> a pointer.  Works, but is fragile.
>> 
>> We now have a better tool: use QAPISchemaType method c_null().  The
>> initializers for non-pointers become prettier: 0, false or the
>> enumeration constant with the value 0 instead of {0}.
>> 
>> One place looks suspicious: we initialize pointers, but not
>> non-pointers.  Either the initialization is superfluous and should be
>> deleted, or the non-pointers need it as well, or something subtle is
>> going on and needs a comment.  Since I lack the time to figure it out
>> now, mark it FIXME.
>
> Commenting on just this part for now (out of time to review this patch
> proper for today):
>
>> @@ -214,7 +208,8 @@ def gen_marshal_input(name, args, ret_type, middle_mode):
>>  header=hdr)
>>  
>>  if ret_type:
>> -if is_c_ptr(ret_type.c_type()):
>> +# FIXME fishy: only pointers are initialized
>> +if ret_type.c_null() == 'NULL':
>>  retval = "%s retval = NULL;" % ret_type.c_type()
>>  else:
>>  retval = "%s retval;" % ret_type.c_type()
>
> Here's an example function impacted by this:
>
> static void qmp_marshal_input_guest_file_open(QDict *args, QObject
> **ret, Error
> **errp)
> {
> Error *local_err = NULL;
> int64_t retval;
> QmpInputVisitor *mi = qmp_input_visitor_new_strict(QOBJECT(args));
> QapiDeallocVisitor *md;
> Visitor *v;
> char *path = NULL;
> bool has_mode = false;
> char *mode = NULL;
> ...
> retval = qmp_guest_file_open(path, has_mode, mode, &local_err);
> if (local_err) {
> goto out;
> }
>
> qmp_marshal_output_guest_file_open(retval, ret, &local_err);
>
> But compare that to any other function that returns a pointer, and
> you'll see the same pattern (the only use of retval is in the final call
> to qmp_marshal_output..., right after assigning retval); that is,

Correct.  Inspection of qapi-commands.py shows retval is only ever read
in the call of qmp_marshal_output_FOO().

> initializing to NULL is dead code, and you could get away with just
> always declaring it instead of worrying about c_null() in this code.
>
> Or maybe we have a leak - if the 'if (local_err)' can ever trigger even
> when a function returned non-NULL, then our out: label is missing a
> free(retval), but only when retval is a pointer.  Or maybe we change the
> code to assert that retval is NULL if local_err is set after calling the
> user's function, to prove we don't have a leak.

Let me rephrase to make sure I understand.

Ignore the (not rets) case, because retval doesn't exist then.

qmp_marshal_output_FOO() visits retval twice.  First, with a QMP output
visitor to do the actual marshalling.  Second, with a QAPI dealloc
visitor to destroy it.

If we execute the assignment to retval, we must go on to call
qmp_marshal_output_FOO(), or else we have a leak.

If we can reach qmp_marshal_output_FOO() without executing the
assignment, we must initialize retval.  If we can't, any initialization
is unused.

gen_call() generates code of the form

retval = qmp_FOO(... args ..., &local_err);
if (local_err) {
goto out;
}

qmp_marshal_output_FOO(retval, ret, &local_err);

Its caller then generates

out:
error_propagate(errp, local_err);

and so forth.

Observe:

1. The assignment dominates the only use.  Therefore, the initialization
   is unused.  Let's drop it in a separate cleanup patch.

2. We can leak retval only when qmp_FOO() returns non-null and local_err
   is non-null.  This must not happen, because:

   a. local_err must be null before the call, and

   b. the call must not return non-null when it sets local_err.

   We could right after out: assert(!local_err || !retval).  Not sure
   it's worthwhile.

TL;DR: I concur with your analysis.

Re: [Qemu-devel] [PATCH v2 42/45] ivshmem: add hostmem backend

2015-07-28 Thread Andrew Jones

On Tue, Jul 28, 2015 at 02:32:54AM +0200, Marc-André Lureau wrote:
> From: Marc-André Lureau 
> 
> Instead of handling allocation, teach ivshmem to use a memory backend.
> This allows to use hugetlbfs backed memory now.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  hw/misc/ivshmem.c| 85 
> 
>  tests/ivshmem-test.c | 12 
>  2 files changed, 78 insertions(+), 19 deletions(-)
> 
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index ecf31f9..a150500 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -26,6 +26,8 @@
>  #include "qemu/event_notifier.h"
>  #include "qemu/fifo8.h"
>  #include "sysemu/char.h"
> +#include "sysemu/hostmem.h"
> +#include "qapi/visitor.h"
>  
>  #include "hw/misc/ivshmem.h"
>  
> @@ -57,6 +59,8 @@
>  #define IVSHMEM(obj) \
>  OBJECT_CHECK(IVShmemState, (obj), TYPE_IVSHMEM)
>  
> +#define IVSHMEM_MEMDEV_PROP "memdev"
> +
>  typedef struct Peer {
>  int nb_eventfds;
>  EventNotifier *eventfds;
> @@ -72,6 +76,7 @@ typedef struct IVShmemState {
>  PCIDevice parent_obj;
>  /*< public >*/
>  
> +HostMemoryBackend *hostmem;
>  uint32_t intrmask;
>  uint32_t intrstatus;
>  
> @@ -700,9 +705,23 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error 
> **errp)
>  PCI_BASE_ADDRESS_MEM_PREFETCH;;
>  Error *local_err = NULL;
>  
> -s->shm_fd = -1;
> +if (!!s->server_chr + !!s->shmobj + !!s->hostmem != 1) {
> +error_setg(errp, "You must specify either a shmobj, a chardev"
> +   " or a hostmem");
> +return;
> +}
> +
> +if (s->hostmem) {
> +MemoryRegion *mr;
>  
> -if (s->sizearg == NULL) {
> +if (s->sizearg) {
> +error_setg(errp, "hostmem requires no size argument");
> +return;
Just warn instead? "size argument ignored with hostmem" Otherwise
"hostmem must not specify a size argument" would be more clear
> +}
> +
> +mr = host_memory_backend_get_memory(s->hostmem, errp);
> +s->ivshmem_size = memory_region_size(mr);
> +} else if (s->sizearg == NULL) {
>  s->ivshmem_size = 4 << 20; /* 4 MB default */
>  } else {
>  s->ivshmem_size = parse_mem_size(s->sizearg, &local_err);
> @@ -758,7 +777,16 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error 
> **errp)
>  attr |= PCI_BASE_ADDRESS_MEM_TYPE_64;
>  }
>  
> -if (s->server_chr != NULL) {
> +if (s->hostmem != NULL) {
> +MemoryRegion *mr;
> +
> +IVSHMEM_DPRINTF("using hostmem\n");
> +
> +mr = host_memory_backend_get_memory(MEMORY_BACKEND(s->hostmem), 
> errp);
> +vmstate_register_ram(mr, DEVICE(s));
> +memory_region_add_subregion(&s->bar, 0, mr);
> +pci_register_bar(PCI_DEVICE(s), 2, attr, &s->bar);
> +} else if (s->server_chr != NULL) {
>  if (strncmp(s->server_chr->filename, "unix:", 5)) {
>  error_setg(errp, "chardev is not a unix client socket");
>  return;
> @@ -767,12 +795,6 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error 
> **errp)
>  /* if we get a UNIX socket as the parameter we will talk
>   * to the ivshmem server to receive the memory region */
>  
> -if (s->shmobj != NULL) {
> -error_setg(errp, "do not specify both 'chardev' "
> -   "and 'shm' with ivshmem");
> -return;
> -}
> -
>  IVSHMEM_DPRINTF("using shared memory server (socket = %s)\n",
>  s->server_chr->filename);
>  
> @@ -796,11 +818,6 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error 
> **errp)
>  /* just map the file immediately, we're not using a server */
>  int fd;
>  
> -if (s->shmobj == NULL) {
> -error_setg(errp, "Must specify 'chardev' or 'shm' to ivshmem");
> -return;
> -}
> -
>  IVSHMEM_DPRINTF("using shm_open (shm object = %s)\n", s->shmobj);
>  
>  /* try opening with O_EXCL and if it succeeds zero the memory
> @@ -840,14 +857,17 @@ static void pci_ivshmem_exit(PCIDevice *dev)
>  }
>  
>  if (memory_region_is_mapped(&s->ivshmem)) {
> -void *addr = memory_region_get_ram_ptr(&s->ivshmem);
> +if (!s->hostmem) {
> +void *addr = memory_region_get_ram_ptr(&s->ivshmem);
> +
> +if (munmap(addr, s->ivshmem_size) == -1) {
> +error_report("Failed to munmap shared memory %s",
> + strerror(errno));
> +}
> +}
>  
>  vmstate_unregister_ram(&s->ivshmem, DEVICE(dev));
>  memory_region_del_subregion(&s->bar, &s->ivshmem);
> -
> -if (munmap(addr, s->ivshmem_size) == -1) {
> -error_report("Failed to munmap shared memory %s", 
> strerror(errno));
> -}
>  }
>  
>  if (s->eventfd_chr) {
> @@ -955,10 +975,37 @@ static void ivshmem_class_init(ObjectClass *klass, void 
> *da

Re: [Qemu-devel] [PATCH v12 00/15] vfio-pci: pass the aer error to guest

2015-07-28 Thread Chen, Hanxiao

Hi, Alex

> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> To: Chen, Hanxiao
> Cc: qemu-devel@nongnu.org; Chen, Fan
> Subject: Re: [Qemu-devel] [PATCH v12 00/15] vfio-pci: pass the aer error to 
> guest
> 
> On Thu, 2015-07-16 at 12:00 +0800, Chen Hanxiao wrote:
> > From: Chen Fan 
> >
> > For now, when qemu receives an error from host aer report
> > by vfio pci passthough devices, qemu just terminate the guest.
> > Usually user want to know what error occurred
> > rather than stop the guest.
> >
> > This patches add aer capability support for vfio device,
> > then pass the error to guest, and let guest driver to recover
> > from the error.
> > Turning on SERR# for error forwording in bridge control register
> > patch in seabios has been merged as commit 32ec3ee.
> >
> > notes:
> >   this series patches enable aer support single/multi-function,
> >   for multi-function, require all the function of the slot assigned to
> >   VM and on the same slot.
> >
> > Chen Fan (15):
> >   vfio: extract vfio_get_hot_reset_info as a single function
> >   vfio: squeeze out vfio_pci_do_hot_reset for support bus reset
> >   pcie: modify the capability size assert
> >   vfio: make the 4 bytes aligned for capability size
> >   vfio: add pcie extanded capability support
> >   aer: impove pcie_aer_init to support vfio device
> >   vfio: add aer support for vfio device
> >   vfio: add check host bus reset is support or not
> >   pci: add bus reset_notifiers callbacks for host bus reset
> >   vfio: add sec_bus_reset notifier to notify physical bus reset is
> > needed
> >   vfio: modify vfio_pci_hot_reset to support bus reset
> >   vfio: do hot bus reset when do virtual secondary bus reset
> >   pcie_aer: expose pcie_aer_msg() interface
> >   vfio-pci: pass the aer error to guest
> >   vfio: add 'aer' property to expose aercap
> >
> >  hw/pci-bridge/ioh3420.c|   2 +-
> >  hw/pci-bridge/xio3130_downstream.c |   2 +-
> >  hw/pci-bridge/xio3130_upstream.c   |   2 +-
> >  hw/pci/pci.c   |  16 +
> >  hw/pci/pci_bridge.c|   6 +
> >  hw/pci/pcie.c  |   2 +-
> >  hw/pci/pcie_aer.c  |   6 +-
> >  hw/vfio/pci.c  | 636 
> > +
> >  include/hw/pci/pci.h   |   4 +
> >  include/hw/pci/pci_bus.h   |   2 +
> >  include/hw/pci/pcie_aer.h  |   3 +-
> >  11 files changed, 609 insertions(+), 72 deletions(-)
> 
> 
> This seems to be pretty much the same as v11 where I commented that I
> didn't think it was acceptable to have a feature dependent on having all
> the functions assigned without supporting hot-add of multi-function
> devices.  Can you summarize what's changed here and whether that comment
> was addressed.  It would be a courtesy to reviewers to provide at least
> a summary changelog with each new version.  Thanks,
> 

We could hot-unplug all passthrough devices by device_del,
But currently Qemu could not hot-add multi-function pci device.

See TODO in pcie_cap_slot_hotplug_cb:
/* TODO: multifunction hot-plug.
 * Right now, only a device of function = 0 is allowed to be
 * hot plugged/unplugged.
 */
assert(PCI_FUNC(pci_dev->devfn) == 0);

So we had to limit this as a workaround.

Why can't  we add functions one by one to the same slot by device_add?

If we could add functions one by one,
then we can enable aer for the devices once all dependence functions
were added by setting aer as a dynamic property(such as using 
object_property_add, qom-set)

How do you think?

Regards,
- Chen

Re: [Qemu-devel] [PATCH] qmp-shell: add documentation

2015-07-28 Thread Kashyap Chamarthy

On Wed, Jul 01, 2015 at 02:25:49PM -0400, John Snow wrote:
> I should probably document the changes that were made.
> 
> Signed-off-by: John Snow 
> ---
>  scripts/qmp/qmp-shell | 35 +++
>  1 file changed, 35 insertions(+)

Since I did some tests[1] when you wrote these improvements to
qmp-shell, the change looks good, FWIW:

Reviewed-By: Kashyap Chamarthy 

[1]
https://lists.nongnu.org/archive/html/qemu-devel/2015-04/msg04201.html

> diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell
> index 65280d2..fa39bf0 100755
> --- a/scripts/qmp/qmp-shell
> +++ b/scripts/qmp/qmp-shell
> @@ -29,6 +29,41 @@
>  # (QEMU) device_add driver=e1000 id=net1
>  # {u'return': {}}
>  # (QEMU)
> +#
> +# key=value pairs also support Python or JSON object literal subset 
> notations,
> +# without spaces. Dictionaries/objects {} are supported as are arrays [].
> +#
> +#example-command arg-name1={'key':'value','obj'={'prop':"value"}}
> +#
> +# Both JSON and Python formatting should work, including both styles of
> +# string literal quotes. Both paradigms of literal values should work,
> +# including null/true/false for JSON and None/True/False for Python.
> +#
> +#
> +# Transactions have the following multi-line format:
> +#
> +#transaction(
> +#action-name1 [ arg-name1=arg1 ] ... [arg-nameN=argN ]
> +#...
> +#action-nameN [ arg-name1=arg1 ] ... [arg-nameN=argN ]
> +#)
> +#
> +# One line transactions are also supported:
> +#
> +#transaction( action-name1 ... )
> +#
> +# For example:
> +#
> +# (QEMU) transaction(
> +# TRANS> block-dirty-bitmap-add node=drive0 name=bitmap1
> +# TRANS> block-dirty-bitmap-clear node=drive0 name=bitmap0
> +# TRANS> )
> +# {"return": {}}
> +# (QEMU)
> +#
> +# Use the -v and -p options to activate the verbose and pretty-print options,
> +# which will echo back the properly formatted JSON-compliant QMP that is 
> being
> +# sent to QEMU, which is useful for debugging and documentation generation.
>  
>  import qmp
>  import json
> -- 
> 2.1.0
> 
> 

-- 
/kashyap

[Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?

2015-07-28 Thread Dennis Luehring

(i've posted the question already on qemu-disc...@nongnu.org but was 
toled to better use this mailing list)


i've prepared an Debian 7.8.0 image for SPARC64/qemu emulation for C/C++
development before-real-hardware big-endian/unaligned tests

i've benchmarked compiling of single pugixml.cpp 
(https://github.com/zeux/pugixml/blob/master/src/pugixml.cpp)


qemu-system-sparc64: >180sek
x64 native : ~ 2sek

so my sparc64 emulation is around 90 times slower then native x64

my system:

using lastest qemu git 2.3.x, with virtio for harddisk/network and qcow2 
image


https://depositfiles.com/files/sj20aqwp0 (~280MB
press the "regular download" button, wait some seconds, solve the
chapca, "download file in regular mode by browser"

there is pugi_sparc.txt in the 7z which describes how to start,use and
what is installed in the image

qemu runs natively under a ubuntu 15.04 (x64), Core i7, 8GB system doing
nothing but qemu

installed is

gcc/g++ 4.6
make
sshd running

compiling cmake 2.3.2 tooked around 10h
compiling pugixml takes also very very long

"top perf" from guest and host while compiling pugixml don't show big
blockers or something over time
http://pastebin.com/D2fUpPrM

anything i can do to speedup the emulation?

Re: [Qemu-devel] [PATCH for-2.4 0/2] AioContext: fix deadlock after aio_context_acquire() race

2015-07-28 Thread Cornelia Huck

On Tue, 28 Jul 2015 09:07:00 +0200
Cornelia Huck  wrote:

> On Mon, 27 Jul 2015 17:33:37 +0100
> Stefan Hajnoczi  wrote:
> 
> > See Patch 2 for details on the deadlock after two aio_context_acquire() 
> > calls
> > race.  This caused dataplane to hang on startup.
> > 
> > Patch 1 is a memory leak fix for AioContext that's needed by Patch 2.
> > 
> > Stefan Hajnoczi (2):
> >   AioContext: avoid leaking BHs on cleanup
> >   AioContext: force event loop iteration using BH
> > 
> >  async.c | 29 +++--
> >  include/block/aio.h |  3 +++
> >  2 files changed, 30 insertions(+), 2 deletions(-)
> > 
> 
> Just gave this a try: The stripped-down guest that hangs during startup
> on master is working fine with these patches applied, and my full setup
> works as well.
> 
> So,
> 
> Tested-by: Cornelia Huck 

Uh-oh, spoke too soon. It starts, but when I try a virsh managedsave, I
get

qemu-system-s390x: /data/git/yyy/qemu/async.c:242: aio_ctx_finalize: Assertion 
`ctx->first_bh->deleted' failed.

[Qemu-devel] [PULL 4/4] memory: do not add a reference to the owner of aliased regions

2015-07-28 Thread Paolo Bonzini

Very often the owner of the aliased region is the same as the owner of the alias
region itself.  When this happens, the reference count can never go back to 0 
and
the owner is leaked.  This is for example breaking hot-unplug of virtio-pci
devices (the device cannot be plugged back again with the same id).

Another common use for alias is to transform the system I/O address space
into an MMIO regions; in this case the aliased region never dies, so there
is no problem.  Otherwise the owner is always the same for aliasing
and aliased region.

I checked all calls to memory_region_init_alias introduced after commit
dfde4e6 (memory: add ref/unref calls, 2013-05-06) and they do not need the
reference in order to keep the owner of the aliased region alive.

Reported-by: Michael S. Tsirkin 
Tested-by: Michael S. Tsirkin 
Signed-off-by: Paolo Bonzini 
---
 memory.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/memory.c b/memory.c
index 5e5f325..4eb138a 100644
--- a/memory.c
+++ b/memory.c
@@ -859,11 +859,6 @@ static void memory_region_destructor_ram(MemoryRegion *mr)
 qemu_ram_free(mr->ram_addr);
 }
 
-static void memory_region_destructor_alias(MemoryRegion *mr)
-{
-memory_region_unref(mr->alias);
-}
-
 static void memory_region_destructor_ram_from_ptr(MemoryRegion *mr)
 {
 qemu_ram_free_from_ptr(mr->ram_addr);
@@ -1272,8 +1267,6 @@ void memory_region_init_alias(MemoryRegion *mr,
   uint64_t size)
 {
 memory_region_init(mr, owner, name, size);
-memory_region_ref(orig);
-mr->destructor = memory_region_destructor_alias;
 mr->alias = orig;
 mr->alias_offset = offset;
 }
-- 
2.4.3

[Qemu-devel] [PULL 3/4] megasas: Add write function to handle write access to PCI BAR 3

2015-07-28 Thread Paolo Bonzini

From: Salva Peiró 

This patch fixes a QEMU SEGFAULT when a write operation is performed on
the memory region of the PCI BAR 3 (base address space).
When a writeb(0xe000) is performed the .write function is invoked to
handle the write access, however, since the .write is not initialised,
the call to 0, causes QEMU to SEGFAULT.

Signed-off-by: Salva Peiró 
Acked-by: Hannes Reinecke 
Message-Id: <1437987112-24744-1-git-send-email-speir...@gmail.com>
Signed-off-by: Paolo Bonzini 
---
 hw/scsi/megasas.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index 51ba9e0..a04369c 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -2202,8 +2202,15 @@ static uint64_t megasas_queue_read(void *opaque, hwaddr 
addr,
 return 0;
 }
 
+static void megasas_queue_write(void *opaque, hwaddr addr,
+   uint64_t val, unsigned size)
+{
+return;
+}
+
 static const MemoryRegionOps megasas_queue_ops = {
 .read = megasas_queue_read,
+.write = megasas_queue_write,
 .endianness = DEVICE_LITTLE_ENDIAN,
 .impl = {
 .min_access_size = 8,
-- 
2.4.3

[Qemu-devel] [PULL 0/4] Fixes for 2.4.0-rc3

2015-07-28 Thread Paolo Bonzini

The following changes since commit f793d97e454a56d17e404004867985622ca1a63b:

  Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging 
(2015-07-24 13:07:10 +0100)

are available in the git repository at:

  git://github.com/bonzini/qemu.git tags/for-upstream

for you to fetch changes up to 52c91dac6bd891656f297dab76da51fc8bc61309:

  memory: do not add a reference to the owner of aliased regions (2015-07-27 
23:05:49 +0200)


* crypto fixes
* megasas SIGSEGV fix
* memory refcount change to fix virtio hot-unplug


Daniel P. Berrange (2):
  crypto: fix built-in AES decrypt function
  crypto: extend unit tests to cover decryption too

Paolo Bonzini (1):
  memory: do not add a reference to the owner of aliased regions

Salva Peiró (1):
  megasas: Add write function to handle write access to PCI BAR 3

 crypto/cipher-builtin.c|  8 
 hw/scsi/megasas.c  |  7 +++
 memory.c   |  7 ---
 tests/test-crypto-cipher.c | 28 
 4 files changed, 31 insertions(+), 19 deletions(-)
-- 
2.4.3

[Qemu-devel] [PULL 2/4] crypto: extend unit tests to cover decryption too

2015-07-28 Thread Paolo Bonzini

From: "Daniel P. Berrange" 

The current unit test only verifies the encryption API,
resulting in us missing a recently introduced bug in the
decryption API from commit d3462e3. It was fortunately
later discovered & fixed by commit bd09594, thanks to the
QEMU I/O tests for qcow2 encryption, but we should really
detect this directly in the crypto unit tests. Also remove
an accidental debug message and simplify some asserts.

Signed-off-by: Daniel P. Berrange 
Message-Id: <1437468902-23230-1-git-send-email-berra...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 tests/test-crypto-cipher.c | 28 
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/tests/test-crypto-cipher.c b/tests/test-crypto-cipher.c
index f9b1a03..9d38d26 100644
--- a/tests/test-crypto-cipher.c
+++ b/tests/test-crypto-cipher.c
@@ -226,12 +226,10 @@ static void test_cipher(const void *opaque)
 const QCryptoCipherTestData *data = opaque;
 
 QCryptoCipher *cipher;
-Error *err = NULL;
 uint8_t *key, *iv, *ciphertext, *plaintext, *outtext;
 size_t nkey, niv, nciphertext, nplaintext;
 char *outtexthex;
 
-g_test_message("foo");
 nkey = unhex_string(data->key, &key);
 niv = unhex_string(data->iv, &iv);
 nciphertext = unhex_string(data->ciphertext, &ciphertext);
@@ -244,28 +242,42 @@ static void test_cipher(const void *opaque)
 cipher = qcrypto_cipher_new(
 data->alg, data->mode,
 key, nkey,
-&err);
+&error_abort);
 g_assert(cipher != NULL);
-g_assert(err == NULL);
 
 
 if (iv) {
 g_assert(qcrypto_cipher_setiv(cipher,
   iv, niv,
-  &err) == 0);
-g_assert(err == NULL);
+  &error_abort) == 0);
 }
 g_assert(qcrypto_cipher_encrypt(cipher,
 plaintext,
 outtext,
 nplaintext,
-&err) == 0);
-g_assert(err == NULL);
+&error_abort) == 0);
 
 outtexthex = hex_string(outtext, nciphertext);
 
 g_assert_cmpstr(outtexthex, ==, data->ciphertext);
 
+g_free(outtexthex);
+
+if (iv) {
+g_assert(qcrypto_cipher_setiv(cipher,
+  iv, niv,
+  &error_abort) == 0);
+}
+g_assert(qcrypto_cipher_decrypt(cipher,
+ciphertext,
+outtext,
+nplaintext,
+&error_abort) == 0);
+
+outtexthex = hex_string(outtext, nplaintext);
+
+g_assert_cmpstr(outtexthex, ==, data->plaintext);
+
 g_free(outtext);
 g_free(outtexthex);
 g_free(key);
-- 
2.4.3

[Qemu-devel] [PULL 1/4] crypto: fix built-in AES decrypt function

2015-07-28 Thread Paolo Bonzini

From: "Daniel P. Berrange" 

The qcrypto_cipher_decrypt_aes method was using the wrong
key material, and passing the wrong mode. This caused it
to incorrectly decrypt ciphertext.

Signed-off-by: Daniel P. Berrange 
Message-Id: <1437740634-6261-1-git-send-email-berra...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 crypto/cipher-builtin.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/crypto/cipher-builtin.c b/crypto/cipher-builtin.c
index 912c1b9..30f4853 100644
--- a/crypto/cipher-builtin.c
+++ b/crypto/cipher-builtin.c
@@ -117,7 +117,7 @@ static int qcrypto_cipher_decrypt_aes(QCryptoCipher *cipher,
 uint8_t *outptr = out;
 while (len) {
 if (len > AES_BLOCK_SIZE) {
-AES_decrypt(inptr, outptr, &ctxt->state.aes.encrypt_key);
+AES_decrypt(inptr, outptr, &ctxt->state.aes.decrypt_key);
 inptr += AES_BLOCK_SIZE;
 outptr += AES_BLOCK_SIZE;
 len -= AES_BLOCK_SIZE;
@@ -126,15 +126,15 @@ static int qcrypto_cipher_decrypt_aes(QCryptoCipher 
*cipher,
 memcpy(tmp1, inptr, len);
 /* Fill with 0 to avoid valgrind uninitialized reads */
 memset(tmp1 + len, 0, sizeof(tmp1) - len);
-AES_decrypt(tmp1, tmp2, &ctxt->state.aes.encrypt_key);
+AES_decrypt(tmp1, tmp2, &ctxt->state.aes.decrypt_key);
 memcpy(outptr, tmp2, len);
 len = 0;
 }
 }
 } else {
 AES_cbc_encrypt(in, out, len,
-&ctxt->state.aes.encrypt_key,
-ctxt->state.aes.iv, 1);
+&ctxt->state.aes.decrypt_key,
+ctxt->state.aes.iv, 0);
 }
 
 return 0;
-- 
2.4.3

Re: [Qemu-devel] [PATCH RFC 2/6] posix: add linux-only memfd fallback

2015-07-28 Thread Paolo Bonzini



On 23/07/2015 17:25, Michael S. Tsirkin wrote:
> > +#ifdef CONFIG_LINUX
> > +
> > +#ifndef F_LINUX_SPECIFIC_BASE
> > +#define F_LINUX_SPECIFIC_BASE 1024
> > +#endif
> > +
> > +#ifndef F_ADD_SEALS
> > +#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
> > +#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
> > +
> > +#define F_SEAL_SEAL 0x0001  /* prevent further seals from being set */
> > +#define F_SEAL_SHRINK   0x0002  /* prevent file from shrinking */
> > +#define F_SEAL_GROW 0x0004  /* prevent file from growing */
> > +#define F_SEAL_WRITE0x0008  /* prevent writes */
> > +#endif
>
> These are from include/uapi/linux/fcntl.h,
> they should be imported into linux-headers I think.

linux-headers is usually used for virt-related features that we want in
QEMU a few weeks before they are distributed upstream.

Here, I think just including linux/fcntl.h is enough.

>> +#ifndef __NR_memfd_create
>> +#  if defined __x86_64__
>> +#define __NR_memfd_create 319
>> +#  elif defined __arm__
>> +#define __NR_memfd_create 385
>> +#  elif defined __aarch64__
>> +#define __NR_memfd_create 279
>> +#  elif defined _MIPS_SIM
>> +#if _MIPS_SIM == _MIPS_SIM_ABI32
>> +#  define __NR_memfd_create 4354
>> +#endif
>> +#if _MIPS_SIM == _MIPS_SIM_NABI32
>> +#  define __NR_memfd_create 6318
>> +#endif
>> +#if _MIPS_SIM == _MIPS_SIM_ABI64
>> +#  define __NR_memfd_create 5314
>> +#endif
> 
> What's defining all these macros?

They're in asm/unistd.h.

I think that, instead of making qemu/osdep.h the new qemu-common.h, the
wrappers added by patch 3 should be declared in a new header
qemu/memfd.h.  The implementation in util/memfd.c can include both
linux/fcntl.h and asm/unistd.h.

Paolo

Re: [Qemu-devel] [PATCH v2] exec: use macro ROUND_UP for alignment

2015-07-28 Thread Chen, Hanxiao


> -Original Message-
> From: qemu-devel-bounces+chenhanxiao=cn.fujitsu@nongnu.org
> [mailto:qemu-devel-bounces+chenhanxiao=cn.fujitsu@nongnu.org] On Behalf Of
> Chen Hanxiao
> Sent: Friday, July 24, 2015 11:12 AM
> To: Paolo Bonzini
> Cc: qemu-devel@nongnu.org
> Subject: [Qemu-devel] [PATCH v2] exec: use macro ROUND_UP for alignment
> 
> Use ROUND_UP instead.
> 
> Signed-off-by: Chen Hanxiao 
> ---
>  exec.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/exec.c b/exec.c
> index 7d60e15..bb16c50 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1207,7 +1207,7 @@ static void *file_ram_alloc(RAMBlock *block,
>  unlink(filename);
>  g_free(filename);
> 
> -memory = (memory+hpagesize-1) & ~(hpagesize-1);
> +memory = ROUND_UP(memory, hpagesize);
> 
>  /*
>   * ftruncate is not supported by hugetlbfs in older
> --
> 2.1.0
> 

ping

Regards,
- Chen

Re: [Qemu-devel] [PATCH v2] exec: use macro ROUND_UP for alignment

2015-07-28 Thread Paolo Bonzini



On 28/07/2015 10:11, Chen, Hanxiao wrote:
> 
>> -Original Message-
>> From: qemu-devel-bounces+chenhanxiao=cn.fujitsu@nongnu.org
>> [mailto:qemu-devel-bounces+chenhanxiao=cn.fujitsu@nongnu.org] On Behalf 
>> Of
>> Chen Hanxiao
>> Sent: Friday, July 24, 2015 11:12 AM
>> To: Paolo Bonzini
>> Cc: qemu-devel@nongnu.org
>> Subject: [Qemu-devel] [PATCH v2] exec: use macro ROUND_UP for alignment
>>
>> Use ROUND_UP instead.
>>
>> Signed-off-by: Chen Hanxiao 
>> ---
>>  exec.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/exec.c b/exec.c
>> index 7d60e15..bb16c50 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -1207,7 +1207,7 @@ static void *file_ram_alloc(RAMBlock *block,
>>  unlink(filename);
>>  g_free(filename);
>>
>> -memory = (memory+hpagesize-1) & ~(hpagesize-1);
>> +memory = ROUND_UP(memory, hpagesize);
>>
>>  /*
>>   * ftruncate is not supported by hugetlbfs in older
>> --
>> 2.1.0
>>
> 
> ping

QEMU is in hard freeze.  It's normal to wait a few days more for
non-bugfix patches.  This one is okay, though.

Paolo

Re: [Qemu-devel] [PATCH for-2.4 0/2] AioContext: fix deadlock after aio_context_acquire() race

2015-07-28 Thread Stefan Hajnoczi

On Tue, Jul 28, 2015 at 10:02:26AM +0200, Cornelia Huck wrote:
> On Tue, 28 Jul 2015 09:07:00 +0200
> Cornelia Huck  wrote:
> 
> > On Mon, 27 Jul 2015 17:33:37 +0100
> > Stefan Hajnoczi  wrote:
> > 
> > > See Patch 2 for details on the deadlock after two aio_context_acquire() 
> > > calls
> > > race.  This caused dataplane to hang on startup.
> > > 
> > > Patch 1 is a memory leak fix for AioContext that's needed by Patch 2.
> > > 
> > > Stefan Hajnoczi (2):
> > >   AioContext: avoid leaking BHs on cleanup
> > >   AioContext: force event loop iteration using BH
> > > 
> > >  async.c | 29 +++--
> > >  include/block/aio.h |  3 +++
> > >  2 files changed, 30 insertions(+), 2 deletions(-)
> > > 
> > 
> > Just gave this a try: The stripped-down guest that hangs during startup
> > on master is working fine with these patches applied, and my full setup
> > works as well.
> > 
> > So,
> > 
> > Tested-by: Cornelia Huck 
> 
> Uh-oh, spoke too soon. It starts, but when I try a virsh managedsave, I
> get
> 
> qemu-system-s390x: /data/git/yyy/qemu/async.c:242: aio_ctx_finalize: 
> Assertion `ctx->first_bh->deleted' failed.

Please pretty-print ctx->first_bh in gdb.  In particular, which function
is ctx->first_bh->cb pointing to?

I tried reproducing with qemu-system-x86_64 and a RHEL 7 guest but
couldn't trigger the assertion failure.

This assertion means that there is an *existing* QEMUBH memory leak.  It
is not introduced by this patch series.  If we run out of time for QEMU
2.4, it would be acceptable to replace the assertion with:

  /* TODO track down leaked BHs and turn this into an assertion */
  if (ctx->first_bh->deleted) {
  g_free(ctx->first_bh);
  }


pgp1jYOTu9RZJ.pgp
Description: PGP signature

Re: [Qemu-devel] [sheepdog] [PATCH] sheepdog: serialize requests to overwrapping area

2015-07-28 Thread Liu Yuan

On Sat, Jul 18, 2015 at 01:44:24AM +0900, Hitoshi Mitake wrote:
> Current sheepdog driver only serializes create requests in oid
> unit. This mechanism isn't enough for handling requests to
> overwrapping area spanning multiple oids, so it can result bugs like
> below:
> https://bugs.launchpad.net/sheepdog-project/+bug/1456421

I'm a bit late to review the patch since I'm not on the cc list, but I'd like to
get the idea how the mentioned bug relates to the serialization of requests?

The mentioned bug looks to me more a bug of sheepdog because the create and
write request will only unref a single oid. The bug report is unclear about
why the object idx in inode becomes zero, at least not pointing that it relates
to QEMU.

But this patch assume QEMU send the requests the wrong way and just vaguely
says it is just wrong without reason.

What is overrapping requests? As far as I understand, the request that stride
over two objects will be split into two, to make sure all the requests fit the
sheepdog object size. Allow requests run concurrently on different SD objects is
way achieving high performance. This patch mutes this feature, to me, without a
decent reason. 

Probably I miss something hidden, but I'd like someone enlighten me about it
because this patch might slow down QEMU VM over sheepdog.

Thanks,
Yuan

Re: [Qemu-devel] [PULL 18/19] macio: move unaligned DMA write code into separate pmac_dma_write() function

2015-07-28 Thread Mark Cave-Ayland

On 27/07/15 23:00, Aurelien Jarno wrote:

> On 2015-05-22 15:59, John Snow wrote:
>> From: Mark Cave-Ayland 
>>
>> Similarly switch the macio IDE routines over to use the new function and
>> tidy-up the remaining code as required.
>>
>> [Maintainer edit: printf format codes adjusted for 32/64bit. --js]
>>
>> Signed-off-by: Mark Cave-Ayland 
>> Acked-by: John Snow 
>> Message-id: 1425939893-14404-3-git-send-email-mark.cave-ayl...@ilande.co.uk
>> Signed-off-by: John Snow 
>> ---
>>  hw/ide/macio.c | 268 
>> +
>>  include/hw/ppc/mac_dbdma.h |   4 -
>>  2 files changed, 125 insertions(+), 147 deletions(-)
> 
> This patch has removed TRIM support without any obvious reason, and
> without mentioning it in the changelog. As a consequence guests with
> TRIM enabled now fail to boot:
> 
> | [   46.916047] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> | [   46.916545] ata1.00: failed command: DATA SET MANAGEMENT
> | [   46.916794] ata1.00: cmd 06/01:01:00:00:00/00:00:00:00:00/a0 tag 0 dma 
> 512 out
> | [   46.916794]  res 40/00:01:00:00:00/00:00:00:00:00/e0 Emask 0x20 
> (host bus error)
> | [   46.917219] ata1.00: status: { DRDY }
> | [   51.957389] ata1.00: qc timeout (cmd 0xec)
> | [   51.958076] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> | [   51.958551] ata1.00: revalidation failed (errno=-5)
> | [   56.996713] ata1: link is slow to respond, please be patient (ready=0)
> | [   61.981042] ata1: device not ready (errno=-16), forcing hardreset
> | [   61.981669] ata1: soft resetting link
> | [   62.137894] ata1.00: configured for MWDMA2
> | [   62.138294] ata1.00: device reported invalid CHS sector 0
> | [   62.139045] sd 0:0:0:0: [sda]  
> | [   62.139128] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> | [   62.139243] sd 0:0:0:0: [sda]  
> | [   62.139346] Sense Key : Aborted Command [current] [descriptor]
> | [   62.139581] Descriptor sense data with sense descriptors (in hex):
> | [   62.139670] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
> | [   62.139812] 00 00 00 00 
> | [   62.139897] sd 0:0:0:0: [sda]  
> | [   62.140009] Add. Sense: No additional sense information
> | [   62.140115] sd 0:0:0:0: [sda] CDB: 
> | [   62.140204] Write same(16): 93 08 00 00 00 00 03 c0 00 48 00 3f ff b8 00 
> 00
> | [   62.140661] end_request: I/O error, dev sda, sector 62914632
> | [   62.141270] ata1: EH complete

Hi Aurelien,

Thanks for the heads-up. I have a fairly comprehensive suite of various
OS test images I use for OpenBIOS testing and evidently not a single one
of them issues a TRIM command as I don't see any regressions here. Can
you point me towards the particular test image you are using?


ATB,

Mark.

Re: [Qemu-devel] [PATCH] RFC/net: Add a net filter

2015-07-28 Thread Yang Hongyang


On 07/28/2015 12:00 PM, Yang Hongyang wrote:

On 07/28/2015 11:28 AM, Jason Wang wrote:

On 07/27/2015 06:03 PM, Yang Hongyang wrote:

On 07/27/2015 05:16 PM, Jason Wang wrote:
[...]

I think this won't work for the buffer case? If we want the buffer
case
to work under this, we should modify the generic netdev layer
code, to
check the return value of the filter function call.


But checking return value is rather simpler than a new netdev type,
isn't it?


But how to implement a plugin which suppose to do the actual work on
the packets?


Well, the filter get the packets, so it can do everything it wants.


how to configure params related to the plugin? different
plugins may need different params, implement as another netdev?


I belive qmp can do this? something like -filter dump,id=f0,len=1?


So you mean implement another object filter?


Yes.


and the structure is like netdev?


No, it is embedded in netdev.


Ah, I see what you mean, thank you for the patience...
does the command line looks like:
-filter dump,id=f0,len=1
-netdev tap,XXX,filter=dump

If I need both dump and packets buffering, how will the qmp be?
-filter dump,id=f0,len=1
-filter buffer,XXX
-netdev tap,XXX,filter=dump:buffer:XXX ?


This is ok but we have several choices, e.g you may want to have a next
field like:

- filter buffer,id=f0
- filter dump,id=f1,len=1000,next=f0
- netdev tap,XXX,filter_root=f1


This seems better, thank you!


Sorry, when try to implement it, I found this qmp is not that good when it
comes to dynamically add/remove filters, can I use below instead:
-netdev tap,id=bn0
-netfilter dump,id=f0,netdev=bn0
-netfilter buffer,id=f1,netdev=bn0

by this, I can introduce a QTAILQ of filters to NetClentState and
netdev_{add|del}_filter to dynamically add/remove filters.






or
-netdev tap,id=bn0,XXX
-filter dump,id=f0,len=1,netdev=bn0
-filter buffer,XXX,netdev=bn0 ?

And do you care if we add a filter list to NetClientInfo?


I don't care, and in fact this also shows great advantages over the
proposal of new netdevs. In that case, if you want two filters, two
netdevs is needed and I can't image how to handle offloads or link
status in this case.


and modify the generic layer to deal with these filters?
E.g, init/cleanup filters, go through these filters, check
for return values, stop call peer's receiving.


I think it's ok to do these. What we really need is an new layer before
peer's receiving not new kinds of netdevs.


This is our main concern at the beginning, we want to avoid
modify the netdev generic layer too much, and do all things
within the filter dev so that this could be a bolt-on
feature. But if you don't care about this, we could simply
implement it as you said.


I don't think much will be modified. Maybe just add callbacks on
receive, initialization and cleanup. Most of the codes should be limited
to filter itself. The point is 'filter' is not a kind of device or
backend, so most of the fields will be unnecessary. Treating it as
pseudo netdev will bring burdens too. E.g: dealing with vnet headers,
offloads, be/le setting and link status and probably even more.


Thank you for the explanation. will do as you said.






That will duplicate some of the netdev layer code.


Not at all, it only cares about how to deal with the packet.


Implement it as
a netdev can reuse the existing netdev design. And current dump is
implemented
as a netdev right?


Right but it only works for hub, and that's why Thomas wrote his series
to make it work for all other backends


even if we simply passing the packets to the filter function before
calling nc->info->receive{_raw}(), we might also need to implement as
a netdev as dump dose.


Why? The reason why we still keep -netdev dump is for backward
compatibility. If we only care about using it for new netdevs, we can
get rid of all netdev stuffs from dump.


Thank you, I see the points.












And it is not as
extensible as we abstract the filter function to a netdev, We can
flexibly add/remove/change filter plugins on the fly.


I don't see why we lose the flexibility like what I suggested.
Actually,
implement it through a netdev will complex this. E.g:

-netdev tap,id=bn0  # you can use whatever backend as needed
-netdev filter,id=f0,backend=bn0,plugin=dump
-device e1000,netdev=f0

How did you remove filter id=f0? Looks like you need also remove
e1000 nic?


No, when remove filter, we restore the connection between network
backend and
NIC. Just like filter does not ever exists.


But e1000's peer is f0. You mean you will modify the peer pointer
during
filter removing?


Yes.


Sounds scary.







.





.





.





.





--
Thanks,
Yang.

Re: [Qemu-devel] [PATCH v7 42/42] Inhibit ballooning during postcopy

2015-07-28 Thread Dr. David Alan Gilbert

* Amit Shah (amit.s...@redhat.com) wrote:
> On (Tue) 16 Jun 2015 [11:26:55], Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" 
> > 
> > Postcopy detects accesses to pages that haven't been transferred yet
> > using userfaultfd, and it causes exceptions on pages that are 'not
> > present'.
> > Ballooning also causes pages to be marked as 'not present' when the
> > guest inflates the balloon.
> > Potentially a balloon could be inflated to discard pages that are
> > currently inflight during postcopy and that may be arriving at about
> > the same time.
> > 
> > To avoid this confusion, disable ballooning during postcopy.
> > 
> > When disabled we drop balloon requests from the guest.  Since ballooning
> > is generally initiated by the host, the management system should avoid
> > initiating any balloon instructions to the guest during migration,
> > although it's not possible to know how long it would take a guest to
> > process a request made prior to the start of migration.
> > 
> > Queueing the requests until after migration would be nice, but is
> > non-trivial, since the set of inflate/deflate requests have to
> > be compared with the state of the page to know what the final
> > outcome is allowed to be.
> 
> I didn't track the previous discussion, but there were plans to have
> guest-initiated balloon requests for cases where the guest wants to
> co-operate with hosts and return any free mem available We don't
> currently have guests that do this, but we also don't want to have a
> dependency between the host and guest -- they should be independent.
> 
> This approach here seems the simplest possible, short of maintaining
> another bitmap for the duration of postcopy which indicates
> guest-freed memory pages which postcopy should not populate, after
> receiving them at the dest (this sounds better to me than queuing up
> guest requests).
> 
> The downside here is that the guest offered some memory back, and we
> don't use it.  The guest also doesn't use it -- so it's a double loss,
> of sorts.
> 
> Thoughts?  I don't have a problem with this current approach, but if
> we could get something better, that'll be good too.

It needs something like that bitmap, but it would take quite a bit
of care to manage the interaction between:
a) The guest emitting balloon notifications
b) Pages being received from the source
c) Destination use of that page

  we also have to think what to do with a page that's been ballooned
after reception of the source page; the madvise(dontneed) that's used
normally would cause userfault to fire again, and we can't allow that.
(We could make it the same as receiving a zero page).   But then we would
also have to cope with  the source sending us a page after the destination
has ballooned it and make sure to discard that (I suspect there are further
ordering examples that have to also be considered).

Dave
> 
>   Amit
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] RFC/net: Add a net filter

2015-07-28 Thread Yang Hongyang


On 07/28/2015 12:00 PM, Yang Hongyang wrote:

On 07/28/2015 11:28 AM, Jason Wang wrote:

On 07/27/2015 06:03 PM, Yang Hongyang wrote:

On 07/27/2015 05:16 PM, Jason Wang wrote:
[...]

I think this won't work for the buffer case? If we want the buffer
case
to work under this, we should modify the generic netdev layer
code, to
check the return value of the filter function call.


But checking return value is rather simpler than a new netdev type,
isn't it?


But how to implement a plugin which suppose to do the actual work on
the packets?


Well, the filter get the packets, so it can do everything it wants.


how to configure params related to the plugin? different
plugins may need different params, implement as another netdev?


I belive qmp can do this? something like -filter dump,id=f0,len=1?


So you mean implement another object filter?


Yes.


and the structure is like netdev?


No, it is embedded in netdev.


Ah, I see what you mean, thank you for the patience...
does the command line looks like:
-filter dump,id=f0,len=1
-netdev tap,XXX,filter=dump

If I need both dump and packets buffering, how will the qmp be?
-filter dump,id=f0,len=1
-filter buffer,XXX
-netdev tap,XXX,filter=dump:buffer:XXX ?


This is ok but we have several choices, e.g you may want to have a next
field like:

- filter buffer,id=f0
- filter dump,id=f1,len=1000,next=f0
- netdev tap,XXX,filter_root=f1


This seems better, thank you!


Sorry, when try to implement it, I found this qmp is not that good when it
comes to dynamically add/remove filters, can I use below instead:
-netdev tap,id=bn0
-netfilter dump,id=f0,netdev=bn0
-netfilter buffer,id=f1,netdev=bn0

by this, I can introduce a QTAILQ of filters to NetClentState and
netdev_{add|del}_filter to dynamically add/remove filters.






or
-netdev tap,id=bn0,XXX
-filter dump,id=f0,len=1,netdev=bn0
-filter buffer,XXX,netdev=bn0 ?

And do you care if we add a filter list to NetClientInfo?


I don't care, and in fact this also shows great advantages over the
proposal of new netdevs. In that case, if you want two filters, two
netdevs is needed and I can't image how to handle offloads or link
status in this case.


and modify the generic layer to deal with these filters?
E.g, init/cleanup filters, go through these filters, check
for return values, stop call peer's receiving.


I think it's ok to do these. What we really need is an new layer before
peer's receiving not new kinds of netdevs.


This is our main concern at the beginning, we want to avoid
modify the netdev generic layer too much, and do all things
within the filter dev so that this could be a bolt-on
feature. But if you don't care about this, we could simply
implement it as you said.


I don't think much will be modified. Maybe just add callbacks on
receive, initialization and cleanup. Most of the codes should be limited
to filter itself. The point is 'filter' is not a kind of device or
backend, so most of the fields will be unnecessary. Treating it as
pseudo netdev will bring burdens too. E.g: dealing with vnet headers,
offloads, be/le setting and link status and probably even more.


Thank you for the explanation. will do as you said.






That will duplicate some of the netdev layer code.


Not at all, it only cares about how to deal with the packet.


Implement it as
a netdev can reuse the existing netdev design. And current dump is
implemented
as a netdev right?


Right but it only works for hub, and that's why Thomas wrote his series
to make it work for all other backends


even if we simply passing the packets to the filter function before
calling nc->info->receive{_raw}(), we might also need to implement as
a netdev as dump dose.


Why? The reason why we still keep -netdev dump is for backward
compatibility. If we only care about using it for new netdevs, we can
get rid of all netdev stuffs from dump.


Thank you, I see the points.












And it is not as
extensible as we abstract the filter function to a netdev, We can
flexibly add/remove/change filter plugins on the fly.


I don't see why we lose the flexibility like what I suggested.
Actually,
implement it through a netdev will complex this. E.g:

-netdev tap,id=bn0  # you can use whatever backend as needed
-netdev filter,id=f0,backend=bn0,plugin=dump
-device e1000,netdev=f0

How did you remove filter id=f0? Looks like you need also remove
e1000 nic?


No, when remove filter, we restore the connection between network
backend and
NIC. Just like filter does not ever exists.


But e1000's peer is f0. You mean you will modify the peer pointer
during
filter removing?


Yes.


Sounds scary.







.





.





.





.





--
Thanks,
Yang.

[Qemu-devel] [PATCH for-2.4 v2] xen: Drop net_rx_ok

2015-07-28 Thread Fam Zheng

Let net_rx_packet() (which checks the same conditions) drops the packet
if the device is not ready. Drop net_xen_info.can_receive and update the
return value for the buffer full case.

We rely on the qemu_flush_queued_packets() in net_event() to wake up
the peer when the buffer becomes available again.

Signed-off-by: Fam Zheng 

---

v2:
Fix return value of net_rx_packet to keep the semantics when ring
buffer is full. [Stefan]
Update commit message accordingly.
---
 hw/net/xen_nic.c | 24 +---
 1 file changed, 1 insertion(+), 23 deletions(-)

diff --git a/hw/net/xen_nic.c b/hw/net/xen_nic.c
index 19ecfc4..acb7e2e 100644
--- a/hw/net/xen_nic.c
+++ b/hw/net/xen_nic.c
@@ -234,27 +234,6 @@ static void net_rx_response(struct XenNetDev *netdev,
 
 #define NET_IP_ALIGN 2
 
-static int net_rx_ok(NetClientState *nc)
-{
-struct XenNetDev *netdev = qemu_get_nic_opaque(nc);
-RING_IDX rc, rp;
-
-if (netdev->xendev.be_state != XenbusStateConnected) {
-return 0;
-}
-
-rc = netdev->rx_ring.req_cons;
-rp = netdev->rx_ring.sring->req_prod;
-xen_rmb();
-
-if (rc == rp || RING_REQUEST_CONS_OVERFLOW(&netdev->rx_ring, rc)) {
-xen_be_printf(&netdev->xendev, 2, "%s: no rx buffers (%d/%d)\n",
-  __FUNCTION__, rc, rp);
-return 0;
-}
-return 1;
-}
-
 static ssize_t net_rx_packet(NetClientState *nc, const uint8_t *buf, size_t 
size)
 {
 struct XenNetDev *netdev = qemu_get_nic_opaque(nc);
@@ -272,7 +251,7 @@ static ssize_t net_rx_packet(NetClientState *nc, const 
uint8_t *buf, size_t size
 
 if (rc == rp || RING_REQUEST_CONS_OVERFLOW(&netdev->rx_ring, rc)) {
 xen_be_printf(&netdev->xendev, 2, "no buffer, drop packet\n");
-return -1;
+return 0;
 }
 if (size > XC_PAGE_SIZE - NET_IP_ALIGN) {
 xen_be_printf(&netdev->xendev, 0, "packet too big (%lu > %ld)",
@@ -304,7 +283,6 @@ static ssize_t net_rx_packet(NetClientState *nc, const 
uint8_t *buf, size_t size
 static NetClientInfo net_xen_info = {
 .type = NET_CLIENT_OPTIONS_KIND_NIC,
 .size = sizeof(NICState),
-.can_receive = net_rx_ok,
 .receive = net_rx_packet,
 };
 
-- 
2.4.3

Re: [Qemu-devel] [PATCH] RFC/net: Add a net filter

2015-07-28 Thread Jason Wang



On 07/28/2015 05:19 PM, Yang Hongyang wrote:
> On 07/28/2015 12:00 PM, Yang Hongyang wrote:
>> On 07/28/2015 11:28 AM, Jason Wang wrote:
>>> On 07/27/2015 06:03 PM, Yang Hongyang wrote:
 On 07/27/2015 05:16 PM, Jason Wang wrote:
 [...]
>> I think this won't work for the buffer case? If we want the
>> buffer
>> case
>> to work under this, we should modify the generic netdev layer
>> code, to
>> check the return value of the filter function call.
>
> But checking return value is rather simpler than a new netdev
> type,
> isn't it?

 But how to implement a plugin which suppose to do the actual
 work on
 the packets?
>>>
>>> Well, the filter get the packets, so it can do everything it wants.
>>>
 how to configure params related to the plugin? different
 plugins may need different params, implement as another netdev?
>>>
>>> I belive qmp can do this? something like -filter
>>> dump,id=f0,len=1?
>>
>> So you mean implement another object filter?
>
> Yes.
>
>> and the structure is like netdev?
>
> No, it is embedded in netdev.

 Ah, I see what you mean, thank you for the patience...
 does the command line looks like:
 -filter dump,id=f0,len=1
 -netdev tap,XXX,filter=dump

 If I need both dump and packets buffering, how will the qmp be?
 -filter dump,id=f0,len=1
 -filter buffer,XXX
 -netdev tap,XXX,filter=dump:buffer:XXX ?
>>>
>>> This is ok but we have several choices, e.g you may want to have a next
>>> field like:
>>>
>>> - filter buffer,id=f0
>>> - filter dump,id=f1,len=1000,next=f0
>>> - netdev tap,XXX,filter_root=f1
>>
>> This seems better, thank you!
>
> Sorry, when try to implement it, I found this qmp is not that good
> when it
> comes to dynamically add/remove filters, can I use below instead:
> -netdev tap,id=bn0
> -netfilter dump,id=f0,netdev=bn0
> -netfilter buffer,id=f1,netdev=bn0
>
> by this, I can introduce a QTAILQ of filters to NetClentState and
> netdev_{add|del}_filter to dynamically add/remove filters. 

This looks good.

Thanks

Re: [Qemu-devel] [PATCH for-2.4 v2] xen: Drop net_rx_ok

2015-07-28 Thread Wen Congyang

On 07/28/2015 05:25 PM, Fam Zheng wrote:
> Let net_rx_packet() (which checks the same conditions) drops the packet
> if the device is not ready. Drop net_xen_info.can_receive and update the
> return value for the buffer full case.
> 
> We rely on the qemu_flush_queued_packets() in net_event() to wake up
> the peer when the buffer becomes available again.
> 
> Signed-off-by: Fam Zheng 
> 
> ---
> 
> v2:
> Fix return value of net_rx_packet to keep the semantics when ring
> buffer is full. [Stefan]
> Update commit message accordingly.
> ---
>  hw/net/xen_nic.c | 24 +---
>  1 file changed, 1 insertion(+), 23 deletions(-)
> 
> diff --git a/hw/net/xen_nic.c b/hw/net/xen_nic.c
> index 19ecfc4..acb7e2e 100644
> --- a/hw/net/xen_nic.c
> +++ b/hw/net/xen_nic.c
> @@ -234,27 +234,6 @@ static void net_rx_response(struct XenNetDev *netdev,
>  
>  #define NET_IP_ALIGN 2
>  
> -static int net_rx_ok(NetClientState *nc)
> -{
> -struct XenNetDev *netdev = qemu_get_nic_opaque(nc);
> -RING_IDX rc, rp;
> -
> -if (netdev->xendev.be_state != XenbusStateConnected) {
> -return 0;
> -}
> -
> -rc = netdev->rx_ring.req_cons;
> -rp = netdev->rx_ring.sring->req_prod;
> -xen_rmb();
> -
> -if (rc == rp || RING_REQUEST_CONS_OVERFLOW(&netdev->rx_ring, rc)) {
> -xen_be_printf(&netdev->xendev, 2, "%s: no rx buffers (%d/%d)\n",
> -  __FUNCTION__, rc, rp);
> -return 0;
> -}
> -return 1;
> -}
> -
>  static ssize_t net_rx_packet(NetClientState *nc, const uint8_t *buf, size_t 
> size)
>  {
>  struct XenNetDev *netdev = qemu_get_nic_opaque(nc);
> @@ -272,7 +251,7 @@ static ssize_t net_rx_packet(NetClientState *nc, const 
> uint8_t *buf, size_t size
>  
>  if (rc == rp || RING_REQUEST_CONS_OVERFLOW(&netdev->rx_ring, rc)) {
>  xen_be_printf(&netdev->xendev, 2, "no buffer, drop packet\n");

IIUC, 'return 0' means that the packet will be queued in the incoming queue. So 
the log should
be changed too.

Thanks
Wen Congyang

> -return -1;
> +return 0;
>  }
>  if (size > XC_PAGE_SIZE - NET_IP_ALIGN) {
>  xen_be_printf(&netdev->xendev, 0, "packet too big (%lu > %ld)",
> @@ -304,7 +283,6 @@ static ssize_t net_rx_packet(NetClientState *nc, const 
> uint8_t *buf, size_t size
>  static NetClientInfo net_xen_info = {
>  .type = NET_CLIENT_OPTIONS_KIND_NIC,
>  .size = sizeof(NICState),
> -.can_receive = net_rx_ok,
>  .receive = net_rx_packet,
>  };
>  
>

Re: [Qemu-devel] [sheepdog] [PATCH] sheepdog: serialize requests to overwrapping area

2015-07-28 Thread Liu Yuan

On Tue, Jul 28, 2015 at 04:50:08PM +0800, Liu Yuan wrote:
> On Sat, Jul 18, 2015 at 01:44:24AM +0900, Hitoshi Mitake wrote:
> > Current sheepdog driver only serializes create requests in oid
> > unit. This mechanism isn't enough for handling requests to
> > overwrapping area spanning multiple oids, so it can result bugs like
> > below:
> > https://bugs.launchpad.net/sheepdog-project/+bug/1456421
> 
> I'm a bit late to review the patch since I'm not on the cc list, but I'd like 
> to
> get the idea how the mentioned bug relates to the serialization of requests?
> 
> The mentioned bug looks to me more a bug of sheepdog because the create and
> write request will only unref a single oid. The bug report is unclear about
> why the object idx in inode becomes zero, at least not pointing that it 
> relates
> to QEMU.
> 
> But this patch assume QEMU send the requests the wrong way and just vaguely
> says it is just wrong without reason.
> 
> What is overrapping requests? As far as I understand, the request that stride
> over two objects will be split into two, to make sure all the requests fit the
> sheepdog object size. Allow requests run concurrently on different SD objects 
> is
> way achieving high performance. This patch mutes this feature, to me, without 
> a
> decent reason. 
> 
> Probably I miss something hidden, but I'd like someone enlighten me about it
> because this patch might slow down QEMU VM over sheepdog.
> 
> Thanks,
> Yuan

Cc Jeff

Re: [Qemu-devel] [PATCH] RFC/net: Add a net filter

2015-07-28 Thread Yang Hongyang


On 07/28/2015 05:30 PM, Jason Wang wrote:

On 07/28/2015 05:19 PM, Yang Hongyang wrote:

On 07/28/2015 12:00 PM, Yang Hongyang wrote:

On 07/28/2015 11:28 AM, Jason Wang wrote:

On 07/27/2015 06:03 PM, Yang Hongyang wrote:

On 07/27/2015 05:16 PM, Jason Wang wrote:
[...]

I think this won't work for the buffer case? If we want the
buffer
case
to work under this, we should modify the generic netdev layer
code, to
check the return value of the filter function call.


But checking return value is rather simpler than a new netdev
type,
isn't it?


But how to implement a plugin which suppose to do the actual
work on
the packets?


Well, the filter get the packets, so it can do everything it wants.


how to configure params related to the plugin? different
plugins may need different params, implement as another netdev?


I belive qmp can do this? something like -filter
dump,id=f0,len=1?


So you mean implement another object filter?


Yes.


and the structure is like netdev?


No, it is embedded in netdev.


Ah, I see what you mean, thank you for the patience...
does the command line looks like:
-filter dump,id=f0,len=1
-netdev tap,XXX,filter=dump

If I need both dump and packets buffering, how will the qmp be?
-filter dump,id=f0,len=1
-filter buffer,XXX
-netdev tap,XXX,filter=dump:buffer:XXX ?


This is ok but we have several choices, e.g you may want to have a next
field like:

- filter buffer,id=f0
- filter dump,id=f1,len=1000,next=f0
- netdev tap,XXX,filter_root=f1


This seems better, thank you!


Sorry, when try to implement it, I found this qmp is not that good
when it
comes to dynamically add/remove filters, can I use below instead:
-netdev tap,id=bn0
-netfilter dump,id=f0,netdev=bn0
-netfilter buffer,id=f1,netdev=bn0

by this, I can introduce a QTAILQ of filters to NetClentState and
netdev_{add|del}_filter to dynamically add/remove filters.


This looks good.


Thank you.



Thanks
.



--
Thanks,
Yang.

Re: [Qemu-devel] [PATCH for-2.4 v2] xen: Drop net_rx_ok

2015-07-28 Thread Fam Zheng

On Tue, 07/28 17:32, Wen Congyang wrote:
> On 07/28/2015 05:25 PM, Fam Zheng wrote:
> > Let net_rx_packet() (which checks the same conditions) drops the packet
> > if the device is not ready. Drop net_xen_info.can_receive and update the
> > return value for the buffer full case.
> > 
> > We rely on the qemu_flush_queued_packets() in net_event() to wake up
> > the peer when the buffer becomes available again.
> > 
> > Signed-off-by: Fam Zheng 
> > 
> > ---
> > 
> > v2:
> > Fix return value of net_rx_packet to keep the semantics when ring
> > buffer is full. [Stefan]
> > Update commit message accordingly.
> > ---
> >  hw/net/xen_nic.c | 24 +---
> >  1 file changed, 1 insertion(+), 23 deletions(-)
> > 
> > diff --git a/hw/net/xen_nic.c b/hw/net/xen_nic.c
> > index 19ecfc4..acb7e2e 100644
> > --- a/hw/net/xen_nic.c
> > +++ b/hw/net/xen_nic.c
> > @@ -234,27 +234,6 @@ static void net_rx_response(struct XenNetDev *netdev,
> >  
> >  #define NET_IP_ALIGN 2
> >  
> > -static int net_rx_ok(NetClientState *nc)
> > -{
> > -struct XenNetDev *netdev = qemu_get_nic_opaque(nc);
> > -RING_IDX rc, rp;
> > -
> > -if (netdev->xendev.be_state != XenbusStateConnected) {
> > -return 0;
> > -}
> > -
> > -rc = netdev->rx_ring.req_cons;
> > -rp = netdev->rx_ring.sring->req_prod;
> > -xen_rmb();
> > -
> > -if (rc == rp || RING_REQUEST_CONS_OVERFLOW(&netdev->rx_ring, rc)) {
> > -xen_be_printf(&netdev->xendev, 2, "%s: no rx buffers (%d/%d)\n",
> > -  __FUNCTION__, rc, rp);
> > -return 0;
> > -}
> > -return 1;
> > -}
> > -
> >  static ssize_t net_rx_packet(NetClientState *nc, const uint8_t *buf, 
> > size_t size)
> >  {
> >  struct XenNetDev *netdev = qemu_get_nic_opaque(nc);
> > @@ -272,7 +251,7 @@ static ssize_t net_rx_packet(NetClientState *nc, const 
> > uint8_t *buf, size_t size
> >  
> >  if (rc == rp || RING_REQUEST_CONS_OVERFLOW(&netdev->rx_ring, rc)) {
> >  xen_be_printf(&netdev->xendev, 2, "no buffer, drop packet\n");
> 
> IIUC, 'return 0' means that the packet will be queued in the incoming queue. 
> So the log should
> be changed too.

You're right, will fix.

Fam

[Qemu-devel] [PATCH for-2.4 v3] xen: Drop net_rx_ok

2015-07-28 Thread Fam Zheng

Let net_rx_packet() (which checks the same conditions) drops the packet
if the device is not ready. Drop net_xen_info.can_receive and update the
return value for the buffer full case.

We rely on the qemu_flush_queued_packets() in net_event() to wake up
the peer when the buffer becomes available again.

Signed-off-by: Fam Zheng 

---

v3: Drop log. [Wen]

v2:
Fix return value of net_rx_packet to keep the semantics when ring
buffer is full. [Stefan]
Update commit message accordingly.
---
 hw/net/xen_nic.c | 25 +
 1 file changed, 1 insertion(+), 24 deletions(-)

diff --git a/hw/net/xen_nic.c b/hw/net/xen_nic.c
index 19ecfc4..d7cbfc1 100644
--- a/hw/net/xen_nic.c
+++ b/hw/net/xen_nic.c
@@ -234,27 +234,6 @@ static void net_rx_response(struct XenNetDev *netdev,
 
 #define NET_IP_ALIGN 2
 
-static int net_rx_ok(NetClientState *nc)
-{
-struct XenNetDev *netdev = qemu_get_nic_opaque(nc);
-RING_IDX rc, rp;
-
-if (netdev->xendev.be_state != XenbusStateConnected) {
-return 0;
-}
-
-rc = netdev->rx_ring.req_cons;
-rp = netdev->rx_ring.sring->req_prod;
-xen_rmb();
-
-if (rc == rp || RING_REQUEST_CONS_OVERFLOW(&netdev->rx_ring, rc)) {
-xen_be_printf(&netdev->xendev, 2, "%s: no rx buffers (%d/%d)\n",
-  __FUNCTION__, rc, rp);
-return 0;
-}
-return 1;
-}
-
 static ssize_t net_rx_packet(NetClientState *nc, const uint8_t *buf, size_t 
size)
 {
 struct XenNetDev *netdev = qemu_get_nic_opaque(nc);
@@ -271,8 +250,7 @@ static ssize_t net_rx_packet(NetClientState *nc, const 
uint8_t *buf, size_t size
 xen_rmb(); /* Ensure we see queued requests up to 'rp'. */
 
 if (rc == rp || RING_REQUEST_CONS_OVERFLOW(&netdev->rx_ring, rc)) {
-xen_be_printf(&netdev->xendev, 2, "no buffer, drop packet\n");
-return -1;
+return 0;
 }
 if (size > XC_PAGE_SIZE - NET_IP_ALIGN) {
 xen_be_printf(&netdev->xendev, 0, "packet too big (%lu > %ld)",
@@ -304,7 +282,6 @@ static ssize_t net_rx_packet(NetClientState *nc, const 
uint8_t *buf, size_t size
 static NetClientInfo net_xen_info = {
 .type = NET_CLIENT_OPTIONS_KIND_NIC,
 .size = sizeof(NICState),
-.can_receive = net_rx_ok,
 .receive = net_rx_packet,
 };
 
-- 
2.4.3

Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?

2015-07-28 Thread Artyom Tarasenko

On Tue, Jul 28, 2015 at 9:52 AM, Dennis Luehring  wrote:
> (i've posted the question already on qemu-disc...@nongnu.org but was toled
> to better use this mailing list)
>
> i've prepared an Debian 7.8.0 image for SPARC64/qemu emulation for C/C++
> development before-real-hardware big-endian/unaligned tests
>
> i've benchmarked compiling of single pugixml.cpp
> (https://github.com/zeux/pugixml/blob/master/src/pugixml.cpp)
>
> qemu-system-sparc64: >180sek
> x64 native : ~ 2sek
>
> so my sparc64 emulation is around 90 times slower then native x64
>
> my system:
>
> using lastest qemu git 2.3.x, with virtio for harddisk/network and qcow2
> image
>
> https://depositfiles.com/files/sj20aqwp0 (~280MB
> press the "regular download" button, wait some seconds, solve the
> chapca, "download file in regular mode by browser"
>
> there is pugi_sparc.txt in the 7z which describes how to start,use and
> what is installed in the image
>
> qemu runs natively under a ubuntu 15.04 (x64), Core i7, 8GB system doing
> nothing but qemu
>
> installed is
>
> gcc/g++ 4.6
> make
> sshd running
>
> compiling cmake 2.3.2 tooked around 10h
> compiling pugixml takes also very very long
>
> "top perf" from guest and host while compiling pugixml don't show big
> blockers or something over time
> http://pastebin.com/D2fUpPrM
>
> anything i can do to speedup the emulation?

Maybe try the fresh tcg optimizer improvements from Aurelien:
https://lists.gnu.org/archive/html/qemu-devel/2015-07/msg05133.html

Artyom

-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu

[Qemu-devel] [PULL 02/10] virtio-serial: fix ANY_LAYOUT

2015-07-28 Thread Michael S. Tsirkin

Don't assume a specific layout for control messages.
Required by virtio 1.

Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Amit Shah 
Reviewed-by: Jason Wang 
---
 hw/char/virtio-serial-bus.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 78c73e5..929e49c 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -195,7 +195,8 @@ static size_t send_control_msg(VirtIOSerial *vser, void 
*buf, size_t len)
 return 0;
 }
 
-memcpy(elem.in_sg[0].iov_base, buf, len);
+/* TODO: detect a buffer that's too short, set NEEDS_RESET */
+iov_from_buf(elem.in_sg, elem.in_num, 0, buf, len);
 
 virtqueue_push(vq, &elem, len);
 virtio_notify(VIRTIO_DEVICE(vser), vq);
-- 
MST

[Qemu-devel] [PULL 04/10] virtio: set any_layout in virtio core

2015-07-28 Thread Michael S. Tsirkin

Exceptions:
- virtio-blk
- compat machine types

Signed-off-by: Michael S. Tsirkin 
---
 include/hw/compat.h| 22 +-
 include/hw/virtio/virtio.h |  4 +++-
 hw/block/virtio-blk.c  |  1 +
 hw/net/virtio-net.c|  2 --
 hw/scsi/virtio-scsi.c  |  2 --
 5 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/include/hw/compat.h b/include/hw/compat.h
index 4a43466..94c8097 100644
--- a/include/hw/compat.h
+++ b/include/hw/compat.h
@@ -2,7 +2,27 @@
 #define HW_COMPAT_H
 
 #define HW_COMPAT_2_3 \
-/* empty */
+{\
+.driver   = "virtio-blk-pci",\
+.property = "any_layout",\
+.value= "off",\
+},{\
+.driver   = "virtio-balloon-pci",\
+.property = "any_layout",\
+.value= "off",\
+},{\
+.driver   = "virtio-serial-pci",\
+.property = "any_layout",\
+.value= "off",\
+},{\
+.driver   = "virtio-9p-pci",\
+.property = "any_layout",\
+.value= "off",\
+},{\
+.driver   = "virtio-rng-pci",\
+.property = "any_layout",\
+.value= "off",\
+},
 
 #define HW_COMPAT_2_2 \
 /* empty */
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 0634c15..ff91711 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -218,7 +218,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
 DEFINE_PROP_BIT64("event_idx", _state, _field,\
   VIRTIO_RING_F_EVENT_IDX, true), \
 DEFINE_PROP_BIT64("notify_on_empty", _state, _field,  \
-  VIRTIO_F_NOTIFY_ON_EMPTY, true)
+  VIRTIO_F_NOTIFY_ON_EMPTY, true), \
+DEFINE_PROP_BIT64("any_layout", _state, _field, \
+  VIRTIO_F_ANY_LAYOUT, true)
 
 hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
 hwaddr virtio_queue_get_avail_addr(VirtIODevice *vdev, int n);
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 6aefda4..015b9b5 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -731,6 +731,7 @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, 
uint64_t features)
 virtio_add_feature(&features, VIRTIO_BLK_F_TOPOLOGY);
 virtio_add_feature(&features, VIRTIO_BLK_F_BLK_SIZE);
 virtio_add_feature(&features, VIRTIO_BLK_F_SCSI);
+virtio_clear_feature(&features, VIRTIO_F_ANY_LAYOUT);
 
 if (s->conf.config_wce) {
 virtio_add_feature(&features, VIRTIO_BLK_F_CONFIG_WCE);
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 304d3dd..e203058 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1777,8 +1777,6 @@ static void virtio_net_instance_init(Object *obj)
 }
 
 static Property virtio_net_properties[] = {
-DEFINE_PROP_BIT("any_layout", VirtIONet, host_features,
-VIRTIO_F_ANY_LAYOUT, true),
 DEFINE_PROP_BIT("csum", VirtIONet, host_features, VIRTIO_NET_F_CSUM, true),
 DEFINE_PROP_BIT("guest_csum", VirtIONet, host_features,
 VIRTIO_NET_F_GUEST_CSUM, true),
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index f7d3c7c..d17698d 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -953,8 +953,6 @@ static Property virtio_scsi_properties[] = {
   0x),
 DEFINE_PROP_UINT32("cmd_per_lun", VirtIOSCSI, parent_obj.conf.cmd_per_lun,
   128),
-DEFINE_PROP_BIT("any_layout", VirtIOSCSI, host_features,
-  VIRTIO_F_ANY_LAYOUT, true),
 DEFINE_PROP_BIT("hotplug", VirtIOSCSI, host_features,
VIRTIO_SCSI_F_HOTPLUG, true),
 DEFINE_PROP_BIT("param_change", VirtIOSCSI, host_features,
-- 
MST

[Qemu-devel] [PULL 00/10] virtio fixes for 2.4

2015-07-28 Thread Michael S. Tsirkin

The following changes since commit b69b30532e0a80e25449244c01b0cbed000c99a3:

  Update version for v2.4.0-rc2 release (2015-07-22 18:17:19 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream

for you to fetch changes up to c147b5153e733a14fc65d2098e7036754c185ad1:

  virtio: minor cleanup (2015-07-27 23:55:27 +0300)


virtio fixes for 2.4

Mostly virtio 1 spec compliance fixes.
We are unlikely to make it perfectly compliant in
the first release, but it seems worth it to try.

Signed-off-by: Michael S. Tsirkin 


Gal Hammer (1):
  acpi: fix pvpanic device is not shown in ui

Jason Wang (3):
  virtio: get_features() can fail
  virtio-blk: fail get_features when both scsi and 1.0 were set
  virtio-blk: only clear VIRTIO_F_ANY_LAYOUT for legacy device

Michael S. Tsirkin (6):
  virtio: hide legacy features from modern guests
  virtio-serial: fix ANY_LAYOUT
  virtio-9p: fix any_layout
  virtio: set any_layout in virtio core
  virtio-pci: fix memory MR cleanup for modern
  virtio: minor cleanup

 include/hw/compat.h | 22 +-
 include/hw/virtio/virtio.h  | 12 ++--
 hw/9pfs/virtio-9p-device.c  |  3 ++-
 hw/9pfs/virtio-9p.c | 23 +--
 hw/block/virtio-blk.c   | 13 +++--
 hw/char/virtio-serial-bus.c |  6 --
 hw/display/virtio-gpu.c |  3 ++-
 hw/i386/acpi-build.c|  4 ++--
 hw/input/virtio-input.c |  3 ++-
 hw/net/virtio-net.c |  5 ++---
 hw/scsi/vhost-scsi.c|  3 ++-
 hw/scsi/virtio-scsi.c   |  5 ++---
 hw/virtio/virtio-balloon.c  |  3 ++-
 hw/virtio/virtio-bus.c  |  3 ++-
 hw/virtio/virtio-pci.c  | 18 +-
 hw/virtio/virtio-rng.c  |  2 +-
 16 files changed, 99 insertions(+), 29 deletions(-)

[Qemu-devel] [PULL 07/10] virtio-blk: fail get_features when both scsi and 1.0 were set

2015-07-28 Thread Michael S. Tsirkin

From: Jason Wang 

SCSI passthrough was no longer supported in virtio 1.0, so this patch
fail the get_features() when both 1.0 and scsi is set. And also only
advertise VIRTIO_BLK_F_SCSI for legacy virtio-blk device.

Signed-off-by: Jason Wang 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Acked-by: Paolo Bonzini 
---
 hw/block/virtio-blk.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index a6cf008..ebd9d84 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -731,8 +731,16 @@ static uint64_t virtio_blk_get_features(VirtIODevice 
*vdev, uint64_t features,
 virtio_add_feature(&features, VIRTIO_BLK_F_GEOMETRY);
 virtio_add_feature(&features, VIRTIO_BLK_F_TOPOLOGY);
 virtio_add_feature(&features, VIRTIO_BLK_F_BLK_SIZE);
-virtio_add_feature(&features, VIRTIO_BLK_F_SCSI);
 virtio_clear_feature(&features, VIRTIO_F_ANY_LAYOUT);
+if (__virtio_has_feature(features, VIRTIO_F_VERSION_1)) {
+if (s->conf.scsi) {
+error_setg(errp, "Please set scsi=off for virtio-blk devices in 
order to use virtio 1.0");
+return 0;
+}
+virtio_add_feature(&features, VIRTIO_F_ANY_LAYOUT);
+} else {
+virtio_add_feature(&features, VIRTIO_BLK_F_SCSI);
+}
 
 if (s->conf.config_wce) {
 virtio_add_feature(&features, VIRTIO_BLK_F_CONFIG_WCE);
-- 
MST

[Qemu-devel] [PULL 01/10] virtio: hide legacy features from modern guests

2015-07-28 Thread Michael S. Tsirkin

NOTIFY_ON_EMPTY, ANY_LAYOUT and BAD are only valid on the legacy
interface.

Hide them from modern guests.

Signed-off-by: Michael S. Tsirkin 
---
 include/hw/virtio/virtio.h | 4 
 hw/virtio/virtio-pci.c | 3 ++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 473fb75..0634c15 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -25,6 +25,10 @@
 /* A guest should never accept this.  It implies negotiation is broken. */
 #define VIRTIO_F_BAD_FEATURE   30
 
+#define VIRTIO_LEGACY_FEATURES ((0x1ULL << VIRTIO_F_BAD_FEATURE) | \
+(0x1ULL << VIRTIO_F_NOTIFY_ON_EMPTY) | \
+(0x1ULL << VIRTIO_F_ANY_LAYOUT))
+
 struct VirtQueue;
 
 static inline hwaddr vring_align(hwaddr addr,
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 283401a..db8f27c 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1095,7 +1095,8 @@ static uint64_t virtio_pci_common_read(void *opaque, 
hwaddr addr,
 break;
 case VIRTIO_PCI_COMMON_DF:
 if (proxy->dfselect <= 1) {
-val = vdev->host_features >> (32 * proxy->dfselect);
+val = (vdev->host_features & ~VIRTIO_LEGACY_FEATURES) >>
+(32 * proxy->dfselect);
 }
 break;
 case VIRTIO_PCI_COMMON_GFSELECT:
-- 
MST

[Qemu-devel] [PULL 03/10] virtio-9p: fix any_layout

2015-07-28 Thread Michael S. Tsirkin

virtio pci allows any device to have a modern interface,
this in turn requires ANY_LAYOUT support.
Fix up ANY_LAYOUT for virtio-9p.

Reported-by: Jason Wang 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Jason Wang 
---
 hw/9pfs/virtio-9p.c | 23 +--
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c
index 6ef8af3..f972731 100644
--- a/hw/9pfs/virtio-9p.c
+++ b/hw/9pfs/virtio-9p.c
@@ -14,6 +14,7 @@
 #include "hw/virtio/virtio.h"
 #include "hw/i386/pc.h"
 #include "qemu/error-report.h"
+#include "qemu/iov.h"
 #include "qemu/sockets.h"
 #include "virtio-9p.h"
 #include "fsdev/qemu-fsdev.h"
@@ -3261,16 +3262,26 @@ void handle_9p_output(VirtIODevice *vdev, VirtQueue *vq)
 
 while ((pdu = alloc_pdu(s)) &&
 (len = virtqueue_pop(vq, &pdu->elem)) != 0) {
-uint8_t *ptr;
+struct {
+uint32_t size_le;
+uint8_t id;
+uint16_t tag_le;
+} QEMU_PACKED out;
+int len;
+
 pdu->s = s;
 BUG_ON(pdu->elem.out_num == 0 || pdu->elem.in_num == 0);
-BUG_ON(pdu->elem.out_sg[0].iov_len < 7);
+QEMU_BUILD_BUG_ON(sizeof out != 7);
+
+len = iov_to_buf(pdu->elem.out_sg, pdu->elem.out_num, 0,
+ &out, sizeof out);
+BUG_ON(len != sizeof out);
+
+pdu->size = le32_to_cpu(out.size_le);
 
-ptr = pdu->elem.out_sg[0].iov_base;
+pdu->id = out.id;
+pdu->tag = le16_to_cpu(out.tag_le);
 
-pdu->size = le32_to_cpu(*(uint32_t *)ptr);
-pdu->id = ptr[4];
-pdu->tag = le16_to_cpu(*(uint16_t *)(ptr + 5));
 qemu_co_queue_init(&pdu->complete);
 submit_pdu(s, pdu);
 }
-- 
MST

[Qemu-devel] [PULL 06/10] virtio: get_features() can fail

2015-07-28 Thread Michael S. Tsirkin

From: Jason Wang 

Signed-off-by: Jason Wang 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Acked-by: Paolo Bonzini 
---
 include/hw/virtio/virtio.h  | 4 +++-
 hw/9pfs/virtio-9p-device.c  | 3 ++-
 hw/block/virtio-blk.c   | 3 ++-
 hw/char/virtio-serial-bus.c | 3 ++-
 hw/display/virtio-gpu.c | 3 ++-
 hw/input/virtio-input.c | 3 ++-
 hw/net/virtio-net.c | 3 ++-
 hw/scsi/vhost-scsi.c| 3 ++-
 hw/scsi/virtio-scsi.c   | 3 ++-
 hw/virtio/virtio-balloon.c  | 3 ++-
 hw/virtio/virtio-bus.c  | 3 ++-
 hw/virtio/virtio-rng.c  | 2 +-
 12 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index ff91711..59f0763 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -101,7 +101,9 @@ typedef struct VirtioDeviceClass {
 /* This is what a VirtioDevice must implement */
 DeviceRealize realize;
 DeviceUnrealize unrealize;
-uint64_t (*get_features)(VirtIODevice *vdev, uint64_t requested_features);
+uint64_t (*get_features)(VirtIODevice *vdev,
+ uint64_t requested_features,
+ Error **errp);
 uint64_t (*bad_features)(VirtIODevice *vdev);
 void (*set_features)(VirtIODevice *vdev, uint64_t val);
 int (*validate_features)(VirtIODevice *vdev);
diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
index 3f4c9e7..93a407c 100644
--- a/hw/9pfs/virtio-9p-device.c
+++ b/hw/9pfs/virtio-9p-device.c
@@ -21,7 +21,8 @@
 #include "virtio-9p-coth.h"
 #include "hw/virtio/virtio-access.h"
 
-static uint64_t virtio_9p_get_features(VirtIODevice *vdev, uint64_t features)
+static uint64_t virtio_9p_get_features(VirtIODevice *vdev, uint64_t features,
+   Error **errp)
 {
 virtio_add_feature(&features, VIRTIO_9P_MOUNT_TAG);
 return features;
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 015b9b5..a6cf008 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -722,7 +722,8 @@ static void virtio_blk_set_config(VirtIODevice *vdev, const 
uint8_t *config)
 aio_context_release(blk_get_aio_context(s->blk));
 }
 
-static uint64_t virtio_blk_get_features(VirtIODevice *vdev, uint64_t features)
+static uint64_t virtio_blk_get_features(VirtIODevice *vdev, uint64_t features,
+Error **errp)
 {
 VirtIOBlock *s = VIRTIO_BLK(vdev);
 
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 929e49c..bc56f5d 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -500,7 +500,8 @@ static void handle_input(VirtIODevice *vdev, VirtQueue *vq)
 }
 }
 
-static uint64_t get_features(VirtIODevice *vdev, uint64_t features)
+static uint64_t get_features(VirtIODevice *vdev, uint64_t features,
+ Error **errp)
 {
 VirtIOSerial *vser;
 
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 990a26b..a67d927 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -89,7 +89,8 @@ static void virtio_gpu_set_config(VirtIODevice *vdev, const 
uint8_t *config)
 }
 }
 
-static uint64_t virtio_gpu_get_features(VirtIODevice *vdev, uint64_t features)
+static uint64_t virtio_gpu_get_features(VirtIODevice *vdev, uint64_t features,
+Error **errp)
 {
 return features;
 }
diff --git a/hw/input/virtio-input.c b/hw/input/virtio-input.c
index 7f5b8d6..7b25d27 100644
--- a/hw/input/virtio-input.c
+++ b/hw/input/virtio-input.c
@@ -166,7 +166,8 @@ static void virtio_input_set_config(VirtIODevice *vdev,
 virtio_notify_config(vdev);
 }
 
-static uint64_t virtio_input_get_features(VirtIODevice *vdev, uint64_t f)
+static uint64_t virtio_input_get_features(VirtIODevice *vdev, uint64_t f,
+  Error **errp)
 {
 return f;
 }
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index e203058..1510839 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -446,7 +446,8 @@ static void virtio_net_set_queues(VirtIONet *n)
 
 static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue);
 
-static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features)
+static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
+Error **errp)
 {
 VirtIONet *n = VIRTIO_NET(vdev);
 NetClientState *nc = qemu_get_queue(n->nic);
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 52549f8..a69918b 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -153,7 +153,8 @@ static void vhost_scsi_stop(VHostSCSI *s)
 }
 
 static uint64_t vhost_scsi_get_features(VirtIODevice *vdev,
-uint64_t features)
+uint64_t features,
+Error **errp)
 {
 VHostSCSI

[Qemu-devel] [PULL 05/10] virtio-pci: fix memory MR cleanup for modern

2015-07-28 Thread Michael S. Tsirkin

Each memory_region_add_subregion must be paired with
memory_region_del_subregion.

Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Paolo Bonzini 
---
 hw/virtio/virtio-pci.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index db8f27c..c024161 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1414,6 +1414,13 @@ static void virtio_pci_modern_region_map(VirtIOPCIProxy 
*proxy,
 virtio_pci_add_mem_cap(proxy, cap);
 }
 
+static void virtio_pci_modern_region_unmap(VirtIOPCIProxy *proxy,
+   VirtIOPCIRegion *region)
+{
+memory_region_del_subregion(&proxy->modern_bar,
+®ion->mr);
+}
+
 /* This is called by virtio-bus just after the device is plugged. */
 static void virtio_pci_device_plugged(DeviceState *d, Error **errp)
 {
@@ -1520,8 +1527,16 @@ static void virtio_pci_device_plugged(DeviceState *d, 
Error **errp)
 static void virtio_pci_device_unplugged(DeviceState *d)
 {
 VirtIOPCIProxy *proxy = VIRTIO_PCI(d);
+bool modern = !(proxy->flags & VIRTIO_PCI_FLAG_DISABLE_MODERN);
 
 virtio_pci_stop_ioeventfd(proxy);
+
+if (modern) {
+virtio_pci_modern_region_unmap(proxy, &proxy->common);
+virtio_pci_modern_region_unmap(proxy, &proxy->isr);
+virtio_pci_modern_region_unmap(proxy, &proxy->device);
+virtio_pci_modern_region_unmap(proxy, &proxy->notify);
+}
 }
 
 static void virtio_pci_realize(PCIDevice *pci_dev, Error **errp)
-- 
MST

[Qemu-devel] [PULL 08/10] virtio-blk: only clear VIRTIO_F_ANY_LAYOUT for legacy device

2015-07-28 Thread Michael S. Tsirkin

From: Jason Wang 

Chapter 6.3 of spec said

"
Transitional devices MUST offer, and if offered by the device
transitional drivers MUST accept the following:

VIRTIO_F_ANY_LAYOUT (27)
"

So this patch only clear VIRTIO_F_LAYOUT for legacy device.

Cc: Stefan Hajnoczi 
Cc: Kevin Wolf 
Cc: qemu-bl...@nongnu.org
Signed-off-by: Jason Wang 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Acked-by: Paolo Bonzini 
---
 hw/block/virtio-blk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index ebd9d84..44f9b8e 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -731,7 +731,6 @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, 
uint64_t features,
 virtio_add_feature(&features, VIRTIO_BLK_F_GEOMETRY);
 virtio_add_feature(&features, VIRTIO_BLK_F_TOPOLOGY);
 virtio_add_feature(&features, VIRTIO_BLK_F_BLK_SIZE);
-virtio_clear_feature(&features, VIRTIO_F_ANY_LAYOUT);
 if (__virtio_has_feature(features, VIRTIO_F_VERSION_1)) {
 if (s->conf.scsi) {
 error_setg(errp, "Please set scsi=off for virtio-blk devices in 
order to use virtio 1.0");
@@ -739,6 +738,7 @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, 
uint64_t features,
 }
 virtio_add_feature(&features, VIRTIO_F_ANY_LAYOUT);
 } else {
+virtio_clear_feature(&features, VIRTIO_F_ANY_LAYOUT);
 virtio_add_feature(&features, VIRTIO_BLK_F_SCSI);
 }
 
-- 
MST

Re: [Qemu-devel] [PATCH v2 for-2.5] rcu: Allow calling rcu_(un)register_thread() during synchronize_rcu()

2015-07-28 Thread Wen Congyang

On 07/28/2015 05:59 PM, Paolo Bonzini wrote:
> 
> 
> On 28/07/2015 04:26, Wen Congyang wrote:
>> If rcu_(un)register_thread() is called together with synchronize_rcu(),
>> it will wait for the synchronize_rcu() to finish. But when synchronize_rcu()
>> waits for some events, we can modify the list registry.
>> We also use the lock rcu_gp_lock to assume that synchronize_rcu() isn't
>> executed in more than one thread at the same time. Add a new mutex lock
>> rcu_sync_lock to assume it and rename rcu_gp_lock to rcu_registry_lock.
>> Release rcu_registry_lock when synchronize_rcu() waits for some events.
>>
>> Signed-off-by: Wen Congyang 
>> ---
>>  util/rcu.c | 48 +++-
>>  1 file changed, 35 insertions(+), 13 deletions(-)
>>
>> diff --git a/util/rcu.c b/util/rcu.c
>> index cdcad67..4dd33df 100644
>> --- a/util/rcu.c
>> +++ b/util/rcu.c
>> @@ -47,7 +47,8 @@
>>  unsigned long rcu_gp_ctr = RCU_GP_LOCKED;
>>  
>>  QemuEvent rcu_gp_event;
>> -static QemuMutex rcu_gp_lock;
>> +static QemuMutex rcu_registry_lock;
>> +static QemuMutex rcu_sync_lock;
>>  
>>  /*
>>   * Check whether a quiescent state was crossed between the beginning of
>> @@ -66,7 +67,7 @@ static inline int rcu_gp_ongoing(unsigned long *ctr)
>>   */
>>  __thread struct rcu_reader_data rcu_reader;
>>  
>> -/* Protected by rcu_gp_lock.  */
>> +/* Protected by rcu_registry_lock.  */
>>  typedef QLIST_HEAD(, rcu_reader_data) ThreadList;
>>  static ThreadList registry = QLIST_HEAD_INITIALIZER(registry);
>>  
>> @@ -114,10 +115,26 @@ static void wait_for_readers(void)
>>  break;
>>  }
>>  
>> -/* Wait for one thread to report a quiescent state and
>> - * try again.
>> +/* Wait for one thread to report a quiescent state and try again.
>> + * Release rcu_registry_lock, so rcu_(un)register_thread() doesn't
>> + * wait too much time.
>> + *
>> + * rcu_register_thread() may add nodes to ®istry; it will not
>> + * wake up synchronize_rcu, but that is okay because at least 
>> another
>> + * thread must exit its RCU read-side critical section before
>> + * synchronize_rcu is done.  The next iteration of the loop will
>> + * move the new thread's rcu_reader from ®istry to &qsreaders,
>> + * because rcu_gp_ongoing() will return false.
>> + *
>> + * rcu_unregister_thread() may remove nodes from &qsreaders instead
>> + * of ®istry if it runs during qemu_event_wait.  That's okay;
>> + * the node then will not be added back to ®istry by QLIST_SWAP
>> + * below. The invariant is that the node is part of one list when
>> + * rcu_registry_lock is released.
>>   */
>> +qemu_mutex_unlock(&rcu_registry_lock);
>>  qemu_event_wait(&rcu_gp_event);
>> +qemu_mutex_lock(&rcu_registry_lock);
>>  }
>>  
>>  /* put back the reader list in the registry */
>> @@ -126,7 +143,8 @@ static void wait_for_readers(void)
>>  
>>  void synchronize_rcu(void)
>>  {
>> -qemu_mutex_lock(&rcu_gp_lock);
>> +qemu_mutex_lock(&rcu_sync_lock);
>> +qemu_mutex_lock(&rcu_registry_lock);
>>  
>>  if (!QLIST_EMPTY(®istry)) {
>>  /* In either case, the atomic_mb_set below blocks stores that free
>> @@ -149,7 +167,8 @@ void synchronize_rcu(void)
>>  wait_for_readers();
>>  }
>>  
>> -qemu_mutex_unlock(&rcu_gp_lock);
>> +qemu_mutex_unlock(&rcu_registry_lock);
>> +qemu_mutex_unlock(&rcu_sync_lock);
>>  }
>>  
>>  
>> @@ -273,23 +292,24 @@ void call_rcu1(struct rcu_head *node, void 
>> (*func)(struct rcu_head *node))
>>  void rcu_register_thread(void)
>>  {
>>  assert(rcu_reader.ctr == 0);
>> -qemu_mutex_lock(&rcu_gp_lock);
>> +qemu_mutex_lock(&rcu_registry_lock);
>>  QLIST_INSERT_HEAD(®istry, &rcu_reader, node);
>> -qemu_mutex_unlock(&rcu_gp_lock);
>> +qemu_mutex_unlock(&rcu_registry_lock);
>>  }
>>  
>>  void rcu_unregister_thread(void)
>>  {
>> -qemu_mutex_lock(&rcu_gp_lock);
>> +qemu_mutex_lock(&rcu_registry_lock);
>>  QLIST_REMOVE(&rcu_reader, node);
>> -qemu_mutex_unlock(&rcu_gp_lock);
>> +qemu_mutex_unlock(&rcu_registry_lock);
>>  }
>>  
>>  static void rcu_init_complete(void)
>>  {
>>  QemuThread thread;
>>  
>> -qemu_mutex_init(&rcu_gp_lock);
>> +qemu_mutex_init(&rcu_registry_lock);
>> +qemu_mutex_init(&rcu_sync_lock);
>>  qemu_event_init(&rcu_gp_event, true);
>>  
>>  qemu_event_init(&rcu_call_ready_event, false);
>> @@ -306,12 +326,14 @@ static void rcu_init_complete(void)
>>  #ifdef CONFIG_POSIX
>>  static void rcu_init_lock(void)
>>  {
>> -qemu_mutex_lock(&rcu_gp_lock);
>> +qemu_mutex_lock(&rcu_sync_lock);
>> +qemu_mutex_lock(&rcu_registry_lock);
>>  }
>>  
>>  static void rcu_init_unlock(void)
>>  {
>> -qemu_mutex_unlock(&rcu_gp_lock);
>> +qemu_mutex_unlock(&rcu_registry_lock);
>> +qemu_mutex_unlock(&rcu_sync_lock);
>>

[Qemu-devel] [PULL 09/10] acpi: fix pvpanic device is not shown in ui

2015-07-28 Thread Michael S. Tsirkin

From: Gal Hammer 

Commit 2332333c added a _STA method that hides the device. The fact
that the device is not shown in the gui make it harder to install its
Windows' device.

https://bugzilla.redhat.com/show_bug.cgi?id=1238141

Signed-off-by: Gal Hammer 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Igor Mammedov 
---
 hw/i386/acpi-build.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index aed811a..46eddb8 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1108,8 +1108,8 @@ build_ssdt(GArray *table_data, GArray *linker,
 aml_append(field, aml_named_field("PEPT", 8));
 aml_append(dev, field);
 
-/* device present, functioning, decoding, not shown in UI */
-aml_append(dev, aml_name_decl("_STA", aml_int(0xB)));
+/* device present, functioning, decoding, shown in UI */
+aml_append(dev, aml_name_decl("_STA", aml_int(0xF)));
 
 method = aml_method("RDPT", 0);
 aml_append(method, aml_store(aml_name("PEPT"), aml_local(0)));
-- 
MST

Re: [Qemu-devel] [PATCH v2 for-2.5] rcu: Allow calling rcu_(un)register_thread() during synchronize_rcu()

2015-07-28 Thread Paolo Bonzini



On 28/07/2015 04:26, Wen Congyang wrote:
> If rcu_(un)register_thread() is called together with synchronize_rcu(),
> it will wait for the synchronize_rcu() to finish. But when synchronize_rcu()
> waits for some events, we can modify the list registry.
> We also use the lock rcu_gp_lock to assume that synchronize_rcu() isn't
> executed in more than one thread at the same time. Add a new mutex lock
> rcu_sync_lock to assume it and rename rcu_gp_lock to rcu_registry_lock.
> Release rcu_registry_lock when synchronize_rcu() waits for some events.
> 
> Signed-off-by: Wen Congyang 
> ---
>  util/rcu.c | 48 +++-
>  1 file changed, 35 insertions(+), 13 deletions(-)
> 
> diff --git a/util/rcu.c b/util/rcu.c
> index cdcad67..4dd33df 100644
> --- a/util/rcu.c
> +++ b/util/rcu.c
> @@ -47,7 +47,8 @@
>  unsigned long rcu_gp_ctr = RCU_GP_LOCKED;
>  
>  QemuEvent rcu_gp_event;
> -static QemuMutex rcu_gp_lock;
> +static QemuMutex rcu_registry_lock;
> +static QemuMutex rcu_sync_lock;
>  
>  /*
>   * Check whether a quiescent state was crossed between the beginning of
> @@ -66,7 +67,7 @@ static inline int rcu_gp_ongoing(unsigned long *ctr)
>   */
>  __thread struct rcu_reader_data rcu_reader;
>  
> -/* Protected by rcu_gp_lock.  */
> +/* Protected by rcu_registry_lock.  */
>  typedef QLIST_HEAD(, rcu_reader_data) ThreadList;
>  static ThreadList registry = QLIST_HEAD_INITIALIZER(registry);
>  
> @@ -114,10 +115,26 @@ static void wait_for_readers(void)
>  break;
>  }
>  
> -/* Wait for one thread to report a quiescent state and
> - * try again.
> +/* Wait for one thread to report a quiescent state and try again.
> + * Release rcu_registry_lock, so rcu_(un)register_thread() doesn't
> + * wait too much time.
> + *
> + * rcu_register_thread() may add nodes to ®istry; it will not
> + * wake up synchronize_rcu, but that is okay because at least another
> + * thread must exit its RCU read-side critical section before
> + * synchronize_rcu is done.  The next iteration of the loop will
> + * move the new thread's rcu_reader from ®istry to &qsreaders,
> + * because rcu_gp_ongoing() will return false.
> + *
> + * rcu_unregister_thread() may remove nodes from &qsreaders instead
> + * of ®istry if it runs during qemu_event_wait.  That's okay;
> + * the node then will not be added back to ®istry by QLIST_SWAP
> + * below. The invariant is that the node is part of one list when
> + * rcu_registry_lock is released.
>   */
> +qemu_mutex_unlock(&rcu_registry_lock);
>  qemu_event_wait(&rcu_gp_event);
> +qemu_mutex_lock(&rcu_registry_lock);
>  }
>  
>  /* put back the reader list in the registry */
> @@ -126,7 +143,8 @@ static void wait_for_readers(void)
>  
>  void synchronize_rcu(void)
>  {
> -qemu_mutex_lock(&rcu_gp_lock);
> +qemu_mutex_lock(&rcu_sync_lock);
> +qemu_mutex_lock(&rcu_registry_lock);
>  
>  if (!QLIST_EMPTY(®istry)) {
>  /* In either case, the atomic_mb_set below blocks stores that free
> @@ -149,7 +167,8 @@ void synchronize_rcu(void)
>  wait_for_readers();
>  }
>  
> -qemu_mutex_unlock(&rcu_gp_lock);
> +qemu_mutex_unlock(&rcu_registry_lock);
> +qemu_mutex_unlock(&rcu_sync_lock);
>  }
>  
>  
> @@ -273,23 +292,24 @@ void call_rcu1(struct rcu_head *node, void 
> (*func)(struct rcu_head *node))
>  void rcu_register_thread(void)
>  {
>  assert(rcu_reader.ctr == 0);
> -qemu_mutex_lock(&rcu_gp_lock);
> +qemu_mutex_lock(&rcu_registry_lock);
>  QLIST_INSERT_HEAD(®istry, &rcu_reader, node);
> -qemu_mutex_unlock(&rcu_gp_lock);
> +qemu_mutex_unlock(&rcu_registry_lock);
>  }
>  
>  void rcu_unregister_thread(void)
>  {
> -qemu_mutex_lock(&rcu_gp_lock);
> +qemu_mutex_lock(&rcu_registry_lock);
>  QLIST_REMOVE(&rcu_reader, node);
> -qemu_mutex_unlock(&rcu_gp_lock);
> +qemu_mutex_unlock(&rcu_registry_lock);
>  }
>  
>  static void rcu_init_complete(void)
>  {
>  QemuThread thread;
>  
> -qemu_mutex_init(&rcu_gp_lock);
> +qemu_mutex_init(&rcu_registry_lock);
> +qemu_mutex_init(&rcu_sync_lock);
>  qemu_event_init(&rcu_gp_event, true);
>  
>  qemu_event_init(&rcu_call_ready_event, false);
> @@ -306,12 +326,14 @@ static void rcu_init_complete(void)
>  #ifdef CONFIG_POSIX
>  static void rcu_init_lock(void)
>  {
> -qemu_mutex_lock(&rcu_gp_lock);
> +qemu_mutex_lock(&rcu_sync_lock);
> +qemu_mutex_lock(&rcu_registry_lock);
>  }
>  
>  static void rcu_init_unlock(void)
>  {
> -qemu_mutex_unlock(&rcu_gp_lock);
> +qemu_mutex_unlock(&rcu_registry_lock);
> +qemu_mutex_unlock(&rcu_sync_lock);
>  }
>  #endif
>  
> 

Ok, thanks.

Paolo

[Qemu-devel] [PULL 10/10] virtio: minor cleanup

2015-07-28 Thread Michael S. Tsirkin

There's no need for blk to set ANY_LAYOUT, it's
done by virtio core as necessary.

Signed-off-by: Michael S. Tsirkin 
---
 hw/block/virtio-blk.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 44f9b8e..1556c9c 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -736,7 +736,6 @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, 
uint64_t features,
 error_setg(errp, "Please set scsi=off for virtio-blk devices in 
order to use virtio 1.0");
 return 0;
 }
-virtio_add_feature(&features, VIRTIO_F_ANY_LAYOUT);
 } else {
 virtio_clear_feature(&features, VIRTIO_F_ANY_LAYOUT);
 virtio_add_feature(&features, VIRTIO_BLK_F_SCSI);
-- 
MST

[Qemu-devel] [PULL 4/5] net/dp8393x: remove check of runt packets

2015-07-28 Thread Leon Alrae

From: Hervé Poussineau 

Ethernet requires that messages are at least 64 bytes on the wire. This
limitation does not exist on emulation (no wire message), so remove the
check. Netcard is now able to receive small network packets.

Signed-off-by: Hervé Poussineau 
Reviewed-by: Aurelien Jarno 
Signed-off-by: Leon Alrae 
---
 hw/net/dp8393x.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
index 93d6a47..0f45146 100644
--- a/hw/net/dp8393x.c
+++ b/hw/net/dp8393x.c
@@ -643,11 +643,6 @@ static int dp8393x_receive_filter(dp8393xState *s, const 
uint8_t * buf,
 static const uint8_t bcast[] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff};
 int i;
 
-/* Check for runt packet (remember that checksum is not there) */
-if (size < 64 - 4) {
-return (s->regs[SONIC_RCR] & SONIC_RCR_RNT) ? 0 : -1;
-}
-
 /* Check promiscuous mode */
 if ((s->regs[SONIC_RCR] & SONIC_RCR_PRO) && (buf[0] & 1) == 0) {
 return 0;
-- 
2.1.0

[Qemu-devel] [PULL 5/5] net/dp8393x: do not use memory_region_init_rom_device with NULL

2015-07-28 Thread Leon Alrae

From: Hervé Poussineau 

Replace memory_region_init_rom_device() with memory_region_init_ram() and
memory_region_set_readonly().
This fixes a guest-triggerable QEMU crash when guest tries to write to PROM.

Signed-off-by: Hervé Poussineau 
[leon.al...@imgtec.com: shorten subject length]
Signed-off-by: Leon Alrae 
---
 hw/net/dp8393x.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
index 0f45146..ab607e4 100644
--- a/hw/net/dp8393x.c
+++ b/hw/net/dp8393x.c
@@ -831,6 +831,7 @@ static void dp8393x_realize(DeviceState *dev, Error **errp)
 dp8393xState *s = DP8393X(dev);
 int i, checksum;
 uint8_t *prom;
+Error *local_err = NULL;
 
 address_space_init(&s->as, s->dma_mr, "dp8393x");
 memory_region_init_io(&s->mmio, OBJECT(dev), &dp8393x_ops, s,
@@ -843,8 +844,13 @@ static void dp8393x_realize(DeviceState *dev, Error **errp)
 s->watchdog = timer_new_ns(QEMU_CLOCK_VIRTUAL, dp8393x_watchdog, s);
 s->regs[SONIC_SR] = 0x0004; /* only revision recognized by Linux */
 
-memory_region_init_rom_device(&s->prom, OBJECT(dev), NULL, NULL,
-  "dp8393x-prom", SONIC_PROM_SIZE, NULL);
+memory_region_init_ram(&s->prom, OBJECT(dev),
+   "dp8393x-prom", SONIC_PROM_SIZE, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+memory_region_set_readonly(&s->prom, true);
 prom = memory_region_get_ram_ptr(&s->prom);
 checksum = 0;
 for (i = 0; i < 6; i++) {
-- 
2.1.0

[Qemu-devel] [PULL 1/5] target-mips: fix passing incompatible pointer type in machine.c

2015-07-28 Thread Leon Alrae

Reported-by: Peter Maydell 
Signed-off-by: Leon Alrae 
---
 target-mips/machine.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target-mips/machine.c b/target-mips/machine.c
index 8fa755c..b15c43a 100644
--- a/target-mips/machine.c
+++ b/target-mips/machine.c
@@ -153,6 +153,7 @@ static void put_tlb(QEMUFile *f, void *pv, size_t size)
 {
 r4k_tlb_t *v = pv;
 
+uint8_t asid = v->ASID;
 uint16_t flags = ((v->EHINV << 15) |
   (v->RI1 << 14) |
   (v->RI0 << 13) |
@@ -168,7 +169,7 @@ static void put_tlb(QEMUFile *f, void *pv, size_t size)
 
 qemu_put_betls(f, &v->VPN);
 qemu_put_be32s(f, &v->PageMask);
-qemu_put_8s(f, &v->ASID);
+qemu_put_8s(f, &asid);
 qemu_put_be16s(f, &flags);
 qemu_put_be64s(f, &v->PFN[0]);
 qemu_put_be64s(f, &v->PFN[1]);
-- 
2.1.0

[Qemu-devel] [PULL 0/5] target-mips queue for 2.4

2015-07-28 Thread Leon Alrae

Hi,

This last minute pull request for -rc3 contains target-mips bug fixes and
it also includes recent Herve's dp8393x fixes.

Thanks,
Leon

Cc: Peter Maydell 
Cc: Aurelien Jarno 

The following changes since commit f8787f8723eaca1be99e3b1873e54de163fffa93:

  Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20150727' into 
staging (2015-07-27 19:37:09 +0100)

are available in the git repository at:

  git://github.com/lalrae/qemu.git tags/mips-20150728

for you to fetch changes up to 52579c681cb12bf64de793e85edc50d990f4d42f:

  net/dp8393x: do not use memory_region_init_rom_device with NULL (2015-07-28 
09:30:10 +0100)


MIPS patches 2015-07-28

Changes:
* net/dp8393x fixes
* Vectored Interrupts bug fix
* fix for a bug in machine.c which was provoking a warning on FreeBSD


Hervé Poussineau (3):
  net/dp8393x: disable user creation
  net/dp8393x: remove check of runt packets
  net/dp8393x: do not use memory_region_init_rom_device with NULL

Leon Alrae (1):
  target-mips: fix passing incompatible pointer type in machine.c

Yongbok Kim (1):
  target-mips: fix offset calculation for Interrupts

 hw/net/dp8393x.c| 17 ++---
 target-mips/helper.c| 46 +-
 target-mips/machine.c   |  3 ++-
 target-mips/op_helper.c |  2 --
 4 files changed, 33 insertions(+), 35 deletions(-)

[Qemu-devel] [PULL 3/5] net/dp8393x: disable user creation

2015-07-28 Thread Leon Alrae

From: Hervé Poussineau 

Netcard needs an address space to write data to, which can't be specified
on command line.
This fixes a crash when user starts QEMU with "-device dp8393x"

Signed-off-by: Hervé Poussineau 
Reviewed-by: Aurelien Jarno 
Signed-off-by: Leon Alrae 
---
 hw/net/dp8393x.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
index 451ff72..93d6a47 100644
--- a/hw/net/dp8393x.c
+++ b/hw/net/dp8393x.c
@@ -889,6 +889,8 @@ static void dp8393x_class_init(ObjectClass *klass, void 
*data)
 dc->reset = dp8393x_reset;
 dc->vmsd = &vmstate_dp8393x;
 dc->props = dp8393x_properties;
+/* Reason: dma_mr property can't be set */
+dc->cannot_instantiate_with_device_add_yet = true;
 }
 
 static const TypeInfo dp8393x_info = {
-- 
2.1.0

[Qemu-devel] [PULL 2/5] target-mips: fix offset calculation for Interrupts

2015-07-28 Thread Leon Alrae

From: Yongbok Kim 

Correct computation of vector offsets for EXCP_EXT_INTERRUPT.
For instance, if Cause.IV is 0 the vector offset should be 0x180.

Simplify the finding vector number logic for the Vectored Interrupts.

Signed-off-by: Yongbok Kim 
Reviewed-by: Leon Alrae 
[leon.al...@imgtec.com: cosmetic changes]
Signed-off-by: Leon Alrae 
---
 target-mips/helper.c| 46 +-
 target-mips/op_helper.c |  2 --
 2 files changed, 21 insertions(+), 27 deletions(-)

diff --git a/target-mips/helper.c b/target-mips/helper.c
index 8e3204a..04ba19f 100644
--- a/target-mips/helper.c
+++ b/target-mips/helper.c
@@ -565,34 +565,30 @@ void mips_cpu_do_interrupt(CPUState *cs)
 break;
 case EXCP_EXT_INTERRUPT:
 cause = 0;
-if (env->CP0_Cause & (1 << CP0Ca_IV))
-offset = 0x200;
-
-if (env->CP0_Config3 & ((1 << CP0C3_VInt) | (1 << CP0C3_VEIC))) {
-/* Vectored Interrupts.  */
-unsigned int spacing;
-unsigned int vector;
-unsigned int pending = (env->CP0_Cause & CP0Ca_IP_mask) >> 8;
-
-pending &= env->CP0_Status >> 8;
-/* Compute the Vector Spacing.  */
-spacing = (env->CP0_IntCtl >> CP0IntCtl_VS) & ((1 << 6) - 1);
-spacing <<= 5;
-
-if (env->CP0_Config3 & (1 << CP0C3_VInt)) {
-/* For VInt mode, the MIPS computes the vector internally.  */
-for (vector = 7; vector > 0; vector--) {
-if (pending & (1 << vector)) {
-/* Found it.  */
-break;
+if (env->CP0_Cause & (1 << CP0Ca_IV)) {
+uint32_t spacing = (env->CP0_IntCtl >> CP0IntCtl_VS) & 0x1f;
+
+if ((env->CP0_Status & (1 << CP0St_BEV)) || spacing == 0) {
+offset = 0x200;
+} else {
+uint32_t vector = 0;
+uint32_t pending = (env->CP0_Cause & CP0Ca_IP_mask) >> 
CP0Ca_IP;
+
+if (env->CP0_Config3 & (1 << CP0C3_VEIC)) {
+/* For VEIC mode, the external interrupt controller feeds
+ * the vector through the CP0Cause IP lines.  */
+vector = pending;
+} else {
+/* Vectored Interrupts
+ * Mask with Status.IM7-IM0 to get enabled interrupts. */
+pending &= (env->CP0_Status >> CP0St_IM) & 0xff;
+/* Find the highest-priority interrupt. */
+while (pending >>= 1) {
+vector++;
 }
 }
-} else {
-/* For VEIC mode, the external interrupt controller feeds the
-   vector through the CP0Cause IP lines.  */
-vector = pending;
+offset = 0x200 + (vector * (spacing << 5));
 }
-offset = 0x200 + vector * spacing;
 }
 goto set_EPC;
 case EXCP_LTLBL:
diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c
index 9c28631..db4f6b9 100644
--- a/target-mips/op_helper.c
+++ b/target-mips/op_helper.c
@@ -1432,7 +1432,6 @@ void helper_mttc0_status(CPUMIPSState *env, target_ulong 
arg1)
 
 void helper_mtc0_intctl(CPUMIPSState *env, target_ulong arg1)
 {
-/* vectored interrupts not implemented, no performance counters. */
 env->CP0_IntCtl = (env->CP0_IntCtl & ~0x03e0) | (arg1 & 0x03e0);
 }
 
@@ -1473,7 +1472,6 @@ target_ulong helper_mftc0_ebase(CPUMIPSState *env)
 
 void helper_mtc0_ebase(CPUMIPSState *env, target_ulong arg1)
 {
-/* vectored interrupts not implemented */
 env->CP0_EBase = (env->CP0_EBase & ~0x3000) | (arg1 & 0x3000);
 }
 
-- 
2.1.0

Re: [Qemu-devel] [PATCH RFC 4/6] vhost: alloc shareable log

2015-07-28 Thread Michael S. Tsirkin

On Tue, Jul 28, 2015 at 01:28:05PM +0800, Jason Wang wrote:
> 
> 
> On 07/23/2015 09:36 AM, Marc-André Lureau wrote:
> > If the backend is of type VHOST_BACKEND_TYPE_USER, allocate
> > shareable memory.
> >
> > Note: vhost_log_get() can use a global "vhost_log" that can be shared by
> > several vhost devices. We may want instead a common shareable log and a
> > common non-shareable one.
> >
> > Signed-off-by: Marc-André Lureau 
> > ---
> >  hw/virtio/vhost.c | 42 +-
> >  include/hw/virtio/vhost.h |  3 ++-
> >  2 files changed, 35 insertions(+), 10 deletions(-)
> >
> > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > index 2712c6f..12dd644 100644
> > --- a/hw/virtio/vhost.c
> > +++ b/hw/virtio/vhost.c
> > @@ -286,20 +286,34 @@ static uint64_t vhost_get_log_size(struct vhost_dev 
> > *dev)
> >  }
> >  return log_size;
> >  }
> > -static struct vhost_log *vhost_log_alloc(uint64_t size)
> > +
> > +static struct vhost_log *vhost_log_alloc(uint64_t size, bool share)
> >  {
> > -struct vhost_log *log = g_malloc0(sizeof *log + size * 
> > sizeof(*(log->log)));
> > +struct vhost_log *log;
> > +uint64_t logsize = size * sizeof(*(log->log));
> > +int fd = -1;
> > +
> > +log = g_new0(struct vhost_log, 1);
> > +if (share) {
> > +log->log = qemu_memfd_alloc("vhost-log", logsize,
> > +F_SEAL_GROW|F_SEAL_SHRINK|F_SEAL_SEAL, 
> > &fd);
> > +memset(log->log, 0, logsize);
> > +} else {
> > +log->log = g_malloc0(logsize);
> > +}
> >  
> >  log->size = size;
> >  log->refcnt = 1;
> > +log->fd = fd;
> >  
> >  return log;
> >  }
> >  
> > -static struct vhost_log *vhost_log_get(uint64_t size)
> > +static struct vhost_log *vhost_log_get(uint64_t size, bool share)
> >  {
> > -if (!vhost_log || vhost_log->size != size) {
> > -vhost_log = vhost_log_alloc(size);
> > +if (!vhost_log || vhost_log->size != size ||
> > +(share && vhost_log->fd == -1)) {
> > +vhost_log = vhost_log_alloc(size, share);
> >  } else {
> >  ++vhost_log->refcnt;
> >  }
> > @@ -324,21 +338,30 @@ static void vhost_log_put(struct vhost_dev *dev, bool 
> > sync)
> >  if (vhost_log == log) {
> >  vhost_log = NULL;
> >  }
> > +
> > +if (log->fd == -1) {
> > +g_free(log->log);
> > +} else {
> > +qemu_memfd_free(log->log, log->size * sizeof(*(log->log)),
> > +log->fd);
> > +}
> >  g_free(log);
> >  }
> >  }
> >  
> >  static inline void vhost_dev_log_resize(struct vhost_dev* dev, uint64_t 
> > size)
> >  {
> > -struct vhost_log *log = vhost_log_get(size);
> > +bool share = dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER;
> > +struct vhost_log *log = vhost_log_get(size, share);
> >  uint64_t log_base = (uintptr_t)log->log;
> >  int r;
> >  
> > -r = dev->vhost_ops->vhost_call(dev, VHOST_SET_LOG_BASE, &log_base);
> > -assert(r >= 0);
> >  vhost_log_put(dev, true);
> >  dev->log = log;
> >  dev->log_size = size;
> > +
> > +r = dev->vhost_ops->vhost_call(dev, VHOST_SET_LOG_BASE, &log_base);
> > +assert(r >= 0);
> >  }
> 
> Why this change is needed?

I know why it's needed :) But this needs to be stated in the commit log.
Also, it only makes sense if remote supports getting the logfd.

> >  
> >  static int vhost_verify_ring_mappings(struct vhost_dev *dev,
> > @@ -1136,9 +1159,10 @@ int vhost_dev_start(struct vhost_dev *hdev, 
> > VirtIODevice *vdev)
> >  
> >  if (hdev->log_enabled) {
> >  uint64_t log_base;
> > +bool share = hdev->vhost_ops->backend_type == 
> > VHOST_BACKEND_TYPE_USER;
> >  
> >  hdev->log_size = vhost_get_log_size(hdev);
> > -hdev->log = vhost_log_get(hdev->log_size);
> > +hdev->log = vhost_log_get(hdev->log_size, share);
> >  log_base = (uintptr_t)hdev->log->log;
> >  r = hdev->vhost_ops->vhost_call(hdev, VHOST_SET_LOG_BASE,
> >  hdev->log_size ? &log_base : NULL);
> > diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> > index 6467c73..ab1dcac 100644
> > --- a/include/hw/virtio/vhost.h
> > +++ b/include/hw/virtio/vhost.h
> > @@ -31,7 +31,8 @@ typedef unsigned long vhost_log_chunk_t;
> >  struct vhost_log {
> >  unsigned long long size;
> >  int refcnt;
> > -vhost_log_chunk_t log[0];
> > +int fd;
> > +vhost_log_chunk_t *log;
> >  };
> >  
> >  struct vhost_memory;

Re: [Qemu-devel] [PATCH v7 42/42] Inhibit ballooning during postcopy

2015-07-28 Thread Amit Shah

On (Tue) 28 Jul 2015 [10:08:15], Dr. David Alan Gilbert wrote:
> * Amit Shah (amit.s...@redhat.com) wrote:
> > On (Tue) 16 Jun 2015 [11:26:55], Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" 
> > > 
> > > Postcopy detects accesses to pages that haven't been transferred yet
> > > using userfaultfd, and it causes exceptions on pages that are 'not
> > > present'.
> > > Ballooning also causes pages to be marked as 'not present' when the
> > > guest inflates the balloon.
> > > Potentially a balloon could be inflated to discard pages that are
> > > currently inflight during postcopy and that may be arriving at about
> > > the same time.
> > > 
> > > To avoid this confusion, disable ballooning during postcopy.
> > > 
> > > When disabled we drop balloon requests from the guest.  Since ballooning
> > > is generally initiated by the host, the management system should avoid
> > > initiating any balloon instructions to the guest during migration,
> > > although it's not possible to know how long it would take a guest to
> > > process a request made prior to the start of migration.
> > > 
> > > Queueing the requests until after migration would be nice, but is
> > > non-trivial, since the set of inflate/deflate requests have to
> > > be compared with the state of the page to know what the final
> > > outcome is allowed to be.
> > 
> > I didn't track the previous discussion, but there were plans to have
> > guest-initiated balloon requests for cases where the guest wants to
> > co-operate with hosts and return any free mem available We don't
> > currently have guests that do this, but we also don't want to have a
> > dependency between the host and guest -- they should be independent.
> > 
> > This approach here seems the simplest possible, short of maintaining
> > another bitmap for the duration of postcopy which indicates
> > guest-freed memory pages which postcopy should not populate, after
> > receiving them at the dest (this sounds better to me than queuing up
> > guest requests).
> > 
> > The downside here is that the guest offered some memory back, and we
> > don't use it.  The guest also doesn't use it -- so it's a double loss,
> > of sorts.
> > 
> > Thoughts?  I don't have a problem with this current approach, but if
> > we could get something better, that'll be good too.
> 
> It needs something like that bitmap, but it would take quite a bit
> of care to manage the interaction between:
> a) The guest emitting balloon notifications
> b) Pages being received from the source
> c) Destination use of that page
> 
>   we also have to think what to do with a page that's been ballooned
> after reception of the source page; the madvise(dontneed) that's used
> normally would cause userfault to fire again, and we can't allow that.
> (We could make it the same as receiving a zero page).   But then we would
> also have to cope with  the source sending us a page after the destination
> has ballooned it and make sure to discard that (I suspect there are further
> ordering examples that have to also be considered).

Yeah.  I'm fine with the current approach, with the downsides
mentioned.  Maybe in the commit message, make it explicit that the
guest may think it's given up ownership, but the host won't honour
this till postcopy isn't finished.

Anyway:

Reviewed-by: Amit Shah 


Amit

Re: [Qemu-devel] [PATCH v2 for-2.5] rcu: Allow calling rcu_(un)register_thread() during synchronize_rcu()

2015-07-28 Thread Paolo Bonzini

On 28/07/2015 12:02, Wen Congyang wrote:
> I have a question about rcu: while do we call wait_for_readers()
> twice for 32-bit host?

Because there is a very small but non-zero probability of the counter
going up by exactly 2^31 periods (periods are stored in bits 1-31 so you
lose one bit) while the thread is sleeping.  This detail of the
implementation comes from URCU.

Paolo

Re: [Qemu-devel] [PATCH] raw-posix.c: Make GetBSDPath() handle caching options

2015-07-28 Thread Stefan Hajnoczi

On Mon, Jul 27, 2015 at 12:28:03PM -0400, Programmingkid wrote:
> Add support for caching options that can be specified from
> the command line. 

Please squash this into the commit message when merging:

The CD-ROM raw char device bypasses the host page cache and therefore
has alignment requirements.  Alignment probing is necessary so only use
the raw char device if BDRV_O_NOCACHE is set.

This patch fixes -cdrom /dev/cdrom on Mac OS X hosts, where bdrv_read()
used to fail due to misaligned requests during image format probing.

Reviewed-by: Stefan Hajnoczi 

pgp4xLdVDM1y5.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH v5] raw-posix.c: Make physical devices usable in QEMU under Mac OS X host

2015-07-28 Thread Stefan Hajnoczi

On Mon, Jul 27, 2015 at 01:05:15PM -0400, Programmingkid wrote:
> Mac OS X can be picky when it comes to allowing the user to use physical 
> devices
> in QEMU. Most mounted volumes appear to be off limits to QEMU. If an issue is
> detected, a message is displayed showing the user how to unmount a volume.
> 
> Signed-off-by: John Arbuckle 
> 
> ---
> Removed changes to GetBSDPath() to a separate patch.
> This patch now depends on the GetBSDPath patch.
> 
>  block/raw-posix.c |   90 +---
>  1 files changed, 64 insertions(+), 26 deletions(-)

Kevin: Please review for QEMU 2.5


pgpIYFFljVxdL.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH for-2.4 0/2] AioContext: fix deadlock after aio_context_acquire() race

2015-07-28 Thread Cornelia Huck

On Tue, 28 Jul 2015 09:34:46 +0100
Stefan Hajnoczi  wrote:

> On Tue, Jul 28, 2015 at 10:02:26AM +0200, Cornelia Huck wrote:
> > On Tue, 28 Jul 2015 09:07:00 +0200
> > Cornelia Huck  wrote:
> > 
> > > On Mon, 27 Jul 2015 17:33:37 +0100
> > > Stefan Hajnoczi  wrote:
> > > 
> > > > See Patch 2 for details on the deadlock after two aio_context_acquire() 
> > > > calls
> > > > race.  This caused dataplane to hang on startup.
> > > > 
> > > > Patch 1 is a memory leak fix for AioContext that's needed by Patch 2.
> > > > 
> > > > Stefan Hajnoczi (2):
> > > >   AioContext: avoid leaking BHs on cleanup
> > > >   AioContext: force event loop iteration using BH
> > > > 
> > > >  async.c | 29 +++--
> > > >  include/block/aio.h |  3 +++
> > > >  2 files changed, 30 insertions(+), 2 deletions(-)
> > > > 
> > > 
> > > Just gave this a try: The stripped-down guest that hangs during startup
> > > on master is working fine with these patches applied, and my full setup
> > > works as well.
> > > 
> > > So,
> > > 
> > > Tested-by: Cornelia Huck 
> > 
> > Uh-oh, spoke too soon. It starts, but when I try a virsh managedsave, I
> > get
> > 
> > qemu-system-s390x: /data/git/yyy/qemu/async.c:242: aio_ctx_finalize: 
> > Assertion `ctx->first_bh->deleted' failed.
> 
> Please pretty-print ctx->first_bh in gdb.  In particular, which function
> is ctx->first_bh->cb pointing to?

(gdb) p/x *(QEMUBH *)ctx->first_bh
$2 = {ctx = 0x9aab3730, cb = 0x801b7c5c, opaque = 0x3ff9800dee0, next = 
0x3ff9800dfb0, scheduled = 0x0, idle = 0x0, deleted = 0x0}

cb is pointing at spawn_thread_bh_fn.

> 
> I tried reproducing with qemu-system-x86_64 and a RHEL 7 guest but
> couldn't trigger the assertion failure.

I use the old x-data-plane attribute; if I turn it off, I don't hit the
assertion.

> 
> This assertion means that there is an *existing* QEMUBH memory leak.  It
> is not introduced by this patch series.  If we run out of time for QEMU
> 2.4, it would be acceptable to replace the assertion with:
> 
>   /* TODO track down leaked BHs and turn this into an assertion */
>   if (ctx->first_bh->deleted) {
>   g_free(ctx->first_bh);
>   }

Re: [Qemu-devel] [PULL 0/2] Block layer patches for 2.4.0-rc3

2015-07-28 Thread Peter Maydell

On 27 July 2015 at 16:46, Kevin Wolf  wrote:
> The following changes since commit 122e7dab8ac549c8c5a9e1e13aa2464190e888de:
>
>   Merge remote-tracking branch 'remotes/stefanha/tags/net-pull-request' into 
> staging (2015-07-27 14:53:42 +0100)
>
> are available in the git repository at:
>
>
>   git://repo.or.cz/qemu/kevin.git tags/for-upstream
>
> for you to fetch changes up to 77c102c26ead946fe7589d4bddcdfa5cb431ebfe:
>
>   block: qemu-iotests - add check for multiplication overflow in vpc 
> (2015-07-27 17:19:07 +0200)
>
> 
> Block layer patches for 2.4.0-rc3
>

Applied, thanks.

-- PMM

Re: [Qemu-devel] [PATCH] hw/net: handle flow control in mcf_fec driver receiver

2015-07-28 Thread Stefan Hajnoczi

On Tue, Jul 28, 2015 at 11:02:54AM +1000, Greg Ungerer wrote:
> The network mcf_fec driver emulated receive side method is not dealing
> with network queue flow control properly.
> 
> Modify the receive side to check if we have enough space in the
> descriptors to store the current packet. If not we process none of it
> and return 0. When the guest frees up some buffers through its descriptors
> we signal the qemu net layer to send more packets.
> 
> Signed-off-by: Greg Ungerer 
> ---
>  hw/net/mcf_fec.c | 45 ++---
>  1 file changed, 34 insertions(+), 11 deletions(-)

Thanks, applied to my net tree:
https://github.com/stefanha/qemu/commits/net

Stefan


pgpJfCkcJddet.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH for-2.4 0/2] AioContext: fix deadlock after aio_context_acquire() race

2015-07-28 Thread Stefan Hajnoczi

On Tue, Jul 28, 2015 at 11:26 AM, Cornelia Huck
 wrote:
> On Tue, 28 Jul 2015 09:34:46 +0100
> Stefan Hajnoczi  wrote:
>
>> On Tue, Jul 28, 2015 at 10:02:26AM +0200, Cornelia Huck wrote:
>> > On Tue, 28 Jul 2015 09:07:00 +0200
>> > Cornelia Huck  wrote:
>> >
>> > > On Mon, 27 Jul 2015 17:33:37 +0100
>> > > Stefan Hajnoczi  wrote:
>> > >
>> > > > See Patch 2 for details on the deadlock after two 
>> > > > aio_context_acquire() calls
>> > > > race.  This caused dataplane to hang on startup.
>> > > >
>> > > > Patch 1 is a memory leak fix for AioContext that's needed by Patch 2.
>> > > >
>> > > > Stefan Hajnoczi (2):
>> > > >   AioContext: avoid leaking BHs on cleanup
>> > > >   AioContext: force event loop iteration using BH
>> > > >
>> > > >  async.c | 29 +++--
>> > > >  include/block/aio.h |  3 +++
>> > > >  2 files changed, 30 insertions(+), 2 deletions(-)
>> > > >
>> > >
>> > > Just gave this a try: The stripped-down guest that hangs during startup
>> > > on master is working fine with these patches applied, and my full setup
>> > > works as well.
>> > >
>> > > So,
>> > >
>> > > Tested-by: Cornelia Huck 
>> >
>> > Uh-oh, spoke too soon. It starts, but when I try a virsh managedsave, I
>> > get
>> >
>> > qemu-system-s390x: /data/git/yyy/qemu/async.c:242: aio_ctx_finalize: 
>> > Assertion `ctx->first_bh->deleted' failed.
>>
>> Please pretty-print ctx->first_bh in gdb.  In particular, which function
>> is ctx->first_bh->cb pointing to?
>
> (gdb) p/x *(QEMUBH *)ctx->first_bh
> $2 = {ctx = 0x9aab3730, cb = 0x801b7c5c, opaque = 0x3ff9800dee0, next =
> 0x3ff9800dfb0, scheduled = 0x0, idle = 0x0, deleted = 0x0}
>
> cb is pointing at spawn_thread_bh_fn.
>
>>
>> I tried reproducing with qemu-system-x86_64 and a RHEL 7 guest but
>> couldn't trigger the assertion failure.
>
> I use the old x-data-plane attribute; if I turn it off, I don't hit the
> assertion.

Thanks.  I understand how to reproduce it now: use -drive aio=threads
and do I/O during managedsave.

I suspect there are more cases of this.  We need to clean it up during QEMU 2.5.

For now let's continue leaking these BHs as we've always done.

Stefan

Re: [Qemu-devel] [PATCH v2 for-2.5] rcu: Allow calling rcu_(un)register_thread() during synchronize_rcu()

2015-07-28 Thread Wen Congyang

On 07/28/2015 06:18 PM, Paolo Bonzini wrote:
> 
> 
> On 28/07/2015 12:02, Wen Congyang wrote:
>> I have a question about rcu: while do we call wait_for_readers()
>> twice for 32-bit host?
> 
> Because there is a very small but non-zero probability of the counter
> going up by exactly 2^31 periods (periods are stored in bits 1-31 so you
> lose one bit) while the thread is sleeping.  This detail of the
> implementation comes from URCU.

Yes, so you use rcu_gp_ctr ^ RCU_GP_CTR to instead of rcu_gp_ctr + RCU_GP_CTR.
The initial value is 1, so rcu_gp_ctr is: 1, 3, 1, 3, ...
The rcu_gp_ctr will never be 0. I think calling wait_for_readers() once is
enough.

Do I miss something?

Thanks
Wen Congyang

> 
> Paolo
>

Re: [Qemu-devel] [PATCH for-2.4 0/2] AioContext: fix deadlock after aio_context_acquire() race

2015-07-28 Thread Stefan Hajnoczi

On Tue, Jul 28, 2015 at 11:31 AM, Stefan Hajnoczi  wrote:
> On Tue, Jul 28, 2015 at 11:26 AM, Cornelia Huck
>  wrote:
>> On Tue, 28 Jul 2015 09:34:46 +0100
>> Stefan Hajnoczi  wrote:
>>
>>> On Tue, Jul 28, 2015 at 10:02:26AM +0200, Cornelia Huck wrote:
>>> > On Tue, 28 Jul 2015 09:07:00 +0200
>>> > Cornelia Huck  wrote:
>>> >
>>> > > On Mon, 27 Jul 2015 17:33:37 +0100
>>> > > Stefan Hajnoczi  wrote:
>>> > >
>>> > > > See Patch 2 for details on the deadlock after two 
>>> > > > aio_context_acquire() calls
>>> > > > race.  This caused dataplane to hang on startup.
>>> > > >
>>> > > > Patch 1 is a memory leak fix for AioContext that's needed by Patch 2.
>>> > > >
>>> > > > Stefan Hajnoczi (2):
>>> > > >   AioContext: avoid leaking BHs on cleanup
>>> > > >   AioContext: force event loop iteration using BH
>>> > > >
>>> > > >  async.c | 29 +++--
>>> > > >  include/block/aio.h |  3 +++
>>> > > >  2 files changed, 30 insertions(+), 2 deletions(-)
>>> > > >
>>> > >
>>> > > Just gave this a try: The stripped-down guest that hangs during startup
>>> > > on master is working fine with these patches applied, and my full setup
>>> > > works as well.
>>> > >
>>> > > So,
>>> > >
>>> > > Tested-by: Cornelia Huck 
>>> >
>>> > Uh-oh, spoke too soon. It starts, but when I try a virsh managedsave, I
>>> > get
>>> >
>>> > qemu-system-s390x: /data/git/yyy/qemu/async.c:242: aio_ctx_finalize: 
>>> > Assertion `ctx->first_bh->deleted' failed.
>>>
>>> Please pretty-print ctx->first_bh in gdb.  In particular, which function
>>> is ctx->first_bh->cb pointing to?
>>
>> (gdb) p/x *(QEMUBH *)ctx->first_bh
>> $2 = {ctx = 0x9aab3730, cb = 0x801b7c5c, opaque = 0x3ff9800dee0, next =
>> 0x3ff9800dfb0, scheduled = 0x0, idle = 0x0, deleted = 0x0}
>>
>> cb is pointing at spawn_thread_bh_fn.
>>
>>>
>>> I tried reproducing with qemu-system-x86_64 and a RHEL 7 guest but
>>> couldn't trigger the assertion failure.
>>
>> I use the old x-data-plane attribute; if I turn it off, I don't hit the
>> assertion.
>
> Thanks.  I understand how to reproduce it now: use -drive aio=threads
> and do I/O during managedsave.
>
> I suspect there are more cases of this.  We need to clean it up during QEMU 
> 2.5.
>
> For now let's continue leaking these BHs as we've always done.

Actually, this case can be fixed in the patch by moving
thread_pool_free() before the BH cleanup loop.

But I still fear other parts of QEMU may be leaking BHs and we should
use a full release cycle to weed them out before introducing the
assertion.

Stefan

Re: [Qemu-devel] [PATCH for-2.4 v3] xen: Drop net_rx_ok

2015-07-28 Thread Stefan Hajnoczi

On Tue, Jul 28, 2015 at 05:52:56PM +0800, Fam Zheng wrote:
> Let net_rx_packet() (which checks the same conditions) drops the packet
> if the device is not ready. Drop net_xen_info.can_receive and update the
> return value for the buffer full case.
> 
> We rely on the qemu_flush_queued_packets() in net_event() to wake up
> the peer when the buffer becomes available again.
> 
> Signed-off-by: Fam Zheng 
> 
> ---
> 
> v3: Drop log. [Wen]
> 
> v2:
> Fix return value of net_rx_packet to keep the semantics when ring
> buffer is full. [Stefan]
> Update commit message accordingly.
> ---
>  hw/net/xen_nic.c | 25 +
>  1 file changed, 1 insertion(+), 24 deletions(-)

Thanks, applied to my net tree:
https://github.com/stefanha/qemu/commits/net

Stefan


pgpc21bzSUnwN.pgp
Description: PGP signature

Re: [Qemu-devel] [PULL 00/10] virtio fixes for 2.4

2015-07-28 Thread Cornelia Huck

On Tue, 28 Jul 2015 12:57:29 +0300
"Michael S. Tsirkin"  wrote:

> 
> virtio fixes for 2.4
> 
> Mostly virtio 1 spec compliance fixes.
> We are unlikely to make it perfectly compliant in
> the first release, but it seems worth it to try.
> 
> Signed-off-by: Michael S. Tsirkin 
> 
> 

FWIW, I think the feature bit tango in the various virtio-blk patches
is a bit confusing, but the end result looks sane. A bit wary of doing
these changes that late in the release cycle, but it survives some
light testing and seems fine for virtio-ccw revision 1 enablement as
well.

Re: [Qemu-devel] [PATCH RFC 2/6] posix: add linux-only memfd fallback

2015-07-28 Thread Marc-André Lureau

Hi

On Tue, Jul 28, 2015 at 10:11 AM, Paolo Bonzini  wrote:
>>
>> What's defining all these macros?
>
> They're in asm/unistd.h.
>
> I think that, instead of making qemu/osdep.h the new qemu-common.h, the
> wrappers added by patch 3 should be declared in a new header
> qemu/memfd.h.  The implementation in util/memfd.c can include both
> linux/fcntl.h and asm/unistd.h.
>

Ok, shouldn't it keep the inline function? this avoids future clash
when upgrading glibc.

-- 
Marc-André Lureau

[Qemu-devel] [PATCH v2] gdbstub: Implement Xfer:auxv:read

2015-07-28 Thread Bhushan Attarde

Implementation of "Xfer:auxv:read" to provide auxiliary vector information
to clients which relies on it.

For example: AT_ENTRY in auxiliary vector provides the entry point information.
Client can use this information to compare it with entry point mentioned in
executable to calculate load offset and then update the load addresses
accordingly.

Signed-off-by: Bhushan Attarde 
---

Notes:
Changes for v2:
- Xfer:auxv:read and Xfer:features:read now shares the code for parsing 
out arguments rather than duplicating it.

 gdbstub.c | 71 ---
 1 file changed, 54 insertions(+), 17 deletions(-)

diff --git a/gdbstub.c b/gdbstub.c
index 92b2f81..a6a79dc 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -1150,42 +1150,73 @@ static int gdb_handle_packet(GDBState *s, const char 
*line_buf)
 if (cc->gdb_core_xml_file != NULL) {
 pstrcat(buf, sizeof(buf), ";qXfer:features:read+");
 }
+#ifdef CONFIG_USER_ONLY
+pstrcat(buf, sizeof(buf), ";qXfer:auxv:read+");
+#endif
 put_packet(s, buf);
 break;
 }
-if (strncmp(p, "Xfer:features:read:", 19) == 0) {
+if (strncmp(p, "Xfer:", 5) == 0) {
+#ifdef CONFIG_USER_ONLY
+target_ulong auxv = 0;
+#endif
+target_ulong total_len = 0;
+char *ptr = NULL;
 const char *xml;
-target_ulong total_len;
 
-cc = CPU_GET_CLASS(first_cpu);
-if (cc->gdb_core_xml_file == NULL) {
+if (strncmp(p + 5, "features:read:", 14) == 0) {
+cc = CPU_GET_CLASS(first_cpu);
+if (cc->gdb_core_xml_file == NULL) {
+goto unknown_command;
+}
+gdb_has_xml = true;
+p += 19;
+xml = get_feature_xml(p, &p, cc);
+if (!xml) {
+snprintf(buf, sizeof(buf), "E00");
+put_packet(s, buf);
+break;
+}
+total_len = strlen(xml);
+}
+#ifdef CONFIG_USER_ONLY
+else if (strncmp(p + 5, "auxv:read:", 10) == 0) {
+TaskState *ts = s->c_cpu->opaque;
+auxv = ts->info->saved_auxv;
+total_len = ts->info->auxv_len;
+p += 15;
+ptr = lock_user(VERIFY_READ, auxv, total_len, 0);
+if (ptr == NULL) {
+break;
+}
+xml = (char *) ptr;
+}
+#endif
+else {
 goto unknown_command;
 }
 
-gdb_has_xml = true;
-p += 19;
-xml = get_feature_xml(p, &p, cc);
-if (!xml) {
-snprintf(buf, sizeof(buf), "E00");
-put_packet(s, buf);
-break;
+while (*p && *p != ':') {
+p++;
 }
+p++;
 
-if (*p == ':')
-p++;
 addr = strtoul(p, (char **)&p, 16);
-if (*p == ',')
+if (*p == ',') {
 p++;
+}
 len = strtoul(p, (char **)&p, 16);
 
-total_len = strlen(xml);
-if (addr > total_len) {
+if ((ptr != NULL && addr > len)
+|| (ptr == NULL && addr > total_len)) {
 snprintf(buf, sizeof(buf), "E00");
 put_packet(s, buf);
 break;
 }
-if (len > (MAX_PACKET_LENGTH - 5) / 2)
+
+if (len > (MAX_PACKET_LENGTH - 5) / 2) {
 len = (MAX_PACKET_LENGTH - 5) / 2;
+}
 if (len < total_len - addr) {
 buf[0] = 'm';
 len = memtox(buf + 1, xml + addr, len);
@@ -1194,6 +1225,12 @@ static int gdb_handle_packet(GDBState *s, const char 
*line_buf)
 len = memtox(buf + 1, xml + addr, total_len - addr);
 }
 put_packet_binary(s, buf, len + 1);
+
+#ifdef CONFIG_USER_ONLY
+if (ptr != NULL) {
+unlock_user(ptr, auxv, len);
+}
+#endif
 break;
 }
 if (is_query_packet(p, "Attached", ':')) {
-- 
2.4.4

Re: [Qemu-devel] [PATCH for-2.4 0/2] AioContext: fix deadlock after aio_context_acquire() race

2015-07-28 Thread Cornelia Huck

On Tue, 28 Jul 2015 11:34:18 +0100
Stefan Hajnoczi  wrote:

> On Tue, Jul 28, 2015 at 11:31 AM, Stefan Hajnoczi  wrote:
> > On Tue, Jul 28, 2015 at 11:26 AM, Cornelia Huck
> >  wrote:
> >> On Tue, 28 Jul 2015 09:34:46 +0100
> >> Stefan Hajnoczi  wrote:
> >>
> >>> On Tue, Jul 28, 2015 at 10:02:26AM +0200, Cornelia Huck wrote:
> >>> > On Tue, 28 Jul 2015 09:07:00 +0200
> >>> > Cornelia Huck  wrote:
> >>> >
> >>> > > On Mon, 27 Jul 2015 17:33:37 +0100
> >>> > > Stefan Hajnoczi  wrote:
> >>> > >
> >>> > > > See Patch 2 for details on the deadlock after two 
> >>> > > > aio_context_acquire() calls
> >>> > > > race.  This caused dataplane to hang on startup.
> >>> > > >
> >>> > > > Patch 1 is a memory leak fix for AioContext that's needed by Patch 
> >>> > > > 2.
> >>> > > >
> >>> > > > Stefan Hajnoczi (2):
> >>> > > >   AioContext: avoid leaking BHs on cleanup
> >>> > > >   AioContext: force event loop iteration using BH
> >>> > > >
> >>> > > >  async.c | 29 +++--
> >>> > > >  include/block/aio.h |  3 +++
> >>> > > >  2 files changed, 30 insertions(+), 2 deletions(-)
> >>> > > >
> >>> > >
> >>> > > Just gave this a try: The stripped-down guest that hangs during 
> >>> > > startup
> >>> > > on master is working fine with these patches applied, and my full 
> >>> > > setup
> >>> > > works as well.
> >>> > >
> >>> > > So,
> >>> > >
> >>> > > Tested-by: Cornelia Huck 
> >>> >
> >>> > Uh-oh, spoke too soon. It starts, but when I try a virsh managedsave, I
> >>> > get
> >>> >
> >>> > qemu-system-s390x: /data/git/yyy/qemu/async.c:242: aio_ctx_finalize: 
> >>> > Assertion `ctx->first_bh->deleted' failed.
> >>>
> >>> Please pretty-print ctx->first_bh in gdb.  In particular, which function
> >>> is ctx->first_bh->cb pointing to?
> >>
> >> (gdb) p/x *(QEMUBH *)ctx->first_bh
> >> $2 = {ctx = 0x9aab3730, cb = 0x801b7c5c, opaque = 0x3ff9800dee0, next =
> >> 0x3ff9800dfb0, scheduled = 0x0, idle = 0x0, deleted = 0x0}
> >>
> >> cb is pointing at spawn_thread_bh_fn.
> >>
> >>>
> >>> I tried reproducing with qemu-system-x86_64 and a RHEL 7 guest but
> >>> couldn't trigger the assertion failure.
> >>
> >> I use the old x-data-plane attribute; if I turn it off, I don't hit the
> >> assertion.
> >
> > Thanks.  I understand how to reproduce it now: use -drive aio=threads
> > and do I/O during managedsave.
> >
> > I suspect there are more cases of this.  We need to clean it up during QEMU 
> > 2.5.
> >
> > For now let's continue leaking these BHs as we've always done.
> 
> Actually, this case can be fixed in the patch by moving
> thread_pool_free() before the BH cleanup loop.

Tried that, may have done it wrong, because the assertion still hits.

> 
> But I still fear other parts of QEMU may be leaking BHs and we should
> use a full release cycle to weed them out before introducing the
> assertion.

Yes, that's probably the best thing to do that late in the cycle.

Re: [Qemu-devel] [PATCH v7 42/42] Inhibit ballooning during postcopy

2015-07-28 Thread Dr. David Alan Gilbert

* Amit Shah (amit.s...@redhat.com) wrote:
> On (Tue) 28 Jul 2015 [10:08:15], Dr. David Alan Gilbert wrote:
> > * Amit Shah (amit.s...@redhat.com) wrote:
> > > On (Tue) 16 Jun 2015 [11:26:55], Dr. David Alan Gilbert (git) wrote:
> > > > From: "Dr. David Alan Gilbert" 
> > > > 
> > > > Postcopy detects accesses to pages that haven't been transferred yet
> > > > using userfaultfd, and it causes exceptions on pages that are 'not
> > > > present'.
> > > > Ballooning also causes pages to be marked as 'not present' when the
> > > > guest inflates the balloon.
> > > > Potentially a balloon could be inflated to discard pages that are
> > > > currently inflight during postcopy and that may be arriving at about
> > > > the same time.
> > > > 
> > > > To avoid this confusion, disable ballooning during postcopy.
> > > > 
> > > > When disabled we drop balloon requests from the guest.  Since ballooning
> > > > is generally initiated by the host, the management system should avoid
> > > > initiating any balloon instructions to the guest during migration,
> > > > although it's not possible to know how long it would take a guest to
> > > > process a request made prior to the start of migration.
> > > > 
> > > > Queueing the requests until after migration would be nice, but is
> > > > non-trivial, since the set of inflate/deflate requests have to
> > > > be compared with the state of the page to know what the final
> > > > outcome is allowed to be.
> > > 
> > > I didn't track the previous discussion, but there were plans to have
> > > guest-initiated balloon requests for cases where the guest wants to
> > > co-operate with hosts and return any free mem available We don't
> > > currently have guests that do this, but we also don't want to have a
> > > dependency between the host and guest -- they should be independent.
> > > 
> > > This approach here seems the simplest possible, short of maintaining
> > > another bitmap for the duration of postcopy which indicates
> > > guest-freed memory pages which postcopy should not populate, after
> > > receiving them at the dest (this sounds better to me than queuing up
> > > guest requests).
> > > 
> > > The downside here is that the guest offered some memory back, and we
> > > don't use it.  The guest also doesn't use it -- so it's a double loss,
> > > of sorts.
> > > 
> > > Thoughts?  I don't have a problem with this current approach, but if
> > > we could get something better, that'll be good too.
> > 
> > It needs something like that bitmap, but it would take quite a bit
> > of care to manage the interaction between:
> > a) The guest emitting balloon notifications
> > b) Pages being received from the source
> > c) Destination use of that page
> > 
> >   we also have to think what to do with a page that's been ballooned
> > after reception of the source page; the madvise(dontneed) that's used
> > normally would cause userfault to fire again, and we can't allow that.
> > (We could make it the same as receiving a zero page).   But then we would
> > also have to cope with  the source sending us a page after the destination
> > has ballooned it and make sure to discard that (I suspect there are further
> > ordering examples that have to also be considered).
> 
> Yeah.  I'm fine with the current approach, with the downsides
> mentioned.  Maybe in the commit message, make it explicit that the
> guest may think it's given up ownership, but the host won't honour
> this till postcopy isn't finished.

OK, I've added the text:
'Guest initiated ballooning will not know if it's really freed a page
of host memory or not.'

> Anyway:
> 
> Reviewed-by: Amit Shah 

Thanks.

Dave

> 
> 
>   Amit
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v2] gdbstub: Implement Xfer:auxv:read

2015-07-28 Thread Peter Maydell

On 28 July 2015 at 11:58, Bhushan Attarde  wrote:
> Implementation of "Xfer:auxv:read" to provide auxiliary vector information
> to clients which relies on it.
>
> For example: AT_ENTRY in auxiliary vector provides the entry point 
> information.
> Client can use this information to compare it with entry point mentioned in
> executable to calculate load offset and then update the load addresses
> accordingly.
>
> Signed-off-by: Bhushan Attarde 
> ---
>
> Notes:
> Changes for v2:
> - Xfer:auxv:read and Xfer:features:read now shares the code for 
> parsing out arguments rather than duplicating it.
>
>  gdbstub.c | 71 
> ---
>  1 file changed, 54 insertions(+), 17 deletions(-)
>
> diff --git a/gdbstub.c b/gdbstub.c
> index 92b2f81..a6a79dc 100644
> --- a/gdbstub.c
> +++ b/gdbstub.c
> @@ -1150,42 +1150,73 @@ static int gdb_handle_packet(GDBState *s, const char 
> *line_buf)
>  if (cc->gdb_core_xml_file != NULL) {
>  pstrcat(buf, sizeof(buf), ";qXfer:features:read+");
>  }
> +#ifdef CONFIG_USER_ONLY
> +pstrcat(buf, sizeof(buf), ";qXfer:auxv:read+");
> +#endif
>  put_packet(s, buf);
>  break;
>  }
> -if (strncmp(p, "Xfer:features:read:", 19) == 0) {
> +if (strncmp(p, "Xfer:", 5) == 0) {
> +#ifdef CONFIG_USER_ONLY
> +target_ulong auxv = 0;
> +#endif
> +target_ulong total_len = 0;
> +char *ptr = NULL;
>  const char *xml;
> -target_ulong total_len;
>
> -cc = CPU_GET_CLASS(first_cpu);
> -if (cc->gdb_core_xml_file == NULL) {
> +if (strncmp(p + 5, "features:read:", 14) == 0) {
> +cc = CPU_GET_CLASS(first_cpu);
> +if (cc->gdb_core_xml_file == NULL) {
> +goto unknown_command;
> +}
> +gdb_has_xml = true;
> +p += 19;
> +xml = get_feature_xml(p, &p, cc);
> +if (!xml) {
> +snprintf(buf, sizeof(buf), "E00");
> +put_packet(s, buf);
> +break;
> +}
> +total_len = strlen(xml);
> +}
> +#ifdef CONFIG_USER_ONLY
> +else if (strncmp(p + 5, "auxv:read:", 10) == 0) {
> +TaskState *ts = s->c_cpu->opaque;
> +auxv = ts->info->saved_auxv;
> +total_len = ts->info->auxv_len;
> +p += 15;
> +ptr = lock_user(VERIFY_READ, auxv, total_len, 0);
> +if (ptr == NULL) {
> +break;
> +}
> +xml = (char *) ptr;
> +}
> +#endif
> +else {
>  goto unknown_command;
>  }
>
> -gdb_has_xml = true;
> -p += 19;
> -xml = get_feature_xml(p, &p, cc);
> -if (!xml) {
> -snprintf(buf, sizeof(buf), "E00");
> -put_packet(s, buf);
> -break;
> +while (*p && *p != ':') {
> +p++;
>  }
> +p++;
>
> -if (*p == ':')
> -p++;
>  addr = strtoul(p, (char **)&p, 16);
> -if (*p == ',')
> +if (*p == ',') {
>  p++;
> +}
>  len = strtoul(p, (char **)&p, 16);
>
> -total_len = strlen(xml);
> -if (addr > total_len) {
> +if ((ptr != NULL && addr > len)
> +|| (ptr == NULL && addr > total_len)) {

This if is rather confusing. Why should ptr == or != NULL
make a difference? What is ptr == NULL actually encoding?

>  snprintf(buf, sizeof(buf), "E00");
>  put_packet(s, buf);
>  break;
>  }
> -if (len > (MAX_PACKET_LENGTH - 5) / 2)
> +
> +if (len > (MAX_PACKET_LENGTH - 5) / 2) {
>  len = (MAX_PACKET_LENGTH - 5) / 2;
> +}
>  if (len < total_len - addr) {
>  buf[0] = 'm';
>  len = memtox(buf + 1, xml + addr, len);

> @@ -1194,6 +1225,12 @@ static int gdb_handle_packet(GDBState *s, const char 
> *line_buf)
>  len = memtox(buf + 1, xml + addr, total_len - addr);
>  }
>  put_packet_binary(s, buf, len + 1);
> +
> +#ifdef CONFIG_USER_ONLY
> +if (ptr != NULL) {
> +unlock_user(ptr, auxv, len);
> +}
> +#endif

This hunk is pretty ugly and rather subtle too since "ptr" is a
very generic variable name. This is implicitly placing requirements
on the code further up to behave in a particular way (use ptr
and only ptr as locked-memory in user mode). Isn't there some
way to write this that abstracts stuff out into separate functions
or a 'register an Xfer read handler' pattern someho

Re: [Qemu-devel] [PATCH v2 6/7] error: Revamp interface documentation

2015-07-28 Thread Dr. David Alan Gilbert

* Markus Armbruster (arm...@redhat.com) wrote:
> Signed-off-by: Markus Armbruster 
> Reviewed-by: Eric Blake 
> ---
>  include/qapi/error.h | 177 
> ---
>  1 file changed, 127 insertions(+), 50 deletions(-)
> 
> diff --git a/include/qapi/error.h b/include/qapi/error.h
> index 8c3a7dd..7d808e8 100644
> --- a/include/qapi/error.h
> +++ b/include/qapi/error.h
> @@ -2,13 +2,75 @@
>   * QEMU Error Objects
>   *
>   * Copyright IBM, Corp. 2011
> + * Copyright (C) 2011-2015 Red Hat, Inc.
>   *
>   * Authors:
>   *  Anthony Liguori   
> + *  Markus Armbruster 
>   *
>   * This work is licensed under the terms of the GNU LGPL, version 2.  See
>   * the COPYING.LIB file in the top-level directory.
>   */
> +
> +/*
> + * Error reporting system loosely patterned after Glib's GError.

Excellent; it's great to have this documented.
Do we have a note anywhere that says why we don't just use GError?

Dave

> + *
> + * Create an error:
> + * error_setg(&err, "situation normal, all fouled up");
> + *
> + * Report an error to stderr:
> + * error_report_err(err);
> + * This frees the error object.
> + *
> + * Report an error somewhere else:
> + * const char *msg = error_get_pretty(err);
> + * do with msg what needs to be done...
> + * error_free(err);
> + *
> + * Handle an error without reporting it (just for completeness):
> + * error_free(err);
> + * 
> + * Pass an existing error to the caller:
> + * error_propagate(errp, err);
> + * where Error **errp is a parameter, by convention the last one.
> + *
> + * Create a new error and pass it to the caller:
> + * error_setg(errp, "situation normal, all fouled up");
> + *
> + * Call a function and receive an error from it:
> + * Error *err = NULL;
> + * foo(arg, &err);
> + * if (err) {
> + * handle the error...
> + * }
> + *
> + * Call a function ignoring errors:
> + * foo(arg, NULL);
> + *
> + * Call a function aborting on errors:
> + * foo(arg, &error_abort);
> + *
> + * Receive an error and pass it on to the caller:
> + * Error *err = NULL;
> + * foo(arg, &err);
> + * if (err) {
> + * handle the error...
> + * error_propagate(errp, err);
> + * }
> + * where Error **errp is a parameter, by convention the last one.
> + *
> + * Do *not* "optimize" this to
> + * foo(arg, errp);
> + * if (*errp) { // WRONG!
> + * handle the error...
> + * }
> + * because errp may be NULL!
> + *
> + * But when all you do with the error is pass it on, please use
> + * foo(arg, errp);
> + * for readability.
> + */
> +
>  #ifndef ERROR_H
>  #define ERROR_H
>  
> @@ -16,85 +78,100 @@
>  #include "qapi-types.h"
>  #include 
>  
> -/**
> - * A class representing internal errors within QEMU.  An error has a 
> ErrorClass
> - * code and a human message.
> +/*
> + * Opaque error object.
>   */
>  typedef struct Error Error;
>  
> -/**
> - * Set an indirect pointer to an error given a ErrorClass value and a
> - * printf-style human message.  This function is not meant to be used outside
> - * of QEMU.
> +/*
> + * Get @err's human-readable error message.
>   */
> -void error_set(Error **errp, ErrorClass err_class, const char *fmt, ...)
> -GCC_FMT_ATTR(3, 4);
> +const char *error_get_pretty(Error *err);
>  
> -/**
> - * Set an indirect pointer to an error given a ErrorClass value and a
> - * printf-style human message, followed by a strerror() string if
> - * @os_error is not zero.
> +/*
> + * Get @err's error class.
> + * Note: use of error classes other than ERROR_CLASS_GENERIC_ERROR is
> + * strongly discouraged.
> + */
> +ErrorClass error_get_class(const Error *err);
> +
> +/*
> + * Create a new error object and assign it to *@errp.
> + * If @errp is NULL, the error is ignored.  Don't bother creating one
> + * then.
> + * If @errp is &error_abort, print a suitable message and abort().
> + * If @errp is anything else, *@errp must be NULL.
> + * The new error's class is ERROR_CLASS_GENERIC_ERROR, and its
> + * human-readable error message is made from printf-style @fmt, ...
> + */
> +void error_setg(Error **errp, const char *fmt, ...)
> +GCC_FMT_ATTR(2, 3);
> +
> +/*
> + * Just like error_setg(), with @os_error info added to the message.
> + * If @os_error is non-zero, ": " + strerror(os_error) is appended to
> + * the human-readable error message.
>   */
>  void error_setg_errno(Error **errp, int os_error, const char *fmt, ...)
>  GCC_FMT_ATTR(3, 4);
>  
>  #ifdef _WIN32
> -/**
> - * Set an indirect pointer to an error given a ErrorClass value and a
> - * printf-style human message, followed by a g_win32_error_message() string 
> if
> - * @win32_err is not zero.
> +/*
> + * Just like error_setg(), with @win32_error info added to the message.
> + * If @win32_error is non-zero, ": " + g_win32_error_message(win32_err)
> + * is appended to the human-readable error message.
>   */
>  void error_setg_win32(Error **errp, int win32_err, co

Re: [Qemu-devel] [PATCH v7 41/42] Disable mlock around incoming postcopy

2015-07-28 Thread Juan Quintela

Amit Shah  wrote:
> On (Tue) 14 Jul 2015 [17:22:13], Juan Quintela wrote:
>> "Dr. David Alan Gilbert (git)"  wrote:
>
>> > +if (enable_mlock) {
>> > +if (os_mlock() < 0) {
>> > +error_report("mlock: %s", strerror(errno));
>> > +/*
>> > + * It doesn't feel right to fail at this point, we have a 
>> > valid
>> > + * VM state.
>> > + */
>> 
>> realtime_init() exit in case of os_mlock() fails, so current code is:
>
> Yea, I was wondering the same - but then I thought: would the realtime
> case want a migration to happen at all?

Then disable migration with realtime looks like saner.  But that
decission don't belong to this series.

>
>> - we start qemu with mlock requset
>> - we mlock memory
>> - we start postcopy
>> - we munlock memory
>> - we mlock memory
>> 
>> I wmill really, really preffer having a check if memory is mlocked, and
>> it that case, just abort migration altogether.  Or better still, wait to
>> enable mlock *until* we have finished postcopy, no?
>
>   Amit

Re: [Qemu-devel] [PATCH v2 for-2.5] rcu: Allow calling rcu_(un)register_thread() during synchronize_rcu()

2015-07-28 Thread Paolo Bonzini

On 28/07/2015 12:33, Wen Congyang wrote:
> On 07/28/2015 06:18 PM, Paolo Bonzini wrote:
>>
>>
>> On 28/07/2015 12:02, Wen Congyang wrote:
>>> I have a question about rcu: while do we call wait_for_readers()
>>> twice for 32-bit host?
>>
>> Because there is a very small but non-zero probability of the counter
>> going up by exactly 2^31 periods (periods are stored in bits 1-31 so you
>> lose one bit) while the thread is sleeping.  This detail of the
>> implementation comes from URCU.
> 
> Yes, so you use rcu_gp_ctr ^ RCU_GP_CTR to instead of rcu_gp_ctr + RCU_GP_CTR.
> The initial value is 1, so rcu_gp_ctr is: 1, 3, 1, 3, ...
> The rcu_gp_ctr will never be 0. I think calling wait_for_readers() once is
> enough.
> 
> Do I miss something?

If you call it just once, you have the same problem as before.  In fact,
it's worse because instead of having an overflow every 2^31 periods, you
have one every 2 periods.  Instead, by checking that rcu_reader went
through 1 _and_ 3 (or that it was at least once 0, i.e. the thread was
quiescent), you are sure that the thread went through _at least one_
grace period.

Paolo

Re: [Qemu-devel] [PATCH RFC 2/6] posix: add linux-only memfd fallback

2015-07-28 Thread Paolo Bonzini



On 28/07/2015 12:58, Marc-André Lureau wrote:
> Hi
> 
> On Tue, Jul 28, 2015 at 10:11 AM, Paolo Bonzini  wrote:
>>>
>>> What's defining all these macros?
>>
>> They're in asm/unistd.h.
>>
>> I think that, instead of making qemu/osdep.h the new qemu-common.h, the
>> wrappers added by patch 3 should be declared in a new header
>> qemu/memfd.h.  The implementation in util/memfd.c can include both
>> linux/fcntl.h and asm/unistd.h.
>>
> 
> Ok, shouldn't it keep the inline function? this avoids future clash
> when upgrading glibc.

Can the inline function stay in util/memfd.c?

Paolo

[Qemu-devel] [PATCH v2 0/2] AioContext: fix deadlock after aio_context_acquire() race

2015-07-28 Thread Stefan Hajnoczi

v2:
 * Free BHs after thread_pool_free(), which calls qemu_bh_delete() [Cornelia]
 * Remove assert for leaked BHs since we don't know how many existing cases
   there are yet and QEMU 2.4-rc3 is a poor time to risk assertion failures

See Patch 2 for details on the deadlock after two aio_context_acquire() calls
race.  This caused dataplane to hang on startup.

Patch 1 is a memory leak fix for AioContext that's needed by Patch 2.

Stefan Hajnoczi (2):
  AioContext: avoid leaking deleted BHs on cleanup
  AioContext: force event loop iteration using BH

 async.c | 29 +++--
 include/block/aio.h |  3 +++
 2 files changed, 30 insertions(+), 2 deletions(-)

-- 
2.4.3

[Qemu-devel] [PATCH v2 2/2] AioContext: force event loop iteration using BH

2015-07-28 Thread Stefan Hajnoczi

The notify_me optimization introduced in commit eabc97797310
("AioContext: fix broken ctx->dispatching optimization") skips
event_notifier_set() calls when the event loop thread is not blocked in
ppoll(2).

This optimization causes a deadlock if two aio_context_acquire() calls
race.  notify_me = 0 during the race so the winning thread can enter
ppoll(2) unaware that the other thread is waiting its turn to acquire
the AioContext.

This patch forces ppoll(2) to return by scheduling a BH instead of
calling aio_notify().

The following deadlock with virtio-blk dataplane is fixed:

  qemu ... -object iothread,id=iothread0 \
   -drive if=none,id=drive0,file=test.img,... \
   -device virtio-blk-pci,iothread=iothread0,drive=drive0

This command-line results in a hang early on without this patch.

Thanks to Paolo Bonzini  for investigating this bug
with me.

Cc: Christian Borntraeger 
Cc: Cornelia Huck 
Cc: Paolo Bonzini 
Signed-off-by: Stefan Hajnoczi 
---
 async.c | 16 ++--
 include/block/aio.h |  3 +++
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/async.c b/async.c
index 929d533..7dd6dd4 100644
--- a/async.c
+++ b/async.c
@@ -79,8 +79,10 @@ int aio_bh_poll(AioContext *ctx)
  * aio_notify again if necessary.
  */
 if (!bh->deleted && atomic_xchg(&bh->scheduled, 0)) {
-if (!bh->idle)
+/* Idle BHs and the notify BH don't count as progress */
+if (!bh->idle && bh != ctx->notify_dummy_bh) {
 ret = 1;
+}
 bh->idle = 0;
 bh->cb(bh->opaque);
 }
@@ -230,6 +232,7 @@ aio_ctx_finalize(GSource *source)
 {
 AioContext *ctx = (AioContext *) source;
 
+qemu_bh_delete(ctx->notify_dummy_bh);
 thread_pool_free(ctx->thread_pool);
 aio_set_event_notifier(ctx, &ctx->notifier, NULL);
 event_notifier_cleanup(&ctx->notifier);
@@ -298,8 +301,15 @@ static void aio_timerlist_notify(void *opaque)
 
 static void aio_rfifolock_cb(void *opaque)
 {
+AioContext *ctx = opaque;
+
 /* Kick owner thread in case they are blocked in aio_poll() */
-aio_notify(opaque);
+qemu_bh_schedule(ctx->notify_dummy_bh);
+}
+
+static void notify_dummy_bh(void *opaque)
+{
+/* Do nothing, we were invoked just to force the event loop to iterate */
 }
 
 static void event_notifier_dummy_cb(EventNotifier *e)
@@ -326,6 +336,8 @@ AioContext *aio_context_new(Error **errp)
 rfifolock_init(&ctx->lock, aio_rfifolock_cb, ctx);
 timerlistgroup_init(&ctx->tlg, aio_timerlist_notify, ctx);
 
+ctx->notify_dummy_bh = aio_bh_new(ctx, notify_dummy_bh, NULL);
+
 return ctx;
 }
 
diff --git a/include/block/aio.h b/include/block/aio.h
index 9dd32e0..400b1b0 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -114,6 +114,9 @@ struct AioContext {
 bool notified;
 EventNotifier notifier;
 
+/* Scheduling this BH forces the event loop it iterate */
+QEMUBH *notify_dummy_bh;
+
 /* Thread pool for performing work and receiving completion callbacks */
 struct ThreadPool *thread_pool;
 
-- 
2.4.3

[Qemu-devel] [PATCH v2 1/2] AioContext: avoid leaking deleted BHs on cleanup

2015-07-28 Thread Stefan Hajnoczi

BHs are freed during aio_bh_poll().  This leads to memory leaks if there
is no aio_bh_poll() between qemu_bh_delete() and aio_ctx_finalize().

Suggested-by: Paolo Bonzini 
Signed-off-by: Stefan Hajnoczi 
---
 async.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/async.c b/async.c
index 9a98a74..929d533 100644
--- a/async.c
+++ b/async.c
@@ -234,7 +234,20 @@ aio_ctx_finalize(GSource *source)
 aio_set_event_notifier(ctx, &ctx->notifier, NULL);
 event_notifier_cleanup(&ctx->notifier);
 rfifolock_destroy(&ctx->lock);
+
+qemu_mutex_lock(&ctx->bh_lock);
+while (ctx->first_bh) {
+QEMUBH *next = ctx->first_bh->next;
+
+/* TODO ignore leaks for now, change to an assertion in the future */
+if (ctx->first_bh->deleted) {
+g_free(ctx->first_bh);
+}
+ctx->first_bh = next;
+}
+qemu_mutex_unlock(&ctx->bh_lock);
 qemu_mutex_destroy(&ctx->bh_lock);
+
 timerlistgroup_deinit(&ctx->tlg);
 }
 
-- 
2.4.3

Re: [Qemu-devel] [PATCH for-2.4 0/2] AioContext: fix deadlock after aio_context_acquire() race

2015-07-28 Thread Paolo Bonzini

On 28/07/2015 12:58, Cornelia Huck wrote:
> > > Thanks.  I understand how to reproduce it now: use -drive aio=threads
> > > and do I/O during managedsave.
> > >
> > > I suspect there are more cases of this.  We need to clean it up during 
> > > QEMU 2.5.
> > >
> > > For now let's continue leaking these BHs as we've always done.
> > 
> > Actually, this case can be fixed in the patch by moving
> > thread_pool_free() before the BH cleanup loop.
>
> Tried that, may have done it wrong, because the assertion still hits.

If you're doing savevm with a dataplane disk as the destination, that 
cannot work; savevm doesn't attempt to acquire the AioContext so it is 
not thread safe.

An even simpler reproducer for this bug, however, is to hot-unplug a 
disk created with x-data-plane.  It also shows another bug, fixed by 
this patch:

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 3db139b..6106e46 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -223,8 +223,8 @@ void virtio_blk_data_plane_destroy(VirtIOBlockDataPlane *s)
 virtio_blk_data_plane_stop(s);
 blk_op_unblock_all(s->conf->conf.blk, s->blocker);
 error_free(s->blocker);
-object_unref(OBJECT(s->iothread));
 qemu_bh_delete(s->bh);
+object_unref(OBJECT(s->iothread));
 g_free(s);
 }

which I'll formally send shortly.

I would prefer to fix them all in 2.4 and risk regressions, because the
bugs are use-after-frees, i.e. pretty bad.

Paolo

Re: [Qemu-devel] [PULL for-2.4 00/10] Trivial patches for 2015-07-27

2015-07-28 Thread Peter Maydell

On 27 July 2015 at 20:54, Michael Tokarev  wrote:
> There are a few patches from the trivial queue which are, I think,
> suitable for 2.4.  These are fixing or adding docs/comments, are
> minor/trivial cleanups, or small bugfixes.
>
> Please consider applying for 2.4.
>
> The rest of the trivial tree will be pushed once 2.5 development
> window will be open, hopefully  I will be able to do that by when.
>
> /mjt
>
> The following changes since commit edec47cfef96209987cb7922286cb384916aae02:
>
>   main-loop: fix qemu_notify_event for aio_notify optimization (2015-07-27 
> 17:12:19 +0100)
>
> are available in the git repository at:
>
>   git://git.corpit.ru/qemu.git tags/pull-trivial-patches-2015-07-27
>
> for you to fetch changes up to 226d007dbd75ec8d0f12d0f9e1ce66caf55d49e4:
>
>   gdbstub: Set current CPU on interruptions (2015-07-27 22:46:16 +0300)
>
> 
> trivial patches for 2015-07-27

Applied, thanks.

-- PMM

[Qemu-devel] [PULL for-2.4 0/2] Net patches

2015-07-28 Thread Stefan Hajnoczi

The following changes since commit f8787f8723eaca1be99e3b1873e54de163fffa93:

  Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20150727' into 
staging (2015-07-27 19:37:09 +0100)

are available in the git repository at:

  git://github.com/stefanha/qemu.git tags/net-pull-request

for you to fetch changes up to 7bba83bf80eae9c9e323319ff40d0ca477b0a77a:

  xen: Drop net_rx_ok (2015-07-28 11:35:54 +0100)


Pull request

These two .can_receive() are now reviewed.  The net subsystem queue for 2.4 is 
now empty.



Fam Zheng (1):
  xen: Drop net_rx_ok

Greg Ungerer (1):
  hw/net: handle flow control in mcf_fec driver receiver

 hw/net/mcf_fec.c | 44 ++--
 hw/net/xen_nic.c | 25 +
 2 files changed, 35 insertions(+), 34 deletions(-)

-- 
2.4.3

[Qemu-devel] [PULL for-2.4 2/2] xen: Drop net_rx_ok

2015-07-28 Thread Stefan Hajnoczi

From: Fam Zheng 

Let net_rx_packet() (which checks the same conditions) drops the packet
if the device is not ready. Drop net_xen_info.can_receive and update the
return value for the buffer full case.

We rely on the qemu_flush_queued_packets() in net_event() to wake up
the peer when the buffer becomes available again.

Signed-off-by: Fam Zheng 
Message-id: 1438077176-378-1-git-send-email-f...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 hw/net/xen_nic.c | 25 +
 1 file changed, 1 insertion(+), 24 deletions(-)

diff --git a/hw/net/xen_nic.c b/hw/net/xen_nic.c
index 19ecfc4..d7cbfc1 100644
--- a/hw/net/xen_nic.c
+++ b/hw/net/xen_nic.c
@@ -234,27 +234,6 @@ static void net_rx_response(struct XenNetDev *netdev,
 
 #define NET_IP_ALIGN 2
 
-static int net_rx_ok(NetClientState *nc)
-{
-struct XenNetDev *netdev = qemu_get_nic_opaque(nc);
-RING_IDX rc, rp;
-
-if (netdev->xendev.be_state != XenbusStateConnected) {
-return 0;
-}
-
-rc = netdev->rx_ring.req_cons;
-rp = netdev->rx_ring.sring->req_prod;
-xen_rmb();
-
-if (rc == rp || RING_REQUEST_CONS_OVERFLOW(&netdev->rx_ring, rc)) {
-xen_be_printf(&netdev->xendev, 2, "%s: no rx buffers (%d/%d)\n",
-  __FUNCTION__, rc, rp);
-return 0;
-}
-return 1;
-}
-
 static ssize_t net_rx_packet(NetClientState *nc, const uint8_t *buf, size_t 
size)
 {
 struct XenNetDev *netdev = qemu_get_nic_opaque(nc);
@@ -271,8 +250,7 @@ static ssize_t net_rx_packet(NetClientState *nc, const 
uint8_t *buf, size_t size
 xen_rmb(); /* Ensure we see queued requests up to 'rp'. */
 
 if (rc == rp || RING_REQUEST_CONS_OVERFLOW(&netdev->rx_ring, rc)) {
-xen_be_printf(&netdev->xendev, 2, "no buffer, drop packet\n");
-return -1;
+return 0;
 }
 if (size > XC_PAGE_SIZE - NET_IP_ALIGN) {
 xen_be_printf(&netdev->xendev, 0, "packet too big (%lu > %ld)",
@@ -304,7 +282,6 @@ static ssize_t net_rx_packet(NetClientState *nc, const 
uint8_t *buf, size_t size
 static NetClientInfo net_xen_info = {
 .type = NET_CLIENT_OPTIONS_KIND_NIC,
 .size = sizeof(NICState),
-.can_receive = net_rx_ok,
 .receive = net_rx_packet,
 };
 
-- 
2.4.3

[Qemu-devel] [PULL for-2.4 1/2] hw/net: handle flow control in mcf_fec driver receiver

2015-07-28 Thread Stefan Hajnoczi

From: Greg Ungerer 

The network mcf_fec driver emulated receive side method is not dealing
with network queue flow control properly.

Modify the receive side to check if we have enough space in the
descriptors to store the current packet. If not we process none of it
and return 0. When the guest frees up some buffers through its descriptors
we signal the qemu net layer to send more packets.

[Fixed coding style: 4-space indent and curly braces on if statement.
--Stefan]

Signed-off-by: Greg Ungerer 
Message-id: 1438045374-10358-1-git-send-email-g...@uclinux.org
Signed-off-by: Stefan Hajnoczi 
---
 hw/net/mcf_fec.c | 44 ++--
 1 file changed, 34 insertions(+), 10 deletions(-)

diff --git a/hw/net/mcf_fec.c b/hw/net/mcf_fec.c
index 4e6939f..21928f9 100644
--- a/hw/net/mcf_fec.c
+++ b/hw/net/mcf_fec.c
@@ -196,12 +196,14 @@ static void mcf_fec_do_tx(mcf_fec_state *s)
 
 static void mcf_fec_enable_rx(mcf_fec_state *s)
 {
+NetClientState *nc = qemu_get_queue(s->nic);
 mcf_fec_bd bd;
 
 mcf_fec_read_bd(&bd, s->rx_descriptor);
 s->rx_enabled = ((bd.flags & FEC_BD_E) != 0);
-if (!s->rx_enabled)
-DPRINTF("RX buffer full\n");
+if (s->rx_enabled) {
+qemu_flush_queued_packets(nc);
+}
 }
 
 static void mcf_fec_reset(mcf_fec_state *s)
@@ -397,6 +399,32 @@ static void mcf_fec_write(void *opaque, hwaddr addr,
 mcf_fec_update(s);
 }
 
+static int mcf_fec_have_receive_space(mcf_fec_state *s, size_t want)
+{
+mcf_fec_bd bd;
+uint32_t addr;
+
+/* Walk descriptor list to determine if we have enough buffer */
+addr = s->rx_descriptor;
+while (want > 0) {
+mcf_fec_read_bd(&bd, addr);
+if ((bd.flags & FEC_BD_E) == 0) {
+return 0;
+}
+if (want < s->emrbr) {
+return 1;
+}
+want -= s->emrbr;
+/* Advance to the next descriptor.  */
+if ((bd.flags & FEC_BD_W) != 0) {
+addr = s->erdsr;
+} else {
+addr += 8;
+}
+}
+return 0;
+}
+
 static ssize_t mcf_fec_receive(NetClientState *nc, const uint8_t *buf, size_t 
size)
 {
 mcf_fec_state *s = qemu_get_nic_opaque(nc);
@@ -426,18 +454,14 @@ static ssize_t mcf_fec_receive(NetClientState *nc, const 
uint8_t *buf, size_t si
 if (size > (s->rcr >> 16)) {
 flags |= FEC_BD_LG;
 }
+/* Check if we have enough space in current descriptors */
+if (!mcf_fec_have_receive_space(s, size)) {
+return 0;
+}
 addr = s->rx_descriptor;
 retsize = size;
 while (size > 0) {
 mcf_fec_read_bd(&bd, addr);
-if ((bd.flags & FEC_BD_E) == 0) {
-/* No descriptors available.  Bail out.  */
-/* FIXME: This is wrong.  We should probably either save the
-   remainder for when more RX buffers are available, or
-   flag an error.  */
-fprintf(stderr, "mcf_fec: Lost end of frame\n");
-break;
-}
 buf_len = (size <= s->emrbr) ? size: s->emrbr;
 bd.length = buf_len;
 size -= buf_len;
-- 
2.4.3

Re: [Qemu-devel] [PATCH for-2.4 v3] xen: Drop net_rx_ok

2015-07-28 Thread Stefan Hajnoczi

On Tue, Jul 28, 2015 at 05:52:56PM +0800, Fam Zheng wrote:
> Let net_rx_packet() (which checks the same conditions) drops the packet
> if the device is not ready. Drop net_xen_info.can_receive and update the
> return value for the buffer full case.
> 
> We rely on the qemu_flush_queued_packets() in net_event() to wake up
> the peer when the buffer becomes available again.
> 
> Signed-off-by: Fam Zheng 
> 
> ---
> 
> v3: Drop log. [Wen]
> 
> v2:
> Fix return value of net_rx_packet to keep the semantics when ring
> buffer is full. [Stefan]
> Update commit message accordingly.
> ---
>  hw/net/xen_nic.c | 25 +
>  1 file changed, 1 insertion(+), 24 deletions(-)

Thanks, applied to my net tree:
https://github.com/stefanha/qemu/commits/net

Stefan


pgpTcaGYQDXq6.pgp
Description: PGP signature

[Qemu-devel] [PATCH for-2.4] block: delete bottom halves before the AioContext is freed

2015-07-28 Thread Paolo Bonzini

Other uses of aio_bh_new are safe as long as all scheduled bottom
halves are run before an iothread is destroy, which bdrv_drain will
ensure:

- ctx->notify_dummy_bh: cleared by aio_ctx_finalize

- archipelago_finish_aiocb: BH deletes itself

- inject_error: BH deletes itself

- blkverify_aio_bh: BH deletes itself

- abort_aio_request: BH deletes itself

- curl_aio_readv: BH deletes itself

- gluster_finish_aiocb: BH deletes itself

- bdrv_aio_rw_vector: BH deletes itself

- bdrv_co_maybe_schedule_bh: BH deletes itself

- iscsi_schedule_bh, iscsi_co_generic_cb: BH deletes itself

- laio_attach_aio_context: deleted in laio_detach_aio_context,
called through bdrv_detach_aio_context before deleting the iothread

- nfs_co_generic_cb: BH deletes itself

- null_aio_common: BH deletes itself

- qed_aio_complete: BH deletes itself

- rbd_finish_aiocb: BH deletes itself

- dma_blk_cb: BH deletes itself

- virtio_blk_dma_restart_cb: BH deletes itself

- qemu_bh_new: main loop AioContext is never destroyed

- test-aio.c: bh_delete_cb deletes itself, otherwise deleted in
the same function that calls aio_bh_new

Reported-by: Cornelia Huck 
Signed-off-by: Paolo Bonzini 
---
 async.c | 2 +-
 hw/block/dataplane/virtio-blk.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/async.c b/async.c
index 9ca7095..efce14b 100644
--- a/async.c
+++ b/async.c
@@ -233,6 +233,7 @@ aio_ctx_finalize(GSource *source)
 AioContext *ctx = (AioContext *) source;
 
 qemu_bh_delete(ctx->notify_dummy_bh);
+thread_pool_free(ctx->thread_pool);
 
 qemu_mutex_lock(&ctx->bh_lock);
 while (ctx->first_bh) {
@@ -246,7 +247,6 @@ aio_ctx_finalize(GSource *source)
 }
 qemu_mutex_unlock(&ctx->bh_lock);
 
-thread_pool_free(ctx->thread_pool);
 aio_set_event_notifier(ctx, &ctx->notifier, NULL);
 event_notifier_cleanup(&ctx->notifier);
 rfifolock_destroy(&ctx->lock);
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 3db139b..6106e46 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -223,8 +223,8 @@ void virtio_blk_data_plane_destroy(VirtIOBlockDataPlane *s)
 virtio_blk_data_plane_stop(s);
 blk_op_unblock_all(s->conf->conf.blk, s->blocker);
 error_free(s->blocker);
-object_unref(OBJECT(s->iothread));
 qemu_bh_delete(s->bh);
+object_unref(OBJECT(s->iothread));
 g_free(s);
 }
 
-- 
2.4.3

Re: [Qemu-devel] [PATCH v3] opts: produce valid command line in qemu_opts_print

2015-07-28 Thread Stefan Hajnoczi

On Tue, Jul 7, 2015 at 3:42 PM, Kővágó, Zoltán  wrote:
> This will let us print options in a format that the user would actually
> write it on the command line (foo=bar,baz=asd,etc=def), without
> prepending a spurious comma at the beginning of the list, or quoting
> values unnecessarily.  This patch provides the following changes:
> * write and id=, if the option has an id
> * do not print separator before the first element
> * do not quote string arguments
> * properly escape commas (,) for QEMU
>
> Signed-off-by: Kővágó, Zoltán 
> Reviewed-by: Markus Armbruster 

Feel free to send a "Ping" email reply if your patches haven't been
reviewed after about 1 week.

Reviewed-by: Stefan Hajnoczi

Re: [Qemu-devel] [PATCH] qemu-nbd: remove unnecessary qemu_notify_event()

2015-07-28 Thread Paolo Bonzini



On 28/07/2015 06:09, Fam Zheng wrote:
> On Mon, 07/27 13:54, Paolo Bonzini wrote:
>> This was needed when qemu-nbd was using qemu_set_fd_handler2.  It is
>> not needed anymore now that nbd_update_server_fd_handler is called
>> whenever nbd_can_accept() can change from false to true.
>> nbd_update_server_fd_handler will call qemu_set_fd_handler(),
>> which will call qemu_notify_event().
>>
>> Signed-off-by: Paolo Bonzini 
>> ---
>>  qemu-nbd.c | 1 -
>>  1 file changed, 1 deletion(-)
>>
>> diff --git a/qemu-nbd.c b/qemu-nbd.c
>> index 5106b80..d9644b2 100644
>> --- a/qemu-nbd.c
>> +++ b/qemu-nbd.c
>> @@ -362,7 +362,6 @@ static void nbd_client_closed(NBDClient *client)
>>  state = TERMINATE;
>>  }
>>  nbd_update_server_fd_handler(server_fd);
>> -qemu_notify_event();
>>  nbd_client_put(client);
>>  }
>>  
>> -- 
>> 2.4.3
>>
>>
> 
> Reviewed-by: Fam Zheng 
> 
> A question not related to this patch: unlike aio_set_fd_handler,
> qemu_set_fd_handler doesn't call qemu_notify_event for the "delete" branch. Is
> that intended?

Yes.  In fact I think qemu_notify_event() is not needed at all in
qemu_set_fd_handler() because there are no concurrent ppoll() calls
during a qemu_set_fd_handler().

Paolo

Re: [Qemu-devel] [PATCH for-2.4] block: delete bottom halves before the AioContext is freed

2015-07-28 Thread Cornelia Huck

On Tue, 28 Jul 2015 14:30:28 +0200
Paolo Bonzini  wrote:

> diff --git a/async.c b/async.c
> index 9ca7095..efce14b 100644
> --- a/async.c
> +++ b/async.c
> @@ -233,6 +233,7 @@ aio_ctx_finalize(GSource *source)
>  AioContext *ctx = (AioContext *) source;
> 
>  qemu_bh_delete(ctx->notify_dummy_bh);
> +thread_pool_free(ctx->thread_pool);
> 
>  qemu_mutex_lock(&ctx->bh_lock);
>  while (ctx->first_bh) {
> @@ -246,7 +247,6 @@ aio_ctx_finalize(GSource *source)
>  }
>  qemu_mutex_unlock(&ctx->bh_lock);
> 
> -thread_pool_free(ctx->thread_pool);
>  aio_set_event_notifier(ctx, &ctx->notifier, NULL);
>  event_notifier_cleanup(&ctx->notifier);
>  rfifolock_destroy(&ctx->lock);
> diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
> index 3db139b..6106e46 100644
> --- a/hw/block/dataplane/virtio-blk.c
> +++ b/hw/block/dataplane/virtio-blk.c
> @@ -223,8 +223,8 @@ void virtio_blk_data_plane_destroy(VirtIOBlockDataPlane 
> *s)
>  virtio_blk_data_plane_stop(s);
>  blk_op_unblock_all(s->conf->conf.blk, s->blocker);
>  error_free(s->blocker);
> -object_unref(OBJECT(s->iothread));
>  qemu_bh_delete(s->bh);
> +object_unref(OBJECT(s->iothread));
>  g_free(s);
>  }
> 

With this applied on top of Stefan's AioContext patches, both
managedsave and device_del with dataplane devices work for me.

Tested-by: Cornelia Huck

Re: [Qemu-devel] [PULL for-2.4 0/2] block patches for 2.4-rc3

2015-07-28 Thread Peter Maydell

On 28 July 2015 at 05:23, Jeff Cody  wrote:
> The following changes since commit f8787f8723eaca1be99e3b1873e54de163fffa93:
>
>   Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20150727' into 
> staging (2015-07-27 19:37:09 +0100)
>
> are available in the git repository at:
>
>
>   g...@github.com:codyprime/qemu-kvm-jtc.git 
> tags/jtc-for-upstream-pull-request
>
> for you to fetch changes up to 325e3904210c779a13fbbc9ee156056d045d7eee:
>
>   block/ssh: Avoid segfault if inet_connect doesn't set errno. (2015-07-28 
> 00:19:05 -0400)
>
> 
> Block patches for 2.4-rc3
> 


Applied, thanks.

-- PMM

[Qemu-devel] Live migration hangs after migration to remote host

2015-07-28 Thread Eduardo Otubo

Hello all,

I'm facing a weird behavior on my tests: I am able to live migrate
between two virtual machines on my localhost, but not to another
machine, both using tcp.

* I am using the same arguments on the command line;
* Both virtual machines uses the same qcow2 file visible through NFS;
* Both machines are in the same subnet;
* Migration is being done from intel to intel;
* Same version of Qemu (github master - f8787f8723);

Using all above I am able to live migrate on the same host: between two
vms on local host or between two vms in the remote host; but when
migrating from local to remote, the guest hangs. I still can access its
console via ctrl+alt+2, though, and everything seems to be normal. If I
issue a reboote via console on the remote, the guest gets back to
normal.

Am I missing something here?

Regards,

-- 
Eduardo Otubo
ProfitBricks GmbH


signature.asc
Description: Digital signature

[Qemu-devel] [PATCH 5/5] hw/intc/arm_gic: Actually set the active bits for active interrupts

2015-07-28 Thread Peter Maydell

Although we were correctly handling interrupts becoming active
and then inactive, we weren't actually exposing this to the guest
by setting the 'active' flag for the interrupt, so reads
of GICD_ICACTIVERn and GICD_ISACTIVERn would generally incorrectly
return zeroes. Correct this oversight.

Signed-off-by: Peter Maydell 
---
 hw/intc/arm_gic.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/intc/arm_gic.c b/hw/intc/arm_gic.c
index 427c221..dc5a44b 100644
--- a/hw/intc/arm_gic.c
+++ b/hw/intc/arm_gic.c
@@ -262,6 +262,7 @@ static void gic_activate_irq(GICState *s, int cpu, int irq)
 }
 
 s->running_priority[cpu] = prio;
+GIC_SET_ACTIVE(irq, 1 << cpu);
 }
 
 static int gic_get_prio_from_apr_bits(GICState *s, int cpu)
@@ -536,6 +537,7 @@ void gic_complete_irq(GICState *s, int cpu, int irq, 
MemTxAttrs attrs)
  */
 
 gic_drop_prio(s, cpu, group);
+GIC_CLEAR_ACTIVE(irq, cm);
 gic_update(s);
 }
 
-- 
1.9.1

[Qemu-devel] [PATCH 1/5] armv7m_nvic: Implement ICSR without using internal GIC state

2015-07-28 Thread Peter Maydell

Change the implementation of the Interrupt Control and State Register
in the v7M NVIC to not use the running_irq and last_active internal
state fields in the GIC. These fields don't correspond to state in
a real GIC and will be removed soon.
The changes to the ICSR are:
 * the VECTACTIVE field is documented as identical to the IPSR[8:0]
   field, so implement it that way
 * implement RETTOBASE via looking at the active state bits

Signed-off-by: Peter Maydell 
---
 hw/intc/armv7m_nvic.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
index e13b729..3ec8408 100644
--- a/hw/intc/armv7m_nvic.c
+++ b/hw/intc/armv7m_nvic.c
@@ -185,26 +185,25 @@ static uint32_t nvic_readl(nvic_state *s, uint32_t offset)
 return cpu->midr;
 case 0xd04: /* Interrupt Control State.  */
 /* VECTACTIVE */
-val = s->gic.running_irq[0];
+cpu = ARM_CPU(current_cpu);
+val = cpu->env.v7m.exception;
 if (val == 1023) {
 val = 0;
 } else if (val >= 32) {
 val -= 16;
 }
-/* RETTOBASE */
-if (s->gic.running_irq[0] == 1023
-|| s->gic.last_active[s->gic.running_irq[0]][0] == 1023) {
-val |= (1 << 11);
-}
 /* VECTPENDING */
 if (s->gic.current_pending[0] != 1023)
 val |= (s->gic.current_pending[0] << 12);
-/* ISRPENDING */
+/* ISRPENDING and RETTOBASE */
 for (irq = 32; irq < s->num_irq; irq++) {
 if (s->gic.irq_state[irq].pending) {
 val |= (1 << 22);
 break;
 }
+if (irq != cpu->env.v7m.exception && s->gic.irq_state[irq].active) 
{
+val |= (1 << 11);
+}
 }
 /* PENDSTSET */
 if (s->gic.irq_state[ARMV7M_EXCP_SYSTICK].pending)
-- 
1.9.1

[Qemu-devel] [PATCH 2/5] hw/intc/arm_gic: Running priority is group priority, not full priority

2015-07-28 Thread Peter Maydell

Priority values for the GIC are divided into a "group priority"
and a "subpriority" (with the division being determined by the
binary point register). The running priority is only determined
by the group priority of the active interrupts, not the
subpriority. In particular, this means that there can't be more
than one active interrupt at any particular group priority.

Signed-off-by: Peter Maydell 
---
 hw/intc/arm_gic.c | 28 +++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/hw/intc/arm_gic.c b/hw/intc/arm_gic.c
index 454bfd7..9814bb9 100644
--- a/hw/intc/arm_gic.c
+++ b/hw/intc/arm_gic.c
@@ -219,13 +219,39 @@ static uint16_t gic_get_current_pending_irq(GICState *s, 
int cpu,
 return pending_irq;
 }
 
+static int gic_get_group_priority(GICState *s, int cpu, int irq)
+{
+/* Return the group priority of the specified interrupt
+ * (which is the top bits of its priority, with the number
+ * of bits masked determined by the applicable binary point register).
+ */
+int bpr;
+uint32_t mask;
+
+if (gic_has_groups(s) &&
+!(s->cpu_ctlr[cpu] & GICC_CTLR_CBPR) &&
+GIC_TEST_GROUP(irq, (1 << cpu))) {
+bpr = s->abpr[cpu];
+} else {
+bpr = s->bpr[cpu];
+}
+
+/* a BPR of 0 means the group priority bits are [7:1];
+ * a BPR of 1 means they are [7:2], and so on down to
+ * a BPR of 7 meaning no group priority bits at all.
+ */
+mask = ~0U << ((bpr & 7) + 1);
+
+return GIC_GET_PRIORITY(irq, cpu) & mask;
+}
+
 static void gic_set_running_irq(GICState *s, int cpu, int irq)
 {
 s->running_irq[cpu] = irq;
 if (irq == 1023) {
 s->running_priority[cpu] = 0x100;
 } else {
-s->running_priority[cpu] = GIC_GET_PRIORITY(irq, cpu);
+s->running_priority[cpu] = gic_get_group_priority(s, cpu, irq);
 }
 gic_update(s);
 }
-- 
1.9.1

[Qemu-devel] [PATCH 0/5] arm_gic: Drop running_irq and last_active arrays

2015-07-28 Thread Peter Maydell

This patchset is a bit of cleanup to our GIC implementation that
I've wanted to do for ages.

Our current GIC code uses a couple of arrays (running_irq and
last_active) to track currently active interrupts so that
it can correctly determine the running priority as potentially
nested interrupts are taken and then dismissed. This does
work, but:
 * the effectively-a-linked-list is not very hardware-ish,
   which is usually a bit of a red flag when doing modelling
 * the GICv2 spec adds the Active Priority Registers which are
   for use for saving and restoring this sort of state, and
   implementing these properly effectively constrains you to
   an implementation that doesn't look like what we have now
 * it doesn't really fit with the GIC grouping and security
   extensions, where a guest can say "dismiss the last group 1
   interrupt" rather than just "dismiss the last interrupt".

This series gets rid of those arrays and instead uses the
Active Priority Registers to do the job. The APRs have one
bit per "preemption level" (ie per distinct group priority).
When we take an interrupt we set the appropriate bit to 1
(and this will always be the lowest set bit in the register,
because low-numbered priorities are higher and we wouldn't
have taken the interrupt unless it was higher than our current
priority). Similarly, when we dismiss an interrupt this just
clears the lowest set bit in the register, which must be the
current active interrupt. (It's important not to try to look
at the current configured priority of the interrupt number,
because the guest might have reconfigured it while it was
active.) The new running priority is then calculable by
looking at the new lowest set bit.

The new code also takes a step in the direction of
separating the idea of "priority drop" from "deactivate interrupt",
which we will need to implement the GICv2 feature which allows
guests to do these two things as separate operations. There's
more work to do in this area though.

Patch series structure:
 * patch 1 disentangles the v7M NVIC from some of the internal
   state we're about to rewrite
 * patch 2 fixes a bug that would have meant we could have multiple
   active interrupts at the same group priority, which (a) isn't
   permitted and (b) would break the redesign we're about to do
 * patch 3 is fixing the guest accessors for the APR registers
 * patch 4 is the meat of the change
 * patch 5 is a bonus bugfix


Peter Maydell (5):
  armv7m_nvic: Implement ICSR without using internal GIC state
  hw/intc/arm_gic: Running priority is group priority, not full priority
  hw/intc/arm_gic: Fix handling of GICC_APR, GICC_NSAPR registers
  hw/intc/arm_gic: Drop running_irq and last_active arrays
  hw/intc/arm_gic: Actually set the active bits for active interrupts

 hw/intc/arm_gic.c| 245 ++-
 hw/intc/arm_gic_common.c |   8 +-
 hw/intc/armv7m_nvic.c|  13 +--
 include/hw/intc/arm_gic_common.h |  11 +-
 4 files changed, 224 insertions(+), 53 deletions(-)

-- 
1.9.1

[Qemu-devel] [PATCH 4/5] hw/intc/arm_gic: Drop running_irq and last_active arrays

2015-07-28 Thread Peter Maydell

The running_irq and last_active arrays represent state which
doesn't exist in a real hardware GIC. The only thing we use
them for is updating the running priority when an interrupt
is completed, but in fact we can use the active-priority
registers to do this. The running priority is always the
priority corresponding to the lowest set bit in the active
priority registers, because only one interrupt at any
particular priority can be active at once.

Signed-off-by: Peter Maydell 
---
 hw/intc/arm_gic.c| 103 ---
 hw/intc/arm_gic_common.c |   7 +--
 include/hw/intc/arm_gic_common.h |  10 
 3 files changed, 76 insertions(+), 44 deletions(-)

diff --git a/hw/intc/arm_gic.c b/hw/intc/arm_gic.c
index ed7faf5..427c221 100644
--- a/hw/intc/arm_gic.c
+++ b/hw/intc/arm_gic.c
@@ -245,15 +245,72 @@ static int gic_get_group_priority(GICState *s, int cpu, 
int irq)
 return GIC_GET_PRIORITY(irq, cpu) & mask;
 }
 
-static void gic_set_running_irq(GICState *s, int cpu, int irq)
+static void gic_activate_irq(GICState *s, int cpu, int irq)
 {
-s->running_irq[cpu] = irq;
-if (irq == 1023) {
-s->running_priority[cpu] = 0x100;
+/* Set the appropriate Active Priority Register bit for this IRQ,
+ * and update the running priority.
+ */
+int prio = gic_get_group_priority(s, cpu, irq);
+int preemption_level = prio >> (GIC_MIN_BPR + 1);
+int regno = preemption_level / 32;
+int bitno = preemption_level % 32;
+
+if (gic_has_groups(s) && GIC_TEST_GROUP(irq, (1 << cpu))) {
+s->nsapr[regno][cpu] &= (1 << bitno);
 } else {
-s->running_priority[cpu] = gic_get_group_priority(s, cpu, irq);
+s->apr[regno][cpu] &= (1 << bitno);
 }
-gic_update(s);
+
+s->running_priority[cpu] = prio;
+}
+
+static int gic_get_prio_from_apr_bits(GICState *s, int cpu)
+{
+/* Recalculate the current running priority for this CPU based
+ * on the set bits in the Active Priority Registers.
+ */
+int i;
+for (i = 0; i < GIC_NR_APRS; i++) {
+uint32_t apr = s->apr[i][cpu] | s->nsapr[i][cpu];
+if (!apr) {
+continue;
+}
+return (i * 32 + ctz32(apr)) << (GIC_MIN_BPR + 1);
+}
+return 0x100;
+}
+
+static void gic_drop_prio(GICState *s, int cpu, int group)
+{
+/* Drop the priority of the currently active interrupt in the
+ * specified group.
+ *
+ * Note that we can guarantee (because of the requirement to nest
+ * GICC_IAR reads [which activate an interrupt and raise priority]
+ * with GICC_EOIR writes [which drop the priority for the interrupt])
+ * that the interrupt we're being called for is the highest priority
+ * active interrupt, meaning that it has the lowest set bit in the
+ * APR registers.
+ *
+ * If the guest does not honour the ordering constraints then the
+ * behaviour of the GIC is UNPREDICTABLE, which for us means that
+ * the values of the APR registers might become incorrect and the
+ * running priority will be wrong, so interrupts that should preempt
+ * might not do so, and interrupts that should not preempt might do so.
+ */
+int i;
+
+for (i = 0; i < GIC_NR_APRS; i++) {
+uint32_t *papr = group ? &s->nsapr[i][cpu] : &s->apr[i][cpu];
+if (!*papr) {
+continue;
+}
+/* Clear lowest set bit */
+*papr &= *papr - 1;
+break;
+}
+
+s->running_priority[cpu] = gic_get_prio_from_apr_bits(s, cpu);
 }
 
 uint32_t gic_acknowledge_irq(GICState *s, int cpu, MemTxAttrs attrs)
@@ -276,7 +333,6 @@ uint32_t gic_acknowledge_irq(GICState *s, int cpu, 
MemTxAttrs attrs)
 DPRINTF("ACK, pending interrupt (%d) has insufficient priority\n", 
irq);
 return 1023;
 }
-s->last_active[irq][cpu] = s->running_irq[cpu];
 
 if (s->revision == REV_11MPCORE || s->revision == REV_NVIC) {
 /* Clear pending flags for both level and edge triggered interrupts.
@@ -307,7 +363,8 @@ uint32_t gic_acknowledge_irq(GICState *s, int cpu, 
MemTxAttrs attrs)
 }
 }
 
-gic_set_running_irq(s, cpu, irq);
+gic_activate_irq(s, cpu, irq);
+gic_update(s);
 DPRINTF("ACK %d\n", irq);
 return ret;
 }
@@ -437,8 +494,9 @@ static uint8_t gic_get_running_priority(GICState *s, int 
cpu, MemTxAttrs attrs)
 
 void gic_complete_irq(GICState *s, int cpu, int irq, MemTxAttrs attrs)
 {
-int update = 0;
 int cm = 1 << cpu;
+int group;
+
 DPRINTF("EOI %d\n", irq);
 if (irq >= s->num_irq) {
 /* This handles two cases:
@@ -451,8 +509,9 @@ void gic_complete_irq(GICState *s, int cpu, int irq, 
MemTxAttrs attrs)
  */
 return;
 }
-if (s->running_irq[cpu] == 1023)
+if (s->running_priority[cpu] == 0x100) {
 return; /* No active IRQ.  */
+}
 
 if (s->revision == REV_11MPCORE || s->revision == REV_NVIC) {
 /* Mark level triggered interru

[Qemu-devel] [PATCH 3/5] hw/intc/arm_gic: Fix handling of GICC_APR, GICC_NSAPR registers

2015-07-28 Thread Peter Maydell

A GICv2 has both GICC_APR and GICC_NSAPR registers, with
the latter holding the active priority bits for Group 1 interrupts
(usually Nonsecure interrupts), and the Nonsecure view of the
GICC_APR is the second half of the GICC_NSAPR registers.
Turn our half-hearted implementation of APR into a proper
implementation of both APR and NSAPR:

 * Add the underlying state for NSAPR
 * Make sure APR aren't visible for pre-GICv2
 * Implement reading of NSAPR
 * Make non-secure reads of APR behave correctly
 * Implement writing to APR and NSAPR

Signed-off-by: Peter Maydell 
---
 hw/intc/arm_gic.c| 114 ++-
 hw/intc/arm_gic_common.c |   5 +-
 include/hw/intc/arm_gic_common.h |   1 +
 3 files changed, 116 insertions(+), 4 deletions(-)

diff --git a/hw/intc/arm_gic.c b/hw/intc/arm_gic.c
index 9814bb9..ed7faf5 100644
--- a/hw/intc/arm_gic.c
+++ b/hw/intc/arm_gic.c
@@ -954,6 +954,68 @@ static const MemoryRegionOps gic_dist_ops = {
 .endianness = DEVICE_NATIVE_ENDIAN,
 };
 
+static inline uint32_t gic_apr_ns_view(GICState *s, int cpu, int regno)
+{
+/* Return the Nonsecure view of GICC_APR. This is the
+ * second half of GICC_NSAPR.
+ */
+switch (GIC_MIN_BPR) {
+case 0:
+if (regno < 2) {
+return s->nsapr[regno + 2][cpu];
+}
+break;
+case 1:
+if (regno == 0) {
+return s->nsapr[regno + 1][cpu];
+}
+break;
+case 2:
+if (regno == 0) {
+return extract32(s->nsapr[0][cpu], 16, 16);
+}
+break;
+case 3:
+if (regno == 0) {
+return extract32(s->nsapr[0][cpu], 8, 8);
+}
+break;
+default:
+g_assert_not_reached();
+}
+return 0;
+}
+
+static inline void gic_apr_write_ns_view(GICState *s, int cpu, int regno,
+ uint32_t value)
+{
+/* Write the Nonsecure view of GICC_APR. */
+switch (GIC_MIN_BPR) {
+case 0:
+if (regno < 2) {
+s->nsapr[regno + 2][cpu] = value;
+}
+break;
+case 1:
+if (regno == 0) {
+s->nsapr[regno + 1][cpu] = value;
+}
+break;
+case 2:
+if (regno == 0) {
+s->nsapr[0][cpu] = deposit32(s->nsapr[0][cpu], 16, 16, value);
+}
+break;
+case 3:
+if (regno == 0) {
+s->nsapr[0][cpu] = deposit32(s->nsapr[0][cpu], 8, 8, value);
+}
+break;
+default:
+g_assert_not_reached();
+}
+}
+
 static MemTxResult gic_cpu_read(GICState *s, int cpu, int offset,
 uint64_t *data, MemTxAttrs attrs)
 {
@@ -994,8 +1056,31 @@ static MemTxResult gic_cpu_read(GICState *s, int cpu, int 
offset,
 }
 break;
 case 0xd0: case 0xd4: case 0xd8: case 0xdc:
-*data = s->apr[(offset - 0xd0) / 4][cpu];
+{
+int regno = (offset - 0xd0) / 4;
+
+if (regno >= GIC_NR_APRS || s->revision != 2) {
+*data = 0;
+} else if (s->security_extn && !attrs.secure) {
+/* NS view of GICC_APR is the top half of GIC_NSAPR */
+*data = gic_apr_ns_view(s, regno, cpu);
+} else {
+*data = s->apr[regno][cpu];
+}
 break;
+}
+case 0xe0: case 0xe4: case 0xe8: case 0xec:
+{
+int regno = (offset - 0xe0) / 4;
+
+if (regno >= GIC_NR_APRS || s->revision != 2 || !gic_has_groups(s) ||
+(s->security_extn && !attrs.secure)) {
+*data = 0;
+} else {
+*data = s->nsapr[regno][cpu];
+}
+break;
+}
 default:
 qemu_log_mask(LOG_GUEST_ERROR,
   "gic_cpu_read: Bad offset %x\n", (int)offset);
@@ -1033,8 +1118,33 @@ static MemTxResult gic_cpu_write(GICState *s, int cpu, 
int offset,
 }
 break;
 case 0xd0: case 0xd4: case 0xd8: case 0xdc:
-qemu_log_mask(LOG_UNIMP, "Writing APR not implemented\n");
+{
+int regno = (offset - 0xd0) / 4;
+
+if (regno >= GIC_NR_APRS || s->revision != 2) {
+return MEMTX_OK;
+}
+if (s->security_extn && !attrs.secure) {
+/* NS view of GICC_APR is the top half of GIC_NSAPR */
+gic_apr_write_ns_view(s, regno, cpu, value);
+} else {
+s->apr[regno][cpu] = value;
+}
+break;
+}
+case 0xe0: case 0xe4: case 0xe8: case 0xec:
+{
+int regno = (offset - 0xe0) / 4;
+
+if (regno >= GIC_NR_APRS || s->revision != 2) {
+return MEMTX_OK;
+}
+if (!gic_has_groups(s) || (s->security_extn && !attrs.secure)) {
+return MEMTX_OK;
+}
+s->nsapr[regno][cpu] = value;
 break;
+}
 default:
 qemu_log_mask(LOG_GUEST_ERROR,
   "gic_cpu_write: Bad offset %x\n", (int)offset);
diff --git a/hw/intc/arm_gic_common.c b/hw/i

Re: [Qemu-devel] [PATCH v2 for-2.5] rcu: Allow calling rcu_(un)register_thread() during synchronize_rcu()

2015-07-28 Thread Wen Congyang


At 2015/7/28 19:46, Paolo Bonzini Wrote:



On 28/07/2015 12:33, Wen Congyang wrote:

On 07/28/2015 06:18 PM, Paolo Bonzini wrote:



On 28/07/2015 12:02, Wen Congyang wrote:

I have a question about rcu: while do we call wait_for_readers()
twice for 32-bit host?


Because there is a very small but non-zero probability of the counter
going up by exactly 2^31 periods (periods are stored in bits 1-31 so you
lose one bit) while the thread is sleeping.  This detail of the
implementation comes from URCU.


Yes, so you use rcu_gp_ctr ^ RCU_GP_CTR to instead of rcu_gp_ctr + RCU_GP_CTR.
The initial value is 1, so rcu_gp_ctr is: 1, 3, 1, 3, ...
The rcu_gp_ctr will never be 0. I think calling wait_for_readers() once is
enough.

Do I miss something?


If you call it just once, you have the same problem as before.  In fact,
it's worse because instead of having an overflow every 2^31 periods, you
have one every 2 periods.  Instead, by checking that rcu_reader went
through 1 _and_ 3 (or that it was at least once 0, i.e. the thread was
quiescent), you are sure that the thread went through _at least one_
grace period.


The overflow is acceptable. We only compare if rcu_reader.ctr is equal than
rcu_gp_ctr. If not, we should wait that thread to call rcu_read_unlock().
We don't care which is bigger. If no threads calls sync_rcu(), all threads
rcu_read.ctr is 0 or rcu_gp_ctr. If one thread calls sync_rcu(), all threads
rcu_read.ctr is 0, old_rcu_gp_ctr, or new_rcu_gp_ctr. We only wait the 
thread

that's rcu_read.ctr is old_rcu_gp_ctr.

Thanks
Wen Congyang



Paolo

[Qemu-devel] [PATCH] migration: yet more possible state transitions

2015-07-28 Thread Juan Quintela

On destination, we move from INMIGRATE to FINISH_MIGRATE.  Add that to
the list of allowed states.

Signed-off-by: Juan Quintela 
---
 vl.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/vl.c b/vl.c
index 0adbbd6..45eb9ea 100644
--- a/vl.c
+++ b/vl.c
@@ -582,6 +582,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_INMIGRATE, RUN_STATE_SUSPENDED },
 { RUN_STATE_INMIGRATE, RUN_STATE_WATCHDOG },
 { RUN_STATE_INMIGRATE, RUN_STATE_GUEST_PANICKED },
+{ RUN_STATE_INMIGRATE, RUN_STATE_FINISH_MIGRATE },

 { RUN_STATE_INTERNAL_ERROR, RUN_STATE_PAUSED },
 { RUN_STATE_INTERNAL_ERROR, RUN_STATE_FINISH_MIGRATE },
-- 
2.4.3

Re: [Qemu-devel] [PATCH for-2.4 0/2] AioContext: fix deadlock after aio_context_acquire() race

2015-07-28 Thread Stefan Hajnoczi

On Tue, Jul 28, 2015 at 1:18 PM, Paolo Bonzini  wrote:
> I would prefer to fix them all in 2.4 and risk regressions, because the
> bugs are use-after-frees, i.e. pretty bad.

There may be existing use-after-free bugs but keep in mind there are
other common cases:
1. Never touch the QEMUBH again.  Simple leak.
2. Call qemu_bh_delete().  Leak but still not use-after-free, since
the QEMUBH is still allocated.

The only scenario where a real use-after-free occurs is when
qemu_bh_schedule() is called after the AioContext was freed.  We don't
need an assertion to detect that case, just assign bh->ctx = NULL to
cause a segfault if the AioContext is ever accessed again.

Stefan

Re: [Qemu-devel] [PATCH for-2.4] block: delete bottom halves before the AioContext is freed

2015-07-28 Thread Stefan Hajnoczi

On Tue, Jul 28, 2015 at 2:11 PM, Cornelia Huck  wrote:
> On Tue, 28 Jul 2015 14:30:28 +0200
> Paolo Bonzini  wrote:
>
>> diff --git a/async.c b/async.c
>> index 9ca7095..efce14b 100644
>> --- a/async.c
>> +++ b/async.c
>> @@ -233,6 +233,7 @@ aio_ctx_finalize(GSource *source)
>>  AioContext *ctx = (AioContext *) source;
>>
>>  qemu_bh_delete(ctx->notify_dummy_bh);
>> +thread_pool_free(ctx->thread_pool);
>>
>>  qemu_mutex_lock(&ctx->bh_lock);
>>  while (ctx->first_bh) {
>> @@ -246,7 +247,6 @@ aio_ctx_finalize(GSource *source)
>>  }
>>  qemu_mutex_unlock(&ctx->bh_lock);
>>
>> -thread_pool_free(ctx->thread_pool);
>>  aio_set_event_notifier(ctx, &ctx->notifier, NULL);
>>  event_notifier_cleanup(&ctx->notifier);
>>  rfifolock_destroy(&ctx->lock);

My v2 patch includes something similar so that thread_pool_free() is
called before freeing BHs.

We can take this patch on top of my v1.

>> diff --git a/hw/block/dataplane/virtio-blk.c 
>> b/hw/block/dataplane/virtio-blk.c
>> index 3db139b..6106e46 100644
>> --- a/hw/block/dataplane/virtio-blk.c
>> +++ b/hw/block/dataplane/virtio-blk.c
>> @@ -223,8 +223,8 @@ void virtio_blk_data_plane_destroy(VirtIOBlockDataPlane 
>> *s)
>>  virtio_blk_data_plane_stop(s);
>>  blk_op_unblock_all(s->conf->conf.blk, s->blocker);
>>  error_free(s->blocker);
>> -object_unref(OBJECT(s->iothread));
>>  qemu_bh_delete(s->bh);
>> +object_unref(OBJECT(s->iothread));
>>  g_free(s);
>>  }
>>
>
> With this applied on top of Stefan's AioContext patches, both
> managedsave and device_del with dataplane devices work for me.
>
> Tested-by: Cornelia Huck 

Reviewed-by: Stefan Hajnoczi

Re: [Qemu-devel] [PATCH v2 0/2] AioContext: fix deadlock after aio_context_acquire() race

2015-07-28 Thread Stefan Hajnoczi

On Tue, Jul 28, 2015 at 1:12 PM, Stefan Hajnoczi  wrote:
> v2:
>  * Free BHs after thread_pool_free(), which calls qemu_bh_delete() [Cornelia]
>  * Remove assert for leaked BHs since we don't know how many existing cases
>there are yet and QEMU 2.4-rc3 is a poor time to risk assertion failures

The v2 isn't necessary if we apply Paolo's "[PATCH for-2.4] block:
delete bottom halves before the AioContext is freed" on top of my v1.
Paolo has audited all BHs so the risk of hitting assertion failures is
very low.

Stefan

1 2 3 >

1 - 100 of 227 matches

Mail list logo