date:20160131

Re: [Qemu-devel] [PATCH v3 2/3] .travis.yml: run make check for all matrix targets

2016-01-31 Thread Alex Bennée


David Gibson  writes:

> On Thu, Jan 28, 2016 at 02:23:28PM +, Alex Bennée wrote:
>> We only ran make check once before it used to be an unreliable target.
>> It was only a stop gap measure and we should be able to revert it now.
>> This also stops us needing a large all-MMU build.
>>
>> We disable "make check" for a couple of the extra config targets which
>> are currently broken.
>>
>> Signed-off-by: Alex Bennée 
>
> So, in general I like the idea of running make check more widely.
>
> However.. I was wondering - what's the rationale for having separate
> matrix builds for each target (or small group) rather than just doing
> one build with all the targets?

Each individual part of the matrix can be run in parallel with the
others so it makes sense to keep the build component small (as each
softmmu target rebuilds a significant chunk of the build).

Having said that there is a fair amount of repetition as we are
repeating all the generic qtests each time just so we can run the extra
${TARGET}-qtest binaries.

Travis does has an option for using ccache so it might be worth
experimenting with that to see if things are improved.

> I can't see any obvious benefit to splitting the build that way, but
> it does increase the total build time significantly - and will do so
> rather more so with make check added.

Elapsed and total are the ones to look at:

https://travis-ci.org/stsquad/qemu/builds/105401126

vs

https://travis-ci.org/qemu/qemu/builds/105711606

However it looks like Travis are having scaling growing pains because
there "old style" VM approach is running a lot faster than it used to.

>
>> ---
>>  .travis.yml | 15 ---
>>  1 file changed, 8 insertions(+), 7 deletions(-)
>>
>> diff --git a/.travis.yml b/.travis.yml
>> index 4a0c23a..16be23f 100644
>> --- a/.travis.yml
>> +++ b/.travis.yml
>> @@ -40,7 +40,7 @@ notifications:
>>  on_failure: always
>>  env:
>>global:
>> -- TEST_CMD=""
>> +- TEST_CMD="make check"
>>  - EXTRA_CONFIG=""
>>matrix:
>>  # Group major targets together with their linux-user counterparts
>> @@ -73,17 +73,14 @@ script:
>>  matrix:
>># We manually include a number of additional build for non-standard bits
>>include:
>> -# Make check target (we only do this once)
>> -- env:
>> -- 
>> TARGETS=alpha-softmmu,arm-softmmu,aarch64-softmmu,cris-softmmu,i386-softmmu,x86_64-softmmu,m68k-softmmu,microblaze-softmmu,microblazeel-softmmu,mips-softmmu,mips64-softmmu,mips64el-softmmu,mipsel-softmmu,or32-softmmu,ppc-softmmu,ppc64-softmmu,ppcemb-softmmu,s390x-softmmu,sh4-softmmu,sh4eb-softmmu,sparc-softmmu,sparc64-softmmu,unicore32-softmmu,lm32-softmmu,moxie-softmmu,tricore-softmmu,xtensa-softmmu,xtensaeb-softmmu
>> -  TEST_CMD="make check"
>> -  compiler: gcc
>>  # Debug related options
>>  - env: TARGETS=i386-softmmu,x86_64-softmmu
>> EXTRA_CONFIG="--enable-debug"
>>compiler: gcc
>> +# We currently disable "make check"
>>  - env: TARGETS=i386-softmmu,x86_64-softmmu
>> EXTRA_CONFIG="--enable-debug --enable-tcg-interpreter"
>> +   TEST_CMD=""
>>compiler: gcc
>>  # Disable a few of the optional features
>>  - env: TARGETS=i386-softmmu,x86_64-softmmu
>> @@ -104,11 +101,15 @@ matrix:
>>  - env: TARGETS=i386-softmmu,x86_64-softmmu
>> EXTRA_CONFIG="--enable-trace-backends=simple"
>>compiler: gcc
>> +# We currently disable "make check"
>>  - env: TARGETS=i386-softmmu,x86_64-softmmu
>> EXTRA_CONFIG="--enable-trace-backends=ftrace"
>> +   TEST_CMD=""
>>compiler: gcc
>> +# We currently disable "make check"
>>  - env: TARGETS=i386-softmmu,x86_64-softmmu
>> -  EXTRA_CONFIG="--enable-trace-backends=ust"
>> +   EXTRA_CONFIG="--enable-trace-backends=ust"
>> +   TEST_CMD=""
>>compiler: gcc
>>  - env: TARGETS=i386-softmmu,x86_64-softmmu
>> EXTRA_CONFIG="--enable-modules"


--
Alex Bennée

Re: [Qemu-devel] [PATCH v3 0/3] Travis updates

2016-01-31 Thread Alex Bennée

Michael Tokarev  writes:

> 28.01.2016 17:23, Alex Bennée wrote:
>> Hi,
>>
>> The first patch has been reviewed and signed off. Long term I think
>> it is worth applying but it look like the performance increase it
>> negligible compared to the old style VM builds at the moment. I
>> suspect this may be because the new infrastructure is under more load
>> as more projects have migrated.
>
> I'm applying all this to -trivial, but without trying to understand
> what it is all about, as I don't know travis.

Maybe we should hold off for now until the later ones have been reviewed
(David has some comments on 2/3). I guess it was really a question of if
trivial is the right tree for these patches to go upstream.

Maybe what's really needed is a build and test automation tree (and
associated maintainer)? Peter any comments?

>
> Thanks,
>
> /mjt

--
Alex Bennée

Re: [Qemu-devel] [PATCH v3 2/3] .travis.yml: run make check for all matrix targets

2016-01-31 Thread David Gibson

On Sun, Jan 31, 2016 at 08:37:49AM +, Alex Bennée wrote:
> 
> David Gibson  writes:
> 
> > On Thu, Jan 28, 2016 at 02:23:28PM +, Alex Bennée wrote:
> >> We only ran make check once before it used to be an unreliable target.
> >> It was only a stop gap measure and we should be able to revert it now.
> >> This also stops us needing a large all-MMU build.
> >>
> >> We disable "make check" for a couple of the extra config targets which
> >> are currently broken.
> >>
> >> Signed-off-by: Alex Bennée 
> >
> > So, in general I like the idea of running make check more widely.
> >
> > However.. I was wondering - what's the rationale for having separate
> > matrix builds for each target (or small group) rather than just doing
> > one build with all the targets?
> 
> Each individual part of the matrix can be run in parallel with the
> others so it makes sense to keep the build component small (as each
> softmmu target rebuilds a significant chunk of the build).

It does rebuild a significant chunk, but there's a significant chunk
that isn't rebuilt as well.  When I tried this I found a recombined
build marginally decreases the elapsed time and significantly (maybe
30-40%) reduced the total time.  Given the load the travis system is
under, it seems to me that we should try to keep our total demand on
its resources down when it doens't significantly lower our coverage.

> Having said that there is a fair amount of repetition as we are
> repeating all the generic qtests each time just so we can run the extra
> ${TARGET}-qtest binaries.

That too.

> Travis does has an option for using ccache so it might be worth
> experimenting with that to see if things are improved.

That does sound like something worth looking at.

One thing that does annoy me about travis is that it will do a full
rebuild if you have two branches on exactly the same commit, or if you
revert a branch to an earlier commit which was built previously.

> > I can't see any obvious benefit to splitting the build that way, but
> > it does increase the total build time significantly - and will do so
> > rather more so with make check added.
> 
> Elapsed and total are the ones to look at:
> 
> https://travis-ci.org/stsquad/qemu/builds/105401126
> 
> vs
> 
> https://travis-ci.org/qemu/qemu/builds/105711606
> 
> However it looks like Travis are having scaling growing pains because
> there "old style" VM approach is running a lot faster than it used
> to.

Not terribly surprising TBH.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[Qemu-devel] [PATCH 03/10] virtio: introduce qemu_get/put_virtqueue_element

2016-01-31 Thread Paolo Bonzini

Move allocation to virtio functions also when loading/saving a
VirtQueueElement.  This will also let the load/save functions
keep backwards compatibility when the VirtQueueElement layout
is changed.

Reviewed-by: Cornelia Huck 
Signed-off-by: Paolo Bonzini 
---
 hw/block/virtio-blk.c   | 10 +++---
 hw/char/virtio-serial-bus.c | 10 +++---
 hw/scsi/virtio-scsi.c   |  7 ++-
 hw/virtio/virtio.c  | 13 +
 include/hw/virtio/virtio.h  |  2 ++
 5 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index a874cb7..75b2bfc 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -807,8 +807,7 @@ static void virtio_blk_save_device(VirtIODevice *vdev, 
QEMUFile *f)
 
 while (req) {
 qemu_put_sbyte(f, 1);
-qemu_put_buffer(f, (unsigned char *)&req->elem,
-sizeof(VirtQueueElement));
+qemu_put_virtqueue_element(f, &req->elem);
 req = req->next;
 }
 qemu_put_sbyte(f, 0);
@@ -831,14 +830,11 @@ static int virtio_blk_load_device(VirtIODevice *vdev, 
QEMUFile *f,
 VirtIOBlock *s = VIRTIO_BLK(vdev);
 
 while (qemu_get_sbyte(f)) {
-VirtIOBlockReq *req = g_new(VirtIOBlockReq, 1);
+VirtIOBlockReq *req;
+req = qemu_get_virtqueue_element(f, sizeof(VirtIOBlockReq));
 virtio_blk_init_request(s, req);
-qemu_get_buffer(f, (unsigned char *)&req->elem,
-sizeof(VirtQueueElement));
 req->next = s->rq;
 s->rq = req;
-
-virtqueue_map(&req->elem);
 }
 
 return 0;
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index f0c1c45..12ce64a 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -645,9 +645,7 @@ static void virtio_serial_save_device(VirtIODevice *vdev, 
QEMUFile *f)
 if (elem_popped) {
 qemu_put_be32s(f, &port->iov_idx);
 qemu_put_be64s(f, &port->iov_offset);
-
-qemu_put_buffer(f, (unsigned char *)port->elem,
-sizeof(VirtQueueElement));
+qemu_put_virtqueue_element(f, port->elem);
 }
 }
 }
@@ -722,10 +720,8 @@ static int fetch_active_ports_list(QEMUFile *f, int 
version_id,
 qemu_get_be32s(f, &port->iov_idx);
 qemu_get_be64s(f, &port->iov_offset);
 
-port->elem = g_new(VirtQueueElement, 1);
-qemu_get_buffer(f, (unsigned char *)port->elem,
-sizeof(VirtQueueElement));
-virtqueue_map(port->elem);
+port->elem =
+qemu_get_virtqueue_element(f, sizeof(VirtQueueElement));
 
 /*
  *  Port was throttled on source machine.  Let's
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index ca20a1d..789cf38 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -188,7 +188,7 @@ static void virtio_scsi_save_request(QEMUFile *f, 
SCSIRequest *sreq)
 
 assert(n < vs->conf.num_queues);
 qemu_put_be32s(f, &n);
-qemu_put_buffer(f, (unsigned char *)&req->elem, sizeof(req->elem));
+qemu_put_virtqueue_element(f, &req->elem);
 }
 
 static void *virtio_scsi_load_request(QEMUFile *f, SCSIRequest *sreq)
@@ -201,12 +201,9 @@ static void *virtio_scsi_load_request(QEMUFile *f, 
SCSIRequest *sreq)
 
 qemu_get_be32s(f, &n);
 assert(n < vs->conf.num_queues);
-req = g_malloc(sizeof(VirtIOSCSIReq) + vs->cdb_size);
-qemu_get_buffer(f, (unsigned char *)&req->elem, sizeof(req->elem));
+req = qemu_get_virtqueue_element(f, sizeof(VirtIOSCSIReq) + vs->cdb_size);
 virtio_scsi_init_req(s, vs->cmd_vqs[n], req);
 
-virtqueue_map(&req->elem);
-
 if (virtio_scsi_parse_req(req, sizeof(VirtIOSCSICmdReq) + vs->cdb_size,
   sizeof(VirtIOSCSICmdResp) + vs->sense_size) < 0) 
{
 error_report("invalid SCSI request migration data");
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 9b2c0bf..388e91c 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -576,6 +576,19 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
 return elem;
 }
 
+void *qemu_get_virtqueue_element(QEMUFile *f, size_t sz)
+{
+VirtQueueElement *elem = g_malloc(sz);
+qemu_get_buffer(f, (uint8_t *)elem, sizeof(VirtQueueElement));
+virtqueue_map(elem);
+return elem;
+}
+
+void qemu_put_virtqueue_element(QEMUFile *f, VirtQueueElement *elem)
+{
+qemu_put_buffer(f, (uint8_t *)elem, sizeof(VirtQueueElement));
+}
+
 /* virtio device */
 static void virtio_notify_vector(VirtIODevice *vdev, uint16_t vector)
 {
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 21fda17..44da9a8 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -153,6 +153,8 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement 
*elem,
 
 void virtqueue_map(VirtQueueElement *elem);
 void *v

[Qemu-devel] [PATCH 04/10] virtio: introduce virtqueue_alloc_element

2016-01-31 Thread Paolo Bonzini

Allocate the arrays for in_addr/out_addr/in_sg/out_sg outside the
VirtQueueElement.  For now, virtqueue_pop and vring_pop keep
allocating a very large VirtQueueElement.

Reviewed-by: Cornelia Huck 
Signed-off-by: Paolo Bonzini 
---
v1->v2: add assertions on sz [Conny]

 hw/virtio/dataplane/vring.c |   3 +-
 hw/virtio/virtio.c  | 110 +++-
 include/hw/virtio/virtio.h  |   9 ++--
 3 files changed, 105 insertions(+), 17 deletions(-)

diff --git a/hw/virtio/dataplane/vring.c b/hw/virtio/dataplane/vring.c
index 11e7f9f..c950caa 100644
--- a/hw/virtio/dataplane/vring.c
+++ b/hw/virtio/dataplane/vring.c
@@ -402,8 +402,7 @@ void *vring_pop(VirtIODevice *vdev, Vring *vring, size_t sz)
 goto out;
 }
 
-assert(sz >= sizeof(VirtQueueElement));
-elem = g_malloc(sz);
+elem = virtqueue_alloc_element(sz, VIRTQUEUE_MAX_SIZE, VIRTQUEUE_MAX_SIZE);
 
 /* Initialize elem so it can be safely unmapped */
 elem->in_num = elem->out_num = 0;
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 388e91c..f49c5ae 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -494,11 +494,30 @@ static void virtqueue_map_iovec(struct iovec *sg, hwaddr 
*addr,
 void virtqueue_map(VirtQueueElement *elem)
 {
 virtqueue_map_iovec(elem->in_sg, elem->in_addr, &elem->in_num,
-MIN(ARRAY_SIZE(elem->in_sg), 
ARRAY_SIZE(elem->in_addr)),
-1);
+VIRTQUEUE_MAX_SIZE, 1);
 virtqueue_map_iovec(elem->out_sg, elem->out_addr, &elem->out_num,
-MIN(ARRAY_SIZE(elem->out_sg), 
ARRAY_SIZE(elem->out_addr)),
-0);
+VIRTQUEUE_MAX_SIZE, 0);
+}
+
+void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_num)
+{
+VirtQueueElement *elem;
+size_t in_addr_ofs = QEMU_ALIGN_UP(sz, __alignof__(elem->in_addr[0]));
+size_t out_addr_ofs = in_addr_ofs + in_num * sizeof(elem->in_addr[0]);
+size_t out_addr_end = out_addr_ofs + out_num * sizeof(elem->out_addr[0]);
+size_t in_sg_ofs = QEMU_ALIGN_UP(out_addr_end, 
__alignof__(elem->in_sg[0]));
+size_t out_sg_ofs = in_sg_ofs + in_num * sizeof(elem->in_sg[0]);
+size_t out_sg_end = out_sg_ofs + out_num * sizeof(elem->out_sg[0]);
+
+assert(sz >= sizeof(VirtQueueElement));
+elem = g_malloc(out_sg_end);
+elem->out_num = out_num;
+elem->in_num = in_num;
+elem->in_addr = (void *)elem + in_addr_ofs;
+elem->out_addr = (void *)elem + out_addr_ofs;
+elem->in_sg = (void *)elem + in_sg_ofs;
+elem->out_sg = (void *)elem + out_sg_ofs;
+return elem;
 }
 
 void *virtqueue_pop(VirtQueue *vq, size_t sz)
@@ -513,8 +532,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
 }
 
 /* When we start there are none of either input nor output. */
-assert(sz >= sizeof(VirtQueueElement));
-elem = g_malloc(sz);
+elem = virtqueue_alloc_element(sz, VIRTQUEUE_MAX_SIZE, VIRTQUEUE_MAX_SIZE);
 elem->out_num = elem->in_num = 0;
 
 max = vq->vring.num;
@@ -541,14 +559,14 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
 struct iovec *sg;
 
 if (vring_desc_flags(vdev, desc_pa, i) & VRING_DESC_F_WRITE) {
-if (elem->in_num >= ARRAY_SIZE(elem->in_sg)) {
+if (elem->in_num >= VIRTQUEUE_MAX_SIZE) {
 error_report("Too many write descriptors in indirect table");
 exit(1);
 }
 elem->in_addr[elem->in_num] = vring_desc_addr(vdev, desc_pa, i);
 sg = &elem->in_sg[elem->in_num++];
 } else {
-if (elem->out_num >= ARRAY_SIZE(elem->out_sg)) {
+if (elem->out_num >= VIRTQUEUE_MAX_SIZE) {
 error_report("Too many read descriptors in indirect table");
 exit(1);
 }
@@ -576,17 +594,87 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
 return elem;
 }
 
+/* Reading and writing a structure directly to QEMUFile is *awful*, but
+ * it is what QEMU has always done by mistake.  We can change it sooner
+ * or later by bumping the version number of the affected vm states.
+ * In the meanwhile, since the in-memory layout of VirtQueueElement
+ * has changed, we need to marshal to and from the layout that was
+ * used before the change.
+ */
+typedef struct VirtQueueElementOld {
+unsigned int index;
+unsigned int out_num;
+unsigned int in_num;
+hwaddr in_addr[VIRTQUEUE_MAX_SIZE];
+hwaddr out_addr[VIRTQUEUE_MAX_SIZE];
+struct iovec in_sg[VIRTQUEUE_MAX_SIZE];
+struct iovec out_sg[VIRTQUEUE_MAX_SIZE];
+} VirtQueueElementOld;
+
 void *qemu_get_virtqueue_element(QEMUFile *f, size_t sz)
 {
-VirtQueueElement *elem = g_malloc(sz);
-qemu_get_buffer(f, (uint8_t *)elem, sizeof(VirtQueueElement));
+VirtQueueElement *elem;
+VirtQueueElementOld data;
+int i;
+
+qemu_get_buffer(f, (uint8_t *)&data, sizeof(VirtQueueElementOld));
+
+elem = virtque

[Qemu-devel] [PATCH v2 00/10] virtio/vring: optimization patches

2016-01-31 Thread Paolo Bonzini

This includes two optimization of virtio:

- "slimming down" VirtQueueElements by not including room for
  1024 buffers.  This makes malloc much faster.

- optimizations to limit the number of address_space_translate
  calls in virtio.c, from Vincenzo and myself.

Thanks,

Paolo

v1->v2: improved commit messages [Conny]
add assertions on sz [Conny]
change bools from 1 and 0 to "true" and "false" [Conny]
update shadow avail_idx in virtio_queue_set_last_avail_idx [Michael]
collect Reviewed-by

Paolo Bonzini (7):
  virtio: move VirtQueueElement at the beginning of the structs
  virtio: move allocation to virtqueue_pop/vring_pop
  virtio: introduce qemu_get/put_virtqueue_element
  virtio: introduce virtqueue_alloc_element
  virtio: slim down allocation of VirtQueueElements
  vring: slim down allocation of VirtQueueElements
  virtio: combine the read of a descriptor

Vincenzo Maffione (3):
  virtio: cache used_idx in a VirtQueue field
  virtio: read avail_idx from VQ only when necessary
  virtio: combine write of an entry into used ring

 hw/9pfs/9p.c|   2 +-
 hw/9pfs/virtio-9p-device.c  |  17 +-
 hw/9pfs/virtio-9p.h |   2 +-
 hw/block/dataplane/virtio-blk.c |  11 +-
 hw/block/virtio-blk.c   |  23 +--
 hw/char/virtio-serial-bus.c |  78 +
 hw/display/virtio-gpu.c |  25 ++-
 hw/input/virtio-input.c |  24 ++-
 hw/net/virtio-net.c |  69 +---
 hw/scsi/virtio-scsi-dataplane.c |  15 +-
 hw/scsi/virtio-scsi.c   |  26 ++-
 hw/virtio/dataplane/vring.c |  62 ---
 hw/virtio/virtio-balloon.c  |  22 ++-
 hw/virtio/virtio-rng.c  |  10 +-
 hw/virtio/virtio.c  | 340 +---
 include/hw/virtio/dataplane/vring.h |   2 +-
 include/hw/virtio/virtio-balloon.h  |   2 +-
 include/hw/virtio/virtio-blk.h  |   5 +-
 include/hw/virtio/virtio-net.h  |   2 +-
 include/hw/virtio/virtio-scsi.h |  15 +-
 include/hw/virtio/virtio-serial.h   |   2 +-
 include/hw/virtio/virtio.h  |  13 +-
 22 files changed, 486 insertions(+), 281 deletions(-)

-- 
2.5.0

[Qemu-devel] [PATCH 01/10] virtio: move VirtQueueElement at the beginning of the structs

2016-01-31 Thread Paolo Bonzini

The next patch will make virtqueue_pop/vring_pop allocate memory for
the VirtQueueElement. In some cases (blk, scsi, gpu) the device wants
to extend VirtQueueElement with device-specific fields and, until now,
the place of the VirtQueueElement within the containing struct didn't
matter. When allocating the entire block in virtqueue_pop/vring_pop,
however, the containing struct must basically be a "subclass" of
VirtQueueElement, with the VirtQueueElement as the first field. Make
that the case for blk and scsi; gpu is already doing it.

Signed-off-by: Paolo Bonzini 
---
 hw/scsi/virtio-scsi.c   |  3 +--
 include/hw/virtio/virtio-blk.h  |  2 +-
 include/hw/virtio/virtio-scsi.h | 13 ++---
 3 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 607593c..df8e379 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -44,8 +44,7 @@ VirtIOSCSIReq *virtio_scsi_init_req(VirtIOSCSI *s, VirtQueue 
*vq)
 {
 VirtIOSCSIReq *req;
 VirtIOSCSICommon *vs = (VirtIOSCSICommon *)s;
-const size_t zero_skip = offsetof(VirtIOSCSIReq, elem)
- + sizeof(VirtQueueElement);
+const size_t zero_skip = offsetof(VirtIOSCSIReq, vring);
 
 req = g_malloc(sizeof(*req) + vs->cdb_size);
 req->vq = vq;
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index ae11a63..403ab86 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -60,9 +60,9 @@ typedef struct VirtIOBlock {
 } VirtIOBlock;
 
 typedef struct VirtIOBlockReq {
+VirtQueueElement elem;
 int64_t sector_num;
 VirtIOBlock *dev;
-VirtQueueElement elem;
 struct virtio_blk_inhdr *in;
 struct virtio_blk_outhdr out;
 QEMUIOVector qiov;
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 088fe9f..63f5b51 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -102,18 +102,17 @@ typedef struct VirtIOSCSI {
 } VirtIOSCSI;
 
 typedef struct VirtIOSCSIReq {
+/* Note:
+ * - fields up to resp_iov are initialized by virtio_scsi_init_req;
+ * - fields starting at vring are zeroed by virtio_scsi_init_req.
+ * */
+VirtQueueElement elem;
+
 VirtIOSCSI *dev;
 VirtQueue *vq;
 QEMUSGList qsgl;
 QEMUIOVector resp_iov;
 
-/* Note:
- * - fields before elem are initialized by virtio_scsi_init_req;
- * - elem is uninitialized at the time of allocation.
- * - fields after elem are zeroed by virtio_scsi_init_req.
- * */
-
-VirtQueueElement elem;
 /* Set by dataplane code. */
 VirtIOSCSIVring *vring;
 
-- 
2.5.0

[Qemu-devel] [PATCH 05/10] virtio: slim down allocation of VirtQueueElements

2016-01-31 Thread Paolo Bonzini

Build the addresses and s/g lists on the stack, and then copy them
to a VirtQueueElement that is just as big as required to contain this
particular s/g list.  The cost of the copy is minimal compared to that
of a large malloc.

When virtqueue_map is used on the destination side of migration or on
loadvm, the iovecs have already been split at memory region boundary,
so we can just reuse the out_num/in_num we find in the file.

Reviewed-by: Cornelia Huck 
Signed-off-by: Paolo Bonzini 
---
v1->v2: change bools from 1 and 0 to "true" and "false" [Conny]

 hw/virtio/virtio.c | 82 +-
 1 file changed, 51 insertions(+), 31 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index f49c5ae..79a635f 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -448,6 +448,32 @@ int virtqueue_avail_bytes(VirtQueue *vq, unsigned int 
in_bytes,
 return in_bytes <= in_total && out_bytes <= out_total;
 }
 
+static void virtqueue_map_desc(unsigned int *p_num_sg, hwaddr *addr, struct 
iovec *iov,
+   unsigned int max_num_sg, bool is_write,
+   hwaddr pa, size_t sz)
+{
+unsigned num_sg = *p_num_sg;
+assert(num_sg <= max_num_sg);
+
+while (sz) {
+hwaddr len = sz;
+
+if (num_sg == max_num_sg) {
+error_report("virtio: too many write descriptors in indirect 
table");
+exit(1);
+}
+
+iov[num_sg].iov_base = cpu_physical_memory_map(pa, &len, is_write);
+iov[num_sg].iov_len = len;
+addr[num_sg] = pa;
+
+sz -= len;
+pa += len;
+num_sg++;
+}
+*p_num_sg = num_sg;
+}
+
 static void virtqueue_map_iovec(struct iovec *sg, hwaddr *addr,
 unsigned int *num_sg, unsigned int max_size,
 int is_write)
@@ -474,20 +500,10 @@ static void virtqueue_map_iovec(struct iovec *sg, hwaddr 
*addr,
 error_report("virtio: error trying to map MMIO memory");
 exit(1);
 }
-if (len == sg[i].iov_len) {
-continue;
-}
-if (*num_sg >= max_size) {
-error_report("virtio: memory split makes iovec too large");
+if (len != sg[i].iov_len) {
+error_report("virtio: unexpected memory split");
 exit(1);
 }
-memmove(sg + i + 1, sg + i, sizeof(*sg) * (*num_sg - i));
-memmove(addr + i + 1, addr + i, sizeof(*addr) * (*num_sg - i));
-assert(len < sg[i + 1].iov_len);
-sg[i].iov_len = len;
-addr[i + 1] += len;
-sg[i + 1].iov_len -= len;
-++*num_sg;
 }
 }
 
@@ -526,14 +542,16 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
 hwaddr desc_pa = vq->vring.desc;
 VirtIODevice *vdev = vq->vdev;
 VirtQueueElement *elem;
+unsigned out_num, in_num;
+hwaddr addr[VIRTQUEUE_MAX_SIZE];
+struct iovec iov[VIRTQUEUE_MAX_SIZE];
 
 if (!virtqueue_num_heads(vq, vq->last_avail_idx)) {
 return NULL;
 }
 
 /* When we start there are none of either input nor output. */
-elem = virtqueue_alloc_element(sz, VIRTQUEUE_MAX_SIZE, VIRTQUEUE_MAX_SIZE);
-elem->out_num = elem->in_num = 0;
+out_num = in_num = 0;
 
 max = vq->vring.num;
 
@@ -556,37 +574,39 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
 
 /* Collect all the descriptors */
 do {
-struct iovec *sg;
+hwaddr pa = vring_desc_addr(vdev, desc_pa, i);
+size_t len = vring_desc_len(vdev, desc_pa, i);
 
 if (vring_desc_flags(vdev, desc_pa, i) & VRING_DESC_F_WRITE) {
-if (elem->in_num >= VIRTQUEUE_MAX_SIZE) {
-error_report("Too many write descriptors in indirect table");
-exit(1);
-}
-elem->in_addr[elem->in_num] = vring_desc_addr(vdev, desc_pa, i);
-sg = &elem->in_sg[elem->in_num++];
+virtqueue_map_desc(&in_num, addr + out_num, iov + out_num,
+   VIRTQUEUE_MAX_SIZE - out_num, true, pa, len);
 } else {
-if (elem->out_num >= VIRTQUEUE_MAX_SIZE) {
-error_report("Too many read descriptors in indirect table");
+if (in_num) {
+error_report("Incorrect order for descriptors");
 exit(1);
 }
-elem->out_addr[elem->out_num] = vring_desc_addr(vdev, desc_pa, i);
-sg = &elem->out_sg[elem->out_num++];
+virtqueue_map_desc(&out_num, addr, iov,
+   VIRTQUEUE_MAX_SIZE, false, pa, len);
 }
 
-sg->iov_len = vring_desc_len(vdev, desc_pa, i);
-
 /* If we've got too many, that implies a descriptor loop. */
-if ((elem->in_num + elem->out_num) > max) {
+if ((in_num + out_num) > max) {
 error_report("Looped descriptor");
 exit(1);
 }
 } while ((i = virtqueue_ne

[Qemu-devel] [PATCH 06/10] vring: slim down allocation of VirtQueueElements

2016-01-31 Thread Paolo Bonzini

Build the addresses and s/g lists on the stack, and then copy them
to a VirtQueueElement that is just as big as required to contain this
particular s/g list.  The cost of the copy is minimal compared to that
of a large malloc.

Reviewed-by: Cornelia Huck 
Signed-off-by: Paolo Bonzini 
---
 hw/virtio/dataplane/vring.c | 53 ++---
 1 file changed, 36 insertions(+), 17 deletions(-)

diff --git a/hw/virtio/dataplane/vring.c b/hw/virtio/dataplane/vring.c
index c950caa..d6b8ba9 100644
--- a/hw/virtio/dataplane/vring.c
+++ b/hw/virtio/dataplane/vring.c
@@ -217,8 +217,14 @@ bool vring_should_notify(VirtIODevice *vdev, Vring *vring)
 new, old);
 }
 
-
-static int get_desc(Vring *vring, VirtQueueElement *elem,
+typedef struct VirtQueueCurrentElement {
+unsigned in_num;
+unsigned out_num;
+hwaddr addr[VIRTQUEUE_MAX_SIZE];
+struct iovec iov[VIRTQUEUE_MAX_SIZE];
+} VirtQueueCurrentElement;
+
+static int get_desc(Vring *vring, VirtQueueCurrentElement *elem,
 struct vring_desc *desc)
 {
 unsigned *num;
@@ -229,12 +235,12 @@ static int get_desc(Vring *vring, VirtQueueElement *elem,
 
 if (desc->flags & VRING_DESC_F_WRITE) {
 num = &elem->in_num;
-iov = &elem->in_sg[*num];
-addr = &elem->in_addr[*num];
+iov = &elem->iov[elem->out_num + *num];
+addr = &elem->addr[elem->out_num + *num];
 } else {
 num = &elem->out_num;
-iov = &elem->out_sg[*num];
-addr = &elem->out_addr[*num];
+iov = &elem->iov[*num];
+addr = &elem->addr[*num];
 
 /* If it's an output descriptor, they're all supposed
  * to come before any input descriptors. */
@@ -298,7 +304,8 @@ static bool read_vring_desc(VirtIODevice *vdev,
 
 /* This is stolen from linux/drivers/vhost/vhost.c. */
 static int get_indirect(VirtIODevice *vdev, Vring *vring,
-VirtQueueElement *elem, struct vring_desc *indirect)
+VirtQueueCurrentElement *cur_elem,
+struct vring_desc *indirect)
 {
 struct vring_desc desc;
 unsigned int i = 0, count, found = 0;
@@ -350,7 +357,7 @@ static int get_indirect(VirtIODevice *vdev, Vring *vring,
 return -EFAULT;
 }
 
-ret = get_desc(vring, elem, &desc);
+ret = get_desc(vring, cur_elem, &desc);
 if (ret < 0) {
 vring->broken |= (ret == -EFAULT);
 return ret;
@@ -393,6 +400,7 @@ void *vring_pop(VirtIODevice *vdev, Vring *vring, size_t sz)
 struct vring_desc desc;
 unsigned int i, head, found = 0, num = vring->vr.num;
 uint16_t avail_idx, last_avail_idx;
+VirtQueueCurrentElement cur_elem;
 VirtQueueElement *elem = NULL;
 int ret;
 
@@ -402,10 +410,7 @@ void *vring_pop(VirtIODevice *vdev, Vring *vring, size_t 
sz)
 goto out;
 }
 
-elem = virtqueue_alloc_element(sz, VIRTQUEUE_MAX_SIZE, VIRTQUEUE_MAX_SIZE);
-
-/* Initialize elem so it can be safely unmapped */
-elem->in_num = elem->out_num = 0;
+cur_elem.in_num = cur_elem.out_num = 0;
 
 /* Check it isn't doing very strange things with descriptor numbers. */
 last_avail_idx = vring->last_avail_idx;
@@ -432,8 +437,6 @@ void *vring_pop(VirtIODevice *vdev, Vring *vring, size_t sz)
  * the index we've seen. */
 head = vring_get_avail_ring(vdev, vring, last_avail_idx % num);
 
-elem->index = head;
-
 /* If their number is silly, that's an error. */
 if (unlikely(head >= num)) {
 error_report("Guest says index %u > %u is available", head, num);
@@ -460,14 +463,14 @@ void *vring_pop(VirtIODevice *vdev, Vring *vring, size_t 
sz)
 barrier();
 
 if (desc.flags & VRING_DESC_F_INDIRECT) {
-ret = get_indirect(vdev, vring, elem, &desc);
+ret = get_indirect(vdev, vring, &cur_elem, &desc);
 if (ret < 0) {
 goto out;
 }
 continue;
 }
 
-ret = get_desc(vring, elem, &desc);
+ret = get_desc(vring, &cur_elem, &desc);
 if (ret < 0) {
 goto out;
 }
@@ -482,6 +485,18 @@ void *vring_pop(VirtIODevice *vdev, Vring *vring, size_t 
sz)
 virtio_tswap16(vdev, vring->last_avail_idx);
 }
 
+/* Now copy what we have collected and mapped */
+elem = virtqueue_alloc_element(sz, cur_elem.out_num, cur_elem.in_num);
+elem->index = head;
+for (i = 0; i < cur_elem.out_num; i++) {
+elem->out_addr[i] = cur_elem.addr[i];
+elem->out_sg[i] = cur_elem.iov[i];
+}
+for (i = 0; i < cur_elem.in_num; i++) {
+elem->in_addr[i] = cur_elem.addr[cur_elem.out_num + i];
+elem->in_sg[i] = cur_elem.iov[cur_elem.out_num + i];
+}
+
 return elem;
 
 out:
@@ -489,7 +504,11 @@ out:
 if (ret == -EFAULT) {
 vring->broken = true;
 }
-vring_unmap_element(elem);
+
+for (i = 0; i < cur_elem.out_nu

[Qemu-devel] [PATCH 09/10] virtio: read avail_idx from VQ only when necessary

2016-01-31 Thread Paolo Bonzini

From: Vincenzo Maffione 

The virtqueue_pop() implementation needs to check if the avail ring
contains some pending buffers. To perform this check, it is not
always necessary to fetch the avail_idx in the VQ memory, which is
expensive. This patch introduces a shadow variable tracking avail_idx
and modifies virtio_queue_empty() to access avail_idx in physical
memory only when necessary.

Signed-off-by: Vincenzo Maffione 
Message-Id: 

Reviewed-by: Cornelia Huck 
Signed-off-by: Paolo Bonzini 
---
v1->v2: update shadow avail_idx in virtio_queue_set_last_avail_idx
[Michael]

 hw/virtio/virtio.c | 26 ++
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 5116a2e..6842938 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -70,8 +70,13 @@ typedef struct VRing
 struct VirtQueue
 {
 VRing vring;
+
+/* Next head to pop */
 uint16_t last_avail_idx;
 
+/* Last avail_idx read from VQ. */
+uint16_t shadow_avail_idx;
+
 uint16_t used_idx;
 
 /* Last used index value we have signalled on */
@@ -132,7 +137,8 @@ static inline uint16_t vring_avail_idx(VirtQueue *vq)
 {
 hwaddr pa;
 pa = vq->vring.avail + offsetof(VRingAvail, idx);
-return virtio_lduw_phys(vq->vdev, pa);
+vq->shadow_avail_idx = virtio_lduw_phys(vq->vdev, pa);
+return vq->shadow_avail_idx;
 }
 
 static inline uint16_t vring_avail_ring(VirtQueue *vq, int i)
@@ -223,8 +229,14 @@ int virtio_queue_ready(VirtQueue *vq)
 return vq->vring.avail != 0;
 }
 
+/* Fetch avail_idx from VQ memory only when we really need to know if
+ * guest has added some buffers. */
 int virtio_queue_empty(VirtQueue *vq)
 {
+if (vq->shadow_avail_idx != vq->last_avail_idx) {
+return 0;
+}
+
 return vring_avail_idx(vq) == vq->last_avail_idx;
 }
 
@@ -300,7 +312,7 @@ static int virtqueue_num_heads(VirtQueue *vq, unsigned int 
idx)
 /* Check it isn't doing very strange things with descriptor numbers. */
 if (num_heads > vq->vring.num) {
 error_report("Guest moved used index from %u to %u",
- idx, vring_avail_idx(vq));
+ idx, vq->shadow_avail_idx);
 exit(1);
 }
 /* On success, callers read a descriptor at vq->last_avail_idx.
@@ -535,9 +547,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
 struct iovec iov[VIRTQUEUE_MAX_SIZE];
 VRingDesc desc;
 
-if (!virtqueue_num_heads(vq, vq->last_avail_idx)) {
+if (virtio_queue_empty(vq)) {
 return NULL;
 }
+/* Needed after virtio_queue_empty(), see comment in
+ * virtqueue_num_heads(). */
+smp_rmb();
 
 /* When we start there are none of either input nor output. */
 out_num = in_num = 0;
@@ -786,6 +801,7 @@ void virtio_reset(void *opaque)
 vdev->vq[i].vring.avail = 0;
 vdev->vq[i].vring.used = 0;
 vdev->vq[i].last_avail_idx = 0;
+vdev->vq[i].shadow_avail_idx = 0;
 vdev->vq[i].used_idx = 0;
 virtio_queue_set_vector(vdev, i, VIRTIO_NO_VECTOR);
 vdev->vq[i].signalled_used = 0;
@@ -1155,7 +1171,7 @@ static bool vring_notify(VirtIODevice *vdev, VirtQueue 
*vq)
 smp_mb();
 /* Always notify when queue is empty (when feature acknowledge) */
 if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFY_ON_EMPTY) &&
-!vq->inuse && vring_avail_idx(vq) == vq->last_avail_idx) {
+!vq->inuse && virtio_queue_empty(vq)) {
 return true;
 }
 
@@ -1579,6 +1595,7 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int 
version_id)
 return -1;
 }
 vdev->vq[i].used_idx = vring_used_idx(&vdev->vq[i]);
+vdev->vq[i].shadow_avail_idx = vring_avail_idx(&vdev->vq[i]);
 }
 }
 
@@ -1714,6 +1731,7 @@ uint16_t virtio_queue_get_last_avail_idx(VirtIODevice 
*vdev, int n)
 void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n, uint16_t idx)
 {
 vdev->vq[n].last_avail_idx = idx;
+vdev->vq[n].shadow_avail_idx = idx;
 }
 
 void virtio_queue_invalidate_signalled_used(VirtIODevice *vdev, int n)
-- 
2.5.0

[Qemu-devel] [PATCH 10/10] virtio: combine write of an entry into used ring

2016-01-31 Thread Paolo Bonzini

From: Vincenzo Maffione 

Fill in an element of the used ring with a single combined access to the
guest physical memory, rather than using two separated accesses.
This reduces the overhead due to expensive address translation.

Signed-off-by: Vincenzo Maffione 
Message-Id: 

Reviewed-by: Cornelia Huck 
Signed-off-by: Paolo Bonzini 
---
 hw/virtio/virtio.c | 25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 6842938..241c7e3 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -153,18 +153,15 @@ static inline uint16_t vring_get_used_event(VirtQueue *vq)
 return vring_avail_ring(vq, vq->vring.num);
 }
 
-static inline void vring_used_ring_id(VirtQueue *vq, int i, uint32_t val)
+static inline void vring_used_write(VirtQueue *vq, VRingUsedElem *uelem,
+int i)
 {
 hwaddr pa;
-pa = vq->vring.used + offsetof(VRingUsed, ring[i].id);
-virtio_stl_phys(vq->vdev, pa, val);
-}
-
-static inline void vring_used_ring_len(VirtQueue *vq, int i, uint32_t val)
-{
-hwaddr pa;
-pa = vq->vring.used + offsetof(VRingUsed, ring[i].len);
-virtio_stl_phys(vq->vdev, pa, val);
+virtio_tswap32s(vq->vdev, &uelem->id);
+virtio_tswap32s(vq->vdev, &uelem->len);
+pa = vq->vring.used + offsetof(VRingUsed, ring[i]);
+address_space_write(&address_space_memory, pa, MEMTXATTRS_UNSPECIFIED,
+   (void *)uelem, sizeof(VRingUsedElem));
 }
 
 static uint16_t vring_used_idx(VirtQueue *vq)
@@ -273,15 +270,17 @@ void virtqueue_discard(VirtQueue *vq, const 
VirtQueueElement *elem,
 void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
 unsigned int len, unsigned int idx)
 {
+VRingUsedElem uelem;
+
 trace_virtqueue_fill(vq, elem, len, idx);
 
 virtqueue_unmap_sg(vq, elem, len);
 
 idx = (idx + vq->used_idx) % vq->vring.num;
 
-/* Get a pointer to the next entry in the used ring. */
-vring_used_ring_id(vq, idx, elem->index);
-vring_used_ring_len(vq, idx, len);
+uelem.id = elem->index;
+uelem.len = len;
+vring_used_write(vq, &uelem, idx);
 }
 
 void virtqueue_flush(VirtQueue *vq, unsigned int count)
-- 
2.5.0

[Qemu-devel] [PATCH 07/10] virtio: combine the read of a descriptor

2016-01-31 Thread Paolo Bonzini

Compared to vring, virtio has a performance penalty of 10%.  Fix it
by combining all the reads for a descriptor in a single address_space_read
call.  This also simplifies the code nicely.

Reviewed-by: Cornelia Huck 
Signed-off-by: Paolo Bonzini 
---
 hw/virtio/virtio.c | 86 ++
 1 file changed, 35 insertions(+), 51 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 79a635f..2433866 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -107,35 +107,15 @@ void virtio_queue_update_rings(VirtIODevice *vdev, int n)
   vring->align);
 }
 
-static inline uint64_t vring_desc_addr(VirtIODevice *vdev, hwaddr desc_pa,
-   int i)
+static void vring_desc_read(VirtIODevice *vdev, VRingDesc *desc,
+hwaddr desc_pa, int i)
 {
-hwaddr pa;
-pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, addr);
-return virtio_ldq_phys(vdev, pa);
-}
-
-static inline uint32_t vring_desc_len(VirtIODevice *vdev, hwaddr desc_pa, int 
i)
-{
-hwaddr pa;
-pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, len);
-return virtio_ldl_phys(vdev, pa);
-}
-
-static inline uint16_t vring_desc_flags(VirtIODevice *vdev, hwaddr desc_pa,
-int i)
-{
-hwaddr pa;
-pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, flags);
-return virtio_lduw_phys(vdev, pa);
-}
-
-static inline uint16_t vring_desc_next(VirtIODevice *vdev, hwaddr desc_pa,
-   int i)
-{
-hwaddr pa;
-pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, next);
-return virtio_lduw_phys(vdev, pa);
+address_space_read(&address_space_memory, desc_pa + i * sizeof(VRingDesc),
+   MEMTXATTRS_UNSPECIFIED, (void *)desc, 
sizeof(VRingDesc));
+virtio_tswap64s(vdev, &desc->addr);
+virtio_tswap32s(vdev, &desc->len);
+virtio_tswap16s(vdev, &desc->flags);
+virtio_tswap16s(vdev, &desc->next);
 }
 
 static inline uint16_t vring_avail_flags(VirtQueue *vq)
@@ -345,18 +325,18 @@ static unsigned int virtqueue_get_head(VirtQueue *vq, 
unsigned int idx)
 return head;
 }
 
-static unsigned virtqueue_next_desc(VirtIODevice *vdev, hwaddr desc_pa,
-unsigned int i, unsigned int max)
+static unsigned virtqueue_read_next_desc(VirtIODevice *vdev, VRingDesc *desc,
+ hwaddr desc_pa, unsigned int max)
 {
 unsigned int next;
 
 /* If this descriptor says it doesn't chain, we're done. */
-if (!(vring_desc_flags(vdev, desc_pa, i) & VRING_DESC_F_NEXT)) {
+if (!(desc->flags & VRING_DESC_F_NEXT)) {
 return max;
 }
 
 /* Check they're not leading us off end of descriptors. */
-next = vring_desc_next(vdev, desc_pa, i);
+next = desc->next;
 /* Make sure compiler knows to grab that: we don't want it changing! */
 smp_wmb();
 
@@ -365,6 +345,7 @@ static unsigned virtqueue_next_desc(VirtIODevice *vdev, 
hwaddr desc_pa,
 exit(1);
 }
 
+vring_desc_read(vdev, desc, desc_pa, next);
 return next;
 }
 
@@ -381,6 +362,7 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int 
*in_bytes,
 while (virtqueue_num_heads(vq, idx)) {
 VirtIODevice *vdev = vq->vdev;
 unsigned int max, num_bufs, indirect = 0;
+VRingDesc desc;
 hwaddr desc_pa;
 int i;
 
@@ -388,9 +370,10 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int 
*in_bytes,
 num_bufs = total_bufs;
 i = virtqueue_get_head(vq, idx++);
 desc_pa = vq->vring.desc;
+vring_desc_read(vdev, &desc, desc_pa, i);
 
-if (vring_desc_flags(vdev, desc_pa, i) & VRING_DESC_F_INDIRECT) {
-if (vring_desc_len(vdev, desc_pa, i) % sizeof(VRingDesc)) {
+if (desc.flags & VRING_DESC_F_INDIRECT) {
+if (desc.len % sizeof(VRingDesc)) {
 error_report("Invalid size for indirect buffer table");
 exit(1);
 }
@@ -403,9 +386,10 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int 
*in_bytes,
 
 /* loop over the indirect descriptor table */
 indirect = 1;
-max = vring_desc_len(vdev, desc_pa, i) / sizeof(VRingDesc);
-desc_pa = vring_desc_addr(vdev, desc_pa, i);
+max = desc.len / sizeof(VRingDesc);
+desc_pa = desc.addr;
 num_bufs = i = 0;
+vring_desc_read(vdev, &desc, desc_pa, i);
 }
 
 do {
@@ -415,15 +399,15 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned 
int *in_bytes,
 exit(1);
 }
 
-if (vring_desc_flags(vdev, desc_pa, i) & VRING_DESC_F_WRITE) {
-in_total += vring_desc_len(vdev, desc_pa, i);
+if (desc.flags & VRING_DESC_F_WRITE) {
+in_total += des

[Qemu-devel] [PATCH 08/10] virtio: cache used_idx in a VirtQueue field

2016-01-31 Thread Paolo Bonzini

From: Vincenzo Maffione 

Accessing used_idx in the VQ requires an expensive access to
guest physical memory. Before this patch, 3 accesses are normally
done for each pop/push/notify call. However, since the used_idx is
only written by us, we can track it in our internal data structure.

Signed-off-by: Vincenzo Maffione 
Message-Id: 
<3d062ec54e9a7bf9fb325c1fd693564951f2b319.1450218353.git.v.maffi...@gmail.com>
Reviewed-by: Cornelia Huck 
Signed-off-by: Paolo Bonzini 
---
 hw/virtio/virtio.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 2433866..5116a2e 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -71,6 +71,9 @@ struct VirtQueue
 {
 VRing vring;
 uint16_t last_avail_idx;
+
+uint16_t used_idx;
+
 /* Last used index value we have signalled on */
 uint16_t signalled_used;
 
@@ -170,6 +173,7 @@ static inline void vring_used_idx_set(VirtQueue *vq, 
uint16_t val)
 hwaddr pa;
 pa = vq->vring.used + offsetof(VRingUsed, idx);
 virtio_stw_phys(vq->vdev, pa, val);
+vq->used_idx = val;
 }
 
 static inline void vring_used_flags_set_bit(VirtQueue *vq, int mask)
@@ -261,7 +265,7 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement 
*elem,
 
 virtqueue_unmap_sg(vq, elem, len);
 
-idx = (idx + vring_used_idx(vq)) % vq->vring.num;
+idx = (idx + vq->used_idx) % vq->vring.num;
 
 /* Get a pointer to the next entry in the used ring. */
 vring_used_ring_id(vq, idx, elem->index);
@@ -274,7 +278,7 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
 /* Make sure buffer is written before we update index. */
 smp_wmb();
 trace_virtqueue_flush(vq, count);
-old = vring_used_idx(vq);
+old = vq->used_idx;
 new = old + count;
 vring_used_idx_set(vq, new);
 vq->inuse -= count;
@@ -782,6 +786,7 @@ void virtio_reset(void *opaque)
 vdev->vq[i].vring.avail = 0;
 vdev->vq[i].vring.used = 0;
 vdev->vq[i].last_avail_idx = 0;
+vdev->vq[i].used_idx = 0;
 virtio_queue_set_vector(vdev, i, VIRTIO_NO_VECTOR);
 vdev->vq[i].signalled_used = 0;
 vdev->vq[i].signalled_used_valid = false;
@@ -1161,7 +1166,7 @@ static bool vring_notify(VirtIODevice *vdev, VirtQueue 
*vq)
 v = vq->signalled_used_valid;
 vq->signalled_used_valid = true;
 old = vq->signalled_used;
-new = vq->signalled_used = vring_used_idx(vq);
+new = vq->signalled_used = vq->used_idx;
 return !v || vring_need_event(vring_get_used_event(vq), new, old);
 }
 
@@ -1573,6 +1578,7 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int 
version_id)
  vdev->vq[i].last_avail_idx, nheads);
 return -1;
 }
+vdev->vq[i].used_idx = vring_used_idx(&vdev->vq[i]);
 }
 }
 
-- 
2.5.0

[Qemu-devel] [PATCH 02/10] virtio: move allocation to virtqueue_pop/vring_pop

2016-01-31 Thread Paolo Bonzini

The return code of virtqueue_pop/vring_pop is unused except to check for
errors or 0.  We can thus easily move allocation inside the functions
and just return a pointer to the VirtQueueElement.

The advantage is that we will be able to allocate only the space that
is needed for the actual size of the s/g list instead of the full
VIRTQUEUE_MAX_SIZE items.  Currently VirtQueueElement takes about 48K
of memory, and this kind of allocation puts a lot of stress on malloc.
By cutting the size by two or three orders of magnitude, malloc can
use much more efficient algorithms.

The patch is pretty large, but changes to each device are testable
more or less independently.  Splitting it would mostly add churn.

Signed-off-by: Paolo Bonzini 
---
v1->v2: add assertions on sz [Conny]

 hw/9pfs/9p.c|  2 +-
 hw/9pfs/virtio-9p-device.c  | 17 
 hw/9pfs/virtio-9p.h |  2 +-
 hw/block/dataplane/virtio-blk.c | 11 +++--
 hw/block/virtio-blk.c   | 15 +++
 hw/char/virtio-serial-bus.c | 80 +++--
 hw/display/virtio-gpu.c | 25 +++-
 hw/input/virtio-input.c | 24 +++
 hw/net/virtio-net.c | 69 
 hw/scsi/virtio-scsi-dataplane.c | 15 +++
 hw/scsi/virtio-scsi.c   | 18 -
 hw/virtio/dataplane/vring.c | 18 +
 hw/virtio/virtio-balloon.c  | 22 ++
 hw/virtio/virtio-rng.c  | 10 +++--
 hw/virtio/virtio.c  | 12 --
 include/hw/virtio/dataplane/vring.h |  2 +-
 include/hw/virtio/virtio-balloon.h  |  2 +-
 include/hw/virtio/virtio-blk.h  |  3 +-
 include/hw/virtio/virtio-net.h  |  2 +-
 include/hw/virtio/virtio-scsi.h |  2 +-
 include/hw/virtio/virtio-serial.h   |  2 +-
 include/hw/virtio/virtio.h  |  2 +-
 22 files changed, 212 insertions(+), 143 deletions(-)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 3ff3106..ad1ae96 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -1586,7 +1586,7 @@ static int v9fs_xattr_read(V9fsState *s, V9fsPDU *pdu, 
V9fsFidState *fidp,
 int read_count;
 int64_t xattr_len;
 V9fsVirtioState *v = container_of(s, V9fsVirtioState, state);
-VirtQueueElement *elem = &v->elems[pdu->idx];
+VirtQueueElement *elem = v->elems[pdu->idx];
 
 xattr_len = fidp->fs.xattr.len;
 read_count = xattr_len - off;
diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
index 5643fd5..a0fe179 100644
--- a/hw/9pfs/virtio-9p-device.c
+++ b/hw/9pfs/virtio-9p-device.c
@@ -25,10 +25,12 @@ void virtio_9p_push_and_notify(V9fsPDU *pdu)
 {
 V9fsState *s = pdu->s;
 V9fsVirtioState *v = container_of(s, V9fsVirtioState, state);
-VirtQueueElement *elem = &v->elems[pdu->idx];
+VirtQueueElement *elem = v->elems[pdu->idx];
 
 /* push onto queue and notify */
 virtqueue_push(v->vq, elem, pdu->size);
+g_free(elem);
+v->elems[pdu->idx] = NULL;
 
 /* FIXME: we should batch these completions */
 virtio_notify(VIRTIO_DEVICE(v), v->vq);
@@ -47,10 +49,10 @@ static void handle_9p_output(VirtIODevice *vdev, VirtQueue 
*vq)
 uint8_t id;
 uint16_t tag_le;
 } QEMU_PACKED out;
-VirtQueueElement *elem = &v->elems[pdu->idx];
+VirtQueueElement *elem;
 
-len = virtqueue_pop(vq, elem);
-if (!len) {
+elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+if (!elem) {
 pdu_free(pdu);
 break;
 }
@@ -58,6 +60,7 @@ static void handle_9p_output(VirtIODevice *vdev, VirtQueue 
*vq)
 BUG_ON(elem->out_num == 0 || elem->in_num == 0);
 QEMU_BUILD_BUG_ON(sizeof out != 7);
 
+v->elems[pdu->idx] = elem;
 len = iov_to_buf(elem->out_sg, elem->out_num, 0,
  &out, sizeof out);
 BUG_ON(len != sizeof out);
@@ -140,7 +143,7 @@ ssize_t virtio_pdu_vmarshal(V9fsPDU *pdu, size_t offset,
 {
 V9fsState *s = pdu->s;
 V9fsVirtioState *v = container_of(s, V9fsVirtioState, state);
-VirtQueueElement *elem = &v->elems[pdu->idx];
+VirtQueueElement *elem = v->elems[pdu->idx];
 
 return v9fs_iov_vmarshal(elem->in_sg, elem->in_num, offset, 1, fmt, ap);
 }
@@ -150,7 +153,7 @@ ssize_t virtio_pdu_vunmarshal(V9fsPDU *pdu, size_t offset,
 {
 V9fsState *s = pdu->s;
 V9fsVirtioState *v = container_of(s, V9fsVirtioState, state);
-VirtQueueElement *elem = &v->elems[pdu->idx];
+VirtQueueElement *elem = v->elems[pdu->idx];
 
 return v9fs_iov_vunmarshal(elem->out_sg, elem->out_num, offset, 1, fmt, 
ap);
 }
@@ -160,7 +163,7 @@ void virtio_init_iov_from_pdu(V9fsPDU *pdu, struct iovec 
**piov,
 {
 V9fsState *s = pdu->s;
 V9fsVirtioState *v = container_of(s, V9fsVirtioState, state);
-VirtQueueElement *elem = &v->elems[pdu->idx];
+VirtQueueElement *elem = v->elems[pdu->idx];
 
 if (is_write) {
 *piov

Re: [Qemu-devel] [PATCH v3 0/3] Travis updates

2016-01-31 Thread Peter Maydell

On 31 January 2016 at 08:43, Alex Bennée  wrote:
> Maybe what's really needed is a build and test automation tree (and
> associated maintainer)? Peter any comments?

If you want to be the submaintainer for that, be my guest :-)

thanks
-- PMM

Re: [Qemu-devel] [PATCH v5] qom, qmp, hmp, qapi: create qom-type-prop-list for class properties

2016-01-31 Thread Valentin Rakush

Hi Eduardo,

I will try to answer some of your questions at this email and will answer
other questions later.

> Can you clarify what you mean by "TYPE_DEVICE has its own
> properties"? TYPE_DEVICE properties are registered as normal QOM
> properties.

It is possible that I do not understand object model correctly

This commit 16bf7f522a2f adds GHashTable *properties; to the ObjectClass
struct in the include/qom/object.h
The typedef struct DeviceClass from include/hw/qdev-core.h is inherited
from ObjectClass. Also DeviceClass has it own properties
Property *props.

In the device_list_properties we call

static DevicePropertyInfo *make_device_property_info

Which tries to downcast class to DEVICE_CLASS

for (prop = DEVICE_CLASS(klass)->props; prop && prop->name; prop++) {

So we are using Property *props, defined in the DeviceClass, but we do not
use GHashTable * properties, defined in the ObjectClass. Here I mean that
DeviceClass has its own properties.

> I don't understand what you mean, here. GlobalProperties are not
> machine properties, they are just property=value pairs to be
> registered as global properties. They are unrelated to the
> properties TYPE_MACHINE actually has.

Same here. The struct MachineClass is defined in the include/hw/boards.h It
has a member GlobalProperty *compat_props;
But after commit 16bf7f522a2f it would be better to use ObjectClass
properties. IMHO. I did not check how compat_props are used in the code yet.

> Could you clarify what you mean by "process different classes
> differently"?

In the list_device_properties function we should have several conditional
statements like

if (machine = object_class_dynamic_cast(class, TYPE_MACHINE)) {
/* process machine properties using MachineClass GlobalProperty
*compat_props; */
}
else if (machine = object_class_dynamic_cast(class, TYPE_DEVICE)) {
/* process device class properties, using DeviceClass Property *props; */
}
else if (machine = object_class_dynamic_cast(class, TYPE_CPU)) {
/* process CPU, using ObjectClass GHashTable *properties; */
}

> 5) -cpu options:
>
> Ditto. the list will be incomplete unless all CPU subclasses are
> converted to use only class-properties, or the new command uses
> object_new().

This is a use case that I initially tried to implement.

Regards,
Valentin

On Fri, Jan 29, 2016 at 6:28 PM, Eduardo Habkost 
wrote:

> On Fri, Jan 29, 2016 at 01:03:38PM +0300, Valentin Rakush wrote:
> > Hi Eduardo, hi Daniel,
> >
> > I checked most of the classes that are used for x86_64 qemu simulation
> with
> > this command line:
> > x86_64-softmmu/qemu-system-x86_64 -qmp tcp:localhost:,server,nowait
> > -machine pc -cpu core2duo
> >
> > Here are some of the classes that cannot provide properties with
> > device_list_properties call:
> > /object/machine/generic-pc-machine/pc-0.13-machine
> > /object/bus/i2c-bus
> > /interface/user-creatable
> > /object/tls-creds/tls-creds-anon
> > /object/memory-backend/memory-backend-file
> > /object/qemu:memory-region
> > /object/rng-backend/rng-random
> > /object/tpm-backend/tpm-passthrough
> > /object/tls-creds/tls-creds-x509
> > /object/secret
> >
> > They cannot provide properties because these classes cannot be casted to
> > TYPE_DEVICE. This is done intentionally because TYPE_DEVICE has its own
> > properties.
>
> Can you clarify what you mean by "TYPE_DEVICE has its own
> properties"? TYPE_DEVICE properties are registered as normal QOM
> properties.
>
> We can still add a new command that's not specific for
> TYPE_DEVICE (if necessary). The point is that it shouldn't return
> arbitrarily different (and incomplete) data from the existing
> mechanism to list properties.
>
> In other words, I don't see why the output of "qom-type-prop-list
> " can't be as good as the output of "device-list-properties
> ". If we make return only class-properties, it will be less
> complete and less useful.
>
>
> > Also TYPE_MACHINE has own properties of type GlobalProperty.
>
> I don't understand what you mean, here. GlobalProperties are not
> machine properties, they are just property=value pairs to be
> registered as global properties. They are unrelated to the
> properties TYPE_MACHINE actually has.
>
> > Here are two ways (AFAICS):
> > - we refactor TYPE_DEVICE and TYPE_MACHINE so they store their properties
> > in the ObjectClass properties.
>
> Too many classes need to be converted. We would still need
> something to use during the transiation.
>
> > - we change device_list_properties so it process different classes
> > differently.
>
> Could you clarify what you mean by "process different classes
> differently"?
>
> A third option is to just use object_new(), like
> qmp_device_list_properties() already does.
>
> >
> > The disadvantage of the second approach, is that it is complicating code
> in
> > favor of simplifying qapi interface. I like first approach with
> > refactoring, although it is more complex. The first approach should put
> all
> > properties in the base classes an

Re: [Qemu-devel] [PATCH 0/2] checkpatch: Fixing two cases of false positives in checkpatch.pl

2016-01-31 Thread Leonid Bloch

ping

http://patchwork.ozlabs.org/patch/537763
http://patchwork.ozlabs.org/patch/537762

On Mon, Jan 11, 2016 at 2:12 PM, Markus Armbruster 
wrote:

> Copying Paolo.
>
> Leonid Bloch  writes:
>
> > This series addresses two cases where errors were printed if whitespaces
> > appeared in front of a square bracket in places where there should be no
> > problem with such placements (please see messages of individual commits).
> >
> > Leonid Bloch (2):
> >   checkpatch: Eliminate false positive in case of comma-space-square
> > bracket
> >   checkpatch: Eliminate false positive in case of space before square
> > bracket in a definition
> >
> >  scripts/checkpatch.pl | 6 +-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
>
>

Re: [Qemu-devel] [PATCH] qemu-ga: Fixed minor version switch issue

2016-01-31 Thread Leonid Bloch

ping

http://patchwork.ozlabs.org/patch/565712

On Mon, Jan 11, 2016 at 11:12 AM, Leonid Bloch  wrote:

> With automatically generated GUID, on minor version changes, an error
> occurred, stating that there is a problem with the installer.
> Now, a notification is shown, warning the user that another version of
> this product is already installed, and that configuration or removal of
> the existing version is possible through Add/Remove Programs on the
> Control Panel (expected behavior).
>
> Signed-off-by: Leonid Bloch 
> ---
>  qga/installer/qemu-ga.wxs | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/qga/installer/qemu-ga.wxs b/qga/installer/qemu-ga.wxs
> index 9473875..7f92891 100644
> --- a/qga/installer/qemu-ga.wxs
> +++ b/qga/installer/qemu-ga.wxs
> @@ -41,7 +41,7 @@
>
>  Name="QEMU guest agent"
> -Id="*"
> +Id="{DF9974AD-E41A-4304-81AD-69AA8F299766}"
>  UpgradeCode="{EB6B8302-C06E-4BEC-ADAC-932C68A3A98D}"
>  Manufacturer="$(env.QEMU_GA_MANUFACTURER)"
>  Version="$(env.QEMU_GA_VERSION)"
> --
> 2.4.3
>
>

[Qemu-devel] Strange monitor/stdout issue on qemu-system-sparc/qemu-system-ppc

2016-01-31 Thread Mark Cave-Ayland

Hi Daniel,

Commit d0d7708ba29cbcc343364a46bff981e0ff88366f "qemu-char: add logfile
facility to all chardev backends" appears to be causing problems with
the monitor and stdin/stdout on both qemu-system-sparc/qemu-system-ppc here.

On current git master I see the following behaviour changes when
launching qemu-system-ppc/qemu-system-sparc directly from the command
line with no parameters:

- Serial port output is now also written to stdout

- The monitor output also appears on stdout by default, rather than
being enabled with CTRL-A-c. However if I enter this sequence manually
on stdin then I am unable to enter or interact with the monitor.

- If I use the gtk interface to switch to the monitor terminal and
attempt to type, I get multiple copies of each letter echoed back in the
graphical terminal

Are there other changes that need to be made to these targets in order
for the functionality to work as before?


Many thanks,

Mark.

Re: [Qemu-devel] Strange monitor/stdout issue on qemu-system-sparc/qemu-system-ppc

2016-01-31 Thread Peter Maydell

On 31 January 2016 at 15:19, Mark Cave-Ayland
 wrote:
> Hi Daniel,
>
> Commit d0d7708ba29cbcc343364a46bff981e0ff88366f "qemu-char: add logfile
> facility to all chardev backends" appears to be causing problems with
> the monitor and stdin/stdout on both qemu-system-sparc/qemu-system-ppc here.

These should be fixed by https://patchwork.ozlabs.org/patch/571128/
I think (the duplicate output at least; the multiple-echoback I'm
not so sure about).

thanks
-- PMM

Re: [Qemu-devel] [PATCH RFC V5 0/9] Implement GIC-500 from GICv3 family for arm64

2016-01-31 Thread Shlomo Pongratz

On Friday, January 29, 2016, Christopher Covington 
wrote:

> On 10/20/2015 01:22 PM, Shlomo Pongratz wrote:
> > From: Shlomo Pongratz >
> >
> > This patch is a first step multicores support for arm64.
> >
> > This implemntation was tested up to 100 cores.
> >
> > Things left to do:
> >
> > Support SPI, ITS and ITS CONTROL, note that this patch porpose is to
> enable
> > running multi cores using the "virt" virtual machine and this goal is
> achived
> > without that.
> >
> > Add GICv2 backwards competability. Since there is a GICv2 implementation
> I
> > can't see the pusprose for it.
> >
> > Special thanks to Peter Crostwaite whose patch to th Linux (kernel) i.e.
> > Implement cpu_relax as yield solved the problem of the boot process
> getting
> > stuck for 24 cores and more.
> >
> > Figure out why virtual machine name changed from virt-v3 to
> virt-v3-machine
>
> Hi Shlomo,
>
> Were you planning on another revision of this patchset? Are there any
> things you would like help with?
>
> Peter,
>
> Do you have any thoughts about what is essential and what isn't for a
> first wave of TCG GICv3 patches to be mergeable?
>
> Thanks,
> Christopher Covington
>
> --
> Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project
>

Hi,

I will do a new revision of the GICv3.
I needed to get a time slot from my employee in order to do the work and I
got one starting next week.

Best regards,

S.P.

Re: [Qemu-devel] Strange monitor/stdout issue on qemu-system-sparc/qemu-system-ppc

2016-01-31 Thread Mark Cave-Ayland

On 31/01/16 15:34, Peter Maydell wrote:

> On 31 January 2016 at 15:19, Mark Cave-Ayland
>  wrote:
>> Hi Daniel,
>>
>> Commit d0d7708ba29cbcc343364a46bff981e0ff88366f "qemu-char: add logfile
>> facility to all chardev backends" appears to be causing problems with
>> the monitor and stdin/stdout on both qemu-system-sparc/qemu-system-ppc here.
> 
> These should be fixed by https://patchwork.ozlabs.org/patch/571128/
> I think (the duplicate output at least; the multiple-echoback I'm
> not so sure about).

Aha! A quick test here shows that the patch fixes the serial port
appearing on stdout and entering the monitor, but I still see the
multiple echo problem in the GTK GUI.

I also notice that with the above commit I lose cycling through history
in the GTK monitor - even with the multiple echo, instead of the up/down
arrow keys cycling through the history instead I see the codes ^[[B and
^[[A being output to the window instead.

ATB,

Mark.

Re: [Qemu-devel] [PATCH v2 0/2] Architectural watchpoint check

2016-01-31 Thread Sergey Fedorov

On 29.01.2016 22:17, Sergey Fedorov wrote:
> This series is intended to fix ARM watchpoint emulation misbehavior.
> QEMU hangs when QEMU watchpoint fires but it does not pass additional
> architectural checks in ARM CPU debug exception handler. For details,
> please see individual patches. The most relevant parts of the original
> discussion about ARM breakpoint and watchpoint emulation misbehavior can be
> found at:
> https://lists.gnu.org/archive/html/qemu-devel/2015-08/msg02715.html
> https://lists.gnu.org/archive/html/qemu-devel/2015-09/msg00527.html
>
> Changes in v2:
>  * Check moved before setting cpu->watchpoint_hit
>  * Pointer to watchpoint being checked passed to debug_check_watchpoint()
>callback
>  * Comment for debug_check_watchpoint() callback improved
>
> Sergey Fedorov (2):
>   cpu: Add callback to check architectural watchpoint match
>   target-arm: Implement checking of fired watchpoint
>
>  exec.c |  5 +
>  include/qom/cpu.h  |  3 +++
>  qom/cpu.c  |  9 +
>  target-arm/cpu.c   |  1 +
>  target-arm/internals.h |  3 +++
>  target-arm/op_helper.c | 35 +--
>  6 files changed, 42 insertions(+), 14 deletions(-)
>

Please ignore this series. Somehow I send the old patches again. I'm
sending v3 with the correct patches.

Best regards,
Sergey

[Qemu-devel] [PATCH v3 2/2] target-arm: Implement checking of fired watchpoint

2016-01-31 Thread Sergey Fedorov

ARM stops before access to a location covered by watchpoint. Also, QEMU
watchpoint fire is not necessarily an architectural watchpoint match.
Unfortunately, that is hardly possible to ignore a fired watchpoint in
debug exception handler. So move watchpoint check from debug exception
handler to the dedicated watchpoint checking callback.

Signed-off-by: Sergey Fedorov 
Reviewed-by: Peter Maydell 
---
 target-arm/cpu.c   |  1 +
 target-arm/internals.h |  3 +++
 target-arm/op_helper.c | 35 +--
 3 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/target-arm/cpu.c b/target-arm/cpu.c
index 0e582c4..21ec18e 100644
--- a/target-arm/cpu.c
+++ b/target-arm/cpu.c
@@ -1474,6 +1474,7 @@ static void arm_cpu_class_init(ObjectClass *oc, void 
*data)
 cc->gdb_arch_name = arm_gdb_arch_name;
 cc->gdb_stop_before_watchpoint = true;
 cc->debug_excp_handler = arm_debug_excp_handler;
+cc->debug_check_watchpoint = arm_debug_check_watchpoint;
 
 cc->disas_set_info = arm_disas_set_info;
 
diff --git a/target-arm/internals.h b/target-arm/internals.h
index d226bbe..16d9487 100644
--- a/target-arm/internals.h
+++ b/target-arm/internals.h
@@ -409,6 +409,9 @@ void hw_breakpoint_update(ARMCPU *cpu, int n);
  */
 void hw_breakpoint_update_all(ARMCPU *cpu);
 
+/* Callback function for checking if a watchpoint should trigger. */
+bool arm_debug_check_watchpoint(CPUState *cs, CPUWatchpoint *wp);
+
 /* Callback function for when a watchpoint or breakpoint triggers. */
 void arm_debug_excp_handler(CPUState *cs);
 
diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c
index a5ee65f..4adf9cc 100644
--- a/target-arm/op_helper.c
+++ b/target-arm/op_helper.c
@@ -975,6 +975,16 @@ void HELPER(check_breakpoints)(CPUARMState *env)
 }
 }
 
+bool arm_debug_check_watchpoint(CPUState *cs, CPUWatchpoint *wp)
+{
+/* Called by core code when a CPU watchpoint fires; need to check if this
+ * is also an architectural watchpoint match.
+ */
+ARMCPU *cpu = ARM_CPU(cs);
+
+return check_watchpoints(cpu);
+}
+
 void arm_debug_excp_handler(CPUState *cs)
 {
 /* Called by core code when a watchpoint or breakpoint fires;
@@ -986,23 +996,20 @@ void arm_debug_excp_handler(CPUState *cs)
 
 if (wp_hit) {
 if (wp_hit->flags & BP_CPU) {
+bool wnr = (wp_hit->flags & BP_WATCHPOINT_HIT_WRITE) != 0;
+bool same_el = arm_debug_target_el(env) == arm_current_el(env);
+
 cs->watchpoint_hit = NULL;
-if (check_watchpoints(cpu)) {
-bool wnr = (wp_hit->flags & BP_WATCHPOINT_HIT_WRITE) != 0;
-bool same_el = arm_debug_target_el(env) == arm_current_el(env);
-
-if (extended_addresses_enabled(env)) {
-env->exception.fsr = (1 << 9) | 0x22;
-} else {
-env->exception.fsr = 0x2;
-}
-env->exception.vaddress = wp_hit->hitaddr;
-raise_exception(env, EXCP_DATA_ABORT,
-syn_watchpoint(same_el, 0, wnr),
-arm_debug_target_el(env));
+
+if (extended_addresses_enabled(env)) {
+env->exception.fsr = (1 << 9) | 0x22;
 } else {
-cpu_resume_from_signal(cs, NULL);
+env->exception.fsr = 0x2;
 }
+env->exception.vaddress = wp_hit->hitaddr;
+raise_exception(env, EXCP_DATA_ABORT,
+syn_watchpoint(same_el, 0, wnr),
+arm_debug_target_el(env));
 }
 } else {
 uint64_t pc = is_a64(env) ? env->pc : env->regs[15];
-- 
1.9.1

[Qemu-devel] [PATCH v3 1/2] cpu: Add callback to check architectural watchpoint match

2016-01-31 Thread Sergey Fedorov

When QEMU watchpoint matches, that is not definitely an architectural
watchpoint match yet. If it is a stop-before-access watchpoint then that
is hardly possible to ignore it after throwing a TCG exception.

A special callback is introduced to check for architectural watchpoint
match before raising a TCG exception.

Signed-off-by: Sergey Fedorov 
---
 exec.c| 6 ++
 include/qom/cpu.h | 4 
 qom/cpu.c | 9 +
 3 files changed, 19 insertions(+)

diff --git a/exec.c b/exec.c
index 9e076bc..14e7c76 100644
--- a/exec.c
+++ b/exec.c
@@ -2024,6 +2024,7 @@ static const MemoryRegionOps notdirty_mem_ops = {
 static void check_watchpoint(int offset, int len, MemTxAttrs attrs, int flags)
 {
 CPUState *cpu = current_cpu;
+CPUClass *cc = CPU_GET_CLASS(cpu);
 CPUArchState *env = cpu->env_ptr;
 target_ulong pc, cs_base;
 target_ulong vaddr;
@@ -2049,6 +2050,11 @@ static void check_watchpoint(int offset, int len, 
MemTxAttrs attrs, int flags)
 wp->hitaddr = vaddr;
 wp->hitattrs = attrs;
 if (!cpu->watchpoint_hit) {
+if (wp->flags & BP_CPU &&
+!cc->debug_check_watchpoint(cpu, wp)) {
+wp->flags &= ~BP_WATCHPOINT_HIT;
+continue;
+}
 cpu->watchpoint_hit = wp;
 tb_check_watchpoint(cpu);
 if (wp->flags & BP_STOP_BEFORE_ACCESS) {
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 035179c..984bc8d 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -64,6 +64,7 @@ typedef uint64_t vaddr;
 #define CPU_GET_CLASS(obj) OBJECT_GET_CLASS(CPUClass, (obj), TYPE_CPU)
 
 typedef struct CPUState CPUState;
+typedef struct CPUWatchpoint CPUWatchpoint;
 
 typedef void (*CPUUnassignedAccess)(CPUState *cpu, hwaddr addr,
 bool is_write, bool is_exec, int opaque,
@@ -106,6 +107,8 @@ struct TranslationBlock;
  *   a memory access with the specified memory transaction attributes.
  * @gdb_read_register: Callback for letting GDB read a register.
  * @gdb_write_register: Callback for letting GDB write a register.
+ * @debug_check_watchpoint: Callback: return true if the architectural
+ *   watchpoint whose address has matched should really fire.
  * @debug_excp_handler: Callback for handling debug exceptions.
  * @write_elf64_note: Callback for writing a CPU-specific ELF note to a
  * 64-bit VM coredump.
@@ -165,6 +168,7 @@ typedef struct CPUClass {
 int (*asidx_from_attrs)(CPUState *cpu, MemTxAttrs attrs);
 int (*gdb_read_register)(CPUState *cpu, uint8_t *buf, int reg);
 int (*gdb_write_register)(CPUState *cpu, uint8_t *buf, int reg);
+bool (*debug_check_watchpoint)(CPUState *cpu, CPUWatchpoint *wp);
 void (*debug_excp_handler)(CPUState *cpu);
 
 int (*write_elf64_note)(WriteCoreDumpFunction f, CPUState *cpu,
diff --git a/qom/cpu.c b/qom/cpu.c
index 8f537a4..5a6a47e 100644
--- a/qom/cpu.c
+++ b/qom/cpu.c
@@ -188,6 +188,14 @@ static int cpu_common_gdb_write_register(CPUState *cpu, 
uint8_t *buf, int reg)
 return 0;
 }
 
+static bool cpu_common_debug_check_watchpoint(CPUState *cpu, CPUWatchpoint *wp)
+{
+/* If no extra check is required, QEMU watchpoint match can be considered
+ * as an architectural match.
+ */
+return true;
+}
+
 bool target_words_bigendian(void);
 static bool cpu_common_virtio_is_big_endian(CPUState *cpu)
 {
@@ -352,6 +360,7 @@ static void cpu_class_init(ObjectClass *klass, void *data)
 k->gdb_write_register = cpu_common_gdb_write_register;
 k->virtio_is_big_endian = cpu_common_virtio_is_big_endian;
 k->debug_excp_handler = cpu_common_noop;
+k->debug_check_watchpoint = cpu_common_debug_check_watchpoint;
 k->cpu_exec_enter = cpu_common_noop;
 k->cpu_exec_exit = cpu_common_noop;
 k->cpu_exec_interrupt = cpu_common_exec_interrupt;
-- 
1.9.1

[Qemu-devel] [PATCH v3 0/2] Architectural watchpoint check

2016-01-31 Thread Sergey Fedorov

This series is intended to fix ARM watchpoint emulation misbehavior.
QEMU hangs when QEMU watchpoint fires but it does not pass additional
architectural checks in ARM CPU debug exception handler. For details,
please see individual patches. The most relevant parts of the original
discussion about ARM breakpoint and watchpoint emulation misbehavior can be
found at:
https://lists.gnu.org/archive/html/qemu-devel/2015-08/msg02715.html
https://lists.gnu.org/archive/html/qemu-devel/2015-09/msg00527.html

Changes in v2:
 * Check moved before setting cpu->watchpoint_hit
 * Pointer to watchpoint being checked passed to debug_check_watchpoint()
   callback
 * BP_WATCHPOINT_HIT flag cleared from flags from wp->flags in no-fire case
 * Comment for debug_check_watchpoint() callback improved


Sergey Fedorov (2):
  cpu: Add callback to check architectural watchpoint match
  target-arm: Implement checking of fired watchpoint

 exec.c |  6 ++
 include/qom/cpu.h  |  4 
 qom/cpu.c  |  9 +
 target-arm/cpu.c   |  1 +
 target-arm/internals.h |  3 +++
 target-arm/op_helper.c | 35 +--
 6 files changed, 44 insertions(+), 14 deletions(-)

-- 
1.9.1

Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device

2016-01-31 Thread Michael S. Tsirkin

On Fri, Jan 29, 2016 at 12:13:59PM +0100, Igor Mammedov wrote:
> On Thu, 28 Jan 2016 14:59:25 +0200
> "Michael S. Tsirkin"  wrote:
> 
> > On Thu, Jan 28, 2016 at 01:03:16PM +0100, Igor Mammedov wrote:
> > > On Thu, 28 Jan 2016 13:13:04 +0200
> > > "Michael S. Tsirkin"  wrote:
> > >   
> > > > On Thu, Jan 28, 2016 at 11:54:25AM +0100, Igor Mammedov wrote:  
> > > > > Based on Microsoft's specifications (paper can be
> > > > > downloaded from http://go.microsoft.com/fwlink/?LinkId=260709,
> > > > > easily found by "Virtual Machine Generation ID" keywords),
> > > > > add a PCI device with corresponding description in
> > > > > SSDT ACPI table.
> > > > > 
> > > > > The GUID is set using "vmgenid.guid" property or
> > > > > a corresponding HMP/QMP command.
> > > > > 
> > > > > Example of using vmgenid device:
> > > > >  -device vmgenid,id=FOO,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
> > > > > 
> > > > > 'vmgenid' device initialization flow is as following:
> > > > >  1. vmgenid has RAM BAR registered with size of GUID buffer
> > > > >  2. BIOS initializes PCI devices and it maps BAR in PCI hole
> > > > >  3. BIOS reads ACPI tables from QEMU, at that moment tables
> > > > > are generated with \_SB.VMGI.ADDR constant pointing to
> > > > > GPA where BIOS's mapped vmgenid's BAR earlier
> > > > > 
> > > > > Note:
> > > > > This implementation uses PCI class 0x0500 code for vmgenid device,
> > > > > that is marked as NO_DRV in Windows's machine.inf.
> > > > > Testing various Windows versions showed that, OS
> > > > > doesn't touch nor checks for resource conflicts
> > > > > for such PCI devices.
> > > > > There was concern that during PCI rebalancing, OS
> > > > > could reprogram the BAR at other place, which would
> > > > > leave VGEN.ADDR pointing to the old (no more valid)
> > > > > address.
> > > > > However testing showed that Windows does rebalancing
> > > > > only for PCI device that have a driver attached
> > > > > and completely ignores NO_DRV class of devices.
> > > > > Which in turn creates a problem where OS could remap
> > > > > one of PCI devices(with driver) over BAR used by
> > > > > a driver-less PCI device.
> > > > > Statically declaring used memory range as VGEN._CRS
> > > > > makes OS to honor resource reservation and an ignored
> > > > > BAR range is not longer touched during PCI rebalancing.
> > > > > 
> > > > > Signed-off-by: Gal Hammer 
> > > > > Signed-off-by: Igor Mammedov 
> > > > 
> > > > It's an interesting hack, but this needs some thought. BIOS has no idea
> > > > this BAR is special and can not be rebalanced, so it might put the BAR
> > > > in the middle of the range, in effect fragmenting it.  
> > > yep that's the only drawback in PCI approach.
> > >   
> > > > Really I think something like V12 just rewritten using the new APIs
> > > > (probably with something like build_append_named_dword that I suggested)
> > > > would be much a simpler way to implement this device, given
> > > > the weird API limitations.  
> > > We went over stating drawbacks of both approaches several times 
> > > and that's where I strongly disagree with using v12 AML patching
> > > approach for reasons stated in those discussions.  
> > 
> > Yes, IIRC you dislike the need to allocate an IO range to pass address
> > to host, and to have costom code to migrate the address.
> allocating IO ports is fine by me but I'm against using bios_linker (ACPI)
> approach for task at hand,
> let me enumerate one more time the issues that make me dislike it so much
> (in order where most disliked ones go the first):
> 
> 1. over-engineered for the task at hand, 
>for device to become initialized guest OS has to execute AML,
>so init chain looks like:
>  QEMU -> BIOS (patch AML) -> OS (AML write buf address to IO port) ->
>  QEMU (update buf address)
>it's hell to debug when something doesn't work right in this chain

Well this is not very different from e.g. virtio.
If it's just AML that worries you, we could teach BIOS/EFI a new command
to give some addresses after linking back to QEMU. Would this address
this issue?


>even if there isn't any memory corruption that incorrect AML patching
>could introduce.
>As result of complexity patches are hard to review since one has
>to remember/relearn all details how bios_linker in QEMU and BIOS works,
>hence chance of regression is very high.
>Dynamically patched AML also introduces its own share of AML
>code that has to deal with dynamic buff address value.
>For an example:
>  "nvdimm acpi: add _CRS" https://patchwork.ozlabs.org/patch/566697/
>27 liner patch could be just 5-6 lines if static (known in advance)
>buffer address were used to declare static _CRS variable.

Problem is with finding a fixed address, and fragmentation that this
causes.  Look at the mess we have with just allocating addresses for
RAM.  I think it's a mistake to add to this mess.  Either let's teach
management to specify an add

Re: [Qemu-devel] Strange monitor/stdout issue on qemu-system-sparc/qemu-system-ppc

2016-01-31 Thread Paolo Bonzini



On 31/01/2016 16:54, Mark Cave-Ayland wrote:
> Aha! A quick test here shows that the patch fixes the serial port
> appearing on stdout and entering the monitor, but I still see the
> multiple echo problem in the GTK GUI.
> 
> I also notice that with the above commit I lose cycling through history
> in the GTK monitor - even with the multiple echo, instead of the up/down
> arrow keys cycling through the history instead I see the codes ^[[B and
> ^[[A being output to the window instead.

That is probably me.  The echo feature was introduced for QMP, but in
theory it should have been limited to that.  I'll check it, thanks.

Paolo

Re: [Qemu-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks

2016-01-31 Thread Alex Williamson

On Sat, 2016-01-30 at 01:18 +, Kay, Allen M wrote:
> 
> > -Original Message-
> > From: iGVT-g [mailto:igvt-g-boun...@lists.01.org] On Behalf Of Alex
> > Williamson
> > Sent: Friday, January 29, 2016 10:00 AM
> > To: Gerd Hoffmann
> > Cc: igv...@ml01.01.org; xen-de...@lists.xensource.com; Eduardo Habkost;
> > Stefano Stabellini; qemu-devel@nongnu.org; Cao jin; vfio-
> > us...@redhat.com
> > Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> > tweaks
> > 
> > Do guest drivers depend on IGD appearing at 00:02.0?  I'm currently testing
> > for any Intel VGA device, but I wonder if I should only be enabling anything
> > opregion if it also appears at a specific address.
> > 
> 
> No.  Both Windows and Linux IGD driver should work at any PCI slot.  We have 
> seen 0:5.0 in the guest and the driver works.

Thanks Allen.  Another question, when I boot a VM with an assigned HD
P4000 GPU, my console stream with IOMMU faults, like:

DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3 
DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3 
DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3 
DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3 
DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3 

All of these fall within the host RMRR range for the device:

DMAR: Setting RMRR:
DMAR: Setting identity map for device :00:02.0 [0x9f80 - 0xaf9f]

A while back, we excluded devices using RMRRs from participating in
IOMMU API domains because they may continue to DMA to these reserved
regions after assignment, possibly corrupting VM memory
(c875d2c1b808).  Intel later decided this exclusion shouldn't apply to
graphics devices (18436afdc11a).  Don't the above IOMMU faults reveal
that exactly the problem we're trying to prevent by general exclusion of
RMRR encumbered devices from the IOMMU API is actually occuring?  If I
were to have VM memory within the RMRR address range, I wouldn't be
seeing these faults, I'd be having the GPU corrupt my VM memory.

David notes in the latter commit above:

"We should be able to successfully assign graphics devices to guests
too, as long as the initial handling of stolen memory is reconfigured
appropriately."

What code is supposed to be doing that reconfiguration when a device is
assigned?  Clearly we don't have it yet, making assignment of these
devices very unsafe.  It seems like vfio or IOMMU code  in the kernel
needs device specific code to clear these settings to make it safe for
userspace, then perhaps VM BIOS support to reallocate.  Is there any
consistency across IGD revisions for doing this?  Is there a spec?
Thanks,

Alex

Re: [Qemu-devel] Strange monitor/stdout issue on qemu-system-sparc/qemu-system-ppc

2016-01-31 Thread Peter Maydell

On 31 January 2016 at 17:19, Paolo Bonzini  wrote:
> On 31/01/2016 16:54, Mark Cave-Ayland wrote:
>> I also notice that with the above commit I lose cycling through history
>> in the GTK monitor - even with the multiple echo, instead of the up/down
>> arrow keys cycling through the history instead I see the codes ^[[B and
>> ^[[A being output to the window instead.
>
> That is probably me.  The echo feature was introduced for QMP, but in
> theory it should have been limited to that.  I'll check it, thanks.

I've also seen echo, but only intermittently...

thanks
-- PMM

[Qemu-devel] [RFC Patch v2 01/10] virtio-net rsc: Data structure, 'Segment', 'Chain' and 'Status'

2016-01-31 Thread wexu

From: Wei Xu 

Segment is the coalesced packets in a connection.

Status is to indicate the status while do coalescing, such as if a
packet is bypassed or coalesced, etc.

Chain is used to save the segments of different protocols in a VirtIONet
instance.

A timer is used in a chain to help purging the buffer/coalesced packets.

Signed-off-by: Wei Xu 
---
 include/hw/virtio/virtio.h | 32 
 1 file changed, 32 insertions(+)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 205fadf..1383220 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -127,6 +127,38 @@ typedef struct VirtioDeviceClass {
 int (*load)(VirtIODevice *vdev, QEMUFile *f, int version_id);
 } VirtioDeviceClass;
 
+/* Coalesced packets type & status */
+typedef enum {
+RSC_COALESCE,   /* Data been coalesced */
+RSC_FINAL,  /* Will terminate current connection */
+RSC_NO_MATCH,   /* No matched in the buffer pool */
+RSC_BYPASS, /* Packet to be bypass, not tcp, tcp ctrl, etc */
+RSC_WANT/* Data want to be coalesced */
+} COALESCE_STATUS;
+
+/* Coalesced segmant */
+typedef struct NetRscSeg {
+QTAILQ_ENTRY(NetRscSeg) next;
+void *buf;
+size_t size;
+uint32_t dup_ack_count;
+bool is_coalesced;  /* need recal ipv4 header checksum, mark here */
+NetClientState *nc;
+} NetRscSeg;
+
+/* Receive callback for ipv4/6 */
+typedef size_t (VirtioNetReceive) (void *,
+   NetClientState *, const uint8_t *, size_t);
+
+/* Chain is divided by protocol(ipv4/v6) and NetClientInfo */
+typedef struct NetRscChain {
+QTAILQ_ENTRY(NetRscChain) next;
+uint16_t proto;
+VirtioNetReceive *do_receive;
+QEMUTimer *drain_timer;
+QTAILQ_HEAD(, NetRscSeg) buffers;
+} NetRscChain;
+
 void virtio_instance_init_common(Object *proxy_obj, void *data,
  size_t vdev_size, const char *vdev_name);
 
-- 
2.4.0

[Qemu-devel] [RFC Patch v2 04/10] virtio-net rsc: Detailed IPv4 and General TCP data coalescing

2016-01-31 Thread wexu

From: Wei Xu 

Since this feature also needs to support IPv6, and there are
some protocol specific differences difference for IPv4/6 in the header,
so try to make the interface to be general.

IPv4/6 should set up both the new and old IP/TCP header before invoking
TCP coalescing, and should also tell the real payload.

The main handler of TCP includes TCP window update, duplicated ACK check
and the real data coalescing if the new segment passed invalid filter
and is identified as an expected one.

An expected segment means:
1. Segment is within current window and the sequence is the expected one.
2. ACK of the segment is in the valid window.
3. If the ACK in the segment is a duplicated one, then it must less than 2,
   this is to notify upper layer TCP starting retransmission due to the spec.

Signed-off-by: Wei Xu 
---
 hw/net/virtio-net.c | 127 ++--
 1 file changed, 124 insertions(+), 3 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index cfbac6d..4f77fbe 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -41,6 +41,10 @@
 
 #define VIRTIO_HEADER   12/* Virtio net header size */
 #define IP_OFFSET (VIRTIO_HEADER + sizeof(struct eth_header))
+#define TCP_WINDOW  65535
+
+/* IPv4 max payload, 16 bits in the header */
+#define MAX_IP4_PAYLOAD  (65535 - sizeof(struct ip_header))
 
 #define MAX_VIRTIO_IP_PAYLOAD  (65535 + IP_OFFSET)
 
@@ -1670,13 +1674,130 @@ out:
 return 0;
 }
 
+static int32_t virtio_net_rsc_handle_ack(NetRscChain *chain, NetRscSeg *seg,
+ const uint8_t *buf, struct tcp_header *n_tcp,
+ struct tcp_header *o_tcp)
+{
+uint32_t nack, oack;
+uint16_t nwin, owin;
+
+nack = htonl(n_tcp->th_ack);
+nwin = htons(n_tcp->th_win);
+oack = htonl(o_tcp->th_ack);
+owin = htons(o_tcp->th_win);
+
+if ((nack - oack) >= TCP_WINDOW) {
+return RSC_FINAL;
+} else if (nack == oack) {
+/* duplicated ack or window probe */
+if (nwin == owin) {
+/* duplicated ack, add dup ack count due to whql test up to 1 */
+
+if (seg->dup_ack_count == 0) {
+seg->dup_ack_count++;
+return RSC_COALESCE;
+} else {
+/* Spec says should send it directly */
+return RSC_FINAL;
+}
+} else {
+/* Coalesce window update */
+o_tcp->th_win = n_tcp->th_win;
+return RSC_COALESCE;
+}
+} else {
+/* pure ack, update ack */
+o_tcp->th_ack = n_tcp->th_ack;
+return RSC_COALESCE;
+}
+}
+
+static int32_t virtio_net_rsc_coalesce_tcp(NetRscChain *chain, NetRscSeg *seg,
+   const uint8_t *buf, struct tcp_header *n_tcp, uint16_t 
n_tcp_len,
+   uint16_t n_data, struct tcp_header *o_tcp, uint16_t o_tcp_len,
+   uint16_t o_data, uint16_t *p_ip_len, uint16_t max_data)
+{
+void *data;
+uint16_t o_ip_len;
+uint32_t nseq, oseq;
+
+o_ip_len = htons(*p_ip_len);
+nseq = htonl(n_tcp->th_seq);
+oseq = htonl(o_tcp->th_seq);
+
+/* Ignore packet with more/larger tcp options */
+if (n_tcp_len > o_tcp_len) {
+return RSC_FINAL;
+}
+
+/* out of order or retransmitted. */
+if ((nseq - oseq) > TCP_WINDOW) {
+return RSC_FINAL;
+}
+
+data = ((uint8_t *)n_tcp) + n_tcp_len;
+if (nseq == oseq) {
+if ((0 == o_data) && n_data) {
+/* From no payload to payload, normal case, not a dup ack or etc */
+goto coalesce;
+} else {
+return virtio_net_rsc_handle_ack(chain, seg, buf, n_tcp, o_tcp);
+}
+} else if ((nseq - oseq) != o_data) {
+/* Not a consistent packet, out of order */
+return RSC_FINAL;
+} else {
+coalesce:
+if ((o_ip_len + n_data) > max_data) {
+return RSC_FINAL;
+}
+
+/* Here comes the right data, the payload lengh in v4/v6 is different,
+   so use the field value to update */
+*p_ip_len = htons(o_ip_len + n_data); /* Update new data len */
+o_tcp->th_offset_flags = n_tcp->th_offset_flags; /* Bring 'PUSH' big */
+o_tcp->th_ack = n_tcp->th_ack;
+o_tcp->th_win = n_tcp->th_win;
+
+memmove(seg->buf + seg->size, data, n_data);
+seg->size += n_data;
+return RSC_COALESCE;
+}
+}
 
 static int32_t virtio_net_rsc_try_coalesce4(NetRscChain *chain,
NetRscSeg *seg, const uint8_t *buf, size_t size)
 {
-/* This real part of this function will be introduced in next patch, just
-*  return a 'final' to feed the compilation. */
-return RSC_FINAL;
+uint16_t o_ip_len, n_ip_len;/* len in ip header field */
+uint16_t n_ip_hdrlen, o_ip_hdrlen;  /* ipv4 header len */
+uint16_t n_tcp_len, o_tcp_len;  /* tcp header len */
+uint16_t o_data, n_data;  /*

[Qemu-devel] [RFC Patch v2 02/10] virtio-net rsc: Initilize & Cleanup

2016-01-31 Thread wexu

From: Wei Xu 

The chain list is initialized when the device is getting realized,
and the entry of the chain will be inserted dynamically according
to protocol type of the network traffic.

All the buffered packets and chain will be destroyed when the
device is going to be unrealized.

Signed-off-by: Wei Xu 
---
 hw/net/virtio-net.c| 22 ++
 include/hw/virtio/virtio-net.h |  1 +
 2 files changed, 23 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index a877614..4e9458e 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1603,6 +1603,26 @@ static int virtio_net_load_device(VirtIODevice *vdev, 
QEMUFile *f,
 return 0;
 }
 
+
+static void virtio_net_rsc_cleanup(VirtIONet *n)
+{
+NetRscChain *chain, *rn_chain;
+NetRscSeg *seg, *rn_seg;
+
+QTAILQ_FOREACH_SAFE(chain, &n->rsc_chains, next, rn_chain) {
+QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, rn_seg) {
+QTAILQ_REMOVE(&chain->buffers, seg, next);
+g_free(seg->buf);
+g_free(seg);
+
+timer_del(chain->drain_timer);
+timer_free(chain->drain_timer);
+QTAILQ_REMOVE(&n->rsc_chains, chain, next);
+g_free(chain);
+}
+}
+}
+
 static NetClientInfo net_virtio_info = {
 .type = NET_CLIENT_OPTIONS_KIND_NIC,
 .size = sizeof(NICState),
@@ -1732,6 +1752,7 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)
 nc = qemu_get_queue(n->nic);
 nc->rxfilter_notify_enabled = 1;
 
+QTAILQ_INIT(&n->rsc_chains);
 n->qdev = dev;
 register_savevm(dev, "virtio-net", -1, VIRTIO_NET_VM_VERSION,
 virtio_net_save, virtio_net_load, n);
@@ -1766,6 +1787,7 @@ static void virtio_net_device_unrealize(DeviceState *dev, 
Error **errp)
 g_free(n->vqs);
 qemu_del_nic(n->nic);
 virtio_cleanup(vdev);
+virtio_net_rsc_cleanup(n);
 }
 
 static void virtio_net_instance_init(Object *obj)
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index f3cc25f..6ce8b93 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -59,6 +59,7 @@ typedef struct VirtIONet {
 VirtIONetQueue *vqs;
 VirtQueue *ctrl_vq;
 NICState *nic;
+QTAILQ_HEAD(, NetRscChain) rsc_chains;
 uint32_t tx_timeout;
 int32_t tx_burst;
 uint32_t has_vnet_hdr;
-- 
2.4.0

[Qemu-devel] [RFC v2 0/10] Support Receive-Segment-Offload(RSC) for WHQL test of Window guest

2016-01-31 Thread wexu

From: Wei Xu 

Patch v2 add detailed commit log.

This patch is to support WHQL test for Windows guest, while this feature also
benifits other guest works as a kernel 'gro' like feature with userspace 
implementation.
Feature information:
  http://msdn.microsoft.com/en-us/library/windows/hardware/jj853324

Both IPv4 and IPv6 are supported, though performance with userspace virtio
is slow than vhost-net, there is about 30-40 percent performance
improvement to userspace virtio, this is done by turning this feature on
and disable 'tso' on corresponding tap interface.

Test steps:
Although this feature is mainly used for window guest, i used linux guest to 
help test
the feature, to make things simple, i used 3 steps to test the patch as i moved 
on.
1. With a tcp socket client/server pair runnig on 2 linux guest, thus i can 
control
the traffic and debugging the code as i want.
2. Netperf on linux guest test the throughput.
3. WHQL test with 2 Windows guest.

Current status:
IPv4 pass all the above tests. 
IPv6 just passed test step 1 and 2 as described ahead, the virtio nic cannot 
receive
any packet in WHQL test, debugging on the host side shows all the packets have 
been
pushed to th vring, by replacing it with a linux guest, i add 10 extra packets 
before
sending out the real packet, tcpdump running on guest only capture 6 packets, 
don't
find out the root cause yet, will continue working on this.

Note:
A 'MessageDevice' nic chose as 'Realtek' will panic the system sometimes during 
setup,
this can be figured out by replacing it with an 'e1000' nic.

Pending issues & Todo list:
1. Dup ack count not added in the virtio_net_hdr, but WHQL test case passes,
looks like a bug in test case.
2. Missing a Feature Bit
3. Missing a few tcp/ip handling
ECN change.
TCP window scale.

Wei Xu (10):
  virtio-net rsc: Data structure, 'Segment', 'Chain' and 'Status'
  virtio-net rsc: Initilize & Cleanup
  virtio-net rsc: Chain Lookup, Packet Caching and Framework of IPv4
  virtio-net rsc: Detailed IPv4 and General TCP data coalescing
  virtio-net rsc: Create timer to drain the packets from the cache pool
  virtio-net rsc: IPv4 checksum
  virtio-net rsc: Checking TCP flag and drain specific connection
packets
  virtio-net rsc: Sanity check & More bypass cases check
  virtio-net rsc: Add IPv6 support
  virtio-net rsc: Add Receive Segment Coalesce statistics

 hw/net/virtio-net.c| 626 -
 include/hw/virtio/virtio-net.h |   1 +
 include/hw/virtio/virtio.h |  65 +
 3 files changed, 691 insertions(+), 1 deletion(-)

-- 
2.4.0

[Qemu-devel] [RFC Patch v2 03/10] virtio-net rsc: Chain Lookup, Packet Caching and Framework of IPv4

2016-01-31 Thread wexu

From: Wei Xu 

Upon a packet is arriving, a corresponding chain will be selected or created,
or be bypassed if it's not an IPv4 packets.

The callback in the chain will be invoked to call the real coalescing.

Since the coalescing is based on the TCP connection, so the packets will be
cached if there is no previous data within the same connection.

The framework of IPv4 is also introduced.

This patch depends on patch 2918cf2 (Detailed IPv4 and General TCP data
coalescing)

Signed-off-by: Wei Xu 
---
 hw/net/virtio-net.c | 173 +++-
 1 file changed, 172 insertions(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 4e9458e..cfbac6d 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -14,10 +14,12 @@
 #include "qemu/iov.h"
 #include "hw/virtio/virtio.h"
 #include "net/net.h"
+#include "net/eth.h"
 #include "net/checksum.h"
 #include "net/tap.h"
 #include "qemu/error-report.h"
 #include "qemu/timer.h"
+#include "qemu/sockets.h"
 #include "hw/virtio/virtio-net.h"
 #include "net/vhost_net.h"
 #include "hw/virtio/virtio-bus.h"
@@ -37,6 +39,21 @@
 #define endof(container, field) \
 (offsetof(container, field) + sizeof(((container *)0)->field))
 
+#define VIRTIO_HEADER   12/* Virtio net header size */
+#define IP_OFFSET (VIRTIO_HEADER + sizeof(struct eth_header))
+
+#define MAX_VIRTIO_IP_PAYLOAD  (65535 + IP_OFFSET)
+
+/* Global statistics */
+static uint32_t rsc_chain_no_mem;
+
+/* Switcher to enable/disable rsc */
+static bool virtio_net_rsc_bypass;
+
+/* Coalesce callback for ipv4/6 */
+typedef int32_t (VirtioNetCoalesce) (NetRscChain *chain, NetRscSeg *seg,
+ const uint8_t *buf, size_t size);
+
 typedef struct VirtIOFeature {
 uint32_t flags;
 size_t end;
@@ -1019,7 +1036,8 @@ static int receive_filter(VirtIONet *n, const uint8_t 
*buf, int size)
 return 0;
 }
 
-static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, 
size_t size)
+static ssize_t virtio_net_do_receive(NetClientState *nc,
+  const uint8_t *buf, size_t size)
 {
 VirtIONet *n = qemu_get_nic_opaque(nc);
 VirtIONetQueue *q = virtio_net_get_subqueue(nc);
@@ -1623,6 +1641,159 @@ static void virtio_net_rsc_cleanup(VirtIONet *n)
 }
 }
 
+static int virtio_net_rsc_cache_buf(NetRscChain *chain, NetClientState *nc,
+const uint8_t *buf, size_t size)
+{
+NetRscSeg *seg;
+
+seg = g_malloc(sizeof(NetRscSeg));
+if (!seg) {
+return 0;
+}
+
+seg->buf = g_malloc(MAX_VIRTIO_IP_PAYLOAD);
+if (!seg->buf) {
+goto out;
+}
+
+memmove(seg->buf, buf, size);
+seg->size = size;
+seg->dup_ack_count = 0;
+seg->is_coalesced = 0;
+seg->nc = nc;
+
+QTAILQ_INSERT_TAIL(&chain->buffers, seg, next);
+return size;
+
+out:
+g_free(seg);
+return 0;
+}
+
+
+static int32_t virtio_net_rsc_try_coalesce4(NetRscChain *chain,
+   NetRscSeg *seg, const uint8_t *buf, size_t size)
+{
+/* This real part of this function will be introduced in next patch, just
+*  return a 'final' to feed the compilation. */
+return RSC_FINAL;
+}
+
+static size_t virtio_net_rsc_callback(NetRscChain *chain, NetClientState *nc,
+const uint8_t *buf, size_t size, VirtioNetCoalesce *coalesce)
+{
+int ret;
+NetRscSeg *seg, *nseg;
+
+if (QTAILQ_EMPTY(&chain->buffers)) {
+if (!virtio_net_rsc_cache_buf(chain, nc, buf, size)) {
+return 0;
+} else {
+return size;
+}
+}
+
+QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, nseg) {
+ret = coalesce(chain, seg, buf, size);
+if (RSC_FINAL == ret) {
+ret = virtio_net_do_receive(seg->nc, seg->buf, seg->size);
+QTAILQ_REMOVE(&chain->buffers, seg, next);
+g_free(seg->buf);
+g_free(seg);
+if (ret == 0) {
+/* Send failed */
+return 0;
+}
+
+/* Send current packet */
+return virtio_net_do_receive(nc, buf, size);
+} else if (RSC_NO_MATCH == ret) {
+continue;
+} else {
+/* Coalesced, mark coalesced flag to tell calc cksum for ipv4 */
+seg->is_coalesced = 1;
+return size;
+}
+}
+
+return virtio_net_rsc_cache_buf(chain, nc, buf, size);
+}
+
+static size_t virtio_net_rsc_receive4(void *opq, NetClientState* nc,
+  const uint8_t *buf, size_t size)
+{
+NetRscChain *chain;
+
+chain = (NetRscChain *)opq;
+return virtio_net_rsc_callback(chain, nc, buf, size,
+   virtio_net_rsc_try_coalesce4);
+}
+
+static NetRscChain *virtio_net_rsc_lookup_chain(NetClientState *nc,
+uint16_t proto)
+{
+VirtIONet *n;
+NetRscChain *chain

[Qemu-devel] [RFC Patch v2 06/10] virtio-net rsc: IPv4 checksum

2016-01-31 Thread wexu

From: Wei Xu 

If a field in the IPv4 header is modified, then the checksum
have to be recalculated before sending it out.

Signed-off-by: Wei Xu 
---
 hw/net/virtio-net.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 93df0d5..88fc4f8 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1630,6 +1630,18 @@ static int virtio_net_load_device(VirtIODevice *vdev, 
QEMUFile *f,
 return 0;
 }
 
+static void virtio_net_rsc_ipv4_checksum(NetRscSeg *seg)
+{
+uint32_t sum;
+struct ip_header *ip;
+
+ip = (struct ip_header *)(seg->buf + IP_OFFSET);
+
+ip->ip_sum = 0;
+sum = net_checksum_add_cont(sizeof(struct ip_header), (uint8_t *)ip, 0);
+ip->ip_sum = cpu_to_be16(net_checksum_finish(sum));
+}
+
 static void virtio_net_rsc_purge(void *opq)
 {
 int ret = 0;
@@ -1643,6 +1655,10 @@ static void virtio_net_rsc_purge(void *opq)
 continue;
 }
 
+if ((chain->proto == ETH_P_IP) && seg->is_coalesced) {
+virtio_net_rsc_ipv4_checksum(seg);
+}
+
 ret = virtio_net_do_receive(seg->nc, seg->buf, seg->size);
 QTAILQ_REMOVE(&chain->buffers, seg, next);
 g_free(seg->buf);
@@ -1853,6 +1869,9 @@ static size_t virtio_net_rsc_callback(NetRscChain *chain, 
NetClientState *nc,
 QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, nseg) {
 ret = coalesce(chain, seg, buf, size);
 if (RSC_FINAL == ret) {
+if ((chain->proto == ETH_P_IP) && seg->is_coalesced) {
+virtio_net_rsc_ipv4_checksum(seg);
+}
 ret = virtio_net_do_receive(seg->nc, seg->buf, seg->size);
 QTAILQ_REMOVE(&chain->buffers, seg, next);
 g_free(seg->buf);
-- 
2.4.0

[Qemu-devel] [RFC Patch v2 05/10] virtio-net rsc: Create timer to drain the packets from the cache pool

2016-01-31 Thread wexu

From: Wei Xu 

The timer will only be triggered if the packets pool is not empty,
and it'll drain off all the cached packets, this is to reduce the
delay to upper layer protocol stack.

Signed-off-by: Wei Xu 
---
 hw/net/virtio-net.c | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 4f77fbe..93df0d5 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -48,12 +48,17 @@
 
 #define MAX_VIRTIO_IP_PAYLOAD  (65535 + IP_OFFSET)
 
+/* Purge coalesced packets timer interval */
+#define RSC_TIMER_INTERVAL  50
+
 /* Global statistics */
 static uint32_t rsc_chain_no_mem;
 
 /* Switcher to enable/disable rsc */
 static bool virtio_net_rsc_bypass;
 
+static uint32_t rsc_timeout = RSC_TIMER_INTERVAL;
+
 /* Coalesce callback for ipv4/6 */
 typedef int32_t (VirtioNetCoalesce) (NetRscChain *chain, NetRscSeg *seg,
  const uint8_t *buf, size_t size);
@@ -1625,6 +1630,35 @@ static int virtio_net_load_device(VirtIODevice *vdev, 
QEMUFile *f,
 return 0;
 }
 
+static void virtio_net_rsc_purge(void *opq)
+{
+int ret = 0;
+NetRscChain *chain = (NetRscChain *)opq;
+NetRscSeg *seg, *rn;
+
+QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, rn) {
+if (!qemu_can_send_packet(seg->nc)) {
+/* Should quit or continue? not sure if one or some
+* of the queues fail would happen, try continue here */
+continue;
+}
+
+ret = virtio_net_do_receive(seg->nc, seg->buf, seg->size);
+QTAILQ_REMOVE(&chain->buffers, seg, next);
+g_free(seg->buf);
+g_free(seg);
+
+if (ret == 0) {
+/* Try next queue */
+continue;
+}
+}
+
+if (!QTAILQ_EMPTY(&chain->buffers)) {
+timer_mod(chain->drain_timer,
+  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + rsc_timeout);
+}
+}
 
 static void virtio_net_rsc_cleanup(VirtIONet *n)
 {
@@ -1810,6 +1844,8 @@ static size_t virtio_net_rsc_callback(NetRscChain *chain, 
NetClientState *nc,
 if (!virtio_net_rsc_cache_buf(chain, nc, buf, size)) {
 return 0;
 } else {
+timer_mod(chain->drain_timer,
+  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + rsc_timeout);
 return size;
 }
 }
@@ -1877,6 +1913,8 @@ static NetRscChain 
*virtio_net_rsc_lookup_chain(NetClientState *nc,
 }
 
 chain->proto = proto;
+chain->drain_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
+  virtio_net_rsc_purge, chain);
 chain->do_receive = virtio_net_rsc_receive4;
 
 QTAILQ_INIT(&chain->buffers);
-- 
2.4.0

[Qemu-devel] [RFC Patch v2 09/10] virtio-net rsc: Add IPv6 support

2016-01-31 Thread wexu

From: Wei Xu 

A few more stuffs should be included to support this
1. Corresponding chain lookup
2. Coalescing callback for the protocol chain
3. Filter & Sanity Check.

Signed-off-by: Wei Xu 
---
 hw/net/virtio-net.c | 104 +++-
 1 file changed, 102 insertions(+), 2 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 9b44762..c9f6bfc 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -46,12 +46,19 @@
 #define TCP4_OFFSET (IP_OFFSET + sizeof(struct ip_header)) /* tcp4 header */
 #define TCP4_PORT_OFFSET TCP4_OFFSET/* tcp4 port offset */
 #define IP4_ADDR_SIZE   8   /* ipv4 saddr + daddr */
+
+#define IP6_ADDR_OFFSET (IP_OFFSET + 8) /* ipv6 address start */
+#define TCP6_OFFSET (IP_OFFSET + sizeof(struct ip6_header)) /* tcp6 header */
+#define TCP6_PORT_OFFSET TCP6_OFFSET/* tcp6 port offset */
+#define IP6_ADDR_SIZE   32  /* ipv6 saddr + daddr */
 #define TCP_PORT_SIZE   4   /* sport + dport */
 #define TCP_WINDOW  65535
 
 /* IPv4 max payload, 16 bits in the header */
 #define MAX_IP4_PAYLOAD  (65535 - sizeof(struct ip_header))
 
+/* ip6 max payload, payload in ipv6 don't include the  header */
+#define MAX_IP6_PAYLOAD  65535
 #define MAX_VIRTIO_IP_PAYLOAD  (65535 + IP_OFFSET)
 
 /* Purge coalesced packets timer interval */
@@ -1856,6 +1863,42 @@ static int32_t virtio_net_rsc_try_coalesce4(NetRscChain 
*chain,
 o_data, &o_ip->ip_len, MAX_IP4_PAYLOAD);
 }
 
+static int32_t virtio_net_rsc_try_coalesce6(NetRscChain *chain,
+NetRscSeg *seg, const uint8_t *buf, size_t size)
+{
+uint16_t o_ip_len, n_ip_len;/* len in ip header field */
+uint16_t n_tcp_len, o_tcp_len;  /* tcp header len */
+uint16_t o_data, n_data;/* payload without virtio/eth/ip/tcp */
+struct ip6_header *n_ip, *o_ip;
+struct tcp_header *n_tcp, *o_tcp;
+
+n_ip = (struct ip6_header *)(buf + IP_OFFSET);
+n_ip_len = htons(n_ip->ip6_ctlun.ip6_un1.ip6_un1_plen);
+n_tcp = (struct tcp_header *)(((uint8_t *)n_ip)\
++ sizeof(struct ip6_header));
+n_tcp_len = (htons(n_tcp->th_offset_flags) & 0xF000) >> 10;
+n_data = n_ip_len - n_tcp_len;
+
+o_ip = (struct ip6_header *)(seg->buf + IP_OFFSET);
+o_ip_len = htons(o_ip->ip6_ctlun.ip6_un1.ip6_un1_plen);
+o_tcp = (struct tcp_header *)(((uint8_t *)o_ip)\
++ sizeof(struct ip6_header));
+o_tcp_len = (htons(o_tcp->th_offset_flags) & 0xF000) >> 10;
+o_data = o_ip_len - o_tcp_len;
+
+if (memcmp(&n_ip->ip6_src, &o_ip->ip6_src, sizeof(struct in6_address))
+|| memcmp(&n_ip->ip6_dst, &o_ip->ip6_dst, sizeof(struct in6_address))
+|| (n_tcp->th_sport ^ o_tcp->th_sport)
+|| (n_tcp->th_dport ^ o_tcp->th_dport)) {
+return RSC_NO_MATCH;
+}
+
+/* There is a difference between payload lenght in ipv4 and v6,
+   ip header is excluded in ipv6 */
+return virtio_net_rsc_coalesce_tcp(chain, seg, buf,
+   n_tcp, n_tcp_len, n_data, o_tcp, o_tcp_len, o_data,
+   &o_ip->ip6_ctlun.ip6_un1.ip6_un1_plen, MAX_IP6_PAYLOAD);
+}
 
 /* Pakcets with 'SYN' should bypass, other flag should be sent after drain
  * to prevent out of order */
@@ -2015,6 +2058,59 @@ static size_t virtio_net_rsc_receive4(void *opq, 
NetClientState* nc,
virtio_net_rsc_try_coalesce4);
 }
 
+static int32_t virtio_net_rsc_filter6(NetRscChain *chain, struct ip6_header 
*ip,
+  const uint8_t *buf, size_t size)
+{
+uint16_t ip_len;
+
+if (size < (TCP6_OFFSET + sizeof(tcp_header))) {
+return RSC_BYPASS;
+}
+
+if (0x6 != (0xF & ip->ip6_ctlun.ip6_un1.ip6_un1_flow)) {
+return RSC_BYPASS;
+}
+
+/* Both option and protocol is checked in this */
+if (ip->ip6_ctlun.ip6_un1.ip6_un1_nxt != IPPROTO_TCP) {
+return RSC_BYPASS;
+}
+
+/* Sanity check */
+ip_len = htons(ip->ip6_ctlun.ip6_un1.ip6_un1_plen);
+if (ip_len < sizeof(struct tcp_header)
+|| ip_len > (size - TCP6_OFFSET)) {
+return RSC_BYPASS;
+}
+
+return 0;
+}
+
+static size_t virtio_net_rsc_receive6(void *opq, NetClientState* nc,
+  const uint8_t *buf, size_t size)
+{
+int32_t ret;
+NetRscChain *chain;
+struct ip6_header *ip;
+
+chain = (NetRscChain *)opq;
+ip = (struct ip6_header *)(buf + IP_OFFSET);
+if (RSC_WANT != virtio_net_rsc_filter6(chain, ip, buf, size)) {
+return virtio_net_do_receive(nc, buf, size);
+}
+
+ret = virtio_net_rsc_parse_tcp_ctrl((uint8_t *)ip, sizeof(*ip));
+if (RSC_BYPASS == ret) {
+return virtio_net_do_receive(nc, buf, size);
+} else if (RSC_FINAL == ret) {
+return virtio_net_rsc_drain_one(chain, nc, buf,

[Qemu-devel] [RFC Patch v2 07/10] virtio-net rsc: Checking TCP flag and drain specific connection packets

2016-01-31 Thread wexu

From: Wei Xu 

Normally it includes 2 typical way to handle a TCP control flag, bypass
and finalize, bypass means should be sent out directly, and finalize
means the packets should also be bypassed, and this should be done
after searching for the same connection packets in the pool and sending
all of them out, this is to avoid out of data.

All the 'SYN' packets will be bypassed since this always begin a new'
connection, other flag such 'FIN/RST' will trigger a finalization, because
this normally happens upon a connection is going to be closed.

Signed-off-by: Wei Xu 
---
 hw/net/virtio-net.c | 66 +
 1 file changed, 66 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 88fc4f8..b0987d0 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -41,6 +41,12 @@
 
 #define VIRTIO_HEADER   12/* Virtio net header size */
 #define IP_OFFSET (VIRTIO_HEADER + sizeof(struct eth_header))
+
+#define IP4_ADDR_OFFSET (IP_OFFSET + 12)/* ipv4 address start */
+#define TCP4_OFFSET (IP_OFFSET + sizeof(struct ip_header)) /* tcp4 header */
+#define TCP4_PORT_OFFSET TCP4_OFFSET/* tcp4 port offset */
+#define IP4_ADDR_SIZE   8   /* ipv4 saddr + daddr */
+#define TCP_PORT_SIZE   4   /* sport + dport */
 #define TCP_WINDOW  65535
 
 /* IPv4 max payload, 16 bits in the header */
@@ -1850,6 +1856,27 @@ static int32_t virtio_net_rsc_try_coalesce4(NetRscChain 
*chain,
 o_data, &o_ip->ip_len, MAX_IP4_PAYLOAD);
 }
 
+
+/* Pakcets with 'SYN' should bypass, other flag should be sent after drain
+ * to prevent out of order */
+static int virtio_net_rsc_parse_tcp_ctrl(uint8_t *ip, uint16_t offset)
+{
+uint16_t tcp_flag;
+struct tcp_header *tcp;
+
+tcp = (struct tcp_header *)(ip + offset);
+tcp_flag = htons(tcp->th_offset_flags) & 0x3F;
+if (tcp_flag & TH_SYN) {
+return RSC_BYPASS;
+}
+
+if (tcp_flag & (TH_FIN | TH_URG | TH_RST)) {
+return RSC_FINAL;
+}
+
+return 0;
+}
+
 static size_t virtio_net_rsc_callback(NetRscChain *chain, NetClientState *nc,
 const uint8_t *buf, size_t size, VirtioNetCoalesce *coalesce)
 {
@@ -1895,12 +1922,51 @@ static size_t virtio_net_rsc_callback(NetRscChain 
*chain, NetClientState *nc,
 return virtio_net_rsc_cache_buf(chain, nc, buf, size);
 }
 
+/* Drain a connection data, this is to avoid out of order segments */
+static size_t virtio_net_rsc_drain_one(NetRscChain *chain, NetClientState *nc,
+const uint8_t *buf, size_t size, uint16_t ip_start,
+uint16_t ip_size, uint16_t tcp_port, uint16_t port_size)
+{
+NetRscSeg *seg, *nseg;
+
+QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, nseg) {
+if (memcmp(buf + ip_start, seg->buf + ip_start, ip_size)
+|| memcmp(buf + tcp_port, seg->buf + tcp_port, port_size)) {
+continue;
+}
+if ((chain->proto == ETH_P_IP) && seg->is_coalesced) {
+virtio_net_rsc_ipv4_checksum(seg);
+}
+
+virtio_net_do_receive(seg->nc, seg->buf, seg->size);
+
+QTAILQ_REMOVE(&chain->buffers, seg, next);
+g_free(seg->buf);
+g_free(seg);
+break;
+}
+
+return virtio_net_do_receive(nc, buf, size);
+}
 static size_t virtio_net_rsc_receive4(void *opq, NetClientState* nc,
   const uint8_t *buf, size_t size)
 {
+int32_t ret;
+struct ip_header *ip;
 NetRscChain *chain;
 
 chain = (NetRscChain *)opq;
+ip = (struct ip_header *)(buf + IP_OFFSET);
+
+ret = virtio_net_rsc_parse_tcp_ctrl((uint8_t *)ip,
+(0xF & ip->ip_ver_len) << 2);
+if (RSC_BYPASS == ret) {
+return virtio_net_do_receive(nc, buf, size);
+} else if (RSC_FINAL == ret) {
+return virtio_net_rsc_drain_one(chain, nc, buf, size, IP4_ADDR_OFFSET,
+IP4_ADDR_SIZE, TCP4_PORT_OFFSET, 
TCP_PORT_SIZE);
+}
+
 return virtio_net_rsc_callback(chain, nc, buf, size,
virtio_net_rsc_try_coalesce4);
 }
-- 
2.4.0

[Qemu-devel] [RFC Patch v2 08/10] virtio-net rsc: Sanity check & More bypass cases check

2016-01-31 Thread wexu

From: Wei Xu 

More general exception cases check
1. Incorrect version in IP header
2. IP options & IP fragment
3. Not a TCP packets
4. Sanity size check to prevent buffer overflow attack.

Signed-off-by: Wei Xu 
---
 hw/net/virtio-net.c | 44 
 1 file changed, 44 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index b0987d0..9b44762 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1948,6 +1948,46 @@ static size_t virtio_net_rsc_drain_one(NetRscChain 
*chain, NetClientState *nc,
 
 return virtio_net_do_receive(nc, buf, size);
 }
+
+static int32_t virtio_net_rsc_filter4(NetRscChain *chain, struct ip_header *ip,
+  const uint8_t *buf, size_t size)
+{
+uint16_t ip_len;
+
+if (size < (TCP4_OFFSET + sizeof(tcp_header))) {
+return RSC_BYPASS;
+}
+
+/* Not an ipv4 one */
+if (0x4 != ((0xF0 & ip->ip_ver_len) >> 4)) {
+return RSC_BYPASS;
+}
+
+/* Don't handle packets with ip option */
+if (5 != (0xF & ip->ip_ver_len)) {
+return RSC_BYPASS;
+}
+
+/* Don't handle packets with ip fragment */
+if (!(htons(ip->ip_off) & IP_DF)) {
+return RSC_BYPASS;
+}
+
+if (ip->ip_p != IPPROTO_TCP) {
+return RSC_BYPASS;
+}
+
+/* Sanity check */
+ip_len = htons(ip->ip_len);
+if (ip_len < (sizeof(struct ip_header) + sizeof(struct tcp_header))
+|| ip_len > (size - IP_OFFSET)) {
+return RSC_BYPASS;
+}
+
+return RSC_WANT;
+}
+
+
 static size_t virtio_net_rsc_receive4(void *opq, NetClientState* nc,
   const uint8_t *buf, size_t size)
 {
@@ -1958,6 +1998,10 @@ static size_t virtio_net_rsc_receive4(void *opq, 
NetClientState* nc,
 chain = (NetRscChain *)opq;
 ip = (struct ip_header *)(buf + IP_OFFSET);
 
+if (RSC_WANT != virtio_net_rsc_filter4(chain, ip, buf, size)) {
+return virtio_net_do_receive(nc, buf, size);
+}
+
 ret = virtio_net_rsc_parse_tcp_ctrl((uint8_t *)ip,
 (0xF & ip->ip_ver_len) << 2);
 if (RSC_BYPASS == ret) {
-- 
2.4.0

[Qemu-devel] [RFC Patch v2 10/10] virtio-net rsc: Add Receive Segment Coalesce statistics

2016-01-31 Thread wexu

From: Wei Xu 

Add statistics to log what happened during the process.

Signed-off-by: Wei Xu 
---
 hw/net/virtio-net.c| 49 +++---
 include/hw/virtio/virtio.h | 33 +++
 2 files changed, 79 insertions(+), 3 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index c9f6bfc..ab08b96 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -66,6 +66,7 @@
 
 /* Global statistics */
 static uint32_t rsc_chain_no_mem;
+static uint64_t virtio_net_received;
 
 /* Switcher to enable/disable rsc */
 static bool virtio_net_rsc_bypass;
@@ -1679,10 +1680,12 @@ static void virtio_net_rsc_purge(void *opq)
 
 if (ret == 0) {
 /* Try next queue */
+chain->stat.purge_failed++;
 continue;
 }
 }
 
+chain->stat.timer++;
 if (!QTAILQ_EMPTY(&chain->buffers)) {
 timer_mod(chain->drain_timer,
   qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + rsc_timeout);
@@ -1715,6 +1718,7 @@ static int virtio_net_rsc_cache_buf(NetRscChain *chain, 
NetClientState *nc,
 
 seg = g_malloc(sizeof(NetRscSeg));
 if (!seg) {
+chain->stat.no_buf++;
 return 0;
 }
 
@@ -1730,9 +1734,11 @@ static int virtio_net_rsc_cache_buf(NetRscChain *chain, 
NetClientState *nc,
 seg->nc = nc;
 
 QTAILQ_INSERT_TAIL(&chain->buffers, seg, next);
+chain->stat.cache++;
 return size;
 
 out:
+chain->stat.no_buf++;
 g_free(seg);
 return 0;
 }
@@ -1750,27 +1756,33 @@ static int32_t virtio_net_rsc_handle_ack(NetRscChain 
*chain, NetRscSeg *seg,
 owin = htons(o_tcp->th_win);
 
 if ((nack - oack) >= TCP_WINDOW) {
+chain->stat.ack_out_of_win++;
 return RSC_FINAL;
 } else if (nack == oack) {
 /* duplicated ack or window probe */
 if (nwin == owin) {
 /* duplicated ack, add dup ack count due to whql test up to 1 */
+chain->stat.dup_ack++;
 
 if (seg->dup_ack_count == 0) {
 seg->dup_ack_count++;
+chain->stat.dup_ack1++;
 return RSC_COALESCE;
 } else {
 /* Spec says should send it directly */
+chain->stat.dup_ack2++;
 return RSC_FINAL;
 }
 } else {
 /* Coalesce window update */
 o_tcp->th_win = n_tcp->th_win;
+chain->stat.win_update++;
 return RSC_COALESCE;
 }
 } else {
 /* pure ack, update ack */
 o_tcp->th_ack = n_tcp->th_ack;
+chain->stat.pure_ack++;
 return RSC_COALESCE;
 }
 }
@@ -1788,13 +1800,20 @@ static int32_t virtio_net_rsc_coalesce_tcp(NetRscChain 
*chain, NetRscSeg *seg,
 nseq = htonl(n_tcp->th_seq);
 oseq = htonl(o_tcp->th_seq);
 
+if (n_tcp_len > sizeof(struct tcp_header)) {
+/* Log this only for debugging observation */
+chain->stat.tcp_option++;
+}
+
 /* Ignore packet with more/larger tcp options */
 if (n_tcp_len > o_tcp_len) {
+chain->stat.tcp_larger_option++;
 return RSC_FINAL;
 }
 
 /* out of order or retransmitted. */
 if ((nseq - oseq) > TCP_WINDOW) {
+chain->stat.data_out_of_win++;
 return RSC_FINAL;
 }
 
@@ -1802,16 +1821,19 @@ static int32_t virtio_net_rsc_coalesce_tcp(NetRscChain 
*chain, NetRscSeg *seg,
 if (nseq == oseq) {
 if ((0 == o_data) && n_data) {
 /* From no payload to payload, normal case, not a dup ack or etc */
+chain->stat.data_after_pure_ack++;
 goto coalesce;
 } else {
 return virtio_net_rsc_handle_ack(chain, seg, buf, n_tcp, o_tcp);
 }
 } else if ((nseq - oseq) != o_data) {
 /* Not a consistent packet, out of order */
+chain->stat.data_out_of_order++;
 return RSC_FINAL;
 } else {
 coalesce:
 if ((o_ip_len + n_data) > max_data) {
+chain->stat.over_size++;
 return RSC_FINAL;
 }
 
@@ -1824,6 +1846,7 @@ coalesce:
 
 memmove(seg->buf + seg->size, data, n_data);
 seg->size += n_data;
+chain->stat.coalesced++;
 return RSC_COALESCE;
 }
 }
@@ -1855,6 +1878,7 @@ static int32_t virtio_net_rsc_try_coalesce4(NetRscChain 
*chain,
 if ((n_ip->ip_src ^ o_ip->ip_src) || (n_ip->ip_dst ^ o_ip->ip_dst)
 || (n_tcp->th_sport ^ o_tcp->th_sport)
 || (n_tcp->th_dport ^ o_tcp->th_dport)) {
+chain->stat.no_match++;
 return RSC_NO_MATCH;
 }
 
@@ -1890,6 +1914,7 @@ static int32_t virtio_net_rsc_try_coalesce6(NetRscChain 
*chain,
 || memcmp(&n_ip->ip6_dst, &o_ip->ip6_dst, sizeof(struct in6_address))
 || (n_tcp->th_sport ^ o_tcp->th_sport)
 || (n_tcp->th_dport ^ o_tcp->th_dport)) {
+chain->stat.no_match++;
 return RSC_NO_MATCH;
 }
 
@@ -1927,6 +1952,7 @@ static size_t virtio_net_rsc_callba

Re: [Qemu-devel] [RFC Patch v2 02/10] virtio-net rsc: Initilize & Cleanup

2016-01-31 Thread Michael S. Tsirkin

On Mon, Feb 01, 2016 at 02:13:21AM +0800, w...@redhat.com wrote:
> From: Wei Xu 
> 
> The chain list is initialized when the device is getting realized,
> and the entry of the chain will be inserted dynamically according
> to protocol type of the network traffic.
> 
> All the buffered packets and chain will be destroyed when the
> device is going to be unrealized.
> 
> Signed-off-by: Wei Xu 

What happens during migration?

> ---
>  hw/net/virtio-net.c| 22 ++
>  include/hw/virtio/virtio-net.h |  1 +
>  2 files changed, 23 insertions(+)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index a877614..4e9458e 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1603,6 +1603,26 @@ static int virtio_net_load_device(VirtIODevice *vdev, 
> QEMUFile *f,
>  return 0;
>  }
>  
> +
> +static void virtio_net_rsc_cleanup(VirtIONet *n)
> +{
> +NetRscChain *chain, *rn_chain;
> +NetRscSeg *seg, *rn_seg;
> +
> +QTAILQ_FOREACH_SAFE(chain, &n->rsc_chains, next, rn_chain) {
> +QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, rn_seg) {
> +QTAILQ_REMOVE(&chain->buffers, seg, next);
> +g_free(seg->buf);
> +g_free(seg);
> +
> +timer_del(chain->drain_timer);
> +timer_free(chain->drain_timer);
> +QTAILQ_REMOVE(&n->rsc_chains, chain, next);
> +g_free(chain);
> +}
> +}
> +}
> +
>  static NetClientInfo net_virtio_info = {
>  .type = NET_CLIENT_OPTIONS_KIND_NIC,
>  .size = sizeof(NICState),
> @@ -1732,6 +1752,7 @@ static void virtio_net_device_realize(DeviceState *dev, 
> Error **errp)
>  nc = qemu_get_queue(n->nic);
>  nc->rxfilter_notify_enabled = 1;
>  
> +QTAILQ_INIT(&n->rsc_chains);
>  n->qdev = dev;
>  register_savevm(dev, "virtio-net", -1, VIRTIO_NET_VM_VERSION,
>  virtio_net_save, virtio_net_load, n);
> @@ -1766,6 +1787,7 @@ static void virtio_net_device_unrealize(DeviceState 
> *dev, Error **errp)
>  g_free(n->vqs);
>  qemu_del_nic(n->nic);
>  virtio_cleanup(vdev);
> +virtio_net_rsc_cleanup(n);
>  }
>  
>  static void virtio_net_instance_init(Object *obj)
> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> index f3cc25f..6ce8b93 100644
> --- a/include/hw/virtio/virtio-net.h
> +++ b/include/hw/virtio/virtio-net.h
> @@ -59,6 +59,7 @@ typedef struct VirtIONet {
>  VirtIONetQueue *vqs;
>  VirtQueue *ctrl_vq;
>  NICState *nic;
> +QTAILQ_HEAD(, NetRscChain) rsc_chains;
>  uint32_t tx_timeout;
>  int32_t tx_burst;
>  uint32_t has_vnet_hdr;
> -- 
> 2.4.0

Re: [Qemu-devel] [RFC Patch v2 03/10] virtio-net rsc: Chain Lookup, Packet Caching and Framework of IPv4

2016-01-31 Thread Michael S. Tsirkin

On Mon, Feb 01, 2016 at 02:13:22AM +0800, w...@redhat.com wrote:
> From: Wei Xu 
> 
> Upon a packet is arriving, a corresponding chain will be selected or created,
> or be bypassed if it's not an IPv4 packets.
> 
> The callback in the chain will be invoked to call the real coalescing.
> 
> Since the coalescing is based on the TCP connection, so the packets will be
> cached if there is no previous data within the same connection.
> 
> The framework of IPv4 is also introduced.
> 
> This patch depends on patch 2918cf2 (Detailed IPv4 and General TCP data
> coalescing)
> 
> Signed-off-by: Wei Xu 
> ---
>  hw/net/virtio-net.c | 173 
> +++-
>  1 file changed, 172 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 4e9458e..cfbac6d 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -14,10 +14,12 @@
>  #include "qemu/iov.h"
>  #include "hw/virtio/virtio.h"
>  #include "net/net.h"
> +#include "net/eth.h"
>  #include "net/checksum.h"
>  #include "net/tap.h"
>  #include "qemu/error-report.h"
>  #include "qemu/timer.h"
> +#include "qemu/sockets.h"
>  #include "hw/virtio/virtio-net.h"
>  #include "net/vhost_net.h"
>  #include "hw/virtio/virtio-bus.h"
> @@ -37,6 +39,21 @@
>  #define endof(container, field) \
>  (offsetof(container, field) + sizeof(((container *)0)->field))
>  
> +#define VIRTIO_HEADER   12/* Virtio net header size */
> +#define IP_OFFSET (VIRTIO_HEADER + sizeof(struct eth_header))
> +
> +#define MAX_VIRTIO_IP_PAYLOAD  (65535 + IP_OFFSET)
> +
> +/* Global statistics */
> +static uint32_t rsc_chain_no_mem;
> +
> +/* Switcher to enable/disable rsc */
> +static bool virtio_net_rsc_bypass;
> +
> +/* Coalesce callback for ipv4/6 */
> +typedef int32_t (VirtioNetCoalesce) (NetRscChain *chain, NetRscSeg *seg,
> + const uint8_t *buf, size_t size);
> +

Since there are only 2 cases, it's probably better to just
open-code if (v4) -> coalesce4 else if v6 -> coalesce6

>  typedef struct VirtIOFeature {
>  uint32_t flags;
>  size_t end;
> @@ -1019,7 +1036,8 @@ static int receive_filter(VirtIONet *n, const uint8_t 
> *buf, int size)
>  return 0;
>  }
>  
> -static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, 
> size_t size)
> +static ssize_t virtio_net_do_receive(NetClientState *nc,
> +  const uint8_t *buf, size_t size)
>  {
>  VirtIONet *n = qemu_get_nic_opaque(nc);
>  VirtIONetQueue *q = virtio_net_get_subqueue(nc);
> @@ -1623,6 +1641,159 @@ static void virtio_net_rsc_cleanup(VirtIONet *n)
>  }
>  }
>  
> +static int virtio_net_rsc_cache_buf(NetRscChain *chain, NetClientState *nc,
> +const uint8_t *buf, size_t size)
> +{
> +NetRscSeg *seg;
> +
> +seg = g_malloc(sizeof(NetRscSeg));
> +if (!seg) {
> +return 0;
> +}
> +
> +seg->buf = g_malloc(MAX_VIRTIO_IP_PAYLOAD);
> +if (!seg->buf) {
> +goto out;
> +}
> +
> +memmove(seg->buf, buf, size);
> +seg->size = size;
> +seg->dup_ack_count = 0;
> +seg->is_coalesced = 0;
> +seg->nc = nc;
> +
> +QTAILQ_INSERT_TAIL(&chain->buffers, seg, next);
> +return size;
> +
> +out:
> +g_free(seg);
> +return 0;
> +}
> +
> +
> +static int32_t virtio_net_rsc_try_coalesce4(NetRscChain *chain,
> +   NetRscSeg *seg, const uint8_t *buf, size_t size)
> +{
> +/* This real part of this function will be introduced in next patch, just
> +*  return a 'final' to feed the compilation. */
> +return RSC_FINAL;
> +}
> +
> +static size_t virtio_net_rsc_callback(NetRscChain *chain, NetClientState *nc,
> +const uint8_t *buf, size_t size, VirtioNetCoalesce *coalesce)
> +{
> +int ret;
> +NetRscSeg *seg, *nseg;
> +
> +if (QTAILQ_EMPTY(&chain->buffers)) {
> +if (!virtio_net_rsc_cache_buf(chain, nc, buf, size)) {
> +return 0;
> +} else {
> +return size;
> +}
> +}
> +
> +QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, nseg) {
> +ret = coalesce(chain, seg, buf, size);
> +if (RSC_FINAL == ret) {
> +ret = virtio_net_do_receive(seg->nc, seg->buf, seg->size);
> +QTAILQ_REMOVE(&chain->buffers, seg, next);
> +g_free(seg->buf);
> +g_free(seg);
> +if (ret == 0) {
> +/* Send failed */
> +return 0;
> +}
> +
> +/* Send current packet */
> +return virtio_net_do_receive(nc, buf, size);
> +} else if (RSC_NO_MATCH == ret) {
> +continue;
> +} else {
> +/* Coalesced, mark coalesced flag to tell calc cksum for ipv4 */
> +seg->is_coalesced = 1;
> +return size;
> +}
> +}
> +
> +return virtio_net_rsc_cache_buf(chain, nc, buf, size);
> +}
> +
> +static size_t virtio_

Re: [Qemu-devel] [RFC v2 0/10] Support Receive-Segment-Offload(RSC) for WHQL test of Window guest

2016-01-31 Thread Michael S. Tsirkin

On Mon, Feb 01, 2016 at 02:13:19AM +0800, w...@redhat.com wrote:
> From: Wei Xu 
> 
> Patch v2 add detailed commit log.
> 
> This patch is to support WHQL test for Windows guest, while this feature also
> benifits other guest works as a kernel 'gro' like feature with userspace 
> implementation.
> Feature information:
>   http://msdn.microsoft.com/en-us/library/windows/hardware/jj853324
> 
> Both IPv4 and IPv6 are supported, though performance with userspace virtio
> is slow than vhost-net, there is about 30-40 percent performance
> improvement to userspace virtio, this is done by turning this feature on
> and disable 'tso' on corresponding tap interface.
> 
> Test steps:
> Although this feature is mainly used for window guest, i used linux guest to 
> help test
> the feature, to make things simple, i used 3 steps to test the patch as i 
> moved on.
> 1. With a tcp socket client/server pair runnig on 2 linux guest, thus i can 
> control
> the traffic and debugging the code as i want.
> 2. Netperf on linux guest test the throughput.
> 3. WHQL test with 2 Windows guest.
> 
> Current status:
> IPv4 pass all the above tests. 
> IPv6 just passed test step 1 and 2 as described ahead, the virtio nic cannot 
> receive
> any packet in WHQL test, debugging on the host side shows all the packets 
> have been
> pushed to th vring, by replacing it with a linux guest, i add 10 extra 
> packets before
> sending out the real packet, tcpdump running on guest only capture 6 packets, 
> don't
> find out the root cause yet, will continue working on this.
> 
> Note:
> A 'MessageDevice' nic chose as 'Realtek' will panic the system sometimes 
> during setup,
> this can be figured out by replacing it with an 'e1000' nic.

Either memory corruption or unrelated bug.
try with valgrind?

> Pending issues & Todo list:
> 1. Dup ack count not added in the virtio_net_hdr, but WHQL test case passes,
> looks like a bug in test case.

Maybe that's ok - as long as packets are not forwarded.

> 2. Missing a Feature Bit

Do we need a new bit? Maybe for ack coalescing only ...

> 3. Missing a few tcp/ip handling
> ECN change.
> TCP window scale.
> 
> Wei Xu (10):
>   virtio-net rsc: Data structure, 'Segment', 'Chain' and 'Status'
>   virtio-net rsc: Initilize & Cleanup
>   virtio-net rsc: Chain Lookup, Packet Caching and Framework of IPv4
>   virtio-net rsc: Detailed IPv4 and General TCP data coalescing
>   virtio-net rsc: Create timer to drain the packets from the cache pool
>   virtio-net rsc: IPv4 checksum
>   virtio-net rsc: Checking TCP flag and drain specific connection
> packets
>   virtio-net rsc: Sanity check & More bypass cases check
>   virtio-net rsc: Add IPv6 support
>   virtio-net rsc: Add Receive Segment Coalesce statistics
> 
>  hw/net/virtio-net.c| 626 
> -
>  include/hw/virtio/virtio-net.h |   1 +
>  include/hw/virtio/virtio.h |  65 +
>  3 files changed, 691 insertions(+), 1 deletion(-)
> 
> -- 
> 2.4.0

[Qemu-devel] [PATCH 3/3] ppc: include timebase in migration stream for g3beige/mac99 machines

2016-01-31 Thread Mark Cave-Ayland

Signed-off-by: Mark Cave-Ayland 
---
 hw/ppc/mac_newworld.c |4 
 hw/ppc/mac_oldworld.c |4 
 2 files changed, 8 insertions(+)

diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c
index f95086b..3283f1d 100644
--- a/hw/ppc/mac_newworld.c
+++ b/hw/ppc/mac_newworld.c
@@ -179,6 +179,7 @@ static void ppc_core99_init(MachineState *machine)
 int *token = g_new(int, 1);
 hwaddr nvram_addr = 0xFFF04000;
 uint64_t tbfreq;
+PPCTimebase *tb;
 
 linux_boot = (kernel_filename != NULL);
 
@@ -201,6 +202,9 @@ static void ppc_core99_init(MachineState *machine)
 /* Set time-base frequency to 100 Mhz */
 cpu_ppc_tb_init(env, TBFREQ);
 qemu_register_reset(ppc_core99_reset, cpu);
+
+tb = g_malloc0(sizeof(PPCTimebase));
+vmstate_register(NULL, -1, &vmstate_ppc_timebase, tb);
 }
 
 /* allocate RAM */
diff --git a/hw/ppc/mac_oldworld.c b/hw/ppc/mac_oldworld.c
index 8984398..45e410b 100644
--- a/hw/ppc/mac_oldworld.c
+++ b/hw/ppc/mac_oldworld.c
@@ -104,6 +104,7 @@ static void ppc_heathrow_init(MachineState *machine)
 DriveInfo *hd[MAX_IDE_BUS * MAX_IDE_DEVS];
 void *fw_cfg;
 uint64_t tbfreq;
+PPCTimebase *tb;
 
 linux_boot = (kernel_filename != NULL);
 
@@ -121,6 +122,9 @@ static void ppc_heathrow_init(MachineState *machine)
 /* Set time-base frequency to 16.6 Mhz */
 cpu_ppc_tb_init(env,  TBFREQ);
 qemu_register_reset(ppc_heathrow_reset, cpu);
+
+tb = g_malloc0(sizeof(PPCTimebase));
+vmstate_register(NULL, -1, &vmstate_ppc_timebase, tb);
 }
 
 /* allocate RAM */
-- 
1.7.10.4

[Qemu-devel] [PATCH 0/3] ppc: add timebase migration support to Mac machines

2016-01-31 Thread Mark Cave-Ayland

This patchset allows migration of the PPC timebase for g3beige/mac99
machines under TCG on non-PPC hosts.

The majority of the work is in patch 2: here the existing migration code is
split into PPC and non-PPC host codepaths (where the previous behaviour is
preserved). In effect, non-PPC hosts use QEMU's emulated timebase routines
which are based upon the guest virtual clock, but it is still possible to
migrate guests in the same manner.

Finally patch 3 enables the inclusion of the timebase in the migration stream
for both Old World and New World Macs.

Unfortunately I have no ability to test this on KVM-enabled hardware, however
it should preserve the existing behaviour, barring the bugfix in patch 1.

Signed-off-by: Mark Cave-Ayland 

Mark Cave-Ayland (3):
  ppc: fix timebase adjustment during migration
  ppc: add support for timebase migration on non-PPC hosts
  ppc: include timebase in migration stream for g3beige/mac99 machines

 hw/ppc/mac_newworld.c |4 
 hw/ppc/mac_oldworld.c |4 
 hw/ppc/ppc.c  |   35 ---
 3 files changed, 36 insertions(+), 7 deletions(-)

-- 
1.7.10.4

[Qemu-devel] [PATCH 2/3] ppc: add support for timebase migration on non-PPC hosts

2016-01-31 Thread Mark Cave-Ayland

This patch provides support for migration of the PPC guest timebase on non-PPC
host architectures (i.e those using QEMU's virtual emulated timebase).

Signed-off-by: Mark Cave-Ayland 
---
 hw/ppc/ppc.c |   33 +++--
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index 19f4570..9b80c1d 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -832,6 +832,15 @@ static void cpu_ppc_set_tb_clk (void *opaque, uint32_t 
freq)
 cpu_ppc_store_purr(cpu, 0xULL);
 }
 
+static int host_cpu_is_ppc(void)
+{
+#if defined(_ARCH_PPC)
+return -1;
+#else
+return 0;
+#endif
+}
+
 static void timebase_pre_save(void *opaque)
 {
 PPCTimebase *tb = opaque;
@@ -844,11 +853,16 @@ static void timebase_pre_save(void *opaque)
 }
 
 tb->time_of_the_day_ns = qemu_clock_get_ns(QEMU_CLOCK_HOST);
-/*
- * tb_offset is only expected to be changed by migration so
- * there is no need to update it from KVM here
- */
-tb->guest_timebase = ticks + first_ppc_cpu->env.tb_env->tb_offset;
+
+if (host_cpu_is_ppc()) {
+/*
+ * tb_offset is only expected to be changed by migration so
+ * there is no need to update it from KVM here
+ */
+tb->guest_timebase = ticks + first_ppc_cpu->env.tb_env->tb_offset;
+} else {
+tb->guest_timebase = cpu_ppc_load_tbl(&first_ppc_cpu->env);
+}
 }
 
 static int timebase_post_load(void *opaque, int version_id)
@@ -879,7 +893,14 @@ static int timebase_post_load(void *opaque, int version_id)
  NANOSECONDS_PER_SECOND);
 guest_tb = tb_remote->guest_timebase + migration_duration_tb;
 
-tb_off_adj = guest_tb - cpu_get_host_ticks();
+if (host_cpu_is_ppc()) {
+/* Hardware timebase */
+tb_off_adj = guest_tb - cpu_get_host_ticks();
+} else {
+/* Software timebase */
+tb_off_adj = guest_tb - muldiv64(qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL),
+ freq, get_ticks_per_sec());
+}
 
 tb_off = first_ppc_cpu->env.tb_env->tb_offset;
 trace_ppc_tb_adjust(tb_off, tb_off_adj, tb_off_adj - tb_off,
-- 
1.7.10.4

[Qemu-devel] [PATCH 1/3] ppc: fix timebase adjustment during migration

2016-01-31 Thread Mark Cave-Ayland

ns_diff is already clamped to a minimum of 0 to prevent the timebase going
backwards during migration due to misaligned clocks. Following on from this
migration_duration_tb is also subject to the same constraint; hence the
expression MIN(0, migration_duration_tb) always evaluates to 0 and so no
timebase adjustment ever takes place.

Signed-off-by: Mark Cave-Ayland 
---
 hw/ppc/ppc.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index ce90b09..19f4570 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -877,7 +877,7 @@ static int timebase_post_load(void *opaque, int version_id)
 migration_duration_ns = MIN(NANOSECONDS_PER_SECOND, ns_diff);
 migration_duration_tb = muldiv64(migration_duration_ns, freq,
  NANOSECONDS_PER_SECOND);
-guest_tb = tb_remote->guest_timebase + MIN(0, migration_duration_tb);
+guest_tb = tb_remote->guest_timebase + migration_duration_tb;
 
 tb_off_adj = guest_tb - cpu_get_host_ticks();
 
-- 
1.7.10.4

Re: [Qemu-devel] [PATCH 3/3] ppc: include timebase in migration stream for g3beige/mac99 machines

2016-01-31 Thread Peter Maydell

On 31 January 2016 at 19:19, Mark Cave-Ayland
 wrote:
> Signed-off-by: Mark Cave-Ayland 
> ---
>  hw/ppc/mac_newworld.c |4 
>  hw/ppc/mac_oldworld.c |4 
>  2 files changed, 8 insertions(+)
>
> diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c
> index f95086b..3283f1d 100644
> --- a/hw/ppc/mac_newworld.c
> +++ b/hw/ppc/mac_newworld.c
> @@ -179,6 +179,7 @@ static void ppc_core99_init(MachineState *machine)
>  int *token = g_new(int, 1);
>  hwaddr nvram_addr = 0xFFF04000;
>  uint64_t tbfreq;
> +PPCTimebase *tb;
>
>  linux_boot = (kernel_filename != NULL);
>
> @@ -201,6 +202,9 @@ static void ppc_core99_init(MachineState *machine)
>  /* Set time-base frequency to 100 Mhz */
>  cpu_ppc_tb_init(env, TBFREQ);
>  qemu_register_reset(ppc_core99_reset, cpu);
> +
> +tb = g_malloc0(sizeof(PPCTimebase));
> +vmstate_register(NULL, -1, &vmstate_ppc_timebase, tb);

Is there no way to avoid the vmstate_register here (ie to
tie the migration data to an actual device or CPU object) ?

thanks
-- PMM

Re: [Qemu-devel] [PATCH 3/3] ppc: include timebase in migration stream for g3beige/mac99 machines

2016-01-31 Thread Mark Cave-Ayland

On 31/01/16 19:58, Peter Maydell wrote:

> On 31 January 2016 at 19:19, Mark Cave-Ayland
>  wrote:
>> Signed-off-by: Mark Cave-Ayland 
>> ---
>>  hw/ppc/mac_newworld.c |4 
>>  hw/ppc/mac_oldworld.c |4 
>>  2 files changed, 8 insertions(+)
>>
>> diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c
>> index f95086b..3283f1d 100644
>> --- a/hw/ppc/mac_newworld.c
>> +++ b/hw/ppc/mac_newworld.c
>> @@ -179,6 +179,7 @@ static void ppc_core99_init(MachineState *machine)
>>  int *token = g_new(int, 1);
>>  hwaddr nvram_addr = 0xFFF04000;
>>  uint64_t tbfreq;
>> +PPCTimebase *tb;
>>
>>  linux_boot = (kernel_filename != NULL);
>>
>> @@ -201,6 +202,9 @@ static void ppc_core99_init(MachineState *machine)
>>  /* Set time-base frequency to 100 Mhz */
>>  cpu_ppc_tb_init(env, TBFREQ);
>>  qemu_register_reset(ppc_core99_reset, cpu);
>> +
>> +tb = g_malloc0(sizeof(PPCTimebase));
>> +vmstate_register(NULL, -1, &vmstate_ppc_timebase, tb);
> 
> Is there no way to avoid the vmstate_register here (ie to
> tie the migration data to an actual device or CPU object) ?

Not exactly that I know of - although I shamelessly borrowed this part
from similar code in spapr which has this comment:

/* FIXME: Should register things through the MachineState's qdev
 * interface, this is a legacy from the sPAPREnvironment structure
 * which predated MachineState but had a similar function */

Is this something that is now possible?


ATB,

Mark.

Re: [Qemu-devel] Strange monitor/stdout issue on qemu-system-sparc/qemu-system-ppc

2016-01-31 Thread Paolo Bonzini



On 31/01/2016 18:54, Peter Maydell wrote:
> On 31 January 2016 at 17:19, Paolo Bonzini  wrote:
>> On 31/01/2016 16:54, Mark Cave-Ayland wrote:
>>> I also notice that with the above commit I lose cycling through history
>>> in the GTK monitor - even with the multiple echo, instead of the up/down
>>> arrow keys cycling through the history instead I see the codes ^[[B and
>>> ^[[A being output to the window instead.
>>
>> That is probably me.  The echo feature was introduced for QMP, but in
>> theory it should have been limited to that.  I'll check it, thanks.
> 
> I've also seen echo, but only intermittently...

That smells like uninitialized memory or something like that.

Actually I'm fairly sure I tested "-monitor vc" at least, so perhaps
it's an interaction between the echo feature and "qemu-char: add logfile
facility to all chardev backends".  Anyway I'll look at it.

Paolo

Re: [Qemu-devel] [PULL 00/39] ppc-for-2.6 queue 20160129

2016-01-31 Thread David Gibson

On Sat, Jan 30, 2016 at 11:29:43PM +1100, David Gibson wrote:
> On Fri, Jan 29, 2016 at 02:48:23PM +, Peter Maydell wrote:
> > On 29 January 2016 at 05:06, David Gibson  
> > wrote:
> > > The following changes since commit 
> > > 357e81c7e880f868833edf9f53cce1f3b09ea8ec:
> > >
> > >   Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20160128' into 
> > > staging (2016-01-28 11:46:34 +)
> > >
> > > are available in the git repository at:
> > >
> > >   git://github.com/dgibson/qemu.git tags/ppc-for-2.6-20160129
> > >
> > > for you to fetch changes up to 1699679e699276c0538008f6ca74cd04e6c68b42:
> > >
> > >   target-ppc: Make every FPSCR_ macro have a corresponding FP_ macro 
> > > (2016-01-29 14:01:52 +1100)
> > >
> > > 
> > > ppc patch queue for 2016-01-29
> > >
> > > Currently accumulated patches for target-ppc, pseries machine type and
> > > related devices.
> > >   * Cleanup of error handling code in spapr
> > >   * A number of fixes for Macintosh devices for the benefit of MacOS 9 
> > > and X
> > >   * Remove some abuses of the RTAS memory access functions in spapr
> > >   * Fixes for the gdbstub (and monitor debug) for VMX and VSX extensions.
> > >   * Fix pseries machine hotplug memory under TCG
> > >   * Clean up and extend handling of multiple page sizes with 64-bit hash 
> > > MMUs
> > >
> > 
> > Hi. Unfortunately this generates errors when built with clang:
> > 
> > /home/petmay01/linaro/qemu-for-merges/target-ppc/mmu_helper.c:660:20:
> > error: unused function 'ppc4xx_tlb_invalidate_virt'
> > [-Werror,-Wunused-function]
> > static inline void ppc4xx_tlb_invalidate_virt(CPUPPCState *env,
> >^
> > 1 error generated.
> > 
> > The function does appear from a quick grep to be entirely unused...
> > 
> > (GCC doesn't complain about this because it doesn't warn about unused
> > static inline functions in a .c file, but clang does.)
> 
> Dammit.  Sorry.
> 
> Now.. why didn't travis pick that up :/.

Turns out the answer is because the test for support of #pragma GCC
diagnostic turns off -Werror on clang builds for me.  So I wonder
what's different about your setup that -Werror is working with clang.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v2 2/2] target-ppc: mcrfs should always update FEX/VX and only clear exception bits

2016-01-31 Thread David Gibson

On Fri, Jan 29, 2016 at 06:40:21PM +, James Clarke wrote:
> Here is the description of the mcrfs instruction from the PowerPC Architecture
> Book, Version 2.02, Book I: PowerPC User Instruction Set Architecture
> (http://www.ibm.com/developerworks/systems/library/es-archguide-v2.html), 
> found
> on page 120:
> 
> The contents of FPSCR field BFA are copied to Condition Register field BF.
> All exception bits copied are set to 0 in the FPSCR. If the FX bit is
> copied, it is set to 0 in the FPSCR.
> 
> Special Registers Altered:
> CR field BF
> FX OX(if BFA=0)
> UX ZX XX VXSNAN  (if BFA=1)
> VXISI VXIDI VXZDZ VXIMZ  (if BFA=2)
> VXVC (if BFA=3)
> VXSOFT VXSQRT VXCVI  (if BFA=5)
> 
> However, currently every bit in FPSCR field BFA is set to 0, including ones 
> not
> on that list.
> 
> This can be seen in the following simple C program:
> 
> #include 
> #include 
> 
> int main(int argc, char **argv) {
> int ret;
> ret = fegetround();
> printf("Current rounding: %d\n", ret);
> ret = fesetround(FE_UPWARD);
> printf("Setting to FE_UPWARD (%d): %d\n", FE_UPWARD, ret);
> ret = fegetround();
> printf("Current rounding: %d\n", ret);
> ret = fegetround();
> printf("Current rounding: %d\n", ret);
> return 0;
> }
> 
> which gave the output (before this commit):
> 
> Current rounding: 0
> Setting to FE_UPWARD (2): 0
> Current rounding: 2
> Current rounding: 0
> 
> instead of (after this commit):
> 
> Current rounding: 0
> Setting to FE_UPWARD (2): 0
> Current rounding: 2
> Current rounding: 2
> 
> The relevant disassembly is in fegetround(), which, on my system, is:
> 
> __GI___fegetround:
> <+0>:   mcrfs  cr7, cr7
> <+4>:   mfcr   r3
> <+8>:   clrldi r3, r3, 62
> <+12>:  blr
> 
> What happens is that, the first time fegetround() is called, FPSCR field 7 is
> retrieved. However, because of the bug in mcrfs, the entirety of field 7 is 
> set
> to 0, which includes the rounding mode.
> 
> There are other issues this will fix, such as condition flags not persisting
> when they should if read, and if you were to read a specific field with some
> exception bits set, but no others were set in the entire register, then the
> bits would be cleared correctly, but FEX/VX would not be updated to 0 as they
> should be.
> 
> Signed-off-by: James Clarke 

Thanks for the fixup.  It actually looks like helper_store_fpscr()
should really take a target_ulong instead of u64 and have the (single)
caller which wants to pass a 64 do the truncate.  But that can be a
cleanup for another day.

Applied to ppc-for-2.6.

> ---
>  target-ppc/cpu.h   |  6 ++
>  target-ppc/translate.c | 21 +
>  2 files changed, 23 insertions(+), 4 deletions(-)
> 
> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> index 3a967b7..d811bc9 100644
> --- a/target-ppc/cpu.h
> +++ b/target-ppc/cpu.h
> @@ -718,6 +718,12 @@ enum {
>  #define FP_RN1   (1ull << FPSCR_RN1)
>  #define FP_RN(1ull << FPSCR_RN)
>  
> +/* the exception bits which can be cleared by mcrfs - includes FX */
> +#define FP_EX_CLEAR_BITS (FP_FX | FP_OX | FP_UX | FP_ZX | \
> +  FP_XX | FP_VXSNAN | FP_VXISI  | FP_VXIDI  | \
> +  FP_VXZDZ  | FP_VXIMZ  | FP_VXVC   | FP_VXSOFT | \
> +  FP_VXSQRT | FP_VXCVI)
> +
>  
> /*/
>  /* Vector status and control register */
>  #define VSCR_NJ  16 /* Vector non-java */
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index 4be7eaa..ca10bd1 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -2500,18 +2500,31 @@ static void gen_fmrgow(DisasContext *ctx)
>  static void gen_mcrfs(DisasContext *ctx)
>  {
>  TCGv tmp = tcg_temp_new();
> +TCGv_i32 tmask;
> +TCGv_i64 tnew_fpscr = tcg_temp_new_i64();
>  int bfa;
> +int nibble;
> +int shift;
>  
>  if (unlikely(!ctx->fpu_enabled)) {
>  gen_exception(ctx, POWERPC_EXCP_FPU);
>  return;
>  }
> -bfa = 4 * (7 - crfS(ctx->opcode));
> -tcg_gen_shri_tl(tmp, cpu_fpscr, bfa);
> +bfa = crfS(ctx->opcode);
> +nibble = 7 - bfa;
> +shift = 4 * nibble;
> +tcg_gen_shri_tl(tmp, cpu_fpscr, shift);
>  tcg_gen_trunc_tl_i32(cpu_crf[crfD(ctx->opcode)], tmp);
> -tcg_temp_free(tmp);
>  tcg_gen_andi_i32(cpu_crf[crfD(ctx->opcode)], cpu_crf[crfD(ctx->opcode)], 
> 0xf);
> -tcg_gen_andi_tl(cpu_fpscr, cpu_fpscr, ~(0xF << bfa));
> +tcg_temp_free(tmp);
> +tcg_gen_extu_tl_i64(tnew_fpscr, cpu_fpscr);
> +/* Only the exception bits (including FX) should be cleared if read */
> +tcg_gen_andi_i64(tnew_fpscr, tnew_f

Re: [Qemu-devel] [PATCH v2 2/2] target-ppc: mcrfs should always update FEX/VX and only clear exception bits

2016-01-31 Thread James Clarke

> On 31 Jan 2016, at 23:50, David Gibson  wrote:
> On Fri, Jan 29, 2016 at 06:40:21PM +, James Clarke wrote:
>> Here is the description of the mcrfs instruction from the PowerPC 
>> Architecture
>> Book, Version 2.02, Book I: PowerPC User Instruction Set Architecture
>> (http://www.ibm.com/developerworks/systems/library/es-archguide-v2.html), 
>> found
>> on page 120:
>> 
>>The contents of FPSCR field BFA are copied to Condition Register field BF.
>>All exception bits copied are set to 0 in the FPSCR. If the FX bit is
>>copied, it is set to 0 in the FPSCR.
>> 
>>Special Registers Altered:
>>CR field BF
>>FX OX(if BFA=0)
>>UX ZX XX VXSNAN  (if BFA=1)
>>VXISI VXIDI VXZDZ VXIMZ  (if BFA=2)
>>VXVC (if BFA=3)
>>VXSOFT VXSQRT VXCVI  (if BFA=5)
>> 
>> However, currently every bit in FPSCR field BFA is set to 0, including ones 
>> not
>> on that list.
>> 
>> This can be seen in the following simple C program:
>> 
>>#include 
>>#include 
>> 
>>int main(int argc, char **argv) {
>>int ret;
>>ret = fegetround();
>>printf("Current rounding: %d\n", ret);
>>ret = fesetround(FE_UPWARD);
>>printf("Setting to FE_UPWARD (%d): %d\n", FE_UPWARD, ret);
>>ret = fegetround();
>>printf("Current rounding: %d\n", ret);
>>ret = fegetround();
>>printf("Current rounding: %d\n", ret);
>>return 0;
>>}
>> 
>> which gave the output (before this commit):
>> 
>>Current rounding: 0
>>Setting to FE_UPWARD (2): 0
>>Current rounding: 2
>>Current rounding: 0
>> 
>> instead of (after this commit):
>> 
>>Current rounding: 0
>>Setting to FE_UPWARD (2): 0
>>Current rounding: 2
>>Current rounding: 2
>> 
>> The relevant disassembly is in fegetround(), which, on my system, is:
>> 
>>__GI___fegetround:
>><+0>:   mcrfs  cr7, cr7
>><+4>:   mfcr   r3
>><+8>:   clrldi r3, r3, 62
>><+12>:  blr
>> 
>> What happens is that, the first time fegetround() is called, FPSCR field 7 is
>> retrieved. However, because of the bug in mcrfs, the entirety of field 7 is 
>> set
>> to 0, which includes the rounding mode.
>> 
>> There are other issues this will fix, such as condition flags not persisting
>> when they should if read, and if you were to read a specific field with some
>> exception bits set, but no others were set in the entire register, then the
>> bits would be cleared correctly, but FEX/VX would not be updated to 0 as they
>> should be.
>> 
>> Signed-off-by: James Clarke 
> 
> Thanks for the fixup.  It actually looks like helper_store_fpscr()
> should really take a target_ulong instead of u64 and have the (single)
> caller which wants to pass a 64 do the truncate.  But that can be a
> cleanup for another day.
> 
> Applied to ppc-for-2.6.

Great, thanks. I agree it seems odd, especially given the argument is cast to
target_ulong, but that’s a more invasive change.

> 
>> ---
>> target-ppc/cpu.h   |  6 ++
>> target-ppc/translate.c | 21 +
>> 2 files changed, 23 insertions(+), 4 deletions(-)
>> 
>> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
>> index 3a967b7..d811bc9 100644
>> --- a/target-ppc/cpu.h
>> +++ b/target-ppc/cpu.h
>> @@ -718,6 +718,12 @@ enum {
>> #define FP_RN1   (1ull << FPSCR_RN1)
>> #define FP_RN(1ull << FPSCR_RN)
>> 
>> +/* the exception bits which can be cleared by mcrfs - includes FX */
>> +#define FP_EX_CLEAR_BITS (FP_FX | FP_OX | FP_UX | FP_ZX | \
>> +  FP_XX | FP_VXSNAN | FP_VXISI  | FP_VXIDI  | \
>> +  FP_VXZDZ  | FP_VXIMZ  | FP_VXVC   | FP_VXSOFT | \
>> +  FP_VXSQRT | FP_VXCVI)
>> +
>> /*/
>> /* Vector status and control register */
>> #define VSCR_NJ  16 /* Vector non-java */
>> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
>> index 4be7eaa..ca10bd1 100644
>> --- a/target-ppc/translate.c
>> +++ b/target-ppc/translate.c
>> @@ -2500,18 +2500,31 @@ static void gen_fmrgow(DisasContext *ctx)
>> static void gen_mcrfs(DisasContext *ctx)
>> {
>> TCGv tmp = tcg_temp_new();
>> +TCGv_i32 tmask;
>> +TCGv_i64 tnew_fpscr = tcg_temp_new_i64();
>> int bfa;
>> +int nibble;
>> +int shift;
>> 
>> if (unlikely(!ctx->fpu_enabled)) {
>> gen_exception(ctx, POWERPC_EXCP_FPU);
>> return;
>> }
>> -bfa = 4 * (7 - crfS(ctx->opcode));
>> -tcg_gen_shri_tl(tmp, cpu_fpscr, bfa);
>> +bfa = crfS(ctx->opcode);
>> +nibble = 7 - bfa;
>> +shift = 4 * nibble;
>> +tcg_gen_shri_tl(tmp, cpu_fpscr, shift);
>> tcg_gen_trunc_tl_i32(cpu_crf[crfD(ctx->opcode)], tmp);
>> -tcg_temp_free(tmp);
>> tcg_gen_andi_i32(cpu_crf[crfD(ctx->opcode)], cpu_crf[crfD(ctx->opcode)], 
>> 0

Re: [Qemu-devel] [PATCH v2 0/3] qemu-nbd.texi formatting, grammar and completeness fixes

2016-01-31 Thread Paolo Bonzini



On 31/01/2016 14:25, Sitsofe Wheeler wrote:
>> > Thanks, queued.  Will send a pull request some time next week.
> Just checking - did this one get lost? Nothing's popped up in the QEMU
> git repos yet...

Hmm, yes.  Thanks for telling me.

Paolo

Re: [Qemu-devel] [PATCH v14 7/8] Implement new driver for block replication

2016-01-31 Thread Wen Congyang

On 01/29/2016 11:46 PM, Stefan Hajnoczi wrote:
> On Fri, Jan 29, 2016 at 11:13:42AM +0800, Changlong Xie wrote:
>> On 01/28/2016 11:15 PM, Stefan Hajnoczi wrote:
>>> On Thu, Jan 28, 2016 at 09:13:24AM +0800, Wen Congyang wrote:
 On 01/27/2016 10:46 PM, Stefan Hajnoczi wrote:
> On Wed, Jan 13, 2016 at 05:18:31PM +0800, Changlong Xie wrote:
>>> I'm concerned that the bdrv_drain_all() in vm_stop() can take a long
>>> time if the disk is slow/failing.  bdrv_drain_all() blocks until all
>>> in-flight I/O requests have completed.  What does the Primary do if the
>>> Secondary becomes unresponsive?
>>
>> Actually, we knew this problem. But currently, there seems no better way to
>> resolve it. If you have any ideas?
> 
> Is it possible to hold the checkpoint information and acknowledge the
> checkpoint right away, without waiting for bdrv_drain_all() or any
> Secondory guest activity to complete?

There is no way to know that secondary becomes unreponsive.

> 
> I think this really means falling back to microcheckpointing until the
> Secondary guest can checkpoint.  Instead of a blocking vm_stop() we
> would prevent vcpus from running and when the last pending I/O finishes
> the Secondary could apply the last checkpoint.  This approach does not
> block QEMU (the monitor, etc).
> 

If secondary host becomes unresponsive, it means that we cannot do 
mocrocheckpointing.
We should do failover in this case.

Thanks
Wen Congyang

Re: [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints

2016-01-31 Thread Wen Congyang

On 01/29/2016 06:47 PM, Dr. David Alan Gilbert wrote:
> * Wen Congyang (we...@cn.fujitsu.com) wrote:
>> On 01/29/2016 06:07 PM, Dr. David Alan Gilbert wrote:
>>> * Wen Congyang (we...@cn.fujitsu.com) wrote:
 On 01/27/2016 07:03 PM, Dr. David Alan Gilbert wrote:
> Hi,
>   I've got a block error if I kill the secondary.
>
> Start both primary & secondary
> kill -9 secondary qemu
> x_colo_lost_heartbeat on primary
>
> The guest sees a block error and the ext4 root switches to read-only.
>
> I gdb'd the primary with a breakpoint on quorum_report_bad; see
> backtrace below.
> (This is based on colo-v2.4-periodic-mode of the framework
> code with the block and network proxy merged in; so it could be my
> merging but I don't think so ?)
>
>
> (gdb) where
> #0  quorum_report_bad (node_name=0x7f2946a0892c "node0", ret=-5, 
> acb=0x7f2946cb3910, acb=0x7f2946cb3910)
> at /root/colo/jan-2016/qemu/block/quorum.c:222
> #1  0x7f2943b23058 in quorum_aio_cb (opaque=, 
> ret=)
> at /root/colo/jan-2016/qemu/block/quorum.c:315
> #2  0x7f2943b311be in bdrv_co_complete (acb=0x7f2946cb3f60) at 
> /root/colo/jan-2016/qemu/block/io.c:2122
> #3  0x7f2943ae777d in aio_bh_call (bh=) at 
> /root/colo/jan-2016/qemu/async.c:64
> #4  aio_bh_poll (ctx=ctx@entry=0x7f2945b771d0) at 
> /root/colo/jan-2016/qemu/async.c:92
> #5  0x7f2943af5090 in aio_dispatch (ctx=0x7f2945b771d0) at 
> /root/colo/jan-2016/qemu/aio-posix.c:305
> #6  0x7f2943ae756e in aio_ctx_dispatch (source=, 
> callback=, 
> user_data=) at /root/colo/jan-2016/qemu/async.c:231
> #7  0x7f293b84a79a in g_main_context_dispatch () from 
> /lib64/libglib-2.0.so.0
> #8  0x7f2943af3a00 in glib_pollfds_poll () at 
> /root/colo/jan-2016/qemu/main-loop.c:211
> #9  os_host_main_loop_wait (timeout=) at 
> /root/colo/jan-2016/qemu/main-loop.c:256
> #10 main_loop_wait (nonblocking=) at 
> /root/colo/jan-2016/qemu/main-loop.c:504
> #11 0x7f29438529ee in main_loop () at 
> /root/colo/jan-2016/qemu/vl.c:1945
> #12 main (argc=, argv=, envp= out>) at /root/colo/jan-2016/qemu/vl.c:4707
>
> (gdb) p s->num_children
> $1 = 2
> (gdb) p acb->success_count
> $2 = 0
> (gdb) p acb->is_read
> $5 = false

 Sorry for the late reply.
>>>
>>> No problem.
>>>
 What it the value of acb->count?
>>>
>>> (gdb) p acb->count
>>> $1 = 1
>>
>> Note, the count is 1, not 2. Writing to children.0 is in flight. If writing 
>> to children.0 successes,
>> the guest doesn't know this error.
 If secondary host is down, you should remove quorum's children.1. 
 Otherwise, you will get
 I/O error event.
>>>
>>> Is that safe?  If the secondary fails, do you always have time to issue the 
>>> command to
>>> remove the children.1  before the guest sees the error?
>>
>> We will write to two children, and expect that writing to children.0 will 
>> success. If so,
>> the guest doesn't know this error. You just get the I/O error event.
> 
> I think children.0 is the disk, and that should be OK - so only the 
> children.1/replication should
> be failing - so in that case why do I see the error?

I don't know, and I will check the codes.

> The 'node0' in the backtrace above is the name of the replication, so it does 
> look like the error
> is coming from the replication.

No, the backtrace is just report an I/O error events to the management 
application.

> 
>>> Anyway, I tried removing children.1 but it segfaults now, I guess the 
>>> replication is unhappy:
>>>
>>> (qemu) x_block_change colo-disk0 -d children.1
>>> (qemu) x_colo_lost_heartbeat 
>>
>> Hmm, you should not remove the child before failover. I will check it how to 
>> avoid it in the codes.
> 
>  But you said 'If secondary host is down, you should remove quorum's 
> children.1' - is that not
> what you meant?

Yes, you should excute 'x_colo_lost_heartbeat' fist, and then excute 
'x_block_change ... -d ...'.

> 
>>> 12973 Segmentation fault  (core dumped) 
>>> ./try/x86_64-softmmu/qemu-system-x86_64 -enable-kvm $console_param -S -boot 
>>> c -m 4080 -smp 4 -machine pc-i440fx-2.5,accel=kvm -name debug-threads=on 
>>> -trace events=trace-file -device virtio-rng-pci $block_param $net_param
>>>
>>> #0  0x7f0a398a864c in bdrv_stop_replication (bs=0x7f0a3b0a8430, 
>>> failover=true, errp=0x7fff6a5c3420)
>>> at /root/colo/jan-2016/qemu/block.c:4426
>>>
>>> (gdb) p drv
>>> $1 = (BlockDriver *) 0x5d2a
>>>
>>>   it looks like the whole of bs is bogus.
>>>
>>> #1  0x7f0a398d87f6 in quorum_stop_replication (bs=, 
>>> failover=, 
>>> errp=) at /root/colo/jan-2016/qemu/block/quorum.c:1213
>>>
>>> (gdb) p s->replication_index
>>> $3 = 1
>>>
>>> I guess quorum_del_child needs to stop replication before it removes the 
>>> child?
>>
>> Yes, but in the newest version, quorum doesn't know the

Re: [Qemu-devel] [PATCH 1/3] ppc: fix timebase adjustment during migration

2016-01-31 Thread David Gibson

On Sun, Jan 31, 2016 at 07:19:34PM +, Mark Cave-Ayland wrote:
> ns_diff is already clamped to a minimum of 0 to prevent the timebase going
> backwards during migration due to misaligned clocks. Following on from this
> migration_duration_tb is also subject to the same constraint; hence the
> expression MIN(0, migration_duration_tb) always evaluates to 0 and so no
> timebase adjustment ever takes place.
> 
> Signed-off-by: Mark Cave-Ayland 

So, there are actually two problems here, which could be expressed a
bit more clearly in the commit message.

First, this clamping is redundant, because of the earlier clamp on
ns_diff.  Well.. probably.. I do wonder if we could get an overflow
anywhere giving us a negative number again.

More importantly, though, this is supposed to be a clamp below, which
needs a MAX.  MIN is Just Plain Wrong.

> ---
>  hw/ppc/ppc.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
> index ce90b09..19f4570 100644
> --- a/hw/ppc/ppc.c
> +++ b/hw/ppc/ppc.c
> @@ -877,7 +877,7 @@ static int timebase_post_load(void *opaque, int 
> version_id)
>  migration_duration_ns = MIN(NANOSECONDS_PER_SECOND, ns_diff);
>  migration_duration_tb = muldiv64(migration_duration_ns, freq,
>   NANOSECONDS_PER_SECOND);
> -guest_tb = tb_remote->guest_timebase + MIN(0, migration_duration_tb);
> +guest_tb = tb_remote->guest_timebase + migration_duration_tb;
>  
>  tb_off_adj = guest_tb - cpu_get_host_ticks();
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 3/3] ppc: include timebase in migration stream for g3beige/mac99 machines

2016-01-31 Thread David Gibson

On Sun, Jan 31, 2016 at 08:10:08PM +, Mark Cave-Ayland wrote:
> On 31/01/16 19:58, Peter Maydell wrote:
> 
> > On 31 January 2016 at 19:19, Mark Cave-Ayland
> >  wrote:
> >> Signed-off-by: Mark Cave-Ayland 
> >> ---
> >>  hw/ppc/mac_newworld.c |4 
> >>  hw/ppc/mac_oldworld.c |4 
> >>  2 files changed, 8 insertions(+)
> >>
> >> diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c
> >> index f95086b..3283f1d 100644
> >> --- a/hw/ppc/mac_newworld.c
> >> +++ b/hw/ppc/mac_newworld.c
> >> @@ -179,6 +179,7 @@ static void ppc_core99_init(MachineState *machine)
> >>  int *token = g_new(int, 1);
> >>  hwaddr nvram_addr = 0xFFF04000;
> >>  uint64_t tbfreq;
> >> +PPCTimebase *tb;
> >>
> >>  linux_boot = (kernel_filename != NULL);
> >>
> >> @@ -201,6 +202,9 @@ static void ppc_core99_init(MachineState *machine)
> >>  /* Set time-base frequency to 100 Mhz */
> >>  cpu_ppc_tb_init(env, TBFREQ);
> >>  qemu_register_reset(ppc_core99_reset, cpu);
> >> +
> >> +tb = g_malloc0(sizeof(PPCTimebase));
> >> +vmstate_register(NULL, -1, &vmstate_ppc_timebase, tb);
> > 
> > Is there no way to avoid the vmstate_register here (ie to
> > tie the migration data to an actual device or CPU object) ?
> 
> Not exactly that I know of - although I shamelessly borrowed this part
> from similar code in spapr which has this comment:
> 
> /* FIXME: Should register things through the MachineState's qdev
>  * interface, this is a legacy from the sPAPREnvironment structure
>  * which predated MachineState but had a similar function */
> 
> Is this something that is now possible?

Well, it's certainly possible to do better than this.  You want to
make a vmstate_g3beige and vmstate_mac99 which contain all the machine
level things to migrate for these machines, similar to vmstate_spapr.
They will be attached to the MachineState object.

That will at least mean that if more things need to get added to
migration for these machines, then additional vmstate_register() calls
won't be needed.

I'm not sure if there's a better way to register a vmstate for a
machine type.  I thought there was, but I couldn't spot it in a quick
lock.

Peter,

I believe this does need to be attached to the machine, not to the
cpu, even though the cpu would seem to make more sense on a first
look.  The reason is that attaching it to the cpu means it will be
transferred separately for each cpu, and unless we're super-careful
about timing the destination cpus could end up with slightly different
values.  That would be bad, because ppc has a pretty strong
requirement that the timebases be synchronized across all cpus in an
smp system.  The means of initially accomplishing that vary by
platform - usually there's some board level register to freeze /
resume all the timebases - but however it's been done, we don't want
to mess it up on migration.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 2/3] ppc: add support for timebase migration on non-PPC hosts

2016-01-31 Thread David Gibson

On Sun, Jan 31, 2016 at 07:19:35PM +, Mark Cave-Ayland wrote:
> This patch provides support for migration of the PPC guest timebase on non-PPC
> host architectures (i.e those using QEMU's virtual emulated timebase).
> 
> Signed-off-by: Mark Cave-Ayland 

We shouldn't need an explicit test for a ppc host.  Instead we should
never be touching any host-dependent ticks values, only using host
side interfaces which work in realtime units like ns.

Worse, the ppc host variants here will still be wrong if the host has
a different timebase frequency to the guest, which will always be true
for a g3beige (16MHz) on a modern ppc host (512 MHz).


> ---
>  hw/ppc/ppc.c |   33 +++--
>  1 file changed, 27 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
> index 19f4570..9b80c1d 100644
> --- a/hw/ppc/ppc.c
> +++ b/hw/ppc/ppc.c
> @@ -832,6 +832,15 @@ static void cpu_ppc_set_tb_clk (void *opaque, uint32_t 
> freq)
>  cpu_ppc_store_purr(cpu, 0xULL);
>  }
>  
> +static int host_cpu_is_ppc(void)
> +{
> +#if defined(_ARCH_PPC)
> +return -1;
> +#else
> +return 0;
> +#endif
> +}
> +
>  static void timebase_pre_save(void *opaque)
>  {
>  PPCTimebase *tb = opaque;
> @@ -844,11 +853,16 @@ static void timebase_pre_save(void *opaque)
>  }
>  
>  tb->time_of_the_day_ns = qemu_clock_get_ns(QEMU_CLOCK_HOST);
> -/*
> - * tb_offset is only expected to be changed by migration so
> - * there is no need to update it from KVM here
> - */
> -tb->guest_timebase = ticks + first_ppc_cpu->env.tb_env->tb_offset;
> +
> +if (host_cpu_is_ppc()) {
> +/*
> + * tb_offset is only expected to be changed by migration so
> + * there is no need to update it from KVM here
> + */
> +tb->guest_timebase = ticks + first_ppc_cpu->env.tb_env->tb_offset;
> +} else {
> +tb->guest_timebase = cpu_ppc_load_tbl(&first_ppc_cpu->env);
> +}
>  }
>  
>  static int timebase_post_load(void *opaque, int version_id)
> @@ -879,7 +893,14 @@ static int timebase_post_load(void *opaque, int 
> version_id)
>   NANOSECONDS_PER_SECOND);
>  guest_tb = tb_remote->guest_timebase + migration_duration_tb;
>  
> -tb_off_adj = guest_tb - cpu_get_host_ticks();
> +if (host_cpu_is_ppc()) {
> +/* Hardware timebase */
> +tb_off_adj = guest_tb - cpu_get_host_ticks();
> +} else {
> +/* Software timebase */
> +tb_off_adj = guest_tb - 
> muldiv64(qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL),
> + freq, get_ticks_per_sec());
> +}
>  
>  tb_off = first_ppc_cpu->env.tb_env->tb_offset;
>  trace_ppc_tb_adjust(tb_off, tb_off_adj, tb_off_adj - tb_off,

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] Migrating decrementer

2016-01-31 Thread David Gibson

On Tue, Jan 26, 2016 at 10:31:19PM +, Mark Cave-Ayland wrote:
> On 25/01/16 11:10, David Gibson wrote:
> 
> > Um.. so the migration duration is a complete red herring, regardless
> > of the units.
> > 
> > Remember, we only ever compute the guest timebase value at the moment
> > the guest requests it - actually maintaining a current timebase value
> > makes sense in hardware, but would be nuts in software.
> > 
> > The timebase is a function of real, wall-clock time, and the migration
> > destination has a notion of wall-clock time without reference to the
> > source.
> > 
> > So what you need to transmit for migration is enough information to
> > compute the guest timebase from real-time - essentially just an offset
> > between real-time and the timebase.
> > 
> > The guest can potentially observe the migration duration as a jump in
> > timebase values, but qemu doesn't need to do any calculations with it.
> 
> Thanks for more pointers - I think I'm slowly getting there. My current
> thoughts are that the basic migration algorithm is doing the right thing
> in that it works out the number of host ticks different between source
> and destination.

Sorry, I've take a while to reply to this.  I realised the tb
migration didn't work the way I thought it did, so I've had to get my
head around what's actually going on.

I had thought that it transferred only meta-information telling the
destination how to calculate the timebase, without actually working
out the timebase value at any particular moment.

In fact, what it sends is basically the tuple of (timebase, realtime)
at the point of sending the migration stream.  The destination then
uses that to work out how to compute the timebase from realtime there.

I'm not convinced this is a great approach, but it should basically
work.  However, as you've seen there are also some Just Plain Bugs in
the logic for this.

> I have a slight query with this section of code though:
> 
> migration_duration_tb = muldiv64(migration_duration_ns, freq,
>  NANOSECONDS_PER_SECOND);
> 
> This is not technically correct on TCG x86 since the timebase is the x86
> TSC which is running somewhere in the GHz range, compared to freq which
> is hard-coded to 16MHz.

Um.. what?  AFAICT that line doesn't have any reference to the TSC
speed.  Just ns and the (guest) tb).  Also 16MHz is only for the
oldworld Macs - modern ppc cpus have the TB frequency architected as
512MHz.

> However this doesn't seem to matter because the
> timebase adjustment is limited to a maximum of 1s. Why should this be if
> the timebase is supposed to be free running as you mentioned in a
> previous email?

AFAICT, what it's doing here is assuming that if the migration
duration is >1s (or appears to be >1s) then it's because the host
clocks are out of sync and so just capping the elapsed tb time at 1s.

That's just wrong, IMO.  1s is a long downtime for a live migration,
but it's not impossible, and it will happen nearly always in the
scenariou you've discussed of manually loading the migration stream
from a file.

But more to the point, trying to maintain correctness of the timebase
when the hosts are out of sync is basically futile.  There's no other
reference we can use, so all we can achieve is getting a different
wrong value from what we'd get by blindly trusting the host clock.

We do need to constrain the tb from going backwards, because that will
cause chaos on the guest, but otherwise we should just trust the host
clock and ditch that 1s clamp.  If the hosts are out of sync, then
guest time will jump, but that was always going to happen.

> AFAICT the main problem on TCG x86 is that post-migration the timebase
> calculated by cpu_ppc_get_tb() is incorrect:
> 
> uint64_t cpu_ppc_get_tb(ppc_tb_t *tb_env, uint64_t vmclk, int64_t tb_offset)
> {
> /* TB time in tb periods */
> return muldiv64(vmclk, tb_env->tb_freq, get_ticks_per_sec()) +
> tb_offset;
> }

So the problem here is that get_ticks_per_sec() (which always returns
1,000,000,000) is not talking about the same ticks as
cpu_get_host_ticks().  That may not have been true when this code was
written.

> For a typical savevm/loadvm pair I see something like this:
> 
> savevm:
> 
> tb->guest_timebase = 26281306490558
> qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) = 7040725511
> 
> loadvm:
> 
> cpu_get_host_ticks() = 26289847005259
> tb_off_adj = -8540514701
> qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) = 7040725511
> cpu_ppc_get_tb() = -15785159386
> 
> But as cpu_ppc_get_tb() uses QEMU_CLOCK_VIRTUAL for vmclk we end up with
> a negative number for the timebase since the virtual clock is dwarfed by
> the number of TSC ticks calculated for tb_off_adj. This will work on a
> PPC host though since cpu_host_get_ticks() is also derived from the
> timebase.

Yeah, we shouldn't be using cpu_host_get_ticks() at all - or anything
else which depends on a host frequency.  We should only be using qemu
interfaces

[Qemu-devel] [PULL 04/40] macio: add dma_active to VMStateDescription

2016-01-31 Thread David Gibson

From: Mark Cave-Ayland 

Make sure that we include the value of dma_active in the migration stream.

Signed-off-by: Mark Cave-Ayland 
Acked-by: John Snow 
Signed-off-by: David Gibson 
---
 hw/ide/macio.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/ide/macio.c b/hw/ide/macio.c
index bfdc377..1725e5b 100644
--- a/hw/ide/macio.c
+++ b/hw/ide/macio.c
@@ -517,11 +517,12 @@ static const MemoryRegionOps pmac_ide_ops = {
 
 static const VMStateDescription vmstate_pmac = {
 .name = "ide",
-.version_id = 3,
+.version_id = 4,
 .minimum_version_id = 0,
 .fields = (VMStateField[]) {
 VMSTATE_IDE_BUS(bus, MACIOIDEState),
 VMSTATE_IDE_DRIVES(bus.ifs, MACIOIDEState),
+VMSTATE_BOOL(dma_active, MACIOIDEState),
 VMSTATE_END_OF_LIST()
 }
 };
-- 
2.5.0

[Qemu-devel] [PULL 02/40] target-ppc: use cpu_write_xer() helper in cpu_post_load

2016-01-31 Thread David Gibson

From: Mark Cave-Ayland 

Otherwise some internal xer variables fail to get set post-migration.

Signed-off-by: Mark Cave-Ayland 
Reviewed-by: Alexey Kardashevskiy 
Signed-off-by: David Gibson 
---
 target-ppc/machine.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target-ppc/machine.c b/target-ppc/machine.c
index 8e30b7a..8cabc77 100644
--- a/target-ppc/machine.c
+++ b/target-ppc/machine.c
@@ -169,7 +169,7 @@ static int cpu_post_load(void *opaque, int version_id)
 env->spr[SPR_PVR] = env->spr_cb[SPR_PVR].default_value;
 env->lr = env->spr[SPR_LR];
 env->ctr = env->spr[SPR_CTR];
-env->xer = env->spr[SPR_XER];
+cpu_write_xer(env, env->spr[SPR_XER]);
 #if defined(TARGET_PPC64)
 env->cfar = env->spr[SPR_CFAR];
 #endif
-- 
2.5.0

[Qemu-devel] [PULL 08/40] spapr: Remove rtas_st_buffer_direct()

2016-01-31 Thread David Gibson

rtas_st_buffer_direct() is a not particularly useful wrapper around
cpu_physical_memory_write().  All the callers are in
rtas_ibm_configure_connector, where it's better handled by local helper.

Signed-off-by: David Gibson 
Reviewed-by: Alexey Kardashevskiy 
---
 hw/ppc/spapr_rtas.c| 17 ++---
 include/hw/ppc/spapr.h |  8 
 2 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index ab11b32..19e903d 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -506,6 +506,13 @@ out:
 #define CC_VAL_DATA_OFFSET ((CC_IDX_PROP_DATA_OFFSET + 1) * 4)
 #define CC_WA_LEN 4096
 
+static void configure_connector_st(target_ulong addr, target_ulong offset,
+   const void *buf, size_t len)
+{
+cpu_physical_memory_write(ppc64_phys_to_real(addr + offset),
+  buf, MIN(len, CC_WA_LEN - offset));
+}
+
 static void rtas_ibm_configure_connector(PowerPCCPU *cpu,
  sPAPRMachineState *spapr,
  uint32_t token, uint32_t nargs,
@@ -571,8 +578,7 @@ static void rtas_ibm_configure_connector(PowerPCCPU *cpu,
 /* provide the name of the next OF node */
 wa_offset = CC_VAL_DATA_OFFSET;
 rtas_st(wa_addr, CC_IDX_NODE_NAME_OFFSET, wa_offset);
-rtas_st_buffer_direct(wa_addr + wa_offset, CC_WA_LEN - wa_offset,
-  (uint8_t *)name, strlen(name) + 1);
+configure_connector_st(wa_addr, wa_offset, name, strlen(name) + 1);
 resp = SPAPR_DR_CC_RESPONSE_NEXT_CHILD;
 break;
 case FDT_END_NODE:
@@ -597,8 +603,7 @@ static void rtas_ibm_configure_connector(PowerPCCPU *cpu,
 /* provide the name of the next OF property */
 wa_offset = CC_VAL_DATA_OFFSET;
 rtas_st(wa_addr, CC_IDX_PROP_NAME_OFFSET, wa_offset);
-rtas_st_buffer_direct(wa_addr + wa_offset, CC_WA_LEN - wa_offset,
-  (uint8_t *)name, strlen(name) + 1);
+configure_connector_st(wa_addr, wa_offset, name, strlen(name) + 1);
 
 /* provide the length and value of the OF property. data gets
  * placed immediately after NULL terminator of the OF property's
@@ -607,9 +612,7 @@ static void rtas_ibm_configure_connector(PowerPCCPU *cpu,
 wa_offset += strlen(name) + 1,
 rtas_st(wa_addr, CC_IDX_PROP_LEN, prop_len);
 rtas_st(wa_addr, CC_IDX_PROP_DATA_OFFSET, wa_offset);
-rtas_st_buffer_direct(wa_addr + wa_offset, CC_WA_LEN - wa_offset,
-  (uint8_t *)((struct fdt_property 
*)prop)->data,
-  prop_len);
+configure_connector_st(wa_addr, wa_offset, prop->data, prop_len);
 resp = SPAPR_DR_CC_RESPONSE_NEXT_PROPERTY;
 break;
 case FDT_END:
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 1e10fc9..1f9e722 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -506,14 +506,6 @@ static inline void rtas_st(target_ulong phys, int n, 
uint32_t val)
 stl_be_phys(&address_space_memory, ppc64_phys_to_real(phys + 4*n), val);
 }
 
-static inline void rtas_st_buffer_direct(target_ulong phys,
- target_ulong phys_len,
- uint8_t *buffer, uint16_t buffer_len)
-{
-cpu_physical_memory_write(ppc64_phys_to_real(phys), buffer,
-  MIN(buffer_len, phys_len));
-}
-
 typedef void (*spapr_rtas_fn)(PowerPCCPU *cpu, sPAPRMachineState *sm,
   uint32_t token,
   uint32_t nargs, target_ulong args,
-- 
2.5.0

[Qemu-devel] [PULL 01/40] target-ppc: Use sensible POWER8/POWER8E versions

2016-01-31 Thread David Gibson

From: Benjamin Herrenschmidt 

We never released anything older than POWER8 DD2.0 and POWER8E DD2.1,
so let's use these versions, without that some firmware or Linux code
might fail to use some HW features that were non functional in earlier
internal only spins of the chip.

Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: David Gibson 
---
 target-ppc/cpu-models.c | 12 ++--
 target-ppc/cpu-models.h |  4 ++--
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/target-ppc/cpu-models.c b/target-ppc/cpu-models.c
index 884e31d..ed005d7 100644
--- a/target-ppc/cpu-models.c
+++ b/target-ppc/cpu-models.c
@@ -1139,10 +1139,10 @@
 "POWER7 v2.3")
 POWERPC_DEF("POWER7+_v2.1",  CPU_POWERPC_POWER7P_v21,POWER7,
 "POWER7+ v2.1")
-POWERPC_DEF("POWER8E_v1.0",  CPU_POWERPC_POWER8E_v10,POWER8,
-"POWER8E v1.0")
-POWERPC_DEF("POWER8_v1.0",   CPU_POWERPC_POWER8_v10, POWER8,
-"POWER8 v1.0")
+POWERPC_DEF("POWER8E_v2.1",  CPU_POWERPC_POWER8E_v21,POWER8,
+"POWER8E v2.1")
+POWERPC_DEF("POWER8_v2.0",   CPU_POWERPC_POWER8_v20, POWER8,
+"POWER8 v2.0")
 POWERPC_DEF("970_v2.2",  CPU_POWERPC_970_v22,970,
 "PowerPC 970 v2.2")
 POWERPC_DEF("970fx_v1.0",CPU_POWERPC_970FX_v10,  970,
@@ -1390,8 +1390,8 @@ PowerPCCPUAlias ppc_cpu_aliases[] = {
 { "POWER5gs", "POWER5+_v2.1" },
 { "POWER7", "POWER7_v2.3" },
 { "POWER7+", "POWER7+_v2.1" },
-{ "POWER8E", "POWER8E_v1.0" },
-{ "POWER8", "POWER8_v1.0" },
+{ "POWER8E", "POWER8E_v2.1" },
+{ "POWER8", "POWER8_v2.0" },
 { "970", "970_v2.2" },
 { "970fx", "970fx_v3.1" },
 { "970mp", "970mp_v1.1" },
diff --git a/target-ppc/cpu-models.h b/target-ppc/cpu-models.h
index 9d80e72..2992427 100644
--- a/target-ppc/cpu-models.h
+++ b/target-ppc/cpu-models.h
@@ -557,9 +557,9 @@ enum {
 CPU_POWERPC_POWER7P_BASE   = 0x004A,
 CPU_POWERPC_POWER7P_v21= 0x004A0201,
 CPU_POWERPC_POWER8E_BASE   = 0x004B,
-CPU_POWERPC_POWER8E_v10= 0x004B0100,
+CPU_POWERPC_POWER8E_v21= 0x004B0201,
 CPU_POWERPC_POWER8_BASE= 0x004D,
-CPU_POWERPC_POWER8_v10 = 0x004D0100,
+CPU_POWERPC_POWER8_v20 = 0x004D0200,
 CPU_POWERPC_970_v22= 0x00390202,
 CPU_POWERPC_970FX_v10  = 0x00391100,
 CPU_POWERPC_970FX_v20  = 0x003C0200,
-- 
2.5.0

[Qemu-devel] [PULL 05/40] mac_dbdma: add DBDMA controller state to VMStateDescription

2016-01-31 Thread David Gibson

From: Mark Cave-Ayland 

Make sure that we include the DBDMA controller state in the migration
stream.

Signed-off-by: Mark Cave-Ayland 
Signed-off-by: David Gibson 
---
 hw/misc/macio/mac_dbdma.c | 40 
 1 file changed, 36 insertions(+), 4 deletions(-)

diff --git a/hw/misc/macio/mac_dbdma.c b/hw/misc/macio/mac_dbdma.c
index c6d5b96..d81dea7 100644
--- a/hw/misc/macio/mac_dbdma.c
+++ b/hw/misc/macio/mac_dbdma.c
@@ -713,20 +713,52 @@ static const MemoryRegionOps dbdma_ops = {
 },
 };
 
-static const VMStateDescription vmstate_dbdma_channel = {
-.name = "dbdma_channel",
+static const VMStateDescription vmstate_dbdma_io = {
+.name = "dbdma_io",
+.version_id = 0,
+.minimum_version_id = 0,
+.fields = (VMStateField[]) {
+VMSTATE_UINT64(addr, struct DBDMA_io),
+VMSTATE_INT32(len, struct DBDMA_io),
+VMSTATE_INT32(is_last, struct DBDMA_io),
+VMSTATE_INT32(is_dma_out, struct DBDMA_io),
+VMSTATE_BOOL(processing, struct DBDMA_io),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static const VMStateDescription vmstate_dbdma_cmd = {
+.name = "dbdma_cmd",
 .version_id = 0,
 .minimum_version_id = 0,
 .fields = (VMStateField[]) {
+VMSTATE_UINT16(req_count, dbdma_cmd),
+VMSTATE_UINT16(command, dbdma_cmd),
+VMSTATE_UINT32(phy_addr, dbdma_cmd),
+VMSTATE_UINT32(cmd_dep, dbdma_cmd),
+VMSTATE_UINT16(res_count, dbdma_cmd),
+VMSTATE_UINT16(xfer_status, dbdma_cmd),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static const VMStateDescription vmstate_dbdma_channel = {
+.name = "dbdma_channel",
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
 VMSTATE_UINT32_ARRAY(regs, struct DBDMA_channel, DBDMA_REGS),
+VMSTATE_STRUCT(io, struct DBDMA_channel, 0, vmstate_dbdma_io, 
DBDMA_io),
+VMSTATE_STRUCT(current, struct DBDMA_channel, 0, vmstate_dbdma_cmd,
+   dbdma_cmd),
 VMSTATE_END_OF_LIST()
 }
 };
 
 static const VMStateDescription vmstate_dbdma = {
 .name = "dbdma",
-.version_id = 2,
-.minimum_version_id = 2,
+.version_id = 3,
+.minimum_version_id = 3,
 .fields = (VMStateField[]) {
 VMSTATE_STRUCT_ARRAY(channels, DBDMAState, DBDMA_CHANNELS, 1,
  vmstate_dbdma_channel, DBDMA_channel),
-- 
2.5.0

[Qemu-devel] [PULL 00/40] ppc-for-2.6 queue 20160201

2016-01-31 Thread David Gibson

The following changes since commit 0430891ce162b986c6e02a7729a942ecd2a32ca4:

  hw: Clean up includes (2016-01-29 15:07:25 +)

are available in the git repository at:

  git://github.com/dgibson/qemu.git tags/ppc-for-2.6-20160201

for you to fetch changes up to d1277156b5d3df6d75d138a7eec6ff80934cdcec:

  target-ppc: mcrfs should always update FEX/VX and only clear exception bits 
(2016-02-01 13:27:01 +1100)


I hope I've managed to finally iron out the problems in this series.
I've fixed the clang build problem from the 20160129 request and
checked build on a 32-bit host.  I've also added the mcrfs fix on top.



ppc patch queue for 2016-02-01

Currently accumulated patches for target-ppc, pseries machine type and
related devices.
  * Cleanup of error handling code in spapr
  * A number of fixes for Macintosh devices for the benefit of MacOS 9 and X
  * Remove some abuses of the RTAS memory access functions in spapr
  * Fixes for the gdbstub (and monitor debug) for VMX and VSX extensions.
  * Fix pseries machine hotplug memory under TCG
  * Clean up and extend handling of multiple page sizes with 64-bit hash MMUs
  * Fix to the TCG implementation of mcrfs


Alyssa Milburn (1):
  cuda.c: return error for unknown commands

Anton Blanchard (1):
  target-ppc: gdbstub: Add VSX support

Benjamin Herrenschmidt (1):
  target-ppc: Use sensible POWER8/POWER8E versions

Bharata B Rao (1):
  spapr: Don't create ibm,dynamic-reconfiguration-memory w/o DR LMBs

David Gibson (22):
  spapr: Small fixes to rtas_ibm_get_system_parameter, remove rtas_st_buffer
  spapr: Remove rtas_st_buffer_direct()
  spapr: Remove abuse of rtas_ld() in h_client_architecture_support
  ppc: Clean up error handling in ppc_set_compat()
  pseries: Clean up error handling of spapr_cpu_init()
  pseries: Clean up error handling in spapr_validate_node_memory()
  pseries: Clean up error handling in spapr_vga_init()
  pseries: Clean up error handling in spapr_rtas_register()
  pseries: Clean up error handling in xics_system_init()
  pseries: Clean up error reporting in ppc_spapr_init()
  pseries: Clean up error reporting in htab migration functions
  pseries: Allow TCG h_enter to work with hotplugged memory
  target-ppc: Remove unused kvmppc_read_segment_page_sizes() stub
  target-ppc: Convert mmu-hash{32,64}.[ch] from CPUPPCState to PowerPCCPU
  target-ppc: Rework ppc_store_slb
  target-ppc: Rework SLB page size lookup
  target-ppc: Use actual page size encodings from HPTE
  target-ppc: Remove unused mmu models from ppc_tlb_invalidate_one
  target-ppc: Split 44x tlbiva from ppc_tlb_invalidate_one()
  target-ppc: Add new TLB invalidate by HPTE call for hash64 MMUs
  target-ppc: Helper to determine page size information from hpte alone
  target-ppc: Allow more page sizes for POWER7 & POWER8 in TCG

Greg Kurz (6):
  target-ppc: kvm: fix floating point registers sync on little-endian hosts
  target-ppc: rename and export maybe_bswap_register()
  target-ppc: gdbstub: fix float registers for little-endian guests
  target-ppc: gdbstub: introduce avr_need_swap()
  target-ppc: gdbstub: fix altivec registers for little-endian guests
  target-ppc: gdbstub: fix spe registers for little-endian guests

James Clarke (2):
  target-ppc: Make every FPSCR_ macro have a corresponding FP_ macro
  target-ppc: mcrfs should always update FEX/VX and only clear exception 
bits

Mark Cave-Ayland (5):
  target-ppc: use cpu_write_xer() helper in cpu_post_load
  macio: use the existing IDEDMA aiocb to hold the active DMA aiocb
  macio: add dma_active to VMStateDescription
  mac_dbdma: add DBDMA controller state to VMStateDescription
  cuda: add missing fields to VMStateDescription

Programmingkid (1):
  uninorth.c: add support for UniNorth kMacRISCPCIAddressSelect (0x48) 
register

 configure   |   6 +-
 gdb-xml/power-vsx.xml   |  44 
 hw/ide/macio.c  |  23 ++--
 hw/misc/macio/cuda.c|  12 +-
 hw/misc/macio/mac_dbdma.c   |  40 ++-
 hw/pci-host/uninorth.c  |   9 ++
 hw/ppc/mac.h|   1 -
 hw/ppc/spapr.c  | 112 ++
 hw/ppc/spapr_hcall.c| 145 +---
 hw/ppc/spapr_rtas.c |  50 
 include/hw/ppc/spapr.h  |  36 ++
 target-ppc/cpu-models.c |  12 +-
 target-ppc/cpu-models.h |   4 +-
 target-ppc/cpu.h|  41 +--
 target-ppc/gdbstub.c|  10 +-
 target-ppc/helper.h |   1 +
 target-ppc/kvm.c|  14 ++-
 target-ppc/kvm_ppc.h|   5 -
 target-ppc/machine.c|  22 +++-
 target-ppc/mmu-hash32.c |  68 ++-
 target-ppc/mmu-hash32.h |  30 ++---
 target-ppc/mmu-hash64.c | 270 ++

[Qemu-devel] [PULL 06/40] cuda: add missing fields to VMStateDescription

2016-01-31 Thread David Gibson

From: Mark Cave-Ayland 

Include some fields missed from the previous VMState conversion to the
migration stream, as well as the new SR_INT delay timer.

Signed-off-by: Mark Cave-Ayland 
Signed-off-by: David Gibson 
---
 hw/misc/macio/cuda.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/misc/macio/cuda.c b/hw/misc/macio/cuda.c
index 8d450cf..0bd90e8 100644
--- a/hw/misc/macio/cuda.c
+++ b/hw/misc/macio/cuda.c
@@ -705,15 +705,17 @@ static const VMStateDescription vmstate_cuda_timer = {
 
 static const VMStateDescription vmstate_cuda = {
 .name = "cuda",
-.version_id = 2,
-.minimum_version_id = 2,
+.version_id = 3,
+.minimum_version_id = 3,
 .fields = (VMStateField[]) {
 VMSTATE_UINT8(a, CUDAState),
 VMSTATE_UINT8(b, CUDAState),
+VMSTATE_UINT8(last_b, CUDAState),
 VMSTATE_UINT8(dira, CUDAState),
 VMSTATE_UINT8(dirb, CUDAState),
 VMSTATE_UINT8(sr, CUDAState),
 VMSTATE_UINT8(acr, CUDAState),
+VMSTATE_UINT8(last_acr, CUDAState),
 VMSTATE_UINT8(pcr, CUDAState),
 VMSTATE_UINT8(ifr, CUDAState),
 VMSTATE_UINT8(ier, CUDAState),
@@ -728,6 +730,7 @@ static const VMStateDescription vmstate_cuda = {
 VMSTATE_STRUCT_ARRAY(timers, CUDAState, 2, 1,
  vmstate_cuda_timer, CUDATimer),
 VMSTATE_TIMER_PTR(adb_poll_timer, CUDAState),
+VMSTATE_TIMER_PTR(sr_delay_timer, CUDAState),
 VMSTATE_END_OF_LIST()
 }
 };
-- 
2.5.0

[Qemu-devel] [PULL 03/40] macio: use the existing IDEDMA aiocb to hold the active DMA aiocb

2016-01-31 Thread David Gibson

From: Mark Cave-Ayland 

Currently the aiocb is held within MACIOIDEState, however the IDE core code
assumes that the current actvie DMA aiocb is held in aiocb in a few places,
e.g. ide_bus_reset() and ide_reset().

Switch over to using IDEDMA aiocb to store the aiocb for the current active
DMA request so that bus resets and restarts are handled correctly. As a
consequence we can now use ide_set_inactive() rather than handling its
functionality ourselves.

Signed-off-by: Mark Cave-Ayland 
Reviewed-by: John Snow 
Signed-off-by: David Gibson 
---
 hw/ide/macio.c | 20 
 hw/ppc/mac.h   |  1 -
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/hw/ide/macio.c b/hw/ide/macio.c
index 336784b..bfdc377 100644
--- a/hw/ide/macio.c
+++ b/hw/ide/macio.c
@@ -120,8 +120,8 @@ static void pmac_dma_read(BlockBackend *blk,
 MACIO_DPRINTF("--- Block read transfer - sector_num: %" PRIx64 "  "
   "nsector: %x\n", (offset >> 9), (bytes >> 9));
 
-m->aiocb = blk_aio_readv(blk, (offset >> 9), &io->iov, (bytes >> 9),
- cb, io);
+s->bus->dma->aiocb = blk_aio_readv(blk, (offset >> 9), &io->iov,
+ (bytes >> 9), cb, io);
 }
 
 static void pmac_dma_write(BlockBackend *blk,
@@ -205,8 +205,8 @@ static void pmac_dma_write(BlockBackend *blk,
 MACIO_DPRINTF("--- Block write transfer - sector_num: %" PRIx64 "  "
   "nsector: %x\n", (offset >> 9), (bytes >> 9));
 
-m->aiocb = blk_aio_writev(blk, (offset >> 9), &io->iov, (bytes >> 9),
-  cb, io);
+s->bus->dma->aiocb = blk_aio_writev(blk, (offset >> 9), &io->iov,
+ (bytes >> 9), cb, io);
 }
 
 static void pmac_dma_trim(BlockBackend *blk,
@@ -232,8 +232,8 @@ static void pmac_dma_trim(BlockBackend *blk,
 s->io_buffer_index += io->len;
 io->len = 0;
 
-m->aiocb = ide_issue_trim(blk, (offset >> 9), &io->iov, (bytes >> 9),
-  cb, io);
+s->bus->dma->aiocb = ide_issue_trim(blk, (offset >> 9), &io->iov,
+ (bytes >> 9), cb, io);
 }
 
 static void pmac_ide_atapi_transfer_cb(void *opaque, int ret)
@@ -292,6 +292,8 @@ done:
 } else {
 block_acct_done(blk_get_stats(s->blk), &s->acct);
 }
+
+ide_set_inactive(s, false);
 io->dma_end(opaque);
 }
 
@@ -306,7 +308,6 @@ static void pmac_ide_transfer_cb(void *opaque, int ret)
 
 if (ret < 0) {
 MACIO_DPRINTF("DMA error: %d\n", ret);
-m->aiocb = NULL;
 ide_dma_error(s);
 goto done;
 }
@@ -357,6 +358,8 @@ done:
 block_acct_done(blk_get_stats(s->blk), &s->acct);
 }
 }
+
+ide_set_inactive(s, false);
 io->dma_end(opaque);
 }
 
@@ -394,8 +397,9 @@ static void pmac_ide_transfer(DBDMA_io *io)
 static void pmac_ide_flush(DBDMA_io *io)
 {
 MACIOIDEState *m = io->opaque;
+IDEState *s = idebus_active_if(&m->bus);
 
-if (m->aiocb) {
+if (s->bus->dma->aiocb) {
 blk_drain_all();
 }
 }
diff --git a/hw/ppc/mac.h b/hw/ppc/mac.h
index e375ed2..ecf7792 100644
--- a/hw/ppc/mac.h
+++ b/hw/ppc/mac.h
@@ -134,7 +134,6 @@ typedef struct MACIOIDEState {
 
 MemoryRegion mem;
 IDEBus bus;
-BlockAIOCB *aiocb;
 IDEDMA dma;
 void *dbdma;
 bool dma_active;
-- 
2.5.0

[Qemu-devel] [PULL 14/40] pseries: Clean up error handling in spapr_vga_init()

2016-01-31 Thread David Gibson

Use error_setg() to return an error rather than an explicit exit().
Previously it was an exit(0) instead of a non-zero exit code, which was
simply a bug.  Also improve the error message.

While we're at it change the type of spapr_vga_init() to bool since that's
how we're using it anyway.

Signed-off-by: David Gibson 
Reviewed-by: Thomas Huth 
Reviewed-by: Alexey Kardashevskiy 
Reviewed-by: Markus Armbruster 
---
 hw/ppc/spapr.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 4e6ee6d..3f90e50 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1246,7 +1246,7 @@ static void spapr_rtc_create(sPAPRMachineState *spapr)
 }
 
 /* Returns whether we want to use VGA or not */
-static int spapr_vga_init(PCIBus *pci_bus)
+static bool spapr_vga_init(PCIBus *pci_bus, Error **errp)
 {
 switch (vga_interface_type) {
 case VGA_NONE:
@@ -1257,9 +1257,9 @@ static int spapr_vga_init(PCIBus *pci_bus)
 case VGA_VIRTIO:
 return pci_vga_init(pci_bus) != NULL;
 default:
-fprintf(stderr, "This vga model is not supported,"
-"currently it only supports -vga std\n");
-exit(0);
+error_setg(errp,
+   "Unsupported VGA mode, only -vga std or -vga virtio is 
supported");
+return false;
 }
 }
 
@@ -1934,7 +1934,7 @@ static void ppc_spapr_init(MachineState *machine)
 }
 
 /* Graphics */
-if (spapr_vga_init(phb->bus)) {
+if (spapr_vga_init(phb->bus, &error_fatal)) {
 spapr->has_graphics = true;
 machine->usb |= defaults_enabled() && !machine->usb_disabled;
 }
-- 
2.5.0

[Qemu-devel] [PULL 10/40] spapr: Don't create ibm, dynamic-reconfiguration-memory w/o DR LMBs

2016-01-31 Thread David Gibson

From: Bharata B Rao 

If guest doesn't have any dynamically reconfigurable (DR) logical memory
blocks (LMB), then we shouldn't create ibm,dynamic-reconfiguration-memory
device tree node.

Signed-off-by: Bharata B Rao 
Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 08da895..0ac6368 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -764,6 +764,13 @@ static int spapr_populate_drconf_memory(sPAPRMachineState 
*spapr, void *fdt)
 int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1;
 
 /*
+ * Don't create the node if there are no DR LMBs.
+ */
+if (!nr_lmbs) {
+return 0;
+}
+
+/*
  * Allocate enough buffer size to fit in ibm,dynamic-memory
  * or ibm,associativity-lookup-arrays
  */
@@ -869,7 +876,7 @@ int spapr_h_cas_compose_response(sPAPRMachineState *spapr,
 _FDT((spapr_fixup_cpu_dt(fdt, spapr)));
 }
 
-/* Generate memory nodes or ibm,dynamic-reconfiguration-memory node */
+/* Generate ibm,dynamic-reconfiguration-memory node if required */
 if (memory_update && smc->dr_lmb_enabled) {
 _FDT((spapr_populate_drconf_memory(spapr, fdt)));
 }
-- 
2.5.0

[Qemu-devel] [PULL 17/40] pseries: Clean up error reporting in ppc_spapr_init()

2016-01-31 Thread David Gibson

This function includes a number of explicit fprintf()s for errors.
Change these to use error_report() instead.

Also replace the single exit(EXIT_FAILURE) with an explicit exit(1), since
the latter is the more usual idiom in qemu by a large margin.

Signed-off-by: David Gibson 
Reviewed-by: Alexey Kardashevskiy 
Reviewed-by: Markus Armbruster 
---
 hw/ppc/spapr.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 1281e07..c05ddfb 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1789,8 +1789,8 @@ static void ppc_spapr_init(MachineState *machine)
 }
 
 if (spapr->rma_size > node0_size) {
-fprintf(stderr, "Error: Numa node 0 has to span the RMA 
(%#08"HWADDR_PRIx")\n",
-spapr->rma_size);
+error_report("Numa node 0 has to span the RMA (%#08"HWADDR_PRIx")",
+ spapr->rma_size);
 exit(1);
 }
 
@@ -1856,10 +1856,10 @@ static void ppc_spapr_init(MachineState *machine)
 ram_addr_t hotplug_mem_size = machine->maxram_size - machine->ram_size;
 
 if (machine->ram_slots > SPAPR_MAX_RAM_SLOTS) {
-error_report("Specified number of memory slots %" PRIu64
- " exceeds max supported %d",
+error_report("Specified number of memory slots %"
+ PRIu64" exceeds max supported %d",
  machine->ram_slots, SPAPR_MAX_RAM_SLOTS);
-exit(EXIT_FAILURE);
+exit(1);
 }
 
 spapr->hotplug_memory.base = ROUND_UP(machine->ram_size,
@@ -1955,8 +1955,9 @@ static void ppc_spapr_init(MachineState *machine)
 }
 
 if (spapr->rma_size < (MIN_RMA_SLOF << 20)) {
-fprintf(stderr, "qemu: pSeries SLOF firmware requires >= "
-"%ldM guest RMA (Real Mode Area memory)\n", MIN_RMA_SLOF);
+error_report(
+"pSeries SLOF firmware requires >= %ldM guest RMA (Real Mode Area 
memory)",
+MIN_RMA_SLOF);
 exit(1);
 }
 
@@ -1972,8 +1973,8 @@ static void ppc_spapr_init(MachineState *machine)
 kernel_le = kernel_size > 0;
 }
 if (kernel_size < 0) {
-fprintf(stderr, "qemu: error loading %s: %s\n",
-kernel_filename, load_elf_strerror(kernel_size));
+error_report("error loading %s: %s",
+ kernel_filename, load_elf_strerror(kernel_size));
 exit(1);
 }
 
@@ -1986,8 +1987,8 @@ static void ppc_spapr_init(MachineState *machine)
 initrd_size = load_image_targphys(initrd_filename, initrd_base,
   load_limit - initrd_base);
 if (initrd_size < 0) {
-fprintf(stderr, "qemu: could not load initial ram disk '%s'\n",
-initrd_filename);
+error_report("could not load initial ram disk '%s'",
+ initrd_filename);
 exit(1);
 }
 } else {
-- 
2.5.0

[Qemu-devel] [PULL 09/40] spapr: Remove abuse of rtas_ld() in h_client_architecture_support

2016-01-31 Thread David Gibson

h_client_architecture_support() uses rtas_ld() for general purpose memory
access, despite the fact that it's not an RTAS routine at all and rtas_ld
makes things more awkward.

Clean this up by replacing rtas_ld() calls with appropriate ldXX_phys()
calls.

Signed-off-by: David Gibson 
Reviewed-by: Alexey Kardashevskiy 
---
 hw/ppc/spapr_hcall.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 51083cd..fdd7fea 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -862,7 +862,8 @@ static target_ulong 
h_client_architecture_support(PowerPCCPU *cpu_,
   target_ulong opcode,
   target_ulong *args)
 {
-target_ulong list = args[0], ov_table;
+target_ulong list = ppc64_phys_to_real(args[0]);
+target_ulong ov_table, ov5;
 PowerPCCPUClass *pcc_ = POWERPC_CPU_GET_CLASS(cpu_);
 CPUState *cs;
 bool cpu_match = false, cpu_update = true, memory_update = false;
@@ -876,9 +877,9 @@ static target_ulong 
h_client_architecture_support(PowerPCCPU *cpu_,
 for (counter = 0; counter < 512; ++counter) {
 uint32_t pvr, pvr_mask;
 
-pvr_mask = rtas_ld(list, 0);
+pvr_mask = ldl_be_phys(&address_space_memory, list);
 list += 4;
-pvr = rtas_ld(list, 0);
+pvr = ldl_be_phys(&address_space_memory, list);
 list += 4;
 
 trace_spapr_cas_pvr_try(pvr);
@@ -949,14 +950,13 @@ static target_ulong 
h_client_architecture_support(PowerPCCPU *cpu_,
 /* For the future use: here @ov_table points to the first option vector */
 ov_table = list;
 
-list = cas_get_option_vector(5, ov_table);
-if (!list) {
+ov5 = cas_get_option_vector(5, ov_table);
+if (!ov5) {
 return H_SUCCESS;
 }
 
 /* @list now points to OV 5 */
-list += 2;
-ov5_byte2 = rtas_ld(list, 0) >> 24;
+ov5_byte2 = ldub_phys(&address_space_memory, ov5 + 2);
 if (ov5_byte2 & OV5_DRCONF_MEMORY) {
 memory_update = true;
 }
-- 
2.5.0

[Qemu-devel] [PULL 26/40] pseries: Allow TCG h_enter to work with hotplugged memory

2016-01-31 Thread David Gibson

The implementation of the H_ENTER hypercall for PAPR guests needs to
enforce correct access attributes on the inserted HPTE.  This means
determining if the HPTE's real address is a regular RAM address (which
requires attributes for coherent access) or an IO address (which requires
attributes for cache-inhibited access).

At the moment this check is implemented with (raddr < machine->ram_size),
but that only handles addresses in the base RAM area, not any hotplugged
RAM.

This patch corrects the problem with a new helper.

Signed-off-by: David Gibson 
Reviewed-by: Alexey Kardashevskiy 
---
 hw/ppc/spapr_hcall.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 655c433..093d426 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -85,10 +85,25 @@ static inline bool valid_pte_index(CPUPPCState *env, 
target_ulong pte_index)
 return true;
 }
 
+static bool is_ram_address(sPAPRMachineState *spapr, hwaddr addr)
+{
+MachineState *machine = MACHINE(spapr);
+MemoryHotplugState *hpms = &spapr->hotplug_memory;
+
+if (addr < machine->ram_size) {
+return true;
+}
+if ((addr >= hpms->base)
+&& ((addr - hpms->base) < memory_region_size(&hpms->mr))) {
+return true;
+}
+
+return false;
+}
+
 static target_ulong h_enter(PowerPCCPU *cpu, sPAPRMachineState *spapr,
 target_ulong opcode, target_ulong *args)
 {
-MachineState *machine = MACHINE(spapr);
 CPUPPCState *env = &cpu->env;
 target_ulong flags = args[0];
 target_ulong pte_index = args[1];
@@ -120,7 +135,7 @@ static target_ulong h_enter(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
 
 raddr = (ptel & HPTE64_R_RPN) & ~((1ULL << page_shift) - 1);
 
-if (raddr < machine->ram_size) {
+if (is_ram_address(spapr, raddr)) {
 /* Regular RAM - should have WIMG=0010 */
 if ((ptel & HPTE64_R_WIMG) != HPTE64_R_M) {
 return H_PARAMETER;
-- 
2.5.0

[Qemu-devel] [PULL 12/40] pseries: Clean up error handling of spapr_cpu_init()

2016-01-31 Thread David Gibson

Currently spapr_cpu_init() is hardcoded to handle any errors as fatal.
That works for now, since it's only called from initial setup where an
error here means we really can't proceed.

However, we'll want to handle this more flexibly for cpu hotplug in future
so generalize this using the error reporting infrastructure.  While we're
at it make a small cleanup in a related part of ppc_spapr_init() to use
error_report() instead of an old-style explicit fprintf().

Signed-off-by: David Gibson 
Reviewed-by: Bharata B Rao 
Reviewed-by: Alexey Kardashevskiy 
Reviewed-by: Markus Armbruster 
---
 hw/ppc/spapr.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 8862d18..61653ae 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1625,7 +1625,8 @@ static void spapr_boot_set(void *opaque, const char 
*boot_device,
 machine->boot_order = g_strdup(boot_device);
 }
 
-static void spapr_cpu_init(sPAPRMachineState *spapr, PowerPCCPU *cpu)
+static void spapr_cpu_init(sPAPRMachineState *spapr, PowerPCCPU *cpu,
+   Error **errp)
 {
 CPUPPCState *env = &cpu->env;
 
@@ -1643,7 +1644,13 @@ static void spapr_cpu_init(sPAPRMachineState *spapr, 
PowerPCCPU *cpu)
 }
 
 if (cpu->max_compat) {
-ppc_set_compat(cpu, cpu->max_compat, &error_fatal);
+Error *local_err = NULL;
+
+ppc_set_compat(cpu, cpu->max_compat, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
 }
 
 xics_cpu_setup(spapr->icp, cpu);
@@ -1812,10 +1819,10 @@ static void ppc_spapr_init(MachineState *machine)
 for (i = 0; i < smp_cpus; i++) {
 cpu = cpu_ppc_init(machine->cpu_model);
 if (cpu == NULL) {
-fprintf(stderr, "Unable to find PowerPC CPU definition\n");
+error_report("Unable to find PowerPC CPU definition");
 exit(1);
 }
-spapr_cpu_init(spapr, cpu);
+spapr_cpu_init(spapr, cpu, &error_fatal);
 }
 
 if (kvm_enabled()) {
-- 
2.5.0

[Qemu-devel] [PULL 07/40] spapr: Small fixes to rtas_ibm_get_system_parameter, remove rtas_st_buffer

2016-01-31 Thread David Gibson

rtas_st_buffer() appears in spapr.h as though it were a widely used helper,
but in fact it is only used for saving data in a format used by
rtas_ibm_get_system_parameter().  This changes it to a local helper more
specifically for that function.

While we're there fix a couple of small defects in
rtas_ibm_get_system_parameter:
  - For the string value SPLPAR_CHARACTERISTICS, it wasn't including the
terminating \0 in the length which it should according to LoPAPR
7.3.16.1
  - It now checks that the supplied buffer has at least enough space for
the length of the returned data, and returns an error if it does not.

Signed-off-by: David Gibson 
Reviewed-by: Alexey Kardashevskiy 
---
 hw/ppc/spapr_rtas.c| 21 +
 include/hw/ppc/spapr.h | 28 +---
 2 files changed, 26 insertions(+), 23 deletions(-)

diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index f3ead8c..ab11b32 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -229,6 +229,19 @@ static void rtas_stop_self(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
 env->msr = 0;
 }
 
+static inline int sysparm_st(target_ulong addr, target_ulong len,
+ const void *val, uint16_t vallen)
+{
+hwaddr phys = ppc64_phys_to_real(addr);
+
+if (len < 2) {
+return RTAS_OUT_SYSPARM_PARAM_ERROR;
+}
+stw_be_phys(&address_space_memory, phys, vallen);
+cpu_physical_memory_write(phys + 2, val, MIN(len - 2, vallen));
+return RTAS_OUT_SUCCESS;
+}
+
 static void rtas_ibm_get_system_parameter(PowerPCCPU *cpu,
   sPAPRMachineState *spapr,
   uint32_t token, uint32_t nargs,
@@ -238,7 +251,7 @@ static void rtas_ibm_get_system_parameter(PowerPCCPU *cpu,
 target_ulong parameter = rtas_ld(args, 0);
 target_ulong buffer = rtas_ld(args, 1);
 target_ulong length = rtas_ld(args, 2);
-target_ulong ret = RTAS_OUT_SUCCESS;
+target_ulong ret;
 
 switch (parameter) {
 case RTAS_SYSPARM_SPLPAR_CHARACTERISTICS: {
@@ -250,18 +263,18 @@ static void rtas_ibm_get_system_parameter(PowerPCCPU *cpu,
   current_machine->ram_size / M_BYTE,
   smp_cpus,
   max_cpus);
-rtas_st_buffer(buffer, length, (uint8_t *)param_val, 
strlen(param_val));
+ret = sysparm_st(buffer, length, param_val, strlen(param_val) + 1);
 g_free(param_val);
 break;
 }
 case RTAS_SYSPARM_DIAGNOSTICS_RUN_MODE: {
 uint8_t param_val = DIAGNOSTICS_RUN_MODE_DISABLED;
 
-rtas_st_buffer(buffer, length, ¶m_val, sizeof(param_val));
+ret = sysparm_st(buffer, length, ¶m_val, sizeof(param_val));
 break;
 }
 case RTAS_SYSPARM_UUID:
-rtas_st_buffer(buffer, length, qemu_uuid, (qemu_uuid_set ? 16 : 0));
+ret = sysparm_st(buffer, length, qemu_uuid, (qemu_uuid_set ? 16 : 0));
 break;
 default:
 ret = RTAS_OUT_NOT_SUPPORTED;
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 53af76a..1e10fc9 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -408,14 +408,15 @@ int spapr_allocate_irq_block(int num, bool lsi, bool msi);
 #define RTAS_SLOT_PERM_ERR_LOG   2
 
 /* RTAS return codes */
-#define RTAS_OUT_SUCCESS0
-#define RTAS_OUT_NO_ERRORS_FOUND1
-#define RTAS_OUT_HW_ERROR   -1
-#define RTAS_OUT_BUSY   -2
-#define RTAS_OUT_PARAM_ERROR-3
-#define RTAS_OUT_NOT_SUPPORTED  -3
-#define RTAS_OUT_NO_SUCH_INDICATOR  -3
-#define RTAS_OUT_NOT_AUTHORIZED -9002
+#define RTAS_OUT_SUCCESS0
+#define RTAS_OUT_NO_ERRORS_FOUND1
+#define RTAS_OUT_HW_ERROR   -1
+#define RTAS_OUT_BUSY   -2
+#define RTAS_OUT_PARAM_ERROR-3
+#define RTAS_OUT_NOT_SUPPORTED  -3
+#define RTAS_OUT_NO_SUCH_INDICATOR  -3
+#define RTAS_OUT_NOT_AUTHORIZED -9002
+#define RTAS_OUT_SYSPARM_PARAM_ERROR-
 
 /* RTAS tokens */
 #define RTAS_TOKEN_BASE  0x2000
@@ -513,17 +514,6 @@ static inline void rtas_st_buffer_direct(target_ulong phys,
   MIN(buffer_len, phys_len));
 }
 
-static inline void rtas_st_buffer(target_ulong phys, target_ulong phys_len,
-  uint8_t *buffer, uint16_t buffer_len)
-{
-if (phys_len < 2) {
-return;
-}
-stw_be_phys(&address_space_memory,
-ppc64_phys_to_real(phys), buffer_len);
-rtas_st_buffer_direct(phys + 2, phys_len - 2, buffer, buffer_len);
-}
-
 typedef void (*spapr_rtas_fn)(PowerPCCPU *cpu, sPAPRMachineState *sm,
   uint32_t token,
   uint32_t nargs, target_ulong args,
-- 
2.5.0

[Qemu-devel] [PULL 19/40] target-ppc: kvm: fix floating point registers sync on little-endian hosts

2016-01-31 Thread David Gibson

From: Greg Kurz 

On VSX capable CPUs, the 32 FP registers are mapped to the high-bits
of the 32 first VSX registers. So if you have:

VSR31 = (uint128) 0x0102030405060708090a0b0c0d0e0f00

then

FPR31 = (uint64) 0x0102030405060708

The kernel stores the VSX registers in the fp_state struct following the
host endian element ordering.

On big-endian:

fp_state.fpr[31][0] = 0x0102030405060708
fp_state.fpr[31][1] = 0x090a0b0c0d0e0f00

On little-endian:

fp_state.fpr[31][0] = 0x090a0b0c0d0e0f00
fp_state.fpr[31][1] = 0x0102030405060708

The KVM_GET_ONE_REG and KVM_SET_ONE_REG ioctls preserve this ordering, but
QEMU considers it as big-endian and always copies element [0] to the
fpr[] array and element [1] to the vsr[] array. This does not work with
little-endian hosts, and you will get:

(qemu) p $f31
0x90a0b0c0d0e0f00

instead of:

(qemu) p $f31
0x102030405060708

This patch fixes the element ordering for little-endian hosts.

Signed-off-by: Greg Kurz 
Signed-off-by: David Gibson 
---
 target-ppc/kvm.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index c2e8912..60ff119 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -650,8 +650,13 @@ static int kvm_put_fp(CPUState *cs)
 for (i = 0; i < 32; i++) {
 uint64_t vsr[2];
 
+#ifdef HOST_WORDS_BIGENDIAN
 vsr[0] = float64_val(env->fpr[i]);
 vsr[1] = env->vsr[i];
+#else
+vsr[0] = env->vsr[i];
+vsr[1] = float64_val(env->fpr[i]);
+#endif
 reg.addr = (uintptr_t) &vsr;
 reg.id = vsx ? KVM_REG_PPC_VSR(i) : KVM_REG_PPC_FPR(i);
 
@@ -721,10 +726,17 @@ static int kvm_get_fp(CPUState *cs)
 vsx ? "VSR" : "FPR", i, strerror(errno));
 return ret;
 } else {
+#ifdef HOST_WORDS_BIGENDIAN
 env->fpr[i] = vsr[0];
 if (vsx) {
 env->vsr[i] = vsr[1];
 }
+#else
+env->fpr[i] = vsr[1];
+if (vsx) {
+env->vsr[i] = vsr[0];
+}
+#endif
 }
 }
 }
-- 
2.5.0

[Qemu-devel] [PULL 15/40] pseries: Clean up error handling in spapr_rtas_register()

2016-01-31 Thread David Gibson

The errors detected in this function necessarily indicate bugs in the rest
of the qemu code, rather than an external or configuration problem.

So, a simple assert() is more appropriate than any more complex error
reporting.

Signed-off-by: David Gibson 
Reviewed-by: Thomas Huth 
Reviewed-by: Alexey Kardashevskiy 
Reviewed-by: Markus Armbruster 
---
 hw/ppc/spapr_rtas.c | 12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 19e903d..07ad672 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -665,17 +665,11 @@ target_ulong spapr_rtas_call(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
 
 void spapr_rtas_register(int token, const char *name, spapr_rtas_fn fn)
 {
-if (!((token >= RTAS_TOKEN_BASE) && (token < RTAS_TOKEN_MAX))) {
-fprintf(stderr, "RTAS invalid token 0x%x\n", token);
-exit(1);
-}
+assert((token >= RTAS_TOKEN_BASE) && (token < RTAS_TOKEN_MAX));
 
 token -= RTAS_TOKEN_BASE;
-if (rtas_table[token].name) {
-fprintf(stderr, "RTAS call \"%s\" is registered already as 0x%x\n",
-rtas_table[token].name, token);
-exit(1);
-}
+
+assert(!rtas_table[token].name);
 
 rtas_table[token].name = name;
 rtas_table[token].fn = fn;
-- 
2.5.0

[Qemu-devel] [PULL 27/40] cuda.c: return error for unknown commands

2016-01-31 Thread David Gibson

From: Alyssa Milburn 

This avoids MacsBug hanging at startup in the absence of ADB mouse
input, by replying with an error (which is also what MOL does) when
it sends an unknown command (0x1c).

Signed-off-by: Alyssa Milburn 
Signed-off-by: David Gibson 
---
 hw/misc/macio/cuda.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/misc/macio/cuda.c b/hw/misc/macio/cuda.c
index 0bd90e8..316c1ac 100644
--- a/hw/misc/macio/cuda.c
+++ b/hw/misc/macio/cuda.c
@@ -606,6 +606,11 @@ static void cuda_receive_packet(CUDAState *s,
 }
 break;
 default:
+obuf[0] = ERROR_PACKET;
+obuf[1] = 0x2;
+obuf[2] = CUDA_PACKET;
+obuf[3] = data[0];
+cuda_send_packet_to_host(s, obuf, 4);
 break;
 }
 }
-- 
2.5.0

[Qemu-devel] [PULL 31/40] target-ppc: Rework ppc_store_slb

2016-01-31 Thread David Gibson

ppc_store_slb updates the SLB for PPC cpus with 64-bit hash MMUs.
Currently it takes two parameters, which contain values encoded as the
register arguments to the slbmte instruction, one register contains the
ESID portion of the SLBE and also the slot number, the other contains the
VSID portion of the SLBE.

We're shortly going to want to do some SLB updates from other code where
it is more convenient to supply the slot number and ESID separately, so
rework this function and its callers to work this way.

As a bonus, this slightly simplifies the emulation of segment registers for
when running a 32-bit OS on a 64-bit CPU.

Signed-off-by: David Gibson 
Reviewed-by: Laurent Vivier 
Acked-by: Benjamin Herrenschmidt 
Reviewed-by: Alexander Graf 
---
 target-ppc/kvm.c|  2 +-
 target-ppc/mmu-hash64.c | 24 +---
 target-ppc/mmu-hash64.h |  3 ++-
 target-ppc/mmu_helper.c | 14 +-
 4 files changed, 21 insertions(+), 22 deletions(-)

diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 3e61fcd..70ca296 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -1205,7 +1205,7 @@ int kvm_arch_get_registers(CPUState *cs)
  * Only restore valid entries
  */
 if (rb & SLB_ESID_V) {
-ppc_store_slb(cpu, rb, rs);
+ppc_store_slb(cpu, rb & 0xfff, rb & ~0xfffULL, rs);
 }
 }
 #endif
diff --git a/target-ppc/mmu-hash64.c b/target-ppc/mmu-hash64.c
index 8648408..788725c 100644
--- a/target-ppc/mmu-hash64.c
+++ b/target-ppc/mmu-hash64.c
@@ -136,28 +136,30 @@ void helper_slbie(CPUPPCState *env, target_ulong addr)
 }
 }
 
-int ppc_store_slb(PowerPCCPU *cpu, target_ulong rb, target_ulong rs)
+int ppc_store_slb(PowerPCCPU *cpu, target_ulong slot,
+  target_ulong esid, target_ulong vsid)
 {
 CPUPPCState *env = &cpu->env;
-int slot = rb & 0xfff;
 ppc_slb_t *slb = &env->slb[slot];
 
-if (rb & (0x1000 - env->slb_nr)) {
-return -1; /* Reserved bits set or slot too high */
+if (slot >= env->slb_nr) {
+return -1; /* Bad slot number */
+}
+if (esid & ~(SLB_ESID_ESID | SLB_ESID_V)) {
+return -1; /* Reserved bits set */
 }
-if (rs & (SLB_VSID_B & ~SLB_VSID_B_1T)) {
+if (vsid & (SLB_VSID_B & ~SLB_VSID_B_1T)) {
 return -1; /* Bad segment size */
 }
-if ((rs & SLB_VSID_B) && !(env->mmu_model & POWERPC_MMU_1TSEG)) {
+if ((vsid & SLB_VSID_B) && !(env->mmu_model & POWERPC_MMU_1TSEG)) {
 return -1; /* 1T segment on MMU that doesn't support it */
 }
 
-/* Mask out the slot number as we store the entry */
-slb->esid = rb & (SLB_ESID_ESID | SLB_ESID_V);
-slb->vsid = rs;
+slb->esid = esid;
+slb->vsid = vsid;
 
 LOG_SLB("%s: %d " TARGET_FMT_lx " - " TARGET_FMT_lx " => %016" PRIx64
-" %016" PRIx64 "\n", __func__, slot, rb, rs,
+" %016" PRIx64 "\n", __func__, slot, esid, vsid,
 slb->esid, slb->vsid);
 
 return 0;
@@ -197,7 +199,7 @@ void helper_store_slb(CPUPPCState *env, target_ulong rb, 
target_ulong rs)
 {
 PowerPCCPU *cpu = ppc_env_get_cpu(env);
 
-if (ppc_store_slb(cpu, rb, rs) < 0) {
+if (ppc_store_slb(cpu, rb & 0xfff, rb & ~0xfffULL, rs) < 0) {
 helper_raise_exception_err(env, POWERPC_EXCP_PROGRAM,
POWERPC_EXCP_INVAL);
 }
diff --git a/target-ppc/mmu-hash64.h b/target-ppc/mmu-hash64.h
index 6e3de7e..24fd2c4 100644
--- a/target-ppc/mmu-hash64.h
+++ b/target-ppc/mmu-hash64.h
@@ -6,7 +6,8 @@
 #ifdef TARGET_PPC64
 void ppc_hash64_check_page_sizes(PowerPCCPU *cpu, Error **errp);
 void dump_slb(FILE *f, fprintf_function cpu_fprintf, PowerPCCPU *cpu);
-int ppc_store_slb(PowerPCCPU *cpu, target_ulong rb, target_ulong rs);
+int ppc_store_slb(PowerPCCPU *cpu, target_ulong slot,
+  target_ulong esid, target_ulong vsid);
 hwaddr ppc_hash64_get_phys_page_debug(PowerPCCPU *cpu, target_ulong addr);
 int ppc_hash64_handle_mmu_fault(PowerPCCPU *cpu, target_ulong address, int rw,
 int mmu_idx);
diff --git a/target-ppc/mmu_helper.c b/target-ppc/mmu_helper.c
index 2446bba..7277889 100644
--- a/target-ppc/mmu_helper.c
+++ b/target-ppc/mmu_helper.c
@@ -2089,21 +2089,17 @@ void helper_store_sr(CPUPPCState *env, target_ulong 
srnum, target_ulong value)
 (int)srnum, value, env->sr[srnum]);
 #if defined(TARGET_PPC64)
 if (env->mmu_model & POWERPC_MMU_64) {
-uint64_t rb = 0, rs = 0;
+uint64_t esid, vsid;
 
 /* ESID = srnum */
-rb |= ((uint32_t)srnum & 0xf) << 28;
-/* Set the valid bit */
-rb |= SLB_ESID_V;
-/* Index = ESID */
-rb |= (uint32_t)srnum;
+esid = ((uint64_t)(srnum & 0xf) << 28) | SLB_ESID_V;
 
 /* VSID = VSID */
-rs |= (value & 0xfff) << 12;
+vsid = (value & 0xfff) << 12;
 /* flags = flags */
-rs |= ((value >> 27) & 0xf) << 8;
+

[Qemu-devel] [PULL 16/40] pseries: Clean up error handling in xics_system_init()

2016-01-31 Thread David Gibson

Use the error handling infrastructure to pass an error out from
try_create_xics() instead of assuming &error_abort - the caller is in a
better position to decide on error handling policy.

Also change the error handling from an &error_abort to &error_fatal, since
this occurs during the initial machine construction and could be triggered
by bad configuration rather than a program error.

Signed-off-by: David Gibson 
Reviewed-by: Thomas Huth 
Reviewed-by: Alexey Kardashevskiy 
Reviewed-by: Markus Armbruster 
---
 hw/ppc/spapr.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 3f90e50..1281e07 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -112,7 +112,7 @@ static XICSState *try_create_xics(const char *type, int 
nr_servers,
 }
 
 static XICSState *xics_system_init(MachineState *machine,
-   int nr_servers, int nr_irqs)
+   int nr_servers, int nr_irqs, Error **errp)
 {
 XICSState *icp = NULL;
 
@@ -131,7 +131,7 @@ static XICSState *xics_system_init(MachineState *machine,
 }
 
 if (!icp) {
-icp = try_create_xics(TYPE_XICS, nr_servers, nr_irqs, &error_abort);
+icp = try_create_xics(TYPE_XICS, nr_servers, nr_irqs, errp);
 }
 
 return icp;
@@ -1813,7 +1813,7 @@ static void ppc_spapr_init(MachineState *machine)
 spapr->icp = xics_system_init(machine,
   DIV_ROUND_UP(max_cpus * kvmppc_smt_threads(),
smp_threads),
-  XICS_IRQS);
+  XICS_IRQS, &error_fatal);
 
 if (smc->dr_lmb_enabled) {
 spapr_validate_node_memory(machine, &error_fatal);
-- 
2.5.0

[Qemu-devel] [PULL 34/40] target-ppc: Remove unused mmu models from ppc_tlb_invalidate_one

2016-01-31 Thread David Gibson

ppc_tlb_invalidate_one() has a big switch handling many different MMU
types.  However, most of those branches can never be reached:

It is called from 3 places: from remove_hpte() and h_protect() in
spapr_hcall.c (which always has a 64-bit hash MMU type), and from
helper_tlbie() in mmu_helper.c.

Calls to helper_tlbie() are generated from gen_tlbiel, gen_tlbiel and
gen_tlbiva.  The first two are only used with the PPC_MEM_TLBIE flag,
set only with 32-bit or 64-bit hash MMU models, and gen_tlbiva() is
used only on 440 and 460 models with the BookE mmu model.

These means the exhaustive list of MMU types which may call
ppc_tlb_invalidate_one() is: POWERPC_MMU_SOFT_6xx, POWERPC_MMU_601,
POWERPC_MMU_32B, POWERPC_MMU_SOFT_74xx, POWERPC_MMU_64B, POWERPC_MMU_2_03,
POWERPC_MMU_2_06, POWERPC_MMU_2_07 and POWERPC_MMU_BOOKE.

Clean up by removing logic for all other MMU types from
ppc_tlb_invalidate_one().

This means that ppc4xx_tlb_invalidate_virt() now has no callers, or rather,
makes it obvious that it has no callers.  So, we remove that function as
well.

Signed-off-by: David Gibson 
---
 target-ppc/mmu_helper.c | 46 ++
 1 file changed, 2 insertions(+), 44 deletions(-)

diff --git a/target-ppc/mmu_helper.c b/target-ppc/mmu_helper.c
index 7277889..4343cb2 100644
--- a/target-ppc/mmu_helper.c
+++ b/target-ppc/mmu_helper.c
@@ -658,32 +658,6 @@ static inline void ppc4xx_tlb_invalidate_all(CPUPPCState 
*env)
 tlb_flush(CPU(cpu), 1);
 }
 
-static inline void ppc4xx_tlb_invalidate_virt(CPUPPCState *env,
-  target_ulong eaddr, uint32_t pid)
-{
-#if !defined(FLUSH_ALL_TLBS)
-CPUState *cs = CPU(ppc_env_get_cpu(env));
-ppcemb_tlb_t *tlb;
-hwaddr raddr;
-target_ulong page, end;
-int i;
-
-for (i = 0; i < env->nb_tlb; i++) {
-tlb = &env->tlb.tlbe[i];
-if (ppcemb_tlb_check(env, tlb, &raddr, eaddr, pid, 0, i) == 0) {
-end = tlb->EPN + tlb->size;
-for (page = tlb->EPN; page < end; page += TARGET_PAGE_SIZE) {
-tlb_flush_page(cs, page);
-}
-tlb->prot &= ~PAGE_VALID;
-break;
-}
-}
-#else
-ppc4xx_tlb_invalidate_all(env);
-#endif
-}
-
 static int mmu40x_get_physical_address(CPUPPCState *env, mmu_ctx_t *ctx,
target_ulong address, int rw,
int access_type)
@@ -1972,25 +1946,10 @@ void ppc_tlb_invalidate_one(CPUPPCState *env, 
target_ulong addr)
 ppc6xx_tlb_invalidate_virt(env, addr, 1);
 }
 break;
-case POWERPC_MMU_SOFT_4xx:
-case POWERPC_MMU_SOFT_4xx_Z:
-ppc4xx_tlb_invalidate_virt(env, addr, env->spr[SPR_40x_PID]);
-break;
-case POWERPC_MMU_REAL:
-cpu_abort(CPU(cpu), "No TLB for PowerPC 4xx in real mode\n");
-break;
-case POWERPC_MMU_MPC8xx:
-/* XXX: TODO */
-cpu_abort(CPU(cpu), "MPC8xx MMU model is not implemented\n");
-break;
 case POWERPC_MMU_BOOKE:
 /* XXX: TODO */
 cpu_abort(CPU(cpu), "BookE MMU model is not implemented\n");
 break;
-case POWERPC_MMU_BOOKE206:
-/* XXX: TODO */
-cpu_abort(CPU(cpu), "BookE 2.06 MMU model is not implemented\n");
-break;
 case POWERPC_MMU_32B:
 case POWERPC_MMU_601:
 /* tlbie invalidate TLBs for all segments */
@@ -2032,9 +1991,8 @@ void ppc_tlb_invalidate_one(CPUPPCState *env, 
target_ulong addr)
 break;
 #endif /* defined(TARGET_PPC64) */
 default:
-/* XXX: TODO */
-cpu_abort(CPU(cpu), "Unknown MMU model\n");
-break;
+/* Should never reach here with other MMU models */
+assert(0);
 }
 #else
 ppc_tlb_invalidate_all(env);
-- 
2.5.0

[Qemu-devel] [PULL 21/40] target-ppc: gdbstub: fix float registers for little-endian guests

2016-01-31 Thread David Gibson

From: Greg Kurz 

Let's reuse the ppc_maybe_bswap_register() helper, like we already do
with the general registers.

Signed-off-by: Greg Kurz 
Signed-off-by: David Gibson 
---
 target-ppc/translate_init.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 78c2811..031c71e 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -8754,10 +8754,12 @@ static int gdb_get_float_reg(CPUPPCState *env, uint8_t 
*mem_buf, int n)
 {
 if (n < 32) {
 stfq_p(mem_buf, env->fpr[n]);
+ppc_maybe_bswap_register(env, mem_buf, 8);
 return 8;
 }
 if (n == 32) {
 stl_p(mem_buf, env->fpscr);
+ppc_maybe_bswap_register(env, mem_buf, 4);
 return 4;
 }
 return 0;
@@ -8766,10 +8768,12 @@ static int gdb_get_float_reg(CPUPPCState *env, uint8_t 
*mem_buf, int n)
 static int gdb_set_float_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 {
 if (n < 32) {
+ppc_maybe_bswap_register(env, mem_buf, 8);
 env->fpr[n] = ldfq_p(mem_buf);
 return 8;
 }
 if (n == 32) {
+ppc_maybe_bswap_register(env, mem_buf, 4);
 helper_store_fpscr(env, ldl_p(mem_buf), 0x);
 return 4;
 }
-- 
2.5.0

[Qemu-devel] [PULL 20/40] target-ppc: rename and export maybe_bswap_register()

2016-01-31 Thread David Gibson

From: Greg Kurz 

This helper will be used to support FP, Altivec and VSX registers when
the guest is little-endian.

Signed-off-by: Greg Kurz 
Signed-off-by: David Gibson 
---
 target-ppc/cpu.h |  1 +
 target-ppc/gdbstub.c | 10 +-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index b3b89e6..2bc96b4 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -2355,4 +2355,5 @@ int ppc_get_vcpu_dt_id(PowerPCCPU *cpu);
  */
 PowerPCCPU *ppc_get_vcpu_by_dt_id(int cpu_dt_id);
 
+void ppc_maybe_bswap_register(CPUPPCState *env, uint8_t *mem_buf, int len);
 #endif /* !defined (__CPU_PPC_H__) */
diff --git a/target-ppc/gdbstub.c b/target-ppc/gdbstub.c
index ef4be23..569c380 100644
--- a/target-ppc/gdbstub.c
+++ b/target-ppc/gdbstub.c
@@ -88,7 +88,7 @@ static int ppc_gdb_register_len(int n)
the proper ordering for the binary, and cannot be changed.
For system mode, TARGET_WORDS_BIGENDIAN is always set, and we must check
the current mode of the chip to see if we're running in little-endian.  */
-static void maybe_bswap_register(CPUPPCState *env, uint8_t *mem_buf, int len)
+void ppc_maybe_bswap_register(CPUPPCState *env, uint8_t *mem_buf, int len)
 {
 #ifndef CONFIG_USER_ONLY
 if (!msr_le) {
@@ -158,7 +158,7 @@ int ppc_cpu_gdb_read_register(CPUState *cs, uint8_t 
*mem_buf, int n)
 break;
 }
 }
-maybe_bswap_register(env, mem_buf, r);
+ppc_maybe_bswap_register(env, mem_buf, r);
 return r;
 }
 
@@ -214,7 +214,7 @@ int ppc_cpu_gdb_read_register_apple(CPUState *cs, uint8_t 
*mem_buf, int n)
 break;
 }
 }
-maybe_bswap_register(env, mem_buf, r);
+ppc_maybe_bswap_register(env, mem_buf, r);
 return r;
 }
 
@@ -227,7 +227,7 @@ int ppc_cpu_gdb_write_register(CPUState *cs, uint8_t 
*mem_buf, int n)
 if (!r) {
 return r;
 }
-maybe_bswap_register(env, mem_buf, r);
+ppc_maybe_bswap_register(env, mem_buf, r);
 if (n < 32) {
 /* gprs */
 env->gpr[n] = ldtul_p(mem_buf);
@@ -277,7 +277,7 @@ int ppc_cpu_gdb_write_register_apple(CPUState *cs, uint8_t 
*mem_buf, int n)
 if (!r) {
 return r;
 }
-maybe_bswap_register(env, mem_buf, r);
+ppc_maybe_bswap_register(env, mem_buf, r);
 if (n < 32) {
 /* gprs */
 env->gpr[n] = ldq_p(mem_buf);
-- 
2.5.0

[Qemu-devel] [PULL 36/40] target-ppc: Add new TLB invalidate by HPTE call for hash64 MMUs

2016-01-31 Thread David Gibson

When HPTEs are removed or modified by hypercalls on spapr, we need to
invalidate the relevant pages in the qemu TLB.

Currently we do that by doing some complicated calculations to work out the
right encoding for the tlbie instruction, then passing that to
ppc_tlb_invalidate_one()... which totally ignores the argument and flushes
the whole tlb.

Avoid that by adding a new flush-by-hpte helper in mmu-hash64.c.

Signed-off-by: David Gibson 
Acked-by: Benjamin Herrenschmidt 
Reviewed-by: Alexander Graf 
---
 hw/ppc/spapr_hcall.c| 46 --
 target-ppc/mmu-hash64.c | 12 
 target-ppc/mmu-hash64.h |  3 +++
 3 files changed, 19 insertions(+), 42 deletions(-)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index a53bd2f..0a8378c 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -38,42 +38,6 @@ static void set_spr(CPUState *cs, int spr, target_ulong 
value,
 run_on_cpu(cs, do_spr_sync, &s);
 }
 
-static target_ulong compute_tlbie_rb(target_ulong v, target_ulong r,
- target_ulong pte_index)
-{
-target_ulong rb, va_low;
-
-rb = (v & ~0x7fULL) << 16; /* AVA field */
-va_low = pte_index >> 3;
-if (v & HPTE64_V_SECONDARY) {
-va_low = ~va_low;
-}
-/* xor vsid from AVA */
-if (!(v & HPTE64_V_1TB_SEG)) {
-va_low ^= v >> 12;
-} else {
-va_low ^= v >> 24;
-}
-va_low &= 0x7ff;
-if (v & HPTE64_V_LARGE) {
-rb |= 1; /* L field */
-#if 0 /* Disable that P7 specific bit for now */
-if (r & 0xff000) {
-/* non-16MB large page, must be 64k */
-/* (masks depend on page size) */
-rb |= 0x1000;/* page encoding in LP field */
-rb |= (va_low & 0x7f) << 16; /* 7b of VA in AVA/LP field */
-rb |= (va_low & 0xfe);   /* AVAL field */
-}
-#endif
-} else {
-/* 4kB page */
-rb |= (va_low & 0x7ff) << 12;   /* remaining 11b of AVA */
-}
-rb |= (v >> 54) & 0x300;/* B field */
-return rb;
-}
-
 static inline bool valid_pte_index(CPUPPCState *env, target_ulong pte_index)
 {
 /*
@@ -199,7 +163,7 @@ static RemoveResult remove_hpte(PowerPCCPU *cpu, 
target_ulong ptex,
 {
 CPUPPCState *env = &cpu->env;
 uint64_t token;
-target_ulong v, r, rb;
+target_ulong v, r;
 
 if (!valid_pte_index(env, ptex)) {
 return REMOVE_PARM;
@@ -218,8 +182,7 @@ static RemoveResult remove_hpte(PowerPCCPU *cpu, 
target_ulong ptex,
 *vp = v;
 *rp = r;
 ppc_hash64_store_hpte(cpu, ptex, HPTE64_V_HPTE_DIRTY, 0);
-rb = compute_tlbie_rb(v, r, ptex);
-ppc_tlb_invalidate_one(env, rb);
+ppc_hash64_tlb_flush_hpte(cpu, ptex, v, r);
 return REMOVE_SUCCESS;
 }
 
@@ -323,7 +286,7 @@ static target_ulong h_protect(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
 target_ulong pte_index = args[1];
 target_ulong avpn = args[2];
 uint64_t token;
-target_ulong v, r, rb;
+target_ulong v, r;
 
 if (!valid_pte_index(env, pte_index)) {
 return H_PARAMETER;
@@ -344,10 +307,9 @@ static target_ulong h_protect(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
 r |= (flags << 55) & HPTE64_R_PP0;
 r |= (flags << 48) & HPTE64_R_KEY_HI;
 r |= flags & (HPTE64_R_PP | HPTE64_R_N | HPTE64_R_KEY_LO);
-rb = compute_tlbie_rb(v, r, pte_index);
 ppc_hash64_store_hpte(cpu, pte_index,
   (v & ~HPTE64_V_VALID) | HPTE64_V_HPTE_DIRTY, 0);
-ppc_tlb_invalidate_one(env, rb);
+ppc_hash64_tlb_flush_hpte(cpu, pte_index, v, r);
 /* Don't need a memory barrier, due to qemu's global lock */
 ppc_hash64_store_hpte(cpu, pte_index, v | HPTE64_V_HPTE_DIRTY, r);
 return H_SUCCESS;
diff --git a/target-ppc/mmu-hash64.c b/target-ppc/mmu-hash64.c
index f4c25b7..565a0f4 100644
--- a/target-ppc/mmu-hash64.c
+++ b/target-ppc/mmu-hash64.c
@@ -708,3 +708,15 @@ void ppc_hash64_store_hpte(PowerPCCPU *cpu,
  env->htab_base + pte_index + HASH_PTE_SIZE_64 / 2, pte1);
 }
 }
+
+void ppc_hash64_tlb_flush_hpte(PowerPCCPU *cpu,
+   target_ulong pte_index,
+   target_ulong pte0, target_ulong pte1)
+{
+/*
+ * XXX: given the fact that there are too many segments to
+ * invalidate, and we still don't have a tlb_flush_mask(env, n,
+ * mask) in QEMU, we just invalidate all TLBs
+ */
+tlb_flush(CPU(cpu), 1);
+}
diff --git a/target-ppc/mmu-hash64.h b/target-ppc/mmu-hash64.h
index 24fd2c4..293a951 100644
--- a/target-ppc/mmu-hash64.h
+++ b/target-ppc/mmu-hash64.h
@@ -13,6 +13,9 @@ int ppc_hash64_handle_mmu_fault(PowerPCCPU *cpu, target_ulong 
address, int rw,
 int mmu_idx);
 void ppc_hash64_store_hpte(PowerPCCPU *cpu, target_ulong index,
target_ulong pte0, target_ulong pte1);
+void ppc_hash64_tlb_flush_hpte(Power

[Qemu-devel] [PULL 38/40] target-ppc: Allow more page sizes for POWER7 & POWER8 in TCG

2016-01-31 Thread David Gibson

Now that the TCG and spapr code has been extended to allow (semi-)
arbitrary page encodings in the CPU's 'sps' table, we can add the many
page sizes supported by real POWER7 and POWER8 hardware that we previously
didn't support in TCG.

Signed-off-by: David Gibson 
Acked-by: Benjamin Herrenschmidt 
Reviewed-by: Alexander Graf 
---
 target-ppc/mmu-hash64.h |  2 ++
 target-ppc/translate_init.c | 32 
 2 files changed, 34 insertions(+)

diff --git a/target-ppc/mmu-hash64.h b/target-ppc/mmu-hash64.h
index 34cf975..ab0f86b 100644
--- a/target-ppc/mmu-hash64.h
+++ b/target-ppc/mmu-hash64.h
@@ -48,6 +48,8 @@ unsigned ppc_hash64_hpte_page_shift_noslb(PowerPCCPU *cpu,
 #define SLB_VSID_LLP_MASK   (SLB_VSID_L | SLB_VSID_LP)
 #define SLB_VSID_4K 0xULL
 #define SLB_VSID_64K0x0110ULL
+#define SLB_VSID_16M0x0100ULL
+#define SLB_VSID_16G0x0120ULL
 
 /*
  * Hash page table definitions
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 4d71a5d..cdd18ac 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -8105,6 +8105,36 @@ static Property powerpc_servercpu_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+#ifdef CONFIG_SOFTMMU
+static const struct ppc_segment_page_sizes POWER7_POWER8_sps = {
+.sps = {
+{
+.page_shift = 12, /* 4K */
+.slb_enc = 0,
+.enc = { { .page_shift = 12, .pte_enc = 0 },
+ { .page_shift = 16, .pte_enc = 0x7 },
+ { .page_shift = 24, .pte_enc = 0x38 }, },
+},
+{
+.page_shift = 16, /* 64K */
+.slb_enc = SLB_VSID_64K,
+.enc = { { .page_shift = 16, .pte_enc = 0x1 },
+ { .page_shift = 24, .pte_enc = 0x8 }, },
+},
+{
+.page_shift = 24, /* 16M */
+.slb_enc = SLB_VSID_16M,
+.enc = { { .page_shift = 24, .pte_enc = 0 }, },
+},
+{
+.page_shift = 34, /* 16G */
+.slb_enc = SLB_VSID_16G,
+.enc = { { .page_shift = 34, .pte_enc = 0x3 }, },
+},
+}
+};
+#endif /* CONFIG_SOFTMMU */
+
 static void init_proc_POWER7 (CPUPPCState *env)
 {
 init_proc_book3s_64(env, BOOK3S_CPU_POWER7);
@@ -8168,6 +8198,7 @@ POWERPC_FAMILY(POWER7)(ObjectClass *oc, void *data)
 pcc->mmu_model = POWERPC_MMU_2_06;
 #if defined(CONFIG_SOFTMMU)
 pcc->handle_mmu_fault = ppc_hash64_handle_mmu_fault;
+pcc->sps = &POWER7_POWER8_sps;
 #endif
 pcc->excp_model = POWERPC_EXCP_POWER7;
 pcc->bus_model = PPC_FLAGS_INPUT_POWER7;
@@ -8248,6 +8279,7 @@ POWERPC_FAMILY(POWER8)(ObjectClass *oc, void *data)
 pcc->mmu_model = POWERPC_MMU_2_07;
 #if defined(CONFIG_SOFTMMU)
 pcc->handle_mmu_fault = ppc_hash64_handle_mmu_fault;
+pcc->sps = &POWER7_POWER8_sps;
 #endif
 pcc->excp_model = POWERPC_EXCP_POWER7;
 pcc->bus_model = PPC_FLAGS_INPUT_POWER7;
-- 
2.5.0

[Qemu-devel] [PULL 25/40] target-ppc: gdbstub: Add VSX support

2016-01-31 Thread David Gibson

From: Anton Blanchard 

Add the XML and functions to get and set VSX registers.

Signed-off-by: Anton Blanchard 
(fixed little-endian guests)
Signed-off-by: Greg Kurz 
Signed-off-by: David Gibson 
---
 configure   |  6 +++---
 gdb-xml/power-vsx.xml   | 44 
 target-ppc/translate_init.c | 24 
 3 files changed, 71 insertions(+), 3 deletions(-)
 create mode 100644 gdb-xml/power-vsx.xml

diff --git a/configure b/configure
index 3506e44..297bfc7 100755
--- a/configure
+++ b/configure
@@ -5702,20 +5702,20 @@ case "$target_name" in
   ppc64)
 TARGET_BASE_ARCH=ppc
 TARGET_ABI_DIR=ppc
-gdb_xml_files="power64-core.xml power-fpu.xml power-altivec.xml 
power-spe.xml"
+gdb_xml_files="power64-core.xml power-fpu.xml power-altivec.xml 
power-spe.xml power-vsx.xml"
   ;;
   ppc64le)
 TARGET_ARCH=ppc64
 TARGET_BASE_ARCH=ppc
 TARGET_ABI_DIR=ppc
-gdb_xml_files="power64-core.xml power-fpu.xml power-altivec.xml 
power-spe.xml"
+gdb_xml_files="power64-core.xml power-fpu.xml power-altivec.xml 
power-spe.xml power-vsx.xml"
   ;;
   ppc64abi32)
 TARGET_ARCH=ppc64
 TARGET_BASE_ARCH=ppc
 TARGET_ABI_DIR=ppc
 echo "TARGET_ABI32=y" >> $config_target_mak
-gdb_xml_files="power64-core.xml power-fpu.xml power-altivec.xml 
power-spe.xml"
+gdb_xml_files="power64-core.xml power-fpu.xml power-altivec.xml 
power-spe.xml power-vsx.xml"
   ;;
   sh4|sh4eb)
 TARGET_ARCH=sh4
diff --git a/gdb-xml/power-vsx.xml b/gdb-xml/power-vsx.xml
new file mode 100644
index 000..fd290e9
--- /dev/null
+++ b/gdb-xml/power-vsx.xml
@@ -0,0 +1,44 @@
+
+
+
+
+
+
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index fce68f3..4d71a5d 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -8896,6 +8896,26 @@ static int gdb_set_spe_reg(CPUPPCState *env, uint8_t 
*mem_buf, int n)
 return 0;
 }
 
+static int gdb_get_vsx_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
+{
+if (n < 32) {
+stq_p(mem_buf, env->vsr[n]);
+ppc_maybe_bswap_register(env, mem_buf, 8);
+return 8;
+}
+return 0;
+}
+
+static int gdb_set_vsx_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
+{
+if (n < 32) {
+ppc_maybe_bswap_register(env, mem_buf, 8);
+env->vsr[n] = ldq_p(mem_buf);
+return 8;
+}
+return 0;
+}
+
 static int ppc_fixup_cpu(PowerPCCPU *cpu)
 {
 CPUPPCState *env = &cpu->env;
@@ -9001,6 +9021,10 @@ static void ppc_cpu_realizefn(DeviceState *dev, Error 
**errp)
 gdb_register_coprocessor(cs, gdb_get_spe_reg, gdb_set_spe_reg,
  34, "power-spe.xml", 0);
 }
+if (pcc->insns_flags2 & PPC2_VSX) {
+gdb_register_coprocessor(cs, gdb_get_vsx_reg, gdb_set_vsx_reg,
+ 32, "power-vsx.xml", 0);
+}
 
 qemu_init_vcpu(cs);
 
-- 
2.5.0

[Qemu-devel] [PULL 11/40] ppc: Clean up error handling in ppc_set_compat()

2016-01-31 Thread David Gibson

Current ppc_set_compat() returns -1 for errors, and also (unconditionally)
reports an error message.  The caller in h_client_architecture_support()
may then report it again using an outdated fprintf().

Clean this up by using the modern error reporting mechanisms.  Also add
strerror(errno) to the error message.

Signed-off-by: David Gibson 
Reviewed-by: Thomas Huth 
Reviewed-by: Alexey Kardashevskiy 
Reviewed-by: Markus Armbruster 
---
 hw/ppc/spapr.c  |  4 +---
 hw/ppc/spapr_hcall.c| 10 +-
 target-ppc/cpu.h|  2 +-
 target-ppc/translate_init.c | 13 +++--
 4 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 0ac6368..8862d18 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1643,9 +1643,7 @@ static void spapr_cpu_init(sPAPRMachineState *spapr, 
PowerPCCPU *cpu)
 }
 
 if (cpu->max_compat) {
-if (ppc_set_compat(cpu, cpu->max_compat) < 0) {
-exit(1);
-}
+ppc_set_compat(cpu, cpu->max_compat, &error_fatal);
 }
 
 xics_cpu_setup(spapr->icp, cpu);
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index fdd7fea..655c433 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -838,7 +838,7 @@ static target_ulong cas_get_option_vector(int vector, 
target_ulong table)
 typedef struct {
 PowerPCCPU *cpu;
 uint32_t cpu_version;
-int ret;
+Error *err;
 } SetCompatState;
 
 static void do_set_compat(void *arg)
@@ -846,7 +846,7 @@ static void do_set_compat(void *arg)
 SetCompatState *s = arg;
 
 cpu_synchronize_state(CPU(s->cpu));
-s->ret = ppc_set_compat(s->cpu, s->cpu_version);
+ppc_set_compat(s->cpu, s->cpu_version, &s->err);
 }
 
 #define get_compat_level(cpuver) ( \
@@ -931,13 +931,13 @@ static target_ulong 
h_client_architecture_support(PowerPCCPU *cpu_,
 SetCompatState s = {
 .cpu = POWERPC_CPU(cs),
 .cpu_version = cpu_version,
-.ret = 0
+.err = NULL,
 };
 
 run_on_cpu(cs, do_set_compat, &s);
 
-if (s.ret < 0) {
-fprintf(stderr, "Unable to set compatibility mode\n");
+if (s.err) {
+error_report_err(s.err);
 return H_HARDWARE;
 }
 }
diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 9706000..b3b89e6 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -1210,7 +1210,7 @@ void ppc_store_msr (CPUPPCState *env, target_ulong value);
 
 void ppc_cpu_list (FILE *f, fprintf_function cpu_fprintf);
 int ppc_get_compat_smt_threads(PowerPCCPU *cpu);
-int ppc_set_compat(PowerPCCPU *cpu, uint32_t cpu_version);
+void ppc_set_compat(PowerPCCPU *cpu, uint32_t cpu_version, Error **errp);
 
 /* Time-base and decrementer management */
 #ifndef NO_CPU_IO_DEFS
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 76d5da1..78c2811 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -9185,7 +9185,7 @@ int ppc_get_compat_smt_threads(PowerPCCPU *cpu)
 return ret;
 }
 
-int ppc_set_compat(PowerPCCPU *cpu, uint32_t cpu_version)
+void ppc_set_compat(PowerPCCPU *cpu, uint32_t cpu_version, Error **errp)
 {
 int ret = 0;
 CPUPPCState *env = &cpu->env;
@@ -9207,12 +9207,13 @@ int ppc_set_compat(PowerPCCPU *cpu, uint32_t 
cpu_version)
 break;
 }
 
-if (kvm_enabled() && kvmppc_set_compat(cpu, cpu->cpu_version) < 0) {
-error_report("Unable to set compatibility mode in KVM");
-ret = -1;
+if (kvm_enabled()) {
+ret = kvmppc_set_compat(cpu, cpu->cpu_version);
+if (ret < 0) {
+error_setg_errno(errp, -ret,
+ "Unable to set CPU compatibility mode in KVM");
+}
 }
-
-return ret;
 }
 
 static gint ppc_cpu_compare_class_pvr(gconstpointer a, gconstpointer b)
-- 
2.5.0

[Qemu-devel] [PULL 24/40] target-ppc: gdbstub: fix spe registers for little-endian guests

2016-01-31 Thread David Gibson

From: Greg Kurz 

Let's reuse the ppc_maybe_bswap_register() helper, like we already do
with the general registers.

Signed-off-by: Greg Kurz 
Signed-off-by: David Gibson 
---
 target-ppc/translate_init.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 1174141..fce68f3 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -8848,6 +8848,7 @@ static int gdb_get_spe_reg(CPUPPCState *env, uint8_t 
*mem_buf, int n)
 if (n < 32) {
 #if defined(TARGET_PPC64)
 stl_p(mem_buf, env->gpr[n] >> 32);
+ppc_maybe_bswap_register(env, mem_buf, 4);
 #else
 stl_p(mem_buf, env->gprh[n]);
 #endif
@@ -8855,10 +8856,12 @@ static int gdb_get_spe_reg(CPUPPCState *env, uint8_t 
*mem_buf, int n)
 }
 if (n == 32) {
 stq_p(mem_buf, env->spe_acc);
+ppc_maybe_bswap_register(env, mem_buf, 8);
 return 8;
 }
 if (n == 33) {
 stl_p(mem_buf, env->spe_fscr);
+ppc_maybe_bswap_register(env, mem_buf, 4);
 return 4;
 }
 return 0;
@@ -8869,7 +8872,11 @@ static int gdb_set_spe_reg(CPUPPCState *env, uint8_t 
*mem_buf, int n)
 if (n < 32) {
 #if defined(TARGET_PPC64)
 target_ulong lo = (uint32_t)env->gpr[n];
-target_ulong hi = (target_ulong)ldl_p(mem_buf) << 32;
+target_ulong hi;
+
+ppc_maybe_bswap_register(env, mem_buf, 4);
+
+hi = (target_ulong)ldl_p(mem_buf) << 32;
 env->gpr[n] = lo | hi;
 #else
 env->gprh[n] = ldl_p(mem_buf);
@@ -8877,10 +8884,12 @@ static int gdb_set_spe_reg(CPUPPCState *env, uint8_t 
*mem_buf, int n)
 return 4;
 }
 if (n == 32) {
+ppc_maybe_bswap_register(env, mem_buf, 8);
 env->spe_acc = ldq_p(mem_buf);
 return 8;
 }
 if (n == 33) {
+ppc_maybe_bswap_register(env, mem_buf, 4);
 env->spe_fscr = ldl_p(mem_buf);
 return 4;
 }
-- 
2.5.0

[Qemu-devel] [PULL 13/40] pseries: Clean up error handling in spapr_validate_node_memory()

2016-01-31 Thread David Gibson

Use error_setg() and return an error, rather than using an explicit exit().

Also improve messages, and be more explicit about which constraint failed.

Signed-off-by: David Gibson 
Reviewed-by: Bharata B Rao 
Reviewed-by: Thomas Huth 
Reviewed-by: Alexey Kardashevskiy 
Reviewed-by: Markus Armbruster 
---
 hw/ppc/spapr.c | 37 ++---
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 61653ae..4e6ee6d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1699,27 +1699,34 @@ static void 
spapr_create_lmb_dr_connectors(sPAPRMachineState *spapr)
  * to SPAPR_MEMORY_BLOCK_SIZE(256MB), then refuse to start the guest
  * since we can't support such unaligned sizes with DRCONF_MEMORY.
  */
-static void spapr_validate_node_memory(MachineState *machine)
+static void spapr_validate_node_memory(MachineState *machine, Error **errp)
 {
 int i;
 
-if (machine->maxram_size % SPAPR_MEMORY_BLOCK_SIZE ||
-machine->ram_size % SPAPR_MEMORY_BLOCK_SIZE) {
-error_report("Can't support memory configuration where RAM size "
- "0x" RAM_ADDR_FMT " or maxmem size "
- "0x" RAM_ADDR_FMT " isn't aligned to %llu MB",
- machine->ram_size, machine->maxram_size,
- SPAPR_MEMORY_BLOCK_SIZE/M_BYTE);
-exit(EXIT_FAILURE);
+if (machine->ram_size % SPAPR_MEMORY_BLOCK_SIZE) {
+error_setg(errp, "Memory size 0x" RAM_ADDR_FMT
+   " is not aligned to %llu MiB",
+   machine->ram_size,
+   SPAPR_MEMORY_BLOCK_SIZE / M_BYTE);
+return;
+}
+
+if (machine->maxram_size % SPAPR_MEMORY_BLOCK_SIZE) {
+error_setg(errp, "Maximum memory size 0x" RAM_ADDR_FMT
+   " is not aligned to %llu MiB",
+   machine->ram_size,
+   SPAPR_MEMORY_BLOCK_SIZE / M_BYTE);
+return;
 }
 
 for (i = 0; i < nb_numa_nodes; i++) {
 if (numa_info[i].node_mem % SPAPR_MEMORY_BLOCK_SIZE) {
-error_report("Can't support memory configuration where memory size"
- " %" PRIx64 " of node %d isn't aligned to %llu MB",
- numa_info[i].node_mem, i,
- SPAPR_MEMORY_BLOCK_SIZE/M_BYTE);
-exit(EXIT_FAILURE);
+error_setg(errp,
+   "Node %d memory size 0x%" PRIx64
+   " is not aligned to %llu MiB",
+   i, numa_info[i].node_mem,
+   SPAPR_MEMORY_BLOCK_SIZE / M_BYTE);
+return;
 }
 }
 }
@@ -1809,7 +1816,7 @@ static void ppc_spapr_init(MachineState *machine)
   XICS_IRQS);
 
 if (smc->dr_lmb_enabled) {
-spapr_validate_node_memory(machine);
+spapr_validate_node_memory(machine, &error_fatal);
 }
 
 /* init CPUs */
-- 
2.5.0

[Qemu-devel] [PULL 30/40] target-ppc: Convert mmu-hash{32, 64}.[ch] from CPUPPCState to PowerPCCPU

2016-01-31 Thread David Gibson

Like a lot of places these files include a mixture of functions taking
both the older CPUPPCState *env and newer PowerPCCPU *cpu.  Move a step
closer to cleaning this up by standardizing on PowerPCCPU, except for the
helper_* functions which are called with the CPUPPCState * from tcg.

Callers and some related functions are updated as well, the boundaries of
what's changed here are a bit arbitrary.

Signed-off-by: David Gibson 
Reviewed-by: Laurent Vivier 
Reviewed-by: Alexander Graf 
---
 hw/ppc/spapr_hcall.c| 31 ++-
 target-ppc/kvm.c|  2 +-
 target-ppc/mmu-hash32.c | 68 +++--
 target-ppc/mmu-hash32.h | 30 ++-
 target-ppc/mmu-hash64.c | 80 +
 target-ppc/mmu-hash64.h | 21 ++---
 target-ppc/mmu_helper.c | 13 
 7 files changed, 136 insertions(+), 109 deletions(-)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 093d426..a53bd2f 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -161,7 +161,7 @@ static target_ulong h_enter(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
 pte_index &= ~7ULL;
 token = ppc_hash64_start_access(cpu, pte_index);
 for (; index < 8; index++) {
-if ((ppc_hash64_load_hpte0(env, token, index) & HPTE64_V_VALID) == 
0) {
+if (!(ppc_hash64_load_hpte0(cpu, token, index) & HPTE64_V_VALID)) {
 break;
 }
 }
@@ -171,14 +171,14 @@ static target_ulong h_enter(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
 }
 } else {
 token = ppc_hash64_start_access(cpu, pte_index);
-if (ppc_hash64_load_hpte0(env, token, 0) & HPTE64_V_VALID) {
+if (ppc_hash64_load_hpte0(cpu, token, 0) & HPTE64_V_VALID) {
 ppc_hash64_stop_access(token);
 return H_PTEG_FULL;
 }
 ppc_hash64_stop_access(token);
 }
 
-ppc_hash64_store_hpte(env, pte_index + index,
+ppc_hash64_store_hpte(cpu, pte_index + index,
   pteh | HPTE64_V_HPTE_DIRTY, ptel);
 
 args[0] = pte_index + index;
@@ -192,11 +192,12 @@ typedef enum {
 REMOVE_HW = 3,
 } RemoveResult;
 
-static RemoveResult remove_hpte(CPUPPCState *env, target_ulong ptex,
+static RemoveResult remove_hpte(PowerPCCPU *cpu, target_ulong ptex,
 target_ulong avpn,
 target_ulong flags,
 target_ulong *vp, target_ulong *rp)
 {
+CPUPPCState *env = &cpu->env;
 uint64_t token;
 target_ulong v, r, rb;
 
@@ -204,9 +205,9 @@ static RemoveResult remove_hpte(CPUPPCState *env, 
target_ulong ptex,
 return REMOVE_PARM;
 }
 
-token = ppc_hash64_start_access(ppc_env_get_cpu(env), ptex);
-v = ppc_hash64_load_hpte0(env, token, 0);
-r = ppc_hash64_load_hpte1(env, token, 0);
+token = ppc_hash64_start_access(cpu, ptex);
+v = ppc_hash64_load_hpte0(cpu, token, 0);
+r = ppc_hash64_load_hpte1(cpu, token, 0);
 ppc_hash64_stop_access(token);
 
 if ((v & HPTE64_V_VALID) == 0 ||
@@ -216,7 +217,7 @@ static RemoveResult remove_hpte(CPUPPCState *env, 
target_ulong ptex,
 }
 *vp = v;
 *rp = r;
-ppc_hash64_store_hpte(env, ptex, HPTE64_V_HPTE_DIRTY, 0);
+ppc_hash64_store_hpte(cpu, ptex, HPTE64_V_HPTE_DIRTY, 0);
 rb = compute_tlbie_rb(v, r, ptex);
 ppc_tlb_invalidate_one(env, rb);
 return REMOVE_SUCCESS;
@@ -225,13 +226,12 @@ static RemoveResult remove_hpte(CPUPPCState *env, 
target_ulong ptex,
 static target_ulong h_remove(PowerPCCPU *cpu, sPAPRMachineState *spapr,
  target_ulong opcode, target_ulong *args)
 {
-CPUPPCState *env = &cpu->env;
 target_ulong flags = args[0];
 target_ulong pte_index = args[1];
 target_ulong avpn = args[2];
 RemoveResult ret;
 
-ret = remove_hpte(env, pte_index, avpn, flags,
+ret = remove_hpte(cpu, pte_index, avpn, flags,
   &args[0], &args[1]);
 
 switch (ret) {
@@ -272,7 +272,6 @@ static target_ulong h_remove(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
 static target_ulong h_bulk_remove(PowerPCCPU *cpu, sPAPRMachineState *spapr,
   target_ulong opcode, target_ulong *args)
 {
-CPUPPCState *env = &cpu->env;
 int i;
 
 for (i = 0; i < H_BULK_REMOVE_MAX_BATCH; i++) {
@@ -294,7 +293,7 @@ static target_ulong h_bulk_remove(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
 return H_PARAMETER;
 }
 
-ret = remove_hpte(env, *tsh & H_BULK_REMOVE_PTEX, tsl,
+ret = remove_hpte(cpu, *tsh & H_BULK_REMOVE_PTEX, tsl,
   (*tsh & H_BULK_REMOVE_FLAGS) >> 26,
   &v, &r);
 
@@ -331,8 +330,8 @@ static target_ulong h_protect(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
 }
 
 token = ppc_hash64_start_access(cpu, pte_index);
-v = ppc_hash64_load_hpte0(env, token, 0);
-

[Qemu-devel] [PULL 39/40] target-ppc: Make every FPSCR_ macro have a corresponding FP_ macro

2016-01-31 Thread David Gibson

From: James Clarke 

Signed-off-by: James Clarke 
Signed-off-by: David Gibson 
---
 target-ppc/cpu.h | 31 ++-
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 0820390..f300c86 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -687,24 +687,37 @@ enum {
 
 #define FP_FX  (1ull << FPSCR_FX)
 #define FP_FEX (1ull << FPSCR_FEX)
+#define FP_VX  (1ull << FPSCR_VX)
 #define FP_OX  (1ull << FPSCR_OX)
-#define FP_OE  (1ull << FPSCR_OE)
 #define FP_UX  (1ull << FPSCR_UX)
-#define FP_UE  (1ull << FPSCR_UE)
-#define FP_XX  (1ull << FPSCR_XX)
-#define FP_XE  (1ull << FPSCR_XE)
 #define FP_ZX  (1ull << FPSCR_ZX)
-#define FP_ZE  (1ull << FPSCR_ZE)
-#define FP_VX  (1ull << FPSCR_VX)
+#define FP_XX  (1ull << FPSCR_XX)
 #define FP_VXSNAN  (1ull << FPSCR_VXSNAN)
 #define FP_VXISI   (1ull << FPSCR_VXISI)
-#define FP_VXIMZ   (1ull << FPSCR_VXIMZ)
-#define FP_VXZDZ   (1ull << FPSCR_VXZDZ)
 #define FP_VXIDI   (1ull << FPSCR_VXIDI)
+#define FP_VXZDZ   (1ull << FPSCR_VXZDZ)
+#define FP_VXIMZ   (1ull << FPSCR_VXIMZ)
 #define FP_VXVC(1ull << FPSCR_VXVC)
+#define FP_FR  (1ull << FSPCR_FR)
+#define FP_FI  (1ull << FPSCR_FI)
+#define FP_C   (1ull << FPSCR_C)
+#define FP_FL  (1ull << FPSCR_FL)
+#define FP_FG  (1ull << FPSCR_FG)
+#define FP_FE  (1ull << FPSCR_FE)
+#define FP_FU  (1ull << FPSCR_FU)
+#define FP_FPCC(FP_FL | FP_FG | FP_FE | FP_FU)
+#define FP_FPRF(FP_C  | FP_FL | FP_FG | FP_FE | FP_FU)
+#define FP_VXSOFT  (1ull << FPSCR_VXSOFT)
+#define FP_VXSQRT  (1ull << FPSCR_VXSQRT)
 #define FP_VXCVI   (1ull << FPSCR_VXCVI)
 #define FP_VE  (1ull << FPSCR_VE)
-#define FP_FI  (1ull << FPSCR_FI)
+#define FP_OE  (1ull << FPSCR_OE)
+#define FP_UE  (1ull << FPSCR_UE)
+#define FP_ZE  (1ull << FPSCR_ZE)
+#define FP_XE  (1ull << FPSCR_XE)
+#define FP_NI  (1ull << FPSCR_NI)
+#define FP_RN1 (1ull << FPSCR_RN1)
+#define FP_RN  (1ull << FPSCR_RN)
 
 /*/
 /* Vector status and control register */
-- 
2.5.0

[Qemu-devel] [PULL 22/40] target-ppc: gdbstub: introduce avr_need_swap()

2016-01-31 Thread David Gibson

From: Greg Kurz 

This helper will be used to support Altivec registers in little-endian guests.
This patch does not change functionnality.

Note: I had to put the helper some lines away from the gdb_*_avr_reg()
routines to get a more readable patch.

Signed-off-by: Greg Kurz 
Signed-off-by: David Gibson 
---
 target-ppc/translate_init.c | 37 +++--
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 031c71e..0d6d115 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -8750,6 +8750,15 @@ static void dump_ppc_insns (CPUPPCState *env)
 }
 #endif
 
+static bool avr_need_swap(CPUPPCState *env)
+{
+#ifdef HOST_WORDS_BIGENDIAN
+return false;
+#else
+return true;
+#endif
+}
+
 static int gdb_get_float_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 {
 if (n < 32) {
@@ -8783,13 +8792,13 @@ static int gdb_set_float_reg(CPUPPCState *env, uint8_t 
*mem_buf, int n)
 static int gdb_get_avr_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 {
 if (n < 32) {
-#ifdef HOST_WORDS_BIGENDIAN
-stq_p(mem_buf, env->avr[n].u64[0]);
-stq_p(mem_buf+8, env->avr[n].u64[1]);
-#else
-stq_p(mem_buf, env->avr[n].u64[1]);
-stq_p(mem_buf+8, env->avr[n].u64[0]);
-#endif
+if (!avr_need_swap(env)) {
+stq_p(mem_buf, env->avr[n].u64[0]);
+stq_p(mem_buf+8, env->avr[n].u64[1]);
+} else {
+stq_p(mem_buf, env->avr[n].u64[1]);
+stq_p(mem_buf+8, env->avr[n].u64[0]);
+}
 return 16;
 }
 if (n == 32) {
@@ -8806,13 +8815,13 @@ static int gdb_get_avr_reg(CPUPPCState *env, uint8_t 
*mem_buf, int n)
 static int gdb_set_avr_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 {
 if (n < 32) {
-#ifdef HOST_WORDS_BIGENDIAN
-env->avr[n].u64[0] = ldq_p(mem_buf);
-env->avr[n].u64[1] = ldq_p(mem_buf+8);
-#else
-env->avr[n].u64[1] = ldq_p(mem_buf);
-env->avr[n].u64[0] = ldq_p(mem_buf+8);
-#endif
+if (!avr_need_swap(env)) {
+env->avr[n].u64[0] = ldq_p(mem_buf);
+env->avr[n].u64[1] = ldq_p(mem_buf+8);
+} else {
+env->avr[n].u64[1] = ldq_p(mem_buf);
+env->avr[n].u64[0] = ldq_p(mem_buf+8);
+}
 return 16;
 }
 if (n == 32) {
-- 
2.5.0

[Qemu-devel] [PULL 29/40] target-ppc: Remove unused kvmppc_read_segment_page_sizes() stub

2016-01-31 Thread David Gibson

This stub function is in the !KVM ifdef in target-ppc/kvm_ppc.h.  However
no such function exists on the KVM side, or is ever used.

I think this originally referenced a function which read host page size
information from /proc, for we we now use the KVM GET_SMMU_INFO extension
instead.

In any case, it has no function now, so remove it.

Signed-off-by: David Gibson 
Reviewed-by: Thomas Huth 
Reviewed-by: Laurent Vivier 
Reviewed-by: Alexander Graf 
---
 target-ppc/kvm_ppc.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index 5e1333d..62406ce 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -98,11 +98,6 @@ static inline int kvmppc_get_hypercall(CPUPPCState *env, 
uint8_t *buf, int buf_l
 return -1;
 }
 
-static inline int kvmppc_read_segment_page_sizes(uint32_t *prop, int maxcells)
-{
-return -1;
-}
-
 static inline int kvmppc_set_interrupt(PowerPCCPU *cpu, int irq, int level)
 {
 return -1;
-- 
2.5.0

[Qemu-devel] [PULL 37/40] target-ppc: Helper to determine page size information from hpte alone

2016-01-31 Thread David Gibson

h_enter() in the spapr code needs to know the page size of the HPTE it's
about to insert.  Unlike other paths that do this, it doesn't have access
to the SLB, so at the moment it determines this with some open-coded
tests which assume POWER7 or POWER8 page size encodings.

To make this more flexible add ppc_hash64_hpte_page_shift_noslb() to
determine both the "base" page size per segment, and the individual
effective page size from an HPTE alone.

This means that the spapr code should now be able to handle any page size
listed in the env->sps table.

Signed-off-by: David Gibson 
Acked-by: Benjamin Herrenschmidt 
Reviewed-by: Alexander Graf 
---
 hw/ppc/spapr_hcall.c| 25 ++---
 target-ppc/mmu-hash64.c | 35 +++
 target-ppc/mmu-hash64.h |  3 +++
 3 files changed, 44 insertions(+), 19 deletions(-)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 0a8378c..12f8c33 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -73,31 +73,18 @@ static target_ulong h_enter(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
 target_ulong pte_index = args[1];
 target_ulong pteh = args[2];
 target_ulong ptel = args[3];
-target_ulong page_shift = 12;
+unsigned apshift, spshift;
 target_ulong raddr;
 target_ulong index;
 uint64_t token;
 
-/* only handle 4k and 16M pages for now */
-if (pteh & HPTE64_V_LARGE) {
-#if 0 /* We don't support 64k pages yet */
-if ((ptel & 0xf000) == 0x1000) {
-/* 64k page */
-} else
-#endif
-if ((ptel & 0xff000) == 0) {
-/* 16M page */
-page_shift = 24;
-/* lowest AVA bit must be 0 for 16M pages */
-if (pteh & 0x80) {
-return H_PARAMETER;
-}
-} else {
-return H_PARAMETER;
-}
+apshift = ppc_hash64_hpte_page_shift_noslb(cpu, pteh, ptel, &spshift);
+if (!apshift) {
+/* Bad page size encoding */
+return H_PARAMETER;
 }
 
-raddr = (ptel & HPTE64_R_RPN) & ~((1ULL << page_shift) - 1);
+raddr = (ptel & HPTE64_R_RPN) & ~((1ULL << apshift) - 1);
 
 if (is_ram_address(spapr, raddr)) {
 /* Regular RAM - should have WIMG=0010 */
diff --git a/target-ppc/mmu-hash64.c b/target-ppc/mmu-hash64.c
index 565a0f4..6d110ee 100644
--- a/target-ppc/mmu-hash64.c
+++ b/target-ppc/mmu-hash64.c
@@ -513,6 +513,41 @@ static unsigned hpte_page_shift(const struct 
ppc_one_seg_page_size *sps,
 return 0; /* Bad page size encoding */
 }
 
+unsigned ppc_hash64_hpte_page_shift_noslb(PowerPCCPU *cpu,
+  uint64_t pte0, uint64_t pte1,
+  unsigned *seg_page_shift)
+{
+CPUPPCState *env = &cpu->env;
+int i;
+
+if (!(pte0 & HPTE64_V_LARGE)) {
+*seg_page_shift = 12;
+return 12;
+}
+
+/*
+ * The encodings in env->sps need to be carefully chosen so that
+ * this gives an unambiguous result.
+ */
+for (i = 0; i < PPC_PAGE_SIZES_MAX_SZ; i++) {
+const struct ppc_one_seg_page_size *sps = &env->sps.sps[i];
+unsigned shift;
+
+if (!sps->page_shift) {
+break;
+}
+
+shift = hpte_page_shift(sps, pte0, pte1);
+if (shift) {
+*seg_page_shift = sps->page_shift;
+return shift;
+}
+}
+
+*seg_page_shift = 0;
+return 0;
+}
+
 int ppc_hash64_handle_mmu_fault(PowerPCCPU *cpu, target_ulong eaddr,
 int rwx, int mmu_idx)
 {
diff --git a/target-ppc/mmu-hash64.h b/target-ppc/mmu-hash64.h
index 293a951..34cf975 100644
--- a/target-ppc/mmu-hash64.h
+++ b/target-ppc/mmu-hash64.h
@@ -16,6 +16,9 @@ void ppc_hash64_store_hpte(PowerPCCPU *cpu, target_ulong 
index,
 void ppc_hash64_tlb_flush_hpte(PowerPCCPU *cpu,
target_ulong pte_index,
target_ulong pte0, target_ulong pte1);
+unsigned ppc_hash64_hpte_page_shift_noslb(PowerPCCPU *cpu,
+  uint64_t pte0, uint64_t pte1,
+  unsigned *seg_page_shift);
 #endif
 
 /*
-- 
2.5.0

[Qemu-devel] [PULL 23/40] target-ppc: gdbstub: fix altivec registers for little-endian guests

2016-01-31 Thread David Gibson

From: Greg Kurz 

Altivec registers are 128-bit wide. They are stored in memory as two
64-bit values that must be byteswapped when the guest is little-endian.
Let's reuse the ppc_maybe_bswap_register() helper for this.

We also need to fix the ordering of the 64-bit elements according to
the target endianness, for both system and user mode.

Signed-off-by: Greg Kurz 
Signed-off-by: David Gibson 
---
 target-ppc/translate_init.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 0d6d115..1174141 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -8753,9 +8753,9 @@ static void dump_ppc_insns (CPUPPCState *env)
 static bool avr_need_swap(CPUPPCState *env)
 {
 #ifdef HOST_WORDS_BIGENDIAN
-return false;
+return msr_le;
 #else
-return true;
+return !msr_le;
 #endif
 }
 
@@ -8799,14 +8799,18 @@ static int gdb_get_avr_reg(CPUPPCState *env, uint8_t 
*mem_buf, int n)
 stq_p(mem_buf, env->avr[n].u64[1]);
 stq_p(mem_buf+8, env->avr[n].u64[0]);
 }
+ppc_maybe_bswap_register(env, mem_buf, 8);
+ppc_maybe_bswap_register(env, mem_buf + 8, 8);
 return 16;
 }
 if (n == 32) {
 stl_p(mem_buf, env->vscr);
+ppc_maybe_bswap_register(env, mem_buf, 4);
 return 4;
 }
 if (n == 33) {
 stl_p(mem_buf, (uint32_t)env->spr[SPR_VRSAVE]);
+ppc_maybe_bswap_register(env, mem_buf, 4);
 return 4;
 }
 return 0;
@@ -8815,6 +8819,8 @@ static int gdb_get_avr_reg(CPUPPCState *env, uint8_t 
*mem_buf, int n)
 static int gdb_set_avr_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 {
 if (n < 32) {
+ppc_maybe_bswap_register(env, mem_buf, 8);
+ppc_maybe_bswap_register(env, mem_buf + 8, 8);
 if (!avr_need_swap(env)) {
 env->avr[n].u64[0] = ldq_p(mem_buf);
 env->avr[n].u64[1] = ldq_p(mem_buf+8);
@@ -8825,10 +8831,12 @@ static int gdb_set_avr_reg(CPUPPCState *env, uint8_t 
*mem_buf, int n)
 return 16;
 }
 if (n == 32) {
+ppc_maybe_bswap_register(env, mem_buf, 4);
 env->vscr = ldl_p(mem_buf);
 return 4;
 }
 if (n == 33) {
+ppc_maybe_bswap_register(env, mem_buf, 4);
 env->spr[SPR_VRSAVE] = (target_ulong)ldl_p(mem_buf);
 return 4;
 }
-- 
2.5.0

[Qemu-devel] [PULL 28/40] uninorth.c: add support for UniNorth kMacRISCPCIAddressSelect (0x48) register

2016-01-31 Thread David Gibson

From: Programmingkid 

Darwin/OS X use the undocumented kMacRISCPCIAddressSelect (0x48) to
configure PCI memory space size for mac99 machines. Without this
register, warnings similar to below are emitted to the console during boot:

AppleMacRiscPCI: bad range 2(8000:0100)
AppleMacRiscPCI: bad range 2(8100:1000)
AppleMacRiscPCI: bad range 2(8108:0008)

Based upon the algorithm in Darwin's AppleMacRiscPCI.cpp driver, set the
kMacRISCPCIAddressSelect register so that Darwin considers the PCI
memory space to be at 0x8000 (size 0x1000) which matches that
currently used by QEMU and OpenBIOS.

Signed-off-by: John Arbuckle 
Tested-by: Mark Cave-Ayland 
[commit message and comment revised as suggested by Mark Cave-Ayland]
Signed-off-by: David Gibson 
---
 hw/pci-host/uninorth.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/hw/pci-host/uninorth.c b/hw/pci-host/uninorth.c
index 778f8e6..40a2e3e 100644
--- a/hw/pci-host/uninorth.c
+++ b/hw/pci-host/uninorth.c
@@ -331,6 +331,15 @@ static void unin_agp_pci_host_realize(PCIDevice *d, Error 
**errp)
 d->config[0x0C] = 0x08; // cache_line_size
 d->config[0x0D] = 0x10; // latency_timer
 //d->config[0x34] = 0x80; // capabilities_pointer
+/*
+ * Set kMacRISCPCIAddressSelect (0x48) register to indicate PCI
+ * memory space with base 0x8000, size 0x1000 for Apple's
+ * AppleMacRiscPCI driver
+ */
+d->config[0x48] = 0x0;
+d->config[0x49] = 0x0;
+d->config[0x4a] = 0x0;
+d->config[0x4b] = 0x1;
 }
 
 static void u3_agp_pci_host_realize(PCIDevice *d, Error **errp)
-- 
2.5.0

Re: [Qemu-devel] [PATCH v3] blockjob: Fix hang in block_job_finish_sync

2016-01-31 Thread Fam Zheng

On Fri, 01/29 11:31, Stefan Hajnoczi wrote:
> On Fri, Jan 29, 2016 at 10:19:49AM +0800, Fam Zheng wrote:
> > @@ -402,6 +407,10 @@ typedef void BlockJobDeferToMainLoopFn(BlockJob *job, 
> > void *opaque);
> >   * AioContext acquired.  Block jobs must call bdrv_unref(), bdrv_close(), 
> > and
> >   * anything that uses bdrv_drain_all() in the main loop.
> >   *
> > + * The job->deferred_to_main_loop flag will be set. Caller must clear it 
> > once
> > + * the deferred work is done and the block job coroutine continues, unless 
> > it's
> > + * completing immediately.
> > + *
> 
> It's not necessary to expose job->deferred_to_main_loop to the user.
> Just clear it:
> 
> static void block_job_defer_to_main_loop_bh(void *opaque)
> {
> BlockJobDeferToMainLoopData *data = opaque;
> AioContext *aio_context;
> 
> qemu_bh_delete(data->bh);
> 
> /* Prevent race with block_job_defer_to_main_loop() */
> aio_context_acquire(data->aio_context);
> 
> /* Fetch BDS AioContext again, in case it has changed */
> aio_context = bdrv_get_aio_context(data->job->bs);
> aio_context_acquire(aio_context);
> 
> data->fn(data->job, data->opaque);
> job->deferred_to_main_loop = false;  /* <- HERE */

Maybe move one line above in case data->fn() does another
block_job_defer_to_main_loop()?

Fam

> 
> aio_context_release(aio_context);
> 
> aio_context_release(data->aio_context);
> 
> g_free(data);
> }

[Qemu-devel] [PULL 18/40] pseries: Clean up error reporting in htab migration functions

2016-01-31 Thread David Gibson

The functions for migrating the hash page table on pseries machine type
(htab_save_setup() and htab_load()) can report some errors with an
explicit fprintf() before returning an appropriate error code.  Change some
of these to use error_report() instead. htab_save_setup() is omitted for
now to avoid conflicts with some other in-progress work.

Signed-off-by: David Gibson 
Reviewed-by: Thomas Huth 
Reviewed-by: Alexey Kardashevskiy 
Reviewed-by: Markus Armbruster 
---
 hw/ppc/spapr.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index c05ddfb..5bd8fd3 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1534,7 +1534,7 @@ static int htab_load(QEMUFile *f, void *opaque, int 
version_id)
 int fd = -1;
 
 if (version_id < 1 || version_id > 1) {
-fprintf(stderr, "htab_load() bad version\n");
+error_report("htab_load() bad version");
 return -EINVAL;
 }
 
@@ -1555,8 +1555,8 @@ static int htab_load(QEMUFile *f, void *opaque, int 
version_id)
 
 fd = kvmppc_get_htab_fd(true);
 if (fd < 0) {
-fprintf(stderr, "Unable to open fd to restore KVM hash table: 
%s\n",
-strerror(errno));
+error_report("Unable to open fd to restore KVM hash table: %s",
+ strerror(errno));
 }
 }
 
@@ -1576,9 +1576,9 @@ static int htab_load(QEMUFile *f, void *opaque, int 
version_id)
 if ((index + n_valid + n_invalid) >
 (HTAB_SIZE(spapr) / HASH_PTE_SIZE_64)) {
 /* Bad index in stream */
-fprintf(stderr, "htab_load() bad index %d (%hd+%hd entries) "
-"in htab stream (htab_shift=%d)\n", index, n_valid, 
n_invalid,
-spapr->htab_shift);
+error_report(
+"htab_load() bad index %d (%hd+%hd entries) in htab stream 
(htab_shift=%d)",
+index, n_valid, n_invalid, spapr->htab_shift);
 return -EINVAL;
 }
 
-- 
2.5.0

[Qemu-devel] [PULL 33/40] target-ppc: Use actual page size encodings from HPTE

2016-01-31 Thread David Gibson

At present the 64-bit hash MMU code uses information from the SLB to
determine the page size of a translation.  We do need that information to
correctly look up the hash table.  However the MMU also allows a
possibly larger page size to be encoded into the HPTE itself, which is used
to populate the TLB.  At present qemu doesn't check that, and so doesn't
support the MPSS "Multiple Page Size per Segment" feature.

This makes a start on allowing this, by adding an hpte_page_shift()
function which looks up the page size of an HPTE.  We use this to validate
page sizes encodings on faults, and populate the qemu TLB with larger
page sizes when appropriate.

Signed-off-by: David Gibson 
Acked-by: Benjamin Herrenschmidt 
Reviewed-by: Alexander Graf 
---
 target-ppc/mmu-hash64.c | 63 ++---
 1 file changed, 60 insertions(+), 3 deletions(-)

diff --git a/target-ppc/mmu-hash64.c b/target-ppc/mmu-hash64.c
index 9ad02f3..f4c25b7 100644
--- a/target-ppc/mmu-hash64.c
+++ b/target-ppc/mmu-hash64.c
@@ -22,6 +22,7 @@
 #include "exec/helper-proto.h"
 #include "qemu/error-report.h"
 #include "sysemu/kvm.h"
+#include "qemu/error-report.h"
 #include "kvm_ppc.h"
 #include "mmu-hash64.h"
 
@@ -475,12 +476,50 @@ static hwaddr ppc_hash64_htab_lookup(PowerPCCPU *cpu,
 return pte_offset;
 }
 
+static unsigned hpte_page_shift(const struct ppc_one_seg_page_size *sps,
+uint64_t pte0, uint64_t pte1)
+{
+int i;
+
+if (!(pte0 & HPTE64_V_LARGE)) {
+if (sps->page_shift != 12) {
+/* 4kiB page in a non 4kiB segment */
+return 0;
+}
+/* Normal 4kiB page */
+return 12;
+}
+
+for (i = 0; i < PPC_PAGE_SIZES_MAX_SZ; i++) {
+const struct ppc_one_page_size *ps = &sps->enc[i];
+uint64_t mask;
+
+if (!ps->page_shift) {
+break;
+}
+
+if (ps->page_shift == 12) {
+/* L bit is set so this can't be a 4kiB page */
+continue;
+}
+
+mask = ((1ULL << ps->page_shift) - 1) & HPTE64_R_RPN;
+
+if ((pte1 & mask) == (ps->pte_enc << HPTE64_R_RPN_SHIFT)) {
+return ps->page_shift;
+}
+}
+
+return 0; /* Bad page size encoding */
+}
+
 int ppc_hash64_handle_mmu_fault(PowerPCCPU *cpu, target_ulong eaddr,
 int rwx, int mmu_idx)
 {
 CPUState *cs = CPU(cpu);
 CPUPPCState *env = &cpu->env;
 ppc_slb_t *slb;
+unsigned apshift;
 hwaddr pte_offset;
 ppc_hash_pte64_t pte;
 int pp_prot, amr_prot, prot;
@@ -544,6 +583,18 @@ int ppc_hash64_handle_mmu_fault(PowerPCCPU *cpu, 
target_ulong eaddr,
 qemu_log_mask(CPU_LOG_MMU,
 "found PTE at offset %08" HWADDR_PRIx "\n", pte_offset);
 
+/* Validate page size encoding */
+apshift = hpte_page_shift(slb->sps, pte.pte0, pte.pte1);
+if (!apshift) {
+error_report("Bad page size encoding in HPTE 0x%"PRIx64" - 0x%"PRIx64
+ " @ 0x%"HWADDR_PRIx, pte.pte0, pte.pte1, pte_offset);
+/* Not entirely sure what the right action here, but machine
+ * check seems reasonable */
+cs->exception_index = POWERPC_EXCP_MCHECK;
+env->error_code = 0;
+return 1;
+}
+
 /* 5. Check access permissions */
 
 pp_prot = ppc_hash64_pte_prot(cpu, slb, pte);
@@ -596,10 +647,10 @@ int ppc_hash64_handle_mmu_fault(PowerPCCPU *cpu, 
target_ulong eaddr,
 
 /* 7. Determine the real address from the PTE */
 
-raddr = deposit64(pte.pte1 & HPTE64_R_RPN, 0, slb->sps->page_shift, eaddr);
+raddr = deposit64(pte.pte1 & HPTE64_R_RPN, 0, apshift, eaddr);
 
 tlb_set_page(cs, eaddr & TARGET_PAGE_MASK, raddr & TARGET_PAGE_MASK,
- prot, mmu_idx, TARGET_PAGE_SIZE);
+ prot, mmu_idx, 1ULL << apshift);
 
 return 0;
 }
@@ -610,6 +661,7 @@ hwaddr ppc_hash64_get_phys_page_debug(PowerPCCPU *cpu, 
target_ulong addr)
 ppc_slb_t *slb;
 hwaddr pte_offset;
 ppc_hash_pte64_t pte;
+unsigned apshift;
 
 if (msr_dr == 0) {
 /* In real mode the top 4 effective address bits are ignored */
@@ -626,7 +678,12 @@ hwaddr ppc_hash64_get_phys_page_debug(PowerPCCPU *cpu, 
target_ulong addr)
 return -1;
 }
 
-return deposit64(pte.pte1 & HPTE64_R_RPN, 0, slb->sps->page_shift, addr)
+apshift = hpte_page_shift(slb->sps, pte.pte0, pte.pte1);
+if (!apshift) {
+return -1;
+}
+
+return deposit64(pte.pte1 & HPTE64_R_RPN, 0, apshift, addr)
 & TARGET_PAGE_MASK;
 }
 
-- 
2.5.0

1 2 >

1 - 100 of 137 matches

Mail list logo