date:20140901

Re: [Qemu-devel] [PATCH 2/2] i386: Add a Virtual Machine Generation ID device.

2014-09-01 Thread Gal Hammer


On 17/08/2014 12:49, Paolo Bonzini wrote:

Il 12/08/2014 10:02, Gal Hammer ha scritto:

Hi,

On 10/08/2014 20:22, Paolo Bonzini wrote:


Il 10/08/2014 13:32, Gal Hammer ha scritto:

Based on Microsoft's sepecifications (paper can be dowloaded from
http://go.microsoft.com/fwlink/?LinkId=260709), add a device
description to the SSDT ACPI table.

The GUID is set using a new "-vmgenid" command line parameter.

Signed-off-by: Gal Hammer 
---
   hw/i386/acpi-build.c  | 23 +++
   hw/i386/ssdt-misc.dsl | 33 +
   qemu-options.hx   |  9 +
   vl.c  | 11 +++
   4 files changed, 76 insertions(+)


Please make this a new device (like pvpanic), instead of adding a new
command-line option.


There is a problem with this request. I don't want to use ISA because it
is obsolete, PCI is overkill for such a device and a SYSBUS (like HPET)
device doesn't effect the command line options.

Did I miss something in SYSBUS and that's was the reason it didn't
appear in the "-device ?" list?


For a sysbus device, you can override the
cannot_instantiate_with_device_add_yet field of DeviceClass in your
class_init function.



+Scope(\_SB) {
+
+Device(VMGI) {
+Name(_HID, "QEMU0002")
+Name(_CID, "VM_Gen_Counter")
+Name(_DDN, "VM_Gen_Counter")
+
+ACPI_EXTRACT_NAME_DWORD_CONST ssdt_acpi_vm_gid_addr
+Name(VGIA, 0x12345678)
+
+ACPI_EXTRACT_NAME_BUFFER16 ssdt_acpi_vm_gid
+Name(VGID, Buffer(16) {
+0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 })
+
+Method(_STA, 0, NotSerialized) {
+Store(VGIA, Local0)
+If (LEqual(Local0, Zero)) {
+Return (0x00)
+} Else {
+Return (0x0F)
+}
+}
+
+Method(ADDR, 0, Serialized) {
+Store(Package(2) { }, Local0)
+Store(VGIA, Index(Local0, 0))
+Store(0x, Index(Local0, 1))
+return (Local0)
+}
+}
+}
+


Please either put this in the DSDT, or omit the Device altogether if you
put it in the SSDT and there is no VMGID device.


I'm not sure I understand this comment. I've put the new device in the
SSDT table (like pvpanic) and add a _STA method which disable the device
if no GUID's address is set (VGIA). The device doesn't show in the
Device Manager if it was not added using the command line.


We are still in the process of defining which devices/methods go in the
DSDT and which go in the SSDT.  We had bad experiences with ACPI table
migration in 2.1, and one plan to fix them is the following:

* the DSDT should always be the same size no matter what command line
options are there

* the SSDT should have the exact same content (byte-for-byte) for
different versions of QEMU, with the same command line options
(including the machine type).

Right now your code obeys the first rule, not the second rule, so it
should add the device to the DSDT.


Are you sure about selecting the DSDT? I don't see anyone else is using 
the ACPI_EXTRACT_NAME_* macros in the DSDT table (and I keep crashing my 
guest, but ignore it for now ;-)).



BTW, which events would cause the ID to change?  How should live cloning
(or revert to a disk+RAM snapshot) be implemented by layers above QEMU
for the VM gen ID to be patched?  Can you add something to docs/ about it?


The VGID is expected to change when executing a VM with the -snapshot 
option, when a VM is restored from a backup or when it is imported, 
copied or cloned. So I would say it is a management's call.


I think that the Microsoft's document describes the requirements better 
than me :-).



Also, how does this ID compare to the UUID in the DMI info (-uuid)?


The -uuid is not expected to change after the VM was created. Unlike the 
-vmgenid that is designed to give the guest OS a notification that a 
change has occurred. Microsoft, as an example, writes that is can be use 
for a safer cryptographic software.



Paolo



Gal.

Re: [Qemu-devel] [PATCH v8 0/4] s390: Support for Hotplug of Standby Memory

2014-09-01 Thread Christian Borntraeger

On 28/08/14 17:25, Matthew Rosato wrote:
> This patchset adds support in s390 for a pool of standby memory,
> which can be set online/offline by the guest (ie, via chmem).
> The standby pool of memory is allocated as the difference between 
> the initial memory setting and the maxmem setting.
> As part of this work, additional results are provided for the 
> Read SCP Information SCLP, and new implentation is added for the 
> Read Storage Element Information, Attach Storage Element, 
> Assign Storage and Unassign Storage SCLPs, which enables the s390 
> guest to manipulate the standby memory pool.
> 
> This patchset is based on work originally done by Jeng-Fang (Nick)
> Wang.
> 
> Sample qemu command snippet:
> 
> qemu -machine s390-ccw-virtio  -m 1024M,maxmem=2048M,slots=32 -enable-kvm
> 
> This will allocate 1024M of active memory, and another 1024M 
> of standby memory.  Example output from s390-tools lsmem:
> =
> 0x-0x0fff256  online   no 0-127
> 0x1000-0x1fff256  online   yes128-255
> 0x2000-0x3fff512  online   no 256-511
> 0x4000-0x7fff   1024  offline  -  512-1023
> 
> Memory device size  : 2 MB
> Memory block size   : 256 MB
> Total online memory : 1024 MB
> Total offline memory: 1024 MB
> 
> 
> The guest can dynamically enable part or all of the standby pool 
> via the s390-tools chmem, for example:
> 
> chmem -e 512M
> 
> And can attempt to dynamically disable:
> 
> chmem -d 512M
> 
> Changes for v8:
>  * In unassign_storage, replace memory_region_destroy with a call to 
>object_unparent. We need a call to object_unparent here, as the region 
>may be re-used later.
> 
> Changes for v7:
>  * Added patch to enforce the same memory alignments in s390-virtio.c,
>so that shared code (like sclp) doesn't need to be dual paths.  
> 
> Changes for v6:
>  * Fix in sclp.h - DeviceState parent --> SysBusDevice parent 
>in struct sclpMemoryHotplugDev.
>  * Fix in assign_storage - int this_subregion_size, should 
>be uint64_t.
>  * Added information on how to test in the cover letter.  
> 
> Changes for v5:
>  * Since ACPI memory hotplug is now in, removed Igor's patches 
>from this set.
>  * Updated sclp.c to use object_resolve_path() instead of 
>object_property_find().
> 
> Matthew Rosato (4):
>   sclp-s390: Add device to manage s390 memory hotplug
>   virtio-ccw: Include standby memory when calculating storage increment
>   s390-virtio: Apply same memory boundaries as virtio-ccw
>   sclp-s390: Add memory hotplug SCLPs
> 
>  hw/s390x/s390-virtio-ccw.c |   46 +--
>  hw/s390x/s390-virtio.c |   15 ++-
>  hw/s390x/sclp.c|  289 
> +++-
>  include/hw/s390x/sclp.h|   20 +++
>  qemu-options.hx|3 +-
>  target-s390x/cpu.h |   18 +++
>  target-s390x/kvm.c |5 +
>  7 files changed, 375 insertions(+), 21 deletions(-)
> 


Applied.
There was a small mismatch due to the latest nmi qomification. Can you double 
check my tree
git://github.com/borntraeger/qemu.git s390-next
that your patches made it properly into this tree?

Christian

Re: [Qemu-devel] [PATCH] dump: let dump_error printf the error reason

2014-09-01 Thread zhanghailiang


On 2014/8/29 20:55, Luiz Capitulino wrote:

On Fri, 29 Aug 2014 16:06:18 +0800
zhanghailiang  wrote:


On 2014/8/27 21:18, Luiz Capitulino wrote:

On Wed, 27 Aug 2014 19:18:53 +0800
zhanghailiang   wrote:


The second parameter of dump_error is unused, but one purpose of
using this function is to report the error info.

Signed-off-by: zhanghailiang
---
   dump.c | 3 +++
   1 file changed, 3 insertions(+)

diff --git a/dump.c b/dump.c
index 71d3e94..0f44e9d 100644
--- a/dump.c
+++ b/dump.c
@@ -83,6 +83,9 @@ static int dump_cleanup(DumpState *s)

   static void dump_error(DumpState *s, const char *reason)
   {
+if (reason) {
+error_report("%s", reason);
+}
   dump_cleanup(s);
   }



Good catch, but error_report() will report the error only to the user. This
is QMP code, so we have to use the Error API.

I think that the best way to solve this is to make dump_error() fill an
Error object (eg. by calling error_setg()) and then returning it after
the call to dump_cleanup(). Of course that you will have to change _all_
code paths calling dump_error() to propagate the error up.

For more information on this, please read docs/writing-qmp-commands.txt.
You can also take a look at simple commands doing error propagation, like
qmp_cont() or qmp_block_passwd().

.


Hi,

Thanks for your review.

Actually, for all paths that call dump_error, at last they will come into a
common path which will call error_setg().

The call process as below:
qmp_dump_guest_memory
(1) -->create_kdump_vmcore
-->write_start_flat_header
-->dump_error
-->write_end_flat_header
-->dump_error
...
  -->error_set(errp, QERR_IO_ERROR)(if create_kdump_vmcore failed)
(2) -->create_vmcore
-->dump_begin
-->write_elf64_header
-->dump_error
-->write_elf64_note
-->dump_error
...
-->dump_iterate
-->write_data
-->dump_error
...
-->error_set(errp, QERR_IO_ERROR)(if create_kdump_vmcore failed)

And a short *IO ERROR* info will be returned to the caller of 
qmp_dump_guest_memory.

So, is it OK in dump_error just report the detailed error info to users
(actually, it will be stored in qemu log)? Or should these error info
also returned to the caller?


The errors should be propagated up to the caller. This way they can be
consumed by QMP and HMP.



OK, I will fix this and submit the second version. Thanks:)



What's your suggestion? Thanks.:)

Best Regards,
zhanghailiang







.

[Qemu-devel] [PATCH 1/8] block/quorum: initialize qcrs.aiocb for read

2014-09-01 Thread Liu Yuan

This is required by quorum_aio_cancel()

Cc: Eric Blake 
Cc: Benoit Canet 
Cc: Kevin Wolf 
Cc: Stefan Hajnoczi 
Signed-off-by: Liu Yuan 
---
 block/quorum.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index af48e8c..5866bca 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -653,8 +653,10 @@ static BlockDriverAIOCB *read_quorum_children(QuorumAIOCB 
*acb)
 }
 
 for (i = 0; i < s->num_children; i++) {
-bdrv_aio_readv(s->bs[i], acb->sector_num, &acb->qcrs[i].qiov,
-   acb->nb_sectors, quorum_aio_cb, &acb->qcrs[i]);
+acb->qcrs[i].aiocb = bdrv_aio_readv(s->bs[i], acb->sector_num,
+&acb->qcrs[i].qiov,
+acb->nb_sectors, quorum_aio_cb,
+&acb->qcrs[i]);
 }
 
 return &acb->common;
@@ -663,15 +665,14 @@ static BlockDriverAIOCB *read_quorum_children(QuorumAIOCB 
*acb)
 static BlockDriverAIOCB *read_fifo_child(QuorumAIOCB *acb)
 {
 BDRVQuorumState *s = acb->common.bs->opaque;
-
-acb->qcrs[acb->child_iter].buf = qemu_blockalign(s->bs[acb->child_iter],
- acb->qiov->size);
-qemu_iovec_init(&acb->qcrs[acb->child_iter].qiov, acb->qiov->niov);
-qemu_iovec_clone(&acb->qcrs[acb->child_iter].qiov, acb->qiov,
- acb->qcrs[acb->child_iter].buf);
-bdrv_aio_readv(s->bs[acb->child_iter], acb->sector_num,
-   &acb->qcrs[acb->child_iter].qiov, acb->nb_sectors,
-   quorum_aio_cb, &acb->qcrs[acb->child_iter]);
+int i = acb->child_iter;
+
+acb->qcrs[i].buf = qemu_blockalign(s->bs[i], acb->qiov->size);
+qemu_iovec_init(&acb->qcrs[i].qiov, acb->qiov->niov);
+qemu_iovec_clone(&acb->qcrs[i].qiov, acb->qiov, acb->qcrs[i].buf);
+acb->qcrs[i].aiocb = bdrv_aio_readv(s->bs[i], acb->sector_num,
+&acb->qcrs[i].qiov, acb->nb_sectors,
+quorum_aio_cb, &acb->qcrs[i]);
 
 return &acb->common;
 }
-- 
1.9.1

[Qemu-devel] [PATCH 4/8] block/quorum: add quorum_aio_release() helper

2014-09-01 Thread Liu Yuan

Cc: Eric Blake 
Cc: Benoit Canet 
Cc: Kevin Wolf 
Cc: Stefan Hajnoczi 
Signed-off-by: Liu Yuan 
---
 block/quorum.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index 5866bca..9e056d6 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -130,6 +130,12 @@ struct QuorumAIOCB {
 
 static bool quorum_vote(QuorumAIOCB *acb);
 
+static void quorum_aio_release(QuorumAIOCB *acb)
+{
+g_free(acb->qcrs);
+qemu_aio_release(acb);
+}
+
 static void quorum_aio_cancel(BlockDriverAIOCB *blockacb)
 {
 QuorumAIOCB *acb = container_of(blockacb, QuorumAIOCB, common);
@@ -141,8 +147,7 @@ static void quorum_aio_cancel(BlockDriverAIOCB *blockacb)
 bdrv_aio_cancel(acb->qcrs[i].aiocb);
 }
 
-g_free(acb->qcrs);
-qemu_aio_release(acb);
+quorum_aio_release(acb);
 }
 
 static AIOCBInfo quorum_aiocb_info = {
@@ -168,8 +173,7 @@ static void quorum_aio_finalize(QuorumAIOCB *acb)
 }
 }
 
-g_free(acb->qcrs);
-qemu_aio_release(acb);
+quorum_aio_release(acb);
 }
 
 static bool quorum_sha256_compare(QuorumVoteValue *a, QuorumVoteValue *b)
-- 
1.9.1

[Qemu-devel] [PATCH 0/8] add basic recovery logic to quorum driver

2014-09-01 Thread Liu Yuan

This patch set mainly add mainly two logics to implement device recover
- notify qourum driver of the broken states from the child driver(s)
- dirty track and sync the device after it is repaired

Thus quorum allow VMs to continue while some child devices are broken and when
the child devices are repaired and return back, we sync dirty bits during
downtime to keep data consistency.

The recovery logic is based on the driver state bitmap and will sync the dirty
bits with a timeslice window in a coroutine in this prtimive implementation.

Simple graph about 2 children with threshold=1 and read-pattern=fifo:
(similary to DRBD)

+ denote device sync iteration
- IO on a single device
= IO on two devices

  sync complete, release dirty bitmap
 ^
 |
  -++==
 | |
 | v
 |   device repaired and begin to sync
 v
   device broken, create a dirty bitmap

  This sync logic can take care of nested broken problem, that devices are
  broken while in sync. We just start a sync process after the devices are
  repaired again and switch the devices from broken to sound only when the sync
  completes.

For read-pattern=quorum mode, it enjoys the recovery logic without any problem.

Todo:
- use aio interface to sync data (multiple transfer in one go)
- dynamic slice window to control sync bandwidth more smoothly
- add auto-reconnection mechanism to other protocol (if not support yet)
- add tests

Cc: Eric Blake 
Cc: Benoit Canet 
Cc: Kevin Wolf 
Cc: Stefan Hajnoczi 

Liu Yuan (8):
  block/quorum: initialize qcrs.aiocb for read
  block: add driver operation callbacks
  block/sheepdog: propagate disconnect/reconnect events to upper driver
  block/quorum: add quorum_aio_release() helper
  quorum: fix quorum_aio_cancel()
  block/quorum: add broken state to BlockDriverState
  block: add two helpers
  quorum: add basic device recovery logic

 block.c   |  17 +++
 block/quorum.c| 324 +-
 block/sheepdog.c  |   9 ++
 include/block/block.h |   9 ++
 include/block/block_int.h |   6 +
 trace-events  |   5 +
 6 files changed, 336 insertions(+), 34 deletions(-)

-- 
1.9.1

[Qemu-devel] [PATCH 5/8] quorum: fix quorum_aio_cancel()

2014-09-01 Thread Liu Yuan

For a fifo read pattern, we only have one running aio (possible other cases that
has less number than num_children in the future), so we need to check if
.acb is NULL against bdrv_aio_cancel() to avoid segfault.

Cc: Eric Blake 
Cc: Benoit Canet 
Cc: Kevin Wolf 
Cc: Stefan Hajnoczi 
Signed-off-by: Liu Yuan 
---
 block/quorum.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/quorum.c b/block/quorum.c
index 9e056d6..b9eeda3 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -144,7 +144,9 @@ static void quorum_aio_cancel(BlockDriverAIOCB *blockacb)
 
 /* cancel all callbacks */
 for (i = 0; i < s->num_children; i++) {
-bdrv_aio_cancel(acb->qcrs[i].aiocb);
+if (acb->qcrs[i].aiocb) {
+bdrv_aio_cancel(acb->qcrs[i].aiocb);
+}
 }
 
 quorum_aio_release(acb);
-- 
1.9.1

[Qemu-devel] [PATCH 2/8] block: add driver operation callbacks

2014-09-01 Thread Liu Yuan

Driver operations are defined as callbacks passed from block upper drivers to
lower drivers and are supposed to be called by lower drivers.

Requests handling(queuing, submitting, etc.) are done in protocol tier in the
block layer and connection states are also maintained down there. Driver
operations are supposed to notify the upper tier (such as quorum) of the states
changes.

For now only two operation are added:

driver_disconnect: called when connection is off
driver_reconnect: called when connection is on after disconnection

Which are used to notify upper tier of the connection state.

Cc: Eric Blake 
Cc: Benoit Canet 
Cc: Kevin Wolf 
Cc: Stefan Hajnoczi 
Signed-off-by: Liu Yuan 
---
 block.c   | 7 +++
 include/block/block.h | 7 +++
 include/block/block_int.h | 3 +++
 3 files changed, 17 insertions(+)

diff --git a/block.c b/block.c
index c12b8de..22eb3e4 100644
--- a/block.c
+++ b/block.c
@@ -2152,6 +2152,13 @@ void bdrv_set_dev_ops(BlockDriverState *bs, const 
BlockDevOps *ops,
 bs->dev_opaque = opaque;
 }
 
+void bdrv_set_drv_ops(BlockDriverState *bs, const BlockDrvOps *ops,
+  void *opaque)
+{
+bs->drv_ops = ops;
+bs->drv_opaque = opaque;
+}
+
 static void bdrv_dev_change_media_cb(BlockDriverState *bs, bool load)
 {
 if (bs->dev_ops && bs->dev_ops->change_media_cb) {
diff --git a/include/block/block.h b/include/block/block.h
index 8f4ad16..a61eaf0 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -82,6 +82,11 @@ typedef struct BlockDevOps {
 void (*resize_cb)(void *opaque);
 } BlockDevOps;
 
+typedef struct BlockDrvOps {
+void (*driver_reconnect)(BlockDriverState *bs);
+void (*driver_disconnect)(BlockDriverState *bs);
+} BlockDrvOps;
+
 typedef enum {
 BDRV_REQ_COPY_ON_READ = 0x1,
 BDRV_REQ_ZERO_WRITE   = 0x2,
@@ -234,6 +239,8 @@ void bdrv_detach_dev(BlockDriverState *bs, void *dev);
 void *bdrv_get_attached_dev(BlockDriverState *bs);
 void bdrv_set_dev_ops(BlockDriverState *bs, const BlockDevOps *ops,
   void *opaque);
+void bdrv_set_drv_ops(BlockDriverState *bs, const BlockDrvOps *ops,
+  void *opaque);
 void bdrv_dev_eject_request(BlockDriverState *bs, bool force);
 bool bdrv_dev_has_removable_media(BlockDriverState *bs);
 bool bdrv_dev_is_tray_open(BlockDriverState *bs);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 2334895..9fdec7f 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -319,6 +319,9 @@ struct BlockDriverState {
 const BlockDevOps *dev_ops;
 void *dev_opaque;
 
+const BlockDrvOps *drv_ops;
+void *drv_opaque;
+
 AioContext *aio_context; /* event loop used for fd handlers, timers, etc */
 
 char filename[1024];
-- 
1.9.1

[Qemu-devel] [PATCH 8/8] quorum: add basic device recovery logic

2014-09-01 Thread Liu Yuan

For some configuration, quorum allow VMs to continue while some child devices
are broken and when the child devices are repaired and return back, we need to
sync dirty bits during downtime to keep data consistency.

The recovery logic is based on the driver state bitmap and will sync the dirty
bits with a timeslice window in a coroutine in this prtimive implementation.

Simple graph about 2 children with threshold=1 and read-pattern=fifo:

+ denote device sync iteration
- IO on a single device
= IO on two devices

  sync complete, release dirty bitmap
 ^
 |
  -++==
 | |
 | v
 |   device repaired and begin to sync
 v
   device broken, create a dirty bitmap

  This sync logic can take care of nested broken problem, that devices are
  broken while in sync. We just start a sync process after the devices are
  repaired again and switch the devices from broken to sound only when the sync
  completes.

For read-pattern=quorum mode, it enjoys the recovery logic without any problem.

Cc: Eric Blake 
Cc: Benoit Canet 
Cc: Kevin Wolf 
Cc: Stefan Hajnoczi 
Signed-off-by: Liu Yuan 
---
 block/quorum.c | 189 -
 trace-events   |   5 ++
 2 files changed, 191 insertions(+), 3 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index 7b07e35..ffd7c2d 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -23,6 +23,7 @@
 #include "qapi/qmp/qlist.h"
 #include "qapi/qmp/qstring.h"
 #include "qapi-event.h"
+#include "trace.h"
 
 #define HASH_LENGTH 32
 
@@ -31,6 +32,10 @@
 #define QUORUM_OPT_REWRITE"rewrite-corrupted"
 #define QUORUM_OPT_READ_PATTERN   "read-pattern"
 
+#define SLICE_TIME  1ULL /* 100 ms */
+#define CHUNK_SIZE  (1 << 20) /* 1M */
+#define SECTORS_PER_CHUNK   (CHUNK_SIZE >> BDRV_SECTOR_BITS)
+
 /* This union holds a vote hash value */
 typedef union QuorumVoteValue {
 char h[HASH_LENGTH];   /* SHA-256 hash */
@@ -64,6 +69,7 @@ typedef struct QuorumVotes {
 
 /* the following structure holds the state of one quorum instance */
 typedef struct BDRVQuorumState {
+BlockDriverState *mybs;/* Quorum block driver base state */
 BlockDriverState **bs; /* children BlockDriverStates */
 int num_children;  /* children count */
 int threshold; /* if less than threshold children reads gave the
@@ -82,6 +88,10 @@ typedef struct BDRVQuorumState {
 */
 
 QuorumReadPattern read_pattern;
+BdrvDirtyBitmap *dirty_bitmap;
+uint8_t *sync_buf;
+HBitmapIter hbi;
+int64_t sector_num;
 } BDRVQuorumState;
 
 typedef struct QuorumAIOCB QuorumAIOCB;
@@ -290,12 +300,11 @@ static void quorum_copy_qiov(QEMUIOVector *dest, 
QEMUIOVector *source)
 }
 }
 
-static int next_fifo_child(QuorumAIOCB *acb)
+static int get_good_child(BDRVQuorumState *s, int iter)
 {
-BDRVQuorumState *s = acb->common.bs->opaque;
 int i;
 
-for (i = acb->child_iter; i < s->num_children; i++) {
+for (i = iter; i < s->num_children; i++) {
 if (!s->bs[i]->broken) {
 break;
 }
@@ -306,6 +315,13 @@ static int next_fifo_child(QuorumAIOCB *acb)
 return i;
 }
 
+static int next_fifo_child(QuorumAIOCB *acb)
+{
+BDRVQuorumState *s = acb->common.bs->opaque;
+
+return get_good_child(s, acb->child_iter);
+}
+
 static void quorum_aio_cb(void *opaque, int ret)
 {
 QuorumChildRequest *sacb = opaque;
@@ -951,6 +967,171 @@ static int parse_read_pattern(const char *opt)
 return -EINVAL;
 }
 
+static void sync_prepare(BDRVQuorumState *qs, int64_t *num)
+{
+int64_t nb, total = bdrv_nb_sectors(qs->mybs);
+
+qs->sector_num = hbitmap_iter_next(&qs->hbi);
+/* Wrap around if previous bits get dirty while syncing */
+if (qs->sector_num < 0) {
+bdrv_dirty_iter_init(qs->mybs, qs->dirty_bitmap, &qs->hbi);
+qs->sector_num = hbitmap_iter_next(&qs->hbi);
+assert(qs->sector_num >= 0);
+}
+
+for (nb = 1; nb < SECTORS_PER_CHUNK && qs->sector_num + nb < total;
+ nb++) {
+if (!bdrv_get_dirty(qs->mybs, qs->dirty_bitmap, qs->sector_num + nb)) {
+break;
+}
+}
+*num = nb;
+}
+
+static void sync_finish(BDRVQuorumState *qs, int64_t num)
+{
+int64_t i;
+
+for (i = 0; i < num; i++) {
+/* We need to advance the iterator manually */
+hbitmap_iter_next(&qs->hbi);
+}
+bdrv_reset_dirty(qs->mybs, qs->sector_num, num);
+}
+
+static int quorum_sync_iteration(BDRVQuorumState *qs, BlockDriverState *target)
+{
+BlockDriverState *source;
+QEMUIOVector qiov;
+int ret, good;
+int64_t nb_sectors;
+struct iovec iov;
+const char *sname, *tname = bdrv_get_filename(target);
+
+good = get_good_child(qs, 0);
+if (good < 0) {
+

[Qemu-devel] [PATCH 3/8] block/sheepdog: propagate disconnect/reconnect events to upper driver

2014-09-01 Thread Liu Yuan

This is the reference usage how we propagate connection state to upper tier.

Cc: Eric Blake 
Cc: Benoit Canet 
Cc: Kevin Wolf 
Cc: Stefan Hajnoczi 
Signed-off-by: Liu Yuan 
---
 block/sheepdog.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 53c24d6..9c0fc49 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -714,6 +714,11 @@ static coroutine_fn void reconnect_to_sdog(void *opaque)
 {
 BDRVSheepdogState *s = opaque;
 AIOReq *aio_req, *next;
+BlockDriverState *bs = s->bs;
+
+if (bs->drv_ops && bs->drv_ops->driver_disconnect) {
+bs->drv_ops->driver_disconnect(bs);
+}
 
 aio_set_fd_handler(s->aio_context, s->fd, NULL, NULL, NULL);
 close(s->fd);
@@ -756,6 +761,10 @@ static coroutine_fn void reconnect_to_sdog(void *opaque)
 QLIST_INSERT_HEAD(&s->inflight_aio_head, aio_req, aio_siblings);
 resend_aioreq(s, aio_req);
 }
+
+if (bs->drv_ops && bs->drv_ops->driver_reconnect) {
+bs->drv_ops->driver_reconnect(bs);
+}
 }
 
 /*
-- 
1.9.1

Re: [Qemu-devel] [PATCH 1/5] target-ppc: Extend rtas-blob

2014-09-01 Thread Alexey Kardashevskiy

On 08/25/2014 11:45 PM, Aravinda Prasad wrote:
> Extend rtas-blob to accommodate error log. Error log
> structure is saved in rtas space upon a machine check
> exception.
> 
> Signed-off-by: Aravinda Prasad 
> ---
>  hw/ppc/spapr.c  |   13 ++---
>  hw/ppc/spapr_rtas.c |4 ++--
>  include/hw/ppc/spapr.h  |2 +-
>  pc-bios/spapr-rtas/spapr-rtas.S |   12 
>  4 files changed, 25 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index d01978f..1120988 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -85,6 +85,12 @@
>  
>  #define HTAB_SIZE(spapr)(1ULL << ((spapr)->htab_shift))
>  
> +/*
> + * The rtas-entry-offset should match the value specified in
> + * spapr-rtas.S
> + */
> +#define RTAS_ENTRY_OFFSET   0x1000
> +
>  typedef struct sPAPRMachineState sPAPRMachineState;
>  
>  #define TYPE_SPAPR_MACHINE  "spapr-machine"
> @@ -670,7 +676,8 @@ static int spapr_populate_memory(sPAPREnvironment *spapr, 
> void *fdt)
>  static void spapr_finalize_fdt(sPAPREnvironment *spapr,
> hwaddr fdt_addr,
> hwaddr rtas_addr,
> -   hwaddr rtas_size)
> +   hwaddr rtas_size,
> +   hwaddr rtas_entry)
>  {
>  int ret, i;
>  size_t cb = 0;
> @@ -705,7 +712,7 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr,
>  }
>  
>  /* RTAS */
> -ret = spapr_rtas_device_tree_setup(fdt, rtas_addr, rtas_size);
> +ret = spapr_rtas_device_tree_setup(fdt, rtas_addr, rtas_size, 
> rtas_entry);
>  if (ret < 0) {
>  fprintf(stderr, "Couldn't set up RTAS device tree properties\n");
>  }
> @@ -808,7 +815,7 @@ static void ppc_spapr_reset(void)
>  
>  /* Load the fdt */
>  spapr_finalize_fdt(spapr, spapr->fdt_addr, spapr->rtas_addr,
> -   spapr->rtas_size);
> +   spapr->rtas_size, spapr->rtas_addr + 
> RTAS_ENTRY_OFFSET);
>  
>  /* Set up the entry state */
>  first_ppc_cpu = POWERPC_CPU(first_cpu);
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 9ba1ba6..02ddbf9 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -328,7 +328,7 @@ void spapr_rtas_register(int token, const char *name, 
> spapr_rtas_fn fn)
>  }
>  
>  int spapr_rtas_device_tree_setup(void *fdt, hwaddr rtas_addr,
> - hwaddr rtas_size)
> + hwaddr rtas_size, hwaddr rtas_entry)
>  {
>  int ret;
>  int i;
> @@ -349,7 +349,7 @@ int spapr_rtas_device_tree_setup(void *fdt, hwaddr 
> rtas_addr,
>  }
>  
>  ret = qemu_fdt_setprop_cell(fdt, "/rtas", "linux,rtas-entry",
> -rtas_addr);
> +rtas_entry);
>  if (ret < 0) {
>  fprintf(stderr, "Couldn't add linux,rtas-entry property: %s\n",
>  fdt_strerror(ret));
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index bbba51a..dedfa67 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -436,7 +436,7 @@ target_ulong spapr_rtas_call(PowerPCCPU *cpu, 
> sPAPREnvironment *spapr,
>   uint32_t token, uint32_t nargs, target_ulong 
> args,
>   uint32_t nret, target_ulong rets);
>  int spapr_rtas_device_tree_setup(void *fdt, hwaddr rtas_addr,
> - hwaddr rtas_size);
> + hwaddr rtas_size, hwaddr rtas_entry);
>  
>  #define SPAPR_TCE_PAGE_SHIFT   12
>  #define SPAPR_TCE_PAGE_SIZE(1ULL << SPAPR_TCE_PAGE_SHIFT)
> diff --git a/pc-bios/spapr-rtas/spapr-rtas.S b/pc-bios/spapr-rtas/spapr-rtas.S
> index 903bec2..8c9b17e 100644
> --- a/pc-bios/spapr-rtas/spapr-rtas.S
> +++ b/pc-bios/spapr-rtas/spapr-rtas.S
> @@ -30,6 +30,18 @@
>  
>  .globl   _start
>  _start:
> + /*
> +  * Reserve space for error log in RTAS blob.
> +  *
> +  * Either we can reserve initial bytes for error log followed by
> +  * rtas-entry or space can be reserved after rtas-entry. I prefer
> +  * former, as we already have rtas-base and rtas-entry (currently
> +  * both pointing to rtas-base) defined in qemu and we can update
> +  * rtas-entry to point to an offset from rtas-base. This avoids
> +  * unnecessary definition of rtas-error-offset while keeping
> +  * rtas-entry redundant.
> +  */
> + . = 0x1000


Why not this (and not changing spapr-rtas.S)?

--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -875,7 +875,8 @@ static void ppc_spapr_reset(void)
spapr->rtas_size);

 /* Copy RTAS over */
-cpu_physical_memory_write(spapr->rtas_addr, spapr->rtas_blob,
+cpu_physical_memory_write(spapr->rtas_addr + RTAS_ENTRY_OFFSET,
+  spapr->rtas_blob,
   spapr->rt

[Qemu-devel] [PATCH 6/8] block/quorum: add broken state to BlockDriverState

2014-09-01 Thread Liu Yuan

This allow VM continues to process even if some devices are broken meanwhile
with proper configuration.

We mark the device broken when the protocol tier notify back some broken
state(s) of the device, such as diconnection via driver operations. We could
also reset the device as sound when the protocol tier is repaired.

Origianlly .threshold controls how we should decide the success of read/write
and return the failure only if the success count of read/write is less than
.threshold specified by users. But it doesn't record the states of underlying
states and will impact performance a bit in some cases.

For example, we have 3 children and .threshold is set 2. If one of the devices
broken, we should still return success and continue to run VM. But for every
IO operations, we will blindly send the requests to the broken device.

To store broken state into driver state we can save requests to borken devices
and resend the requests to the repaired ones by setting broken as false.

This is especially useful for network based protocol such as sheepdog, which
has a auto-reconnection mechanism and will never report EIO if the connection
is broken but just store the requests to its local queue and wait for resending.
Without broken state, quorum request will not come back until the connection is
re-established. So we have to skip the broken deivces to allow VM to continue
running with networked backed child (glusterfs, nfs, sheepdog, etc).

With the combination of read-pattern and threshold, we can easily mimic the DRVD
behavior with following configuration:

 read-pattern=fifo,threshold=1 will two children.

Cc: Eric Blake 
Cc: Benoit Canet 
Cc: Kevin Wolf 
Cc: Stefan Hajnoczi 
Signed-off-by: Liu Yuan 
---
 block/quorum.c| 102 ++
 include/block/block_int.h |   3 ++
 2 files changed, 87 insertions(+), 18 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index b9eeda3..7b07e35 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -120,6 +120,7 @@ struct QuorumAIOCB {
 int rewrite_count;  /* number of replica to rewrite: count down to
  * zero once writes are fired
  */
+int issued_count;   /* actual read&write issued count */
 
 QuorumVotes votes;
 
@@ -170,8 +171,10 @@ static void quorum_aio_finalize(QuorumAIOCB *acb)
 if (acb->is_read) {
 /* on the quorum case acb->child_iter == s->num_children - 1 */
 for (i = 0; i <= acb->child_iter; i++) {
-qemu_vfree(acb->qcrs[i].buf);
-qemu_iovec_destroy(&acb->qcrs[i].qiov);
+if (acb->qcrs[i].buf) {
+qemu_vfree(acb->qcrs[i].buf);
+qemu_iovec_destroy(&acb->qcrs[i].qiov);
+}
 }
 }
 
@@ -207,6 +210,7 @@ static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
 acb->count = 0;
 acb->success_count = 0;
 acb->rewrite_count = 0;
+acb->issued_count = 0;
 acb->votes.compare = quorum_sha256_compare;
 QLIST_INIT(&acb->votes.vote_list);
 acb->is_read = false;
@@ -286,6 +290,22 @@ static void quorum_copy_qiov(QEMUIOVector *dest, 
QEMUIOVector *source)
 }
 }
 
+static int next_fifo_child(QuorumAIOCB *acb)
+{
+BDRVQuorumState *s = acb->common.bs->opaque;
+int i;
+
+for (i = acb->child_iter; i < s->num_children; i++) {
+if (!s->bs[i]->broken) {
+break;
+}
+}
+if (i == s->num_children) {
+return -1;
+}
+return i;
+}
+
 static void quorum_aio_cb(void *opaque, int ret)
 {
 QuorumChildRequest *sacb = opaque;
@@ -293,11 +313,18 @@ static void quorum_aio_cb(void *opaque, int ret)
 BDRVQuorumState *s = acb->common.bs->opaque;
 bool rewrite = false;
 
+if (ret < 0) {
+s->bs[acb->child_iter]->broken = true;
+}
+
 if (acb->is_read && s->read_pattern == QUORUM_READ_PATTERN_FIFO) {
 /* We try to read next child in FIFO order if we fail to read */
-if (ret < 0 && ++acb->child_iter < s->num_children) {
-read_fifo_child(acb);
-return;
+if (ret < 0) {
+acb->child_iter = next_fifo_child(acb);
+if (acb->child_iter > 0) {
+read_fifo_child(acb);
+return;
+}
 }
 
 if (ret == 0) {
@@ -315,9 +342,9 @@ static void quorum_aio_cb(void *opaque, int ret)
 } else {
 quorum_report_bad(acb, sacb->aiocb->bs->node_name, ret);
 }
-assert(acb->count <= s->num_children);
-assert(acb->success_count <= s->num_children);
-if (acb->count < s->num_children) {
+assert(acb->count <= acb->issued_count);
+assert(acb->success_count <= acb->issued_count);
+if (acb->count < acb->issued_count) {
 return;
 }
 
@@ -647,22 +674,46 @@ free_exit:
 return rewrite;
 }
 
+static bool has_enough_children(BDRVQuorumState *s)
+{
+int good = 0, i;
+
+for (i = 0; i < s->num

[Qemu-devel] [PATCH 7/8] block: add two helpers

2014-09-01 Thread Liu Yuan

These helpers are needed by later quorum sync device logic.

Cc: Eric Blake 
Cc: Benoit Canet 
Cc: Kevin Wolf 
Cc: Stefan Hajnoczi 
Signed-off-by: Liu Yuan 
---
 block.c   | 10 ++
 include/block/block.h |  2 ++
 2 files changed, 12 insertions(+)

diff --git a/block.c b/block.c
index 22eb3e4..2e2f1d9 100644
--- a/block.c
+++ b/block.c
@@ -2145,6 +2145,16 @@ void *bdrv_get_attached_dev(BlockDriverState *bs)
 return bs->dev;
 }
 
+BlockDriverState *bdrv_get_file(BlockDriverState *bs)
+{
+return bs->file;
+}
+
+const char *bdrv_get_filename(BlockDriverState *bs)
+{
+return bs->filename;
+}
+
 void bdrv_set_dev_ops(BlockDriverState *bs, const BlockDevOps *ops,
   void *opaque)
 {
diff --git a/include/block/block.h b/include/block/block.h
index a61eaf0..1e116cc 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -237,6 +237,8 @@ int bdrv_attach_dev(BlockDriverState *bs, void *dev);
 void bdrv_attach_dev_nofail(BlockDriverState *bs, void *dev);
 void bdrv_detach_dev(BlockDriverState *bs, void *dev);
 void *bdrv_get_attached_dev(BlockDriverState *bs);
+BlockDriverState *bdrv_get_file(BlockDriverState *bs);
+const char *bdrv_get_filename(BlockDriverState *bs);
 void bdrv_set_dev_ops(BlockDriverState *bs, const BlockDevOps *ops,
   void *opaque);
 void bdrv_set_drv_ops(BlockDriverState *bs, const BlockDrvOps *ops,
-- 
1.9.1

Re: [Qemu-devel] [Xen-devel] [PATCH 2/2] xen:i386:pc_piix: create isa bridge specific to IGD passthrough

2014-09-01 Thread Chen, Tiejun


On 2014/9/1 14:05, Michael S. Tsirkin wrote:

On Mon, Sep 01, 2014 at 10:50:37AM +0800, Chen, Tiejun wrote:

On 2014/8/31 16:58, Michael S. Tsirkin wrote:

On Fri, Aug 29, 2014 at 09:28:50AM +0800, Chen, Tiejun wrote:



On 2014/8/28 8:56, Chen, Tiejun wrote:

+ */
+dev = pci_create_simple(bus, PCI_DEVFN(0x1f, 0),
+"xen-igd-passthrough-isa-bridge");
+if (dev) {
+r = xen_host_pci_device_get(&hdev, 0, 0, PCI_DEVFN(0x1f,
0), 0);
+if (!r) {
+pci_config_set_vendor_id(dev->config, hdev.vendor_id);
+pci_config_set_device_id(dev->config, hdev.device_id);


Can you, instead, implement the reverse logic, probing
the card and supplying the correct device id for PCH?



Here what is your so-called reverse logic as I already asked you
previously? Do you mean I should list all PCHs with a combo illustrated
with the vendor/device id in advance? Then look up if we can find a


Michael,



Ping.

Thanks
Tiejun


Could you explain this exactly? Then I can try follow-up your idea ASAP
if this is necessary and possible.


Michel,

Could you give us some explanation for your "reverse logic" when you're
free?

Thanks
Tiejun


So future drivers will look at device ID for the card
and figure out how things should work from there.
Old drivers still poke at device id of the chipset for this,
but maybe qemu can do what newer drivers do:
look at the card and figure out what guest should do,
then present the appropriate chipset id.

This is based on what Jesse said:
Practically speaking, we could probably assume specific GPU/PCH combos,
as I don't think they're generally mixed across generations, though SNB
and IVB did have compatible sockets, so there is the possibility of
mixing CPT and PPT PCHs, but those are register identical as far as the
graphics driver is concerned, so even that should be safe.



Michael,

Thanks for your explanation.


so the idea is to have a reverse table mapping GPU back to PCH.
Present to guest the ID that will let it assume the correct GPU.


I guess you mean we should create to maintain such a table:

[GPU Card: device_id(s), PCH: device_id]

Then each time, instead of exposing that real PCH device id directly, qemu
first can get the real GPU device id to lookup this table to present a
appropriate PCH's device id to the guest.

And looks here that appropriate PCH's device id is not always a that real
PCH's device id. Right? If I'm wrong please correct me.


Exactly: we don't really care what the PCH ID is - we only need it
for the guest driver to do the right thing.


Agreed.

I need to ask Allen to check if one given GPU card device id is always 
corresponding to one given PCH on both HSW and BDW currently. If yes, I 
can do this quickly. Otherwise I need some time to establish that sort 
of connection. 	


Thanks
Tiejun





the problem with these tables is they are hard to keep up to date


Yeah. But I think currently we can just start from some modern CPU such as
HSW and BDW, then things could be easy.

Allen,

Any idea to this suggestion?


as new hardware comes out, but as future hardware won't need
these hacks, we shall be fine.


Yeah.

Thanks
Tiejun






Thanks
Tiejun


matched PCH? If yes, what is that benefit you expect in passthrough
case? Shouldn't we pass these info to VM directly in passthrough case?

Thanks
Tiejun

[Qemu-devel] [PATCH 1/5] target-arm: add powered off cpu state

2014-09-01 Thread Ard Biesheuvel

From: Rob Herring 

Add tracking of cpu power state in order to support powering off of
cores in system emuluation. The initial state is determined by the
start-powered-off QOM property.

Signed-off-by: Rob Herring 
Signed-off-by: Ard Biesheuvel 
---
 target-arm/cpu-qom.h | 1 +
 target-arm/cpu.c | 7 ++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/target-arm/cpu-qom.h b/target-arm/cpu-qom.h
index 07f3c9e86639..f71f6a4ccc6b 100644
--- a/target-arm/cpu-qom.h
+++ b/target-arm/cpu-qom.h
@@ -98,6 +98,7 @@ typedef struct ARMCPU {
 
 /* Should CPU start in PSCI powered-off state? */
 bool start_powered_off;
+bool powered_off;
 
 /* [QEMU_]KVM_ARM_TARGET_* constant for this CPU, or
  * QEMU_KVM_ARM_TARGET_NONE if the kernel doesn't support this CPU type.
diff --git a/target-arm/cpu.c b/target-arm/cpu.c
index 8199f32e3267..b4c06c17cf87 100644
--- a/target-arm/cpu.c
+++ b/target-arm/cpu.c
@@ -40,7 +40,9 @@ static void arm_cpu_set_pc(CPUState *cs, vaddr value)
 
 static bool arm_cpu_has_work(CPUState *cs)
 {
-return cs->interrupt_request &
+ARMCPU *cpu = ARM_CPU(cs);
+
+return !cpu->powered_off && cs->interrupt_request &
 (CPU_INTERRUPT_FIQ | CPU_INTERRUPT_HARD | CPU_INTERRUPT_EXITTB);
 }
 
@@ -91,6 +93,9 @@ static void arm_cpu_reset(CPUState *s)
 env->vfp.xregs[ARM_VFP_MVFR1] = cpu->mvfr1;
 env->vfp.xregs[ARM_VFP_MVFR2] = cpu->mvfr2;
 
+cpu->powered_off = cpu->start_powered_off;
+s->halted = cpu->start_powered_off;
+
 if (arm_feature(env, ARM_FEATURE_IWMMXT)) {
 env->iwmmxt.cregs[ARM_IWMMXT_wCID] = 0x69051000 | 'Q';
 }
-- 
1.8.3.2

[Qemu-devel] [PATCH 4/5] target-arm: add emulation of PSCI calls for system emulation

2014-09-01 Thread Ard Biesheuvel

From: Rob Herring 

Add support for handling PSCI calls in system emulation. Both version
0.1 and 0.2 of the PSCI spec are supported. Platforms can enable support
by setting "psci-method" QOM property on the cpus to SMC or HVC
emulation and having PSCI binding in their dtb.

Signed-off-by: Rob Herring 
Signed-off-by: Ard Biesheuvel 
---
 target-arm/Makefile.objs |   1 +
 target-arm/cpu-qom.h |   6 ++
 target-arm/cpu.c |  10 +++-
 target-arm/cpu.h |   6 ++
 target-arm/helper.c  |  12 
 target-arm/psci.c| 152 +++
 6 files changed, 184 insertions(+), 3 deletions(-)
 create mode 100644 target-arm/psci.c

diff --git a/target-arm/Makefile.objs b/target-arm/Makefile.objs
index dcd167e0d880..deda9f49fec3 100644
--- a/target-arm/Makefile.objs
+++ b/target-arm/Makefile.objs
@@ -7,5 +7,6 @@ obj-$(call lnot,$(CONFIG_KVM)) += kvm-stub.o
 obj-y += translate.o op_helper.o helper.o cpu.o
 obj-y += neon_helper.o iwmmxt_helper.o
 obj-y += gdbstub.o
+obj-y += psci.o
 obj-$(TARGET_AARCH64) += cpu64.o translate-a64.o helper-a64.o gdbstub64.o
 obj-y += crypto_helper.o
diff --git a/target-arm/cpu-qom.h b/target-arm/cpu-qom.h
index bcdd1e0edb55..b1c96b2d194e 100644
--- a/target-arm/cpu-qom.h
+++ b/target-arm/cpu-qom.h
@@ -100,6 +100,11 @@ typedef struct ARMCPU {
 bool start_powered_off;
 bool powered_off;
 
+/* PSCI emulation state
+ * 0 - disabled, 1 - smc, 2 - hvc
+ */
+uint32_t psci_method;
+
 /* [QEMU_]KVM_ARM_TARGET_* constant for this CPU, or
  * QEMU_KVM_ARM_TARGET_NONE if the kernel doesn't support this CPU type.
  */
@@ -191,6 +196,7 @@ extern const struct VMStateDescription vmstate_arm_cpu;
 void register_cp_regs_for_features(ARMCPU *cpu);
 void init_cpreg_list(ARMCPU *cpu);
 
+bool arm_handle_psci(CPUState *cs);
 bool arm_cpu_do_hvc(CPUState *cs);
 bool arm_cpu_do_smc(CPUState *cs);
 
diff --git a/target-arm/cpu.c b/target-arm/cpu.c
index 633a533af716..cbd56be5149b 100644
--- a/target-arm/cpu.c
+++ b/target-arm/cpu.c
@@ -261,9 +261,12 @@ static void arm_cpu_initfn(Object *obj)
 cpu->psci_version = 1; /* By default assume PSCI v0.1 */
 cpu->kvm_target = QEMU_KVM_ARM_TARGET_NONE;
 
-if (tcg_enabled() && !inited) {
-inited = true;
-arm_translate_init();
+if (tcg_enabled()) {
+cpu->psci_version = 2; /* TCG implements PSCI 0.2 */
+if (!inited) {
+inited = true;
+arm_translate_init();
+}
 }
 }
 
@@ -1017,6 +1020,7 @@ static const ARMCPUInfo arm_cpus[] = {
 
 static Property arm_cpu_properties[] = {
 DEFINE_PROP_BOOL("start-powered-off", ARMCPU, start_powered_off, false),
+DEFINE_PROP_UINT32("psci-method", ARMCPU, psci_method, 0),
 DEFINE_PROP_UINT32("midr", ARMCPU, midr, 0),
 DEFINE_PROP_END_OF_LIST()
 };
diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 4c336b342553..645d97357f28 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -1362,4 +1362,10 @@ static inline void arm_cpu_set_pc(CPUState *cs, vaddr 
value)
 
 }
 
+enum {
+QEMU_PSCI_METHOD_DISABLED = 0,
+QEMU_PSCI_METHOD_SMC = 1,
+QEMU_PSCI_METHOD_HVC = 2,
+};
+
 #endif
diff --git a/target-arm/helper.c b/target-arm/helper.c
index 3e29d08ae182..ea8d85bd8d53 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -3499,11 +3499,23 @@ void arm_v7m_cpu_do_interrupt(CPUState *cs)
 
 bool arm_cpu_do_hvc(CPUState *cs)
 {
+ARMCPU *cpu = ARM_CPU(cs);
+
+if (cpu->psci_method == QEMU_PSCI_METHOD_HVC) {
+return arm_handle_psci(cs);
+}
+
 return false;
 }
 
 bool arm_cpu_do_smc(CPUState *cs)
 {
+ARMCPU *cpu = ARM_CPU(cs);
+
+if (cpu->psci_method == QEMU_PSCI_METHOD_SMC) {
+return arm_handle_psci(cs);
+}
+
 return false;
 }
 
diff --git a/target-arm/psci.c b/target-arm/psci.c
new file mode 100644
index ..b7ca3e0b1113
--- /dev/null
+++ b/target-arm/psci.c
@@ -0,0 +1,152 @@
+/*
+ * Copyright (C) 2014 - Linaro
+ * Author: Rob Herring 
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#if !defined(CONFIG_USER_ONLY)
+
+bool arm_handle_psci(CPUState *cs)
+{
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = &cpu->env;
+uint64_t param[4];
+uint64_t context_id, mpidr;
+target_ulong entry;
+int32_t ret =

[Qemu-devel] [PATCH 2/5] target-arm: support AArch64 for arm_cpu_set_pc

2014-09-01 Thread Ard Biesheuvel

From: Rob Herring 

Add AArch64 support to arm_cpu_set_pc and make it available to other files.

Signed-off-by: Rob Herring 
Signed-off-by: Ard Biesheuvel 
---
 target-arm/cpu.c |  7 ---
 target-arm/cpu.h | 12 
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/target-arm/cpu.c b/target-arm/cpu.c
index b4c06c17cf87..633a533af716 100644
--- a/target-arm/cpu.c
+++ b/target-arm/cpu.c
@@ -31,13 +31,6 @@
 #include "sysemu/kvm.h"
 #include "kvm_arm.h"
 
-static void arm_cpu_set_pc(CPUState *cs, vaddr value)
-{
-ARMCPU *cpu = ARM_CPU(cs);
-
-cpu->env.regs[15] = value;
-}
-
 static bool arm_cpu_has_work(CPUState *cs)
 {
 ARMCPU *cpu = ARM_CPU(cs);
diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 51bedc826299..5fa91b4f1d6c 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -1348,4 +1348,16 @@ static inline void cpu_pc_from_tb(CPUARMState *env, 
TranslationBlock *tb)
 }
 }
 
+static inline void arm_cpu_set_pc(CPUState *cs, vaddr value)
+{
+ARMCPU *cpu = ARM_CPU(cs);
+
+if (is_a64(&cpu->env)) {
+cpu->env.pc = value;
+} else {
+cpu->env.regs[15] = value;
+}
+
+}
+
 #endif
-- 
1.8.3.2

[Qemu-devel] [PATCH 0/5] ARM: add PSCI 0.2 support in TCG mode

2014-09-01 Thread Ard Biesheuvel

This series adds PSCI support to ARM and AArch64 system emulation when running
in TCG mode. As PSCI calls can be made using either hypervisor call (HVC) or
secure monitor call (SMC) instructions, support is added for handling those
in patch #3 before patch #4 adds the actual PSCI dispatch logic. Patch #5
enables PSCI for the mach-virt platform.

Currently, booting multiple cores under TCG is unstable, so the restriction
to 1 cpu in TCG mode is retained for now. However, PSCI reset and poweroff are
supported.

Rob Herring (5):
  target-arm: add powered off cpu state
  target-arm: support AArch64 for arm_cpu_set_pc
  target-arm: add hvc and smc exception emulation handling
infrastructure
  target-arm: add emulation of PSCI calls for system emulation
  arm/virt: enable PSCI emulation support for system emulation

 hw/arm/virt.c  |  70 ++---
 target-arm/Makefile.objs   |   1 +
 target-arm/cpu-qom.h   |  10 +++
 target-arm/cpu.c   |  22 ---
 target-arm/cpu.h   |  20 ++
 target-arm/helper-a64.c|  11 
 target-arm/helper.c|  33 ++
 target-arm/internals.h |  15 +
 target-arm/psci.c  | 152 +
 target-arm/translate-a64.c |  26 +---
 target-arm/translate.c |  24 ---
 11 files changed, 322 insertions(+), 62 deletions(-)
 create mode 100644 target-arm/psci.c

-- 
1.8.3.2

[Qemu-devel] [PATCH 3/5] target-arm: add hvc and smc exception emulation handling infrastructure

2014-09-01 Thread Ard Biesheuvel

From: Rob Herring 

Add the infrastructure to handle and emulate hvc and smc exceptions.
This will enable emulation of things such as PSCI calls. This commit
does not change the behavior and will exit with unknown exception.

Signed-off-by: Rob Herring 
Signed-off-by: Ard Biesheuvel 
---
 target-arm/cpu-qom.h   |  3 +++
 target-arm/cpu.h   |  2 ++
 target-arm/helper-a64.c| 11 +++
 target-arm/helper.c| 21 +
 target-arm/internals.h | 15 +++
 target-arm/translate-a64.c | 26 +-
 target-arm/translate.c | 24 +---
 7 files changed, 86 insertions(+), 16 deletions(-)

diff --git a/target-arm/cpu-qom.h b/target-arm/cpu-qom.h
index f71f6a4ccc6b..bcdd1e0edb55 100644
--- a/target-arm/cpu-qom.h
+++ b/target-arm/cpu-qom.h
@@ -191,6 +191,9 @@ extern const struct VMStateDescription vmstate_arm_cpu;
 void register_cp_regs_for_features(ARMCPU *cpu);
 void init_cpreg_list(ARMCPU *cpu);
 
+bool arm_cpu_do_hvc(CPUState *cs);
+bool arm_cpu_do_smc(CPUState *cs);
+
 void arm_cpu_do_interrupt(CPUState *cpu);
 void arm_v7m_cpu_do_interrupt(CPUState *cpu);
 
diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 5fa91b4f1d6c..4c336b342553 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -51,6 +51,8 @@
 #define EXCP_EXCEPTION_EXIT  8   /* Return from v7M exception.  */
 #define EXCP_KERNEL_TRAP 9   /* Jumped to kernel code page.  */
 #define EXCP_STREX  10
+#define EXCP_HVC11
+#define EXCP_SMC12
 
 #define ARMV7M_EXCP_RESET   1
 #define ARMV7M_EXCP_NMI 2
diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
index 2e9ef64786ae..54700e729711 100644
--- a/target-arm/helper-a64.c
+++ b/target-arm/helper-a64.c
@@ -483,6 +483,17 @@ void aarch64_cpu_do_interrupt(CPUState *cs)
 case EXCP_FIQ:
 addr += 0x100;
 break;
+case EXCP_HVC:
+if (arm_cpu_do_hvc(cs)) {
+return;
+}
+cpu_abort(cs, "Unhandled exception 0x%x\n", cs->exception_index);
+return;
+case EXCP_SMC:
+if (arm_cpu_do_smc(cs)) {
+return;
+}
+/* Fall-though */
 default:
 cpu_abort(cs, "Unhandled exception 0x%x\n", cs->exception_index);
 }
diff --git a/target-arm/helper.c b/target-arm/helper.c
index 2b95f33872cb..3e29d08ae182 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -3497,6 +3497,16 @@ void arm_v7m_cpu_do_interrupt(CPUState *cs)
 env->thumb = addr & 1;
 }
 
+bool arm_cpu_do_hvc(CPUState *cs)
+{
+return false;
+}
+
+bool arm_cpu_do_smc(CPUState *cs)
+{
+return false;
+}
+
 /* Handle a CPU exception.  */
 void arm_cpu_do_interrupt(CPUState *cs)
 {
@@ -3599,6 +3609,17 @@ void arm_cpu_do_interrupt(CPUState *cs)
 mask = CPSR_A | CPSR_I | CPSR_F;
 offset = 4;
 break;
+case EXCP_HVC:
+if (arm_cpu_do_hvc(cs)) {
+return;
+}
+cpu_abort(cs, "Unhandled exception 0x%x\n", cs->exception_index);
+return;
+case EXCP_SMC:
+if (arm_cpu_do_smc(cs)) {
+return;
+}
+/* Fall-though */
 default:
 cpu_abort(cs, "Unhandled exception 0x%x\n", cs->exception_index);
 return; /* Never happens.  Keep compiler happy.  */
diff --git a/target-arm/internals.h b/target-arm/internals.h
index 53c2e3cf3e7e..37fd740526a2 100644
--- a/target-arm/internals.h
+++ b/target-arm/internals.h
@@ -210,6 +210,21 @@ static inline uint32_t syn_aa32_svc(uint32_t imm16, bool 
is_thumb)
 | (is_thumb ? 0 : ARM_EL_IL);
 }
 
+static inline uint32_t syn_aa64_hvc(uint32_t imm16)
+{
+return (EC_AA64_HVC << ARM_EL_EC_SHIFT) | ARM_EL_IL | (imm16 & 0x);
+}
+
+static inline uint32_t syn_aa32_hvc(uint32_t imm16)
+{
+return (EC_AA32_HVC << ARM_EL_EC_SHIFT) | ARM_EL_IL | (imm16 & 0x);
+}
+
+static inline uint32_t syn_aa64_smc(uint32_t imm16)
+{
+return (EC_AA64_SMC << ARM_EL_EC_SHIFT) | ARM_EL_IL | (imm16 & 0x);
+}
+
 static inline uint32_t syn_aa64_bkpt(uint32_t imm16)
 {
 return (EC_AA64_BKPT << ARM_EL_EC_SHIFT) | ARM_EL_IL | (imm16 & 0x);
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 8e66b6c97282..9fd204694e7a 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -1473,20 +1473,28 @@ static void disas_exc(DisasContext *s, uint32_t insn)
 
 switch (opc) {
 case 0:
-/* SVC, HVC, SMC; since we don't support the Virtualization
- * or TrustZone extensions these all UNDEF except SVC.
- */
-if (op2_ll != 1) {
-unallocated_encoding(s);
-break;
-}
 /* For SVC, HVC and SMC we advance the single-step state
  * machine before taking the exception. This is architecturally
  * mandated, to ensure that single-stepping a system call
  * instruction works properly.
  */
-gen_ss_advance(s);
-gen_excep

[Qemu-devel] [PATCH 5/5] arm/virt: enable PSCI emulation support for system emulation

2014-09-01 Thread Ard Biesheuvel

From: Rob Herring 

Now that we have PSCI emulation, enable it for the virt platform.
This simplifies the virt machine a bit now that PSCI no longer
needs to be a KVM only feature.

Signed-off-by: Rob Herring 
Signed-off-by: Ard Biesheuvel 
---
 hw/arm/virt.c | 70 +--
 1 file changed, 34 insertions(+), 36 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index d6fffc75bda0..fefe80219d2f 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -189,47 +189,43 @@ static void create_fdt(VirtBoardInfo *vbi)
 
 static void fdt_add_psci_node(const VirtBoardInfo *vbi)
 {
+uint32_t cpu_suspend_fn;
+uint32_t cpu_off_fn;
+uint32_t cpu_on_fn;
+uint32_t migrate_fn;
 void *fdt = vbi->fdt;
 ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(0));
 
-/* No PSCI for TCG yet */
-if (kvm_enabled()) {
-uint32_t cpu_suspend_fn;
-uint32_t cpu_off_fn;
-uint32_t cpu_on_fn;
-uint32_t migrate_fn;
-
-qemu_fdt_add_subnode(fdt, "/psci");
-if (armcpu->psci_version == 2) {
-const char comp[] = "arm,psci-0.2\0arm,psci";
-qemu_fdt_setprop(fdt, "/psci", "compatible", comp, sizeof(comp));
-
-cpu_off_fn = QEMU_PSCI_0_2_FN_CPU_OFF;
-if (arm_feature(&armcpu->env, ARM_FEATURE_AARCH64)) {
-cpu_suspend_fn = QEMU_PSCI_0_2_FN64_CPU_SUSPEND;
-cpu_on_fn = QEMU_PSCI_0_2_FN64_CPU_ON;
-migrate_fn = QEMU_PSCI_0_2_FN64_MIGRATE;
-} else {
-cpu_suspend_fn = QEMU_PSCI_0_2_FN_CPU_SUSPEND;
-cpu_on_fn = QEMU_PSCI_0_2_FN_CPU_ON;
-migrate_fn = QEMU_PSCI_0_2_FN_MIGRATE;
-}
-} else {
-qemu_fdt_setprop_string(fdt, "/psci", "compatible", "arm,psci");
+qemu_fdt_add_subnode(fdt, "/psci");
+if (armcpu->psci_version == 2) {
+const char comp[] = "arm,psci-0.2\0arm,psci";
+qemu_fdt_setprop(fdt, "/psci", "compatible", comp, sizeof(comp));
 
-cpu_suspend_fn = QEMU_PSCI_0_1_FN_CPU_SUSPEND;
-cpu_off_fn = QEMU_PSCI_0_1_FN_CPU_OFF;
-cpu_on_fn = QEMU_PSCI_0_1_FN_CPU_ON;
-migrate_fn = QEMU_PSCI_0_1_FN_MIGRATE;
+cpu_off_fn = QEMU_PSCI_0_2_FN_CPU_OFF;
+if (arm_feature(&armcpu->env, ARM_FEATURE_AARCH64)) {
+cpu_suspend_fn = QEMU_PSCI_0_2_FN64_CPU_SUSPEND;
+cpu_on_fn = QEMU_PSCI_0_2_FN64_CPU_ON;
+migrate_fn = QEMU_PSCI_0_2_FN64_MIGRATE;
+} else {
+cpu_suspend_fn = QEMU_PSCI_0_2_FN_CPU_SUSPEND;
+cpu_on_fn = QEMU_PSCI_0_2_FN_CPU_ON;
+migrate_fn = QEMU_PSCI_0_2_FN_MIGRATE;
 }
+} else {
+qemu_fdt_setprop_string(fdt, "/psci", "compatible", "arm,psci");
 
-qemu_fdt_setprop_string(fdt, "/psci", "method", "hvc");
-
-qemu_fdt_setprop_cell(fdt, "/psci", "cpu_suspend", cpu_suspend_fn);
-qemu_fdt_setprop_cell(fdt, "/psci", "cpu_off", cpu_off_fn);
-qemu_fdt_setprop_cell(fdt, "/psci", "cpu_on", cpu_on_fn);
-qemu_fdt_setprop_cell(fdt, "/psci", "migrate", migrate_fn);
+cpu_suspend_fn = QEMU_PSCI_0_1_FN_CPU_SUSPEND;
+cpu_off_fn = QEMU_PSCI_0_1_FN_CPU_OFF;
+cpu_on_fn = QEMU_PSCI_0_1_FN_CPU_ON;
+migrate_fn = QEMU_PSCI_0_1_FN_MIGRATE;
 }
+
+qemu_fdt_setprop_string(fdt, "/psci", "method", "hvc");
+
+qemu_fdt_setprop_cell(fdt, "/psci", "cpu_suspend", cpu_suspend_fn);
+qemu_fdt_setprop_cell(fdt, "/psci", "cpu_off", cpu_off_fn);
+qemu_fdt_setprop_cell(fdt, "/psci", "cpu_on", cpu_on_fn);
+qemu_fdt_setprop_cell(fdt, "/psci", "migrate", migrate_fn);
 }
 
 static void fdt_add_timer_nodes(const VirtBoardInfo *vbi)
@@ -468,8 +464,7 @@ static void machvirt_init(MachineState *machine)
 vbi->smp_cpus = smp_cpus;
 
 /*
- * Only supported method of starting secondary CPUs is PSCI and
- * PSCI is not yet supported with TCG, so limit smp_cpus to 1
+ * SMP is not yet supported with TCG, so limit smp_cpus to 1
  * if we're not using KVM.
  */
 if (!kvm_enabled() && smp_cpus > 1) {
@@ -495,6 +490,9 @@ static void machvirt_init(MachineState *machine)
 }
 cpuobj = object_new(object_class_get_name(oc));
 
+object_property_set_int(cpuobj, QEMU_PSCI_METHOD_HVC, "psci-method",
+NULL);
+
 /* Secondary CPUs start in PSCI powered-down state */
 if (n > 0) {
 object_property_set_bool(cpuobj, true, "start-powered-off", NULL);
-- 
1.8.3.2

Re: [Qemu-devel] What tests should "make check-block" run?

2014-09-01 Thread Markus Armbruster

Max Reitz  writes:

> On 28.08.2014 17:24, Markus Armbruster wrote:
>> Stefan Hajnoczi  writes:
>>
>>> On Thu, Aug 21, 2014 at 02:16:36PM +0100, Peter Maydell wrote:
 On 21 August 2014 14:12, Stefan Hajnoczi  wrote:
> On Thu, Aug 21, 2014 at 02:27:00PM +0200, Markus Armbruster wrote:
>> Should we have a variant of "make check-block" for testing other
>> (format, protocol) combinations?
> I don't think variants are useful.  If you need control, use ./check.
 That seems pretty undiscoverable to me. I know about 'make check',
 and 'make check-help' tells me about 'make check-block', but how
 do I find out about 'check' ? I just had to bounce through the makefile
 and a wrapper script to even figure out which directory it lives in,
 and there's no help text or usage comments in it...
>>> http://qemu-project.org/Documentation/QemuIoTests
>> I'm afraid this needs updating for Max's recent "iotests: Allow
>> out-of-tree run" series.  Max?
>
> Done. Thanks for notifying me.

Looks good, thank you!

[Qemu-devel] [Bug 1362635] Re: bdrv_read co-routine re-entered recursively

2014-09-01 Thread senya

I'm trying to reanimate github.com/jagane/qemu-kvm-livebackup
there is a separate thread which connects with client through socket and sends 
disk blocks to it.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1362635

Title:
  bdrv_read co-routine re-entered recursively

Status in QEMU:
  New

Bug description:
  calling bdrv_read in a loop leads to the follwing situation:

  bs->drv->bdrv_aio_readv is called, and finally calls bdrv_co_io_em_complete 
in other thread context.
  there is a possibility of calling bdrv_co_io_em_complete before calling 
qemu_coroutine_yield in bdrv_co_io_em. And qemu fails with "co-routine 
re-entered recursively".

  static void bdrv_co_io_em_complete(void *opaque, int ret)
  {
  CoroutineIOCompletion *co = opaque;

  co->ret = ret;
  qemu_coroutine_enter(co->coroutine, NULL);
  }

  static int coroutine_fn bdrv_co_io_em(BlockDriverState *bs, int64_t 
sector_num,
int nb_sectors, QEMUIOVector *iov,
bool is_write)
  {
  CoroutineIOCompletion co = {
  .coroutine = qemu_coroutine_self(),
  };
  BlockDriverAIOCB *acb;

  if (is_write) {
  acb = bs->drv->bdrv_aio_writev(bs, sector_num, iov, nb_sectors,
 bdrv_co_io_em_complete, &co);
  } else {
  acb = bs->drv->bdrv_aio_readv(bs, sector_num, iov, nb_sectors,
bdrv_co_io_em_complete, &co);
  }

  trace_bdrv_co_io_em(bs, sector_num, nb_sectors, is_write, acb);
  if (!acb) {
  return -EIO;
  }
  qemu_coroutine_yield();

  return co.ret;
  }

  is it a bug, or may be I don't understand something?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1362635/+subscriptions

Re: [Qemu-devel] [PATCH] cpu: init vmstate for ticks and clock offset

2014-09-01 Thread Paolo Bonzini

Il 01/09/2014 07:34, Pavel Dovgalyuk ha scritto:
> Ticks and clock offset used by CPU timers have to be saved in vmstate.
> But vmstate for these fields registered only in icount mode.
> Missing registration leads to breaking the continuity when vmstate is loaded.
> This patch introduces new initialization function which fixes this.
> 
> Signed-off-by: Pavel Dovgalyuk 
> ---
>  cpus.c|8 ++--
>  include/qemu-common.h |2 ++
>  vl.c  |1 +
>  3 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/cpus.c b/cpus.c
> index 2b5c0bd..c07826d 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -492,13 +492,17 @@ static const VMStateDescription vmstate_timers = {
>  }
>  };
>  
> +void cpu_ticks_init(void)
> +{
> +seqlock_init(&timers_state.vm_clock_seqlock, NULL);
> +vmstate_register(NULL, 0, &vmstate_timers, &timers_state);
> +}
> +
>  void configure_icount(QemuOpts *opts, Error **errp)
>  {
>  const char *option;
>  char *rem_str = NULL;
>  
> -seqlock_init(&timers_state.vm_clock_seqlock, NULL);
> -vmstate_register(NULL, 0, &vmstate_timers, &timers_state);
>  option = qemu_opt_get(opts, "shift");
>  if (!option) {
>  if (qemu_opt_get(opts, "align") != NULL) {
> diff --git a/include/qemu-common.h b/include/qemu-common.h
> index bcf7a6a..dcb57ab 100644
> --- a/include/qemu-common.h
> +++ b/include/qemu-common.h
> @@ -105,6 +105,8 @@ static inline char *realpath(const char *path, char 
> *resolved_path)
>  }
>  #endif
>  
> +void cpu_ticks_init(void);
> +
>  /* icount */
>  void configure_icount(QemuOpts *opts, Error **errp);
>  extern int use_icount;
> diff --git a/vl.c b/vl.c
> index b796c67..9241e2d 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -4330,6 +4330,7 @@ int main(int argc, char **argv, char **envp)
>  qemu_spice_init();
>  #endif
>  
> +cpu_ticks_init();
>  if (icount_opts) {
>  if (kvm_enabled() || xen_enabled()) {
>  fprintf(stderr, "-icount is not allowed with kvm or xen\n");
> 
> 
> 

Thanks, will send a pull request for this ASAP.

Paolo

Re: [Qemu-devel] bug (bisected): qemu-system-sh4 serial console, ctrl-C kills emulator

2014-09-01 Thread Paolo Bonzini

Il 01/09/2014 07:05, Rob Landley ha scritto:
> If you grab http://landley.net/aboriginal/bin/qemu-system-sh4.tar.bz2
> extract it and ./run-emulator.sh (which is a fairly straightforward
> qemu-system-sh4 invocation on the included kernel image and squashfs
> root filesystem), older qemu versions would run it just fine, and ctrl-C
> would pass through to the shell. But current qemu, ctrl-C kills the
> emulator.
> 
> I bisected this to commit 02c4bdf1d2ca, although that's between commit
> a9e8aeb3755b (which broke the sh4 pci bus) and b23ea25f5098 (which made
> it boot again), so you have to either revert the first or apply the
> second in order to test that the reverted commit fixes it.

I get a 404 error, but this is most likely not sh4 specific---can you
just post here the command line?

Paolo

Re: [Qemu-devel] tcmu-runner and QEMU

2014-09-01 Thread Paolo Bonzini

Il 31/08/2014 22:03, Andy Grover ha scritto:
>> So I think what you want is a `qemu-iscsi'?  ie. the same as qemu-nbd,
>> but with an iSCSI frontend (to replace the NBD server).
> 
> You want qemu to be able to issue SCSI commands over iSCSI? I thought
> qemu used libiscsi for this, to be the initiator. What Benoit and I have
> been discussing is the other side, enabling qemu to configure LIO to
> handle requests from other initiators

Right, you're talking about the same thing; qemu-nbd is an NBD server
(matching a "target" in SCSI speak) that understands qcow2.

Paolo

> (either VMs or iron) over iSCSI or
> FCoE, but backed by qcow2 disk images. The problem being LIO doesn't
> speak qcow2 yet.

Re: [Qemu-devel] [PATCH V3] vhost_net: start/stop guest notifiers properly

2014-09-01 Thread Michael S. Tsirkin

On Fri, Aug 29, 2014 at 06:40:24PM +0800, Zhangjie (HZ) wrote:
> 
> 
> On 2014/8/27 20:59, Michael S. Tsirkin wrote:
> > On Thu, Aug 21, 2014 at 03:42:53PM +0800, Zhangjie (HZ) wrote:
> >> On 2014/8/21 14:53, Jason Wang wrote:
> >>> On 08/21/2014 02:28 PM, Zhangjie (HZ) wrote:
> 
>  After migration, vhost is not disabled, virtual nic became unreachable 
>  because vhost is not awakened.
>  By the logical of EVENT_IDX, virtio-net will not kick vhost again if the 
>  used idx is not updated.
>  So, if one interrupts is lost during migration, virtio_net will not kick 
>  vhost again.
>  Then, no skb from virtio-net can be sent to tap.
> >>>
> >>> Yes and I mean to test vhost=off to see if it was the issue of vhost.
> >> That sounds reasonable, but the test case is to test vhost.
> 
>  Jason's patch reduced the probability of occurrence, from about 1/20 to 
>  1/80. It is really effective. I think the patch should be acked.
>  May be we can try to solve the problem from another perspective. Do you 
>  have some methods to sense the migration?
>  We can make up a signal from virtio-net after the migration.
> >>>
> >>> You can make a patch like this to debug. If problem disappears, it means
> >>> interrupt was really lost anyway.
> 
> > Anyway, I will try to reproduce it by myself.
> >
>  The test environment is really terrible, I build a environment myself, 
>  but it problem did not occur.
>  The environment I use now is from a colleague Responsible for test work.
>  Two hosts, every host has about 20 vms, they send packages(ipv4 and 
>  ipv6) between each other.
>  The VM to be migrated also sens packages itself, and there is a ping(-i 
>  0.001) from another host to it.
>  The physical nic is 1GE, connected through a internal nework.
> >>>
> >>> Just want to confirm. For the problem did not occur, you mean with my
> >>> patch on top?
> >>> .
> >>>
> >> I mean, with your patch, I have to test 80 times before it occurs, the 
> >> probability is reduced.
> > 
> > Could you please try to apply the patch
> > [PATCH V4] net: Forbid dealing with packets when VM is not running
> > on top and see if this helps?
> > 
> > Thanks!
> > 
> >> -- 
> >> Best Wishes!
> >> Zhang Jie
> > .
> > 
> Thanks! I will have a test.

Great, once you have the result of the two patches applied
together, please let us know on the list.


> -- 
> Best Wishes!
> Zhang Jie

Re: [Qemu-devel] [PATCH 0/8] add basic recovery logic to quorum driver

2014-09-01 Thread Benoît Canet

The Monday 01 Sep 2014 à 15:43:06 (+0800), Liu Yuan wrote :
> This patch set mainly add mainly two logics to implement device recover
> - notify qourum driver of the broken states from the child driver(s)
> - dirty track and sync the device after it is repaired
> 
> Thus quorum allow VMs to continue while some child devices are broken and when
> the child devices are repaired and return back, we sync dirty bits during
> downtime to keep data consistency.
> 
> The recovery logic is based on the driver state bitmap and will sync the dirty
> bits with a timeslice window in a coroutine in this prtimive implementation.
> 
> Simple graph about 2 children with threshold=1 and read-pattern=fifo:
> (similary to DRBD)
> 
> + denote device sync iteration
> - IO on a single device
> = IO on two devices
> 
>   sync complete, release dirty bitmap
>  ^
>  |
>   -++==
>  | |
>  | v
>  |   device repaired and begin to sync
>  v
>device broken, create a dirty bitmap
> 
>   This sync logic can take care of nested broken problem, that devices are
>   broken while in sync. We just start a sync process after the devices are
>   repaired again and switch the devices from broken to sound only when the 
> sync
>   completes.
> 
> For read-pattern=quorum mode, it enjoys the recovery logic without any 
> problem.

Hi Liu,

I had something like that in mind.

This series seems very cool I will review it.

Thanks for contributing to quorum.

Best regards

Benoît

> 
> Todo:
> - use aio interface to sync data (multiple transfer in one go)
> - dynamic slice window to control sync bandwidth more smoothly
> - add auto-reconnection mechanism to other protocol (if not support yet)
> - add tests
> 
> Cc: Eric Blake 
> Cc: Benoit Canet 
> Cc: Kevin Wolf 
> Cc: Stefan Hajnoczi 
> 
> Liu Yuan (8):
>   block/quorum: initialize qcrs.aiocb for read
>   block: add driver operation callbacks
>   block/sheepdog: propagate disconnect/reconnect events to upper driver
>   block/quorum: add quorum_aio_release() helper
>   quorum: fix quorum_aio_cancel()
>   block/quorum: add broken state to BlockDriverState
>   block: add two helpers
>   quorum: add basic device recovery logic
> 
>  block.c   |  17 +++
>  block/quorum.c| 324 
> +-
>  block/sheepdog.c  |   9 ++
>  include/block/block.h |   9 ++
>  include/block/block_int.h |   6 +
>  trace-events  |   5 +
>  6 files changed, 336 insertions(+), 34 deletions(-)
> 
> -- 
> 1.9.1
>

Re: [Qemu-devel] [PATCH 1/1] hw/pci-assign: split pci-assign.c

2014-09-01 Thread Michael S. Tsirkin

On Mon, Sep 01, 2014 at 10:07:19AM +0800, Tiejun Chen wrote:
> We will try to reuse assign_dev_load_option_rom in xen side, and
> especially its a good beginning to unify pci assign codes both on
> kvm and xen in the future.
> 
> Signed-off-by: Tiejun Chen 
> ---
>  hw/i386/kvm/pci-assign.c| 136 
> +---
>  include/hw/pci/pci-assign.h | 123 +++
>  2 files changed, 151 insertions(+), 108 deletions(-)
>  create mode 100644 include/hw/pci/pci-assign.h
> 
> diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
> index 17c7d6dc..4cf24ad 100644
> --- a/hw/i386/kvm/pci-assign.c
> +++ b/hw/i386/kvm/pci-assign.c
> @@ -37,16 +37,7 @@
>  #include "hw/pci/pci.h"
>  #include "hw/pci/msi.h"
>  #include "kvm_i386.h"
> -
> -#define MSIX_PAGE_SIZE 0x1000
> -
> -/* From linux/ioport.h */
> -#define IORESOURCE_IO   0x0100  /* Resource type */
> -#define IORESOURCE_MEM  0x0200
> -#define IORESOURCE_IRQ  0x0400
> -#define IORESOURCE_DMA  0x0800
> -#define IORESOURCE_PREFETCH 0x2000  /* No side effects */
> -#define IORESOURCE_MEM_64   0x0010
> +#include "hw/pci/pci-assign.h"
>  
>  //#define DEVICE_ASSIGNMENT_DEBUG
>  
> @@ -59,88 +50,6 @@
>  #define DEBUG(fmt, ...)
>  #endif
>  
> -typedef struct PCIRegion {
> -int type;   /* Memory or port I/O */
> -int valid;
> -uint64_t base_addr;
> -uint64_t size;/* size of the region */
> -int resource_fd;
> -} PCIRegion;
> -
> -typedef struct PCIDevRegions {
> -uint8_t bus, dev, func; /* Bus inside domain, device and function */
> -int irq;/* IRQ number */
> -uint16_t region_number; /* number of active regions */
> -
> -/* Port I/O or MMIO Regions */
> -PCIRegion regions[PCI_NUM_REGIONS - 1];
> -int config_fd;
> -} PCIDevRegions;
> -
> -typedef struct AssignedDevRegion {
> -MemoryRegion container;
> -MemoryRegion real_iomem;
> -union {
> -uint8_t *r_virtbase; /* mmapped access address for memory regions */
> -uint32_t r_baseport; /* the base guest port for I/O regions */
> -} u;
> -pcibus_t e_size;/* emulated size of region in bytes */
> -pcibus_t r_size;/* real size of region in bytes */
> -PCIRegion *region;
> -} AssignedDevRegion;
> -
> -#define ASSIGNED_DEVICE_PREFER_MSI_BIT  0
> -#define ASSIGNED_DEVICE_SHARE_INTX_BIT  1
> -
> -#define ASSIGNED_DEVICE_PREFER_MSI_MASK (1 << ASSIGNED_DEVICE_PREFER_MSI_BIT)
> -#define ASSIGNED_DEVICE_SHARE_INTX_MASK (1 << ASSIGNED_DEVICE_SHARE_INTX_BIT)
> -
> -typedef struct MSIXTableEntry {
> -uint32_t addr_lo;
> -uint32_t addr_hi;
> -uint32_t data;
> -uint32_t ctrl;
> -} MSIXTableEntry;
> -
> -typedef enum AssignedIRQType {
> -ASSIGNED_IRQ_NONE = 0,
> -ASSIGNED_IRQ_INTX_HOST_INTX,
> -ASSIGNED_IRQ_INTX_HOST_MSI,
> -ASSIGNED_IRQ_MSI,
> -ASSIGNED_IRQ_MSIX
> -} AssignedIRQType;
> -
> -typedef struct AssignedDevice {
> -PCIDevice dev;
> -PCIHostDeviceAddress host;
> -uint32_t dev_id;
> -uint32_t features;
> -int intpin;
> -AssignedDevRegion v_addrs[PCI_NUM_REGIONS - 1];
> -PCIDevRegions real_device;
> -PCIINTxRoute intx_route;
> -AssignedIRQType assigned_irq_type;
> -struct {
> -#define ASSIGNED_DEVICE_CAP_MSI (1 << 0)
> -#define ASSIGNED_DEVICE_CAP_MSIX (1 << 1)
> -uint32_t available;
> -#define ASSIGNED_DEVICE_MSI_ENABLED (1 << 0)
> -#define ASSIGNED_DEVICE_MSIX_ENABLED (1 << 1)
> -#define ASSIGNED_DEVICE_MSIX_MASKED (1 << 2)
> -uint32_t state;
> -} cap;
> -uint8_t emulate_config_read[PCI_CONFIG_SPACE_SIZE];
> -uint8_t emulate_config_write[PCI_CONFIG_SPACE_SIZE];
> -int msi_virq_nr;
> -int *msi_virq;
> -MSIXTableEntry *msix_table;
> -hwaddr msix_table_addr;
> -uint16_t msix_max;
> -MemoryRegion mmio;
> -char *configfd_name;
> -int32_t bootindex;
> -} AssignedDevice;
> -
>  static void assigned_dev_update_irq_routing(PCIDevice *dev);
>  
>  static void assigned_dev_load_option_rom(AssignedDevice *dev);
> @@ -1896,37 +1805,38 @@ type_init(assign_register_types)
>   * load the corresponding ROM data to RAM. If an error occurs while loading 
> an
>   * option ROM, we just ignore that option ROM and continue with the next one.
>   */
> -static void assigned_dev_load_option_rom(AssignedDevice *dev)
> +int dev_load_option_rom(PCIDevice *dev, struct Object *owner, void *ptr,
> +unsigned int domain, unsigned int bus,
> +unsigned int slot, unsigned int function)
>  {
>  char name[32], rom_file[64];
>  FILE *fp;
>  uint8_t val;
>  struct stat st;
> -void *ptr;
> +int size = 0;
>  
>  /* If loading ROM from file, pci handles it */
> -if (dev->dev.romfile || !dev->dev.rom_bar) {
> -return;
> +if (dev->romfile || !dev->rom_bar) {
> +return -1;
>  }
>  
>  snprintf(rom_file, siz

Re: [Qemu-devel] [PATCH 2/8] block: add driver operation callbacks

2014-09-01 Thread Benoît Canet

The Monday 01 Sep 2014 à 15:43:08 (+0800), Liu Yuan wrote :
> Driver operations are defined as callbacks passed from block upper drivers to
> lower drivers and are supposed to be called by lower drivers.
> 
> Requests handling(queuing, submitting, etc.) are done in protocol tier in the
> block layer and connection states are also maintained down there. Driver
> operations are supposed to notify the upper tier (such as quorum) of the 
> states
> changes.
> 
> For now only two operation are added:
> 
> driver_disconnect: called when connection is off
> driver_reconnect: called when connection is on after disconnection
> 
> Which are used to notify upper tier of the connection state.
> 
> Cc: Eric Blake 
> Cc: Benoit Canet 
> Cc: Kevin Wolf 
> Cc: Stefan Hajnoczi 
> Signed-off-by: Liu Yuan 
> ---
>  block.c   | 7 +++
>  include/block/block.h | 7 +++
>  include/block/block_int.h | 3 +++
>  3 files changed, 17 insertions(+)
> 
> diff --git a/block.c b/block.c
> index c12b8de..22eb3e4 100644
> --- a/block.c
> +++ b/block.c
> @@ -2152,6 +2152,13 @@ void bdrv_set_dev_ops(BlockDriverState *bs, const 
> BlockDevOps *ops,
>  bs->dev_opaque = opaque;
>  }
>  
> +void bdrv_set_drv_ops(BlockDriverState *bs, const BlockDrvOps *ops,
> +  void *opaque)
> +{
> +bs->drv_ops = ops;
> +bs->drv_opaque = opaque;

We need to be very carefull of the mix between these fields and the infamous
bdrv_swap function.

Also I don't know if "driver operations" is the right name since the 
BlockDriver structure's
callback could also be named "driver operations".

> +}
> +
>  static void bdrv_dev_change_media_cb(BlockDriverState *bs, bool load)
>  {
>  if (bs->dev_ops && bs->dev_ops->change_media_cb) {
> diff --git a/include/block/block.h b/include/block/block.h
> index 8f4ad16..a61eaf0 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -82,6 +82,11 @@ typedef struct BlockDevOps {
>  void (*resize_cb)(void *opaque);
>  } BlockDevOps;
>  
> +typedef struct BlockDrvOps {
> +void (*driver_reconnect)(BlockDriverState *bs);
> +void (*driver_disconnect)(BlockDriverState *bs);
> +} BlockDrvOps;
> +
>  typedef enum {
>  BDRV_REQ_COPY_ON_READ = 0x1,
>  BDRV_REQ_ZERO_WRITE   = 0x2,
> @@ -234,6 +239,8 @@ void bdrv_detach_dev(BlockDriverState *bs, void *dev);
>  void *bdrv_get_attached_dev(BlockDriverState *bs);
>  void bdrv_set_dev_ops(BlockDriverState *bs, const BlockDevOps *ops,
>void *opaque);
> +void bdrv_set_drv_ops(BlockDriverState *bs, const BlockDrvOps *ops,
> +  void *opaque);
>  void bdrv_dev_eject_request(BlockDriverState *bs, bool force);
>  bool bdrv_dev_has_removable_media(BlockDriverState *bs);
>  bool bdrv_dev_is_tray_open(BlockDriverState *bs);
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 2334895..9fdec7f 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -319,6 +319,9 @@ struct BlockDriverState {
>  const BlockDevOps *dev_ops;
>  void *dev_opaque;
>  
> +const BlockDrvOps *drv_ops;
> +void *drv_opaque;
> +
>  AioContext *aio_context; /* event loop used for fd handlers, timers, etc 
> */
>  
>  char filename[1024];
> -- 
> 1.9.1
>

Re: [Qemu-devel] [PATCH 3/8] block/sheepdog: propagate disconnect/reconnect events to upper driver

2014-09-01 Thread Benoît Canet

The Monday 01 Sep 2014 à 15:43:09 (+0800), Liu Yuan wrote :
> This is the reference usage how we propagate connection state to upper tier.
> 
> Cc: Eric Blake 
> Cc: Benoit Canet 
> Cc: Kevin Wolf 
> Cc: Stefan Hajnoczi 
> Signed-off-by: Liu Yuan 
> ---
>  block/sheepdog.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/block/sheepdog.c b/block/sheepdog.c
> index 53c24d6..9c0fc49 100644
> --- a/block/sheepdog.c
> +++ b/block/sheepdog.c
> @@ -714,6 +714,11 @@ static coroutine_fn void reconnect_to_sdog(void *opaque)
>  {
>  BDRVSheepdogState *s = opaque;
>  AIOReq *aio_req, *next;
> +BlockDriverState *bs = s->bs;
> +
> +if (bs->drv_ops && bs->drv_ops->driver_disconnect) {
> +bs->drv_ops->driver_disconnect(bs);
> +}

Since this sequence will be strictly the same for all the implementation
could we create a bdrv_signal_disconnect(bs); in the block layer to make this
code generic ?

>  
>  aio_set_fd_handler(s->aio_context, s->fd, NULL, NULL, NULL);
>  close(s->fd);
> @@ -756,6 +761,10 @@ static coroutine_fn void reconnect_to_sdog(void *opaque)
>  QLIST_INSERT_HEAD(&s->inflight_aio_head, aio_req, aio_siblings);
>  resend_aioreq(s, aio_req);
>  }
> +
> +if (bs->drv_ops && bs->drv_ops->driver_reconnect) {
> +bs->drv_ops->driver_reconnect(bs);
> +}

Same here bdrv_signal_reconnect(bs);

>  }
>  
>  /*
> -- 
> 1.9.1
>

Re: [Qemu-devel] tcmu-runner and QEMU

2014-09-01 Thread Paolo Bonzini

Il 31/08/2014 22:38, Benoît Canet ha scritto:
> The problem with QEMU block drivers is that they are using either coroutines
> or QEMU custom AIO callbacks so reusing them without the block layer is
> not doable.

Not really true.  The QEMU block layer can be wrapped relatively easily
in a GSource.  This is how QEMU uses it, in fact.

As to global state, each .so is only loaded once in an executable, so if
TCMU loaded two QEMU plugins the .so would point to the block devices
from each plugin.

The problem is more the QMP interface, I think.

> For the QEMU block layer as a whole it maintains some linked lists of block 
> devices
> states or similar stuff as static global variables.
> (See https://github.com/qemu/qemu/blob/master/block.c#L96)
> 
> So having more than one instance of the block layer running is not doable.
> 
> I am not aware of anyone successfull in turning it into a proper .so.

There was libqemublock.  I think it stopped just because the author
turned to something else, not because there were particular problems
with the design.

Paolo

> Extracting into a binary acting as an nbd target was done with qemu-nbd 
> though.

Re: [Qemu-devel] [PATCH 4/8] block/quorum: add quorum_aio_release() helper

2014-09-01 Thread Benoît Canet

The Monday 01 Sep 2014 à 15:43:10 (+0800), Liu Yuan wrote :
> Cc: Eric Blake 
> Cc: Benoit Canet 
> Cc: Kevin Wolf 
> Cc: Stefan Hajnoczi 
> Signed-off-by: Liu Yuan 
> ---
>  block/quorum.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/block/quorum.c b/block/quorum.c
> index 5866bca..9e056d6 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -130,6 +130,12 @@ struct QuorumAIOCB {
>  
>  static bool quorum_vote(QuorumAIOCB *acb);
>  
> +static void quorum_aio_release(QuorumAIOCB *acb)
> +{
> +g_free(acb->qcrs);
> +qemu_aio_release(acb);
> +}
> +
>  static void quorum_aio_cancel(BlockDriverAIOCB *blockacb)
>  {
>  QuorumAIOCB *acb = container_of(blockacb, QuorumAIOCB, common);
> @@ -141,8 +147,7 @@ static void quorum_aio_cancel(BlockDriverAIOCB *blockacb)
>  bdrv_aio_cancel(acb->qcrs[i].aiocb);
>  }
>  
> -g_free(acb->qcrs);
> -qemu_aio_release(acb);
> +quorum_aio_release(acb);
>  }
>  
>  static AIOCBInfo quorum_aiocb_info = {
> @@ -168,8 +173,7 @@ static void quorum_aio_finalize(QuorumAIOCB *acb)
>  }
>  }
>  
> -g_free(acb->qcrs);
> -qemu_aio_release(acb);
> +quorum_aio_release(acb);
>  }
>  
>  static bool quorum_sha256_compare(QuorumVoteValue *a, QuorumVoteValue *b)
> -- 
> 1.9.1
> 

Reviewed-by: Benoît Canet

Re: [Qemu-devel] [PATCH 5/8] quorum: fix quorum_aio_cancel()

2014-09-01 Thread Benoît Canet

The Monday 01 Sep 2014 à 15:43:11 (+0800), Liu Yuan wrote :
> For a fifo read pattern, we only have one running aio

>(possible other cases that has less number than num_children in the future)
I have trouble understanding this part of the commit message could you try
to clarify it ?

> , so we need to check if
> .acb is NULL against bdrv_aio_cancel() to avoid segfault.
> 
> Cc: Eric Blake 
> Cc: Benoit Canet 
> Cc: Kevin Wolf 
> Cc: Stefan Hajnoczi 
> Signed-off-by: Liu Yuan 
> ---
>  block/quorum.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/block/quorum.c b/block/quorum.c
> index 9e056d6..b9eeda3 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -144,7 +144,9 @@ static void quorum_aio_cancel(BlockDriverAIOCB *blockacb)
>  
>  /* cancel all callbacks */
>  for (i = 0; i < s->num_children; i++) {
> -bdrv_aio_cancel(acb->qcrs[i].aiocb);
> +if (acb->qcrs[i].aiocb) {
> +bdrv_aio_cancel(acb->qcrs[i].aiocb);
> +}
>  }
>  
>  quorum_aio_release(acb);
> -- 
> 1.9.1
>

Re: [Qemu-devel] linux-user: enabling binfmt P flag

2014-09-01 Thread Paolo Bonzini

Il 29/08/2014 20:01, Peter Maydell ha scritto:
> [cc'ing MJT for more distro opinion since I think fundamentally
> the choice we ought to make upstream is "what's not going to
> screw over distros"... Paolo, is there a RedHat QEMU maintainer
> who would have an opinion here?]

There's Cole Robinson.

BTW, Fedora doesn't use the binfmt scripts from QEMU, but does reuse the
binfmt lines.  We'd just add Ps and we'd be fine.

However, the problem is not really for distros.  Packagers just read the
release notes and adjust whatever needs to be adjusted.  The problem is
for people who compile from source and are bit by conflicting binfmt
formats from their distro.

The solution could be to extend binfmt_misc so that it sets two
environment variables BINFMT_MISC_PID and BINFMT_MISC_ORIG_ARGV0.  The
former is set to the pid of the binfmt "interpreter" program, the latter
to the argv[0] value.  Then QEMU can check if BINFMT_MISC_PID matches
getpid() and, if so, trust the BINFMT_MISC_ORIG_ARGV0 value.

Paolo

Re: [Qemu-devel] [PATCH 2/2] i386: Add a Virtual Machine Generation ID device.

2014-09-01 Thread Paolo Bonzini

Il 01/09/2014 09:20, Gal Hammer ha scritto:
>>
>> We are still in the process of defining which devices/methods go in the
>> DSDT and which go in the SSDT.  We had bad experiences with ACPI table
>> migration in 2.1, and one plan to fix them is the following:
>>
>> * the DSDT should always be the same size no matter what command line
>> options are there
>>
>> * the SSDT should have the exact same content (byte-for-byte) for
>> different versions of QEMU, with the same command line options
>> (including the machine type).
>>
>> Right now your code obeys the first rule, not the second rule, so it
>> should add the device to the DSDT.
> 
> Are you sure about selecting the DSDT? I don't see anyone else is using
> the ACPI_EXTRACT_NAME_* macros in the DSDT table (and I keep crashing my
> guest, but ignore it for now ;-)).

There is one user:

acpi-dsdt-isa.dsl:ACPI_EXTRACT_NAME_BYTE_CONST DSDT_APPLESMC_STA

>> BTW, which events would cause the ID to change?  How should live cloning
>> (or revert to a disk+RAM snapshot) be implemented by layers above QEMU
>> for the VM gen ID to be patched?  Can you add something to docs/ about
>> it?
> 
> The VGID is expected to change when executing a VM with the -snapshot
> option, when a VM is restored from a backup or when it is imported,
> copied or cloned. So I would say it is a management's call.

Ok, got it.

So to support migration (which includes reverting to an earlier disk+RAM
snapshot) you just need to ensure the VGID is patched accordingly.
Whether VGID _will_ be different or not, that's management's call.

> I think that the Microsoft's document describes the requirements better
> than me :-).
> 
>> Also, how does this ID compare to the UUID in the DMI info (-uuid)?
> 
> The -uuid is not expected to change after the VM was created. Unlike the
> -vmgenid that is designed to give the guest OS a notification that a
> change has occurred. Microsoft, as an example, writes that is can be use
> for a safer cryptographic software.

I would say that cloning should change the UUID (and the VMGID).

Paolo

[Qemu-devel] [PATCH 0/2] vbe: bochs dispi interface fixes

2014-09-01 Thread Gerd Hoffmann

  Hi,

Two fixes for the bochs dispi interface,
one of them fixing a minor security issue.

please review,
  Gerd

Gerd Hoffmann (2):
  vbe: make bochs dispi interface return the correct memory size with
qxl
  vbe: rework sanity checks

 hw/display/qxl.c |   1 +
 hw/display/vga.c | 159 ---
 hw/display/vga_int.h |   1 +
 3 files changed, 101 insertions(+), 60 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH 2/2] vbe: rework sanity checks

2014-09-01 Thread Gerd Hoffmann

Plug a bunch of holes in the bochs dispi interface parameter checking.
Add a function doing verification on all registers.  Call that
unconditionally on every register write.  That way we should catch
everything, even changing one register affecting the valid range of
another register.

Some of the holes have been added by commit
e9c6149f6ae6873f14a12eea554925b6aa4c4dec.  Before that commit the
maximum possible framebuffer (VBE_DISPI_MAX_XRES * VBE_DISPI_MAX_YRES *
32 bpp) has been smaller than the qemu vga memory (8MB) and the checking
for VBE_DISPI_MAX_XRES + VBE_DISPI_MAX_YRES + VBE_DISPI_MAX_BPP was ok.

Some of the holes have been there forever, such as
VBE_DISPI_INDEX_X_OFFSET and VBE_DISPI_INDEX_Y_OFFSET register writes
lacking any verification.

Security impact:

(1) Guest can make the ui (gtk/vnc/...) use memory rages outside the vga
frame buffer as source  ->  host memory leak.  Memory isn't leaked to
the guest but to the vnc client though.

(2) Qemu will segfault in case the memory range happens to include
unmapped areas  ->  Guest can DoS itself.

The guest can not modify host memory, so I don't think this can be used
by the guest to escape.

Cc: qemu-sta...@nongnu.org
Cc: secal...@redhat.com
Signed-off-by: Gerd Hoffmann 
Reviewed-by: Laszlo Ersek 
---
 hw/display/vga.c | 154 ++-
 1 file changed, 95 insertions(+), 59 deletions(-)

diff --git a/hw/display/vga.c b/hw/display/vga.c
index 99251d7..62e6243 100644
--- a/hw/display/vga.c
+++ b/hw/display/vga.c
@@ -576,6 +576,93 @@ void vga_ioport_write(void *opaque, uint32_t addr, 
uint32_t val)
 }
 }
 
+/*
+ * Sanity check vbe register writes.
+ *
+ * As we don't have a way to signal errors to the guest in the bochs
+ * dispi interface we'll go adjust the registers to the closest valid
+ * value.
+ */
+static void vbe_fixup_regs(VGACommonState *s)
+{
+uint16_t *r = s->vbe_regs;
+uint32_t bits, linelength, maxy, offset;
+
+if (!(r[VBE_DISPI_INDEX_ENABLE] & VBE_DISPI_ENABLED)) {
+/* vbe is turned off -- nothing to do */
+return;
+}
+
+/* check depth */
+switch (r[VBE_DISPI_INDEX_BPP]) {
+case 4:
+case 8:
+case 16:
+case 24:
+case 32:
+bits = r[VBE_DISPI_INDEX_BPP];
+break;
+case 15:
+bits = 16;
+break;
+default:
+bits = r[VBE_DISPI_INDEX_BPP] = 8;
+break;
+}
+
+/* check width */
+r[VBE_DISPI_INDEX_XRES] &= ~7u;
+if (r[VBE_DISPI_INDEX_XRES] == 0) {
+r[VBE_DISPI_INDEX_XRES] = 8;
+}
+if (r[VBE_DISPI_INDEX_XRES] > VBE_DISPI_MAX_XRES) {
+r[VBE_DISPI_INDEX_XRES] = VBE_DISPI_MAX_XRES;
+}
+r[VBE_DISPI_INDEX_VIRT_WIDTH] &= ~7u;
+if (r[VBE_DISPI_INDEX_VIRT_WIDTH] > VBE_DISPI_MAX_XRES) {
+r[VBE_DISPI_INDEX_VIRT_WIDTH] = VBE_DISPI_MAX_XRES;
+}
+if (r[VBE_DISPI_INDEX_VIRT_WIDTH] < r[VBE_DISPI_INDEX_XRES]) {
+r[VBE_DISPI_INDEX_VIRT_WIDTH] = r[VBE_DISPI_INDEX_XRES];
+}
+
+/* check height */
+linelength = r[VBE_DISPI_INDEX_VIRT_WIDTH] * bits / 8;
+maxy = s->vbe_size / linelength;
+if (r[VBE_DISPI_INDEX_YRES] == 0) {
+r[VBE_DISPI_INDEX_YRES] = 1;
+}
+if (r[VBE_DISPI_INDEX_YRES] > VBE_DISPI_MAX_YRES) {
+r[VBE_DISPI_INDEX_YRES] = VBE_DISPI_MAX_YRES;
+}
+if (r[VBE_DISPI_INDEX_YRES] > maxy) {
+r[VBE_DISPI_INDEX_YRES] = maxy;
+}
+
+/* check offset */
+if (r[VBE_DISPI_INDEX_X_OFFSET] > VBE_DISPI_MAX_XRES) {
+r[VBE_DISPI_INDEX_X_OFFSET] = VBE_DISPI_MAX_XRES;
+}
+if (r[VBE_DISPI_INDEX_Y_OFFSET] > VBE_DISPI_MAX_YRES) {
+r[VBE_DISPI_INDEX_Y_OFFSET] = VBE_DISPI_MAX_YRES;
+}
+offset = r[VBE_DISPI_INDEX_X_OFFSET] * bits / 8;
+offset += r[VBE_DISPI_INDEX_Y_OFFSET] * linelength;
+if (offset + r[VBE_DISPI_INDEX_YRES] * linelength > s->vbe_size) {
+r[VBE_DISPI_INDEX_Y_OFFSET] = 0;
+offset = r[VBE_DISPI_INDEX_X_OFFSET] * bits / 8;
+if (offset + r[VBE_DISPI_INDEX_YRES] * linelength > s->vbe_size) {
+r[VBE_DISPI_INDEX_X_OFFSET] = 0;
+offset = 0;
+}
+}
+
+/* update vga state */
+r[VBE_DISPI_INDEX_VIRT_HEIGHT] = maxy;
+s->vbe_line_offset = linelength;
+s->vbe_start_addr  = offset / 4;
+}
+
 static uint32_t vbe_ioport_read_index(void *opaque, uint32_t addr)
 {
 VGACommonState *s = opaque;
@@ -645,22 +732,13 @@ void vbe_ioport_write_data(void *opaque, uint32_t addr, 
uint32_t val)
 }
 break;
 case VBE_DISPI_INDEX_XRES:
-if ((val <= VBE_DISPI_MAX_XRES) && ((val & 7) == 0)) {
-s->vbe_regs[s->vbe_index] = val;
-}
-break;
 case VBE_DISPI_INDEX_YRES:
-if (val <= VBE_DISPI_MAX_YRES) {
-s->vbe_regs[s->vbe_index] = val;
-}
-break;
 case VBE_DISPI_INDEX_BPP:
-if (val == 0)
-val = 8;
-

[Qemu-devel] [PATCH 1/2] vbe: make bochs dispi interface return the correct memory size with qxl

2014-09-01 Thread Gerd Hoffmann

VgaState->vram_size is the size of the pci bar.  In case of qxl not the
whole pci bar can be used as vga framebuffer.  Add a new variable
vbe_size to handle that case.  By default (if unset) it equals
vram_size, but qxl can set vbe_size to something else.

This makes sure VBE_DISPI_INDEX_VIDEO_MEMORY_64K returns correct results
and sanity checks are done with the correct size too.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Gerd Hoffmann 
Reviewed-by: Laszlo Ersek 
---
 hw/display/qxl.c | 1 +
 hw/display/vga.c | 7 +--
 hw/display/vga_int.h | 1 +
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/display/qxl.c b/hw/display/qxl.c
index d43aa49..652af99 100644
--- a/hw/display/qxl.c
+++ b/hw/display/qxl.c
@@ -2063,6 +2063,7 @@ static int qxl_init_primary(PCIDevice *dev)
 
 qxl->id = 0;
 qxl_init_ramsize(qxl);
+vga->vbe_size = qxl->vgamem_size;
 vga->vram_size_mb = qxl->vga.vram_size >> 20;
 vga_common_init(vga, OBJECT(dev), true);
 vga_init(vga, OBJECT(dev),
diff --git a/hw/display/vga.c b/hw/display/vga.c
index 65dab8d..99251d7 100644
--- a/hw/display/vga.c
+++ b/hw/display/vga.c
@@ -610,7 +610,7 @@ uint32_t vbe_ioport_read_data(void *opaque, uint32_t addr)
 val = s->vbe_regs[s->vbe_index];
 }
 } else if (s->vbe_index == VBE_DISPI_INDEX_VIDEO_MEMORY_64K) {
-val = s->vram_size / (64 * 1024);
+val = s->vbe_size / (64 * 1024);
 } else {
 val = 0;
 }
@@ -749,7 +749,7 @@ void vbe_ioport_write_data(void *opaque, uint32_t addr, 
uint32_t val)
 line_offset = w >> 1;
 else
 line_offset = w * ((s->vbe_regs[VBE_DISPI_INDEX_BPP] + 7) 
>> 3);
-h = s->vram_size / line_offset;
+h = s->vbe_size / line_offset;
 /* XXX: support weird bochs semantics ? */
 if (h < s->vbe_regs[VBE_DISPI_INDEX_YRES])
 return;
@@ -2285,6 +2285,9 @@ void vga_common_init(VGACommonState *s, Object *obj, bool 
global_vmstate)
 s->vram_size <<= 1;
 }
 s->vram_size_mb = s->vram_size >> 20;
+if (!s->vbe_size) {
+s->vbe_size = s->vram_size;
+}
 
 s->is_vbe_vmstate = 1;
 memory_region_init_ram(&s->vram, obj, "vga.vram", s->vram_size);
diff --git a/hw/display/vga_int.h b/hw/display/vga_int.h
index 641f8f4..bbc0cb2 100644
--- a/hw/display/vga_int.h
+++ b/hw/display/vga_int.h
@@ -93,6 +93,7 @@ typedef struct VGACommonState {
 MemoryRegion vram_vbe;
 uint32_t vram_size;
 uint32_t vram_size_mb; /* property */
+uint32_t vbe_size;
 uint32_t latch;
 bool has_chain4_alias;
 MemoryRegion chain4_alias;
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH 7/8] block: add two helpers

2014-09-01 Thread Benoît Canet

The Monday 01 Sep 2014 à 15:43:13 (+0800), Liu Yuan wrote :
> These helpers are needed by later quorum sync device logic.
> 
> Cc: Eric Blake 
> Cc: Benoit Canet 
> Cc: Kevin Wolf 
> Cc: Stefan Hajnoczi 
> Signed-off-by: Liu Yuan 
> ---
>  block.c   | 10 ++
>  include/block/block.h |  2 ++
>  2 files changed, 12 insertions(+)
> 
> diff --git a/block.c b/block.c
> index 22eb3e4..2e2f1d9 100644
> --- a/block.c
> +++ b/block.c
> @@ -2145,6 +2145,16 @@ void *bdrv_get_attached_dev(BlockDriverState *bs)
>  return bs->dev;
>  }
>  
> +BlockDriverState *bdrv_get_file(BlockDriverState *bs)
> +{
> +return bs->file;
> +}
> +
> +const char *bdrv_get_filename(BlockDriverState *bs)
> +{
> +return bs->filename;
> +}
> +

I don't think why we would need this. We will see in next patch.

>  void bdrv_set_dev_ops(BlockDriverState *bs, const BlockDevOps *ops,
>void *opaque)
>  {
> diff --git a/include/block/block.h b/include/block/block.h
> index a61eaf0..1e116cc 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -237,6 +237,8 @@ int bdrv_attach_dev(BlockDriverState *bs, void *dev);
>  void bdrv_attach_dev_nofail(BlockDriverState *bs, void *dev);
>  void bdrv_detach_dev(BlockDriverState *bs, void *dev);
>  void *bdrv_get_attached_dev(BlockDriverState *bs);
> +BlockDriverState *bdrv_get_file(BlockDriverState *bs);
> +const char *bdrv_get_filename(BlockDriverState *bs);
>  void bdrv_set_dev_ops(BlockDriverState *bs, const BlockDevOps *ops,
>void *opaque);
>  void bdrv_set_drv_ops(BlockDriverState *bs, const BlockDrvOps *ops,
> -- 
> 1.9.1
>

Re: [Qemu-devel] [PATCH 6/8] block/quorum: add broken state to BlockDriverState

2014-09-01 Thread Benoît Canet

The Monday 01 Sep 2014 à 15:43:12 (+0800), Liu Yuan wrote :
> This allow VM continues to process even if some devices are broken meanwhile
> with proper configuration.
> 
> We mark the device broken when the protocol tier notify back some broken
> state(s) of the device, such as diconnection via driver operations. We could
> also reset the device as sound when the protocol tier is repaired.
> 
> Origianlly .threshold controls how we should decide the success of read/write
> and return the failure only if the success count of read/write is less than
> .threshold specified by users. But it doesn't record the states of underlying
> states and will impact performance a bit in some cases.
> 
> For example, we have 3 children and .threshold is set 2. If one of the devices
> broken, we should still return success and continue to run VM. But for every
> IO operations, we will blindly send the requests to the broken device.
> 
> To store broken state into driver state we can save requests to borken devices
> and resend the requests to the repaired ones by setting broken as false.
> 
> This is especially useful for network based protocol such as sheepdog, which
> has a auto-reconnection mechanism and will never report EIO if the connection
> is broken but just store the requests to its local queue and wait for 
> resending.
> Without broken state, quorum request will not come back until the connection 
> is
> re-established. So we have to skip the broken deivces to allow VM to continue
> running with networked backed child (glusterfs, nfs, sheepdog, etc).
> 
> With the combination of read-pattern and threshold, we can easily mimic the 
> DRVD
> behavior with following configuration:
> 
>  read-pattern=fifo,threshold=1 will two children.
> 
> Cc: Eric Blake 
> Cc: Benoit Canet 
> Cc: Kevin Wolf 
> Cc: Stefan Hajnoczi 
> Signed-off-by: Liu Yuan 
> ---
>  block/quorum.c| 102 
> ++
>  include/block/block_int.h |   3 ++
>  2 files changed, 87 insertions(+), 18 deletions(-)
> 
> diff --git a/block/quorum.c b/block/quorum.c
> index b9eeda3..7b07e35 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -120,6 +120,7 @@ struct QuorumAIOCB {
>  int rewrite_count;  /* number of replica to rewrite: count down 
> to
>   * zero once writes are fired
>   */
> +int issued_count;   /* actual read&write issued count */
>  
>  QuorumVotes votes;
>  
> @@ -170,8 +171,10 @@ static void quorum_aio_finalize(QuorumAIOCB *acb)
>  if (acb->is_read) {
>  /* on the quorum case acb->child_iter == s->num_children - 1 */
>  for (i = 0; i <= acb->child_iter; i++) {
> -qemu_vfree(acb->qcrs[i].buf);
> -qemu_iovec_destroy(&acb->qcrs[i].qiov);
> +if (acb->qcrs[i].buf) {
> +qemu_vfree(acb->qcrs[i].buf);
> +qemu_iovec_destroy(&acb->qcrs[i].qiov);
> +}
>  }
>  }
>  
> @@ -207,6 +210,7 @@ static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
>  acb->count = 0;
>  acb->success_count = 0;
>  acb->rewrite_count = 0;
> +acb->issued_count = 0;
>  acb->votes.compare = quorum_sha256_compare;
>  QLIST_INIT(&acb->votes.vote_list);
>  acb->is_read = false;
> @@ -286,6 +290,22 @@ static void quorum_copy_qiov(QEMUIOVector *dest, 
> QEMUIOVector *source)
>  }
>  }
>  
> +static int next_fifo_child(QuorumAIOCB *acb)
> +{
> +BDRVQuorumState *s = acb->common.bs->opaque;
> +int i;
> +
> +for (i = acb->child_iter; i < s->num_children; i++) {
> +if (!s->bs[i]->broken) {
> +break;
> +}
> +}
> +if (i == s->num_children) {
> +return -1;
> +}
> +return i;
> +}
> +
>  static void quorum_aio_cb(void *opaque, int ret)
>  {
>  QuorumChildRequest *sacb = opaque;
> @@ -293,11 +313,18 @@ static void quorum_aio_cb(void *opaque, int ret)
>  BDRVQuorumState *s = acb->common.bs->opaque;
>  bool rewrite = false;
>  
> +if (ret < 0) {
> +s->bs[acb->child_iter]->broken = true;
> +}

child_iter is fifo mode stuff.
Do we need to write if (s->read_pattern == QUORUM_READ_PATTERN_FIFO && ret < 0) 
here ?


> +
>  if (acb->is_read && s->read_pattern == QUORUM_READ_PATTERN_FIFO) {
>  /* We try to read next child in FIFO order if we fail to read */
> -if (ret < 0 && ++acb->child_iter < s->num_children) {
> -read_fifo_child(acb);
> -return;
> +if (ret < 0) {
> +acb->child_iter = next_fifo_child(acb);

You don't seem to increment child_iter anymore.

> +if (acb->child_iter > 0) {
> +read_fifo_child(acb);
> +return;
> +}
>  }
>  
>  if (ret == 0) {
> @@ -315,9 +342,9 @@ static void quorum_aio_cb(void *opaque, int ret)
>  } else {
>  quorum_report_bad(acb, sacb->aiocb->bs->nod

[Qemu-devel] [PATCH v1 2/2] block/archipelago: Use QEMU atomic builtins

2014-09-01 Thread Chrysostomos Nanakos

Replace __sync builtins with the ones provided by QEMU
for atomic operations.

Signed-off-by: Chrysostomos Nanakos 
---
 block/archipelago.c |   11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/block/archipelago.c b/block/archipelago.c
index 34f72dc..fa8cd29 100644
--- a/block/archipelago.c
+++ b/block/archipelago.c
@@ -57,6 +57,7 @@
 #include "qapi/qmp/qint.h"
 #include "qapi/qmp/qstring.h"
 #include "qapi/qmp/qjson.h"
+#include "qemu/atomic.h"
 
 #include 
 #include 
@@ -214,7 +215,7 @@ static void xseg_request_handler(void *state)
 
 xseg_put_request(s->xseg, req, s->srcport);
 
-if ((__sync_add_and_fetch(&segreq->ref, -1)) == 0) {
+if ((atomic_add_fetch(&segreq->ref, -1)) == 0) {
 if (!segreq->failed) {
 reqdata->aio_cb->ret = segreq->count;
 archipelago_finish_aiocb(reqdata);
@@ -233,7 +234,7 @@ static void xseg_request_handler(void *state)
 segreq->count += req->serviced;
 xseg_put_request(s->xseg, req, s->srcport);
 
-if ((__sync_add_and_fetch(&segreq->ref, -1)) == 0) {
+if ((atomic_add_fetch(&segreq->ref, -1)) == 0) {
 if (!segreq->failed) {
 reqdata->aio_cb->ret = segreq->count;
 archipelago_finish_aiocb(reqdata);
@@ -885,13 +886,13 @@ static int 
archipelago_aio_segmented_rw(BDRVArchipelagoState *s,
 return 0;
 
 err_exit:
-__sync_add_and_fetch(&segreq->failed, 1);
+atomic_add_fetch(&segreq->failed, 1);
 if (segments_nr == 1) {
-if (__sync_add_and_fetch(&segreq->ref, -1) == 0) {
+if (atomic_add_fetch(&segreq->ref, -1) == 0) {
 g_free(segreq);
 }
 } else {
-if ((__sync_add_and_fetch(&segreq->ref, -segments_nr + i)) == 0) {
+if ((atomic_add_fetch(&segreq->ref, -segments_nr + i)) == 0) {
 g_free(segreq);
 }
 }
-- 
1.7.10.4

[Qemu-devel] [PATCH v1 1/2] Extend header file for atomic operations

2014-09-01 Thread Chrysostomos Nanakos

Add __sync_*_and_fetch builtins used in several places.

Signed-off-by: Chrysostomos Nanakos 
---
 include/qemu/atomic.h |4 
 1 file changed, 4 insertions(+)

diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
index 492bce1..48fc283 100644
--- a/include/qemu/atomic.h
+++ b/include/qemu/atomic.h
@@ -189,6 +189,10 @@
 #define atomic_fetch_sub   __sync_fetch_and_sub
 #define atomic_fetch_and   __sync_fetch_and_and
 #define atomic_fetch_or__sync_fetch_and_or
+#define atomic_add_fetch   __sync_add_and_fetch
+#define atomic_sub_fetch   __sync_sub_and_fetch
+#define atomic_or_fetch__sync_or_and_fetch
+#define atomic_and_fetch   __sync_and_and_fetch
 #define atomic_cmpxchg __sync_val_compare_and_swap
 
 /* And even shorter names that return void.  */
-- 
1.7.10.4

Re: [Qemu-devel] linux-user: enabling binfmt P flag

2014-09-01 Thread Peter Maydell

On 1 September 2014 09:51, Paolo Bonzini  wrote:
> Il 29/08/2014 20:01, Peter Maydell ha scritto:
>> [cc'ing MJT for more distro opinion since I think fundamentally
>> the choice we ought to make upstream is "what's not going to
>> screw over distros"... Paolo, is there a RedHat QEMU maintainer
>> who would have an opinion here?]
>
> There's Cole Robinson.
>
> BTW, Fedora doesn't use the binfmt scripts from QEMU

That's ok, nobody with any sense doesn't.

>, but does reuse the
> binfmt lines.  We'd just add Ps and we'd be fine.

But this would break all your existing users' existing
chroot setups. That's the question I'm after an answer to:
what do you (as a distro) think would be acceptable as
transitional breakage, if anything?

> However, the problem is not really for distros.  Packagers just read the
> release notes and adjust whatever needs to be adjusted.  The problem is
> for people who compile from source and are bit by conflicting binfmt
> formats from their distro.

This is one reason I like the "one binary name for O and
one for P" approach.

> The solution could be to extend binfmt_misc so that it sets two
> environment variables BINFMT_MISC_PID and BINFMT_MISC_ORIG_ARGV0.  The
> former is set to the pid of the binfmt "interpreter" program, the latter
> to the argv[0] value.  Then QEMU can check if BINFMT_MISC_PID matches
> getpid() and, if so, trust the BINFMT_MISC_ORIG_ARGV0 value.

Certainly if we're in a position to get the kernel to be more
informative about how it invoked us that would be the ideal.

thanks
-- PMM

Re: [Qemu-devel] [PATCH 2/5] target-arm: support AArch64 for arm_cpu_set_pc

2014-09-01 Thread Ard Biesheuvel

On 1 September 2014 11:09, Peter Maydell  wrote:
> On 1 September 2014 08:53, Ard Biesheuvel  wrote:
>> From: Rob Herring 
>>
>> Add AArch64 support to arm_cpu_set_pc and make it available to other files.
>
> This is still the wrong way to do this. See review on
> previous version of this patchset:
> http://lists.gnu.org/archive/html/qemu-devel/2014-05/msg00558.html
>

I wasn't aware that these patches had been sent out for review
already, sorry about that.
I will go through the archive, so no need to duplicate any more feedback here.

-- 
Ard.

Re: [Qemu-devel] [PATCH 2/8] block: add driver operation callbacks

2014-09-01 Thread Liu Yuan

On Mon, Sep 01, 2014 at 10:28:54AM +0200, Benoît Canet wrote:
> The Monday 01 Sep 2014 à 15:43:08 (+0800), Liu Yuan wrote :
> > Driver operations are defined as callbacks passed from block upper drivers 
> > to
> > lower drivers and are supposed to be called by lower drivers.
> > 
> > Requests handling(queuing, submitting, etc.) are done in protocol tier in 
> > the
> > block layer and connection states are also maintained down there. Driver
> > operations are supposed to notify the upper tier (such as quorum) of the 
> > states
> > changes.
> > 
> > For now only two operation are added:
> > 
> > driver_disconnect: called when connection is off
> > driver_reconnect: called when connection is on after disconnection
> > 
> > Which are used to notify upper tier of the connection state.
> > 
> > Cc: Eric Blake 
> > Cc: Benoit Canet 
> > Cc: Kevin Wolf 
> > Cc: Stefan Hajnoczi 
> > Signed-off-by: Liu Yuan 
> > ---
> >  block.c   | 7 +++
> >  include/block/block.h | 7 +++
> >  include/block/block_int.h | 3 +++
> >  3 files changed, 17 insertions(+)
> > 
> > diff --git a/block.c b/block.c
> > index c12b8de..22eb3e4 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -2152,6 +2152,13 @@ void bdrv_set_dev_ops(BlockDriverState *bs, const 
> > BlockDevOps *ops,
> >  bs->dev_opaque = opaque;
> >  }
> >  
> > +void bdrv_set_drv_ops(BlockDriverState *bs, const BlockDrvOps *ops,
> > +  void *opaque)
> > +{
> > +bs->drv_ops = ops;
> > +bs->drv_opaque = opaque;
> 
> We need to be very carefull of the mix between these fields and the infamous
> bdrv_swap function.
> 
> Also I don't know if "driver operations" is the right name since the 
> BlockDriver structure's
> callback could also be named "driver operations".
> 

BlockDrvierState has a "device operation" for callbacks from devices. So I
choose "driver operation". So any sugguestion for better name?

Thanks
Yuan

[Qemu-devel] [PATCH v5] block: Introduce "null" drivers

2014-09-01 Thread Fam Zheng

This is an analogue to Linux null_blk. It can be used for testing or
benchmarking block device emulation and general block layer
functionalities such as coroutines and throttling, where disk IO is not
necessary or wanted.

Use null-aio:// for AIO version, and null-co:// for coroutine version.

Signed-off-by: Fam Zheng 
Reviewed-by: Benoît Canet 

---

v5: Rename "null://" to "null-aio://". (Stefan)
Add Benoit's rev-by line.
---
 block/Makefile.objs  |   1 +
 block/null.c | 176 +++
 qapi/block-core.json |  20 +-
 3 files changed, 195 insertions(+), 2 deletions(-)
 create mode 100644 block/null.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 858d2b3..087e281 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -9,6 +9,7 @@ block-obj-y += snapshot.o qapi.o
 block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
 block-obj-$(CONFIG_POSIX) += raw-posix.o
 block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
+block-obj-y += null.o
 
 ifeq ($(CONFIG_POSIX),y)
 block-obj-y += nbd.o nbd-client.o sheepdog.o
diff --git a/block/null.c b/block/null.c
new file mode 100644
index 000..2d1633e
--- /dev/null
+++ b/block/null.c
@@ -0,0 +1,176 @@
+/*
+ * Null block driver
+ *
+ * Authors:
+ *  Fam Zheng 
+ *
+ * Copyright (C) 2014 Red Hat, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "block/block_int.h"
+
+typedef struct {
+int64_t length;
+} BDRVNullState;
+
+static QemuOptsList runtime_opts = {
+.name = "null",
+.head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
+.desc = {
+{
+.name = "filename",
+.type = QEMU_OPT_STRING,
+.help = "",
+},
+{
+.name = BLOCK_OPT_SIZE,
+.type = QEMU_OPT_SIZE,
+.help = "size of the null block",
+},
+{ /* end of list */ }
+},
+};
+
+static int null_file_open(BlockDriverState *bs, QDict *options, int flags,
+  Error **errp)
+{
+QemuOpts *opts;
+BDRVNullState *s = bs->opaque;
+
+opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
+qemu_opts_absorb_qdict(opts, options, &error_abort);
+s->length =
+qemu_opt_get_size(opts, BLOCK_OPT_SIZE, 1 << 30);
+qemu_opts_del(opts);
+return 0;
+}
+
+static void null_close(BlockDriverState *bs)
+{
+}
+
+static int64_t null_getlength(BlockDriverState *bs)
+{
+BDRVNullState *s = bs->opaque;
+return s->length;
+}
+
+static coroutine_fn int null_co_read(BlockDriverState *bs, int64_t sector_num,
+ uint8_t *buf, int nb_sectors)
+{
+return 0;
+}
+
+static coroutine_fn int null_co_write(BlockDriverState *bs, int64_t sector_num,
+  const uint8_t *buf, int nb_sectors)
+{
+return 0;
+}
+
+static coroutine_fn int null_co_flush(BlockDriverState *bs)
+{
+return 0;
+}
+
+typedef struct {
+BlockDriverAIOCB common;
+QEMUBH *bh;
+} NullAIOCB;
+
+static void null_aio_cancel(BlockDriverAIOCB *blockacb);
+
+static const AIOCBInfo null_aiocb_info = {
+.aiocb_size = sizeof(NullAIOCB),
+.cancel = null_aio_cancel,
+};
+
+static void null_bh_cb(void *opaque)
+{
+NullAIOCB *acb = opaque;
+acb->common.cb(acb->common.opaque, 0);
+qemu_bh_delete(acb->bh);
+qemu_aio_release(acb);
+}
+
+static inline BlockDriverAIOCB *null_aio_common(BlockDriverState *bs,
+BlockDriverCompletionFunc *cb,
+void *opaque)
+{
+NullAIOCB *acb;
+
+acb = qemu_aio_get(&null_aiocb_info, bs, cb, opaque);
+acb->bh = aio_bh_new(bdrv_get_aio_context(bs), null_bh_cb, acb);
+qemu_bh_schedule(acb->bh);
+return &acb->common;
+}
+
+static BlockDriverAIOCB *null_aio_readv(BlockDriverState *bs,
+int64_t sector_num, QEMUIOVector *qiov,
+int nb_sectors,
+BlockDriverCompletionFunc *cb,
+void *opaque)
+{
+return null_aio_common(bs, cb, opaque);
+}
+
+static BlockDriverAIOCB *null_aio_writev(BlockDriverState *bs,
+ int64_t sector_num, QEMUIOVector 
*qiov,
+ int nb_sectors,
+ BlockDriverCompletionFunc *cb,
+ void *opaque)
+{
+return null_aio_common(bs, cb, opaque);
+}
+
+static BlockDriverAIOCB *null_aio_flush(BlockDriverState *bs,
+BlockDriverCompletionFunc *cb,
+void *opaque)
+{
+return null_aio_common(bs, cb, opaque);
+}
+
+static void null_aio_cancel(BlockDriverAIOCB *blockacb)
+{
+NullAIOCB *acb

Re: [Qemu-devel] [PATCH 3/8] block/sheepdog: propagate disconnect/reconnect events to upper driver

2014-09-01 Thread Liu Yuan

On Mon, Sep 01, 2014 at 10:31:47AM +0200, Benoît Canet wrote:
> The Monday 01 Sep 2014 à 15:43:09 (+0800), Liu Yuan wrote :
> > This is the reference usage how we propagate connection state to upper tier.
> > 
> > Cc: Eric Blake 
> > Cc: Benoit Canet 
> > Cc: Kevin Wolf 
> > Cc: Stefan Hajnoczi 
> > Signed-off-by: Liu Yuan 
> > ---
> >  block/sheepdog.c | 9 +
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/block/sheepdog.c b/block/sheepdog.c
> > index 53c24d6..9c0fc49 100644
> > --- a/block/sheepdog.c
> > +++ b/block/sheepdog.c
> > @@ -714,6 +714,11 @@ static coroutine_fn void reconnect_to_sdog(void 
> > *opaque)
> >  {
> >  BDRVSheepdogState *s = opaque;
> >  AIOReq *aio_req, *next;
> > +BlockDriverState *bs = s->bs;
> > +
> > +if (bs->drv_ops && bs->drv_ops->driver_disconnect) {
> > +bs->drv_ops->driver_disconnect(bs);
> > +}
> 
> Since this sequence will be strictly the same for all the implementation
> could we create a bdrv_signal_disconnect(bs); in the block layer to make this
> code generic ?

I'm not sure if other protocol driver can have the same auto-reconnection logic.
Probably for simplicity, we keep it as is in the patch. Later when we get more
flesh of implementation of other protocols, we can make a better decision.

Thanks
Yuan

Re: [Qemu-devel] [PATCH 5/8] quorum: fix quorum_aio_cancel()

2014-09-01 Thread Liu Yuan

On Mon, Sep 01, 2014 at 10:35:27AM +0200, Benoît Canet wrote:
> The Monday 01 Sep 2014 à 15:43:11 (+0800), Liu Yuan wrote :
> > For a fifo read pattern, we only have one running aio
> 
> >(possible other cases that has less number than num_children in the future)
> I have trouble understanding this part of the commit message could you try
> to clarify it ?

Until this patch, we have two cases, read single or read all. But later patch
allow VMs to continue if some devices are down. So the discrete number 1 and N
becomes a range [1, N], that is possible running requests are from 1 to N.

Thanks
Yuan

Re: [Qemu-devel] [PATCH 1/1] hw/pci-assign: split pci-assign.c

2014-09-01 Thread Chen, Tiejun


On 2014/9/1 16:27, Michael S. Tsirkin wrote:

On Mon, Sep 01, 2014 at 10:07:19AM +0800, Tiejun Chen wrote:

We will try to reuse assign_dev_load_option_rom in xen side, and
especially its a good beginning to unify pci assign codes both on
kvm and xen in the future.

Signed-off-by: Tiejun Chen 
---


[snip]


+ */
+#ifndef PCI_ASSIGN_H
+#define PCI_ASSIGN_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "hw/hw.h"
+#include "hw/i386/pc.h"
+#include "qemu/error-report.h"
+#include "ui/console.h"
+#include "hw/loader.h"
+#include "monitor/monitor.h"
+#include "qemu/range.h"
+#include "sysemu/sysemu.h"
+#include "hw/pci/pci.h"
+#include "hw/pci/msi.h"
+#include "kvm_i386.h"


Why are you pulling all these headers here?
Please include the minimum required.


So just leave #include "hw/pci/pci.h".




+
+#define MSIX_PAGE_SIZE 0x1000
+
+/* From linux/ioport.h */
+#define IORESOURCE_IO   0x0100  /* Resource type */
+#define IORESOURCE_MEM  0x0200


[snip]


+uint8_t emulate_config_read[PCI_CONFIG_SPACE_SIZE];
+uint8_t emulate_config_write[PCI_CONFIG_SPACE_SIZE];
+int msi_virq_nr;
+int *msi_virq;
+MSIXTableEntry *msix_table;
+hwaddr msix_table_addr;
+uint16_t msix_max;
+MemoryRegion mmio;
+char *configfd_name;
+int32_t bootindex;
+} AssignedDevice;
+


Why are you moving the above here?


As I said in the patch head description, I think this is a good 
beginning to unify pci-assign in both KVM and XEN. So I tried to move 
these common stuffs. Although we mightn't use them directly in the 
future, but I guess we still need to move them into this head file.


If you think we should do this on-demand exactly, I can move back them 
to pci-assign.c.






+int dev_load_option_rom(PCIDevice *dev, struct Object *owner, void *ptr,
+unsigned int domain, unsigned int bus,
+unsigned int slot, unsigned int function);


Please use a header-specific prefix to avoid global namespace pollution.
pci_assign_dev_load_option_rom?


Looks good so I will follow-up yours.

Thanks
Tiejun




+#endif /* PCI_ASSIGN_H */
--
1.9.1

Re: [Qemu-devel] linux-user: enabling binfmt P flag

2014-09-01 Thread Paolo Bonzini

Il 01/09/2014 11:12, Peter Maydell ha scritto:
>> We'd just add Ps and we'd be fine.
> 
> But this would break all your existing users' existing
> chroot setups. That's the question I'm after an answer to:
> what do you (as a distro) think would be acceptable as
> transitional breakage, if anything?

Yes, but it's not like Fedora'd have choice.  Distros have to follow
what upstream does, even if it breaks something else (or they could
revert the "P" patch and get an entirely specular set of bug reports).

>> > The solution could be to extend binfmt_misc so that it sets two
>> > environment variables BINFMT_MISC_PID and BINFMT_MISC_ORIG_ARGV0.  The
>> > former is set to the pid of the binfmt "interpreter" program, the latter
>> > to the argv[0] value.  Then QEMU can check if BINFMT_MISC_PID matches
>> > getpid() and, if so, trust the BINFMT_MISC_ORIG_ARGV0 value.
> Certainly if we're in a position to get the kernel to be more
> informative about how it invoked us that would be the ideal.

I think it's simply that "P" was ill-designed (possibly because
implementing the above is not trivial).

Paolo

Re: [Qemu-devel] [PATCH 2/8] block: add driver operation callbacks

2014-09-01 Thread Benoît Canet

The Monday 01 Sep 2014 à 17:19:19 (+0800), Liu Yuan wrote :
> On Mon, Sep 01, 2014 at 10:28:54AM +0200, Benoît Canet wrote:
> > The Monday 01 Sep 2014 à 15:43:08 (+0800), Liu Yuan wrote :
> > > Driver operations are defined as callbacks passed from block upper 
> > > drivers to
> > > lower drivers and are supposed to be called by lower drivers.
> > > 
> > > Requests handling(queuing, submitting, etc.) are done in protocol tier in 
> > > the
> > > block layer and connection states are also maintained down there. Driver
> > > operations are supposed to notify the upper tier (such as quorum) of the 
> > > states
> > > changes.
> > > 
> > > For now only two operation are added:
> > > 
> > > driver_disconnect: called when connection is off
> > > driver_reconnect: called when connection is on after disconnection
> > > 
> > > Which are used to notify upper tier of the connection state.
> > > 
> > > Cc: Eric Blake 
> > > Cc: Benoit Canet 
> > > Cc: Kevin Wolf 
> > > Cc: Stefan Hajnoczi 
> > > Signed-off-by: Liu Yuan 
> > > ---
> > >  block.c   | 7 +++
> > >  include/block/block.h | 7 +++
> > >  include/block/block_int.h | 3 +++
> > >  3 files changed, 17 insertions(+)
> > > 
> > > diff --git a/block.c b/block.c
> > > index c12b8de..22eb3e4 100644
> > > --- a/block.c
> > > +++ b/block.c
> > > @@ -2152,6 +2152,13 @@ void bdrv_set_dev_ops(BlockDriverState *bs, const 
> > > BlockDevOps *ops,
> > >  bs->dev_opaque = opaque;
> > >  }
> > >  
> > > +void bdrv_set_drv_ops(BlockDriverState *bs, const BlockDrvOps *ops,
> > > +  void *opaque)
> > > +{
> > > +bs->drv_ops = ops;
> > > +bs->drv_opaque = opaque;
> > 
> > We need to be very carefull of the mix between these fields and the infamous
> > bdrv_swap function.
> > 
> > Also I don't know if "driver operations" is the right name since the 
> > BlockDriver structure's
> > callback could also be named "driver operations".
> > 
> 
> BlockDrvierState has a "device operation" for callbacks from devices. So I
> choose "driver operation". So any sugguestion for better name?

>From what I see in this series the job of these callbacks is to send a message 
>or a signal to
the upper BDS.

Also the name must reflect it goes from the child to the parent.

child_signals ?
child_messages ?

Best regards

Benoît

> 
> Thanks
> Yuan
>

Re: [Qemu-devel] [PATCH 2/5] target-arm: support AArch64 for arm_cpu_set_pc

2014-09-01 Thread Peter Maydell

On 1 September 2014 08:53, Ard Biesheuvel  wrote:
> From: Rob Herring 
>
> Add AArch64 support to arm_cpu_set_pc and make it available to other files.

This is still the wrong way to do this. See review on
previous version of this patchset:
http://lists.gnu.org/archive/html/qemu-devel/2014-05/msg00558.html

thanks
-- PMM

Re: [Qemu-devel] [PATCH 6/8] block/quorum: add broken state to BlockDriverState

2014-09-01 Thread Liu Yuan

On Mon, Sep 01, 2014 at 10:57:43AM +0200, Benoît Canet wrote:
> The Monday 01 Sep 2014 à 15:43:12 (+0800), Liu Yuan wrote :
> > This allow VM continues to process even if some devices are broken meanwhile
> > with proper configuration.
> > 
> > We mark the device broken when the protocol tier notify back some broken
> > state(s) of the device, such as diconnection via driver operations. We could
> > also reset the device as sound when the protocol tier is repaired.
> > 
> > Origianlly .threshold controls how we should decide the success of 
> > read/write
> > and return the failure only if the success count of read/write is less than
> > .threshold specified by users. But it doesn't record the states of 
> > underlying
> > states and will impact performance a bit in some cases.
> > 
> > For example, we have 3 children and .threshold is set 2. If one of the 
> > devices
> > broken, we should still return success and continue to run VM. But for every
> > IO operations, we will blindly send the requests to the broken device.
> > 
> > To store broken state into driver state we can save requests to borken 
> > devices
> > and resend the requests to the repaired ones by setting broken as false.
> > 
> > This is especially useful for network based protocol such as sheepdog, which
> > has a auto-reconnection mechanism and will never report EIO if the 
> > connection
> > is broken but just store the requests to its local queue and wait for 
> > resending.
> > Without broken state, quorum request will not come back until the 
> > connection is
> > re-established. So we have to skip the broken deivces to allow VM to 
> > continue
> > running with networked backed child (glusterfs, nfs, sheepdog, etc).
> > 
> > With the combination of read-pattern and threshold, we can easily mimic the 
> > DRVD
> > behavior with following configuration:
> > 
> >  read-pattern=fifo,threshold=1 will two children.
> > 
> > Cc: Eric Blake 
> > Cc: Benoit Canet 
> > Cc: Kevin Wolf 
> > Cc: Stefan Hajnoczi 
> > Signed-off-by: Liu Yuan 
> > ---
> >  block/quorum.c| 102 
> > ++
> >  include/block/block_int.h |   3 ++
> >  2 files changed, 87 insertions(+), 18 deletions(-)
> > 
> > diff --git a/block/quorum.c b/block/quorum.c
> > index b9eeda3..7b07e35 100644
> > --- a/block/quorum.c
> > +++ b/block/quorum.c
> > @@ -120,6 +120,7 @@ struct QuorumAIOCB {
> >  int rewrite_count;  /* number of replica to rewrite: count 
> > down to
> >   * zero once writes are fired
> >   */
> > +int issued_count;   /* actual read&write issued count */
> >  
> >  QuorumVotes votes;
> >  
> > @@ -170,8 +171,10 @@ static void quorum_aio_finalize(QuorumAIOCB *acb)
> >  if (acb->is_read) {
> >  /* on the quorum case acb->child_iter == s->num_children - 1 */
> >  for (i = 0; i <= acb->child_iter; i++) {
> > -qemu_vfree(acb->qcrs[i].buf);
> > -qemu_iovec_destroy(&acb->qcrs[i].qiov);
> > +if (acb->qcrs[i].buf) {
> > +qemu_vfree(acb->qcrs[i].buf);
> > +qemu_iovec_destroy(&acb->qcrs[i].qiov);
> > +}
> >  }
> >  }
> >  
> > @@ -207,6 +210,7 @@ static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
> >  acb->count = 0;
> >  acb->success_count = 0;
> >  acb->rewrite_count = 0;
> > +acb->issued_count = 0;
> >  acb->votes.compare = quorum_sha256_compare;
> >  QLIST_INIT(&acb->votes.vote_list);
> >  acb->is_read = false;
> > @@ -286,6 +290,22 @@ static void quorum_copy_qiov(QEMUIOVector *dest, 
> > QEMUIOVector *source)
> >  }
> >  }
> >  
> > +static int next_fifo_child(QuorumAIOCB *acb)
> > +{
> > +BDRVQuorumState *s = acb->common.bs->opaque;
> > +int i;
> > +
> > +for (i = acb->child_iter; i < s->num_children; i++) {
> > +if (!s->bs[i]->broken) {
> > +break;
> > +}
> > +}
> > +if (i == s->num_children) {
> > +return -1;
> > +}
> > +return i;
> > +}
> > +
> >  static void quorum_aio_cb(void *opaque, int ret)
> >  {
> >  QuorumChildRequest *sacb = opaque;
> > @@ -293,11 +313,18 @@ static void quorum_aio_cb(void *opaque, int ret)
> >  BDRVQuorumState *s = acb->common.bs->opaque;
> >  bool rewrite = false;
> >  
> > +if (ret < 0) {
> > +s->bs[acb->child_iter]->broken = true;
> > +}
> 
> child_iter is fifo mode stuff.
> Do we need to write if (s->read_pattern == QUORUM_READ_PATTERN_FIFO && ret < 
> 0) here ?

Probably not. child_iter denotes which bs the QuorumChildRequest belongs to.

> 
> > +
> >  if (acb->is_read && s->read_pattern == QUORUM_READ_PATTERN_FIFO) {
> >  /* We try to read next child in FIFO order if we fail to read */
> > -if (ret < 0 && ++acb->child_iter < s->num_children) {
> > -read_fifo_child(acb);
> > -return;
> > +if (ret < 0) {
>

Re: [Qemu-devel] linux-user: enabling binfmt P flag

2014-09-01 Thread Peter Maydell

On 1 September 2014 10:28, Paolo Bonzini  wrote:
> Il 01/09/2014 11:12, Peter Maydell ha scritto:
>>> We'd just add Ps and we'd be fine.
>>
>> But this would break all your existing users' existing
>> chroot setups. That's the question I'm after an answer to:
>> what do you (as a distro) think would be acceptable as
>> transitional breakage, if anything?
>
> Yes, but it's not like Fedora'd have choice.  Distros have to follow
> what upstream does, even if it breaks something else (or they could
> revert the "P" patch and get an entirely specular set of bug reports).

Yes, that's why I'm asking what you'd prefer before we
change upstream QEMU rather than just picking something
randomly and letting you deal with the consequences :-)

-- PMM

Re: [Qemu-devel] [PATCH 5/8] quorum: fix quorum_aio_cancel()

2014-09-01 Thread Benoît Canet

The Monday 01 Sep 2014 à 17:26:09 (+0800), Liu Yuan wrote :
> On Mon, Sep 01, 2014 at 10:35:27AM +0200, Benoît Canet wrote:
> > The Monday 01 Sep 2014 à 15:43:11 (+0800), Liu Yuan wrote :
> > > For a fifo read pattern, we only have one running aio
> > 
> > >(possible other cases that has less number than num_children in the future)
> > I have trouble understanding this part of the commit message could you try
> > to clarify it ?
> 
> Until this patch, we have two cases, read single or read all. But later patch
> allow VMs to continue if some devices are down. So the discrete number 1 and N
> becomes a range [1, N], that is possible running requests are from 1 to N.

Why not 

(In some other cases some children of the num_children BDS could be disabled
reducing the number of requests needed) ?

> 
> Thanks
> Yuan

[Qemu-devel] [Patch] block:qemu will crash when vhost-scsi disk vm reboot

2014-09-01 Thread Zhang Min

From: subo 

When the vm reboot, it will call virtio_scsi_handle_event(),
for vhost-scsi device,vdev is VIRTIO_SCSI_COMMON, not VIRTIO_SCSI,
if vdev convert to the VIRTIO_SCSI, it will cause qemu crash.

Signed-off-by: Zhang Min 
Signed-off-by: subo 
---
 hw/scsi/virtio-scsi.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 86aba88..7d3bc49 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -630,7 +630,11 @@ static void virtio_scsi_push_event(VirtIOSCSI *s, 
SCSIDevice *dev,
 
 static void virtio_scsi_handle_event(VirtIODevice *vdev, VirtQueue *vq)
 {
-VirtIOSCSI *s = VIRTIO_SCSI(vdev);
+VirtIOSCSI *s;
+
+s = (VirtIOSCSI *)object_dynamic_cast((Object *)vdev, TYPE_VIRTIO_SCSI);
+if (!s)
+return;
 
 if (s->events_dropped) {
 virtio_scsi_push_event(s, NULL, VIRTIO_SCSI_T_NO_EVENT, 0);
-- 
1.8.5

Re: [Qemu-devel] [Patch] block:qemu will crash when vhost-scsi disk vm reboot

2014-09-01 Thread Paolo Bonzini

Il 01/09/2014 11:33, Zhang Min ha scritto:
> From: subo 
> 
> When the vm reboot, it will call virtio_scsi_handle_event(),
> for vhost-scsi device,vdev is VIRTIO_SCSI_COMMON, not VIRTIO_SCSI,
> if vdev convert to the VIRTIO_SCSI, it will cause qemu crash.
> 
> Signed-off-by: Zhang Min 
> Signed-off-by: subo 
> ---
>  hw/scsi/virtio-scsi.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> index 86aba88..7d3bc49 100644
> --- a/hw/scsi/virtio-scsi.c
> +++ b/hw/scsi/virtio-scsi.c
> @@ -630,7 +630,11 @@ static void virtio_scsi_push_event(VirtIOSCSI *s, 
> SCSIDevice *dev,
>  
>  static void virtio_scsi_handle_event(VirtIODevice *vdev, VirtQueue *vq)
>  {
> -VirtIOSCSI *s = VIRTIO_SCSI(vdev);
> +VirtIOSCSI *s;
> +
> +s = (VirtIOSCSI *)object_dynamic_cast((Object *)vdev, TYPE_VIRTIO_SCSI);
> +if (!s)
> +return;
>  
>  if (s->events_dropped) {
>  virtio_scsi_push_event(s, NULL, VIRTIO_SCSI_T_NO_EVENT, 0);
> 

Should be already fixed in 2.1 by commit 91d670f (virtio-scsi: define
dummy handle_output for vhost-scsi vqs, 2014-06-19).

Paolo

Re: [Qemu-devel] [PATCH 8/8] quorum: add basic device recovery logic

2014-09-01 Thread Benoît Canet

The Monday 01 Sep 2014 à 15:43:14 (+0800), Liu Yuan wrote :
> For some configuration, quorum allow VMs to continue while some child devices
> are broken and when the child devices are repaired and return back, we need to
> sync dirty bits during downtime to keep data consistency.
> 
> The recovery logic is based on the driver state bitmap and will sync the dirty
> bits with a timeslice window in a coroutine in this prtimive implementation.
> 
> Simple graph about 2 children with threshold=1 and read-pattern=fifo:
> 
> + denote device sync iteration
> - IO on a single device
> = IO on two devices
> 
>   sync complete, release dirty bitmap
>  ^
>  |
>   -++==
>  | |
>  | v
>  |   device repaired and begin to sync
>  v
>device broken, create a dirty bitmap
> 
>   This sync logic can take care of nested broken problem, that devices are
>   broken while in sync. We just start a sync process after the devices are
>   repaired again and switch the devices from broken to sound only when the 
> sync
>   completes.
> 
> For read-pattern=quorum mode, it enjoys the recovery logic without any 
> problem.
> 
> Cc: Eric Blake 
> Cc: Benoit Canet 
> Cc: Kevin Wolf 
> Cc: Stefan Hajnoczi 
> Signed-off-by: Liu Yuan 
> ---
>  block/quorum.c | 189 
> -
>  trace-events   |   5 ++
>  2 files changed, 191 insertions(+), 3 deletions(-)
> 
> diff --git a/block/quorum.c b/block/quorum.c
> index 7b07e35..ffd7c2d 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -23,6 +23,7 @@
>  #include "qapi/qmp/qlist.h"
>  #include "qapi/qmp/qstring.h"
>  #include "qapi-event.h"
> +#include "trace.h"
>  
>  #define HASH_LENGTH 32
>  
> @@ -31,6 +32,10 @@
>  #define QUORUM_OPT_REWRITE"rewrite-corrupted"
>  #define QUORUM_OPT_READ_PATTERN   "read-pattern"
>  
> +#define SLICE_TIME  1ULL /* 100 ms */
> +#define CHUNK_SIZE  (1 << 20) /* 1M */
> +#define SECTORS_PER_CHUNK   (CHUNK_SIZE >> BDRV_SECTOR_BITS)
> +
>  /* This union holds a vote hash value */
>  typedef union QuorumVoteValue {
>  char h[HASH_LENGTH];   /* SHA-256 hash */
> @@ -64,6 +69,7 @@ typedef struct QuorumVotes {
>  
>  /* the following structure holds the state of one quorum instance */
>  typedef struct BDRVQuorumState {
> +BlockDriverState *mybs;/* Quorum block driver base state */
>  BlockDriverState **bs; /* children BlockDriverStates */
>  int num_children;  /* children count */
>  int threshold; /* if less than threshold children reads gave the
> @@ -82,6 +88,10 @@ typedef struct BDRVQuorumState {
>  */
>  
>  QuorumReadPattern read_pattern;
> +BdrvDirtyBitmap *dirty_bitmap;
> +uint8_t *sync_buf;
> +HBitmapIter hbi;
> +int64_t sector_num;
>  } BDRVQuorumState;
>  
>  typedef struct QuorumAIOCB QuorumAIOCB;
> @@ -290,12 +300,11 @@ static void quorum_copy_qiov(QEMUIOVector *dest, 
> QEMUIOVector *source)
>  }
>  }
>  
> -static int next_fifo_child(QuorumAIOCB *acb)
> +static int get_good_child(BDRVQuorumState *s, int iter)
>  {
> -BDRVQuorumState *s = acb->common.bs->opaque;
>  int i;
>  
> -for (i = acb->child_iter; i < s->num_children; i++) {
> +for (i = iter; i < s->num_children; i++) {
>  if (!s->bs[i]->broken) {
>  break;
>  }
> @@ -306,6 +315,13 @@ static int next_fifo_child(QuorumAIOCB *acb)
>  return i;
>  }
>  
> +static int next_fifo_child(QuorumAIOCB *acb)
> +{
> +BDRVQuorumState *s = acb->common.bs->opaque;
> +
> +return get_good_child(s, acb->child_iter);
> +}
> +
>  static void quorum_aio_cb(void *opaque, int ret)
>  {
>  QuorumChildRequest *sacb = opaque;
> @@ -951,6 +967,171 @@ static int parse_read_pattern(const char *opt)
>  return -EINVAL;
>  }
>  
> +static void sync_prepare(BDRVQuorumState *qs, int64_t *num)
> +{
> +int64_t nb, total = bdrv_nb_sectors(qs->mybs);
> +
> +qs->sector_num = hbitmap_iter_next(&qs->hbi);
> +/* Wrap around if previous bits get dirty while syncing */
> +if (qs->sector_num < 0) {
> +bdrv_dirty_iter_init(qs->mybs, qs->dirty_bitmap, &qs->hbi);
> +qs->sector_num = hbitmap_iter_next(&qs->hbi);
> +assert(qs->sector_num >= 0);
> +}
> +
> +for (nb = 1; nb < SECTORS_PER_CHUNK && qs->sector_num + nb < total;
> + nb++) {
> +if (!bdrv_get_dirty(qs->mybs, qs->dirty_bitmap, qs->sector_num + 
> nb)) {
> +break;
> +}
> +}
> +*num = nb;
> +}
> +
> +static void sync_finish(BDRVQuorumState *qs, int64_t num)
> +{
> +int64_t i;
> +
> +for (i = 0; i < num; i++) {
> +/* We need to advance the iterator manually */
> +hbitmap_iter_next(&qs->hbi);
> +}
> +bdrv_reset_dirty(qs-

Re: [Qemu-devel] [PATCH 2/4] pcie: Add support for Single Root I/O Virtualization (SR/IOV)

2014-09-01 Thread Michael S. Tsirkin

On Fri, Aug 29, 2014 at 09:17:07AM +0200, Knut Omang wrote:
> This patch provides the building blocks for creating an SR/IOV
> PCIe Extended Capability header and creating and removing
> SR/IOV Virtual Functions.
> 
> Signed-off-by: Knut Omang 
> ---
>  hw/pci/pci.c  | 107 +++---
>  hw/pci/pcie.c | 205 
> +-
>  include/hw/pci/pci.h  |   6 +-
>  include/hw/pci/pcie.h |  26 +++
>  4 files changed, 311 insertions(+), 33 deletions(-)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index daeaeac..071ab81 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -35,7 +35,6 @@
>  #include "hw/pci/msi.h"
>  #include "hw/pci/msix.h"
>  #include "exec/address-spaces.h"
> -#include "hw/hotplug.h"
>  
>  //#define DEBUG_PCI
>  #ifdef DEBUG_PCI
> @@ -126,6 +125,9 @@ static int pci_bar(PCIDevice *d, int reg)
>  {
>  uint8_t type;
>  
> +/* PCIe virtual functions do not have their own BARs */
> +assert(!d->exp.is_vf);
> +
>  if (reg != PCI_ROM_SLOT)
>  return PCI_BASE_ADDRESS_0 + reg * 4;
>  
> @@ -184,22 +186,13 @@ void pci_device_deassert_intx(PCIDevice *dev)
>  }
>  }
>  
> -static void pci_do_device_reset(PCIDevice *dev)
> +static void pci_reset_regions(PCIDevice *dev)
>  {
>  int r;
> +if (dev->exp.is_vf) {
> +return;
> +}
>  
> -pci_device_deassert_intx(dev);
> -assert(dev->irq_state == 0);
> -
> -/* Clear all writable bits */
> -pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
> - pci_get_word(dev->wmask + PCI_COMMAND) |
> - pci_get_word(dev->w1cmask + PCI_COMMAND));
> -pci_word_test_and_clear_mask(dev->config + PCI_STATUS,
> - pci_get_word(dev->wmask + PCI_STATUS) |
> - pci_get_word(dev->w1cmask + PCI_STATUS));
> -dev->config[PCI_CACHE_LINE_SIZE] = 0x0;
> -dev->config[PCI_INTERRUPT_LINE] = 0x0;
>  for (r = 0; r < PCI_NUM_REGIONS; ++r) {
>  PCIIORegion *region = &dev->io_regions[r];
>  if (!region->size) {
> @@ -213,6 +206,27 @@ static void pci_do_device_reset(PCIDevice *dev)
>  pci_set_long(dev->config + pci_bar(dev, r), region->type);
>  }
>  }
> +}
> +
> +static void pci_do_device_reset(PCIDevice *dev)
> +{
> +qdev_reset_all(&dev->qdev);
> +
> +dev->irq_state = 0;
> +pci_update_irq_status(dev);
> +pci_device_deassert_intx(dev);
> +assert(dev->irq_state == 0);
> +
> +/* Clear all writable bits */
> +pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
> + pci_get_word(dev->wmask + PCI_COMMAND) |
> + pci_get_word(dev->w1cmask + PCI_COMMAND));
> +pci_word_test_and_clear_mask(dev->config + PCI_STATUS,
> + pci_get_word(dev->wmask + PCI_STATUS) |
> + pci_get_word(dev->w1cmask + PCI_STATUS));
> +dev->config[PCI_CACHE_LINE_SIZE] = 0x0;
> +dev->config[PCI_INTERRUPT_LINE] = 0x0;
> +pci_reset_regions(dev);
>  pci_update_mappings(dev);
>  
>  msi_reset(dev);
> @@ -734,6 +748,14 @@ static int pci_init_multifunction(PCIBus *bus, PCIDevice 
> *dev)
>  dev->config[PCI_HEADER_TYPE] |= PCI_HEADER_TYPE_MULTI_FUNCTION;
>  }
>  
> +/* With SR/IOV and ARI, subsequent function 0's are only
> + * another VF which the physical function is placed in the initial
> + * function (0.0)

Couldn't parse this. Could you please reword?

> + */
> +if (dev->exp.pf && dev->exp.pf->cap_present & 
> QEMU_PCI_CAP_MULTIFUNCTION) {
> +return 0;
> +}
> +
>  /*
>   * multifunction bit is interpreted in two ways as follows.
>   *   - all functions must set the bit to 1.
> @@ -920,6 +942,7 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num,
>  uint64_t wmask;
>  pcibus_t size = memory_region_size(memory);
>  
> +assert(!pci_dev->exp.is_vf); /* VFs must use pcie_register_vf_bar */
>  assert(region_num >= 0);
>  assert(region_num < PCI_NUM_REGIONS);
>  if (size & (size-1)) {
> @@ -1018,18 +1041,51 @@ pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int 
> region_num)
>  return pci_dev->io_regions[region_num].addr;
>  }
>  
> -static pcibus_t pci_bar_address(PCIDevice *d,
> - int reg, uint8_t type, pcibus_t size)
> +
> +static pcibus_t pci_config_get_bar_addr(PCIDevice *d, int reg,
> +uint8_t type, pcibus_t size)
> +{
> +pcibus_t new_addr;
> +if (!d->exp.is_vf) {
> +int bar = pci_bar(d, reg);
> +if (type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
> +new_addr = pci_get_quad(d->config + bar);
> +} else {
> +new_addr = pci_get_long(d->config + bar);
> +}
> +} else {
> +int bar = d->exp.pf->exp.sriov_cap + PCI_SRIOV_BAR + reg * 4

Re: [Qemu-devel] [PATCH 2/8] block: add driver operation callbacks

2014-09-01 Thread Liu Yuan

On Mon, Sep 01, 2014 at 11:28:22AM +0200, Benoît Canet wrote:
> The Monday 01 Sep 2014 à 17:19:19 (+0800), Liu Yuan wrote :
> > On Mon, Sep 01, 2014 at 10:28:54AM +0200, Benoît Canet wrote:
> > > The Monday 01 Sep 2014 à 15:43:08 (+0800), Liu Yuan wrote :
> > > > Driver operations are defined as callbacks passed from block upper 
> > > > drivers to
> > > > lower drivers and are supposed to be called by lower drivers.
> > > > 
> > > > Requests handling(queuing, submitting, etc.) are done in protocol tier 
> > > > in the
> > > > block layer and connection states are also maintained down there. Driver
> > > > operations are supposed to notify the upper tier (such as quorum) of 
> > > > the states
> > > > changes.
> > > > 
> > > > For now only two operation are added:
> > > > 
> > > > driver_disconnect: called when connection is off
> > > > driver_reconnect: called when connection is on after disconnection
> > > > 
> > > > Which are used to notify upper tier of the connection state.
> > > > 
> > > > Cc: Eric Blake 
> > > > Cc: Benoit Canet 
> > > > Cc: Kevin Wolf 
> > > > Cc: Stefan Hajnoczi 
> > > > Signed-off-by: Liu Yuan 
> > > > ---
> > > >  block.c   | 7 +++
> > > >  include/block/block.h | 7 +++
> > > >  include/block/block_int.h | 3 +++
> > > >  3 files changed, 17 insertions(+)
> > > > 
> > > > diff --git a/block.c b/block.c
> > > > index c12b8de..22eb3e4 100644
> > > > --- a/block.c
> > > > +++ b/block.c
> > > > @@ -2152,6 +2152,13 @@ void bdrv_set_dev_ops(BlockDriverState *bs, 
> > > > const BlockDevOps *ops,
> > > >  bs->dev_opaque = opaque;
> > > >  }
> > > >  
> > > > +void bdrv_set_drv_ops(BlockDriverState *bs, const BlockDrvOps *ops,
> > > > +  void *opaque)
> > > > +{
> > > > +bs->drv_ops = ops;
> > > > +bs->drv_opaque = opaque;
> > > 
> > > We need to be very carefull of the mix between these fields and the 
> > > infamous
> > > bdrv_swap function.
> > > 
> > > Also I don't know if "driver operations" is the right name since the 
> > > BlockDriver structure's
> > > callback could also be named "driver operations".
> > > 
> > 
> > BlockDrvierState has a "device operation" for callbacks from devices. So I
> > choose "driver operation". So any sugguestion for better name?
> 
> From what I see in this series the job of these callbacks is to send a 
> message or a signal to
> the upper BDS.
> 
> Also the name must reflect it goes from the child to the parent.
> 
> child_signals ?
> child_messages ?
> 

As far as I see, put child in the name will make it too quorum centric. Since it
is operation in BlockDriverState, we need to keep it as generic as we could.

These operations [here we mean disconnect() and reconnect(), but probably later
some other will add more opeartions] are passed from 'upper driver' to protocol
driver [in the code we call the protocol as 'file' driver, a narrow name too],
so I chose to name it as 'driver operation'. If we can rename 'file' as
protocol, include file, nfs, sheepdog, etc, such as

bdrv_create_file -> bdrv_create_protocol
bs.file -> bs.protocol

then the 'driver operation' here would sound better.

Thanks
Yuan

Re: [Qemu-devel] [PATCH v1 2/2] block/archipelago: Use QEMU atomic builtins

2014-09-01 Thread Paolo Bonzini

Il 01/09/2014 10:58, Chrysostomos Nanakos ha scritto:
> Replace __sync builtins with the ones provided by QEMU
> for atomic operations.
> 
> Signed-off-by: Chrysostomos Nanakos 
> ---
>  block/archipelago.c |   11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/block/archipelago.c b/block/archipelago.c
> index 34f72dc..fa8cd29 100644
> --- a/block/archipelago.c
> +++ b/block/archipelago.c
> @@ -57,6 +57,7 @@
>  #include "qapi/qmp/qint.h"
>  #include "qapi/qmp/qstring.h"
>  #include "qapi/qmp/qjson.h"
> +#include "qemu/atomic.h"
>  
>  #include 
>  #include 
> @@ -214,7 +215,7 @@ static void xseg_request_handler(void *state)
>  
>  xseg_put_request(s->xseg, req, s->srcport);
>  
> -if ((__sync_add_and_fetch(&segreq->ref, -1)) == 0) {
> +if ((atomic_add_fetch(&segreq->ref, -1)) == 0) {

Why not just use "== 1" and avoid patch 1? :)

(Also, you could use atomic_fetch_dec).

>  if (!segreq->failed) {
>  reqdata->aio_cb->ret = segreq->count;
>  archipelago_finish_aiocb(reqdata);
> @@ -233,7 +234,7 @@ static void xseg_request_handler(void *state)
>  segreq->count += req->serviced;
>  xseg_put_request(s->xseg, req, s->srcport);
>  
> -if ((__sync_add_and_fetch(&segreq->ref, -1)) == 0) {
> +if ((atomic_add_fetch(&segreq->ref, -1)) == 0) {
>  if (!segreq->failed) {
>  reqdata->aio_cb->ret = segreq->count;
>  archipelago_finish_aiocb(reqdata);
> @@ -885,13 +886,13 @@ static int 
> archipelago_aio_segmented_rw(BDRVArchipelagoState *s,
>  return 0;
>  
>  err_exit:
> -__sync_add_and_fetch(&segreq->failed, 1);
> +atomic_add_fetch(&segreq->failed, 1);

You can use atomic_inc here.

Paolo

>  if (segments_nr == 1) {
> -if (__sync_add_and_fetch(&segreq->ref, -1) == 0) {
> +if (atomic_add_fetch(&segreq->ref, -1) == 0) {
>  g_free(segreq);
>  }
>  } else {
> -if ((__sync_add_and_fetch(&segreq->ref, -segments_nr + i)) == 0) {
> +if ((atomic_add_fetch(&segreq->ref, -segments_nr + i)) == 0) {
>  g_free(segreq);
>  }
>  }
>

Re: [Qemu-devel] [PATCH 8/8] quorum: add basic device recovery logic

2014-09-01 Thread Liu Yuan

On Mon, Sep 01, 2014 at 11:37:20AM +0200, Benoît Canet wrote:
> The Monday 01 Sep 2014 à 15:43:14 (+0800), Liu Yuan wrote :
> > For some configuration, quorum allow VMs to continue while some child 
> > devices
> > are broken and when the child devices are repaired and return back, we need 
> > to
> > sync dirty bits during downtime to keep data consistency.
> > 
> > The recovery logic is based on the driver state bitmap and will sync the 
> > dirty
> > bits with a timeslice window in a coroutine in this prtimive implementation.
> > 
> > Simple graph about 2 children with threshold=1 and read-pattern=fifo:
> > 
> > + denote device sync iteration
> > - IO on a single device
> > = IO on two devices
> > 
> >   sync complete, release dirty bitmap
> >  ^
> >  |
> >   -++==
> >  | |
> >  | v
> >  |   device repaired and begin to sync
> >  v
> >device broken, create a dirty bitmap
> > 
> >   This sync logic can take care of nested broken problem, that devices are
> >   broken while in sync. We just start a sync process after the devices are
> >   repaired again and switch the devices from broken to sound only when the 
> > sync
> >   completes.
> > 
> > For read-pattern=quorum mode, it enjoys the recovery logic without any 
> > problem.
> > 
> > Cc: Eric Blake 
> > Cc: Benoit Canet 
> > Cc: Kevin Wolf 
> > Cc: Stefan Hajnoczi 
> > Signed-off-by: Liu Yuan 
> > ---
> >  block/quorum.c | 189 
> > -
> >  trace-events   |   5 ++
> >  2 files changed, 191 insertions(+), 3 deletions(-)
> > 
> > diff --git a/block/quorum.c b/block/quorum.c
> > index 7b07e35..ffd7c2d 100644
> > --- a/block/quorum.c
> > +++ b/block/quorum.c
> > @@ -23,6 +23,7 @@
> >  #include "qapi/qmp/qlist.h"
> >  #include "qapi/qmp/qstring.h"
> >  #include "qapi-event.h"
> > +#include "trace.h"
> >  
> >  #define HASH_LENGTH 32
> >  
> > @@ -31,6 +32,10 @@
> >  #define QUORUM_OPT_REWRITE"rewrite-corrupted"
> >  #define QUORUM_OPT_READ_PATTERN   "read-pattern"
> >  
> > +#define SLICE_TIME  1ULL /* 100 ms */
> > +#define CHUNK_SIZE  (1 << 20) /* 1M */
> > +#define SECTORS_PER_CHUNK   (CHUNK_SIZE >> BDRV_SECTOR_BITS)
> > +
> >  /* This union holds a vote hash value */
> >  typedef union QuorumVoteValue {
> >  char h[HASH_LENGTH];   /* SHA-256 hash */
> > @@ -64,6 +69,7 @@ typedef struct QuorumVotes {
> >  
> >  /* the following structure holds the state of one quorum instance */
> >  typedef struct BDRVQuorumState {
> > +BlockDriverState *mybs;/* Quorum block driver base state */
> >  BlockDriverState **bs; /* children BlockDriverStates */
> >  int num_children;  /* children count */
> >  int threshold; /* if less than threshold children reads gave 
> > the
> > @@ -82,6 +88,10 @@ typedef struct BDRVQuorumState {
> >  */
> >  
> >  QuorumReadPattern read_pattern;
> > +BdrvDirtyBitmap *dirty_bitmap;
> > +uint8_t *sync_buf;
> > +HBitmapIter hbi;
> > +int64_t sector_num;
> >  } BDRVQuorumState;
> >  
> >  typedef struct QuorumAIOCB QuorumAIOCB;
> > @@ -290,12 +300,11 @@ static void quorum_copy_qiov(QEMUIOVector *dest, 
> > QEMUIOVector *source)
> >  }
> >  }
> >  
> > -static int next_fifo_child(QuorumAIOCB *acb)
> > +static int get_good_child(BDRVQuorumState *s, int iter)
> >  {
> > -BDRVQuorumState *s = acb->common.bs->opaque;
> >  int i;
> >  
> > -for (i = acb->child_iter; i < s->num_children; i++) {
> > +for (i = iter; i < s->num_children; i++) {
> >  if (!s->bs[i]->broken) {
> >  break;
> >  }
> > @@ -306,6 +315,13 @@ static int next_fifo_child(QuorumAIOCB *acb)
> >  return i;
> >  }
> >  
> > +static int next_fifo_child(QuorumAIOCB *acb)
> > +{
> > +BDRVQuorumState *s = acb->common.bs->opaque;
> > +
> > +return get_good_child(s, acb->child_iter);
> > +}
> > +
> >  static void quorum_aio_cb(void *opaque, int ret)
> >  {
> >  QuorumChildRequest *sacb = opaque;
> > @@ -951,6 +967,171 @@ static int parse_read_pattern(const char *opt)
> >  return -EINVAL;
> >  }
> >  
> > +static void sync_prepare(BDRVQuorumState *qs, int64_t *num)
> > +{
> > +int64_t nb, total = bdrv_nb_sectors(qs->mybs);
> > +
> > +qs->sector_num = hbitmap_iter_next(&qs->hbi);
> > +/* Wrap around if previous bits get dirty while syncing */
> > +if (qs->sector_num < 0) {
> > +bdrv_dirty_iter_init(qs->mybs, qs->dirty_bitmap, &qs->hbi);
> > +qs->sector_num = hbitmap_iter_next(&qs->hbi);
> > +assert(qs->sector_num >= 0);
> > +}
> > +
> > +for (nb = 1; nb < SECTORS_PER_CHUNK && qs->sector_num + nb < total;
> > + nb++) {
> > +if (!bdrv_get_dirty(qs->mybs, qs->dirty_bitmap, qs-

Re: [Qemu-devel] [PATCH 5/8] quorum: fix quorum_aio_cancel()

2014-09-01 Thread Liu Yuan

On Mon, Sep 01, 2014 at 11:32:04AM +0200, Benoît Canet wrote:
> The Monday 01 Sep 2014 à 17:26:09 (+0800), Liu Yuan wrote :
> > On Mon, Sep 01, 2014 at 10:35:27AM +0200, Benoît Canet wrote:
> > > The Monday 01 Sep 2014 à 15:43:11 (+0800), Liu Yuan wrote :
> > > > For a fifo read pattern, we only have one running aio
> > > 
> > > >(possible other cases that has less number than num_children in the 
> > > >future)
> > > I have trouble understanding this part of the commit message could you try
> > > to clarify it ?
> > 
> > Until this patch, we have two cases, read single or read all. But later 
> > patch
> > allow VMs to continue if some devices are down. So the discrete number 1 
> > and N
> > becomes a range [1, N], that is possible running requests are from 1 to N.
> 
> Why not 
> 
> (In some other cases some children of the num_children BDS could be disabled
> reducing the number of requests needed) ?
> 

Sounds better, I'll take yours, thanks!

Yuan

Re: [Qemu-devel] [PULL 00/35] Block patches

2014-09-01 Thread Peter Maydell

On 29 August 2014 17:29, Stefan Hajnoczi  wrote:
> The following changes since commit a6aebb38ba4682951ab04fe6d6e6b169bd9e4dca:
>
>   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
> staging (2014-08-28 17:08:13 +0100)
>
> are available in the git repository at:
>
>
>   git://github.com/stefanha/qemu.git tags/block-pull-request
>
> for you to fetch changes up to 8df3abfceef557551f00adac1618ddd6fe46f85c:
>
>   quorum: Fix leak of opts in quorum_open (2014-08-29 17:10:18 +0100)
>
> 
> Block pull request

Applied, thanks.

-- PMM

[Qemu-devel] KVM call for agenda for 2014-09-02

2014-09-01 Thread Juan Quintela


Hi

Please, send any topic that you are interested in covering.

 Thanks, Juan.

 Call details:

 15:00 CEST
 13:00 UTC
 09:00 EDT

 Every two weeks

By popular demand, a google calendar public entry with it

 
https://www.google.com/calendar/embed?src=dG9iMXRqcXAzN3Y4ZXZwNzRoMHE4a3BqcXNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ

 (Let me know if you have any problems with the calendar entry)

If you need phone number details,  contact me privately

Thanks, Juan.

Re: [Qemu-devel] linux-user: enabling binfmt P flag

2014-09-01 Thread Riku Voipio

On Mon, Sep 01, 2014 at 10:12:18AM +0100, Peter Maydell wrote:
> On 1 September 2014 09:51, Paolo Bonzini  wrote:
> > Il 29/08/2014 20:01, Peter Maydell ha scritto:
> >> [cc'ing MJT for more distro opinion since I think fundamentally
> >> the choice we ought to make upstream is "what's not going to
> >> screw over distros"... Paolo, is there a RedHat QEMU maintainer
> >> who would have an opinion here?]
> >
> > There's Cole Robinson.
> >
> > BTW, Fedora doesn't use the binfmt scripts from QEMU
> 
> That's ok, nobody with any sense doesn't.
> 
> >, but does reuse the
> > binfmt lines.  We'd just add Ps and we'd be fine.
> 
> But this would break all your existing users' existing
> chroot setups. That's the question I'm after an answer to:
> what do you (as a distro) think would be acceptable as
> transitional breakage, if anything?
> 
> > However, the problem is not really for distros.  Packagers just read the
> > release notes and adjust whatever needs to be adjusted.  The problem is
> > for people who compile from source and are bit by conflicting binfmt
> > formats from their distro.

Or people with chroots from older/different distros, already
having a qemu-static inside.

> This is one reason I like the "one binary name for O and
> one for P" approach.

Maybe the new binary name could be something more generic than qemu-x-binfmt.
Say qemu-x-user. Then distributions and users can drop the old binary
name over time, and we are back to one binary again eventually.

> > The solution could be to extend binfmt_misc so that it sets two
> > environment variables BINFMT_MISC_PID and BINFMT_MISC_ORIG_ARGV0.  The
> > former is set to the pid of the binfmt "interpreter" program, the latter
> > to the argv[0] value.  Then QEMU can check if BINFMT_MISC_PID matches
> > getpid() and, if so, trust the BINFMT_MISC_ORIG_ARGV0 value.
 
> Certainly if we're in a position to get the kernel to be more
> informative about how it invoked us that would be the ideal.

There is AT_FLAGS, that seems unused atm (only ever set to 0).

http://lxr.free-electrons.com/source/fs/binfmt_elf.c#L240

As indeed I afree with Paolo that (in hindsight) it was misdesign for the
kernel to tell the application how it invoked us..

Riku

Riku

Re: [Qemu-devel] [PATCH v3 2/2] docs: update ivshmem device spec

2014-09-01 Thread David Marchand


On 08/28/2014 11:49 AM, Stefan Hajnoczi wrote:

On Tue, Aug 26, 2014 at 01:04:30PM +0200, Paolo Bonzini wrote:

Il 26/08/2014 08:47, David Marchand ha scritto:


Using a version message supposes we want to keep ivshmem-server and QEMU
separated (for example, in two distribution packages) while we can avoid
this, so why would we do so ?

If we want the ivshmem-server to come with QEMU, then both are supposed
to be aligned on your system.


What about upgrading QEMU and ivshmem-server while you have existing
guests?  You cannot restart ivshmem-server, and the new QEMU would have
to talk to the old ivshmem-server.


Version negotiation also helps avoid confusion if someone combines
ivshmem-server and QEMU from different origins (e.g. built from source
and distro packaged).

It's a safeguard to prevent hard-to-diagnose failures when the system is
misconfigured.



Hum, so you want the code to be defensive against mis-use, why not.

I wanted to keep modifications on ivshmem as little as possible in a 
first phase (all the more so as there are potential ivshmem users out 
there that I think will be impacted by a protocol change).


Sending the version as the first "vm_id" with an associated fd to -1 
before sending the real client id should work with existing QEMU client 
code (hw/misc/ivshmem.c).


Do you have a better idea ?
Is there a best practice in QEMU for "version negotiation" that could 
work with ivshmem protocol ?


I have a v4 ready with this (and all the pending comments), I will send 
it later unless a better idea is exposed.



Thanks.

--
David Marchand

Re: [Qemu-devel] IO accounting overhaul

2014-09-01 Thread Markus Armbruster

Cc'ing libvirt following Stefan's lead.

Benoît Canet  writes:

> Hi,
>
> I collected some items of a cloud provider wishlist regarding I/O accouting.

Feedback from real power-users, lovely!

> In a cloud I/O accouting can have 3 purpose: billing, helping the customers
> and doing metrology to help the cloud provider seeks hidden costs.
>
> I'll cover the two former topic in this mail because they are the most 
> important
> business wize.
>
> 1) prefered place to collect billing IO accounting data:
> 
> For billing purpose the collected data must be as close as possible to what 
> the
> customer would see by using iostats in his vm.

Good point.

> The first conclusion we can draw is that the choice of collecting IO accouting
> data used for billing in the block devices models is right.

Slightly rephrasing: doing I/O accounting in the block device models is
right for billing.

There may be other uses for I/O accounting, with different preferences.
For instance, data on how exactly guest I/O gets translated to host I/O
as it flows through the nodes in the block graph could be useful.

Doesn't diminish the need for accurate billing information, of course.

> 2) what to do with occurences of rare events:
> -
>
> Another point is that QEMU developpers agree that they don't know which policy
> to apply to some I/O accounting events.
> Must QEMU discard invalid I/O write IO or account them as done ?
> Must QEMU count a failed read I/O as done ?
>
> When discusting this with a cloud provider the following appears:
> these decisions
> are really specific to each cloud provider and QEMU should not implement them.

Good point, consistent with the old advice to avoid baking policy into
inappropriately low levels of the stack.

> The right thing to do is to add accouting counters to collect these events.
>
> Moreover these rare events are precious troubleshooting data so it's
> an additional
> reason not to toss them.

Another good point.

> 3) list of block I/O accouting metrics wished for billing and helping
> the customers
> ---
>
> Basic I/O accouting data will end up making the customers bills.
> Extra I/O accouting informations would be a precious help for the cloud 
> provider
> to implement a monitoring panel like Amazon Cloudwatch.

These are the first two from your list of three purposes, i.e. the ones
you promised to cover here.

> Here is the list of counters and statitics I would like to help
> implement in QEMU.
>
> This is the most important part of the mail and the one I would like
> the community
> review the most.
>
> Once this list is settled I would proceed to implement the required
> infrastructure
> in QEMU before using it in the device models.

For context, let me recap how I/O accounting works now.

The BlockDriverState abstract data type (short: BDS) can hold the
following accounting data:

uint64_t nr_bytes[BDRV_MAX_IOTYPE];
uint64_t nr_ops[BDRV_MAX_IOTYPE];
uint64_t total_time_ns[BDRV_MAX_IOTYPE];
uint64_t wr_highest_sector;

where BDRV_MAX_IOTYPE enumerates read, write, flush.

wr_highest_sector is a high watermark updated by the block layer as it
writes sectors.

The other three are *not* touched by the block layer.  Instead, the
block layer provides a pair of functions for device models to update
them:

void bdrv_acct_start(BlockDriverState *bs, BlockAcctCookie *cookie,
int64_t bytes, enum BlockAcctType type);
void bdrv_acct_done(BlockDriverState *bs, BlockAcctCookie *cookie);

bdrv_acct_start() initializes cookie for a read, write, or flush
operation of a certain size.  The size of a flush is always zero.

bdrv_acct_done() adds the operations to the BDS's accounting data.
total_time_ns is incremented by the time between _start() and _done().

You may call _start() without calling _done().  That's a feature.
Device models use it to avoid accounting some requests.

Device models are not supposed to mess with cookie directly, only
through these two functions.

Some device models implement accounting, some don't.  The ones that do
don't agree on how to count invalid guest requests (the ones not passed
to block layer) and failed requests (passed to block layer and failed
there).  It's a mess in part caused by us never writing down what
exactly device models are expected to do.

Accounting data is used by "query-blockstats", and nothing else.

Corollary: even though every BDS holds accounting data, only the ones in
"top" BDSes ever get used.  This is a common block layer blemish, and
we're working on cleaning it up.

If a device model doesn't implement accounting, query-blockstats lies.
Fortunately, its lies are pretty transparent (everything's zero) as long
as you don't do things like connecting a backend to a device model that
doesn't implement accounting after disconnecting i

Re: [Qemu-devel] [PATCH 1/1] hw/pci-assign: split pci-assign.c

2014-09-01 Thread Michael S. Tsirkin

On Mon, Sep 01, 2014 at 05:26:24PM +0800, Chen, Tiejun wrote:
> On 2014/9/1 16:27, Michael S. Tsirkin wrote:
> >On Mon, Sep 01, 2014 at 10:07:19AM +0800, Tiejun Chen wrote:
> >>We will try to reuse assign_dev_load_option_rom in xen side, and
> >>especially its a good beginning to unify pci assign codes both on
> >>kvm and xen in the future.
> >>
> >>Signed-off-by: Tiejun Chen 
> >>---
> 
> [snip]
> 
> >>+ */
> >>+#ifndef PCI_ASSIGN_H
> >>+#define PCI_ASSIGN_H
> >>+
> >>+#include 
> >>+#include 
> >>+#include 
> >>+#include 
> >>+#include 
> >>+#include 
> >>+#include "hw/hw.h"
> >>+#include "hw/i386/pc.h"
> >>+#include "qemu/error-report.h"
> >>+#include "ui/console.h"
> >>+#include "hw/loader.h"
> >>+#include "monitor/monitor.h"
> >>+#include "qemu/range.h"
> >>+#include "sysemu/sysemu.h"
> >>+#include "hw/pci/pci.h"
> >>+#include "hw/pci/msi.h"
> >>+#include "kvm_i386.h"
> >
> >Why are you pulling all these headers here?
> >Please include the minimum required.
> 
> So just leave #include "hw/pci/pci.h".
> 
> >
> >>+
> >>+#define MSIX_PAGE_SIZE 0x1000
> >>+
> >>+/* From linux/ioport.h */
> >>+#define IORESOURCE_IO   0x0100  /* Resource type */
> >>+#define IORESOURCE_MEM  0x0200
> 
> [snip]
> 
> >>+uint8_t emulate_config_read[PCI_CONFIG_SPACE_SIZE];
> >>+uint8_t emulate_config_write[PCI_CONFIG_SPACE_SIZE];
> >>+int msi_virq_nr;
> >>+int *msi_virq;
> >>+MSIXTableEntry *msix_table;
> >>+hwaddr msix_table_addr;
> >>+uint16_t msix_max;
> >>+MemoryRegion mmio;
> >>+char *configfd_name;
> >>+int32_t bootindex;
> >>+} AssignedDevice;
> >>+
> >
> >Why are you moving the above here?
> 
> As I said in the patch head description, I think this is a good beginning to
> unify pci-assign in both KVM and XEN. So I tried to move these common
> stuffs. Although we mightn't use them directly in the future, but I guess we
> still need to move them into this head file.
> 
> If you think we should do this on-demand exactly, I can move back them to
> pci-assign.c.

Yes, I think this is better on demand.

> >
> >
> >>+int dev_load_option_rom(PCIDevice *dev, struct Object *owner, void *ptr,
> >>+unsigned int domain, unsigned int bus,
> >>+unsigned int slot, unsigned int function);
> >
> >Please use a header-specific prefix to avoid global namespace pollution.
> >pci_assign_dev_load_option_rom?
> 
> Looks good so I will follow-up yours.
> 
> Thanks
> Tiejun
> 
> >
> >>+#endif /* PCI_ASSIGN_H */
> >>--
> >>1.9.1
> >
> >

Re: [Qemu-devel] [PATCH 0/5] s390x/gdb: various fixes

2014-09-01 Thread Christian Borntraeger

On 29/08/14 15:52, Jens Freimann wrote:
> Conny, Alex, Christian,
> 
> here are some patches improving our gdb support. 
> 
> * Patch 1 fixes a bug where the cc was changed accidentally. 
> * Patch 2 adds the gdb feature XML files for s390x
> * Patch 3 Define acr and fpr registers as coprocessor registers. This allows 
> us
>to reuse the feature XML files. 
> * Patch 4 whitespace fixes
> * Patch 5 changes common code and other architectures with gdb target.xml 
> support.
>It adds a field gdb_arch_name to the XML description of the CPU and to 
> struct
>CPUClass.  It allows the remote gdb to detect the target architecture
>in cases where it can't tell otherwise.
> 
> David Hildenbrand (5):
>   s390x/gdb: don't touch the cc if tcg is not enabled
>   s390x/gdb: add the feature xml files for s390x
>   s390x/gdb: generate target.xml and handle fp/ac as coprocessors
>   s390x/gdb: coding style fixes
>   gdb: provide the name of the architecture in the target.xml
> 
>  configure   |   1 +
>  gdb-xml/s390-acr.xml|  26 +++
>  gdb-xml/s390-fpr.xml|  27 +++
>  gdb-xml/s390x-core64.xml|  28 
>  gdbstub.c   |  19 +---
>  include/qom/cpu.h   |   2 +
>  target-arm/cpu64.c  |   1 +
>  target-ppc/translate_init.c |   2 +
>  target-s390x/cpu-qom.h  |   1 +
>  target-s390x/cpu.c  |   5 +-
>  target-s390x/cpu.h  |  40 +---
>  target-s390x/gdbstub.c  | 109 
> +---
>  12 files changed, 188 insertions(+), 73 deletions(-)
>  create mode 100644 gdb-xml/s390-acr.xml
>  create mode 100644 gdb-xml/s390-fpr.xml
>  create mode 100644 gdb-xml/s390x-core64.xml
> 

Applied 1-4.

Peter,
do you want to push patch5 yourself?
As an alternative I can push it via the s390 tree, I need your ACK in that case.

Alex, (or Alexey?) can you ACK/NACK patch 5 from the power perspective?

[Qemu-devel] [PATCH v8 5/7] libqos: Added test case for configuration changes in virtio-blk test

2014-09-01 Thread Marc Marí

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Marc Marí 
---
 tests/virtio-blk-test.c |   34 ++
 1 file changed, 34 insertions(+)

diff --git a/tests/virtio-blk-test.c b/tests/virtio-blk-test.c
index 95e6861..07ae754 100644
--- a/tests/virtio-blk-test.c
+++ b/tests/virtio-blk-test.c
@@ -359,6 +359,39 @@ static void pci_indirect(void)
 test_end();
 }
 
+static void pci_config(void)
+{
+QVirtioPCIDevice *dev;
+QPCIBus *bus;
+int n_size = TEST_IMAGE_SIZE / 2;
+void *addr;
+uint64_t capacity;
+
+bus = test_start();
+
+dev = virtio_blk_init(bus);
+
+/* MSI-X is not enabled */
+addr = dev->addr + QVIRTIO_DEVICE_SPECIFIC_NO_MSIX;
+
+capacity = qvirtio_config_readq(&qvirtio_pci, &dev->vdev, addr);
+g_assert_cmpint(capacity, ==, TEST_IMAGE_SIZE / 512);
+
+qvirtio_set_driver_ok(&qvirtio_pci, &dev->vdev);
+
+qmp("{ 'execute': 'block_resize', 'arguments': { 'device': 'drive0', "
+" 'size': %d } }", n_size);
+g_assert(qvirtio_wait_isr(&qvirtio_pci, &dev->vdev, 0x2,
+QVIRTIO_BLK_TIMEOUT));
+
+capacity = qvirtio_config_readq(&qvirtio_pci, &dev->vdev, addr);
+g_assert_cmpint(capacity, ==, n_size / 512);
+
+qvirtio_pci_device_disable(dev);
+g_free(dev);
+test_end();
+}
+
 int main(int argc, char **argv)
 {
 int ret;
@@ -367,6 +400,7 @@ int main(int argc, char **argv)
 
 g_test_add_func("/virtio/blk/pci/basic", pci_basic);
 g_test_add_func("/virtio/blk/pci/indirect", pci_indirect);
+g_test_add_func("/virtio/blk/pci/config", pci_config);
 
 ret = g_test_run();
 
-- 
1.7.10.4

[Qemu-devel] [PATCH v8 0/7] Virtio PCI libqos driver

2014-09-01 Thread Marc Marí

v3: Solved problems, added indirect descriptor support and test for
configuration changes
v4: Solved bugs, changed some interfaces, added MSI-X and event_idx support.
v5: Simplified virtio-blk-test, solved bugs, avoid patches already merged.
v6: Solve bugs (qpci_iomap changed prototype)
v7: Solve bugs (qvirtio_pci_config_readq endianness)
v8: Solve bugs (qvirtio_pci_config_readq endianness)

Marc Marí (7):
  tests: Functions bus_foreach and device_find from libqos virtio API
  tests: Add virtio device initialization
  libqos: Added basic virtqueue support to virtio implementation
  libqos: Added indirect descriptor support to virtio implementation
  libqos: Added test case for configuration changes in virtio-blk test
  libqos: Added MSI-X support
  libqos: Added EVENT_IDX support

 tests/Makefile|3 +-
 tests/libqos/pci.c|  111 +++-
 tests/libqos/pci.h|   10 +
 tests/libqos/virtio-pci.c |  343 +
 tests/libqos/virtio-pci.h |   61 +
 tests/libqos/virtio.c |  257 +++
 tests/libqos/virtio.h |  182 +
 tests/libqtest.c  |   48 
 tests/libqtest.h  |7 +
 tests/virtio-blk-test.c   |  622 -
 10 files changed, 1633 insertions(+), 11 deletions(-)
 create mode 100644 tests/libqos/virtio-pci.c
 create mode 100644 tests/libqos/virtio-pci.h
 create mode 100644 tests/libqos/virtio.c
 create mode 100644 tests/libqos/virtio.h

-- 
1.7.10.4

[Qemu-devel] [PATCH v8 2/7] tests: Add virtio device initialization

2014-09-01 Thread Marc Marí

Add functions to read and write virtio header fields.
Add status bit setting in virtio-blk-device.

Signed-off-by: Marc Marí 
---
 tests/Makefile|2 +-
 tests/libqos/virtio-pci.c |   71 +
 tests/libqos/virtio-pci.h |   18 
 tests/libqos/virtio.c |   55 +++
 tests/libqos/virtio.h |   30 +++
 tests/libqtest.c  |   48 ++
 tests/libqtest.h  |7 +
 tests/virtio-blk-test.c   |   31 +---
 8 files changed, 257 insertions(+), 5 deletions(-)
 create mode 100644 tests/libqos/virtio.c

diff --git a/tests/Makefile b/tests/Makefile
index a1936f1..ff631fe 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -294,7 +294,7 @@ libqos-obj-y += tests/libqos/i2c.o
 libqos-pc-obj-y = $(libqos-obj-y) tests/libqos/pci-pc.o
 libqos-pc-obj-y += tests/libqos/malloc-pc.o
 libqos-omap-obj-y = $(libqos-obj-y) tests/libqos/i2c-omap.o
-libqos-virtio-obj-y = $(libqos-obj-y) $(libqos-pc-obj-y) 
tests/libqos/virtio-pci.o
+libqos-virtio-obj-y = $(libqos-obj-y) $(libqos-pc-obj-y) tests/libqos/virtio.o 
tests/libqos/virtio-pci.o
 
 tests/rtc-test$(EXESUF): tests/rtc-test.o
 tests/m48t59-test$(EXESUF): tests/m48t59-test.o
diff --git a/tests/libqos/virtio-pci.c b/tests/libqos/virtio-pci.c
index fde1b1f..1a37620 100644
--- a/tests/libqos/virtio-pci.c
+++ b/tests/libqos/virtio-pci.c
@@ -8,6 +8,7 @@
  */
 
 #include 
+#include 
 #include "libqtest.h"
 #include "libqos/virtio.h"
 #include "libqos/virtio-pci.h"
@@ -55,6 +56,64 @@ static void qvirtio_pci_assign_device(QVirtioDevice *d, void 
*data)
 *vpcidev = (QVirtioPCIDevice *)d;
 }
 
+static uint8_t qvirtio_pci_config_readb(QVirtioDevice *d, void *addr)
+{
+QVirtioPCIDevice *dev = (QVirtioPCIDevice *)d;
+return qpci_io_readb(dev->pdev, addr);
+}
+
+static uint16_t qvirtio_pci_config_readw(QVirtioDevice *d, void *addr)
+{
+QVirtioPCIDevice *dev = (QVirtioPCIDevice *)d;
+return qpci_io_readw(dev->pdev, addr);
+}
+
+static uint32_t qvirtio_pci_config_readl(QVirtioDevice *d, void *addr)
+{
+QVirtioPCIDevice *dev = (QVirtioPCIDevice *)d;
+return qpci_io_readl(dev->pdev, addr);
+}
+
+static uint64_t qvirtio_pci_config_readq(QVirtioDevice *d, void *addr)
+{
+QVirtioPCIDevice *dev = (QVirtioPCIDevice *)d;
+int i;
+uint64_t u64 = 0;
+
+if (qtest_big_endian()) {
+for (i = 0; i < 8; ++i) {
+u64 |= (uint64_t)qpci_io_readb(dev->pdev, addr + i) << (7 - i) * 8;
+}
+} else {
+for (i = 0; i < 8; ++i) {
+u64 |= (uint64_t)qpci_io_readb(dev->pdev, addr + i) << i * 8;
+}
+}
+
+return u64;
+}
+
+static uint8_t qvirtio_pci_get_status(QVirtioDevice *d)
+{
+QVirtioPCIDevice *dev = (QVirtioPCIDevice *)d;
+return qpci_io_readb(dev->pdev, dev->addr + QVIRTIO_DEVICE_STATUS);
+}
+
+static void qvirtio_pci_set_status(QVirtioDevice *d, uint8_t status)
+{
+QVirtioPCIDevice *dev = (QVirtioPCIDevice *)d;
+qpci_io_writeb(dev->pdev, dev->addr + QVIRTIO_DEVICE_STATUS, status);
+}
+
+const QVirtioBus qvirtio_pci = {
+.config_readb = qvirtio_pci_config_readb,
+.config_readw = qvirtio_pci_config_readw,
+.config_readl = qvirtio_pci_config_readl,
+.config_readq = qvirtio_pci_config_readq,
+.get_status = qvirtio_pci_get_status,
+.set_status = qvirtio_pci_set_status,
+};
+
 void qvirtio_pci_foreach(QPCIBus *bus, uint16_t device_type,
 void (*func)(QVirtioDevice *d, void *data), void *data)
 {
@@ -73,3 +132,15 @@ QVirtioPCIDevice *qvirtio_pci_device_find(QPCIBus *bus, 
uint16_t device_type)
 
 return dev;
 }
+
+void qvirtio_pci_device_enable(QVirtioPCIDevice *d)
+{
+qpci_device_enable(d->pdev);
+d->addr = qpci_iomap(d->pdev, 0, NULL);
+g_assert(d->addr != NULL);
+}
+
+void qvirtio_pci_device_disable(QVirtioPCIDevice *d)
+{
+qpci_iounmap(d->pdev, d->addr);
+}
diff --git a/tests/libqos/virtio-pci.h b/tests/libqos/virtio-pci.h
index 5101abb..26f902e 100644
--- a/tests/libqos/virtio-pci.h
+++ b/tests/libqos/virtio-pci.h
@@ -13,12 +13,30 @@
 #include "libqos/virtio.h"
 #include "libqos/pci.h"
 
+#define QVIRTIO_DEVICE_FEATURES 0x00
+#define QVIRTIO_GUEST_FEATURES  0x04
+#define QVIRTIO_QUEUE_ADDRESS   0x08
+#define QVIRTIO_QUEUE_SIZE  0x0C
+#define QVIRTIO_QUEUE_SELECT0x0E
+#define QVIRTIO_QUEUE_NOTIFY0x10
+#define QVIRTIO_DEVICE_STATUS   0x12
+#define QVIRTIO_ISR_STATUS  0x13
+#define QVIRTIO_MSIX_CONF_VECTOR0x14
+#define QVIRTIO_MSIX_QUEUE_VECTOR   0x16
+#define QVIRTIO_DEVICE_SPECIFIC_MSIX0x18
+#define QVIRTIO_DEVICE_SPECIFIC_NO_MSIX 0x14
+
 typedef struct QVirtioPCIDevice {
 QVirtioDevice vdev;
 QPCIDevice *pdev;
+void *addr;
 } QVirtioPCIDevice;
 
+extern const QVirtioBus qvirtio_pci;
+
 void qvirtio_pci_foreach(QPCIBus *bus, uint16_t device_type,

[Qemu-devel] [PATCH v8 4/7] libqos: Added indirect descriptor support to virtio implementation

2014-09-01 Thread Marc Marí

Add functions necessary for working with indirect descriptors.
Add test using new functions.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Marc Marí 
---
 tests/libqos/virtio-pci.c |   10 +
 tests/libqos/virtio.c |   64 +
 tests/libqos/virtio.h |   22 +-
 tests/virtio-blk-test.c   |  100 +
 4 files changed, 195 insertions(+), 1 deletion(-)

diff --git a/tests/libqos/virtio-pci.c b/tests/libqos/virtio-pci.c
index 12b06a2..cf15f7a 100644
--- a/tests/libqos/virtio-pci.c
+++ b/tests/libqos/virtio-pci.c
@@ -107,6 +107,12 @@ static void qvirtio_pci_set_features(QVirtioDevice *d, 
uint32_t features)
 qpci_io_writel(dev->pdev, dev->addr + QVIRTIO_GUEST_FEATURES, features);
 }
 
+static uint32_t qvirtio_pci_get_guest_features(QVirtioDevice *d)
+{
+QVirtioPCIDevice *dev = (QVirtioPCIDevice *)d;
+return qpci_io_readl(dev->pdev, dev->addr + QVIRTIO_GUEST_FEATURES);
+}
+
 static uint8_t qvirtio_pci_get_status(QVirtioDevice *d)
 {
 QVirtioPCIDevice *dev = (QVirtioPCIDevice *)d;
@@ -146,10 +152,12 @@ static void qvirtio_pci_set_queue_address(QVirtioDevice 
*d, uint32_t pfn)
 static QVirtQueue *qvirtio_pci_virtqueue_setup(QVirtioDevice *d,
 QGuestAllocator *alloc, uint16_t index)
 {
+uint32_t feat;
 uint64_t addr;
 QVirtQueue *vq;
 
 vq = g_malloc0(sizeof(*vq));
+feat = qvirtio_pci_get_guest_features(d);
 
 qvirtio_pci_queue_select(d, index);
 vq->index = index;
@@ -157,6 +165,7 @@ static QVirtQueue 
*qvirtio_pci_virtqueue_setup(QVirtioDevice *d,
 vq->free_head = 0;
 vq->num_free = vq->size;
 vq->align = QVIRTIO_PCI_ALIGN;
+vq->indirect = (feat & QVIRTIO_F_RING_INDIRECT_DESC) != 0;
 
 /* Check different than 0 */
 g_assert_cmpint(vq->size, !=, 0);
@@ -186,6 +195,7 @@ const QVirtioBus qvirtio_pci = {
 .config_readq = qvirtio_pci_config_readq,
 .get_features = qvirtio_pci_get_features,
 .set_features = qvirtio_pci_set_features,
+.get_guest_features = qvirtio_pci_get_guest_features,
 .get_status = qvirtio_pci_get_status,
 .set_status = qvirtio_pci_set_status,
 .get_isr_status = qvirtio_pci_get_isr_status,
diff --git a/tests/libqos/virtio.c b/tests/libqos/virtio.c
index de92642..b1cab1f 100644
--- a/tests/libqos/virtio.c
+++ b/tests/libqos/virtio.c
@@ -116,6 +116,51 @@ void qvring_init(const QGuestAllocator *alloc, QVirtQueue 
*vq, uint64_t addr)
 writew(vq->used, 0);
 }
 
+QVRingIndirectDesc *qvring_indirect_desc_setup(QVirtioDevice *d,
+QGuestAllocator *alloc, uint16_t elem)
+{
+int i;
+QVRingIndirectDesc *indirect = g_malloc(sizeof(*indirect));
+
+indirect->index = 0;
+indirect->elem = elem;
+indirect->desc = guest_alloc(alloc, sizeof(QVRingDesc)*elem);
+
+for (i = 0; i < elem - 1; ++i) {
+/* indirect->desc[i].addr */
+writeq(indirect->desc + (16 * i), 0);
+/* indirect->desc[i].flags */
+writew(indirect->desc + (16 * i) + 12, QVRING_DESC_F_NEXT);
+/* indirect->desc[i].next */
+writew(indirect->desc + (16 * i) + 14, i + 1);
+}
+
+return indirect;
+}
+
+void qvring_indirect_desc_add(QVRingIndirectDesc *indirect, uint64_t data,
+uint32_t len, bool write)
+{
+uint16_t flags;
+
+g_assert_cmpint(indirect->index, <, indirect->elem);
+
+flags = readw(indirect->desc + (16 * indirect->index) + 12);
+
+if (write) {
+flags |= QVRING_DESC_F_WRITE;
+}
+
+/* indirect->desc[indirect->index].addr */
+writeq(indirect->desc + (16 * indirect->index), data);
+/* indirect->desc[indirect->index].len */
+writel(indirect->desc + (16 * indirect->index) + 8, len);
+/* indirect->desc[indirect->index].flags */
+writew(indirect->desc + (16 * indirect->index) + 12, flags);
+
+indirect->index++;
+}
+
 uint32_t qvirtqueue_add(QVirtQueue *vq, uint64_t data, uint32_t len, bool 
write,
 bool next)
 {
@@ -140,6 +185,25 @@ uint32_t qvirtqueue_add(QVirtQueue *vq, uint64_t data, 
uint32_t len, bool write,
 return vq->free_head++; /* Return and increase, in this order */
 }
 
+uint32_t qvirtqueue_add_indirect(QVirtQueue *vq, QVRingIndirectDesc *indirect)
+{
+g_assert(vq->indirect);
+g_assert_cmpint(vq->size, >=, indirect->elem);
+g_assert_cmpint(indirect->index, ==, indirect->elem);
+
+vq->num_free--;
+
+/* vq->desc[vq->free_head].addr */
+writeq(vq->desc + (16 * vq->free_head), indirect->desc);
+/* vq->desc[vq->free_head].len */
+writel(vq->desc + (16 * vq->free_head) + 8,
+sizeof(QVRingDesc) * indirect->elem);
+/* vq->desc[vq->free_head].flags */
+writew(vq->desc + (16 * vq->free_head) + 12, QVRING_DESC_F_INDIRECT);
+
+return vq->free_head++; /* Return and

[Qemu-devel] [PATCH v8 1/7] tests: Functions bus_foreach and device_find from libqos virtio API

2014-09-01 Thread Marc Marí

Virtio header has been changed to compile and work with a real device.
Functions bus_foreach and device_find have been implemented for PCI.
Virtio-blk test case now opens a fake device.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Marc Marí 
---
 tests/Makefile|3 +-
 tests/libqos/virtio-pci.c |   75 +
 tests/libqos/virtio-pci.h |   24 +++
 tests/libqos/virtio.h |   23 ++
 tests/virtio-blk-test.c   |   61 +++-
 5 files changed, 177 insertions(+), 9 deletions(-)
 create mode 100644 tests/libqos/virtio-pci.c
 create mode 100644 tests/libqos/virtio-pci.h
 create mode 100644 tests/libqos/virtio.h

diff --git a/tests/Makefile b/tests/Makefile
index 837e9c8..a1936f1 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -294,6 +294,7 @@ libqos-obj-y += tests/libqos/i2c.o
 libqos-pc-obj-y = $(libqos-obj-y) tests/libqos/pci-pc.o
 libqos-pc-obj-y += tests/libqos/malloc-pc.o
 libqos-omap-obj-y = $(libqos-obj-y) tests/libqos/i2c-omap.o
+libqos-virtio-obj-y = $(libqos-obj-y) $(libqos-pc-obj-y) 
tests/libqos/virtio-pci.o
 
 tests/rtc-test$(EXESUF): tests/rtc-test.o
 tests/m48t59-test$(EXESUF): tests/m48t59-test.o
@@ -315,7 +316,7 @@ tests/vmxnet3-test$(EXESUF): tests/vmxnet3-test.o
 tests/ne2000-test$(EXESUF): tests/ne2000-test.o
 tests/wdt_ib700-test$(EXESUF): tests/wdt_ib700-test.o
 tests/virtio-balloon-test$(EXESUF): tests/virtio-balloon-test.o
-tests/virtio-blk-test$(EXESUF): tests/virtio-blk-test.o
+tests/virtio-blk-test$(EXESUF): tests/virtio-blk-test.o $(libqos-virtio-obj-y)
 tests/virtio-net-test$(EXESUF): tests/virtio-net-test.o
 tests/virtio-rng-test$(EXESUF): tests/virtio-rng-test.o
 tests/virtio-scsi-test$(EXESUF): tests/virtio-scsi-test.o
diff --git a/tests/libqos/virtio-pci.c b/tests/libqos/virtio-pci.c
new file mode 100644
index 000..fde1b1f
--- /dev/null
+++ b/tests/libqos/virtio-pci.c
@@ -0,0 +1,75 @@
+/*
+ * libqos virtio PCI driver
+ *
+ * Copyright (c) 2014 Marc Marí
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include 
+#include "libqtest.h"
+#include "libqos/virtio.h"
+#include "libqos/virtio-pci.h"
+#include "libqos/pci.h"
+#include "libqos/pci-pc.h"
+
+#include "hw/pci/pci_regs.h"
+
+typedef struct QVirtioPCIForeachData {
+void (*func)(QVirtioDevice *d, void *data);
+uint16_t device_type;
+void *user_data;
+} QVirtioPCIForeachData;
+
+static QVirtioPCIDevice *qpcidevice_to_qvirtiodevice(QPCIDevice *pdev)
+{
+QVirtioPCIDevice *vpcidev;
+vpcidev = g_malloc0(sizeof(*vpcidev));
+
+if (pdev) {
+vpcidev->pdev = pdev;
+vpcidev->vdev.device_type =
+qpci_config_readw(vpcidev->pdev, PCI_SUBSYSTEM_ID);
+}
+
+return vpcidev;
+}
+
+static void qvirtio_pci_foreach_callback(
+QPCIDevice *dev, int devfn, void *data)
+{
+QVirtioPCIForeachData *d = data;
+QVirtioPCIDevice *vpcidev = qpcidevice_to_qvirtiodevice(dev);
+
+if (vpcidev->vdev.device_type == d->device_type) {
+d->func(&vpcidev->vdev, d->user_data);
+} else {
+g_free(vpcidev);
+}
+}
+
+static void qvirtio_pci_assign_device(QVirtioDevice *d, void *data)
+{
+QVirtioPCIDevice **vpcidev = data;
+*vpcidev = (QVirtioPCIDevice *)d;
+}
+
+void qvirtio_pci_foreach(QPCIBus *bus, uint16_t device_type,
+void (*func)(QVirtioDevice *d, void *data), void *data)
+{
+QVirtioPCIForeachData d = { .func = func,
+.device_type = device_type,
+.user_data = data };
+
+qpci_device_foreach(bus, QVIRTIO_VENDOR_ID, -1,
+qvirtio_pci_foreach_callback, &d);
+}
+
+QVirtioPCIDevice *qvirtio_pci_device_find(QPCIBus *bus, uint16_t device_type)
+{
+QVirtioPCIDevice *dev = NULL;
+qvirtio_pci_foreach(bus, device_type, qvirtio_pci_assign_device, &dev);
+
+return dev;
+}
diff --git a/tests/libqos/virtio-pci.h b/tests/libqos/virtio-pci.h
new file mode 100644
index 000..5101abb
--- /dev/null
+++ b/tests/libqos/virtio-pci.h
@@ -0,0 +1,24 @@
+/*
+ * libqos virtio PCI definitions
+ *
+ * Copyright (c) 2014 Marc Marí
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef LIBQOS_VIRTIO_PCI_H
+#define LIBQOS_VIRTIO_PCI_H
+
+#include "libqos/virtio.h"
+#include "libqos/pci.h"
+
+typedef struct QVirtioPCIDevice {
+QVirtioDevice vdev;
+QPCIDevice *pdev;
+} QVirtioPCIDevice;
+
+void qvirtio_pci_foreach(QPCIBus *bus, uint16_t device_type,
+void (*func)(QVirtioDevice *d, void *data), void *data);
+QVirtioPCIDevice *qvirtio_pci_device_find(QPCIBus *bus, uint16_t device_type);
+#endif
diff --git a/tests/libqos/virtio.h b/tests/libqos/virtio.h
new file mode 100644
index 000..2a05798
--- /dev/null
+

[Qemu-devel] [PATCH v8 3/7] libqos: Added basic virtqueue support to virtio implementation

2014-09-01 Thread Marc Marí

Add status changing and feature negotiation.
Add basic virtqueue support for adding and sending virtqueue requests.
Add ISR checking.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Marc Marí 
---
 tests/libqos/virtio-pci.c |   82 +
 tests/libqos/virtio-pci.h |2 +
 tests/libqos/virtio.c |  100 +
 tests/libqos/virtio.h |   99 +
 tests/virtio-blk-test.c   |  178 -
 5 files changed, 458 insertions(+), 3 deletions(-)

diff --git a/tests/libqos/virtio-pci.c b/tests/libqos/virtio-pci.c
index 1a37620..12b06a2 100644
--- a/tests/libqos/virtio-pci.c
+++ b/tests/libqos/virtio-pci.c
@@ -14,6 +14,8 @@
 #include "libqos/virtio-pci.h"
 #include "libqos/pci.h"
 #include "libqos/pci-pc.h"
+#include "libqos/malloc.h"
+#include "libqos/malloc-pc.h"
 
 #include "hw/pci/pci_regs.h"
 
@@ -93,6 +95,18 @@ static uint64_t qvirtio_pci_config_readq(QVirtioDevice *d, 
void *addr)
 return u64;
 }
 
+static uint32_t qvirtio_pci_get_features(QVirtioDevice *d)
+{
+QVirtioPCIDevice *dev = (QVirtioPCIDevice *)d;
+return qpci_io_readl(dev->pdev, dev->addr + QVIRTIO_DEVICE_FEATURES);
+}
+
+static void qvirtio_pci_set_features(QVirtioDevice *d, uint32_t features)
+{
+QVirtioPCIDevice *dev = (QVirtioPCIDevice *)d;
+qpci_io_writel(dev->pdev, dev->addr + QVIRTIO_GUEST_FEATURES, features);
+}
+
 static uint8_t qvirtio_pci_get_status(QVirtioDevice *d)
 {
 QVirtioPCIDevice *dev = (QVirtioPCIDevice *)d;
@@ -105,13 +119,81 @@ static void qvirtio_pci_set_status(QVirtioDevice *d, 
uint8_t status)
 qpci_io_writeb(dev->pdev, dev->addr + QVIRTIO_DEVICE_STATUS, status);
 }
 
+static uint8_t qvirtio_pci_get_isr_status(QVirtioDevice *d)
+{
+QVirtioPCIDevice *dev = (QVirtioPCIDevice *)d;
+return qpci_io_readb(dev->pdev, dev->addr + QVIRTIO_ISR_STATUS);
+}
+
+static void qvirtio_pci_queue_select(QVirtioDevice *d, uint16_t index)
+{
+QVirtioPCIDevice *dev = (QVirtioPCIDevice *)d;
+qpci_io_writeb(dev->pdev, dev->addr + QVIRTIO_QUEUE_SELECT, index);
+}
+
+static uint16_t qvirtio_pci_get_queue_size(QVirtioDevice *d)
+{
+QVirtioPCIDevice *dev = (QVirtioPCIDevice *)d;
+return qpci_io_readw(dev->pdev, dev->addr + QVIRTIO_QUEUE_SIZE);
+}
+
+static void qvirtio_pci_set_queue_address(QVirtioDevice *d, uint32_t pfn)
+{
+QVirtioPCIDevice *dev = (QVirtioPCIDevice *)d;
+qpci_io_writel(dev->pdev, dev->addr + QVIRTIO_QUEUE_ADDRESS, pfn);
+}
+
+static QVirtQueue *qvirtio_pci_virtqueue_setup(QVirtioDevice *d,
+QGuestAllocator *alloc, uint16_t index)
+{
+uint64_t addr;
+QVirtQueue *vq;
+
+vq = g_malloc0(sizeof(*vq));
+
+qvirtio_pci_queue_select(d, index);
+vq->index = index;
+vq->size = qvirtio_pci_get_queue_size(d);
+vq->free_head = 0;
+vq->num_free = vq->size;
+vq->align = QVIRTIO_PCI_ALIGN;
+
+/* Check different than 0 */
+g_assert_cmpint(vq->size, !=, 0);
+
+/* Check power of 2 */
+g_assert_cmpint(vq->size & (vq->size - 1), ==, 0);
+
+addr = guest_alloc(alloc, qvring_size(vq->size, QVIRTIO_PCI_ALIGN));
+qvring_init(alloc, vq, addr);
+qvirtio_pci_set_queue_address(d, vq->desc / QVIRTIO_PCI_ALIGN);
+
+/* TODO: MSI-X configuration */
+
+return vq;
+}
+
+static void qvirtio_pci_virtqueue_kick(QVirtioDevice *d, QVirtQueue *vq)
+{
+QVirtioPCIDevice *dev = (QVirtioPCIDevice *)d;
+qpci_io_writew(dev->pdev, dev->addr + QVIRTIO_QUEUE_NOTIFY, vq->index);
+}
+
 const QVirtioBus qvirtio_pci = {
 .config_readb = qvirtio_pci_config_readb,
 .config_readw = qvirtio_pci_config_readw,
 .config_readl = qvirtio_pci_config_readl,
 .config_readq = qvirtio_pci_config_readq,
+.get_features = qvirtio_pci_get_features,
+.set_features = qvirtio_pci_set_features,
 .get_status = qvirtio_pci_get_status,
 .set_status = qvirtio_pci_set_status,
+.get_isr_status = qvirtio_pci_get_isr_status,
+.queue_select = qvirtio_pci_queue_select,
+.get_queue_size = qvirtio_pci_get_queue_size,
+.set_queue_address = qvirtio_pci_set_queue_address,
+.virtqueue_setup = qvirtio_pci_virtqueue_setup,
+.virtqueue_kick = qvirtio_pci_virtqueue_kick,
 };
 
 void qvirtio_pci_foreach(QPCIBus *bus, uint16_t device_type,
diff --git a/tests/libqos/virtio-pci.h b/tests/libqos/virtio-pci.h
index 26f902e..40bd12d 100644
--- a/tests/libqos/virtio-pci.h
+++ b/tests/libqos/virtio-pci.h
@@ -26,6 +26,8 @@
 #define QVIRTIO_DEVICE_SPECIFIC_MSIX0x18
 #define QVIRTIO_DEVICE_SPECIFIC_NO_MSIX 0x14
 
+#define QVIRTIO_PCI_ALIGN   4096
+
 typedef struct QVirtioPCIDevice {
 QVirtioDevice vdev;
 QPCIDevice *pdev;
diff --git a/tests/libqos/virtio.c b/tests/libqos/virtio.c
index 577d679..de92642 100644
--- a/tests/libqos/virtio.c
+++ b/tests/libqos/virtio.c
@@ -35,6 +35,23 @@ uint64_t qvirtio_config_readq(const QVirtioBus *bus, 
QVirtioDevice *d,
 return bus->config_readq(d, addr);
 }
 
+uin

[Qemu-devel] [PATCH v8 6/7] libqos: Added MSI-X support

2014-09-01 Thread Marc Marí

Added MSI-X support for qtest PCI.
Added MSI-X support for virtio-pci.
Added MSI-X test case in virtio-blk-test.

Signed-off-by: Marc Marí 
---
 tests/libqos/pci.c|  111 +++-
 tests/libqos/pci.h|   10 +++
 tests/libqos/virtio-pci.c |  142 ++-
 tests/libqos/virtio-pci.h |   17 +
 tests/libqos/virtio.c |   17 -
 tests/libqos/virtio.h |   11 ++-
 tests/virtio-blk-test.c   |  180 -
 7 files changed, 426 insertions(+), 62 deletions(-)

diff --git a/tests/libqos/pci.c b/tests/libqos/pci.c
index ce0b308..d5ce683 100644
--- a/tests/libqos/pci.c
+++ b/tests/libqos/pci.c
@@ -15,8 +15,6 @@
 #include "hw/pci/pci_regs.h"
 #include 
 
-#include 
-
 void qpci_device_foreach(QPCIBus *bus, int vendor_id, int device_id,
  void (*func)(QPCIDevice *dev, int devfn, void *data),
  void *data)
@@ -75,6 +73,115 @@ void qpci_device_enable(QPCIDevice *dev)
 qpci_config_writew(dev, PCI_COMMAND, cmd);
 }
 
+uint8_t qpci_find_capability(QPCIDevice *dev, uint8_t id)
+{
+uint8_t cap;
+uint8_t addr = qpci_config_readb(dev, PCI_CAPABILITY_LIST);
+
+do {
+cap = qpci_config_readb(dev, addr);
+if (cap != id) {
+addr = qpci_config_readb(dev, addr + PCI_CAP_LIST_NEXT);
+}
+} while (cap != id && addr != 0);
+
+return addr;
+}
+
+void qpci_msix_enable(QPCIDevice *dev)
+{
+uint8_t addr;
+uint16_t val;
+uint32_t table;
+uint8_t bir_table;
+uint8_t bir_pba;
+void *offset;
+
+addr = qpci_find_capability(dev, PCI_CAP_ID_MSIX);
+g_assert_cmphex(addr, !=, 0);
+
+val = qpci_config_readw(dev, addr + PCI_MSIX_FLAGS);
+qpci_config_writew(dev, addr + PCI_MSIX_FLAGS, val | 
PCI_MSIX_FLAGS_ENABLE);
+
+table = qpci_config_readl(dev, addr + PCI_MSIX_TABLE);
+bir_table = table & PCI_MSIX_FLAGS_BIRMASK;
+offset = qpci_iomap(dev, bir_table, NULL);
+dev->msix_table = offset + (table & ~PCI_MSIX_FLAGS_BIRMASK);
+
+table = qpci_config_readl(dev, addr + PCI_MSIX_PBA);
+bir_pba = table & PCI_MSIX_FLAGS_BIRMASK;
+if (bir_pba != bir_table) {
+offset = qpci_iomap(dev, bir_pba, NULL);
+}
+dev->msix_pba = offset + (table & ~PCI_MSIX_FLAGS_BIRMASK);
+
+g_assert(dev->msix_table != NULL);
+g_assert(dev->msix_pba != NULL);
+dev->msix_enabled = true;
+}
+
+void qpci_msix_disable(QPCIDevice *dev)
+{
+uint8_t addr;
+uint16_t val;
+
+g_assert(dev->msix_enabled);
+addr = qpci_find_capability(dev, PCI_CAP_ID_MSIX);
+g_assert_cmphex(addr, !=, 0);
+val = qpci_config_readw(dev, addr + PCI_MSIX_FLAGS);
+qpci_config_writew(dev, addr + PCI_MSIX_FLAGS,
+val & ~PCI_MSIX_FLAGS_ENABLE);
+
+qpci_iounmap(dev, dev->msix_table);
+qpci_iounmap(dev, dev->msix_pba);
+dev->msix_enabled = 0;
+dev->msix_table = NULL;
+dev->msix_pba = NULL;
+}
+
+bool qpci_msix_pending(QPCIDevice *dev, uint16_t entry)
+{
+uint32_t pba_entry;
+uint8_t bit_n = entry % 32;
+void *addr = dev->msix_pba + (entry / 32) * PCI_MSIX_ENTRY_SIZE / 4;
+
+g_assert(dev->msix_enabled);
+pba_entry = qpci_io_readl(dev, addr);
+qpci_io_writel(dev, addr, pba_entry & ~(1 << bit_n));
+return (pba_entry & (1 << bit_n)) != 0;
+}
+
+bool qpci_msix_masked(QPCIDevice *dev, uint16_t entry)
+{
+uint8_t addr;
+uint16_t val;
+void *vector_addr = dev->msix_table + (entry * PCI_MSIX_ENTRY_SIZE);
+
+g_assert(dev->msix_enabled);
+addr = qpci_find_capability(dev, PCI_CAP_ID_MSIX);
+g_assert_cmphex(addr, !=, 0);
+val = qpci_config_readw(dev, addr + PCI_MSIX_FLAGS);
+
+if (val & PCI_MSIX_FLAGS_MASKALL) {
+return true;
+} else {
+return (qpci_io_readl(dev, vector_addr + PCI_MSIX_ENTRY_VECTOR_CTRL)
+& PCI_MSIX_ENTRY_CTRL_MASKBIT) != 
0;
+}
+}
+
+uint16_t qpci_msix_table_size(QPCIDevice *dev)
+{
+uint8_t addr;
+uint16_t control;
+
+addr = qpci_find_capability(dev, PCI_CAP_ID_MSIX);
+g_assert_cmphex(addr, !=, 0);
+
+control = qpci_config_readw(dev, addr + PCI_MSIX_FLAGS);
+return (control & PCI_MSIX_FLAGS_QSIZE) + 1;
+}
+
 uint8_t qpci_config_readb(QPCIDevice *dev, uint8_t offset)
 {
 return dev->bus->config_readb(dev->bus, dev->devfn, offset);
diff --git a/tests/libqos/pci.h b/tests/libqos/pci.h
index 9ee048b..d51eb9e 100644
--- a/tests/libqos/pci.h
+++ b/tests/libqos/pci.h
@@ -14,6 +14,7 @@
 #define LIBQOS_PCI_H
 
 #include 
+#include "libqtest.h"
 
 #define QPCI_DEVFN(dev, fn) (((dev) << 3) | (fn))
 
@@ -49,6 +50,9 @@ struct QPCIDevice
 {
 QPCIBus *bus;
 int devfn;
+bool msix_enabled;
+void *msix_table;
+void *msix_pba;
 };
 
 void qpci_device_foreach(QPCIBus *bus, int vendor_id, int device_id,
@@ -57,6 +61,12 @@ void qpci_device_foreach(QPCIBus *bus, int vendor_

[Qemu-devel] [PATCH v8 7/7] libqos: Added EVENT_IDX support

2014-09-01 Thread Marc Marí

Added avail_event and NO_NOTIFY check before notifying.
Added used_event setting.

Signed-off-by: Marc Marí 
---
 tests/libqos/virtio-pci.c |1 +
 tests/libqos/virtio.c |   27 +-
 tests/libqos/virtio.h |5 ++
 tests/virtio-blk-test.c   |  124 +
 4 files changed, 156 insertions(+), 1 deletion(-)

diff --git a/tests/libqos/virtio-pci.c b/tests/libqos/virtio-pci.c
index ab28717..788ebaf 100644
--- a/tests/libqos/virtio-pci.c
+++ b/tests/libqos/virtio-pci.c
@@ -203,6 +203,7 @@ static QVirtQueue 
*qvirtio_pci_virtqueue_setup(QVirtioDevice *d,
 vqpci->vq.num_free = vqpci->vq.size;
 vqpci->vq.align = QVIRTIO_PCI_ALIGN;
 vqpci->vq.indirect = (feat & QVIRTIO_F_RING_INDIRECT_DESC) != 0;
+vqpci->vq.event = (feat & QVIRTIO_F_RING_EVENT_IDX) != 0;
 
 vqpci->msix_entry = -1;
 vqpci->msix_addr = 0;
diff --git a/tests/libqos/virtio.c b/tests/libqos/virtio.c
index 16eaf79..128dbd0 100644
--- a/tests/libqos/virtio.c
+++ b/tests/libqos/virtio.c
@@ -124,9 +124,13 @@ void qvring_init(const QGuestAllocator *alloc, QVirtQueue 
*vq, uint64_t addr)
 writew(vq->avail, 0);
 /* vq->avail->idx */
 writew(vq->avail + 2, 0);
+/* vq->avail->used_event */
+writew(vq->avail + 4 + (2 * vq->size), 0);
 
 /* vq->used->flags */
 writew(vq->used, 0);
+/* vq->used->avail_event */
+writew(vq->used+2+(sizeof(struct QVRingUsedElem)*vq->size), 0);
 }
 
 QVRingIndirectDesc *qvring_indirect_desc_setup(QVirtioDevice *d,
@@ -222,11 +226,32 @@ void qvirtqueue_kick(const QVirtioBus *bus, QVirtioDevice 
*d, QVirtQueue *vq,
 {
 /* vq->avail->idx */
 uint16_t idx = readl(vq->avail + 2);
+/* vq->used->flags */
+uint16_t flags;
+/* vq->used->avail_event */
+uint16_t avail_event;
 
 /* vq->avail->ring[idx % vq->size] */
 writel(vq->avail + 4 + (2 * (idx % vq->size)), free_head);
 /* vq->avail->idx */
 writel(vq->avail + 2, idx + 1);
 
-bus->virtqueue_kick(d, vq);
+/* Must read after idx is updated */
+flags = readw(vq->avail);
+avail_event = readw(vq->used + 4 +
+(sizeof(struct QVRingUsedElem) * vq->size));
+
+/* < 1 because we add elements to avail queue one by one */
+if ((flags & QVRING_USED_F_NO_NOTIFY) == 0 &&
+(!vq->event || (uint16_t)(idx-avail_event) < 1)) {
+bus->virtqueue_kick(d, vq);
+}
+}
+
+void qvirtqueue_set_used_event(QVirtQueue *vq, uint16_t idx)
+{
+g_assert(vq->event);
+
+/* vq->avail->used_event */
+writew(vq->avail + 4 + (2 * vq->size), idx);
 }
diff --git a/tests/libqos/virtio.h b/tests/libqos/virtio.h
index cebccd2..70b3376 100644
--- a/tests/libqos/virtio.h
+++ b/tests/libqos/virtio.h
@@ -26,6 +26,7 @@
 #define QVIRTIO_F_ANY_LAYOUT0x0800
 #define QVIRTIO_F_RING_INDIRECT_DESC0x1000
 #define QVIRTIO_F_RING_EVENT_IDX0x2000
+#define QVIRTIO_F_BAD_FEATURE   0x4000
 
 #define QVRING_DESC_F_NEXT  0x1
 #define QVRING_DESC_F_WRITE 0x2
@@ -57,6 +58,7 @@ typedef struct QVRingAvail {
 uint16_t flags;
 uint16_t idx;
 uint16_t ring[0]; /* This is an array of uint16_t */
+uint16_t used_event;
 } QVRingAvail;
 
 typedef struct QVRingUsedElem {
@@ -68,6 +70,7 @@ typedef struct QVRingUsed {
 uint16_t flags;
 uint16_t idx;
 QVRingUsedElem ring[0]; /* This is an array of QVRingUsedElem structs */
+uint16_t avail_event;
 } QVRingUsed;
 
 typedef struct QVirtQueue {
@@ -80,6 +83,7 @@ typedef struct QVirtQueue {
 uint32_t num_free;
 uint32_t align;
 bool indirect;
+bool event;
 } QVirtQueue;
 
 typedef struct QVRingIndirectDesc {
@@ -174,4 +178,5 @@ uint32_t qvirtqueue_add_indirect(QVirtQueue *vq, 
QVRingIndirectDesc *indirect);
 void qvirtqueue_kick(const QVirtioBus *bus, QVirtioDevice *d, QVirtQueue *vq,
 uint32_t 
free_head);
 
+void qvirtqueue_set_used_event(QVirtQueue *vq, uint16_t idx);
 #endif
diff --git a/tests/virtio-blk-test.c b/tests/virtio-blk-test.c
index 026abc2..fdc6ffe 100644
--- a/tests/virtio-blk-test.c
+++ b/tests/virtio-blk-test.c
@@ -478,6 +478,129 @@ static void pci_msix(void)
 
 qvirtqueue_kick(&qvirtio_pci, &dev->vdev, &vqpci->vq, free_head);
 
+
+g_assert(qvirtio_wait_queue_isr(&qvirtio_pci, &dev->vdev, &vqpci->vq,
+QVIRTIO_BLK_TIMEOUT));
+
+status = readb(req_addr + 528);
+g_assert_cmpint(status, ==, 0);
+
+data = g_malloc0(512);
+memread(req_addr + 16, data, 512);
+g_assert_cmpstr(data, ==, "TEST");
+g_free(data);
+
+guest_free(alloc, req_addr);
+
+/* End test */
+guest_free(alloc, (uint64_t)vqpci->vq.desc);
+qpci_msix_disable(dev->pdev);
+qvirtio_pci_device_disable(dev);
+g_free(dev);
+test_end();
+}
+
+static void pci_idx(void)
+{
+QVirtioPCIDevice *dev;
+QPCIBus *bus;
+QVirtQ

Re: [Qemu-devel] [PATCH v1 2/2] block/archipelago: Use QEMU atomic builtins

2014-09-01 Thread Chrysostomos Nanakos


On 09/01/2014 12:43 PM, Paolo Bonzini wrote:

Il 01/09/2014 10:58, Chrysostomos Nanakos ha scritto:

Replace __sync builtins with the ones provided by QEMU
for atomic operations.

Signed-off-by: Chrysostomos Nanakos 
---
  block/archipelago.c |   11 ++-
  1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/block/archipelago.c b/block/archipelago.c
index 34f72dc..fa8cd29 100644
--- a/block/archipelago.c
+++ b/block/archipelago.c
@@ -57,6 +57,7 @@
  #include "qapi/qmp/qint.h"
  #include "qapi/qmp/qstring.h"
  #include "qapi/qmp/qjson.h"
+#include "qemu/atomic.h"
  
  #include 

  #include 
@@ -214,7 +215,7 @@ static void xseg_request_handler(void *state)
  
  xseg_put_request(s->xseg, req, s->srcport);
  
-if ((__sync_add_and_fetch(&segreq->ref, -1)) == 0) {

+if ((atomic_add_fetch(&segreq->ref, -1)) == 0) {

Why not just use "== 1" and avoid patch 1? :)

(Also, you could use atomic_fetch_dec).


Nice catch!



  if (!segreq->failed) {
  reqdata->aio_cb->ret = segreq->count;
  archipelago_finish_aiocb(reqdata);
@@ -233,7 +234,7 @@ static void xseg_request_handler(void *state)
  segreq->count += req->serviced;
  xseg_put_request(s->xseg, req, s->srcport);
  
-if ((__sync_add_and_fetch(&segreq->ref, -1)) == 0) {

+if ((atomic_add_fetch(&segreq->ref, -1)) == 0) {
  if (!segreq->failed) {
  reqdata->aio_cb->ret = segreq->count;
  archipelago_finish_aiocb(reqdata);
@@ -885,13 +886,13 @@ static int 
archipelago_aio_segmented_rw(BDRVArchipelagoState *s,
  return 0;
  
  err_exit:

-__sync_add_and_fetch(&segreq->failed, 1);
+atomic_add_fetch(&segreq->failed, 1);

You can use atomic_inc here.


Yes atomic_inc seems to be a lot better. Thanks!


Paolo


  if (segments_nr == 1) {
-if (__sync_add_and_fetch(&segreq->ref, -1) == 0) {
+if (atomic_add_fetch(&segreq->ref, -1) == 0) {
  g_free(segreq);
  }
  } else {
-if ((__sync_add_and_fetch(&segreq->ref, -segments_nr + i)) == 0) {
+if ((atomic_add_fetch(&segreq->ref, -segments_nr + i)) == 0) {



What about this one? It seems easier to me to read the above and 
understand what is going on.


By any means, it's up to you to accept patch 1 :)

Regards,
Chrysostomos.



  g_free(segreq);
  }
  }

Re: [Qemu-devel] [PATCH 5/5] gdb: provide the name of the architecture in the target.xml

2014-09-01 Thread Peter Maydell

[ccing Andreas in case he wants to review the QOM aspects of this,
though they're fairly straightforward I think.]

On 29 August 2014 14:52, Jens Freimann  wrote:
> From: David Hildenbrand 
>
> This patch provides the name of the architecture in the target.xml if 
> available.
>
> This allows the remote gdb to detect the target architecture on its own - so
> there is no need to specify it manually (e.g. if gdb is started without a
> binary) using "set arch *arch_name*".

This is neat; I didn't realise gdb let you do this.

> The name of the architecture has been added to all archs that provide a
> target.xml (by supplying a gdb_core_xml_file) and have a unique architecture
> name in gdb's feature xml files.

What about 32-bit ARM? You set the architecture name for AArch64
but not the 32 bit case.

Are there architectures that might need to specify something
more complicated than "always the same string"? (ie is there
a case for having the target provide a "return architecture name"
method rather than a constant string?)

> Signed-off-by: David Hildenbrand 
> Acked-by: Cornelia Huck 
> Acked-by: Christian Borntraeger 
> Signed-off-by: Jens Freimann 
> Cc: Andrzej Zaborowski 
> Cc: Peter Maydell 
> Cc: Vassili Karpov (malc) 
> ---
>  gdbstub.c   | 19 ---
>  include/qom/cpu.h   |  2 ++
>  target-arm/cpu64.c  |  1 +
>  target-ppc/translate_init.c |  2 ++
>  target-s390x/cpu.c  |  1 +
>  5 files changed, 18 insertions(+), 7 deletions(-)
>
> diff --git a/gdbstub.c b/gdbstub.c
> index 8afe0b7..af82259 100644
> --- a/gdbstub.c
> +++ b/gdbstub.c
> @@ -523,13 +523,18 @@ static const char *get_feature_xml(const char *p, const 
> char **newp,
>  GDBRegisterState *r;
>  CPUState *cpu = first_cpu;
>
> -snprintf(target_xml, sizeof(target_xml),
> - ""
> - ""
> - ""
> - "",
> - cc->gdb_core_xml_file);
> -
> +pstrcat(target_xml, sizeof(target_xml),
> +""
> +""
> +"");
> +if (cc->gdb_arch_name) {
> +pstrcat(target_xml, sizeof(target_xml), "");
> +pstrcat(target_xml, sizeof(target_xml), cc->gdb_arch_name);
> +pstrcat(target_xml, sizeof(target_xml), "");
> +}
> +pstrcat(target_xml, sizeof(target_xml), " +pstrcat(target_xml, sizeof(target_xml), cc->gdb_core_xml_file);
> +pstrcat(target_xml, sizeof(target_xml), "\"/>");
>  for (r = cpu->gdb_regs; r; r = r->next) {
>  pstrcat(target_xml, sizeof(target_xml), " href=\"");
>  pstrcat(target_xml, sizeof(target_xml), r->xml);
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index 1aafbf5..8828b16 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -98,6 +98,7 @@ struct TranslationBlock;
>   * @vmsd: State description for migration.
>   * @gdb_num_core_regs: Number of core registers accessible to GDB.
>   * @gdb_core_xml_file: File name for core registers GDB XML description.
> + * @gdb_arch_name: Architecture name known to GDB.
>   *
>   * Represents a CPU family or model.
>   */
> @@ -147,6 +148,7 @@ typedef struct CPUClass {
>  const struct VMStateDescription *vmsd;
>  int gdb_num_core_regs;
>  const char *gdb_core_xml_file;
> +const char *gdb_arch_name;
>  } CPUClass;
>
>  #ifdef HOST_WORDS_BIGENDIAN
> diff --git a/target-arm/cpu64.c b/target-arm/cpu64.c
> index 38d2b84..9df7492 100644
> --- a/target-arm/cpu64.c
> +++ b/target-arm/cpu64.c
> @@ -201,6 +201,7 @@ static void aarch64_cpu_class_init(ObjectClass *oc, void 
> *data)
>  cc->gdb_write_register = aarch64_cpu_gdb_write_register;
>  cc->gdb_num_core_regs = 34;
>  cc->gdb_core_xml_file = "aarch64-core.xml";
> +cc->gdb_arch_name = "aarch64";
>  }
>
>  static void aarch64_cpu_register(const ARMCPUInfo *info)
> diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
> index 48177ed..7165347 100644
> --- a/target-ppc/translate_init.c
> +++ b/target-ppc/translate_init.c
> @@ -9649,8 +9649,10 @@ static void ppc_cpu_class_init(ObjectClass *oc, void 
> *data)
>
>  #if defined(TARGET_PPC64)
>  cc->gdb_core_xml_file = "power64-core.xml";
> +cc->gdb_arch_name = "powerpc:common64";
>  #else
>  cc->gdb_core_xml_file = "power-core.xml";
> +cc->gdb_arch_name = "powerpc:common";
>  #endif
>  #ifndef CONFIG_USER_ONLY
>  cc->virtio_is_big_endian = ppc_cpu_is_big_endian;
> diff --git a/target-s390x/cpu.c b/target-s390x/cpu.c
> index 4b03e42..5dae93c 100644
> --- a/target-s390x/cpu.c
> +++ b/target-s390x/cpu.c
> @@ -262,6 +262,7 @@ static void s390_cpu_class_init(ObjectClass *oc, void 
> *data)
>  dc->vmsd = &vmstate_s390_cpu;
>  cc->gdb_num_core_regs = S390_NUM_CORE_REGS;
>  cc->gdb_core_xml_file = "s390x-core64.xml";
> +cc->gdb_arch_name

Re: [Qemu-devel] [PATCH 5/5] gdb: provide the name of the architecture in the target.xml

2014-09-01 Thread Andreas Färber

Am 01.09.2014 12:19, schrieb Peter Maydell:
> [ccing Andreas in case he wants to review the QOM aspects of this,
> though they're fairly straightforward I think.]
> 
> On 29 August 2014 14:52, Jens Freimann  wrote:
>> From: David Hildenbrand 
>>
>> This patch provides the name of the architecture in the target.xml if 
>> available.
>>
>> This allows the remote gdb to detect the target architecture on its own - so
>> there is no need to specify it manually (e.g. if gdb is started without a
>> binary) using "set arch *arch_name*".
> 
> This is neat; I didn't realise gdb let you do this.
> 
>> The name of the architecture has been added to all archs that provide a
>> target.xml (by supplying a gdb_core_xml_file) and have a unique architecture
>> name in gdb's feature xml files.
> 
> What about 32-bit ARM? You set the architecture name for AArch64
> but not the 32 bit case.
> 
> Are there architectures that might need to specify something
> more complicated than "always the same string"? (ie is there
> a case for having the target provide a "return architecture name"
> method rather than a constant string?)
> 
>> Signed-off-by: David Hildenbrand 
>> Acked-by: Cornelia Huck 
>> Acked-by: Christian Borntraeger 
>> Signed-off-by: Jens Freimann 
>> Cc: Andrzej Zaborowski 
>> Cc: Peter Maydell 
>> Cc: Vassili Karpov (malc) 
>> ---
>>  gdbstub.c   | 19 ---
>>  include/qom/cpu.h   |  2 ++
>>  target-arm/cpu64.c  |  1 +
>>  target-ppc/translate_init.c |  2 ++
>>  target-s390x/cpu.c  |  1 +
>>  5 files changed, 18 insertions(+), 7 deletions(-)
>>
>> diff --git a/gdbstub.c b/gdbstub.c
>> index 8afe0b7..af82259 100644
>> --- a/gdbstub.c
>> +++ b/gdbstub.c
>> @@ -523,13 +523,18 @@ static const char *get_feature_xml(const char *p, 
>> const char **newp,
>>  GDBRegisterState *r;
>>  CPUState *cpu = first_cpu;
>>
>> -snprintf(target_xml, sizeof(target_xml),
>> - ""
>> - ""
>> - ""
>> - "",
>> - cc->gdb_core_xml_file);
>> -
>> +pstrcat(target_xml, sizeof(target_xml),
>> +""
>> +""
>> +"");
>> +if (cc->gdb_arch_name) {
>> +pstrcat(target_xml, sizeof(target_xml), "");
>> +pstrcat(target_xml, sizeof(target_xml), cc->gdb_arch_name);
>> +pstrcat(target_xml, sizeof(target_xml), "");
>> +}
>> +pstrcat(target_xml, sizeof(target_xml), "> +pstrcat(target_xml, sizeof(target_xml), cc->gdb_core_xml_file);
>> +pstrcat(target_xml, sizeof(target_xml), "\"/>");
>>  for (r = cpu->gdb_regs; r; r = r->next) {
>>  pstrcat(target_xml, sizeof(target_xml), "> href=\"");
>>  pstrcat(target_xml, sizeof(target_xml), r->xml);
>> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
>> index 1aafbf5..8828b16 100644
>> --- a/include/qom/cpu.h
>> +++ b/include/qom/cpu.h
>> @@ -98,6 +98,7 @@ struct TranslationBlock;
>>   * @vmsd: State description for migration.
>>   * @gdb_num_core_regs: Number of core registers accessible to GDB.
>>   * @gdb_core_xml_file: File name for core registers GDB XML description.
>> + * @gdb_arch_name: Architecture name known to GDB.
>>   *
>>   * Represents a CPU family or model.
>>   */
>> @@ -147,6 +148,7 @@ typedef struct CPUClass {
>>  const struct VMStateDescription *vmsd;
>>  int gdb_num_core_regs;
>>  const char *gdb_core_xml_file;
>> +const char *gdb_arch_name;
>>  } CPUClass;
>>
>>  #ifdef HOST_WORDS_BIGENDIAN
>> diff --git a/target-arm/cpu64.c b/target-arm/cpu64.c
>> index 38d2b84..9df7492 100644
>> --- a/target-arm/cpu64.c
>> +++ b/target-arm/cpu64.c
>> @@ -201,6 +201,7 @@ static void aarch64_cpu_class_init(ObjectClass *oc, void 
>> *data)
>>  cc->gdb_write_register = aarch64_cpu_gdb_write_register;
>>  cc->gdb_num_core_regs = 34;
>>  cc->gdb_core_xml_file = "aarch64-core.xml";
>> +cc->gdb_arch_name = "aarch64";
>>  }
>>
>>  static void aarch64_cpu_register(const ARMCPUInfo *info)
>> diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
>> index 48177ed..7165347 100644
>> --- a/target-ppc/translate_init.c
>> +++ b/target-ppc/translate_init.c
>> @@ -9649,8 +9649,10 @@ static void ppc_cpu_class_init(ObjectClass *oc, void 
>> *data)
>>
>>  #if defined(TARGET_PPC64)
>>  cc->gdb_core_xml_file = "power64-core.xml";
>> +cc->gdb_arch_name = "powerpc:common64";
>>  #else
>>  cc->gdb_core_xml_file = "power-core.xml";
>> +cc->gdb_arch_name = "powerpc:common";
>>  #endif
>>  #ifndef CONFIG_USER_ONLY
>>  cc->virtio_is_big_endian = ppc_cpu_is_big_endian;
>> diff --git a/target-s390x/cpu.c b/target-s390x/cpu.c
>> index 4b03e42..5dae93c 100644
>> --- a/target-s390x/cpu.c
>> +++ b/target-s390x/cpu.c
>> @@ -262,6 +262,7 @@ static void s390_cpu_class_init(ObjectCl

Re: [Qemu-devel] [PATCH 5/5] gdb: provide the name of the architecture in the target.xml

2014-09-01 Thread Christian Borntraeger

On 01/09/14 12:19, Peter Maydell wrote:
> [ccing Andreas in case he wants to review the QOM aspects of this,
> though they're fairly straightforward I think.]
> 
> On 29 August 2014 14:52, Jens Freimann  wrote:
>> From: David Hildenbrand 
>>
>> This patch provides the name of the architecture in the target.xml if 
>> available.
>>
>> This allows the remote gdb to detect the target architecture on its own - so
>> there is no need to specify it manually (e.g. if gdb is started without a
>> binary) using "set arch *arch_name*".
> 
> This is neat; I didn't realise gdb let you do this.
> 
>> The name of the architecture has been added to all archs that provide a
>> target.xml (by supplying a gdb_core_xml_file) and have a unique architecture
>> name in gdb's feature xml files.
> 
> What about 32-bit ARM? You set the architecture name for AArch64
> but not the 32 bit case.
> 
> Are there architectures that might need to specify something
> more complicated than "always the same string"? (ie is there
> a case for having the target provide a "return architecture name"
> method rather than a constant string?)

I dont know and David is on vacation.
Would it make sense to do the other architectures as addon patches or shall we 
wait with this patch until we know what is necessary for all supported 
platforms?


> 
>> Signed-off-by: David Hildenbrand 
>> Acked-by: Cornelia Huck 
>> Acked-by: Christian Borntraeger 
>> Signed-off-by: Jens Freimann 
>> Cc: Andrzej Zaborowski 
>> Cc: Peter Maydell 
>> Cc: Vassili Karpov (malc) 
>> ---
>>  gdbstub.c   | 19 ---
>>  include/qom/cpu.h   |  2 ++
>>  target-arm/cpu64.c  |  1 +
>>  target-ppc/translate_init.c |  2 ++
>>  target-s390x/cpu.c  |  1 +
>>  5 files changed, 18 insertions(+), 7 deletions(-)
>>
>> diff --git a/gdbstub.c b/gdbstub.c
>> index 8afe0b7..af82259 100644
>> --- a/gdbstub.c
>> +++ b/gdbstub.c
>> @@ -523,13 +523,18 @@ static const char *get_feature_xml(const char *p, 
>> const char **newp,
>>  GDBRegisterState *r;
>>  CPUState *cpu = first_cpu;
>>
>> -snprintf(target_xml, sizeof(target_xml),
>> - ""
>> - ""
>> - ""
>> - "",
>> - cc->gdb_core_xml_file);
>> -
>> +pstrcat(target_xml, sizeof(target_xml),
>> +""
>> +""
>> +"");
>> +if (cc->gdb_arch_name) {
>> +pstrcat(target_xml, sizeof(target_xml), "");
>> +pstrcat(target_xml, sizeof(target_xml), cc->gdb_arch_name);
>> +pstrcat(target_xml, sizeof(target_xml), "");
>> +}
>> +pstrcat(target_xml, sizeof(target_xml), "> +pstrcat(target_xml, sizeof(target_xml), cc->gdb_core_xml_file);
>> +pstrcat(target_xml, sizeof(target_xml), "\"/>");
>>  for (r = cpu->gdb_regs; r; r = r->next) {
>>  pstrcat(target_xml, sizeof(target_xml), "> href=\"");
>>  pstrcat(target_xml, sizeof(target_xml), r->xml);
>> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
>> index 1aafbf5..8828b16 100644
>> --- a/include/qom/cpu.h
>> +++ b/include/qom/cpu.h
>> @@ -98,6 +98,7 @@ struct TranslationBlock;
>>   * @vmsd: State description for migration.
>>   * @gdb_num_core_regs: Number of core registers accessible to GDB.
>>   * @gdb_core_xml_file: File name for core registers GDB XML description.
>> + * @gdb_arch_name: Architecture name known to GDB.
>>   *
>>   * Represents a CPU family or model.
>>   */
>> @@ -147,6 +148,7 @@ typedef struct CPUClass {
>>  const struct VMStateDescription *vmsd;
>>  int gdb_num_core_regs;
>>  const char *gdb_core_xml_file;
>> +const char *gdb_arch_name;
>>  } CPUClass;
>>
>>  #ifdef HOST_WORDS_BIGENDIAN
>> diff --git a/target-arm/cpu64.c b/target-arm/cpu64.c
>> index 38d2b84..9df7492 100644
>> --- a/target-arm/cpu64.c
>> +++ b/target-arm/cpu64.c
>> @@ -201,6 +201,7 @@ static void aarch64_cpu_class_init(ObjectClass *oc, void 
>> *data)
>>  cc->gdb_write_register = aarch64_cpu_gdb_write_register;
>>  cc->gdb_num_core_regs = 34;
>>  cc->gdb_core_xml_file = "aarch64-core.xml";
>> +cc->gdb_arch_name = "aarch64";
>>  }
>>
>>  static void aarch64_cpu_register(const ARMCPUInfo *info)
>> diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
>> index 48177ed..7165347 100644
>> --- a/target-ppc/translate_init.c
>> +++ b/target-ppc/translate_init.c
>> @@ -9649,8 +9649,10 @@ static void ppc_cpu_class_init(ObjectClass *oc, void 
>> *data)
>>
>>  #if defined(TARGET_PPC64)
>>  cc->gdb_core_xml_file = "power64-core.xml";
>> +cc->gdb_arch_name = "powerpc:common64";
>>  #else
>>  cc->gdb_core_xml_file = "power-core.xml";
>> +cc->gdb_arch_name = "powerpc:common";
>>  #endif
>>  #ifndef CONFIG_USER_ONLY
>>  cc->virtio_is_big_endian = ppc_cpu_is_big_endian;
>> diff

Re: [Qemu-devel] [PATCH 1/1] hw/pci-assign: split pci-assign.c

2014-09-01 Thread Chen, Tiejun


On 2014/9/1 17:53, Michael S. Tsirkin wrote:

On Mon, Sep 01, 2014 at 05:26:24PM +0800, Chen, Tiejun wrote:

On 2014/9/1 16:27, Michael S. Tsirkin wrote:

On Mon, Sep 01, 2014 at 10:07:19AM +0800, Tiejun Chen wrote:

We will try to reuse assign_dev_load_option_rom in xen side, and
especially its a good beginning to unify pci assign codes both on
kvm and xen in the future.

Signed-off-by: Tiejun Chen 
---


[snip]


+ */
+#ifndef PCI_ASSIGN_H
+#define PCI_ASSIGN_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "hw/hw.h"
+#include "hw/i386/pc.h"
+#include "qemu/error-report.h"
+#include "ui/console.h"
+#include "hw/loader.h"
+#include "monitor/monitor.h"
+#include "qemu/range.h"
+#include "sysemu/sysemu.h"
+#include "hw/pci/pci.h"
+#include "hw/pci/msi.h"
+#include "kvm_i386.h"


Why are you pulling all these headers here?
Please include the minimum required.


So just leave #include "hw/pci/pci.h".




+
+#define MSIX_PAGE_SIZE 0x1000
+
+/* From linux/ioport.h */
+#define IORESOURCE_IO   0x0100  /* Resource type */
+#define IORESOURCE_MEM  0x0200


[snip]


+uint8_t emulate_config_read[PCI_CONFIG_SPACE_SIZE];
+uint8_t emulate_config_write[PCI_CONFIG_SPACE_SIZE];
+int msi_virq_nr;
+int *msi_virq;
+MSIXTableEntry *msix_table;
+hwaddr msix_table_addr;
+uint16_t msix_max;
+MemoryRegion mmio;
+char *configfd_name;
+int32_t bootindex;
+} AssignedDevice;
+


Why are you moving the above here?


As I said in the patch head description, I think this is a good beginning to
unify pci-assign in both KVM and XEN. So I tried to move these common
stuffs. Although we mightn't use them directly in the future, but I guess we
still need to move them into this head file.

If you think we should do this on-demand exactly, I can move back them to
pci-assign.c.


Yes, I think this is better on demand.


Okay, I'll restore this in next revision.

Thanks
Tiejun







+int dev_load_option_rom(PCIDevice *dev, struct Object *owner, void *ptr,
+unsigned int domain, unsigned int bus,
+unsigned int slot, unsigned int function);


Please use a header-specific prefix to avoid global namespace pollution.
pci_assign_dev_load_option_rom?


Looks good so I will follow-up yours.

Thanks
Tiejun




+#endif /* PCI_ASSIGN_H */
--
1.9.1

Re: [Qemu-devel] [PATCH 5/5] gdb: provide the name of the architecture in the target.xml

2014-09-01 Thread Peter Maydell

On 1 September 2014 11:31, Christian Borntraeger  wrote:
> On 01/09/14 12:19, Peter Maydell wrote:
>> [ccing Andreas in case he wants to review the QOM aspects of this,
>> though they're fairly straightforward I think.]
>> On 29 August 2014 14:52, Jens Freimann  wrote:
>>> The name of the architecture has been added to all archs that provide a
>>> target.xml (by supplying a gdb_core_xml_file) and have a unique architecture
>>> name in gdb's feature xml files.
>>
>> What about 32-bit ARM? You set the architecture name for AArch64
>> but not the 32 bit case.
>>
>> Are there architectures that might need to specify something
>> more complicated than "always the same string"? (ie is there
>> a case for having the target provide a "return architecture name"
>> method rather than a constant string?)
>
> I dont know and David is on vacation.
> Would it make sense to do the other architectures as addon patches
> or shall we wait with this patch until we know what is necessary for
> all supported platforms?

We don't have to implement all the architectures at once, but
I would prefer it if we can get the QOM API for it right from
the start rather than having to change it (and all the targets
which implemented it the old way) later.

thanks
-- PMM

Re: [Qemu-devel] [PATCH v1 2/2] block/archipelago: Use QEMU atomic builtins

2014-09-01 Thread Paolo Bonzini

Il 01/09/2014 12:13, Chrysostomos Nanakos ha scritto:
>>>
>>> -if ((__sync_add_and_fetch(&segreq->ref, -segments_nr + i))
>>> == 0) {
>>> +if ((atomic_add_fetch(&segreq->ref, -segments_nr + i)) == 0) {
> 
> 
> What about this one? It seems easier to me to read the above and
> understand what is going on.
> 
> By any means, it's up to you to accept patch 1 :)

Yeah, you win on this one. :)  But this code could use some
refactoring...

Also, there is also a missing memory barrier before the
first archipelago_submit_request call, and setting "failed" does
not need an atomic operation.

Something like this (untested):

diff --git a/block/archipelago.c b/block/archipelago.c
index 34f72dc..c71898a 100644
--- a/block/archipelago.c
+++ b/block/archipelago.c
@@ -824,76 +824,47 @@ static int 
archipelago_aio_segmented_rw(BDRVArchipelagoState *s,
 ArchipelagoAIOCB *aio_cb,
 int op)
 {
-int i, ret, segments_nr, last_segment_size;
+int i, ret, segments_nr;
+size_t pos;
 ArchipelagoSegmentedRequest *segreq;
 
-segreq = g_new(ArchipelagoSegmentedRequest, 1);
+segreq = g_new0(ArchipelagoSegmentedRequest, 1);
 
 if (op == ARCHIP_OP_FLUSH) {
 segments_nr = 1;
-segreq->ref = segments_nr;
-segreq->total = count;
-segreq->count = 0;
-segreq->failed = 0;
-ret = archipelago_submit_request(s, 0, count, offset, aio_cb,
-   segreq, ARCHIP_OP_FLUSH);
-if (ret < 0) {
-goto err_exit;
-}
-return 0;
+} else {
+segments_nr = (int)(count / MAX_REQUEST_SIZE) + \
+  ((count % MAX_REQUEST_SIZE) ? 1 : 0);
 }
 
-segments_nr = (int)(count / MAX_REQUEST_SIZE) + \
-  ((count % MAX_REQUEST_SIZE) ? 1 : 0);
-last_segment_size = (int)(count % MAX_REQUEST_SIZE);
-
-segreq->ref = segments_nr;
 segreq->total = count;
-segreq->count = 0;
-segreq->failed = 0;
-
-for (i = 0; i < segments_nr - 1; i++) {
-ret = archipelago_submit_request(s, i * MAX_REQUEST_SIZE,
-   MAX_REQUEST_SIZE,
-   offset + i * MAX_REQUEST_SIZE,
-   aio_cb, segreq, op);
-
+atomic_mb_set(&segreq->ref, segments_nr);
+
+pos = 0;
+for (; segments_nr > 1; segments_nr--) {
+ret = archipelago_submit_request(s, pos,
+ MAX_REQUEST_SIZE,
+ offset + pos,
+ aio_cb, segreq, op);
 if (ret < 0) {
 goto err_exit;
 }
+count -= MAX_REQUEST_SIZE;
+pos += MAX_REQUEST_SIZE;
 }
 
-if ((segments_nr > 1) && last_segment_size) {
-ret = archipelago_submit_request(s, i * MAX_REQUEST_SIZE,
-   last_segment_size,
-   offset + i * MAX_REQUEST_SIZE,
-   aio_cb, segreq, op);
-} else if ((segments_nr > 1) && !last_segment_size) {
-ret = archipelago_submit_request(s, i * MAX_REQUEST_SIZE,
-   MAX_REQUEST_SIZE,
-   offset + i * MAX_REQUEST_SIZE,
-   aio_cb, segreq, op);
-} else if (segments_nr == 1) {
-ret = archipelago_submit_request(s, 0, count, offset, aio_cb,
-   segreq, op);
-}
-
+ret = archipelago_submit_request(s, pos, count,
+ offset + pos,
+ aio_cb, segreq, op);
 if (ret < 0) {
 goto err_exit;
 }
-
 return 0;
 
 err_exit:
-__sync_add_and_fetch(&segreq->failed, 1);
-if (segments_nr == 1) {
-if (__sync_add_and_fetch(&segreq->ref, -1) == 0) {
-g_free(segreq);
-}
-} else {
-if ((__sync_add_and_fetch(&segreq->ref, -segments_nr + i)) == 0) {
-g_free(segreq);
-}
+segreq->failed = 1;
+if (atomic_fetch_sub(&segreq->ref, segments_nr) == segments_nr) {
+g_free(segreq);
 }
 
 return ret;

[Qemu-devel] [v3][PATCH 1/1] hw/pci-assign: split pci-assign.c

2014-09-01 Thread Tiejun Chen

We will try to reuse assign_dev_load_option_rom in xen side, and
especially its a good beginning to unify pci assign codes both on
kvm and xen in the future.

Signed-off-by: Tiejun Chen 
---
 hw/i386/kvm/pci-assign.c| 46 +
 include/hw/pci/pci-assign.h | 16 
 2 files changed, 46 insertions(+), 16 deletions(-)
 create mode 100644 include/hw/pci/pci-assign.h

diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
index 17c7d6dc..fdc7b64 100644
--- a/hw/i386/kvm/pci-assign.c
+++ b/hw/i386/kvm/pci-assign.c
@@ -37,6 +37,7 @@
 #include "hw/pci/pci.h"
 #include "hw/pci/msi.h"
 #include "kvm_i386.h"
+#include "hw/pci/pci-assign.h"
 
 #define MSIX_PAGE_SIZE 0x1000
 
@@ -1896,37 +1897,39 @@ type_init(assign_register_types)
  * load the corresponding ROM data to RAM. If an error occurs while loading an
  * option ROM, we just ignore that option ROM and continue with the next one.
  */
-static void assigned_dev_load_option_rom(AssignedDevice *dev)
+int pci_assign_dev_load_option_rom(PCIDevice *dev, struct Object *owner,
+   void *ptr, unsigned int domain,
+   unsigned int bus, unsigned int slot,
+   unsigned int function)
 {
 char name[32], rom_file[64];
 FILE *fp;
 uint8_t val;
 struct stat st;
-void *ptr;
+int size = 0;
 
 /* If loading ROM from file, pci handles it */
-if (dev->dev.romfile || !dev->dev.rom_bar) {
-return;
+if (dev->romfile || !dev->rom_bar) {
+return -1;
 }
 
 snprintf(rom_file, sizeof(rom_file),
  "/sys/bus/pci/devices/%04x:%02x:%02x.%01x/rom",
- dev->host.domain, dev->host.bus, dev->host.slot,
- dev->host.function);
+ domain, bus, slot, function);
 
 if (stat(rom_file, &st)) {
-return;
+return -1;
 }
 
 if (access(rom_file, F_OK)) {
 error_report("pci-assign: Insufficient privileges for %s", rom_file);
-return;
+return -1;
 }
 
 /* Write "1" to the ROM file to enable it */
 fp = fopen(rom_file, "r+");
 if (fp == NULL) {
-return;
+return -1;
 }
 val = 1;
 if (fwrite(&val, 1, 1, fp) != 1) {
@@ -1934,11 +1937,10 @@ static void assigned_dev_load_option_rom(AssignedDevice 
*dev)
 }
 fseek(fp, 0, SEEK_SET);
 
-snprintf(name, sizeof(name), "%s.rom",
-object_get_typename(OBJECT(dev)));
-memory_region_init_ram(&dev->dev.rom, OBJECT(dev), name, st.st_size);
-vmstate_register_ram(&dev->dev.rom, &dev->dev.qdev);
-ptr = memory_region_get_ram_ptr(&dev->dev.rom);
+snprintf(name, sizeof(name), "%s.rom", object_get_typename(owner));
+memory_region_init_ram(&dev->rom, owner, name, st.st_size);
+vmstate_register_ram(&dev->rom, &dev->qdev);
+ptr = memory_region_get_ram_ptr(&dev->rom);
 memset(ptr, 0xff, st.st_size);
 
 if (!fread(ptr, 1, st.st_size, fp)) {
@@ -1949,8 +1951,9 @@ static void assigned_dev_load_option_rom(AssignedDevice 
*dev)
 goto close_rom;
 }
 
-pci_register_bar(&dev->dev, PCI_ROM_SLOT, 0, &dev->dev.rom);
-dev->dev.has_rom = true;
+pci_register_bar(dev, PCI_ROM_SLOT, 0, &dev->rom);
+dev->has_rom = true;
+size = st.st_size;
 close_rom:
 /* Write "0" to disable ROM */
 fseek(fp, 0, SEEK_SET);
@@ -1959,4 +1962,15 @@ close_rom:
 DEBUG("%s\n", "Failed to disable pci-sysfs rom file");
 }
 fclose(fp);
+
+return size;
+}
+
+static void assigned_dev_load_option_rom(AssignedDevice *dev)
+{
+void *ptr = NULL;
+
+pci_assign_dev_load_option_rom(&dev->dev, OBJECT(dev), ptr,
+   dev->host.domain, dev->host.bus,
+   dev->host.slot, dev->host.function);
 }
diff --git a/include/hw/pci/pci-assign.h b/include/hw/pci/pci-assign.h
new file mode 100644
index 000..6d6bf93
--- /dev/null
+++ b/include/hw/pci/pci-assign.h
@@ -0,0 +1,16 @@
+/*
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Just split from hw/i386/kvm/pci-assign.c.
+ */
+#ifndef PCI_ASSIGN_H
+#define PCI_ASSIGN_H
+
+#include "hw/pci/pci.h"
+
+int pci_assign_dev_load_option_rom(PCIDevice *dev, struct Object *owner,
+   void *ptr, unsigned int domain,
+   unsigned int bus, unsigned int slot,
+   unsigned int function);
+#endif /* PCI_ASSIGN_H */
-- 
1.9.1

[Qemu-devel] [v3][PATCH 0/1] qemu:pci-assign: try to pci-assign.c

2014-09-01 Thread Tiejun Chen

v3:

* Don't move out those common structures.
* Just include minimum required head file.
* Rename dev_load_option_rom with pci_assign_dev_load_option_rom to avoid
  global namespace pollution.

v2:

* v1 is making so much code inline, so try to move it to an out of line file.
* rename pci-assign not pci_assign.

A you know I'm working on supporting IGD passthrough.

Here we need load VGABIOS to work out IGD case. Obviously something may
be duplicated to kvm codes, we should unify some codes but looks its not
easy to finish that in short time. So as Michael suggestion, at least
we'd better split assigned_dev_load_option_rom to reuse on both kvm
and xen.

I don't finish all IGD stuff patches but here I'd like to post
some related codes to show how to use assigned_dev_load_option_rom()
lately.

+static int get_vgabios(XenPCIPassthroughState *s, void *ptr,
+   XenHostPCIDevice *dev)
+{
+int size = 0;
+
+size = dev_load_option_rom(&s->dev, OBJECT(dev), ptr, dev->domain,
+   dev->bus, dev->dev, dev->func);
+
+return size;
+}
+
+int xen_pt_setup_vga(XenPCIPassthroughState *s, XenHostPCIDevice *dev)
+{
+void *bios = NULL;
+int bios_size = 0;
+int rc = 0;
+
+if (!is_vga_passthrough(dev)) {
+return rc;
+}
+
+bios_size = get_vgabios(s, bios, dev);
+if (!bios || !bios_size) {
+XEN_PT_ERR(NULL, "VGA: getting VBIOS!\n");
+rc = -1;
+goto out;
+}
...


Tiejun Chen (1):
  hw/pci-assign: split pci-assign.c
 
 hw/i386/kvm/pci-assign.c| 46 ++
 include/hw/pci/pci-assign.h | 16 
 2 files changed, 46 insertions(+), 16 deletions(-)
 create mode 100644 include/hw/pci/pci-assign.h

Thanks
Tiejun

Re: [Qemu-devel] [PATCH v1 2/2] block/archipelago: Use QEMU atomic builtins

2014-09-01 Thread Chrysostomos Nanakos


On 09/01/2014 01:33 PM, Paolo Bonzini wrote:

Il 01/09/2014 12:13, Chrysostomos Nanakos ha scritto:

-if ((__sync_add_and_fetch(&segreq->ref, -segments_nr + i))
== 0) {
+if ((atomic_add_fetch(&segreq->ref, -segments_nr + i)) == 0) {


What about this one? It seems easier to me to read the above and
understand what is going on.

By any means, it's up to you to accept patch 1 :)

Yeah, you win on this one. :)  But this code could use some
refactoring...

Also, there is also a missing memory barrier before the
first archipelago_submit_request call, and setting "failed" does
not need an atomic operation.


Have to check the refactoring code thoroughly but at a first glance 
seems to be fine.


I'll come back with a new patch for block/archipelago.c then.

Thanks very much for your time and effort!

Regards,
Chrysostomos.


Something like this (untested):

diff --git a/block/archipelago.c b/block/archipelago.c
index 34f72dc..c71898a 100644
--- a/block/archipelago.c
+++ b/block/archipelago.c
@@ -824,76 +824,47 @@ static int 
archipelago_aio_segmented_rw(BDRVArchipelagoState *s,
  ArchipelagoAIOCB *aio_cb,
  int op)
  {
-int i, ret, segments_nr, last_segment_size;
+int i, ret, segments_nr;
+size_t pos;
  ArchipelagoSegmentedRequest *segreq;
  
-segreq = g_new(ArchipelagoSegmentedRequest, 1);

+segreq = g_new0(ArchipelagoSegmentedRequest, 1);
  
  if (op == ARCHIP_OP_FLUSH) {

  segments_nr = 1;
-segreq->ref = segments_nr;
-segreq->total = count;
-segreq->count = 0;
-segreq->failed = 0;
-ret = archipelago_submit_request(s, 0, count, offset, aio_cb,
-   segreq, ARCHIP_OP_FLUSH);
-if (ret < 0) {
-goto err_exit;
-}
-return 0;
+} else {
+segments_nr = (int)(count / MAX_REQUEST_SIZE) + \
+  ((count % MAX_REQUEST_SIZE) ? 1 : 0);
  }
  
-segments_nr = (int)(count / MAX_REQUEST_SIZE) + \

-  ((count % MAX_REQUEST_SIZE) ? 1 : 0);
-last_segment_size = (int)(count % MAX_REQUEST_SIZE);
-
-segreq->ref = segments_nr;
  segreq->total = count;
-segreq->count = 0;
-segreq->failed = 0;
-
-for (i = 0; i < segments_nr - 1; i++) {
-ret = archipelago_submit_request(s, i * MAX_REQUEST_SIZE,
-   MAX_REQUEST_SIZE,
-   offset + i * MAX_REQUEST_SIZE,
-   aio_cb, segreq, op);
-
+atomic_mb_set(&segreq->ref, segments_nr);
+
+pos = 0;
+for (; segments_nr > 1; segments_nr--) {
+ret = archipelago_submit_request(s, pos,
+ MAX_REQUEST_SIZE,
+ offset + pos,
+ aio_cb, segreq, op);
  if (ret < 0) {
  goto err_exit;
  }
+count -= MAX_REQUEST_SIZE;
+pos += MAX_REQUEST_SIZE;
  }
  
-if ((segments_nr > 1) && last_segment_size) {

-ret = archipelago_submit_request(s, i * MAX_REQUEST_SIZE,
-   last_segment_size,
-   offset + i * MAX_REQUEST_SIZE,
-   aio_cb, segreq, op);
-} else if ((segments_nr > 1) && !last_segment_size) {
-ret = archipelago_submit_request(s, i * MAX_REQUEST_SIZE,
-   MAX_REQUEST_SIZE,
-   offset + i * MAX_REQUEST_SIZE,
-   aio_cb, segreq, op);
-} else if (segments_nr == 1) {
-ret = archipelago_submit_request(s, 0, count, offset, aio_cb,
-   segreq, op);
-}
-
+ret = archipelago_submit_request(s, pos, count,
+ offset + pos,
+ aio_cb, segreq, op);
  if (ret < 0) {
  goto err_exit;
  }
-
  return 0;
  
  err_exit:

-__sync_add_and_fetch(&segreq->failed, 1);
-if (segments_nr == 1) {
-if (__sync_add_and_fetch(&segreq->ref, -1) == 0) {
-g_free(segreq);
-}
-} else {
-if ((__sync_add_and_fetch(&segreq->ref, -segments_nr + i)) == 0) {
-g_free(segreq);
-}
+segreq->failed = 1;
+if (atomic_fetch_sub(&segreq->ref, segments_nr) == segments_nr) {
+g_free(segreq);
  }
  
  return ret;

Re: [Qemu-devel] [libvirt] IO accounting overhaul

2014-09-01 Thread Benoît Canet

The Monday 01 Sep 2014 à 11:52:00 (+0200), Markus Armbruster wrote :
> Cc'ing libvirt following Stefan's lead.
> 
> Benoît Canet  writes:
> 
> > Hi,
> >
> > I collected some items of a cloud provider wishlist regarding I/O accouting.
> 
> Feedback from real power-users, lovely!
> 
> > In a cloud I/O accouting can have 3 purpose: billing, helping the customers
> > and doing metrology to help the cloud provider seeks hidden costs.
> >
> > I'll cover the two former topic in this mail because they are the most 
> > important
> > business wize.
> >
> > 1) prefered place to collect billing IO accounting data:
> > 
> > For billing purpose the collected data must be as close as possible to what 
> > the
> > customer would see by using iostats in his vm.
> 
> Good point.
> 
> > The first conclusion we can draw is that the choice of collecting IO 
> > accouting
> > data used for billing in the block devices models is right.
> 
> Slightly rephrasing: doing I/O accounting in the block device models is
> right for billing.
> 
> There may be other uses for I/O accounting, with different preferences.
> For instance, data on how exactly guest I/O gets translated to host I/O
> as it flows through the nodes in the block graph could be useful.

I think this is the third point that I named as metrology.
Basically it boils down to "Where are the hidden IO costs of the QEMU block 
layer".

> 
> Doesn't diminish the need for accurate billing information, of course.
> 
> > 2) what to do with occurences of rare events:
> > -
> >
> > Another point is that QEMU developpers agree that they don't know which 
> > policy
> > to apply to some I/O accounting events.
> > Must QEMU discard invalid I/O write IO or account them as done ?
> > Must QEMU count a failed read I/O as done ?
> >
> > When discusting this with a cloud provider the following appears:
> > these decisions
> > are really specific to each cloud provider and QEMU should not implement 
> > them.
> 
> Good point, consistent with the old advice to avoid baking policy into
> inappropriately low levels of the stack.
> 
> > The right thing to do is to add accouting counters to collect these events.
> >
> > Moreover these rare events are precious troubleshooting data so it's
> > an additional
> > reason not to toss them.
> 
> Another good point.
> 
> > 3) list of block I/O accouting metrics wished for billing and helping
> > the customers
> > ---
> >
> > Basic I/O accouting data will end up making the customers bills.
> > Extra I/O accouting informations would be a precious help for the cloud 
> > provider
> > to implement a monitoring panel like Amazon Cloudwatch.
> 
> These are the first two from your list of three purposes, i.e. the ones
> you promised to cover here.
> 
> > Here is the list of counters and statitics I would like to help
> > implement in QEMU.
> >
> > This is the most important part of the mail and the one I would like
> > the community
> > review the most.
> >
> > Once this list is settled I would proceed to implement the required
> > infrastructure
> > in QEMU before using it in the device models.
> 
> For context, let me recap how I/O accounting works now.
> 
> The BlockDriverState abstract data type (short: BDS) can hold the
> following accounting data:
> 
> uint64_t nr_bytes[BDRV_MAX_IOTYPE];
> uint64_t nr_ops[BDRV_MAX_IOTYPE];
> uint64_t total_time_ns[BDRV_MAX_IOTYPE];
> uint64_t wr_highest_sector;
> 
> where BDRV_MAX_IOTYPE enumerates read, write, flush.
> 
> wr_highest_sector is a high watermark updated by the block layer as it
> writes sectors.
> 
> The other three are *not* touched by the block layer.  Instead, the
> block layer provides a pair of functions for device models to update
> them:
> 
> void bdrv_acct_start(BlockDriverState *bs, BlockAcctCookie *cookie,
> int64_t bytes, enum BlockAcctType type);
> void bdrv_acct_done(BlockDriverState *bs, BlockAcctCookie *cookie);
> 
> bdrv_acct_start() initializes cookie for a read, write, or flush
> operation of a certain size.  The size of a flush is always zero.
> 
> bdrv_acct_done() adds the operations to the BDS's accounting data.
> total_time_ns is incremented by the time between _start() and _done().
> 
> You may call _start() without calling _done().  That's a feature.
> Device models use it to avoid accounting some requests.
> 
> Device models are not supposed to mess with cookie directly, only
> through these two functions.
> 
> Some device models implement accounting, some don't.  The ones that do
> don't agree on how to count invalid guest requests (the ones not passed
> to block layer) and failed requests (passed to block layer and failed
> there).  It's a mess in part caused by us never writing down what
> exactly device models are expected to do.
> 
> Accounting data is used by "q

Re: [Qemu-devel] [v3][PATCH 1/1] hw/pci-assign: split pci-assign.c

2014-09-01 Thread Michael S. Tsirkin

On Mon, Sep 01, 2014 at 06:36:47PM +0800, Tiejun Chen wrote:
> We will try to reuse assign_dev_load_option_rom in xen side, and
> especially its a good beginning to unify pci assign codes both on
> kvm and xen in the future.
> 
> Signed-off-by: Tiejun Chen 
> ---
>  hw/i386/kvm/pci-assign.c| 46 
> +
>  include/hw/pci/pci-assign.h | 16 
>  2 files changed, 46 insertions(+), 16 deletions(-)
>  create mode 100644 include/hw/pci/pci-assign.h
> 
> diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
> index 17c7d6dc..fdc7b64 100644
> --- a/hw/i386/kvm/pci-assign.c
> +++ b/hw/i386/kvm/pci-assign.c
> @@ -37,6 +37,7 @@
>  #include "hw/pci/pci.h"
>  #include "hw/pci/msi.h"
>  #include "kvm_i386.h"
> +#include "hw/pci/pci-assign.h"
>  
>  #define MSIX_PAGE_SIZE 0x1000
>  
> @@ -1896,37 +1897,39 @@ type_init(assign_register_types)
>   * load the corresponding ROM data to RAM. If an error occurs while loading 
> an
>   * option ROM, we just ignore that option ROM and continue with the next one.
>   */
> -static void assigned_dev_load_option_rom(AssignedDevice *dev)
> +int pci_assign_dev_load_option_rom(PCIDevice *dev, struct Object *owner,
> +   void *ptr, unsigned int domain,

ptr parameter seems unused.

> +   unsigned int bus, unsigned int slot,
> +   unsigned int function)
>  {
>  char name[32], rom_file[64];
>  FILE *fp;
>  uint8_t val;
>  struct stat st;
> -void *ptr;
> +int size = 0;
>  
>  /* If loading ROM from file, pci handles it */
> -if (dev->dev.romfile || !dev->dev.rom_bar) {
> -return;
> +if (dev->romfile || !dev->rom_bar) {
> +return -1;
>  }
>  
>  snprintf(rom_file, sizeof(rom_file),
>   "/sys/bus/pci/devices/%04x:%02x:%02x.%01x/rom",
> - dev->host.domain, dev->host.bus, dev->host.slot,
> - dev->host.function);
> + domain, bus, slot, function);
>  
>  if (stat(rom_file, &st)) {
> -return;
> +return -1;
>  }
>  
>  if (access(rom_file, F_OK)) {
>  error_report("pci-assign: Insufficient privileges for %s", rom_file);
> -return;
> +return -1;
>  }
>  
>  /* Write "1" to the ROM file to enable it */
>  fp = fopen(rom_file, "r+");
>  if (fp == NULL) {
> -return;
> +return -1;
>  }
>  val = 1;
>  if (fwrite(&val, 1, 1, fp) != 1) {
> @@ -1934,11 +1937,10 @@ static void 
> assigned_dev_load_option_rom(AssignedDevice *dev)
>  }
>  fseek(fp, 0, SEEK_SET);
>  
> -snprintf(name, sizeof(name), "%s.rom",
> -object_get_typename(OBJECT(dev)));
> -memory_region_init_ram(&dev->dev.rom, OBJECT(dev), name, st.st_size);
> -vmstate_register_ram(&dev->dev.rom, &dev->dev.qdev);
> -ptr = memory_region_get_ram_ptr(&dev->dev.rom);
> +snprintf(name, sizeof(name), "%s.rom", object_get_typename(owner));
> +memory_region_init_ram(&dev->rom, owner, name, st.st_size);
> +vmstate_register_ram(&dev->rom, &dev->qdev);
> +ptr = memory_region_get_ram_ptr(&dev->rom);
>  memset(ptr, 0xff, st.st_size);
>  
>  if (!fread(ptr, 1, st.st_size, fp)) {
> @@ -1949,8 +1951,9 @@ static void assigned_dev_load_option_rom(AssignedDevice 
> *dev)
>  goto close_rom;
>  }
>  
> -pci_register_bar(&dev->dev, PCI_ROM_SLOT, 0, &dev->dev.rom);
> -dev->dev.has_rom = true;
> +pci_register_bar(dev, PCI_ROM_SLOT, 0, &dev->rom);
> +dev->has_rom = true;
> +size = st.st_size;
>  close_rom:
>  /* Write "0" to disable ROM */
>  fseek(fp, 0, SEEK_SET);
> @@ -1959,4 +1962,15 @@ close_rom:
>  DEBUG("%s\n", "Failed to disable pci-sysfs rom file");
>  }
>  fclose(fp);
> +
> +return size;
> +}
> +
> +static void assigned_dev_load_option_rom(AssignedDevice *dev)
> +{
> +void *ptr = NULL;
> +

This is never modified, I don't think you need this
variable.

> +pci_assign_dev_load_option_rom(&dev->dev, OBJECT(dev), ptr,
> +   dev->host.domain, dev->host.bus,
> +   dev->host.slot, dev->host.function);
>  }
> diff --git a/include/hw/pci/pci-assign.h b/include/hw/pci/pci-assign.h
> new file mode 100644
> index 000..6d6bf93
> --- /dev/null
> +++ b/include/hw/pci/pci-assign.h
> @@ -0,0 +1,16 @@
> +/*
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + * Just split from hw/i386/kvm/pci-assign.c.
> + */
> +#ifndef PCI_ASSIGN_H
> +#define PCI_ASSIGN_H
> +
> +#include "hw/pci/pci.h"
> +
> +int pci_assign_dev_load_option_rom(PCIDevice *dev, struct Object *owner,
> +   void *ptr, unsigned int domain,
> +   unsigned int bus, unsigned int slot,
> +   un

Re: [Qemu-devel] [PATCH] KVM_CAP_IRQFD and KVM_CAP_IRQFD_RESAMPLE checks

2014-09-01 Thread Paolo Bonzini

Il 29/08/2014 19:38, Eric Auger ha scritto:
> Compute kvm_irqfds_allowed by checking the KVM_CAP_IRQFD extension.
> Remove direct settings in architecture specific files.
> 
> Add a new kvm_resamplefds_allowed variable, initialized by
> checking the KVM_CAP_IRQFD_RESAMPLE extension. Add a corresponding
> kvm_resamplefds_enabled() function.

Please add a user too (in hw/misc/vfio.c).

Otherwise looks good, thanks!

Paolo

> Signed-off-by: Eric Auger 
> 
> ---
> 
> in practice KVM_CAP_IRQFD_RESAMPLE seems to be always enabled
> as soon as kernel has HAVE_KVM_IRQFD so the resamplefd check
> may be unnecessary.
> ---
>  hw/intc/openpic_kvm.c |  1 -
>  hw/intc/xics_kvm.c|  1 -
>  include/sysemu/kvm.h  | 10 ++
>  kvm-all.c |  7 +++
>  target-i386/kvm.c |  1 -
>  target-s390x/kvm.c|  1 -
>  6 files changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/intc/openpic_kvm.c b/hw/intc/openpic_kvm.c
> index e3bce04..6cef3b1 100644
> --- a/hw/intc/openpic_kvm.c
> +++ b/hw/intc/openpic_kvm.c
> @@ -229,7 +229,6 @@ static void kvm_openpic_realize(DeviceState *dev, Error 
> **errp)
>  kvm_irqchip_add_irq_route(kvm_state, i, 0, i);
>  }
>  
> -kvm_irqfds_allowed = true;
>  kvm_msi_via_irqfd_allowed = true;
>  kvm_gsi_routing_allowed = true;
>  
> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
> index 20b19e9..c15453f 100644
> --- a/hw/intc/xics_kvm.c
> +++ b/hw/intc/xics_kvm.c
> @@ -448,7 +448,6 @@ static void xics_kvm_realize(DeviceState *dev, Error 
> **errp)
>  }
>  
>  kvm_kernel_irqchip = true;
> -kvm_irqfds_allowed = true;
>  kvm_msi_via_irqfd_allowed = true;
>  kvm_gsi_direct_mapping = true;
>  
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index 174ea36..69c4d0f 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -45,6 +45,7 @@ extern bool kvm_async_interrupts_allowed;
>  extern bool kvm_halt_in_kernel_allowed;
>  extern bool kvm_eventfds_allowed;
>  extern bool kvm_irqfds_allowed;
> +extern bool kvm_resamplefds_allowed;
>  extern bool kvm_msi_via_irqfd_allowed;
>  extern bool kvm_gsi_routing_allowed;
>  extern bool kvm_gsi_direct_mapping;
> @@ -102,6 +103,15 @@ extern bool kvm_readonly_mem_allowed;
>  #define kvm_irqfds_enabled() (kvm_irqfds_allowed)
>  
>  /**
> + * kvm_resamplefds_enabled:
> + *
> + * Returns: true if we can use resamplefds to inject interrupts into
> + * a KVM CPU (ie the kernel supports resamplefds and we are running
> + * with a configuration where it is meaningful to use them).
> + */
> +#define kvm_resamplefds_enabled() (kvm_resamplefds_allowed)
> +
> +/**
>   * kvm_msi_via_irqfd_enabled:
>   *
>   * Returns: true if we can route a PCI MSI (Message Signaled Interrupt)
> diff --git a/kvm-all.c b/kvm-all.c
> index 1402f4f..fdc97d6 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -116,6 +116,7 @@ bool kvm_async_interrupts_allowed;
>  bool kvm_halt_in_kernel_allowed;
>  bool kvm_eventfds_allowed;
>  bool kvm_irqfds_allowed;
> +bool kvm_resamplefds_allowed;
>  bool kvm_msi_via_irqfd_allowed;
>  bool kvm_gsi_routing_allowed;
>  bool kvm_gsi_direct_mapping;
> @@ -1548,6 +1549,12 @@ int kvm_init(MachineClass *mc)
>  kvm_eventfds_allowed =
>  (kvm_check_extension(s, KVM_CAP_IOEVENTFD) > 0);
>  
> +kvm_irqfds_allowed =
> +(kvm_check_extension(s, KVM_CAP_IRQFD) > 0);
> +
> +kvm_resamplefds_allowed =
> +(kvm_check_extension(s, KVM_CAP_IRQFD_RESAMPLE) > 0);
> +
>  ret = kvm_arch_init(s);
>  if (ret < 0) {
>  goto err;
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 097fe11..4bc2d80 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -2447,7 +2447,6 @@ void kvm_arch_init_irq_routing(KVMState *s)
>   * irqchip, so we can use irqfds, and on x86 we know
>   * we can use msi via irqfd and GSI routing.
>   */
> -kvm_irqfds_allowed = true;
>  kvm_msi_via_irqfd_allowed = true;
>  kvm_gsi_routing_allowed = true;
>  }
> diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
> index a32d91a..4d2bca6 100644
> --- a/target-s390x/kvm.c
> +++ b/target-s390x/kvm.c
> @@ -1281,7 +1281,6 @@ void kvm_arch_init_irq_routing(KVMState *s)
>   * have to override the common code kvm_halt_in_kernel_allowed setting.
>   */
>  if (kvm_check_extension(s, KVM_CAP_IRQ_ROUTING)) {
> -kvm_irqfds_allowed = true;
>  kvm_gsi_routing_allowed = true;
>  kvm_halt_in_kernel_allowed = false;
>  }
>

Re: [Qemu-devel] [PATCH] block: kill tail whitespace in block.c

2014-09-01 Thread Benoît Canet

The Monday 01 Sep 2014 à 13:35:21 (+0800), Liu Yuan wrote :
> Cc: Kevin Wolf 
> Cc: Stefan Hajnoczi 
> Signed-off-by: Liu Yuan 
> ---
>  block.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block.c b/block.c
> index e9380f6..c12b8de 100644
> --- a/block.c
> +++ b/block.c
> @@ -2239,7 +2239,7 @@ int bdrv_commit(BlockDriverState *bs)
>  
>  if (!drv)
>  return -ENOMEDIUM;
> -
> +
>  if (!bs->backing_hd) {
>  return -ENOTSUP;
>  }
> -- 
> 1.9.1
> 
> 

Reviewed-by: Benoît Canet 

You should probably CC:  qemu-triv...@nongnu.org

See http://wiki.qemu.org/Contribute/TrivialPatches

[Qemu-devel] [PATCH v2] qcow2: add update refcount table realization for update_refcount

2014-09-01 Thread Jun Li

When every item of refcount block is NULL, free refcount block and reset the
corresponding item of refcount table with NULL.

Signed-off-by: Jun Li 
---

The v2 do following change to modify some potential issue.

 +--- Here should start from "0".
 |
for (k = 0; k < refcount_block_entries; k++) {
if (refcount_block[k] != cpu_to_be16(0)) {
...| |
}  | |
}  | + Using "0" is more safe.
   |
   + This should be "k" not "++k".
---
 block/qcow2-refcount.c | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 43665b8..63f36e6 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -586,6 +586,37 @@ static int QEMU_WARN_UNUSED_RESULT 
update_refcount(BlockDriverState *bs,
 if (refcount == 0 && s->discard_passthrough[type]) {
 update_refcount_discard(bs, cluster_offset, s->cluster_size);
 }
+
+/* When refcount block is NULL, update refcount table */
+if (block_index == 0) {
+int k = block_index;
+int refcount_block_entries = s->cluster_size / sizeof(uint16_t);
+for (k = 0; k < refcount_block_entries; k++) {
+if (refcount_block[k] != cpu_to_be16(0)) {
+break;
+}
+}
+
+if (k == refcount_block_entries) {
+qemu_vfree(refcount_block);
+/* update refcount table */
+unsigned int refcount_table_index;
+uint64_t data64 = cpu_to_be64(0);
+refcount_table_index = cluster_index >> (s->cluster_bits -
+   REFCOUNT_SHIFT);
+ret = bdrv_pwrite_sync(bs->file,
+   s->refcount_table_offset +
+   refcount_table_index *
+   sizeof(uint64_t),
+   &data64, sizeof(data64));
+if (ret < 0) {
+goto fail;
+}
+
+s->refcount_table[refcount_table_index] = data64;
+
+}
+}
 }
 
 ret = 0;
-- 
1.9.3

[Qemu-devel] [PATCH v7 02/15] target-tricore: Add board for systemmode

2014-09-01 Thread Bastian Koppelmann

Add basic board to allow systemmode emulation

Signed-off-by: Bastian Koppelmann 
---
v6 -> v7:
- TRICORECPU -> TriCoreCPU.
- CPUTRICOREState -> CPUTriCoreState.
- tricore_testboard.c: Change Licence to GPL v2.
- fprintf(stderr, ..) -> error_report(..).
- tricore_boot_info: Remove unused fields.
- tricore_testboard.c: Remove flash drive.
- tricore_testboard.c: Is not default anymore, change desc.

 hw/tricore/Makefile.objs   |   1 +
 hw/tricore/tricore_testboard.c | 124 +
 include/hw/tricore/tricore.h   |  11 
 3 files changed, 136 insertions(+)
 create mode 100644 hw/tricore/Makefile.objs
 create mode 100644 hw/tricore/tricore_testboard.c
 create mode 100644 include/hw/tricore/tricore.h

diff --git a/hw/tricore/Makefile.objs b/hw/tricore/Makefile.objs
new file mode 100644
index 000..435e095
--- /dev/null
+++ b/hw/tricore/Makefile.objs
@@ -0,0 +1 @@
+obj-y += tricore_testboard.o
diff --git a/hw/tricore/tricore_testboard.c b/hw/tricore/tricore_testboard.c
new file mode 100644
index 000..f412e27
--- /dev/null
+++ b/hw/tricore/tricore_testboard.c
@@ -0,0 +1,124 @@
+/*
+ * TriCore Baseboard System emulation.
+ *
+ * Copyright (c) 2013-2014 Bastian Koppelmann C-Lab/University Paderborn
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+
+#include "hw/hw.h"
+#include "hw/devices.h"
+#include "net/net.h"
+#include "sysemu/sysemu.h"
+#include "hw/boards.h"
+#include "hw/loader.h"
+#include "sysemu/blockdev.h"
+#include "exec/address-spaces.h"
+#include "hw/block/flash.h"
+#include "elf.h"
+#include "hw/tricore/tricore.h"
+#include "qemu/error-report.h"
+
+
+/* Board init.  */
+
+static struct tricore_boot_info tricoretb_binfo;
+
+static void tricore_load_kernel(CPUTriCoreState *env)
+{
+uint64_t entry;
+long kernel_size;
+
+kernel_size = load_elf(tricoretb_binfo.kernel_filename, NULL,
+   NULL, (uint64_t *)&entry, NULL,
+   NULL, 0,
+   ELF_MACHINE, 1);
+if (kernel_size <= 0) {
+error_report("qemu: no kernel file '%s'",
+tricoretb_binfo.kernel_filename);
+exit(1);
+}
+env->PC = entry;
+
+}
+
+static void tricore_testboard_init(MachineState *machine, int board_id)
+{
+TriCoreCPU *cpu;
+CPUTriCoreState *env;
+
+MemoryRegion *sysmem = get_system_memory();
+MemoryRegion *ext_cram = g_new(MemoryRegion, 1);
+MemoryRegion *ext_dram = g_new(MemoryRegion, 1);
+MemoryRegion *int_cram = g_new(MemoryRegion, 1);
+MemoryRegion *int_dram = g_new(MemoryRegion, 1);
+MemoryRegion *pcp_data = g_new(MemoryRegion, 1);
+MemoryRegion *pcp_text = g_new(MemoryRegion, 1);
+
+if (!machine->cpu_model) {
+machine->cpu_model = "tc1796";
+}
+cpu = cpu_tricore_init(machine->cpu_model);
+env = &cpu->env;
+if (!cpu) {
+error_report("Unable to find CPU definition");
+exit(1);
+}
+memory_region_init_ram(ext_cram, NULL, "powerlink_ext_c.ram", 2*1024*1024);
+vmstate_register_ram_global(ext_cram);
+memory_region_init_ram(ext_dram, NULL, "powerlink_ext_d.ram", 4*1024*1024);
+vmstate_register_ram_global(ext_dram);
+memory_region_init_ram(int_cram, NULL, "powerlink_int_c.ram", 48*1024);
+vmstate_register_ram_global(int_cram);
+memory_region_init_ram(int_dram, NULL, "powerlink_int_d.ram", 48*1024);
+vmstate_register_ram_global(int_dram);
+memory_region_init_ram(pcp_data, NULL, "powerlink_pcp_data.ram", 16*1024);
+vmstate_register_ram_global(pcp_data);
+memory_region_init_ram(pcp_text, NULL, "powerlink_pcp_text.ram", 32*1024);
+vmstate_register_ram_global(pcp_text);
+
+memory_region_add_subregion(sysmem, 0x8000, ext_cram);
+memory_region_add_subregion(sysmem, 0xa100, ext_dram);
+memory_region_add_subregion(sysmem, 0xd400, int_cram);
+memory_region_add_subregion(sysmem, 0xd000, int_dram);
+memory_region_add_subregion(sysmem, 0xf005, pcp_data);
+memory_region_add_subregion(sysmem, 0xf006, pcp_text);
+
+tricoretb_binfo.ram_size = machine->ram_size;
+tricoretb_binfo.kernel_filename = machine->kernel_filename;
+
+if (machine->kernel_filename) {
+tricore_load_kernel(env);
+}
+}
+
+static void tricoreboard_init(MachineState *mach

[Qemu-devel] [PATCH v7 01/15] target-tricore: Add target stubs and qom-cpu

2014-09-01 Thread Bastian Koppelmann

Add TriCore target stubs, and QOM cpu, and Maintainer

Signed-off-by: Bastian Koppelmann 
---
v6 -> v7:
- TRICORECPU -> TriCoreCPU.
- TRICORECPUClass -> TriCoreCPUClass.
- CPUTRICOREState -> CPUTriCoreState.
- TRICORECPUInfo: Add terminator.
- TRICORECPUInfo -> TriCoreCPUInfo.
- Remove ARM-style IRQ and FIQ lines.
- CPUTRICOREState: target_ulong -> uint32_t.
- CPUTRICOREState: Move mask defines below the struct.

 MAINTAINERS   |   6 +
 arch_init.c   |   2 +
 cpu-exec.c|  11 +-
 cpus.c|   6 +
 include/elf.h |   2 +
 include/sysemu/arch_init.h|   1 +
 target-tricore/Makefile.objs  |   1 +
 target-tricore/cpu-qom.h  |  71 
 target-tricore/cpu.c  | 192 
 target-tricore/cpu.h  | 405 ++
 target-tricore/helper.c   |  92 ++
 target-tricore/helper.h   |   0
 target-tricore/op_helper.c|  27 +++
 target-tricore/translate.c| 100 +++
 target-tricore/tricore-defs.h |  28 +++
 15 files changed, 943 insertions(+), 1 deletion(-)
 create mode 100644 target-tricore/Makefile.objs
 create mode 100644 target-tricore/cpu-qom.h
 create mode 100644 target-tricore/cpu.c
 create mode 100644 target-tricore/cpu.h
 create mode 100644 target-tricore/helper.c
 create mode 100644 target-tricore/helper.h
 create mode 100644 target-tricore/op_helper.c
 create mode 100644 target-tricore/translate.c
 create mode 100644 target-tricore/tricore-defs.h

diff --git a/MAINTAINERS b/MAINTAINERS
index a74c04c..142f68a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -161,6 +161,12 @@ S: Maintained
 F: target-xtensa/
 F: hw/xtensa/
 
+TriCore
+M: Bastian Koppelmann 
+S: Maintained
+F: target-tricore/
+F: hw/tricore/
+
 Guest CPU Cores (KVM):
 --
 
diff --git a/arch_init.c b/arch_init.c
index 28ece76..c974f3f 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -104,6 +104,8 @@ int graphic_depth = 32;
 #define QEMU_ARCH QEMU_ARCH_XTENSA
 #elif defined(TARGET_UNICORE32)
 #define QEMU_ARCH QEMU_ARCH_UNICORE32
+#elif defined(TARGET_TRICORE)
+#define QEMU_ARCH QEMU_ARCH_TRICORE
 #endif
 
 const uint32_t arch_type = QEMU_ARCH;
diff --git a/cpu-exec.c b/cpu-exec.c
index c6aad74..7b5d2e2 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -387,6 +387,7 @@ int cpu_exec(CPUArchState *env)
 #elif defined(TARGET_CRIS)
 #elif defined(TARGET_S390X)
 #elif defined(TARGET_XTENSA)
+#elif defined(TARGET_TRICORE)
 /* X */
 #else
 #error unsupported target CPU
@@ -444,7 +445,8 @@ int cpu_exec(CPUArchState *env)
 }
 #if defined(TARGET_ARM) || defined(TARGET_SPARC) || defined(TARGET_MIPS) || \
 defined(TARGET_PPC) || defined(TARGET_ALPHA) || defined(TARGET_CRIS) || \
-defined(TARGET_MICROBLAZE) || defined(TARGET_LM32) || 
defined(TARGET_UNICORE32)
+defined(TARGET_MICROBLAZE) || defined(TARGET_LM32) ||   \
+defined(TARGET_UNICORE32) || defined(TARGET_TRICORE)
 if (interrupt_request & CPU_INTERRUPT_HALT) {
 cpu->interrupt_request &= ~CPU_INTERRUPT_HALT;
 cpu->halted = 1;
@@ -560,6 +562,12 @@ int cpu_exec(CPUArchState *env)
 cc->do_interrupt(cpu);
 next_tb = 0;
 }
+#elif defined(TARGET_TRICORE)
+if ((interrupt_request & CPU_INTERRUPT_HARD)) {
+cc->do_interrupt(cpu);
+next_tb = 0;
+}
+
 #elif defined(TARGET_OPENRISC)
 {
 int idx = -1;
@@ -846,6 +854,7 @@ int cpu_exec(CPUArchState *env)
   | env->cc_dest | (env->cc_x << 4);
 #elif defined(TARGET_MICROBLAZE)
 #elif defined(TARGET_MIPS)
+#elif defined(TARGET_TRICORE)
 #elif defined(TARGET_MOXIE)
 #elif defined(TARGET_OPENRISC)
 #elif defined(TARGET_SH4)
diff --git a/cpus.c b/cpus.c
index eb1ac85..0f7d0ea 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1410,6 +1410,9 @@ CpuInfoList *qmp_query_cpus(Error **errp)
 #elif defined(TARGET_MIPS)
 MIPSCPU *mips_cpu = MIPS_CPU(cpu);
 CPUMIPSState *env = &mips_cpu->env;
+#elif defined(TARGET_TRICORE)
+TriCoreCPU *tricore_cpu = TRICORE_CPU(cpu);
+CPUTriCoreState *env = &tricore_cpu->env;
 #endif
 
 cpu_synchronize_state(cpu);
@@ -1434,6 +1437,9 @@ CpuInfoList *qmp_query_cpus(Error **errp)
 #elif defined(TARGET_MIPS)
 info->value->has_PC = true;
 info->value->PC = env->active_tc.PC;
+#elif defined(TARGET_TRICORE)
+info->value->has_PC = true;
+info->value->PC = env->PC;
 #endif
 
 /* XXX: waiting for the qapi to support GSList */
diff --git a/include/elf.h b/include/elf.h
index e88d52f..70107f0 100644
--- a/include/elf.h
+++ b/include/elf.h
@@ -92,6 +92,8 @@ typedef int64_t  Elf64_Sxword;
 
 #define EM_SPARCV9 43  /* SPARC v9 64-bit */

Re: [Qemu-devel] [PATCH 2/2] docs/qcow2: Correct refcount_block_entries

2014-09-01 Thread Benoît Canet

The Friday 29 Aug 2014 à 23:45:27 (+0200), Max Reitz wrote :
> A refblock entry may have a different size than 16 bits, it may even be
> smaller than a byte. Correct the refcount_block_entries calculation

Now if the refblock entry size is smaller than a byte

> accordingly.
> 
> Signed-off-by: Max Reitz 
> ---
>  docs/specs/qcow2.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
> index cfbc8b0..531c478 100644
> --- a/docs/specs/qcow2.txt
> +++ b/docs/specs/qcow2.txt
> @@ -183,7 +183,7 @@ blocks and are exactly one cluster in size.
>  Given a offset into the image file, the refcount of its cluster can be 
> obtained
>  as follows:
>  
> -refcount_block_entries = (cluster_size / sizeof(uint16_t))
> +refcount_block_entries = (cluster_size / (refcount_bits / 8))

This give a divide by zero error.   ^

>  
>  refcount_block_index = (offset / cluster_size) % refcount_block_entries
>  refcount_table_index = (offset / cluster_size) / refcount_block_entries
> -- 
> 2.1.0
> 
>

[Qemu-devel] [PATCH v7 05/15] target-tricore: Add masks and opcodes for decoding

2014-09-01 Thread Bastian Koppelmann

Add masks and opcodes for decoding TriCore instructions.

Signed-off-by: Bastian Koppelmann 

Reviewed-by: Richard Henderson 
---
 target-tricore/translate.c   |1 +
 target-tricore/tricore-opcodes.h | 1406 ++
 2 files changed, 1407 insertions(+)
 create mode 100644 target-tricore/tricore-opcodes.h

diff --git a/target-tricore/translate.c b/target-tricore/translate.c
index 7691b11..e06cf41 100644
--- a/target-tricore/translate.c
+++ b/target-tricore/translate.c
@@ -26,6 +26,7 @@
 #include "exec/helper-proto.h"
 #include "exec/helper-gen.h"
 
+#include "tricore-opcodes.h"
 /*
  * TCG registers
  */
diff --git a/target-tricore/tricore-opcodes.h b/target-tricore/tricore-opcodes.h
new file mode 100644
index 000..9c6ec01
--- /dev/null
+++ b/target-tricore/tricore-opcodes.h
@@ -0,0 +1,1406 @@
+/*
+ *  Copyright (c) 2012-2014 Bastian Koppelmann C-Lab/University Paderborn
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+/*
+ * Opcode Masks for Tricore
+ * Format MASK_OP_InstrFormatName_Field
+ */
+
+/* This creates a mask with bits start .. end set to 1 and applies it to op */
+#define MASK_BITS_SHIFT(op, start, end) (extract32(op, (start), \
+(end) - (start) + 1))
+#define MASK_BITS_SHIFT_SEXT(op, start, end) (sextract32(op, (start),\
+ (end) - (start) + 1))
+
+/* new opcode masks */
+
+#define MASK_OP_MAJOR(op)  MASK_BITS_SHIFT(op, 0, 7)
+
+/* 16-Bit Formats */
+#define MASK_OP_SB_DISP8(op)   MASK_BITS_SHIFT(op, 8, 15)
+#define MASK_OP_SB_DISP8_SEXT(op) MASK_BITS_SHIFT_SEXT(op, 8, 15)
+
+#define MASK_OP_SBC_CONST4(op) MASK_BITS_SHIFT(op, 12, 15)
+#define MASK_OP_SBC_CONST4_SEXT(op) MASK_BITS_SHIFT_SEXT(op, 12, 15)
+#define MASK_OP_SBC_DISP4(op)  MASK_BITS_SHIFT(op, 8, 11)
+
+#define MASK_OP_SBR_S2(op) MASK_BITS_SHIFT(op, 12, 15)
+#define MASK_OP_SBR_DISP4(op)  MASK_BITS_SHIFT(op, 8, 11)
+
+#define MASK_OP_SBRN_N(op) MASK_BITS_SHIFT(op, 12, 15)
+#define MASK_OP_SBRN_DISP4(op) MASK_BITS_SHIFT(op, 8, 11)
+
+#define MASK_OP_SC_CONST8(op)  MASK_BITS_SHIFT(op, 8, 15)
+
+#define MASK_OP_SLR_S2(op) MASK_BITS_SHIFT(op, 12, 15)
+#define MASK_OP_SLR_D(op)  MASK_BITS_SHIFT(op, 8, 11)
+
+#define MASK_OP_SLRO_OFF4(op)  MASK_BITS_SHIFT(op, 12, 15)
+#define MASK_OP_SLRO_D(op) MASK_BITS_SHIFT(op, 8, 11)
+
+#define MASK_OP_SR_OP2(op) MASK_BITS_SHIFT(op, 12, 15)
+#define MASK_OP_SR_S1D(op) MASK_BITS_SHIFT(op, 8, 11)
+
+#define MASK_OP_SRC_CONST4(op) MASK_BITS_SHIFT(op, 12, 15)
+#define MASK_OP_SRC_CONST4_SEXT(op) MASK_BITS_SHIFT_SEXT(op, 12, 15)
+#define MASK_OP_SRC_S1D(op)MASK_BITS_SHIFT(op, 8, 11)
+
+#define MASK_OP_SRO_S2(op) MASK_BITS_SHIFT(op, 12, 15)
+#define MASK_OP_SRO_OFF4(op)   MASK_BITS_SHIFT(op, 8, 11)
+
+#define MASK_OP_SRR_S2(op) MASK_BITS_SHIFT(op, 12, 15)
+#define MASK_OP_SRR_S1D(op)MASK_BITS_SHIFT(op, 8, 11)
+
+#define MASK_OP_SRRS_S2(op)MASK_BITS_SHIFT(op, 12, 15)
+#define MASK_OP_SRRS_S1D(op)   MASK_BITS_SHIFT(op, 8, 11)
+#define MASK_OP_SRRS_N(op) MASK_BITS_SHIFT(op, 6, 7)
+
+#define MASK_OP_SSR_S2(op) MASK_BITS_SHIFT(op, 12, 15)
+#define MASK_OP_SSR_S1(op) MASK_BITS_SHIFT(op, 8, 11)
+
+#define MASK_OP_SSRO_OFF4(op)  MASK_BITS_SHIFT(op, 12, 15)
+#define MASK_OP_SSRO_S1(op)MASK_BITS_SHIFT(op, 8, 11)
+
+/* 32-Bit Formats */
+
+/* ABS Format */
+#define MASK_OP_ABS_OFF18(op)  (MASK_BITS_SHIFT(op, 16, 21) +   \
+   (MASK_BITS_SHIFT(op, 28, 31) << 6) + \
+   (MASK_BITS_SHIFT(op, 22, 25) << 10) +\
+   (MASK_BITS_SHIFT(op, 12, 15) << 14))
+#define MASK_OP_ABS_OP2(op)MASK_BITS_SHIFT(op, 26, 27)
+#define MASK_OP_ABS_S1D(op)MASK_BITS_SHIFT(op, 8, 11)
+
+/* ABSB Format */
+#define MASK_OP_ABSB_OFF18(op) MASK_OP_ABS_OFF18(op)
+#define MASK_OP_ABSB_OP2(op)   MASK_BITS_SHIFT(op, 26, 27)
+#define MASK_OP_ABSB_B(op) MASK_BITS_SHIFT(op, 11, 11)
+#define MASK_OP_ABSB_BPOS(op)  MASK_BITS_SHIFT(op, 7, 10)
+
+/* B Format   */
+#define MASK_OP_B_DISP24(op)   (MASK_BITS_SHIFT(op, 16, 31) + \
+   (MASK_BITS_SHIFT(op, 8, 15) << 16))
+/* BIT Format */
+#define MASK_OP_BIT_D(op)  MASK_BITS_SHIFT(op, 28, 31)
+#define MASK_OP_BIT_POS2(op)   MASK_BITS_SHIFT(op, 23, 27)
+#define MASK_OP_B

[Qemu-devel] [PATCH v7 00/15] TriCore architecture guest implementation

2014-09-01 Thread Bastian Koppelmann

Hi,

my aim is to add Infineon's TriCore architecture to QEMU. This series of 
patches adds the target stubs, a basic testboard and a softmmu for system mode 
emulation. Furthermore it adds all the 16 bit long instructions of the 
architecture grouped by opcode format.

After this series of patches. Another one will follow, which adds a lot of the 
32 bit long instructions.

All the best

Bastian

v6 -> v7:
- TRICORECPU -> TriCoreCPU.
- TRICORECPUClass -> TriCoreCPUClass.
- CPUTRICOREState -> CPUTriCoreState.
- TRICORECPUInfo: Add terminator.
- TRICORECPUInfo -> TriCoreCPUInfo.
- Remove ARM-style IRQ and FIQ lines.
- CPUTRICOREState: target_ulong -> uint32_t.
- CPUTRICOREState: Move mask defines below the struct.
- tricore_testboard.c: Change Licence to GPL v2.
- fprintf(stderr, ..) -> error_report(..).
- tricore_boot_info: Remove unused fields.
- tricore_testboard.c: Remove flash drive.
- tricore_testboard.c: Is not default anymore, change desc.
- configure: Remove empty disas case. Remove target_phys_bits=32.
- tricore-softmmu.mak: Remove pci, SMC91C111 and PFLASH_CFI01.


Bastian Koppelmann (15):
  target-tricore: Add target stubs and qom-cpu
  target-tricore: Add board for systemmode
  target-tricore: Add softmmu support
  target-tricore: Add initialization for translation and activate target
  target-tricore: Add masks and opcodes for decoding
  target-tricore: Add instructions of SRC opcode format
  target-tricore: Add instructions of SRR opcode format
  target-tricore: Add instructions of SSR opcode format
  target-tricore: Add instructions of SRRS and SLRO opcode format
  target-tricore: Add instructions of SB opcode format
  target-tricore: Add instructions of SBC and SBRN opcode format
  target-tricore: Add instructions of SBR opcode format
  target-tricore: Add instructions of SC opcode format
  target-tricore: Add instructions of SLR, SSRO and SRO opcode format
  target-tricore: Add instructions of SR opcode format

 MAINTAINERS |6 +
 arch_init.c |2 +
 configure   |2 +
 cpu-exec.c  |   11 +-
 cpus.c  |6 +
 default-configs/tricore-softmmu.mak |0
 hw/tricore/Makefile.objs|1 +
 hw/tricore/tricore_testboard.c  |  124 +++
 include/elf.h   |2 +
 include/hw/tricore/tricore.h|   11 +
 include/sysemu/arch_init.h  |1 +
 target-tricore/Makefile.objs|1 +
 target-tricore/cpu-qom.h|   71 ++
 target-tricore/cpu.c|  192 +
 target-tricore/cpu.h|  405 ++
 target-tricore/helper.c |  144 
 target-tricore/helper.h |   25 +
 target-tricore/op_helper.c  |  392 ++
 target-tricore/translate.c  | 1263 +++
 target-tricore/tricore-defs.h   |   28 +
 target-tricore/tricore-opcodes.h| 1406 +++
 21 files changed, 4092 insertions(+), 1 deletion(-)
 create mode 100644 default-configs/tricore-softmmu.mak
 create mode 100644 hw/tricore/Makefile.objs
 create mode 100644 hw/tricore/tricore_testboard.c
 create mode 100644 include/hw/tricore/tricore.h
 create mode 100644 target-tricore/Makefile.objs
 create mode 100644 target-tricore/cpu-qom.h
 create mode 100644 target-tricore/cpu.c
 create mode 100644 target-tricore/cpu.h
 create mode 100644 target-tricore/helper.c
 create mode 100644 target-tricore/helper.h
 create mode 100644 target-tricore/op_helper.c
 create mode 100644 target-tricore/translate.c
 create mode 100644 target-tricore/tricore-defs.h
 create mode 100644 target-tricore/tricore-opcodes.h

-- 
2.1.0

[Qemu-devel] [PATCH v7 11/15] target-tricore: Add instructions of SBC and SBRN opcode format

2014-09-01 Thread Bastian Koppelmann

Add instructions of SBC and SBRN opcode format.

Signed-off-by: Bastian Koppelmann 

Reviewed-by: Richard Henderson 
---
 target-tricore/translate.c | 36 
 1 file changed, 36 insertions(+)

diff --git a/target-tricore/translate.c b/target-tricore/translate.c
index e13fba7..1bd2642 100644
--- a/target-tricore/translate.c
+++ b/target-tricore/translate.c
@@ -391,6 +391,8 @@ static inline void gen_branch_condi(DisasContext *ctx, 
TCGCond cond, TCGv r1,
 static void gen_compute_branch(DisasContext *ctx, uint32_t opc, int r1,
int r2 , int32_t constant , int32_t offset)
 {
+TCGv temp;
+
 switch (opc) {
 /* SB-format jumps */
 case OPC1_16_SB_J:
@@ -407,6 +409,26 @@ static void gen_compute_branch(DisasContext *ctx, uint32_t 
opc, int r1,
 case OPC1_16_SB_JNZ:
 gen_branch_condi(ctx, TCG_COND_NE, cpu_gpr_d[15], 0, offset);
 break;
+/* SBC-format jumps */
+case OPC1_16_SBC_JEQ:
+gen_branch_condi(ctx, TCG_COND_EQ, cpu_gpr_d[15], constant, offset);
+break;
+case OPC1_16_SBC_JNE:
+gen_branch_condi(ctx, TCG_COND_NE, cpu_gpr_d[15], constant, offset);
+break;
+/* SBRN-format jumps */
+case OPC1_16_SBRN_JZ_T:
+temp = tcg_temp_new();
+tcg_gen_andi_tl(temp, cpu_gpr_d[15], 0x1u << constant);
+gen_branch_condi(ctx, TCG_COND_EQ, temp, 0, offset);
+tcg_temp_free(temp);
+break;
+case OPC1_16_SBRN_JNZ_T:
+temp = tcg_temp_new();
+tcg_gen_andi_tl(temp, cpu_gpr_d[15], 0x1u << constant);
+gen_branch_condi(ctx, TCG_COND_NE, temp, 0, offset);
+tcg_temp_free(temp);
+break;
 default:
 printf("Branch Error at %x\n", ctx->pc);
 }
@@ -716,6 +738,20 @@ static void decode_16Bit_opc(CPUTriCoreState *env, 
DisasContext *ctx)
 address = MASK_OP_SB_DISP8_SEXT(ctx->opcode);
 gen_compute_branch(ctx, op1, 0, 0, 0, address);
 break;
+/* SBC-format */
+case OPC1_16_SBC_JEQ:
+case OPC1_16_SBC_JNE:
+address = MASK_OP_SBC_DISP4(ctx->opcode);
+const16 = MASK_OP_SBC_CONST4_SEXT(ctx->opcode);
+gen_compute_branch(ctx, op1, 0, 0, const16, address);
+break;
+/* SBRN-format */
+case OPC1_16_SBRN_JNZ_T:
+case OPC1_16_SBRN_JZ_T:
+address = MASK_OP_SBRN_DISP4(ctx->opcode);
+const16 = MASK_OP_SBRN_N(ctx->opcode);
+gen_compute_branch(ctx, op1, 0, 0, const16, address);
+break;
 }
 }
 
-- 
2.1.0

[Qemu-devel] [PATCH v7 06/15] target-tricore: Add instructions of SRC opcode format

2014-09-01 Thread Bastian Koppelmann

Add instructions of SRC opcode format.
Add micro-op generator functions for add, conditional add/sub and shi/shai.

Signed-off-by: Bastian Koppelmann 

Reviewed-by: Richard Henderson 
---
 target-tricore/helper.h|  16 +++
 target-tricore/translate.c | 251 +
 2 files changed, 267 insertions(+)

diff --git a/target-tricore/helper.h b/target-tricore/helper.h
index e69de29..5884240 100644
--- a/target-tricore/helper.h
+++ b/target-tricore/helper.h
@@ -0,0 +1,16 @@
+/*
+ *  Copyright (c) 2012-2014 Bastian Koppelmann C-Lab/University Paderborn
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
diff --git a/target-tricore/translate.c b/target-tricore/translate.c
index e06cf41..607a066 100644
--- a/target-tricore/translate.c
+++ b/target-tricore/translate.c
@@ -27,6 +27,7 @@
 #include "exec/helper-gen.h"
 
 #include "tricore-opcodes.h"
+
 /*
  * TCG registers
  */
@@ -102,8 +103,258 @@ void tricore_cpu_dump_state(CPUState *cs, FILE *f,
 
 }
 
+/*
+ * Functions to generate micro-ops
+ */
+
+/* Functions for arithmetic instructions  */
+
+static inline void gen_add_d(TCGv ret, TCGv r1, TCGv r2)
+{
+TCGv t0 = tcg_temp_new_i32();
+TCGv result = tcg_temp_new_i32();
+/* Addition and set V/SV bits */
+tcg_gen_add_tl(result, r1, r2);
+/* calc V bit */
+tcg_gen_xor_tl(cpu_PSW_V, result, r1);
+tcg_gen_xor_tl(t0, r1, r2);
+tcg_gen_andc_tl(cpu_PSW_V, cpu_PSW_V, t0);
+/* Calc SV bit */
+tcg_gen_or_tl(cpu_PSW_SV, cpu_PSW_SV, cpu_PSW_V);
+/* Calc AV/SAV bits */
+tcg_gen_add_tl(cpu_PSW_AV, result, result);
+tcg_gen_xor_tl(cpu_PSW_AV, result, cpu_PSW_AV);
+/* calc SAV */
+tcg_gen_or_tl(cpu_PSW_SAV, cpu_PSW_SAV, cpu_PSW_AV);
+/* write back result */
+tcg_gen_mov_tl(ret, result);
+
+tcg_temp_free(result);
+tcg_temp_free(t0);
+}
+
+static inline void gen_addi_d(TCGv ret, TCGv r1, target_ulong r2)
+{
+TCGv temp = tcg_const_i32(r2);
+gen_add_d(ret, r1, temp);
+tcg_temp_free(temp);
+}
+
+static inline void gen_cond_add(TCGCond cond, TCGv r1, TCGv r2, TCGv r3,
+TCGv r4)
+{
+TCGv temp = tcg_temp_new();
+TCGv temp2 = tcg_temp_new();
+TCGv result = tcg_temp_new();
+TCGv mask = tcg_temp_new();
+TCGv t0 = tcg_const_i32(0);
+
+/* create mask for sticky bits */
+tcg_gen_setcond_tl(cond, mask, r4, t0);
+tcg_gen_shli_tl(mask, mask, 31);
+
+tcg_gen_add_tl(result, r1, r2);
+/* Calc PSW_V */
+tcg_gen_xor_tl(temp, result, r1);
+tcg_gen_xor_tl(temp2, r1, r2);
+tcg_gen_andc_tl(temp, temp, temp2);
+tcg_gen_movcond_tl(cond, cpu_PSW_V, r4, t0, temp, cpu_PSW_V);
+/* Set PSW_SV */
+tcg_gen_and_tl(temp, temp, mask);
+tcg_gen_or_tl(cpu_PSW_SV, temp, cpu_PSW_SV);
+/* calc AV bit */
+tcg_gen_add_tl(temp, result, result);
+tcg_gen_xor_tl(temp, temp, result);
+tcg_gen_movcond_tl(cond, cpu_PSW_AV, r4, t0, temp, cpu_PSW_AV);
+/* calc SAV bit */
+tcg_gen_and_tl(temp, temp, mask);
+tcg_gen_or_tl(cpu_PSW_SAV, temp, cpu_PSW_SAV);
+/* write back result */
+tcg_gen_movcond_tl(cond, r3, r4, t0, result, r3);
+
+tcg_temp_free(t0);
+tcg_temp_free(temp);
+tcg_temp_free(temp2);
+tcg_temp_free(result);
+tcg_temp_free(mask);
+}
+
+static inline void gen_condi_add(TCGCond cond, TCGv r1, int32_t r2,
+ TCGv r3, TCGv r4)
+{
+TCGv temp = tcg_const_i32(r2);
+gen_cond_add(cond, r1, temp, r3, r4);
+tcg_temp_free(temp);
+}
+
+static void gen_shi(TCGv ret, TCGv r1, int32_t shift_count)
+{
+if (shift_count == -32) {
+tcg_gen_movi_tl(ret, 0);
+} else if (shift_count >= 0) {
+tcg_gen_shli_tl(ret, r1, shift_count);
+} else {
+tcg_gen_shri_tl(ret, r1, -shift_count);
+}
+}
+
+static void gen_shaci(TCGv ret, TCGv r1, int32_t shift_count)
+{
+uint32_t msk, msk_start;
+TCGv temp = tcg_temp_new();
+TCGv temp2 = tcg_temp_new();
+TCGv t_0 = tcg_const_i32(0);
+
+if (shift_count == 0) {
+/* Clear PSW.C and PSW.V */
+tcg_gen_movi_tl(cpu_PSW_C, 0);
+tcg_gen_mov_tl(cpu_PSW_V, cpu_PSW_C);
+tcg_gen_mov_tl(ret, r1);
+} else if (shift_count == -32) {
+/* set PSW.C */
+tcg_gen_mov_tl(cpu_PSW_C, r1);
+/* fill ret completly with sign bit */

1 2 3 >

1 - 100 of 257 matches

Mail list logo