date:20170220

Re: [Qemu-devel] Error handling for KVM_GET_DIRTY_LOG

2017-02-20 Thread Christian Borntraeger

On 02/16/2017 03:51 PM, Janosch Frank wrote:
> While trying to fix a bug in the s390 migration code, I noticed that
> QEMU ignores practically all errors returned from that VM ioctl. QEMU
> behaves as specified in the KVM api and only processes -1 (-EPERM) as an
> error.
> 
> Unfortunately the documentation is wrong/old and KVM may return -EFAULT,
> -EINVAL, -ENOTSUPP (BookE) and -ENOENT. This bugs me, as I found a case
> where I want to return -EFAULT because of guest memory problems and QEMU
> will still happily migrate the VM.
> 
> I currently don't see a reason why we continue to migrate on EFAULT and
> EINVAL. But returning -error from kvm_physical_sync_dirty_bitmap might
> also a bit hard, as it kills QEMU.
> 
> Do we want to fix this and if, how do we want it done?
> If not we at least have a definitive mail to point to when the next one
> comes around. I also have a KVM patch to update the api documentation if
> wanted (maybe we should dust that off a bit anyhow).

I think we want to handle _ALL_ error of that ioctl. Instead of aborting
QEMU we might just want to abort the migration  in that case?


> 
> 
> This has been brought up in 2009 [1] the first time and was more or less
> fixed and then reverted in 2014 [2].
> 
> The reason in [1] was that PPC hadn't settled yet on a valid return code.
> 
> In [2] it was too close to the v2 to handle it properly.
> 
> 
> [1] https://lists.nongnu.org/archive/html/qemu-devel/2009-07/msg01772.html
> 
> [2] https://lists.nongnu.org/archive/html/qemu-devel/2014-04/msg01993.html

So back then it was just too close to 2.0 and should have been revisited for 
2.1. Lets now fix it for 2.9?

Re: [Qemu-devel] [PATCH v7 00/17] VT-d: vfio enablement and misc enhances

2017-02-20 Thread Liu, Yi L

> -Original Message-
> From: Qemu-devel [mailto:qemu-devel-bounces+yi.l.liu=intel@nongnu.org]
> On Behalf Of Peter Xu
> Sent: Monday, February 20, 2017 3:48 PM
> To: Alex Williamson 
> Cc: Lan, Tianyu ; Tian, Kevin ;
> m...@redhat.com; jan.kis...@siemens.com; jasow...@redhat.com; qemu-
> de...@nongnu.org; bd.a...@gmail.com; David Gibson
> 
> Subject: Re: [Qemu-devel] [PATCH v7 00/17] VT-d: vfio enablement and misc
> enhances
> 
> On Fri, Feb 17, 2017 at 10:18:35AM -0700, Alex Williamson wrote:
> > On Tue,  7 Feb 2017 16:28:02 +0800
> > Peter Xu  wrote:
> >
> > > This is v7 of vt-d vfio enablement series.
> > [snip]
> > > =
> > > Test Done
> > > =
> > >
> > > Build test passed for x86_64/arm/ppc64.
> > >
> > > Simply tested with x86_64, assigning two PCI devices to a single VM,
> > > boot the VM using:
> > >
> > > bin=x86_64-softmmu/qemu-system-x86_64
> > > $bin -M q35,accel=kvm,kernel-irqchip=split -m 1G \
> > >  -device intel-iommu,intremap=on,eim=off,caching-mode=on \
> > >  -netdev user,id=net0,hostfwd=tcp::-:22 \
> > >  -device virtio-net-pci,netdev=net0 \
> > >  -device vfio-pci,host=03:00.0 \
> > >  -device vfio-pci,host=02:00.0 \
> > >  -trace events=".trace.vfio" \
> > >  /var/lib/libvirt/images/vm1.qcow2
> > >
> > > pxdev:bin [vtd-vfio-enablement]# cat .trace.vfio
> > > vtd_page_walk*
> > > vtd_replay*
> > > vtd_inv_desc*
> > >
> > > Then, in the guest, run the following tool:
> > >
> > >
> > > https://github.com/xzpeter/clibs/blob/master/gpl/userspace/vfio-bind
> > > -group/vfio-bind-group.c
> > >
> > > With parameter:
> > >
> > >   ./vfio-bind-group 00:03.0 00:04.0
> > >
> > > Check host side trace log, I can see pages are replayed and mapped
> > > in
> > > 00:04.0 device address space, like:
> > >
> > > ...
> > > vtd_replay_ce_valid replay valid context device 00:04.00 hi 0x401 lo
> > > 0x38fe1001 vtd_page_walk Page walk for ce (0x401, 0x38fe1001) iova
> > > range 0x0 - 0x80 vtd_page_walk_level Page walk
> > > (base=0x38fe1000, level=3) iova range 0x0 - 0x80
> > > vtd_page_walk_level Page walk (base=0x35d31000, level=2) iova range
> > > 0x0 - 0x4000 vtd_page_walk_level Page walk (base=0x34979000,
> > > level=1) iova range 0x0 - 0x20 vtd_page_walk_one Page walk
> > > detected map level 0x1 iova 0x0 -> gpa 0x22dc3000 mask 0xfff perm 3
> > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x1000 ->
> > > gpa 0x22e25000 mask 0xfff perm 3 vtd_page_walk_one Page walk
> > > detected map level 0x1 iova 0x2000 -> gpa 0x22e12000 mask 0xfff perm
> > > 3 vtd_page_walk_one Page walk detected map level 0x1 iova 0x3000 ->
> > > gpa 0x22e2d000 mask 0xfff perm 3 vtd_page_walk_one Page walk
> > > detected map level 0x1 iova 0x4000 -> gpa 0x12a49000 mask 0xfff perm
> > > 3 vtd_page_walk_one Page walk detected map level 0x1 iova 0x5000 ->
> > > gpa 0x129bb000 mask 0xfff perm 3 vtd_page_walk_one Page walk
> > > detected map level 0x1 iova 0x6000 -> gpa 0x128db000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x7000 -> gpa
> 0x12a8 mask 0xfff perm 3 vtd_page_walk_one Page walk detected map
> level 0x1 iova 0x8000 -> gpa 0x12a7e000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x9000 -> gpa
> 0x12b22000 mask 0xfff perm 3 vtd_page_walk_one Page walk detected map
> level 0x1 iova 0xa000 -> gpa 0x12b41000 mask 0xfff perm 3 ...
> >
> > Hi Peter,
> >
> > I'm trying to make use of this, with your vtd-vfio-enablement-v7
> > branch (HEAD 0c1c4e738095).  I'm assigning an 82576 PF to a VM.  It
> > works with iommu=pt, but if I remove that option, the device does not
> > work and vfio_iommu_map_notify is never called.  Any suggestions?  My
> > commandline is below.  Thanks,
> >
> > Alex
> >
> > /usr/local/bin/qemu-system-x86_64 \
> > -name guest=l1,debug-threads=on -S \
> > -machine pc-q35-2.9,accel=kvm,usb=off,dump-guest-core=off,kernel-
> irqchip=split \
> > -cpu host -m 10240 -realtime mlock=off -smp
> 4,sockets=1,cores=2,threads=2 \
> > -no-user-config -nodefaults -monitor stdio -rtc 
> > base=utc,driftfix=slew \
> > -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown \
> > -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 \
> > -boot strict=on \
> > -device ioh3420,port=0x10,chassis=1,id=pci.1,bus=pcie.0,addr=0x2 \
> > -device i82801b11-bridge,id=pci.2,bus=pcie.0,addr=0x1e \
> > -device pci-bridge,chassis_nr=3,id=pci.3,bus=pci.2,addr=0x0 \
> > -device ioh3420,port=0x18,chassis=4,id=pci.4,bus=pcie.0,addr=0x3 \
> > -device ioh3420,port=0x20,chassis=5,id=pci.5,bus=pcie.0,addr=0x4 \
> > -device ioh3420,port=0x28,chassis=6,id=pci.6,bus=pcie.0,addr=0x5 \
> > -device ioh3420,port=0x30,chassis=7,id=pci.7,bus=pcie.0,addr=0x6 \
> > -device ioh3420,port=0x38,chassis=8,id=pci.8,bus=pcie.0,addr=0x7 \
> > -device ich9-usb-ehci1,id=usb,bu

[Qemu-devel] [PATCH 0/3] filter-rewriter: fix two bugs and one optimization

2017-02-20 Thread zhanghailiang

Hi,

Patch 1 fixes a double free bug, and patch 2 fixes a memory leak bug.
Patch 3 is an optimization for filter-rewriter.

Please review, thanks.

zhanghailiang (3):
  net/colo: fix memory double free error
  filter-rewriter: fix memory leak for connection in
connection_track_table
  filter-rewriter: skip net_checksum_calculate() while offset = 0

 net/colo.c|  2 --
 net/colo.h|  4 +++
 net/filter-rewriter.c | 86 +++
 3 files changed, 77 insertions(+), 15 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH 3/3] filter-rewriter: skip net_checksum_calculate() while offset = 0

2017-02-20 Thread zhanghailiang

While the offset of packets's sequence for primary side and
secondary side is zero, it is unnecessary to call net_checksum_calculate()
to recalculate the checksume value of packets.

Signed-off-by: zhanghailiang 
---
 net/filter-rewriter.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index 7e7ec35..c9a6d43 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -93,10 +93,12 @@ static int handle_primary_tcp_pkt(RewriterState *rf,
 conn->offset -= (ntohl(tcp_pkt->th_ack) - 1);
 conn->syn_flag = 0;
 }
-/* handle packets to the secondary from the primary */
-tcp_pkt->th_ack = htonl(ntohl(tcp_pkt->th_ack) + conn->offset);
+if (conn->offset) {
+/* handle packets to the secondary from the primary */
+tcp_pkt->th_ack = htonl(ntohl(tcp_pkt->th_ack) + conn->offset);
 
-net_checksum_calculate((uint8_t *)pkt->data, pkt->size);
+net_checksum_calculate((uint8_t *)pkt->data, pkt->size);
+}
 /*
  * Case 1:
  * The *server* side of this connect is VM, *client* tries to close
@@ -112,7 +114,6 @@ static int handle_primary_tcp_pkt(RewriterState *rf,
  */
 if ((conn->tcp_state == TCPS_LAST_ACK) &&
 (ntohl(tcp_pkt->th_ack) == (conn->fin_ack_seq + 1))) {
-fprintf(stderr, "Remove conn "
 g_hash_table_remove(rf->connection_track_table, key);
 }
 }
@@ -159,10 +160,13 @@ static int handle_secondary_tcp_pkt(RewriterState *rf,
 }
 
 if ((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_ACK) {
-/* handle packets to the primary from the secondary*/
-tcp_pkt->th_seq = htonl(ntohl(tcp_pkt->th_seq) - conn->offset);
+/* Only need to adjust seq while offset is Non-zero */
+if (conn->offset) {
+/* handle packets to the primary from the secondary*/
+tcp_pkt->th_seq = htonl(ntohl(tcp_pkt->th_seq) - conn->offset);
 
-net_checksum_calculate((uint8_t *)pkt->data, pkt->size);
+net_checksum_calculate((uint8_t *)pkt->data, pkt->size);
+}
 /*
  * Case 2:
  * The *server* side of this connect is VM, *server* tries to close
-- 
1.8.3.1

[Qemu-devel] [PATCH 1/3] net/colo: fix memory double free error

2017-02-20 Thread zhanghailiang

The 'primary_list' and 'secondary_list' members of struct Connection
is not allocated through dynamically g_queue_new(), but we free it by using
g_queue_free(), which will lead to a double-free bug.

Signed-off-by: zhanghailiang 
---
 net/colo.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/colo.c b/net/colo.c
index 6a6eacd..7d5c423 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -147,9 +147,7 @@ void connection_destroy(void *opaque)
 Connection *conn = opaque;
 
 g_queue_foreach(&conn->primary_list, packet_destroy, NULL);
-g_queue_free(&conn->primary_list);
 g_queue_foreach(&conn->secondary_list, packet_destroy, NULL);
-g_queue_free(&conn->secondary_list);
 g_slice_free(Connection, conn);
 }
 
-- 
1.8.3.1

[Qemu-devel] [PATCH 2/3] filter-rewriter: fix memory leak for connection in connection_track_table

2017-02-20 Thread zhanghailiang

After a net connection is closed, we didn't clear its releated resources
in connection_track_table, which will lead to memory leak.

Let't track the state of net connection, if it is closed, its related
resources will be cleared up.

Signed-off-by: zhanghailiang 
---
 net/colo.h|  4 +++
 net/filter-rewriter.c | 70 +--
 2 files changed, 67 insertions(+), 7 deletions(-)

diff --git a/net/colo.h b/net/colo.h
index 7c524f3..cd9027f 100644
--- a/net/colo.h
+++ b/net/colo.h
@@ -18,6 +18,7 @@
 #include "slirp/slirp.h"
 #include "qemu/jhash.h"
 #include "qemu/timer.h"
+#include "slirp/tcp.h"
 
 #define HASHTABLE_MAX_SIZE 16384
 
@@ -69,6 +70,9 @@ typedef struct Connection {
  * run once in independent tcp connection
  */
 int syn_flag;
+
+int tcp_state; /* TCP FSM state */
+tcp_seq fin_ack_seq; /* the seq of 'fin=1,ack=1' */
 } Connection;
 
 uint32_t connection_key_hash(const void *opaque);
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index c4ab91c..7e7ec35 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -60,9 +60,9 @@ static int is_tcp_packet(Packet *pkt)
 }
 
 /* handle tcp packet from primary guest */
-static int handle_primary_tcp_pkt(NetFilterState *nf,
+static int handle_primary_tcp_pkt(RewriterState *rf,
   Connection *conn,
-  Packet *pkt)
+  Packet *pkt, ConnectionKey *key)
 {
 struct tcphdr *tcp_pkt;
 
@@ -97,15 +97,45 @@ static int handle_primary_tcp_pkt(NetFilterState *nf,
 tcp_pkt->th_ack = htonl(ntohl(tcp_pkt->th_ack) + conn->offset);
 
 net_checksum_calculate((uint8_t *)pkt->data, pkt->size);
+/*
+ * Case 1:
+ * The *server* side of this connect is VM, *client* tries to close
+ * the connection.
+ *
+ * We got 'ack=1' packets from client side, it acks 'fin=1, ack=1'
+ * packet from server side. From this point, we can ensure that there
+ * will be no packets in the connection, except that, some errors
+ * happen between the path of 'filter object' and vNIC, if this rare
+ * case really happen, we can still create a new connection,
+ * So it is safe to remove the connection from connection_track_table.
+ *
+ */
+if ((conn->tcp_state == TCPS_LAST_ACK) &&
+(ntohl(tcp_pkt->th_ack) == (conn->fin_ack_seq + 1))) {
+fprintf(stderr, "Remove conn "
+g_hash_table_remove(rf->connection_track_table, key);
+}
+}
+/*
+ * Case 2:
+ * The *server* side of this connect is VM, *server* tries to close
+ * the connection.
+ *
+ * We got 'fin=1, ack=1' packet from client side, we need to
+ * record the seq of 'fin=1, ack=1' packet.
+ */
+if ((tcp_pkt->th_flags & (TH_ACK | TH_FIN)) == (TH_ACK | TH_FIN)) {
+conn->fin_ack_seq = htonl(tcp_pkt->th_seq);
+conn->tcp_state = TCPS_LAST_ACK;
 }
 
 return 0;
 }
 
 /* handle tcp packet from secondary guest */
-static int handle_secondary_tcp_pkt(NetFilterState *nf,
+static int handle_secondary_tcp_pkt(RewriterState *rf,
 Connection *conn,
-Packet *pkt)
+Packet *pkt, ConnectionKey *key)
 {
 struct tcphdr *tcp_pkt;
 
@@ -133,8 +163,34 @@ static int handle_secondary_tcp_pkt(NetFilterState *nf,
 tcp_pkt->th_seq = htonl(ntohl(tcp_pkt->th_seq) - conn->offset);
 
 net_checksum_calculate((uint8_t *)pkt->data, pkt->size);
+/*
+ * Case 2:
+ * The *server* side of this connect is VM, *server* tries to close
+ * the connection.
+ *
+ * We got 'ack=1' packets from server side, it acks 'fin=1, ack=1'
+ * packet from client side. Like Case 1, there should be no packets
+ * in the connection from now know, But the difference here is
+ * if the packet is lost, We will get the resent 'fin=1,ack=1' packet.
+ * TODO: Fix above case.
+ */
+if ((conn->tcp_state == TCPS_LAST_ACK) &&
+(ntohl(tcp_pkt->th_ack) == (conn->fin_ack_seq + 1))) {
+g_hash_table_remove(rf->connection_track_table, key);
+}
+}
+/*
+ * Case 1:
+ * The *server* side of this connect is VM, *client* tries to close
+ * the connection.
+ *
+ * We got 'fin=1, ack=1' packet from server side, we need to
+ * record the seq of 'fin=1, ack=1' packet.
+ */
+if ((tcp_pkt->th_flags & (TH_ACK | TH_FIN)) == (TH_ACK | TH_FIN)) {
+conn->fin_ack_seq = ntohl(tcp_pkt->th_seq);
+conn->tcp_state = TCPS_LAST_ACK;
 }
-
 return 0;
 }
 
@@ -178,7 +234,7 @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
 
 if (sender == nf->netdev) {
 /* NET_FILTER_DIRECTION_TX */
-

Re: [Qemu-devel] [PULL 08/41] intel_iommu: support device iotlb descriptor

2017-02-20 Thread Liu, Yi L

> -Original Message-
> From: Jason Wang [mailto:jasow...@redhat.com]
> Sent: Friday, February 17, 2017 2:43 PM
> To: Liu, Yi L ; Michael S. Tsirkin ; 
> qemu-
> de...@nongnu.org
> Cc: Peter Maydell ; Eduardo Habkost
> ; Peter Xu ; Paolo Bonzini
> ; Richard Henderson ; Tian, Kevin
> ; Lan, Tianyu ; Alex Williamson
> 
> Subject: Re: [Qemu-devel] [PULL 08/41] intel_iommu: support device iotlb
> descriptor
> 
> 
> 
> On 2017年02月17日 14:18, Liu, Yi L wrote:
> >> -Original Message-
> >> From: Jason Wang [mailto:jasow...@redhat.com]
> >> Sent: Thursday, February 16, 2017 1:44 PM
> >> To: Liu, Yi L ; Michael S. Tsirkin
> >> ; qemu- de...@nongnu.org
> >> Cc: Peter Maydell ; Eduardo Habkost
> >> ; Peter Xu ; Paolo Bonzini
> >> ; Richard Henderson ; Tian,
> >> Kevin ; Lan, Tianyu ;
> >> Alex Williamson 
> >> Subject: Re: [Qemu-devel] [PULL 08/41] intel_iommu: support device
> >> iotlb descriptor
> >>
> >>
> >>
> >> On 2017年02月16日 13:36, Liu, Yi L wrote:
>  -Original Message-
>  From: Qemu-devel
>  [mailto:qemu-devel-bounces+yi.l.liu=intel@nongnu.org]
>  On Behalf Of Michael S. Tsirkin
>  Sent: Tuesday, January 10, 2017 1:40 PM
>  To: qemu-devel@nongnu.org
>  Cc: Peter Maydell ; Eduardo Habkost
>  ; Jason Wang ; Peter
> Xu
>  ; Paolo Bonzini ; Richard
>  Henderson 
>  Subject: [Qemu-devel] [PULL 08/41] intel_iommu: support device
>  iotlb descriptor
> 
>  From: Jason Wang 
> 
>  This patch enables device IOTLB support for intel iommu. The major
>  work is to implement QI device IOTLB descriptor processing and
>  notify the device through iommu notifier.
> 
> >>> Hi Jason/Michael,
> >>>
> >>> Recently Peter Xu's patch also touched intel-iommu emulation. His
> >>> patch shadows second-level page table by capturing iotlb flush from
> >>> guest. It would result in page table updating in host. Does this
> >>> patch also use the same map/umap API provided by VFIO?
> >> Yes, it depends on the iommu notifier too.
> >>
> >>> If it is, then I think it would also update page table in host. It
> >>> looks to be a duplicate update. Pls refer to the following snapshot
> >>> captured from section 6.5.2.5 of vtd spec.
> >>>
> >>> "Since translation requests from a device may be serviced by
> >>> hardware from the IOTLB, software must always request IOTLB
> >>> invalidation
> >>> (iotlb_inv_dsc) before requesting corresponding Device-TLB
> >>> (dev_tlb_inv_dsc) invalidation."
> >>>
> >>> Maybe for device-iotlb, we need a separate API which just pass down
> >>> the invalidate info without updating page table. Any thoughts?
> >> cc Alex.
> >>
> >> If we want ATS to be visible for guest (but I'm not sure if VFIO
> >> support this), we probably need another notifier or a new flag.
> > Jason, for assigned device, I think guest could see ATS if the
> > assigned device supports ATS. I can see it when passthru iGPU.
> >
> > Regards,
> > Yi L
> 
> Good to know this.
> 
> If I understand your suggestion correctly, you want a dedicated API to flush a
> hardware device IOTLB. I'm not sure this is really needed.
yes, I'd like to have an extra API besides the current MAP/UNMAP.

> There's some discussion of similar issue in the past (when ATS is used for 
> virtio-
> net/vhost), looks like we could solve this by not trigger the UNMAP notifier
> unless it was device IOTLB inv desc if ATS is enabled for the device? With 
> this
> remote IOMMU/IOTLB can only get invalidation request once. For VFIO, the
> under layer IOMMU can deal with hardware device IOTLB without any extra
> efforts.
If I catch the background, I think it should be "not trigger the UNMAP notifier
when unless it was device IOTLB inv desc if ATS is enabled for the device"

Feel free to correct me.

Thanks,
Yi L
> Does this make sense?
> 
> Thanks

Re: [Qemu-devel] [PATCH v7 00/17] VT-d: vfio enablement and misc enhances

2017-02-20 Thread Peter Xu

On Mon, Feb 20, 2017 at 08:17:32AM +, Liu, Yi L wrote:
> > -Original Message-
> > From: Qemu-devel [mailto:qemu-devel-bounces+yi.l.liu=intel@nongnu.org]
> > On Behalf Of Peter Xu
> > Sent: Monday, February 20, 2017 3:48 PM
> > To: Alex Williamson 
> > Cc: Lan, Tianyu ; Tian, Kevin ;
> > m...@redhat.com; jan.kis...@siemens.com; jasow...@redhat.com; qemu-
> > de...@nongnu.org; bd.a...@gmail.com; David Gibson
> > 
> > Subject: Re: [Qemu-devel] [PATCH v7 00/17] VT-d: vfio enablement and misc
> > enhances
> > 
> > On Fri, Feb 17, 2017 at 10:18:35AM -0700, Alex Williamson wrote:
> > > On Tue,  7 Feb 2017 16:28:02 +0800
> > > Peter Xu  wrote:
> > >
> > > > This is v7 of vt-d vfio enablement series.
> > > [snip]
> > > > =
> > > > Test Done
> > > > =
> > > >
> > > > Build test passed for x86_64/arm/ppc64.
> > > >
> > > > Simply tested with x86_64, assigning two PCI devices to a single VM,
> > > > boot the VM using:
> > > >
> > > > bin=x86_64-softmmu/qemu-system-x86_64
> > > > $bin -M q35,accel=kvm,kernel-irqchip=split -m 1G \
> > > >  -device intel-iommu,intremap=on,eim=off,caching-mode=on \
> > > >  -netdev user,id=net0,hostfwd=tcp::-:22 \
> > > >  -device virtio-net-pci,netdev=net0 \
> > > >  -device vfio-pci,host=03:00.0 \
> > > >  -device vfio-pci,host=02:00.0 \
> > > >  -trace events=".trace.vfio" \
> > > >  /var/lib/libvirt/images/vm1.qcow2
> > > >
> > > > pxdev:bin [vtd-vfio-enablement]# cat .trace.vfio
> > > > vtd_page_walk*
> > > > vtd_replay*
> > > > vtd_inv_desc*
> > > >
> > > > Then, in the guest, run the following tool:
> > > >
> > > >
> > > > https://github.com/xzpeter/clibs/blob/master/gpl/userspace/vfio-bind
> > > > -group/vfio-bind-group.c
> > > >
> > > > With parameter:
> > > >
> > > >   ./vfio-bind-group 00:03.0 00:04.0
> > > >
> > > > Check host side trace log, I can see pages are replayed and mapped
> > > > in
> > > > 00:04.0 device address space, like:
> > > >
> > > > ...
> > > > vtd_replay_ce_valid replay valid context device 00:04.00 hi 0x401 lo
> > > > 0x38fe1001 vtd_page_walk Page walk for ce (0x401, 0x38fe1001) iova
> > > > range 0x0 - 0x80 vtd_page_walk_level Page walk
> > > > (base=0x38fe1000, level=3) iova range 0x0 - 0x80
> > > > vtd_page_walk_level Page walk (base=0x35d31000, level=2) iova range
> > > > 0x0 - 0x4000 vtd_page_walk_level Page walk (base=0x34979000,
> > > > level=1) iova range 0x0 - 0x20 vtd_page_walk_one Page walk
> > > > detected map level 0x1 iova 0x0 -> gpa 0x22dc3000 mask 0xfff perm 3
> > > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x1000 ->
> > > > gpa 0x22e25000 mask 0xfff perm 3 vtd_page_walk_one Page walk
> > > > detected map level 0x1 iova 0x2000 -> gpa 0x22e12000 mask 0xfff perm
> > > > 3 vtd_page_walk_one Page walk detected map level 0x1 iova 0x3000 ->
> > > > gpa 0x22e2d000 mask 0xfff perm 3 vtd_page_walk_one Page walk
> > > > detected map level 0x1 iova 0x4000 -> gpa 0x12a49000 mask 0xfff perm
> > > > 3 vtd_page_walk_one Page walk detected map level 0x1 iova 0x5000 ->
> > > > gpa 0x129bb000 mask 0xfff perm 3 vtd_page_walk_one Page walk
> > > > detected map level 0x1 iova 0x6000 -> gpa 0x128db000 mask 0xfff perm 3
> > vtd_page_walk_one Page walk detected map level 0x1 iova 0x7000 -> gpa
> > 0x12a8 mask 0xfff perm 3 vtd_page_walk_one Page walk detected map
> > level 0x1 iova 0x8000 -> gpa 0x12a7e000 mask 0xfff perm 3
> > vtd_page_walk_one Page walk detected map level 0x1 iova 0x9000 -> gpa
> > 0x12b22000 mask 0xfff perm 3 vtd_page_walk_one Page walk detected map
> > level 0x1 iova 0xa000 -> gpa 0x12b41000 mask 0xfff perm 3 ...
> > >
> > > Hi Peter,
> > >
> > > I'm trying to make use of this, with your vtd-vfio-enablement-v7
> > > branch (HEAD 0c1c4e738095).  I'm assigning an 82576 PF to a VM.  It
> > > works with iommu=pt, but if I remove that option, the device does not
> > > work and vfio_iommu_map_notify is never called.  Any suggestions?  My
> > > commandline is below.  Thanks,
> > >
> > > Alex
> > >
> > > /usr/local/bin/qemu-system-x86_64 \
> > > -name guest=l1,debug-threads=on -S \
> > > -machine pc-q35-2.9,accel=kvm,usb=off,dump-guest-core=off,kernel-
> > irqchip=split \
> > > -cpu host -m 10240 -realtime mlock=off -smp
> > 4,sockets=1,cores=2,threads=2 \
> > > -no-user-config -nodefaults -monitor stdio -rtc 
> > > base=utc,driftfix=slew \
> > > -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown \
> > > -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 \
> > > -boot strict=on \
> > > -device ioh3420,port=0x10,chassis=1,id=pci.1,bus=pcie.0,addr=0x2 \
> > > -device i82801b11-bridge,id=pci.2,bus=pcie.0,addr=0x1e \
> > > -device pci-bridge,chassis_nr=3,id=pci.3,bus=pci.2,addr=0x0 \
> > > -device ioh3420,port=0x18,chassis=4,id=pci.4,bus=pcie.0,addr=0x3 \
> > > -device ioh3420,port=0x20,chassis=5,id=pci.5,bus=pcie.0,addr=0x4 \
> > >

Re: [Qemu-devel] [Qemu-trivial] [PATCH] lm32: milkymist-tmu2: fix a third integer overflow

2017-02-20 Thread Michael Walle


Am 2017-02-18 00:00, schrieb Philippe Mathieu-Daudé:

On 02/16/2017 02:26 PM, Peter Maydell wrote:

Don't truncate the multiplication and do a 64 bit one instead
because the result is stored in a 64 bit variable.

This fixes a similar coverity warning to commits 237a8650d640 and
4382fa655498, in a similar way, and is the final third of the fix for
coverity CID 1167561 (hopefully!).

Signed-off-by: Peter Maydell 


Reviewed-by: Philippe Mathieu-Daudé 


Acked-by: Michael Walle 





---
Third time lucky -- I checked and this is the last of these
multiply lines.

 hw/display/milkymist-tmu2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/display/milkymist-tmu2.c b/hw/display/milkymist-tmu2.c
index 7528665..59120dd 100644
--- a/hw/display/milkymist-tmu2.c
+++ b/hw/display/milkymist-tmu2.c
@@ -293,7 +293,7 @@ static void tmu2_start(MilkymistTMU2State *s)
 cpu_physical_memory_unmap(mesh, mesh_len, 0, mesh_len);

 /* Write back the OpenGL framebuffer to the QEMU framebuffer */
-fb_len = 2 * s->regs[R_DSTHRES] * s->regs[R_DSTVRES];
+fb_len = 2ULL * s->regs[R_DSTHRES] * s->regs[R_DSTVRES];
 fb = cpu_physical_memory_map(s->regs[R_DSTFBUF], &fb_len, 1);
 if (fb == NULL) {
 glDeleteTextures(1, &texture);

Re: [Qemu-devel] [PULL 08/41] intel_iommu: support device iotlb descriptor

2017-02-20 Thread Liu, Yi L

> -Original Message-
> From: Peter Xu [mailto:pet...@redhat.com]
> Sent: Friday, February 17, 2017 3:00 PM
> To: Liu, Yi L 
> Cc: Michael S. Tsirkin ; qemu-devel@nongnu.org; Peter
> Maydell ; Eduardo Habkost
> ; Jason Wang ; Paolo Bonzini
> ; Richard Henderson ; Tian, Kevin
> ; Lan, Tianyu 
> Subject: Re: [Qemu-devel] [PULL 08/41] intel_iommu: support device iotlb
> descriptor
> 
> On Fri, Feb 17, 2017 at 06:36:41AM +, Liu, Yi L wrote:
> > > -Original Message-
> > > From: Peter Xu [mailto:pet...@redhat.com]
> > > Sent: Friday, February 17, 2017 11:26 AM
> > > To: Liu, Yi L 
> > > Cc: Michael S. Tsirkin ; qemu-devel@nongnu.org;
> > > Peter Maydell ; Eduardo Habkost
> > > ; Jason Wang ; Paolo
> > > Bonzini ; Richard Henderson ;
> > > Tian, Kevin ; Lan, Tianyu
> > > 
> > > Subject: Re: [Qemu-devel] [PULL 08/41] intel_iommu: support device
> > > iotlb descriptor
> > >
> > > On Thu, Feb 16, 2017 at 05:36:06AM +, Liu, Yi L wrote:
> > > > > -Original Message-
> > > > > From: Qemu-devel
> > > > > [mailto:qemu-devel-bounces+yi.l.liu=intel@nongnu.org]
> > > > > On Behalf Of Michael S. Tsirkin
> > > > > Sent: Tuesday, January 10, 2017 1:40 PM
> > > > > To: qemu-devel@nongnu.org
> > > > > Cc: Peter Maydell ; Eduardo Habkost
> > > > > ; Jason Wang ; Peter
> > > > > Xu ; Paolo Bonzini ;
> > > > > Richard Henderson 
> > > > > Subject: [Qemu-devel] [PULL 08/41] intel_iommu: support device
> > > > > iotlb descriptor
> > > > >
> > > > > From: Jason Wang 
> > > > >
> > > > > This patch enables device IOTLB support for intel iommu. The
> > > > > major work is to implement QI device IOTLB descriptor processing
> > > > > and notify the device through iommu notifier.
> > > > >
> > > > Hi Jason/Michael,
> > > >
> > > > Recently Peter Xu's patch also touched intel-iommu emulation. His
> > > > patch shadows second-level page table by capturing iotlb flush
> > > > from guest. It would result in page table updating in host. Does
> > > > this patch also use the same map/umap API provided by VFIO? If it
> > > > is, then I think it
> > > would also update page table in host.
> > >
> > > I haven't considered complex scenarios, but if simply we have a VM
> > > with one vhost device and one vfio-pci device, imho that should not
> > > be a problem - device iotlb is targeting SID rather than DOMAIN. So
> > > device iotlb invalidations for vhost will be sent to vhost device only.
> >
> > Peter, I think for assigned device, all guest iotlb flush should be
> > translated to be targeting device in the scope of host. Besides the
> > scenario which has vhost and vfio-pci device at the same time, how
> > about only having vfio-pci device and this device has ATS support. Then 
> > there
> should be device-iotlb flushing.
> > With this patch and your patch, it would also introduce two flushing.
> 
> Hmm possibly. I'm still not quite familiar with ATS, but here what we need to
> do may be that we forward these device-iotlb invalidations to the hardware
> below, instead of sending UNMAP notifies, right?
yes, that wouldn’t result in duplicate page table updating in host.

> 
> >
> > > However, vhost may receive two invalidations here, but it won't
> > > matter much since vhost is just flushing caches twice.
> >
> > yeah, so far I didn’t see functional issue, may just reduce
> > performance^_^
> >
> > > > It looks to be
> > > > a duplicate update. Pls refer to the following snapshot captured
> > > > from section 6.5.2.5 of vtd spec.
> > > >
> > > > "Since translation requests from a device may be serviced by
> > > > hardware from the IOTLB, software must always request IOTLB
> > > > invalidation
> > > > (iotlb_inv_dsc) before requesting corresponding Device-TLB
> > > > (dev_tlb_inv_dsc) invalidation."
> > > >
> > > > Maybe for device-iotlb, we need a separate API which just pass
> > > > down the invalidate info without updating page table. Any thoughts?
> > >
> > > Now imho I slightly prefer just use the current UNMAP notifier type
> > > even for device iotlb device. But, we may need to do one more check
> > > that we skip sending general iotlb invalidations to ATS enabled
> > > devices like vhost, to avoid duplicated cache flushing. From
> > > virt-svm side, do we have specific requirement to introduce a new flag 
> > > for it?
> > I think it is a practical solution to do such a check to avoid duplicate 
> > flushing.
> >
> > For virt-svm, requirement is a little bit different since it's not
> > shadowing any guest page table. It needs to shadow invalidate
> > operations. So virt-svm will not use MAP/UNMAP notifier. Instead, it
> > may require notifier which passdown invalidate info and then submit
> invalidation directly.(no page table updating in host).
> 
> Again, just want to know whether my above understanding is correct. If so,
> instead of introducing a new flag, maybe we just need to enhance current
> vtd_process_device_iotlb_desc() to behave differently depending on which
> device the SID belongs to. E.g.:
>

Re: [Qemu-devel] [PATCH v4 1/4] tests/docker: add basic user mapping support

2017-02-20 Thread Alex Bennée


Philippe Mathieu-Daudé  writes:

> Hi Alex,
>
> I first tried "make docker-image-debian-armhf-cross NOUSER=1 V=1"
> which worked fine, then "make docker-image-debian-armhf-cross
> NOUSER=0" but got a "Image is up to date." I thought I should have to
> remove the image manually so I typed "docker rmi
> qemu:debian-armhf-cross" and tried again but still no change in my
> tests/docker/dockerfiles/debian-armhf-cross.docker adding my username,
> then I reviewed your change in the Makefile and got it, setting NOUSER
> to '0' has the same behavior, it's just set. To build the image with
> my user I had to _not_ use the flag or use it unset "NOUSER=" and it
> worked like charm.

Ahh yes. I followed the same pattern as NOCACHE which does the same
thing. I've made it clearer:

@echo 'NOUSER   Define to disable adding current user 
to containers passwd.'

>
>
> On 02/16/2017 09:34 AM, Alex Bennée wrote:
>> Currently all docker builds are done by exporting a tarball to the
>> docker container and running the build as the containers root user.
>> Other use cases are possible however and it is possible to map a part
>> of users file-system to the container. This is useful for example for
>> doing cross-builds of arbitrary source trees. For this to work
>> smoothly the container needs to have a user created that maps cleanly
>> to the host system.
>>
>> This adds a -u option to the docker script so that:
>>
>>   DEB_ARCH=armhf DEB_TYPE=stable ./tests/docker/docker.py build \
>> -u --include-executable=arm-linux-user/qemu-arm \
>> debian:armhf ./tests/docker/dockerfiles/debian-bootstrap.docker
>>
>> Will build a container that can then be run like:
>>
>>   docker run --rm -it -v /home/alex/lsrc/qemu/risu.git/:/src \
>> --user=alex:alex -w /src/ debian:armhf \
>> sh -c "make clean && ./configure -s && make"
>>
>> All docker containers built will add the current user unless
>> explicitly disabled by specifying NOUSER when invoking the Makefile:
>>
>>   make docker-image-debian-armhf-cross NOUSER=1
>>
>> Signed-off-by: Alex Bennée 
>> Reviewed-by: Fam Zheng 
>
> Tested-by: Philippe Mathieu-Daudé 
> Reviewed-by: Philippe Mathieu-Daudé 

Thanks

>
>> ---
>> v2
>>   - write the useradd directly
>>   - change long option to --add-current-user
>> v3
>>   - images -> image's
>>   - add r-b
>>   - add USER to Makefile
>> v4
>>   - s/USER/NOUSER/ and default to on
>>   - fix the add-user code to skip if user already setup (for chained images)
>> ---
>>  tests/docker/Makefile.include |  2 ++
>>  tests/docker/docker.py| 16 ++--
>>  2 files changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
>> index 3f15d5aea8..4778b27ca8 100644
>> --- a/tests/docker/Makefile.include
>> +++ b/tests/docker/Makefile.include
>> @@ -50,6 +50,7 @@ docker-image-%: $(DOCKER_FILES_DIR)/%.docker
>>  $(call quiet-command,\
>>  $(SRC_PATH)/tests/docker/docker.py build qemu:$* $< \
>>  $(if $V,,--quiet) $(if $(NOCACHE),--no-cache) \
>> +$(if $(NOUSER),,--add-current-user) \
>>  $(if $(EXECUTABLE),--include-executable=$(EXECUTABLE)),\
>>  "BUILD","$*")
>>
>> @@ -99,6 +100,7 @@ docker:
>>  @echo ' (default is 1)'
>>  @echo 'DEBUG=1  Stop and drop to shell in the created 
>> container'
>>  @echo ' before running the command.'
>> +@echo 'NOUSER=1 Disable adding current user to 
>> containers passwd.'
>>  @echo 'NOCACHE=1Ignore cache when build images.'
>>  @echo 'EXECUTABLE=Include executable in image.'
>>
>> diff --git a/tests/docker/docker.py b/tests/docker/docker.py
>> index 37d83199e7..d277a2268f 100755
>> --- a/tests/docker/docker.py
>> +++ b/tests/docker/docker.py
>> @@ -25,6 +25,7 @@ import signal
>>  from tarfile import TarFile, TarInfo
>>  from StringIO import StringIO
>>  from shutil import copy, rmtree
>> +from pwd import getpwuid
>>
>>
>>  DEVNULL = open(os.devnull, 'wb')
>> @@ -149,13 +150,21 @@ class Docker(object):
>>  labels = json.loads(resp)[0]["Config"].get("Labels", {})
>>  return labels.get("com.qemu.dockerfile-checksum", "")
>>
>> -def build_image(self, tag, docker_dir, dockerfile, quiet=True, 
>> argv=None):
>> +def build_image(self, tag, docker_dir, dockerfile,
>> +quiet=True, user=False, argv=None):
>>  if argv == None:
>>  argv = []
>>
>>  tmp_df = tempfile.NamedTemporaryFile(dir=docker_dir, 
>> suffix=".docker")
>>  tmp_df.write(dockerfile)
>>
>> +if user:
>> +uid = os.getuid()
>> +uname = getpwuid(uid).pw_name
>> +tmp_df.write("\n")
>> +tmp_df.write("RUN id %s || useradd -u %d -U %s" %
>> + (uname, uid, uname))
>> +
>>  tmp_df.write("\n")
>>  tmp_df

Re: [Qemu-devel] [PATCH v4 2/4] new: debian docker targets for cross-compiling

2017-02-20 Thread Alex Bennée


Philippe Mathieu-Daudé  writes:

> On 02/16/2017 09:34 AM, Alex Bennée wrote:
>> This provides a basic Debian install with access to the emdebian cross
>> compilers. The debian-armhf-cross and debian-arm64-cross targets build
>> on the basic Debian image to allow cross compiling to those targets.
>>
>> A new environment variable (QEMU_CONFIGURE_OPTS) is set as part of the
>> docker container and passed to the build to specify the
>> --cross-prefix. The user still calls the build in the usual way, for
>> example:
>>
>>   make docker-test-build@debian-arm64-cross \
>> TARGET_LIST="aarch64-softmmu,aarch64-linux-user"
>>
>> Signed-off-by: Alex Bennée 
>>
>> ---
>> v2
>>   - add clang (keep shippable happy)
>>   - rm adduser code (done direct now)
>>   - add aptitude (useful for debugging package clashes)
>> v3
>>   - split into debian, debian-armhf-cross and debian-aarch64-cross
>> v4
>>   - Add QEMU_CONFIGURE_OPTS
>> ---
>>  tests/docker/Makefile.include  |  4 
>>  tests/docker/common.rc |  2 +-
>>  tests/docker/dockerfiles/debian-arm64-cross.docker | 15 +
>>  tests/docker/dockerfiles/debian-armhf-cross.docker | 15 +
>>  tests/docker/dockerfiles/debian.docker | 25 
>> ++
>>  5 files changed, 60 insertions(+), 1 deletion(-)
>>  create mode 100644 tests/docker/dockerfiles/debian-arm64-cross.docker
>>  create mode 100644 tests/docker/dockerfiles/debian-armhf-cross.docker
>>  create mode 100644 tests/docker/dockerfiles/debian.docker
>>
>> diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
>> index 4778b27ca8..84bdcc944a 100644
>> --- a/tests/docker/Makefile.include
>> +++ b/tests/docker/Makefile.include
>> @@ -54,6 +54,10 @@ docker-image-%: $(DOCKER_FILES_DIR)/%.docker
>>  $(if $(EXECUTABLE),--include-executable=$(EXECUTABLE)),\
>>  "BUILD","$*")
>>
>> +# Enforce dependancies for composite images
>> +docker-image-debian-armhf-cross: docker-image-debian
>> +docker-image-debian-arm64-cross: docker-image-debian
>> +
>>  # Expand all the pre-requistes for each docker image and test combination
>>  $(foreach i,$(DOCKER_IMAGES), \
>>  $(foreach t,$(DOCKER_TESTS) $(DOCKER_TOOLS), \
>> diff --git a/tests/docker/common.rc b/tests/docker/common.rc
>> index 21657e87c6..6865689bb5 100755
>> --- a/tests/docker/common.rc
>> +++ b/tests/docker/common.rc
>> @@ -29,7 +29,7 @@ build_qemu()
>>  config_opts="--enable-werror \
>>   ${TARGET_LIST:+--target-list=${TARGET_LIST}} \
>>   --prefix=$PWD/install \
>> - $EXTRA_CONFIGURE_OPTS \
>> + $QEMU_CONFIGURE_OPTS $EXTRA_CONFIGURE_OPTS \
>>   $@"
>>  echo "Configure options:"
>>  echo $config_opts
>> diff --git a/tests/docker/dockerfiles/debian-arm64-cross.docker 
>> b/tests/docker/dockerfiles/debian-arm64-cross.docker
>> new file mode 100644
>> index 00..ce90da943f
>> --- /dev/null
>> +++ b/tests/docker/dockerfiles/debian-arm64-cross.docker
>> @@ -0,0 +1,15 @@
>> +#
>> +# Docker arm64 cross-compiler target
>> +#
>> +# This docker target builds on the base debian image.
>> +#
>> +FROM qemu:debian
>> +
>> +# Add the foreign architecture we want and install dependacies
>
> typo "dependencies"

Fixed, thanks.

>
> Reviewed-by: Philippe Mathieu-Daudé 
>
>> +RUN dpkg --add-architecture arm64
>> +RUN apt update
>> +RUN apt install -yy crossbuild-essential-arm64
>> +RUN apt-get build-dep -yy -a arm64 qemu
>> +
>> +# Specify the cross prefix for this image (see tests/docker/common.rc)
>> +ENV QEMU_CONFIGURE_OPTS --cross-prefix=aarch64-linux-gnu-
>> diff --git a/tests/docker/dockerfiles/debian-armhf-cross.docker 
>> b/tests/docker/dockerfiles/debian-armhf-cross.docker
>> new file mode 100644
>> index 00..e0f81a556d
>> --- /dev/null
>> +++ b/tests/docker/dockerfiles/debian-armhf-cross.docker
>> @@ -0,0 +1,15 @@
>> +#
>> +# Docker armhf cross-compiler target
>> +#
>> +# This docker target builds on the base debian image.
>> +#
>> +FROM qemu:debian
>> +
>> +# Add the foreign architecture we want and install dependacies
>> +RUN dpkg --add-architecture armhf
>> +RUN apt update
>> +RUN apt install -yy crossbuild-essential-armhf
>> +RUN apt-get build-dep -yy -a armhf qemu
>> +
>> +# Specify the cross prefix for this image (see tests/docker/common.rc)
>> +ENV QEMU_CONFIGURE_OPTS --cross-prefix=arm-linux-gnueabihf-
>> diff --git a/tests/docker/dockerfiles/debian.docker 
>> b/tests/docker/dockerfiles/debian.docker
>> new file mode 100644
>> index 00..52bd79938e
>> --- /dev/null
>> +++ b/tests/docker/dockerfiles/debian.docker
>> @@ -0,0 +1,25 @@
>> +#
>> +# Docker multiarch cross-compiler target
>> +#
>> +# This docker target is builds on Debian and Emdebian's cross compiler 
>> targets
>> +# to build distro with a selection of cross compilers for building test 
>> binaries.
>> +#
>> +# On its own you can't build much but the docke

Re: [Qemu-devel] [PULL 08/41] intel_iommu: support device iotlb descriptor

2017-02-20 Thread Jason Wang

On 2017年02月20日 16:27, Liu, Yi L wrote:

-Original Message-
From: Jason Wang [mailto:jasow...@redhat.com]
Sent: Friday, February 17, 2017 2:43 PM
To: Liu, Yi L ; Michael S. Tsirkin ; qemu-
de...@nongnu.org
Cc: Peter Maydell ; Eduardo Habkost
; Peter Xu ; Paolo Bonzini
; Richard Henderson ; Tian, Kevin
; Lan, Tianyu ; Alex Williamson

Subject: Re: [Qemu-devel] [PULL 08/41] intel_iommu: support device iotlb
descriptor

On 2017年02月17日 14:18, Liu, Yi L wrote:

-Original Message-
From: Jason Wang [mailto:jasow...@redhat.com]
Sent: Thursday, February 16, 2017 1:44 PM
To: Liu, Yi L ; Michael S. Tsirkin
; qemu- de...@nongnu.org
Cc: Peter Maydell ; Eduardo Habkost
; Peter Xu ; Paolo Bonzini
; Richard Henderson ; Tian,
Kevin ; Lan, Tianyu ;
Alex Williamson 
Subject: Re: [Qemu-devel] [PULL 08/41] intel_iommu: support device
iotlb descriptor

On 2017年02月16日 13:36, Liu, Yi L wrote:

-Original Message-
From: Qemu-devel
[mailto:qemu-devel-bounces+yi.l.liu=intel@nongnu.org]
On Behalf Of Michael S. Tsirkin
Sent: Tuesday, January 10, 2017 1:40 PM
To: qemu-devel@nongnu.org
Cc: Peter Maydell ; Eduardo Habkost
; Jason Wang ; Peter

Xu

; Paolo Bonzini ; Richard
Henderson 
Subject: [Qemu-devel] [PULL 08/41] intel_iommu: support device
iotlb descriptor

From: Jason Wang 

This patch enables device IOTLB support for intel iommu. The major
work is to implement QI device IOTLB descriptor processing and
notify the device through iommu notifier.

Hi Jason/Michael,

Recently Peter Xu's patch also touched intel-iommu emulation. His
patch shadows second-level page table by capturing iotlb flush from
guest. It would result in page table updating in host. Does this
patch also use the same map/umap API provided by VFIO?

Yes, it depends on the iommu notifier too.

If it is, then I think it would also update page table in host. It
looks to be a duplicate update. Pls refer to the following snapshot
captured from section 6.5.2.5 of vtd spec.

"Since translation requests from a device may be serviced by
hardware from the IOTLB, software must always request IOTLB
invalidation
(iotlb_inv_dsc) before requesting corresponding Device-TLB
(dev_tlb_inv_dsc) invalidation."

Maybe for device-iotlb, we need a separate API which just pass down
the invalidate info without updating page table. Any thoughts?

cc Alex.

If we want ATS to be visible for guest (but I'm not sure if VFIO
support this), we probably need another notifier or a new flag.

Jason, for assigned device, I think guest could see ATS if the
assigned device supports ATS. I can see it when passthru iGPU.

Regards,
Yi L

Good to know this.

If I understand your suggestion correctly, you want a dedicated API to flush a
hardware device IOTLB. I'm not sure this is really needed.

yes, I'd like to have an extra API besides the current MAP/UNMAP.

I'm think whether or not we can do this without extra API or even don't 
need to care about this.

There's some discussion of similar issue in the past (when ATS is used for 
virtio-
net/vhost), looks like we could solve this by not trigger the UNMAP notifier
unless it was device IOTLB inv desc if ATS is enabled for the device? With this
remote IOMMU/IOTLB can only get invalidation request once. For VFIO, the
under layer IOMMU can deal with hardware device IOTLB without any extra
efforts.

If I catch the background, I think it should be "not trigger the UNMAP notifier
when unless it was device IOTLB inv desc if ATS is enabled for the device"

Seems not :)

I mean, if ATS is enabled for the device, only trigger UNMAP notifier 
when processing device IOTLB. Then we can only have flush once. And host 
IOMMU driver will take care of device IOTLB flush too.

Thanks

Feel free to correct me.

Thanks,
Yi L

Does this make sense?

Thanks

Re: [Qemu-devel] [PULL 08/41] intel_iommu: support device iotlb descriptor

2017-02-20 Thread Liu, Yi L

> -Original Message-
> From: Jason Wang [mailto:jasow...@redhat.com]
> Sent: Monday, February 20, 2017 5:04 PM
> To: Liu, Yi L ; Michael S. Tsirkin ; 
> qemu-
> de...@nongnu.org
> Cc: Lan, Tianyu ; Peter Maydell
> ; Tian, Kevin ; Eduardo
> Habkost ; Peter Xu ; Alex
> Williamson ; Paolo Bonzini
> ; Richard Henderson 
> Subject: Re: [Qemu-devel] [PULL 08/41] intel_iommu: support device iotlb
> descriptor
> 
> 
> 
> On 2017年02月20日 16:27, Liu, Yi L wrote:
> >> -Original Message-
> >> From: Jason Wang [mailto:jasow...@redhat.com]
> >> Sent: Friday, February 17, 2017 2:43 PM
> >> To: Liu, Yi L ; Michael S. Tsirkin
> >> ; qemu- de...@nongnu.org
> >> Cc: Peter Maydell ; Eduardo Habkost
> >> ; Peter Xu ; Paolo Bonzini
> >> ; Richard Henderson ; Tian,
> >> Kevin ; Lan, Tianyu ;
> >> Alex Williamson 
> >> Subject: Re: [Qemu-devel] [PULL 08/41] intel_iommu: support device
> >> iotlb descriptor
> >>
> >>
> >>
> >> On 2017年02月17日 14:18, Liu, Yi L wrote:
>  -Original Message-
>  From: Jason Wang [mailto:jasow...@redhat.com]
>  Sent: Thursday, February 16, 2017 1:44 PM
>  To: Liu, Yi L ; Michael S. Tsirkin
>  ; qemu- de...@nongnu.org
>  Cc: Peter Maydell ; Eduardo Habkost
>  ; Peter Xu ; Paolo Bonzini
>  ; Richard Henderson ; Tian,
>  Kevin ; Lan, Tianyu ;
>  Alex Williamson 
>  Subject: Re: [Qemu-devel] [PULL 08/41] intel_iommu: support device
>  iotlb descriptor
> 
> 
> 
>  On 2017年02月16日 13:36, Liu, Yi L wrote:
> >> -Original Message-
> >> From: Qemu-devel
> >> [mailto:qemu-devel-bounces+yi.l.liu=intel@nongnu.org]
> >> On Behalf Of Michael S. Tsirkin
> >> Sent: Tuesday, January 10, 2017 1:40 PM
> >> To: qemu-devel@nongnu.org
> >> Cc: Peter Maydell ; Eduardo Habkost
> >> ; Jason Wang ; Peter
> >> Xu
> >> ; Paolo Bonzini ; Richard
> >> Henderson 
> >> Subject: [Qemu-devel] [PULL 08/41] intel_iommu: support device
> >> iotlb descriptor
> >>
> >> From: Jason Wang 
> >>
> >> This patch enables device IOTLB support for intel iommu. The
> >> major work is to implement QI device IOTLB descriptor processing
> >> and notify the device through iommu notifier.
> >>
> > Hi Jason/Michael,
> >
> > Recently Peter Xu's patch also touched intel-iommu emulation. His
> > patch shadows second-level page table by capturing iotlb flush
> > from guest. It would result in page table updating in host. Does
> > this patch also use the same map/umap API provided by VFIO?
>  Yes, it depends on the iommu notifier too.
> 
> > If it is, then I think it would also update page table in host. It
> > looks to be a duplicate update. Pls refer to the following
> > snapshot captured from section 6.5.2.5 of vtd spec.
> >
> > "Since translation requests from a device may be serviced by
> > hardware from the IOTLB, software must always request IOTLB
> > invalidation
> > (iotlb_inv_dsc) before requesting corresponding Device-TLB
> > (dev_tlb_inv_dsc) invalidation."
> >
> > Maybe for device-iotlb, we need a separate API which just pass
> > down the invalidate info without updating page table. Any thoughts?
>  cc Alex.
> 
>  If we want ATS to be visible for guest (but I'm not sure if VFIO
>  support this), we probably need another notifier or a new flag.
> >>> Jason, for assigned device, I think guest could see ATS if the
> >>> assigned device supports ATS. I can see it when passthru iGPU.
> >>>
> >>> Regards,
> >>> Yi L
> >> Good to know this.
> >>
> >> If I understand your suggestion correctly, you want a dedicated API
> >> to flush a hardware device IOTLB. I'm not sure this is really needed.
> > yes, I'd like to have an extra API besides the current MAP/UNMAP.
> 
> I'm think whether or not we can do this without extra API or even don't need 
> to
> care about this.
> 
> >
> >> There's some discussion of similar issue in the past (when ATS is
> >> used for virtio- net/vhost), looks like we could solve this by not
> >> trigger the UNMAP notifier unless it was device IOTLB inv desc if ATS
> >> is enabled for the device? With this remote IOMMU/IOTLB can only get
> >> invalidation request once. For VFIO, the under layer IOMMU can deal
> >> with hardware device IOTLB without any extra efforts.
> > If I catch the background, I think it should be "not trigger the UNMAP
> > notifier when unless it was device IOTLB inv desc if ATS is enabled for the
> device"
> 
> Seems not :)
> 
> I mean, if ATS is enabled for the device, only trigger UNMAP notifier when
> processing device IOTLB. Then we can only have flush once. And host IOMMU
> driver will take care of device IOTLB flush too.
hmmm, how about the iotlb inv desc which is prior to device-iotlb? I'm not sure
if it is practical to ignore the iotlb inv des since there is no SID info in it.

Regards,
Yi L

[Qemu-devel] [Bug 741115] Re: Add support of coprocessor cp15, cp14 registers exposion in the embedded gdb server

2017-02-20 Thread Thomas Huth

** Changed in: qemu
   Importance: Undecided => Wishlist

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/741115

Title:
  Add support of coprocessor cp15, cp14 registers exposion in the
  embedded gdb server

Status in QEMU:
  New

Bug description:
  Please add support of exposion of ARM coprocesor registers/logic at the 
embedded gdb server,
   for example of cp15, cp14, etc registers.

  Related project http://jtagarmgdbsrvr.sourceforge.net/index.html

  Also filled bug in the GDB
  http://sourceware.org/bugzilla/show_bug.cgi?id=12602

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/741115/+subscriptions

Re: [Qemu-devel] [PULL 08/41] intel_iommu: support device iotlb descriptor

2017-02-20 Thread Jason Wang

On 2017年02月20日 17:13, Liu, Yi L wrote:

-Original Message-
From: Jason Wang [mailto:jasow...@redhat.com]
Sent: Monday, February 20, 2017 5:04 PM
To: Liu, Yi L ; Michael S. Tsirkin ; qemu-
de...@nongnu.org
Cc: Lan, Tianyu ; Peter Maydell
; Tian, Kevin ; Eduardo
Habkost ; Peter Xu ; Alex
Williamson ; Paolo Bonzini
; Richard Henderson 
Subject: Re: [Qemu-devel] [PULL 08/41] intel_iommu: support device iotlb
descriptor

On 2017年02月20日 16:27, Liu, Yi L wrote:

-Original Message-
From: Jason Wang [mailto:jasow...@redhat.com]
Sent: Friday, February 17, 2017 2:43 PM
To: Liu, Yi L ; Michael S. Tsirkin
; qemu- de...@nongnu.org
Cc: Peter Maydell ; Eduardo Habkost
; Peter Xu ; Paolo Bonzini
; Richard Henderson ; Tian,
Kevin ; Lan, Tianyu ;
Alex Williamson 
Subject: Re: [Qemu-devel] [PULL 08/41] intel_iommu: support device
iotlb descriptor

On 2017年02月17日 14:18, Liu, Yi L wrote:

-Original Message-
From: Jason Wang [mailto:jasow...@redhat.com]
Sent: Thursday, February 16, 2017 1:44 PM
To: Liu, Yi L ; Michael S. Tsirkin
; qemu- de...@nongnu.org
Cc: Peter Maydell ; Eduardo Habkost
; Peter Xu ; Paolo Bonzini
; Richard Henderson ; Tian,
Kevin ; Lan, Tianyu ;
Alex Williamson 
Subject: Re: [Qemu-devel] [PULL 08/41] intel_iommu: support device
iotlb descriptor

On 2017年02月16日 13:36, Liu, Yi L wrote:

-Original Message-
From: Qemu-devel
[mailto:qemu-devel-bounces+yi.l.liu=intel@nongnu.org]
On Behalf Of Michael S. Tsirkin
Sent: Tuesday, January 10, 2017 1:40 PM
To: qemu-devel@nongnu.org
Cc: Peter Maydell ; Eduardo Habkost
; Jason Wang ; Peter

Xu

; Paolo Bonzini ; Richard
Henderson 
Subject: [Qemu-devel] [PULL 08/41] intel_iommu: support device
iotlb descriptor

From: Jason Wang 

This patch enables device IOTLB support for intel iommu. The
major work is to implement QI device IOTLB descriptor processing
and notify the device through iommu notifier.

Hi Jason/Michael,

Recently Peter Xu's patch also touched intel-iommu emulation. His
patch shadows second-level page table by capturing iotlb flush
from guest. It would result in page table updating in host. Does
this patch also use the same map/umap API provided by VFIO?

Yes, it depends on the iommu notifier too.

If it is, then I think it would also update page table in host. It
looks to be a duplicate update. Pls refer to the following
snapshot captured from section 6.5.2.5 of vtd spec.

"Since translation requests from a device may be serviced by
hardware from the IOTLB, software must always request IOTLB
invalidation
(iotlb_inv_dsc) before requesting corresponding Device-TLB
(dev_tlb_inv_dsc) invalidation."

Maybe for device-iotlb, we need a separate API which just pass
down the invalidate info without updating page table. Any thoughts?

cc Alex.

If we want ATS to be visible for guest (but I'm not sure if VFIO
support this), we probably need another notifier or a new flag.

Jason, for assigned device, I think guest could see ATS if the
assigned device supports ATS. I can see it when passthru iGPU.

Regards,
Yi L

Good to know this.

If I understand your suggestion correctly, you want a dedicated API
to flush a hardware device IOTLB. I'm not sure this is really needed.

yes, I'd like to have an extra API besides the current MAP/UNMAP.

I'm think whether or not we can do this without extra API or even don't need to
care about this.

There's some discussion of similar issue in the past (when ATS is
used for virtio- net/vhost), looks like we could solve this by not
trigger the UNMAP notifier unless it was device IOTLB inv desc if ATS
is enabled for the device? With this remote IOMMU/IOTLB can only get
invalidation request once. For VFIO, the under layer IOMMU can deal
with hardware device IOTLB without any extra efforts.

If I catch the background, I think it should be "not trigger the UNMAP
notifier when unless it was device IOTLB inv desc if ATS is enabled for the

device"

Seems not :)

I mean, if ATS is enabled for the device, only trigger UNMAP notifier when
processing device IOTLB. Then we can only have flush once. And host IOMMU
driver will take care of device IOTLB flush too.

hmmm, how about the iotlb inv desc which is prior to device-iotlb?

Any issue in this case?

I'm not sure
if it is practical to ignore the iotlb inv des since there is no SID info in it.

Yes, this needs some changes maybe.

Thanks

Regards,
Yi L

Re: [Qemu-devel] [Help] Windows2012 as Guest 64+cores on KVM Halts

2017-02-20 Thread Gonglei (Arei)

Hi Paolo,

> 
> 
> On 16/02/2017 02:31, Gonglei (Arei) wrote:
> > And the below patch works for me, I can support max 255 vcpus for WS2012
> > with hyper-v enlightenments.
> >
> > diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> > index 27fd050..efe3cbc 100644
> > --- a/target/i386/kvm.c
> > +++ b/target/i386/kvm.c
> > @@ -772,7 +772,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
> >
> >  c = &cpuid_data.entries[cpuid_i++];
> >  c->function = HYPERV_CPUID_IMPLEMENT_LIMITS;
> > -c->eax = 0x40;
> > +c->eax = -1;
> >  c->ebx = 0x40;
> >
> >  kvm_base = KVM_CPUID_SIGNATURE_NEXT;
> 
> This needs to depend on the machine type, but apart from that I think

I don't know why. Because the negative effects for this change don't exist
on current QEMU IIUC, and we don't have compatible problems for live migration.

> you should submit the patch for 2.9.
> 

Thanks,
-Gonglei

[Qemu-devel] [PULL 03/24] block-backend: allow blk_prw from coroutine context

2017-02-20 Thread Stefan Hajnoczi

From: Paolo Bonzini 

qcow2_create2 calls this.  Do not run a nested event loop, as that
breaks when aio_co_wake tries to queue the coroutine on the co_queue_wakeup
list of the currently running one.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Message-id: 20170213135235.12274-4-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 block/block-backend.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index efbf398..1177598 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -880,7 +880,6 @@ static int blk_prw(BlockBackend *blk, int64_t offset, 
uint8_t *buf,
 {
 QEMUIOVector qiov;
 struct iovec iov;
-Coroutine *co;
 BlkRwCo rwco;
 
 iov = (struct iovec) {
@@ -897,9 +896,14 @@ static int blk_prw(BlockBackend *blk, int64_t offset, 
uint8_t *buf,
 .ret= NOT_DONE,
 };
 
-co = qemu_coroutine_create(co_entry, &rwco);
-qemu_coroutine_enter(co);
-BDRV_POLL_WHILE(blk_bs(blk), rwco.ret == NOT_DONE);
+if (qemu_in_coroutine()) {
+/* Fast-path if already in coroutine context */
+co_entry(&rwco);
+} else {
+Coroutine *co = qemu_coroutine_create(co_entry, &rwco);
+qemu_coroutine_enter(co);
+BDRV_POLL_WHILE(blk_bs(blk), rwco.ret == NOT_DONE);
+}
 
 return rwco.ret;
 }
-- 
2.9.3

[Qemu-devel] [PULL 01/24] block: move AioContext, QEMUTimer, main-loop to libqemuutil

2017-02-20 Thread Stefan Hajnoczi

From: Paolo Bonzini 

AioContext is fairly self contained, the only dependency is QEMUTimer but
that in turn doesn't need anything else.  So move them out of block-obj-y
to avoid introducing a dependency from io/ to block-obj-y.

main-loop and its dependency iohandler also need to be moved, because
later in this series io/ will call iohandler_get_aio_context.

[Changed copyright "the QEMU team" to "other QEMU contributors" as
suggested by Daniel Berrange and agreed by Paolo.
--Stefan]

Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Message-id: 20170213135235.12274-2-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 Makefile.objs   |  4 ---
 stubs/Makefile.objs |  1 +
 tests/Makefile.include  | 11 
 util/Makefile.objs  |  6 +++-
 block/io.c  | 29 ---
 stubs/linux-aio.c   | 32 +
 stubs/set-fd-handler.c  | 11 
 aio-posix.c => util/aio-posix.c |  2 +-
 aio-win32.c => util/aio-win32.c |  0
 util/aiocb.c| 55 +
 async.c => util/async.c |  3 +-
 iohandler.c => util/iohandler.c |  0
 main-loop.c => util/main-loop.c |  0
 qemu-timer.c => util/qemu-timer.c   |  0
 thread-pool.c => util/thread-pool.c |  2 +-
 trace-events| 11 
 util/trace-events   | 11 
 17 files changed, 114 insertions(+), 64 deletions(-)
 create mode 100644 stubs/linux-aio.c
 rename aio-posix.c => util/aio-posix.c (99%)
 rename aio-win32.c => util/aio-win32.c (100%)
 create mode 100644 util/aiocb.c
 rename async.c => util/async.c (99%)
 rename iohandler.c => util/iohandler.c (100%)
 rename main-loop.c => util/main-loop.c (100%)
 rename qemu-timer.c => util/qemu-timer.c (100%)
 rename thread-pool.c => util/thread-pool.c (99%)

diff --git a/Makefile.objs b/Makefile.objs
index 431fc59..b4b29c2 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -9,12 +9,8 @@ chardev-obj-y = chardev/
 ###
 # block-obj-y is code used by both qemu system emulation and qemu-img
 
-block-obj-y = async.o thread-pool.o
 block-obj-y += nbd/
 block-obj-y += block.o blockjob.o
-block-obj-y += main-loop.o iohandler.o qemu-timer.o
-block-obj-$(CONFIG_POSIX) += aio-posix.o
-block-obj-$(CONFIG_WIN32) += aio-win32.o
 block-obj-y += block/
 block-obj-y += qemu-io-cmds.o
 block-obj-$(CONFIG_REPLICATION) += replication.o
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index a187295..aa6050f 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -16,6 +16,7 @@ stub-obj-y += get-vm-name.o
 stub-obj-y += iothread.o
 stub-obj-y += iothread-lock.o
 stub-obj-y += is-daemonized.o
+stub-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
 stub-obj-y += machine-init-done.o
 stub-obj-y += migr-blocker.o
 stub-obj-y += monitor.o
diff --git a/tests/Makefile.include b/tests/Makefile.include
index 634394a..fd9c70a 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -45,6 +45,9 @@ check-unit-y += tests/test-visitor-serialization$(EXESUF)
 check-unit-y += tests/test-iov$(EXESUF)
 gcov-files-test-iov-y = util/iov.c
 check-unit-y += tests/test-aio$(EXESUF)
+gcov-files-test-aio-y = util/async.c util/qemu-timer.o
+gcov-files-test-aio-$(CONFIG_WIN32) += util/aio-win32.c
+gcov-files-test-aio-$(CONFIG_POSIX) += util/aio-posix.c
 check-unit-y += tests/test-throttle$(EXESUF)
 gcov-files-test-aio-$(CONFIG_WIN32) = aio-win32.c
 gcov-files-test-aio-$(CONFIG_POSIX) = aio-posix.c
@@ -517,8 +520,7 @@ tests/check-qjson$(EXESUF): tests/check-qjson.o 
$(test-util-obj-y)
 tests/check-qom-interface$(EXESUF): tests/check-qom-interface.o 
$(test-qom-obj-y)
 tests/check-qom-proplist$(EXESUF): tests/check-qom-proplist.o $(test-qom-obj-y)
 
-tests/test-char$(EXESUF): tests/test-char.o qemu-timer.o \
-   $(test-util-obj-y) $(qtest-obj-y) $(test-block-obj-y) $(chardev-obj-y)
+tests/test-char$(EXESUF): tests/test-char.o $(test-util-obj-y) $(qtest-obj-y) 
$(test-io-obj-y) $(chardev-obj-y)
 tests/test-coroutine$(EXESUF): tests/test-coroutine.o $(test-block-obj-y)
 tests/test-aio$(EXESUF): tests/test-aio.o $(test-block-obj-y)
 tests/test-throttle$(EXESUF): tests/test-throttle.o $(test-block-obj-y)
@@ -551,8 +553,7 @@ tests/test-vmstate$(EXESUF): tests/test-vmstate.o \
migration/vmstate.o migration/qemu-file.o \
 migration/qemu-file-channel.o migration/qjson.o \
$(test-io-obj-y)
-tests/test-timed-average$(EXESUF): tests/test-timed-average.o qemu-timer.o \
-   $(test-util-obj-y)
+tests/test-timed-average$(EXESUF): tests/test-timed-average.o 
$(test-util-obj-y)
 tests/test-base64$(EXESUF): tests/test-base64.o \
libqemuutil.a libqemustub.a
 tests/ptimer-test$(EXESUF): tests/ptimer-test.o tests/ptimer-test-stubs.o 
hw/core/ptimer.o libqemustub.a
@@ -712,7 +713,7 @@ tests/usb-hcd-ehci-test$(EXESUF)

[Qemu-devel] [PULL 00/24] Block patches

2017-02-20 Thread Stefan Hajnoczi

The following changes since commit 5dae13cd71f0755a1395b5a4cde635b8a6ee3f58:

  Merge remote-tracking branch 'remotes/rth/tags/pull-or-20170214' into staging 
(2017-02-14 09:55:48 +)

are available in the git repository at:

  git://github.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to decc18f33adecb1316437a47fff0cf0a7665906a:

  coroutine-lock: make CoRwlock thread-safe and fair (2017-02-16 17:17:34 +)





Paolo Bonzini (24):
  block: move AioContext, QEMUTimer, main-loop to libqemuutil
  aio: introduce aio_co_schedule and aio_co_wake
  block-backend: allow blk_prw from coroutine context
  test-thread-pool: use generic AioContext infrastructure
  io: add methods to set I/O handlers on AioContext
  io: make qio_channel_yield aware of AioContexts
  nbd: convert to use qio_channel_yield
  coroutine-lock: reschedule coroutine on the AioContext it was running
on
  blkdebug: reschedule coroutine on the AioContext it is running on
  qed: introduce qed_aio_start_io and qed_aio_next_io_cb
  aio: push aio_context_acquire/release down to dispatching
  block: explicitly acquire aiocontext in timers that need it
  block: explicitly acquire aiocontext in callbacks that need it
  block: explicitly acquire aiocontext in bottom halves that need it
  block: explicitly acquire aiocontext in aio callbacks that need it
  aio-posix: partially inline aio_dispatch into aio_poll
  async: remove unnecessary inc/dec pairs
  block: document fields protected by AioContext lock
  coroutine-lock: make CoMutex thread-safe
  coroutine-lock: add limited spinning to CoMutex
  test-aio-multithread: add performance comparison with thread-based
mutexes
  coroutine-lock: place CoMutex before CoQueue in header
  coroutine-lock: add mutex argument to CoQueue APIs
  coroutine-lock: make CoRwlock thread-safe and fair

 Makefile.objs   |   4 -
 stubs/Makefile.objs |   1 +
 tests/Makefile.include  |  19 +-
 util/Makefile.objs  |   6 +-
 block/nbd-client.h  |   2 +-
 block/qed.h |   3 +
 include/block/aio.h |  38 ++-
 include/block/block_int.h   |  64 +++--
 include/io/channel.h|  72 +-
 include/qemu/coroutine.h|  84 ---
 include/qemu/coroutine_int.h|  11 +-
 include/sysemu/block-backend.h  |  14 +-
 tests/iothread.h|  25 ++
 block/backup.c  |   2 +-
 block/blkdebug.c|   9 +-
 block/blkreplay.c   |   2 +-
 block/block-backend.c   |  13 +-
 block/curl.c|  44 +++-
 block/gluster.c |   9 +-
 block/io.c  |  42 +---
 block/iscsi.c   |  15 +-
 block/linux-aio.c   |  10 +-
 block/mirror.c  |  12 +-
 block/nbd-client.c  | 119 +
 block/nfs.c |   9 +-
 block/qcow2-cluster.c   |   4 +-
 block/qed-cluster.c |   2 +
 block/qed-table.c   |  12 +-
 block/qed.c |  58 +++--
 block/sheepdog.c|  31 +--
 block/ssh.c |  29 +--
 block/throttle-groups.c |   4 +-
 block/win32-aio.c   |   9 +-
 dma-helpers.c   |   2 +
 hw/9pfs/9p.c|   2 +-
 hw/block/virtio-blk.c   |  19 +-
 hw/scsi/scsi-bus.c  |   2 +
 hw/scsi/scsi-disk.c |  15 ++
 hw/scsi/scsi-generic.c  |  20 +-
 hw/scsi/virtio-scsi.c   |   6 +
 io/channel-command.c|  13 +
 io/channel-file.c   |  11 +
 io/channel-socket.c |  16 +-
 io/channel-tls.c|  12 +
 io/channel-watch.c  |   6 +
 io/channel.c|  97 ++--
 nbd/client.c|   2 +-
 nbd/common.c|   9 +-
 nbd/server.c|  94 +++-
 stubs/linux-aio.c   |  32 +++
 stubs/set-fd-handler.c  |  11 -
 tests/iothread.c|  91 +++
 tests/test-aio-multithread.c| 463 
 tests/test-thread-pool.c|  12 +-
 aio-posix.c => util/aio-posix.c |  62 ++---
 aio-win32.c => util/aio-win32.c |  30 +--
 util/aiocb.c|  55 +
 async.c => util/async.c |  84 ++-
 iohandler.c => util/iohandler.c |   0
 main-loop.c => util/main-loop.c |   0
 util/qemu-coroutine-lock.c  | 254 ++--
 util/qemu-coroutine-sleep.c |   2 +-
 util/qemu-coroutine.c   |   8 +
 qemu-timer.c => util/qem

[Qemu-devel] [PULL 07/24] nbd: convert to use qio_channel_yield

2017-02-20 Thread Stefan Hajnoczi

From: Paolo Bonzini 

In the client, read the reply headers from a coroutine, switching the
read side between the "read header" coroutine and the I/O coroutine that
reads the body of the reply.

In the server, if the server can read more requests it will create a new
"read request" coroutine as soon as a request has been read.  Otherwise,
the new coroutine is created in nbd_request_put.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Reviewed-by: Daniel P. Berrange 
Message-id: 20170213135235.12274-8-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 block/nbd-client.h |   2 +-
 block/nbd-client.c | 117 -
 nbd/client.c   |   2 +-
 nbd/common.c   |   9 +
 nbd/server.c   |  94 +-
 5 files changed, 83 insertions(+), 141 deletions(-)

diff --git a/block/nbd-client.h b/block/nbd-client.h
index f8d6006..8cdfc92 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -25,7 +25,7 @@ typedef struct NBDClientSession {
 
 CoMutex send_mutex;
 CoQueue free_sema;
-Coroutine *send_coroutine;
+Coroutine *read_reply_co;
 int in_flight;
 
 Coroutine *recv_coroutine[MAX_NBD_REQUESTS];
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 06f1532..10fcc9e 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -33,8 +33,9 @@
 #define HANDLE_TO_INDEX(bs, handle) ((handle) ^ ((uint64_t)(intptr_t)bs))
 #define INDEX_TO_HANDLE(bs, index)  ((index)  ^ ((uint64_t)(intptr_t)bs))
 
-static void nbd_recv_coroutines_enter_all(NBDClientSession *s)
+static void nbd_recv_coroutines_enter_all(BlockDriverState *bs)
 {
+NBDClientSession *s = nbd_get_client_session(bs);
 int i;
 
 for (i = 0; i < MAX_NBD_REQUESTS; i++) {
@@ -42,6 +43,7 @@ static void nbd_recv_coroutines_enter_all(NBDClientSession *s)
 qemu_coroutine_enter(s->recv_coroutine[i]);
 }
 }
+BDRV_POLL_WHILE(bs, s->read_reply_co);
 }
 
 static void nbd_teardown_connection(BlockDriverState *bs)
@@ -56,7 +58,7 @@ static void nbd_teardown_connection(BlockDriverState *bs)
 qio_channel_shutdown(client->ioc,
  QIO_CHANNEL_SHUTDOWN_BOTH,
  NULL);
-nbd_recv_coroutines_enter_all(client);
+nbd_recv_coroutines_enter_all(bs);
 
 nbd_client_detach_aio_context(bs);
 object_unref(OBJECT(client->sioc));
@@ -65,54 +67,43 @@ static void nbd_teardown_connection(BlockDriverState *bs)
 client->ioc = NULL;
 }
 
-static void nbd_reply_ready(void *opaque)
+static coroutine_fn void nbd_read_reply_entry(void *opaque)
 {
-BlockDriverState *bs = opaque;
-NBDClientSession *s = nbd_get_client_session(bs);
+NBDClientSession *s = opaque;
 uint64_t i;
 int ret;
 
-if (!s->ioc) { /* Already closed */
-return;
-}
-
-if (s->reply.handle == 0) {
-/* No reply already in flight.  Fetch a header.  It is possible
- * that another thread has done the same thing in parallel, so
- * the socket is not readable anymore.
- */
+for (;;) {
+assert(s->reply.handle == 0);
 ret = nbd_receive_reply(s->ioc, &s->reply);
-if (ret == -EAGAIN) {
-return;
-}
 if (ret < 0) {
-s->reply.handle = 0;
-goto fail;
+break;
 }
-}
 
-/* There's no need for a mutex on the receive side, because the
- * handler acts as a synchronization point and ensures that only
- * one coroutine is called until the reply finishes.  */
-i = HANDLE_TO_INDEX(s, s->reply.handle);
-if (i >= MAX_NBD_REQUESTS) {
-goto fail;
-}
+/* There's no need for a mutex on the receive side, because the
+ * handler acts as a synchronization point and ensures that only
+ * one coroutine is called until the reply finishes.
+ */
+i = HANDLE_TO_INDEX(s, s->reply.handle);
+if (i >= MAX_NBD_REQUESTS || !s->recv_coroutine[i]) {
+break;
+}
 
-if (s->recv_coroutine[i]) {
-qemu_coroutine_enter(s->recv_coroutine[i]);
-return;
+/* We're woken up by the recv_coroutine itself.  Note that there
+ * is no race between yielding and reentering read_reply_co.  This
+ * is because:
+ *
+ * - if recv_coroutine[i] runs on the same AioContext, it is only
+ *   entered after we yield
+ *
+ * - if recv_coroutine[i] runs on a different AioContext, reentering
+ *   read_reply_co happens through a bottom half, which can only
+ *   run after we yield.
+ */
+aio_co_wake(s->recv_coroutine[i]);
+qemu_coroutine_yield();
 }
-
-fail:
-nbd_teardown_connection(bs);
-}
-
-static void nbd_restart_write(void *opaque)
-{
-BlockDriverState *bs = opaque;
-
-qemu_coroutine_enter(nbd_get_client_session(bs)->send_coroutine);
+

[Qemu-devel] [PULL 04/24] test-thread-pool: use generic AioContext infrastructure

2017-02-20 Thread Stefan Hajnoczi

From: Paolo Bonzini 

Once the thread pool starts using aio_co_wake, it will also need
qemu_get_current_aio_context().  Make test-thread-pool create
an AioContext with qemu_init_main_loop, so that stubs/iothread.c
and tests/iothread.c can provide the rest.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Message-id: 20170213135235.12274-5-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 tests/test-thread-pool.c | 12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/tests/test-thread-pool.c b/tests/test-thread-pool.c
index 8dbf66a..91b4ec5 100644
--- a/tests/test-thread-pool.c
+++ b/tests/test-thread-pool.c
@@ -6,6 +6,7 @@
 #include "qapi/error.h"
 #include "qemu/timer.h"
 #include "qemu/error-report.h"
+#include "qemu/main-loop.h"
 
 static AioContext *ctx;
 static ThreadPool *pool;
@@ -224,15 +225,9 @@ static void test_cancel_async(void)
 int main(int argc, char **argv)
 {
 int ret;
-Error *local_error = NULL;
 
-init_clocks();
-
-ctx = aio_context_new(&local_error);
-if (!ctx) {
-error_reportf_err(local_error, "Failed to create AIO Context: ");
-exit(1);
-}
+qemu_init_main_loop(&error_abort);
+ctx = qemu_get_current_aio_context();
 pool = aio_get_thread_pool(ctx);
 
 g_test_init(&argc, &argv, NULL);
@@ -245,6 +240,5 @@ int main(int argc, char **argv)
 
 ret = g_test_run();
 
-aio_context_unref(ctx);
 return ret;
 }
-- 
2.9.3

[Qemu-devel] [PULL 02/24] aio: introduce aio_co_schedule and aio_co_wake

2017-02-20 Thread Stefan Hajnoczi

From: Paolo Bonzini 

aio_co_wake provides the infrastructure to start a coroutine on a "home"
AioContext.  It will be used by CoMutex and CoQueue, so that coroutines
don't jump from one context to another when they go to sleep on a
mutex or waitqueue.  However, it can also be used as a more efficient
alternative to one-shot bottom halves, and saves the effort of tracking
which AioContext a coroutine is running on.

aio_co_schedule is the part of aio_co_wake that starts a coroutine
on a remove AioContext, but it is also useful to implement e.g.
bdrv_set_aio_context callbacks.

The implementation of aio_co_schedule is based on a lock-free
multiple-producer, single-consumer queue.  The multiple producers use
cmpxchg to add to a LIFO stack.  The consumer (a per-AioContext bottom
half) grabs all items added so far, inverts the list to make it FIFO,
and goes through it one item at a time until it's empty.  The data
structure was inspired by OSv, which uses it in the very code we'll
"port" to QEMU for the thread-safe CoMutex.

Most of the new code is really tests.

Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Message-id: 20170213135235.12274-3-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 tests/Makefile.include   |   8 +-
 include/block/aio.h  |  32 +++
 include/qemu/coroutine_int.h |  11 ++-
 tests/iothread.h |  25 +
 tests/iothread.c |  91 ++
 tests/test-aio-multithread.c | 213 +++
 util/async.c |  65 +
 util/qemu-coroutine.c|   8 ++
 util/trace-events|   4 +
 9 files changed, 453 insertions(+), 4 deletions(-)
 create mode 100644 tests/iothread.h
 create mode 100644 tests/iothread.c
 create mode 100644 tests/test-aio-multithread.c

diff --git a/tests/Makefile.include b/tests/Makefile.include
index fd9c70a..e60bb6c 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -48,9 +48,10 @@ check-unit-y += tests/test-aio$(EXESUF)
 gcov-files-test-aio-y = util/async.c util/qemu-timer.o
 gcov-files-test-aio-$(CONFIG_WIN32) += util/aio-win32.c
 gcov-files-test-aio-$(CONFIG_POSIX) += util/aio-posix.c
+check-unit-y += tests/test-aio-multithread$(EXESUF)
+gcov-files-test-aio-multithread-y = $(gcov-files-test-aio-y)
+gcov-files-test-aio-multithread-y += util/qemu-coroutine.c tests/iothread.c
 check-unit-y += tests/test-throttle$(EXESUF)
-gcov-files-test-aio-$(CONFIG_WIN32) = aio-win32.c
-gcov-files-test-aio-$(CONFIG_POSIX) = aio-posix.c
 check-unit-y += tests/test-thread-pool$(EXESUF)
 gcov-files-test-thread-pool-y = thread-pool.c
 gcov-files-test-hbitmap-y = util/hbitmap.c
@@ -508,7 +509,7 @@ test-qapi-obj-y = tests/test-qapi-visit.o 
tests/test-qapi-types.o \
$(test-qom-obj-y)
 test-crypto-obj-y = $(crypto-obj-y) $(test-qom-obj-y)
 test-io-obj-y = $(io-obj-y) $(test-crypto-obj-y)
-test-block-obj-y = $(block-obj-y) $(test-io-obj-y)
+test-block-obj-y = $(block-obj-y) $(test-io-obj-y) tests/iothread.o
 
 tests/check-qint$(EXESUF): tests/check-qint.o $(test-util-obj-y)
 tests/check-qstring$(EXESUF): tests/check-qstring.o $(test-util-obj-y)
@@ -523,6 +524,7 @@ tests/check-qom-proplist$(EXESUF): 
tests/check-qom-proplist.o $(test-qom-obj-y)
 tests/test-char$(EXESUF): tests/test-char.o $(test-util-obj-y) $(qtest-obj-y) 
$(test-io-obj-y) $(chardev-obj-y)
 tests/test-coroutine$(EXESUF): tests/test-coroutine.o $(test-block-obj-y)
 tests/test-aio$(EXESUF): tests/test-aio.o $(test-block-obj-y)
+tests/test-aio-multithread$(EXESUF): tests/test-aio-multithread.o 
$(test-block-obj-y)
 tests/test-throttle$(EXESUF): tests/test-throttle.o $(test-block-obj-y)
 tests/test-blockjob$(EXESUF): tests/test-blockjob.o $(test-block-obj-y) 
$(test-util-obj-y)
 tests/test-blockjob-txn$(EXESUF): tests/test-blockjob-txn.o 
$(test-block-obj-y) $(test-util-obj-y)
diff --git a/include/block/aio.h b/include/block/aio.h
index 7df271d..614cbc6 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -47,6 +47,7 @@ typedef void QEMUBHFunc(void *opaque);
 typedef bool AioPollFn(void *opaque);
 typedef void IOHandler(void *opaque);
 
+struct Coroutine;
 struct ThreadPool;
 struct LinuxAioState;
 
@@ -108,6 +109,9 @@ struct AioContext {
 bool notified;
 EventNotifier notifier;
 
+QSLIST_HEAD(, Coroutine) scheduled_coroutines;
+QEMUBH *co_schedule_bh;
+
 /* Thread pool for performing work and receiving completion callbacks.
  * Has its own locking.
  */
@@ -483,6 +487,34 @@ static inline bool aio_node_check(AioContext *ctx, bool 
is_external)
 }
 
 /**
+ * aio_co_schedule:
+ * @ctx: the aio context
+ * @co: the coroutine
+ *
+ * Start a coroutine on a remote AioContext.
+ *
+ * The coroutine must not be entered by anyone else while aio_co_schedule()
+ * is active.  In addition the coroutine must have yielded unless ctx
+ * is the context in which the coroutine is running (i.e. the value of
+ * qemu_get_current_aio_context() from the coroutine its

[Qemu-devel] [PULL 11/24] aio: push aio_context_acquire/release down to dispatching

2017-02-20 Thread Stefan Hajnoczi

From: Paolo Bonzini 

The AioContext data structures are now protected by list_lock and/or
they are walked with FOREACH_RCU primitives.  There is no need anymore
to acquire the AioContext for the entire duration of aio_dispatch.
Instead, just acquire it before and after invoking the callbacks.
The next step is then to push it further down.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Reviewed-by: Daniel P. Berrange 
Message-id: 20170213135235.12274-12-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 util/aio-posix.c | 25 +++--
 util/aio-win32.c | 15 +++
 util/async.c |  2 ++
 3 files changed, 20 insertions(+), 22 deletions(-)

diff --git a/util/aio-posix.c b/util/aio-posix.c
index a8d7090..b590c5a 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -402,7 +402,9 @@ static bool aio_dispatch_handlers(AioContext *ctx)
 (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) &&
 aio_node_check(ctx, node->is_external) &&
 node->io_read) {
+aio_context_acquire(ctx);
 node->io_read(node->opaque);
+aio_context_release(ctx);
 
 /* aio_notify() does not count as progress */
 if (node->opaque != &ctx->notifier) {
@@ -413,7 +415,9 @@ static bool aio_dispatch_handlers(AioContext *ctx)
 (revents & (G_IO_OUT | G_IO_ERR)) &&
 aio_node_check(ctx, node->is_external) &&
 node->io_write) {
+aio_context_acquire(ctx);
 node->io_write(node->opaque);
+aio_context_release(ctx);
 progress = true;
 }
 
@@ -450,7 +454,9 @@ bool aio_dispatch(AioContext *ctx, bool dispatch_fds)
 }
 
 /* Run our timers */
+aio_context_acquire(ctx);
 progress |= timerlistgroup_run_timers(&ctx->tlg);
+aio_context_release(ctx);
 
 return progress;
 }
@@ -597,9 +603,6 @@ bool aio_poll(AioContext *ctx, bool blocking)
 int64_t timeout;
 int64_t start = 0;
 
-aio_context_acquire(ctx);
-progress = false;
-
 /* aio_notify can avoid the expensive event_notifier_set if
  * everything (file descriptors, bottom halves, timers) will
  * be re-evaluated before the next blocking poll().  This is
@@ -617,9 +620,11 @@ bool aio_poll(AioContext *ctx, bool blocking)
 start = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 }
 
-if (try_poll_mode(ctx, blocking)) {
-progress = true;
-} else {
+aio_context_acquire(ctx);
+progress = try_poll_mode(ctx, blocking);
+aio_context_release(ctx);
+
+if (!progress) {
 assert(npfd == 0);
 
 /* fill pollfds */
@@ -636,9 +641,6 @@ bool aio_poll(AioContext *ctx, bool blocking)
 timeout = blocking ? aio_compute_timeout(ctx) : 0;
 
 /* wait until next event */
-if (timeout) {
-aio_context_release(ctx);
-}
 if (aio_epoll_check_poll(ctx, pollfds, npfd, timeout)) {
 AioHandler epoll_handler;
 
@@ -650,9 +652,6 @@ bool aio_poll(AioContext *ctx, bool blocking)
 } else  {
 ret = qemu_poll_ns(pollfds, npfd, timeout);
 }
-if (timeout) {
-aio_context_acquire(ctx);
-}
 }
 
 if (blocking) {
@@ -717,8 +716,6 @@ bool aio_poll(AioContext *ctx, bool blocking)
 progress = true;
 }
 
-aio_context_release(ctx);
-
 return progress;
 }
 
diff --git a/util/aio-win32.c b/util/aio-win32.c
index 900524c..ab6d0e5 100644
--- a/util/aio-win32.c
+++ b/util/aio-win32.c
@@ -266,7 +266,9 @@ static bool aio_dispatch_handlers(AioContext *ctx, HANDLE 
event)
 (revents || event_notifier_get_handle(node->e) == event) &&
 node->io_notify) {
 node->pfd.revents = 0;
+aio_context_acquire(ctx);
 node->io_notify(node->e);
+aio_context_release(ctx);
 
 /* aio_notify() does not count as progress */
 if (node->e != &ctx->notifier) {
@@ -278,11 +280,15 @@ static bool aio_dispatch_handlers(AioContext *ctx, HANDLE 
event)
 (node->io_read || node->io_write)) {
 node->pfd.revents = 0;
 if ((revents & G_IO_IN) && node->io_read) {
+aio_context_acquire(ctx);
 node->io_read(node->opaque);
+aio_context_release(ctx);
 progress = true;
 }
 if ((revents & G_IO_OUT) && node->io_write) {
+aio_context_acquire(ctx);
 node->io_write(node->opaque);
+aio_context_release(ctx);
 progress = true;
 }
 
@@ -329,7 +335,6 @@ bool aio_poll(AioContext *ctx, bool blocking)
 int count;
 int timeout;
 
-aio_context_acquire(ctx);
 progress = false;
 
 /* aio_notify can avoid the expensive event_notifier_set if
@@ -371,17 +376,11 @@ bool aio_poll(AioContext *ctx, bool blocking)
 
 timeout = block

[Qemu-devel] [PULL 14/24] block: explicitly acquire aiocontext in bottom halves that need it

2017-02-20 Thread Stefan Hajnoczi

From: Paolo Bonzini 

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Reviewed-by: Daniel P. Berrange 
Message-id: 20170213135235.12274-15-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 block/archipelago.c   |  3 +++
 block/blkreplay.c |  2 +-
 block/block-backend.c |  6 ++
 block/curl.c  | 26 ++
 block/gluster.c   |  9 +
 block/io.c|  6 +-
 block/iscsi.c |  6 +-
 block/linux-aio.c | 15 +--
 block/nfs.c   |  3 ++-
 block/null.c  |  4 
 block/qed.c   |  3 +++
 block/rbd.c   |  4 
 dma-helpers.c |  2 ++
 hw/block/virtio-blk.c |  2 ++
 hw/scsi/scsi-bus.c|  2 ++
 util/async.c  |  4 ++--
 util/thread-pool.c|  2 ++
 17 files changed, 71 insertions(+), 28 deletions(-)

diff --git a/block/archipelago.c b/block/archipelago.c
index 2449cfc..a624390 100644
--- a/block/archipelago.c
+++ b/block/archipelago.c
@@ -310,8 +310,11 @@ static void qemu_archipelago_complete_aio(void *opaque)
 {
 AIORequestData *reqdata = (AIORequestData *) opaque;
 ArchipelagoAIOCB *aio_cb = (ArchipelagoAIOCB *) reqdata->aio_cb;
+AioContext *ctx = bdrv_get_aio_context(aio_cb->common.bs);
 
+aio_context_acquire(ctx);
 aio_cb->common.cb(aio_cb->common.opaque, aio_cb->ret);
+aio_context_release(ctx);
 aio_cb->status = 0;
 
 qemu_aio_unref(aio_cb);
diff --git a/block/blkreplay.c b/block/blkreplay.c
index a741654..cfc8c5b 100755
--- a/block/blkreplay.c
+++ b/block/blkreplay.c
@@ -60,7 +60,7 @@ static int64_t blkreplay_getlength(BlockDriverState *bs)
 static void blkreplay_bh_cb(void *opaque)
 {
 Request *req = opaque;
-qemu_coroutine_enter(req->co);
+aio_co_wake(req->co);
 qemu_bh_delete(req->bh);
 g_free(req);
 }
diff --git a/block/block-backend.c b/block/block-backend.c
index 1177598..bfc0e6b 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -939,9 +939,12 @@ int blk_make_zero(BlockBackend *blk, BdrvRequestFlags 
flags)
 static void error_callback_bh(void *opaque)
 {
 struct BlockBackendAIOCB *acb = opaque;
+AioContext *ctx = bdrv_get_aio_context(acb->common.bs);
 
 bdrv_dec_in_flight(acb->common.bs);
+aio_context_acquire(ctx);
 acb->common.cb(acb->common.opaque, acb->ret);
+aio_context_release(ctx);
 qemu_aio_unref(acb);
 }
 
@@ -983,9 +986,12 @@ static void blk_aio_complete(BlkAioEmAIOCB *acb)
 static void blk_aio_complete_bh(void *opaque)
 {
 BlkAioEmAIOCB *acb = opaque;
+AioContext *ctx = bdrv_get_aio_context(acb->common.bs);
 
 assert(acb->has_returned);
+aio_context_acquire(ctx);
 blk_aio_complete(acb);
+aio_context_release(ctx);
 }
 
 static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,
diff --git a/block/curl.c b/block/curl.c
index 05b9ca3..f3f063b 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -796,13 +796,18 @@ static void curl_readv_bh_cb(void *p)
 {
 CURLState *state;
 int running;
+int ret = -EINPROGRESS;
 
 CURLAIOCB *acb = p;
-BDRVCURLState *s = acb->common.bs->opaque;
+BlockDriverState *bs = acb->common.bs;
+BDRVCURLState *s = bs->opaque;
+AioContext *ctx = bdrv_get_aio_context(bs);
 
 size_t start = acb->sector_num * BDRV_SECTOR_SIZE;
 size_t end;
 
+aio_context_acquire(ctx);
+
 // In case we have the requested data already (e.g. read-ahead),
 // we can just call the callback and be done.
 switch (curl_find_buf(s, start, acb->nb_sectors * BDRV_SECTOR_SIZE, acb)) {
@@ -810,7 +815,7 @@ static void curl_readv_bh_cb(void *p)
 qemu_aio_unref(acb);
 // fall through
 case FIND_RET_WAIT:
-return;
+goto out;
 default:
 break;
 }
@@ -818,9 +823,8 @@ static void curl_readv_bh_cb(void *p)
 // No cache found, so let's start a new request
 state = curl_init_state(acb->common.bs, s);
 if (!state) {
-acb->common.cb(acb->common.opaque, -EIO);
-qemu_aio_unref(acb);
-return;
+ret = -EIO;
+goto out;
 }
 
 acb->start = 0;
@@ -834,9 +838,8 @@ static void curl_readv_bh_cb(void *p)
 state->orig_buf = g_try_malloc(state->buf_len);
 if (state->buf_len && state->orig_buf == NULL) {
 curl_clean_state(state);
-acb->common.cb(acb->common.opaque, -ENOMEM);
-qemu_aio_unref(acb);
-return;
+ret = -ENOMEM;
+goto out;
 }
 state->acb[0] = acb;
 
@@ -849,6 +852,13 @@ static void curl_readv_bh_cb(void *p)
 
 /* Tell curl it needs to kick things off */
 curl_multi_socket_action(s->multi, CURL_SOCKET_TIMEOUT, 0, &running);
+
+out:
+if (ret != -EINPROGRESS) {
+acb->common.cb(acb->common.opaque, ret);
+qemu_aio_unref(acb);
+}
+aio_context_release(ctx);
 }
 
 static BlockAIOCB *curl_aio_readv(BlockDriverState *bs,
diff --git a/block/glu

[Qemu-devel] [PULL 05/24] io: add methods to set I/O handlers on AioContext

2017-02-20 Thread Stefan Hajnoczi

From: Paolo Bonzini 

This is in preparation for making qio_channel_yield work on
AioContexts other than the main one.

Reviewed-by: Daniel P. Berrange 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Message-id: 20170213135235.12274-6-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/io/channel.h | 25 +
 io/channel-command.c | 13 +
 io/channel-file.c| 11 +++
 io/channel-socket.c  | 16 +++-
 io/channel-tls.c | 12 
 io/channel-watch.c   |  6 ++
 io/channel.c | 11 +++
 7 files changed, 89 insertions(+), 5 deletions(-)

diff --git a/include/io/channel.h b/include/io/channel.h
index 32a9470..0bc7c3f 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -23,6 +23,7 @@
 
 #include "qemu-common.h"
 #include "qom/object.h"
+#include "block/aio.h"
 
 #define TYPE_QIO_CHANNEL "qio-channel"
 #define QIO_CHANNEL(obj)\
@@ -132,6 +133,11 @@ struct QIOChannelClass {
  off_t offset,
  int whence,
  Error **errp);
+void (*io_set_aio_fd_handler)(QIOChannel *ioc,
+  AioContext *ctx,
+  IOHandler *io_read,
+  IOHandler *io_write,
+  void *opaque);
 };
 
 /* General I/O handling functions */
@@ -525,4 +531,23 @@ void qio_channel_yield(QIOChannel *ioc,
 void qio_channel_wait(QIOChannel *ioc,
   GIOCondition condition);
 
+/**
+ * qio_channel_set_aio_fd_handler:
+ * @ioc: the channel object
+ * @ctx: the AioContext to set the handlers on
+ * @io_read: the read handler
+ * @io_write: the write handler
+ * @opaque: the opaque value passed to the handler
+ *
+ * This is used internally by qio_channel_yield().  It can
+ * be used by channel implementations to forward the handlers
+ * to another channel (e.g. from #QIOChannelTLS to the
+ * underlying socket).
+ */
+void qio_channel_set_aio_fd_handler(QIOChannel *ioc,
+AioContext *ctx,
+IOHandler *io_read,
+IOHandler *io_write,
+void *opaque);
+
 #endif /* QIO_CHANNEL_H */
diff --git a/io/channel-command.c b/io/channel-command.c
index ad25313..319c5ed 100644
--- a/io/channel-command.c
+++ b/io/channel-command.c
@@ -328,6 +328,18 @@ static int qio_channel_command_close(QIOChannel *ioc,
 }
 
 
+static void qio_channel_command_set_aio_fd_handler(QIOChannel *ioc,
+   AioContext *ctx,
+   IOHandler *io_read,
+   IOHandler *io_write,
+   void *opaque)
+{
+QIOChannelCommand *cioc = QIO_CHANNEL_COMMAND(ioc);
+aio_set_fd_handler(ctx, cioc->readfd, false, io_read, NULL, NULL, opaque);
+aio_set_fd_handler(ctx, cioc->writefd, false, NULL, io_write, NULL, 
opaque);
+}
+
+
 static GSource *qio_channel_command_create_watch(QIOChannel *ioc,
  GIOCondition condition)
 {
@@ -349,6 +361,7 @@ static void qio_channel_command_class_init(ObjectClass 
*klass,
 ioc_klass->io_set_blocking = qio_channel_command_set_blocking;
 ioc_klass->io_close = qio_channel_command_close;
 ioc_klass->io_create_watch = qio_channel_command_create_watch;
+ioc_klass->io_set_aio_fd_handler = qio_channel_command_set_aio_fd_handler;
 }
 
 static const TypeInfo qio_channel_command_info = {
diff --git a/io/channel-file.c b/io/channel-file.c
index e1da243..b383273 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -186,6 +186,16 @@ static int qio_channel_file_close(QIOChannel *ioc,
 }
 
 
+static void qio_channel_file_set_aio_fd_handler(QIOChannel *ioc,
+AioContext *ctx,
+IOHandler *io_read,
+IOHandler *io_write,
+void *opaque)
+{
+QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc);
+aio_set_fd_handler(ctx, fioc->fd, false, io_read, io_write, NULL, opaque);
+}
+
 static GSource *qio_channel_file_create_watch(QIOChannel *ioc,
   GIOCondition condition)
 {
@@ -206,6 +216,7 @@ static void qio_channel_file_class_init(ObjectClass *klass,
 ioc_klass->io_seek = qio_channel_file_seek;
 ioc_klass->io_close = qio_channel_file_close;
 ioc_klass->io_create_watch = qio_channel_file_create_watch;
+ioc_klass->io_set_aio_fd_handler = qio_channel_file_set_aio_fd_handler;
 }
 
 static const TypeInfo qio_channel_file_info = {
diff --git a/io/channel-socket.c b/io/channel-socket.c
index f385233..f5

[Qemu-devel] [PULL 06/24] io: make qio_channel_yield aware of AioContexts

2017-02-20 Thread Stefan Hajnoczi

From: Paolo Bonzini 

Support separate coroutines for reading and writing, and place the
read/write handlers on the AioContext that the QIOChannel is registered
with.

Reviewed-by: Daniel P. Berrange 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Message-id: 20170213135235.12274-7-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/io/channel.h | 47 ++--
 io/channel.c | 86 +++-
 2 files changed, 109 insertions(+), 24 deletions(-)

diff --git a/include/io/channel.h b/include/io/channel.h
index 0bc7c3f..5d48906 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -23,6 +23,7 @@
 
 #include "qemu-common.h"
 #include "qom/object.h"
+#include "qemu/coroutine.h"
 #include "block/aio.h"
 
 #define TYPE_QIO_CHANNEL "qio-channel"
@@ -81,6 +82,9 @@ struct QIOChannel {
 Object parent;
 unsigned int features; /* bitmask of QIOChannelFeatures */
 char *name;
+AioContext *ctx;
+Coroutine *read_coroutine;
+Coroutine *write_coroutine;
 #ifdef _WIN32
 HANDLE event; /* For use with GSource on Win32 */
 #endif
@@ -503,13 +507,50 @@ guint qio_channel_add_watch(QIOChannel *ioc,
 
 
 /**
+ * qio_channel_attach_aio_context:
+ * @ioc: the channel object
+ * @ctx: the #AioContext to set the handlers on
+ *
+ * Request that qio_channel_yield() sets I/O handlers on
+ * the given #AioContext.  If @ctx is %NULL, qio_channel_yield()
+ * uses QEMU's main thread event loop.
+ *
+ * You can move a #QIOChannel from one #AioContext to another even if
+ * I/O handlers are set for a coroutine.  However, #QIOChannel provides
+ * no synchronization between the calls to qio_channel_yield() and
+ * qio_channel_attach_aio_context().
+ *
+ * Therefore you should first call qio_channel_detach_aio_context()
+ * to ensure that the coroutine is not entered concurrently.  Then,
+ * while the coroutine has yielded, call qio_channel_attach_aio_context(),
+ * and then aio_co_schedule() to place the coroutine on the new
+ * #AioContext.  The calls to qio_channel_detach_aio_context()
+ * and qio_channel_attach_aio_context() should be protected with
+ * aio_context_acquire() and aio_context_release().
+ */
+void qio_channel_attach_aio_context(QIOChannel *ioc,
+AioContext *ctx);
+
+/**
+ * qio_channel_detach_aio_context:
+ * @ioc: the channel object
+ *
+ * Disable any I/O handlers set by qio_channel_yield().  With the
+ * help of aio_co_schedule(), this allows moving a coroutine that was
+ * paused by qio_channel_yield() to another context.
+ */
+void qio_channel_detach_aio_context(QIOChannel *ioc);
+
+/**
  * qio_channel_yield:
  * @ioc: the channel object
  * @condition: the I/O condition to wait for
  *
- * Yields execution from the current coroutine until
- * the condition indicated by @condition becomes
- * available.
+ * Yields execution from the current coroutine until the condition
+ * indicated by @condition becomes available.  @condition must
+ * be either %G_IO_IN or %G_IO_OUT; it cannot contain both.  In
+ * addition, no two coroutine can be waiting on the same condition
+ * and channel at the same time.
  *
  * This must only be called from coroutine context
  */
diff --git a/io/channel.c b/io/channel.c
index ce470d7..cdf7454 100644
--- a/io/channel.c
+++ b/io/channel.c
@@ -21,7 +21,7 @@
 #include "qemu/osdep.h"
 #include "io/channel.h"
 #include "qapi/error.h"
-#include "qemu/coroutine.h"
+#include "qemu/main-loop.h"
 
 bool qio_channel_has_feature(QIOChannel *ioc,
  QIOChannelFeature feature)
@@ -238,36 +238,80 @@ off_t qio_channel_io_seek(QIOChannel *ioc,
 }
 
 
-typedef struct QIOChannelYieldData QIOChannelYieldData;
-struct QIOChannelYieldData {
-QIOChannel *ioc;
-Coroutine *co;
-};
+static void qio_channel_set_aio_fd_handlers(QIOChannel *ioc);
 
+static void qio_channel_restart_read(void *opaque)
+{
+QIOChannel *ioc = opaque;
+Coroutine *co = ioc->read_coroutine;
+
+ioc->read_coroutine = NULL;
+qio_channel_set_aio_fd_handlers(ioc);
+aio_co_wake(co);
+}
 
-static gboolean qio_channel_yield_enter(QIOChannel *ioc,
-GIOCondition condition,
-gpointer opaque)
+static void qio_channel_restart_write(void *opaque)
 {
-QIOChannelYieldData *data = opaque;
-qemu_coroutine_enter(data->co);
-return FALSE;
+QIOChannel *ioc = opaque;
+Coroutine *co = ioc->write_coroutine;
+
+ioc->write_coroutine = NULL;
+qio_channel_set_aio_fd_handlers(ioc);
+aio_co_wake(co);
 }
 
+static void qio_channel_set_aio_fd_handlers(QIOChannel *ioc)
+{
+IOHandler *rd_handler = NULL, *wr_handler = NULL;
+AioContext *ctx;
+
+if (ioc->read_coroutine) {
+rd_handler = qio_channel_restart_read;
+}
+if (ioc->write_coroutine) {
+wr_handler = qio_channel_restart_write;
+}
+
+ctx

[Qemu-devel] [PULL 12/24] block: explicitly acquire aiocontext in timers that need it

2017-02-20 Thread Stefan Hajnoczi

From: Paolo Bonzini 

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Reviewed-by: Daniel P. Berrange 
Message-id: 20170213135235.12274-13-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 block/qed.h |  3 +++
 block/curl.c|  2 ++
 block/io.c  |  5 +
 block/iscsi.c   |  8 ++--
 block/null.c|  4 
 block/qed.c | 12 
 block/throttle-groups.c |  2 ++
 util/aio-posix.c|  2 --
 util/aio-win32.c|  2 --
 util/qemu-coroutine-sleep.c |  2 +-
 10 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/block/qed.h b/block/qed.h
index 9676ab9..ce8c314 100644
--- a/block/qed.h
+++ b/block/qed.h
@@ -198,6 +198,9 @@ enum {
  */
 typedef void QEDFindClusterFunc(void *opaque, int ret, uint64_t offset, size_t 
len);
 
+void qed_acquire(BDRVQEDState *s);
+void qed_release(BDRVQEDState *s);
+
 /**
  * Generic callback for chaining async callbacks
  */
diff --git a/block/curl.c b/block/curl.c
index 792fef8..65e6da1 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -424,9 +424,11 @@ static void curl_multi_timeout_do(void *arg)
 return;
 }
 
+aio_context_acquire(s->aio_context);
 curl_multi_socket_action(s->multi, CURL_SOCKET_TIMEOUT, 0, &running);
 
 curl_multi_check_completion(s);
+aio_context_release(s->aio_context);
 #else
 abort();
 #endif
diff --git a/block/io.c b/block/io.c
index 76dfaf4..dd6c74f 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2080,6 +2080,11 @@ void bdrv_aio_cancel(BlockAIOCB *acb)
 if (acb->aiocb_info->get_aio_context) {
 aio_poll(acb->aiocb_info->get_aio_context(acb), true);
 } else if (acb->bs) {
+/* qemu_aio_ref and qemu_aio_unref are not thread-safe, so
+ * assert that we're not using an I/O thread.  Thread-safe
+ * code should use bdrv_aio_cancel_async exclusively.
+ */
+assert(bdrv_get_aio_context(acb->bs) == qemu_get_aio_context());
 aio_poll(bdrv_get_aio_context(acb->bs), true);
 } else {
 abort();
diff --git a/block/iscsi.c b/block/iscsi.c
index 1860f1b..664b71a 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -174,7 +174,7 @@ static void iscsi_retry_timer_expired(void *opaque)
 struct IscsiTask *iTask = opaque;
 iTask->complete = 1;
 if (iTask->co) {
-qemu_coroutine_enter(iTask->co);
+aio_co_wake(iTask->co);
 }
 }
 
@@ -1392,16 +1392,20 @@ static void iscsi_nop_timed_event(void *opaque)
 {
 IscsiLun *iscsilun = opaque;
 
+aio_context_acquire(iscsilun->aio_context);
 if (iscsi_get_nops_in_flight(iscsilun->iscsi) >= MAX_NOP_FAILURES) {
 error_report("iSCSI: NOP timeout. Reconnecting...");
 iscsilun->request_timed_out = true;
 } else if (iscsi_nop_out_async(iscsilun->iscsi, NULL, NULL, 0, NULL) != 0) 
{
 error_report("iSCSI: failed to sent NOP-Out. Disabling NOP messages.");
-return;
+goto out;
 }
 
 timer_mod(iscsilun->nop_timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + 
NOP_INTERVAL);
 iscsi_set_events(iscsilun);
+
+out:
+aio_context_release(iscsilun->aio_context);
 }
 
 static void iscsi_readcapacity_sync(IscsiLun *iscsilun, Error **errp)
diff --git a/block/null.c b/block/null.c
index b300390..356209a 100644
--- a/block/null.c
+++ b/block/null.c
@@ -141,7 +141,11 @@ static void null_bh_cb(void *opaque)
 static void null_timer_cb(void *opaque)
 {
 NullAIOCB *acb = opaque;
+AioContext *ctx = bdrv_get_aio_context(acb->common.bs);
+
+aio_context_acquire(ctx);
 acb->common.cb(acb->common.opaque, 0);
+aio_context_release(ctx);
 timer_deinit(&acb->timer);
 qemu_aio_unref(acb);
 }
diff --git a/block/qed.c b/block/qed.c
index 7f1c508..a21d025 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -345,10 +345,22 @@ static void qed_need_check_timer_cb(void *opaque)
 
 trace_qed_need_check_timer_cb(s);
 
+qed_acquire(s);
 qed_plug_allocating_write_reqs(s);
 
 /* Ensure writes are on disk before clearing flag */
 bdrv_aio_flush(s->bs->file->bs, qed_clear_need_check, s);
+qed_release(s);
+}
+
+void qed_acquire(BDRVQEDState *s)
+{
+aio_context_acquire(bdrv_get_aio_context(s->bs));
+}
+
+void qed_release(BDRVQEDState *s)
+{
+aio_context_release(bdrv_get_aio_context(s->bs));
 }
 
 static void qed_start_need_check_timer(BDRVQEDState *s)
diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index 17b2efb..aade5de 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -416,7 +416,9 @@ static void timer_cb(BlockBackend *blk, bool is_write)
 qemu_mutex_unlock(&tg->lock);
 
 /* Run the request that was waiting for this timer */
+aio_context_acquire(blk_get_aio_context(blk));
 empty_queue = !qemu_co_enter_next(&blkp->throttled_reqs[is_write]);
+aio_context_release(blk_get_aio_context(blk

[Qemu-devel] [PULL 15/24] block: explicitly acquire aiocontext in aio callbacks that need it

2017-02-20 Thread Stefan Hajnoczi

From: Paolo Bonzini 

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Reviewed-by: Daniel P. Berrange 
Message-id: 20170213135235.12274-16-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 block/archipelago.c|  3 ---
 block/block-backend.c  |  7 ---
 block/curl.c   |  2 +-
 block/io.c |  6 +-
 block/iscsi.c  |  3 ---
 block/linux-aio.c  |  5 +
 block/mirror.c | 12 +---
 block/null.c   |  8 
 block/qed-cluster.c|  2 ++
 block/qed-table.c  | 12 ++--
 block/qed.c|  4 ++--
 block/rbd.c|  4 
 block/win32-aio.c  |  3 ---
 hw/block/virtio-blk.c  | 12 +++-
 hw/scsi/scsi-disk.c| 15 +++
 hw/scsi/scsi-generic.c | 20 +---
 util/thread-pool.c |  4 +++-
 17 files changed, 72 insertions(+), 50 deletions(-)

diff --git a/block/archipelago.c b/block/archipelago.c
index a624390..2449cfc 100644
--- a/block/archipelago.c
+++ b/block/archipelago.c
@@ -310,11 +310,8 @@ static void qemu_archipelago_complete_aio(void *opaque)
 {
 AIORequestData *reqdata = (AIORequestData *) opaque;
 ArchipelagoAIOCB *aio_cb = (ArchipelagoAIOCB *) reqdata->aio_cb;
-AioContext *ctx = bdrv_get_aio_context(aio_cb->common.bs);
 
-aio_context_acquire(ctx);
 aio_cb->common.cb(aio_cb->common.opaque, aio_cb->ret);
-aio_context_release(ctx);
 aio_cb->status = 0;
 
 qemu_aio_unref(aio_cb);
diff --git a/block/block-backend.c b/block/block-backend.c
index bfc0e6b..819f272 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -939,12 +939,9 @@ int blk_make_zero(BlockBackend *blk, BdrvRequestFlags 
flags)
 static void error_callback_bh(void *opaque)
 {
 struct BlockBackendAIOCB *acb = opaque;
-AioContext *ctx = bdrv_get_aio_context(acb->common.bs);
 
 bdrv_dec_in_flight(acb->common.bs);
-aio_context_acquire(ctx);
 acb->common.cb(acb->common.opaque, acb->ret);
-aio_context_release(ctx);
 qemu_aio_unref(acb);
 }
 
@@ -986,12 +983,8 @@ static void blk_aio_complete(BlkAioEmAIOCB *acb)
 static void blk_aio_complete_bh(void *opaque)
 {
 BlkAioEmAIOCB *acb = opaque;
-AioContext *ctx = bdrv_get_aio_context(acb->common.bs);
-
 assert(acb->has_returned);
-aio_context_acquire(ctx);
 blk_aio_complete(acb);
-aio_context_release(ctx);
 }
 
 static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,
diff --git a/block/curl.c b/block/curl.c
index f3f063b..2939cc7 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -854,11 +854,11 @@ static void curl_readv_bh_cb(void *p)
 curl_multi_socket_action(s->multi, CURL_SOCKET_TIMEOUT, 0, &running);
 
 out:
+aio_context_release(ctx);
 if (ret != -EINPROGRESS) {
 acb->common.cb(acb->common.opaque, ret);
 qemu_aio_unref(acb);
 }
-aio_context_release(ctx);
 }
 
 static BlockAIOCB *curl_aio_readv(BlockDriverState *bs,
diff --git a/block/io.c b/block/io.c
index 8486e27..a5c7d36 100644
--- a/block/io.c
+++ b/block/io.c
@@ -813,7 +813,7 @@ static void bdrv_co_io_em_complete(void *opaque, int ret)
 CoroutineIOCompletion *co = opaque;
 
 co->ret = ret;
-qemu_coroutine_enter(co->coroutine);
+aio_co_wake(co->coroutine);
 }
 
 static int coroutine_fn bdrv_driver_preadv(BlockDriverState *bs,
@@ -2152,13 +2152,9 @@ static void bdrv_co_complete(BlockAIOCBCoroutine *acb)
 static void bdrv_co_em_bh(void *opaque)
 {
 BlockAIOCBCoroutine *acb = opaque;
-BlockDriverState *bs = acb->common.bs;
-AioContext *ctx = bdrv_get_aio_context(bs);
 
 assert(!acb->need_bh);
-aio_context_acquire(ctx);
 bdrv_co_complete(acb);
-aio_context_release(ctx);
 }
 
 static void bdrv_co_maybe_schedule_bh(BlockAIOCBCoroutine *acb)
diff --git a/block/iscsi.c b/block/iscsi.c
index 4fb43c2..2561be9 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -136,16 +136,13 @@ static void
 iscsi_bh_cb(void *p)
 {
 IscsiAIOCB *acb = p;
-AioContext *ctx = bdrv_get_aio_context(acb->common.bs);
 
 qemu_bh_delete(acb->bh);
 
 g_free(acb->buf);
 acb->buf = NULL;
 
-aio_context_acquire(ctx);
 acb->common.cb(acb->common.opaque, acb->status);
-aio_context_release(ctx);
 
 if (acb->task != NULL) {
 scsi_free_scsi_task(acb->task);
diff --git a/block/linux-aio.c b/block/linux-aio.c
index f7ae38a..88b8d55 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -75,7 +75,6 @@ static inline ssize_t io_event_ret(struct io_event *ev)
  */
 static void qemu_laio_process_completion(struct qemu_laiocb *laiocb)
 {
-LinuxAioState *s = laiocb->ctx;
 int ret;
 
 ret = laiocb->ret;
@@ -94,7 +93,6 @@ static void qemu_laio_process_completion(struct qemu_laiocb 
*laiocb)
 }
 
 laiocb->ret = ret;
-aio_context_acquire(s->aio_context);
 if (laiocb->co) {
 /* If the coroutine is already entered it must be in ioq_submit() and
  * will noti

Re: [Qemu-devel] [PATCH 1/2] monitor.c: make mon_get_cpu return NULL when there is no CPU

2017-02-20 Thread Yang Ziyue

Sorry for forgetting the "no irrelevant change" rule...won't do that next time!

2017-02-20 14:09 GMT+08:00 Thomas Huth :
> On 19.02.2017 04:55, Philippe Mathieu-Daudé wrote:
>> On 02/17/2017 05:27 AM, Ziyue Yang wrote:
>>> From: Ziyue Yang 
>>>
>>> Currently mon_get_cpu always dereferences first_cpu without checking
>>> whether it's a valid pointer. This commit adds check before
>>> dereferencing,
>>> and reports "No CPU" info if there isn't any CPU then returns NULL.
>>>
>>> Signed-off-by: Ziyue Yang 
>>
>> Reviewed-by: Philippe Mathieu-Daudé 
>>
>>> ---
>>>  monitor.c | 10 +++---
>>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/monitor.c b/monitor.c
>>> index 3cd72a9bab..6b25cf7a2b 100644
>>> --- a/monitor.c
>>> +++ b/monitor.c
>>> @@ -1026,6 +1026,10 @@ int monitor_set_cpu(int cpu_index)
>>>  CPUState *mon_get_cpu(void)
>>>  {
>>>  if (!cur_mon->mon_cpu) {
>>> +if (!first_cpu) {
>>> +monitor_printf(cur_mon, "No CPU available on this
>>> machine\n");
>>> +return NULL;
>>> +}
>>>  monitor_set_cpu(first_cpu->cpu_index);
>>>  }
>>>  cpu_synchronize_state(cur_mon->mon_cpu);
>>> @@ -2495,11 +2499,11 @@ static int default_fmt_size = 4;
>>>  static int is_valid_option(const char *c, const char *typestr)
>>>  {
>>>  char option[3];
>>> -
>>> +
>>>  option[0] = '-';
>>>  option[1] = *c;
>>>  option[2] = '\0';
>>> -
>>> +
>>>  typestr = strstr(typestr, option);
>>>  return (typestr != NULL);
>>>  }
>>> @@ -2864,7 +2868,7 @@ static QDict *monitor_parse_arguments(Monitor *mon,
>>>  p++;
>>>  if(c != *p) {
>>>  if(!is_valid_option(p, typestr)) {
>>> -
>>> +
>>>  monitor_printf(mon, "%s: unsupported
>>> option -%c\n",
>>> cmd->name, *p);
>>>  goto fail;
>
> Your patch contains some unnecessary white space changes, please try to
> avoid that! (or send a separate "beautification" patch to fix these).
>
>  Thomas
>

[Qemu-devel] [PULL 13/24] block: explicitly acquire aiocontext in callbacks that need it

2017-02-20 Thread Stefan Hajnoczi

From: Paolo Bonzini 

This covers both file descriptor callbacks and polling callbacks,
since they execute related code.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Reviewed-by: Daniel P. Berrange 
Message-id: 20170213135235.12274-14-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 block/curl.c  | 16 +---
 block/iscsi.c |  4 
 block/linux-aio.c |  4 
 block/nfs.c   |  6 ++
 block/sheepdog.c  | 29 +++--
 block/ssh.c   | 29 +
 block/win32-aio.c | 10 ++
 hw/block/virtio-blk.c |  5 -
 hw/scsi/virtio-scsi.c |  6 ++
 util/aio-posix.c  |  7 ---
 util/aio-win32.c  |  6 --
 11 files changed, 67 insertions(+), 55 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index 65e6da1..05b9ca3 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -386,9 +386,8 @@ static void curl_multi_check_completion(BDRVCURLState *s)
 }
 }
 
-static void curl_multi_do(void *arg)
+static void curl_multi_do_locked(CURLState *s)
 {
-CURLState *s = (CURLState *)arg;
 CURLSocket *socket, *next_socket;
 int running;
 int r;
@@ -406,12 +405,23 @@ static void curl_multi_do(void *arg)
 }
 }
 
+static void curl_multi_do(void *arg)
+{
+CURLState *s = (CURLState *)arg;
+
+aio_context_acquire(s->s->aio_context);
+curl_multi_do_locked(s);
+aio_context_release(s->s->aio_context);
+}
+
 static void curl_multi_read(void *arg)
 {
 CURLState *s = (CURLState *)arg;
 
-curl_multi_do(arg);
+aio_context_acquire(s->s->aio_context);
+curl_multi_do_locked(s);
 curl_multi_check_completion(s->s);
+aio_context_release(s->s->aio_context);
 }
 
 static void curl_multi_timeout_do(void *arg)
diff --git a/block/iscsi.c b/block/iscsi.c
index 664b71a..303b108 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -394,8 +394,10 @@ iscsi_process_read(void *arg)
 IscsiLun *iscsilun = arg;
 struct iscsi_context *iscsi = iscsilun->iscsi;
 
+aio_context_acquire(iscsilun->aio_context);
 iscsi_service(iscsi, POLLIN);
 iscsi_set_events(iscsilun);
+aio_context_release(iscsilun->aio_context);
 }
 
 static void
@@ -404,8 +406,10 @@ iscsi_process_write(void *arg)
 IscsiLun *iscsilun = arg;
 struct iscsi_context *iscsi = iscsilun->iscsi;
 
+aio_context_acquire(iscsilun->aio_context);
 iscsi_service(iscsi, POLLOUT);
 iscsi_set_events(iscsilun);
+aio_context_release(iscsilun->aio_context);
 }
 
 static int64_t sector_lun2qemu(int64_t sector, IscsiLun *iscsilun)
diff --git a/block/linux-aio.c b/block/linux-aio.c
index 03ab741..277c016 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -251,7 +251,9 @@ static void qemu_laio_completion_cb(EventNotifier *e)
 LinuxAioState *s = container_of(e, LinuxAioState, e);
 
 if (event_notifier_test_and_clear(&s->e)) {
+aio_context_acquire(s->aio_context);
 qemu_laio_process_completions_and_submit(s);
+aio_context_release(s->aio_context);
 }
 }
 
@@ -265,7 +267,9 @@ static bool qemu_laio_poll_cb(void *opaque)
 return false;
 }
 
+aio_context_acquire(s->aio_context);
 qemu_laio_process_completions_and_submit(s);
+aio_context_release(s->aio_context);
 return true;
 }
 
diff --git a/block/nfs.c b/block/nfs.c
index 689eaa7..5ce968c 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -208,15 +208,21 @@ static void nfs_set_events(NFSClient *client)
 static void nfs_process_read(void *arg)
 {
 NFSClient *client = arg;
+
+aio_context_acquire(client->aio_context);
 nfs_service(client->context, POLLIN);
 nfs_set_events(client);
+aio_context_release(client->aio_context);
 }
 
 static void nfs_process_write(void *arg)
 {
 NFSClient *client = arg;
+
+aio_context_acquire(client->aio_context);
 nfs_service(client->context, POLLOUT);
 nfs_set_events(client);
+aio_context_release(client->aio_context);
 }
 
 static void nfs_co_init_task(BlockDriverState *bs, NFSRPC *task)
diff --git a/block/sheepdog.c b/block/sheepdog.c
index f757157..32c4e4c 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -575,13 +575,6 @@ static coroutine_fn int send_co_req(int sockfd, 
SheepdogReq *hdr, void *data,
 return ret;
 }
 
-static void restart_co_req(void *opaque)
-{
-Coroutine *co = opaque;
-
-qemu_coroutine_enter(co);
-}
-
 typedef struct SheepdogReqCo {
 int sockfd;
 BlockDriverState *bs;
@@ -592,12 +585,19 @@ typedef struct SheepdogReqCo {
 unsigned int *rlen;
 int ret;
 bool finished;
+Coroutine *co;
 } SheepdogReqCo;
 
+static void restart_co_req(void *opaque)
+{
+SheepdogReqCo *srco = opaque;
+
+aio_co_wake(srco->co);
+}
+
 static coroutine_fn void do_co_req(void *opaque)
 {
 int ret;
-Coroutine *co;
 SheepdogReqCo *srco = opaque;
 int sockfd = srco->sockfd;
 SheepdogReq *hdr = srco->hdr;
@@ -605,9 +605,9 @@ st

[Qemu-devel] [PULL 08/24] coroutine-lock: reschedule coroutine on the AioContext it was running on

2017-02-20 Thread Stefan Hajnoczi

From: Paolo Bonzini 

As a small step towards the introduction of multiqueue, we want
coroutines to remain on the same AioContext that started them,
unless they are moved explicitly with e.g. aio_co_schedule.  This patch
avoids that coroutines switch AioContext when they use a CoMutex.
For now it does not make much of a difference, because the CoMutex
is not thread-safe and the AioContext itself is used to protect the
CoMutex from concurrent access.  However, this is going to change.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Reviewed-by: Daniel P. Berrange 
Message-id: 20170213135235.12274-9-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 util/qemu-coroutine-lock.c | 5 ++---
 util/trace-events  | 1 -
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
index 14cf9ce..e6afd1a 100644
--- a/util/qemu-coroutine-lock.c
+++ b/util/qemu-coroutine-lock.c
@@ -27,6 +27,7 @@
 #include "qemu/coroutine.h"
 #include "qemu/coroutine_int.h"
 #include "qemu/queue.h"
+#include "block/aio.h"
 #include "trace.h"
 
 void qemu_co_queue_init(CoQueue *queue)
@@ -63,7 +64,6 @@ void qemu_co_queue_run_restart(Coroutine *co)
 
 static bool qemu_co_queue_do_restart(CoQueue *queue, bool single)
 {
-Coroutine *self = qemu_coroutine_self();
 Coroutine *next;
 
 if (QSIMPLEQ_EMPTY(&queue->entries)) {
@@ -72,8 +72,7 @@ static bool qemu_co_queue_do_restart(CoQueue *queue, bool 
single)
 
 while ((next = QSIMPLEQ_FIRST(&queue->entries)) != NULL) {
 QSIMPLEQ_REMOVE_HEAD(&queue->entries, co_queue_next);
-QSIMPLEQ_INSERT_TAIL(&self->co_queue_wakeup, next, co_queue_next);
-trace_qemu_co_queue_next(next);
+aio_co_wake(next);
 if (single) {
 break;
 }
diff --git a/util/trace-events b/util/trace-events
index 53bd70c..65c9787 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -28,7 +28,6 @@ qemu_coroutine_terminate(void *co) "self %p"
 
 # util/qemu-coroutine-lock.c
 qemu_co_queue_run_restart(void *co) "co %p"
-qemu_co_queue_next(void *nxt) "next %p"
 qemu_co_mutex_lock_entry(void *mutex, void *self) "mutex %p self %p"
 qemu_co_mutex_lock_return(void *mutex, void *self) "mutex %p self %p"
 qemu_co_mutex_unlock_entry(void *mutex, void *self) "mutex %p self %p"
-- 
2.9.3

[Qemu-devel] [PULL 17/24] async: remove unnecessary inc/dec pairs

2017-02-20 Thread Stefan Hajnoczi

From: Paolo Bonzini 

Pull the increment/decrement pair out of aio_bh_poll and into the
callers.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Reviewed-by: Daniel P. Berrange 
Message-id: 20170213135235.12274-18-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 util/aio-posix.c |  8 +++-
 util/aio-win32.c |  8 
 util/async.c | 12 ++--
 3 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/util/aio-posix.c b/util/aio-posix.c
index 2173378..2d51239 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -425,9 +425,8 @@ static bool aio_dispatch_handlers(AioContext *ctx)
 
 void aio_dispatch(AioContext *ctx)
 {
+qemu_lockcnt_inc(&ctx->list_lock);
 aio_bh_poll(ctx);
-
-qemu_lockcnt_inc(&ctx->list_lock);
 aio_dispatch_handlers(ctx);
 qemu_lockcnt_dec(&ctx->list_lock);
 
@@ -679,16 +678,15 @@ bool aio_poll(AioContext *ctx, bool blocking)
 }
 
 npfd = 0;
-qemu_lockcnt_dec(&ctx->list_lock);
 
 progress |= aio_bh_poll(ctx);
 
 if (ret > 0) {
-qemu_lockcnt_inc(&ctx->list_lock);
 progress |= aio_dispatch_handlers(ctx);
-qemu_lockcnt_dec(&ctx->list_lock);
 }
 
+qemu_lockcnt_dec(&ctx->list_lock);
+
 progress |= timerlistgroup_run_timers(&ctx->tlg);
 
 return progress;
diff --git a/util/aio-win32.c b/util/aio-win32.c
index 442a179..bca496a 100644
--- a/util/aio-win32.c
+++ b/util/aio-win32.c
@@ -253,8 +253,6 @@ static bool aio_dispatch_handlers(AioContext *ctx, HANDLE 
event)
 bool progress = false;
 AioHandler *tmp;
 
-qemu_lockcnt_inc(&ctx->list_lock);
-
 /*
  * We have to walk very carefully in case aio_set_fd_handler is
  * called while we're walking.
@@ -305,14 +303,15 @@ static bool aio_dispatch_handlers(AioContext *ctx, HANDLE 
event)
 }
 }
 
-qemu_lockcnt_dec(&ctx->list_lock);
 return progress;
 }
 
 void aio_dispatch(AioContext *ctx)
 {
+qemu_lockcnt_inc(&ctx->list_lock);
 aio_bh_poll(ctx);
 aio_dispatch_handlers(ctx, INVALID_HANDLE_VALUE);
+qemu_lockcnt_dec(&ctx->list_lock);
 timerlistgroup_run_timers(&ctx->tlg);
 }
 
@@ -349,7 +348,6 @@ bool aio_poll(AioContext *ctx, bool blocking)
 }
 }
 
-qemu_lockcnt_dec(&ctx->list_lock);
 first = true;
 
 /* ctx->notifier is always registered.  */
@@ -392,6 +390,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
 progress |= aio_dispatch_handlers(ctx, event);
 } while (count > 0);
 
+qemu_lockcnt_dec(&ctx->list_lock);
+
 progress |= timerlistgroup_run_timers(&ctx->tlg);
 return progress;
 }
diff --git a/util/async.c b/util/async.c
index 187bc5b..7d469eb 100644
--- a/util/async.c
+++ b/util/async.c
@@ -90,15 +90,16 @@ void aio_bh_call(QEMUBH *bh)
 bh->cb(bh->opaque);
 }
 
-/* Multiple occurrences of aio_bh_poll cannot be called concurrently */
+/* Multiple occurrences of aio_bh_poll cannot be called concurrently.
+ * The count in ctx->list_lock is incremented before the call, and is
+ * not affected by the call.
+ */
 int aio_bh_poll(AioContext *ctx)
 {
 QEMUBH *bh, **bhp, *next;
 int ret;
 bool deleted = false;
 
-qemu_lockcnt_inc(&ctx->list_lock);
-
 ret = 0;
 for (bh = atomic_rcu_read(&ctx->first_bh); bh; bh = next) {
 next = atomic_rcu_read(&bh->next);
@@ -123,11 +124,10 @@ int aio_bh_poll(AioContext *ctx)
 
 /* remove deleted bhs */
 if (!deleted) {
-qemu_lockcnt_dec(&ctx->list_lock);
 return ret;
 }
 
-if (qemu_lockcnt_dec_and_lock(&ctx->list_lock)) {
+if (qemu_lockcnt_dec_if_lock(&ctx->list_lock)) {
 bhp = &ctx->first_bh;
 while (*bhp) {
 bh = *bhp;
@@ -138,7 +138,7 @@ int aio_bh_poll(AioContext *ctx)
 bhp = &bh->next;
 }
 }
-qemu_lockcnt_unlock(&ctx->list_lock);
+qemu_lockcnt_inc_and_unlock(&ctx->list_lock);
 }
 return ret;
 }
-- 
2.9.3

[Qemu-devel] [PULL 10/24] qed: introduce qed_aio_start_io and qed_aio_next_io_cb

2017-02-20 Thread Stefan Hajnoczi

From: Paolo Bonzini 

qed_aio_start_io and qed_aio_next_io will not have to acquire/release
the AioContext, while qed_aio_next_io_cb will.  Split the functionality
and gain a little type-safety in the process.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Reviewed-by: Daniel P. Berrange 
Message-id: 20170213135235.12274-11-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 block/qed.c | 39 +--
 1 file changed, 25 insertions(+), 14 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index 1a7ef0a..7f1c508 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -273,7 +273,19 @@ static CachedL2Table *qed_new_l2_table(BDRVQEDState *s)
 return l2_table;
 }
 
-static void qed_aio_next_io(void *opaque, int ret);
+static void qed_aio_next_io(QEDAIOCB *acb, int ret);
+
+static void qed_aio_start_io(QEDAIOCB *acb)
+{
+qed_aio_next_io(acb, 0);
+}
+
+static void qed_aio_next_io_cb(void *opaque, int ret)
+{
+QEDAIOCB *acb = opaque;
+
+qed_aio_next_io(acb, ret);
+}
 
 static void qed_plug_allocating_write_reqs(BDRVQEDState *s)
 {
@@ -292,7 +304,7 @@ static void qed_unplug_allocating_write_reqs(BDRVQEDState 
*s)
 
 acb = QSIMPLEQ_FIRST(&s->allocating_write_reqs);
 if (acb) {
-qed_aio_next_io(acb, 0);
+qed_aio_start_io(acb);
 }
 }
 
@@ -959,7 +971,7 @@ static void qed_aio_complete(QEDAIOCB *acb, int ret)
 QSIMPLEQ_REMOVE_HEAD(&s->allocating_write_reqs, next);
 acb = QSIMPLEQ_FIRST(&s->allocating_write_reqs);
 if (acb) {
-qed_aio_next_io(acb, 0);
+qed_aio_start_io(acb);
 } else if (s->header.features & QED_F_NEED_CHECK) {
 qed_start_need_check_timer(s);
 }
@@ -984,7 +996,7 @@ static void qed_commit_l2_update(void *opaque, int ret)
 acb->request.l2_table = qed_find_l2_cache_entry(&s->l2_cache, l2_offset);
 assert(acb->request.l2_table != NULL);
 
-qed_aio_next_io(opaque, ret);
+qed_aio_next_io(acb, ret);
 }
 
 /**
@@ -1032,11 +1044,11 @@ static void qed_aio_write_l2_update(QEDAIOCB *acb, int 
ret, uint64_t offset)
 if (need_alloc) {
 /* Write out the whole new L2 table */
 qed_write_l2_table(s, &acb->request, 0, s->table_nelems, true,
-qed_aio_write_l1_update, acb);
+   qed_aio_write_l1_update, acb);
 } else {
 /* Write out only the updated part of the L2 table */
 qed_write_l2_table(s, &acb->request, index, acb->cur_nclusters, false,
-qed_aio_next_io, acb);
+   qed_aio_next_io_cb, acb);
 }
 return;
 
@@ -1088,7 +1100,7 @@ static void qed_aio_write_main(void *opaque, int ret)
 }
 
 if (acb->find_cluster_ret == QED_CLUSTER_FOUND) {
-next_fn = qed_aio_next_io;
+next_fn = qed_aio_next_io_cb;
 } else {
 if (s->bs->backing) {
 next_fn = qed_aio_write_flush_before_l2_update;
@@ -1201,7 +1213,7 @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
 if (acb->flags & QED_AIOCB_ZERO) {
 /* Skip ahead if the clusters are already zero */
 if (acb->find_cluster_ret == QED_CLUSTER_ZERO) {
-qed_aio_next_io(acb, 0);
+qed_aio_start_io(acb);
 return;
 }
 
@@ -1321,18 +1333,18 @@ static void qed_aio_read_data(void *opaque, int ret,
 /* Handle zero cluster and backing file reads */
 if (ret == QED_CLUSTER_ZERO) {
 qemu_iovec_memset(&acb->cur_qiov, 0, 0, acb->cur_qiov.size);
-qed_aio_next_io(acb, 0);
+qed_aio_start_io(acb);
 return;
 } else if (ret != QED_CLUSTER_FOUND) {
 qed_read_backing_file(s, acb->cur_pos, &acb->cur_qiov,
-  &acb->backing_qiov, qed_aio_next_io, acb);
+  &acb->backing_qiov, qed_aio_next_io_cb, acb);
 return;
 }
 
 BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
 bdrv_aio_readv(bs->file, offset / BDRV_SECTOR_SIZE,
&acb->cur_qiov, acb->cur_qiov.size / BDRV_SECTOR_SIZE,
-   qed_aio_next_io, acb);
+   qed_aio_next_io_cb, acb);
 return;
 
 err:
@@ -1342,9 +1354,8 @@ err:
 /**
  * Begin next I/O or complete the request
  */
-static void qed_aio_next_io(void *opaque, int ret)
+static void qed_aio_next_io(QEDAIOCB *acb, int ret)
 {
-QEDAIOCB *acb = opaque;
 BDRVQEDState *s = acb_to_s(acb);
 QEDFindClusterFunc *io_fn = (acb->flags & QED_AIOCB_WRITE) ?
 qed_aio_write_data : qed_aio_read_data;
@@ -1400,7 +1411,7 @@ static BlockAIOCB *qed_aio_setup(BlockDriverState *bs,
 qemu_iovec_init(&acb->cur_qiov, qiov->niov);
 
 /* Start request */
-qed_aio_next_io(acb, 0);
+qed_aio_start_io(acb);
 return &acb->common;
 }
 
-- 
2.9.3

[Qemu-devel] [PULL 09/24] blkdebug: reschedule coroutine on the AioContext it is running on

2017-02-20 Thread Stefan Hajnoczi

From: Paolo Bonzini 

Keep the coroutine on the same AioContext.  Without this change,
there would be a race between yielding the coroutine and reentering it.
While the race cannot happen now, because the code only runs from a single
AioContext, this will change with multiqueue support in the block layer.

While doing the change, replace custom bottom half with aio_co_schedule.

Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Daniel P. Berrange 
Message-id: 20170213135235.12274-10-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 block/blkdebug.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index acccf85..d8eee1b 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -405,12 +405,6 @@ out:
 return ret;
 }
 
-static void error_callback_bh(void *opaque)
-{
-Coroutine *co = opaque;
-qemu_coroutine_enter(co);
-}
-
 static int inject_error(BlockDriverState *bs, BlkdebugRule *rule)
 {
 BDRVBlkdebugState *s = bs->opaque;
@@ -423,8 +417,7 @@ static int inject_error(BlockDriverState *bs, BlkdebugRule 
*rule)
 }
 
 if (!immediately) {
-aio_bh_schedule_oneshot(bdrv_get_aio_context(bs), error_callback_bh,
-qemu_coroutine_self());
+aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self());
 qemu_coroutine_yield();
 }
 
-- 
2.9.3

[Qemu-devel] [PULL 24/24] coroutine-lock: make CoRwlock thread-safe and fair

2017-02-20 Thread Stefan Hajnoczi

From: Paolo Bonzini 

This adds a CoMutex around the existing CoQueue.  Because the write-side
can just take CoMutex, the old "writer" field is not necessary anymore.
Instead of removing it altogether, count the number of pending writers
during a read-side critical section and forbid further readers from
entering.

Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Message-id: 20170213181244.16297-7-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/qemu/coroutine.h   |  3 ++-
 util/qemu-coroutine-lock.c | 35 ---
 2 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index d2de268..e60beaf 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -204,8 +204,9 @@ bool qemu_co_queue_empty(CoQueue *queue);
 
 
 typedef struct CoRwlock {
-bool writer;
+int pending_writer;
 int reader;
+CoMutex mutex;
 CoQueue queue;
 } CoRwlock;
 
diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
index b0a554f..6328eed 100644
--- a/util/qemu-coroutine-lock.c
+++ b/util/qemu-coroutine-lock.c
@@ -346,16 +346,22 @@ void qemu_co_rwlock_init(CoRwlock *lock)
 {
 memset(lock, 0, sizeof(*lock));
 qemu_co_queue_init(&lock->queue);
+qemu_co_mutex_init(&lock->mutex);
 }
 
 void qemu_co_rwlock_rdlock(CoRwlock *lock)
 {
 Coroutine *self = qemu_coroutine_self();
 
-while (lock->writer) {
-qemu_co_queue_wait(&lock->queue, NULL);
+qemu_co_mutex_lock(&lock->mutex);
+/* For fairness, wait if a writer is in line.  */
+while (lock->pending_writer) {
+qemu_co_queue_wait(&lock->queue, &lock->mutex);
 }
 lock->reader++;
+qemu_co_mutex_unlock(&lock->mutex);
+
+/* The rest of the read-side critical section is run without the mutex.  */
 self->locks_held++;
 }
 
@@ -364,10 +370,13 @@ void qemu_co_rwlock_unlock(CoRwlock *lock)
 Coroutine *self = qemu_coroutine_self();
 
 assert(qemu_in_coroutine());
-if (lock->writer) {
-lock->writer = false;
+if (!lock->reader) {
+/* The critical section started in qemu_co_rwlock_wrlock.  */
 qemu_co_queue_restart_all(&lock->queue);
 } else {
+self->locks_held--;
+
+qemu_co_mutex_lock(&lock->mutex);
 lock->reader--;
 assert(lock->reader >= 0);
 /* Wakeup only one waiting writer */
@@ -375,16 +384,20 @@ void qemu_co_rwlock_unlock(CoRwlock *lock)
 qemu_co_queue_next(&lock->queue);
 }
 }
-self->locks_held--;
+qemu_co_mutex_unlock(&lock->mutex);
 }
 
 void qemu_co_rwlock_wrlock(CoRwlock *lock)
 {
-Coroutine *self = qemu_coroutine_self();
-
-while (lock->writer || lock->reader) {
-qemu_co_queue_wait(&lock->queue, NULL);
+qemu_co_mutex_lock(&lock->mutex);
+lock->pending_writer++;
+while (lock->reader) {
+qemu_co_queue_wait(&lock->queue, &lock->mutex);
 }
-lock->writer = true;
-self->locks_held++;
+lock->pending_writer--;
+
+/* The rest of the write-side critical section is run with
+ * the mutex taken, so that lock->reader remains zero.
+ * There is no need to update self->locks_held.
+ */
 }
-- 
2.9.3

[Qemu-devel] [PULL 20/24] coroutine-lock: add limited spinning to CoMutex

2017-02-20 Thread Stefan Hajnoczi

From: Paolo Bonzini 

Running a very small critical section on pthread_mutex_t and CoMutex
shows that pthread_mutex_t is much faster because it doesn't actually
go to sleep.  What happens is that the critical section is shorter
than the latency of entering the kernel and thus FUTEX_WAIT always
fails.  With CoMutex there is no such latency but you still want to
avoid wait and wakeup.  So introduce it artificially.

This only works with one waiters; because CoMutex is fair, it will
always have more waits and wakeups than a pthread_mutex_t.

Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Message-id: 20170213181244.16297-3-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/qemu/coroutine.h   |  5 +
 util/qemu-coroutine-lock.c | 51 --
 util/qemu-coroutine.c  |  2 +-
 3 files changed, 51 insertions(+), 7 deletions(-)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index fce228f..12ce8e1 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -167,6 +167,11 @@ typedef struct CoMutex {
  */
 unsigned locked;
 
+/* Context that is holding the lock.  Useful to avoid spinning
+ * when two coroutines on the same AioContext try to get the lock. :)
+ */
+AioContext *ctx;
+
 /* A queue of waiters.  Elements are added atomically in front of
  * from_push.  to_pop is only populated, and popped from, by whoever
  * is in charge of the next wakeup.  This can be an unlocker or,
diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
index 25da9fa..73fe77c 100644
--- a/util/qemu-coroutine-lock.c
+++ b/util/qemu-coroutine-lock.c
@@ -30,6 +30,7 @@
 #include "qemu-common.h"
 #include "qemu/coroutine.h"
 #include "qemu/coroutine_int.h"
+#include "qemu/processor.h"
 #include "qemu/queue.h"
 #include "block/aio.h"
 #include "trace.h"
@@ -181,7 +182,18 @@ void qemu_co_mutex_init(CoMutex *mutex)
 memset(mutex, 0, sizeof(*mutex));
 }
 
-static void coroutine_fn qemu_co_mutex_lock_slowpath(CoMutex *mutex)
+static void coroutine_fn qemu_co_mutex_wake(CoMutex *mutex, Coroutine *co)
+{
+/* Read co before co->ctx; pairs with smp_wmb() in
+ * qemu_coroutine_enter().
+ */
+smp_read_barrier_depends();
+mutex->ctx = co->ctx;
+aio_co_wake(co);
+}
+
+static void coroutine_fn qemu_co_mutex_lock_slowpath(AioContext *ctx,
+ CoMutex *mutex)
 {
 Coroutine *self = qemu_coroutine_self();
 CoWaitRecord w;
@@ -206,10 +218,11 @@ static void coroutine_fn 
qemu_co_mutex_lock_slowpath(CoMutex *mutex)
 if (co == self) {
 /* We got the lock ourselves!  */
 assert(to_wake == &w);
+mutex->ctx = ctx;
 return;
 }
 
-aio_co_wake(co);
+qemu_co_mutex_wake(mutex, co);
 }
 
 qemu_coroutine_yield();
@@ -218,13 +231,39 @@ static void coroutine_fn 
qemu_co_mutex_lock_slowpath(CoMutex *mutex)
 
 void coroutine_fn qemu_co_mutex_lock(CoMutex *mutex)
 {
+AioContext *ctx = qemu_get_current_aio_context();
 Coroutine *self = qemu_coroutine_self();
+int waiters, i;
 
-if (atomic_fetch_inc(&mutex->locked) == 0) {
+/* Running a very small critical section on pthread_mutex_t and CoMutex
+ * shows that pthread_mutex_t is much faster because it doesn't actually
+ * go to sleep.  What happens is that the critical section is shorter
+ * than the latency of entering the kernel and thus FUTEX_WAIT always
+ * fails.  With CoMutex there is no such latency but you still want to
+ * avoid wait and wakeup.  So introduce it artificially.
+ */
+i = 0;
+retry_fast_path:
+waiters = atomic_cmpxchg(&mutex->locked, 0, 1);
+if (waiters != 0) {
+while (waiters == 1 && ++i < 1000) {
+if (atomic_read(&mutex->ctx) == ctx) {
+break;
+}
+if (atomic_read(&mutex->locked) == 0) {
+goto retry_fast_path;
+}
+cpu_relax();
+}
+waiters = atomic_fetch_inc(&mutex->locked);
+}
+
+if (waiters == 0) {
 /* Uncontended.  */
 trace_qemu_co_mutex_lock_uncontended(mutex, self);
+mutex->ctx = ctx;
 } else {
-qemu_co_mutex_lock_slowpath(mutex);
+qemu_co_mutex_lock_slowpath(ctx, mutex);
 }
 mutex->holder = self;
 self->locks_held++;
@@ -240,6 +279,7 @@ void coroutine_fn qemu_co_mutex_unlock(CoMutex *mutex)
 assert(mutex->holder == self);
 assert(qemu_in_coroutine());
 
+mutex->ctx = NULL;
 mutex->holder = NULL;
 self->locks_held--;
 if (atomic_fetch_dec(&mutex->locked) == 1) {
@@ -252,8 +292,7 @@ void coroutine_fn qemu_co_mutex_unlock(CoMutex *mutex)
 unsigned our_handoff;
 
 if (to_wake) {
-Coroutine *co = to_wake->co;
-aio_co_wake(co);
+qemu_co_mutex_wake(mutex, to_wake->co);
 break;

[Qemu-devel] [PULL 18/24] block: document fields protected by AioContext lock

2017-02-20 Thread Stefan Hajnoczi

From: Paolo Bonzini 

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Reviewed-by: Daniel P. Berrange 
Message-id: 20170213135235.12274-19-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/block/block_int.h  | 64 +-
 include/sysemu/block-backend.h | 14 ++---
 2 files changed, 49 insertions(+), 29 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 2d92d7e..1670941 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -430,8 +430,9 @@ struct BdrvChild {
  * copied as well.
  */
 struct BlockDriverState {
-int64_t total_sectors; /* if we are reading a disk image, give its
-  size in sectors */
+/* Protected by big QEMU lock or read-only after opening.  No special
+ * locking needed during I/O...
+ */
 int open_flags; /* flags used to open the file, re-used for re-open */
 bool read_only; /* if true, the media is read only */
 bool encrypted; /* if true, the media is encrypted */
@@ -439,14 +440,6 @@ struct BlockDriverState {
 bool sg;/* if true, the device is a /dev/sg* */
 bool probed;/* if true, format was probed rather than specified */
 
-int copy_on_read; /* if nonzero, copy read backing sectors into image.
- note this is a reference count */
-
-CoQueue flush_queue;/* Serializing flush queue */
-bool active_flush_req;  /* Flush request in flight? */
-unsigned int write_gen; /* Current data generation */
-unsigned int flushed_gen;   /* Flushed write generation */
-
 BlockDriver *drv; /* NULL means no media */
 void *opaque;
 
@@ -468,18 +461,6 @@ struct BlockDriverState {
 BdrvChild *backing;
 BdrvChild *file;
 
-/* Callback before write request is processed */
-NotifierWithReturnList before_write_notifiers;
-
-/* number of in-flight requests; overall and serialising */
-unsigned int in_flight;
-unsigned int serialising_in_flight;
-
-bool wakeup;
-
-/* Offset after the highest byte written to */
-uint64_t wr_highest_offset;
-
 /* I/O Limits */
 BlockLimits bl;
 
@@ -497,11 +478,8 @@ struct BlockDriverState {
 QTAILQ_ENTRY(BlockDriverState) bs_list;
 /* element of the list of monitor-owned BDS */
 QTAILQ_ENTRY(BlockDriverState) monitor_list;
-QLIST_HEAD(, BdrvDirtyBitmap) dirty_bitmaps;
 int refcnt;
 
-QLIST_HEAD(, BdrvTrackedRequest) tracked_requests;
-
 /* operation blockers */
 QLIST_HEAD(, BdrvOpBlocker) op_blockers[BLOCK_OP_TYPE_MAX];
 
@@ -522,6 +500,31 @@ struct BlockDriverState {
 /* The error object in use for blocking operations on backing_hd */
 Error *backing_blocker;
 
+/* Protected by AioContext lock */
+
+/* If true, copy read backing sectors into image.  Can be >1 if more
+ * than one client has requested copy-on-read.
+ */
+int copy_on_read;
+
+/* If we are reading a disk image, give its size in sectors.
+ * Generally read-only; it is written to by load_vmstate and save_vmstate,
+ * but the block layer is quiescent during those.
+ */
+int64_t total_sectors;
+
+/* Callback before write request is processed */
+NotifierWithReturnList before_write_notifiers;
+
+/* number of in-flight requests; overall and serialising */
+unsigned int in_flight;
+unsigned int serialising_in_flight;
+
+bool wakeup;
+
+/* Offset after the highest byte written to */
+uint64_t wr_highest_offset;
+
 /* threshold limit for writes, in bytes. "High water mark". */
 uint64_t write_threshold_offset;
 NotifierWithReturn write_threshold_notifier;
@@ -529,6 +532,17 @@ struct BlockDriverState {
 /* counter for nested bdrv_io_plug */
 unsigned io_plugged;
 
+QLIST_HEAD(, BdrvTrackedRequest) tracked_requests;
+CoQueue flush_queue;  /* Serializing flush queue */
+bool active_flush_req;/* Flush request in flight? */
+unsigned int write_gen;   /* Current data generation */
+unsigned int flushed_gen; /* Flushed write generation */
+
+QLIST_HEAD(, BdrvDirtyBitmap) dirty_bitmaps;
+
+/* do we need to tell the quest if we have a volatile write cache? */
+int enable_write_cache;
+
 int quiesce_counter;
 };
 
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 6444e41..f365a51 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -64,14 +64,20 @@ typedef struct BlockDevOps {
  * fields that must be public. This is in particular for QLIST_ENTRY() and
  * friends so that BlockBackends can be kept in lists outside block-backend.c 
*/
 typedef struct BlockBackendPublic {
-/* I/O throttling.
- * throttle_state tells us if this BlockBackend has I/O limits configured.
- * io_limits_disabled tell

[Qemu-devel] [PULL 22/24] coroutine-lock: place CoMutex before CoQueue in header

2017-02-20 Thread Stefan Hajnoczi

From: Paolo Bonzini 

This will avoid forward references in the next patch.  It is also
more logical because CoQueue is not anymore the basic primitive.

Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Message-id: 20170213181244.16297-5-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/qemu/coroutine.h | 89 
 1 file changed, 44 insertions(+), 45 deletions(-)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index 12ce8e1..9f68579 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -112,51 +112,6 @@ bool qemu_in_coroutine(void);
  */
 bool qemu_coroutine_entered(Coroutine *co);
 
-
-/**
- * CoQueues are a mechanism to queue coroutines in order to continue executing
- * them later. They provide the fundamental primitives on which coroutine locks
- * are built.
- */
-typedef struct CoQueue {
-QSIMPLEQ_HEAD(, Coroutine) entries;
-} CoQueue;
-
-/**
- * Initialise a CoQueue. This must be called before any other operation is used
- * on the CoQueue.
- */
-void qemu_co_queue_init(CoQueue *queue);
-
-/**
- * Adds the current coroutine to the CoQueue and transfers control to the
- * caller of the coroutine.
- */
-void coroutine_fn qemu_co_queue_wait(CoQueue *queue);
-
-/**
- * Restarts the next coroutine in the CoQueue and removes it from the queue.
- *
- * Returns true if a coroutine was restarted, false if the queue is empty.
- */
-bool coroutine_fn qemu_co_queue_next(CoQueue *queue);
-
-/**
- * Restarts all coroutines in the CoQueue and leaves the queue empty.
- */
-void coroutine_fn qemu_co_queue_restart_all(CoQueue *queue);
-
-/**
- * Enter the next coroutine in the queue
- */
-bool qemu_co_enter_next(CoQueue *queue);
-
-/**
- * Checks if the CoQueue is empty.
- */
-bool qemu_co_queue_empty(CoQueue *queue);
-
-
 /**
  * Provides a mutex that can be used to synchronise coroutines
  */
@@ -202,6 +157,50 @@ void coroutine_fn qemu_co_mutex_lock(CoMutex *mutex);
  */
 void coroutine_fn qemu_co_mutex_unlock(CoMutex *mutex);
 
+
+/**
+ * CoQueues are a mechanism to queue coroutines in order to continue executing
+ * them later.
+ */
+typedef struct CoQueue {
+QSIMPLEQ_HEAD(, Coroutine) entries;
+} CoQueue;
+
+/**
+ * Initialise a CoQueue. This must be called before any other operation is used
+ * on the CoQueue.
+ */
+void qemu_co_queue_init(CoQueue *queue);
+
+/**
+ * Adds the current coroutine to the CoQueue and transfers control to the
+ * caller of the coroutine.
+ */
+void coroutine_fn qemu_co_queue_wait(CoQueue *queue);
+
+/**
+ * Restarts the next coroutine in the CoQueue and removes it from the queue.
+ *
+ * Returns true if a coroutine was restarted, false if the queue is empty.
+ */
+bool coroutine_fn qemu_co_queue_next(CoQueue *queue);
+
+/**
+ * Restarts all coroutines in the CoQueue and leaves the queue empty.
+ */
+void coroutine_fn qemu_co_queue_restart_all(CoQueue *queue);
+
+/**
+ * Enter the next coroutine in the queue
+ */
+bool qemu_co_enter_next(CoQueue *queue);
+
+/**
+ * Checks if the CoQueue is empty.
+ */
+bool qemu_co_queue_empty(CoQueue *queue);
+
+
 typedef struct CoRwlock {
 bool writer;
 int reader;
-- 
2.9.3

[Qemu-devel] [PULL 16/24] aio-posix: partially inline aio_dispatch into aio_poll

2017-02-20 Thread Stefan Hajnoczi

From: Paolo Bonzini 

This patch prepares for the removal of unnecessary lockcnt inc/dec pairs.
Extract the dispatching loop for file descriptor handlers into a new
function aio_dispatch_handlers, and then inline aio_dispatch into
aio_poll.

aio_dispatch can now become void.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Reviewed-by: Daniel P. Berrange 
Message-id: 20170213135235.12274-17-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/block/aio.h |  6 +-
 util/aio-posix.c| 44 ++--
 util/aio-win32.c| 13 -
 util/async.c|  2 +-
 4 files changed, 20 insertions(+), 45 deletions(-)

diff --git a/include/block/aio.h b/include/block/aio.h
index 614cbc6..677b6ff 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -310,12 +310,8 @@ bool aio_pending(AioContext *ctx);
 /* Dispatch any pending callbacks from the GSource attached to the AioContext.
  *
  * This is used internally in the implementation of the GSource.
- *
- * @dispatch_fds: true to process fds, false to skip them
- *(can be used as an optimization by callers that know there
- *are no fds ready)
  */
-bool aio_dispatch(AioContext *ctx, bool dispatch_fds);
+void aio_dispatch(AioContext *ctx);
 
 /* Progress in completing AIO work to occur.  This can issue new pending
  * aio as a result of executing I/O completion or bh callbacks.
diff --git a/util/aio-posix.c b/util/aio-posix.c
index 84cee43..2173378 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -386,12 +386,6 @@ static bool aio_dispatch_handlers(AioContext *ctx)
 AioHandler *node, *tmp;
 bool progress = false;
 
-/*
- * We have to walk very carefully in case aio_set_fd_handler is
- * called while we're walking.
- */
-qemu_lockcnt_inc(&ctx->list_lock);
-
 QLIST_FOREACH_SAFE_RCU(node, &ctx->aio_handlers, node, tmp) {
 int revents;
 
@@ -426,33 +420,18 @@ static bool aio_dispatch_handlers(AioContext *ctx)
 }
 }
 
-qemu_lockcnt_dec(&ctx->list_lock);
 return progress;
 }
 
-/*
- * Note that dispatch_fds == false has the side-effect of post-poning the
- * freeing of deleted handlers.
- */
-bool aio_dispatch(AioContext *ctx, bool dispatch_fds)
+void aio_dispatch(AioContext *ctx)
 {
-bool progress;
+aio_bh_poll(ctx);
 
-/*
- * If there are callbacks left that have been queued, we need to call them.
- * Do not call select in this case, because it is possible that the caller
- * does not need a complete flush (as is the case for aio_poll loops).
- */
-progress = aio_bh_poll(ctx);
+qemu_lockcnt_inc(&ctx->list_lock);
+aio_dispatch_handlers(ctx);
+qemu_lockcnt_dec(&ctx->list_lock);
 
-if (dispatch_fds) {
-progress |= aio_dispatch_handlers(ctx);
-}
-
-/* Run our timers */
-progress |= timerlistgroup_run_timers(&ctx->tlg);
-
-return progress;
+timerlistgroup_run_timers(&ctx->tlg);
 }
 
 /* These thread-local variables are used only in a small part of aio_poll
@@ -702,11 +681,16 @@ bool aio_poll(AioContext *ctx, bool blocking)
 npfd = 0;
 qemu_lockcnt_dec(&ctx->list_lock);
 
-/* Run dispatch even if there were no readable fds to run timers */
-if (aio_dispatch(ctx, ret > 0)) {
-progress = true;
+progress |= aio_bh_poll(ctx);
+
+if (ret > 0) {
+qemu_lockcnt_inc(&ctx->list_lock);
+progress |= aio_dispatch_handlers(ctx);
+qemu_lockcnt_dec(&ctx->list_lock);
 }
 
+progress |= timerlistgroup_run_timers(&ctx->tlg);
+
 return progress;
 }
 
diff --git a/util/aio-win32.c b/util/aio-win32.c
index 20b63ce..442a179 100644
--- a/util/aio-win32.c
+++ b/util/aio-win32.c
@@ -309,16 +309,11 @@ static bool aio_dispatch_handlers(AioContext *ctx, HANDLE 
event)
 return progress;
 }
 
-bool aio_dispatch(AioContext *ctx, bool dispatch_fds)
+void aio_dispatch(AioContext *ctx)
 {
-bool progress;
-
-progress = aio_bh_poll(ctx);
-if (dispatch_fds) {
-progress |= aio_dispatch_handlers(ctx, INVALID_HANDLE_VALUE);
-}
-progress |= timerlistgroup_run_timers(&ctx->tlg);
-return progress;
+aio_bh_poll(ctx);
+aio_dispatch_handlers(ctx, INVALID_HANDLE_VALUE);
+timerlistgroup_run_timers(&ctx->tlg);
 }
 
 bool aio_poll(AioContext *ctx, bool blocking)
diff --git a/util/async.c b/util/async.c
index c54da71..187bc5b 100644
--- a/util/async.c
+++ b/util/async.c
@@ -258,7 +258,7 @@ aio_ctx_dispatch(GSource *source,
 AioContext *ctx = (AioContext *) source;
 
 assert(callback == NULL);
-aio_dispatch(ctx, true);
+aio_dispatch(ctx);
 return true;
 }
 
-- 
2.9.3

[Qemu-devel] [PULL 21/24] test-aio-multithread: add performance comparison with thread-based mutexes

2017-02-20 Thread Stefan Hajnoczi

From: Paolo Bonzini 

Add two implementations of the same benchmark as the previous patch,
but using pthreads.  One uses a normal QemuMutex, the other is Linux
only and implements a fair mutex based on MCS locks and futexes.
This shows that the slower performance of the 5-thread case is due to
the fairness of CoMutex, rather than to coroutines.  If fairness does
not matter, as is the case with two threads, CoMutex can actually be
faster than pthreads.

Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Message-id: 20170213181244.16297-4-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 tests/test-aio-multithread.c | 164 +++
 1 file changed, 164 insertions(+)

diff --git a/tests/test-aio-multithread.c b/tests/test-aio-multithread.c
index 4fa2e9b..f11e990 100644
--- a/tests/test-aio-multithread.c
+++ b/tests/test-aio-multithread.c
@@ -278,6 +278,162 @@ static void test_multi_co_mutex_2_30(void)
 test_multi_co_mutex(2, 30);
 }
 
+/* Same test with fair mutexes, for performance comparison.  */
+
+#ifdef CONFIG_LINUX
+#include "qemu/futex.h"
+
+/* The nodes for the mutex reside in this structure (on which we try to avoid
+ * false sharing).  The head of the mutex is in the "mutex_head" variable.
+ */
+static struct {
+int next, locked;
+int padding[14];
+} nodes[NUM_CONTEXTS] __attribute__((__aligned__(64)));
+
+static int mutex_head = -1;
+
+static void mcs_mutex_lock(void)
+{
+int prev;
+
+nodes[id].next = -1;
+nodes[id].locked = 1;
+prev = atomic_xchg(&mutex_head, id);
+if (prev != -1) {
+atomic_set(&nodes[prev].next, id);
+qemu_futex_wait(&nodes[id].locked, 1);
+}
+}
+
+static void mcs_mutex_unlock(void)
+{
+int next;
+if (nodes[id].next == -1) {
+if (atomic_read(&mutex_head) == id &&
+atomic_cmpxchg(&mutex_head, id, -1) == id) {
+/* Last item in the list, exit.  */
+return;
+}
+while (atomic_read(&nodes[id].next) == -1) {
+/* mcs_mutex_lock did the xchg, but has not updated
+ * nodes[prev].next yet.
+ */
+}
+}
+
+/* Wake up the next in line.  */
+next = nodes[id].next;
+nodes[next].locked = 0;
+qemu_futex_wake(&nodes[next].locked, 1);
+}
+
+static void test_multi_fair_mutex_entry(void *opaque)
+{
+while (!atomic_mb_read(&now_stopping)) {
+mcs_mutex_lock();
+counter++;
+mcs_mutex_unlock();
+atomic_inc(&atomic_counter);
+}
+atomic_dec(&running);
+}
+
+static void test_multi_fair_mutex(int threads, int seconds)
+{
+int i;
+
+assert(mutex_head == -1);
+counter = 0;
+atomic_counter = 0;
+now_stopping = false;
+
+create_aio_contexts();
+assert(threads <= NUM_CONTEXTS);
+running = threads;
+for (i = 0; i < threads; i++) {
+Coroutine *co1 = qemu_coroutine_create(test_multi_fair_mutex_entry, 
NULL);
+aio_co_schedule(ctx[i], co1);
+}
+
+g_usleep(seconds * 100);
+
+atomic_mb_set(&now_stopping, true);
+while (running > 0) {
+g_usleep(10);
+}
+
+join_aio_contexts();
+g_test_message("%d iterations/second\n", counter / seconds);
+g_assert_cmpint(counter, ==, atomic_counter);
+}
+
+static void test_multi_fair_mutex_1(void)
+{
+test_multi_fair_mutex(NUM_CONTEXTS, 1);
+}
+
+static void test_multi_fair_mutex_10(void)
+{
+test_multi_fair_mutex(NUM_CONTEXTS, 10);
+}
+#endif
+
+/* Same test with pthread mutexes, for performance comparison and
+ * portability.  */
+
+static QemuMutex mutex;
+
+static void test_multi_mutex_entry(void *opaque)
+{
+while (!atomic_mb_read(&now_stopping)) {
+qemu_mutex_lock(&mutex);
+counter++;
+qemu_mutex_unlock(&mutex);
+atomic_inc(&atomic_counter);
+}
+atomic_dec(&running);
+}
+
+static void test_multi_mutex(int threads, int seconds)
+{
+int i;
+
+qemu_mutex_init(&mutex);
+counter = 0;
+atomic_counter = 0;
+now_stopping = false;
+
+create_aio_contexts();
+assert(threads <= NUM_CONTEXTS);
+running = threads;
+for (i = 0; i < threads; i++) {
+Coroutine *co1 = qemu_coroutine_create(test_multi_mutex_entry, NULL);
+aio_co_schedule(ctx[i], co1);
+}
+
+g_usleep(seconds * 100);
+
+atomic_mb_set(&now_stopping, true);
+while (running > 0) {
+g_usleep(10);
+}
+
+join_aio_contexts();
+g_test_message("%d iterations/second\n", counter / seconds);
+g_assert_cmpint(counter, ==, atomic_counter);
+}
+
+static void test_multi_mutex_1(void)
+{
+test_multi_mutex(NUM_CONTEXTS, 1);
+}
+
+static void test_multi_mutex_10(void)
+{
+test_multi_mutex(NUM_CONTEXTS, 10);
+}
+
 /* End of tests.  */
 
 int main(int argc, char **argv)
@@ -290,10 +446,18 @@ int main(int argc, char **argv)
 g_test_add_func("/aio/multi/schedule", test_multi_co_schedule_1);
 g_test_add_func("/aio/multi/mute

Re: [Qemu-devel] Error handling for KVM_GET_DIRTY_LOG

2017-02-20 Thread Dr. David Alan Gilbert

* Christian Borntraeger (borntrae...@de.ibm.com) wrote:
> On 02/16/2017 03:51 PM, Janosch Frank wrote:
> > While trying to fix a bug in the s390 migration code, I noticed that
> > QEMU ignores practically all errors returned from that VM ioctl. QEMU
> > behaves as specified in the KVM api and only processes -1 (-EPERM) as an
> > error.
> > 
> > Unfortunately the documentation is wrong/old and KVM may return -EFAULT,
> > -EINVAL, -ENOTSUPP (BookE) and -ENOENT. This bugs me, as I found a case
> > where I want to return -EFAULT because of guest memory problems and QEMU
> > will still happily migrate the VM.
> > 
> > I currently don't see a reason why we continue to migrate on EFAULT and
> > EINVAL. But returning -error from kvm_physical_sync_dirty_bitmap might
> > also a bit hard, as it kills QEMU.
> > 
> > Do we want to fix this and if, how do we want it done?
> > If not we at least have a definitive mail to point to when the next one
> > comes around. I also have a KVM patch to update the api documentation if
> > wanted (maybe we should dust that off a bit anyhow).
> 
> I think we want to handle _ALL_ error of that ioctl. Instead of aborting
> QEMU we might just want to abort the migration  in that case?

Yes, I don't see any reason to kill the source guest.

> > This has been brought up in 2009 [1] the first time and was more or less
> > fixed and then reverted in 2014 [2].
> > 
> > The reason in [1] was that PPC hadn't settled yet on a valid return code.
> > 
> > In [2] it was too close to the v2 to handle it properly.
> > 
> > 
> > [1] https://lists.nongnu.org/archive/html/qemu-devel/2009-07/msg01772.html
> > 
> > [2] https://lists.nongnu.org/archive/html/qemu-devel/2014-04/msg01993.html
> 
> So back then it was just too close to 2.0 and should have been revisited for 
> 2.1. Lets now fix it for 2.9?

Yes

Dave

> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v8 1/2] block/vxhs.c: Add support for a new block device type called "vxhs"

2017-02-20 Thread Daniel P. Berrange

On Fri, Feb 17, 2017 at 04:42:15PM -0500, Jeff Cody wrote:
> On Thu, Feb 16, 2017 at 02:24:19PM -0800, ashish mittal wrote:
> > Hi,
> > 
> > I am getting the following error with checkpatch.pl
> > 
> > ERROR: externs should be avoided in .c files
> > #78: FILE: block/vxhs.c:28:
> > +QemuUUID qemu_uuid __attribute__ ((weak));
> > 
> > Is there any way to get around this, or does it mean that I would have
> > to add a vxhs.h just for this one entry?
> >
> 
> I remain skeptical on the use of the qemu_uuid as a way to select the TLS
> cert.

Yes, we should not be hardcoding arbitrary path lookup policies like that
in QEMU. The libqnio API should allow QEMU to specify what paths it wants
to use for certs directly. That allows the admin the flexibility to decide
their own policy for where to put certs and the policy on which certs are
used for which purpose.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|

[Qemu-devel] [PULL 23/24] coroutine-lock: add mutex argument to CoQueue APIs

2017-02-20 Thread Stefan Hajnoczi

From: Paolo Bonzini 

All that CoQueue needs in order to become thread-safe is help
from an external mutex.  Add this to the API.

Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Message-id: 20170213181244.16297-6-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/qemu/coroutine.h   |  8 +---
 block/backup.c |  2 +-
 block/io.c |  4 ++--
 block/nbd-client.c |  2 +-
 block/qcow2-cluster.c  |  4 +---
 block/sheepdog.c   |  2 +-
 block/throttle-groups.c|  2 +-
 hw/9pfs/9p.c   |  2 +-
 util/qemu-coroutine-lock.c | 24 +---
 9 files changed, 34 insertions(+), 16 deletions(-)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index 9f68579..d2de268 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -160,7 +160,8 @@ void coroutine_fn qemu_co_mutex_unlock(CoMutex *mutex);
 
 /**
  * CoQueues are a mechanism to queue coroutines in order to continue executing
- * them later.
+ * them later.  They are similar to condition variables, but they need help
+ * from an external mutex in order to maintain thread-safety.
  */
 typedef struct CoQueue {
 QSIMPLEQ_HEAD(, Coroutine) entries;
@@ -174,9 +175,10 @@ void qemu_co_queue_init(CoQueue *queue);
 
 /**
  * Adds the current coroutine to the CoQueue and transfers control to the
- * caller of the coroutine.
+ * caller of the coroutine.  The mutex is unlocked during the wait and
+ * locked again afterwards.
  */
-void coroutine_fn qemu_co_queue_wait(CoQueue *queue);
+void coroutine_fn qemu_co_queue_wait(CoQueue *queue, CoMutex *mutex);
 
 /**
  * Restarts the next coroutine in the CoQueue and removes it from the queue.
diff --git a/block/backup.c b/block/backup.c
index ea38733..fe010e7 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -64,7 +64,7 @@ static void coroutine_fn 
wait_for_overlapping_requests(BackupBlockJob *job,
 retry = false;
 QLIST_FOREACH(req, &job->inflight_reqs, list) {
 if (end > req->start && start < req->end) {
-qemu_co_queue_wait(&req->wait_queue);
+qemu_co_queue_wait(&req->wait_queue, NULL);
 retry = true;
 break;
 }
diff --git a/block/io.c b/block/io.c
index a5c7d36..d5c4544 100644
--- a/block/io.c
+++ b/block/io.c
@@ -539,7 +539,7 @@ static bool coroutine_fn 
wait_serialising_requests(BdrvTrackedRequest *self)
  * (instead of producing a deadlock in the former case). */
 if (!req->waiting_for) {
 self->waiting_for = req;
-qemu_co_queue_wait(&req->wait_queue);
+qemu_co_queue_wait(&req->wait_queue, NULL);
 self->waiting_for = NULL;
 retry = true;
 waited = true;
@@ -2275,7 +2275,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
 
 /* Wait until any previous flushes are completed */
 while (bs->active_flush_req) {
-qemu_co_queue_wait(&bs->flush_queue);
+qemu_co_queue_wait(&bs->flush_queue, NULL);
 }
 
 bs->active_flush_req = true;
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 10fcc9e..0dc12c2 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -182,7 +182,7 @@ static void nbd_coroutine_start(NBDClientSession *s,
 /* Poor man semaphore.  The free_sema is locked when no other request
  * can be accepted, and unlocked after receiving one reply.  */
 if (s->in_flight == MAX_NBD_REQUESTS) {
-qemu_co_queue_wait(&s->free_sema);
+qemu_co_queue_wait(&s->free_sema, NULL);
 assert(s->in_flight < MAX_NBD_REQUESTS);
 }
 s->in_flight++;
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 928c1e2..78c11d4 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -932,9 +932,7 @@ static int handle_dependencies(BlockDriverState *bs, 
uint64_t guest_offset,
 if (bytes == 0) {
 /* Wait for the dependency to complete. We need to recheck
  * the free/allocated clusters when we continue. */
-qemu_co_mutex_unlock(&s->lock);
-qemu_co_queue_wait(&old_alloc->dependent_requests);
-qemu_co_mutex_lock(&s->lock);
+qemu_co_queue_wait(&old_alloc->dependent_requests, &s->lock);
 return -EAGAIN;
 }
 }
diff --git a/block/sheepdog.c b/block/sheepdog.c
index 32c4e4c..860ba61 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -486,7 +486,7 @@ static void wait_for_overlapping_aiocb(BDRVSheepdogState 
*s, SheepdogAIOCB *acb)
 retry:
 QLIST_FOREACH(cb, &s->inflight_aiocb_head, aiocb_siblings) {
 if (AIOCBOverlapping(acb, cb)) {
-qemu_co_queue_wait(&s->overlapping_queue);
+qemu_co_queue_wait(&s->overlapping_queue, NULL);
 goto retry;
 }
 }
diff --git a/block/throttle-g

[Qemu-devel] [PATCH 05/12] virtio-ccw: add virtio-crypto-ccw device

2017-02-20 Thread Cornelia Huck

From: Halil Pasic 

Wire up virtio-crypto for the CCW based VIRTIO.

Signed-off-by: Halil Pasic 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/virtio-ccw.c | 61 +++
 hw/s390x/virtio-ccw.h | 12 ++
 2 files changed, 73 insertions(+)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index 32c414f23d..613d8c6615 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -894,6 +894,24 @@ static void virtio_ccw_rng_realize(VirtioCcwDevice 
*ccw_dev, Error **errp)
  NULL);
 }
 
+static void virtio_ccw_crypto_realize(VirtioCcwDevice *ccw_dev, Error **errp)
+{
+VirtIOCryptoCcw *dev = VIRTIO_CRYPTO_CCW(ccw_dev);
+DeviceState *vdev = DEVICE(&dev->vdev);
+Error *err = NULL;
+
+qdev_set_parent_bus(vdev, BUS(&ccw_dev->bus));
+object_property_set_bool(OBJECT(vdev), true, "realized", &err);
+if (err) {
+error_propagate(errp, err);
+return;
+}
+
+object_property_set_link(OBJECT(vdev),
+ OBJECT(dev->vdev.conf.cryptodev), "cryptodev",
+ NULL);
+}
+
 /* DeviceState to VirtioCcwDevice. Note: used on datapath,
  * be careful and test performance if you change this.
  */
@@ -1534,6 +1552,48 @@ static const TypeInfo virtio_ccw_rng = {
 .class_init= virtio_ccw_rng_class_init,
 };
 
+static Property virtio_ccw_crypto_properties[] = {
+DEFINE_PROP_CSS_DEV_ID("devno", VirtioCcwDevice, parent_obj.bus_id),
+DEFINE_PROP_BIT("ioeventfd", VirtioCcwDevice, flags,
+VIRTIO_CCW_FLAG_USE_IOEVENTFD_BIT, true),
+DEFINE_PROP_UINT32("max_revision", VirtioCcwDevice, max_rev,
+   VIRTIO_CCW_MAX_REV),
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static void virtio_ccw_crypto_instance_init(Object *obj)
+{
+VirtIOCryptoCcw *dev = VIRTIO_CRYPTO_CCW(obj);
+VirtioCcwDevice *ccw_dev = VIRTIO_CCW_DEVICE(obj);
+
+ccw_dev->force_revision_1 = true;
+virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
+TYPE_VIRTIO_CRYPTO);
+
+object_property_add_alias(obj, "cryptodev", OBJECT(&dev->vdev),
+  "cryptodev", &error_abort);
+}
+
+static void virtio_ccw_crypto_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+VirtIOCCWDeviceClass *k = VIRTIO_CCW_DEVICE_CLASS(klass);
+
+k->realize = virtio_ccw_crypto_realize;
+k->exit = virtio_ccw_exit;
+dc->reset = virtio_ccw_reset;
+dc->props = virtio_ccw_crypto_properties;
+set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+}
+
+static const TypeInfo virtio_ccw_crypto = {
+.name  = TYPE_VIRTIO_CRYPTO_CCW,
+.parent= TYPE_VIRTIO_CCW_DEVICE,
+.instance_size = sizeof(VirtIOCryptoCcw),
+.instance_init = virtio_ccw_crypto_instance_init,
+.class_init= virtio_ccw_crypto_class_init,
+};
+
 static void virtio_ccw_busdev_realize(DeviceState *dev, Error **errp)
 {
 VirtioCcwDevice *_dev = (VirtioCcwDevice *)dev;
@@ -1736,6 +1796,7 @@ static void virtio_ccw_register(void)
 #ifdef CONFIG_VHOST_VSOCK
 type_register_static(&vhost_vsock_ccw_info);
 #endif
+type_register_static(&virtio_ccw_crypto);
 }
 
 type_init(virtio_ccw_register)
diff --git a/hw/s390x/virtio-ccw.h b/hw/s390x/virtio-ccw.h
index 6a04e94f66..41d4010378 100644
--- a/hw/s390x/virtio-ccw.h
+++ b/hw/s390x/virtio-ccw.h
@@ -22,6 +22,7 @@
 #endif
 #include "hw/virtio/virtio-balloon.h"
 #include "hw/virtio/virtio-rng.h"
+#include "hw/virtio/virtio-crypto.h"
 #include "hw/virtio/virtio-bus.h"
 #ifdef CONFIG_VHOST_VSOCK
 #include "hw/virtio/vhost-vsock.h"
@@ -183,6 +184,17 @@ typedef struct VirtIORNGCcw {
 VirtIORNG vdev;
 } VirtIORNGCcw;
 
+/* virtio-crypto-ccw */
+
+#define TYPE_VIRTIO_CRYPTO_CCW "virtio-crypto-ccw"
+#define VIRTIO_CRYPTO_CCW(obj) \
+OBJECT_CHECK(VirtIOCryptoCcw, (obj), TYPE_VIRTIO_CRYPTO_CCW)
+
+typedef struct VirtIOCryptoCcw {
+VirtioCcwDevice parent_obj;
+VirtIOCrypto vdev;
+} VirtIOCryptoCcw;
+
 VirtIODevice *virtio_ccw_get_vdev(SubchDev *sch);
 
 #ifdef CONFIG_VIRTFS
-- 
2.11.0

[Qemu-devel] [PATCH] egl-helpers: Support newer MESA versions

2017-02-20 Thread Frediano Ziglio

According to
https://www.khronos.org/registry/EGL/extensions/MESA/EGL_MESA_platform_gbm.txt
if MESA_platform_gbm is supported display should be initialized
from a GBM handle using eglGetPlatformDisplayEXT.

Signed-off-by: Frediano Ziglio 
---
This should fix
http://www.spinics.net/linux/fedora/libvir/msg142837.html

Tested on Fedora rawhide.
---
 ui/egl-helpers.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/ui/egl-helpers.c b/ui/egl-helpers.c
index cd24568..964c5a5 100644
--- a/ui/egl-helpers.c
+++ b/ui/egl-helpers.c
@@ -219,7 +219,11 @@ int qemu_egl_init_dpy(EGLNativeDisplayType dpy, bool gles, 
bool debug)
 }
 
 egl_dbg("eglGetDisplay (dpy %p) ...\n", dpy);
+#ifdef EGL_MESA_platform_gbm
+qemu_egl_display = eglGetPlatformDisplayEXT(EGL_PLATFORM_GBM_MESA, dpy, 
NULL);
+#else
 qemu_egl_display = eglGetDisplay(dpy);
+#endif
 if (qemu_egl_display == EGL_NO_DISPLAY) {
 error_report("egl: eglGetDisplay failed");
 return -1;
-- 
2.9.3

[Qemu-devel] [Bug 1278166] Re: Last commit to exec.c causes BSOD installing WinXP on i386-softmmu

2017-02-20 Thread Thomas Huth

Can you still reproduce this issue with the latest version of QEMU?

** Changed in: qemu
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1278166

Title:
  Last commit to exec.c causes BSOD installing WinXP on i386-softmmu

Status in QEMU:
  Incomplete

Bug description:
  The last commit to exec.c (360e607b88a23d378f6efaa769c76d26f538234d),
  causes a BSOD when trying to install a 32bit Windows XP SP-3 image
  using the pure emulation version of i386-softmmu. A checkout of the
  previous version of the file (commited in
  0169c511554cb0014a00290b0d3d26c31a49818f) solves the problem.
  Nevertheless, this last commit was intented to solve a BSOD when Xen
  was used as a hypervisor.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1278166/+subscriptions

[Qemu-devel] [PULL 19/24] coroutine-lock: make CoMutex thread-safe

2017-02-20 Thread Stefan Hajnoczi

From: Paolo Bonzini 

This uses the lock-free mutex described in the paper '"Blocking without
Locking", or LFTHREADS: A lock-free thread library' by Gidenstam and
Papatriantafilou.  The same technique is used in OSv, and in fact
the code is essentially a conversion to C of OSv's code.

[Added missing coroutine_fn in tests/test-aio-multithread.c.
--Stefan]

Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Message-id: 20170213181244.16297-2-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/qemu/coroutine.h |  17 -
 tests/test-aio-multithread.c |  86 
 util/qemu-coroutine-lock.c   | 155 ---
 util/trace-events|   1 +
 4 files changed, 246 insertions(+), 13 deletions(-)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index 12584ed..fce228f 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -160,10 +160,23 @@ bool qemu_co_queue_empty(CoQueue *queue);
 /**
  * Provides a mutex that can be used to synchronise coroutines
  */
+struct CoWaitRecord;
 typedef struct CoMutex {
-bool locked;
+/* Count of pending lockers; 0 for a free mutex, 1 for an
+ * uncontended mutex.
+ */
+unsigned locked;
+
+/* A queue of waiters.  Elements are added atomically in front of
+ * from_push.  to_pop is only populated, and popped from, by whoever
+ * is in charge of the next wakeup.  This can be an unlocker or,
+ * through the handoff protocol, a locker that is about to go to sleep.
+ */
+QSLIST_HEAD(, CoWaitRecord) from_push, to_pop;
+
+unsigned handoff, sequence;
+
 Coroutine *holder;
-CoQueue queue;
 } CoMutex;
 
 /**
diff --git a/tests/test-aio-multithread.c b/tests/test-aio-multithread.c
index 534807d..4fa2e9b 100644
--- a/tests/test-aio-multithread.c
+++ b/tests/test-aio-multithread.c
@@ -196,6 +196,88 @@ static void test_multi_co_schedule_10(void)
 test_multi_co_schedule(10);
 }
 
+/* CoMutex thread-safety.  */
+
+static uint32_t atomic_counter;
+static uint32_t running;
+static uint32_t counter;
+static CoMutex comutex;
+
+static void coroutine_fn test_multi_co_mutex_entry(void *opaque)
+{
+while (!atomic_mb_read(&now_stopping)) {
+qemu_co_mutex_lock(&comutex);
+counter++;
+qemu_co_mutex_unlock(&comutex);
+
+/* Increase atomic_counter *after* releasing the mutex.  Otherwise
+ * there is a chance (it happens about 1 in 3 runs) that the iothread
+ * exits before the coroutine is woken up, causing a spurious
+ * assertion failure.
+ */
+atomic_inc(&atomic_counter);
+}
+atomic_dec(&running);
+}
+
+static void test_multi_co_mutex(int threads, int seconds)
+{
+int i;
+
+qemu_co_mutex_init(&comutex);
+counter = 0;
+atomic_counter = 0;
+now_stopping = false;
+
+create_aio_contexts();
+assert(threads <= NUM_CONTEXTS);
+running = threads;
+for (i = 0; i < threads; i++) {
+Coroutine *co1 = qemu_coroutine_create(test_multi_co_mutex_entry, 
NULL);
+aio_co_schedule(ctx[i], co1);
+}
+
+g_usleep(seconds * 100);
+
+atomic_mb_set(&now_stopping, true);
+while (running > 0) {
+g_usleep(10);
+}
+
+join_aio_contexts();
+g_test_message("%d iterations/second\n", counter / seconds);
+g_assert_cmpint(counter, ==, atomic_counter);
+}
+
+/* Testing with NUM_CONTEXTS threads focuses on the queue.  The mutex however
+ * is too contended (and the threads spend too much time in aio_poll)
+ * to actually stress the handoff protocol.
+ */
+static void test_multi_co_mutex_1(void)
+{
+test_multi_co_mutex(NUM_CONTEXTS, 1);
+}
+
+static void test_multi_co_mutex_10(void)
+{
+test_multi_co_mutex(NUM_CONTEXTS, 10);
+}
+
+/* Testing with fewer threads stresses the handoff protocol too.  Still, the
+ * case where the locker _can_ pick up a handoff is very rare, happening
+ * about 10 times in 1 million, so increase the runtime a bit compared to
+ * other "quick" testcases that only run for 1 second.
+ */
+static void test_multi_co_mutex_2_3(void)
+{
+test_multi_co_mutex(2, 3);
+}
+
+static void test_multi_co_mutex_2_30(void)
+{
+test_multi_co_mutex(2, 30);
+}
+
 /* End of tests.  */
 
 int main(int argc, char **argv)
@@ -206,8 +288,12 @@ int main(int argc, char **argv)
 g_test_add_func("/aio/multi/lifecycle", test_lifecycle);
 if (g_test_quick()) {
 g_test_add_func("/aio/multi/schedule", test_multi_co_schedule_1);
+g_test_add_func("/aio/multi/mutex/contended", test_multi_co_mutex_1);
+g_test_add_func("/aio/multi/mutex/handoff", test_multi_co_mutex_2_3);
 } else {
 g_test_add_func("/aio/multi/schedule", test_multi_co_schedule_10);
+g_test_add_func("/aio/multi/mutex/contended", test_multi_co_mutex_10);
+g_test_add_func("/aio/multi/mutex/handoff", test_multi_co_mutex_2_30);
 }
 return g_test_run();
 }
diff --gi

[Qemu-devel] [PATCH 08/12] virtio-ccw: check flic->adapter_routes_max_batch

2017-02-20 Thread Cornelia Huck

From: Halil Pasic 

Currently VIRTIO_CCW_QUEUE_MAX is defined as ADAPTER_ROUTES_MAX_GSI.
That is when checking queue max we implicitly check the constraint
concerning the number of adapter routes. This won't be satisfactory any
more (due to backward migration considerations) if ADAPTER_ROUTES_MAX_GSI
changes (ADAPTER_ROUTES_MAX_GSI is going to change because we want to
support up to VIRTIO_QUEUE_MAX queues per virtio-ccw device).

Let us introduce a check on a recently introduce flic property which
gives us the compatibility machine aware limit on adapter routes.

Signed-off-by: Halil Pasic 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/virtio-ccw.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index 771411ea3c..a2ea95947f 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -1319,6 +1319,7 @@ static void virtio_ccw_device_plugged(DeviceState *d, 
Error **errp)
 CcwDevice *ccw_dev = CCW_DEVICE(d);
 SubchDev *sch = ccw_dev->sch;
 int n = virtio_get_num_queues(vdev);
+S390FLICState *flic = s390_get_flic();
 
 if (!virtio_has_feature(vdev->host_features, VIRTIO_F_VERSION_1)) {
 dev->max_rev = 0;
@@ -1330,6 +1331,12 @@ static void virtio_ccw_device_plugged(DeviceState *d, 
Error **errp)
VIRTIO_CCW_QUEUE_MAX);
 return;
 }
+if (virtio_get_num_queues(vdev) > flic->adapter_routes_max_batch) {
+error_setg(errp, "The number of virtqueues %d "
+   "exceeds flic adapter route limit %d", n,
+   flic->adapter_routes_max_batch);
+return;
+}
 
 sch->id.cu_model = virtio_bus_get_vdev_id(&dev->bus);
 
-- 
2.11.0

[Qemu-devel] [PATCH 03/12] s390x/flic: fail migration on source already

2017-02-20 Thread Cornelia Huck

Current code puts a 'FLIC_FAILED' marker into the migration stream
to indicate something went wrong while saving flic state and fails
load if it encounters that marker. VMState's put routine recently
gained the ability to return error codes (but did not wire it up
yet).

In order to be able to reap the benefits of returning an error and
failing migration on the source already once this gets wired up
in core, return an error in addition to storing 'FLIC_FAILED'.

Suggested-by: Dr. David Alan Gilbert 
Signed-off-by: Cornelia Huck 
Reviewed-by: Jens Freimann 
Reviewed-by: Christian Borntraeger 
---
 hw/intc/s390_flic_kvm.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/intc/s390_flic_kvm.c b/hw/intc/s390_flic_kvm.c
index e86a84e49a..cc44bc4e1e 100644
--- a/hw/intc/s390_flic_kvm.c
+++ b/hw/intc/s390_flic_kvm.c
@@ -293,6 +293,7 @@ static int kvm_flic_save(QEMUFile *f, void *opaque, size_t 
size,
 int len = FLIC_SAVE_INITIAL_SIZE;
 void *buf;
 int count;
+int r = 0;
 
 flic_disable_wait_pfault((struct KVMS390FLICState *) opaque);
 
@@ -303,7 +304,7 @@ static int kvm_flic_save(QEMUFile *f, void *opaque, size_t 
size,
  * migration state */
 error_report("flic: couldn't allocate memory");
 qemu_put_be64(f, FLIC_FAILED);
-return 0;
+return -ENOMEM;
 }
 
 count = __get_all_irqs(flic, &buf, len);
@@ -314,6 +315,7 @@ static int kvm_flic_save(QEMUFile *f, void *opaque, size_t 
size,
  * target system to fail when attempting to load irqs from the
  * migration state */
 qemu_put_be64(f, FLIC_FAILED);
+r = count;
 } else {
 qemu_put_be64(f, count);
 qemu_put_buffer(f, (uint8_t *) buf,
@@ -321,7 +323,7 @@ static int kvm_flic_save(QEMUFile *f, void *opaque, size_t 
size,
 }
 g_free(buf);
 
-return 0;
+return r;
 }
 
 /**
-- 
2.11.0

Re: [Qemu-devel] [PATCH v9 1/2] block/vxhs.c: Add support for a new block device type called "vxhs"

2017-02-20 Thread Daniel P. Berrange

On Sun, Feb 19, 2017 at 02:30:53PM -0800, Ashish Mittal wrote:
> Source code for the qnio library that this code loads can be downloaded from:
> https://github.com/VeritasHyperScale/libqnio.git
> 
> Sample command line using JSON syntax:
> ./x86_64-softmmu/qemu-system-x86_64 -name instance-0008 -S -vnc 0.0.0.0:0
> -k en-us -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
> -msg timestamp=on
> 'json:{"driver":"vxhs","vdisk-id":"c3e9095a-a5ee-4dce-afeb-2a59fb387410",
> "server":{"host":"172.172.17.4","port":""}}'
> 
> Sample command line using URI syntax:
> qemu-img convert -f raw -O raw -n
> /var/lib/nova/instances/_base/0c5eacd5ebea5ed914b6a3e7b18f1ce734c386ad
> vxhs://192.168.0.1:/c6718f6b-0401-441d-a8c3-1f0064d75ee0
> 
> Signed-off-by: Ashish Mittal 
> ---
> 
> v9 changelog:
> (1) Fixes for all the review comments from v8. I have left the definition
> of VXHS_UUID_DEF unchanged pending a better suggestion.
> (2) qcow2 tests now pass on the vxhs test server.
> (3) Packaging changes for libvxhs will be checked in to the git repo soon.
> (4) I have not moved extern QemuUUID qemu_uuid to a separate header file.
> 
> v8 changelog:
> (1) Security implementation for libqnio present in branch 'securify'.
> Please use 'securify' branch for building libqnio and testing
> with this patch.
> (2) Renamed libqnio to libvxhs.
> (3) Pass instance ID to libvxhs for SSL authentication.
> 
> v7 changelog:
> (1) IO failover code has moved out to the libqnio library.
> (2) Fixes for issues reported by Stefan on v6.
> (3) Incorporated the QEMUBH patch provided by Stefan.
> This is a replacement for the pipe mechanism used earlier.
> (4) Fixes to the buffer overflows reported in libqnio.
> (5) Input validations in vxhs.c to prevent any buffer overflows for 
> arguments passed to libqnio.
> 
> v6 changelog:
> (1) Added qemu-iotests for VxHS as a new patch in the series.
> (2) Replaced release version from 2.8 to 2.9 in block-core.json.
> 
> v5 changelog:
> (1) Incorporated v4 review comments.
> 
> v4 changelog:
> (1) Incorporated v3 review comments on QAPI changes.
> (2) Added refcounting for device open/close.
> Free library resources on last device close.
> 
> v3 changelog:
> (1) Added QAPI schema for the VxHS driver.
> 
> v2 changelog:
> (1) Changes done in response to v1 comments.
> 
>  block/Makefile.objs  |   2 +
>  block/trace-events   |  16 ++
>  block/vxhs.c | 527 
> +++
>  configure|  40 
>  qapi/block-core.json |  20 +-
>  5 files changed, 603 insertions(+), 2 deletions(-)
>  create mode 100644 block/vxhs.c
> 

> diff --git a/block/vxhs.c b/block/vxhs.c
> new file mode 100644
> index 000..4f0633e
> --- /dev/null
> +++ b/block/vxhs.c
> @@ -0,0 +1,527 @@
> +/*
> + * QEMU Block driver for Veritas HyperScale (VxHS)
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include 
> +#include 
> +#include "block/block_int.h"
> +#include "qapi/qmp/qerror.h"
> +#include "qapi/qmp/qdict.h"
> +#include "qapi/qmp/qstring.h"
> +#include "trace.h"
> +#include "qemu/uri.h"
> +#include "qapi/error.h"
> +#include "qemu/uuid.h"
> +
> +#define VXHS_OPT_FILENAME   "filename"
> +#define VXHS_OPT_VDISK_ID   "vdisk-id"
> +#define VXHS_OPT_SERVER "server"
> +#define VXHS_OPT_HOST   "host"
> +#define VXHS_OPT_PORT   "port"
> +#define VXHS_UUID_DEF "12345678-1234-1234-1234-123456789012"

Hardcoding a default UUID like this is really dubious. If the
qemu_uuid is unset, and a UUID is required, then it should
simply report an error.=

> +QemuUUID qemu_uuid __attribute__ ((weak));

This is already defined in include/sysemu/system.h

> +
> +static uint32_t vxhs_ref;
> +
> +typedef enum {
> +VDISK_AIO_READ,
> +VDISK_AIO_WRITE,
> +} VDISKAIOCmd;
> +
> +/*
> + * HyperScale AIO callbacks structure
> + */
> +typedef struct VXHSAIOCB {
> +BlockAIOCB common;
> +int err;
> +QEMUIOVector *qiov;
> +} VXHSAIOCB;
> +
> +typedef struct VXHSvDiskHostsInfo {
> +void *dev_handle; /* Device handle */
> +char *host; /* Host name or IP */
> +int port; /* Host's port number */
> +} VXHSvDiskHostsInfo;
> +
> +/*
> + * Structure per vDisk maintained for state
> + */
> +typedef struct BDRVVXHSState {
> +VXHSvDiskHostsInfo vdisk_hostinfo; /* Per host info */
> +char *vdisk_guid;
> +} BDRVVXHSState;
> +
> +static void vxhs_complete_aio_bh(void *opaque)
> +{
> +VXHSAIOCB *acb = opaque;
> +BlockCompletionFunc *cb = acb->common.cb;
> +void *cb_opaque = acb->common.opaque;
> +int ret = 0;
> +
> +if (acb->err != 0) {
> +trace_vxhs_complete_aio(acb, acb->err);
> +ret = (-EIO);
> +}
> +
> +qemu_aio_unref(acb);
> +cb(cb_opaque, ret);
> +}
> +
> +/*
> + * Called from a libqnio thre

Re: [Qemu-devel] [PATCH v8 1/2] block/vxhs.c: Add support for a new block device type called "vxhs"

2017-02-20 Thread Daniel P. Berrange

On Sat, Feb 18, 2017 at 12:30:31AM +, Ketan Nilangekar wrote:
> On 2/17/17, 1:42 PM, "Jeff Cody"  wrote:
> 
> On Thu, Feb 16, 2017 at 02:24:19PM -0800, ashish mittal wrote:
> > Hi,
> > 
> > I am getting the following error with checkpatch.pl
> > 
> > ERROR: externs should be avoided in .c files
> > #78: FILE: block/vxhs.c:28:
> > +QemuUUID qemu_uuid __attribute__ ((weak));
> > 
> > Is there any way to get around this, or does it mean that I would have
> > to add a vxhs.h just for this one entry?
> >
> 
> I remain skeptical on the use of the qemu_uuid as a way to select the TLS
> cert.
> 
> [ketan]
> Is there another identity that can be used for uniquely identifying instances?
> The requirement was to enforce vdisk access to owner instances.

The UUID is a bad way to do any kind of access control as QEMU could simply
lie about its UUID.

If the server needs to identify the client to do access control you need
something non-spoofable. In the absence of having an authentication protocol
built into the libqnio protocol, the best you could do would be to use the
TLS client certificate distinguished name. QEMU can't lie about that without
having access to the other certificate file - which would be blocked by
SELinux

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|

[Qemu-devel] [PATCH 00/12] more s390x patches for 2.9

2017-02-20 Thread Cornelia Huck

A selection of s390x patches:
- cleanups and improvements
- program check loop detection (useful with the corresponding kernel
  patch)
- wire up virtio-crypto for ccw
- and finally support many virtqueues for virtio-ccw

Christian Borntraeger (3):
  s390x/kvm: detect some program check loops
  s390x/arch_dump: use proper note name and note size
  s390x/arch_dump: pass cpuid into notes sections

Cornelia Huck (2):
  s390x/s390-virtio: get rid of DPRINTF
  s390x/flic: fail migration on source already

Halil Pasic (7):
  virtio-ccw: handle virtio 1 only devices
  virtio-ccw: add virtio-crypto-ccw device
  virtio-ccw: Check the number of vqs in CCW_CMD_SET_IND
  s390x: add property adapter_routes_max_batch
  virtio-ccw: check flic->adapter_routes_max_batch
  s390x: bump ADAPTER_ROUTES_MAX_GSI
  virtio-ccw: support VIRTIO_QUEUE_MAX virtqueues

 hw/intc/s390_flic.c  |  28 +++
 hw/intc/s390_flic_kvm.c  |   6 ++-
 hw/s390x/s390-virtio-ccw.c   |   9 +++-
 hw/s390x/s390-virtio.c   |  10 
 hw/s390x/virtio-ccw.c| 109 +++
 hw/s390x/virtio-ccw.h|  13 ++
 include/hw/s390x/s390_flic.h |  11 -
 target/s390x/arch_dump.c |  66 +++---
 target/s390x/kvm.c   |  43 +++--
 9 files changed, 241 insertions(+), 54 deletions(-)

-- 
2.11.0

Re: [Qemu-devel] [PATCH v8 2/5] hw/intc/arm_gicv3_kvm: Add ICC_SRE_EL1 register to vmstate

2017-02-20 Thread Peter Maydell

On 20 February 2017 at 06:21, Vijay Kilari  wrote:
> Hi Peter,
>
> On Fri, Feb 17, 2017 at 7:25 PM, Peter Maydell  
> wrote:
[on the guest-visible ICC_SRE_EL1 value]
>> Is there a situation where KVM might allow a value other
>> than 0x7?
>
> In KVM, the SRE_EL1 value is 0x1. During save, value
> read from KVM is 0x1 though we reset to 0x7.

0x1 meanss "System Register Interface enabled, IRQ
bypass enabled, FIQ bypass enabled". This seems
rather a weird setting, because it means "the GICv3
CPU interface functionality is disabled and the GICv3
should signal interrupts via legacy IRQ and FIQ".
Does KVM really support IRQ/FIQ bypass and does Linux
really leave it enabled rather than turning it off
by writing the value to 1?

My expectation was that the KVM GICv3 emulation would
make these bits RAO/WI like the TCG implementation.
Is there maybe a bug in the kernel side where it
doesn't implement bypass but has made these bits be
RAZ/WI rather than RAO/WI ?

thanks
-- PMM

[Qemu-devel] [PATCH 07/12] s390x: add property adapter_routes_max_batch

2017-02-20 Thread Cornelia Huck

From: Halil Pasic 

To make virtio-ccw supports more that  64 virtqueues we will have to
increase ADAPTER_ROUTES_MAX_GSI which is currently limiting the number if
possible adapter routes. Of course increasing the number of supported
routes can break backwards migration.

Let us introduce a compatibility property adapter_routes_max_batch so
client code can use the some old limit if in compatibility mode and
retain the migration compatibility.

Signed-off-by: Halil Pasic 
Signed-off-by: Cornelia Huck 
---
 hw/intc/s390_flic.c  | 28 
 include/hw/s390x/s390_flic.h |  2 ++
 2 files changed, 30 insertions(+)

diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c
index 6ab29efc65..bef4caf980 100644
--- a/hw/intc/s390_flic.c
+++ b/hw/intc/s390_flic.c
@@ -16,6 +16,8 @@
 #include "migration/qemu-file.h"
 #include "hw/s390x/s390_flic.h"
 #include "trace.h"
+#include "hw/qdev.h"
+#include "qapi/error.h"
 
 S390FLICState *s390_get_flic(void)
 {
@@ -85,6 +87,30 @@ static void qemu_s390_flic_class_init(ObjectClass *oc, void 
*data)
 fsc->clear_io_irq = qemu_s390_clear_io_flic;
 }
 
+static Property s390_flic_common_properties[] = {
+DEFINE_PROP_UINT32("adapter_routes_max_batch", S390FLICState,
+   adapter_routes_max_batch, ADAPTER_ROUTES_MAX_GSI),
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static void s390_flic_common_realize(DeviceState *dev, Error **errp)
+{
+uint32_t max_batch = S390_FLIC_COMMON(dev)->adapter_routes_max_batch;
+
+if (max_batch > ADAPTER_ROUTES_MAX_GSI) {
+error_setg(errp, "flic adapter_routes_max_batch too big"
+   "%d (%d allowed)", max_batch, ADAPTER_ROUTES_MAX_GSI);
+}
+}
+
+static void s390_flic_class_init(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+
+dc->props = s390_flic_common_properties;
+dc->realize = s390_flic_common_realize;
+}
+
 static const TypeInfo qemu_s390_flic_info = {
 .name  = TYPE_QEMU_S390_FLIC,
 .parent= TYPE_S390_FLIC_COMMON,
@@ -92,10 +118,12 @@ static const TypeInfo qemu_s390_flic_info = {
 .class_init= qemu_s390_flic_class_init,
 };
 
+
 static const TypeInfo s390_flic_common_info = {
 .name  = TYPE_S390_FLIC_COMMON,
 .parent= TYPE_SYS_BUS_DEVICE,
 .instance_size = sizeof(S390FLICState),
+.class_init= s390_flic_class_init,
 .class_size= sizeof(S390FLICStateClass),
 };
 
diff --git a/include/hw/s390x/s390_flic.h b/include/hw/s390x/s390_flic.h
index 9094edadf5..9f0b05c71b 100644
--- a/include/hw/s390x/s390_flic.h
+++ b/include/hw/s390x/s390_flic.h
@@ -32,6 +32,8 @@ typedef struct AdapterRoutes {
 
 typedef struct S390FLICState {
 SysBusDevice parent_obj;
+/* to limit AdapterRoutes.num_routes for compat */
+uint32_t adapter_routes_max_batch;
 
 } S390FLICState;
 
-- 
2.11.0

[Qemu-devel] [PATCH v1 03/10] target/ppc: move subf logic block

2017-02-20 Thread Nikunj A Dadhania

Signed-off-by: Nikunj A Dadhania 
---
 target/ppc/translate.c | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 2a2d071..77045be 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -1389,17 +1389,19 @@ static inline void gen_op_arith_subf(DisasContext *ctx, 
TCGv ret, TCGv arg1,
 tcg_temp_free(t1);
 tcg_gen_shri_tl(cpu_ca, cpu_ca, 32);/* extract bit 32 */
 tcg_gen_andi_tl(cpu_ca, cpu_ca, 1);
-} else if (add_ca) {
-TCGv zero, inv1 = tcg_temp_new();
-tcg_gen_not_tl(inv1, arg1);
-zero = tcg_const_tl(0);
-tcg_gen_add2_tl(t0, cpu_ca, arg2, zero, cpu_ca, zero);
-tcg_gen_add2_tl(t0, cpu_ca, t0, cpu_ca, inv1, zero);
-tcg_temp_free(zero);
-tcg_temp_free(inv1);
 } else {
-tcg_gen_setcond_tl(TCG_COND_GEU, cpu_ca, arg2, arg1);
-tcg_gen_sub_tl(t0, arg2, arg1);
+if (add_ca) {
+TCGv zero, inv1 = tcg_temp_new();
+tcg_gen_not_tl(inv1, arg1);
+zero = tcg_const_tl(0);
+tcg_gen_add2_tl(t0, cpu_ca, arg2, zero, cpu_ca, zero);
+tcg_gen_add2_tl(t0, cpu_ca, t0, cpu_ca, inv1, zero);
+tcg_temp_free(zero);
+tcg_temp_free(inv1);
+} else {
+tcg_gen_setcond_tl(TCG_COND_GEU, cpu_ca, arg2, arg1);
+tcg_gen_sub_tl(t0, arg2, arg1);
+}
 }
 } else if (add_ca) {
 /* Since we're ignoring carry-out, we can simplify the
-- 
2.7.4

[Qemu-devel] [PATCH 01/12] s390x/s390-virtio: get rid of DPRINTF

2017-02-20 Thread Cornelia Huck

The DPRINTF approach is likely to introduce bitrot, and the preferred
way for debugging is tracing anyway. Fortunately, there are no users
(left), so nuke it.

Signed-off-by: Cornelia Huck 
Reviewed-by: Halil Pasic 
---
 hw/s390x/s390-virtio.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/hw/s390x/s390-virtio.c b/hw/s390x/s390-virtio.c
index 7a3a7fe5fd..9cfb09057e 100644
--- a/hw/s390x/s390-virtio.c
+++ b/hw/s390x/s390-virtio.c
@@ -44,16 +44,6 @@
 #include "hw/s390x/ipl.h"
 #include "cpu.h"
 
-//#define DEBUG_S390
-
-#ifdef DEBUG_S390
-#define DPRINTF(fmt, ...) \
-do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
-#else
-#define DPRINTF(fmt, ...) \
-do { } while (0)
-#endif
-
 #define MAX_BLK_DEVS10
 
 #define S390_TOD_CLOCK_VALUE_MISSING0x00
-- 
2.11.0

[Qemu-devel] [PATCH 11/12] s390x/arch_dump: use proper note name and note size

2017-02-20 Thread Cornelia Huck

From: Christian Borntraeger 

In binutils/libbfd (bfd/elf.c) it is enforced that all s390
specific ELF notes like e.g. NT_S390_PREFIX or NT_S390_CTRS
have "LINUX" specified as note name and that the namesz is
6. Otherwise the notes are ignored.

QEMU currently uses "CORE" for these notes. Up to now this has
not been a real problem because the dump analysis tool "crash"
does handle that. But it will break all programs that use libbfd
for processing ELF notes.

So fix this and use "LINUX" for all s390 specific notes to comply
with libbfd. Also set the correct namesz.

Reported-by: Philipp Rudo 
Signed-off-by: Christian Borntraeger 
Signed-off-by: Cornelia Huck 
---
 target/s390x/arch_dump.c | 43 ---
 1 file changed, 28 insertions(+), 15 deletions(-)

diff --git a/target/s390x/arch_dump.c b/target/s390x/arch_dump.c
index 4731869f6b..887cae947e 100644
--- a/target/s390x/arch_dump.c
+++ b/target/s390x/arch_dump.c
@@ -59,8 +59,7 @@ typedef struct S390xElfVregsHiStruct S390xElfVregsHi;
 
 typedef struct noteStruct {
 Elf64_Nhdr hdr;
-char name[5];
-char pad3[3];
+char name[8];
 union {
 S390xElfPrstatus prstatus;
 S390xElfFpregset fpregset;
@@ -162,13 +161,19 @@ static void s390x_write_elf64_prefix(Note *note, S390CPU 
*cpu)
 }
 
 
-static const struct NoteFuncDescStruct {
+typedef struct NoteFuncDescStruct {
 int contents_size;
 void (*note_contents_func)(Note *note, S390CPU *cpu);
-} note_func[] = {
+} NoteFuncDesc;
+
+static const NoteFuncDesc note_core[] = {
 {sizeof(((Note *)0)->contents.prstatus), s390x_write_elf64_prstatus},
-{sizeof(((Note *)0)->contents.prefix),   s390x_write_elf64_prefix},
 {sizeof(((Note *)0)->contents.fpregset), s390x_write_elf64_fpregset},
+{ 0, NULL}
+};
+
+static const NoteFuncDesc note_linux[] = {
+{sizeof(((Note *)0)->contents.prefix),   s390x_write_elf64_prefix},
 {sizeof(((Note *)0)->contents.ctrs), s390x_write_elf64_ctrs},
 {sizeof(((Note *)0)->contents.timer),s390x_write_elf64_timer},
 {sizeof(((Note *)0)->contents.todcmp),   s390x_write_elf64_todcmp},
@@ -178,22 +183,20 @@ static const struct NoteFuncDescStruct {
 { 0, NULL}
 };
 
-typedef struct NoteFuncDescStruct NoteFuncDesc;
-
-
-static int s390x_write_all_elf64_notes(const char *note_name,
+static int s390x_write_elf64_notes(const char *note_name,
WriteCoreDumpFunction f,
S390CPU *cpu, int id,
-   void *opaque)
+   void *opaque,
+   const NoteFuncDesc *funcs)
 {
 Note note;
 const NoteFuncDesc *nf;
 int note_size;
 int ret = -1;
 
-for (nf = note_func; nf->note_contents_func; nf++) {
+for (nf = funcs; nf->note_contents_func; nf++) {
 memset(¬e, 0, sizeof(note));
-note.hdr.n_namesz = cpu_to_be32(sizeof(note.name));
+note.hdr.n_namesz = cpu_to_be32(strlen(note_name) + 1);
 note.hdr.n_descsz = cpu_to_be32(nf->contents_size);
 strncpy(note.name, note_name, sizeof(note.name));
 (*nf->note_contents_func)(¬e, cpu);
@@ -215,7 +218,13 @@ int s390_cpu_write_elf64_note(WriteCoreDumpFunction f, 
CPUState *cs,
   int cpuid, void *opaque)
 {
 S390CPU *cpu = S390_CPU(cs);
-return s390x_write_all_elf64_notes("CORE", f, cpu, cpuid, opaque);
+int r;
+
+r = s390x_write_elf64_notes("CORE", f, cpu, cpuid, opaque, note_core);
+if (r) {
+return r;
+}
+return s390x_write_elf64_notes("LINUX", f, cpu, cpuid, opaque, note_linux);
 }
 
 int cpu_get_dump_info(ArchDumpInfo *info,
@@ -230,7 +239,7 @@ int cpu_get_dump_info(ArchDumpInfo *info,
 
 ssize_t cpu_get_note_size(int class, int machine, int nr_cpus)
 {
-int name_size = 8; /* "CORE" or "QEMU" rounded */
+int name_size = 8; /* "LINUX" or "CORE" + pad */
 size_t elf_note_size = 0;
 int note_head_size;
 const NoteFuncDesc *nf;
@@ -240,7 +249,11 @@ ssize_t cpu_get_note_size(int class, int machine, int 
nr_cpus)
 
 note_head_size = sizeof(Elf64_Nhdr);
 
-for (nf = note_func; nf->note_contents_func; nf++) {
+for (nf = note_core; nf->note_contents_func; nf++) {
+elf_note_size = elf_note_size + note_head_size + name_size +
+nf->contents_size;
+}
+for (nf = note_linux; nf->note_contents_func; nf++) {
 elf_note_size = elf_note_size + note_head_size + name_size +
 nf->contents_size;
 }
-- 
2.11.0

[Qemu-devel] [PATCH 02/12] s390x/kvm: detect some program check loops

2017-02-20 Thread Cornelia Huck

From: Christian Borntraeger 

Sometimes (e.g. early boot) a guest is broken in such ways that it loops
100% delivering operation exceptions (illegal operation) but the pgm new
PSW is not set properly. This will result in code being read from
address zero, which usually contains another illegal op. Let's detect
this case and put the guest in crashed state. Instead of only detecting
this for address zero apply a heuristic that will work for any program
check new psw so that it will also reach the crashed state if you
provide some random elf file to the -kernel option.
We do not want guest problem state to be able to trigger a guest panic,
e.g. by faulting on an address that is the same as the program check
new PSW, so we check for the problem state bit being off.

With this we
a: get rid of CPU consumption of such broken guests
b: keep the program old PSW. This allows to find out the original illegal
   operation - making debugging such early boot issues much easier than
   with single stepping

This relies on the kernel using a similar heuristic and passing such
operation exceptions to user space.

Signed-off-by: Christian Borntraeger 
Signed-off-by: Cornelia Huck 
---
 target/s390x/kvm.c | 43 ---
 1 file changed, 40 insertions(+), 3 deletions(-)

diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
index 25367807f4..5ec050cf89 100644
--- a/target/s390x/kvm.c
+++ b/target/s390x/kvm.c
@@ -1867,6 +1867,40 @@ static void unmanageable_intercept(S390CPU *cpu, const 
char *str, int pswoffset)
 qemu_system_guest_panicked(NULL);
 }
 
+/* try to detect pgm check loops */
+static int handle_oper_loop(S390CPU *cpu, struct kvm_run *run)
+{
+CPUState *cs = CPU(cpu);
+PSW oldpsw, newpsw;
+
+cpu_synchronize_state(cs);
+newpsw.mask = ldq_phys(cs->as, cpu->env.psa +
+   offsetof(LowCore, program_new_psw));
+newpsw.addr = ldq_phys(cs->as, cpu->env.psa +
+   offsetof(LowCore, program_new_psw) + 8);
+oldpsw.mask  = run->psw_mask;
+oldpsw.addr  = run->psw_addr;
+/*
+ * Avoid endless loops of operation exceptions, if the pgm new
+ * PSW will cause a new operation exception.
+ * The heuristic checks if the pgm new psw is within 6 bytes before
+ * the faulting psw address (with same DAT, AS settings) and the
+ * new psw is not a wait psw and the fault was not triggered by
+ * problem state. In that case go into crashed state.
+ */
+
+if (oldpsw.addr - newpsw.addr <= 6 &&
+!(newpsw.mask & PSW_MASK_WAIT) &&
+!(oldpsw.mask & PSW_MASK_PSTATE) &&
+(newpsw.mask & PSW_MASK_ASC) == (oldpsw.mask & PSW_MASK_ASC) &&
+(newpsw.mask & PSW_MASK_DAT) == (oldpsw.mask & PSW_MASK_DAT)) {
+unmanageable_intercept(cpu, "operation exception loop",
+   offsetof(LowCore, program_new_psw));
+return EXCP_HALTED;
+}
+return 0;
+}
+
 static int handle_intercept(S390CPU *cpu)
 {
 CPUState *cs = CPU(cpu);
@@ -1914,11 +1948,14 @@ static int handle_intercept(S390CPU *cpu)
 r = EXCP_HALTED;
 break;
 case ICPT_OPEREXC:
-/* currently only instr 0x after enabled via capability */
+/* check for break points */
 r = handle_sw_breakpoint(cpu, run);
 if (r == -ENOENT) {
-enter_pgmcheck(cpu, PGM_OPERATION);
-r = 0;
+/* Then check for potential pgm check loops */
+r = handle_oper_loop(cpu, run);
+if (r == 0) {
+enter_pgmcheck(cpu, PGM_OPERATION);
+}
 }
 break;
 case ICPT_SOFT_INTERCEPT:
-- 
2.11.0

[Qemu-devel] [PATCH 04/12] virtio-ccw: handle virtio 1 only devices

2017-02-20 Thread Cornelia Huck

From: Halil Pasic 

As a preparation for wiring-up virtio-crypto, the first non-transitional
virtio device on the ccw transport, let us introduce a mechanism for
disabling revision 0.  This is more or less equivalent with disabling
legacy as revision 0 is legacy only, and legacy drivers use the revision
0 exclusively.

Signed-off-by: Halil Pasic 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/virtio-ccw.c | 18 +-
 hw/s390x/virtio-ccw.h |  1 +
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index 63c46373fb..32c414f23d 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -280,6 +280,15 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
ccw.cmd_code);
 check_len = !((ccw.flags & CCW_FLAG_SLI) && !(ccw.flags & CCW_FLAG_DC));
 
+if (dev->force_revision_1 && dev->revision < 0 &&
+ccw.cmd_code != CCW_CMD_SET_VIRTIO_REV) {
+/*
+ * virtio-1 drivers must start with negotiating to a revision >= 1,
+ * so post a command reject for all other commands
+ */
+return -ENOSYS;
+}
+
 /* Look at the command. */
 switch (ccw.cmd_code) {
 case CCW_CMD_SET_VQ:
@@ -638,7 +647,8 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
  * need to fetch it here. Nothing to do for now, though.
  */
 if (dev->revision >= 0 ||
-revinfo.revision > virtio_ccw_rev_max(dev)) {
+revinfo.revision > virtio_ccw_rev_max(dev) ||
+(dev->force_revision_1 && !revinfo.revision)) {
 ret = -ENOSYS;
 break;
 }
@@ -669,6 +679,12 @@ static void virtio_ccw_device_realize(VirtioCcwDevice 
*dev, Error **errp)
 if (!sch) {
 return;
 }
+if (!virtio_ccw_rev_max(dev) && dev->force_revision_1) {
+error_setg(&err, "Invalid value of property max_rev "
+   "(is %d expected >= 1)", virtio_ccw_rev_max(dev));
+error_propagate(errp, err);
+return;
+}
 
 sch->driver_data = dev;
 sch->ccw_cb = virtio_ccw_cb;
diff --git a/hw/s390x/virtio-ccw.h b/hw/s390x/virtio-ccw.h
index 77d10f1671..6a04e94f66 100644
--- a/hw/s390x/virtio-ccw.h
+++ b/hw/s390x/virtio-ccw.h
@@ -94,6 +94,7 @@ struct VirtioCcwDevice {
 IndAddr *indicators2;
 IndAddr *summary_indicator;
 uint64_t ind_bit;
+bool force_revision_1;
 };
 
 /* The maximum virtio revision we support. */
-- 
2.11.0

[Qemu-devel] [PATCH v1 01/10] target/ppc: support for 32-bit carry and overflow

2017-02-20 Thread Nikunj A Dadhania

POWER ISA 3.0 adds CA32 and OV32 status in 64-bit mode. Add the flags
and corresponding defines.

Moreover, CA32 is updated when CA is updated and OV32 is updated when OV
is updated.

Arithmetic instructions:
* Addition and Substractions:

addic, addic., subfic, addc, subfc, adde, subfe, addme, subfme,
addze, and subfze always updates CA and CA32.

=> CA reflects the carry out of bit 0 in 64-bit mode and out of
   bit 32 in 32-bit mode.
=> CA32 reflects the carry out of bit 32 independent of the
   mode.

=> SO and OV reflects overflow of the 64-bit result in 64-bit
   mode and overflow of the low-order 32-bit result in 32-bit
   mode
=> OV32 reflects overflow of the low-order 32-bit independent of
   the mode

* Multiply Low and Divide:

For mulld, divd, divde, divdu and divdeu: SO, OV, and OV32 bits
reflects overflow of the 64-bit result

For mullw, divw, divwe, divwu and divweu: SO, OV, and OV32 bits
reflects overflow of the 32-bit result

 * Negate with OE=1 (nego)

   For 64-bit mode if the register RA contains
   0x8000___, OV and OV32 are set to 1.

   For 32-bit mode if the register RA contains 0x8000_, OV and
   OV32 are set to 1.

Signed-off-by: Nikunj A Dadhania 
---
 target/ppc/cpu.h   | 30 ++
 target/ppc/translate.c | 17 -
 2 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 425e79d..ef392f0 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -965,6 +965,8 @@ struct CPUPPCState {
 target_ulong so;
 target_ulong ov;
 target_ulong ca;
+target_ulong ov32;
+target_ulong ca32;
 /* Reservation address */
 target_ulong reserve_addr;
 /* Reservation value */
@@ -1372,11 +1374,15 @@ int ppc_compat_max_threads(PowerPCCPU *cpu);
 #define XER_SO  31
 #define XER_OV  30
 #define XER_CA  29
+#define XER_OV32  19
+#define XER_CA32  18
 #define XER_CMP  8
 #define XER_BC   0
 #define xer_so  (env->so)
 #define xer_ov  (env->ov)
 #define xer_ca  (env->ca)
+#define xer_ov32  (env->ov)
+#define xer_ca32  (env->ca)
 #define xer_cmp ((env->xer >> XER_CMP) & 0xFF)
 #define xer_bc  ((env->xer >> XER_BC)  & 0x7F)
 
@@ -2343,11 +2349,21 @@ enum {
 
 /*/
 
+#ifndef TARGET_PPC64
 static inline target_ulong cpu_read_xer(CPUPPCState *env)
 {
 return env->xer | (env->so << XER_SO) | (env->ov << XER_OV) | (env->ca << 
XER_CA);
 }
+#else
+static inline target_ulong cpu_read_xer(CPUPPCState *env)
+{
+return env->xer | (env->so << XER_SO) |
+(env->ov << XER_OV) | (env->ca << XER_CA) |
+(env->ov32 << XER_OV32) | (env->ca32 << XER_CA32);
+}
+#endif
 
+#ifndef TARGET_PPC64
 static inline void cpu_write_xer(CPUPPCState *env, target_ulong xer)
 {
 env->so = (xer >> XER_SO) & 1;
@@ -2355,6 +2371,20 @@ static inline void cpu_write_xer(CPUPPCState *env, 
target_ulong xer)
 env->ca = (xer >> XER_CA) & 1;
 env->xer = xer & ~((1u << XER_SO) | (1u << XER_OV) | (1u << XER_CA));
 }
+#else
+static inline void cpu_write_xer(CPUPPCState *env, target_ulong xer)
+{
+env->so = (xer >> XER_SO) & 1;
+env->ov = (xer >> XER_OV) & 1;
+env->ca = (xer >> XER_CA) & 1;
+env->ov32 = (xer >> XER_OV32) & 1;
+env->ca32 = (xer >> XER_CA32) & 1;
+env->xer = xer & ~((1ul << XER_SO) |
+   (1ul << XER_OV) | (1ul << XER_CA) |
+   (1ul << XER_OV32) | (1ul << XER_CA32));
+}
+#endif
+
 
 static inline void cpu_get_tb_cpu_state(CPUPPCState *env, target_ulong *pc,
 target_ulong *cs_base, uint32_t *flags)
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 3ba2616..498b095 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -71,7 +71,7 @@ static TCGv cpu_lr;
 #if defined(TARGET_PPC64)
 static TCGv cpu_cfar;
 #endif
-static TCGv cpu_xer, cpu_so, cpu_ov, cpu_ca;
+static TCGv cpu_xer, cpu_so, cpu_ov, cpu_ca, cpu_ov32, cpu_ca32;
 static TCGv cpu_reserve;
 static TCGv cpu_fpscr;
 static TCGv_i32 cpu_access_type;
@@ -173,6 +173,10 @@ void ppc_translate_init(void)
 offsetof(CPUPPCState, ov), "OV");
 cpu_ca = tcg_global_mem_new(cpu_env,
 offsetof(CPUPPCState, ca), "CA");
+cpu_ov32 = tcg_global_mem_new(cpu_env,
+  offsetof(CPUPPCState, ov32), "OV32");
+cpu_ca32 = tcg_global_mem_new(cpu_env,
+  offsetof(CPUPPCState, ca32), "CA32");
 
 cpu_reserve = tcg_global_mem_new(cpu_env,
  offsetof(CPUPPCState, reserve_addr),
@@ -3715,6 +3719,12 @@ static void gen_read_xer(TCGv dst)
 tcg_gen_or_tl(t0, t0, t1);
 tcg_gen_or_tl(dst, dst, t2);
 tcg_gen_or_tl(dst, dst, t0

[Qemu-devel] [PATCH 12/12] s390x/arch_dump: pass cpuid into notes sections

2017-02-20 Thread Cornelia Huck

From: Christian Borntraeger 

we need to pass the cpuid into the pid field of the notes
section, otherwise the notes for different CPUs all have 0:

e.g. objdump -h shows:
old:
  5 .reg-s390-prefix/0 0004    
  6 .reg-s390-prefix 0004    
 21 .reg-s390-prefix/0 0004    
new:
  5 .reg-s390-prefix/1 0004    
  6 .reg-s390-prefix 0004    
 21 .reg-s390-prefix/2 0004    

Reported-by: Philipp Rudo 
Signed-off-by: Christian Borntraeger 
Signed-off-by: Cornelia Huck 
---
 target/s390x/arch_dump.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/target/s390x/arch_dump.c b/target/s390x/arch_dump.c
index 887cae947e..105ae9a5d8 100644
--- a/target/s390x/arch_dump.c
+++ b/target/s390x/arch_dump.c
@@ -73,7 +73,7 @@ typedef struct noteStruct {
 } contents;
 } QEMU_PACKED Note;
 
-static void s390x_write_elf64_prstatus(Note *note, S390CPU *cpu)
+static void s390x_write_elf64_prstatus(Note *note, S390CPU *cpu, int id)
 {
 int i;
 S390xUserRegs *regs;
@@ -87,9 +87,10 @@ static void s390x_write_elf64_prstatus(Note *note, S390CPU 
*cpu)
 regs->acrs[i] = cpu_to_be32(cpu->env.aregs[i]);
 regs->gprs[i] = cpu_to_be64(cpu->env.regs[i]);
 }
+note->contents.prstatus.pid = id;
 }
 
-static void s390x_write_elf64_fpregset(Note *note, S390CPU *cpu)
+static void s390x_write_elf64_fpregset(Note *note, S390CPU *cpu, int id)
 {
 int i;
 CPUS390XState *cs = &cpu->env;
@@ -101,7 +102,7 @@ static void s390x_write_elf64_fpregset(Note *note, S390CPU 
*cpu)
 }
 }
 
-static void s390x_write_elf64_vregslo(Note *note, S390CPU *cpu)
+static void s390x_write_elf64_vregslo(Note *note, S390CPU *cpu,  int id)
 {
 int i;
 
@@ -111,7 +112,7 @@ static void s390x_write_elf64_vregslo(Note *note, S390CPU 
*cpu)
 }
 }
 
-static void s390x_write_elf64_vregshi(Note *note, S390CPU *cpu)
+static void s390x_write_elf64_vregshi(Note *note, S390CPU *cpu, int id)
 {
 int i;
 S390xElfVregsHi *temp_vregshi;
@@ -125,25 +126,25 @@ static void s390x_write_elf64_vregshi(Note *note, S390CPU 
*cpu)
 }
 }
 
-static void s390x_write_elf64_timer(Note *note, S390CPU *cpu)
+static void s390x_write_elf64_timer(Note *note, S390CPU *cpu, int id)
 {
 note->hdr.n_type = cpu_to_be32(NT_S390_TIMER);
 note->contents.timer = cpu_to_be64((uint64_t)(cpu->env.cputm));
 }
 
-static void s390x_write_elf64_todcmp(Note *note, S390CPU *cpu)
+static void s390x_write_elf64_todcmp(Note *note, S390CPU *cpu, int id)
 {
 note->hdr.n_type = cpu_to_be32(NT_S390_TODCMP);
 note->contents.todcmp = cpu_to_be64((uint64_t)(cpu->env.ckc));
 }
 
-static void s390x_write_elf64_todpreg(Note *note, S390CPU *cpu)
+static void s390x_write_elf64_todpreg(Note *note, S390CPU *cpu, int id)
 {
 note->hdr.n_type = cpu_to_be32(NT_S390_TODPREG);
 note->contents.todpreg = cpu_to_be32((uint32_t)(cpu->env.todpr));
 }
 
-static void s390x_write_elf64_ctrs(Note *note, S390CPU *cpu)
+static void s390x_write_elf64_ctrs(Note *note, S390CPU *cpu, int id)
 {
 int i;
 
@@ -154,7 +155,7 @@ static void s390x_write_elf64_ctrs(Note *note, S390CPU *cpu)
 }
 }
 
-static void s390x_write_elf64_prefix(Note *note, S390CPU *cpu)
+static void s390x_write_elf64_prefix(Note *note, S390CPU *cpu, int id)
 {
 note->hdr.n_type = cpu_to_be32(NT_S390_PREFIX);
 note->contents.prefix = cpu_to_be32((uint32_t)(cpu->env.psa));
@@ -163,7 +164,7 @@ static void s390x_write_elf64_prefix(Note *note, S390CPU 
*cpu)
 
 typedef struct NoteFuncDescStruct {
 int contents_size;
-void (*note_contents_func)(Note *note, S390CPU *cpu);
+void (*note_contents_func)(Note *note, S390CPU *cpu, int id);
 } NoteFuncDesc;
 
 static const NoteFuncDesc note_core[] = {
@@ -199,7 +200,7 @@ static int s390x_write_elf64_notes(const char *note_name,
 note.hdr.n_namesz = cpu_to_be32(strlen(note_name) + 1);
 note.hdr.n_descsz = cpu_to_be32(nf->contents_size);
 strncpy(note.name, note_name, sizeof(note.name));
-(*nf->note_contents_func)(¬e, cpu);
+(*nf->note_contents_func)(¬e, cpu, id);
 
 note_size = sizeof(note) - sizeof(note.contents) + nf->contents_size;
 ret = f(¬e, note_size, opaque);
-- 
2.11.0

[Qemu-devel] [PATCH 06/12] virtio-ccw: Check the number of vqs in CCW_CMD_SET_IND

2017-02-20 Thread Cornelia Huck

From: Halil Pasic 

We cannot support more than 64 virtqueues with the 64 bits provided by
classic indicators. If a driver tries to setup classic indicators
(which it is free to do even for virtio-1 devices) for a device with
more than 64 virtqueues, we should reject the attempt so that the
driver does not end up with an unusable device.

This is in preparation for bumping the number of supported virtqueues
on the ccw transport.

Signed-off-by: Halil Pasic 
Reviewed-by: Cornelia Huck 
Reviewed-by: Christian Borntraeger 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/virtio-ccw.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index 613d8c6615..771411ea3c 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -35,6 +35,8 @@
 #include "trace.h"
 #include "hw/s390x/css-bridge.h"
 
+#define NR_CLASSIC_INDICATOR_BITS 64
+
 static void virtio_ccw_bus_new(VirtioBusState *bus, size_t bus_size,
VirtioCcwDevice *dev);
 
@@ -509,6 +511,11 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 ret = -ENOSYS;
 break;
 }
+if (virtio_get_num_queues(vdev) > NR_CLASSIC_INDICATOR_BITS) {
+/* More queues than indicator bits --> trigger a reject */
+ret = -ENOSYS;
+break;
+}
 if (!ccw.cda) {
 ret = -EFAULT;
 } else {
-- 
2.11.0

[Qemu-devel] [PATCH 10/12] virtio-ccw: support VIRTIO_QUEUE_MAX virtqueues

2017-02-20 Thread Cornelia Huck

From: Halil Pasic 

The maximal number of virtqueues per device can be limited on a per
transport basis. For virtio-ccw this limit is defined by
VIRTIO_CCW_QUEUE_MAX, however the limitation used to come form the
number of adapter routes supported by flic (via notifiers).

Recently the limitation of the flic was adjusted so that it can
accommodate VIRTIO_QUEUE_MAX queues, and is in the meanwhile checked for
separately too.

Let us remove the transport specific limitation of virtio-ccw by
dropping VIRTIO_CCW_QUEUE_MAX and using VIRTIO_QUEUE_MAX instead.

Signed-off-by: Halil Pasic 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/s390-virtio-ccw.c   |  2 +-
 hw/s390x/virtio-ccw.c| 16 
 include/hw/s390x/s390_flic.h |  1 -
 3 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index ea244bbf55..4f0d62b2d8 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -63,7 +63,7 @@ static int virtio_ccw_hcall_notify(const uint64_t *args)
 if (!sch || !css_subch_visible(sch)) {
 return -EINVAL;
 }
-if (queue >= VIRTIO_CCW_QUEUE_MAX) {
+if (queue >= VIRTIO_QUEUE_MAX) {
 return -EINVAL;
 }
 virtio_queue_notify(virtio_ccw_get_vdev(sch), queue);
diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index a2ea95947f..00b3bde4e9 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -128,7 +128,7 @@ static int virtio_ccw_set_vqs(SubchDev *sch, VqInfoBlock 
*info,
 uint16_t num = info ? info->num : linfo->num;
 uint64_t desc = info ? info->desc : linfo->queue;
 
-if (index >= VIRTIO_CCW_QUEUE_MAX) {
+if (index >= VIRTIO_QUEUE_MAX) {
 return -EINVAL;
 }
 
@@ -164,7 +164,7 @@ static int virtio_ccw_set_vqs(SubchDev *sch, VqInfoBlock 
*info,
 virtio_queue_set_vector(vdev, index, index);
 }
 /* tell notify handler in case of config change */
-vdev->config_vector = VIRTIO_CCW_QUEUE_MAX;
+vdev->config_vector = VIRTIO_QUEUE_MAX;
 return 0;
 }
 
@@ -565,7 +565,7 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 ccw.cda,
 MEMTXATTRS_UNSPECIFIED,
 NULL);
-if (vq_config.index >= VIRTIO_CCW_QUEUE_MAX) {
+if (vq_config.index >= VIRTIO_QUEUE_MAX) {
 ret = -EINVAL;
 break;
 }
@@ -960,11 +960,11 @@ static void virtio_ccw_notify(DeviceState *d, uint16_t 
vector)
 uint64_t indicators;
 
 /* queue indicators + secondary indicators */
-if (vector >= VIRTIO_CCW_QUEUE_MAX + 64) {
+if (vector >= VIRTIO_QUEUE_MAX + 64) {
 return;
 }
 
-if (vector < VIRTIO_CCW_QUEUE_MAX) {
+if (vector < VIRTIO_QUEUE_MAX) {
 if (!dev->indicators) {
 return;
 }
@@ -1325,10 +1325,10 @@ static void virtio_ccw_device_plugged(DeviceState *d, 
Error **errp)
 dev->max_rev = 0;
 }
 
-if (virtio_get_num_queues(vdev) > VIRTIO_CCW_QUEUE_MAX) {
+if (virtio_get_num_queues(vdev) > VIRTIO_QUEUE_MAX) {
 error_setg(errp, "The number of virtqueues %d "
-   "exceeds ccw limit %d", n,
-   VIRTIO_CCW_QUEUE_MAX);
+   "exceeds virtio limit %d", n,
+   VIRTIO_QUEUE_MAX);
 return;
 }
 if (virtio_get_num_queues(vdev) > flic->adapter_routes_max_batch) {
diff --git a/include/hw/s390x/s390_flic.h b/include/hw/s390x/s390_flic.h
index 7f8ec7541b..f9e6890c90 100644
--- a/include/hw/s390x/s390_flic.h
+++ b/include/hw/s390x/s390_flic.h
@@ -24,7 +24,6 @@
  * maximum right now.
  */
 #define ADAPTER_ROUTES_MAX_GSI VIRTIO_QUEUE_MAX
-#define VIRTIO_CCW_QUEUE_MAX 64
 
 typedef struct AdapterRoutes {
 AdapterInfo adapter;
-- 
2.11.0

[Qemu-devel] [PATCH v1 05/10] target/ppc: update overflow flags for add/sub

2017-02-20 Thread Nikunj A Dadhania

* SO and OV reflects overflow of the 64-bit result in 64-bit mode and
  overflow of the low-order 32-bit result in 32-bit mode

* OV32 reflects overflow of the low-order 32-bit independent of the mode

Signed-off-by: Nikunj A Dadhania 
---
 target/ppc/translate.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index dd413de..5d8d109 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -809,10 +809,11 @@ static inline void gen_op_arith_compute_ov(DisasContext 
*ctx, TCGv arg0,
 tcg_gen_andc_tl(cpu_ov, cpu_ov, t0);
 }
 tcg_temp_free(t0);
+tcg_gen_extract_tl(cpu_ov32, cpu_ov, 31, 1);
+tcg_gen_extract_tl(cpu_ov, cpu_ov, 63, 1);
 if (NARROW_MODE(ctx)) {
-tcg_gen_ext32s_tl(cpu_ov, cpu_ov);
+tcg_gen_mov_tl(cpu_ov, cpu_ov32);
 }
-tcg_gen_shri_tl(cpu_ov, cpu_ov, TARGET_LONG_BITS - 1);
 tcg_gen_or_tl(cpu_so, cpu_so, cpu_ov);
 }
 
-- 
2.7.4

Re: [Qemu-devel] [PATCH v15 08/25] block: introduce auto-loading bitmaps

2017-02-20 Thread Vladimir Sementsov-Ogievskiy


17.02.2017 17:24, Kevin Wolf wrote:

Am 17.02.2017 um 14:48 hat Denis V. Lunev geschrieben:

On 02/17/2017 04:34 PM, Kevin Wolf wrote:

Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben:

But for sure this is bad from the downtime point of view.
On migrate you will have to write to the image and re-read
it again on the target. This would be very slow. This will
not help for the migration with non-shared disk too.

That is why we have specifically worked in a migration,
which for a good does not influence downtime at all now.

With a write we are issuing several write requests + sync.
Our measurements shows that bdrv_drain could take around
a second on an averagely loaded conventional system, which
seems unacceptable addition to me.

I'm not arguing against optimising migration, I fully agree with you. I
just think that we should start with a correct if slow base version and
then add optimisation to that, instead of starting with a broken base
version and adding to that.

Look, whether you do the expensive I/O on open/close and make that a
slow operation or whether you do it on invalidate_cache/inactivate
doesn't really make a difference in term of slowness because in general
both operations are called exactly once. But it does make a difference
in terms of correctness.

Once you do the optimisation, of course, you'll skip writing those
bitmaps that you transfer using a different channel, no matter whether
you skip it in bdrv_close() or in bdrv_inactivate().

Kevin

I do not understand this point as in order to optimize this
we will have to create specific code path or option from
the migration code and keep this as an ugly kludge forever.

The point that I don't understand is why it makes any difference for the
follow-up migration series whether the writeout is in bdrv_close() or
bdrv_inactivate(). I don't really see the difference between the two
from a migration POV; both need to be skipped if we transfer the bitmap
using a different channel.

Maybe I would see the reason if I could find the time to look at the
migration patches first, but unfortunately I don't have this time at the
moment.

My point is just that generally we want to have a correctly working qemu
after every single patch, and even more importantly after every series.
As the migration series is separate from this, I don't think it's a good
excuse for doing worse than we could easily do here.

Kevin

Bitmaps are not just qcow2 metadata. They belongs to generic block 
layer. And qcow2's ability to store bitmap is used to realize the common 
interface. After bitmap is loaded and turned into BdrvDirtyBitmap it not 
belongs to qcow2.. And qcow2 layer should not touch it on intermediate 
reopens..


We can introduce additional flags to control loading of autoloading 
bitmaps. Isn't it better to handle them in common place for all formats? 
(yes, for now only qcow2 can store bitmaps, but parallels format defines 
this ability too, in future we may have some way to load bitmaps for raw 
format.. Also, there is bitmap related extension for NBD protocol).


bdrv_load_dirty_bitmap, defined in my series about NBD BLOCK_STATUS is 
definetly generic, as it should be called from qmp command.. (just note, 
not concrete argument)



--
Best regards,
Vladimir

[Qemu-devel] [PATCH 09/12] s390x: bump ADAPTER_ROUTES_MAX_GSI

2017-02-20 Thread Cornelia Huck

From: Halil Pasic 

Let's increase ADAPTER_ROUTES_MAX_GSI to VIRTIO_QUEUE_MAX which is the
largest demand foreseeable at the moment. Let us add a compatibility
macro for the previous machines so client code can maintain backwards
migration compatibility

To not mess up migration compatibility for virtio-ccw
VIRTIO_CCW_QUEUE_MAX is left at it's current value, and will be dropped
when virtio-ccw is converted to use the capability of the flic
introduced by this patch.

Signed-off-by: Halil Pasic 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/s390-virtio-ccw.c   |  7 ++-
 include/hw/s390x/s390_flic.h | 10 --
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index e9a676797a..ea244bbf55 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -336,7 +336,12 @@ static const TypeInfo ccw_machine_info = {
 type_init(ccw_machine_register_##suffix)
 
 #define CCW_COMPAT_2_8 \
-HW_COMPAT_2_8
+HW_COMPAT_2_8 \
+{\
+.driver   = TYPE_S390_FLIC_COMMON,\
+.property = "adapter_routes_max_batch",\
+.value= "64",\
+},
 
 #define CCW_COMPAT_2_7 \
 HW_COMPAT_2_7
diff --git a/include/hw/s390x/s390_flic.h b/include/hw/s390x/s390_flic.h
index 9f0b05c71b..7f8ec7541b 100644
--- a/include/hw/s390x/s390_flic.h
+++ b/include/hw/s390x/s390_flic.h
@@ -17,8 +17,14 @@
 #include "hw/s390x/adapter.h"
 #include "hw/virtio/virtio.h"
 
-#define ADAPTER_ROUTES_MAX_GSI 64
-#define VIRTIO_CCW_QUEUE_MAX ADAPTER_ROUTES_MAX_GSI
+/*
+ * Reserve enough gsis to accommodate all virtio devices.
+ * If any other user of adapter routes needs more of these,
+ * we need to bump the value; but virtio looks like the
+ * maximum right now.
+ */
+#define ADAPTER_ROUTES_MAX_GSI VIRTIO_QUEUE_MAX
+#define VIRTIO_CCW_QUEUE_MAX 64
 
 typedef struct AdapterRoutes {
 AdapterInfo adapter;
-- 
2.11.0

[Qemu-devel] [PATCH v1 00/10] POWER9 TCG enablements - part15

2017-02-20 Thread Nikunj A Dadhania

This series contains implentation of CA32 and OV32 bits added to the 
ISA 3.0. Various fixed-point arithmetic instructions are updated to take
care of the newer flags. 

Finally the last patch adds new instruction mcrxrx, that helps reading 
the carry (CA and CA32) and the overflow (OV and OV32) flags

Nikunj A Dadhania (10):
  target/ppc: support for 32-bit carry and overflow
  target/ppc: Update ca32 in arithmetic add
  target/ppc: move subf logic block
  target/ppc: compute ca32 for arithmetic substract
  target/ppc: update overflow flags for add/sub
  target/ppc: use tcg ops for neg instruction
  target/ppc: update ov/ov32 for nego
  target/ppc: add ov32 flag for multiply low insns
  target/ppc: add ov32 flag in divide operations
  target/ppc: add mcrxrx instruction

 target/ppc/cpu.h|  30 +++
 target/ppc/int_helper.c |  49 ++
 target/ppc/translate.c  | 134 +++-
 3 files changed, 155 insertions(+), 58 deletions(-)

-- 
2.7.4

[Qemu-devel] [PATCH v1 06/10] target/ppc: use tcg ops for neg instruction

2017-02-20 Thread Nikunj A Dadhania

Signed-off-by: Nikunj A Dadhania 
---
 target/ppc/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 5d8d109..9fa3b5a 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -1483,7 +1483,7 @@ static inline void gen_op_arith_neg(DisasContext *ctx, 
bool compute_ov)
 
 static void gen_neg(DisasContext *ctx)
 {
-gen_op_arith_neg(ctx, 0);
+tcg_gen_neg_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)]);
 }
 
 static void gen_nego(DisasContext *ctx)
-- 
2.7.4

Re: [Qemu-devel] [PATCH v15 25/25] qcow2-bitmap: improve check_constraints_on_bitmap

2017-02-20 Thread Vladimir Sementsov-Ogievskiy


17.02.2017 18:48, Eric Blake wrote:

On 02/17/2017 04:18 AM, Vladimir Sementsov-Ogievskiy wrote:

16.02.2017 17:21, Kevin Wolf wrote:

Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben:

Add detailed error messages.

Signed-off-by: Vladimir Sementsov-Ogievskiy 

Why not merge this patch into the one that originally introduced the
function?

Just to not create extra work for reviewers

It's extra work for reviewers if you don't rebase obvious fixes where
they belong - a new reviewer may flag the issue in the earlier patch
only to find out later in the series that you've already fixed it.
Avoiding needless code churn is part of what rebasing is all about - you
want each step of the series to be self-contained and as correct as
possible, by adding in the fixes at the point where they make sense,
rather than at the end of the series.



Ok, I'll rebase, It's not a problem.


--
Best regards,
Vladimir

[Qemu-devel] [PATCH v1 08/10] target/ppc: add ov32 flag for multiply low insns

2017-02-20 Thread Nikunj A Dadhania

For Multiply Word:
SO, OV, and OV32 bits reflects overflow of the 32-bit result

For Multiply DoubleWord:
SO, OV, and OV32 bits reflects overflow of the 64-bit result

Signed-off-by: Nikunj A Dadhania 
---
 target/ppc/translate.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 0168e1c..69ec0b2 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -1286,6 +1286,7 @@ static void gen_mullwo(DisasContext *ctx)
 tcg_gen_sari_i32(t0, t0, 31);
 tcg_gen_setcond_i32(TCG_COND_NE, t0, t0, t1);
 tcg_gen_extu_i32_tl(cpu_ov, t0);
+tcg_gen_mov_tl(cpu_ov32, cpu_ov);
 tcg_gen_or_tl(cpu_so, cpu_so, cpu_ov);
 
 tcg_temp_free_i32(t0);
@@ -1347,6 +1348,7 @@ static void gen_mulldo(DisasContext *ctx)
 
 tcg_gen_sari_i64(t0, t0, 63);
 tcg_gen_setcond_i64(TCG_COND_NE, cpu_ov, t0, t1);
+tcg_gen_mov_tl(cpu_ov32, cpu_ov);
 tcg_gen_or_tl(cpu_so, cpu_so, cpu_ov);
 
 tcg_temp_free_i64(t0);
-- 
2.7.4

Re: [Qemu-devel] [PATCH v8 4/8] ACPI: Add Virtual Machine Generation ID support

2017-02-20 Thread Dr. David Alan Gilbert

* Laszlo Ersek (ler...@redhat.com) wrote:
> CC Dave

This isn't an area I really understand; but if I'm
reading this right then 
   vmgenid is stored in fw_cfg?
   fw_cfg isn't migrated

So why should any changes to it get migrated, except if it's already
been read by the guest (and if the guest reads it again aftwards what's
it expected to read?)

Dave

> On 02/17/17 11:43, Igor Mammedov wrote:
> > On Thu, 16 Feb 2017 15:15:36 -0800
> > b...@skyportsystems.com wrote:
> > 
> >> From: Ben Warren 
> >>
> >> This implements the VM Generation ID feature by passing a 128-bit
> >> GUID to the guest via a fw_cfg blob.
> >> Any time the GUID changes, an ACPI notify event is sent to the guest
> >>
> >> The user interface is a simple device with one parameter:
> >>  - guid (string, must be "auto" or in UUID format
> >>----)
> > I've given it some testing with WS2012R2 and v4 patches for Seabios,
> > 
> > Windows is able to read initial GUID allocation and writeback
> > seems to work somehow:
> > 
> > (qemu) info vm-generation-id 
> > c109c09b-0e8b-42d5-9b33-8409c9dcd16c
> > 
> > vmgenid client in Windows reads it as 2 following 64bit integers:
> > 42d50e8bc109c09b:6cd1dcc90984339b
> > 
> > However update path/restore from snapshot doesn't
> > here is as I've tested it:
> > 
> > qemu-system-x86_64 -device vmgenid,id=testvgid,guid=auto -monitor stdio
> > (qemu) info vm-generation-id 
> > c109c09b-0e8b-42d5-9b33-8409c9dcd16c
> > (qemu) stop
> > (qemu) migrate "exec:gzip -c > STATEFILE.gz" 
> > (qemu) quit
> > 
> > qemu-system-x86_64 -device vmgenid,id=testvgid,guid=auto -monitor stdio
> > -incoming "exec: gzip -c -d STATEFILE.gz"
> > (qemu) info vm-generation-id 
> > 28b587fa-991b-4267-80d7-9cf28b746fe9
> > 
> > guest
> >  1. doesn't get GPE notification that it must receive
> >  2. vmgenid client in Windows reads the same value
> >   42d50e8bc109c09b:6cd1dcc90984339b
> 
> Hmmm, I wonder if we need something like this, in vmgenid_post_load():
> 
> commit 90c647db8d59e47c9000affc0d81754eb346e939
> Author: Dr. David Alan Gilbert 
> Date:   Fri Apr 15 12:41:30 2016 +0100
> 
> Fix pflash migration
> 
> with the idea being that in a single device's post_load callback, we
> shouldn't perform machine-wide actions (post_load is likely for fixing
> up the device itself). If machine-wide actions are necessary, we should
> temporarily register a "vm change state handler", and do the thing once
> that handler is called (when the machine has been loaded fully and is
> about to continue execution).
> 
> Can you please try the attached patch on top? (Build tested only.)
> 
> Thanks!
> Laszlo

> diff --git a/include/hw/acpi/vmgenid.h b/include/hw/acpi/vmgenid.h
> index db7fa0e63303..a2ae450b1f56 100644
> --- a/include/hw/acpi/vmgenid.h
> +++ b/include/hw/acpi/vmgenid.h
> @@ -4,6 +4,7 @@
>  #include "hw/acpi/bios-linker-loader.h"
>  #include "hw/qdev.h"
>  #include "qemu/uuid.h"
> +#include "sysemu/sysemu.h"
>  
>  #define VMGENID_DEVICE   "vmgenid"
>  #define VMGENID_GUID "guid"
> @@ -21,6 +22,7 @@ typedef struct VmGenIdState {
>  DeviceClass parent_obj;
>  QemuUUID guid;/* The 128-bit GUID seen by the guest */
>  uint8_t vmgenid_addr_le[8];   /* Address of the GUID (little-endian) */
> +VMChangeStateEntry *vmstate;
>  } VmGenIdState;
>  
>  static inline Object *find_vmgenid_dev(void)
> diff --git a/hw/acpi/vmgenid.c b/hw/acpi/vmgenid.c
> index 9f97b722761b..0ae1d56ff297 100644
> --- a/hw/acpi/vmgenid.c
> +++ b/hw/acpi/vmgenid.c
> @@ -177,10 +177,20 @@ static void vmgenid_set_guid(Object *obj, const char 
> *value, Error **errp)
>  /* After restoring an image, we need to update the guest memory and notify
>   * it of a potential change to VM Generation ID
>   */
> +static void postload_update_guest_cb(void *opaque, int running, RunState 
> state)
> +{
> +VmGenIdState *vms = opaque;
> +
> +qemu_del_vm_change_state_handler(vms->vmstate);
> +vms->vmstate = NULL;
> +vmgenid_update_guest(vms);
> +}
> +
>  static int vmgenid_post_load(void *opaque, int version_id)
>  {
>  VmGenIdState *vms = opaque;
> -vmgenid_update_guest(vms);
> +vms->vmstate = qemu_add_vm_change_state_handler(postload_update_guest_cb,
> +vms);
>  return 0;
>  }
>  

--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[Qemu-devel] [PATCH v1 10/10] target/ppc: add mcrxrx instruction

2017-02-20 Thread Nikunj A Dadhania

mcrxrx: Move to CR from XER Extended

Signed-off-by: Nikunj A Dadhania 
---
 target/ppc/translate.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index ee44205..36d59a5 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3816,6 +3816,28 @@ static void gen_mcrxr(DisasContext *ctx)
 tcg_gen_movi_tl(cpu_ca, 0);
 }
 
+#ifdef TARGET_PPC64
+/* mcrxrx */
+static void gen_mcrxrx(DisasContext *ctx)
+{
+TCGv t0 = tcg_temp_new();
+TCGv t1 = tcg_temp_new();
+TCGv_i32 dst = cpu_crf[crfD(ctx->opcode)];
+
+/* copy OV and OV32 */
+tcg_gen_shli_tl(t0, cpu_ov, 1);
+tcg_gen_or_tl(t0, t0, cpu_ov32);
+tcg_gen_shli_tl(t0, t0, 2);
+/* copy CA and CA32 */
+tcg_gen_shli_tl(t1, cpu_ca, 1);
+tcg_gen_or_tl(t1, t1, cpu_ca32);
+tcg_gen_or_tl(t0, t0, t1);
+tcg_gen_trunc_tl_i32(dst, t0);
+tcg_temp_free(t0);
+tcg_temp_free(t1);
+}
+#endif
+
 /* mfcr mfocrf */
 static void gen_mfcr(DisasContext *ctx)
 {
@@ -6485,6 +6507,7 @@ GEN_HANDLER(mtcrf, 0x1F, 0x10, 0x04, 0x0801, 
PPC_MISC),
 #if defined(TARGET_PPC64)
 GEN_HANDLER(mtmsrd, 0x1F, 0x12, 0x05, 0x001EF801, PPC_64B),
 GEN_HANDLER_E(setb, 0x1F, 0x00, 0x04, 0x0003F801, PPC_NONE, PPC2_ISA300),
+GEN_HANDLER_E(mcrxrx, 0x1F, 0x00, 0x12, 0x007FF801, PPC_NONE, PPC2_ISA300),
 #endif
 GEN_HANDLER(mtmsr, 0x1F, 0x12, 0x04, 0x001EF801, PPC_MISC),
 GEN_HANDLER(mtspr, 0x1F, 0x13, 0x0E, 0x, PPC_MISC),
-- 
2.7.4

Re: [Qemu-devel] [RFC PATCH 15/41] block: Add permissions to blk_new()

2017-02-20 Thread Max Reitz

On 13.02.2017 18:22, Kevin Wolf wrote:
> We want every user to be specific about the permissions it needs, so
> we'll pass the initial permissions as parameters to blk_new(). A user
> only needs to call blk_set_perm() if it wants to change the permissions
> after the fact.
> 
> The permissions are stored in the BlockBackend and applied whenever a
> BlockDriverState should be attached in blk_insert_bs().
> 
> This does not include actually choosing the right set of permissions
> yet. Instead, the usual FIXME comment is added to each place and will be
> addressed in individual patches.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/backup.c   |  3 ++-
>  block/block-backend.c| 12 ++--
>  block/commit.c   | 12 
>  block/mirror.c   |  3 ++-
>  blockdev.c   |  2 +-
>  blockjob.c   |  3 ++-
>  hmp.c|  3 ++-
>  hw/block/fdc.c   |  3 ++-
>  hw/core/qdev-properties-system.c |  3 ++-
>  hw/ide/qdev.c|  3 ++-
>  hw/scsi/scsi-disk.c  |  3 ++-
>  include/sysemu/block-backend.h   |  2 +-
>  migration/block.c|  3 ++-
>  nbd/server.c |  3 ++-
>  tests/test-blockjob.c|  3 ++-
>  tests/test-throttle.c|  7 ---
>  16 files changed, 42 insertions(+), 26 deletions(-)

[...]

> diff --git a/block/block-backend.c b/block/block-backend.c
> index 8f0348d..a219f0b 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -123,14 +123,14 @@ static const BdrvChildRole child_root = {
>   * Store an error through @errp on failure, unless it's null.
>   * Return the new BlockBackend on success, null on failure.
>   */
> -BlockBackend *blk_new(void)
> +BlockBackend *blk_new(uint64_t perm, uint64_t shared_perm)

I think it would be a good idea to document what these parameters do, at
least by pointing to the definition of the permission flags.

Also, this makes me think that the permission flags should not be in
block_int.h but in block.h. This function is available outside of the
core block layer.

>  {
>  BlockBackend *blk;
>  
>  blk = g_new0(BlockBackend, 1);
>  blk->refcnt = 1;
> -blk->perm = 0;
> -blk->shared_perm = BLK_PERM_ALL;
> +blk->perm = perm;
> +blk->shared_perm = shared_perm;
>  blk_set_enable_write_cache(blk, true);
>  
>  qemu_co_queue_init(&blk->public.throttled_reqs[0]);
> @@ -161,7 +161,7 @@ BlockBackend *blk_new_open(const char *filename, const 
> char *reference,
>  BlockBackend *blk;
>  BlockDriverState *bs;
>  
> -blk = blk_new();
> +blk = blk_new(0, BLK_PERM_ALL);
>  bs = bdrv_open(filename, reference, options, flags, errp);
>  if (!bs) {
>  blk_unref(blk);
> @@ -505,9 +505,9 @@ void blk_remove_bs(BlockBackend *blk)
>  void blk_insert_bs(BlockBackend *blk, BlockDriverState *bs)
>  {
>  bdrv_ref(bs);
> -/* FIXME Use real permissions */
>  blk->root = bdrv_root_attach_child(bs, "root", &child_root,
> -   0, BLK_PERM_ALL, blk, &error_abort);
> +   blk->perm, blk->shared_perm, blk,
> +   &error_abort);

And the error_abort does not qualify as a FIXME?

>  
>  notifier_list_notify(&blk->insert_bs_notifiers, blk);
>  if (blk->public.throttle_state) {

[...]

> diff --git a/tests/test-blockjob.c b/tests/test-blockjob.c
> index 068c9e4..1dd1cfa 100644
> --- a/tests/test-blockjob.c
> +++ b/tests/test-blockjob.c
> @@ -53,7 +53,8 @@ static BlockJob *do_test_id(BlockBackend *blk, const char 
> *id,
>   * BlockDriverState inserted. */
>  static BlockBackend *create_blk(const char *name)
>  {
> -BlockBackend *blk = blk_new();
> +/* FIXME Use real permissions */
> +BlockBackend *blk = blk_new(0, BLK_PERM_ALL);
>  BlockDriverState *bs;
>  
>  bs = bdrv_open("null-co://", NULL, NULL, 0, &error_abort);

Pretty much independent of this patch, but: blk_new_open() would
probably be the better choice then to call blk_new() + bdrv_open() +
blk_insert_bs().

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v8 4/8] ACPI: Add Virtual Machine Generation ID support

2017-02-20 Thread Laszlo Ersek

On 02/20/17 11:23, Dr. David Alan Gilbert wrote:
> * Laszlo Ersek (ler...@redhat.com) wrote:
>> CC Dave
> 
> This isn't an area I really understand; but if I'm
> reading this right then 
>vmgenid is stored in fw_cfg?
>fw_cfg isn't migrated
> 
> So why should any changes to it get migrated, except if it's already
> been read by the guest (and if the guest reads it again aftwards what's
> it expected to read?)

This is what we have here:
- QEMU formats read-only fw_cfg blob with GUID
- guest downloads blob, places it in guest RAM
- guest tells QEMU the guest-side address of the blob
- during migration, guest RAM is transferred
- after migration, in the device's post_load callback, QEMU overwrites
  the GUID in guest RAM with a different value, and injects an SCI

I CC'd you for the following reason: Igor reported that he didn't see
either the fresh GUID or the SCI in the guest, on the target host, after
migration. I figured that perhaps there was an ordering issue between
RAM loading and post_load execution on the target host, and so I
proposed to delay the RAM overwrite + SCI injection a bit more;
following the pattern seen in your commit 90c647db8d59.

However, since then, both Ben and myself tested the code with migration
(using "virsh save" (Ben) and "virsh managedsave" (myself)), with
Windows and Linux guests, and it works for us; there seems to be no
ordering issue with the current code (= overwrite RAM + inject SCI in
the post_load callback()).

For now we don't understand why it doesn't work for Igor (Igor used
exec/gzip migration to/from a local file using direct QEMU monitor
commands / options, no libvirt). And, copying the pattern seen in your
commit 90c647db8d59 didn't help in his case (while it wasn't even
necessary for success in Ben's and my testing).

So it seems that delaying the deed with
qemu_add_vm_change_state_handler() is neither needed nor effective in
this case; but then we still don't know why it doesn't work for Igor.

Thanks
Laszlo

> 
> Dave
> 
>> On 02/17/17 11:43, Igor Mammedov wrote:
>>> On Thu, 16 Feb 2017 15:15:36 -0800
>>> b...@skyportsystems.com wrote:
>>>
 From: Ben Warren 

 This implements the VM Generation ID feature by passing a 128-bit
 GUID to the guest via a fw_cfg blob.
 Any time the GUID changes, an ACPI notify event is sent to the guest

 The user interface is a simple device with one parameter:
  - guid (string, must be "auto" or in UUID format
----)
>>> I've given it some testing with WS2012R2 and v4 patches for Seabios,
>>>
>>> Windows is able to read initial GUID allocation and writeback
>>> seems to work somehow:
>>>
>>> (qemu) info vm-generation-id 
>>> c109c09b-0e8b-42d5-9b33-8409c9dcd16c
>>>
>>> vmgenid client in Windows reads it as 2 following 64bit integers:
>>> 42d50e8bc109c09b:6cd1dcc90984339b
>>>
>>> However update path/restore from snapshot doesn't
>>> here is as I've tested it:
>>>
>>> qemu-system-x86_64 -device vmgenid,id=testvgid,guid=auto -monitor stdio
>>> (qemu) info vm-generation-id 
>>> c109c09b-0e8b-42d5-9b33-8409c9dcd16c
>>> (qemu) stop
>>> (qemu) migrate "exec:gzip -c > STATEFILE.gz" 
>>> (qemu) quit
>>>
>>> qemu-system-x86_64 -device vmgenid,id=testvgid,guid=auto -monitor stdio
>>> -incoming "exec: gzip -c -d STATEFILE.gz"
>>> (qemu) info vm-generation-id 
>>> 28b587fa-991b-4267-80d7-9cf28b746fe9
>>>
>>> guest
>>>  1. doesn't get GPE notification that it must receive
>>>  2. vmgenid client in Windows reads the same value
>>>   42d50e8bc109c09b:6cd1dcc90984339b
>>
>> Hmmm, I wonder if we need something like this, in vmgenid_post_load():
>>
>> commit 90c647db8d59e47c9000affc0d81754eb346e939
>> Author: Dr. David Alan Gilbert 
>> Date:   Fri Apr 15 12:41:30 2016 +0100
>>
>> Fix pflash migration
>>
>> with the idea being that in a single device's post_load callback, we
>> shouldn't perform machine-wide actions (post_load is likely for fixing
>> up the device itself). If machine-wide actions are necessary, we should
>> temporarily register a "vm change state handler", and do the thing once
>> that handler is called (when the machine has been loaded fully and is
>> about to continue execution).
>>
>> Can you please try the attached patch on top? (Build tested only.)
>>
>> Thanks!
>> Laszlo
> 
>> diff --git a/include/hw/acpi/vmgenid.h b/include/hw/acpi/vmgenid.h
>> index db7fa0e63303..a2ae450b1f56 100644
>> --- a/include/hw/acpi/vmgenid.h
>> +++ b/include/hw/acpi/vmgenid.h
>> @@ -4,6 +4,7 @@
>>  #include "hw/acpi/bios-linker-loader.h"
>>  #include "hw/qdev.h"
>>  #include "qemu/uuid.h"
>> +#include "sysemu/sysemu.h"
>>  
>>  #define VMGENID_DEVICE   "vmgenid"
>>  #define VMGENID_GUID "guid"
>> @@ -21,6 +22,7 @@ typedef struct VmGenIdState {
>>  DeviceClass parent_obj;
>>  QemuUUID guid;/* The 128-bit GUID seen by the guest */
>>  uint8_t vmgenid_addr_le[8];   /* Address of the GUID (little-endian) */
>> +VMC

[Qemu-devel] [PATCH v1 02/10] target/ppc: Update ca32 in arithmetic add

2017-02-20 Thread Nikunj A Dadhania

Adds routine to compute ca32 - gen_op_arith_compute_ca32

For 64-bit mode use the compute ca32 routine. While for 32-bit mode, CA
and CA32 will have same value.

Signed-off-by: Nikunj A Dadhania 
---
 target/ppc/translate.c | 32 
 1 file changed, 32 insertions(+)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 498b095..2a2d071 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -816,6 +816,36 @@ static inline void gen_op_arith_compute_ov(DisasContext 
*ctx, TCGv arg0,
 tcg_gen_or_tl(cpu_so, cpu_so, cpu_ov);
 }
 
+static inline void gen_op_arith_compute_ca32(DisasContext *ctx, TCGv arg0,
+ TCGv arg1, bool add_ca, int sub)
+{
+TCGv t0 = tcg_temp_new();
+TCGv t1 = tcg_temp_new();
+TCGv inv0 = tcg_temp_new();
+
+tcg_gen_extract_tl(t0, arg0, 0, 32);
+tcg_gen_extract_tl(t1, arg1, 0, 32);
+if (sub) {
+tcg_gen_not_tl(inv0, t0);
+if (add_ca) {
+tcg_gen_add_tl(t1, t1, cpu_ca32);
+} else {
+tcg_gen_addi_tl(t1, t1, 1);
+}
+tcg_gen_add_tl(t0, t1, inv0);
+tcg_gen_extract_tl(cpu_ca32, t0, 32, 1);
+} else {
+tcg_gen_add_tl(t0, t0, t1);
+if (add_ca) {
+tcg_gen_add_tl(t0, t0, cpu_ca32);
+}
+tcg_gen_extract_tl(cpu_ca32, t0, 32, 1);
+}
+tcg_temp_free(t0);
+tcg_temp_free(t1);
+}
+
+
 /* Common add function */
 static inline void gen_op_arith_add(DisasContext *ctx, TCGv ret, TCGv arg1,
 TCGv arg2, bool add_ca, bool compute_ca,
@@ -842,6 +872,7 @@ static inline void gen_op_arith_add(DisasContext *ctx, TCGv 
ret, TCGv arg1,
 tcg_temp_free(t1);
 tcg_gen_shri_tl(cpu_ca, cpu_ca, 32);   /* extract bit 32 */
 tcg_gen_andi_tl(cpu_ca, cpu_ca, 1);
+tcg_gen_mov_tl(cpu_ca32, cpu_ca);
 } else {
 TCGv zero = tcg_const_tl(0);
 if (add_ca) {
@@ -850,6 +881,7 @@ static inline void gen_op_arith_add(DisasContext *ctx, TCGv 
ret, TCGv arg1,
 } else {
 tcg_gen_add2_tl(t0, cpu_ca, arg1, zero, arg2, zero);
 }
+gen_op_arith_compute_ca32(ctx, arg1, arg2, add_ca, 0);
 tcg_temp_free(zero);
 }
 } else {
-- 
2.7.4

[Qemu-devel] [PATCH v1 04/10] target/ppc: compute ca32 for arithmetic substract

2017-02-20 Thread Nikunj A Dadhania

For 64-bit mode use the compute ca32 routine. While for 32-bit mode, CA
and CA32 will have same value.

Signed-off-by: Nikunj A Dadhania 
---
 target/ppc/translate.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 77045be..dd413de 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -1389,6 +1389,7 @@ static inline void gen_op_arith_subf(DisasContext *ctx, 
TCGv ret, TCGv arg1,
 tcg_temp_free(t1);
 tcg_gen_shri_tl(cpu_ca, cpu_ca, 32);/* extract bit 32 */
 tcg_gen_andi_tl(cpu_ca, cpu_ca, 1);
+tcg_gen_mov_tl(cpu_ca32, cpu_ca);
 } else {
 if (add_ca) {
 TCGv zero, inv1 = tcg_temp_new();
@@ -1402,6 +1403,7 @@ static inline void gen_op_arith_subf(DisasContext *ctx, 
TCGv ret, TCGv arg1,
 tcg_gen_setcond_tl(TCG_COND_GEU, cpu_ca, arg2, arg1);
 tcg_gen_sub_tl(t0, arg2, arg1);
 }
+gen_op_arith_compute_ca32(ctx, arg1, arg2, add_ca, 1);
 }
 } else if (add_ca) {
 /* Since we're ignoring carry-out, we can simplify the
-- 
2.7.4

[Qemu-devel] [PATCH v1 07/10] target/ppc: update ov/ov32 for nego

2017-02-20 Thread Nikunj A Dadhania

For 64-bit mode if the register RA contains 0x8000___, OV
and OV32 are set to 1.

For 32-bit mode if the register RA contains 0x8000_, OV and OV32 are
set to 1.

Use the tcg-ops for negation (neg_tl) and drop gen_op_arith_neg() as
nego was the last user.

Signed-off-by: Nikunj A Dadhania 
---
 target/ppc/translate.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 9fa3b5a..0168e1c 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -1473,14 +1473,6 @@ static void gen_subfic(DisasContext *ctx)
 }
 
 /* neg neg. nego nego. */
-static inline void gen_op_arith_neg(DisasContext *ctx, bool compute_ov)
-{
-TCGv zero = tcg_const_tl(0);
-gen_op_arith_subf(ctx, cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)],
-  zero, 0, 0, compute_ov, Rc(ctx->opcode));
-tcg_temp_free(zero);
-}
-
 static void gen_neg(DisasContext *ctx)
 {
 tcg_gen_neg_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)]);
@@ -1488,7 +1480,20 @@ static void gen_neg(DisasContext *ctx)
 
 static void gen_nego(DisasContext *ctx)
 {
-gen_op_arith_neg(ctx, 1);
+TCGv t0 = tcg_temp_new();
+TCGv zero = tcg_const_tl(0);
+
+if (NARROW_MODE(ctx)) {
+tcg_gen_xori_tl(t0, cpu_gpr[rA(ctx->opcode)], INT32_MIN);
+} else {
+tcg_gen_xori_tl(t0, cpu_gpr[rA(ctx->opcode)], (target_ulong)INT64_MIN);
+}
+
+tcg_gen_setcond_tl(TCG_COND_EQ, cpu_ov, t0, zero);
+tcg_gen_mov_tl(cpu_ov32, cpu_ov);
+tcg_gen_neg_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)]);
+tcg_temp_free(t0);
+tcg_temp_free(zero);
 }
 
 /***Integer logical***/
-- 
2.7.4

[Qemu-devel] [PATCH v1 09/10] target/ppc: add ov32 flag in divide operations

2017-02-20 Thread Nikunj A Dadhania

Add helper_div_compute_ov() in the int_helper for updating the overflow
flags.

For Divide Word:
SO, OV, and OV32 bits reflects overflow of the 32-bit result

For Divide DoubleWord:
SO, OV, and OV32 bits reflects overflow of the 64-bit result

Signed-off-by: Nikunj A Dadhania 
---
 target/ppc/int_helper.c | 49 -
 target/ppc/translate.c  |  6 --
 2 files changed, 20 insertions(+), 35 deletions(-)

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index dd0a892..34b54e1 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -28,6 +28,18 @@
 /*/
 /* Fixed point operations helpers */
 
+static inline void helper_div_compute_ov(CPUPPCState *env, uint32_t oe,
+ int overflow)
+{
+if (oe) {
+if (unlikely(overflow)) {
+env->so = env->ov = env->ov32 = 1;
+} else {
+env->ov = env->ov32 = 0;
+}
+}
+}
+
 target_ulong helper_divweu(CPUPPCState *env, target_ulong ra, target_ulong rb,
uint32_t oe)
 {
@@ -48,14 +60,7 @@ target_ulong helper_divweu(CPUPPCState *env, target_ulong 
ra, target_ulong rb,
 rt = 0; /* Undefined */
 }
 
-if (oe) {
-if (unlikely(overflow)) {
-env->so = env->ov = 1;
-} else {
-env->ov = 0;
-}
-}
-
+helper_div_compute_ov(env, oe, overflow);
 return (target_ulong)rt;
 }
 
@@ -80,14 +85,7 @@ target_ulong helper_divwe(CPUPPCState *env, target_ulong ra, 
target_ulong rb,
 rt = 0; /* Undefined */
 }
 
-if (oe) {
-if (unlikely(overflow)) {
-env->so = env->ov = 1;
-} else {
-env->ov = 0;
-}
-}
-
+helper_div_compute_ov(env, oe, overflow);
 return (target_ulong)rt;
 }
 
@@ -104,14 +102,7 @@ uint64_t helper_divdeu(CPUPPCState *env, uint64_t ra, 
uint64_t rb, uint32_t oe)
 rt = 0; /* Undefined */
 }
 
-if (oe) {
-if (unlikely(overflow)) {
-env->so = env->ov = 1;
-} else {
-env->ov = 0;
-}
-}
-
+helper_div_compute_ov(env, oe, overflow);
 return rt;
 }
 
@@ -126,15 +117,7 @@ uint64_t helper_divde(CPUPPCState *env, uint64_t rau, 
uint64_t rbu, uint32_t oe)
 rt = 0; /* Undefined */
 }
 
-if (oe) {
-
-if (unlikely(overflow)) {
-env->so = env->ov = 1;
-} else {
-env->ov = 0;
-}
-}
-
+helper_div_compute_ov(env, oe, overflow);
 return rt;
 }
 
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 69ec0b2..ee44205 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -1022,6 +1022,7 @@ static inline void gen_op_arith_divw(DisasContext *ctx, 
TCGv ret, TCGv arg1,
 }
 if (compute_ov) {
 tcg_gen_extu_i32_tl(cpu_ov, t2);
+tcg_gen_extu_i32_tl(cpu_ov32, t2);
 tcg_gen_or_tl(cpu_so, cpu_so, cpu_ov);
 }
 tcg_temp_free_i32(t0);
@@ -1093,6 +1094,7 @@ static inline void gen_op_arith_divd(DisasContext *ctx, 
TCGv ret, TCGv arg1,
 }
 if (compute_ov) {
 tcg_gen_mov_tl(cpu_ov, t2);
+tcg_gen_mov_tl(cpu_ov32, t2);
 tcg_gen_or_tl(cpu_so, cpu_so, cpu_ov);
 }
 tcg_temp_free_i64(t0);
@@ -,10 +1113,10 @@ static void glue(gen_, name)(DisasContext *ctx)
   cpu_gpr[rA(ctx->opcode)], cpu_gpr[rB(ctx->opcode)], \
   sign, compute_ov);  \
 }
-/* divwu  divwu.  divwuo  divwuo.   */
+/* divdu  divdu.  divduo  divduo.   */
 GEN_INT_ARITH_DIVD(divdu, 0x0E, 0, 0);
 GEN_INT_ARITH_DIVD(divduo, 0x1E, 0, 1);
-/* divw  divw.  divwo  divwo.   */
+/* divd  divd.  divdo  divdo.   */
 GEN_INT_ARITH_DIVD(divd, 0x0F, 1, 0);
 GEN_INT_ARITH_DIVD(divdo, 0x1F, 1, 1);
 
-- 
2.7.4

[Qemu-devel] [PATCH v5 4/4] MAINTAINERS: merge Build and test automation with Docker tests

2017-02-20 Thread Alex Bennée

The docker framework is really just another piece in the build
automation puzzle so lets merge it together. For added bonus I've also
included the Travis and Patchew status links. The Shippable links will
be added later once mainline tests have been configured and setup.

Signed-off-by: Alex Bennée 
Reviewed-by: Fam Zheng 

---
v4
  - fix merge fail (.shippable.yml)
---
 MAINTAINERS | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6dcbebf072..d3782691dc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1801,10 +1801,14 @@ F: docs/block-replication.txt
 Build and test automation
 -
 M: Alex Bennée 
+M: Fam Zheng 
 L: qemu-devel@nongnu.org
-S: Supported
+S: Maintained
 F: .travis.yml
 F: .shippable.yml
+F: tests/docker/
+W: https://travis-ci.org/qemu/qemu
+W: http://patchew.org/QEMU/
 
 Documentation
 -
@@ -1813,9 +1817,3 @@ M: Daniel P. Berrange 
 S: Odd Fixes
 F: docs/build-system.txt
 
-Docker testing
---
-Docker based testing framework and cases
-M: Fam Zheng 
-S: Maintained
-F: tests/docker/
-- 
2.11.0

[Qemu-devel] [PATCH v5 3/4] .shippable.yml: new CI provider

2017-02-20 Thread Alex Bennée

Ostensibly Shippable offers a similar set of services as Travis.
However they are focused on Docker container based work-flows so we
can use our existing containers to run a few extra builds - in this
case a bunch of cross-compiled targets on a Debian multiarch system.

Signed-off-by: Alex Bennée 
Reviewed-by: Fam Zheng 

---
v3
  - reduce matrix to armhf/arm64 which currently work
  - use the make docker-image-* build stanzas
  - add TARGET_LIST to each build
v5
  - add .shippable to MAINTAINER
  - drop centos6 build, shippable guests still need to be apt based
---
 .shippable.yml | 19 +++
 MAINTAINERS|  1 +
 2 files changed, 20 insertions(+)
 create mode 100644 .shippable.yml

diff --git a/.shippable.yml b/.shippable.yml
new file mode 100644
index 00..1a1fd7a91d
--- /dev/null
+++ b/.shippable.yml
@@ -0,0 +1,19 @@
+language: c
+env:
+  matrix:
+- IMAGE=debian-armhf-cross
+  TARGET_LIST=arm-softmmu,arm-linux-user
+- IMAGE=debian-arm64-cross
+  TARGET_LIST=aarch64-softmmu,aarch64-linux-user
+build:
+  pre_ci:
+- make docker-image-${IMAGE}
+  pre_ci_boot:
+image_name: qemu
+image_tag: ${IMAGE}
+pull: false
+options: "-e HOME=/root"
+  ci:
+- unset CC
+- ./configure ${QEMU_CONFIGURE_OPTS} --target-list=${TARGET_LIST}
+- make -j2
diff --git a/MAINTAINERS b/MAINTAINERS
index fb57d8eb45..6dcbebf072 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1804,6 +1804,7 @@ M: Alex Bennée 
 L: qemu-devel@nongnu.org
 S: Supported
 F: .travis.yml
+F: .shippable.yml
 
 Documentation
 -
-- 
2.11.0

[Qemu-devel] [PATCH v5 0/4] Docker cross-compile targets and user build support

2017-02-20 Thread Alex Bennée

Hi Fam,

Hopefully this is the final iteration. A couple of minor typos fixes
and your suggestions taken into account. I have also added some
review/tesing tags from Philippe.

Alex.


Alex Bennée (4):
  tests/docker: add basic user mapping support
  new: debian docker targets for cross-compiling
  .shippable.yml: new CI provider
  MAINTAINERS: merge Build and test automation with Docker tests

 .shippable.yml | 19 
 MAINTAINERS| 13 ++-
 tests/docker/Makefile.include  |  6 ++
 tests/docker/common.rc |  2 +-
 tests/docker/docker.py | 16 --
 tests/docker/dockerfiles/debian-arm64-cross.docker | 15 +
 tests/docker/dockerfiles/debian-armhf-cross.docker | 15 +
 tests/docker/dockerfiles/debian.docker | 25 ++
 8 files changed, 101 insertions(+), 10 deletions(-)
 create mode 100644 .shippable.yml
 create mode 100644 tests/docker/dockerfiles/debian-arm64-cross.docker
 create mode 100644 tests/docker/dockerfiles/debian-armhf-cross.docker
 create mode 100644 tests/docker/dockerfiles/debian.docker

-- 
2.11.0

[Qemu-devel] [PATCH v5 1/4] tests/docker: add basic user mapping support

2017-02-20 Thread Alex Bennée

Currently all docker builds are done by exporting a tarball to the
docker container and running the build as the containers root user.
Other use cases are possible however and it is possible to map a part
of users file-system to the container. This is useful for example for
doing cross-builds of arbitrary source trees. For this to work
smoothly the container needs to have a user created that maps cleanly
to the host system.

This adds a -u option to the docker script so that:

  DEB_ARCH=armhf DEB_TYPE=stable ./tests/docker/docker.py build \
-u --include-executable=arm-linux-user/qemu-arm \
debian:armhf ./tests/docker/dockerfiles/debian-bootstrap.docker

Will build a container that can then be run like:

  docker run --rm -it -v /home/alex/lsrc/qemu/risu.git/:/src \
--user=alex:alex -w /src/ debian:armhf \
sh -c "make clean && ./configure -s && make"

All docker containers built will add the current user unless
explicitly disabled by specifying NOUSER when invoking the Makefile:

  make docker-image-debian-armhf-cross NOUSER=1

Signed-off-by: Alex Bennée 
Reviewed-by: Fam Zheng 
Tested-by: Philippe Mathieu-Daudé 
Reviewed-by: Philippe Mathieu-Daudé 

---
v2
  - write the useradd directly
  - change long option to --add-current-user
v3
  - images -> image's
  - add r-b
  - add USER to Makefile
v4
  - s/USER/NOUSER/ and default to on
  - fix the add-user code to skip if user already setup (for chained images)
v5
  - fix whitespace damage in Makefile
  - hide error messages from id when adding user
  - minor re-phrasing of NOUSER line
---
 tests/docker/Makefile.include |  2 ++
 tests/docker/docker.py| 16 ++--
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
index 3f15d5aea8..3b5ffecb04 100644
--- a/tests/docker/Makefile.include
+++ b/tests/docker/Makefile.include
@@ -50,6 +50,7 @@ docker-image-%: $(DOCKER_FILES_DIR)/%.docker
$(call quiet-command,\
$(SRC_PATH)/tests/docker/docker.py build qemu:$* $< \
$(if $V,,--quiet) $(if $(NOCACHE),--no-cache) \
+   $(if $(NOUSER),,--add-current-user) \
$(if $(EXECUTABLE),--include-executable=$(EXECUTABLE)),\
"BUILD","$*")
 
@@ -99,6 +100,7 @@ docker:
@echo ' (default is 1)'
@echo 'DEBUG=1  Stop and drop to shell in the created 
container'
@echo ' before running the command.'
+   @echo 'NOUSER   Define to disable adding current user 
to containers passwd.'
@echo 'NOCACHE=1Ignore cache when build images.'
@echo 'EXECUTABLE=Include executable in image.'
 
diff --git a/tests/docker/docker.py b/tests/docker/docker.py
index 37d83199e7..9fd32ab5fa 100755
--- a/tests/docker/docker.py
+++ b/tests/docker/docker.py
@@ -25,6 +25,7 @@ import signal
 from tarfile import TarFile, TarInfo
 from StringIO import StringIO
 from shutil import copy, rmtree
+from pwd import getpwuid
 
 
 DEVNULL = open(os.devnull, 'wb')
@@ -149,13 +150,21 @@ class Docker(object):
 labels = json.loads(resp)[0]["Config"].get("Labels", {})
 return labels.get("com.qemu.dockerfile-checksum", "")
 
-def build_image(self, tag, docker_dir, dockerfile, quiet=True, argv=None):
+def build_image(self, tag, docker_dir, dockerfile,
+quiet=True, user=False, argv=None):
 if argv == None:
 argv = []
 
 tmp_df = tempfile.NamedTemporaryFile(dir=docker_dir, suffix=".docker")
 tmp_df.write(dockerfile)
 
+if user:
+uid = os.getuid()
+uname = getpwuid(uid).pw_name
+tmp_df.write("\n")
+tmp_df.write("RUN id %s 2>/dev/null || useradd -u %d -U %s" %
+ (uname, uid, uname))
+
 tmp_df.write("\n")
 tmp_df.write("LABEL com.qemu.dockerfile-checksum=%s" %
  _text_checksum(dockerfile))
@@ -225,6 +234,9 @@ class BuildCommand(SubCommand):
 help="""Specify a binary that will be copied to the
 container together with all its dependent
 libraries""")
+parser.add_argument("--add-current-user", "-u", dest="user",
+action="store_true",
+help="Add the current user to image's passwd")
 parser.add_argument("tag",
 help="Image Tag")
 parser.add_argument("dockerfile",
@@ -261,7 +273,7 @@ class BuildCommand(SubCommand):
docker_dir)
 
 dkr.build_image(tag, docker_dir, dockerfile,
-quiet=args.quiet, argv=argv)
+quiet=args.quiet, user=args.user, argv=argv)
 
 rmtree(docker_dir)
 
-- 
2.11.0

[Qemu-devel] [PATCH v5 2/4] new: debian docker targets for cross-compiling

2017-02-20 Thread Alex Bennée

This provides a basic Debian install with access to the emdebian cross
compilers. The debian-armhf-cross and debian-arm64-cross targets build
on the basic Debian image to allow cross compiling to those targets.

A new environment variable (QEMU_CONFIGURE_OPTS) is set as part of the
docker container and passed to the build to specify the
--cross-prefix. The user still calls the build in the usual way, for
example:

  make docker-test-build@debian-arm64-cross \
TARGET_LIST="aarch64-softmmu,aarch64-linux-user"

Signed-off-by: Alex Bennée 
Reviewed-by: Fam Zheng 
Reviewed-by: Philippe Mathieu-Daudé 

---
v2
  - add clang (keep shippable happy)
  - rm adduser code (done direct now)
  - add aptitude (useful for debugging package clashes)
v3
  - split into debian, debian-armhf-cross and debian-aarch64-cross
v4
  - Add QEMU_CONFIGURE_OPTS
v5
  - s/dependacies/dependencies/
  - add r-b
---
 tests/docker/Makefile.include  |  4 
 tests/docker/common.rc |  2 +-
 tests/docker/dockerfiles/debian-arm64-cross.docker | 15 +
 tests/docker/dockerfiles/debian-armhf-cross.docker | 15 +
 tests/docker/dockerfiles/debian.docker | 25 ++
 5 files changed, 60 insertions(+), 1 deletion(-)
 create mode 100644 tests/docker/dockerfiles/debian-arm64-cross.docker
 create mode 100644 tests/docker/dockerfiles/debian-armhf-cross.docker
 create mode 100644 tests/docker/dockerfiles/debian.docker

diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
index 3b5ffecb04..03eda37bf4 100644
--- a/tests/docker/Makefile.include
+++ b/tests/docker/Makefile.include
@@ -54,6 +54,10 @@ docker-image-%: $(DOCKER_FILES_DIR)/%.docker
$(if $(EXECUTABLE),--include-executable=$(EXECUTABLE)),\
"BUILD","$*")
 
+# Enforce dependancies for composite images
+docker-image-debian-armhf-cross: docker-image-debian
+docker-image-debian-arm64-cross: docker-image-debian
+
 # Expand all the pre-requistes for each docker image and test combination
 $(foreach i,$(DOCKER_IMAGES), \
$(foreach t,$(DOCKER_TESTS) $(DOCKER_TOOLS), \
diff --git a/tests/docker/common.rc b/tests/docker/common.rc
index 21657e87c6..6865689bb5 100755
--- a/tests/docker/common.rc
+++ b/tests/docker/common.rc
@@ -29,7 +29,7 @@ build_qemu()
 config_opts="--enable-werror \
  ${TARGET_LIST:+--target-list=${TARGET_LIST}} \
  --prefix=$PWD/install \
- $EXTRA_CONFIGURE_OPTS \
+ $QEMU_CONFIGURE_OPTS $EXTRA_CONFIGURE_OPTS \
  $@"
 echo "Configure options:"
 echo $config_opts
diff --git a/tests/docker/dockerfiles/debian-arm64-cross.docker 
b/tests/docker/dockerfiles/debian-arm64-cross.docker
new file mode 100644
index 00..592b5d7055
--- /dev/null
+++ b/tests/docker/dockerfiles/debian-arm64-cross.docker
@@ -0,0 +1,15 @@
+#
+# Docker arm64 cross-compiler target
+#
+# This docker target builds on the base debian image.
+#
+FROM qemu:debian
+
+# Add the foreign architecture we want and install dependencies
+RUN dpkg --add-architecture arm64
+RUN apt update
+RUN apt install -yy crossbuild-essential-arm64
+RUN apt-get build-dep -yy -a arm64 qemu
+
+# Specify the cross prefix for this image (see tests/docker/common.rc)
+ENV QEMU_CONFIGURE_OPTS --cross-prefix=aarch64-linux-gnu-
diff --git a/tests/docker/dockerfiles/debian-armhf-cross.docker 
b/tests/docker/dockerfiles/debian-armhf-cross.docker
new file mode 100644
index 00..668d60aeb3
--- /dev/null
+++ b/tests/docker/dockerfiles/debian-armhf-cross.docker
@@ -0,0 +1,15 @@
+#
+# Docker armhf cross-compiler target
+#
+# This docker target builds on the base debian image.
+#
+FROM qemu:debian
+
+# Add the foreign architecture we want and install dependencies
+RUN dpkg --add-architecture armhf
+RUN apt update
+RUN apt install -yy crossbuild-essential-armhf
+RUN apt-get build-dep -yy -a armhf qemu
+
+# Specify the cross prefix for this image (see tests/docker/common.rc)
+ENV QEMU_CONFIGURE_OPTS --cross-prefix=arm-linux-gnueabihf-
diff --git a/tests/docker/dockerfiles/debian.docker 
b/tests/docker/dockerfiles/debian.docker
new file mode 100644
index 00..52bd79938e
--- /dev/null
+++ b/tests/docker/dockerfiles/debian.docker
@@ -0,0 +1,25 @@
+#
+# Docker multiarch cross-compiler target
+#
+# This docker target is builds on Debian and Emdebian's cross compiler targets
+# to build distro with a selection of cross compilers for building test 
binaries.
+#
+# On its own you can't build much but the docker-foo-cross targets
+# build on top of the base debian image.
+#
+FROM debian:stable-slim
+
+# Setup some basic tools we need
+RUN apt update
+RUN apt install -yy curl aptitude
+
+# Setup Emdebian
+RUN echo "deb http://emdebian.org/tools/debian/ jessie main" >> 
/etc/apt/sources.list
+RUN curl http://emdebian.org/tools/debian/emdebian-toolchain-archive.key | 
apt-key add -
+
+# Duplicate deb line as deb-s

Re: [Qemu-devel] [PATCH v8 4/8] ACPI: Add Virtual Machine Generation ID support

2017-02-20 Thread Dr. David Alan Gilbert

* Laszlo Ersek (ler...@redhat.com) wrote:
> On 02/20/17 11:23, Dr. David Alan Gilbert wrote:
> > * Laszlo Ersek (ler...@redhat.com) wrote:
> >> CC Dave
> > 
> > This isn't an area I really understand; but if I'm
> > reading this right then 
> >vmgenid is stored in fw_cfg?
> >fw_cfg isn't migrated
> > 
> > So why should any changes to it get migrated, except if it's already
> > been read by the guest (and if the guest reads it again aftwards what's
> > it expected to read?)
> 
> This is what we have here:
> - QEMU formats read-only fw_cfg blob with GUID
> - guest downloads blob, places it in guest RAM
> - guest tells QEMU the guest-side address of the blob
> - during migration, guest RAM is transferred
> - after migration, in the device's post_load callback, QEMU overwrites
>   the GUID in guest RAM with a different value, and injects an SCI
> 
> I CC'd you for the following reason: Igor reported that he didn't see
> either the fresh GUID or the SCI in the guest, on the target host, after
> migration. I figured that perhaps there was an ordering issue between
> RAM loading and post_load execution on the target host, and so I
> proposed to delay the RAM overwrite + SCI injection a bit more;
> following the pattern seen in your commit 90c647db8d59.
> 
> However, since then, both Ben and myself tested the code with migration
> (using "virsh save" (Ben) and "virsh managedsave" (myself)), with
> Windows and Linux guests, and it works for us; there seems to be no
> ordering issue with the current code (= overwrite RAM + inject SCI in
> the post_load callback()).
> 
> For now we don't understand why it doesn't work for Igor (Igor used
> exec/gzip migration to/from a local file using direct QEMU monitor
> commands / options, no libvirt). And, copying the pattern seen in your
> commit 90c647db8d59 didn't help in his case (while it wasn't even
> necessary for success in Ben's and my testing).

One thing I noticed in Igor's test was that he did a 'stop' on the source
before the migate, and so it's probably still paused on the destination
after the migration is loaded, so anything the guest needs to do might
not have happened until it's started.

You say;
   'guest tells QEMU the guest-side address of the blob'
how is that stored/migrated/etc ?


> So it seems that delaying the deed with
> qemu_add_vm_change_state_handler() is neither needed nor effective in
> this case; but then we still don't know why it doesn't work for Igor.

Nod.

Dave

> 
> Thanks
> Laszlo
> 
> > 
> > Dave
> > 
> >> On 02/17/17 11:43, Igor Mammedov wrote:
> >>> On Thu, 16 Feb 2017 15:15:36 -0800
> >>> b...@skyportsystems.com wrote:
> >>>
>  From: Ben Warren 
> 
>  This implements the VM Generation ID feature by passing a 128-bit
>  GUID to the guest via a fw_cfg blob.
>  Any time the GUID changes, an ACPI notify event is sent to the guest
> 
>  The user interface is a simple device with one parameter:
>   - guid (string, must be "auto" or in UUID format
> ----)
> >>> I've given it some testing with WS2012R2 and v4 patches for Seabios,
> >>>
> >>> Windows is able to read initial GUID allocation and writeback
> >>> seems to work somehow:
> >>>
> >>> (qemu) info vm-generation-id 
> >>> c109c09b-0e8b-42d5-9b33-8409c9dcd16c
> >>>
> >>> vmgenid client in Windows reads it as 2 following 64bit integers:
> >>> 42d50e8bc109c09b:6cd1dcc90984339b
> >>>
> >>> However update path/restore from snapshot doesn't
> >>> here is as I've tested it:
> >>>
> >>> qemu-system-x86_64 -device vmgenid,id=testvgid,guid=auto -monitor stdio
> >>> (qemu) info vm-generation-id 
> >>> c109c09b-0e8b-42d5-9b33-8409c9dcd16c
> >>> (qemu) stop
> >>> (qemu) migrate "exec:gzip -c > STATEFILE.gz" 
> >>> (qemu) quit
> >>>
> >>> qemu-system-x86_64 -device vmgenid,id=testvgid,guid=auto -monitor stdio
> >>> -incoming "exec: gzip -c -d STATEFILE.gz"
> >>> (qemu) info vm-generation-id 
> >>> 28b587fa-991b-4267-80d7-9cf28b746fe9
> >>>
> >>> guest
> >>>  1. doesn't get GPE notification that it must receive
> >>>  2. vmgenid client in Windows reads the same value
> >>>   42d50e8bc109c09b:6cd1dcc90984339b
> >>
> >> Hmmm, I wonder if we need something like this, in vmgenid_post_load():
> >>
> >> commit 90c647db8d59e47c9000affc0d81754eb346e939
> >> Author: Dr. David Alan Gilbert 
> >> Date:   Fri Apr 15 12:41:30 2016 +0100
> >>
> >> Fix pflash migration
> >>
> >> with the idea being that in a single device's post_load callback, we
> >> shouldn't perform machine-wide actions (post_load is likely for fixing
> >> up the device itself). If machine-wide actions are necessary, we should
> >> temporarily register a "vm change state handler", and do the thing once
> >> that handler is called (when the machine has been loaded fully and is
> >> about to continue execution).
> >>
> >> Can you please try the attached patch on top? (Build tested only.)
> >>
> >> Thanks!
> >> Laszlo
> > 
> >> diff --git a/include/hw/acpi/vmgen

Re: [Qemu-devel] [PATCH v8 1/2] block/vxhs.c: Add support for a new block device type called "vxhs"

2017-02-20 Thread Stefan Hajnoczi

On Sat, Feb 18, 2017 at 12:30:31AM +, Ketan Nilangekar wrote:
> On 2/17/17, 1:42 PM, "Jeff Cody"  wrote:
> 
> On Thu, Feb 16, 2017 at 02:24:19PM -0800, ashish mittal wrote:
> > Hi,
> > 
> > I am getting the following error with checkpatch.pl
> > 
> > ERROR: externs should be avoided in .c files
> > #78: FILE: block/vxhs.c:28:
> > +QemuUUID qemu_uuid __attribute__ ((weak));
> > 
> > Is there any way to get around this, or does it mean that I would have
> > to add a vxhs.h just for this one entry?
> >
> 
> I remain skeptical on the use of the qemu_uuid as a way to select the TLS
> cert.
> 
> [ketan]
> Is there another identity that can be used for uniquely identifying instances?
> The requirement was to enforce vdisk access to owner instances.

The qemu_uuid weak attribute looks suspect.  What is going to provide a
strong qemu_uuid symbol?

Why aren't configuration parameters like the UUID coming from the QEMU
command-line?

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-devel] [RFC PATCH 16/41] block: Add error parameter to blk_insert_bs()

2017-02-20 Thread Max Reitz

On 13.02.2017 18:22, Kevin Wolf wrote:
> Mow that blk_insert_bs() requests the BlockBackend permissions for the
> node it attaches to, it can fail. Instead of aborting, pass the errors
> to the callers.

So it does qualify as a FIXME, but just for a single patch. Good. :-)

> Signed-off-by: Kevin Wolf 
> ---
>  block/backup.c   |  5 -
>  block/block-backend.c| 12 
>  block/commit.c   | 38 ++
>  block/mirror.c   | 15 ---
>  blockdev.c   |  6 +-
>  blockjob.c   |  7 ++-
>  hmp.c|  6 +-
>  hw/core/qdev-properties-system.c |  7 ++-
>  include/sysemu/block-backend.h   |  2 +-
>  migration/block.c|  2 +-
>  nbd/server.c |  6 +-
>  tests/test-blockjob.c|  2 +-
>  12 files changed, 84 insertions(+), 24 deletions(-)

[...]

> diff --git a/migration/block.c b/migration/block.c
> index 6b7ffd4..d259936 100644
> --- a/migration/block.c
> +++ b/migration/block.c
> @@ -446,7 +446,7 @@ static void init_blk_migration(QEMUFile *f)
>  BlockDriverState *bs = bmds_bs[i].bs;
>  
>  if (bmds) {
> -blk_insert_bs(bmds->blk, bs);
> +blk_insert_bs(bmds->blk, bs, &error_abort);

I don't think it's obvious why this is correct. I assume it is because
this was a legal configuration on the source, so it must be a legal
configuration on the destination.

But I'd certainly appreciate a comment to make that explicitly clear,
especially considering that it isn't obvious that blk_insert_bs() can
fail only because of op blockers and thus will always work if the
configuration is known to be legal.

Max

>  
>  alloc_aio_bitmap(bmds);
>  error_setg(&bmds->blocker, "block device is in use by 
> migration");



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Help] Windows2012 as Guest 64+cores on KVM Halts

2017-02-20 Thread Paolo Bonzini



On 20/02/2017 10:19, Gonglei (Arei) wrote:
> Hi Paolo,
> 
>>
>>
>> On 16/02/2017 02:31, Gonglei (Arei) wrote:
>>> And the below patch works for me, I can support max 255 vcpus for WS2012
>>> with hyper-v enlightenments.
>>>
>>> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
>>> index 27fd050..efe3cbc 100644
>>> --- a/target/i386/kvm.c
>>> +++ b/target/i386/kvm.c
>>> @@ -772,7 +772,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
>>>
>>>  c = &cpuid_data.entries[cpuid_i++];
>>>  c->function = HYPERV_CPUID_IMPLEMENT_LIMITS;
>>> -c->eax = 0x40;
>>> +c->eax = -1;
>>>  c->ebx = 0x40;
>>>
>>>  kvm_base = KVM_CPUID_SIGNATURE_NEXT;
>>
>> This needs to depend on the machine type, but apart from that I think
> 
> I don't know why. Because the negative effects for this change don't exist
> on current QEMU IIUC, and we don't have compatible problems for live 
> migration.

CPUID should never change with the same machine type and command line.

Paolo

>> you should submit the patch for 2.9.
>>
> 
> Thanks,
> -Gonglei
>

Re: [Qemu-devel] Estimation of qcow2 image size converted from raw image

2017-02-20 Thread Stefan Hajnoczi

On Wed, Feb 15, 2017 at 05:49:58PM +0200, Nir Soffer wrote:
> On Wed, Feb 15, 2017 at 5:14 PM, Stefan Hajnoczi  wrote:
> > On Mon, Feb 13, 2017 at 05:46:19PM +0200, Maor Lipchuk wrote:
> >> I was wondering if that is possible to provide a new API that
> >> estimates the size of
> >> qcow2 image converted from a raw image. We could use this new API to
> >> allocate the
> >> size more precisely before the convert operation.
> >>
> > [...]
> >> We think that the best way to solve this issue is to return this info
> >> from qemu-img, maybe as a flag to qemu-img convert that will
> >> calculate the size of the converted image without doing any writes.
> >
> > Sounds reasonable.  qcow2 actually already does some of this calculation
> > internally for image preallocation in qcow2_create2().
> >
> > Let's try this syntax:
> >
> >   $ qemu-img query-max-size -f raw -O qcow2 input.raw
> >   1234678000
> 
> This is little bit verbose compared to other commands
> (e.g. info, check, convert)
> 
> Since this is needed only during convert, maybe this can be
> a convert flag?
> 
> qemu-img convert -f xxx -O yyy src dst --estimate-size --output json
> {
> "estimated size": 1234678000
> }

What is dst?  It's a dummy argument.

Let's not try to shoehorn this new sub-command into qemu-img convert.

> > As John explained, it is only an estimate.  But it will be a
> > conservative maximum.
> >
> > Internally BlockDriver needs a new interface:
> >
> > struct BlockDriver {
> > /*
> >  * Return a conservative estimate of the maximum host file size
> >  * required by a new image given an existing BlockDriverState (not
> >  * necessarily opened with this BlockDriver).
> >  */
> > uint64_t (*bdrv_query_max_size)(BlockDriverState *other_bs,
> > Error **errp);
> > };
> >
> > This interface allows individual block drivers to probe other_bs in
> > whatever way necessary (e.g. querying block allocation status).
> >
> > Since this is a conservative max estimate there's no need to read all
> > data to check for zero regions.  We should give the best estimate that
> > can be generated quickly.
> 
> I think we need to check allocation (e.g. with SEEK_DATA), I hope this
> is what you mean by not read all data.

Yes, allocation data must be checked.  But it will not read data
clusters from disk to check if they contains only zeroes.

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-devel] Estimation of qcow2 image size converted from raw image

2017-02-20 Thread Stefan Hajnoczi

On Wed, Feb 15, 2017 at 04:07:43PM +, Daniel P. Berrange wrote:
> On Wed, Feb 15, 2017 at 05:57:12PM +0200, Nir Soffer wrote:
> > On Wed, Feb 15, 2017 at 5:20 PM, Daniel P. Berrange  
> > wrote:
> > > On Wed, Feb 15, 2017 at 03:14:19PM +, Stefan Hajnoczi wrote:
> > >> On Mon, Feb 13, 2017 at 05:46:19PM +0200, Maor Lipchuk wrote:
> > >> > I was wondering if that is possible to provide a new API that
> > >> > estimates the size of
> > >> > qcow2 image converted from a raw image. We could use this new API to
> > >> > allocate the
> > >> > size more precisely before the convert operation.
> > >> >
> > >> [...]
> > >> > We think that the best way to solve this issue is to return this info
> > >> > from qemu-img, maybe as a flag to qemu-img convert that will
> > >> > calculate the size of the converted image without doing any writes.
> > >>
> > >> Sounds reasonable.  qcow2 actually already does some of this calculation
> > >> internally for image preallocation in qcow2_create2().
> > >>
> > >> Let's try this syntax:
> > >>
> > >>   $ qemu-img query-max-size -f raw -O qcow2 input.raw
> > >>   1234678000
> > >>
> > >> As John explained, it is only an estimate.  But it will be a
> > >> conservative maximum.
> > >
> > > This forces you to have an input file. It would be nice to be able to
> > > get the same information by merely giving the desired capacity e.g
> > >
> > >   $ qemu-img query-max-size -O qcow2 20G
> > 
> > Without a file, this will have to assume that all clusters will be 
> > allocated.
> > 
> > Do you have a use case for not using existing file?
> 
> If you want to format a new qcow2 file in a pre-created block device you
> want to know how big the block device should be. Or you want to validate
> that the filesystem you're about to created it in will not become
> overrcomitted wrt pre-existing guests. So you need to consider the FS
> free space, vs query-max-size for all existing guest, combined with
> query-max-size for the new disk you wan tto create

QEMU can certainly provide a --size 20G mode which returns data size
(20G) + metadata size.  Of course, an empty qcow2 file will start off
much smaller since no data clusters are in use yet.

It's worth remembering that operations like savem and internal snapshots
can increase image size beyond the conservative max estimate.  So this
estimate isn't an upper bound for the future, just an upper bound for
qemu-img convert.

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v15 08/25] block: introduce auto-loading bitmaps

2017-02-20 Thread Kevin Wolf

Am 18.02.2017 um 11:54 hat Denis V. Lunev geschrieben:
> On 02/17/2017 03:54 PM, Vladimir Sementsov-Ogievskiy wrote:
> > 17.02.2017 17:24, Kevin Wolf wrote:
> >> Am 17.02.2017 um 14:48 hat Denis V. Lunev geschrieben:
> >>> On 02/17/2017 04:34 PM, Kevin Wolf wrote:
>  Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben:
> > But for sure this is bad from the downtime point of view.
> > On migrate you will have to write to the image and re-read
> > it again on the target. This would be very slow. This will
> > not help for the migration with non-shared disk too.
> >
> > That is why we have specifically worked in a migration,
> > which for a good does not influence downtime at all now.
> >
> > With a write we are issuing several write requests + sync.
> > Our measurements shows that bdrv_drain could take around
> > a second on an averagely loaded conventional system, which
> > seems unacceptable addition to me.
>  I'm not arguing against optimising migration, I fully agree with
>  you. I
>  just think that we should start with a correct if slow base version
>  and
>  then add optimisation to that, instead of starting with a broken base
>  version and adding to that.
> 
>  Look, whether you do the expensive I/O on open/close and make that a
>  slow operation or whether you do it on invalidate_cache/inactivate
>  doesn't really make a difference in term of slowness because in
>  general
>  both operations are called exactly once. But it does make a difference
>  in terms of correctness.
> 
>  Once you do the optimisation, of course, you'll skip writing those
>  bitmaps that you transfer using a different channel, no matter whether
>  you skip it in bdrv_close() or in bdrv_inactivate().
> 
>  Kevin
> >>> I do not understand this point as in order to optimize this
> >>> we will have to create specific code path or option from
> >>> the migration code and keep this as an ugly kludge forever.
> >> The point that I don't understand is why it makes any difference for the
> >> follow-up migration series whether the writeout is in bdrv_close() or
> >> bdrv_inactivate(). I don't really see the difference between the two
> >> from a migration POV; both need to be skipped if we transfer the bitmap
> >> using a different channel.
> >>
> >> Maybe I would see the reason if I could find the time to look at the
> >> migration patches first, but unfortunately I don't have this time at the
> >> moment.
> >>
> >> My point is just that generally we want to have a correctly working qemu
> >> after every single patch, and even more importantly after every series.
> >> As the migration series is separate from this, I don't think it's a good
> >> excuse for doing worse than we could easily do here.
> >>
> >> Kevin
> >
> > With open/close all is already ok - bitmaps will not be saved because
> > of BDRV_O_INACTIVE  and will not be loaded because of IN_USE.

If the contents of so-called persistent bitmaps is lost across
migration and stale after bdrv_invalidate_cache, that's not "all ok" in
my book. It's buggy.

> in any case load/store happens outside of VM downtime.
> The target is running at the moment of close on source,
> the source is running at the moment of open on target.

Which makes the operation in open/close meaningless: open reads data
which may still by changed, and close cannot write to the image any more
because ownership is already given up.

Anyway, all of this isn't really productive. Please, can't you just
answer that simple question that I asked you above: What problems would
you get if you used invalidate_cache/inactivate instead of open/close
for triggering these actions?

If you can bring up some "this would break X", "it would be very hard to
implement Y correctly then" or "in scenario Z, this would have unwanted
effects", then we can figure out what to do. But not putting things in
the proper place just because you don't feel like it isn't a very strong
argument.

Kevin

Re: [Qemu-devel] [RFC PATCH 17/41] block: Request real permissions in blk_new_open()

2017-02-20 Thread Max Reitz

On 13.02.2017 18:22, Kevin Wolf wrote:
> We can figure out the necessary permissions from the flags that the
> caller passed.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/block-backend.c | 16 ++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 1f80854..e10a278 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -160,6 +160,7 @@ BlockBackend *blk_new_open(const char *filename, const 
> char *reference,
>  {
>  BlockBackend *blk;
>  BlockDriverState *bs;
> +uint64_t perm;
>  
>  blk = blk_new(0, BLK_PERM_ALL);

What about this?

Max

>  bs = bdrv_open(filename, reference, options, flags, errp);
> @@ -168,9 +169,20 @@ BlockBackend *blk_new_open(const char *filename, const 
> char *reference,
>  return NULL;
>  }
>  
> -/* FIXME Use real permissions */
> +/* blk_new_open() is mainly used in .bdrv_create implementations and the
> + * tools where sharing isn't a concern because the BDS stays private, so 
> we
> + * just request permission according to the flags.
> + *
> + * The exceptions are xen_disk and blockdev_init(); in these cases, the
> + * caller of blk_new_open() doesn't make use of the permissions, but they
> + * shouldn't hurt either. We can still share everything here because the
> + * guest devices will add their own blockers if they can't share. */
> +perm = BLK_PERM_CONSISTENT_READ;
> +if (flags & BDRV_O_RDWR) {
> +perm |= BLK_PERM_WRITE;
> +}
>  blk->root = bdrv_root_attach_child(bs, "root", &child_root,
> -   0, BLK_PERM_ALL, blk, &error_abort);
> +   perm, BLK_PERM_ALL, blk, 
> &error_abort);
>  
>  return blk;
>  }
> 




signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [RFC PATCH 16/41] block: Add error parameter to blk_insert_bs()

2017-02-20 Thread Kevin Wolf

Am 20.02.2017 um 12:04 hat Max Reitz geschrieben:
> On 13.02.2017 18:22, Kevin Wolf wrote:
> > Mow that blk_insert_bs() requests the BlockBackend permissions for the
> > node it attaches to, it can fail. Instead of aborting, pass the errors
> > to the callers.
> 
> So it does qualify as a FIXME, but just for a single patch. Good. :-)
> 
> > Signed-off-by: Kevin Wolf 
> > ---
> >  block/backup.c   |  5 -
> >  block/block-backend.c| 12 
> >  block/commit.c   | 38 
> > ++
> >  block/mirror.c   | 15 ---
> >  blockdev.c   |  6 +-
> >  blockjob.c   |  7 ++-
> >  hmp.c|  6 +-
> >  hw/core/qdev-properties-system.c |  7 ++-
> >  include/sysemu/block-backend.h   |  2 +-
> >  migration/block.c|  2 +-
> >  nbd/server.c |  6 +-
> >  tests/test-blockjob.c|  2 +-
> >  12 files changed, 84 insertions(+), 24 deletions(-)
> 
> [...]
> 
> > diff --git a/migration/block.c b/migration/block.c
> > index 6b7ffd4..d259936 100644
> > --- a/migration/block.c
> > +++ b/migration/block.c
> > @@ -446,7 +446,7 @@ static void init_blk_migration(QEMUFile *f)
> >  BlockDriverState *bs = bmds_bs[i].bs;
> >  
> >  if (bmds) {
> > -blk_insert_bs(bmds->blk, bs);
> > +blk_insert_bs(bmds->blk, bs, &error_abort);
> 
> I don't think it's obvious why this is correct. I assume it is because
> this was a legal configuration on the source, so it must be a legal
> configuration on the destination.
> 
> But I'd certainly appreciate a comment to make that explicitly clear,
> especially considering that it isn't obvious that blk_insert_bs() can
> fail only because of op blockers and thus will always work if the
> configuration is known to be legal.

Actually, it's just because the requested permissions for bmds->blk are
still 0, BLK_PERM_ALL here. Once the FIXME is addressed (patch 37) and
the real permissions are requested, we get some error handling here.

Kevin


pgpX6pgBSaq35.pgp
Description: PGP signature

[Qemu-devel] [PATCH] crypto: assert cipher algorithm is always valid

2017-02-20 Thread P J P

From: Prasad J Pandit 

Crypto routines 'qcrypto_cipher_get_block_len' and
'qcrypto_cipher_get_key_len' return non-zero cipher block and key
lengths from static arrays 'alg_block_len[]' and 'alg_key_len[]'
respectively. Returning 'zero(0)' value from either of them would
likely lead to an error condition.

Signed-off-by: Prasad J Pandit 
---
 crypto/cipher.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/crypto/cipher.c b/crypto/cipher.c
index 9ecaff7..5a96489 100644
--- a/crypto/cipher.c
+++ b/crypto/cipher.c
@@ -63,18 +63,14 @@ static bool mode_need_iv[QCRYPTO_CIPHER_MODE__MAX] = {
 
 size_t qcrypto_cipher_get_block_len(QCryptoCipherAlgorithm alg)
 {
-if (alg >= G_N_ELEMENTS(alg_key_len)) {
-return 0;
-}
+assert(alg < G_N_ELEMENTS(alg_key_len));
 return alg_block_len[alg];
 }
 
 
 size_t qcrypto_cipher_get_key_len(QCryptoCipherAlgorithm alg)
 {
-if (alg >= G_N_ELEMENTS(alg_key_len)) {
-return 0;
-}
+assert(alg < G_N_ELEMENTS(alg_key_len));
 return alg_key_len[alg];
 }
 
-- 
2.9.3

Re: [Qemu-devel] [RFC PATCH 16/41] block: Add error parameter to blk_insert_bs()

2017-02-20 Thread Max Reitz

On 20.02.2017 12:22, Kevin Wolf wrote:
> Am 20.02.2017 um 12:04 hat Max Reitz geschrieben:
>> On 13.02.2017 18:22, Kevin Wolf wrote:
>>> Mow that blk_insert_bs() requests the BlockBackend permissions for the
>>> node it attaches to, it can fail. Instead of aborting, pass the errors
>>> to the callers.
>>
>> So it does qualify as a FIXME, but just for a single patch. Good. :-)
>>
>>> Signed-off-by: Kevin Wolf 
>>> ---
>>>  block/backup.c   |  5 -
>>>  block/block-backend.c| 12 
>>>  block/commit.c   | 38 
>>> ++
>>>  block/mirror.c   | 15 ---
>>>  blockdev.c   |  6 +-
>>>  blockjob.c   |  7 ++-
>>>  hmp.c|  6 +-
>>>  hw/core/qdev-properties-system.c |  7 ++-
>>>  include/sysemu/block-backend.h   |  2 +-
>>>  migration/block.c|  2 +-
>>>  nbd/server.c |  6 +-
>>>  tests/test-blockjob.c|  2 +-
>>>  12 files changed, 84 insertions(+), 24 deletions(-)
>>
>> [...]
>>
>>> diff --git a/migration/block.c b/migration/block.c
>>> index 6b7ffd4..d259936 100644
>>> --- a/migration/block.c
>>> +++ b/migration/block.c
>>> @@ -446,7 +446,7 @@ static void init_blk_migration(QEMUFile *f)
>>>  BlockDriverState *bs = bmds_bs[i].bs;
>>>  
>>>  if (bmds) {
>>> -blk_insert_bs(bmds->blk, bs);
>>> +blk_insert_bs(bmds->blk, bs, &error_abort);
>>
>> I don't think it's obvious why this is correct. I assume it is because
>> this was a legal configuration on the source, so it must be a legal
>> configuration on the destination.
>>
>> But I'd certainly appreciate a comment to make that explicitly clear,
>> especially considering that it isn't obvious that blk_insert_bs() can
>> fail only because of op blockers and thus will always work if the
>> configuration is known to be legal.
> 
> Actually, it's just because the requested permissions for bmds->blk are
> still 0, BLK_PERM_ALL here. Once the FIXME is addressed (patch 37) and
> the real permissions are requested, we get some error handling here.

OK, good, thanks.

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [RFC PATCH 18/41] block: Allow error return in BlockDevOps.change_media_cb()

2017-02-20 Thread Max Reitz

On 13.02.2017 18:22, Kevin Wolf wrote:
> Some devices allow a media change between read-only and read-write
> media. They need to adapt the permissions in their .change_media_cb()
> implementation, which can fail. So add an Error parameter to the
> function.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/block-backend.c  | 12 +---
>  blockdev.c | 19 +++
>  hw/block/fdc.c |  2 +-
>  hw/ide/core.c  |  2 +-
>  hw/scsi/scsi-disk.c|  2 +-
>  hw/sd/sd.c |  2 +-
>  include/block/block_int.h  |  2 +-
>  include/sysemu/block-backend.h |  2 +-
>  8 files changed, 30 insertions(+), 13 deletions(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index e10a278..0c23add 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -671,15 +671,21 @@ void blk_set_dev_ops(BlockBackend *blk, const 
> BlockDevOps *ops,
>   * Else, notify of media eject.
>   * Also send DEVICE_TRAY_MOVED events as appropriate.
>   */
> -void blk_dev_change_media_cb(BlockBackend *blk, bool load)
> +void blk_dev_change_media_cb(BlockBackend *blk, bool load, Error **errp)

May deserve a comment that this function never fails with
load == false. The assertion is not too hidden, but it's not extremely
visible either.

Max

>  {
>  if (blk->dev_ops && blk->dev_ops->change_media_cb) {
>  bool tray_was_open, tray_is_open;
> +Error *local_err = NULL;
>  
>  assert(!blk->legacy_dev);
>  
>  tray_was_open = blk_dev_is_tray_open(blk);
> -blk->dev_ops->change_media_cb(blk->dev_opaque, load);
> +blk->dev_ops->change_media_cb(blk->dev_opaque, load, &local_err);
> +if (local_err) {
> +assert(load == true);
> +error_propagate(errp, local_err);
> +return;
> +}
>  tray_is_open = blk_dev_is_tray_open(blk);
>  
>  if (tray_was_open != tray_is_open) {



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] Missing rewrite rule for qemu.org/images/ links

2017-02-20 Thread Stefan Hajnoczi

On Thu, Feb 16, 2017 at 12:07:02PM -0500, Jeff Cody wrote:
> On Thu, Feb 16, 2017 at 10:47:07AM -0500, Jeff Cody wrote:
> > On Thu, Feb 16, 2017 at 03:30:25PM +, Stefan Hajnoczi wrote:
> > > Hi Jeff,
> > > The old qemu.org/images/ links are not being rewritten to
> > > wiki.qemu-project.org/images/.
> > > 
> > > For example, http://www.qemu-project.org/images/4/4e/Q35.pdf no longer 
> > > works:
> > > https://bugs.launchpad.net/qemu/+bug/1665344
> > > 
> > > I do notice that qemu.org/Features/Q35 is working.  Is it possible to
> > > set up a rewrite rule or HTTP redirect so existing /images/ links keep
> > > working?
> > > 
> > > Thanks,
> > > Stefan
> > 
> > Thanks for noticing, and pointing that out.  I'll get right on that.
> >
> 
> Fixed now.

Excellent, thanks for the quick fix!

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v8 1/2] block/vxhs.c: Add support for a new block device type called "vxhs"

2017-02-20 Thread ashish mittal

On Mon, Feb 20, 2017 at 3:02 AM, Stefan Hajnoczi  wrote:
> On Sat, Feb 18, 2017 at 12:30:31AM +, Ketan Nilangekar wrote:
>> On 2/17/17, 1:42 PM, "Jeff Cody"  wrote:
>>
>> On Thu, Feb 16, 2017 at 02:24:19PM -0800, ashish mittal wrote:
>> > Hi,
>> >
>> > I am getting the following error with checkpatch.pl
>> >
>> > ERROR: externs should be avoided in .c files
>> > #78: FILE: block/vxhs.c:28:
>> > +QemuUUID qemu_uuid __attribute__ ((weak));
>> >
>> > Is there any way to get around this, or does it mean that I would have
>> > to add a vxhs.h just for this one entry?
>> >
>>
>> I remain skeptical on the use of the qemu_uuid as a way to select the TLS
>> cert.
>>
>> [ketan]
>> Is there another identity that can be used for uniquely identifying 
>> instances?
>> The requirement was to enforce vdisk access to owner instances.
>
> The qemu_uuid weak attribute looks suspect.  What is going to provide a
> strong qemu_uuid symbol?
>
> Why aren't configuration parameters like the UUID coming from the QEMU
> command-line?
>
> Stefan

UUID will in fact come from the QEMU command line. VxHS is not doing
anything special here. It will just use the value already available to
qemu-kvm process.

QemuUUID qemu_uuid;
bool qemu_uuid_set;

Both the above are defined in vl.c. vl.c will provide the strong
symbol when available. There are certain binaries that do not get
linked with vl.c (e.g. qemu-img). The weak symbol will come into
affect for such binaries, and in this case, the default VXHS UUID will
get picked up. I had, in a previous email, explained how we plan to
use the default UUID. In the regular case, the VxHS controller will
not allow access to the default UUID (non qemu-kvm) binaries, but it
may choose to grant temporary access to specific vdisks for these
binaries depending on the workflow.

Regards,
Ashish

Re: [Qemu-devel] [PATCH v8 4/8] ACPI: Add Virtual Machine Generation ID support

2017-02-20 Thread Laszlo Ersek

On 02/20/17 12:00, Dr. David Alan Gilbert wrote:
> * Laszlo Ersek (ler...@redhat.com) wrote:
>> On 02/20/17 11:23, Dr. David Alan Gilbert wrote:
>>> * Laszlo Ersek (ler...@redhat.com) wrote:
 CC Dave
>>>
>>> This isn't an area I really understand; but if I'm
>>> reading this right then 
>>>vmgenid is stored in fw_cfg?
>>>fw_cfg isn't migrated
>>>
>>> So why should any changes to it get migrated, except if it's already
>>> been read by the guest (and if the guest reads it again aftwards what's
>>> it expected to read?)
>>
>> This is what we have here:
>> - QEMU formats read-only fw_cfg blob with GUID
>> - guest downloads blob, places it in guest RAM
>> - guest tells QEMU the guest-side address of the blob
>> - during migration, guest RAM is transferred
>> - after migration, in the device's post_load callback, QEMU overwrites
>>   the GUID in guest RAM with a different value, and injects an SCI
>>
>> I CC'd you for the following reason: Igor reported that he didn't see
>> either the fresh GUID or the SCI in the guest, on the target host, after
>> migration. I figured that perhaps there was an ordering issue between
>> RAM loading and post_load execution on the target host, and so I
>> proposed to delay the RAM overwrite + SCI injection a bit more;
>> following the pattern seen in your commit 90c647db8d59.
>>
>> However, since then, both Ben and myself tested the code with migration
>> (using "virsh save" (Ben) and "virsh managedsave" (myself)), with
>> Windows and Linux guests, and it works for us; there seems to be no
>> ordering issue with the current code (= overwrite RAM + inject SCI in
>> the post_load callback()).
>>
>> For now we don't understand why it doesn't work for Igor (Igor used
>> exec/gzip migration to/from a local file using direct QEMU monitor
>> commands / options, no libvirt). And, copying the pattern seen in your
>> commit 90c647db8d59 didn't help in his case (while it wasn't even
>> necessary for success in Ben's and my testing).
> 
> One thing I noticed in Igor's test was that he did a 'stop' on the source
> before the migate, and so it's probably still paused on the destination
> after the migration is loaded, so anything the guest needs to do might
> not have happened until it's started.

Interesting! I hope Igor can double-check this!

In the virsh docs, before doing my tests, I read that "managedsave"
optionally took --running or --paused:

Normally, starting a managed save will decide between running or
paused based on the state the domain was in when the save was done;
passing either the --running or --paused flag will allow overriding
which state the start should use.

I didn't pass any such flag ultimately, and I didn't stop the guests
before the managedsave. Indeed they continued execution right after
being loaded with "virsh start".

(Side point: managedsave is awesome. :) )

> 
> You say;
>'guest tells QEMU the guest-side address of the blob'
> how is that stored/migrated/etc ?

It is a uint8_t[8] array (little endian representation), linked into
another (writeable) fw_cfg entry, and it's migrated explicitly (it has a
descriptor in the device's vmstate descriptor). The post_load callback
relies on this array being restored before the migration infrastructure
calls post_load.

Thanks
Laszlo

Re: [Qemu-devel] [PATCH] crypto: assert cipher algorithm is always valid

2017-02-20 Thread Daniel P. Berrange

On Mon, Feb 20, 2017 at 04:53:07PM +0530, P J P wrote:
> From: Prasad J Pandit 
> 
> Crypto routines 'qcrypto_cipher_get_block_len' and
> 'qcrypto_cipher_get_key_len' return non-zero cipher block and key
> lengths from static arrays 'alg_block_len[]' and 'alg_key_len[]'
> respectively. Returning 'zero(0)' value from either of them would
> likely lead to an error condition.

Well callers are supposed to check for 0 condition and report an
error really. In practice none of them do, and the alg parameters
they pass in all come from constants. So we're not validating user
input here - we're catching programming bugs and thus assert makes
sense.

> 
> Signed-off-by: Prasad J Pandit 
> ---
>  crypto/cipher.c | 8 ++--
>  1 file changed, 2 insertions(+), 6 deletions(-)

Reviewed-by: Daniel P. Berrange 

> 
> diff --git a/crypto/cipher.c b/crypto/cipher.c
> index 9ecaff7..5a96489 100644
> --- a/crypto/cipher.c
> +++ b/crypto/cipher.c
> @@ -63,18 +63,14 @@ static bool mode_need_iv[QCRYPTO_CIPHER_MODE__MAX] = {
>  
>  size_t qcrypto_cipher_get_block_len(QCryptoCipherAlgorithm alg)
>  {
> -if (alg >= G_N_ELEMENTS(alg_key_len)) {
> -return 0;
> -}
> +assert(alg < G_N_ELEMENTS(alg_key_len));
>  return alg_block_len[alg];
>  }
>  
>  
>  size_t qcrypto_cipher_get_key_len(QCryptoCipherAlgorithm alg)
>  {
> -if (alg >= G_N_ELEMENTS(alg_key_len)) {
> -return 0;
> -}
> +assert(alg < G_N_ELEMENTS(alg_key_len));
>  return alg_key_len[alg];
>  }

Adding to the crypto/ queue

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|

1 2 3 4 5 >

1 - 100 of 400 matches

Mail list logo