date:20160328

Re: [Qemu-devel] [PATCH v2] migration: skip sending ram pages released by virtio-balloon driver.

2016-03-28 Thread Michael S. Tsirkin

On Mon, Mar 28, 2016 at 09:46:05AM +0530, Jitendra Kolhe wrote:
> While measuring live migration performance for qemu/kvm guest, it
> was observed that the qemu doesn’t maintain any intelligence for the
> guest ram pages which are released by the guest balloon driver and
> treat such pages as any other normal guest ram pages. This has direct
> impact on overall migration time for the guest which has released
> (ballooned out) memory to the host.
> 
> In case of large systems, where we can configure large guests with 1TB
> and with considerable amount of memory release by balloon driver to the,
> host the migration time gets worse.
> 
> The solution proposed below is local only to qemu (and does not require
> any modification to Linux kernel or any guest driver). We have verified
> the fix for large guests =1TB on HPE Superdome X (which can support up
> to 240 cores and 12TB of memory) and in case where 90% of memory is
> released by balloon driver the migration time for an idle guests reduces
> to ~600 sec's from ~1200 sec’s.
> 
> Detail: During live migration, as part of 1st iteration in ram_save_iterate()
> -> ram_find_and_save_block () will try to migrate ram pages which are
> released by vitrio-balloon driver as part of dynamic memory delete.
> Even though the pages which are returned to the host by virtio-balloon
> driver are zero pages, the migration algorithm will still end up
> scanning the entire page ram_find_and_save_block() -> ram_save_page/
> ram_save_compressed_page -> save_zero_page() -> is_zero_range().  We
> also end-up sending some control information over network for these
> page during migration. This adds to total migration time.
> 
> The proposed fix, uses the existing bitmap infrastructure to create
> a virtio-balloon bitmap. The bits in the bitmap represent a guest ram
> page of size 1UL<< VIRTIO_BALLOON_PFN_SHIFT. The bitmap represents
> entire guest ram memory till max configured memory. Guest ram pages
> claimed by the virtio-balloon driver will be represented by 1 in the
> bitmap. During live migration, each guest ram page (host VA offset)
> is checked against the virtio-balloon bitmap, if the bit is set the
> corresponding ram page will be excluded from scanning and sending
> control information during migration. The bitmap is also migrated to
> the target as part of every ram_save_iterate loop and after the
> guest is stopped remaining balloon bitmap is migrated as part of
> balloon driver save / load interface.
> 
> With the proposed fix, the average migration time for an idle guest
> with 1TB maximum memory and 64vCpus
>  - reduces from ~1200 secs to ~600 sec, with guest memory ballooned
>down to 128GB (~10% of 1TB).
>  - reduces from ~1300 to ~1200 sec (7%), with guest memory ballooned
>down to 896GB (~90% of 1TB),
>  - with no ballooning configured, we don’t expect to see any impact
>on total migration time.
> 
> The optimization gets temporarily disabled, if the balloon operation is
> in progress. Since the optimization skips scanning and migrating control
> information for ballooned out pages, we might skip guest ram pages in
> cases where the guest balloon driver has freed the ram page to the guest
> but not yet informed the host/qemu about the ram page
> (VIRTIO_BALLOON_F_MUST_TELL_HOST). In such case with optimization, we
> might skip migrating ram pages which the guest is using. Since this
> problem is specific to balloon leak, we can restrict balloon operation in
> progress check to only balloon leak operation in progress check.
> 
> The optimization also get permanently disabled (for all subsequent
> migrations) in case any of the migration uses postcopy capability. In case
> of postcopy the balloon bitmap would be required to send after vm_stop,
> which has significant impact on the downtime. Moreover, the applications
> in the guest space won’t be actually faulting on the ram pages which are
> already ballooned out, the proposed optimization will not show any
> improvement in migration time during postcopy.
> 
> Signed-off-by: Jitendra Kolhe 
> ---
> Changed in v2:
>  - Resolved compilation issue for qemu-user binaries in exec.c
>  - Localize balloon bitmap test to save_zero_page().
>  - Updated version string for newly added migration capability to 2.7.
>  - Made minor modifications to patch commit text.
> 
>  balloon.c  | 253 
> -
>  exec.c |   3 +
>  hw/virtio/virtio-balloon.c |  35 -
>  include/hw/virtio/virtio-balloon.h |   1 +
>  include/migration/migration.h  |   1 +
>  include/sysemu/balloon.h   |  15 ++-
>  migration/migration.c  |   9 ++
>  migration/ram.c|  31 -
>  qapi-schema.json   |   5 +-
>  9 files changed, 341 insertions(+), 12 deletions(-)
> 
> diff --git a/balloon.c b/balloon.c
> index f2ef50c..1c2d228 100644
> --- a/balloon.c
> +++ b/balloon.c
> @@ -33,11 +33,34 @@
>  #i

[Qemu-devel] [PATCH] pci: fix compilation warnings

2016-03-28 Thread Marcel Apfelbaum

Fix 'error: shift exponent -1 is negative' warning
by adding a corresponding assert.

Reported-by: Peter Maydell 
Signed-off-by: Markus Armbruster 
Signed-off-by: Marcel Apfelbaum 
---
 hw/pci/pci.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index e67664d..a1d41aa 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -163,11 +163,15 @@ int pci_bar(PCIDevice *d, int reg)
 
 static inline int pci_irq_state(PCIDevice *d, int irq_num)
 {
+assert(irq_num >= 0);
+
return (d->irq_state >> irq_num) & 0x1;
 }
 
 static inline void pci_set_irq_state(PCIDevice *d, int irq_num, int level)
 {
+assert(irq_num >= 0);
+
d->irq_state &= ~(0x1 << irq_num);
d->irq_state |= level << irq_num;
 }
-- 
2.4.3

[Qemu-devel] [PATCH] pci: fix identation

2016-03-28 Thread Marcel Apfelbaum

Signed-off-by: Marcel Apfelbaum 
---
 hw/pci/pci.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index a1d41aa..da8b0b6 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -165,15 +165,15 @@ static inline int pci_irq_state(PCIDevice *d, int irq_num)
 {
 assert(irq_num >= 0);
 
-   return (d->irq_state >> irq_num) & 0x1;
+return (d->irq_state >> irq_num) & 0x1;
 }
 
 static inline void pci_set_irq_state(PCIDevice *d, int irq_num, int level)
 {
 assert(irq_num >= 0);
 
-   d->irq_state &= ~(0x1 << irq_num);
-   d->irq_state |= level << irq_num;
+d->irq_state &= ~(0x1 << irq_num);
+d->irq_state |= level << irq_num;
 }
 
 static void pci_change_irq_level(PCIDevice *pci_dev, int irq_num, int change)
-- 
2.4.3

Re: [Qemu-devel] [PATCH] pci: fix compilation warnings

2016-03-28 Thread Stefan Weil

Am 28.03.2016 um 09:23 schrieb Marcel Apfelbaum:
> Fix 'error: shift exponent -1 is negative' warning
> by adding a corresponding assert.
>
> Reported-by: Peter Maydell 
> Signed-off-by: Markus Armbruster 
> Signed-off-by: Marcel Apfelbaum 
> ---
>  hw/pci/pci.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index e67664d..a1d41aa 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -163,11 +163,15 @@ int pci_bar(PCIDevice *d, int reg)
>  
>  static inline int pci_irq_state(PCIDevice *d, int irq_num)
>  {
> +assert(irq_num >= 0);
> +
>   return (d->irq_state >> irq_num) & 0x1;
>  }
>  
>  static inline void pci_set_irq_state(PCIDevice *d, int irq_num, int level)
>  {
> +assert(irq_num >= 0);
> +
>   d->irq_state &= ~(0x1 << irq_num);
>   d->irq_state |= level << irq_num;
>  }

Do we use negative values for irq_num anywhere?
If not, using an unsigned irq_num might be a better solution.

Stefan



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [Bug 1488363] Re: qemu 2.4.0 hangs using vfio for pci passthrough of graphics card

2016-03-28 Thread DiQ

With U14.04_64 and qemu-2.5.0 complied from
http://wiki.qemu.org/Download this is a problem unless I use linux-
generic-lts-xenial (4.4.0.13.7) so it seems there's a kernel issue here

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1488363

Title:
  qemu 2.4.0 hangs using vfio for pci passthrough of graphics card

Status in QEMU:
  New

Bug description:
  2.3.0 (manjaro distro package) works fine. 2.4.0 (manjaro or the arch
  vanilla one) hangs on the SeaBIOS screen when saying "Press F12 for
  boot menu". All tested with the same hardware, OS, command and
  configuration. It also starts without the GPU passed through, even
  with the USB passed through. I am using the latest SeaBIOS 1.8.2.

  The release notes say:
   VFIO
  Support for resetting AMD Bonaire and Hawaii GPUs
  Platform device passthrough support for Calxeda xgmac devices 
  
  So maybe something there broke it.
  
  I am using the arch qemu 2.4.0 PKGBUILD (modified to have make -j8 and 
removed iscsi, gluster, ceph, etc.), which uses vanilla sources and no patches. 
https://projects.archlinux.org/svntogit/packages.git/tree/trunk?h=packages/qemu

  I am not using a frontend. I am using a script I wrote that generates
  the command below.

  Guest OS here would be 64 bit windows 7, but it didn't start so that's
  not relevant. Also a Manjaro Linux VM won't start.

  CPU is AMD FX-8150; board is Gigabyte GA-990FXA-UD5 (990FX chipset).

  full command line (without the \ after each line) is:

  qemu-system-x86_64
  -enable-kvm
  -M q35
  -m 3584
  -cpu host
  -boot c
  -smp 7,sockets=1,cores=7,threads=1
  -vga none
  -device ioh3420,bus=pcie.0,addr=1c.0,port=1,chassis=1,id=root.1
  -device 
vfio-pci,host=04:00.0,bus=root.1,multifunction=on,x-vga=on,addr=0.0,romfile=Sapphire.R7260X.1024.131106.rom
  -device vfio-pci,host=00:14.2,bus=pcie.0
  -device vfio-pci,host=00:16.0,bus=root.1
  -device vfio-pci,host=00:16.2,bus=root.1
  -usb
  -device ahci,bus=pcie.0,id=ahci
  -drive 
file=/dev/data/vm1,id=disk1,format=raw,if=virtio,index=0,media=disk,discard=on
  -drive media=cdrom,id=cdrom,index=5,media=cdrom
  -netdev type=tap,id=net0,ifname=tap-vm1
  -device virtio-net-pci,netdev=net0,mac=00:01:02:03:04:05
  -monitor stdio
  -boot menu=on

  
  $ lspci -nn | grep -E "04:00.0|00:14.2|00:16.0|00:16.2"
  00:14.2 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 
Azalia (Intel HDA) [1002:4383] (rev 40)
  00:16.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD/ATI] 
SB7x0/SB8x0/SB9x0 USB OHCI0 Controller [1002:4397]
  00:16.2 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD/ATI] 
SB7x0/SB8x0/SB9x0 USB EHCI Controller [1002:4396]
  04:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. 
[AMD/ATI] Bonaire XTX [Radeon R7 260X] [1002:6658]

  
  Also I have this one that also hangs:
  05:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. 
[AMD/ATI] Juniper XT [Radeon HD 6770] [1002:68ba]

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1488363/+subscriptions

Re: [Qemu-devel] [PATCH] pci: fix compilation warnings

2016-03-28 Thread Marcel Apfelbaum


On 03/28/2016 11:43 AM, Stefan Weil wrote:

Am 28.03.2016 um 09:23 schrieb Marcel Apfelbaum:

Fix 'error: shift exponent -1 is negative' warning
by adding a corresponding assert.

Reported-by: Peter Maydell 
Signed-off-by: Markus Armbruster 
Signed-off-by: Marcel Apfelbaum 
---
  hw/pci/pci.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index e67664d..a1d41aa 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -163,11 +163,15 @@ int pci_bar(PCIDevice *d, int reg)

  static inline int pci_irq_state(PCIDevice *d, int irq_num)
  {
+assert(irq_num >= 0);
+
return (d->irq_state >> irq_num) & 0x1;
  }

  static inline void pci_set_irq_state(PCIDevice *d, int irq_num, int level)
  {
+assert(irq_num >= 0);
+
d->irq_state &= ~(0x1 << irq_num);
d->irq_state |= level << irq_num;
  }


Do we use negative values for irq_num anywhere?


Hi Stefan,

I didn't see any irq_nu assignments to a negative value but there are
some cases where it can be negative due to arithmetical operations like:

   hw/pci-host/bonito.c:651:int internal_irq = irq_num - BONITO_IRQ_BASE;

On other cases we look for negative value:

   hw/ppc/ppc4xx_devs.c:171:if (irq_num < 0 || irq_num > 3)
or
   hw/ppc/ppc4xx_pci.c:263:if (irq_num < 0)



If not, using an unsigned irq_num might be a better solution.


All of the above are manageable, of course, but we are close to
hard freeze (tomorrow) and the scope of this patch is much smaller (fix a 
compilation warning)

I think is definitely worth looking into it, but maybe as part of QEMU 2.7.

Thanks,
Marcel


Stefan

Re: [Qemu-devel] [PATCH v2] migration: skip sending ram pages released by virtio-balloon driver.

2016-03-28 Thread Denis V. Lunev


On 03/28/2016 07:16 AM, Jitendra Kolhe wrote:

While measuring live migration performance for qemu/kvm guest, it
was observed that the qemu doesn’t maintain any intelligence for the
guest ram pages which are released by the guest balloon driver and
treat such pages as any other normal guest ram pages. This has direct
impact on overall migration time for the guest which has released
(ballooned out) memory to the host.

In case of large systems, where we can configure large guests with 1TB
and with considerable amount of memory release by balloon driver to the,
host the migration time gets worse.

The solution proposed below is local only to qemu (and does not require
any modification to Linux kernel or any guest driver). We have verified
the fix for large guests =1TB on HPE Superdome X (which can support up
to 240 cores and 12TB of memory) and in case where 90% of memory is
released by balloon driver the migration time for an idle guests reduces
to ~600 sec's from ~1200 sec’s.

Detail: During live migration, as part of 1st iteration in ram_save_iterate()
-> ram_find_and_save_block () will try to migrate ram pages which are
released by vitrio-balloon driver as part of dynamic memory delete.
Even though the pages which are returned to the host by virtio-balloon
driver are zero pages, the migration algorithm will still end up
scanning the entire page ram_find_and_save_block() -> ram_save_page/
ram_save_compressed_page -> save_zero_page() -> is_zero_range().  We
also end-up sending some control information over network for these
page during migration. This adds to total migration time.

The proposed fix, uses the existing bitmap infrastructure to create
a virtio-balloon bitmap. The bits in the bitmap represent a guest ram
page of size 1UL<< VIRTIO_BALLOON_PFN_SHIFT. The bitmap represents
entire guest ram memory till max configured memory. Guest ram pages
claimed by the virtio-balloon driver will be represented by 1 in the
bitmap. During live migration, each guest ram page (host VA offset)
is checked against the virtio-balloon bitmap, if the bit is set the
corresponding ram page will be excluded from scanning and sending
control information during migration. The bitmap is also migrated to
the target as part of every ram_save_iterate loop and after the
guest is stopped remaining balloon bitmap is migrated as part of
balloon driver save / load interface.

With the proposed fix, the average migration time for an idle guest
with 1TB maximum memory and 64vCpus
  - reduces from ~1200 secs to ~600 sec, with guest memory ballooned
down to 128GB (~10% of 1TB).
  - reduces from ~1300 to ~1200 sec (7%), with guest memory ballooned
down to 896GB (~90% of 1TB),
  - with no ballooning configured, we don’t expect to see any impact
on total migration time.

The optimization gets temporarily disabled, if the balloon operation is
in progress. Since the optimization skips scanning and migrating control
information for ballooned out pages, we might skip guest ram pages in
cases where the guest balloon driver has freed the ram page to the guest
but not yet informed the host/qemu about the ram page
(VIRTIO_BALLOON_F_MUST_TELL_HOST). In such case with optimization, we
might skip migrating ram pages which the guest is using. Since this
problem is specific to balloon leak, we can restrict balloon operation in
progress check to only balloon leak operation in progress check.

The optimization also get permanently disabled (for all subsequent
migrations) in case any of the migration uses postcopy capability. In case
of postcopy the balloon bitmap would be required to send after vm_stop,
which has significant impact on the downtime. Moreover, the applications
in the guest space won’t be actually faulting on the ram pages which are
already ballooned out, the proposed optimization will not show any
improvement in migration time during postcopy.

I think that you could start the guest without the knowledge of the
ballooned pages and that would be completely fine as these pages
will not be touched at all by the guest. They will come into play
when the host will deflate balloon a bit. In this case QEMU can safely
give local zeroed page for the guest.

Thus you could send bitmap of ballooned pages when the guest
on dest side will be started and in this case this will not influence
downtime.

Den

[Qemu-devel] [PATCH v3] Add param Error ** for msi_init()

2016-03-28 Thread Cao jin

Add param Error **errp, and change pci_add_capability() to
pci_add_capability2(), because pci_add_capability() report error, and
msi_init() is widely used in realize(), so it is not suitable for realize().

Also fix all the callers who should deal with the msi_init() failure but
actually not. The affected devices are as following:

  1. intel hd audio: move msi_init() above, save the strength to free
the MemoryRegion when it fails.
  2. usb-xhci: move msi_init() above, save the strength to free the
MemoryRegion when it fails.
  3. ich9 ahci: it is a on-board device created during machine
initialization, when it fail, qemu will exit, so, no need to free
resource manually.
  4. megasas_scsi: move msi_init() above, save the strength to free
the MemoryRegion when it fails . Also fix a bug: when user enable
msi, and msi_init success(return positive offset value), the code
disable msi in the flags. But it seems this bug doesn`t do anything
bad.
  5. mptsas: when msi_init() fail, it will use INTx. So, catch the
error & report it right there, don`t propagate it. Move msi_init()
above,  save the strength to free the MemoryRegion when it fails.
  6. pci_bridge_dev/ioh3420/xio3130_downstream/xio3130_upstream: catch
error & report it right there.
  7. vmxnet3: move msi_init() above, save the strength to free the
MemoryRegion when it fails. Remove the unecessary vmxnet3_init_msi().
When msi_init() fail, it will use INTx, so msi_init()`s failure should not
break the realize. Just catch & report msi_init()`s failure message.
  8. pvscsi: when msi_init fail, it will use INTx. so msi_init failure
should not break the realize. Report the error when msi_init fail.
Nobody use the return value of pvscsi_init_msi(), so change its type
to void.
  9. vfio-pci: it ignores the config space corruption error, so,
catch & report it right there.

Signed-off-by: Cao jin 
---
The patch has been compiled. No further test.

 hw/audio/intel-hda.c   | 11 +++---
 hw/ide/ich.c   |  2 +-
 hw/net/vmxnet3.c   | 41 ++
 hw/pci-bridge/ioh3420.c|  6 +-
 hw/pci-bridge/pci_bridge_dev.c |  8 +++-
 hw/pci-bridge/xio3130_downstream.c |  7 ++-
 hw/pci-bridge/xio3130_upstream.c   |  7 ++-
 hw/pci/msi.c   | 23 +++--
 hw/scsi/megasas.c  | 12 +++
 hw/scsi/mptsas.c   | 17 +++-
 hw/scsi/vmw_pvscsi.c   | 10 ++
 hw/usb/hcd-xhci.c  | 10 +++---
 hw/vfio/pci.c  |  6 --
 include/hw/pci/msi.h   |  3 ++-
 14 files changed, 108 insertions(+), 55 deletions(-)

diff --git a/hw/audio/intel-hda.c b/hw/audio/intel-hda.c
index d372d4a..c3856cc 100644
--- a/hw/audio/intel-hda.c
+++ b/hw/audio/intel-hda.c
@@ -1139,12 +1139,17 @@ static void intel_hda_realize(PCIDevice *pci, Error 
**errp)
 /* HDCTL off 0x40 bit 0 selects signaling mode (1-HDA, 0 - Ac97) 18.1.19 */
 conf[0x40] = 0x01;
 
+if (d->msi) {
+msi_init(&d->pci, d->old_msi_addr ? 0x50 : 0x60, 1,
+true, false, errp);
+if (*errp) {
+return;
+}
+}
+
 memory_region_init_io(&d->mmio, OBJECT(d), &intel_hda_mmio_ops, d,
   "intel-hda", 0x4000);
 pci_register_bar(&d->pci, 0, 0, &d->mmio);
-if (d->msi) {
-msi_init(&d->pci, d->old_msi_addr ? 0x50 : 0x60, 1, true, false);
-}
 
 hda_codec_bus_init(DEVICE(pci), &d->codecs, sizeof(d->codecs),
intel_hda_response, intel_hda_xfer);
diff --git a/hw/ide/ich.c b/hw/ide/ich.c
index 0a13334..db4fdb5 100644
--- a/hw/ide/ich.c
+++ b/hw/ide/ich.c
@@ -146,7 +146,7 @@ static void pci_ich9_ahci_realize(PCIDevice *dev, Error 
**errp)
 /* Although the AHCI 1.3 specification states that the first capability
  * should be PMCAP, the Intel ICH9 data sheet specifies that the ICH9
  * AHCI device puts the MSI capability first, pointing to 0x80. */
-msi_init(dev, ICH9_MSI_CAP_OFFSET, 1, true, false);
+msi_init(dev, ICH9_MSI_CAP_OFFSET, 1, true, false, errp);
 }
 
 static void pci_ich9_uninit(PCIDevice *dev)
diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 093a71e..f3614f2 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -348,7 +348,7 @@ typedef struct {
 /* Interrupt management */
 
 /*
- *This function returns sign whether interrupt line is in asserted state
+ * This function returns sign whether interrupt line is in asserted state
  * This depends on the type of interrupt used. For INTX interrupt line will
  * be asserted until explicit deassertion, for MSI(X) interrupt line will
  * be deasserted automatically due to notification semantics of the MSI(X)
@@ -2189,27 +2189,6 @@ vmxnet3_cleanup_msix(VMXNET3State *s)
 }
 }
 
-#define VMXNET3_USE_64BIT (true)
-#define VMXNET3_PER_VECTOR_MASK   (false)
-
-static bool
-vmxnet3_init_msi(VMXNET3Stat

[Qemu-devel] [PATCH 3/3] NBD proto: add "Command flags" section

2016-03-28 Thread Denis V. Lunev

From: Pavel Borzenkov 

Add separate "Command flags" section to make it clear which flags are
currently defined by the protocol.

Signed-off-by: Pavel Borzenkov 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Wouter Verhelst 
CC: Eric Blake 
CC: Alex Bligh 
---
 doc/proto.md | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/doc/proto.md b/doc/proto.md
index 036d6d9..662f741 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -485,6 +485,16 @@ The following request types exist:
 Currently one such message is known: `NBD_CMD_CACHE`, with type set to
 5, implemented by xnbd.
 
+ Command flags
+
+This field of 16 bits is sent by the client with every request and provides
+additional information to the server to execute the command. Refer to
+aforementioned "Request types" section for information about the flags
+supported by particular commands.
+
+- bit 0, `NBD_CMD_FLAG_FUA`; should be set to 1 if the client requires
+  "Force Unit Access" mode of operation
+
  Error values
 
 The error values are used for the error field in the reply message.
-- 
2.1.4

[Qemu-devel] [PATCH 1/3] NBD proto: forbid TRIM command without negotiation

2016-03-28 Thread Denis V. Lunev

From: Pavel Borzenkov 

There is a loophole in the protocol that allows a client to send TRIM
request even if support for it wasn't negotiated with the server. State
explicitly that the client MUST NOT send such command without prior
successful negotiation.

Signed-off-by: Pavel Borzenkov 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Wouter Verhelst 
CC: Eric Blake 
CC: Alex Bligh 
---
 doc/proto.md | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/doc/proto.md b/doc/proto.md
index 6d1cb34..d54ed19 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -471,6 +471,9 @@ The following request types exist:
 about the contents of the export affected by this command, until
 overwriting it again with `NBD_CMD_WRITE`.
 
+A client MUST NOT send a trim request unless `NBD_FLAG_SEND_TRIM`
+was set in the export flags field.
+
 * Other requests
 
 Some third-party implementations may require additional protocol
-- 
2.1.4

[Qemu-devel] [PATCH 2/3] NBD proto: document additional error conditions

2016-03-28 Thread Denis V. Lunev

From: Pavel Borzenkov 

It is unclear what the behaviour of a server should be if it receives
an unknown command. Similar uncertainty exists for command flags.

Make it explicit that the server should return EINVAL in all such cases.

Signed-off-by: Pavel Borzenkov 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Wouter Verhelst 
CC: Eric Blake 
CC: Alex Bligh 
---
 doc/proto.md | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/doc/proto.md b/doc/proto.md
index d54ed19..036d6d9 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -512,6 +512,13 @@ return `EINVAL` if it receives a read or trim request 
including one or
 more sectors beyond the size of the device.  It also SHOULD map the
 `EDQUOT` and `EFBIG` errors to `ENOSPC`.  Finally, it SHOULD return
 `EPERM` if it receives a write or trim request on a read-only export.
+
+The server SHOULD return `EINVAL` if it receives an unknown command.
+
+The server SHOULD return `EINVAL` if it receives an unknown command flag. It
+also SHOULD return `EINVAL` if it receives a request with a flag not explicitly
+documented as applicable to the given request.
+
 Which error to return in any other case is not specified by the NBD
 protocol.
 
-- 
2.1.4

[Qemu-devel] [PATCH 0/3] Fix some ambiguities in the NBD protocol

2016-03-28 Thread Denis V. Lunev

This patch set attempts to fix some ambiguities in the NBD protocol.

Signed-off-by: Pavel Borzenkov 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Wouter Verhelst 
CC: Eric Blake 
CC: Alex Bligh 

Pavel Borzenkov (3):
  NBD proto: forbid TRIM command without negotiation
  NBD proto: document additional error conditions
  NBD proto: add "Command flags" section

 doc/proto.md | 20 
 1 file changed, 20 insertions(+)

[Qemu-devel] KVM call for 2016-03-29

2016-03-28 Thread Juan Quintela



Hi

Please, send any topic that you are interested in covering.

At the end of Monday I will send an email with the agenda or the
cancellation of the call, so hurry up.

After discussions on the QEMU Summit, we are going to have always open a
KVM call where you can add topics.

 Call details:

By popular demand, a google calendar public entry with it

  
https://www.google.com/calendar/embed?src=dG9iMXRqcXAzN3Y4ZXZwNzRoMHE4a3BqcXNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ

(Let me know if you have any problems with the calendar entry.  I just
gave up about getting right at the same time CEST, CET, EDT and DST).

If you need phone number details,  contact me privately

Thanks, Juan.

[Qemu-devel] [PATCH 3/3] ide: really restart pending and in-flight atapi dma

2016-03-28 Thread Denis V. Lunev

From: Pavel Butsykin 

Restart of ATAPI DMA used to be unreachable, because the request to do
so wasn't indicated in bus->error_status due to the lack of spare bits, and
ide_restart_bh() would return early doing nothing.

This patch makes use of the observation that not all bit combinations were
possible in ->error_status. In particular, IDE_RETRY_READ only made sense
together with IDE_RETRY_DMA or IDE_RETRY_PIO. This allows to re-use
IDE_RETRY_READ alone as an indicator of ATAPI DMA restart request.

To makes things more uniform, ATAPI DMA gets its own value for ->dma_cmd.
As a means against confusion, macros are added to test the state of
->error_status.

The patch fixes the restart of both in-flight and pending ATAPI DMA,
following the scheme similar to that of IDE DMA.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: John Snow 
---
 hw/ide/atapi.c| 15 ---
 hw/ide/core.c | 27 ---
 hw/ide/internal.h | 21 +
 3 files changed, 41 insertions(+), 22 deletions(-)

diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index acc52cd..fb9ae43 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -342,6 +342,7 @@ static void ide_atapi_cmd_reply(IDEState *s, int size, int 
max_size)
 block_acct_start(blk_get_stats(s->blk), &s->acct, size,
  BLOCK_ACCT_READ);
 s->status = READY_STAT | SEEK_STAT | DRQ_STAT;
+s->dma_cmd = IDE_DMA_ATAPI;
 ide_start_dma(s, ide_atapi_cmd_read_dma_cb);
 } else {
 s->status = READY_STAT | SEEK_STAT;
@@ -375,15 +376,18 @@ static void ide_atapi_cmd_check_status(IDEState *s)
 }
 /* ATAPI DMA support */
 
-/* XXX: handle read errors */
 static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret)
 {
 IDEState *s = opaque;
 int data_offset, n;
 
 if (ret < 0) {
-ide_atapi_io_error(s, ret);
-goto eot;
+if (ide_handle_rw_error(s, -ret, ide_dma_cmd_to_retry(s->dma_cmd))) {
+if (s->bus->error_status) {
+return;
+}
+goto eot;
+}
 }
 
 if (s->io_buffer_size > 0) {
@@ -464,6 +468,7 @@ static void ide_atapi_cmd_read_dma(IDEState *s, int lba, 
int nb_sectors,
 
 /* XXX: check if BUSY_STAT should be set */
 s->status = READY_STAT | SEEK_STAT | DRQ_STAT | BUSY_STAT;
+s->dma_cmd = IDE_DMA_ATAPI;
 ide_start_dma(s, ide_atapi_cmd_read_dma_cb);
 }
 
@@ -481,10 +486,6 @@ static void ide_atapi_cmd_read(IDEState *s, int lba, int 
nb_sectors,
 }
 }
 
-
-/* Called by *_restart_bh when the transfer function points
- * to ide_atapi_cmd
- */
 void ide_atapi_dma_restart(IDEState *s)
 {
 /*
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 8f86036..0425d86 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -56,7 +56,6 @@ static const int smart_attributes[][12] = {
 { 190,  0x03, 0x00, 0x45, 0x45, 0x1f, 0x00, 0x1f, 0x1f, 0x00, 0x00, 0x32},
 };
 
-static int ide_handle_rw_error(IDEState *s, int error, int op);
 static void ide_dummy_transfer_stop(IDEState *s);
 
 static void padstr(char *str, const char *src, int len)
@@ -772,7 +771,7 @@ void ide_dma_error(IDEState *s)
 ide_set_irq(s->bus);
 }
 
-static int ide_handle_rw_error(IDEState *s, int error, int op)
+int ide_handle_rw_error(IDEState *s, int error, int op)
 {
 bool is_read = (op & IDE_RETRY_READ) != 0;
 BlockErrorAction action = blk_get_error_action(s->blk, is_read, error);
@@ -782,8 +781,10 @@ static int ide_handle_rw_error(IDEState *s, int error, int 
op)
 s->bus->error_status = op;
 } else if (action == BLOCK_ERROR_ACTION_REPORT) {
 block_acct_failed(blk_get_stats(s->blk), &s->acct);
-if (op & IDE_RETRY_DMA) {
+if (IS_IDE_RETRY_DMA(op)) {
 ide_dma_error(s);
+} else if (IS_IDE_RETRY_ATAPI(op)) {
+ide_atapi_io_error(s, -error);
 } else {
 ide_rw_error(s);
 }
@@ -871,6 +872,8 @@ static void ide_dma_cb(void *opaque, int ret)
 ide_issue_trim, ide_dma_cb, s,
 DMA_DIRECTION_TO_DEVICE);
 break;
+default:
+abort();
 }
 return;
 
@@ -2517,15 +2520,13 @@ static void ide_restart_bh(void *opaque)
 if (s->bus->dma->ops->restart) {
 s->bus->dma->ops->restart(s->bus->dma);
 }
-}
-
-if (error_status & IDE_RETRY_DMA) {
+} else if (IS_IDE_RETRY_DMA(error_status)) {
 if (error_status & IDE_RETRY_TRIM) {
 ide_restart_dma(s, IDE_DMA_TRIM);
 } else {
 ide_restart_dma(s, is_read ? IDE_DMA_READ : IDE_DMA_WRITE);
 }
-} else if (error_status & IDE_RETRY_PIO) {
+} else if (IS_IDE_RETRY_PIO(error_status)) {
 if (is_read) {
 ide_sector_read(s);
 } else {
@@ -2533,15 +2534,11 @@ static void ide_restart_bh(void *opaque)
 }
 } else if (error_status & IDE_RETR

[Qemu-devel] [PATCH 1/3] ide: don't lose pending dma state

2016-03-28 Thread Denis V. Lunev

From: Pavel Butsykin 

If the migration occurs after the IDE DMA has been set up but before it
has been initiated, the state gets lost upon save/restore. Specifically,
->dma_cb callback gets cleared, so, when the guest eventually starts bus
mastering, the DMA never completes, causing the guest to time out the
operation.

OTOH all the infrastructure is already in place to restart the DMA if
the migration happens while the DMA is in progress.

So reuse that infrastructure, by setting bus->error_status based on
->dma_cmd in pre_save if ->dma_cb callback is already set but DMAING is
clear. This will indicate the need for restart and make sure ->dma_cb is
restored in ide_restart_bh(); however since DMAING is clear the state
upon restore will be exactly "ready for DMA" as before the save.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: John Snow 
---
 hw/ide/core.c |  9 +
 hw/ide/internal.h | 15 +++
 hw/ide/pci.c  |  4 
 3 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 241e840..8f86036 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -803,14 +803,7 @@ static void ide_dma_cb(void *opaque, int ret)
 return;
 }
 if (ret < 0) {
-int op = IDE_RETRY_DMA;
-
-if (s->dma_cmd == IDE_DMA_READ)
-op |= IDE_RETRY_READ;
-else if (s->dma_cmd == IDE_DMA_TRIM)
-op |= IDE_RETRY_TRIM;
-
-if (ide_handle_rw_error(s, -ret, op)) {
+if (ide_handle_rw_error(s, -ret, ide_dma_cmd_to_retry(s->dma_cmd))) {
 return;
 }
 }
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index 86bde26..68c7d0d 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -513,6 +513,21 @@ struct IDEDevice {
 #define IDE_RETRY_TRIM 0x80
 #define IDE_RETRY_HBA  0x100
 
+static inline uint8_t ide_dma_cmd_to_retry(uint8_t dma_cmd)
+{
+switch (dma_cmd) {
+case IDE_DMA_READ:
+return IDE_RETRY_DMA | IDE_RETRY_READ;
+case IDE_DMA_WRITE:
+return IDE_RETRY_DMA;
+case IDE_DMA_TRIM:
+return IDE_RETRY_DMA | IDE_RETRY_TRIM;
+default:
+break;
+}
+return 0;
+}
+
 static inline IDEState *idebus_active_if(IDEBus *bus)
 {
 return bus->ifs + bus->unit;
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 92ffee7..8d56a00 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -308,6 +308,10 @@ static void ide_bmdma_pre_save(void *opaque)
 BMDMAState *bm = opaque;
 uint8_t abused_bits = BM_MIGRATION_COMPAT_STATUS_BITS;
 
+if (!(bm->status & BM_STATUS_DMAING) && bm->dma_cb) {
+bm->bus->error_status =
+ide_dma_cmd_to_retry(bmdma_active_if(bm)->dma_cmd);
+}
 bm->migration_retry_unit = bm->bus->retry_unit;
 bm->migration_retry_sector_num = bm->bus->retry_sector_num;
 bm->migration_retry_nsector = bm->bus->retry_nsector;
-- 
2.1.4

[Qemu-devel] [PATCH 2/3] ide: restart atapi dma by re-evaluating command packet

2016-03-28 Thread Denis V. Lunev

From: Pavel Butsykin 

ide_atapi_dma_restart() used to just complete the DMA with an error,
under the assumption that there isn't enough information to restart it.

However, as the contents of the ->io_buffer is preserved, it looks safe to
just re-evaluate it and dispatch the ATAPI command again.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: John Snow 
---
 hw/ide/atapi.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index 1fe58ab..acc52cd 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -488,14 +488,13 @@ static void ide_atapi_cmd_read(IDEState *s, int lba, int 
nb_sectors,
 void ide_atapi_dma_restart(IDEState *s)
 {
 /*
- * I'm not sure we have enough stored to restart the command
- * safely, so give the guest an error it should recover from.
- * I'm assuming most guests will try to recover from something
- * listed as a medium error on a CD; it seems to work on Linux.
- * This would be more of a problem if we did any other type of
- * DMA operation.
+ * At this point we can just re-evaluate the packet command and start over.
+ * The presence of ->dma_cb callback in the pre_save ensures that the 
packet
+ * command has been completely sent and we can safely restart command.
  */
-ide_atapi_cmd_error(s, MEDIUM_ERROR, ASC_NO_SEEK_COMPLETE);
+s->unit = s->bus->retry_unit;
+s->bus->dma->ops->restart_dma(s->bus->dma);
+ide_atapi_cmd(s);
 }
 
 static inline uint8_t ide_atapi_set_profile(uint8_t *buf, uint8_t *index,
-- 
2.1.4

[Qemu-devel] [PATCH v2 for 2.6 0/3] ide: fix loss of the dma/atapi state during migration

2016-03-28 Thread Denis V. Lunev

This patch set fixes bugs in the IDE DMA and the IDE ATAPI on operations to
save/restore the state.

>From the user point of view this results in IDE timeouts in the guest
when the user reads from the DVD like the following:

[424332.169229] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[424332.170423] sr 0:0:0:0: [sr0] CDB:
[424332.171234] Read(10): 28 00 00 00 02 e4 00 00 01 00
[424332.172418] ata1.00: cmd a0/01:00:00:00:08/00:00:00:00:00/a0 tag 0 dma 2048 
in
 res 40/00:02:00:0c:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
[424332.174877] ata1.00: status: { DRDY }
[424337.212099] ata1: link is slow to respond, please be patient (ready=0)
[424342.220084] ata1: device not ready (errno=-16), forcing hardreset
[424342.222700] ata1: soft resetting link
[424342.381059] ata1.00: configured for MWDMA2
[424342.383693] ata1: EH complete

Another similar nasty effects are possible.

Changes from v1:
- added converter of IDE_DMA_* to IDE_RETRY_* (1)
- fixed handling of the IDE_RETRY_HBA at the ide_restart_bh function (3)

Signed-off-by: Pavel Butsykin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: John Snow 

Pavel Butsykin (3):
  ide: don't lose pending dma state
  ide: restart atapi dma by re-evaluating command packet
  ide: really restart pending and in-flight atapi dma

 hw/ide/atapi.c| 28 ++--
 hw/ide/core.c | 36 +---
 hw/ide/internal.h | 36 
 hw/ide/pci.c  |  4 
 4 files changed, 67 insertions(+), 37 deletions(-)

-- 
2.1.4

Re: [Qemu-devel] [PATCH v4 00/13] tests: Introducing docker tests

2016-03-28 Thread Alex Bennée


Fam Zheng  writes:

> Ping?

I'll do a sweep through this week, probably tomorrow.

>
> On Thu, 03/17 14:34, Fam Zheng wrote:
>> v4: Dropped the .gitignore patch in favor of tempfile [Alex];
>> Added one EXTRA_CONFIGURE_OPTS patch [Alex];
>>
>> 01: Fix commit message, and improve help text;
>> Fix pylint warnings, mostly long lines and some refactoring;
>> "--verbose" is now replaced with the shared args "--quiet";
>> 02: Update commit message;
>> Use "--quiet", drop "--verbose";
>> Fix typo;
>> 05: Mention "build_qemu" in commit message;
>> Add Alex's rev-by;
>> 10: Fix stale commit message;
>>
>> Add Alex's rev-by to v3 except above.
>>
>> This series adds a new "docker" make target family to run tests in created
>> docker containers.
>>
>> To begin with, this can be a place to store standard env/command 
>> combinations to
>> build and test QEMU.
>>
>> Secondly, CI usually provides "docker" capability, where we specify
>> standard/repeatable test environments, and run tests in them.  However, what
>> tests to cover is better maintained in-tree, in order to keep in sync with 
>> the
>> code development.
>>
>> Lastly, this makes it very simple for developers to replicate such tests
>> themselves.
>>
>>
>> Fam Zheng (13):
>>   tests: Add utilities for docker testing
>>   Makefile: Rules for docker testing
>>   docker: Add images
>>   docker: Add test runner
>>   docker: Add common.rc
>>   docker: Add quick test
>>   docker: Add full test
>>   docker: Add clang test
>>   docker: Add mingw test
>>   docker: Add travis tool
>>   docs: Add text for tests/docker in build-system.txt
>>   docker: Add EXTRA_CONFIGURE_OPTS
>>   MAINTAINERS: Add tests/docker
>>
>>  MAINTAINERS |   7 ++
>>  Makefile|   4 +-
>>  docs/build-system.txt   |   5 +
>>  tests/docker/Makefile.include   | 124 +
>>  tests/docker/common.rc  |  32 ++
>>  tests/docker/docker.py  | 191 
>> 
>>  tests/docker/dockerfiles/centos6.docker |   6 +
>>  tests/docker/dockerfiles/fedora.docker  |   7 ++
>>  tests/docker/dockerfiles/ubuntu.docker  |  11 ++
>>  tests/docker/run|  58 ++
>>  tests/docker/test-clang |  25 +
>>  tests/docker/test-full  |  17 +++
>>  tests/docker/test-mingw |  34 ++
>>  tests/docker/test-quick |  19 
>>  tests/docker/travis |  21 
>>  tests/docker/travis.py  |  48 
>>  16 files changed, 608 insertions(+), 1 deletion(-)
>>  create mode 100644 tests/docker/Makefile.include
>>  create mode 100755 tests/docker/common.rc
>>  create mode 100755 tests/docker/docker.py
>>  create mode 100644 tests/docker/dockerfiles/centos6.docker
>>  create mode 100644 tests/docker/dockerfiles/fedora.docker
>>  create mode 100644 tests/docker/dockerfiles/ubuntu.docker
>>  create mode 100755 tests/docker/run
>>  create mode 100755 tests/docker/test-clang
>>  create mode 100755 tests/docker/test-full
>>  create mode 100755 tests/docker/test-mingw
>>  create mode 100755 tests/docker/test-quick
>>  create mode 100755 tests/docker/travis
>>  create mode 100755 tests/docker/travis.py
>>
>> --
>> 2.4.3
>>
>>


--
Alex Bennée

[Qemu-devel] [PATCH v5] net: Allocating Large sized arrays to heap

2016-03-28 Thread Pooja Dhannawat

nc_sendv_compat has a huge stack usage of 69680 bytes approx.
Moving large arrays to heap to reduce stack usage.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Pooja Dhannawat 
---
 net/net.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/net/net.c b/net/net.c
index b0c832e..663da13 100644
--- a/net/net.c
+++ b/net/net.c
@@ -709,23 +709,28 @@ ssize_t qemu_send_packet_raw(NetClientState *nc, const 
uint8_t *buf, int size)
 static ssize_t nc_sendv_compat(NetClientState *nc, const struct iovec *iov,
int iovcnt, unsigned flags)
 {
-uint8_t buf[NET_BUFSIZE];
+uint8_t *buf = NULL;
 uint8_t *buffer;
 size_t offset;
+ssize_t ret;
 
 if (iovcnt == 1) {
 buffer = iov[0].iov_base;
 offset = iov[0].iov_len;
 } else {
+buf = g_new(uint8_t, NET_BUFSIZE);
 buffer = buf;
-offset = iov_to_buf(iov, iovcnt, 0, buf, sizeof(buf));
+offset = iov_to_buf(iov, iovcnt, 0, buf, NET_BUFSIZE);
 }
 
 if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) {
-return nc->info->receive_raw(nc, buffer, offset);
+ret = nc->info->receive_raw(nc, buffer, offset);
 } else {
-return nc->info->receive(nc, buffer, offset);
+ret = nc->info->receive(nc, buffer, offset);
 }
+
+g_free(buf);
+return ret;
 }
 
 ssize_t qemu_deliver_packet_iov(NetClientState *sender,
-- 
2.5.0

Re: [Qemu-devel] [Patch 1/1] omap_lcdc: Remove support for DEPTH != 32

2016-03-28 Thread Pooja Dhannawat

On Wed, Mar 23, 2016 at 11:35 AM, Pooja Dhannawat  wrote:

>
>
> On Wed, Mar 23, 2016 at 3:55 AM, Eric Blake  wrote:
>
>> On 03/12/2016 03:23 AM, Pooja Dhannawat wrote:
>> > As only DEPTH ==32 case is used, removing other dead code
>> > which is based on DEPTH !== 32
>> >
>> > Signed-off-by: Pooja Dhannawat 
>> > ---
>>
>> > +++ b/hw/display/omap_lcdc.c
>> > @@ -71,44 +71,23 @@ static void omap_lcd_interrupts(struct
>> omap_lcd_panel_s *s)
>> >
>> >  #define draw_line_func drawfn
>> >
>> > -#define DEPTH 8
>> > -#include "omap_lcd_template.h"
>> > -#define DEPTH 15
>> > -#include "omap_lcd_template.h"
>> > -#define DEPTH 16
>> > -#include "omap_lcd_template.h"
>> >  #define DEPTH 32
>> >  #include "omap_lcd_template.h"
>>
>> Umm, the old code WAS using DEPTH != 32.  I'm not a display expert, so
>> there may be justification in nuking that code; but the commit message
>> needs a better argument than "the code wasn't used" when it sure seems
>> to be used.  If the argument is that surface_bits_per_pixel() always
>> returns 32, that would be a good argument.
>>
>>Correct me If I am wrong, I dig down on this and I found internally it
> used PIXMAN_FORMAT which uses bits_per_pixel = 32 only.
>
> so surface_bits_per_pixel() always returns 32 (just for reference :
Message-ID: <56f43a06.9090...@redhat.com> ) as discussed in this mail
chain. I will make desired changed for this patch too. is it ok to go ahead
with changes ?


> >
>> >  static draw_line_func draw_line_table2[33] = {
>> >  [0 ... 32]   = NULL,
>> > -[8]  = draw_line2_8,
>> > -[15] = draw_line2_15,
>> > -[16] = draw_line2_16,
>> >  [32] = draw_line2_32,
>>
>> This array is now wasteful.  If surface_bits_per_pixel() always returns
>> 32, then just ditch this array, and later on, change:
>>
>> Yes sure.
>
>
>> /* Colour depth */
>> switch ((omap_lcd->palette[0] >> 12) & 7) {
>> case 1:
>> -   draw_line = draw_line_table2[surface_bits_per_pixel(surface)];
>> +   draw_line = draw_line2_32;
>> bpp = 2;
>> break;
>>
>> In other words, your cleanup, if justified, is incomplete.
>>
>> --
>> Eric Blake   eblake redhat com+1-919-301-3266
>> Libvirt virtualization library http://libvirt.org
>>
>>
>

Re: [Qemu-devel] [PATCH 1/3] NBD proto: forbid TRIM command without negotiation

2016-03-28 Thread Eric Blake

On 03/28/2016 04:43 AM, Denis V. Lunev wrote:
> From: Pavel Borzenkov 
> 
> There is a loophole in the protocol that allows a client to send TRIM
> request even if support for it wasn't negotiated with the server. State
> explicitly that the client MUST NOT send such command without prior
> successful negotiation.
> 
> Signed-off-by: Pavel Borzenkov 
> Reviewed-by: Roman Kagan 
> Signed-off-by: Denis V. Lunev 
> CC: Wouter Verhelst 
> CC: Eric Blake 
> CC: Alex Bligh 
> ---
>  doc/proto.md | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/doc/proto.md b/doc/proto.md
> index 6d1cb34..d54ed19 100644
> --- a/doc/proto.md
> +++ b/doc/proto.md
> @@ -471,6 +471,9 @@ The following request types exist:
>  about the contents of the export affected by this command, until
>  overwriting it again with `NBD_CMD_WRITE`.
>  
> +A client MUST NOT send a trim request unless `NBD_FLAG_SEND_TRIM`
> +was set in the export flags field.
> +

Do we also want to mention that the server SHOULD fail with EINVAL if
the client sends it anyway, and similarly if NBD_CMD_FLUSH was sent
without the appropriate export flag (but that the client should not rely
on that particular failure)?

But as this is a strict improvement,
Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 2/3] NBD proto: document additional error conditions

2016-03-28 Thread Eric Blake

On 03/28/2016 04:43 AM, Denis V. Lunev wrote:
> From: Pavel Borzenkov 
> 
> It is unclear what the behaviour of a server should be if it receives
> an unknown command. Similar uncertainty exists for command flags.
> 
> Make it explicit that the server should return EINVAL in all such cases.

I'd go one step further and document that for backwards compatibility,
clients should not rely on this particular error.

> 
> Signed-off-by: Pavel Borzenkov 
> Reviewed-by: Roman Kagan 
> Signed-off-by: Denis V. Lunev 
> CC: Wouter Verhelst 
> CC: Eric Blake 
> CC: Alex Bligh 
> ---
>  doc/proto.md | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/doc/proto.md b/doc/proto.md
> index d54ed19..036d6d9 100644
> --- a/doc/proto.md
> +++ b/doc/proto.md
> @@ -512,6 +512,13 @@ return `EINVAL` if it receives a read or trim request 
> including one or
>  more sectors beyond the size of the device.  It also SHOULD map the
>  `EDQUOT` and `EFBIG` errors to `ENOSPC`.  Finally, it SHOULD return
>  `EPERM` if it receives a write or trim request on a read-only export.
> +
> +The server SHOULD return `EINVAL` if it receives an unknown command.
> +
> +The server SHOULD return `EINVAL` if it receives an unknown command flag. It
> +also SHOULD return `EINVAL` if it receives a request with a flag not 
> explicitly
> +documented as applicable to the given request.

In other words, while this is good for the server side, we have proven
that existing server implementations did not follow this, and therefore
it would probably be good to add a sentence along the lines of:

However, clients should not rely on this particular error, as some
historical servers silently ignored invalid commands or flags.

But as what you have proposed is a strict improvement,
Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v12 0/9] PTimer fixes/features and ARM MPTimer conversion

2016-03-28 Thread Dmitry Osipenko


Hello Peter,

16.03.2016 17:36, Peter Maydell пишет:

On 30 January 2016 at 16:43, Dmitry Osipenko  wrote:

Changelog for ARM MPTimer QEMUTimer to ptimer conversion:


So, where are we with this series? It looked from the mailing list
threads as if there were still a few things Peter C hadn't got
closure on, but we're rapidly running out of time before hard
freeze :-(  I'd really rather not have to push it out by yet
another release, especially since it got posted back in January..

thanks
-- PMM



It still pending for the review and the due date is tomorrow, maybe you would 
want to pick some of the PTimer patches for 2.6.


PTimer fixes ready for submission:
[PATCH v12 1/9] hw/ptimer: Fix issues caused by the adjusted timer limit value
[PATCH v12 2/9] hw/ptimer: Perform counter wrap around if timer already expired
[PATCH v12 3/9] hw/ptimer: Update .delta on period/freq change

The rest of the patches are either prerequisites for the ARM MPtimer conversion 
or pending for the review, we will work them out once Peter Crosthwaite would 
get some spare time.


--
Dmitry

Re: [Qemu-devel] [PATCH 3/3] NBD proto: add "Command flags" section

2016-03-28 Thread Eric Blake

On 03/28/2016 04:43 AM, Denis V. Lunev wrote:
> From: Pavel Borzenkov 
> 
> Add separate "Command flags" section to make it clear which flags are
> currently defined by the protocol.
> 
> Signed-off-by: Pavel Borzenkov 
> Reviewed-by: Roman Kagan 
> Signed-off-by: Denis V. Lunev 
> CC: Wouter Verhelst 
> CC: Eric Blake 
> CC: Alex Bligh 
> ---
>  doc/proto.md | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/doc/proto.md b/doc/proto.md
> index 036d6d9..662f741 100644
> --- a/doc/proto.md
> +++ b/doc/proto.md
> @@ -485,6 +485,16 @@ The following request types exist:
>  Currently one such message is known: `NBD_CMD_CACHE`, with type set to
>  5, implemented by xnbd.
>  
> + Command flags
> +
> +This field of 16 bits is sent by the client with every request and provides
> +additional information to the server to execute the command. Refer to
> +aforementioned "Request types" section for information about the flags

Maybe:

s/aforementioned "Request types" section/the "Request types" section above/

> +supported by particular commands.
> +
> +- bit 0, `NBD_CMD_FLAG_FUA`; should be set to 1 if the client requires
> +  "Force Unit Access" mode of operation

Trailing dot?  Should you also mention which command(s) it is valid
with? (NBD_CMD_WRITE for now, until other extension commands are added)
 It might also be worth mentioning that the flag should not be sent
unless export flags included NBD_FLAG_SEND_FUA.

> +
>   Error values
>  
>  The error values are used for the error field in the reply message.
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH] doc: Mention proper use of handle

2016-03-28 Thread Eric Blake

Although the proper use of the handle field during transmission
phase was implied, it never hurts to make it more explicit that
clients should alter the handle on each message, and the server
repeat the handle unchanged, in order for the client to track
when the server is sending replies out of order.

Signed-off-by: Eric Blake 
---
 doc/proto.md | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/doc/proto.md b/doc/proto.md
index 6d1cb34..d0102e0 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -200,7 +200,11 @@ S: 64 bits, handle
 S: (*length* bytes of data if the request is of type `NBD_CMD_READ`)

 Replies need not be sent in the same order as requests (i.e., requests
-may be handled by the server asynchronously).
+may be handled by the server asynchronously).  Clients SHOULD send a
+different value of handle for each request, and the server MUST use the
+same value for handle as was sent by the client for each request that
+the server is replying to, so that the client may correlate which
+request is receiving a response.

 ## Values

-- 
2.5.5

Re: [Qemu-devel] [PATCH v2] migration: skip sending ram pages released by virtio-balloon driver.

2016-03-28 Thread Eric Blake

On 03/27/2016 10:16 PM, Jitendra Kolhe wrote:
> While measuring live migration performance for qemu/kvm guest, it
> was observed that the qemu doesn’t maintain any intelligence for the
> guest ram pages which are released by the guest balloon driver and
> treat such pages as any other normal guest ram pages. This has direct
> impact on overall migration time for the guest which has released
> (ballooned out) memory to the host.
> 
> In case of large systems, where we can configure large guests with 1TB
> and with considerable amount of memory release by balloon driver to the,
> host the migration time gets worse.

s/the, host/the host,/

> 
> The optimization gets temporarily disabled, if the balloon operation is

s/disabled,/disabled/

> in progress. Since the optimization skips scanning and migrating control
> information for ballooned out pages, we might skip guest ram pages in
> cases where the guest balloon driver has freed the ram page to the guest
> but not yet informed the host/qemu about the ram page
> (VIRTIO_BALLOON_F_MUST_TELL_HOST). In such case with optimization, we
> might skip migrating ram pages which the guest is using. Since this
> problem is specific to balloon leak, we can restrict balloon operation in
> progress check to only balloon leak operation in progress check.
> 
> The optimization also get permanently disabled (for all subsequent

s/get/gets/

> migrations) in case any of the migration uses postcopy capability. In case
> of postcopy the balloon bitmap would be required to send after vm_stop,
> which has significant impact on the downtime. Moreover, the applications
> in the guest space won’t be actually faulting on the ram pages which are
> already ballooned out, the proposed optimization will not show any
> improvement in migration time during postcopy.
> 
> Signed-off-by: Jitendra Kolhe 
> ---
> Changed in v2:
>  - Resolved compilation issue for qemu-user binaries in exec.c
>  - Localize balloon bitmap test to save_zero_page().
>  - Updated version string for newly added migration capability to 2.7.
>  - Made minor modifications to patch commit text.

I'll leave the technical review to others.

> +++ b/qapi-schema.json
> @@ -544,11 +544,14 @@
>  #  been migrated, pulling the remaining pages along as needed. NOTE: 
> If
>  #  the migration fails during postcopy the VM will fail.  (since 2.6)
>  #
> +# @skip-balloon: Skip scanning ram pages released by virtio-balloon driver.
> +#  (since 2.7)
> +#
>  # Since: 1.2
>  ##
>  { 'enum': 'MigrationCapability',
>'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
> -   'compress', 'events', 'postcopy-ram'] }
> +   'compress', 'events', 'postcopy-ram', 'skip-balloon'] }

Does this flag make sense to always have enabled (in which case we don't
need it as a flag), or are there cases where we'd explicitly want to
disable it?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH v5] Sort the fw_cfg file list

2016-03-28 Thread minyard

From: Gerd Hoffmann 

Entries are inserted in filename order instead of being
appended to the end in case sorting is enabled.

This will avoid any future issues of moving the file creation
around, it doesn't matter what order they are created now,
the will always be in filename order.

Signed-off-by: Gerd Hoffmann 

Added machine type handling for compatibility.  This was
a fairly complex change, this will preserve the order of fw_cfg
for older versions no matter what order the firmware files
actually come in.  A list is kept of the correct legacy order
and the entries will be inserted based upon their order in
the list.  Except that some entries are ordered (in a specific
area of the list) based upon what order they appear on the
command line.  Special handling is added for those entries.

Signed-off-by: Corey Minyard 
---

Resend, as I left out the v5 in the header.

Changes from v4:

 * Add a space in fw_cfg.h to improve readability.

 * Move the set order override for user file out of the main function
   in vl.c to parse_fw_cfg().  That way it has the fw_cfg and can
   call the fww_cfg_set_order_override() function directly.

 * Added a ROM I missed: genroms/kvmvapic.bin.  I looked around
   some more and I couldn't find anything else.  I spent some time
   trying lots of different configurations and didn't find anything
   new.

I'm not sure who should merge this.

 hw/core/loader.c  |  10 
 hw/i386/pc.c  |   4 ++
 hw/i386/pc_piix.c |   1 +
 hw/i386/pc_q35.c  |   1 +
 hw/nvram/fw_cfg.c | 131 +++---
 include/hw/boards.h   |   3 +-
 include/hw/loader.h   |   2 +
 include/hw/nvram/fw_cfg.h |   8 +++
 vl.c  |  10 +++-
 9 files changed, 159 insertions(+), 11 deletions(-)

diff --git a/hw/core/loader.c b/hw/core/loader.c
index 8e8031c..8a3d518 100644
--- a/hw/core/loader.c
+++ b/hw/core/loader.c
@@ -1051,6 +1051,16 @@ void rom_set_fw(FWCfgState *f)
 fw_cfg = f;
 }
 
+void rom_set_order_override(int order)
+{
+fw_cfg_set_order_override(fw_cfg, order);
+}
+
+void rom_reset_order_override(void)
+{
+fw_cfg_reset_order_override(fw_cfg);
+}
+
 static Rom *find_rom(hwaddr addr)
 {
 Rom *rom;
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 56ec6cd..aa3f4f3 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1406,6 +1406,7 @@ DeviceState *pc_vga_init(ISABus *isa_bus, PCIBus *pci_bus)
 {
 DeviceState *dev = NULL;
 
+rom_set_order_override(FW_CFG_ORDER_OVERRIDE_VGA);
 if (pci_bus) {
 PCIDevice *pcidev = pci_vga_init(pci_bus);
 dev = pcidev ? &pcidev->qdev : NULL;
@@ -1413,6 +1414,7 @@ DeviceState *pc_vga_init(ISABus *isa_bus, PCIBus *pci_bus)
 ISADevice *isadev = isa_vga_init(isa_bus);
 dev = isadev ? DEVICE(isadev) : NULL;
 }
+rom_reset_order_override();
 return dev;
 }
 
@@ -1541,6 +1543,7 @@ void pc_nic_init(ISABus *isa_bus, PCIBus *pci_bus)
 {
 int i;
 
+rom_set_order_override(FW_CFG_ORDER_OVERRIDE_NIC);
 for (i = 0; i < nb_nics; i++) {
 NICInfo *nd = &nd_table[i];
 
@@ -1550,6 +1553,7 @@ void pc_nic_init(ISABus *isa_bus, PCIBus *pci_bus)
 pci_nic_init_nofail(nd, pci_bus, "e1000", NULL);
 }
 }
+rom_reset_order_override();
 }
 
 void pc_pci_device_init(PCIBus *pci_bus)
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 6f8c2cd..447703b 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -429,6 +429,7 @@ static void pc_i440fx_2_5_machine_options(MachineClass *m)
 m->alias = NULL;
 m->is_default = 0;
 pcmc->save_tsc_khz = false;
+m->legacy_fw_cfg_order = 1;
 SET_MACHINE_COMPAT(m, PC_COMPAT_2_5);
 }
 
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 46522c9..04f3609 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -291,6 +291,7 @@ static void pc_q35_2_5_machine_options(MachineClass *m)
 pc_q35_2_6_machine_options(m);
 m->alias = NULL;
 pcmc->save_tsc_khz = false;
+m->legacy_fw_cfg_order = 1;
 SET_MACHINE_COMPAT(m, PC_COMPAT_2_5);
 }
 
diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
index 7866248..a58efe4 100644
--- a/hw/nvram/fw_cfg.c
+++ b/hw/nvram/fw_cfg.c
@@ -28,6 +28,7 @@
 #include "hw/isa/isa.h"
 #include "hw/nvram/fw_cfg.h"
 #include "hw/sysbus.h"
+#include "hw/boards.h"
 #include "trace.h"
 #include "qemu/error-report.h"
 #include "qemu/config-file.h"
@@ -68,11 +69,14 @@ struct FWCfgState {
 /*< public >*/
 
 FWCfgEntry entries[2][FW_CFG_MAX_ENTRY];
+int entry_order[FW_CFG_MAX_ENTRY];
 FWCfgFiles *files;
 uint16_t cur_entry;
 uint32_t cur_offset;
 Notifier machine_ready;
 
+int fw_cfg_order_override;
+
 bool dma_enabled;
 dma_addr_t dma_addr;
 AddressSpace *dma_as;
@@ -664,12 +668,87 @@ void fw_cfg_add_i64(FWCfgState *s, uint16_t key, uint64_t 
value)
 fw_cfg_add_bytes(s, key, copy, sizeof(value));
 }
 
+void fw_cfg_set_order_override(FWCfgState *s, int order)
+{
+

Re: [Qemu-devel] [PATCHv4 1/2] Rework ipv6 options

2016-03-28 Thread Eric Blake

On 03/24/2016 05:34 PM, Samuel Thibault wrote:
> Rename the recently-added ip6-foo options into ipv6-foo options, to make
> them coherent with other ipv6 options.
> 
> Also rework the documentation.

Needs to go in to 2.6 before we bake in the wrong names.

> 
> Signed-off-by: Samuel Thibault 
> ---
>  net/slirp.c  |  6 +++---
>  qapi-schema.json | 25 -
>  qemu-options.hx  | 18 ++
>  3 files changed, 29 insertions(+), 20 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCHv4 2/2] slirp: Allow disabling IPv4 or IPv6

2016-03-28 Thread Eric Blake

On 03/24/2016 05:34 PM, Samuel Thibault wrote:
> Add ipv4 and ipv6 boolean options, so the user can setup IPv4-only and
> IPv6-only network environments.
> 
> Signed-off-by: Samuel Thibault 
> 
> ---
> 

> @@ -812,10 +822,18 @@ int net_init_slirp(const NetClientOptions *opts, const 
> char *name,
>  int ret;
>  const NetdevUserOptions *user;
>  const char **dnssearch;
> +int ipv4 = 1, ipv6 = 1;

These should be bool, and set to true.

>  
>  assert(opts->type == NET_CLIENT_OPTIONS_KIND_USER);
>  user = opts->u.user.data;
>  
> +if ((user->has_ipv6 && user->ipv6 && !user->has_ipv4)
> +|| (user->has_ipv4 && !user->ipv4))
> +ipv4 = 0;

Inconsistent with current qemu style.  Should be:

if ((user->has_ipv6 && user->ipv6 && !user->has_ipv4) ||
(user->has_ipv4 && !user->ipv4)) {
ipv4 = false;
}

In particular, the missing {} (should) fail ./scripts/checkpatch.pl.

> +if ((user->has_ipv4 && user->ipv4 && !user->has_ipv6)
> +|| (user->has_ipv6 && !user->ipv6))
> +ipv6 = 0;

More missing {} and unusual indentation.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 4/5] tcg: reorder removal from lists in tb_phys_invalidate

2016-03-28 Thread Sergey Fedorov

On 17/03/16 18:14, Sergey Fedorov wrote:
> On 17/03/16 18:09, Paolo Bonzini wrote:
>>
>> On 17/03/2016 14:46, sergey.fedo...@linaro.org wrote:
>>>   void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t
>>> page_addr)
>>>   {
>>> -CPUState *cpu;
>>>   PageDesc *p;
>>>   unsigned int h, n1;
>>> +tb_page_addr_t pc;
>>>   tb_page_addr_t phys_pc;
>>>   TranslationBlock *tb1, *tb2;
>>>   -/* remove the TB from the hash list */
>>> -phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
>>> -h = tb_phys_hash_func(phys_pc);
>>> -tb_hash_remove(&tcg_ctx.tb_ctx.tb_phys_hash[h], tb);
>>> -
>>> -/* remove the TB from the page list */
>>> -if (tb->page_addr[0] != page_addr) {
>>> -p = page_find(tb->page_addr[0] >> TARGET_PAGE_BITS);
>>> -tb_page_remove(&p->first_tb, tb);
>>> -invalidate_page_bitmap(p);
>>> -}
>>> -if (tb->page_addr[1] != -1 && tb->page_addr[1] != page_addr) {
>>> -p = page_find(tb->page_addr[1] >> TARGET_PAGE_BITS);
>>> -tb_page_remove(&p->first_tb, tb);
>>> -invalidate_page_bitmap(p);
>>> -}
>>> -
>>> -tcg_ctx.tb_ctx.tb_invalidated_flag = 1;
>>> -
>> Did you investigate the removal of this setting of tb_invalidated_flag?
>>
>> My recollection is that it is okay to remove it because at worse it
>> would cause a tb_add_jump from an invalidated source to a valid
>> destination.  This should be harmless as long as the source has been
>> tb_phys_invalidated and not tb_flushed.  But this needs to be checked.
>
> Thanks for pointing that. I should investigate it to make sure,
> although arm32/arm64/x86_64 Linux boots fine as well as the latest
> Alex's kmv-unit-tests pass.

The use pattern of 'tb_invalidated_flag' is a bit intricate; correct me,
if I'm wrong about the following. Basically, 'tb_invalidated_flag' was
meant to catch two events:
 * some TB has been invalidated by tb_phys_invalidate();
 * the whole translation buffer has been flushed by tb_flush().

Then it is checked to ensure:
 * the last executed TB can be safely patched to directly call the next
   one in cpu_exec();
 * the original TB should be provided for further possible invalidation
   along with the temporarily generated TB when in cpu_exec_nocache().

What, I think, we couldn't be sure in is the cpu_exec_nocache() case. It
could look like a kind of corner case, but it could result in stalls, if
the original TB isn't invalidated properly, see b4ac20b4df0d ("cpu-exec:
fix cpu_exec_nocache").

So I would suggest the following solution:s
 (1) Use 'tb->pc' as an indicator of whether TB is valid; check for it
 in cpu_exec() when deciding on whether to patch the last executed
 TB or not
 (2) Use 'tcg_ctx.tb_ctx.tb_flush_count' to check for translation buffer
 flushes; capture it before calling tb_gen_code() and compare to it
 afterwards to check if tb_flush() has been called in between

What do you think?

Kind regards,
Sergey

Re: [Qemu-devel] [RFC for-2.7 1/1] block/qapi: Add query-block-node-tree

2016-03-28 Thread Eric Blake

On 03/26/2016 10:33 AM, Max Reitz wrote:

>> We insert the new child to the head, not the tail...
> 
> Well, the idea is that the order of children doesn't really matter; The
> only thing that describes the behavior of a child is its role. For
> instance, for qcow2 it doesn't matter whether the "file" or the
> "backing" BDS is the first child.
> 
> However, for quorum, the order might matter (e.g. in FIFO mode). But
> then again, the order is clearly specified by the role again: The first
> child is the one with the "children.0" role.
> 
> So I don't think this is real problem as long as I add a note to the
> documentation that the order of objects in the @children array is
> undefined and does not have any significance.

What happens when we hot-add/-delete children from a FIFO mode quorum?
Obviously, we can select which child to delete, so if we delete the
child.0 node, do we have guaranteed semantics on which node takes over
the role as the primary child?  Conversely, when adding a node, do we
have a way to specify whether the new node must be at the front (assume
the child.0 role) or back (must not affect the current FIFO roles)?

I think order is important enough for FIFO mode that we ought to try and
represent it explicitly in the command, rather than getting it backwards
or stating it is undefined.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v5 1/2] QMP: add query-hotpluggable-cpus

2016-03-28 Thread Eric Blake

On 03/26/2016 12:56 PM, Igor Mammedov wrote:
> it will allow mgmt to query present and hotpluggable
> CPU objects, it is required from a target platform that
> wish to support command to implement and set
>  MachineClass.query_hotpluggable_cpus
> callback, which will return a list of possible CPU objects
> with options that would be needed for hotplugging possible
> CPU objects.
> 
> There are:
> 'type': 'str' - QOM CPU object type for usage with device_add
> 'vcpus-count': 'int' - number of logical VCPU threads per
> CPU object (mgmt needs to know)
> 
> and a set of optional fields that are to used for hotplugging
> a CPU objects and would allows mgmt tools to know what/where
> it could be hotplugged;
> [node],[socket],[core],[thread]
> 
> For present CPUs there is a 'qom-path' field which
> would allow mgmt to inspect whatever object/abstraction
> the target platform considers as CPU object.
> 
> Signed-off-by: Igor Mammedov 
> ---

> +++ b/qapi-schema.json
> @@ -4126,3 +4126,44 @@
>  ##
>  { 'enum': 'ReplayMode',
>'data': [ 'none', 'record', 'play' ] }
> +
> +##
> +# CpuInstanceProperties
> +#
> +# @node: NUMA node ID the CPU belongs to, optional

Mark this with '#optional', like you do in the other members.

> +# @socket: #optional socket number within node/board the CPU belongs to
> +# @core: #optional core number within socket the CPU belongs to
> +# @thread: #optional thread number within core the CPU belongs to
> +#
> +# Since: 2.7
> +{ 'struct': 'CpuInstanceProperties',

Missing ## trailing doc marker.  Doesn't matter quite yet, but will once
Marc-Andre's patches for automated doc generation land.

> +  'data': { '*node': 'int',
> +'*socket': 'int',
> +'*core': 'int',
> +'*thread': 'int'
> +  }
> +}
> +
> +##
> +# @HotpluggableCPU
> +#
> +# @type: CPU object type for usage with device_add command
> +# @props: list of properties to be used for hotplugging CPU
> +# @vcpus-count: number of logical VCPU threads @HotpluggableCPU provides
> +# @qom-path: #optional link to existing CPU object if CPU is present or
> +#omitted if CPU is not present.
> +#
> +# Since: 2.7
> +{ 'struct': 'HotpluggableCPU',

Another missing ##

> +  'data': { 'type': 'str',
> +'vcpus-count': 'int',
> +'props': 'CpuInstanceProperties',
> +'*qom-path': 'str'
> +  }
> +}
> +
> +##
> +# @query-hotpluggable-cpus
> +#
> +# Since: 2.7
> +{ 'command': 'query-hotpluggable-cpus', 'returns': ['HotpluggableCPU'] }

Looks okay.

> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 9e05365..85ffba3 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -4853,3 +4853,46 @@ Example:
>   {"type": 0, "out-pport": 0, "pport": 0, "vlan-id": 3840,
>"pop-vlan": 1, "id": 251658240}
> ]}
> +
> +EQMP
> +
> +{
> +.name   = "query-hotpluggable-cpus",
> +.args_type  = "",
> +.mhandler.cmd_new = qmp_marshal_query_hotpluggable_cpus,
> +},
> +
> +SQMP
> +Show  existing/possible CPUs

Why two spaces?

> +---

Line is too long.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v5] Sort the fw_cfg file list

2016-03-28 Thread Michael S. Tsirkin

On Mon, Mar 28, 2016 at 09:42:18AM -0500, miny...@acm.org wrote:
> From: Gerd Hoffmann 
> 
> Entries are inserted in filename order instead of being
> appended to the end in case sorting is enabled.
> 
> This will avoid any future issues of moving the file creation
> around, it doesn't matter what order they are created now,
> the will always be in filename order.
> 
> Signed-off-by: Gerd Hoffmann 
> 
> Added machine type handling for compatibility.  This was
> a fairly complex change, this will preserve the order of fw_cfg
> for older versions no matter what order the firmware files
> actually come in.  A list is kept of the correct legacy order
> and the entries will be inserted based upon their order in
> the list.  Except that some entries are ordered (in a specific
> area of the list) based upon what order they appear on the
> command line.  Special handling is added for those entries.
> 
> Signed-off-by: Corey Minyard 


Reviewed-by: Michael S. Tsirkin 

Gerd, will you pick this up, or should I?


> ---
> 
> Resend, as I left out the v5 in the header.
> 
> Changes from v4:
> 
>  * Add a space in fw_cfg.h to improve readability.
> 
>  * Move the set order override for user file out of the main function
>in vl.c to parse_fw_cfg().  That way it has the fw_cfg and can
>call the fww_cfg_set_order_override() function directly.
> 
>  * Added a ROM I missed: genroms/kvmvapic.bin.  I looked around
>some more and I couldn't find anything else.  I spent some time
>trying lots of different configurations and didn't find anything
>new.
> 
> I'm not sure who should merge this.
> 
>  hw/core/loader.c  |  10 
>  hw/i386/pc.c  |   4 ++
>  hw/i386/pc_piix.c |   1 +
>  hw/i386/pc_q35.c  |   1 +
>  hw/nvram/fw_cfg.c | 131 
> +++---
>  include/hw/boards.h   |   3 +-
>  include/hw/loader.h   |   2 +
>  include/hw/nvram/fw_cfg.h |   8 +++
>  vl.c  |  10 +++-
>  9 files changed, 159 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/core/loader.c b/hw/core/loader.c
> index 8e8031c..8a3d518 100644
> --- a/hw/core/loader.c
> +++ b/hw/core/loader.c
> @@ -1051,6 +1051,16 @@ void rom_set_fw(FWCfgState *f)
>  fw_cfg = f;
>  }
>  
> +void rom_set_order_override(int order)
> +{
> +fw_cfg_set_order_override(fw_cfg, order);
> +}
> +
> +void rom_reset_order_override(void)
> +{
> +fw_cfg_reset_order_override(fw_cfg);
> +}
> +
>  static Rom *find_rom(hwaddr addr)
>  {
>  Rom *rom;
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 56ec6cd..aa3f4f3 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1406,6 +1406,7 @@ DeviceState *pc_vga_init(ISABus *isa_bus, PCIBus 
> *pci_bus)
>  {
>  DeviceState *dev = NULL;
>  
> +rom_set_order_override(FW_CFG_ORDER_OVERRIDE_VGA);
>  if (pci_bus) {
>  PCIDevice *pcidev = pci_vga_init(pci_bus);
>  dev = pcidev ? &pcidev->qdev : NULL;
> @@ -1413,6 +1414,7 @@ DeviceState *pc_vga_init(ISABus *isa_bus, PCIBus 
> *pci_bus)
>  ISADevice *isadev = isa_vga_init(isa_bus);
>  dev = isadev ? DEVICE(isadev) : NULL;
>  }
> +rom_reset_order_override();
>  return dev;
>  }
>  
> @@ -1541,6 +1543,7 @@ void pc_nic_init(ISABus *isa_bus, PCIBus *pci_bus)
>  {
>  int i;
>  
> +rom_set_order_override(FW_CFG_ORDER_OVERRIDE_NIC);
>  for (i = 0; i < nb_nics; i++) {
>  NICInfo *nd = &nd_table[i];
>  
> @@ -1550,6 +1553,7 @@ void pc_nic_init(ISABus *isa_bus, PCIBus *pci_bus)
>  pci_nic_init_nofail(nd, pci_bus, "e1000", NULL);
>  }
>  }
> +rom_reset_order_override();
>  }
>  
>  void pc_pci_device_init(PCIBus *pci_bus)
> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> index 6f8c2cd..447703b 100644
> --- a/hw/i386/pc_piix.c
> +++ b/hw/i386/pc_piix.c
> @@ -429,6 +429,7 @@ static void pc_i440fx_2_5_machine_options(MachineClass *m)
>  m->alias = NULL;
>  m->is_default = 0;
>  pcmc->save_tsc_khz = false;
> +m->legacy_fw_cfg_order = 1;
>  SET_MACHINE_COMPAT(m, PC_COMPAT_2_5);
>  }
>  
> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> index 46522c9..04f3609 100644
> --- a/hw/i386/pc_q35.c
> +++ b/hw/i386/pc_q35.c
> @@ -291,6 +291,7 @@ static void pc_q35_2_5_machine_options(MachineClass *m)
>  pc_q35_2_6_machine_options(m);
>  m->alias = NULL;
>  pcmc->save_tsc_khz = false;
> +m->legacy_fw_cfg_order = 1;
>  SET_MACHINE_COMPAT(m, PC_COMPAT_2_5);
>  }
>  
> diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
> index 7866248..a58efe4 100644
> --- a/hw/nvram/fw_cfg.c
> +++ b/hw/nvram/fw_cfg.c
> @@ -28,6 +28,7 @@
>  #include "hw/isa/isa.h"
>  #include "hw/nvram/fw_cfg.h"
>  #include "hw/sysbus.h"
> +#include "hw/boards.h"
>  #include "trace.h"
>  #include "qemu/error-report.h"
>  #include "qemu/config-file.h"
> @@ -68,11 +69,14 @@ struct FWCfgState {
>  /*< public >*/
>  
>  FWCfgEntry entries[2][FW_CFG_MAX_ENTRY];
> +int entry_

Re: [Qemu-devel] [Nbd] [PATCH 2/2] NBD proto: add GET_LBA_STATUS extension

2016-03-28 Thread Eric Blake

On 03/25/2016 02:49 AM, Wouter Verhelst wrote:

>> You may also want to add a rule that for all future extensions, any
>> command that requires data in the server response (other than the server
>> response to NBD_CMD_READ) must include a 32-bit length as the first
>> field of its data payload, so that the server reply is always structured.
> 
> Right.
> 
>> Hmm, I still think it would be hard to write a wireshark dissector for
>> server replies, since there is no indication whether a given reply will
>> include data or not.
> 
> Well, a wireshark dissector already exists. However, it is very old; it
> doesn't support the NBD_CMD_TRIM or NBD_CMD_FLUSH commands, it doesn't
> support the NBD_CMD_FLAG_FUA option to NBD_CMD_WRITE, and it doesn't
> deal with negotiation at all. It was written when newstyle negotiation
> didn't exist yet, and scanning for INIT_PASSWD at the time was too hard,
> according to the guy who wrote it (don't remember who that was).

And I've never written a wireshark dissector myself, to know how easy or
hard it would be to extend that work.

> 
>> The _client_ knows (well, any well-written client
>> that uses a different value for 'handle' for every command sent to the
>> server), because it can read the returned 'handle' field to know what
>> command it matches to and therefore what data to expect; but a
>> third-party observer seeing _just_ the server messages has no idea which
>> server responses should have payload.
> 
> It can if it knows enough about the protocol, but granted, that makes it
> harder for us to extend the protocol without breaking the dissector.
> 
>> Scanning the stream and assuming that a new reply starts each time
>> NBD_REPLY_MAGIC is encountered is a close approximation, but hits
>> false positives if the data being returned for NBD_CMD_READ contains
>> the same magic number as part of its contents.
> 
> Indeed.

One benefit of TCP: we can rely on packet boundaries (whether or not
fragmentation is also happening); I'm not sure if UDP shares the same
benefits (then again, I'm not even sure if UDP is usable for the NBD
protocol, since we definitely rely on in-order delivery of packets: a
read command can definitely return more bytes than even a jumbo frame
can contain, even though we do allow out-of-order delivery of replies).
 So if the server always sends each reply as the start of its own packet
(rather than coalescing multiple replies into a single network packet),
and the dissector only looks for magic numbers at the start of a packet,
then any server packet not starting with the magic number must be data
payload, and you could potentially even avoid the false positives by
choosing to break data replies into packets by adjusting packet lengths
by one byte any time the next data chunk would otherwise happen to start
with the same two bytes as the magic number.  But I haven't actually
tested any of this, to know if packet coalescing goes on, and I
certainly don't think it is worth complicating the server to avoid false
positive magic number detection by splitting read payloads across packet
boundaries, just for the benefit of wire-sniffing.

> I like this. However, before committing to it, I would like to see
> Markus' feedback on that (explicit Cc added, even though he's on the
> list, too).
> 
> We'd also need a way to communicate the ability to speak this protocol
> from the kernel to userspace before telling the server that the client
> understands something which maybe its kernel doesn't.

proto.md already documents that ioctl()s exist for the user space to
inform the kernel about options sent by the server prior to kicking off
transmission phase, and that the NBD protocol itself does not go into
full detail about available ioctl()s (the kernel/userspace coordination
does not affect the protocol).  It seems fairly straightforward for the
kernel to supply another ioctl() that userspace can use to learn whether
it is allowed or forbidden from advertising structured reply status
during the handshake phase (including the case where the ioctl() is not
present being treated as the client must not enable structured replies).

> 
> Markus?

Markus is on vacation for a bit, so we'll just have to wait for a reply.

> 
>> I could write up a negotiation of global flags for structured reply
>> lengths as an extension proposal, if you think it is worth it.
> 
> I think it is worth it...

Okay, I'll give it a shot, in a separate thread.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [FreeBSD] Host build i386 failing to build aarch64 targets

2016-03-28 Thread Sean Bruno

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

This seems to at least allow things to compile, qemu-system seems to
be able to boot images on i386, but I may have missed something:

diff --git a/cpus.c b/cpus.c
index 23cf7aa..11f8bab 100644
- --- a/cpus.c
+++ b/cpus.c
@@ -338,15 +338,8 @@ static int64_t qemu_icount_round(int64_t count)

 static void icount_warp_rt(void)
 {
- -/* The icount_warp_timer is rescheduled soon after
vm_clock_warp_start
- - * changes from -1 to another value, so the race here is okay.
- - */
- -if (atomic_read(&vm_clock_warp_start) == -1) {
- -return;
- -}
- -
 seqlock_write_lock(&timers_state.vm_clock_seqlock);
- -if (runstate_is_running()) {
+if ((vm_clock_warp_start != -1) && runstate_is_running()) {
 int64_t clock = REPLAY_CLOCK(REPLAY_CLOCK_VIRTUAL_RT,
  cpu_get_clock_locked());
 int64_t warp_delta;
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQF8BAEBCgBmBQJW+WNBXxSAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w
ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRCQUFENDYzMkU3MTIxREU4RDIwOTk3REQx
MjAxRUZDQTFFNzI3RTY0AAoJEBIB78oecn5kyMwH/0Gq6gHKhjtwSCInMNJgG0hv
oAZmv0/2p0U1S3Kp84ALRU90z3K7nwKjShw7uBK6On5Z0mT7CpsFrrVLb1FErJAW
VP3IDlTxUUpY2fvMlJjYRHcBmvVUpiKbFQ9eFBBdYK1erT7lVK4/lGtoCaltoqr+
HRY8XBV5m3IXY/MIypUuTq9hv4/KjFs/Jg5pDYbCGTvXgj3xbqRDdNpEKGLtmKHg
UD3MpzD1LTVsTsT6t8ys6CMU3ykeWKwRc1vY33kfOCalK1VVVIWoq2XUmQK8DUGj
U6FjXuPmA7+jY7C9RQL5uzJnz+gLG3JZE0j8LfLeRzA9fxAMe+tg8ZbhFU9vohw=
=PQrC
-END PGP SIGNATURE-

[Qemu-devel] [PATCH v15 0/4][WIP] block/gluster: add support for multiple gluster servers

2016-03-28 Thread pkalever

From: Prasanna Kumar Kalever 

WIP: As soon as discriminated union support is added use it with GlusterServer

This version of patches are rebased on master branch.

Prasanna Kumar Kalever (4):
  block/gluster: rename [server, volname, image] -> [host, volume, path]
  block/gluster: code cleanup
  block/gluster: using new qapi schema
  block/gluster: add support for multiple gluster servers

v1:
multiple host addresses but common port number and transport type
pattern: URI syntax with query (?) delimitor
syntax:
file=gluster[+transport-type]://host1:24007/testvol/a.img\
 ?server=host2&server=host3

v2:
multiple host addresses each have their own port number, but all use
 common transport type
pattern: URI syntax  with query (?) delimiter
syntax:
file=gluster[+transport-type]://[host[:port]]/testvol/a.img\
 [?server=host1[:port]\
  &server=host2[:port]]

v3:
multiple host addresses each have their own port number and transport type
pattern: changed to json
syntax:
'json:{"driver":"qcow2","file":{"driver":"gluster","volume":"testvol",
   "path":"/path/a.qcow2","server":
 [{"host":"1.2.3.4","port":"24007","transport":"tcp"},
  {"host":"4.5.6.7","port":"24008","transport":"rdma"}] } }'

v4, v5:
address comments from "Eric Blake" 
renamed:
'backup-volfile-servers' -> 'volfile-servers'

v6:
address comments from Peter Krempa 
renamed:
 'volname'->  'volume'
 'image-path' ->  'path'
 'server' ->  'host'

v7:
fix for v6 (initialize num_servers to 1 and other typos)

v8:
split patch set v7 into series of 3 as per Peter Krempa 
review comments

v9:
reorder the series of patches addressing "Eric Blake" 
review comments

v10:
fix mem-leak as per Peter Krempa  review comments

v11:
using qapi-types* defined structures as per "Eric Blake" 
review comments.

v12:
fix crash caused in qapi_free_BlockdevOptionsGluster

v13:
address comments from "Jeff Cody" 

v14:
address comments from "Eric Blake" 
split patch 3/3 into two
rename input option and variable from 'servers' to 'server'

v15:
patch 1/4 changed the commit message as per Eric's comment
patch 2/4 are unchanged
patch 3/4 addressed Jeff's comments
patch 4/4 concentrates on unix transport related help info,
rename 'parse_transport_option()' to 'qapi_enum_parse()',
address memory leaks and other comments given by Jeff and Eric

 block/gluster.c  | 476 +--
 qapi/block-core.json |  60 ++-
 2 files changed, 409 insertions(+), 127 deletions(-)

-- 
2.5.0

[Qemu-devel] [PATCH v15 3/4] block/gluster: using new qapi schema

2016-03-28 Thread pkalever

From: Prasanna Kumar Kalever 

this patch adds 'GlusterServer' related schema in qapi/block-core.json

Signed-off-by: Prasanna Kumar Kalever 
---
 block/gluster.c  | 101 ++-
 qapi/block-core.json |  60 --
 2 files changed, 108 insertions(+), 53 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index 6c73e62..9b360e3 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -12,6 +12,10 @@
 #include "block/block_int.h"
 #include "qemu/uri.h"
 
+#define GLUSTER_OPT_FILENAME"filename"
+#define GLUSTER_DEFAULT_PORT24007
+
+
 typedef struct GlusterAIOCB {
 int64_t size;
 int ret;
@@ -30,15 +34,6 @@ typedef struct BDRVGlusterReopenState {
 struct glfs_fd *fd;
 } BDRVGlusterReopenState;
 
-typedef struct GlusterConf {
-char *host;
-int port;
-char *volume;
-char *path;
-char *transport;
-} GlusterConf;
-
-
 static QemuOptsList qemu_gluster_create_opts = {
 .name = "qemu-gluster-create-opts",
 .head = QTAILQ_HEAD_INITIALIZER(qemu_gluster_create_opts.head),
@@ -62,7 +57,7 @@ static QemuOptsList runtime_opts = {
 .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
 .desc = {
 {
-.name = "filename",
+.name = GLUSTER_OPT_FILENAME,
 .type = QEMU_OPT_STRING,
 .help = "URL to the gluster image",
 },
@@ -71,18 +66,7 @@ static QemuOptsList runtime_opts = {
 };
 
 
-static void qemu_gluster_gconf_free(GlusterConf *gconf)
-{
-if (gconf) {
-g_free(gconf->host);
-g_free(gconf->volume);
-g_free(gconf->path);
-g_free(gconf->transport);
-g_free(gconf);
-}
-}
-
-static int parse_volume_options(GlusterConf *gconf, char *path)
+static int parse_volume_options(BlockdevOptionsGluster *gconf, char *path)
 {
 char *p, *q;
 
@@ -144,8 +128,10 @@ static int parse_volume_options(GlusterConf *gconf, char 
*path)
  * file=gluster+unix:///testvol/dir/a.img?socket=/tmp/glusterd.socket
  * file=gluster+rdma://1.2.3.4:24007/testvol/a.img
  */
-static int qemu_gluster_parseuri(GlusterConf *gconf, const char *filename)
+static int qemu_gluster_parseuri(BlockdevOptionsGluster **pgconf,
+ const char *filename)
 {
+BlockdevOptionsGluster *gconf;
 URI *uri;
 QueryParams *qp = NULL;
 bool is_unix = false;
@@ -156,20 +142,24 @@ static int qemu_gluster_parseuri(GlusterConf *gconf, 
const char *filename)
 return -EINVAL;
 }
 
+gconf = g_new0(BlockdevOptionsGluster, 1);
+gconf->server = g_new0(GlusterServer, 1);
+
 /* transport */
 if (!uri->scheme || !strcmp(uri->scheme, "gluster")) {
-gconf->transport = g_strdup("tcp");
+gconf->server->transport = GLUSTER_TRANSPORT_TCP;
 } else if (!strcmp(uri->scheme, "gluster+tcp")) {
-gconf->transport = g_strdup("tcp");
+gconf->server->transport = GLUSTER_TRANSPORT_TCP;
 } else if (!strcmp(uri->scheme, "gluster+unix")) {
-gconf->transport = g_strdup("unix");
+gconf->server->transport = GLUSTER_TRANSPORT_UNIX;
 is_unix = true;
 } else if (!strcmp(uri->scheme, "gluster+rdma")) {
-gconf->transport = g_strdup("rdma");
+gconf->server->transport = GLUSTER_TRANSPORT_RDMA;
 } else {
 ret = -EINVAL;
 goto out;
 }
+gconf->server->has_transport = true;
 
 ret = parse_volume_options(gconf, uri->path);
 if (ret < 0) {
@@ -191,13 +181,23 @@ static int qemu_gluster_parseuri(GlusterConf *gconf, 
const char *filename)
 ret = -EINVAL;
 goto out;
 }
-gconf->host = g_strdup(qp->p[0].value);
+gconf->server->host = g_strdup(qp->p[0].value);
 } else {
-gconf->host = g_strdup(uri->server ? uri->server : "localhost");
-gconf->port = uri->port;
+gconf->server->host = g_strdup(uri->server ? uri->server : 
"localhost");
+if (uri->port) {
+gconf->server->port = uri->port;
+} else {
+gconf->server->port = GLUSTER_DEFAULT_PORT;
+}
+gconf->server->has_port = true;
 }
 
+*pgconf = gconf;
+
 out:
+if (ret < 0) {
+qapi_free_BlockdevOptionsGluster(gconf);
+}
 if (qp) {
 query_params_free(qp);
 }
@@ -205,14 +205,15 @@ out:
 return ret;
 }
 
-static struct glfs *qemu_gluster_init(GlusterConf *gconf, const char *filename,
-  Error **errp)
+static struct glfs *qemu_gluster_init(BlockdevOptionsGluster **pgconf,
+  const char *filename, Error **errp)
 {
 struct glfs *glfs = NULL;
 int ret;
 int old_errno;
+BlockdevOptionsGluster *gconf = NULL;
 
-ret = qemu_gluster_parseuri(gconf, filename);
+ret = qemu_gluster_parseuri(&gconf, filename);
 if (ret < 0) {
 error_setg(errp, "Usage: file=gluster[+transport]://[host[:port]]/"

[Qemu-devel] [PATCH v15 2/4] block/gluster: code cleanup

2016-03-28 Thread pkalever

From: Prasanna Kumar Kalever 

unified coding styles of multiline function arguments and other error functions
moved random declarations of structures and other list variables

Signed-off-by: Prasanna Kumar Kalever 
Reviewed-by: Eric Blake 
---
 block/gluster.c | 113 ++--
 1 file changed, 60 insertions(+), 53 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index afdd509..6c73e62 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -25,6 +25,11 @@ typedef struct BDRVGlusterState {
 struct glfs_fd *fd;
 } BDRVGlusterState;
 
+typedef struct BDRVGlusterReopenState {
+struct glfs *glfs;
+struct glfs_fd *fd;
+} BDRVGlusterReopenState;
+
 typedef struct GlusterConf {
 char *host;
 int port;
@@ -33,6 +38,39 @@ typedef struct GlusterConf {
 char *transport;
 } GlusterConf;
 
+
+static QemuOptsList qemu_gluster_create_opts = {
+.name = "qemu-gluster-create-opts",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_gluster_create_opts.head),
+.desc = {
+{
+.name = BLOCK_OPT_SIZE,
+.type = QEMU_OPT_SIZE,
+.help = "Virtual disk size"
+},
+{
+.name = BLOCK_OPT_PREALLOC,
+.type = QEMU_OPT_STRING,
+.help = "Preallocation mode (allowed values: off, full)"
+},
+{ /* end of list */ }
+}
+};
+
+static QemuOptsList runtime_opts = {
+.name = "gluster",
+.head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
+.desc = {
+{
+.name = "filename",
+.type = QEMU_OPT_STRING,
+.help = "URL to the gluster image",
+},
+{ /* end of list */ }
+},
+};
+
+
 static void qemu_gluster_gconf_free(GlusterConf *gconf)
 {
 if (gconf) {
@@ -177,7 +215,7 @@ static struct glfs *qemu_gluster_init(GlusterConf *gconf, 
const char *filename,
 ret = qemu_gluster_parseuri(gconf, filename);
 if (ret < 0) {
 error_setg(errp, "Usage: file=gluster[+transport]://[host[:port]]/"
-   "volume/path[?socket=...]");
+ "volume/path[?socket=...]");
 errno = -ret;
 goto out;
 }
@@ -255,20 +293,6 @@ static void gluster_finish_aiocb(struct glfs_fd *fd, 
ssize_t ret, void *arg)
 qemu_bh_schedule(acb->bh);
 }
 
-/* TODO Convert to fine grained options */
-static QemuOptsList runtime_opts = {
-.name = "gluster",
-.head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
-.desc = {
-{
-.name = "filename",
-.type = QEMU_OPT_STRING,
-.help = "URL to the gluster image",
-},
-{ /* end of list */ }
-},
-};
-
 static void qemu_gluster_parse_flags(int bdrv_flags, int *open_flags)
 {
 assert(open_flags != NULL);
@@ -286,7 +310,7 @@ static void qemu_gluster_parse_flags(int bdrv_flags, int 
*open_flags)
 }
 }
 
-static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
+static int qemu_gluster_open(BlockDriverState *bs, QDict *options,
  int bdrv_flags, Error **errp)
 {
 BDRVGlusterState *s = bs->opaque;
@@ -335,12 +359,6 @@ out:
 return ret;
 }
 
-typedef struct BDRVGlusterReopenState {
-struct glfs *glfs;
-struct glfs_fd *fd;
-} BDRVGlusterReopenState;
-
-
 static int qemu_gluster_reopen_prepare(BDRVReopenState *state,
BlockReopenQueue *queue, Error **errp)
 {
@@ -427,7 +445,9 @@ static void qemu_gluster_reopen_abort(BDRVReopenState 
*state)
 
 #ifdef CONFIG_GLUSTERFS_ZEROFILL
 static coroutine_fn int qemu_gluster_co_write_zeroes(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, BdrvRequestFlags flags)
+ int64_t sector_num,
+ int nb_sectors,
+ BdrvRequestFlags flags)
 {
 int ret;
 GlusterAIOCB acb;
@@ -455,7 +475,7 @@ static inline bool gluster_supports_zerofill(void)
 }
 
 static inline int qemu_gluster_zerofill(struct glfs_fd *fd, int64_t offset,
-int64_t size)
+int64_t size)
 {
 return glfs_zerofill(fd, offset, size);
 }
@@ -467,7 +487,7 @@ static inline bool gluster_supports_zerofill(void)
 }
 
 static inline int qemu_gluster_zerofill(struct glfs_fd *fd, int64_t offset,
-int64_t size)
+int64_t size)
 {
 return 0;
 }
@@ -496,19 +516,17 @@ static int qemu_gluster_create(const char *filename,
 tmp = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
 if (!tmp || !strcmp(tmp, "off")) {
 prealloc = 0;
-} else if (!strcmp(tmp, "full") &&
-   gluster_supports_zerofill()) {
+} else if (!strcmp(tmp, "full") && gluster_supports_zerofill()) {
 prealloc = 1;
 } else {
 error_setg(errp, "Invalid preallocation mode: '%s'"
-" or Glus

[Qemu-devel] [PATCH v15 4/4] block/gluster: add support for multiple gluster servers

2016-03-28 Thread pkalever

From: Prasanna Kumar Kalever 

This patch adds a way to specify multiple volfile servers to the gluster
block backend of QEMU with tcp|rdma transport types and their port numbers.

Problem:

Currently VM Image on gluster volume is specified like this:

file=gluster[+tcp]://host[:port]/testvol/a.img

Assuming we have three hosts in trusted pool with replica 3 volume
in action and unfortunately host (mentioned in the command above) went down
for some reason, since the volume is replica 3 we now have other 2 hosts
active from which we can boot the VM.

But currently there is no mechanism to pass the other 2 gluster host
addresses to qemu.

Solution:

New way of specifying VM Image on gluster volume with volfile servers:
(We still support old syntax to maintain backward compatibility)

Basic command line syntax looks like:

Pattern I:
 -drive driver=gluster,
volume=testvol,path=/path/a.raw,
server.0.host=1.2.3.4,
   [server.0.port=24007,]
   [server.0.transport=tcp,]
server.1.host=5.6.7.8,
   [server.1.port=24008,]
   [server.1.transport=rdma,]
server.2.host=/var/run/glusterd.socket,
server.2.transport=unix ...

Pattern II:
 'json:{"driver":"qcow2","file":{"driver":"gluster",
   "volume":"testvol","path":"/path/a.qcow2",
   "server":[{tuple0},{tuple1}, ...{tupleN}]}}'

   driver  => 'gluster' (protocol name)
   volume  => name of gluster volume where our VM image resides
   path=> absolute path of image in gluster volume

  {tuple}  => {"host":"1.2.3.4"[,"port":"24007","transport":"tcp"]}

   host=> host address (hostname/ipv4/ipv6 addresses/socket path)
   port=> port number on which glusterd is listening. (default 24007)
   transport   => transport type used to connect to gluster management daemon,
   it can be tcp|rdma|unix (default 'tcp')

Examples:
1.
 -drive driver=qcow2,file.driver=gluster,
file.volume=testvol,file.path=/path/a.qcow2,
file.server.0.host=1.2.3.4,
file.server.0.port=24007,
file.server.0.transport=tcp,
file.server.1.host=5.6.7.8,
file.server.1.port=24008,
file.server.1.transport=rdma,
file.server.2.host=/var/run/glusterd.socket
file.server.1.transport=unix
2.
 'json:{"driver":"qcow2","file":{"driver":"gluster","volume":"testvol",
 "path":"/path/a.qcow2","server":
 [{"host":"1.2.3.4","port":"24007","transport":"tcp"},
  {"host":"4.5.6.7","port":"24008","transport":"rdma"},
  {"host":"/var/run/glusterd.socket","transport":"unix"}] } }'

This patch gives a mechanism to provide all the server addresses, which are in
replica set, so in case host1 is down VM can still boot from any of the
active hosts.

This is equivalent to the backup-volfile-servers option supported by
mount.glusterfs (FUSE way of mounting gluster volume)

Credits: Sincere thanks to Kevin Wolf  and
"Deepak C Shetty"  for inputs and all their support

Signed-off-by: Prasanna Kumar Kalever 
---
 block/gluster.c  | 298 ---
 qapi/block-core.json |   4 +-
 2 files changed, 261 insertions(+), 41 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index 9b360e3..ad3e683 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -11,8 +11,16 @@
 #include 
 #include "block/block_int.h"
 #include "qemu/uri.h"
+#include "qemu/error-report.h"
 
 #define GLUSTER_OPT_FILENAME"filename"
+#define GLUSTER_OPT_VOLUME  "volume"
+#define GLUSTER_OPT_PATH"path"
+#define GLUSTER_OPT_HOST"host"
+#define GLUSTER_OPT_PORT"port"
+#define GLUSTER_OPT_TRANSPORT   "transport"
+#define GLUSTER_OPT_SERVER_PATTERN  "server."
+
 #define GLUSTER_DEFAULT_PORT24007
 
 
@@ -65,6 +73,46 @@ static QemuOptsList runtime_opts = {
 },
 };
 
+static QemuOptsList runtime_json_opts = {
+.name = "gluster_json",
+.head = QTAILQ_HEAD_INITIALIZER(runtime_json_opts.head),
+.desc = {
+{
+.name = GLUSTER_OPT_VOLUME,
+.type = QEMU_OPT_STRING,
+.help = "name of gluster volume where VM image resides",
+},
+{
+.name = GLUSTER_OPT_PATH,
+.type = QEMU_OPT_STRING,
+.help = "absolute path to image file in gluster volume",
+},
+{ /* end of list */ }
+},
+};
+
+static QemuOptsList runtime_tuple_opts = {
+.name = "gluster_tuple",
+.head = QTAILQ_HEAD_INITIALIZER(runtime_tuple_opts.head),
+.desc = {
+{
+.name = GLUSTER_OPT_HOST,
+.type = QEMU_OPT_STRING,
+.help = "host address (hostname/ipv4/ipv6 addresses/socket path)",
+},
+{
+.name = GLUSTER_OPT_PORT,
+.type = QEMU_OPT_NUMBER,
+.help = "port number on which glusterd is listening (default 
24007)",
+},
+{
+.name = GLUSTER_OPT_TRANSPORT,
+

[Qemu-devel] [PATCH v15 1/4] block/gluster: rename [server, volname, image] -> [host, volume, path]

2016-03-28 Thread pkalever

From: Prasanna Kumar Kalever 

A future patch will add support for multiple gluster servers. Existing
terminology is a bit unusual in relation to what names are used by
other networked devices, and doesn't map very well to the terminology
we expect to use for multiple servers.  Therefore, rename the following
options:
'server'  -> 'host'
'image'   -> 'path'
'volname' -> 'volume'

Signed-off-by: Prasanna Kumar Kalever 
Reviewed-by: Eric Blake 
---
 block/gluster.c | 54 +++---
 1 file changed, 27 insertions(+), 27 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index 65077a0..afdd509 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -26,19 +26,19 @@ typedef struct BDRVGlusterState {
 } BDRVGlusterState;
 
 typedef struct GlusterConf {
-char *server;
+char *host;
 int port;
-char *volname;
-char *image;
+char *volume;
+char *path;
 char *transport;
 } GlusterConf;
 
 static void qemu_gluster_gconf_free(GlusterConf *gconf)
 {
 if (gconf) {
-g_free(gconf->server);
-g_free(gconf->volname);
-g_free(gconf->image);
+g_free(gconf->host);
+g_free(gconf->volume);
+g_free(gconf->path);
 g_free(gconf->transport);
 g_free(gconf);
 }
@@ -58,19 +58,19 @@ static int parse_volume_options(GlusterConf *gconf, char 
*path)
 if (*p == '\0') {
 return -EINVAL;
 }
-gconf->volname = g_strndup(q, p - q);
+gconf->volume = g_strndup(q, p - q);
 
-/* image */
+/* path */
 p += strspn(p, "/");
 if (*p == '\0') {
 return -EINVAL;
 }
-gconf->image = g_strdup(p);
+gconf->path = g_strdup(p);
 return 0;
 }
 
 /*
- * file=gluster[+transport]://[server[:port]]/volname/image[?socket=...]
+ * file=gluster[+transport]://[host[:port]]/volume/path[?socket=...]
  *
  * 'gluster' is the protocol.
  *
@@ -79,10 +79,10 @@ static int parse_volume_options(GlusterConf *gconf, char 
*path)
  * tcp, unix and rdma. If a transport type isn't specified, then tcp
  * type is assumed.
  *
- * 'server' specifies the server where the volume file specification for
+ * 'host' specifies the host where the volume file specification for
  * the given volume resides. This can be either hostname, ipv4 address
  * or ipv6 address. ipv6 address needs to be within square brackets [ ].
- * If transport type is 'unix', then 'server' field should not be specified.
+ * If transport type is 'unix', then 'host' field should not be specified.
  * The 'socket' field needs to be populated with the path to unix domain
  * socket.
  *
@@ -91,9 +91,9 @@ static int parse_volume_options(GlusterConf *gconf, char 
*path)
  * default port. If the transport type is unix, then 'port' should not be
  * specified.
  *
- * 'volname' is the name of the gluster volume which contains the VM image.
+ * 'volume' is the name of the gluster volume which contains the VM image.
  *
- * 'image' is the path to the actual VM image that resides on gluster volume.
+ * 'path' is the path to the actual VM image that resides on gluster volume.
  *
  * Examples:
  *
@@ -102,7 +102,7 @@ static int parse_volume_options(GlusterConf *gconf, char 
*path)
  * file=gluster+tcp://1.2.3.4:24007/testvol/dir/a.img
  * file=gluster+tcp://[1:2:3:4:5:6:7:8]/testvol/dir/a.img
  * file=gluster+tcp://[1:2:3:4:5:6:7:8]:24007/testvol/dir/a.img
- * file=gluster+tcp://server.domain.com:24007/testvol/dir/a.img
+ * file=gluster+tcp://host.domain.com:24007/testvol/dir/a.img
  * file=gluster+unix:///testvol/dir/a.img?socket=/tmp/glusterd.socket
  * file=gluster+rdma://1.2.3.4:24007/testvol/a.img
  */
@@ -153,9 +153,9 @@ static int qemu_gluster_parseuri(GlusterConf *gconf, const 
char *filename)
 ret = -EINVAL;
 goto out;
 }
-gconf->server = g_strdup(qp->p[0].value);
+gconf->host = g_strdup(qp->p[0].value);
 } else {
-gconf->server = g_strdup(uri->server ? uri->server : "localhost");
+gconf->host = g_strdup(uri->server ? uri->server : "localhost");
 gconf->port = uri->port;
 }
 
@@ -176,18 +176,18 @@ static struct glfs *qemu_gluster_init(GlusterConf *gconf, 
const char *filename,
 
 ret = qemu_gluster_parseuri(gconf, filename);
 if (ret < 0) {
-error_setg(errp, "Usage: file=gluster[+transport]://[server[:port]]/"
-   "volname/image[?socket=...]");
+error_setg(errp, "Usage: file=gluster[+transport]://[host[:port]]/"
+   "volume/path[?socket=...]");
 errno = -ret;
 goto out;
 }
 
-glfs = glfs_new(gconf->volname);
+glfs = glfs_new(gconf->volume);
 if (!glfs) {
 goto out;
 }
 
-ret = glfs_set_volfile_server(glfs, gconf->transport, gconf->server,
+ret = glfs_set_volfile_server(glfs, gconf->transport, gconf->host,
 gconf->port);
 if (ret < 0) {
 goto out;
@@ -205,9 +205,9 @@ static struct glfs *qemu_gluster_

Re: [Qemu-devel] [PATCH 0/6] virtio: refactor host notifiers

2016-03-28 Thread Paolo Bonzini

On 28/03/2016 05:55, TU BO wrote:
> Hi Cornelia:
> 
> I got two crash with qemu master + "[PATCH 0/6] virtio: refactor host
> notifiers",

Hi Tu Bo,

please always include the assertion patch at
https://lists.gnu.org/archive/html/qemu-block/2016-03/msg00546.html in
your tests.  Can you include the backtrace from all threads with that patch?

Thanks,

Paolo

Re: [Qemu-devel] [PING][PATCH] Added the -m flag feature to stellaris boards

2016-03-28 Thread Aurelio Remonda

>>  #define NUM_IRQ_LINES 64
>> +#define LM3S811EVB_DEFAULT_DC0 0x1f00 /* Default value for dc0 
>> sram_size half */
>> +#define LM3S6965EVB_DEFAULT_DC0 0xff00 /* Default value for dc0 
>> sram_size half */
>
> These don't seem to be the same as the default values we had previously ?

I thought it will be easier to see in hexadecimal rather than decimal,
these are ram_size just like
user-given are.

>> +/* RAM size should be divided by 256 in order to get a valid 16 bits 
>> dc0 value */
>> +ram_size = (ram_size >> 8) - 1;
>> +
>> +if (ram_size > DC0_MAX_SRAM) {
>> +error_report("Requested RAM size is too big for this board. The 
>> maximum allowed is 16M.");
>> +exit(EXIT_FAILURE);
>> +}
>> +
>> +board->dc0 |= ram_size << DC0_SRAM_SHIFT;
>>  flash_size = (((board->dc0 & 0x) + 1) << 1) * 1024;
>>  sram_size = ((board->dc0 >> 18) + 1) * 1024;
>
> Do you know why your DC0_SRAM_SHIFT is 16 but this line which
> calculates sram_size from board->dc0 is doing a shift by 18 ?

DC0_SRAM_SHIFT will just place the ram_size, prevously calculated
based on the decimal ram_size
value, in the correct dc0 half. Then the sram_size will be calculated as always.

>> @@ -1391,6 +1405,7 @@ static void lm3s811evb_class_init(ObjectClass *oc, 
>> void *data)
>>
>>  mc->desc = "Stellaris LM3S811EVB";
>>  mc->init = lm3s811evb_init;
>> +mc->default_ram_size = LM3S811EVB_DEFAULT_DC0;
>
> The default_ram_size should be a size in bytes, not a DC0 value.

As I said before, I thought it was easier to see ff00 rather than
65280, or 1f00 instead of 7936.
These values are fixed as default values to match the dc0 default
value on each board.

In the case of LM3S811EVB the value is aligned up and you get a ram
size of 8192 but your
dc0 will still be 0x001f001f.

In the other board LM3S6965EVB the value is aligned up to a ram size
of 65536 but your dc0
will be the default 0x00ff007f.

Thank you.
-- 
Aurelio Remonda

Taller Technologies Argentina

Software Engineer

San Lorenzo 47, 3rd Floor, Office 5
Córdoba, Argentina
Phone: +54-351-4217888 / 4218211

Re: [Qemu-devel] [PATCH 4/5] tcg: reorder removal from lists in tb_phys_invalidate

2016-03-28 Thread Sergey Fedorov

On 17/03/16 16:46, sergey.fedo...@linaro.org wrote:
> First the translation block is invalidated, for which a simple write
> to tb->pc is enough.  This means that cpu-exec will not pick up anymore
> the block, though it may still execute it through chained jumps.  This
> also replaces the NULLing out of the pointer in the CPUs' local cache.

Although, using 'tb->pc' to mark a TB as invalid is probably not such a
good idea. There may be some cases when PC could become equal to -1. For
example, ARMv6-M uses PC >= 0xFFF0 to perform exception return. So
we'd better introduce a separate 'tb->valid' or 'tb->invalid' flag.

Kind regards,
Sergey

Re: [Qemu-devel] [REPOST v2] [PATCH] tpm: adapt sysfs cancel path for new TPM driver

2016-03-28 Thread Stefan Berger


Peter, Michael,

  I have 3 TPM related patches on the mailing list that are fixing 
buges. They are either being overlooked or ignored. How can we make 
progress with them?


Regards,
   Stefan



On 03/21/2016 10:39 AM, Stefan Berger wrote:

This patch addresses BZ 1281413.

Adapt the sysfs TPM command cancel path for the TPM driver that
does not use a miscdevice anymore since Linux 4.0. Support old
and new paths.

Signed-off-by: Stefan Berger 
---
  hw/tpm/tpm_passthrough.c | 23 +++
  1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/hw/tpm/tpm_passthrough.c b/hw/tpm/tpm_passthrough.c
index e98efb7..e1edb38 100644
--- a/hw/tpm/tpm_passthrough.c
+++ b/hw/tpm/tpm_passthrough.c
@@ -377,6 +377,8 @@ static int tpm_passthrough_open_sysfs_cancel(TPMBackend *tb)
  int fd = -1;
  char *dev;
  char path[PATH_MAX];
+const char *prefix[] = {"misc/", "tpm/"};
+int i;

  if (tb->cancel_path) {
  fd = qemu_open(tb->cancel_path, O_WRONLY);
@@ -390,16 +392,21 @@ static int tpm_passthrough_open_sysfs_cancel(TPMBackend 
*tb)
  dev = strrchr(tpm_pt->tpm_dev, '/');
  if (dev) {
  dev++;
-if (snprintf(path, sizeof(path), "/sys/class/misc/%s/device/cancel",
- dev) < sizeof(path)) {
-fd = qemu_open(path, O_WRONLY);
-if (fd >= 0) {
-tb->cancel_path = g_strdup(path);
-} else {
-error_report("tpm_passthrough: Could not open TPM cancel "
- "path %s : %s", path, strerror(errno));
+for (i = 0; i < ARRAY_SIZE(prefix); i++) {
+if (snprintf(path, sizeof(path),
+ "/sys/class/%s%s/device/cancel",
+ prefix[i], dev) < sizeof(path)) {
+fd = qemu_open(path, O_WRONLY);
+if (fd >= 0) {
+tb->cancel_path = g_strdup(path);
+break;
+}
  }
  }
+if (fd < 0) {
+error_report("tpm_passthrough: Could not open TPM cancel "
+ "path %s : %s", path, strerror(errno));
+}
  } else {
 error_report("tpm_passthrough: Bad TPM device path %s",
  tpm_pt->tpm_dev);

Re: [Qemu-devel] [PATCH] hw/display contains files named *_template.h. These are included many times with different values of the DEPTH macro. However, only the DEPTH == 32 case is used. Removed suppo

2016-03-28 Thread John Snow



On 03/27/2016 03:33 AM, Nutan Shinde wrote:
> Hi,
> 
> So, this patch won't be considered, because there is already a patch
> related to the same issue, right?
> 

You'd have to follow the other thread to see if the patch was accepted
or not. Even if it isn't, etiquette is generally that it'd be rude to
not allow them a chance to spin their own v2/revision.

So if your patch was sent later than theirs, it's likely this one won't
be considered, correct.

--js

> 
> On Wed, Mar 23, 2016 at 7:40 PM, John Snow  > wrote:
> 
> 
> 
> On 03/23/2016 01:30 AM, Nutan Shinde wrote:
> > Hi John,
> >
> > My patch is same as the patch mentioned in the link. I just picked up
> > the task from BitSizedTask page, as it was not crossed.
> >
> > I guess the issue is resolved then, what should I do in this case?
> Also,
> > please tell me, how do I know which tasks are already taken up?
> >
> > Regards,
> > Nutan Shinde.
> >
> 
> In some cases, since GSoC and Outreachy are approaching, a lot of people
> are trying to pick off the Bite Sizes Tasks right now. You may do well
> by checking
> https://lists.nongnu.org/archive/html/qemu-devel/2016-03/threads.html to
> make sure that nobody else has submitted a fix for the task you wish
> to fix.
> 
> In some cases, fixes may have been submitted recently, but aren't
> accepted into QEMU just yet. Searching the list archives will help you
> determine which is which.
> 
> Thanks,
> --js
> 
>

Re: [Qemu-devel] [PATCH 06/22] hbitmap: load/store

2016-03-28 Thread John Snow



On 03/23/2016 04:22 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 23.03.2016 00:49, John Snow wrote:
>>
>> On 03/15/2016 04:04 PM, Vladimir Sementsov-Ogievskiy wrote:
>>> Add functions for load/store HBitmap to BDS, using clusters table:
>>> Last level of the bitmap is splitted into chunks of 'cluster_size'
>>> size. Each cell of the table contains offset in bds, to load/store
>>> corresponding chunk.
>>>
>>> Also,
>>>  0 in cell means all-zeroes-chunk (should not be saved)
>>>  1 in cell means all-ones-chunk (should not be saved)
>>>  hbitmap_prepare_store() fills table with
>>>0 for all-zeroes chunks
>>>1 for all-ones chunks
>>>2 for others
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>   block/dirty-bitmap.c |  23 +
>>>   include/block/dirty-bitmap.h |  11 +++
>>>   include/qemu/hbitmap.h   |  12 +++
>>>   util/hbitmap.c   | 209
>>> +++
>>>   4 files changed, 255 insertions(+)
>>>
>>> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
>>> index e68c177..816c6ee 100644
>>> --- a/block/dirty-bitmap.c
>>> +++ b/block/dirty-bitmap.c
>>> @@ -396,3 +396,26 @@ int64_t bdrv_get_dirty_count(BdrvDirtyBitmap
>>> *bitmap)
>>>   {
>>>   return hbitmap_count(bitmap->bitmap);
>>>   }
>>> +
>>> +int bdrv_dirty_bitmap_load(BdrvDirtyBitmap *bitmap, BlockDriverState
>>> *bs,
>>> +   const uint64_t *table, uint32_t table_size,
>>> +   uint32_t cluster_size)
>>> +{
>>> +return hbitmap_load(bitmap->bitmap, bs, table, table_size,
>>> cluster_size);
>>> +}
>>> +
>>> +int bdrv_dirty_bitmap_prepare_store(const BdrvDirtyBitmap *bitmap,
>>> +uint32_t cluster_size,
>>> +uint64_t *table,
>>> +uint32_t *table_size)
>>> +{
>>> +return hbitmap_prepare_store(bitmap->bitmap, cluster_size,
>>> + table, table_size);
>>> +}
>>> +
>>> +int bdrv_dirty_bitmap_store(const BdrvDirtyBitmap *bitmap,
>>> BlockDriverState *bs,
>>> +const uint64_t *table, uint32_t table_size,
>>> +uint32_t cluster_size)
>>> +{
>>> +return hbitmap_store(bitmap->bitmap, bs, table, table_size,
>>> cluster_size);
>>> +}
>>> diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
>>> index 27515af..20cb540 100644
>>> --- a/include/block/dirty-bitmap.h
>>> +++ b/include/block/dirty-bitmap.h
>>> @@ -43,4 +43,15 @@ void bdrv_set_dirty_iter(struct HBitmapIter *hbi,
>>> int64_t offset);
>>>   int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap);
>>>   void bdrv_dirty_bitmap_truncate(BlockDriverState *bs);
>>>   +int bdrv_dirty_bitmap_load(BdrvDirtyBitmap *bitmap,
>>> BlockDriverState *bs,
>>> +   const uint64_t *table, uint32_t table_size,
>>> +   uint32_t cluster_size);
>>> +int bdrv_dirty_bitmap_prepare_store(const BdrvDirtyBitmap *bitmap,
>>> +uint32_t cluster_size,
>>> +uint64_t *table,
>>> +uint32_t *table_size);
>>> +int bdrv_dirty_bitmap_store(const BdrvDirtyBitmap *bitmap,
>>> BlockDriverState *bs,
>>> +const uint64_t *table, uint32_t table_size,
>>> +uint32_t cluster_size);
>>> +
>>>   #endif
>>> diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
>>> index 6d1da4d..d83bb79 100644
>>> --- a/include/qemu/hbitmap.h
>>> +++ b/include/qemu/hbitmap.h
>>> @@ -241,5 +241,17 @@ static inline size_t
>>> hbitmap_iter_next_word(HBitmapIter *hbi, unsigned long *p_c
>>>   return hbi->pos;
>>>   }
>>>   +typedef struct BlockDriverState BlockDriverState;
>>> +
>>> +int hbitmap_load(HBitmap *bitmap, BlockDriverState *bs,
>>> + const uint64_t *table, uint32_t table_size,
>>> + uint32_t cluster_size);
>>> +
>>> +int hbitmap_prepare_store(const HBitmap *bitmap, uint32_t cluster_size,
>>> +  uint64_t *table, uint32_t *table_size);
>>> +
>>> +int hbitmap_store(HBitmap *bitmap, BlockDriverState *bs,
>>> +  const uint64_t *table, uint32_t table_size,
>>> +  uint32_t cluster_size);
>>> #endif
>>> diff --git a/util/hbitmap.c b/util/hbitmap.c
>>> index 28595fb..1960e4f 100644
>>> --- a/util/hbitmap.c
>>> +++ b/util/hbitmap.c
>>> @@ -15,6 +15,8 @@
>>>   #include "qemu/host-utils.h"
>>>   #include "trace.h"
>>>   +#include "block/block.h"
>>> +
>>>   /* HBitmaps provides an array of bits.  The bits are stored as
>>> usual in an
>>>* array of unsigned longs, but HBitmap is also optimized to
>>> provide fast
>>>* iteration over set bits; going from one bit to the next is
>>> O(logB n)
>>> @@ -499,3 +501,210 @@ char *hbitmap_md5(const HBitmap *bitmap)
>>>   const guchar *data

Re: [Qemu-devel] [PATCH v15 4/4] block/gluster: add support for multiple gluster servers

2016-03-28 Thread Jeff Cody

On Mon, Mar 28, 2016 at 10:59:12PM +0530, pkale...@redhat.com wrote:
> From: Prasanna Kumar Kalever 
> 
> This patch adds a way to specify multiple volfile servers to the gluster
> block backend of QEMU with tcp|rdma transport types and their port numbers.
> 
> Problem:
> 
> Currently VM Image on gluster volume is specified like this:
> 
> file=gluster[+tcp]://host[:port]/testvol/a.img
> 
> Assuming we have three hosts in trusted pool with replica 3 volume
> in action and unfortunately host (mentioned in the command above) went down
> for some reason, since the volume is replica 3 we now have other 2 hosts
> active from which we can boot the VM.
> 
> But currently there is no mechanism to pass the other 2 gluster host
> addresses to qemu.
> 
> Solution:
> 
> New way of specifying VM Image on gluster volume with volfile servers:
> (We still support old syntax to maintain backward compatibility)
> 
> Basic command line syntax looks like:
> 
> Pattern I:
>  -drive driver=gluster,
> volume=testvol,path=/path/a.raw,
> server.0.host=1.2.3.4,
>[server.0.port=24007,]
>[server.0.transport=tcp,]
> server.1.host=5.6.7.8,
>[server.1.port=24008,]
>[server.1.transport=rdma,]
> server.2.host=/var/run/glusterd.socket,
> server.2.transport=unix ...
> 
> Pattern II:
>  'json:{"driver":"qcow2","file":{"driver":"gluster",
>"volume":"testvol","path":"/path/a.qcow2",
>"server":[{tuple0},{tuple1}, ...{tupleN}]}}'
> 
>driver  => 'gluster' (protocol name)
>volume  => name of gluster volume where our VM image resides
>path=> absolute path of image in gluster volume
> 
>   {tuple}  => {"host":"1.2.3.4"[,"port":"24007","transport":"tcp"]}
> 
>host=> host address (hostname/ipv4/ipv6 addresses/socket path)
>port=> port number on which glusterd is listening. (default 24007)
>transport   => transport type used to connect to gluster management daemon,
>it can be tcp|rdma|unix (default 'tcp')
> 
> Examples:
> 1.
>  -drive driver=qcow2,file.driver=gluster,
> file.volume=testvol,file.path=/path/a.qcow2,
> file.server.0.host=1.2.3.4,
> file.server.0.port=24007,
> file.server.0.transport=tcp,
> file.server.1.host=5.6.7.8,
> file.server.1.port=24008,
> file.server.1.transport=rdma,
> file.server.2.host=/var/run/glusterd.socket
> file.server.1.transport=unix
> 2.
>  'json:{"driver":"qcow2","file":{"driver":"gluster","volume":"testvol",
>  "path":"/path/a.qcow2","server":
>  [{"host":"1.2.3.4","port":"24007","transport":"tcp"},
>   {"host":"4.5.6.7","port":"24008","transport":"rdma"},
>   {"host":"/var/run/glusterd.socket","transport":"unix"}] } }'
> 
> This patch gives a mechanism to provide all the server addresses, which are in
> replica set, so in case host1 is down VM can still boot from any of the
> active hosts.
> 
> This is equivalent to the backup-volfile-servers option supported by
> mount.glusterfs (FUSE way of mounting gluster volume)
> 
> Credits: Sincere thanks to Kevin Wolf  and
> "Deepak C Shetty"  for inputs and all their support
> 
> Signed-off-by: Prasanna Kumar Kalever 
> ---
>  block/gluster.c  | 298 
> ---
>  qapi/block-core.json |   4 +-
>  2 files changed, 261 insertions(+), 41 deletions(-)
> 
> diff --git a/block/gluster.c b/block/gluster.c
> index 9b360e3..ad3e683 100644
> --- a/block/gluster.c
> +++ b/block/gluster.c
> @@ -11,8 +11,16 @@
>  #include 
>  #include "block/block_int.h"
>  #include "qemu/uri.h"
> +#include "qemu/error-report.h"
>  
>  #define GLUSTER_OPT_FILENAME"filename"
> +#define GLUSTER_OPT_VOLUME  "volume"
> +#define GLUSTER_OPT_PATH"path"
> +#define GLUSTER_OPT_HOST"host"
> +#define GLUSTER_OPT_PORT"port"
> +#define GLUSTER_OPT_TRANSPORT   "transport"
> +#define GLUSTER_OPT_SERVER_PATTERN  "server."
> +
>  #define GLUSTER_DEFAULT_PORT24007
>  
>  
> @@ -65,6 +73,46 @@ static QemuOptsList runtime_opts = {
>  },
>  };
>  
> +static QemuOptsList runtime_json_opts = {
> +.name = "gluster_json",
> +.head = QTAILQ_HEAD_INITIALIZER(runtime_json_opts.head),
> +.desc = {
> +{
> +.name = GLUSTER_OPT_VOLUME,
> +.type = QEMU_OPT_STRING,
> +.help = "name of gluster volume where VM image resides",
> +},
> +{
> +.name = GLUSTER_OPT_PATH,
> +.type = QEMU_OPT_STRING,
> +.help = "absolute path to image file in gluster volume",
> +},
> +{ /* end of list */ }
> +},
> +};
> +
> +static QemuOptsList runtime_tuple_opts = {
> +.name = "gluster_tuple",
> +.head = QTAILQ_HEAD_INITIALIZER(runtime_tuple_opts.head),
> +.desc = {
> +{
> +.name = GLUSTER_OPT_HOST,
> +.type = QEM

Re: [Qemu-devel] [PATCH 4/5] tcg: reorder removal from lists in tb_phys_invalidate

2016-03-28 Thread Paolo Bonzini



On 28/03/2016 20:42, Sergey Fedorov wrote:
> On 17/03/16 16:46, sergey.fedo...@linaro.org wrote:
>> First the translation block is invalidated, for which a simple write
>> to tb->pc is enough.  This means that cpu-exec will not pick up anymore
>> the block, though it may still execute it through chained jumps.  This
>> also replaces the NULLing out of the pointer in the CPUs' local cache.
> 
> Although, using 'tb->pc' to mark a TB as invalid is probably not such a
> good idea. There may be some cases when PC could become equal to -1. For
> example, ARMv6-M uses PC >= 0xFFF0 to perform exception return. So
> we'd better introduce a separate 'tb->valid' or 'tb->invalid' flag.

It is also possible to use tb->flags for that.  I suspect that all-ones
tb flags is never valid, but it could also be a #define.

Paolo

Re: [Qemu-devel] [PATCH 4/5] tcg: reorder removal from lists in tb_phys_invalidate

2016-03-28 Thread Paolo Bonzini

On 28/03/2016 17:18, Sergey Fedorov wrote:
> The use pattern of 'tb_invalidated_flag' is a bit intricate; correct me,
> if I'm wrong about the following. Basically, 'tb_invalidated_flag' was
> meant to catch two events:
>  * some TB has been invalidated by tb_phys_invalidate();

This is patch 4.

>  * the whole translation buffer has been flushed by tb_flush().

This is patch 5.

> Then it is checked to ensure:
>  * the last executed TB can be safely patched to directly call the next
>one in cpu_exec();
>  * the original TB should be provided for further possible invalidation
>along with the temporarily generated TB when in cpu_exec_nocache().
>
> [...] I would suggest the following solution:
>  (1) Use 'tb->pc' as an indicator of whether TB is valid; check for it
>  in cpu_exec() when deciding on whether to patch the last executed
>  TB or not
>  (2) Use 'tcg_ctx.tb_ctx.tb_flush_count' to check for translation buffer
>  flushes; capture it before calling tb_gen_code() and compare to it
>  afterwards to check if tb_flush() has been called in between

Of course that would work, but it would be slower.  I think it is
unnecessary for two reasons:

1) There are two calls to cpu_exec_nocache.  One exits immediately with
"break;", the other always sets "next_tb = 0;".  Therefore it is safe in
both cases for cpu_exec_nocache to hijack cpu->tb_invalidated_flag.

2) if it were broken, it would _also_ be broken before these patches
because cpu_exec_nocache always runs with tb_lock taken.  So I think
documenting the assumptions is better than changing them at the same
time as doing other changes.

Your observation that tb->pc==-1 is not necessarily safe still holds of
course.  Probably the best thing is an inline that can do one of:

1) set cs_base to an invalid value (anything nonzero is enough except on
x86 and SPARC; SPARC can use all-ones)

2) sets the flags to an invalid combination (x86 can use all ones)

3) sets the PC to an invalid value (no one really needs it)

Paolo

Re: [Qemu-devel] [PATCH 1/2] softfloat: Enable run-time-configurable meaning of signaling NaN bit

2016-03-28 Thread Richard Henderson


On 03/25/2016 05:50 AM, Aleksandar Markovic wrote:

+float16 float16_default_nan(float_status *status) {


{ on the next line.


+return const_float64(LIT64( 0xFFF8 ));


Let's please fix the horrible formatting in this file as we touch the lines, 
please.



-#define floatx80_default_nan_high 0x7FFF
-#define floatx80_default_nan_low  LIT64(0xBFFF)
-#else
-#define floatx80_default_nan_high 0x
-#define floatx80_default_nan_low  LIT64( 0xC000 )
-#endif
+uint16_t floatx80_default_nan_high(float_status *status) {
+uint64_t floatx80_default_nan_low(float_status *status) {


Why do you need two separate functions for this?


+floatx80 floatx80_default_nan(float_status *status) {


Seems to me this one is good enough, and indeed preferable.


-#define float128_default_nan_high LIT64(0x7FFF7FFF)
-#define float128_default_nan_low  LIT64(0x)

...

+float128 float128_default_nan(float_status *status) {


Likewise.



diff --git a/target-alpha/cpu.c b/target-alpha/cpu.c
index 2eab060..1714387 100644
--- a/target-alpha/cpu.c
+++ b/target-alpha/cpu.c
@@ -277,6 +277,8 @@ static void alpha_cpu_initfn(Object *obj)
  #endif
  env->lock_addr = -1;
  env->fen = 1;
+
+set_snan_bit_is_one(0, &env->fp_status);


We've just done a memset of (most of) the entire cpu struct.  We don't need to 
re-initialize this field to zero here.  Same with most of the other cpus.



r~

Re: [Qemu-devel] [PATCH 2/2] target-mips: Implement IEEE 754-2008 functionality for R6 and MSA instructions

2016-03-28 Thread Richard Henderson


On 03/25/2016 05:50 AM, Aleksandar Markovic wrote:

@@ -2621,9 +2621,23 @@ uint64_t helper_float_cvtl_d(CPUMIPSState *env, uint64_t 
fdt0)
  uint64_t dt2;

  dt2 = float64_to_int64(fdt0, &env->active_fpu.fp_status);
-if (get_float_exception_flags(&env->active_fpu.fp_status)
-& (float_flag_invalid | float_flag_overflow)) {
-dt2 = FP_TO_INT64_OVERFLOW;
+if (env->active_fpu.fcr31 & (1 << FCR31_NAN2008)) {
+if (get_float_exception_flags(&env->active_fpu.fp_status)
+& (float_flag_invalid | float_flag_overflow)) {
+if (float64_is_any_nan(fdt0)) {
+dt2 = 0;
+} else {
+if (float64_is_neg(fdt0))
+dt2 = INT64_MIN;
+else
+dt2 = INT64_MAX;
+}
+}
+} else {
+if (get_float_exception_flags(&env->active_fpu.fp_status)
+& (float_flag_invalid | float_flag_overflow)) {
+dt2 = FP_TO_INT64_OVERFLOW;
+}


Better to swap the tests here, so that you test the exception flags first (and 
once).  That is the exceptional condition, the one that will be true least 
often.  After that, FCR31_NAN2008 will be tested only when needed.


But also, this pattern is replicated so many times you'd do well to pull this 
sequence out to helper functions (one for s, one for d).



+uint64_t helper_float_abs_d(CPUMIPSState *env, uint64_t fdt0)
+{
+uint64_t fdt1;
+
+if (env->active_fpu.fcr31 & (1 << FCR31_ABS2008)) {
+fdt1 = float64_abs(fdt0);
+} else {
+if (float64_is_neg(fdt0)) {
+fdt1 = float64_sub(0, fdt0, &env->active_fpu.fp_status);
+} else {
+fdt1 = float64_add(0, fdt0, &env->active_fpu.fp_status);
+}
+update_fcr31(env, GETPC());


Here you're better off using two separate helper functions, and chose the 
correct one during translation.  Indeed, since the 2008 version is a simple 
bit-flip, you needn't actually have a helper; just expand the sequence inline.



r~

Re: [Qemu-devel] [PATCH 1/8] tcg: Clean up direct block chaining data fields

2016-03-28 Thread Richard Henderson


On 03/24/2016 08:11 AM, Paolo Bonzini wrote:

There is also a case where a TB jumps to itself; it then appears twice
in the list with different values in the low bits, such as this:

 tb->jmp_list_first = tb | 0;
  .'   |
  |.---'
 tb->jmp_list_next[0] = tb | 2;


Of course, it begs the question of why TB would be in its own list, even if it 
does jump to itself.  We only need the points-to list in order to invalidate a 
TB and unlink it.  But if TB is being invalidated, we don't need to reset the 
jump within TB itself.



r~

Re: [Qemu-devel] [PATCH] linux-user/signal.c: Generate opcode data for restorer in setup_rt_frame

2016-03-28 Thread Laurent Vivier



Le 27/03/2016 11:44, Chen Gang a écrit :
> Hello All:
> 
> Please help check this patch when you have time.
> 
> After this patch, we can let gcc testsuite cleanup-10 run successfully.
> 
> Next, I shall continue to implement floating point instructions: remove
> (u)int64_to_float64 from fdouble implementation.
> 
> 
> Thanks.
> 
> On 3/15/16 05:51, cheng...@emindsoft.com.cn wrote:
>> From: Chen Gang 
>>
>> Original implementation uses do_rt_sigreturn directly in host space,
>> when a guest program is in unwind procedure in guest space, it will get
>> an incorrect restore address, then causes unwind failure.
>>
>> Also cleanup the original incorrect indentation.
>>
>> Signed-off-by: Chen Gang 
>> ---
>>  linux-user/signal.c | 12 ++--
>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/linux-user/signal.c b/linux-user/signal.c
>> index 919aa83..0e3b1c6 100644
>> --- a/linux-user/signal.c
>> +++ b/linux-user/signal.c
>> @@ -5566,8 +5566,13 @@ struct target_rt_sigframe {
>>  unsigned char save_area[16]; /* caller save area */
>>  struct target_siginfo info;
>>  struct target_ucontext uc;
>> +abi_ulong retcode[2];
>>  };
>>  
>> +#define INSN_MOVELI_R10_139  0x00045fe551483000ULL /* { moveli r10, 139 } */
>> +#define INSN_SWINT1  0x286b180051485000ULL /* { swint1 } */
>> +
>> +
>>  static void setup_sigcontext(struct target_sigcontext *sc,
>>   CPUArchState *env, int signo)
>>  {
>> @@ -5643,9 +5648,12 @@ static void setup_rt_frame(int sig, struct 
>> target_sigaction *ka,
>>  __put_user(target_sigaltstack_used.ss_size, 
>> &frame->uc.tuc_stack.ss_size);
>>  setup_sigcontext(&frame->uc.tuc_mcontext, env, info->si_signo);
>>  
>> -restorer = (unsigned long) do_rt_sigreturn;
>>  if (ka->sa_flags & TARGET_SA_RESTORER) {
>> -restorer = (unsigned long) ka->sa_restorer;
>> +restorer = (unsigned long) ka->sa_restorer;
>> +} else {
>> +__put_user(INSN_MOVELI_R10_139, &frame->retcode[0]);
>> +__put_user(INSN_SWINT1, &frame->retcode[1]);
>> +restorer = (unsigned long)frame->retcode;

The address of retcode in host and guest can differ.
You need something like:

restorer = (unsigned long)(frame_addr + offsetof(struct
target_rt_sigframe, retcode));

I've experienced this on sh4 (see commit 2a0fa68)

>>  }
>>  env->pc = (unsigned long) ka->_sa_handler;
>>  env->regs[TILEGX_R_SP] = (unsigned long) frame;
>>
> 

Laurent

[Qemu-devel] [PULL] vfio: convert to 128 bit arithmetic calculations when adding mem regions

2016-03-28 Thread Alex Williamson

From: Bandan Das 

vfio_listener_region_add for a iommu mr results in
an overflow assert since iommu memory region is initialized
with UINT64_MAX. Convert calculations to 128 bit arithmetic
for iommu memory regions and let int128_get64 assert for non iommu
regions if there's an overflow.

Suggested-by: Alex Williamson 
Signed-off-by: Bandan Das 
[missed (end - 1) on 2nd trace call, move llsize closer to use]
Signed-off-by: Alex Williamson 
---
 hw/vfio/common.c |   19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index fb588d8..f27db36 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -323,7 +323,7 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 {
 VFIOContainer *container = container_of(listener, VFIOContainer, listener);
 hwaddr iova, end;
-Int128 llend;
+Int128 llend, llsize;
 void *vaddr;
 int ret;
 
@@ -349,12 +349,12 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 if (int128_ge(int128_make64(iova), llend)) {
 return;
 }
-end = int128_get64(llend);
+end = int128_get64(int128_sub(llend, int128_one()));
 
-if ((iova < container->min_iova) || ((end - 1) > container->max_iova)) {
+if ((iova < container->min_iova) || (end > container->max_iova)) {
 error_report("vfio: IOMMU container %p can't map guest IOVA region"
  " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx,
- container, iova, end - 1);
+ container, iova, end);
 ret = -EFAULT;
 goto fail;
 }
@@ -364,7 +364,7 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 if (memory_region_is_iommu(section->mr)) {
 VFIOGuestIOMMU *giommu;
 
-trace_vfio_listener_region_add_iommu(iova, end - 1);
+trace_vfio_listener_region_add_iommu(iova, end);
 /*
  * FIXME: We should do some checking to see if the
  * capabilities of the host VFIO IOMMU are adequate to model
@@ -395,13 +395,16 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 section->offset_within_region +
 (iova - section->offset_within_address_space);
 
-trace_vfio_listener_region_add_ram(iova, end - 1, vaddr);
+trace_vfio_listener_region_add_ram(iova, end, vaddr);
 
-ret = vfio_dma_map(container, iova, end - iova, vaddr, section->readonly);
+llsize = int128_sub(llend, int128_make64(iova));
+
+ret = vfio_dma_map(container, iova, int128_get64(llsize),
+   vaddr, section->readonly);
 if (ret) {
 error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
  "0x%"HWADDR_PRIx", %p) = %d (%m)",
- container, iova, end - iova, vaddr, ret);
+ container, iova, int128_get64(llsize), vaddr, ret);
 goto fail;
 }

[Qemu-devel] [PULL] VFIO updates 2016-03-28

2016-03-28 Thread Alex Williamson

The following changes since commit b68a80139e37e806f004237e55311ebc42151434:

  Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20160324' into 
staging (2016-03-24 16:24:02 +)

are available in the git repository at:


  git://github.com/awilliam/qemu-vfio.git tags/vfio-update-20160328.0

for you to fetch changes up to 55efcc537d330f1659d3fa171a66eebc9327e91b:

  vfio: convert to 128 bit arithmetic calculations when adding mem regions 
(2016-03-28 13:27:49 -0600)


VFIO updates 2016-03-28

 - Use 128bit math to avoid asserts with IOMMU regions (Bandan Das)


Bandan Das (1):
  vfio: convert to 128 bit arithmetic calculations when adding mem regions

 hw/vfio/common.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

Re: [Qemu-devel] [PATCH] linux-user/signal.c: Generate opcode data for restorer in setup_rt_frame

2016-03-28 Thread Chen Gang

On 3/29/16 06:17, Laurent Vivier wrote:
>> On 3/15/16 05:51, cheng...@emindsoft.com.cn wrote:
>>>
>>> Original implementation uses do_rt_sigreturn directly in host space,
>>> when a guest program is in unwind procedure in guest space, it will get
>>> an incorrect restore address, then causes unwind failure.
>>>
>>> Also cleanup the original incorrect indentation.
>>>
>>> Signed-off-by: Chen Gang 
>>> ---
>>>  linux-user/signal.c | 12 ++--
>>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/linux-user/signal.c b/linux-user/signal.c
>>> index 919aa83..0e3b1c6 100644
>>> --- a/linux-user/signal.c
>>> +++ b/linux-user/signal.c
>>> @@ -5566,8 +5566,13 @@ struct target_rt_sigframe {
>>>  unsigned char save_area[16]; /* caller save area */
>>>  struct target_siginfo info;
>>>  struct target_ucontext uc;
>>> +abi_ulong retcode[2];
>>>  };
>>>  
>>> +#define INSN_MOVELI_R10_139  0x00045fe551483000ULL /* { moveli r10, 139 } 
>>> */
>>> +#define INSN_SWINT1  0x286b180051485000ULL /* { swint1 } */
>>> +
>>> +
>>>  static void setup_sigcontext(struct target_sigcontext *sc,
>>>   CPUArchState *env, int signo)
>>>  {
>>> @@ -5643,9 +5648,12 @@ static void setup_rt_frame(int sig, struct 
>>> target_sigaction *ka,
>>>  __put_user(target_sigaltstack_used.ss_size, 
>>> &frame->uc.tuc_stack.ss_size);
>>>  setup_sigcontext(&frame->uc.tuc_mcontext, env, info->si_signo);
>>>  
>>> -restorer = (unsigned long) do_rt_sigreturn;
>>>  if (ka->sa_flags & TARGET_SA_RESTORER) {
>>> -restorer = (unsigned long) ka->sa_restorer;
>>> +restorer = (unsigned long) ka->sa_restorer;
>>> +} else {
>>> +__put_user(INSN_MOVELI_R10_139, &frame->retcode[0]);
>>> +__put_user(INSN_SWINT1, &frame->retcode[1]);
>>> +restorer = (unsigned long)frame->retcode;
> 
> The address of retcode in host and guest can differ.
> You need something like:
> 
>   restorer = (unsigned long)(frame_addr + offsetof(struct
> target_rt_sigframe, retcode));
> 
> I've experienced this on sh4 (see commit 2a0fa68)
> 

OK, thanks. What you said above sounds reasonable to me. :-)

I shall send patch v2 for it (although tilegx is a pure 64-bit target,
with this patch, I guess, tilegx target should still run correctly under
32-bit host).

By the way, it looks that s390x and microblaze targets also have the
same issue.


Thanks.
-- 
Chen Gang (陈刚)

Managing Natural Environments is the Duty of Human Beings.

[Qemu-devel] [PULL 1/5] slirp: Fix memory leak on small incoming ipv4 packet

2016-03-28 Thread Samuel Thibault

Signed-off-by: Samuel Thibault 
Reviewed-by: Thomas Huth 
---
 slirp/ip_input.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/slirp/ip_input.c b/slirp/ip_input.c
index 12f173d..b464f6b 100644
--- a/slirp/ip_input.c
+++ b/slirp/ip_input.c
@@ -85,7 +85,7 @@ ip_input(struct mbuf *m)
DEBUG_ARG("m_len = %d", m->m_len);
 
if (m->m_len < sizeof (struct ip)) {
-   return;
+   goto bad;
}
 
ip = mtod(m, struct ip *);
-- 
2.8.0.rc3

[Qemu-devel] [PULL 4/5] Use C99 flexible array instead of 1-byte trailing array

2016-03-28 Thread Samuel Thibault

From: Peter Maydell 

Signed-off-by: Peter Maydell 
Signed-off-by: Samuel Thibault 
---
 slirp/mbuf.h | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/slirp/mbuf.h b/slirp/mbuf.h
index 38fedf4..36fb814 100644
--- a/slirp/mbuf.h
+++ b/slirp/mbuf.h
@@ -81,11 +81,9 @@ struct mbuf {
Slirp *slirp;
boolresolution_requested;
uint64_t expiration_date;
+   char   *m_ext;
/* start of dynamic buffer area, must be last element */
-   union {
-   charm_dat[1]; /* ANSI don't like 0 sized arrays */
-   char*m_ext;
-   };
+   charm_dat[];
 };
 
 #define ifq_prev m_prev
-- 
2.8.0.rc3

[Qemu-devel] [PULL 5/5] Rework ipv6 options

2016-03-28 Thread Samuel Thibault

Rename the recently-added ip6-foo options into ipv6-foo options, to make
them coherent with other ipv6 options.

Also rework the documentation.

Signed-off-by: Samuel Thibault 
Reviewed-by: Eric Blake 
---
 net/net.c| 16 
 net/slirp.c  |  6 +++---
 qapi-schema.json | 25 -
 qemu-options.hx  | 18 ++
 4 files changed, 37 insertions(+), 28 deletions(-)

diff --git a/net/net.c b/net/net.c
index 1a78edf..60ad9e6 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1051,33 +1051,33 @@ int net_client_init(QemuOpts *opts, int is_netdev, 
Error **errp)
 Visitor *v = opts_get_visitor(ov);
 
 {
-/* Parse convenience option format ip6-net=fec0::0[/64] */
-const char *ip6_net = qemu_opt_get(opts, "ip6-net");
+/* Parse convenience option format ipv6-net=fec0::0[/64] */
+const char *ip6_net = qemu_opt_get(opts, "ipv6-net");
 
 if (ip6_net) {
 char buf[strlen(ip6_net) + 1];
 
 if (get_str_sep(buf, sizeof(buf), &ip6_net, '/') < 0) {
 /* Default 64bit prefix length.  */
-qemu_opt_set(opts, "ip6-prefix", ip6_net, &error_abort);
-qemu_opt_set_number(opts, "ip6-prefixlen", 64, &error_abort);
+qemu_opt_set(opts, "ipv6-prefix", ip6_net, &error_abort);
+qemu_opt_set_number(opts, "ipv6-prefixlen", 64, &error_abort);
 } else {
 /* User-specified prefix length.  */
 unsigned long len;
 int err;
 
-qemu_opt_set(opts, "ip6-prefix", buf, &error_abort);
+qemu_opt_set(opts, "ipv6-prefix", buf, &error_abort);
 err = qemu_strtoul(ip6_net, NULL, 10, &len);
 
 if (err) {
 error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
-  "ip6-prefix", "a number");
+  "ipv6-prefix", "a number");
 } else {
-qemu_opt_set_number(opts, "ip6-prefixlen", len,
+qemu_opt_set_number(opts, "ipv6-prefixlen", len,
 &error_abort);
 }
 }
-qemu_opt_unset(opts, "ip6-net");
+qemu_opt_unset(opts, "ipv6-net");
 }
 }
 
diff --git a/net/slirp.c b/net/slirp.c
index 95239bc..0013c27 100644
--- a/net/slirp.c
+++ b/net/slirp.c
@@ -828,10 +828,10 @@ int net_init_slirp(const NetClientOptions *opts, const 
char *name,
 net_init_slirp_configs(user->guestfwd, 0);
 
 ret = net_slirp_init(peer, "user", name, user->q_restrict, vnet,
- user->host, user->ip6_prefix, user->ip6_prefixlen,
- user->ip6_host, user->hostname, user->tftp,
+ user->host, user->ipv6_prefix, user->ipv6_prefixlen,
+ user->ipv6_host, user->hostname, user->tftp,
  user->bootfile, user->dhcpstart,
- user->dns, user->ip6_dns, user->smb,
+ user->dns, user->ipv6_dns, user->smb,
  user->smbserver, dnssearch);
 
 while (slirp_configs) {
diff --git a/qapi-schema.json b/qapi-schema.json
index 7f8d799..8907790 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -2427,7 +2427,10 @@
 #
 # @ip: #optional legacy parameter, use net= instead
 #
-# @net: #optional IP address and optional netmask
+# @net: #optional IP network address that the guest will see, in the
+#   form addr[/netmask] The netmask is optional, and can be
+#   either in the form a.b.c.d or as a number of valid top-most
+#   bits. Default is 10.0.2.0/24.
 #
 # @host: #optional guest-visible address of the host
 #
@@ -2443,13 +2446,17 @@
 # @dnssearch: #optional list of DNS suffixes to search, passed as DHCP option
 # to the guest
 #
-# @ip6-prefix: #optional IPv6 network prefix (default is fec0::) (since 2.6)
+# @ipv6-prefix: #optional IPv6 network prefix (default is fec0::) (since
+#   2.6). The network prefix is given in the usual
+#   hexadecimal IPv6 address notation.
 #
-# @ip6-prefixlen: #optional IPv6 network prefix length (default is 64) (since 
2.6)
+# @ipv6-prefixlen: #optional IPv6 network prefix length (default is 64)
+#  (since 2.6)
 #
-# @ip6-host: #optional guest-visible IPv6 address of the host (since 2.6)
+# @ipv6-host: #optional guest-visible IPv6 address of the host (since 2.6)
 #
-# @ip6-dns: #optional guest-visible IPv6 address of the virtual nameserver 
(since 2.6)
+# @ipv6-dns: #optional guest-visible IPv6 address of the virtual
+#nameserver (since 2.6)
 #
 # @smb: #optional root directory of the built-in SMB server
 #
@@ -2474,10 +2481,10 @@
 '*dhcpstart': 'str',
 '*dns':   'str',
 '*dnssearch': ['String'],
-'*ip6-prefix':  'str',
-'*ip6-prefixlen':   'int',
-'*ip6-host':'

[Qemu-devel] [PULL 0/5] slirp updates

2016-03-28 Thread Samuel Thibault

  Merge remote-tracking branch 'remotes/mdroth/tags/qga-pull-2016-03-21-tag' 
into staging (2016-03-22 17:39:48 +)

are available in the git repository at:

  http://people.debian.org/~sthibault/qemu.git tags/samuel-thibault

for you to fetch changes up to d8eb38649587c58d767c3bc6a1075bfeabda9e8a:

  Rework ipv6 options (2016-03-29 01:15:43 +0200)


slirp updates


Peter Maydell (1):
  Use C99 flexible array instead of 1-byte trailing array

Samuel Thibault (4):
  slirp: Fix memory leak on small incoming ipv4 packet
  slirp: send icmp6 errors when UDP send failed
  Avoid embedding struct mbuf in other structures
  Rework ipv6 options

 net/slirp.c  |  6 +++---
 qapi-schema.json | 25 -
 qemu-options.hx  | 18 ++
 slirp/if.c   | 27 ++-
 slirp/ip_input.c |  2 +-
 slirp/mbuf.c | 19 ++-
 slirp/mbuf.h |  6 ++
 slirp/misc.c |  5 -
 slirp/misc.h |  5 +
 slirp/slirp.h|  8 +---
 slirp/udp6.c |  3 +--
 11 files changed, 67 insertions(+), 57 deletions(-)

[Qemu-devel] [PULL 2/5] slirp: send icmp6 errors when UDP send failed

2016-03-28 Thread Samuel Thibault

Signed-off-by: Samuel Thibault 
Reviewed-by: Thomas Huth 
---
 slirp/udp6.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/slirp/udp6.c b/slirp/udp6.c
index 60a91c9..a23026f 100644
--- a/slirp/udp6.c
+++ b/slirp/udp6.c
@@ -113,8 +113,7 @@ void udp6_input(struct mbuf *m)
 m->m_data -= iphlen;
 *ip = save_ip;
 DEBUG_MISC((dfd, "udp tx errno = %d-%s\n", errno, strerror(errno)));
-/* TODO: ICMPv6 error */
-/*icmp_error(m, ICMP_UNREACH,ICMP_UNREACH_NET, 0,strerror(errno));*/
+icmp6_send_error(m, ICMP6_UNREACH, ICMP6_UNREACH_NO_ROUTE);
 goto bad;
 }
 
-- 
2.8.0.rc3

[Qemu-devel] [PULL 3/5] Avoid embedding struct mbuf in other structures

2016-03-28 Thread Samuel Thibault

struct mbuf uses a C99 open char array to allow inlining data. Inlining
this in another structure is however a GNU extension. The inlines used
so far in struct Slirp were actually only needed as head of struct
mbuf lists. This replaces these inline with mere struct quehead,
and use casts as appropriate.

Signed-off-by: Samuel Thibault 
Reviewed-by: Peter Maydell 
---
 slirp/if.c| 27 ++-
 slirp/mbuf.c  | 19 ++-
 slirp/misc.c  |  5 -
 slirp/misc.h  |  5 +
 slirp/slirp.h |  8 +---
 5 files changed, 34 insertions(+), 30 deletions(-)

diff --git a/slirp/if.c b/slirp/if.c
index 2e21f43..9b02180 100644
--- a/slirp/if.c
+++ b/slirp/if.c
@@ -28,9 +28,9 @@ ifs_remque(struct mbuf *ifm)
 void
 if_init(Slirp *slirp)
 {
-slirp->if_fastq.ifq_next = slirp->if_fastq.ifq_prev = &slirp->if_fastq;
-slirp->if_batchq.ifq_next = slirp->if_batchq.ifq_prev = &slirp->if_batchq;
-slirp->next_m = &slirp->if_batchq;
+slirp->if_fastq.qh_link = slirp->if_fastq.qh_rlink = &slirp->if_fastq;
+slirp->if_batchq.qh_link = slirp->if_batchq.qh_rlink = &slirp->if_batchq;
+slirp->next_m = (struct mbuf *) &slirp->if_batchq;
 }
 
 /*
@@ -74,7 +74,8 @@ if_output(struct socket *so, struct mbuf *ifm)
 * We mustn't put this packet back on the fastq (or we'll send it out 
of order)
 * XXX add cache here?
 */
-   for (ifq = slirp->if_batchq.ifq_prev; ifq != &slirp->if_batchq;
+   for (ifq = (struct mbuf *) slirp->if_batchq.qh_rlink;
+(struct quehead *) ifq != &slirp->if_batchq;
 ifq = ifq->ifq_prev) {
if (so == ifq->ifq_so) {
/* A match! */
@@ -86,7 +87,7 @@ if_output(struct socket *so, struct mbuf *ifm)
 
/* No match, check which queue to put it on */
if (so && (so->so_iptos & IPTOS_LOWDELAY)) {
-   ifq = slirp->if_fastq.ifq_prev;
+   ifq = (struct mbuf *) slirp->if_fastq.qh_rlink;
on_fastq = 1;
/*
 * Check if this packet is a part of the last
@@ -98,9 +99,9 @@ if_output(struct socket *so, struct mbuf *ifm)
goto diddit;
}
 } else {
-   ifq = slirp->if_batchq.ifq_prev;
+   ifq = (struct mbuf *) slirp->if_batchq.qh_rlink;
 /* Set next_m if the queue was empty so far */
-if (slirp->next_m == &slirp->if_batchq) {
+if ((struct quehead *) slirp->next_m == &slirp->if_batchq) {
 slirp->next_m = ifm;
 }
 }
@@ -166,10 +167,10 @@ void if_start(Slirp *slirp)
 }
 slirp->if_start_busy = true;
 
-if (slirp->if_fastq.ifq_next != &slirp->if_fastq) {
-ifm_next = slirp->if_fastq.ifq_next;
+if (slirp->if_fastq.qh_link != &slirp->if_fastq) {
+ifm_next = (struct mbuf *) slirp->if_fastq.qh_link;
 next_from_batchq = false;
-} else if (slirp->next_m != &slirp->if_batchq) {
+} else if ((struct quehead *) slirp->next_m != &slirp->if_batchq) {
 /* Nothing on fastq, pick up from batchq via next_m */
 ifm_next = slirp->next_m;
 next_from_batchq = true;
@@ -182,12 +183,12 @@ void if_start(Slirp *slirp)
 from_batchq = next_from_batchq;
 
 ifm_next = ifm->ifq_next;
-if (ifm_next == &slirp->if_fastq) {
+if ((struct quehead *) ifm_next == &slirp->if_fastq) {
 /* No more packets in fastq, switch to batchq */
 ifm_next = slirp->next_m;
 next_from_batchq = true;
 }
-if (ifm_next == &slirp->if_batchq) {
+if ((struct quehead *) ifm_next == &slirp->if_batchq) {
 /* end of batchq */
 ifm_next = NULL;
 }
@@ -218,7 +219,7 @@ void if_start(Slirp *slirp)
 /* Next packet in fastq is from the same session */
 ifm_next = next;
 next_from_batchq = false;
-} else if (slirp->next_m == &slirp->if_batchq) {
+} else if ((struct quehead *) slirp->next_m == &slirp->if_batchq) {
 /* Set next_m and ifm_next if the session packet is now the
  * only one on batchq */
 slirp->next_m = ifm_next = next;
diff --git a/slirp/mbuf.c b/slirp/mbuf.c
index d688dd4..d136988 100644
--- a/slirp/mbuf.c
+++ b/slirp/mbuf.c
@@ -29,16 +29,16 @@
 void
 m_init(Slirp *slirp)
 {
-slirp->m_freelist.m_next = slirp->m_freelist.m_prev = &slirp->m_freelist;
-slirp->m_usedlist.m_next = slirp->m_usedlist.m_prev = &slirp->m_usedlist;
+slirp->m_freelist.qh_link = slirp->m_freelist.qh_rlink = 
&slirp->m_freelist;
+slirp->m_usedlist.qh_link = slirp->m_usedlist.qh_rlink = 
&slirp->m_usedlist;
 }
 
 void m_cleanup(Slirp *slirp)
 {
 struct mbuf *m, *next;
 
-m = slirp->m_usedlist.m_next;
-while (m != &slirp->m_usedlist) {
+m = (struct mbuf *) slirp->m_usedlist.qh_link;
+while ((struc

[Qemu-devel] [PATCH 2/5] slirp: Split get_dns_addr

2016-03-28 Thread Samuel Thibault

Separate get_dns_addr into get_dns_addr_cached and get_dns_addr_resolv_conf
to make conversion to IPv6 easier.

Signed-off-by: Samuel Thibault 
---
 slirp/slirp.c | 53 ++---
 1 file changed, 34 insertions(+), 19 deletions(-)

diff --git a/slirp/slirp.c b/slirp/slirp.c
index 6256c89..2f0d3a3 100644
--- a/slirp/slirp.c
+++ b/slirp/slirp.c
@@ -108,7 +108,28 @@ static void winsock_cleanup(void)
 
 static struct stat dns_addr_stat;
 
-int get_dns_addr(struct in_addr *pdns_addr)
+static int get_dns_addr_cached(struct in_addr *pdns_addr)
+{
+struct stat old_stat;
+if ((curtime - dns_addr_time) < TIMEOUT_DEFAULT) {
+*pdns_addr = dns_addr;
+return 0;
+}
+old_stat = dns_addr_stat;
+if (stat("/etc/resolv.conf", &dns_addr_stat) != 0) {
+return -1;
+}
+if ((dns_addr_stat.st_dev == old_stat.st_dev)
+&& (dns_addr_stat.st_ino == old_stat.st_ino)
+&& (dns_addr_stat.st_size == old_stat.st_size)
+&& (dns_addr_stat.st_mtime == old_stat.st_mtime)) {
+*pdns_addr = dns_addr;
+return 0;
+}
+return 1;
+}
+
+static int get_dns_addr_resolv_conf(struct in_addr *pdns_addr)
 {
 char buff[512];
 char buff2[257];
@@ -116,24 +137,6 @@ int get_dns_addr(struct in_addr *pdns_addr)
 int found = 0;
 struct in_addr tmp_addr;
 
-if (dns_addr.s_addr != 0) {
-struct stat old_stat;
-if ((curtime - dns_addr_time) < TIMEOUT_DEFAULT) {
-*pdns_addr = dns_addr;
-return 0;
-}
-old_stat = dns_addr_stat;
-if (stat("/etc/resolv.conf", &dns_addr_stat) != 0)
-return -1;
-if ((dns_addr_stat.st_dev == old_stat.st_dev)
-&& (dns_addr_stat.st_ino == old_stat.st_ino)
-&& (dns_addr_stat.st_size == old_stat.st_size)
-&& (dns_addr_stat.st_mtime == old_stat.st_mtime)) {
-*pdns_addr = dns_addr;
-return 0;
-}
-}
-
 f = fopen("/etc/resolv.conf", "r");
 if (!f)
 return -1;
@@ -173,6 +176,18 @@ int get_dns_addr(struct in_addr *pdns_addr)
 return 0;
 }
 
+int get_dns_addr(struct in_addr *pdns_addr)
+{
+if (dns_addr.s_addr != 0) {
+int ret;
+ret = get_dns_addr_cached(pdns_addr);
+if (ret <= 0) {
+return ret;
+}
+}
+return get_dns_addr_resolv_conf(pdns_addr);
+}
+
 #endif
 
 static void slirp_init_once(void)
-- 
2.8.0.rc3

[Qemu-devel] [PATCH 5/5] slirp: Add RDNSS advertisement

2016-03-28 Thread Samuel Thibault

This adds the RDNSS option to IPv6 router advertisements, so that the guest
can autoconfigure the DNS server address.

Signed-off-by: Samuel Thibault 
---
 slirp/ip6_icmp.c | 19 ---
 slirp/ip6_icmp.h | 12 ++--
 2 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/slirp/ip6_icmp.c b/slirp/ip6_icmp.c
index 09571bc..74585d1 100644
--- a/slirp/ip6_icmp.c
+++ b/slirp/ip6_icmp.c
@@ -149,7 +149,8 @@ void ndp_send_ra(Slirp *slirp)
 rip->ip_nh = IPPROTO_ICMPV6;
 rip->ip_pl = htons(ICMP6_NDP_RA_MINLEN
 + NDPOPT_LINKLAYER_LEN
-+ NDPOPT_PREFIXINFO_LEN);
++ NDPOPT_PREFIXINFO_LEN
++ NDPOPT_RDNSS_LEN);
 t->m_len = sizeof(struct ip6) + ntohs(rip->ip_pl);
 
 /* Build ICMPv6 packet */
@@ -167,16 +168,16 @@ void ndp_send_ra(Slirp *slirp)
 ricmp->icmp6_nra.lifetime = htons(NDP_AdvDefaultLifetime);
 ricmp->icmp6_nra.reach_time = htonl(NDP_AdvReachableTime);
 ricmp->icmp6_nra.retrans_time = htonl(NDP_AdvRetransTime);
+t->m_data += ICMP6_NDP_RA_MINLEN;
 
 /* Source link-layer address (NDP option) */
-t->m_data += ICMP6_NDP_RA_MINLEN;
 struct ndpopt *opt = mtod(t, struct ndpopt *);
 opt->ndpopt_type = NDPOPT_LINKLAYER_SOURCE;
 opt->ndpopt_len = NDPOPT_LINKLAYER_LEN / 8;
 in6_compute_ethaddr(rip->ip_src, opt->ndpopt_linklayer);
+t->m_data += NDPOPT_LINKLAYER_LEN;
 
 /* Prefix information (NDP option) */
-t->m_data += NDPOPT_LINKLAYER_LEN;
 struct ndpopt *opt2 = mtod(t, struct ndpopt *);
 opt2->ndpopt_type = NDPOPT_PREFIX_INFO;
 opt2->ndpopt_len = NDPOPT_PREFIXINFO_LEN / 8;
@@ -188,8 +189,20 @@ void ndp_send_ra(Slirp *slirp)
 opt2->ndpopt_prefixinfo.pref_lt = htonl(NDP_AdvPrefLifetime);
 opt2->ndpopt_prefixinfo.reserved2 = 0;
 opt2->ndpopt_prefixinfo.prefix = slirp->vprefix_addr6;
+t->m_data += NDPOPT_PREFIXINFO_LEN;
+
+/* Prefix information (NDP option) */
+struct ndpopt *opt3 = mtod(t, struct ndpopt *);
+opt3->ndpopt_type = NDPOPT_RDNSS;
+opt3->ndpopt_len = NDPOPT_RDNSS_LEN / 8;
+opt3->ndpopt_rdnss.reserved = 0;
+opt3->ndpopt_rdnss.lifetime = htonl(2 * NDP_MaxRtrAdvInterval);
+opt3->ndpopt_rdnss.addr = slirp->vnameserver_addr6;
+t->m_data += NDPOPT_RDNSS_LEN;
 
 /* ICMPv6 Checksum */
+t->m_data -= NDPOPT_RDNSS_LEN;
+t->m_data -= NDPOPT_PREFIXINFO_LEN;
 t->m_data -= NDPOPT_LINKLAYER_LEN;
 t->m_data -= ICMP6_NDP_RA_MINLEN;
 t->m_data -= sizeof(struct ip6);
diff --git a/slirp/ip6_icmp.h b/slirp/ip6_icmp.h
index 9460bf8..2282d29 100644
--- a/slirp/ip6_icmp.h
+++ b/slirp/ip6_icmp.h
@@ -122,6 +122,7 @@ struct ndpopt {
 uint8_t ndpopt_len; /* /!\ In units of 8 octets */
 union {
 unsigned char   linklayer_addr[6];  /* Source/Target Link-layer */
+#define ndpopt_linklayer ndpopt_body.linklayer_addr
 struct prefixinfo { /* Prefix Information */
 uint8_t prefix_length;
 #ifdef HOST_WORDS_BIGENDIAN
@@ -134,19 +135,26 @@ struct ndpopt {
 uint32_treserved2;
 struct in6_addr prefix;
 } QEMU_PACKED prefixinfo;
-} ndpopt_body;
-#define ndpopt_linklayer ndpopt_body.linklayer_addr
 #define ndpopt_prefixinfo ndpopt_body.prefixinfo
+struct rdnss {
+uint16_t reserved;
+uint32_t lifetime;
+struct in6_addr addr;
+} QEMU_PACKED rdnss;
+#define ndpopt_rdnss ndpopt_body.rdnss
+} ndpopt_body;
 } QEMU_PACKED;
 
 /* NDP options type */
 #define NDPOPT_LINKLAYER_SOURCE 1   /* Source Link-Layer Address */
 #define NDPOPT_LINKLAYER_TARGET 2   /* Target Link-Layer Address */
 #define NDPOPT_PREFIX_INFO  3   /* Prefix Information */
+#define NDPOPT_RDNSS25  /* Recursive DNS Server Address */
 
 /* NDP options size, in octets. */
 #define NDPOPT_LINKLAYER_LEN8
 #define NDPOPT_PREFIXINFO_LEN   32
+#define NDPOPT_RDNSS_LEN24
 
 /*
  * Definition of type and code field values.
-- 
2.8.0.rc3

[Qemu-devel] [PATCH 0/5] ipv4-only and ipv6-only support

2016-03-28 Thread Samuel Thibault

Hello,

This series gathers the patches to enable ipv4-only and ipv6-only support: it
adds the discussed ipv4 and ipv6 options to select which is enabled, and adds
support for ipv6 dns translation.

Samuel Thibault (5):
  slirp: Allow disabling IPv4 or IPv6
  slirp: Split get_dns_addr
  slirp: Add dns6 resolution
  slirp: Support link-local DNS addresses
  slirp: Add RDNSS advertisement

 net/slirp.c   |  36 +---
 qapi-schema.json  |   8 
 qemu-options.hx   |   8 +++-
 slirp/ip6.h   |   9 
 slirp/ip6_icmp.c  |  27 ++--
 slirp/ip6_icmp.h  |  12 +-
 slirp/ip6_input.c |   5 +++
 slirp/ip_input.c  |   4 ++
 slirp/libslirp.h  |   4 +-
 slirp/slirp.c | 127 ++
 slirp/slirp.h |   2 +
 slirp/socket.c|   4 +-
 12 files changed, 203 insertions(+), 43 deletions(-)

-- 
2.8.0.rc3

[Qemu-devel] [PATCH 3/5] slirp: Add dns6 resolution

2016-03-28 Thread Samuel Thibault

This makes get_dns_addr address family-agnostic, thus allowing to add the
IPv6 case.

Signed-off-by: Samuel Thibault 
---
 slirp/ip6.h  |  9 +++
 slirp/libslirp.h |  1 +
 slirp/slirp.c| 72 
 slirp/socket.c   |  4 ++--
 4 files changed, 64 insertions(+), 22 deletions(-)

diff --git a/slirp/ip6.h b/slirp/ip6.h
index 8ddfa24..da23de6 100644
--- a/slirp/ip6.h
+++ b/slirp/ip6.h
@@ -26,6 +26,12 @@
 0x00, 0x00, 0x00, 0x00,\
 0x00, 0x00, 0x00, 0x02 } }
 
+#define ZERO_ADDR  { .s6_addr = \
+{ 0x00, 0x00, 0x00, 0x00,\
+0x00, 0x00, 0x00, 0x00,\
+0x00, 0x00, 0x00, 0x00,\
+0x00, 0x00, 0x00, 0x00 } }
+
 static inline bool in6_equal(const struct in6_addr *a, const struct in6_addr 
*b)
 {
 return memcmp(a, b, sizeof(*a)) == 0;
@@ -84,6 +90,9 @@ static inline bool in6_equal_mach(const struct in6_addr *a,
 #define in6_solicitednode_multicast(a)\
 (in6_equal_net(a, &(struct in6_addr)SOLICITED_NODE_PREFIX, 104))
 
+#define in6_zero(a)\
+(in6_equal(a, &(struct in6_addr)ZERO_ADDR))
+
 /* Compute emulated host MAC address from its ipv6 address */
 static inline void in6_compute_ethaddr(struct in6_addr ip,
uint8_t eth[ETH_ALEN])
diff --git a/slirp/libslirp.h b/slirp/libslirp.h
index 127aa41..b0cfbc5 100644
--- a/slirp/libslirp.h
+++ b/slirp/libslirp.h
@@ -7,6 +7,7 @@ struct Slirp;
 typedef struct Slirp Slirp;
 
 int get_dns_addr(struct in_addr *pdns_addr);
+int get_dns6_addr(struct in6_addr *pdns6_addr);
 
 Slirp *slirp_init(int restricted, bool in_enabled, struct in_addr vnetwork,
   struct in_addr vnetmask, struct in_addr vhost,
diff --git a/slirp/slirp.c b/slirp/slirp.c
index 2f0d3a3..3558b47 100644
--- a/slirp/slirp.c
+++ b/slirp/slirp.c
@@ -45,7 +45,9 @@ static QTAILQ_HEAD(slirp_instances, Slirp) slirp_instances =
 QTAILQ_HEAD_INITIALIZER(slirp_instances);
 
 static struct in_addr dns_addr;
+static struct in6_addr dns6_addr;
 static u_int dns_addr_time;
+static u_int dns6_addr_time;
 
 #define TIMEOUT_FAST 2  /* milliseconds */
 #define TIMEOUT_SLOW 499  /* milliseconds */
@@ -99,6 +101,11 @@ int get_dns_addr(struct in_addr *pdns_addr)
 return 0;
 }
 
+int get_dns6_addr(struct in6_addr *pdns_addr6)
+{
+return -1;
+}
+
 static void winsock_cleanup(void)
 {
 WSACleanup();
@@ -106,36 +113,40 @@ static void winsock_cleanup(void)
 
 #else
 
-static struct stat dns_addr_stat;
-
-static int get_dns_addr_cached(struct in_addr *pdns_addr)
+static int get_dns_addr_cached(void *pdns_addr, void *cached_addr,
+   socklen_t addrlen,
+   struct stat *cached_stat, u_int *cached_time)
 {
 struct stat old_stat;
 if ((curtime - dns_addr_time) < TIMEOUT_DEFAULT) {
-*pdns_addr = dns_addr;
+memcpy(pdns_addr, cached_addr, addrlen);
 return 0;
 }
-old_stat = dns_addr_stat;
-if (stat("/etc/resolv.conf", &dns_addr_stat) != 0) {
+old_stat = *cached_stat;
+if (stat("/etc/resolv.conf", cached_stat) != 0) {
 return -1;
 }
-if ((dns_addr_stat.st_dev == old_stat.st_dev)
-&& (dns_addr_stat.st_ino == old_stat.st_ino)
-&& (dns_addr_stat.st_size == old_stat.st_size)
-&& (dns_addr_stat.st_mtime == old_stat.st_mtime)) {
-*pdns_addr = dns_addr;
+if ((cached_stat->st_dev == old_stat.st_dev)
+&& (cached_stat->st_ino == old_stat.st_ino)
+&& (cached_stat->st_size == old_stat.st_size)
+&& (cached_stat->st_mtime == old_stat.st_mtime)) {
+memcpy(pdns_addr, cached_addr, addrlen);
 return 0;
 }
 return 1;
 }
 
-static int get_dns_addr_resolv_conf(struct in_addr *pdns_addr)
+static int get_dns_addr_resolv_conf(int af, void *pdns_addr, void *cached_addr,
+socklen_t addrlen, u_int *cached_time)
 {
 char buff[512];
 char buff2[257];
 FILE *f;
 int found = 0;
-struct in_addr tmp_addr;
+void *tmp_addr = alloca(addrlen);
+#ifdef DEBUG
+char s[INET6_ADDRSTRLEN];
+#endif
 
 f = fopen("/etc/resolv.conf", "r");
 if (!f)
@@ -146,13 +157,14 @@ static int get_dns_addr_resolv_conf(struct in_addr 
*pdns_addr)
 #endif
 while (fgets(buff, 512, f) != NULL) {
 if (sscanf(buff, "nameserver%*[ \t]%256s", buff2) == 1) {
-if (!inet_aton(buff2, &tmp_addr))
+if (!inet_pton(af, buff2, tmp_addr)) {
 continue;
+}
 /* If it's the first one, set it to dns_addr */
 if (!found) {
-*pdns_addr = tmp_addr;
-dns_addr = tmp_addr;
-dns_addr_time = curtime;
+memcpy(pdns_addr, tmp_addr, addrlen);
+memcpy(cached_addr, tmp_addr, addrlen);
+*cached_time = curtime;

[Qemu-devel] [PATCH 1/5] slirp: Allow disabling IPv4 or IPv6

2016-03-28 Thread Samuel Thibault

Add ipv4 and ipv6 boolean options, so the user can setup IPv4-only and
IPv6-only network environments.

Signed-off-by: Samuel Thibault 

---

Changes since previous versions:

- fix coding style
---
 net/slirp.c   | 36 ++--
 qapi-schema.json  |  8 
 qemu-options.hx   |  8 ++--
 slirp/ip6_icmp.c  |  8 
 slirp/ip6_input.c |  5 +
 slirp/ip_input.c  |  4 
 slirp/libslirp.h  |  3 ++-
 slirp/slirp.c | 10 +-
 slirp/slirp.h |  2 ++
 9 files changed, 74 insertions(+), 10 deletions(-)

diff --git a/net/slirp.c b/net/slirp.c
index 0013c27..c810dc4 100644
--- a/net/slirp.c
+++ b/net/slirp.c
@@ -135,8 +135,8 @@ static NetClientInfo net_slirp_info = {
 
 static int net_slirp_init(NetClientState *peer, const char *model,
   const char *name, int restricted,
-  const char *vnetwork, const char *vhost,
-  const char *vprefix6, int vprefix6_len,
+  bool ipv4, const char *vnetwork, const char *vhost,
+  bool ipv6, const char *vprefix6, int vprefix6_len,
   const char *vhost6,
   const char *vhostname, const char *tftp_export,
   const char *bootfile, const char *vdhcp_start,
@@ -164,6 +164,19 @@ static int net_slirp_init(NetClientState *peer, const char 
*model,
 char *end;
 struct slirp_config_str *config;
 
+if (!ipv4 && (vnetwork || vhost || vnameserver)) {
+return -1;
+}
+
+if (!ipv6 && (vprefix6 || vhost6 || vnameserver6)) {
+return -1;
+}
+
+if (!ipv4 && !ipv6) {
+/* It doesn't make sense to disable both */
+return -1;
+}
+
 if (!tftp_export) {
 tftp_export = legacy_tftp_prefix;
 }
@@ -308,8 +321,8 @@ static int net_slirp_init(NetClientState *peer, const char 
*model,
 
 s = DO_UPCAST(SlirpState, nc, nc);
 
-s->slirp = slirp_init(restricted, net, mask, host,
-  ip6_prefix, vprefix6_len, ip6_host,
+s->slirp = slirp_init(restricted, ipv4, net, mask, host,
+  ipv6, ip6_prefix, vprefix6_len, ip6_host,
   vhostname, tftp_export, bootfile, dhcp,
   dns, ip6_dns, dnssearch, s);
 QTAILQ_INSERT_TAIL(&slirp_stacks, s, entry);
@@ -812,10 +825,20 @@ int net_init_slirp(const NetClientOptions *opts, const 
char *name,
 int ret;
 const NetdevUserOptions *user;
 const char **dnssearch;
+bool ipv4 = true, ipv6 = true;
 
 assert(opts->type == NET_CLIENT_OPTIONS_KIND_USER);
 user = opts->u.user.data;
 
+if ((user->has_ipv6 && user->ipv6 && !user->has_ipv4) ||
+(user->has_ipv4 && !user->ipv4)) {
+ipv4 = 0;
+}
+if ((user->has_ipv4 && user->ipv4 && !user->has_ipv6) ||
+(user->has_ipv6 && !user->ipv6)) {
+ipv6 = 0;
+}
+
 vnet = user->has_net ? g_strdup(user->net) :
user->has_ip  ? g_strdup_printf("%s/24", user->ip) :
NULL;
@@ -827,8 +850,9 @@ int net_init_slirp(const NetClientOptions *opts, const char 
*name,
 net_init_slirp_configs(user->hostfwd, SLIRP_CFG_HOSTFWD);
 net_init_slirp_configs(user->guestfwd, 0);
 
-ret = net_slirp_init(peer, "user", name, user->q_restrict, vnet,
- user->host, user->ipv6_prefix, user->ipv6_prefixlen,
+ret = net_slirp_init(peer, "user", name, user->q_restrict,
+ ipv4, vnet, user->host,
+ ipv6, user->ipv6_prefix, user->ipv6_prefixlen,
  user->ipv6_host, user->hostname, user->tftp,
  user->bootfile, user->dhcpstart,
  user->dns, user->ipv6_dns, user->smb,
diff --git a/qapi-schema.json b/qapi-schema.json
index 8907790..d83caa3 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -2425,6 +2425,12 @@
 #
 # @restrict: #optional isolate the guest from the host
 #
+# @ipv4: #optional whether to support IPv4, default true for enabled
+#(since 2.6)
+#
+# @ipv6: #optional whether to support IPv6, default true for enabled
+#(since 2.6)
+#
 # @ip: #optional legacy parameter, use net= instead
 #
 # @net: #optional IP network address that the guest will see, in the
@@ -2473,6 +2479,8 @@
   'data': {
 '*hostname':  'str',
 '*restrict':  'bool',
+'*ipv4':  'bool',
+'*ipv6':  'bool',
 '*ip':'str',
 '*net':   'str',
 '*host':  'str',
diff --git a/qemu-options.hx b/qemu-options.hx
index 09162f5..705f162 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1551,8 +1551,9 @@ DEF("smb", HAS_ARG, QEMU_OPTION_smb, "", QEMU_ARCH_ALL)
 
 DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
 #ifdef CONFIG_SLIRP
-"-netdev user,id=str[,net=addr[/mask]][,host=addr][,ipv6-net=addr[/int]]\n"
-" 
[,ipv6-host=addr][,restrict=on|off][,hostname=host][,dhcpstart

[Qemu-devel] [PATCH 4/5] slirp: Support link-local DNS addresses

2016-03-28 Thread Samuel Thibault

They look like fe80::%eth0

Signed-off-by: Samuel Thibault 
---
 slirp/libslirp.h |  2 +-
 slirp/slirp.c| 26 ++
 slirp/socket.c   |  2 +-
 3 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/slirp/libslirp.h b/slirp/libslirp.h
index b0cfbc5..81bd139 100644
--- a/slirp/libslirp.h
+++ b/slirp/libslirp.h
@@ -7,7 +7,7 @@ struct Slirp;
 typedef struct Slirp Slirp;
 
 int get_dns_addr(struct in_addr *pdns_addr);
-int get_dns6_addr(struct in6_addr *pdns6_addr);
+int get_dns6_addr(struct in6_addr *pdns6_addr, unsigned *scope_id);
 
 Slirp *slirp_init(int restricted, bool in_enabled, struct in_addr vnetwork,
   struct in_addr vnetmask, struct in_addr vhost,
diff --git a/slirp/slirp.c b/slirp/slirp.c
index 3558b47..eaf843a 100644
--- a/slirp/slirp.c
+++ b/slirp/slirp.c
@@ -29,6 +29,10 @@
 #include "slirp.h"
 #include "hw/hw.h"
 
+#ifndef _WIN32
+#include 
+#endif
+
 /* host loopback address */
 struct in_addr loopback_addr;
 /* host loopback network mask */
@@ -137,13 +141,15 @@ static int get_dns_addr_cached(void *pdns_addr, void 
*cached_addr,
 }
 
 static int get_dns_addr_resolv_conf(int af, void *pdns_addr, void *cached_addr,
-socklen_t addrlen, u_int *cached_time)
+socklen_t addrlen, unsigned *scope_id,
+u_int *cached_time)
 {
 char buff[512];
 char buff2[257];
 FILE *f;
 int found = 0;
 void *tmp_addr = alloca(addrlen);
+unsigned if_index;
 #ifdef DEBUG
 char s[INET6_ADDRSTRLEN];
 #endif
@@ -157,6 +163,14 @@ static int get_dns_addr_resolv_conf(int af, void 
*pdns_addr, void *cached_addr,
 #endif
 while (fgets(buff, 512, f) != NULL) {
 if (sscanf(buff, "nameserver%*[ \t]%256s", buff2) == 1) {
+char *c = strchr(buff2, '%');
+if (c) {
+if_index = if_nametoindex(c + 1);
+*c = '\0';
+} else {
+if_index = 0;
+}
+
 if (!inet_pton(af, buff2, tmp_addr)) {
 continue;
 }
@@ -164,6 +178,9 @@ static int get_dns_addr_resolv_conf(int af, void 
*pdns_addr, void *cached_addr,
 if (!found) {
 memcpy(pdns_addr, tmp_addr, addrlen);
 memcpy(cached_addr, tmp_addr, addrlen);
+if (scope_id) {
+*scope_id = if_index;
+}
 *cached_time = curtime;
 }
 #ifdef DEBUG
@@ -201,12 +218,12 @@ int get_dns_addr(struct in_addr *pdns_addr)
 }
 }
 return get_dns_addr_resolv_conf(AF_INET, pdns_addr, &dns_addr,
-sizeof(dns_addr), &dns_addr_time);
+sizeof(dns_addr), NULL, &dns_addr_time);
 }
 
 static struct stat dns6_addr_stat;
 
-int get_dns6_addr(struct in6_addr *pdns6_addr)
+int get_dns6_addr(struct in6_addr *pdns6_addr, unsigned *scope_id)
 {
 if (!in6_zero(&dns6_addr)) {
 int ret;
@@ -217,7 +234,8 @@ int get_dns6_addr(struct in6_addr *pdns6_addr)
 }
 }
 return get_dns_addr_resolv_conf(AF_INET6, pdns6_addr, &dns6_addr,
-sizeof(dns6_addr), &dns6_addr_time);
+sizeof(dns6_addr),
+scope_id, &dns6_addr_time);
 }
 
 #endif
diff --git a/slirp/socket.c b/slirp/socket.c
index 653257d..896c27e 100644
--- a/slirp/socket.c
+++ b/slirp/socket.c
@@ -796,7 +796,7 @@ void sotranslate_out(struct socket *so, struct 
sockaddr_storage *addr)
 if (in6_equal_net(&so->so_faddr6, &slirp->vprefix_addr6,
 slirp->vprefix_len)) {
 if (in6_equal(&so->so_faddr6, &slirp->vnameserver_addr6)) {
-if (get_dns6_addr(&sin6->sin6_addr) < 0) {
+if (get_dns6_addr(&sin6->sin6_addr, &sin6->sin6_scope_id) < 0) 
{
 sin6->sin6_addr = in6addr_loopback;
 }
 } else {
-- 
2.8.0.rc3

Re: [Qemu-devel] [PATCHv4 2/2] slirp: Allow disabling IPv4 or IPv6

2016-03-28 Thread Samuel Thibault

Eric Blake, on Mon 28 Mar 2016 09:12:04 -0600, wrote:
> > +if ((user->has_ipv6 && user->ipv6 && !user->has_ipv4)
> > +|| (user->has_ipv4 && !user->ipv4))
> > +ipv4 = 0;
> 
> Inconsistent with current qemu style.  Should be:
> 
> if ((user->has_ipv6 && user->ipv6 && !user->has_ipv4) ||
> (user->has_ipv4 && !user->ipv4)) {

Ah, I didn't know about that one.  I personally prefer it the other way,
but alright.

> In particular, the missing {} (should) fail ./scripts/checkpatch.pl.

Right, sorry, it seems I forgot to re-run the script after the last
rework.

Samuel

Re: [Qemu-devel] [PATCH 4/5] tcg: reorder removal from lists in tb_phys_invalidate

2016-03-28 Thread Richard Henderson


On 03/28/2016 01:58 PM, Paolo Bonzini wrote:



On 28/03/2016 20:42, Sergey Fedorov wrote:

On 17/03/16 16:46, sergey.fedo...@linaro.org wrote:

First the translation block is invalidated, for which a simple write
to tb->pc is enough.  This means that cpu-exec will not pick up anymore
the block, though it may still execute it through chained jumps.  This
also replaces the NULLing out of the pointer in the CPUs' local cache.


Although, using 'tb->pc' to mark a TB as invalid is probably not such a
good idea. There may be some cases when PC could become equal to -1. For
example, ARMv6-M uses PC >= 0xFFF0 to perform exception return. So
we'd better introduce a separate 'tb->valid' or 'tb->invalid' flag.


It is also possible to use tb->flags for that.  I suspect that all-ones
tb flags is never valid, but it could also be a #define.


That might work by accident, but it might not.  You'd need to reserve a bit 
across all of the targets.



r~

[Qemu-devel] [Bug 1563152] [NEW] general protection fault running VirtualBox in KVM guest

2016-03-28 Thread Richard Hansen

Public bug reported:

I'm trying to run nested VMs using qemu-kvm on the physical host and VirtualBox 
on the guest host:
  * physical host: Ubuntu 14.04 running Linux 4.2.0, qemu-kvm 2.0.0
  * guest host: Ubuntu 16.04 beta 2 running Linux 4.4.0, VirtualBox 5.0.16

When I try to start up a VirtualBox VM in the guest host, I get a
general protection fault (see below for dmesg output).  According to
https://www.virtualbox.org/ticket/14965 this is caused by a bug in
QEMU/KVM:

The problem in more detail:  As written above, VirtualBox tries to
read the MSR 0x9B (IA32_SMM_MONITOR_CTL).  This is an
architectural MSR which is present if CPUID.01 / ECX bit 5 or bit
6 are set (VMX or SMX).  As KVM has nested virtualization enabled
and therefore pretends to support VT-x, this MSR must be
accessible and reading from this MSR must not raise a
#GP.  KVM/QEmu does not behave like real hardware in this case.

dmesg output:

SUPR0GipMap: fGetGipCpu=0x3
general protection fault:  [#1] SMP
Modules linked in: pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) 
vboxdrv(OE) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
xt_tcpudp bridge stp llc iptable_filter ip_tables x_tables ppdev kvm_intel kvm 
irqbypass snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core 
snd_hwdep snd_pcm snd_timer i2c_piix4 snd input_leds soundcore joydev 
8250_fintek mac_hid serio_raw pvpanic parport_pc parport ib_iser rdma_cm iw_cm 
ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov 
async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 
multipath linear crct10dif_pclmul crc32_pclmul qxl ttm drm_kms_helper 
syscopyarea sysfillrect aesni_intel sysimgblt fb_sys_fops aes_x86_64 lrw 
gf128mul glue_helper ablk_helper cryptd psmouse floppy drm pata_acpi
CPU: 0 PID: 31507 Comm: EMT Tainted: G   OE   4.4.0-15-generic 
#31-Ubuntu
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 880034c0a580 ti: 880002e0 task.ti: 880002e0
RIP: 0010:[]  [] 0xc067e506
RSP: 0018:880002e03d70  EFLAGS: 00010206
RAX: 06f0 RBX: ffdb RCX: 009b
RDX:  RSI: 880002e03d00 RDI: 880002e03cc8
RBP: 880002e03d90 R08: 0004 R09: 06f0
R10: 49656e69 R11: 0f8bfbff R12: 0020
R13:  R14: c957407c R15: c0645260
FS:  7f89b8f6b700() GS:88007fc0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f89b8d1 CR3: 35ae1000 CR4: 06f0
Stack:
    
 880002e03db0 c0693e93 c9574010 880035aae550
 880002e03e30 c060a3e7 880002e03e10 0282
Call Trace:
 [] ? supdrvIOCtl+0x2de7/0x3250 [vboxdrv]
 [] ? VBoxDrvLinuxIOCtl_5_0_16+0x150/0x250 [vboxdrv]
 [] ? do_vfs_ioctl+0x29f/0x490
 [] ? __do_page_fault+0x1b4/0x400
 [] ? SyS_ioctl+0x79/0x90
 [] ? entry_SYSCALL_64_fastpath+0x16/0x71
Code: 88 e4 fc ff ff b9 3a 00 00 00 0f 32 48 c1 e2 20 89 c0 48 09 d0 48 89 05 
f9 db 0e 00 0f 20 e0 b9 9b 00 00 00 48 89 05 d2 db 0e 00 <0f> 32 48 c1 e2 20 89 
c0 b9 80 00 00 c0 48 09 d0 48 89 05 cb db
RIP  [] 0xc067e506
 RSP 
---[ end trace b3284b6520f49e0d ]---

** Affects: qemu
 Importance: Undecided
 Status: New

** Description changed:

  I'm trying to run nested VMs using qemu-kvm on the physical host and 
VirtualBox on the guest host:
-   * physical host: Ubuntu 14.04 running Linux 4.2.0, qemu-kvm 2.0.0
-   * guest host: Ubuntu 16.04 beta 2 running Linux 4.2.0, VirtualBox 5.0.16
+   * physical host: Ubuntu 14.04 running Linux 4.2.0, qemu-kvm 2.0.0
+   * guest host: Ubuntu 16.04 beta 2 running Linux 4.4.0, VirtualBox 5.0.16
  
  When I try to start up a VirtualBox VM in the guest host, I get a
  general protection fault (see below for dmesg output).  According to
  https://www.virtualbox.org/ticket/14965 this is caused by a bug in
  QEMU/KVM:
  
- The problem in more detail:  As written above, VirtualBox tries to
- read the MSR 0x9B (IA32_SMM_MONITOR_CTL).  This is an
- architectural MSR which is present if CPUID.01 / ECX bit 5 or bit
- 6 are set (VMX or SMX).  As KVM has nested virtualization enabled
- and therefore pretends to support VT-x, this MSR must be
- accessible and reading from this MSR must not raise a
- #GP.  KVM/QEmu does not behave like real hardware in this case.
+ The problem in more detail:  As written above, VirtualBox tries to
+ read the MSR 0x9B (IA32_SMM_MONITOR_CTL).  This is an
+ architectural MSR which is present if CPUID.01 / ECX bit 5 or bit
+ 6 are set (VMX or SMX).  As KVM has nested virtualization enabled

[Qemu-devel] [Bug 1563152] Re: general protection fault running VirtualBox in KVM guest

2016-03-28 Thread Richard Hansen

** Also affects: qemu (Ubuntu)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1563152

Title:
  general protection fault running VirtualBox in KVM guest

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  New

Bug description:
  I'm trying to run nested VMs using qemu-kvm on the physical host and 
VirtualBox on the guest host:
    * physical host: Ubuntu 14.04 running Linux 4.2.0, qemu-kvm 2.0.0
    * guest host: Ubuntu 16.04 beta 2 running Linux 4.4.0, VirtualBox 5.0.16

  When I try to start up a VirtualBox VM in the guest host, I get a
  general protection fault (see below for dmesg output).  According to
  https://www.virtualbox.org/ticket/14965 this is caused by a bug in
  QEMU/KVM:

  The problem in more detail:  As written above, VirtualBox tries to
  read the MSR 0x9B (IA32_SMM_MONITOR_CTL).  This is an
  architectural MSR which is present if CPUID.01 / ECX bit 5 or bit
  6 are set (VMX or SMX).  As KVM has nested virtualization enabled
  and therefore pretends to support VT-x, this MSR must be
  accessible and reading from this MSR must not raise a
  #GP.  KVM/QEmu does not behave like real hardware in this case.

  dmesg output:

  SUPR0GipMap: fGetGipCpu=0x3
  general protection fault:  [#1] SMP
  Modules linked in: pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) 
vboxdrv(OE) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
xt_tcpudp bridge stp llc iptable_filter ip_tables x_tables ppdev kvm_intel kvm 
irqbypass snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core 
snd_hwdep snd_pcm snd_timer i2c_piix4 snd input_leds soundcore joydev 
8250_fintek mac_hid serio_raw pvpanic parport_pc parport ib_iser rdma_cm iw_cm 
ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov 
async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 
multipath linear crct10dif_pclmul crc32_pclmul qxl ttm drm_kms_helper 
syscopyarea sysfillrect aesni_intel sysimgblt fb_sys_fops aes_x86_64 lrw 
gf128mul glue_helper ablk_helper cryptd psmouse floppy drm pata_acpi
  CPU: 0 PID: 31507 Comm: EMT Tainted: G   OE   4.4.0-15-generic 
#31-Ubuntu
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  task: 880034c0a580 ti: 880002e0 task.ti: 880002e0
  RIP: 0010:[]  [] 0xc067e506
  RSP: 0018:880002e03d70  EFLAGS: 00010206
  RAX: 06f0 RBX: ffdb RCX: 009b
  RDX:  RSI: 880002e03d00 RDI: 880002e03cc8
  RBP: 880002e03d90 R08: 0004 R09: 06f0
  R10: 49656e69 R11: 0f8bfbff R12: 0020
  R13:  R14: c957407c R15: c0645260
  FS:  7f89b8f6b700() GS:88007fc0() knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2: 7f89b8d1 CR3: 35ae1000 CR4: 06f0
  Stack:
      
   880002e03db0 c0693e93 c9574010 880035aae550
   880002e03e30 c060a3e7 880002e03e10 0282
  Call Trace:
   [] ? supdrvIOCtl+0x2de7/0x3250 [vboxdrv]
   [] ? VBoxDrvLinuxIOCtl_5_0_16+0x150/0x250 [vboxdrv]
   [] ? do_vfs_ioctl+0x29f/0x490
   [] ? __do_page_fault+0x1b4/0x400
   [] ? SyS_ioctl+0x79/0x90
   [] ? entry_SYSCALL_64_fastpath+0x16/0x71
  Code: 88 e4 fc ff ff b9 3a 00 00 00 0f 32 48 c1 e2 20 89 c0 48 09 d0 48 89 05 
f9 db 0e 00 0f 20 e0 b9 9b 00 00 00 48 89 05 d2 db 0e 00 <0f> 32 48 c1 e2 20 89 
c0 b9 80 00 00 c0 48 09 d0 48 89 05 cb db
  RIP  [] 0xc067e506
   RSP 
  ---[ end trace b3284b6520f49e0d ]---

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1563152/+subscriptions

[Qemu-devel] [PATCH v3 1/3] hw/mips: implement GIC Interval Timer

2016-03-28 Thread Yongbok Kim

The interval timer is similar to the CP0 Count/Compare timer within
each processor. The difference is the GIC_SH_COUNTER register is global
to the system so that all processors have the same time reference.

To ease implementation, all VPs are having its own QEMU timer but sharing
global settings and registers such as GIC_SH_CONFIG.COUTNSTOP and
GIC_SH_COUNTER.

MIPS GIC Interval Timer does support upto 64 bits of Count register but
in this implementation it is limited to 32 bits only.

Signed-off-by: Yongbok Kim 
---
 hw/timer/Makefile.objs   |1 +
 hw/timer/mips_gictimer.c |  141 ++
 include/hw/timer/mips_gictimer.h |   46 
 3 files changed, 188 insertions(+), 0 deletions(-)
 create mode 100644 hw/timer/mips_gictimer.c
 create mode 100644 include/hw/timer/mips_gictimer.h

diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs
index 003c14f..7ba8c23 100644
--- a/hw/timer/Makefile.objs
+++ b/hw/timer/Makefile.objs
@@ -26,6 +26,7 @@ obj-$(CONFIG_OMAP) += omap_synctimer.o
 obj-$(CONFIG_PXA2XX) += pxa2xx_timer.o
 obj-$(CONFIG_SH4) += sh_timer.o
 obj-$(CONFIG_DIGIC) += digic-timer.o
+obj-$(CONFIG_MIPS_CPS) += mips_gictimer.o
 
 obj-$(CONFIG_MC146818RTC) += mc146818rtc.o
 
diff --git a/hw/timer/mips_gictimer.c b/hw/timer/mips_gictimer.c
new file mode 100644
index 000..f1ea9ba
--- /dev/null
+++ b/hw/timer/mips_gictimer.c
@@ -0,0 +1,141 @@
+/*
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file "COPYING" in the main directory of this archive
+ * for more details.
+ *
+ * Copyright (C) 2016 Imagination Technologies
+ */
+
+#include "qemu/osdep.h"
+#include "hw/hw.h"
+#include "hw/sysbus.h"
+#include "hw/timer/mips_gictimer.h"
+
+#define TIMER_PERIOD 10 /* 10 ns period for 100 Mhz frequency */
+
+static void gic_vptimer_update(MIPSGICTimerState *gictimer,
+   uint32_t vp_index, uint64_t now)
+{
+uint64_t next;
+uint32_t wait;
+
+wait = gictimer->vptimers[vp_index].comparelo - gictimer->sh_counterlo -
+   (uint32_t)(now / TIMER_PERIOD);
+next = now + (uint64_t)wait * TIMER_PERIOD;
+
+timer_mod(gictimer->vptimers[vp_index].qtimer, next);
+}
+
+static void gic_vptimer_expire(MIPSGICTimerState *gictimer, uint32_t vp_index,
+   uint64_t now)
+{
+if (gictimer->countstop) {
+/* timer stopped */
+return;
+}
+gictimer->cb(gictimer->opaque, vp_index);
+gic_vptimer_update(gictimer, vp_index, now);
+}
+
+static void gic_vptimer_cb(void *opaque)
+{
+MIPSGICTimerVPState *vptimer = opaque;
+MIPSGICTimerState *gictimer = vptimer->gictimer;
+gic_vptimer_expire(gictimer, vptimer->vp_index,
+   qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL));
+}
+
+uint32_t mips_gictimer_get_sh_count(MIPSGICTimerState *gictimer)
+{
+int i;
+if (gictimer->countstop) {
+return gictimer->sh_counterlo;
+} else {
+uint64_t now;
+now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+for (i = 0; i < gictimer->num_vps; i++) {
+if (timer_pending(gictimer->vptimers[i].qtimer)
+&& timer_expired(gictimer->vptimers[i].qtimer, now)) {
+/* The timer has already expired.  */
+gic_vptimer_expire(gictimer, i, now);
+}
+}
+return gictimer->sh_counterlo + (uint32_t)(now / TIMER_PERIOD);
+}
+}
+
+void mips_gictimer_store_sh_count(MIPSGICTimerState *gictimer, uint64_t count)
+{
+int i;
+uint64_t now;
+
+if (gictimer->countstop || !gictimer->vptimers[0].qtimer) {
+gictimer->sh_counterlo = count;
+} else {
+/* Store new count register */
+now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+gictimer->sh_counterlo = count - (uint32_t)(now / TIMER_PERIOD);
+/* Update timer timer */
+for (i = 0; i < gictimer->num_vps; i++) {
+gic_vptimer_update(gictimer, i, now);
+}
+}
+}
+
+uint32_t mips_gictimer_get_vp_compare(MIPSGICTimerState *gictimer,
+  uint32_t vp_index)
+{
+return gictimer->vptimers[vp_index].comparelo;
+}
+
+void mips_gictimer_store_vp_compare(MIPSGICTimerState *gictimer,
+uint32_t vp_index, uint64_t compare)
+{
+gictimer->vptimers[vp_index].comparelo = (uint32_t) compare;
+gic_vptimer_update(gictimer, vp_index,
+   qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL));
+}
+
+uint8_t mips_gictimer_get_countstop(MIPSGICTimerState *gictimer)
+{
+return gictimer->countstop;
+}
+
+void mips_gictimer_start_count(MIPSGICTimerState *gictimer)
+{
+gictimer->countstop = 0;
+mips_gictimer_store_sh_count(gictimer, gictimer->sh_counterlo);
+}
+
+void mips_gictimer_stop_count(MIPSGICTimerState *gictimer)
+{
+int i;
+
+gictimer->countstop = 1;
+/* Store the current value */
+gictimer->sh_coun

[Qemu-devel] [PATCH v3 0/3] mips: add Global Interrupt Controller

2016-03-28 Thread Yongbok Kim

This patchset implement MIPS Global Interrupt Controller.

The Global Interrupt Controller (GIC) is responsible for mapping each
internal and external interrupt to the correct location for servicing.

Limitation:
Level triggering only
GIC CounterHi not implemented (Countbits = 32bits)
DINT not implemented
Local WatchDog, Fast Debug Channel, Perf Counter not implemented

It is based on the earlier un-merged GIC implementation.
https://lists.gnu.org/archive/html/qemu-devel/2013-03/msg00194.html

For more information, 
http://imgtec.com/mips/warrior/p-class-p5600-multiprocessor-core/
http://imgtec.com/mips/warrior/i-class-i6400-multiprocessor-core/

v3:
* rebased to top of CPS changes
* removed GCR from the patchset as CPS patchset includes it
* added checking boundaries
* fixed racy code (James)
* split gic interval timer into separate file hw/timer/mips_gictimer.c
* updated review comments (James, Leon)
* cosmetic changes

v2:
* added user mode section (James)
* moved mips_gic.c into hw/intc, mips_gcr.c into hw/misc (PeterM, PeterC)
* renamed obvious duplications (Leon)
* renamed gic_irqs into irq_state (Leon)
* removed pointer to gic IRQs from env (Leon)
* fixed loading target_ulong CMGCRBase (Leon)
* removed unimplemented registers (Leon)
* fixed writing to wedge register (Leon)
* removed magic numbers
* updated usage of map_vp to indicate not mapped
* cosmetic changes and other review comments

Leon Alrae (1):
  hw/mips/cps: create GIC block inside CPS

Yongbok Kim (2):
  hw/mips: implement GIC Interval Timer
  hw/mips: implement Global Interrupt Controller

 hw/intc/Makefile.objs|1 +
 hw/intc/mips_gic.c   |  459 ++
 hw/mips/cps.c|   25 ++-
 hw/mips/mips_malta.c |4 +-
 hw/misc/mips_cmgcr.c |   33 +++
 hw/timer/Makefile.objs   |1 +
 hw/timer/mips_gictimer.c |  141 
 include/hw/intc/mips_gic.h   |  215 ++
 include/hw/mips/cps.h|2 +
 include/hw/misc/mips_cmgcr.h |9 +
 include/hw/timer/mips_gictimer.h |   46 
 11 files changed, 926 insertions(+), 10 deletions(-)
 create mode 100644 hw/intc/mips_gic.c
 create mode 100644 hw/timer/mips_gictimer.c
 create mode 100644 include/hw/intc/mips_gic.h
 create mode 100644 include/hw/timer/mips_gictimer.h

[Qemu-devel] [PATCH v3 3/3] hw/mips/cps: create GIC block inside CPS

2016-03-28 Thread Yongbok Kim

From: Leon Alrae 

Add GIC to CPS and expose its interrupt pins instead of CPU's.

Signed-off-by: Leon Alrae 
---
 hw/mips/cps.c|   25 ++---
 hw/mips/mips_malta.c |4 +---
 hw/misc/mips_cmgcr.c |   33 +
 include/hw/mips/cps.h|2 ++
 include/hw/misc/mips_cmgcr.h |9 +
 5 files changed, 63 insertions(+), 10 deletions(-)

diff --git a/hw/mips/cps.c b/hw/mips/cps.c
index 59e7926..ddf2294 100644
--- a/hw/mips/cps.c
+++ b/hw/mips/cps.c
@@ -24,13 +24,8 @@
 
 qemu_irq get_cps_irq(MIPSCPSState *s, int pin_number)
 {
-MIPSCPU *cpu = MIPS_CPU(first_cpu);
-CPUMIPSState *env = &cpu->env;
-
 assert(pin_number < s->num_irq);
-
-/* TODO: return GIC pins once implemented */
-return env->irq[pin_number];
+return s->gic.irq_state[pin_number].irq;
 }
 
 static void mips_cps_init(Object *obj)
@@ -96,6 +91,21 @@ static void mips_cps_realize(DeviceState *dev, Error **errp)
 memory_region_add_subregion(&s->container, 0,
 sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->cpc), 
0));
 
+/* Global Interrupt Controller */
+object_initialize(&s->gic, sizeof(s->gic), TYPE_MIPS_GIC);
+qdev_set_parent_bus(DEVICE(&s->gic), sysbus_get_default());
+
+object_property_set_int(OBJECT(&s->gic), s->num_vp, "num-vp", &err);
+object_property_set_int(OBJECT(&s->gic), 128, "num-irq", &err);
+object_property_set_bool(OBJECT(&s->gic), true, "realized", &err);
+if (err != NULL) {
+error_propagate(errp, err);
+return;
+}
+
+memory_region_add_subregion(&s->container, 0,
+sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->gic), 
0));
+
 /* Global Configuration Registers */
 gcr_base = env->CP0_CMGCRBase << 4;
 
@@ -105,6 +115,7 @@ static void mips_cps_realize(DeviceState *dev, Error **errp)
 object_property_set_int(OBJECT(&s->gcr), s->num_vp, "num-vp", &err);
 object_property_set_int(OBJECT(&s->gcr), 0x800, "gcr-rev", &err);
 object_property_set_int(OBJECT(&s->gcr), gcr_base, "gcr-base", &err);
+object_property_set_link(OBJECT(&s->gcr), OBJECT(&s->gic.mr), "gic", &err);
 object_property_set_link(OBJECT(&s->gcr), OBJECT(&s->cpc.mr), "cpc", &err);
 object_property_set_bool(OBJECT(&s->gcr), true, "realized", &err);
 if (err != NULL) {
@@ -118,7 +129,7 @@ static void mips_cps_realize(DeviceState *dev, Error **errp)
 
 static Property mips_cps_properties[] = {
 DEFINE_PROP_UINT32("num-vp", MIPSCPSState, num_vp, 1),
-DEFINE_PROP_UINT32("num-irq", MIPSCPSState, num_irq, 8),
+DEFINE_PROP_UINT32("num-irq", MIPSCPSState, num_irq, 256),
 DEFINE_PROP_STRING("cpu-model", MIPSCPSState, cpu_model),
 DEFINE_PROP_END_OF_LIST()
 };
diff --git a/hw/mips/mips_malta.c b/hw/mips/mips_malta.c
index c26e953..cb55b20 100644
--- a/hw/mips/mips_malta.c
+++ b/hw/mips/mips_malta.c
@@ -954,9 +954,7 @@ static void create_cps(MaltaState *s, const char *cpu_model,
 
 sysbus_mmio_map_overlap(SYS_BUS_DEVICE(s->cps), 0, 0, 1);
 
-/* FIXME: When GIC is present then we should use GIC's IRQ 3.
-   Until then CPS exposes CPU's IRQs thus use the default IRQ 2. */
-*i8259_irq = get_cps_irq(s->cps, 2);
+*i8259_irq = get_cps_irq(s->cps, 3);
 *cbus_irq = NULL;
 }
 
diff --git a/hw/misc/mips_cmgcr.c b/hw/misc/mips_cmgcr.c
index c43b61b..1d72694 100644
--- a/hw/misc/mips_cmgcr.c
+++ b/hw/misc/mips_cmgcr.c
@@ -15,12 +15,18 @@
 #include "sysemu/sysemu.h"
 #include "hw/misc/mips_cmgcr.h"
 #include "hw/misc/mips_cpc.h"
+#include "hw/intc/mips_gic.h"
 
 static inline bool is_cpc_connected(MIPSGCRState *s)
 {
 return s->cpc_mr != NULL;
 }
 
+static inline bool is_gic_connected(MIPSGCRState *s)
+{
+return s->gic_mr != NULL;
+}
+
 static inline void update_cpc_base(MIPSGCRState *gcr, uint64_t val)
 {
 if (is_cpc_connected(gcr)) {
@@ -34,6 +40,19 @@ static inline void update_cpc_base(MIPSGCRState *gcr, 
uint64_t val)
 }
 }
 
+static inline void update_gic_base(MIPSGCRState *gcr, uint64_t val)
+{
+if (is_gic_connected(gcr)) {
+gcr->gic_base = val & GCR_GIC_BASE_MSK;
+memory_region_transaction_begin();
+memory_region_set_address(gcr->gic_mr,
+  gcr->gic_base & GCR_GIC_BASE_GICBASE_MSK);
+memory_region_set_enabled(gcr->gic_mr,
+  gcr->gic_base & GCR_GIC_BASE_GICEN_MSK);
+memory_region_transaction_commit();
+}
+}
+
 /* Read GCR registers */
 static uint64_t gcr_read(void *opaque, hwaddr addr, unsigned size)
 {
@@ -48,8 +67,12 @@ static uint64_t gcr_read(void *opaque, hwaddr addr, unsigned 
size)
 return gcr->gcr_base;
 case GCR_REV_OFS:
 return gcr->gcr_rev;
+case GCR_GIC_BASE_OFS:
+return gcr->gic_base;
 case GCR_CPC_BASE_OFS:
 return gcr->cpc_base;
+case GCR_GIC_STATUS_OFS:
+return is_gic_connected(gcr);
 case GCR_CPC_STATUS_

[Qemu-devel] [PATCH v3 2/3] hw/mips: implement Global Interrupt Controller

2016-03-28 Thread Yongbok Kim

The Global Interrupt Controller (GIC) is responsible for mapping each
internal and external interrupt to the correct location for servicing.

The internal representation of registers is different from the specification
in order to consolidate information for each GIC Interrupt Sources and Virtual
Processors with same functionalities. For example SH_MAP00_VP00 registers are
defined like each bit represents a VP but in this implementation the equivalent
map_vp contains VP number in integer form for ease accesses. When it is being
accessed via read write functions an internal data is converted back into the
original format as the specification.

Limitations:
Level triggering only
GIC CounterHi not implemented (Countbits = 32bits)
DINT not implemented
Local WatchDog, Fast Debug Channel, Perf Counter not implemented

Signed-off-by: Yongbok Kim 
---
 hw/intc/Makefile.objs  |1 +
 hw/intc/mips_gic.c |  459 
 include/hw/intc/mips_gic.h |  215 +
 3 files changed, 675 insertions(+), 0 deletions(-)
 create mode 100644 hw/intc/mips_gic.c
 create mode 100644 include/hw/intc/mips_gic.h

diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
index 0e47f0f..fa95f77 100644
--- a/hw/intc/Makefile.objs
+++ b/hw/intc/Makefile.objs
@@ -32,3 +32,4 @@ obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
 obj-$(CONFIG_S390_FLIC) += s390_flic.o
 obj-$(CONFIG_S390_FLIC_KVM) += s390_flic_kvm.o
 obj-$(CONFIG_ASPEED_SOC) += aspeed_vic.o
+obj-$(CONFIG_MIPS_CPS) += mips_gic.o
diff --git a/hw/intc/mips_gic.c b/hw/intc/mips_gic.c
new file mode 100644
index 000..322ee94
--- /dev/null
+++ b/hw/intc/mips_gic.c
@@ -0,0 +1,459 @@
+/*
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file "COPYING" in the main directory of this archive
+ * for more details.
+ *
+ * Copyright (C) 2012  MIPS Technologies, Inc.  All rights reserved.
+ * Authors: Sanjay Lal 
+ *
+ * Copyright (C) 2016 Imagination Technologies
+ */
+
+#include "qemu/osdep.h"
+#include "hw/hw.h"
+#include "hw/sysbus.h"
+#include "exec/memory.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/kvm.h"
+#include "kvm_mips.h"
+#include "hw/intc/mips_gic.h"
+
+static void mips_gic_set_vp_irq(MIPSGICState *gic, int vp, int pin, int level)
+{
+int ored_level = level;
+int i;
+
+/* ORing pending registers sharing same pin */
+if (!ored_level) {
+for (i = 0; i < gic->num_irq; i++) {
+if ((gic->irq_state[i].map_pin & GIC_MAP_MSK) == pin &&
+gic->irq_state[i].map_vp == vp &&
+gic->irq_state[i].enabled) {
+ored_level |= gic->irq_state[i].pending;
+}
+if (ored_level) {
+/* no need to iterate all interrupts */
+break;
+}
+}
+if (((gic->vps[vp].compare_map & GIC_MAP_MSK) == pin) &&
+(gic->vps[vp].mask & GIC_VP_MASK_CMP_MSK)) {
+/* ORing with local pending register (count/compare) */
+ored_level |= (gic->vps[vp].pend & GIC_VP_MASK_CMP_MSK) >>
+  GIC_VP_MASK_CMP_SHF;
+}
+}
+if (kvm_enabled())  {
+kvm_mips_set_ipi_interrupt(mips_env_get_cpu(gic->vps[vp].env),
+   pin + GIC_CPU_PIN_OFFSET,
+   ored_level);
+} else {
+qemu_set_irq(gic->vps[vp].env->irq[pin + GIC_CPU_PIN_OFFSET],
+ ored_level);
+}
+}
+
+static void gic_set_irq(void *opaque, int n_IRQ, int level)
+{
+MIPSGICState *gic = (MIPSGICState *) opaque;
+int vp = gic->irq_state[n_IRQ].map_vp;
+int pin = gic->irq_state[n_IRQ].map_pin & GIC_MAP_MSK;
+
+gic->irq_state[n_IRQ].pending = (uint8_t) level;
+if (!gic->irq_state[n_IRQ].enabled) {
+/* GIC interrupt source disabled */
+return;
+}
+if (vp < 0 || vp >= gic->num_vps) {
+return;
+}
+mips_gic_set_vp_irq(gic, vp, pin, level);
+}
+
+#define OFFSET_CHECK(c) \
+do {\
+if (!(c)) { \
+goto bad_offset;\
+}   \
+} while (0)
+
+/* GIC Read VP Local/Other Registers */
+static uint64_t gic_read_vp(MIPSGICState *gic, uint32_t vp_index, hwaddr addr,
+unsigned size)
+{
+switch (addr) {
+case GIC_VP_CTL_OFS:
+return gic->vps[vp_index].ctl;
+case GIC_VP_PEND_OFS:
+mips_gictimer_get_sh_count(gic->gic_timer);
+return gic->vps[vp_index].pend;
+case GIC_VP_MASK_OFS:
+return gic->vps[vp_index].mask;
+case GIC_VP_COMPARE_MAP_OFS:
+return gic->vps[vp_index].compare_map;
+case GIC_VP_OTHER_ADDR_OFS:
+return gic->vps[vp_index].other_addr;
+case GIC_VP_IDENT_OFS:
+return vp_index;
+

[Qemu-devel] [PULL 1/3] block: never cancel a streaming job without running stream_complete()

2016-03-28 Thread Jeff Cody

From: Alberto Garcia 

We need to call stream_complete() in order to do all the necessary
clean-ups, even if there's an early failure. At the moment it's only
useful to make sure that s->backing_file_str is not leaked, but it
will become more important if we introduce support for streaming to
any intermediate node.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Fam Zheng 
Reviewed-by: Stefan Hajnoczi 
Message-id: 
2abedf2debc65c250560237f31a8e6756883c8fc.1458566441.git.be...@igalia.com
Signed-off-by: Jeff Cody 
---
 block/stream.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index cafaa07..eea3938 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -89,21 +89,21 @@ static void coroutine_fn stream_run(void *opaque)
 StreamCompleteData *data;
 BlockDriverState *bs = s->common.bs;
 BlockDriverState *base = s->base;
-int64_t sector_num, end;
+int64_t sector_num = 0;
+int64_t end = -1;
 int error = 0;
 int ret = 0;
 int n = 0;
 void *buf;
 
 if (!bs->backing) {
-block_job_completed(&s->common, 0);
-return;
+goto out;
 }
 
 s->common.len = bdrv_getlength(bs);
 if (s->common.len < 0) {
-block_job_completed(&s->common, s->common.len);
-return;
+ret = s->common.len;
+goto out;
 }
 
 end = s->common.len >> BDRV_SECTOR_BITS;
@@ -190,6 +190,7 @@ wait:
 
 qemu_vfree(buf);
 
+out:
 /* Modify backing chain and close BDSes in main loop */
 data = g_malloc(sizeof(*data));
 data->ret = ret;
-- 
1.9.3

[Qemu-devel] [PULL 2/3] qemu-iotests: fix test_stream_partial()

2016-03-28 Thread Jeff Cody

From: Alberto Garcia 

This test is streaming to the top layer using the intermediate image
as the base. This is a mistake since block-stream never copies data
from the base image and its backing chain, so this is effectively a
no-op.

In addition to fixing the base parameter, this patch also writes some
data to the intermediate image before the test, so there's something
to copy and the test is meaningful.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Message-id: 
2efa304da38b32d47c120ce728568a589c5a3afc.1458566441.git.be...@igalia.com
Signed-off-by: Jeff Cody 
---
 tests/qemu-iotests/030 | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 32469ef..48a924c 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -35,6 +35,7 @@ class TestSingleDrive(iotests.QMPTestCase):
 qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % 
backing_img, mid_img)
 qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % 
mid_img, test_img)
 qemu_io('-f', 'raw', '-c', 'write -P 0x1 0 512', backing_img)
+qemu_io('-f', iotests.imgfmt, '-c', 'write -P 0x1 524288 512', mid_img)
 self.vm = iotests.VM().add_drive("blkdebug::" + test_img)
 self.vm.launch()
 
@@ -93,7 +94,7 @@ class TestSingleDrive(iotests.QMPTestCase):
 def test_stream_partial(self):
 self.assert_no_active_block_jobs()
 
-result = self.vm.qmp('block-stream', device='drive0', base=mid_img)
+result = self.vm.qmp('block-stream', device='drive0', base=backing_img)
 self.assert_qmp(result, 'return', {})
 
 self.wait_until_completed()
-- 
1.9.3

[Qemu-devel] [PULL 0/3] Block patches for 2.6

2016-03-28 Thread Jeff Cody

The following changes since commit b68a80139e37e806f004237e55311ebc42151434:

  Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20160324' into 
staging (2016-03-24 16:24:02 +)

are available in the git repository at:


  g...@github.com:codyprime/qemu-kvm-jtc.git tags/block-pull-request

for you to fetch changes up to 409d54986d47b8279c70591e65ee4f3b1771944a:

  qemu-iotests: add no-op streaming test (2016-03-28 13:56:44 -0400)


Block patches for 2.6


Alberto Garcia (3):
  block: never cancel a streaming job without running stream_complete()
  qemu-iotests: fix test_stream_partial()
  qemu-iotests: add no-op streaming test

 block/stream.c | 11 ++-
 tests/qemu-iotests/030 | 21 -
 tests/qemu-iotests/030.out |  4 ++--
 3 files changed, 28 insertions(+), 8 deletions(-)

-- 
1.9.3

[Qemu-devel] [PULL 3/3] qemu-iotests: add no-op streaming test

2016-03-28 Thread Jeff Cody

From: Alberto Garcia 

This patch tests that in a partial block-stream operation, no data is
ever copied from the base image.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Message-id: 
5272a2aa57bc0b3f981f8b3e0c813e58a88c974b.1458566441.git.be...@igalia.com
Signed-off-by: Jeff Cody 
---
 tests/qemu-iotests/030 | 18 ++
 tests/qemu-iotests/030.out |  4 ++--
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 48a924c..3ac2443 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -91,6 +91,24 @@ class TestSingleDrive(iotests.QMPTestCase):
  qemu_io('-f', iotests.imgfmt, '-c', 'map', test_img),
  'image file map does not match backing file after 
streaming')
 
+def test_stream_no_op(self):
+self.assert_no_active_block_jobs()
+
+# The image map is empty before the operation
+empty_map = qemu_io('-f', iotests.imgfmt, '-c', 'map', test_img)
+
+# This is a no-op: no data should ever be copied from the base image
+result = self.vm.qmp('block-stream', device='drive0', base=mid_img)
+self.assert_qmp(result, 'return', {})
+
+self.wait_until_completed()
+
+self.assert_no_active_block_jobs()
+self.vm.shutdown()
+
+self.assertEqual(qemu_io('-f', iotests.imgfmt, '-c', 'map', test_img),
+ empty_map, 'image file map changed after a no-op')
+
 def test_stream_partial(self):
 self.assert_no_active_block_jobs()
 
diff --git a/tests/qemu-iotests/030.out b/tests/qemu-iotests/030.out
index fa16b5c..6323079 100644
--- a/tests/qemu-iotests/030.out
+++ b/tests/qemu-iotests/030.out
@@ -1,5 +1,5 @@
-.
+..
 --
-Ran 13 tests
+Ran 14 tests
 
 OK
-- 
1.9.3

[Qemu-devel] [PATCH 3/1] doc: Propose Structured Replies extension

2016-03-28 Thread Eric Blake

The existing transmission phase protocol is difficult to sniff,
because correct interpretation of the server stream requires
context from the client stream (or risks false positives if
data payloads happen to contain the protocol magic numbers).  It
also prohibits the ability to do short reads, or to return a
read error without also sending length bytes of data.

Remedy this by adding a new global/client flag negotiation,
which affects the response of the NBD_CMD_READ command, and sets
forth rules for how future command responses must behave when
they carry a data payload.

Signed-off-by: Eric Blake 
---
 doc/proto.md | 123 +++
 1 file changed, 123 insertions(+)

diff --git a/doc/proto.md b/doc/proto.md
index 44579fc..f687e3e 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -209,6 +209,10 @@ same value for handle as was sent by the client for each 
request that
 the server is replying to, so that the client may correlate which
 request is receiving a response.

+Note that it is impossible to tell by reading just the server traffic
+whether a data field will be present.  To remedy this, the experimental
+`Structured Reply` extension has been introduced; see below.
+
 ## Values

 This section describes the value and meaning of constants (other than
@@ -231,6 +235,8 @@ the first magic number.
 - bit 1, `NBD_FLAG_NO_ZEROES`; if set, and if the client replies with
   `NBD_FLAG_C_NO_ZEROES` in the client flags field, the server MUST NOT
   send the 124 bytes of zero at the end of the negotiation.
+- bit 2, `NBD_FLAG_STRUCTURED_REPLY`; defined by the experimental
+  `Structured Reply` extension; see below.

 The server MUST NOT set any other flags, and SHOULD NOT change behaviour
 unless the client responds with a corresponding flag.  The server MUST
@@ -265,6 +271,8 @@ receiving the global flags from the server.
 - bit 1, `NBD_FLAG_C_NO_ZEROES`; MUST NOT be set if the server did not
   set `NBD_FLAG_NO_ZEROES`. If set, the server MUST NOT send the 124
   bytes of zeroes at the end of the negotiation.
+- bit 2, `NBD_FLAG_C_STRUCTURED_REPLY`; defined by the experimental
+  `Structured Reply` extension; see below.

 Clients SHOULD NOT set any other flags; the server MUST drop the
 connection if the client sets an unknown flag, or a flag that does
@@ -435,6 +443,10 @@ The following request types exist:
 signalling no error), the server MUST immediately close the
 connection; it MUST NOT send any further data to the client.

+The experimental `Structured Reply` extension changes the set of
+valid replies, in part to allow recovery after a partial read; see
+below.
+
 * `NBD_CMD_WRITE` (1)

 A write request. Length and offset define the location and amount of
@@ -609,6 +621,117 @@ option reply type.
   message if they do not also send it as a reply to the
   `NBD_OPT_SELECT` message.

+### `Structured Reply` extension
+
+Some major downsides of the default reply to `NBD_CMD_READ` is that it
+is not possible to support partial reads (the command must succeed or
+fail as a whole, and len bytes of data must be sent even on an error,
+unless the connection is closed); nor is it possible to decode the
+server traffic without also knowing what pending read requests were
+sent by the client.
+
+To remedy this, a `Structured Reply` extension is envisioned. This
+extension adds a global flag, a client flag, a new reply type during
+the transmission phase, and alters the set of valid replies to an
+existing command.
+
+* `NBD_FLAG_STRUCTURED_REPLY` (bit 2)
+
+  This is a global flag; if set, and if the client replies with
+  `NBD_FLAG_C_STRUCTURED_REPLY` in the client flags field, the server
+  MUST use structured replies to the `NBD_CMD_READ` command.
+
+* `NBD_FLAG_C_STRUCTURED_REPLY` (bit 2)
+
+  This is a client flag; MUST NOT be set if the server did not set
+  `NBD_FLAG_STRUCTURED_REPLY`. If set, the server must use structured
+  replies to the `NBD_CMD_READ` command.
+
+* Transmission phase
+
+  The transmission phase includes a third message type: the structured
+  reply.  If `NBD_FLAG_C_STRUCTURED_REPLY` was negotiated, then the
+  normal server reply will never contain a data payload, and all
+  server replies that send data will be in the following form:
+
+  S: 32 bits, 0x668e33ef, magic (`NBD_STRUCTURED_REPLY_MAGIC`)  
+  S: 32 bits, error  
+  S: 64 bits, handle  
+  S: 32 bits, length of payload (unsigned)  
+  S: *length* bytes of payload data
+
+  If `NBD_FLAG_C_STRUCTURED_REPLY` was not negotiated, then the normal
+  server reply with a data payload will be used for `NBD_CMD_READ`;
+  but any other replies with a data payload will still use a
+  structured reply (that is, only `NBD_CMD_READ` is allowed to send
+  data in the non-structured form, and negotiating
+  `NBD_FLAG_C_STRUCTURED_REPLY` only affects replies to
+  `NBD_CMD_READ`).  This implies that any new commands that require
+  data in the reply should be gated by

[Qemu-devel] [PATCH 2/1] doc: More details on flag negotiation

2016-03-28 Thread Eric Blake

Add documentation that makes it clear that the server may add
flags that the client does not recognize, but that the client
may ignore those flags because the server will not change
behavior without agreement; meanwhile, the client must not set
flags the server does not recognize (since there is no second
round of server reply, the only sane course of action is for
the server to disconnect).

Also, give a forward reference to the effect of negotiating
NO_ZEROES on the server's reply to NBD_OPT_EXPORT_NAME, and
call out the fact that none of the server's global flags should
be used during oldstyle negotiation since a client has no chance
to respond with the corresponding client flag.

Signed-off-by: Eric Blake 
---
 doc/proto.md | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/doc/proto.md b/doc/proto.md
index d0102e0..44579fc 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -103,8 +103,10 @@ C: 32 bits, flags

 This completes the initial phase of negotiation; the client and server
 now both know they understand the first version of the newstyle
-handshake, with no options. What follows is a repeating group of
-options. In non-fixed newstyle only one option can be set
+handshake, with no options. The client SHOULD ignore any global flags
+it does not recognize, while the server MUST close the connection if
+it does not recognize the client's flags.  What follows is a repeating
+group of options. In non-fixed newstyle only one option can be set
 (`NBD_OPT_EXPORT_NAME`), and it is not optional.

 At this point, we move on to option haggling, during which point the
@@ -126,7 +128,8 @@ about the used export:

 S: 64 bits, size of the export in bytes (unsigned)  
 S: 16 bits, export flags  
-S: 124 bytes, zeroes (reserved)
+S: 124 bytes, zeroes (reserved) (unless `NBD_FLAG_C_NO_ZEROES` was
+   negotiated by the client)

 If the server is unwilling to allow the export, it should close the
 connection.
@@ -229,6 +232,10 @@ the first magic number.
   `NBD_FLAG_C_NO_ZEROES` in the client flags field, the server MUST NOT
   send the 124 bytes of zero at the end of the negotiation.

+The server MUST NOT set any other flags, and SHOULD NOT change behaviour
+unless the client responds with a corresponding flag.  The server MUST
+NOT set any of these flags during oldstyle negotiation.
+
 # Export flags

 This field of 16 bits is sent by the server after option haggling, or
@@ -259,6 +266,10 @@ receiving the global flags from the server.
   set `NBD_FLAG_NO_ZEROES`. If set, the server MUST NOT send the 124
   bytes of zeroes at the end of the negotiation.

+Clients SHOULD NOT set any other flags; the server MUST drop the
+connection if the client sets an unknown flag, or a flag that does
+not match something advertised by the server.
+
  Option types

 These values are used in the "option" field during the option haggling
-- 
2.5.5

Re: [Qemu-devel] [PATCH] spapr: compute interrupt vector address from LPCR

2016-03-28 Thread David Gibson

On Thu, Mar 24, 2016 at 04:28:53PM +0100, Cédric Le Goater wrote:
> This address is changed by the linux kernel using the H_SET_MODE hcall
> and needs to be migrated in order to restart a spapr VM running in
> TCG. This can be done using the AIL bits from the LPCR register.
> 
> The patch introduces a spapr_h_set_mode_resource_addr() helper to
> share some code with the H_SET_MODE hcall.
> 
> Signed-off-by: Cédric Le Goater 
> ---
>  hw/ppc/spapr.c |   21 +
>  hw/ppc/spapr_hcall.c   |   13 ++---
>  include/hw/ppc/spapr.h |   14 ++
>  3 files changed, 37 insertions(+), 11 deletions(-)
> 
> Index: qemu-dgibson-for-2.6.git/hw/ppc/spapr.c
> ===
> --- qemu-dgibson-for-2.6.git.orig/hw/ppc/spapr.c
> +++ qemu-dgibson-for-2.6.git/hw/ppc/spapr.c
> @@ -1244,6 +1244,24 @@ static bool spapr_vga_init(PCIBus *pci_b
>  }
>  }
>  
> +static int load_excp_prefix(void)
> +{
> +CPUState *cs;
> +
> +CPU_FOREACH(cs) {
> +CPUPPCState *env = &POWERPC_CPU(cs)->env;
> +int ail = (env->spr[SPR_LPCR] & LPCR_AIL) >> LPCR_AIL_SHIFT;
> +
> +env->excp_prefix = spapr_h_set_mode_resource_addr(ail);
> +if (env->excp_prefix == H_UNSUPPORTED_FLAG) {
> +error_report("LPCR has an invalid AIL value");
> +return -EINVAL;
> +}
> +}
> +
> +return 0;
> +}
> +
>  static int spapr_post_load(void *opaque, int version_id)
>  {
>  sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
> @@ -1257,6 +1275,9 @@ static int spapr_post_load(void *opaque,
>  err = spapr_rtc_import_offset(spapr->rtc, spapr->rtc_offset);
>  }
>  
> +if (!err) {
> +err = load_excp_prefix();
> +}
>  return err;
>  }

As Greg says, it seems like this would make more sense in
cpu_post_load().


> Index: qemu-dgibson-for-2.6.git/include/hw/ppc/spapr.h
> ===
> --- qemu-dgibson-for-2.6.git.orig/include/hw/ppc/spapr.h
> +++ qemu-dgibson-for-2.6.git/include/hw/ppc/spapr.h
> @@ -561,6 +561,20 @@ struct sPAPREventLogEntry {
>  QTAILQ_ENTRY(sPAPREventLogEntry) next;
>  };
>  
> +static inline target_ulong spapr_h_set_mode_resource_addr(target_ulong 
> mflags)
> +{
> +switch (mflags) {
> +case H_SET_MODE_ADDR_TRANS_NONE:
> +return 0;
> +case H_SET_MODE_ADDR_TRANS_0001_8000:
> +return 0x18000;
> +case H_SET_MODE_ADDR_TRANS_C000___4000:
> +return 0xC0004000ULL;
> +default:
> +return H_UNSUPPORTED_FLAG;
> +}
> +}

I'd like to see a different name for this function, and to move it
into target-ppc, since I imagine we'll want to re-use it for mtlpcr
(and/or mtmsr) once we do the powernv machine type.

>  void spapr_events_init(sPAPRMachineState *sm);
>  void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq);
>  int spapr_h_cas_compose_response(sPAPRMachineState *sm,
> Index: qemu-dgibson-for-2.6.git/hw/ppc/spapr_hcall.c
> ===
> --- qemu-dgibson-for-2.6.git.orig/hw/ppc/spapr_hcall.c
> +++ qemu-dgibson-for-2.6.git/hw/ppc/spapr_hcall.c
> @@ -835,17 +835,8 @@ static target_ulong h_set_mode_resource_
>  return H_P4;
>  }
>  
> -switch (mflags) {
> -case H_SET_MODE_ADDR_TRANS_NONE:
> -prefix = 0;
> -break;
> -case H_SET_MODE_ADDR_TRANS_0001_8000:
> -prefix = 0x18000;
> -break;
> -case H_SET_MODE_ADDR_TRANS_C000___4000:
> -prefix = 0xC0004000ULL;
> -break;
> -default:
> +prefix = spapr_h_set_mode_resource_addr(mflags);
> +if (prefix == H_UNSUPPORTED_FLAG) {
>  return H_UNSUPPORTED_FLAG;
>  }
>  
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [RFC PATCH v2 2/2] spapr: Memory hot-unplug support

2016-03-28 Thread David Gibson

On Thu, Mar 24, 2016 at 09:15:58AM -0500, Nathan Fontenot wrote:
> On 03/22/2016 10:22 PM, David Gibson wrote:
> > On Wed, Mar 16, 2016 at 10:11:54AM +0530, Bharata B Rao wrote:
> >> On Wed, Mar 16, 2016 at 12:36:05PM +1100, David Gibson wrote:
> >>> On Tue, Mar 15, 2016 at 10:08:56AM +0530, Bharata B Rao wrote:
>  Add support to hot remove pc-dimm memory devices.
> 
>  Signed-off-by: Bharata B Rao 
> >>>
> >>> Reviewed-by: David Gibson 
> >>>
> >>> Looks correct, but again, needs to wait on the PAPR change.
> >>>
> >>> Have you thought any further on the idea of sending an index message,
> >>> then a count message as an interim approach to fixing this without
> >>> requiring a PAPR change?
> >>
> >> Removal by index and removal by count are valid messages by themselves
> >> and drmgr would go ahead and start the removal in reponse to those
> >> calls. IIUC, you are suggesting that lets remove one LMB by index in
> >> response to 1st message and remove (count -1) LMBs from where the last
> >> removal was done in the previous message.
> > 
> > Yes, that's the idea.
> > 
> >> Since the same code base of powerpc-utils works on PowerVM too, I am not
> >> sure if such an approach would impact PowerVM in any undesirable manner.
> >> May be Nathan can clarify ?
> 
> The issue I see with this approach is that there is no way in the current
> drmgr code to correlate the two memory remove requests. If I understand
> what you are asking to do correctly, this would result in two separate
> invocations of drmgr. The first to remove a specific LMB by index, this
> index then needing to be saved somewhere, then a second invocation that
> would retrieve the index and remove count-1 LMBs.

Ah.. yes.. I had forgotten that this would be two separate drmgr
invocations, and therefore we'd need a way to carry data between
them.  That does complicate this rather.

> Would there be anything tying these two requests together? or would we
> assume that two requests received in this order are correlated?

My assumption was that it would be based simply on order.

> What happens if another request comes in in between these two requests?
> I see this as being a pretty rare possibility, but it is a possibility.

I'm not sure it actually is possible under KVM - I think the qemu side
processes the requests synchronously.  I'm not 100% certain about that
though.

 The plan was that the qemu HV would not permit LMBs to be removed if
they're not the ones that are supposed to be removed, and so drmgr
would keep scanning until it finds the right ones.

So, even if the request order is jumbled, the behaviour should be
still technically correct - it could be *very* slow though as drmgr
might end up vacating (piece by piece) large areas of the guest's RAM
while it scans for the right LMBs to remove.

> > Heard anything from Nathan?  I don't really see how it would be bad
> > under PowerVM.  Under PowerVM it generally doesn't matter which LMBs
> > you remove, right?  So removing the ones immediately after the last
> > one you removed should be as good a choice as any.
> 
> This shouldn't hurt anything for PowerVM systems. In general the only
> time a specific LMB is specified for PowerVM systems is on memory guard
> operations.

Ok.

> >> I see that this can be done, but the changes in drmgr code specially the
> >> code related to LMB list handling/removal can be non-trivial. So not sure
> >> if the temporary approach is all that worth here and hence I feel it is 
> >> better
> >> to wait and do it the count-indexed way.
> > 
> > Really?  drmgr is already scanning LMBs to find ones it can remove.
> > Seeding that scan with the last removed LMB shouldn't be too hard.
> 
> This shouldn't be difficult to implement in the drmgr code. We already
> search a list of LMBs to find ones to remove, updating to just return
> the LMB with the next sequential index shouldn't be difficult.
> 
> -Nathan
> 
> > 
> >> While we are here, I would also like to get some opinion on the real
> >> need for memory unplug. Is there anything that memory unplug gives us
> >> which memory ballooning (shrinking mem via ballooning) can't give ?
> > 
> > Hmm.. that's an interesting question.  
> > 
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [Xen-devel] [PATCH v2 0/2] usb, xen: add pvUSB backend

2016-03-28 Thread Juergen Gross

On 18/03/16 15:47, Juergen Gross wrote:
> On 18/03/16 13:52, Gerd Hoffmann wrote:
>> On Do, 2016-03-10 at 16:19 +0100, Juergen Gross wrote:
>>> This series adds a Xen pvUSB backend driver to qemu. USB devices
>>> connected to the host can be passed through to a Xen guest. The
>>> devices are specified via Xenstore. Access to the devices is done
>>> via host-libusb.c
>>
>>> I've tested the backend with various USB devices (memory sticks,
>>> keyboard, ...).
>>
>> Patches look sane to me.
>>
>> Have you tested both virtual and physical devices?  Given how it is
>> written devices such as the virtual usb tablet should work just fine
>> too.
> 
> I tested with physical devices only.
> 
> TBH, I don't think a virtual device would work, given how the to be used
> device is selected (driver="usb-host", hostbus, hostport).
> 
>> I can take that through the usb queue, but I'd like to see someone from
>> xen have a look at this too.  Reviews anyone?

Stefano, could you please have a look?


Juergen

> 
> Awesome, thanks for the thumbs up!
> 
> 
> Juergen
> 
> ___
> Xen-devel mailing list
> xen-de...@lists.xen.org
> http://lists.xen.org/xen-devel
>

Re: [Qemu-devel] [PATCH qemu v14 18/18] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)

2016-03-28 Thread David Gibson

On Thu, Mar 24, 2016 at 01:32:48PM +1100, Alexey Kardashevskiy wrote:
> On 03/23/2016 05:11 PM, David Gibson wrote:
> >On Wed, Mar 23, 2016 at 02:28:01PM +1100, Alexey Kardashevskiy wrote:
> >>On 03/23/2016 01:13 PM, David Gibson wrote:
> >>>On Mon, Mar 21, 2016 at 06:47:06PM +1100, Alexey Kardashevskiy wrote:
> This adds support for Dynamic DMA Windows (DDW) option defined by
> the SPAPR specification which allows to have additional DMA window(s)
> 
> This implements DDW for emulated and VFIO devices.
> This reserves RTAS token numbers for DDW calls.
> 
> This changes the TCE table migration descriptor to support dynamic
> tables as from now on, PHB will create as many stub TCE table objects
> as PHB can possibly support but not all of them might be initialized at
> the time of migration because DDW might or might not be requested by
> the guest.
> 
> The "ddw" property is enabled by default on a PHB but for compatibility
> the pseries-2.5 machine and older disable it.
> 
> This implements DDW for VFIO. The host kernel support is required.
> This adds a "levels" property to PHB to control the number of levels
> in the actual TCE table allocated by the host kernel, 0 is the default
> value to tell QEMU to calculate the correct value. Current hardware
> supports up to 5 levels.
> 
> The existing linux guests try creating one additional huge DMA window
> with 64K or 16MB pages and map the entire guest RAM to. If succeeded,
> the guest switches to dma_direct_ops and never calls TCE hypercalls
> (H_PUT_TCE,...) again. This enables VFIO devices to use the entire RAM
> and not waste time on map/unmap later. This adds a "dma64_win_addr"
> property which is a bus address for the 64bit window and by default
> set to 0x800... as this is what the modern POWER8 hardware
> uses and this allows having emulated and VFIO devices on the same bus.
> 
> This adds 4 RTAS handlers:
> * ibm,query-pe-dma-window
> * ibm,create-pe-dma-window
> * ibm,remove-pe-dma-window
> * ibm,reset-pe-dma-window
> These are registered from type_init() callback.
> 
> These RTAS handlers are implemented in a separate file to avoid polluting
> spapr_iommu.c with PCI.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>   hw/ppc/Makefile.objs|   1 +
>   hw/ppc/spapr.c  |   7 +-
>   hw/ppc/spapr_pci.c  |  73 ---
>   hw/ppc/spapr_rtas_ddw.c | 300 
>  
>   hw/vfio/common.c|   5 -
>   include/hw/pci-host/spapr.h |  13 ++
>   include/hw/ppc/spapr.h  |  16 ++-
>   trace-events|   4 +
>   8 files changed, 395 insertions(+), 24 deletions(-)
>   create mode 100644 hw/ppc/spapr_rtas_ddw.c
> 
> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
> index c1ffc77..986b36f 100644
> --- a/hw/ppc/Makefile.objs
> +++ b/hw/ppc/Makefile.objs
> @@ -7,6 +7,7 @@ obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o 
> spapr_drc.o spapr_rng.o
>   ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
>   obj-y += spapr_pci_vfio.o
>   endif
> +obj-$(CONFIG_PSERIES) += spapr_rtas_ddw.o
>   # PowerPC 4xx boards
>   obj-y += ppc405_boards.o ppc4xx_devs.o ppc405_uc.o ppc440_bamboo.o
>   obj-y += ppc4xx_pci.o
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index d0bb423..ef4c637 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2362,7 +2362,12 @@ DEFINE_SPAPR_MACHINE(2_6, "2.6", true);
>    * pseries-2.5
>    */
>   #define SPAPR_COMPAT_2_5 \
> -HW_COMPAT_2_5
> +HW_COMPAT_2_5 \
> +{\
> +.driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,\
> +.property = "ddw",\
> +.value= stringify(off),\
> +},
> 
>   static void spapr_machine_2_5_instance_options(MachineState *machine)
>   {
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index af99a36..3bb294a 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -803,12 +803,12 @@ static char *spapr_phb_get_loc_code(sPAPRPHBState 
> *sphb, PCIDevice *pdev)
>   return buf;
>   }
> 
> -static void spapr_phb_dma_window_enable(sPAPRPHBState *sphb,
> -   uint32_t liobn,
> -   uint32_t page_shift,
> -   uint64_t window_addr,
> -   uint64_t window_size,
> -   Error **errp)
> +void spapr_phb_dma_window_enable(sPAPRPHBState *sphb,
> + uint32_t liobn,
> + uint32_t page_shift,
> +

Re: [Qemu-devel] [PATCH qemu v14 17/18] vfio/spapr: Use VFIO_SPAPR_TCE_v2_IOMMU

2016-03-28 Thread David Gibson

On Thu, Mar 24, 2016 at 08:10:44PM +1100, Alexey Kardashevskiy wrote:
> On 03/24/2016 11:03 AM, Alexey Kardashevskiy wrote:
> >On 03/23/2016 05:03 PM, David Gibson wrote:
> >>On Wed, Mar 23, 2016 at 02:06:36PM +1100, Alexey Kardashevskiy wrote:
> >>>On 03/23/2016 01:53 PM, David Gibson wrote:
> On Wed, Mar 23, 2016 at 01:12:59PM +1100, Alexey Kardashevskiy wrote:
> >On 03/23/2016 12:08 PM, David Gibson wrote:
> >>On Tue, Mar 22, 2016 at 04:54:07PM +1100, Alexey Kardashevskiy wrote:
> >>>On 03/22/2016 04:14 PM, David Gibson wrote:
> On Mon, Mar 21, 2016 at 06:47:05PM +1100, Alexey Kardashevskiy wrote:
> >New VFIO_SPAPR_TCE_v2_IOMMU type supports dynamic DMA window
> >management.
> >This adds ability to VFIO common code to dynamically allocate/remove
> >DMA windows in the host kernel when new VFIO container is
> >added/removed.
> >
> >This adds VFIO_IOMMU_SPAPR_TCE_CREATE ioctl to
> >vfio_listener_region_add
> >and adds just created IOMMU into the host IOMMU list; the opposite
> >action is taken in vfio_listener_region_del.
> >
> >When creating a new window, this uses euristic to decide on the
> >TCE table
> >levels number.
> >
> >This should cause no guest visible change in behavior.
> >
> >Signed-off-by: Alexey Kardashevskiy 
> >---
> >Changes:
> >v14:
> >* new to the series
> >
> >---
> >TODO:
> >* export levels to PHB
> >---
> >  hw/vfio/common.c | 108
> >---
> >  trace-events |   2 ++
> >  2 files changed, 105 insertions(+), 5 deletions(-)
> >
> >diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >index 4e873b7..421d6eb 100644
> >--- a/hw/vfio/common.c
> >+++ b/hw/vfio/common.c
> >@@ -279,6 +279,14 @@ static int vfio_host_iommu_add(VFIOContainer
> >*container,
> >  return 0;
> >  }
> >
> >+static void vfio_host_iommu_del(VFIOContainer *container, hwaddr
> >min_iova)
> >+{
> >+VFIOHostIOMMU *hiommu = vfio_host_iommu_lookup(container,
> >min_iova, 0x1000);
> 
> The hard-coded 0x1000 looks dubious..
> >>>
> >>>Well, that's the minimal page size...
> >>
> >>Really?  Some BookE CPUs support 1KiB page size..
> >
> >Hm. For IOMMU? Ok. s/0x1000/1/ should do then :)
> 
> Uh.. actually I don't think those CPUs generally had an IOMMU.  But if
> it's been done for CPU MMU I wouldn't count on it not being done for
> IOMMU.
> 
> 1 is a safer choice.
> 
> >
> >
> >>
> >+g_assert(hiommu);
> >+QLIST_REMOVE(hiommu, hiommu_next);
> >+}
> >+
> >  static bool vfio_listener_skipped_section(MemoryRegionSection
> >*section)
> >  {
> >  return (!memory_region_is_ram(section->mr) &&
> >@@ -392,6 +400,61 @@ static void
> >vfio_listener_region_add(MemoryListener *listener,
> >  }
> >  end = int128_get64(llend);
> >
> >+if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
> 
> I think this would be clearer split out into a helper function,
> vfio_create_host_window() or something.
> >>>
> >>>
> >>>It is rather vfio_spapr_create_host_window() and we were avoiding
> >>>xxx_spapr_xxx so far. I'd cut-n-paste the SPAPR PCI AS listener to a
> >>>separate file but this usually triggers more discussion and never
> >>>ends well.
> >>>
> >>>
> >>>
> >+unsigned entries, pages;
> >+struct vfio_iommu_spapr_tce_create create = { .argsz =
> >sizeof(create) };
> >+
> >+g_assert(section->mr->iommu_ops);
> >+g_assert(memory_region_is_iommu(section->mr));
> 
> I don't think you need these asserts.  AFAICT the same logic should
> work if a RAM MR was added directly to PCI address space - this would
> create the new host window, then the existing code for adding a RAM MR
> would map that block of RAM statically into the new window.
> >>>
> >>>In what configuration/machine can we do that on SPAPR?
> >>
> >>spapr guests won't ever do that.  But you can run an x86 guest on a
> >>powernv host and this situation could come up.
> >
> >
> >I am pretty sure VFIO won't work in this case anyway.
> 
> I'm not.  There's no fundamental reason VFIO shouldn't work with TCG.
> >>>
> >>>This is not about TCG (pseries TCG guest works with VFIO on powernv host),
> >>>this is about things like VFIO_IOMMU_GET_INFO vs.
> >>>VFIO_IOMMU_SPAPR_TCE_GET_INFO ioctls but yes, fundamentally, it can work.

Re: [Qemu-devel] [PATCH qemu v14 17/18] vfio/spapr: Use VFIO_SPAPR_TCE_v2_IOMMU

2016-03-28 Thread Alexey Kardashevskiy


On 03/29/2016 04:30 PM, David Gibson wrote:

On Thu, Mar 24, 2016 at 08:10:44PM +1100, Alexey Kardashevskiy wrote:

On 03/24/2016 11:03 AM, Alexey Kardashevskiy wrote:

On 03/23/2016 05:03 PM, David Gibson wrote:

On Wed, Mar 23, 2016 at 02:06:36PM +1100, Alexey Kardashevskiy wrote:

On 03/23/2016 01:53 PM, David Gibson wrote:

On Wed, Mar 23, 2016 at 01:12:59PM +1100, Alexey Kardashevskiy wrote:

On 03/23/2016 12:08 PM, David Gibson wrote:

On Tue, Mar 22, 2016 at 04:54:07PM +1100, Alexey Kardashevskiy wrote:

On 03/22/2016 04:14 PM, David Gibson wrote:

On Mon, Mar 21, 2016 at 06:47:05PM +1100, Alexey Kardashevskiy wrote:

New VFIO_SPAPR_TCE_v2_IOMMU type supports dynamic DMA window
management.
This adds ability to VFIO common code to dynamically allocate/remove
DMA windows in the host kernel when new VFIO container is
added/removed.

This adds VFIO_IOMMU_SPAPR_TCE_CREATE ioctl to
vfio_listener_region_add
and adds just created IOMMU into the host IOMMU list; the opposite
action is taken in vfio_listener_region_del.

When creating a new window, this uses euristic to decide on the
TCE table
levels number.

This should cause no guest visible change in behavior.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v14:
* new to the series

---
TODO:
* export levels to PHB
---
  hw/vfio/common.c | 108
---
  trace-events |   2 ++
  2 files changed, 105 insertions(+), 5 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 4e873b7..421d6eb 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -279,6 +279,14 @@ static int vfio_host_iommu_add(VFIOContainer
*container,
  return 0;
  }

+static void vfio_host_iommu_del(VFIOContainer *container, hwaddr
min_iova)
+{
+VFIOHostIOMMU *hiommu = vfio_host_iommu_lookup(container,
min_iova, 0x1000);


The hard-coded 0x1000 looks dubious..


Well, that's the minimal page size...


Really?  Some BookE CPUs support 1KiB page size..


Hm. For IOMMU? Ok. s/0x1000/1/ should do then :)


Uh.. actually I don't think those CPUs generally had an IOMMU.  But if
it's been done for CPU MMU I wouldn't count on it not being done for
IOMMU.

1 is a safer choice.







+g_assert(hiommu);
+QLIST_REMOVE(hiommu, hiommu_next);
+}
+
  static bool vfio_listener_skipped_section(MemoryRegionSection
*section)
  {
  return (!memory_region_is_ram(section->mr) &&
@@ -392,6 +400,61 @@ static void
vfio_listener_region_add(MemoryListener *listener,
  }
  end = int128_get64(llend);

+if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {


I think this would be clearer split out into a helper function,
vfio_create_host_window() or something.



It is rather vfio_spapr_create_host_window() and we were avoiding
xxx_spapr_xxx so far. I'd cut-n-paste the SPAPR PCI AS listener to a
separate file but this usually triggers more discussion and never
ends well.




+unsigned entries, pages;
+struct vfio_iommu_spapr_tce_create create = { .argsz =
sizeof(create) };
+
+g_assert(section->mr->iommu_ops);
+g_assert(memory_region_is_iommu(section->mr));


I don't think you need these asserts.  AFAICT the same logic should
work if a RAM MR was added directly to PCI address space - this would
create the new host window, then the existing code for adding a RAM MR
would map that block of RAM statically into the new window.


In what configuration/machine can we do that on SPAPR?


spapr guests won't ever do that.  But you can run an x86 guest on a
powernv host and this situation could come up.



I am pretty sure VFIO won't work in this case anyway.


I'm not.  There's no fundamental reason VFIO shouldn't work with TCG.


This is not about TCG (pseries TCG guest works with VFIO on powernv host),
this is about things like VFIO_IOMMU_GET_INFO vs.
VFIO_IOMMU_SPAPR_TCE_GET_INFO ioctls but yes, fundamentally, it can work.

Should I add such support in this patchset?


Unless adding the generality is really complex, and so far I haven't
seen a reason for it to be.


Seriously? :(



So, I tried.

With q35 machine (pc-i440fx-2.6 have even worse memory tree), there are
several RAM blocks - 0..0xc, then pc.rom, then RAM again till 2GB, then
a gap (PCI MMIO?), then PC BIOS at 0xfffc (which is RAM), then after 4GB
the rest of the RAM:

memory-region: system
   - (prio 0, RW): system
 -7fff (prio 0, RW): alias ram-below-4g
@pc.ram -7fff
 - (prio -1, RW): pci
   000c-000d (prio 1, RW): pc.rom
   000e-000f (prio 1, R-): alias isa-bios
@pc.bios 0002-0003
   febe-febe (prio 1, RW): 0003:09:00.0 BAR 0
 febe-febe (prio 0, RW): 0003:09:00.0 BAR 0
mmaps[0]
   febf-febf1fff (prio 1, RW): 0003:09:00.0 BAR 2
 0

Re: [Qemu-devel] [PATCH qemu v14 18/18] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)

2016-03-28 Thread Alexey Kardashevskiy


On 03/29/2016 04:22 PM, David Gibson wrote:

On Thu, Mar 24, 2016 at 01:32:48PM +1100, Alexey Kardashevskiy wrote:

On 03/23/2016 05:11 PM, David Gibson wrote:

On Wed, Mar 23, 2016 at 02:28:01PM +1100, Alexey Kardashevskiy wrote:

On 03/23/2016 01:13 PM, David Gibson wrote:

On Mon, Mar 21, 2016 at 06:47:06PM +1100, Alexey Kardashevskiy wrote:

This adds support for Dynamic DMA Windows (DDW) option defined by
the SPAPR specification which allows to have additional DMA window(s)

This implements DDW for emulated and VFIO devices.
This reserves RTAS token numbers for DDW calls.

This changes the TCE table migration descriptor to support dynamic
tables as from now on, PHB will create as many stub TCE table objects
as PHB can possibly support but not all of them might be initialized at
the time of migration because DDW might or might not be requested by
the guest.

The "ddw" property is enabled by default on a PHB but for compatibility
the pseries-2.5 machine and older disable it.

This implements DDW for VFIO. The host kernel support is required.
This adds a "levels" property to PHB to control the number of levels
in the actual TCE table allocated by the host kernel, 0 is the default
value to tell QEMU to calculate the correct value. Current hardware
supports up to 5 levels.

The existing linux guests try creating one additional huge DMA window
with 64K or 16MB pages and map the entire guest RAM to. If succeeded,
the guest switches to dma_direct_ops and never calls TCE hypercalls
(H_PUT_TCE,...) again. This enables VFIO devices to use the entire RAM
and not waste time on map/unmap later. This adds a "dma64_win_addr"
property which is a bus address for the 64bit window and by default
set to 0x800... as this is what the modern POWER8 hardware
uses and this allows having emulated and VFIO devices on the same bus.

This adds 4 RTAS handlers:
* ibm,query-pe-dma-window
* ibm,create-pe-dma-window
* ibm,remove-pe-dma-window
* ibm,reset-pe-dma-window
These are registered from type_init() callback.

These RTAS handlers are implemented in a separate file to avoid polluting
spapr_iommu.c with PCI.

Signed-off-by: Alexey Kardashevskiy 
---
  hw/ppc/Makefile.objs|   1 +
  hw/ppc/spapr.c  |   7 +-
  hw/ppc/spapr_pci.c  |  73 ---
  hw/ppc/spapr_rtas_ddw.c | 300 
  hw/vfio/common.c|   5 -
  include/hw/pci-host/spapr.h |  13 ++
  include/hw/ppc/spapr.h  |  16 ++-
  trace-events|   4 +
  8 files changed, 395 insertions(+), 24 deletions(-)
  create mode 100644 hw/ppc/spapr_rtas_ddw.c

diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index c1ffc77..986b36f 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -7,6 +7,7 @@ obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o 
spapr_rng.o
  ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
  obj-y += spapr_pci_vfio.o
  endif
+obj-$(CONFIG_PSERIES) += spapr_rtas_ddw.o
  # PowerPC 4xx boards
  obj-y += ppc405_boards.o ppc4xx_devs.o ppc405_uc.o ppc440_bamboo.o
  obj-y += ppc4xx_pci.o
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index d0bb423..ef4c637 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2362,7 +2362,12 @@ DEFINE_SPAPR_MACHINE(2_6, "2.6", true);
   * pseries-2.5
   */
  #define SPAPR_COMPAT_2_5 \
-HW_COMPAT_2_5
+HW_COMPAT_2_5 \
+{\
+.driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,\
+.property = "ddw",\
+.value= stringify(off),\
+},

  static void spapr_machine_2_5_instance_options(MachineState *machine)
  {
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index af99a36..3bb294a 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -803,12 +803,12 @@ static char *spapr_phb_get_loc_code(sPAPRPHBState *sphb, 
PCIDevice *pdev)
  return buf;
  }

-static void spapr_phb_dma_window_enable(sPAPRPHBState *sphb,
-   uint32_t liobn,
-   uint32_t page_shift,
-   uint64_t window_addr,
-   uint64_t window_size,
-   Error **errp)
+void spapr_phb_dma_window_enable(sPAPRPHBState *sphb,
+ uint32_t liobn,
+ uint32_t page_shift,
+ uint64_t window_addr,
+ uint64_t window_size,
+ Error **errp)
  {
  sPAPRTCETable *tcet;
  uint32_t nb_table = window_size >> page_shift;
@@ -825,10 +825,16 @@ static void spapr_phb_dma_window_enable(sPAPRPHBState 
*sphb,
  return;
  }

+if (SPAPR_PCI_DMA_WINDOW_NUM(liobn) && !sphb->ddw_enabled) {
+error_setg(errp,
+   "Attempt to use second window when DDW is disabled on PHB");
+return;
+}


This should never happen unless something

Re: [Qemu-devel] [Qemu-ppc] [PATCHv2 2/3] target-ppc: Add helpers for updating a CPU's SDR1 and external HPT

2016-03-28 Thread David Gibson

On Fri, Mar 25, 2016 at 10:13:59AM +0100, Greg Kurz wrote:
> Hi Laurent,
> 
> On Thu, 24 Mar 2016 09:41:59 +0100
> Laurent Vivier  wrote:
> 
> > On 24/03/2016 06:35, David Gibson wrote:
> > > On Tue, Mar 22, 2016 at 05:33:45PM +0100, Laurent Vivier wrote:  
> > >> Hi David,
> > >>
> > >> using kvm-unit-tests, I've found a side effect of your patches: the MSR
> > >> is cleared (and perhaps some others).
> > >>
> > >> I was trying to test my patch on top of QEMU master:
> > >>
> > >> "ppc64: set MSR_SF bit"
> > >> http://patchwork.ozlabs.org/patch/598198/
> > >>
> > >> and it was not working anymore.
> > >>
> > >> By bisecting, I've found this commit.
> > >>
> > >> I think "cpu_synchronize_state()" in "ppc_hash64_set_external_hpt()"
> > >> restores the MSR from KVM whereas the one from QEMU has not been saved,
> > >> because cpu_synchronize_all_post_reset() is called later.
> > >>
> > >> So it is cleared.
> > >>
> > >> You can test this by applying the MSR_SF patch and using the "emulator"
> > >> test of kvm-unit-tests (the "emulator: 64bit" test case)  
> > > 
> > > Ugh, you're right of course.  But, I'm having a bit of trouble
> > > figuring out how to fix it propertly.  
> > 
> > Perhaps you can just remove the cpu_synchronize_state()?
> > 
> > As this is in the reset phase (spapr_cpu_reset()), I think the content
> > of the QEMU side registers are correct, and they will be synchronized at
> > the end of the reset phase.
> > 
> 
> I think this is right because qemu_system_reset() is called either:
> - during system startup:
> 
> cpu_synchronize_all_post_init();   => push regs to KVM, kvm_vcpu_dirty = 
> false
> ...
> qemu_system_reset(VMRESET_SILENT);
> 
> QEMU still have good values for the registers though, since KVM hasn't run 
> yet.
> 
> But ppc_hash64_set_external_hpt()->cpu_synchronize_state() will indeed pull
> the registers from KVM and clear MSR_SF, since we have kvm_vcpu_dirty == 
> false.
> 
> - or from main_loop_should_exit() and we have:
> 
> cpu_synchronize_all_states();
> qemu_system_reset(VMRESET_REPORT);
> and
> cpu_synchronize_all_states();
> qemu_system_reset(VMRESET_SILENT);
> 
> In which case ppc_hash64_set_external_hpt()->cpu_synchronize_state() isn't
> needed.
> 
> Makes sense ?

Ugh.  So, I think this is the simplest short term fix, and I've
applied a change to ppc-for-2.6 removing the cpu_synchronize_state().


But, longer term I don't think this is really right.  Removing the
cpu_synchronize_state() corrects using this in the reset path, but
breaks it if it is used outside the reset path.  At the moment we
don't do that, but I have code which will do so in my HPT resizing
branch.  Obviously I can fix that by putting cpu_synchronize_state()
in the caller, but it still seems a bit odd having a put_sregs().

Really, it seems to me that kvm_vcpu_dirty should be set to true
(either via cpu_synchronize_state() or directly) *before* the core CPU
reset path, in the same way that it is set to true in kvm_init_vcpu().

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH qemu v14 17/18] vfio/spapr: Use VFIO_SPAPR_TCE_v2_IOMMU

2016-03-28 Thread David Gibson

On Tue, Mar 29, 2016 at 04:44:04PM +1100, Alexey Kardashevskiy wrote:
> On 03/29/2016 04:30 PM, David Gibson wrote:
> >On Thu, Mar 24, 2016 at 08:10:44PM +1100, Alexey Kardashevskiy wrote:
> >>On 03/24/2016 11:03 AM, Alexey Kardashevskiy wrote:
> >>>On 03/23/2016 05:03 PM, David Gibson wrote:
> On Wed, Mar 23, 2016 at 02:06:36PM +1100, Alexey Kardashevskiy wrote:
> >On 03/23/2016 01:53 PM, David Gibson wrote:
> >>On Wed, Mar 23, 2016 at 01:12:59PM +1100, Alexey Kardashevskiy wrote:
> >>>On 03/23/2016 12:08 PM, David Gibson wrote:
> On Tue, Mar 22, 2016 at 04:54:07PM +1100, Alexey Kardashevskiy wrote:
> >On 03/22/2016 04:14 PM, David Gibson wrote:
> >>On Mon, Mar 21, 2016 at 06:47:05PM +1100, Alexey Kardashevskiy 
> >>wrote:
> >>>New VFIO_SPAPR_TCE_v2_IOMMU type supports dynamic DMA window
> >>>management.
> >>>This adds ability to VFIO common code to dynamically 
> >>>allocate/remove
> >>>DMA windows in the host kernel when new VFIO container is
> >>>added/removed.
> >>>
> >>>This adds VFIO_IOMMU_SPAPR_TCE_CREATE ioctl to
> >>>vfio_listener_region_add
> >>>and adds just created IOMMU into the host IOMMU list; the opposite
> >>>action is taken in vfio_listener_region_del.
> >>>
> >>>When creating a new window, this uses euristic to decide on the
> >>>TCE table
> >>>levels number.
> >>>
> >>>This should cause no guest visible change in behavior.
> >>>
> >>>Signed-off-by: Alexey Kardashevskiy 
> >>>---
> >>>Changes:
> >>>v14:
> >>>* new to the series
> >>>
> >>>---
> >>>TODO:
> >>>* export levels to PHB
> >>>---
> >>>  hw/vfio/common.c | 108
> >>>---
> >>>  trace-events |   2 ++
> >>>  2 files changed, 105 insertions(+), 5 deletions(-)
> >>>
> >>>diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >>>index 4e873b7..421d6eb 100644
> >>>--- a/hw/vfio/common.c
> >>>+++ b/hw/vfio/common.c
> >>>@@ -279,6 +279,14 @@ static int vfio_host_iommu_add(VFIOContainer
> >>>*container,
> >>>  return 0;
> >>>  }
> >>>
> >>>+static void vfio_host_iommu_del(VFIOContainer *container, hwaddr
> >>>min_iova)
> >>>+{
> >>>+VFIOHostIOMMU *hiommu = vfio_host_iommu_lookup(container,
> >>>min_iova, 0x1000);
> >>
> >>The hard-coded 0x1000 looks dubious..
> >
> >Well, that's the minimal page size...
> 
> Really?  Some BookE CPUs support 1KiB page size..
> >>>
> >>>Hm. For IOMMU? Ok. s/0x1000/1/ should do then :)
> >>
> >>Uh.. actually I don't think those CPUs generally had an IOMMU.  But if
> >>it's been done for CPU MMU I wouldn't count on it not being done for
> >>IOMMU.
> >>
> >>1 is a safer choice.
> >>
> >>>
> >>>
> 
> >>>+g_assert(hiommu);
> >>>+QLIST_REMOVE(hiommu, hiommu_next);
> >>>+}
> >>>+
> >>>  static bool vfio_listener_skipped_section(MemoryRegionSection
> >>>*section)
> >>>  {
> >>>  return (!memory_region_is_ram(section->mr) &&
> >>>@@ -392,6 +400,61 @@ static void
> >>>vfio_listener_region_add(MemoryListener *listener,
> >>>  }
> >>>  end = int128_get64(llend);
> >>>
> >>>+if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
> >>
> >>I think this would be clearer split out into a helper function,
> >>vfio_create_host_window() or something.
> >
> >
> >It is rather vfio_spapr_create_host_window() and we were avoiding
> >xxx_spapr_xxx so far. I'd cut-n-paste the SPAPR PCI AS listener to a
> >separate file but this usually triggers more discussion and never
> >ends well.
> >
> >
> >
> >>>+unsigned entries, pages;
> >>>+struct vfio_iommu_spapr_tce_create create = { .argsz =
> >>>sizeof(create) };
> >>>+
> >>>+g_assert(section->mr->iommu_ops);
> >>>+g_assert(memory_region_is_iommu(section->mr));
> >>
> >>I don't think you need these asserts.  AFAICT the same logic should
> >>work if a RAM MR was added directly to PCI address space - this 
> >>would
> >>create the new host window, then the existing code for adding a RAM 
> >>MR
> >>would map that block of RAM statically into the new window.
> >
> >In what configuration/machine can we do that on SPAPR?
> 
> spapr guests won't ever do that.  But you can run an x86 guest on a
> powernv host and this si

91 matches

Mail list logo