[Qemu-devel] [PATCH] ppc/pnv: add dummy XSCOM registers for PRD initialization

2019-05-27 Thread Cédric Le Goater
PRD (Processor recovery diagnostics) is a service available on
OpenPower systems. The opal-prd daemon initializes the PowerPC
Processor through the XSCOM bus and then waits for hardware diagnostic
events.

Signed-off-by: Cédric Le Goater 
---
 hw/ppc/pnv_xscom.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/hw/ppc/pnv_xscom.c b/hw/ppc/pnv_xscom.c
index c285ef514e88..f53a6d7a9457 100644
--- a/hw/ppc/pnv_xscom.c
+++ b/hw/ppc/pnv_xscom.c
@@ -29,6 +29,12 @@
 
 #include 
 
+/* PRD registers */
+#define PRD_P8_IPOLL_REG_MASK   0x01020013
+#define PRD_P8_IPOLL_REG_STATUS 0x01020014
+#define PRD_P9_IPOLL_REG_MASK   0x000F0033
+#define PRD_P9_IPOLL_REG_STATUS 0x000F0034
+
 static void xscom_complete(CPUState *cs, uint64_t hmer_bits)
 {
 /*
@@ -70,6 +76,12 @@ static uint64_t xscom_read_default(PnvChip *chip, uint32_t 
pcba)
 case 0x1010c00: /* PIBAM FIR */
 case 0x1010c03: /* PIBAM FIR MASK */
 
+/* PRD registers */
+case PRD_P8_IPOLL_REG_MASK:
+case PRD_P8_IPOLL_REG_STATUS:
+case PRD_P9_IPOLL_REG_MASK:
+case PRD_P9_IPOLL_REG_STATUS:
+
 /* P9 xscom reset */
 case 0x0090018: /* Receive status reg */
 case 0x0090012: /* log register */
@@ -124,6 +136,12 @@ static bool xscom_write_default(PnvChip *chip, uint32_t 
pcba, uint64_t val)
 case 0x201302a: /* CAPP stuff */
 case 0x2013801: /* CAPP stuff */
 case 0x2013802: /* CAPP stuff */
+
+/* P8 PRD registers */
+case PRD_P8_IPOLL_REG_MASK:
+case PRD_P8_IPOLL_REG_STATUS:
+case PRD_P9_IPOLL_REG_MASK:
+case PRD_P9_IPOLL_REG_STATUS:
 return true;
 default:
 return false;
-- 
2.21.0




[Qemu-devel] [PATCH] ppc/pnv: introduce new skiboot platform properties

2019-05-27 Thread Cédric Le Goater
Newer skiboots (after 6.3) support QEMU platforms that have
characteristics closer to real OpenPOWER systems. The CPU type is used
to define the BMC drivers: Aspeed AST2400 for POWER8 processors and
AST2500 for POWER9s.

Advertise the new platform property names, "qemu,powernv8" and
"qemu,powernv9", using the CPU type chosen for the QEMU PowerNV
machine. Also, advertise the original platform name "qemu,powernv" in
case of POWER8 processors for compatibility with older skiboots.

Signed-off-by: Cédric Le Goater 
---
 hw/ppc/pnv.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index dfb4ea5742c1..1f22cbf833a8 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -450,7 +450,8 @@ static void pnv_dt_power_mgt(void *fdt)
 
 static void *pnv_dt_create(MachineState *machine)
 {
-const char plat_compat[] = "qemu,powernv\0ibm,powernv";
+const char plat_compat8[] = "qemu,powernv8\0qemu,powernv\0ibm,powernv";
+const char plat_compat9[] = "qemu,powernv9\0ibm,powernv";
 PnvMachineState *pnv = PNV_MACHINE(machine);
 void *fdt;
 char *buf;
@@ -465,8 +466,14 @@ static void *pnv_dt_create(MachineState *machine)
 _FDT((fdt_setprop_cell(fdt, 0, "#size-cells", 0x2)));
 _FDT((fdt_setprop_string(fdt, 0, "model",
  "IBM PowerNV (emulated by qemu)")));
-_FDT((fdt_setprop(fdt, 0, "compatible", plat_compat,
-  sizeof(plat_compat;
+if (pnv_is_power9(pnv)) {
+_FDT((fdt_setprop(fdt, 0, "compatible", plat_compat9,
+  sizeof(plat_compat9;
+} else {
+_FDT((fdt_setprop(fdt, 0, "compatible", plat_compat8,
+  sizeof(plat_compat8;
+}
+
 
 buf =  qemu_uuid_unparse_strdup(&qemu_uuid);
 _FDT((fdt_setprop_string(fdt, 0, "vm,uuid", buf)));
-- 
2.21.0




Re: [Qemu-devel] [PATCH] i386: Fix nested SVM on older Opterons

2019-05-27 Thread Bernhard M. Wiedemann
On 23/05/2019 23.27, Eduardo Habkost wrote:
> On Thu, May 23, 2019 at 07:57:38PM +0100, Dr. David Alan Gilbert wrote:
>> * Bernhard M. Wiedemann (bwiedem...@suse.de) wrote:
>>> Without this patch, a VM on a Opteron G3 host will have the svm flag, but
>>> the kvm-amd module fails to load in there, complaining that it needs
>>> cpuid 0x800a
>>>
>>> I have successfully built and tested this for 3+ years in production
>>> on Opteron G3 servers.
> 
> Have you reproduced the bug on QEMU 2.8 or newer?  The problem
> you describe should be fixed by the following commit (from ~2.5
> years ago).
> 
> commit 0c3d7c0051576d220e6da0a8ac08f2d8482e2f0b
> target-i386: Enable CPUID[0x800A] if SVM is enabled

I was still on qemu-2.6.2 so it is good to know that this might work out
of the box with 2.8+

Thanks for the pointer.



Re: [Qemu-devel] hw/s390x/ipl: Dubious use of qdev_reset_all_fn

2019-05-27 Thread Markus Armbruster
Peter Maydell  writes:

> On Fri, 24 May 2019 at 20:47, Christian Borntraeger
>  wrote:
>> While this patch is certainly ok, I find it disturbing that qdev devices are 
>> being resetted,
>> but qom devices not.
>
> It's not a qdev-vs-QOM thing. Anything which is a DeviceState
> has a reset method, but only devices which are somewhere
> rooted in the bus-tree that starts with the "main system
> bus" (aka sysbus) get reset by the vl.c-registered "reset
> everything on the system bus". Devices which are SysBusDevice
> get auto-parented onto the sysbus, and so get reset. Devices
> like PCI devices or SCSI devices get put onto the PCI
> bus or the SCSI bus, and those buses are in turn children
> of some host-controller device which is on the sysbus, so
> they all get reset. The things that don't get reset are
> "orphan" devices which are neither (a) of a type that gets
> automatically parented onto a bus like SysBusDevice nor
> (b) put specifically onto some other bus.
>
> CPU objects are the other common thing that doesn't get
> reset 'automatically'.
>
> Suggestions for how to restructure reset so this doesn't
> happen are welcome... "reset follows the bus hierarchy"
> works well in some places but is a bit weird in others
> (for SoC containers and the like "follow the QOM
> hierarchy" would make more sense, but I have no idea
> how to usefully transition to a model where you could
> say "for these devices, follow QOM tree for reset" or
> what an API for that would look like).

Here's a QOM composition tree for the ARM virt machine (-nodefaults
-device e1000) as visible in qom-fuse under /machine, with irq and
qemu:memory-region ommitted for brevity:

machine  virt-4.1-machine
  +-- fw_cfg  fw_cfg_mem
  +-- peripheral  container
  +-- peripheral-anon  container
  | +-- device[0]  e1000
  +-- unattached  container
  | +-- device[0]  cortex-a15-arm-cpu
  | +-- device[1]  arm_gic
  | +-- device[2]  arm-gicv2m
  | +-- device[3]  pl011
  | +-- device[4]  pl031
  | +-- device[5]  gpex-pcihost
  | | +-- pcie.0  PCIE
  | | +-- gpex_root  gpex-root
  | +-- device[6]  pl061
  | +-- device[7]  gpio-key
  | +-- device[8]  virtio-mmio
  | | +-- virtio-mmio-bus.0  virtio-mmio-bus
  | .
  | .  more virtio-mmio
  | .
  | +-- device[39]  virtio-mmio
  | | +-- virtio-mmio-bus.31  virtio-mmio-bus
  | +-- device[40]  platform-bus-device
  | +-- sysbus  System
  +-- virt.flash0  cfi.pflash01
  +-- virt.flash1  cfi.pflash01

Observations:

* Some components of the machine are direct children of machine: fw_cfg,
  virt.flash0, virt.flash1

* machine additionally has a few containers: peripheral,
  peripheral-anon, unattached.

* machine/peripheral and machine/peripheral-anon contain the -device
  with and without ID, respectively.

* machine/unattached contains everything else created by code without an
  explicit parent device.  Some (all?) of them should perhaps be direct
  children of machine instead.

Compare to the qdev tree shown by info qtree:

bus: main-system-bus
  type System
  dev: platform-bus-device, id "platform-bus-device"
  dev: fw_cfg_mem, id ""
  dev: virtio-mmio, id ""
bus: virtio-mmio-bus.31
  type virtio-mmio-bus
  ... more virtio-mmio
  dev: virtio-mmio, id ""
bus: virtio-mmio-bus.0
  type virtio-mmio-bus
  dev: gpio-key, id ""
  dev: pl061, id ""
  dev: gpex-pcihost, id ""
bus: pcie.0
  type PCIE
  dev: e1000, id ""
  dev: gpex-root, id ""
  dev: pl031, id ""
  dev: pl011, id ""
  dev: arm-gicv2m, id ""
  dev: arm_gic, id ""
  dev: cfi.pflash01, id ""
  dev: cfi.pflash01, id ""

Observations:

* Composition tree root machine's containers are not in the qtree.

* Composition tree node cortex-a15-arm-cpu is not in the qtree.  That's
  because it's not a qdev (in QOM parlance: not a TYPE_DEVICE).

* In the qtree, every other inner node is a qbus.  These are *leaves* in
  the composition tree.  The qtree's vertex from qbus to qdev is a
  *link* in the composition tree.

  Example: main-system-bus -> pl011 is
  machine/unattached/sysbus/child[4] ->
  ../../../machine/unattached/device[3].

  Example: main-system-bus/gpex-pcihost/pcie.0 -> e1000 is
  machine/unattached/device[5]/pcie.0//child[1] ->
  ../../../../machine/peripheral-anon/device[0].

Now let me ramble a bit on reset.

We could model the reset wiring explicitly: every QOM object that wants
to participate in reset has a reset input pin.  We represent the wiring
as links.  The reset links form a reset tree.

Example: object virt-4.1-machine at machine gets a link reset[4]
pointing to its child object cfi.pflash01 at machine/virt.flash0.

Example: object PCIE at machine/unattached/device[5]/pcie.0 gets a link

Re: [Qemu-devel] Question about wrong ram-node0 reference

2019-05-27 Thread Igor Mammedov
On Sat, 25 May 2019 03:35:20 +
"liujunjie (A)"  wrote:

> Hi, I have met a problem:
> 
> The QEMU version is 2.8.1, the virtual machine is configured with 1G huge 
> pages, two NUMA nodes and four pass-through NVME SSDs.
> 
> After we started the VM, in addition to some QMP queries nothing more has 
> been done, the QEMU aborted after some months later.
> After that, the VM is restarted, and the problem does not reproduce yet.
> And The backtrace of the RCU thread is as follows:
> (gdb) bt
> #0  0x7fd2695f0197 in raise () from /usr/lib64/libc.so.6
> #1  0x7fd2695f1888 in abort () from /usr/lib64/libc.so.6
> #2  0x7fd2695e9206 in __assert_fail_base () from /usr/lib64/libc.so.6
> #3  0x7fd2695e92b2 in __assert_fail () from /usr/lib64/libc.so.6
> #4  0x00476a84 in memory_region_finalize (obj=)
> at /home/abuild/rpmbuild/BUILD/qemu-kvm-2.8.1/memory.c:1512
> #5  0x00763105 in object_deinit (obj=obj@entry=0x1dc1fd0,
> type=type@entry=0x1d065b0) at qom/object.c:448
> #6  0x00763153 in object_finalize (data=0x1dc1fd0) at qom/object.c:462
> #7  0x007627cc in object_property_del_all (obj=obj@entry=0x1dc1f70)
> at qom/object.c:399
> #8  0x00763148 in object_finalize (data=0x1dc1f70) at qom/object.c:461
> #9  0x00764426 in object_unref (obj=) at 
> qom/object.c:897
> #10 0x00473b6b in memory_region_unref (mr=)
> at /home/abuild/rpmbuild/BUILD/qemu-kvm-2.8.1/memory.c:1560
> #11 0x00473bc7 in flatview_destroy (view=0x7fc188b9cb90)
> at /home/abuild/rpmbuild/BUILD/qemu-kvm-2.8.1/memory.c:289
> #12 0x00843be0 in call_rcu_thread (opaque=)
> at util/rcu.c:279
> #13 0x008325c2 in qemu_thread_start (args=args@entry=0x1d00810)
> at util/qemu_thread_posix.c:496
> #14 0x7fd269983dc5 in start_thread () from /usr/lib64/libpthread.so.0
> #15 0x7fd2696b27bd in clone () from /usr/lib64/libc.so.6
> 
> In this core, I found that the reference of "/objects/ram-node0"( the type of 
> ram-node0 is struct "HostMemoryBackendFile") equals to 0 , while the 
> reference of "/objects/ram-node1" equals to 129, more details can be seen at 
> the end of this email.
> 
> I searched through the community, and found a case that had the same error 
> report: https://mail.coreboot.org/pipermail/seabios/2017-September/011799.html
> However, I did not configure pcie_pci_bridge. Besides, qemu aborted in device 
> initialization phase in this case.
That case doesn't seem relevant.

> 
> Also, I try to find out which can reference "/objects/ram-node0" so as to 
> look for the one that may un reference improperly, most of them lie in the 
> function of "render_memory_region" or "phys_section_add" when memory topology 
> changes.
> Later, the temporary flatviews are destroyed by RCU thread, so un reference 
> happened and the backtrace is similar to the one shown above.
> But I am not familiar with the detail of these process, it is hard to keep 
> trace of these memory topology changes.
> 
> My question is:
> How can ram-node0's reference comes down to 0 when the virtual machine is 
> still running?
> 
> Maybe someone who is familiar with memory_region_ref or memory-backend-file 
> can help me figure out.
> Any idea is appreciated.

Could you provide steps to reproduce (incl. command line)? 

[...]
> Thanks,
> Junjie Liu
> 




Re: [Qemu-devel] qapi/misc.json is too big, let's bite off a few chunks

2019-05-27 Thread Markus Armbruster
Eduardo Habkost  writes:

> On Thu, May 23, 2019 at 06:14:18PM +0200, Markus Armbruster wrote:
>> It's nice when QAPI schema modules clearly belong to a single subsystem
>> in addition to "QAPI Schema".  misc.json doesn't, and it's grown fat:
>> 3000+ lines.  Let's move out some stuff.  Here are a few candidates:
>> 
>> * Dump (Marc-André)
>> 
>>   dump-guest-memory, query-dump, DUMP_COMPLETED,
>>   query-dump-guest-memory-capability
>> 
>>   ~200 lines.
>> 
>> * Machine core (Eduardo, Marcel)
>> 
>>   query-machines, query-current-machine, 
>> 
>>   ~60 lines.  Hardly worthwhile from a "let's shrink misc.json" point of
>>   view.  Might be worthwhile from a "let's make get_maintainers.pl
>>   work".
>> 
>> * CPUs (Paolo, Richard)
>> 
>>   query-cpus, query-cpus-fast
>> 
>>   ~300 lines.  The commands are implemented in cpus.c, which MAINTAINERS
>>   covers both under "Main loop" and under "Guest CPU cores (TCG) /
>>   Overall".  Neither feels right to me for these QMP commands.
>
> Should it include query-cpu-* (currently on target.json),
> and query-hotpluggable-cpus?

Interesting question.  We might need both cpu.json and cpu-target.json,
to keep target-independent and target-dependent separated.

>> * NUMA (Eduardo)
>> 
>>   query-memdev, set-numa-node
>> 
>>   ~200 lines.
>> 
>> Opinions?
>> 
>> Additional candidates?
>
> QOM: qom-list, qom-get, qom-set, qom-list-properties, object-add
> object-del.

Also qom-list-types.

~230 lines.

As long as we don't have an active QOM maintainer[*], the benefit is
low.


[*] We need one.  I'm not volunteering.



Re: [Qemu-devel] [PATCH v2 5/6] qapi: Allow documentation for features

2019-05-27 Thread Markus Armbruster
Kevin Wolf  writes:

> Features will be documented in a new part introduced by a "Features:"
> line, after arguments and before named sections.
>
> Signed-off-by: Kevin Wolf 
> ---
>  scripts/qapi/common.py | 43 ++
>  scripts/qapi/doc.py| 11 +++
>  2 files changed, 50 insertions(+), 4 deletions(-)
>
> diff --git a/scripts/qapi/common.py b/scripts/qapi/common.py
> index 1d0f4847db..6a1ec87d41 100644
> --- a/scripts/qapi/common.py
> +++ b/scripts/qapi/common.py
> @@ -132,6 +132,9 @@ class QAPIDoc(object):
>  ARGS means that we're parsing the arguments section. Any symbol name 
> is
>  interpreted as an argument and an ArgSection is created for it.
>  
> +FEATURES means that we're parsing features sections. Any symbol name 
> is
> +interpreted as a feature.
> +
>  VARIOUS is the final part where freeform sections may appear. This
>  includes named sections such as "Return:" as well as unnamed
>  paragraphs. No symbols are allowed any more in this part.
> @@ -139,7 +142,8 @@ class QAPIDoc(object):
>  # Can't make it a subclass of Enum because of Python 2
>  BODY = 0
>  ARGS = 1
> -VARIOUS = 2
> +FEATURES = 2
> +VARIOUS = 3
>  
>  def __init__(self, parser, info):
>  # self._parser is used to report errors with QAPIParseError.  The
> @@ -152,6 +156,7 @@ class QAPIDoc(object):
>  self.body = QAPIDoc.Section()
>  # dict mapping parameter name to ArgSection
>  self.args = OrderedDict()
> +self.features = OrderedDict()
>  # a list of Section
>  self.sections = []
>  # the current section
> @@ -180,6 +185,8 @@ class QAPIDoc(object):
>  self._append_body_line(line)
>  elif self._part == QAPIDoc.SymbolPart.ARGS:
>  self._append_args_line(line)
> +elif self._part == QAPIDoc.SymbolPart.FEATURES:
> +self._append_features_line(line)
>  elif self._part == QAPIDoc.SymbolPart.VARIOUS:
>  self._append_various_line(line)
>  else:
> @@ -215,6 +222,8 @@ class QAPIDoc(object):
>  if name.startswith('@') and name.endswith(':'):
>  self._part = QAPIDoc.SymbolPart.ARGS
>  self._append_args_line(line)
> +elif line == 'Features:':
> +self._part = QAPIDoc.SymbolPart.FEATURES
>  elif self.symbol and self._check_named_section(line, name):
>  self._append_various_line(line)
>  else:
> @@ -231,6 +240,26 @@ class QAPIDoc(object):
>  self._start_args_section(name[1:-1])
>  elif self._check_named_section(line, name):
>  return self._append_various_line(line)

Here you return something...

> +elif (self._section.text.endswith('\n\n')
> +  and line and not line[0].isspace()):
> +if line == 'Features:':
> +self._part = QAPIDoc.SymbolPart.FEATURES
> +return

... and here you don't.

> +else:

Unnecessary else after return.  Let's scratch the else.

> +self._start_section()
> +self._part = QAPIDoc.SymbolPart.VARIOUS
> +return self._append_various_line(line)
> +
> +self._append_freeform(line.strip())
> +
> +def _append_features_line(self, line):
> +name = line.split(' ', 1)[0]
> +
> +if name.startswith('@') and name.endswith(':'):
> +line = line[len(name)+1:]
> +self._start_features_section(name[1:-1])
> +elif self._check_named_section(line, name):
> +return self._append_various_line(line)
>  elif (self._section.text.endswith('\n\n')
>and line and not line[0].isspace()):
>  self._start_section()
> @@ -256,17 +285,23 @@ class QAPIDoc(object):
>  
>  self._append_freeform(line)
>  
> -def _start_args_section(self, name):
> +def _start_symbol_section(self, symbols_dict, name):
>  # FIXME invalid names other than the empty string aren't flagged
>  if not name:
>  raise QAPIParseError(self._parser, "Invalid parameter name")
> -if name in self.args:
> +if name in symbols_dict:
>  raise QAPIParseError(self._parser,
>   "'%s' parameter name duplicated" % name)
>  assert not self.sections
>  self._end_section()
>  self._section = QAPIDoc.ArgSection(name)
> -self.args[name] = self._section
> +symbols_dict[name] = self._section
> +
> +def _start_args_section(self, name):
> +self._start_symbol_section(self.args, name)
> +
> +def _start_features_section(self, name):
> +self._start_symbol_section(self.features, name)
>  
>  def _start_section(self, name=None):
>  if name in ('Returns', 'Since') and self.has_section(nam

[Qemu-devel] [PATCH v3 1/8] configure: permit use of io_uring

2019-05-27 Thread Aarushi Mehta
Signed-off-by: Aarushi Mehta 
Reviewed-by: Stefan Hajnoczi 
---
 configure | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/configure b/configure
index 528b9ff705..acbdf04168 100755
--- a/configure
+++ b/configure
@@ -365,6 +365,7 @@ xen=""
 xen_ctrl_version=""
 xen_pci_passthrough=""
 linux_aio=""
+linux_io_uring=""
 cap_ng=""
 attr=""
 libattr=""
@@ -1255,6 +1256,10 @@ for opt do
   ;;
   --enable-linux-aio) linux_aio="yes"
   ;;
+  --disable-linux-io-uring) linux_io_uring="no"
+  ;;
+  --enable-linux-io-uring) linux_io_uring="yes"
+  ;;
   --disable-attr) attr="no"
   ;;
   --enable-attr) attr="yes"
@@ -1773,6 +1778,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   vde support for vde network
   netmap  support for netmap network
   linux-aio   Linux AIO support
+  linux-io-uring  Linux io_uring support
   cap-ng  libcap-ng support
   attrattr and xattr support
   vhost-net   vhost-net kernel acceleration support
@@ -3962,6 +3968,21 @@ EOF
 linux_aio=no
   fi
 fi
+##
+# linux-io-uring probe
+
+if test "$linux_io_uring" != "no" ; then
+  if $pkg_config liburing; then
+linux_io_uring_cflags=$($pkg_config --cflags liburing)
+linux_io_uring_libs=$($pkg_config --libs liburing)
+linux_io_uring=yes
+  else
+if test "$linux_io_uring" = "yes" ; then
+  feature_not_found "linux io_uring" "Install liburing devel"
+fi
+linux_io_uring=no
+  fi
+fi
 
 ##
 # TPM emulation is only on POSIX
@@ -6378,6 +6399,7 @@ echo "PIE   $pie"
 echo "vde support   $vde"
 echo "netmap support$netmap"
 echo "Linux AIO support $linux_aio"
+echo "Linux io_uring support $linux_io_uring"
 echo "ATTR/XATTR support $attr"
 echo "Install blobs $blobs"
 echo "KVM support   $kvm"
@@ -6858,6 +6880,11 @@ fi
 if test "$linux_aio" = "yes" ; then
   echo "CONFIG_LINUX_AIO=y" >> $config_host_mak
 fi
+if test "$linux_io_uring" = "yes" ; then
+  echo "CONFIG_LINUX_IO_URING=y" >> $config_host_mak
+  echo "LINUX_IO_URING_CFLAGS=$linux_io_uring_cflags" >> $config_host_mak
+  echo "LINUX_IO_URING_LIBS=$linux_io_uring_libs" >> $config_host_mak
+fi
 if test "$attr" = "yes" ; then
   echo "CONFIG_ATTR=y" >> $config_host_mak
 fi
-- 
2.17.1




[Qemu-devel] [PATCH v3 0/8] Add support for io_uring

2019-05-27 Thread Aarushi Mehta
This patch series adds support for the newly developed io_uring Linux AIO 
interface. Linux io_uring is faster than Linux's AIO asynchronous I/O code, 
offers efficient buffered asynchronous I/O support, the ability to do I/O 
without performing a system call via polled I/O, and other efficiency 
enhancements. Testing it requires a host kernel (5.1+) and the liburing 
library. Use the option -drive aio=io_uring to enable it.

v2:
- Fix Patchew errors
- Option now enumerates only for CONFIG_LINUX in qapi
- Removed redudant and broken code in io_uring
- io_uring now aborts on sqe leak

v3:
- Fix major errors in io_uring (sorry)
- Option now enumerates for CONFIG_LINUX_IO_URING
- pkg config support added

Aarushi Mehta (8):
  configure: permit use of io_uring
  qapi/block-core: add option for io_uring
  block/block: add BDRV flag for io_uring
  block/io_uring: implements interfaces for io_uring
  stubs: add stubs for io_uring interface
  util/async: add aio interfaces for io_uring
  blockdev: accept io_uring as option
  block/fileposix: extend to use io_uring

 MAINTAINERS |   8 ++
 block/Makefile.objs |   3 +
 block/file-posix.c  |  65 -
 block/io_uring.c| 301 
 blockdev.c  |   4 +-
 configure   |  27 
 include/block/aio.h |  16 ++-
 include/block/block.h   |   1 +
 include/block/raw-aio.h |  15 ++
 qapi/block-core.json|   6 +-
 stubs/Makefile.objs |   1 +
 stubs/io_uring.c|  32 +
 util/async.c|  36 +
 13 files changed, 506 insertions(+), 9 deletions(-)
 create mode 100644 block/io_uring.c
 create mode 100644 stubs/io_uring.c

-- 
2.17.1




[Qemu-devel] [PATCH v3 3/8] block/block: add BDRV flag for io_uring

2019-05-27 Thread Aarushi Mehta
Signed-off-by: Aarushi Mehta 
---
 include/block/block.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/block/block.h b/include/block/block.h
index 9b083e2bca..60f7c6c01c 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -121,6 +121,7 @@ typedef struct HDGeometry {
   ignoring the format layer */
 #define BDRV_O_NO_IO   0x1 /* don't initialize for I/O */
 #define BDRV_O_AUTO_RDONLY 0x2 /* degrade to read-only if opening 
read-write fails */
+#define BDRV_O_IO_URING0x4 /* use io_uring instead of the thread pool 
*/
 
 #define BDRV_O_CACHE_MASK  (BDRV_O_NOCACHE | BDRV_O_NO_FLUSH)
 
-- 
2.17.1




[Qemu-devel] [PATCH v3 2/8] qapi/block-core: add option for io_uring

2019-05-27 Thread Aarushi Mehta
Signed-off-by: Aarushi Mehta 
---
 qapi/block-core.json | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 7ccbfff9d0..2773803890 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2776,11 +2776,13 @@
 #
 # @threads: Use qemu's thread pool
 # @native:  Use native AIO backend (only Linux and Windows)
+# @io_uring:Use linux io_uring (only Linux)
 #
-# Since: 2.9
+# Since: 2.9 @iouring Since: 4.1
 ##
 { 'enum': 'BlockdevAioOptions',
-  'data': [ 'threads', 'native' ] }
+  'data': [ 'threads', 'native',
+{ 'name': 'io_uring', 'if': 'defined(CONFIG_LINUX_IO_URING)' } ] }
 
 ##
 # @BlockdevCacheOptions:
-- 
2.17.1




[Qemu-devel] [PATCH v3 4/8] block/io_uring: implements interfaces for io_uring

2019-05-27 Thread Aarushi Mehta
Signed-off-by: Aarushi Mehta 
---
We need nested loops in ioq_submit because overflowed requests may be
permitted to submit if existing ones are cleared. Hence, failure to 
fulfill an overflow request must break separately from normal submission.

For now, to prevent any infinite loops, if the kernel fails to submit
for any reason, we break (ie when number of submissions is zero). 

Now this is tested with a  kali img with trace events to ensure it is 
actually running. The initramfs boots switched to threads.
 
 MAINTAINERS |   7 +
 block/Makefile.objs |   3 +
 block/io_uring.c| 301 
 include/block/aio.h |  16 ++-
 include/block/raw-aio.h |  15 ++
 5 files changed, 341 insertions(+), 1 deletion(-)
 create mode 100644 block/io_uring.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 3cacd751bf..462c00a021 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2504,6 +2504,13 @@ F: block/file-posix.c
 F: block/file-win32.c
 F: block/win32-aio.c
 
+Linux io_uring
+M: Aarushi Mehta 
+R: Stefan Hajnoczi 
+L: qemu-bl...@nongnu.org
+S: Maintained
+F: block/io_uring.c
+
 qcow2
 M: Kevin Wolf 
 M: Max Reitz 
diff --git a/block/Makefile.objs b/block/Makefile.objs
index 7a81892a52..348a003af5 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -18,6 +18,7 @@ block-obj-y += block-backend.o snapshot.o qapi.o
 block-obj-$(CONFIG_WIN32) += file-win32.o win32-aio.o
 block-obj-$(CONFIG_POSIX) += file-posix.o
 block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
+block-obj-$(CONFIG_LINUX_IO_URING) += io_uring.o
 block-obj-y += null.o mirror.o commit.o io.o create.o
 block-obj-y += throttle-groups.o
 block-obj-$(CONFIG_LINUX) += nvme.o
@@ -61,5 +62,7 @@ block-obj-$(if $(CONFIG_LZFSE),m,n) += dmg-lzfse.o
 dmg-lzfse.o-libs   := $(LZFSE_LIBS)
 qcow.o-libs:= -lz
 linux-aio.o-libs   := -laio
+io_uring.o-cflags  := $(LINUX_IO_URING_CFLAGS)
+io_uring.o-libs:= $(LINUX_IO_URING_LIBS)
 parallels.o-cflags := $(LIBXML2_CFLAGS)
 parallels.o-libs   := $(LIBXML2_LIBS)
diff --git a/block/io_uring.c b/block/io_uring.c
new file mode 100644
index 00..2a8c48a7dc
--- /dev/null
+++ b/block/io_uring.c
@@ -0,0 +1,301 @@
+/*
+ * Linux io_uring support.
+ *
+ * Copyright (C) 2009 IBM, Corp.
+ * Copyright (C) 2009 Red Hat, Inc.
+ * Copyright (C) 2019 Aarushi Mehta
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#include "qemu/osdep.h"
+#include 
+#include "qemu-common.h"
+#include "block/aio.h"
+#include "qemu/queue.h"
+#include "block/block.h"
+#include "block/raw-aio.h"
+#include "qemu/coroutine.h"
+#include "qapi/error.h"
+
+#define MAX_EVENTS 128
+
+typedef struct LuringAIOCB {
+BlockAIOCB common;
+Coroutine *co;
+struct io_uring_sqe sqeq;
+int ret;
+QSIMPLEQ_ENTRY(LuringAIOCB) next;
+} LuringAIOCB;
+
+typedef struct LuringQueue {
+int plugged;
+unsigned int in_queue;
+unsigned int in_flight;
+bool blocked;
+QSIMPLEQ_HEAD(, LuringAIOCB) sq_overflow;
+} LuringQueue;
+
+typedef struct LuringState {
+AioContext *aio_context;
+
+struct io_uring ring;
+
+/* io queue for submit at batch.  Protected by AioContext lock. */
+LuringQueue io_q;
+
+/* I/O completion processing.  Only runs in I/O thread.  */
+QEMUBH *completion_bh;
+} LuringState;
+
+static void ioq_submit(LuringState *s);
+
+static inline int io_cqe_ret(struct io_uring_cqe *cqe)
+{
+return cqe->res;
+}
+
+/**
+ * qemu_luring_process_completions:
+ * @s: AIO state
+ *
+ * Fetches completed I/O requests, consumes cqes and invokes their callbacks.
+ *
+ */
+static void qemu_luring_process_completions(LuringState *s)
+{
+struct io_uring_cqe *cqes;
+/*
+ * Request completion callbacks can run the nested event loop.
+ * Schedule ourselves so the nested event loop will "see" remaining
+ * completed requests and process them.  Without this, completion
+ * callbacks that wait for other requests using a nested event loop
+ * would hang forever.
+ */
+qemu_bh_schedule(s->completion_bh);
+
+while (!io_uring_peek_cqe(&s->ring, &cqes)) {
+io_uring_cqe_seen(&s->ring, cqes);
+
+LuringAIOCB *luringcb = io_uring_cqe_get_data(cqes);
+luringcb->ret = io_cqe_ret(cqes);
+if (luringcb->co) {
+/*
+ * If the coroutine is already entered it must be in ioq_submit()
+ * and will notice luringcb->ret has been filled in when it
+ * eventually runs later. Coroutines cannot be entered recursively
+ * so avoid doing that!
+ */
+if (!qemu_coroutine_entered(luringcb->co)) {
+aio_co_wake(luringcb->co);
+}
+} else {
+luringcb->common.cb(luringcb->common.opaque, luringcb->ret);
+qemu_aio_unref(luringcb);
+}
+/* Change counters one-by-one because we can be nested. */

[Qemu-devel] [PATCH v3 5/8] stubs: add stubs for io_uring interface

2019-05-27 Thread Aarushi Mehta
Signed-off-by: Aarushi Mehta 
---
 MAINTAINERS |  1 +
 stubs/Makefile.objs |  1 +
 stubs/io_uring.c| 32 
 3 files changed, 34 insertions(+)
 create mode 100644 stubs/io_uring.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 462c00a021..6c6672bda3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2510,6 +2510,7 @@ R: Stefan Hajnoczi 
 L: qemu-bl...@nongnu.org
 S: Maintained
 F: block/io_uring.c
+F: stubs/io_uring.c
 
 qcow2
 M: Kevin Wolf 
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 73452ad265..ea158cf0ee 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -13,6 +13,7 @@ stub-obj-y += iothread.o
 stub-obj-y += iothread-lock.o
 stub-obj-y += is-daemonized.o
 stub-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
+stub-obj-$(CONFIG_LINUX_IO_URING) += io_uring.o
 stub-obj-y += machine-init-done.o
 stub-obj-y += migr-blocker.o
 stub-obj-y += change-state-handler.o
diff --git a/stubs/io_uring.c b/stubs/io_uring.c
new file mode 100644
index 00..622d1e4648
--- /dev/null
+++ b/stubs/io_uring.c
@@ -0,0 +1,32 @@
+/*
+ * Linux io_uring support.
+ *
+ * Copyright (C) 2009 IBM, Corp.
+ * Copyright (C) 2009 Red Hat, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#include "qemu/osdep.h"
+#include "block/aio.h"
+#include "block/raw-aio.h"
+
+void luring_detach_aio_context(LuringState *s, AioContext *old_context)
+{
+abort();
+}
+
+void luring_attach_aio_context(LuringState *s, AioContext *new_context)
+{
+abort();
+}
+
+LuringState *luring_init(Error **errp)
+{
+abort();
+}
+
+void luring_cleanup(LuringState *s)
+{
+abort();
+}
-- 
2.17.1




[Qemu-devel] [PATCH v3 7/8] blockdev: accept io_uring as option

2019-05-27 Thread Aarushi Mehta
Signed-off-by: Aarushi Mehta 
Reviewed-by: Stefan Hajnoczi 
---
 blockdev.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index 79fbac8450..b44b9d660d 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -386,6 +386,8 @@ static void extract_common_blockdev_options(QemuOpts *opts, 
int *bdrv_flags,
 if ((aio = qemu_opt_get(opts, "aio")) != NULL) {
 if (!strcmp(aio, "native")) {
 *bdrv_flags |= BDRV_O_NATIVE_AIO;
+} else if (!strcmp(aio, "io_uring")) {
+*bdrv_flags |= BDRV_O_IO_URING;
 } else if (!strcmp(aio, "threads")) {
 /* this is the default */
 } else {
@@ -4547,7 +4549,7 @@ QemuOptsList qemu_common_drive_opts = {
 },{
 .name = "aio",
 .type = QEMU_OPT_STRING,
-.help = "host AIO implementation (threads, native)",
+.help = "host AIO implementation (threads, native, io_uring)",
 },{
 .name = BDRV_OPT_CACHE_WB,
 .type = QEMU_OPT_BOOL,
-- 
2.17.1




[Qemu-devel] [PATCH v3 6/8] util/async: add aio interfaces for io_uring

2019-05-27 Thread Aarushi Mehta
Signed-off-by: Aarushi Mehta 
Reviewed-by: Stefan Hajnoczi 
---
 util/async.c | 36 
 1 file changed, 36 insertions(+)

diff --git a/util/async.c b/util/async.c
index c10642a385..2709f0edc3 100644
--- a/util/async.c
+++ b/util/async.c
@@ -277,6 +277,14 @@ aio_ctx_finalize(GSource *source)
 }
 #endif
 
+#ifdef CONFIG_LINUX_IO_URING
+if (ctx->linux_io_uring) {
+luring_detach_aio_context(ctx->linux_io_uring, ctx);
+luring_cleanup(ctx->linux_io_uring);
+ctx->linux_io_uring = NULL;
+}
+#endif
+
 assert(QSLIST_EMPTY(&ctx->scheduled_coroutines));
 qemu_bh_delete(ctx->co_schedule_bh);
 
@@ -341,6 +349,29 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx)
 }
 #endif
 
+#ifdef CONFIG_LINUX_IO_URING
+LuringState *aio_setup_linux_io_uring(AioContext *ctx, Error **errp)
+{
+if (ctx->linux_io_uring) {
+return ctx->linux_io_uring;
+}
+
+ctx->linux_io_uring = luring_init(errp);
+if (!ctx->linux_io_uring) {
+return NULL;
+}
+
+luring_attach_aio_context(ctx->linux_io_uring, ctx);
+return ctx->linux_io_uring;
+}
+
+LuringState *aio_get_linux_io_uring(AioContext *ctx)
+{
+assert(ctx->linux_io_uring);
+return ctx->linux_io_uring;
+}
+#endif
+
 void aio_notify(AioContext *ctx)
 {
 /* Write e.g. bh->scheduled before reading ctx->notify_me.  Pairs
@@ -432,6 +463,11 @@ AioContext *aio_context_new(Error **errp)
 #ifdef CONFIG_LINUX_AIO
 ctx->linux_aio = NULL;
 #endif
+
+#ifdef CONFIG_LINUX_IO_URING
+ctx->linux_io_uring = NULL;
+#endif
+
 ctx->thread_pool = NULL;
 qemu_rec_mutex_init(&ctx->lock);
 timerlistgroup_init(&ctx->tlg, aio_timerlist_notify, ctx);
-- 
2.17.1




[Qemu-devel] [PATCH v3 8/8] block/fileposix: extend to use io_uring

2019-05-27 Thread Aarushi Mehta
Signed-off-by: Aarushi Mehta 
---
 block/file-posix.c | 65 ++
 1 file changed, 60 insertions(+), 5 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index d018429672..50899064df 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -154,6 +154,7 @@ typedef struct BDRVRawState {
 bool has_write_zeroes:1;
 bool discard_zeroes:1;
 bool use_linux_aio:1;
+bool use_linux_io_uring:1;
 bool page_cache_inconsistent:1;
 bool has_fallocate;
 bool needs_alignment;
@@ -423,7 +424,7 @@ static QemuOptsList raw_runtime_opts = {
 {
 .name = "aio",
 .type = QEMU_OPT_STRING,
-.help = "host AIO implementation (threads, native)",
+.help = "host AIO implementation (threads, native, io_uring)",
 },
 {
 .name = "locking",
@@ -494,6 +495,9 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 goto fail;
 }
 s->use_linux_aio = (aio == BLOCKDEV_AIO_OPTIONS_NATIVE);
+#ifdef CONFIG_LINUX_IO_URING
+s->use_linux_io_uring = (aio == BLOCKDEV_AIO_OPTIONS_IO_URING);
+#endif
 
 locking = qapi_enum_parse(&OnOffAuto_lookup,
   qemu_opt_get(opts, "locking"),
@@ -557,7 +561,9 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 s->shared_perm = BLK_PERM_ALL;
 
 #ifdef CONFIG_LINUX_AIO
- /* Currently Linux does AIO only for files opened with O_DIRECT */
+/*
+ * Currently Linux does AIO only for files opened with O_DIRECT
+ */
 if (s->use_linux_aio) {
 if (!(s->open_flags & O_DIRECT)) {
 error_setg(errp, "aio=native was specified, but it requires "
@@ -578,6 +584,21 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 goto fail;
 }
 #endif /* !defined(CONFIG_LINUX_AIO) */
+#ifdef CONFIG_LINUX_IO_URING
+if (s->use_linux_io_uring) {
+if (!aio_setup_linux_io_uring(bdrv_get_aio_context(bs), errp)) {
+error_prepend(errp, "Unable to use io_uring: ");
+goto fail;
+}
+}
+#else
+if (s->use_linux_io_uring) {
+error_setg(errp, "aio=io_uring was specified, but is not supported "
+ "in this build.");
+ret = -EINVAL;
+goto fail;
+}
+#endif /* !defined(CONFIG_LINUX_IO_URING) */
 
 s->has_discard = true;
 s->has_write_zeroes = true;
@@ -1883,6 +1904,12 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, 
uint64_t offset,
 LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
 assert(qiov->size == bytes);
 return laio_co_submit(bs, aio, s->fd, offset, qiov, type);
+#endif
+#ifdef CONFIG_LINUX_IO_URING
+} else if (s->use_linux_io_uring) {
+LuringState *aio = 
aio_get_linux_io_uring(bdrv_get_aio_context(bs));
+assert(qiov->size == bytes);
+return luring_co_submit(bs, aio, s->fd, offset, qiov, type);
 #endif
 }
 }
@@ -1920,24 +1947,40 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState 
*bs, uint64_t offset,
 
 static void raw_aio_plug(BlockDriverState *bs)
 {
-#ifdef CONFIG_LINUX_AIO
+#if defined CONFIG_LINUX_AIO || defined CONFIG_LINUX_IO_URING
 BDRVRawState *s = bs->opaque;
+#endif
+#ifdef CONFIG_LINUX_AIO
 if (s->use_linux_aio) {
 LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
 laio_io_plug(bs, aio);
 }
 #endif
+#ifdef CONFIG_LINUX_IO_URING
+if (s->use_linux_io_uring) {
+LuringState *aio = aio_get_linux_io_uring(bdrv_get_aio_context(bs));
+luring_io_plug(bs, aio);
+}
+#endif
 }
 
 static void raw_aio_unplug(BlockDriverState *bs)
 {
-#ifdef CONFIG_LINUX_AIO
+#if defined CONFIG_LINUX_AIO || defined CONFIG_LINUX_IO_URING
 BDRVRawState *s = bs->opaque;
+#endif
+#ifdef CONFIG_LINUX_AIO
 if (s->use_linux_aio) {
 LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
 laio_io_unplug(bs, aio);
 }
 #endif
+#ifdef CONFIG_LINUX_IO_URING
+if (s->use_linux_aio) {
+LuringState *aio = aio_get_linux_io_uring(bdrv_get_aio_context(bs));
+luring_io_unplug(bs, aio);
+}
+#endif
 }
 
 static int raw_co_flush_to_disk(BlockDriverState *bs)
@@ -1963,8 +2006,10 @@ static int raw_co_flush_to_disk(BlockDriverState *bs)
 static void raw_aio_attach_aio_context(BlockDriverState *bs,
AioContext *new_context)
 {
+#if defined CONFIG_LINUX_AIO || defined CONFIG_LINUX_IO_URING
+BDRVRawState *s = bs->opaque;
+#endif
 #ifdef CONFIG_LINUX_AIO
-BDRVRawState *s = bs->opaque;
 if (s->use_linux_aio) {
 Error *local_err;
 if (!aio_setup_linux_aio(new_context, &local_err)) {
@@ -1974,6 +2019,16 @@ static void raw_aio_attach_aio_context(BlockDriverState 
*bs,
 }
 }
 #endif
+#ifdef CONFIG_LINUX_IO_URING
+if (s->use_linux_io_uring) {
+

Re: [Qemu-devel] [PATCH v3] monitor: Fix return type of monitor_fdset_dup_fd_find

2019-05-27 Thread Yury Kotov
Ping

23.05.2019, 12:45, "Yury Kotov" :
> monitor_fdset_dup_fd_find_remove() and monitor_fdset_dup_fd_find()
> return mon_fdset->id which is int64_t. Downcasting from int64_t to int
> leads to a bug with removing fd from fdset with id >= 2^32.
> So, fix return types for these function.
>
> Signed-off-by: Yury Kotov 
> Reviewed-by: Markus Armbruster 
> ---
>  include/monitor/monitor.h | 2 +-
>  monitor.c | 4 ++--
>  stubs/fdset.c | 2 +-
>  3 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
> index 86656297f1..51f048d61f 100644
> --- a/include/monitor/monitor.h
> +++ b/include/monitor/monitor.h
> @@ -45,6 +45,6 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool has_fdset_id, 
> int64_t fdset_id,
>  int monitor_fdset_get_fd(int64_t fdset_id, int flags);
>  int monitor_fdset_dup_fd_add(int64_t fdset_id, int dup_fd);
>  void monitor_fdset_dup_fd_remove(int dup_fd);
> -int monitor_fdset_dup_fd_find(int dup_fd);
> +int64_t monitor_fdset_dup_fd_find(int dup_fd);
>
>  #endif /* MONITOR_H */
> diff --git a/monitor.c b/monitor.c
> index 6428eb3b7e..a0e637f7d6 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -2602,7 +2602,7 @@ err:
>  return -1;
>  }
>
> -static int monitor_fdset_dup_fd_find_remove(int dup_fd, bool remove)
> +static int64_t monitor_fdset_dup_fd_find_remove(int dup_fd, bool remove)
>  {
>  MonFdset *mon_fdset;
>  MonFdsetFd *mon_fdset_fd_dup;
> @@ -2630,7 +2630,7 @@ err:
>  return -1;
>  }
>
> -int monitor_fdset_dup_fd_find(int dup_fd)
> +int64_t monitor_fdset_dup_fd_find(int dup_fd)
>  {
>  return monitor_fdset_dup_fd_find_remove(dup_fd, false);
>  }
> diff --git a/stubs/fdset.c b/stubs/fdset.c
> index 4f3edf2ea4..a1b8f41f62 100644
> --- a/stubs/fdset.c
> +++ b/stubs/fdset.c
> @@ -7,7 +7,7 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int dup_fd)
>  return -1;
>  }
>
> -int monitor_fdset_dup_fd_find(int dup_fd)
> +int64_t monitor_fdset_dup_fd_find(int dup_fd)
>  {
>  return -1;
>  }
> --
> 2.21.0



Re: [Qemu-devel] Failure to submit patches, two questions - what should I do?

2019-05-27 Thread Thomas Huth
On 26/05/2019 10.09, Lucien Anti-Spam via Qemu-devel wrote:
>  
> 
>> On Sunday, May 26, 2019, 4:45:26 PM GMT+9,  wrote: 
> > Subject; [Qemu-devel] [PATCH] Incorrect Stack Pointer shadow register 
> support on some m68k CPUs > .> snip> .> === OUTPUT BEGIN ===
>>  ERROR: Author email address is mangled by the mailing list
>>  #2: 
>>  Author: Lucien Murray-Pitts via Qemu-devel 
>>  
>>  WARNING: Block comments use a leading /* on a separate line
>>  #46: FILE: target/m68k/cpu.h:465:
>>  +/* The ColdFire core ISA is a RISC-style reduction of the 68000 series
>>  
>>  WARNING: Block comments use * on subsequent lines
>>  #47: FILE: target/m68k/cpu.h:466:> 
>>  +/* The ColdFire core ISA is a RISC-style reduction of the 68000 series
>> +  Whilst the 68000 flourished by adding extended stack/instructions 
>> in>.> snip
> Q1:  Name mangling seems to be a bug, whats going on - how should I be 
> submiting now?        ( perl script didnt catch it AND there seems to already 
> be a patch from half year or more ago .. 
> https://patchwork.kernel.org/patch/10662525/ )  whats the correct action here?

It's a problem with your mail provider (yahoo.com), you personally can't
do anything about this (except complaining to your provider or to switch
to another one). See this URL for some details:

https://lists.gnu.org/archive/html/qemu-devel/2017-03/msg05625.html

Unless you are bothered and want to switch your provider, you can ignore
the warning here, it's rather a note to the maintainer that they've got
to adjust the "author" of the patch manually when they pick up the patch.

> Q2:  I am getting a WARNING but I believe it is an exception in this case.    
>     yes I know it breaks the coding style BUT this coding style was already 
> there for these comments.        Should I submit this patch with a move to 
> the RIGHT coding style? or will this patch be accepted as the code is older 
> style?

It's up to the maintainer of the subsystem (Laurent?) - IMHO it's ok to
ask for an exception in this case, but a separate clean-up patch is
certainly also welcome.

 Thomas



Re: [Qemu-devel] [PATCH v4 00/20] monitor: add asynchronous command type

2019-05-27 Thread Markus Armbruster
Marc-André Lureau  writes:

> Hi
>
> On Thu, May 23, 2019 at 9:52 AM Markus Armbruster  wrote:
>> I'm not sure how asynchronous commands could support reconnect and
>> resume.
>
> The same way as current commands, including job commands.

Consider the following scenario: a management application such as
libvirt starts a long-running task with the intent to monitor it until
it finishes.  Half-way through, the management application needs to
disconnect and reconnect for some reason (systemctl restart, or crash &
recover, or whatever).

If the long-running task is a job, the management application can resume
after reconnect: the job's ID is as valid as it was before, and the
commands to query and control the job work as before.

What if it's and asynchronous command?

>> >> I'm ignoring "etc" unless you expand it into something specific.
>> >>
>> >> I'm also not taking the "weird" bait :)
>> >> > The following series implements an async command solution instead. By
>> >> > introducing a session context and a command return handler, it can:
>> >> > - defer the return, allowing the mainloop to reenter
>> >> > - return only to the caller (instead of broadcast events for reply)
>> >> > - optionnally allow cancellation when the client is gone
>> >> > - track on-going qapi command(s) per client/session
>> >> >
>> >> > and without introduction of new QMP APIs or client visible change.
>> >>
>> >> What do async commands provide that jobs lack?
>> >>
>> >> Why do we want both?
>> >
>> > They are different things, last we discussed it: jobs are geared
>> > toward block device operations,
>>
>> Historical accident.  We've discussed using them for non-blocky stuff,
>> such as migration.  Of course, discussions are cheap, code is what
>> counts.
>
> Using job API means providing new (& more complex) APIs to client.
>
> The screendump fix here doesn't need new API, it needs new internal
> dispatch of QMP commands: the purpose of this series.
>
> Whenever we can solve things on qemu side, I would rather not
> deprecate current API.

Making a synchronous command asynchronous definitely changes API.

You could still argue the change is easier to handle for QMP clients
than a replacement by a job.

[...]



Re: [Qemu-devel] Failure to submit patches, two questions - what should I do?

2019-05-27 Thread Laurent Vivier

On 27/05/2019 10:13, Thomas Huth wrote:

On 26/05/2019 10.09, Lucien Anti-Spam via Qemu-devel wrote:
  


> On Sunday, May 26, 2019, 4:45:26 PM GMT+9,  wrote: > Subject; 
[Qemu-devel] [PATCH] Incorrect Stack Pointer shadow register support on some m68k CPUs > .> 
snip> .> === OUTPUT BEGIN ===

  ERROR: Author email address is mangled by the mailing list
  #2:
  Author: Lucien Murray-Pitts via Qemu-devel 
  
  WARNING: Block comments use a leading /* on a separate line

  #46: FILE: target/m68k/cpu.h:465:
  +/* The ColdFire core ISA is a RISC-style reduction of the 68000 series
  
  WARNING: Block comments use * on subsequent lines

  #47: FILE: target/m68k/cpu.h:466:>
  +/* The ColdFire core ISA is a RISC-style reduction of the 68000 series
+  Whilst the 68000 flourished by adding extended stack/instructions 
in>.> snip

Q1:  Name mangling seems to be a bug, whats going on - how should I be 
submiting now?        ( perl script didnt catch it AND there seems to already 
be a patch from half year or more ago .. 
https://patchwork.kernel.org/patch/10662525/ )  whats the correct action here?


It's a problem with your mail provider (yahoo.com), you personally can't
do anything about this (except complaining to your provider or to switch
to another one). See this URL for some details:

https://lists.gnu.org/archive/html/qemu-devel/2017-03/msg05625.html

Unless you are bothered and want to switch your provider, you can ignore
the warning here, it's rather a note to the maintainer that they've got
to adjust the "author" of the patch manually when they pick up the patch.


Q2:  I am getting a WARNING but I believe it is an exception in this case.      
  yes I know it breaks the coding style BUT this coding style was already there 
for these comments.        Should I submit this patch with a move to the RIGHT 
coding style? or will this patch be accepted as the code is older style?


It's up to the maintainer of the subsystem (Laurent?) - IMHO it's ok to
ask for an exception in this case, but a separate clean-up patch is
certainly also welcome.


In this case I thought it was just a missing carriage-return on the 
first line, but in fact we have a missing '*' on every line, so, yes, I 
agree it can stay as-is and a separate clean-up patch can be sent later.


Thanks,
Laurent





Re: [Qemu-devel] [PATCH 0/2]: vmdk: Add read-only support for the new seSparse format

2019-05-27 Thread Sam Eiderman
Gentle ping

> On 12 May 2019, at 11:14, Sam  wrote:
> 
> Gentle ping on "[PATCH 2/2] vmdk: Add read-only support for seSparse 
> snapshots”.
> Yuchenlin reviewed "[PATCH 1/2] vmdk: Fix comment regarding max l1_size 
> coverage”.
> 
> Thanks, Sam
> 
>> On 24 Apr 2019, at 10:48, Sam Eiderman > > wrote:
>> 
>> VMware introduced a new snapshot format in VMFS6 - seSparse (Space
>> Efficient Sparse) which is the default format available in ESXi 6.7.
>> Add read-only support for the new snapshot format.
>> 
> 



Re: [Qemu-devel] [Qemu-block] [PATCH v3 2/8] qapi/block-core: add option for io_uring

2019-05-27 Thread Stefan Hajnoczi
On Mon, May 27, 2019 at 9:04 AM Aarushi Mehta  wrote:
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 7ccbfff9d0..2773803890 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2776,11 +2776,13 @@
>  #
>  # @threads: Use qemu's thread pool
>  # @native:  Use native AIO backend (only Linux and Windows)
> +# @io_uring:Use linux io_uring (only Linux)
>  #
> -# Since: 2.9
> +# Since: 2.9 @iouring Since: 4.1

The convention in QAPI schema files is to mark the newly added parameter:

+# @io_uring:Use linux io_uring (only Linux, since 4.1)



Re: [Qemu-devel] [PATCH v3 3/8] block/block: add BDRV flag for io_uring

2019-05-27 Thread Stefan Hajnoczi
Reviewed-by: Stefan Hajnoczi 



Re: [Qemu-devel] [PATCH v3 5/8] stubs: add stubs for io_uring interface

2019-05-27 Thread Stefan Hajnoczi
On Mon, May 27, 2019 at 9:09 AM Aarushi Mehta  wrote:
>
> Signed-off-by: Aarushi Mehta 

Reviewed-by: Stefan Hajnoczi 



Re: [Qemu-devel] [PATCH] Incorrect Stack Pointer shadow register support on some m68k CPUs

2019-05-27 Thread Laurent Vivier

On 27/05/2019 05:43, Lucien Anti-Spam wrote:


 > On Sunday, May 26, 2019, 10:10:39 PM GMT+9, Laurent Vivier 
 wrote:

 > On 26/05/2019 09:28, Lucien Murray-Pitts wrote:
 >> On CPU32 and the early 68000 and 68010 the ISP doesnt exist.
 >> These CPUs only have SSP/USP.
 >>
[SNIP]
 >> The movec instruction when accessing these shadow registers
 >> in some configurations should issue a TRAP.  This patch does not
 >> add this funcitonality to the helpers.
 >>

 >I think it's better to also update movec in the same patch.
[LMP] Movec should be undefined (coldfire manual) for registers it 
doesnt know.  The MC680X0 manual is less clear.
Technically this could be just leaving the operation of the instruction 
alone and allowing it to pass back MSP/ISP/USP as it currently does.  My 
thinking is this is less likely to break anything


In fact, code in m68k_movec_from()/m68k_movec_to() need rework because 
they trigger a cpu_abort() with unknown code, they need rework too. So I 
think we can just do as you propose for the moment.


Thanks,
Laurent



Re: [Qemu-devel] [PATCH v11 02/20] gdbstub: Implement deatch (D pkt) with new infra

2019-05-27 Thread Alex Bennée


Jon Doron  writes:

> Signed-off-by: Jon Doron 

Reviewed-by: Alex Bennée 

> ---
>  gdbstub.c | 93 +++
>  1 file changed, 53 insertions(+), 40 deletions(-)
>
> diff --git a/gdbstub.c b/gdbstub.c
> index e6d895177b..307366b250 100644
> --- a/gdbstub.c
> +++ b/gdbstub.c
> @@ -1413,11 +1413,6 @@ static inline int startswith(const char *string, const 
> char *pattern)
>return !strncmp(string, pattern, strlen(pattern));
>  }
>
> -static int process_string_cmd(
> -GDBState *s, void *user_ctx, const char *data,
> -const GdbCmdParseEntry *cmds, int num_cmds)
> -__attribute__((unused));
> -
>  static int process_string_cmd(GDBState *s, void *user_ctx, const char *data,
>const GdbCmdParseEntry *cmds, int num_cmds)
>  {
> @@ -1463,6 +1458,41 @@ static int process_string_cmd(GDBState *s, void 
> *user_ctx, const char *data,
>  return -1;
>  }
>
> +static void handle_detach(GdbCmdContext *gdb_ctx, void *user_ctx)
> +{
> +GDBProcess *process;
> +GDBState *s = gdb_ctx->s;
> +uint32_t pid = 1;
> +
> +if (s->multiprocess) {
> +if (!gdb_ctx->num_params) {
> +put_packet(s, "E22");
> +return;
> +}
> +
> +pid = gdb_ctx->params[0].val_ul;
> +}
> +
> +process = gdb_get_process(s, pid);
> +gdb_process_breakpoint_remove_all(s, process);
> +process->attached = false;
> +
> +if (pid == gdb_get_cpu_pid(s, s->c_cpu)) {
> +s->c_cpu = gdb_first_attached_cpu(s);
> +}
> +
> +if (pid == gdb_get_cpu_pid(s, s->g_cpu)) {
> +s->g_cpu = gdb_first_attached_cpu(s);
> +}
> +
> +if (!s->c_cpu) {
> +/* No more process attached */
> +gdb_syscall_mode = GDB_SYS_DISABLED;
> +gdb_continue(s);
> +}
> +put_packet(s, "OK");
> +}
> +
>  static int gdb_handle_packet(GDBState *s, const char *line_buf)
>  {
>  CPUState *cpu;
> @@ -1477,6 +1507,7 @@ static int gdb_handle_packet(GDBState *s, const char 
> *line_buf)
>  uint8_t *registers;
>  target_ulong addr, len;
>  GDBThreadIdKind thread_kind;
> +const GdbCmdParseEntry *cmd_parser = NULL;
>
>  trace_gdbstub_io_command(line_buf);
>
> @@ -1577,42 +1608,15 @@ static int gdb_handle_packet(GDBState *s, const char 
> *line_buf)
>  error_report("QEMU: Terminated via GDBstub");
>  exit(0);
>  case 'D':
> -/* Detach packet */
> -pid = 1;
> -
> -if (s->multiprocess) {
> -unsigned long lpid;
> -if (*p != ';') {
> -put_packet(s, "E22");
> -break;
> -}
> -
> -if (qemu_strtoul(p + 1, &p, 16, &lpid)) {
> -put_packet(s, "E22");
> -break;
> -}
> -
> -pid = lpid;
> -}
> -
> -process = gdb_get_process(s, pid);
> -gdb_process_breakpoint_remove_all(s, process);
> -process->attached = false;
> -
> -if (pid == gdb_get_cpu_pid(s, s->c_cpu)) {
> -s->c_cpu = gdb_first_attached_cpu(s);
> -}
> -
> -if (pid == gdb_get_cpu_pid(s, s->g_cpu)) {
> -s->g_cpu = gdb_first_attached_cpu(s);
> -}
> -
> -if (s->c_cpu == NULL) {
> -/* No more process attached */
> -gdb_syscall_mode = GDB_SYS_DISABLED;
> -gdb_continue(s);
> +{
> +static const GdbCmdParseEntry detach_cmd_desc = {
> +.handler = handle_detach,
> +.cmd = "D",
> +.cmd_startswith = 1,
> +.schema = "?.l0"
> +};
> +cmd_parser = &detach_cmd_desc;
>  }
> -put_packet(s, "OK");
>  break;
>  case 's':
>  if (*p != '\0') {
> @@ -1985,6 +1989,15 @@ static int gdb_handle_packet(GDBState *s, const char 
> *line_buf)
>  put_packet(s, buf);
>  break;
>  }
> +
> +if (cmd_parser) {
> +/* helper will respond */
> +process_string_cmd(s, NULL, line_buf, cmd_parser, 1);
> +} else {
> +/* unknown command, empty respone */
> +put_packet(s, "");
> +}
> +
>  return RS_IDLE;
>  }


--
Alex Bennée



Re: [Qemu-devel] [PATCH v11 02/20] gdbstub: Implement deatch (D pkt) with new infra

2019-05-27 Thread Alex Bennée


Alex Bennée  writes:

> Jon Doron  writes:
>
>> Signed-off-by: Jon Doron 
>
> Reviewed-by: Alex Bennée 

Hmm although I bisected to this patch which fails on:

09:49 alex@zen/x86_64  [linux.git/master@origin] >gdb ./builds/arm64/vmlinux -x 
~/lsrc/qemu.git/tests/guest-debug/test-gdbstub.py
GNU gdb (GDB) 8.3.50.20190424-git
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Executed .gdbinit
Reading symbols from ./builds/arm64/vmlinux...
Traceback (most recent call last):
  File "/home/alex/lsrc/linux.git/builds/arm64/vmlinux-gdb.py", line 30, in 

import linux.config
ImportError: No module named config
Connecting to remote
0x4000 in ?? ()
Checking we can step the first few instructions
warning: Invalid remote reply:

Thread 1 received signal SIGINT, Interrupt.
0x4000 in ?? ()
warning: Invalid remote reply:

Thread 1 received signal SIGINT, Interrupt.
0x4000 in ?? ()
warning: Invalid remote reply:

Thread 1 received signal SIGINT, Interrupt.
0x4000 in ?? ()
FAIL: single step in boot code
Checking HW breakpoint works
Hardware assisted breakpoint 1 at 0xff8010778f0c: file 
/home/alex/lsrc/linux.git/init/main.c, line 1068.
warning: Invalid remote reply:

Thread 1 received signal SIGINT, Interrupt.
0x4000 in ?? ()
0x4000 == {int (void *)} 0xff8010778f0c 
FAIL: hbreak @ kernel_init
Setup catch-all for run_init_process
Breakpoint 2 at 0xff8010083dc4: file /home/alex/lsrc/linux.git/init/main.c, 
line 1009.
Breakpoint 3 at 0xff8010083e10: file /home/alex/lsrc/linux.git/init/main.c, 
line 1020.
Checking Normal breakpoint works
Breakpoint 4 at 0xff801077b300: file 
/home/alex/lsrc/linux.git/kernel/sched/completion.c, line 136.
warning: Invalid remote reply:

Thread 1 received signal SIGINT, Interrupt.
0x4000 in ?? ()
0x4000 == {void (struct completion *)} 0xff801077b300 
 0
FAIL: break @ wait_for_completion
Checking watchpoint works
Hardware access (read/write) watchpoint 5: *(enum system_states 
*)(&system_state)
warning: Invalid remote reply:

Thread 1 received signal SIGINT, Interrupt.
0x4000 in ?? ()
FAIL: awatch for system_state (SYSTEM_BOOTING)
Hardware read watchpoint 6: *(enum system_states *)(&system_state)
warning: Invalid remote reply:

Thread 1 received signal SIGINT, Interrupt.
0x4000 in ?? ()
FAIL: rwatch for system_state (SYSTEM_BOOTING)
Hardware watchpoint 7: *(enum system_states *)(&system_state)
warning: Invalid remote reply:

Thread 1 received signal SIGINT, Interrupt.
0x4000 in ?? ()
FAIL: watch for system_state (SYSTEM_BOOTING)
[Inferior 1 (process 1) killed]


>
>> ---
>>  gdbstub.c | 93 +++
>>  1 file changed, 53 insertions(+), 40 deletions(-)
>>
>> diff --git a/gdbstub.c b/gdbstub.c
>> index e6d895177b..307366b250 100644
>> --- a/gdbstub.c
>> +++ b/gdbstub.c
>> @@ -1413,11 +1413,6 @@ static inline int startswith(const char *string, 
>> const char *pattern)
>>return !strncmp(string, pattern, strlen(pattern));
>>  }
>>
>> -static int process_string_cmd(
>> -GDBState *s, void *user_ctx, const char *data,
>> -const GdbCmdParseEntry *cmds, int num_cmds)
>> -__attribute__((unused));
>> -
>>  static int process_string_cmd(GDBState *s, void *user_ctx, const char *data,
>>const GdbCmdParseEntry *cmds, int num_cmds)
>>  {
>> @@ -1463,6 +1458,41 @@ static int process_string_cmd(GDBState *s, void 
>> *user_ctx, const char *data,
>>  return -1;
>>  }
>>
>> +static void handle_detach(GdbCmdContext *gdb_ctx, void *user_ctx)
>> +{
>> +GDBProcess *process;
>> +GDBState *s = gdb_ctx->s;
>> +uint32_t pid = 1;
>> +
>> +if (s->multiprocess) {
>> +if (!gdb_ctx->num_params) {
>> +put_packet(s, "E22");
>> +return;
>> +}
>> +
>> +pid = gdb_ctx->params[0].val_ul;
>> +}
>> +
>> +process = gdb_get_process(s, pid);
>> +gdb_process_breakpoint_remove_all(s, process);
>> +process->attached = false;
>> +
>> +if (pid == gdb_get_cpu_pid(s, s->c_cpu)) {
>> +s->c_cpu = gdb_first_attached_cpu(s);
>> +}
>> +
>> +if (pid == gdb_get_cpu_pid(s, s->g_cpu)) {
>> +s->g_cpu = gdb_first_attached_cpu(s);
>> +}
>> +
>> +if (!s->c_cpu) {

[Qemu-devel] [Bug 1590322] Re: mouse_button 0 takes back to initial position

2019-05-27 Thread Thomas Huth
Did you mean "mouse_button 0" instead of "mouse_move 0" here?

** Changed in: qemu
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1590322

Title:
  mouse_button 0 takes back to initial position

Status in QEMU:
  Incomplete

Bug description:
  i wrote a python script to perform some drag function in the Qemu simulator.
  mouse_move x , y
  mouse_button 1
  mouse_move new_x,new_y
  mouse_move 0

  
  The mouse_move 0 doesn't release the mouse in the position new_x,new_y 
instead it takes  it back to the point x,y and then releases the mouse

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1590322/+subscriptions



[Qemu-devel] [Bug 1585533] Re: cache-miss-rate / Invalid JSON

2019-05-27 Thread Thomas Huth
Is there still something to be done for upstream QEMU here? ...
otherwise, I assume we can close this bug now?

** Changed in: qemu
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1585533

Title:
  cache-miss-rate / Invalid JSON

Status in QEMU:
  Incomplete

Bug description:
  Hi,

  We have VMs which were started with an older version than qemu 2.1
  which added "cache-miss-rate" property for XBZRLECacheStats. While
  trying to migrate the VM to a new host which is running a higher
  version (2.3) of Qemu we got an exception:

  virJSONValueFromString:1642 : internal error: cannot parse json {"return": 
{"expected-downtime": 1, "xbzrle-cache": {"bytes": 0, "cache-size": 67108864, 
"cache-miss-rate": -nan, "pages": 0, "overflow": 0, "cache-miss": 8933}, 
"status": "active", "disk": {"total": 429496729600, "dirty-sync-count": 0, 
"remaining": 193896382464, "mbps": 0, "transferred": 235600347136, "duplicate": 
0, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 0, "normal": 0}, 
"setup-time": 13, "total-time": 1543124, "ram": {"total": 8599183360, 
"dirty-sync-count": 4, "remaining": 30695424, "mbps": 830.636997, 
"transferred": 3100448901, "duplicate": 1358341, "dirty-pages-rate": 7, 
"skipped": 0, "normal-bytes": 3082199040, "normal": 752490}}, "id": 
"libvirt-186200"}: lexical error: malformed number, a digit is required after 
the minus sign.
67108864, "cache-miss-rate": -nan, "pages": 0, "overflow": 0
   (right here) --^

  virNetClientStreamRaiseError:191 : stream aborted at client request

  
  Would it be possible to improve the JSON parser to skip the key if the value 
is incorrect instead of throwing an exception? Then hopefully qemu 2.3 or 
higher is able to handle the data without this property, falling back to its 
default.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1585533/+subscriptions



Re: [Qemu-devel] [PATCH v11 02/20] gdbstub: Implement deatch (D pkt) with new infra

2019-05-27 Thread Alex Bennée


Alex Bennée  writes:

> Alex Bennée  writes:
>
>> Jon Doron  writes:
>>
>>> Signed-off-by: Jon Doron 
>>
>> Reviewed-by: Alex Bennée 
>
> Hmm although I bisected to this patch which fails on:
>
> 09:49 alex@zen/x86_64  [linux.git/master@origin] >gdb ./builds/arm64/vmlinux 
> -x ~/lsrc/qemu.git/tests/guest-debug/test-gdbstub.py

> Connecting to remote
> 0x4000 in ?? ()
> Checking we can step the first few instructions
> warning: Invalid remote reply:

>>>  }
>>> +
>>> +if (cmd_parser) {
>>> +/* helper will respond */
>>> +process_string_cmd(s, NULL, line_buf, cmd_parser, 1);
>>> +} else {
>>> +/* unknown command, empty respone */
>>> +put_packet(s, "");
>>> +}
>>> +

We can't default to this empty response until we have converted the
table otherwise we get strangeness and double responses.

>>>  return RS_IDLE;
>>>  }


--
Alex Bennée



Re: [Qemu-devel] [PATCH v3 0/8] Add support for io_uring

2019-05-27 Thread Stefan Hajnoczi
On Mon, May 27, 2019 at 01:25:56AM -0700, no-re...@patchew.org wrote:
> Patchew URL: 
> https://patchew.org/QEMU/20190527080327.10780-1-mehta.aar...@gmail.com/
> 
> 
> 
> Hi,
> 
> This series seems to have some coding style problems. See output below for
> more information:
> 
> Subject: [Qemu-devel] [PATCH v3 0/8] Add support for io_uring
> Type: series
> Message-id: 20190527080327.10780-1-mehta.aar...@gmail.com
> 
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
> git rev-parse base > /dev/null || exit 0
> git config --local diff.renamelimit 0
> git config --local diff.renames True
> git config --local diff.algorithm histogram
> ./scripts/checkpatch.pl --mailback base..
> === TEST SCRIPT END ===
> 
> Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
> Switched to a new branch 'test'
> 75fc7f1 block/fileposix: extend to use io_uring
> d03ae39 blockdev: accept io_uring as option
> cae30ee util/async: add aio interfaces for io_uring
> f3be807 stubs: add stubs for io_uring interface
> 85c03de block/io_uring: implements interfaces for io_uring
> 5c4a14a block/block: add BDRV flag for io_uring
> 9a6594d qapi/block-core: add option for io_uring
> 460c72d configure: permit use of io_uring
> 
> === OUTPUT BEGIN ===
> 1/8 Checking commit 460c72d1a8df (configure: permit use of io_uring)
> 2/8 Checking commit 9a6594daa76c (qapi/block-core: add option for io_uring)
> 3/8 Checking commit 5c4a14a301f5 (block/block: add BDRV flag for io_uring)
> 4/8 Checking commit 85c03de16186 (block/io_uring: implements interfaces for 
> io_uring)
> WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
> #49: 
> new file mode 100644
> 
> ERROR: space required before the open parenthesis '('
> #196: FILE: block/io_uring.c:143:
> +while(!s->io_q.in_queue) {
> 
> ERROR: trailing whitespace
> #209: FILE: block/io_uring.c:156:
> +if (ret <= 0) { $
> 
> total: 2 errors, 1 warnings, 387 lines checked
> 
> Patch 4/8 has style problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
> 
> 5/8 Checking commit f3be80708ad1 (stubs: add stubs for io_uring interface)
> WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
> #35: 
> new file mode 100644
> 
> total: 0 errors, 1 warnings, 46 lines checked
> 
> Patch 5/8 has style problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
> 6/8 Checking commit cae30ee1388f (util/async: add aio interfaces for io_uring)
> 7/8 Checking commit d03ae39c331c (blockdev: accept io_uring as option)
> 8/8 Checking commit 75fc7f1d8a3e (block/fileposix: extend to use io_uring)
> === OUTPUT END ===

Hi Aarushi,
I use this git hook to identify checkpatch.pl issues at git-commit(1)
time:
http://blog.vmsplice.net/2011/03/how-to-automatically-run-checkpatchpl.html

This way I don't need to resend patch series because the issues were
already taken care of earlier in the development process.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v4 00/20] monitor: add asynchronous command type

2019-05-27 Thread Gerd Hoffmann
On Mon, May 27, 2019 at 10:18:42AM +0200, Markus Armbruster wrote:
> Marc-André Lureau  writes:
> 
> > Hi
> >
> > On Thu, May 23, 2019 at 9:52 AM Markus Armbruster  wrote:
> >> I'm not sure how asynchronous commands could support reconnect and
> >> resume.
> >
> > The same way as current commands, including job commands.
> 
> Consider the following scenario: a management application such as
> libvirt starts a long-running task with the intent to monitor it until
> it finishes.  Half-way through, the management application needs to
> disconnect and reconnect for some reason (systemctl restart, or crash &
> recover, or whatever).
> 
> If the long-running task is a job, the management application can resume
> after reconnect: the job's ID is as valid as it was before, and the
> commands to query and control the job work as before.
> 
> What if it's and asynchronous command?

This is not meant for some long-running job which you have to manage.

Allowing commands being asynchronous makes sense for things which (a)
typically don't take long, and (b) don't need any management.

So, if the connection goes down the job is simply canceled, and after
reconnecting the management can simply send the same command again.

> > Whenever we can solve things on qemu side, I would rather not
> > deprecate current API.
> 
> Making a synchronous command asynchronous definitely changes API.

Inside qemu yes, sure.  But for the QMP client nothing changes.

cheers,
  Gerd




[Qemu-devel] [PATCH] virtio-gpu: add sanity check

2019-05-27 Thread Gerd Hoffmann
Require a minimum 16x16 size for the scanout, to make sure the guest
can't set either width or height to zero.  This (a) doesn't make sense
at all and (b) causes problems in some UI code.  When using spice this
will triggers an assert().

Reported-by: Tyler Slabinski 
Signed-off-by: Gerd Hoffmann 
---
 hw/display/virtio-gpu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 9e37e0ac96b7..372b31ef0af2 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -677,6 +677,8 @@ static void virtio_gpu_set_scanout(VirtIOGPU *g,
 
 if (ss.r.x > res->width ||
 ss.r.y > res->height ||
+ss.r.width < 16 ||
+ss.r.height < 16 ||
 ss.r.width > res->width ||
 ss.r.height > res->height ||
 ss.r.x + ss.r.width > res->width ||
-- 
2.18.1




Re: [Qemu-devel] [PATCH] tests/docker: Update the Fedora image to Fedora 30

2019-05-27 Thread no-reply
Patchew URL: https://patchew.org/QEMU/20190504060336.21060-1-phi...@redhat.com/



Hi,

This series failed build test on s390x host. Please find the details below.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
# Testing script will be invoked under the git checkout with
# HEAD pointing to a commit that has the patches applied on top of "base"
# branch
set -e
CC=$HOME/bin/cc
INSTALL=$PWD/install
BUILD=$PWD/build
mkdir -p $BUILD $INSTALL
SRC=$PWD
cd $BUILD
$SRC/configure --cc=$CC --prefix=$INSTALL
make -j4
# XXX: we need reliable clean up
# make check -j4 V=1
make install

echo
echo "=== ENV ==="
env

echo
echo "=== PACKAGES ==="
rpm -qa
=== TEST SCRIPT END ===

inlined from ‘fill_psinfo’ at 
/var/tmp/patchew-tester-tmp-3g8u4fv2/src/linux-user/elfload.c:3208:12,
inlined from ‘fill_note_info’ at 
/var/tmp/patchew-tester-tmp-3g8u4fv2/src/linux-user/elfload.c:3390:5,
inlined from ‘elf_core_dump’ at 
/var/tmp/patchew-tester-tmp-3g8u4fv2/src/linux-user/elfload.c:3539:9:
/usr/include/bits/string_fortified.h:106:10: error: ‘__builtin_strncpy’ 
specified bound 16 equals destination size [-Werror=stringop-truncation]
  106 |   return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest));
  |  ^~
cc1: all warnings being treated as errors


The full log is available at
http://patchew.org/logs/20190504060336.21060-1-phi...@redhat.com/testing.s390x/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[Qemu-devel] [PATCH] block/linux-aio: explictly clear laiocb->co

2019-05-27 Thread Stefan Hajnoczi
qemu_aio_get() does not zero allocated memory.  Explicitly initialize
laiocb->co to prevent an uninitialized memory access in
qemu_laio_process_completion().

Note that this bug has never manifested itself.  I guess we're lucky!

Signed-off-by: Stefan Hajnoczi 
---
I challenge you to find a place where laiocb->co is initialized and then
we can drop this patch.  I've double-checked and cannot find it...

 block/linux-aio.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index d4b61fb251..a097653be6 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -440,6 +440,7 @@ BlockAIOCB *laio_submit(BlockDriverState *bs, LinuxAioState 
*s, int fd,
 int ret;
 
 laiocb = qemu_aio_get(&laio_aiocb_info, bs, cb, opaque);
+laiocb->co = NULL;
 laiocb->nbytes = nb_sectors * BDRV_SECTOR_SIZE;
 laiocb->ctx = s;
 laiocb->ret = -EINPROGRESS;
-- 
2.21.0




Re: [Qemu-devel] [PATCH v3 4/8] block/io_uring: implements interfaces for io_uring

2019-05-27 Thread Stefan Hajnoczi
On Mon, May 27, 2019 at 01:33:23PM +0530, Aarushi Mehta wrote:
> +static void qemu_luring_process_completions(LuringState *s)
> +{
> +struct io_uring_cqe *cqes;
> +/*
> + * Request completion callbacks can run the nested event loop.
> + * Schedule ourselves so the nested event loop will "see" remaining
> + * completed requests and process them.  Without this, completion
> + * callbacks that wait for other requests using a nested event loop
> + * would hang forever.
> + */
> +qemu_bh_schedule(s->completion_bh);
> +
> +while (!io_uring_peek_cqe(&s->ring, &cqes)) {
> +io_uring_cqe_seen(&s->ring, cqes);

The kernel may overwrite the cqe once we've marked it seen.  Therefore
the cqe must only be marked seen after the last access to it.  This is
analogous to a use-after-free bug: we're not allowed to access fields of
an object after it has been freed.

The place to do so is...

> +
> +LuringAIOCB *luringcb = io_uring_cqe_get_data(cqes);
> +luringcb->ret = io_cqe_ret(cqes);

...here:

  io_uring_cqe_seen(&s->ring, cqes);
  cqes = NULL; /* returned to ring, don't access it anymore */

> +if (luringcb->co) {
> +/*
> + * If the coroutine is already entered it must be in ioq_submit()
> + * and will notice luringcb->ret has been filled in when it
> + * eventually runs later. Coroutines cannot be entered 
> recursively
> + * so avoid doing that!
> + */
> +if (!qemu_coroutine_entered(luringcb->co)) {
> +aio_co_wake(luringcb->co);
> +}
> +} else {
> +luringcb->common.cb(luringcb->common.opaque, luringcb->ret);
> +qemu_aio_unref(luringcb);
> +}
> +/* Change counters one-by-one because we can be nested. */
> +s->io_q.in_flight--;

This counter must be decremented before invoking luringcb's callback.
That way the nested event loop doesn't consider this completed request
in flight anymore.

> +static void ioq_submit(LuringState *s)
> +{
> +int ret;
> +LuringAIOCB *luringcb, *luringcb_next;
> +
> +while(!s->io_q.in_queue) {

Should this be while (s->io_q.in_queue > 0)?

> +QSIMPLEQ_FOREACH_SAFE(luringcb, &s->io_q.sq_overflow, next,
> +  luringcb_next) {
> +struct io_uring_sqe *sqes = io_uring_get_sqe(&s->ring);
> +if (!sqes) {
> +break;
> +}
> +/* Prep sqe for submission */
> +*sqes = luringcb->sqeq;
> +io_uring_sqe_set_data(sqes, luringcb);

This is unnecessary, the data field has already been set in
luring_do_submit() and copied to *sqes in the previous line.

> +BlockAIOCB *luring_submit(BlockDriverState *bs, LuringState *s, int fd,
> +int64_t sector_num, QEMUIOVector *qiov, BlockCompletionFunc *cb,
> +void *opaque, int type)
> +{
> +LuringAIOCB *luringcb;
> +off_t offset = sector_num * BDRV_SECTOR_SIZE;
> +int ret;
> +
> +luringcb = qemu_aio_get(&luring_aiocb_info, bs, cb, opaque);
> +luringcb->ret = -EINPROGRESS;

luringcb isn't zeroed by qemu_aio_get().  luringcb->co must be
explicitly set to NULL to prevent undefined behavior in
qemu_luring_process_completions() (uninitialized memory access).

  luring->co = NULL;

By the way, this bug originates from linux-aio.c.  I have sent a patch
to fix it there!

> +ret = luring_do_submit(fd, luringcb, s, offset, qiov, type);
> +if (ret < 0) {
> +qemu_aio_unref(luringcb);
> +return NULL;
> +}
> +
> +return &luringcb->common;
> +}
> +
> +void luring_detach_aio_context(LuringState *s, AioContext *old_context)
> +{
> +aio_set_fd_handler(old_context, s->ring.ring_fd, false, NULL, NULL, NULL,
> +   &s);
> +qemu_bh_delete(s->completion_bh);
> +s->aio_context = NULL;
> +}
> +
> +void luring_attach_aio_context(LuringState *s, AioContext *new_context)
> +{
> +s->aio_context = new_context;
> +s->completion_bh = aio_bh_new(new_context, qemu_luring_completion_bh, s);
> +aio_set_fd_handler(s->aio_context, s->ring.ring_fd, false,
> +   qemu_luring_completion_cb, NULL, NULL, &s);
> +}
> +
> +LuringState *luring_init(Error **errp)
> +{
> +int rc;
> +LuringState *s;
> +s = g_malloc0(sizeof(*s));
> +struct io_uring *ring = &s->ring;
> +rc =  io_uring_queue_init(MAX_EVENTS, ring, 0);
> +if (rc == -1) {
> +error_setg_errno(errp, -rc, "failed to init linux io_uring ring");

Why was this changed from error_setg_errno(errp, errno, "failed to init
linux io_uring ring") to -rc in v3?

rc is -1 here, not an errno value, so the error message will be
incorrect.


signature.asc
Description: PGP signature


[Qemu-devel] [PATCH 0/2] Deferred incoming migration through fd

2019-05-27 Thread Yury Kotov
Hi,

This series is a continuation of the previous two:
* migration: Fix handling fd protocol
* Add 'inline-fd:' protocol for migration

It's about such use case:
1. Target VM: exec ...,-incoming defer
2. Target VM: getfd("fd-mig")
3. Target VM: migrate-incoming("fd:"fd-mig")
4. Source VM: getfd("fd-mig")
5. Source VM: migrate("fd-mig")

Currently, it's not possible to do the step 3, because for incoming migration
"fd:" protocol expects an integer, not the name of fd.

Yury Kotov (2):
  migration: Fix fd protocol for incoming defer
  migration-test: Add a test for fd protocol

 migration/fd.c |   8 +--
 migration/fd.h |   2 +-
 tests/libqtest.c   |  83 ++--
 tests/libqtest.h   |  51 +++-
 tests/migration-test.c | 107 +++--
 5 files changed, 239 insertions(+), 12 deletions(-)

-- 
2.21.0




[Qemu-devel] [PATCH 1/2] migration: Fix fd protocol for incoming defer

2019-05-27 Thread Yury Kotov
Currently, incoming migration through fd supports only command-line case:
E.g.
fork();
fd = open();
exec("qemu ... -incoming fd:%d", fd);

It's possible to use add-fd commands to pass fd for migration, but it's
invalid case. add-fd works with fdset but not with particular fds.

To work with getfd in incoming defer it's enough to use monitor_fd_param
instead of strtol. monitor_fd_param supports both cases:
* fd:123
* fd:fd_name (added by getfd).

Signed-off-by: Yury Kotov 
---
 migration/fd.c | 8 +---
 migration/fd.h | 2 +-
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/migration/fd.c b/migration/fd.c
index a7c13df4ad..0a29ecdebf 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -52,12 +52,14 @@ static gboolean fd_accept_incoming_migration(QIOChannel 
*ioc,
 return G_SOURCE_REMOVE;
 }
 
-void fd_start_incoming_migration(const char *infd, Error **errp)
+void fd_start_incoming_migration(const char *fdname, Error **errp)
 {
 QIOChannel *ioc;
-int fd;
+int fd = monitor_fd_param(cur_mon, fdname, errp);
+if (fd == -1) {
+return;
+}
 
-fd = strtol(infd, NULL, 0);
 trace_migration_fd_incoming(fd);
 
 ioc = qio_channel_new_fd(fd, errp);
diff --git a/migration/fd.h b/migration/fd.h
index a14a63ce2e..b901bc014e 100644
--- a/migration/fd.h
+++ b/migration/fd.h
@@ -16,7 +16,7 @@
 
 #ifndef QEMU_MIGRATION_FD_H
 #define QEMU_MIGRATION_FD_H
-void fd_start_incoming_migration(const char *path, Error **errp);
+void fd_start_incoming_migration(const char *fdname, Error **errp);
 
 void fd_start_outgoing_migration(MigrationState *s, const char *fdname,
  Error **errp);
-- 
2.21.0




[Qemu-devel] New HAXM maintainer

2019-05-27 Thread Yu Ning

Hello,

I am leaving Intel, and will soon lose access to my Intel email 
accounts. Effective today, I am no longer maintainer of the HAXM open 
source project (https://github.com/intel/haxm). Colin Xu (colin DOT xu 
AT intel DOT com) will take my place, and he will be helped by Henry 
Yuan (hang DOT yuan AT intel DOT com) as well as the rest of the HAXM 
team at Intel (team email: haxm DASH team AT intel DOT com).


I am grateful for all the support this community has given to HAXM and 
myself over the past few years. I would appreciate your continued 
support for the project and the Intel HAXM team.


Thanks,
Yu



[Qemu-devel] [PATCH 2/2] migration-test: Add a test for fd protocol

2019-05-27 Thread Yury Kotov
Signed-off-by: Yury Kotov 
---
 tests/libqtest.c   |  83 ++--
 tests/libqtest.h   |  51 +++-
 tests/migration-test.c | 107 +++--
 3 files changed, 233 insertions(+), 8 deletions(-)

diff --git a/tests/libqtest.c b/tests/libqtest.c
index 8ac0c02af4..de8468d213 100644
--- a/tests/libqtest.c
+++ b/tests/libqtest.c
@@ -32,6 +32,7 @@
 
 #define MAX_IRQ 256
 #define SOCKET_TIMEOUT 50
+#define SOCKET_MAX_FDS 16
 
 QTestState *global_qtest;
 
@@ -391,6 +392,43 @@ static void GCC_FMT_ATTR(2, 3) qtest_sendf(QTestState *s, 
const char *fmt, ...)
 va_end(ap);
 }
 
+static void socket_send_fds(int fd, int *fds, size_t fds_num,
+const char *buf, size_t buf_size)
+{
+#ifndef WIN32
+ssize_t ret;
+struct msghdr msg = { 0 };
+char control[CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS)] = { 0 };
+size_t fdsize = sizeof(int) * fds_num;
+struct cmsghdr *cmsg;
+struct iovec iov = { .iov_base = (char *)buf, .iov_len = buf_size };
+
+msg.msg_iov = &iov;
+msg.msg_iovlen = 1;
+
+if (fds && fds_num > 0) {
+g_assert_cmpuint(fds_num, <, SOCKET_MAX_FDS);
+
+msg.msg_control = control;
+msg.msg_controllen = CMSG_SPACE(fdsize);
+
+cmsg = CMSG_FIRSTHDR(&msg);
+cmsg->cmsg_len = CMSG_LEN(fdsize);
+cmsg->cmsg_level = SOL_SOCKET;
+cmsg->cmsg_type = SCM_RIGHTS;
+memcpy(CMSG_DATA(cmsg), fds, fdsize);
+}
+
+do {
+ret = sendmsg(fd, &msg, 0);
+} while (ret < 0 && errno == EINTR);
+g_assert_cmpint(ret, >, 0);
+#else
+g_test_skip("sendmsg is not supported under Win32");
+return;
+#endif
+}
+
 static GString *qtest_recv_line(QTestState *s)
 {
 GString *line;
@@ -545,7 +583,8 @@ QDict *qtest_qmp_receive(QTestState *s)
  * in the case that they choose to discard all replies up until
  * a particular EVENT is received.
  */
-void qmp_fd_vsend(int fd, const char *fmt, va_list ap)
+void qmp_fd_vsend_fds(int fd, int *fds, size_t fds_num,
+  const char *fmt, va_list ap)
 {
 QObject *qobj;
 
@@ -569,25 +608,49 @@ void qmp_fd_vsend(int fd, const char *fmt, va_list ap)
 fprintf(stderr, "%s", str);
 }
 /* Send QMP request */
-socket_send(fd, str, qstring_get_length(qstr));
+if (fds && fds_num > 0) {
+socket_send_fds(fd, fds, fds_num, str, qstring_get_length(qstr));
+} else {
+socket_send(fd, str, qstring_get_length(qstr));
+}
 
 qobject_unref(qstr);
 qobject_unref(qobj);
 }
 }
 
+void qmp_fd_vsend(int fd, const char *fmt, va_list ap)
+{
+qmp_fd_vsend_fds(fd, NULL, 0, fmt, ap);
+}
+
+void qtest_qmp_vsend_fds(QTestState *s, int *fds, size_t fds_num,
+ const char *fmt, va_list ap)
+{
+qmp_fd_vsend_fds(s->qmp_fd, fds, fds_num, fmt, ap);
+}
+
 void qtest_qmp_vsend(QTestState *s, const char *fmt, va_list ap)
 {
-qmp_fd_vsend(s->qmp_fd, fmt, ap);
+qmp_fd_vsend_fds(s->qmp_fd, NULL, 0, fmt, ap);
 }
 
 QDict *qmp_fdv(int fd, const char *fmt, va_list ap)
 {
-qmp_fd_vsend(fd, fmt, ap);
+qmp_fd_vsend_fds(fd, NULL, 0, fmt, ap);
 
 return qmp_fd_receive(fd);
 }
 
+QDict *qtest_vqmp_fds(QTestState *s, int *fds, size_t fds_num,
+  const char *fmt, va_list ap)
+{
+qtest_qmp_vsend_fds(s, fds, fds_num, fmt, ap);
+
+/* Receive reply */
+return qtest_qmp_receive(s);
+}
+
 QDict *qtest_vqmp(QTestState *s, const char *fmt, va_list ap)
 {
 qtest_qmp_vsend(s, fmt, ap);
@@ -616,6 +679,18 @@ void qmp_fd_send(int fd, const char *fmt, ...)
 va_end(ap);
 }
 
+QDict *qtest_qmp_fds(QTestState *s, int *fds, size_t fds_num,
+ const char *fmt, ...)
+{
+va_list ap;
+QDict *response;
+
+va_start(ap, fmt);
+response = qtest_vqmp_fds(s, fds, fds_num, fmt, ap);
+va_end(ap);
+return response;
+}
+
 QDict *qtest_qmp(QTestState *s, const char *fmt, ...)
 {
 va_list ap;
diff --git a/tests/libqtest.h b/tests/libqtest.h
index a98ea15b7d..e61ebaced1 100644
--- a/tests/libqtest.h
+++ b/tests/libqtest.h
@@ -84,6 +84,21 @@ QTestState *qtest_init_with_serial(const char *extra_args, 
int *sock_fd);
  */
 void qtest_quit(QTestState *s);
 
+/**
+ * qtest_qmp_fds:
+ * @s: #QTestState instance to operate on.
+ * @fds: array of file descriptors
+ * @fds_num: number of elements in @fds
+ * @fmt...: QMP message to send to qemu, formatted like
+ * qobject_from_jsonf_nofail().  See parse_escape() for what's
+ * supported after '%'.
+ *
+ * Sends a QMP message to QEMU with fds and returns the response.
+ */
+QDict *qtest_qmp_fds(QTestState *s, int *fds, size_t fds_num,
+ const char *fmt, ...)
+GCC_FMT_ATTR(4, 5);
+
 /**
  * qtest_qmp:
  * @s: #QTestState instance to operate on.
@@ -120,7 +135,23 @@ void qtest_qmp_send_raw(QTestState *s, const char *fmt, 
...)
 GCC_FMT_ATTR(2, 3);
 
 /**
-

Re: [Qemu-devel] [PATCH 2/2] migration-test: Add a test for fd protocol

2019-05-27 Thread Yury Kotov



27.05.2019, 12:35, "Yury Kotov" :
> Signed-off-by: Yury Kotov 
> ---
>  tests/libqtest.c | 83 ++--
>  tests/libqtest.h | 51 +++-
>  tests/migration-test.c | 107 +++--
>  3 files changed, 233 insertions(+), 8 deletions(-)
>
> diff --git a/tests/libqtest.c b/tests/libqtest.c
> index 8ac0c02af4..de8468d213 100644
> --- a/tests/libqtest.c
> +++ b/tests/libqtest.c
> @@ -32,6 +32,7 @@
>
>  #define MAX_IRQ 256
>  #define SOCKET_TIMEOUT 50
> +#define SOCKET_MAX_FDS 16
>
>  QTestState *global_qtest;
>
> @@ -391,6 +392,43 @@ static void GCC_FMT_ATTR(2, 3) qtest_sendf(QTestState 
> *s, const char *fmt, ...)
>  va_end(ap);
>  }
>
> +static void socket_send_fds(int fd, int *fds, size_t fds_num,
> + const char *buf, size_t buf_size)
> +{
> +#ifndef WIN32
> + ssize_t ret;
> + struct msghdr msg = { 0 };
> + char control[CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS)] = { 0 };
> + size_t fdsize = sizeof(int) * fds_num;
> + struct cmsghdr *cmsg;
> + struct iovec iov = { .iov_base = (char *)buf, .iov_len = buf_size };
> +
> + msg.msg_iov = &iov;
> + msg.msg_iovlen = 1;
> +
> + if (fds && fds_num > 0) {
> + g_assert_cmpuint(fds_num, <, SOCKET_MAX_FDS);
> +
> + msg.msg_control = control;
> + msg.msg_controllen = CMSG_SPACE(fdsize);
> +
> + cmsg = CMSG_FIRSTHDR(&msg);
> + cmsg->cmsg_len = CMSG_LEN(fdsize);
> + cmsg->cmsg_level = SOL_SOCKET;
> + cmsg->cmsg_type = SCM_RIGHTS;
> + memcpy(CMSG_DATA(cmsg), fds, fdsize);
> + }
> +
> + do {
> + ret = sendmsg(fd, &msg, 0);
> + } while (ret < 0 && errno == EINTR);
> + g_assert_cmpint(ret, >, 0);
> +#else
> + g_test_skip("sendmsg is not supported under Win32");
> + return;
> +#endif
> +}
> +
>  static GString *qtest_recv_line(QTestState *s)
>  {
>  GString *line;
> @@ -545,7 +583,8 @@ QDict *qtest_qmp_receive(QTestState *s)
>   * in the case that they choose to discard all replies up until
>   * a particular EVENT is received.
>   */
> -void qmp_fd_vsend(int fd, const char *fmt, va_list ap)
> +void qmp_fd_vsend_fds(int fd, int *fds, size_t fds_num,
> + const char *fmt, va_list ap)
>  {
>  QObject *qobj;
>
> @@ -569,25 +608,49 @@ void qmp_fd_vsend(int fd, const char *fmt, va_list ap)
>  fprintf(stderr, "%s", str);
>  }
>  /* Send QMP request */
> - socket_send(fd, str, qstring_get_length(qstr));
> + if (fds && fds_num > 0) {
> + socket_send_fds(fd, fds, fds_num, str, qstring_get_length(qstr));
> + } else {
> + socket_send(fd, str, qstring_get_length(qstr));
> + }
>
>  qobject_unref(qstr);
>  qobject_unref(qobj);
>  }
>  }
>
> +void qmp_fd_vsend(int fd, const char *fmt, va_list ap)
> +{
> + qmp_fd_vsend_fds(fd, NULL, 0, fmt, ap);
> +}
> +
> +void qtest_qmp_vsend_fds(QTestState *s, int *fds, size_t fds_num,
> + const char *fmt, va_list ap)
> +{
> + qmp_fd_vsend_fds(s->qmp_fd, fds, fds_num, fmt, ap);
> +}
> +
>  void qtest_qmp_vsend(QTestState *s, const char *fmt, va_list ap)
>  {
> - qmp_fd_vsend(s->qmp_fd, fmt, ap);
> + qmp_fd_vsend_fds(s->qmp_fd, NULL, 0, fmt, ap);
>  }
>
>  QDict *qmp_fdv(int fd, const char *fmt, va_list ap)
>  {
> - qmp_fd_vsend(fd, fmt, ap);
> + qmp_fd_vsend_fds(fd, NULL, 0, fmt, ap);
>
>  return qmp_fd_receive(fd);
>  }
>
> +QDict *qtest_vqmp_fds(QTestState *s, int *fds, size_t fds_num,
> + const char *fmt, va_list ap)
> +{
> + qtest_qmp_vsend_fds(s, fds, fds_num, fmt, ap);
> +
> + /* Receive reply */
> + return qtest_qmp_receive(s);
> +}
> +
>  QDict *qtest_vqmp(QTestState *s, const char *fmt, va_list ap)
>  {
>  qtest_qmp_vsend(s, fmt, ap);
> @@ -616,6 +679,18 @@ void qmp_fd_send(int fd, const char *fmt, ...)
>  va_end(ap);
>  }
>
> +QDict *qtest_qmp_fds(QTestState *s, int *fds, size_t fds_num,
> + const char *fmt, ...)
> +{
> + va_list ap;
> + QDict *response;
> +
> + va_start(ap, fmt);
> + response = qtest_vqmp_fds(s, fds, fds_num, fmt, ap);
> + va_end(ap);
> + return response;
> +}
> +
>  QDict *qtest_qmp(QTestState *s, const char *fmt, ...)
>  {
>  va_list ap;
> diff --git a/tests/libqtest.h b/tests/libqtest.h
> index a98ea15b7d..e61ebaced1 100644
> --- a/tests/libqtest.h
> +++ b/tests/libqtest.h
> @@ -84,6 +84,21 @@ QTestState *qtest_init_with_serial(const char *extra_args, 
> int *sock_fd);
>   */
>  void qtest_quit(QTestState *s);
>
> +/**
> + * qtest_qmp_fds:
> + * @s: #QTestState instance to operate on.
> + * @fds: array of file descriptors
> + * @fds_num: number of elements in @fds
> + * @fmt...: QMP message to send to qemu, formatted like
> + * qobject_from_jsonf_nofail(). See parse_escape() for what's
> + * supported after '%'.
> + *
> + * Sends a QMP message to QEMU with fds and returns the response.
> + */
> +QDict *qtest_qmp_fds(QTestState *s, int *fds, size_t fds_num,
> + const char *fmt, ...)
> + GCC_FMT_ATTR(4, 5);
> +
>  /**
>   * qtest_qmp:
>   * @s: #QTestState instance to operate on.
> @@ -120,7 +135,23 @@ void qtest_qmp_send_raw(QTestState *s, const char *fmt, 
> ...)
>  GCC_FMT_ATTR(2, 3)

Re: [Qemu-devel] [PATCH v3 8/8] block/fileposix: extend to use io_uring

2019-05-27 Thread Stefan Hajnoczi
On Mon, May 27, 2019 at 01:33:27PM +0530, Aarushi Mehta wrote:
> @@ -1920,24 +1947,40 @@ static int coroutine_fn 
> raw_co_pwritev(BlockDriverState *bs, uint64_t offset,
>  
>  static void raw_aio_plug(BlockDriverState *bs)
>  {
> -#ifdef CONFIG_LINUX_AIO
> +#if defined CONFIG_LINUX_AIO || defined CONFIG_LINUX_IO_URING
>  BDRVRawState *s = bs->opaque;
> +#endif

It would be nice to avoid the extra ifdefs.  Here is an alternative
without #ifdef:

  BDRVRawState __attribute__((unused)) *s = bs->opaque;

> @@ -1963,8 +2006,10 @@ static int raw_co_flush_to_disk(BlockDriverState *bs)
>  static void raw_aio_attach_aio_context(BlockDriverState *bs,
> AioContext *new_context)
>  {
> +#if defined CONFIG_LINUX_AIO || defined CONFIG_LINUX_IO_URING
> +BDRVRawState *s = bs->opaque;

Indentation?


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 2/2] migration-test: Add a test for fd protocol

2019-05-27 Thread Thomas Huth
On 27/05/2019 11.33, Yury Kotov wrote:
> Signed-off-by: Yury Kotov 
> ---
>  tests/libqtest.c   |  83 ++--
>  tests/libqtest.h   |  51 +++-
>  tests/migration-test.c | 107 +++--
>  3 files changed, 233 insertions(+), 8 deletions(-)
> 
> diff --git a/tests/libqtest.c b/tests/libqtest.c
> index 8ac0c02af4..de8468d213 100644
> --- a/tests/libqtest.c
> +++ b/tests/libqtest.c
> @@ -32,6 +32,7 @@
>  
>  #define MAX_IRQ 256
>  #define SOCKET_TIMEOUT 50
> +#define SOCKET_MAX_FDS 16
>  
>  QTestState *global_qtest;
>  
> @@ -391,6 +392,43 @@ static void GCC_FMT_ATTR(2, 3) qtest_sendf(QTestState 
> *s, const char *fmt, ...)
>  va_end(ap);
>  }

A short description in front of the function about its purpose would be
nice.

> +static void socket_send_fds(int fd, int *fds, size_t fds_num,
> +const char *buf, size_t buf_size)
> +{
> +#ifndef WIN32
> +ssize_t ret;
> +struct msghdr msg = { 0 };
> +char control[CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS)] = { 0 };
> +size_t fdsize = sizeof(int) * fds_num;
> +struct cmsghdr *cmsg;
> +struct iovec iov = { .iov_base = (char *)buf, .iov_len = buf_size };
> +
> +msg.msg_iov = &iov;
> +msg.msg_iovlen = 1;
> +
> +if (fds && fds_num > 0) {
> +g_assert_cmpuint(fds_num, <, SOCKET_MAX_FDS);
> +
> +msg.msg_control = control;
> +msg.msg_controllen = CMSG_SPACE(fdsize);
> +
> +cmsg = CMSG_FIRSTHDR(&msg);
> +cmsg->cmsg_len = CMSG_LEN(fdsize);
> +cmsg->cmsg_level = SOL_SOCKET;
> +cmsg->cmsg_type = SCM_RIGHTS;
> +memcpy(CMSG_DATA(cmsg), fds, fdsize);
> +}
> +
> +do {
> +ret = sendmsg(fd, &msg, 0);
> +} while (ret < 0 && errno == EINTR);
> +g_assert_cmpint(ret, >, 0);
> +#else
> +g_test_skip("sendmsg is not supported under Win32");
> +return;
> +#endif
> +}

We're only compiling the qtests if CONFIG_POSIX=y, so I think you don't
need to check for WIN32 here.

 Thomas



Re: [Qemu-devel] [PATCH v11 07/20] gdbstub: Implement breakpoint commands (Z/z pkt) with new infra

2019-05-27 Thread Alex Bennée


Jon Doron  writes:

> Signed-off-by: Jon Doron 

With the fix to avoid double responses this commit still regresses:

10:46 alex@zen/x86_64  [linux.git/master@origin] >gdb ./builds/arm64/vmlinux -x 
~/lsrc/qemu.git/tests/guest-debug/test-gdbstub.py
GNU gdb (GDB) 8.3.50.20190424-git
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Executed .gdbinit
Reading symbols from ./builds/arm64/vmlinux...
Traceback (most recent call last):
  File "/home/alex/lsrc/linux.git/builds/arm64/vmlinux-gdb.py", line 30, in 

import linux.config
ImportError: No module named config
Connecting to remote
0x4000 in ?? ()
Checking we can step the first few instructions
0x4004 in ?? ()
0x4008 in ?? ()
0x400c in ?? ()
PASS: single step in boot code
Checking HW breakpoint works
Hardware assisted breakpoint 1 at 0xff8010778f0c: file 
/home/alex/lsrc/linux.git/init/main.c, line 1068.
Cannot remove breakpoints because program is no longer writable.
Further execution is probably impossible.

Thread 1 hit Breakpoint 1, kernel_init (unused=0x0) at 
/home/alex/lsrc/linux.git/init/main.c:1068
warning: Source file is more recent than executable.
1068} else
0xff8010778f0c  == {int (void *)} 0xff8010778f0c 

warning: Error removing breakpoint 1
PASS: hbreak @ kernel_init

Something might be broken here due to the BP type?

Setup catch-all for run_init_process
Breakpoint 2 at 0xff8010083dc4: file /home/alex/lsrc/linux.git/init/main.c, 
line 1009.
Breakpoint 3 at 0xff8010083e10: file /home/alex/lsrc/linux.git/init/main.c, 
line 1020.
Checking Normal breakpoint works
Breakpoint 4 at 0xff801077b300: file 
/home/alex/lsrc/linux.git/kernel/sched/completion.c, line 136.

Thread 1 received signal SIGTRAP, Trace/breakpoint trap.
Cannot remove breakpoints because program is no longer writable.
Further execution is probably impossible.
kernel_init (unused=0x0) at /home/alex/lsrc/linux.git/init/main.c:1068
1068} else
0xff8010778f0c  == {void (struct completion *)} 
0xff801077b300  0
warning: Error removing breakpoint 4
FAIL: break @ wait_for_completion
Checking watchpoint works
Hardware access (read/write) watchpoint 5: *(enum system_states 
*)(&system_state)

Thread 1 received signal SIGTRAP, Trace/breakpoint trap.
Cannot remove breakpoints because program is no longer writable.
Further execution is probably impossible.
kernel_init (unused=0x0) at /home/alex/lsrc/linux.git/init/main.c:1068
1068} else
FAIL: awatch for system_state (SYSTEM_BOOTING)
Hardware read watchpoint 6: *(enum system_states *)(&system_state)

Thread 1 received signal SIGTRAP, Trace/breakpoint trap.
Cannot remove breakpoints because program is no longer writable.
Further execution is probably impossible.
kernel_init (unused=0x0) at /home/alex/lsrc/linux.git/init/main.c:1068
1068} else
FAIL: rwatch for system_state (SYSTEM_BOOTING)
Hardware watchpoint 7: *(enum system_states *)(&system_state)

Thread 1 received signal SIGTRAP, Trace/breakpoint trap.
Cannot remove breakpoints because program is no longer writable.
Further execution is probably impossible.
kernel_init (unused=0x0) at /home/alex/lsrc/linux.git/init/main.c:1068
1068} else
FAIL: watch for system_state (SYSTEM_BOOTING)
[Inferior 1 (process 1) killed]


> ---
>  gdbstub.c | 84 +++
>  1 file changed, 66 insertions(+), 18 deletions(-)
>
> diff --git a/gdbstub.c b/gdbstub.c
> index 129a47230f..c59a6765cd 100644
> --- a/gdbstub.c
> +++ b/gdbstub.c
> @@ -950,7 +950,7 @@ static inline int xlat_gdb_type(CPUState *cpu, int 
> gdbtype)
>  }
>  #endif
>
> -static int gdb_breakpoint_insert(target_ulong addr, target_ulong len, int 
> type)
> +static int gdb_breakpoint_insert(int type, target_ulong addr, target_ulong 
> len)
>  {
>  CPUState *cpu;
>  int err = 0;
> @@ -1591,6 +1591,52 @@ static void handle_set_thread(GdbCmdContext *gdb_ctx, 
> void *user_ctx)
>  }
>  }
>
> +static void handle_insert_bp(GdbCmdContext *gdb_ctx, void *user_ctx)
> +{
> +int res;
> +
> +if (gdb_ctx->num_params != 3) {
> +put_packet(gdb_ctx->s, "E22");
> +return;
> +}
> +
> +res = gdb_breakpoint_insert(gdb_ctx->params[0].val_ul,
> +gdb_ctx->params[1].val_ull,

Re: [Qemu-devel] [RFC v4 5/7] tests: New make target check-source

2019-05-27 Thread Paolo Bonzini
On 27/05/19 07:10, Markus Armbruster wrote:
>> Another suggestion: are there headers that cannot even be included once
>> (due to dependencies)?  Is it worth including a test for those even in
>> the first iteration?
>>
> I'm not sure I get what you mean.
> 
> Most headers failing the test fail it in the first #include: they fail
> to conform to 2. Headers should normally include everything they need
> beyond osdep.h.

Ok, good to know.

> The only way to fail in the second #include is a missing header guard.
> If it's missing intentionally, it's "_meant_ to be included many times",
> and you propose renaming to .inc.h.  Else, easy fix.
> 
> I think I'll make a list of headers that fail in the second #include,
> and try to sort them into "intentional" and "bug" buckets.

The proposal is to make two tests, but it can come later.

Another idea could be to make it print the result as TAP.  But I could
work on that later.

Paolo



[Qemu-devel] [PATCH v2] vfio/common: Introduce vfio_set_irq_signaling helper

2019-05-27 Thread Eric Auger
The code used to assign an interrupt index/subindex to an
eventfd is duplicated many times. Let's introduce an helper that
allows to set/unset the signaling for an ACTION_TRIGGER,
ACTION_MASK or ACTION_UNMASK action.

Signed-off-by: Eric Auger 

---

v1 -> v2:
- don't call GET_IRQ_INFO in vfio_set_irq_signaling()
  and restore quiet check in vfio_register_req_notifier.
  Nicer display of the IRQ name.

This is a follow-up to
[PATCH v2 0/2] vfio-pci: Introduce vfio_set_event_handler().
It looks to me that introducing vfio_set_irq_signaling() has more
benefits in term of code reduction and the helper abstraction
looks cleaner.
---
 hw/vfio/common.c  |  78 
 hw/vfio/pci.c | 217 --
 hw/vfio/platform.c|  54 +++--
 include/hw/vfio/vfio-common.h |   2 +
 4 files changed, 150 insertions(+), 201 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 4374cc6176..1f1deff360 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -95,6 +95,84 @@ void vfio_mask_single_irqindex(VFIODevice *vbasedev, int 
index)
 ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
 }
 
+static inline const char *action_to_str(int action)
+{
+switch (action) {
+case VFIO_IRQ_SET_ACTION_MASK:
+return "MASK";
+case VFIO_IRQ_SET_ACTION_UNMASK:
+return "UNMASK";
+case VFIO_IRQ_SET_ACTION_TRIGGER:
+return "TRIGGER";
+default:
+return "UNKNOWN ACTION";
+}
+}
+
+static char *irq_to_str(int index, int subindex)
+{
+char *str;
+
+switch (index) {
+case VFIO_PCI_INTX_IRQ_INDEX:
+str = g_strdup_printf("INTX-%d", subindex);
+break;
+case VFIO_PCI_MSI_IRQ_INDEX:
+str = g_strdup_printf("MSI-%d", subindex);
+break;
+case VFIO_PCI_MSIX_IRQ_INDEX:
+str = g_strdup_printf("MSIX-%d", subindex);
+break;
+case VFIO_PCI_ERR_IRQ_INDEX:
+str = g_strdup_printf("ERR-%d", subindex);
+break;
+case VFIO_PCI_REQ_IRQ_INDEX:
+str = g_strdup_printf("REQ-%d", subindex);
+break;
+default:
+str = g_strdup_printf("index %d (unknown)", index);
+break;
+}
+return str;
+}
+
+int vfio_set_irq_signaling(VFIODevice *vbasedev, int index, int subindex,
+   int action, int fd, Error **errp)
+{
+struct vfio_irq_set *irq_set;
+int argsz, ret = 0;
+int32_t *pfd;
+char *irq_name;
+
+argsz = sizeof(*irq_set) + sizeof(*pfd);
+
+irq_set = g_malloc0(argsz);
+irq_set->argsz = argsz;
+irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | action;
+irq_set->index = index;
+irq_set->start = subindex;
+irq_set->count = 1;
+pfd = (int32_t *)&irq_set->data;
+*pfd = fd;
+
+ret = ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+g_free(irq_set);
+
+if (!ret) {
+return 0;
+}
+
+error_setg_errno(errp, -ret, "VFIO_DEVICE_SET_IRQS failure");
+irq_name = irq_to_str(index, subindex);
+error_prepend(errp,
+  "Failed to %s %s eventfd signaling for interrupt %s: ",
+  fd < 0 ? "tear down" : "set up", action_to_str(action),
+  irq_name);
+g_free(irq_name);
+return ret;
+}
+
 /*
  * IO Port/MMIO - Beware of the endians, VFIO is always little endian
  */
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 8cecb53d5c..e42901bd66 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -113,9 +113,7 @@ static void vfio_intx_enable_kvm(VFIOPCIDevice *vdev, Error 
**errp)
 .gsi = vdev->intx.route.irq,
 .flags = KVM_IRQFD_FLAG_RESAMPLE,
 };
-struct vfio_irq_set *irq_set;
-int ret, argsz;
-int32_t *pfd;
+Error *err = NULL;
 
 if (vdev->no_kvm_intx || !kvm_irqfds_enabled() ||
 vdev->intx.route.mode != PCI_INTX_ENABLED ||
@@ -143,22 +141,10 @@ static void vfio_intx_enable_kvm(VFIOPCIDevice *vdev, 
Error **errp)
 goto fail_irqfd;
 }
 
-argsz = sizeof(*irq_set) + sizeof(*pfd);
-
-irq_set = g_malloc0(argsz);
-irq_set->argsz = argsz;
-irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_UNMASK;
-irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
-irq_set->start = 0;
-irq_set->count = 1;
-pfd = (int32_t *)&irq_set->data;
-
-*pfd = irqfd.resamplefd;
-
-ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
-g_free(irq_set);
-if (ret) {
-error_setg_errno(errp, -ret, "failed to setup INTx unmask fd");
+if (vfio_set_irq_signaling(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX, 0,
+   VFIO_IRQ_SET_ACTION_UNMASK,
+   irqfd.resamplefd, &err)) {
+error_propagate(errp, err);
 goto fail_vfio;
 }
 
@@ -262,10 +248,10 @@ static void vfio_intx_update(PCIDevice *pdev)
 static int vfio_intx_enable(VFIOPCIDevice *vdev, Error **errp)
 {
 uint8_t pin = vfio_pci_read_config(&vdev->pdev, PCI_INTERRUPT

Re: [Qemu-devel] hw/s390x/ipl: Dubious use of qdev_reset_all_fn

2019-05-27 Thread Philippe Mathieu-Daudé
Cc'ing Damien who is working on "multi-phase reset mechanism".

On 5/27/19 9:52 AM, Markus Armbruster wrote:
> Peter Maydell  writes:
> 
>> On Fri, 24 May 2019 at 20:47, Christian Borntraeger
>>  wrote:
>>> While this patch is certainly ok, I find it disturbing that qdev devices 
>>> are being resetted,
>>> but qom devices not.
>>
>> It's not a qdev-vs-QOM thing. Anything which is a DeviceState
>> has a reset method, but only devices which are somewhere
>> rooted in the bus-tree that starts with the "main system
>> bus" (aka sysbus) get reset by the vl.c-registered "reset
>> everything on the system bus". Devices which are SysBusDevice
>> get auto-parented onto the sysbus, and so get reset. Devices
>> like PCI devices or SCSI devices get put onto the PCI
>> bus or the SCSI bus, and those buses are in turn children
>> of some host-controller device which is on the sysbus, so
>> they all get reset. The things that don't get reset are
>> "orphan" devices which are neither (a) of a type that gets
>> automatically parented onto a bus like SysBusDevice nor
>> (b) put specifically onto some other bus.
>>
>> CPU objects are the other common thing that doesn't get
>> reset 'automatically'.
>>
>> Suggestions for how to restructure reset so this doesn't
>> happen are welcome... "reset follows the bus hierarchy"
>> works well in some places but is a bit weird in others
>> (for SoC containers and the like "follow the QOM
>> hierarchy" would make more sense, but I have no idea
>> how to usefully transition to a model where you could
>> say "for these devices, follow QOM tree for reset" or
>> what an API for that would look like).
> 
> Here's a QOM composition tree for the ARM virt machine (-nodefaults
> -device e1000) as visible in qom-fuse under /machine, with irq and
> qemu:memory-region ommitted for brevity:
> 
> machine  virt-4.1-machine
>   +-- fw_cfg  fw_cfg_mem
>   +-- peripheral  container
>   +-- peripheral-anon  container
>   | +-- device[0]  e1000
>   +-- unattached  container
>   | +-- device[0]  cortex-a15-arm-cpu
>   | +-- device[1]  arm_gic
>   | +-- device[2]  arm-gicv2m
>   | +-- device[3]  pl011
>   | +-- device[4]  pl031
>   | +-- device[5]  gpex-pcihost
>   | | +-- pcie.0  PCIE
>   | | +-- gpex_root  gpex-root
>   | +-- device[6]  pl061
>   | +-- device[7]  gpio-key
>   | +-- device[8]  virtio-mmio
>   | | +-- virtio-mmio-bus.0  virtio-mmio-bus
>   | .
>   | .  more virtio-mmio
>   | .
>   | +-- device[39]  virtio-mmio
>   | | +-- virtio-mmio-bus.31  virtio-mmio-bus
>   | +-- device[40]  platform-bus-device
>   | +-- sysbus  System
>   +-- virt.flash0  cfi.pflash01
>   +-- virt.flash1  cfi.pflash01
> 
> Observations:
> 
> * Some components of the machine are direct children of machine: fw_cfg,
>   virt.flash0, virt.flash1
> 
> * machine additionally has a few containers: peripheral,
>   peripheral-anon, unattached.
> 
> * machine/peripheral and machine/peripheral-anon contain the -device
>   with and without ID, respectively.
> 
> * machine/unattached contains everything else created by code without an
>   explicit parent device.  Some (all?) of them should perhaps be direct
>   children of machine instead.
> 
> Compare to the qdev tree shown by info qtree:
> 
> bus: main-system-bus
>   type System
>   dev: platform-bus-device, id "platform-bus-device"
>   dev: fw_cfg_mem, id ""
>   dev: virtio-mmio, id ""
> bus: virtio-mmio-bus.31
>   type virtio-mmio-bus
>   ... more virtio-mmio
>   dev: virtio-mmio, id ""
> bus: virtio-mmio-bus.0
>   type virtio-mmio-bus
>   dev: gpio-key, id ""
>   dev: pl061, id ""
>   dev: gpex-pcihost, id ""
> bus: pcie.0
>   type PCIE
>   dev: e1000, id ""
>   dev: gpex-root, id ""
>   dev: pl031, id ""
>   dev: pl011, id ""
>   dev: arm-gicv2m, id ""
>   dev: arm_gic, id ""
>   dev: cfi.pflash01, id ""
>   dev: cfi.pflash01, id ""
> 
> Observations:
> 
> * Composition tree root machine's containers are not in the qtree.
> 
> * Composition tree node cortex-a15-arm-cpu is not in the qtree.  That's
>   because it's not a qdev (in QOM parlance: not a TYPE_DEVICE).
> 
> * In the qtree, every other inner node is a qbus.  These are *leaves* in
>   the composition tree.  The qtree's vertex from qbus to qdev is a
>   *link* in the composition tree.
> 
>   Example: main-system-bus -> pl011 is
>   machine/unattached/sysbus/child[4] ->
>   ../../../machine/unattached/device[3].
> 
>   Example: main-system-bus/gpex-pcihost/pcie.0 -> e1000 is
>   machine/unattached/device[5]/pcie.0//child[1] ->
>   ../../../../machine/peripheral-anon/device[0].
> 
> Now let me ramble a bit on reset.
> 
> We could model the reset wiring explicitly: every QOM object

Re: [Qemu-devel] qapi/misc.json is too big, let's bite off a few chunks

2019-05-27 Thread Paolo Bonzini
On 27/05/19 10:00, Markus Armbruster wrote:
> As long as we don't have an active QOM maintainer[*], the benefit is
> low.
> 
> 
> [*] We need one.  I'm not volunteering.

I think Daniel, Eduardo and I could count as de facto maintainer.  I
guess I could maintain it if I get two partners in crime as reviewers.

Paolo




Re: [Qemu-devel] [PATCH v3 0/8] Add support for io_uring

2019-05-27 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/20190527080327.10780-1-mehta.aar...@gmail.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [Qemu-devel] [PATCH v3 0/8] Add support for io_uring
Type: series
Message-id: 20190527080327.10780-1-mehta.aar...@gmail.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
75fc7f1 block/fileposix: extend to use io_uring
d03ae39 blockdev: accept io_uring as option
cae30ee util/async: add aio interfaces for io_uring
f3be807 stubs: add stubs for io_uring interface
85c03de block/io_uring: implements interfaces for io_uring
5c4a14a block/block: add BDRV flag for io_uring
9a6594d qapi/block-core: add option for io_uring
460c72d configure: permit use of io_uring

=== OUTPUT BEGIN ===
1/8 Checking commit 460c72d1a8df (configure: permit use of io_uring)
2/8 Checking commit 9a6594daa76c (qapi/block-core: add option for io_uring)
3/8 Checking commit 5c4a14a301f5 (block/block: add BDRV flag for io_uring)
4/8 Checking commit 85c03de16186 (block/io_uring: implements interfaces for 
io_uring)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#49: 
new file mode 100644

ERROR: space required before the open parenthesis '('
#196: FILE: block/io_uring.c:143:
+while(!s->io_q.in_queue) {

ERROR: trailing whitespace
#209: FILE: block/io_uring.c:156:
+if (ret <= 0) { $

total: 2 errors, 1 warnings, 387 lines checked

Patch 4/8 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

5/8 Checking commit f3be80708ad1 (stubs: add stubs for io_uring interface)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#35: 
new file mode 100644

total: 0 errors, 1 warnings, 46 lines checked

Patch 5/8 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
6/8 Checking commit cae30ee1388f (util/async: add aio interfaces for io_uring)
7/8 Checking commit d03ae39c331c (blockdev: accept io_uring as option)
8/8 Checking commit 75fc7f1d8a3e (block/fileposix: extend to use io_uring)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20190527080327.10780-1-mehta.aar...@gmail.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[Qemu-devel] [Bug 1585533] Re: cache-miss-rate / Invalid JSON

2019-05-27 Thread Marc Brothier
I'm not able to test that issue anymore, you can close the ticket.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1585533

Title:
  cache-miss-rate / Invalid JSON

Status in QEMU:
  Incomplete

Bug description:
  Hi,

  We have VMs which were started with an older version than qemu 2.1
  which added "cache-miss-rate" property for XBZRLECacheStats. While
  trying to migrate the VM to a new host which is running a higher
  version (2.3) of Qemu we got an exception:

  virJSONValueFromString:1642 : internal error: cannot parse json {"return": 
{"expected-downtime": 1, "xbzrle-cache": {"bytes": 0, "cache-size": 67108864, 
"cache-miss-rate": -nan, "pages": 0, "overflow": 0, "cache-miss": 8933}, 
"status": "active", "disk": {"total": 429496729600, "dirty-sync-count": 0, 
"remaining": 193896382464, "mbps": 0, "transferred": 235600347136, "duplicate": 
0, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 0, "normal": 0}, 
"setup-time": 13, "total-time": 1543124, "ram": {"total": 8599183360, 
"dirty-sync-count": 4, "remaining": 30695424, "mbps": 830.636997, 
"transferred": 3100448901, "duplicate": 1358341, "dirty-pages-rate": 7, 
"skipped": 0, "normal-bytes": 3082199040, "normal": 752490}}, "id": 
"libvirt-186200"}: lexical error: malformed number, a digit is required after 
the minus sign.
67108864, "cache-miss-rate": -nan, "pages": 0, "overflow": 0
   (right here) --^

  virNetClientStreamRaiseError:191 : stream aborted at client request

  
  Would it be possible to improve the JSON parser to skip the key if the value 
is incorrect instead of throwing an exception? Then hopefully qemu 2.3 or 
higher is able to handle the data without this property, falling back to its 
default.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1585533/+subscriptions



Re: [Qemu-devel] [PULL v2 04/36] virtio: Introduce started flag to VirtioDevice

2019-05-27 Thread Greg Kurz
On Fri, 24 May 2019 19:56:06 +0800
Yongji Xie  wrote:

> On Fri, 24 May 2019 at 18:20, Greg Kurz  wrote:
> >
> > On Mon, 20 May 2019 19:10:35 -0400
> > "Michael S. Tsirkin"  wrote:
> >  
> > > From: Xie Yongji 
> > >
> > > The virtio 1.0 transitional devices support driver uses the device
> > > before setting the DRIVER_OK status bit. So we introduce a started
> > > flag to indicate whether driver has started the device or not.
> > >
> > > Signed-off-by: Xie Yongji 
> > > Signed-off-by: Zhang Yu 
> > > Message-Id: <20190320112646.3712-2-xieyon...@baidu.com>
> > > Reviewed-by: Michael S. Tsirkin 
> > > Signed-off-by: Michael S. Tsirkin 
> > > ---
> > >  include/hw/virtio/virtio.h |  2 ++
> > >  hw/virtio/virtio.c | 52 --
> > >  2 files changed, 52 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> > > index 7140381e3a..27c0efc3d0 100644
> > > --- a/include/hw/virtio/virtio.h
> > > +++ b/include/hw/virtio/virtio.h
> > > @@ -105,6 +105,8 @@ struct VirtIODevice
> > >  uint16_t device_id;
> > >  bool vm_running;
> > >  bool broken; /* device in invalid state, needs reset */
> > > +bool started;
> > > +bool start_on_kick; /* virtio 1.0 transitional devices support that 
> > > */
> > >  VMChangeStateEntry *vmstate;
> > >  char *bus_name;
> > >  uint8_t device_endian;
> > > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > > index 28056a7ef7..5d533ac74e 100644
> > > --- a/hw/virtio/virtio.c
> > > +++ b/hw/virtio/virtio.c
> > > @@ -1162,10 +1162,16 @@ int virtio_set_status(VirtIODevice *vdev, uint8_t 
> > > val)
> > >  }
> > >  }
> > >  }
> > > +vdev->started = val & VIRTIO_CONFIG_S_DRIVER_OK;
> > > +if (unlikely(vdev->start_on_kick && vdev->started)) {
> > > +vdev->start_on_kick = false;
> > > +}
> > > +
> > >  if (k->set_status) {
> > >  k->set_status(vdev, val);
> > >  }
> > >  vdev->status = val;
> > > +
> > >  return 0;
> > >  }
> > >
> > > @@ -1208,6 +1214,9 @@ void virtio_reset(void *opaque)
> > >  k->reset(vdev);
> > >  }
> > >
> > > +vdev->start_on_kick = (virtio_host_has_feature(vdev, 
> > > VIRTIO_F_VERSION_1) &&
> > > +  !virtio_vdev_has_feature(vdev, 
> > > VIRTIO_F_VERSION_1));
> > > +vdev->started = false;
> > >  vdev->broken = false;
> > >  vdev->guest_features = 0;
> > >  vdev->queue_sel = 0;
> > > @@ -1518,14 +1527,21 @@ void virtio_queue_set_align(VirtIODevice *vdev, 
> > > int n, int align)
> > >
> > >  static bool virtio_queue_notify_aio_vq(VirtQueue *vq)
> > >  {
> > > +bool ret = false;
> > > +
> > >  if (vq->vring.desc && vq->handle_aio_output) {
> > >  VirtIODevice *vdev = vq->vdev;
> > >
> > >  trace_virtio_queue_notify(vdev, vq - vdev->vq, vq);
> > > -return vq->handle_aio_output(vdev, vq);
> > > +ret = vq->handle_aio_output(vdev, vq);
> > > +
> > > +if (unlikely(vdev->start_on_kick)) {
> > > +vdev->started = true;
> > > +vdev->start_on_kick = false;
> > > +}
> > >  }
> > >
> > > -return false;
> > > +return ret;
> > >  }
> > >
> > >  static void virtio_queue_notify_vq(VirtQueue *vq)
> > > @@ -1539,6 +1555,11 @@ static void virtio_queue_notify_vq(VirtQueue *vq)
> > >
> > >  trace_virtio_queue_notify(vdev, vq - vdev->vq, vq);
> > >  vq->handle_output(vdev, vq);
> > > +
> > > +if (unlikely(vdev->start_on_kick)) {
> > > +vdev->started = true;
> > > +vdev->start_on_kick = false;
> > > +}
> > >  }
> > >  }
> > >
> > > @@ -1556,6 +1577,11 @@ void virtio_queue_notify(VirtIODevice *vdev, int n)
> > >  } else if (vq->handle_output) {
> > >  vq->handle_output(vdev, vq);
> > >  }
> > > +
> > > +if (unlikely(vdev->start_on_kick)) {
> > > +vdev->started = true;
> > > +vdev->start_on_kick = false;
> > > +}
> > >  }
> > >
> > >  uint16_t virtio_queue_vector(VirtIODevice *vdev, int n)
> > > @@ -1770,6 +1796,13 @@ static bool virtio_broken_needed(void *opaque)
> > >  return vdev->broken;
> > >  }
> > >
> > > +static bool virtio_started_needed(void *opaque)
> > > +{
> > > +VirtIODevice *vdev = opaque;
> > > +
> > > +return vdev->started;  
> >
> > Existing machine types don't know about the "virtio/started" subsection. 
> > This
> > breaks migration to older QEMUs if the driver has started the device, ie. 
> > most
> > probably always when it comes to live migration.
> >
> > My understanding is that we do try to support backward migration though. It
> > is a regular practice in datacenters to migrate workloads without having to
> > take care of the QEMU version. FWIW I had to fix similar issues downstream
> > many times in the past because customers had filed bugs.
> >  
> 
> If we do need to support backward migration, for this patch, what I

[Qemu-devel] [PATCH] target-i386: adds PV_SCHED_YIELD CPUID feature bit

2019-05-27 Thread Wanpeng Li
From: Wanpeng Li 

Adds PV_SCHED_YIELD CPUID feature bit.

Signed-off-by: Wanpeng Li 
---
 target/i386/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 5f07d68..f4c4b6b 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -902,7 +902,7 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 "kvmclock", "kvm-nopiodelay", "kvm-mmu", "kvmclock",
 "kvm-asyncpf", "kvm-steal-time", "kvm-pv-eoi", "kvm-pv-unhalt",
 NULL, "kvm-pv-tlb-flush", NULL, "kvm-pv-ipi",
-NULL, NULL, NULL, NULL,
+"kvm-pv-sched-yield", NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 "kvmclock-stable-bit", NULL, NULL, NULL,
-- 
2.7.4




Re: [Qemu-devel] [PATCH] virtio-gpu: add sanity check

2019-05-27 Thread Marc-André Lureau
On Mon, May 27, 2019 at 11:13 AM Gerd Hoffmann  wrote:
>
> Require a minimum 16x16 size for the scanout, to make sure the guest
> can't set either width or height to zero.  This (a) doesn't make sense
> at all and (b) causes problems in some UI code.  When using spice this
> will triggers an assert().
>
> Reported-by: Tyler Slabinski 
> Signed-off-by: Gerd Hoffmann 

Reviewed-by: Marc-André Lureau 

> ---
>  hw/display/virtio-gpu.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
> index 9e37e0ac96b7..372b31ef0af2 100644
> --- a/hw/display/virtio-gpu.c
> +++ b/hw/display/virtio-gpu.c
> @@ -677,6 +677,8 @@ static void virtio_gpu_set_scanout(VirtIOGPU *g,
>
>  if (ss.r.x > res->width ||
>  ss.r.y > res->height ||
> +ss.r.width < 16 ||
> +ss.r.height < 16 ||
>  ss.r.width > res->width ||
>  ss.r.height > res->height ||
>  ss.r.x + ss.r.width > res->width ||
> --
> 2.18.1
>
>


-- 
Marc-André Lureau



Re: [Qemu-devel] [RFC PATCH 0/2] establish nesting rule of BQL vs cpu-exclusive

2019-05-27 Thread Roman Kagan
On Thu, May 23, 2019 at 12:31:16PM +0100, Alex Bennée wrote:
> 
> Roman Kagan  writes:
> 
> > I came across the following AB-BA deadlock:
> >
> > vCPU thread main thread
> > --- ---
> > async_safe_run_on_cpu(self,
> >   async_synic_update)
> > ... [cpu hot-add]
> > process_queued_cpu_work()
> >   qemu_mutex_unlock_iothread()
> > [grab BQL]
> >   start_exclusive() cpu_list_add()
> >   async_synic_update()finish_safe_work()
> > qemu_mutex_lock_iothread()  cpu_exec_start()
> >
> > ATM async_synic_update seems to be the only async safe work item that
> > grabs BQL.  However it isn't quite obvious that it shouldn't; in the
> > past there were more examples of this (e.g.
> > memory_region_do_invalidate_mmio_ptr).
> >
> > It looks like the problem is generally in the lack of the nesting rule
> > for cpu-exclusive sections against BQL, so I thought I would try to
> > address that.  This patchset is my feeble attempt at this; I'm not sure
> > I fully comprehend all the consequences (rather, I'm sure I don't) hence
> > RFC.
> 
> Hmm I think this is an area touched by:
> 
>   Subject: [PATCH v7 00/73] per-CPU locks
>   Date: Mon,  4 Mar 2019 13:17:00 -0500
>   Message-Id: <20190304181813.8075-1-c...@braap.org>
> 
> which has stalled on it's path into the tree. Last time I checked it
> explicitly handled the concept of work that needed the BQL and work that
> didn't.

I'm still trying to get my head around that patchset, but it looks like
it changes nothing in regards to cpu-exclusive sections and safe work,
so it doesn't make the problem go.

> How do you trigger your deadlock? Just hot-pluging CPUs?

Yes.  The window is pretty narrow so I only saw it once although this
test (where the vms are started and stopped and the cpus are plugged in
and out) is in our test loop for quite a bit (probably 2+ years).

Roman.



[Qemu-devel] [RFC v4 00/27] vSMMUv3/pSMMUv3 2 stage VFIO integration

2019-05-27 Thread Eric Auger
Up to now vSMMUv3 has not been integrated with VFIO. VFIO
integration requires to program the physical IOMMU consistently
with the guest mappings. However, as opposed to VTD, SMMUv3 has
no "Caching Mode" which allows easy trapping of guest mappings.
This means the vSMMUV3 cannot use the same VFIO integration as VTD.

However SMMUv3 has 2 translation stages. This was devised with
virtualization use case in mind where stage 1 is "owned" by the
guest whereas the host uses stage 2 for VM isolation.

This series sets up this nested translation stage. It only works
if there is one physical SMMUv3 used along with QEMU vSMMUv3 (in
other words, it does not work if there is a physical SMMUv2).

The series uses a new kernel user API [1], still under definition.

- We force the host to use stage 2 instead of stage 1, when we
  detect a vSMMUV3 is behind a VFIO device. For a VFIO device
  without any virtual IOMMU, we still use stage 1 as many existing
  SMMUs expect this behavior.
- We introduce new IOTLB "config" notifiers, requested to notify
  changes in the config of a given iommu memory region. So now
  we have notifiers for IOTLB changes and config changes.
- vSMMUv3 calls config notifiers when STE (Stream Table Entries)
  are updated by the guest.
- We implement a specific UNMAP notifier that conveys guest
  IOTLB invalidations to the host
- We implement a new MAP notifiers only used for MSI IOVAs so
  that the host can build a nested stage translation for MSI IOVAs
- As the legacy MAP notifier is not called anymore, we must make
  sure stage 2 mappings are set. This is achieved through another
  memory listener.
- Physical SMMUs faults are reported to the guest via en eventfd
  mechanism and reinjected into this latter.

Note: The first patch is a code cleanup and was sent separately.

Best Regards

Eric

This series can be found at:
https://github.com/eauger/qemu/tree/v4.0.0-2stage-rfcv4

Compatible with kernel series:
[PATCH v8 00/29] SMMUv3 Nested Stage Setup
(https://lkml.org/lkml/2019/5/26/95)

History:
v3 -> v4:
- adapt to changes in uapi (asid cache invalidation)
- check VFIO_PCI_DMA_FAULT_IRQ_INDEX is supported at kernel level
  before attempting to set signaling for it.
- sync on 5.2-rc1 kernel headers + Drew's patch that imports sve_context.h
- fix MSI binding for MSI (not MSIX)
- fix mingw compilation

v2 -> v3:
- rework fault handling
- MSI binding registration done in vfio-pci. MSI binding tear down called
  on container cleanup path
- leaf parameter propagated

v1 -> v2:
- Fixed dual assignment (asid now correctly propagated on TLB invalidations)
- Integrated fault reporting


Andrew Jones (1):
  update-linux-headers: Add sve_context.h to asm-arm64

Eric Auger (26):
  vfio/common: Introduce vfio_set_irq_signaling helper
  update-linux-headers: Import iommu.h
  header update against 5.2.0-rc1 and IOMMU/VFIO nested stage APIs
  memory: add IOMMU_ATTR_VFIO_NESTED IOMMU memory region attribute
  memory: add IOMMU_ATTR_MSI_TRANSLATE IOMMU memory region attribute
  hw/arm/smmuv3: Advertise VFIO_NESTED and MSI_TRANSLATE attributes
  hw/vfio/common: Force nested if iommu requires it
  memory: Prepare for different kinds of IOMMU MR notifiers
  memory: Add IOMMUConfigNotifier
  memory: Add arch_id and leaf fields in IOTLBEntry
  hw/arm/smmuv3: Store the PASID table GPA in the translation config
  hw/arm/smmuv3: Implement dummy replay
  hw/arm/smmuv3: Fill the IOTLBEntry arch_id on NH_VA invalidation
  hw/arm/smmuv3: Fill the IOTLBEntry leaf field on NH_VA invalidation
  hw/arm/smmuv3: Notify on config changes
  hw/vfio/common: Introduce vfio_alloc_guest_iommu helper
  hw/vfio/common: Introduce hostwin_from_range helper
  hw/vfio/common: Introduce helpers to DMA map/unmap a RAM section
  hw/vfio/common: Setup nested stage mappings
  hw/vfio/common: Register a MAP notifier for MSI binding
  vfio-pci: Expose MSI stage 1 bindings to the host
  memory: Introduce IOMMU Memory Region inject_faults API
  hw/arm/smmuv3: Implement fault injection
  vfio-pci: register handler for iommu fault
  vfio-pci: Set up fault regions
  vfio-pci: Implement the DMA fault handler

 exec.c  |  12 +-
 hw/arm/smmu-common.c|  10 +-
 hw/arm/smmuv3.c | 198 +--
 hw/arm/trace-events |   3 +-
 hw/i386/amd_iommu.c |   2 +-
 hw/i386/intel_iommu.c   |  25 +-
 hw/misc/tz-mpc.c|   8 +-
 hw/ppc/spapr_iommu.c|   2 +-
 hw/s390x/s390-pci-inst.c|   4 +-
 hw/vfio/common.c| 572 ++--
 hw/vfio/pci.c   | 471 --
 hw/vfio/pci.h   |   4 +
 hw/vfio/platform.c  |  54 ++-
 hw/vfio/trace-events|   8 +-
 hw/virtio/vhost.c   |  14 +-
 include/exec/memory.h   | 158 +++--
 include/hw/arm/smmu-common.h|   1 +
 include/hw/vfio/vfio-common.h   |  10 +
 linux-headers/linux/iommu.h | 280 +

[Qemu-devel] [RFC v4 05/27] memory: add IOMMU_ATTR_VFIO_NESTED IOMMU memory region attribute

2019-05-27 Thread Eric Auger
We introduce a new IOMMU Memory Region attribute, IOMMU_ATTR_VFIO_NESTED
which tells whether the virtual IOMMU requires physical nested
stages for VFIO integration. Intel virtual IOMMU supports Caching
Mode and does not require 2 stages at physical level. However virtual
ARM SMMU does not implement such caching mode and requires to use
physical stage 1 for VFIO integration.

Signed-off-by: Eric Auger 
---
 include/exec/memory.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 9144a47f57..352a00169f 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -204,7 +204,8 @@ struct MemoryRegionOps {
 };
 
 enum IOMMUMemoryRegionAttr {
-IOMMU_ATTR_SPAPR_TCE_FD
+IOMMU_ATTR_SPAPR_TCE_FD,
+IOMMU_ATTR_VFIO_NESTED,
 };
 
 /**
-- 
2.20.1




[Qemu-devel] [RFC v4 02/27] update-linux-headers: Import iommu.h

2019-05-27 Thread Eric Auger
Update the script to import the new iommu.h uapi header.

Signed-off-by: Eric Auger 
---
 scripts/update-linux-headers.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index f76d77363b..dfdfdfddcf 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -141,7 +141,7 @@ done
 
 rm -rf "$output/linux-headers/linux"
 mkdir -p "$output/linux-headers/linux"
-for header in kvm.h vfio.h vfio_ccw.h vhost.h \
+for header in kvm.h vfio.h vfio_ccw.h vhost.h iommu.h \
   psci.h psp-sev.h userfaultfd.h mman.h; do
 cp "$tmpdir/include/linux/$header" "$output/linux-headers/linux"
 done
-- 
2.20.1




[Qemu-devel] [RFC v4 07/27] hw/arm/smmuv3: Advertise VFIO_NESTED and MSI_TRANSLATE attributes

2019-05-27 Thread Eric Auger
Virtual SMMUv3 requires physical nested stages for VFIO integration
and translates MSIs. So let's advertise those attributes.

Signed-off-by: Eric Auger 

---

v2 -> v3:
- also advertise MSI_TRANSLATE
---
 hw/arm/smmuv3.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index fd8ec7860e..761d722395 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1490,6 +1490,20 @@ static void smmuv3_notify_flag_changed(IOMMUMemoryRegion 
*iommu,
 }
 }
 
+static int smmuv3_get_attr(IOMMUMemoryRegion *iommu,
+   enum IOMMUMemoryRegionAttr attr,
+   void *data)
+{
+if (attr == IOMMU_ATTR_VFIO_NESTED) {
+*(bool *) data = true;
+return 0;
+} else if (attr == IOMMU_ATTR_MSI_TRANSLATE) {
+*(bool *) data = true;
+return 0;
+}
+return -EINVAL;
+}
+
 static void smmuv3_iommu_memory_region_class_init(ObjectClass *klass,
   void *data)
 {
@@ -1497,6 +1511,7 @@ static void 
smmuv3_iommu_memory_region_class_init(ObjectClass *klass,
 
 imrc->translate = smmuv3_translate;
 imrc->notify_flag_changed = smmuv3_notify_flag_changed;
+imrc->get_attr = smmuv3_get_attr;
 }
 
 static const TypeInfo smmuv3_type_info = {
-- 
2.20.1




[Qemu-devel] [RFC v4 01/27] vfio/common: Introduce vfio_set_irq_signaling helper

2019-05-27 Thread Eric Auger
The code used to assign an interrupt index/subindex to an
eventfd is duplicated many times. Let's introduce an helper that
allows to set/unset the signaling for an ACTION_TRIGGER,
ACTION_MASK or ACTION_UNMASK action.

Signed-off-by: Eric Auger 

---

v1 -> v2:
- don't call GET_IRQ_INFO in vfio_set_irq_signaling()
  and restore quiet check in vfio_register_req_notifier.
  Nicer display of the IRQ name.

This is a follow-up to
[PATCH v2 0/2] vfio-pci: Introduce vfio_set_event_handler().
It looks to me that introducing vfio_set_irq_signaling() has more
benefits in term of code reduction and the helper abstraction
looks cleaner.
---
 hw/vfio/common.c  |  78 
 hw/vfio/pci.c | 217 --
 hw/vfio/platform.c|  54 +++--
 include/hw/vfio/vfio-common.h |   2 +
 4 files changed, 150 insertions(+), 201 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 4374cc6176..1f1deff360 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -95,6 +95,84 @@ void vfio_mask_single_irqindex(VFIODevice *vbasedev, int 
index)
 ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
 }
 
+static inline const char *action_to_str(int action)
+{
+switch (action) {
+case VFIO_IRQ_SET_ACTION_MASK:
+return "MASK";
+case VFIO_IRQ_SET_ACTION_UNMASK:
+return "UNMASK";
+case VFIO_IRQ_SET_ACTION_TRIGGER:
+return "TRIGGER";
+default:
+return "UNKNOWN ACTION";
+}
+}
+
+static char *irq_to_str(int index, int subindex)
+{
+char *str;
+
+switch (index) {
+case VFIO_PCI_INTX_IRQ_INDEX:
+str = g_strdup_printf("INTX-%d", subindex);
+break;
+case VFIO_PCI_MSI_IRQ_INDEX:
+str = g_strdup_printf("MSI-%d", subindex);
+break;
+case VFIO_PCI_MSIX_IRQ_INDEX:
+str = g_strdup_printf("MSIX-%d", subindex);
+break;
+case VFIO_PCI_ERR_IRQ_INDEX:
+str = g_strdup_printf("ERR-%d", subindex);
+break;
+case VFIO_PCI_REQ_IRQ_INDEX:
+str = g_strdup_printf("REQ-%d", subindex);
+break;
+default:
+str = g_strdup_printf("index %d (unknown)", index);
+break;
+}
+return str;
+}
+
+int vfio_set_irq_signaling(VFIODevice *vbasedev, int index, int subindex,
+   int action, int fd, Error **errp)
+{
+struct vfio_irq_set *irq_set;
+int argsz, ret = 0;
+int32_t *pfd;
+char *irq_name;
+
+argsz = sizeof(*irq_set) + sizeof(*pfd);
+
+irq_set = g_malloc0(argsz);
+irq_set->argsz = argsz;
+irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | action;
+irq_set->index = index;
+irq_set->start = subindex;
+irq_set->count = 1;
+pfd = (int32_t *)&irq_set->data;
+*pfd = fd;
+
+ret = ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+g_free(irq_set);
+
+if (!ret) {
+return 0;
+}
+
+error_setg_errno(errp, -ret, "VFIO_DEVICE_SET_IRQS failure");
+irq_name = irq_to_str(index, subindex);
+error_prepend(errp,
+  "Failed to %s %s eventfd signaling for interrupt %s: ",
+  fd < 0 ? "tear down" : "set up", action_to_str(action),
+  irq_name);
+g_free(irq_name);
+return ret;
+}
+
 /*
  * IO Port/MMIO - Beware of the endians, VFIO is always little endian
  */
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 8e555db12e..3095379747 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -113,9 +113,7 @@ static void vfio_intx_enable_kvm(VFIOPCIDevice *vdev, Error 
**errp)
 .gsi = vdev->intx.route.irq,
 .flags = KVM_IRQFD_FLAG_RESAMPLE,
 };
-struct vfio_irq_set *irq_set;
-int ret, argsz;
-int32_t *pfd;
+Error *err = NULL;
 
 if (vdev->no_kvm_intx || !kvm_irqfds_enabled() ||
 vdev->intx.route.mode != PCI_INTX_ENABLED ||
@@ -143,22 +141,10 @@ static void vfio_intx_enable_kvm(VFIOPCIDevice *vdev, 
Error **errp)
 goto fail_irqfd;
 }
 
-argsz = sizeof(*irq_set) + sizeof(*pfd);
-
-irq_set = g_malloc0(argsz);
-irq_set->argsz = argsz;
-irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_UNMASK;
-irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
-irq_set->start = 0;
-irq_set->count = 1;
-pfd = (int32_t *)&irq_set->data;
-
-*pfd = irqfd.resamplefd;
-
-ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
-g_free(irq_set);
-if (ret) {
-error_setg_errno(errp, -ret, "failed to setup INTx unmask fd");
+if (vfio_set_irq_signaling(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX, 0,
+   VFIO_IRQ_SET_ACTION_UNMASK,
+   irqfd.resamplefd, &err)) {
+error_propagate(errp, err);
 goto fail_vfio;
 }
 
@@ -262,10 +248,10 @@ static void vfio_intx_update(PCIDevice *pdev)
 static int vfio_intx_enable(VFIOPCIDevice *vdev, Error **errp)
 {
 uint8_t pin = vfio_pci_read_config(&vdev->pdev, PCI_INTERRUPT

[Qemu-devel] [RFC v4 03/27] update-linux-headers: Add sve_context.h to asm-arm64

2019-05-27 Thread Eric Auger
From: Andrew Jones 

Signed-off-by: Andrew Jones 
---
 scripts/update-linux-headers.sh | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index dfdfdfddcf..c97d485b08 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -99,6 +99,9 @@ for arch in $ARCHLIST; do
 cp "$tmpdir/include/asm/$header" "$output/linux-headers/asm-$arch"
 done
 
+if [ $arch = arm64 ]; then
+cp "$tmpdir/include/asm/sve_context.h" 
"$output/linux-headers/asm-arm64/"
+fi
 if [ $arch = mips ]; then
 cp "$tmpdir/include/asm/sgidefs.h" "$output/linux-headers/asm-mips/"
 cp "$tmpdir/include/asm/unistd_o32.h" "$output/linux-headers/asm-mips/"
-- 
2.20.1




[Qemu-devel] [RFC v4 08/27] hw/vfio/common: Force nested if iommu requires it

2019-05-27 Thread Eric Auger
In case we detect the address space is translated by
a virtual IOMMU which requires nested stages, let's set up
the container with the VFIO_TYPE1_NESTING_IOMMU iommu_type.

Signed-off-by: Eric Auger 

---

v2 -> v3:
- add "nested only is selected if requested by @force_nested"
  comment in this patch
---
 hw/vfio/common.c | 27 +++
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 1f1deff360..99ade21056 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1136,14 +1136,19 @@ static void vfio_put_address_space(VFIOAddressSpace 
*space)
  * vfio_get_iommu_type - selects the richest iommu_type (v2 first)
  */
 static int vfio_get_iommu_type(VFIOContainer *container,
+   bool force_nested,
Error **errp)
 {
-int iommu_types[] = { VFIO_TYPE1v2_IOMMU, VFIO_TYPE1_IOMMU,
+int iommu_types[] = { VFIO_TYPE1_NESTING_IOMMU,
+  VFIO_TYPE1v2_IOMMU, VFIO_TYPE1_IOMMU,
   VFIO_SPAPR_TCE_v2_IOMMU, VFIO_SPAPR_TCE_IOMMU };
 int i;
 
 for (i = 0; i < ARRAY_SIZE(iommu_types); i++) {
 if (ioctl(container->fd, VFIO_CHECK_EXTENSION, iommu_types[i])) {
+if (iommu_types[i] == VFIO_TYPE1_NESTING_IOMMU && !force_nested) {
+continue;
+}
 return iommu_types[i];
 }
 }
@@ -1152,11 +1157,11 @@ static int vfio_get_iommu_type(VFIOContainer *container,
 }
 
 static int vfio_init_container(VFIOContainer *container, int group_fd,
-   Error **errp)
+   bool force_nested, Error **errp)
 {
 int iommu_type, ret;
 
-iommu_type = vfio_get_iommu_type(container, errp);
+iommu_type = vfio_get_iommu_type(container, force_nested, errp);
 if (iommu_type < 0) {
 return iommu_type;
 }
@@ -1192,6 +1197,14 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
 VFIOContainer *container;
 int ret, fd;
 VFIOAddressSpace *space;
+IOMMUMemoryRegion *iommu_mr;
+bool force_nested = false;
+
+if (as != &address_space_memory && memory_region_is_iommu(as->root)) {
+iommu_mr = IOMMU_MEMORY_REGION(as->root);
+memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_VFIO_NESTED,
+ (void *)&force_nested);
+}
 
 space = vfio_get_address_space(as);
 
@@ -1252,12 +1265,18 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
 QLIST_INIT(&container->giommu_list);
 QLIST_INIT(&container->hostwin_list);
 
-ret = vfio_init_container(container, group->fd, errp);
+ret = vfio_init_container(container, group->fd, force_nested, errp);
 if (ret) {
 goto free_container_exit;
 }
 
+if (force_nested && container->iommu_type != VFIO_TYPE1_NESTING_IOMMU) {
+error_setg(errp, "nested mode requested by the virtual IOMMU "
+   "but not supported by the vfio iommu");
+}
+
 switch (container->iommu_type) {
+case VFIO_TYPE1_NESTING_IOMMU:
 case VFIO_TYPE1v2_IOMMU:
 case VFIO_TYPE1_IOMMU:
 {
-- 
2.20.1




[Qemu-devel] [RFC v4 09/27] memory: Prepare for different kinds of IOMMU MR notifiers

2019-05-27 Thread Eric Auger
Current IOMMUNotifiers dedicate to IOTLB related notifications,
ie. MAP/UNMAP. We plan to introduce new types of notifiers, for
instance to notify vIOMMU configuration changes. Those new
notifiers may not be characterized by any associated address
space range.

So let's create a specialized IOMMUIOLTBNotifier datatype.
The base IOMMUNotifier will be able to encapsulate either of
the notifier types, including looming IOMMUConfigNotifier.

We also rename:
- IOMMU_NOTIFIER_* into IOMMU_NOTIFIER_IOTLB_*
- *_notify_* into *iotlb_notify_*

All calling sites are updated.

Signed-off-by: Eric Auger 
---
 exec.c   | 12 -
 hw/arm/smmu-common.c | 10 ---
 hw/arm/smmuv3.c  |  8 +++---
 hw/i386/amd_iommu.c  |  2 +-
 hw/i386/intel_iommu.c| 25 ++
 hw/misc/tz-mpc.c |  8 +++---
 hw/ppc/spapr_iommu.c |  2 +-
 hw/s390x/s390-pci-inst.c |  4 +--
 hw/vfio/common.c | 13 -
 hw/virtio/vhost.c| 14 +-
 include/exec/memory.h| 57 +---
 memory.c | 32 --
 12 files changed, 107 insertions(+), 80 deletions(-)

diff --git a/exec.c b/exec.c
index 4e734770c2..ed4c5149ac 100644
--- a/exec.c
+++ b/exec.c
@@ -686,12 +686,12 @@ static void tcg_register_iommu_notifier(CPUState *cpu,
  * just register interest in the whole thing, on the assumption
  * that iommu reconfiguration will be rare.
  */
-iommu_notifier_init(¬ifier->n,
-tcg_iommu_unmap_notify,
-IOMMU_NOTIFIER_UNMAP,
-0,
-HWADDR_MAX,
-iommu_idx);
+iommu_iotlb_notifier_init(¬ifier->n,
+  tcg_iommu_unmap_notify,
+  IOMMU_NOTIFIER_IOTLB_UNMAP,
+  0,
+  HWADDR_MAX,
+  iommu_idx);
 memory_region_register_iommu_notifier(notifier->mr, ¬ifier->n);
 }
 
diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index e94be6db6c..ee81038fc0 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -391,11 +391,11 @@ static void smmu_unmap_notifier_range(IOMMUNotifier *n)
 IOMMUTLBEntry entry;
 
 entry.target_as = &address_space_memory;
-entry.iova = n->start;
+entry.iova = n->iotlb_notifier.start;
 entry.perm = IOMMU_NONE;
-entry.addr_mask = n->end - n->start;
+entry.addr_mask = n->iotlb_notifier.end - n->iotlb_notifier.start;
 
-memory_region_notify_one(n, &entry);
+memory_region_iotlb_notify_one(n, &entry);
 }
 
 /* Unmap all notifiers attached to @mr */
@@ -405,7 +405,9 @@ inline void smmu_inv_notifiers_mr(IOMMUMemoryRegion *mr)
 
 trace_smmu_inv_notifiers_mr(mr->parent_obj.name);
 IOMMU_NOTIFIER_FOREACH(n, mr) {
-smmu_unmap_notifier_range(n);
+if (n->notifier_flags & IOMMU_NOTIFIER_IOTLB_UNMAP) {
+smmu_unmap_notifier_range(n);
+}
 }
 }
 
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 761d722395..1744874e72 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -822,7 +822,7 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
 entry.addr_mask = (1 << tt->granule_sz) - 1;
 entry.perm = IOMMU_NONE;
 
-memory_region_notify_one(n, &entry);
+memory_region_iotlb_notify_one(n, &entry);
 }
 
 /* invalidate an asid/iova tuple in all mr's */
@@ -837,7 +837,9 @@ static void smmuv3_inv_notifiers_iova(SMMUState *s, int 
asid, dma_addr_t iova)
 trace_smmuv3_inv_notifiers_iova(mr->parent_obj.name, asid, iova);
 
 IOMMU_NOTIFIER_FOREACH(n, mr) {
-smmuv3_notify_iova(mr, n, asid, iova);
+if (n->notifier_flags & IOMMU_NOTIFIER_IOTLB_UNMAP) {
+smmuv3_notify_iova(mr, n, asid, iova);
+}
 }
 }
 }
@@ -1473,7 +1475,7 @@ static void smmuv3_notify_flag_changed(IOMMUMemoryRegion 
*iommu,
 SMMUv3State *s3 = sdev->smmu;
 SMMUState *s = &(s3->smmu_state);
 
-if (new & IOMMU_NOTIFIER_MAP) {
+if (new & IOMMU_NOTIFIER_IOTLB_MAP) {
 int bus_num = pci_bus_num(sdev->bus);
 PCIDevice *pcidev = pci_find_device(sdev->bus, bus_num, sdev->devfn);
 
diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 4a4e2c7fd4..7479e74a5c 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -1470,7 +1470,7 @@ static void 
amdvi_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
 {
 AMDVIAddressSpace *as = container_of(iommu, AMDVIAddressSpace, iommu);
 
-if (new & IOMMU_NOTIFIER_MAP) {
+if (new & IOMMU_NOTIFIER_IOTLB_MAP) {
 error_report("device %02x.%02x.%x requires iommu notifier which is not 
"
  "currently supported", as->bus_num, PCI_SLOT(as->devfn),
  PCI_FUNC(as->devfn));
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_i

[Qemu-devel] [RFC v4 06/27] memory: add IOMMU_ATTR_MSI_TRANSLATE IOMMU memory region attribute

2019-05-27 Thread Eric Auger
We introduce a new IOMMU Memory Region attribute, IOMMU_ATTR_MSI_TRANSLATE
which tells whether the virtual IOMMU translates MSIs. ARM SMMU
will expose this attribute since, as opposed to Intel DMAR, MSIs
are translated as any other DMA requests.

Signed-off-by: Eric Auger 
---
 include/exec/memory.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 352a00169f..146a6096da 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -206,6 +206,7 @@ struct MemoryRegionOps {
 enum IOMMUMemoryRegionAttr {
 IOMMU_ATTR_SPAPR_TCE_FD,
 IOMMU_ATTR_VFIO_NESTED,
+IOMMU_ATTR_MSI_TRANSLATE,
 };
 
 /**
-- 
2.20.1




[Qemu-devel] [RFC v4 04/27] header update against 5.2.0-rc1 and IOMMU/VFIO nested stage APIs

2019-05-27 Thread Eric Auger
This is an update against the following development branch:
https://github.com/eauger/linux/tree/v5.2.0-rc1-2stage-v8.

Signed-off-by: Eric Auger 
---
 linux-headers/linux/iommu.h | 280 
 linux-headers/linux/vfio.h  | 107 ++
 2 files changed, 387 insertions(+)
 create mode 100644 linux-headers/linux/iommu.h

diff --git a/linux-headers/linux/iommu.h b/linux-headers/linux/iommu.h
new file mode 100644
index 00..0a59d6439c
--- /dev/null
+++ b/linux-headers/linux/iommu.h
@@ -0,0 +1,280 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * IOMMU user API definitions
+ */
+
+#ifndef _IOMMU_H
+#define _IOMMU_H
+
+#include 
+
+#define IOMMU_FAULT_PERM_READ  (1 << 0) /* read */
+#define IOMMU_FAULT_PERM_WRITE (1 << 1) /* write */
+#define IOMMU_FAULT_PERM_EXEC  (1 << 2) /* exec */
+#define IOMMU_FAULT_PERM_PRIV  (1 << 3) /* privileged */
+
+/* Generic fault types, can be expanded IRQ remapping fault */
+enum iommu_fault_type {
+   IOMMU_FAULT_DMA_UNRECOV = 1,/* unrecoverable fault */
+   IOMMU_FAULT_PAGE_REQ,   /* page request fault */
+};
+
+enum iommu_fault_reason {
+   IOMMU_FAULT_REASON_UNKNOWN = 0,
+
+   /* Could not access the PASID table (fetch caused external abort) */
+   IOMMU_FAULT_REASON_PASID_FETCH,
+
+   /* PASID entry is invalid or has configuration errors */
+   IOMMU_FAULT_REASON_BAD_PASID_ENTRY,
+
+   /*
+* PASID is out of range (e.g. exceeds the maximum PASID
+* supported by the IOMMU) or disabled.
+*/
+   IOMMU_FAULT_REASON_PASID_INVALID,
+
+   /*
+* An external abort occurred fetching (or updating) a translation
+* table descriptor
+*/
+   IOMMU_FAULT_REASON_WALK_EABT,
+
+   /*
+* Could not access the page table entry (Bad address),
+* actual translation fault
+*/
+   IOMMU_FAULT_REASON_PTE_FETCH,
+
+   /* Protection flag check failed */
+   IOMMU_FAULT_REASON_PERMISSION,
+
+   /* access flag check failed */
+   IOMMU_FAULT_REASON_ACCESS,
+
+   /* Output address of a translation stage caused Address Size fault */
+   IOMMU_FAULT_REASON_OOR_ADDRESS,
+};
+
+/**
+ * struct iommu_fault_unrecoverable - Unrecoverable fault data
+ * @reason: reason of the fault, from &enum iommu_fault_reason
+ * @flags: parameters of this fault (IOMMU_FAULT_UNRECOV_* values)
+ * @pasid: Process Address Space ID
+ * @perm: Requested permission access using by the incoming transaction
+ *(IOMMU_FAULT_PERM_* values)
+ * @addr: offending page address
+ * @fetch_addr: address that caused a fetch abort, if any
+ */
+struct iommu_fault_unrecoverable {
+   __u32   reason;
+#define IOMMU_FAULT_UNRECOV_PASID_VALID(1 << 0)
+#define IOMMU_FAULT_UNRECOV_ADDR_VALID (1 << 1)
+#define IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID   (1 << 2)
+   __u32   flags;
+   __u32   pasid;
+   __u32   perm;
+   __u64   addr;
+   __u64   fetch_addr;
+};
+
+/**
+ * struct iommu_fault_page_request - Page Request data
+ * @flags: encodes whether the corresponding fields are valid and whether this
+ * is the last page in group (IOMMU_FAULT_PAGE_REQUEST_* values)
+ * @pasid: Process Address Space ID
+ * @grpid: Page Request Group Index
+ * @perm: requested page permissions (IOMMU_FAULT_PERM_* values)
+ * @addr: page address
+ * @private_data: device-specific private information
+ */
+struct iommu_fault_page_request {
+#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID   (1 << 0)
+#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE (1 << 1)
+#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA (1 << 2)
+   __u32   flags;
+   __u32   pasid;
+   __u32   grpid;
+   __u32   perm;
+   __u64   addr;
+   __u64   private_data[2];
+};
+
+/**
+ * struct iommu_fault - Generic fault data
+ * @type: fault type from &enum iommu_fault_type
+ * @padding: reserved for future use (should be zero)
+ * @event: Fault event, when @type is %IOMMU_FAULT_DMA_UNRECOV
+ * @prm: Page Request message, when @type is %IOMMU_FAULT_PAGE_REQ
+ */
+struct iommu_fault {
+   __u32   type;
+   __u32   padding;
+   union {
+   struct iommu_fault_unrecoverable event;
+   struct iommu_fault_page_request prm;
+   };
+};
+
+/**
+ * struct iommu_pasid_smmuv3 - ARM SMMUv3 Stream Table Entry stage 1 related
+ * information
+ * @version: API version of this structure
+ * @s1fmt: STE s1fmt (format of the CD table: single CD, linear table
+ * or 2-level table)
+ * @s1dss: STE s1dss (specifies the behavior when @pasid_bits != 0
+ * and no PASID is passed along with the incoming transaction)
+ * @padding: reserved for future use (should be zero)
+ *
+ * The PASID table is referred to as the Context Descriptor (CD) table on ARM
+ * SMMUv3. Please refer to the ARM SMMU 3.x spec (ARM IHI 0070A) for full
+ * details.
+ */
+struct iommu_pasid_smmuv3 {
+#de

[Qemu-devel] [RFC v4 12/27] hw/arm/smmuv3: Store the PASID table GPA in the translation config

2019-05-27 Thread Eric Auger
For VFIO integration we will need to pass the Context Descriptor (CD)
table GPA to the host. The CD table is also referred to as the PASID
table. Its GPA corresponds to the s1ctrptr field of the Stream Table
Entry. So let's decode and store it in the configuration structure.

Signed-off-by: Eric Auger 
---
 hw/arm/smmuv3.c  | 1 +
 include/hw/arm/smmu-common.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 1744874e72..96d4147533 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -351,6 +351,7 @@ static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
   "SMMUv3 S1 stalling fault model not allowed yet\n");
 goto bad_ste;
 }
+cfg->s1ctxptr = STE_CTXPTR(ste);
 return 0;
 
 bad_ste:
diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
index 1f37844e5c..353668f4ea 100644
--- a/include/hw/arm/smmu-common.h
+++ b/include/hw/arm/smmu-common.h
@@ -68,6 +68,7 @@ typedef struct SMMUTransCfg {
 uint8_t tbi;   /* Top Byte Ignore */
 uint16_t asid;
 SMMUTransTableInfo tt[2];
+dma_addr_t s1ctxptr;
 uint32_t iotlb_hits;   /* counts IOTLB hits for this asid */
 uint32_t iotlb_misses; /* counts IOTLB misses for this asid */
 } SMMUTransCfg;
-- 
2.20.1




[Qemu-devel] [RFC v4 13/27] hw/arm/smmuv3: Implement dummy replay

2019-05-27 Thread Eric Auger
The default implementation of memory_region_iommu_replay() shall
not be used as it forces the translation of the whole RAM range.
The purpose of this function is to update the shadow page tables.
However in case of nested stage, there is no shadow page table so
we can simply return.

Signed-off-by: Eric Auger 
---
 hw/arm/smmuv3.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 96d4147533..8db605adab 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1507,6 +1507,11 @@ static int smmuv3_get_attr(IOMMUMemoryRegion *iommu,
 return -EINVAL;
 }
 
+static inline void
+smmuv3_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
+{
+}
+
 static void smmuv3_iommu_memory_region_class_init(ObjectClass *klass,
   void *data)
 {
@@ -1515,6 +1520,7 @@ static void 
smmuv3_iommu_memory_region_class_init(ObjectClass *klass,
 imrc->translate = smmuv3_translate;
 imrc->notify_flag_changed = smmuv3_notify_flag_changed;
 imrc->get_attr = smmuv3_get_attr;
+imrc->replay = smmuv3_replay;
 }
 
 static const TypeInfo smmuv3_type_info = {
-- 
2.20.1




[Qemu-devel] [RFC v4 10/27] memory: Add IOMMUConfigNotifier

2019-05-27 Thread Eric Auger
With this patch, an IOMMUNotifier can now be either
an IOTLB notifier or a config notifier. A config notifier
is supposed to be called on guest translation config change.
This gives host a chance to update the physical IOMMU
configuration so that is consistent with the guest view.

The notifier is passed an IOMMUConfig. The first type of
configuration introduced here consists in the PASID
configuration.

We introduce the associated helpers, iommu_config_notifier_init,
memory_region_config_notify_iommu

Signed-off-by: Eric Auger 

---

v1 -> v2:
- use pasid_table config
- pass IOMMUNotifierFlag flags to iommu_config_notifier_init
  to prepare for other config flags
- Introduce IOMMUConfig
- s/IOMMU_NOTIFIER_S1_CFG/IOMMU_NOTIFIER_PASID_CFG
- remove unused IOMMUStage1ConfigType
---
 hw/vfio/common.c  | 15 -
 include/exec/memory.h | 52 ++-
 memory.c  | 25 +
 3 files changed, 86 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 4183772618..75fb568f95 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -720,11 +720,16 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
 VFIOGuestIOMMU *giommu;
 
 QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
-if (MEMORY_REGION(giommu->iommu) == section->mr &&
-is_iommu_iotlb_notifier(&giommu->n) &&
-giommu->n.iotlb_notifier.start == 
section->offset_within_region) {
-memory_region_unregister_iommu_notifier(section->mr,
-&giommu->n);
+if (MEMORY_REGION(giommu->iommu) == section->mr) {
+if (is_iommu_iotlb_notifier(&giommu->n) &&
+giommu->n.iotlb_notifier.start ==
+section->offset_within_region) {
+memory_region_unregister_iommu_notifier(section->mr,
+&giommu->n);
+} else {
+memory_region_unregister_iommu_notifier(section->mr,
+&giommu->n);
+}
 QLIST_REMOVE(giommu, giommu_next);
 g_free(giommu);
 break;
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 42d10b29ef..701cb83367 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -26,6 +26,9 @@
 #include "qom/object.h"
 #include "qemu/rcu.h"
 #include "hw/qdev-core.h"
+#ifdef CONFIG_LINUX
+#include 
+#endif
 
 #define RAM_ADDR_INVALID (~(ram_addr_t)0)
 
@@ -74,6 +77,14 @@ struct IOMMUTLBEntry {
 IOMMUAccessFlags perm;
 };
 
+typedef struct IOMMUConfig {
+union {
+#ifdef __linux__
+struct iommu_pasid_table_config pasid_cfg;
+#endif
+  };
+} IOMMUConfig;
+
 /*
  * Bitmap for different IOMMUNotifier capabilities. Each notifier can
  * register with one or multiple IOMMU Notifier capability bit(s).
@@ -84,13 +95,18 @@ typedef enum {
 IOMMU_NOTIFIER_IOTLB_UNMAP = 0x1,
 /* Notify entry changes (newly created entries) */
 IOMMU_NOTIFIER_IOTLB_MAP = 0x2,
+/* Notify stage 1 config changes */
+IOMMU_NOTIFIER_CONFIG_PASID = 0x4,
 } IOMMUNotifierFlag;
 
 #define IOMMU_NOTIFIER_IOTLB_ALL (IOMMU_NOTIFIER_IOTLB_MAP | 
IOMMU_NOTIFIER_IOTLB_UNMAP)
+#define IOMMU_NOTIFIER_CONFIG_ALL (IOMMU_NOTIFIER_CONFIG_PASID)
 
 struct IOMMUNotifier;
 typedef void (*IOMMUNotify)(struct IOMMUNotifier *notifier,
 IOMMUTLBEntry *data);
+typedef void (*IOMMUConfigNotify)(struct IOMMUNotifier *notifier,
+  IOMMUConfig *cfg);
 
 typedef struct IOMMUIOLTBNotifier {
 IOMMUNotify notify;
@@ -99,10 +115,15 @@ typedef struct IOMMUIOLTBNotifier {
 hwaddr end;
 } IOMMUIOLTBNotifier;
 
+typedef struct IOMMUConfigNotifier {
+IOMMUConfigNotify notify;
+} IOMMUConfigNotifier;
+
 struct IOMMUNotifier {
 IOMMUNotifierFlag notifier_flags;
 union {
 IOMMUIOLTBNotifier iotlb_notifier;
+IOMMUConfigNotifier config_notifier;
 };
 int iommu_idx;
 QLIST_ENTRY(IOMMUNotifier) node;
@@ -147,6 +168,16 @@ static inline void iommu_iotlb_notifier_init(IOMMUNotifier 
*n, IOMMUNotify fn,
 n->iommu_idx = iommu_idx;
 }
 
+static inline void iommu_config_notifier_init(IOMMUNotifier *n,
+  IOMMUConfigNotify fn,
+  IOMMUNotifierFlag flags,
+  int iommu_idx)
+{
+n->notifier_flags = flags;
+n->iommu_idx = iommu_idx;
+n->config_notifier.notify = fn;
+}
+
 /*
  * Memory region callbacks
  */
@@ -647,6 +678,12 @@ static inline bool is_iommu_iotlb_notifier(IOMMUNotifier 
*n)
 {
 return n->notifier_flags & IOMMU_NOTIFIER_IOTLB_ALL;
 }
+
+static inline bool is_iommu_config_notifier(IOMMUNotifier *

[Qemu-devel] [RFC v4 11/27] memory: Add arch_id and leaf fields in IOTLBEntry

2019-05-27 Thread Eric Auger
TLB entries are usually tagged with some ids such as the asid
or pasid. When propagating an invalidation command from the
guest to the host, we need to pass this id.

Also we add a leaf field which indicates, in case of invalidation
notification whether only cache entries for the last level of
translation are required to be invalidated.

Signed-off-by: Eric Auger 
---
 include/exec/memory.h | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 701cb83367..9f107ebedb 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -69,12 +69,30 @@ typedef enum {
 
 #define IOMMU_ACCESS_FLAG(r, w) (((r) ? IOMMU_RO : 0) | ((w) ? IOMMU_WO : 0))
 
+/**
+ * IOMMUTLBEntry - IOMMU TLB entry
+ *
+ * Structure used when performing a translation or when notifying MAP or
+ * UNMAP (invalidation) events
+ *
+ * @target_as: target address space
+ * @iova: IO virtual address (input)
+ * @translated_addr: translated address (output)
+ * @addr_mask: address mask (0xfff means 4K binding), must be multiple of 2
+ * @perm: permission flag of the mapping (NONE encodes no mapping or
+ * invalidation notification)
+ * @arch_id: architecture specific ID tagging the TLB
+ * @leaf: when @perm is NONE, indicates whether only caches for the last
+ * level of translation need to be invalidated.
+ */
 struct IOMMUTLBEntry {
 AddressSpace*target_as;
 hwaddr   iova;
 hwaddr   translated_addr;
-hwaddr   addr_mask;  /* 0xfff = 4k translation */
+hwaddr   addr_mask;
 IOMMUAccessFlags perm;
+uint32_t arch_id;
+bool leaf;
 };
 
 typedef struct IOMMUConfig {
-- 
2.20.1




[Qemu-devel] [RFC v4 14/27] hw/arm/smmuv3: Fill the IOTLBEntry arch_id on NH_VA invalidation

2019-05-27 Thread Eric Auger
When the guest invalidates one S1 entry, it passes the asid.
When propagating this invalidation downto the host, the asid
information also must be passed. So let's fill the arch_id field
introduced for that purpose.

Signed-off-by: Eric Auger 
---
 hw/arm/smmuv3.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 8db605adab..b6eb61304d 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -822,6 +822,7 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
 entry.iova = iova;
 entry.addr_mask = (1 << tt->granule_sz) - 1;
 entry.perm = IOMMU_NONE;
+entry.arch_id = asid;
 
 memory_region_iotlb_notify_one(n, &entry);
 }
-- 
2.20.1




[Qemu-devel] [RFC v4 17/27] hw/vfio/common: Introduce vfio_alloc_guest_iommu helper

2019-05-27 Thread Eric Auger
Soon this code will be called several times. So let's introduce
an helper.

Signed-off-by: Eric Auger 
---
 hw/vfio/common.c | 25 -
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 75fb568f95..7df8b92563 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -24,6 +24,7 @@
 #include 
 #endif
 #include 
+#include 
 
 #include "hw/vfio/vfio-common.h"
 #include "hw/vfio/vfio.h"
@@ -497,6 +498,19 @@ out:
 rcu_read_unlock();
 }
 
+static VFIOGuestIOMMU *vfio_alloc_guest_iommu(VFIOContainer *container,
+  IOMMUMemoryRegion *iommu,
+  hwaddr offset)
+{
+VFIOGuestIOMMU *giommu = g_new0(VFIOGuestIOMMU, 1);
+
+giommu->container = container;
+giommu->iommu = iommu;
+giommu->iommu_offset = offset;
+/* notifier will be registered separately */
+return giommu;
+}
+
 static void vfio_listener_region_add(MemoryListener *listener,
  MemoryRegionSection *section)
 {
@@ -604,6 +618,7 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 if (memory_region_is_iommu(section->mr)) {
 VFIOGuestIOMMU *giommu;
 IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(section->mr);
+hwaddr offset;
 int iommu_idx;
 
 trace_vfio_listener_region_add_iommu(iova, end);
@@ -613,11 +628,11 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
  * would be the right place to wire that up (tell the KVM
  * device emulation the VFIO iommu handles to use).
  */
-giommu = g_malloc0(sizeof(*giommu));
-giommu->iommu = iommu_mr;
-giommu->iommu_offset = section->offset_within_address_space -
-   section->offset_within_region;
-giommu->container = container;
+
+offset = section->offset_within_address_space -
+section->offset_within_region;
+giommu = vfio_alloc_guest_iommu(container, iommu_mr, offset);
+
 llend = int128_add(int128_make64(section->offset_within_region),
section->size);
 llend = int128_sub(llend, int128_one());
-- 
2.20.1




[Qemu-devel] [RFC v4 15/27] hw/arm/smmuv3: Fill the IOTLBEntry leaf field on NH_VA invalidation

2019-05-27 Thread Eric Auger
Let's propagate the leaf attribute throughout the invalidation path.
This hint is used to reduce the scope of the invalidations to the
last level of translation. Not enforcing it induces large performance
penalties in nested mode.

Signed-off-by: Eric Auger 
---
 hw/arm/smmuv3.c | 16 +---
 hw/arm/trace-events |  2 +-
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index b6eb61304d..f2f3724686 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -792,8 +792,7 @@ epilogue:
  */
 static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
IOMMUNotifier *n,
-   int asid,
-   dma_addr_t iova)
+   int asid, dma_addr_t iova, bool leaf)
 {
 SMMUDevice *sdev = container_of(mr, SMMUDevice, iommu);
 SMMUEventInfo event = {};
@@ -823,12 +822,14 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
 entry.addr_mask = (1 << tt->granule_sz) - 1;
 entry.perm = IOMMU_NONE;
 entry.arch_id = asid;
+entry.leaf = leaf;
 
 memory_region_iotlb_notify_one(n, &entry);
 }
 
 /* invalidate an asid/iova tuple in all mr's */
-static void smmuv3_inv_notifiers_iova(SMMUState *s, int asid, dma_addr_t iova)
+static void smmuv3_inv_notifiers_iova(SMMUState *s, int asid,
+  dma_addr_t iova, bool leaf)
 {
 SMMUDevice *sdev;
 
@@ -840,7 +841,7 @@ static void smmuv3_inv_notifiers_iova(SMMUState *s, int 
asid, dma_addr_t iova)
 
 IOMMU_NOTIFIER_FOREACH(n, mr) {
 if (n->notifier_flags & IOMMU_NOTIFIER_IOTLB_UNMAP) {
-smmuv3_notify_iova(mr, n, asid, iova);
+smmuv3_notify_iova(mr, n, asid, iova, leaf);
 }
 }
 }
@@ -979,9 +980,10 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 {
 dma_addr_t addr = CMD_ADDR(&cmd);
 uint16_t vmid = CMD_VMID(&cmd);
+bool leaf = CMD_LEAF(&cmd);
 
-trace_smmuv3_cmdq_tlbi_nh_vaa(vmid, addr);
-smmuv3_inv_notifiers_iova(bs, -1, addr);
+trace_smmuv3_cmdq_tlbi_nh_vaa(vmid, addr, leaf);
+smmuv3_inv_notifiers_iova(bs, -1, addr, leaf);
 smmu_iotlb_inv_all(bs);
 break;
 }
@@ -993,7 +995,7 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 bool leaf = CMD_LEAF(&cmd);
 
 trace_smmuv3_cmdq_tlbi_nh_va(vmid, asid, addr, leaf);
-smmuv3_inv_notifiers_iova(bs, asid, addr);
+smmuv3_inv_notifiers_iova(bs, asid, addr, leaf);
 smmu_iotlb_inv_iova(bs, asid, addr);
 break;
 }
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index 0acedcedc6..3809005cba 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -43,7 +43,7 @@ smmuv3_cmdq_cfgi_cd(uint32_t sid) "streamid = %d"
 smmuv3_config_cache_hit(uint32_t sid, uint32_t hits, uint32_t misses, uint32_t 
perc) "Config cache HIT for sid %d (hits=%d, misses=%d, hit rate=%d)"
 smmuv3_config_cache_miss(uint32_t sid, uint32_t hits, uint32_t misses, 
uint32_t perc) "Config cache MISS for sid %d (hits=%d, misses=%d, hit rate=%d)"
 smmuv3_cmdq_tlbi_nh_va(int vmid, int asid, uint64_t addr, bool leaf) "vmid =%d 
asid =%d addr=0x%"PRIx64" leaf=%d"
-smmuv3_cmdq_tlbi_nh_vaa(int vmid, uint64_t addr) "vmid =%d addr=0x%"PRIx64
+smmuv3_cmdq_tlbi_nh_vaa(int vmid, uint64_t addr, bool leaf) "vmid =%d 
addr=0x%"PRIx64" leaf=%d"
 smmuv3_cmdq_tlbi_nh(void) ""
 smmuv3_cmdq_tlbi_nh_asid(uint16_t asid) "asid=%d"
 smmu_iotlb_cache_hit(uint16_t asid, uint64_t addr, uint32_t hit, uint32_t 
miss, uint32_t p) "IOTLB cache HIT asid=%d addr=0x%"PRIx64" hit=%d miss=%d hit 
rate=%d"
-- 
2.20.1




[Qemu-devel] [RFC v4 16/27] hw/arm/smmuv3: Notify on config changes

2019-05-27 Thread Eric Auger
In case IOMMU config notifiers are attached to the
IOMMU memory region, we execute them, passing as argument
the iommu_pasid_table_config struct updated with the new
viommu translation config. Config notifiers are called on
STE changes. At physical level, they translate into
CMD_CFGI_STE_* commands.

Signed-off-by: Eric Auger 

---
v3 -> v4:
- fix compile issue with mingw

v2 -> v3:
- adapt to pasid_cfg field changes. Use local variable
- add trace event
- set version fields
- use CONFIG_PASID

v1 -> v2:
- do not notify anymore on CD change. Anyway the smmuv3 linux
  driver is not sending any CD invalidation commands. If we were
  to propagate CD invalidation commands, we would use the
  CACHE_INVALIDATE VFIO ioctl.
- notify a precise config flags to prepare for addition of new
  flags
---
 hw/arm/smmuv3.c | 76 +++--
 hw/arm/trace-events |  1 +
 2 files changed, 60 insertions(+), 17 deletions(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index f2f3724686..db03313672 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -16,6 +16,10 @@
  * with this program; if not, see .
  */
 
+#ifdef __linux__
+#include "linux/iommu.h"
+#endif
+
 #include "qemu/osdep.h"
 #include "hw/boards.h"
 #include "sysemu/sysemu.h"
@@ -847,6 +851,59 @@ static void smmuv3_inv_notifiers_iova(SMMUState *s, int 
asid,
 }
 }
 
+static void smmuv3_notify_config_change(SMMUState *bs, uint32_t sid)
+{
+#ifdef __linux__
+IOMMUMemoryRegion *mr = smmu_iommu_mr(bs, sid);
+SMMUEventInfo event = {.type = SMMU_EVT_NONE, .sid = sid};
+SMMUTransCfg *cfg;
+SMMUDevice *sdev;
+
+if (!mr) {
+return;
+}
+
+sdev = container_of(mr, SMMUDevice, iommu);
+
+/* flush QEMU config cache */
+smmuv3_flush_config(sdev);
+
+if (mr->iommu_notify_flags & IOMMU_NOTIFIER_CONFIG_PASID) {
+/* force a guest RAM config structure decoding */
+cfg = smmuv3_get_config(sdev, &event);
+
+if (cfg) {
+IOMMUConfig iommu_config = {
+.pasid_cfg.version = PASID_TABLE_CFG_VERSION_1,
+.pasid_cfg.format = IOMMU_PASID_FORMAT_SMMUV3,
+.pasid_cfg.base_ptr = cfg->s1ctxptr,
+.pasid_cfg.smmuv3.version = PASID_TABLE_SMMUV3_CFG_VERSION_1,
+};
+
+if (cfg->disabled || cfg->bypassed) {
+iommu_config.pasid_cfg.config = IOMMU_PASID_CONFIG_BYPASS;
+} else if (cfg->aborted) {
+iommu_config.pasid_cfg.config = IOMMU_PASID_CONFIG_ABORT;
+} else {
+iommu_config.pasid_cfg.config = IOMMU_PASID_CONFIG_TRANSLATE;
+}
+
+trace_smmuv3_notify_config_change(mr->parent_obj.name,
+  iommu_config.pasid_cfg.config,
+  iommu_config.pasid_cfg.base_ptr);
+
+memory_region_config_notify_iommu(mr, 0,
+  IOMMU_NOTIFIER_CONFIG_PASID,
+  &iommu_config);
+} else {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s error decoding the configuration for iommu 
mr=%s\n",
+ __func__, mr->parent_obj.name);
+}
+}
+#endif
+}
+
 static int smmuv3_cmdq_consume(SMMUv3State *s)
 {
 SMMUState *bs = ARM_SMMU(s);
@@ -897,22 +954,14 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 case SMMU_CMD_CFGI_STE:
 {
 uint32_t sid = CMD_SID(&cmd);
-IOMMUMemoryRegion *mr = smmu_iommu_mr(bs, sid);
-SMMUDevice *sdev;
 
 if (CMD_SSEC(&cmd)) {
 cmd_error = SMMU_CERROR_ILL;
 break;
 }
 
-if (!mr) {
-break;
-}
-
 trace_smmuv3_cmdq_cfgi_ste(sid);
-sdev = container_of(mr, SMMUDevice, iommu);
-smmuv3_flush_config(sdev);
-
+smmuv3_notify_config_change(bs, sid);
 break;
 }
 case SMMU_CMD_CFGI_STE_RANGE: /* same as SMMU_CMD_CFGI_ALL */
@@ -929,14 +978,7 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 trace_smmuv3_cmdq_cfgi_ste_range(start, end);
 
 for (i = start; i <= end; i++) {
-IOMMUMemoryRegion *mr = smmu_iommu_mr(bs, i);
-SMMUDevice *sdev;
-
-if (!mr) {
-continue;
-}
-sdev = container_of(mr, SMMUDevice, iommu);
-smmuv3_flush_config(sdev);
+smmuv3_notify_config_change(bs, i);
 }
 break;
 }
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index 3809005cba..741e645ae2 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -52,4 +52,5 @@ smmuv3_config_cache_inv(uint32_t sid) "Config cache INV for 
sid %d"
 smmuv3_notify_flag_add(const char *io

[Qemu-devel] [RFC v4 25/27] vfio-pci: register handler for iommu fault

2019-05-27 Thread Eric Auger
We use the VFIO_PCI_DMA_FAULT_IRQ_INDEX "irq" index to set/unset
a notifier for physical DMA faults. The associated eventfd is
triggered, in nested mode, whenever a fault is detected at IOMMU
physical level.

As this is the first use of this new IRQ index, also handle it
in irq_to_str() in case the signaling setup fails.

The actual handler will be implemented in subsequent patches.

Signed-off-by: Eric Auger 

---

v3 -> v4:
- check VFIO_PCI_DMA_FAULT_IRQ_INDEX is supported at kernel level
  before attempting to set signaling for it.
---
 hw/vfio/common.c |  3 +++
 hw/vfio/pci.c| 52 
 hw/vfio/pci.h|  1 +
 3 files changed, 56 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 532ede0e70..cf0087321e 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -130,6 +130,9 @@ static char *irq_to_str(int index, int subindex)
 case VFIO_PCI_REQ_IRQ_INDEX:
 str = g_strdup_printf("REQ-%d", subindex);
 break;
+case VFIO_PCI_DMA_FAULT_IRQ_INDEX:
+str = g_strdup_printf("DMA-FAULT-%d", subindex);
+break;
 default:
 str = g_strdup_printf("index %d (unknown)", index);
 break;
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index b613b20501..29d4f633b0 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2736,6 +2736,56 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice 
*vdev)
 vdev->req_enabled = false;
 }
 
+static void vfio_dma_fault_notifier_handler(void *opaque)
+{
+VFIOPCIDevice *vdev = opaque;
+
+if (!event_notifier_test_and_clear(&vdev->dma_fault_notifier)) {
+return;
+}
+}
+
+static void vfio_register_dma_fault_notifier(VFIOPCIDevice *vdev)
+{
+struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info),
+  .index = VFIO_PCI_DMA_FAULT_IRQ_INDEX };
+Error *err = NULL;
+int32_t fd;
+
+if (ioctl(vdev->vbasedev.fd,
+  VFIO_DEVICE_GET_IRQ_INFO, &irq_info) < 0 || irq_info.count < 1) {
+return;
+}
+
+if (event_notifier_init(&vdev->dma_fault_notifier, 0)) {
+error_report("vfio: Unable to init event notifier for dma fault");
+return;
+}
+
+fd = event_notifier_get_fd(&vdev->dma_fault_notifier);
+qemu_set_fd_handler(fd, vfio_dma_fault_notifier_handler, NULL, vdev);
+
+if (vfio_set_irq_signaling(&vdev->vbasedev, VFIO_PCI_DMA_FAULT_IRQ_INDEX, 
0,
+   VFIO_IRQ_SET_ACTION_TRIGGER, fd, &err)) {
+error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
+qemu_set_fd_handler(fd, NULL, NULL, vdev);
+event_notifier_cleanup(&vdev->dma_fault_notifier);
+}
+}
+
+static void vfio_unregister_dma_fault_notifier(VFIOPCIDevice *vdev)
+{
+Error *err = NULL;
+
+if (vfio_set_irq_signaling(&vdev->vbasedev, VFIO_PCI_DMA_FAULT_IRQ_INDEX, 
0,
+   VFIO_IRQ_SET_ACTION_TRIGGER, -1, &err)) {
+error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
+}
+qemu_set_fd_handler(event_notifier_get_fd(&vdev->dma_fault_notifier),
+NULL, NULL, vdev);
+event_notifier_cleanup(&vdev->dma_fault_notifier);
+}
+
 static void vfio_realize(PCIDevice *pdev, Error **errp)
 {
 VFIOPCIDevice *vdev = PCI_VFIO(pdev);
@@ -3035,6 +3085,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 
 vfio_register_err_notifier(vdev);
 vfio_register_req_notifier(vdev);
+vfio_register_dma_fault_notifier(vdev);
 vfio_setup_resetfn_quirk(vdev);
 
 return;
@@ -3073,6 +3124,7 @@ static void vfio_exitfn(PCIDevice *pdev)
 
 vfio_unregister_req_notifier(vdev);
 vfio_unregister_err_notifier(vdev);
+vfio_unregister_dma_fault_notifier(vdev);
 pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
 vfio_disable_interrupts(vdev);
 if (vdev->intx.mmap_timer) {
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index cfcd1a81b8..96d29d667b 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -135,6 +135,7 @@ typedef struct VFIOPCIDevice {
 PCIHostDeviceAddress host;
 EventNotifier err_notifier;
 EventNotifier req_notifier;
+EventNotifier dma_fault_notifier;
 int (*resetfn)(struct VFIOPCIDevice *);
 uint32_t vendor_id;
 uint32_t device_id;
-- 
2.20.1




[Qemu-devel] [RFC v4 22/27] vfio-pci: Expose MSI stage 1 bindings to the host

2019-05-27 Thread Eric Auger
When the guest is exposed with a virtual IOMMU that translates
MSIs, the guest allocates an IOVA (gIOVA) that maps the virtual
doorbell (gDB). In nested mode, when the MSI is setup, we pass
this stage1 mapping to the host so that it can use this stage1
binding to create a nested stage translating into the physical
doorbell. Conversely, when the MSI setup os torn down, we
unregister this binding.

For registration, We directly use the iommu memory region
translate() callback since the addr_mask is returned in the
IOTLB entry. address_space_translate does not return this information.

Now that we use a MAP notifier, let's remove warning against
the usage of map notifiers (historically used along with Intel's
caching mode).

Signed-off-by: Eric Auger 

---
v3 -> v4:
- move the MSI binding registration in vfio_enable_vectors
  to address the MSI use case
---
 hw/arm/smmuv3.c  |  8 ---
 hw/vfio/pci.c| 50 +++-
 hw/vfio/trace-events |  2 ++
 3 files changed, 51 insertions(+), 9 deletions(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index db03313672..a697968ace 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1521,14 +1521,6 @@ static void smmuv3_notify_flag_changed(IOMMUMemoryRegion 
*iommu,
 SMMUv3State *s3 = sdev->smmu;
 SMMUState *s = &(s3->smmu_state);
 
-if (new & IOMMU_NOTIFIER_IOTLB_MAP) {
-int bus_num = pci_bus_num(sdev->bus);
-PCIDevice *pcidev = pci_find_device(sdev->bus, bus_num, sdev->devfn);
-
-warn_report("SMMUv3 does not support notification on MAP: "
- "device %s will not function properly", pcidev->name);
-}
-
 if (old == IOMMU_NOTIFIER_NONE) {
 trace_smmuv3_notify_flag_add(iommu->parent_obj.name);
 QLIST_INSERT_HEAD(&s->devices_with_notifiers, sdev, next);
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 3095379747..b613b20501 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -358,6 +358,48 @@ static void vfio_msi_interrupt(void *opaque)
 notify(&vdev->pdev, nr);
 }
 
+static int vfio_register_msi_binding(VFIOPCIDevice *vdev, int vector_n)
+{
+PCIDevice *dev = &vdev->pdev;
+AddressSpace *as = pci_device_iommu_address_space(dev);
+MSIMessage msg = pci_get_msi_message(dev, vector_n);
+IOMMUMemoryRegionClass *imrc;
+IOMMUMemoryRegion *iommu_mr;
+bool msi_translate = false, nested = false;;
+IOMMUTLBEntry entry;
+
+if (as == &address_space_memory) {
+return 0;
+}
+
+iommu_mr = IOMMU_MEMORY_REGION(as->root);
+memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_MSI_TRANSLATE,
+ (void *)&msi_translate);
+memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_VFIO_NESTED,
+ (void *)&nested);
+imrc = memory_region_get_iommu_class_nocheck(iommu_mr);
+
+if (!nested || !msi_translate) {
+return 0;
+}
+
+/* MSI doorbell address is translated by an IOMMU */
+
+rcu_read_lock();
+entry = imrc->translate(iommu_mr, msg.address, IOMMU_WO, 0);
+rcu_read_unlock();
+
+if (entry.perm == IOMMU_NONE) {
+return -ENOENT;
+}
+
+trace_vfio_register_msi_binding(vdev->vbasedev.name, vector_n,
+msg.address, entry.translated_addr);
+
+memory_region_iotlb_notify_iommu(iommu_mr, 0, entry);
+return 0;
+}
+
 static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
 {
 struct vfio_irq_set *irq_set;
@@ -375,7 +417,7 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool 
msix)
 fds = (int32_t *)&irq_set->data;
 
 for (i = 0; i < vdev->nr_vectors; i++) {
-int fd = -1;
+int ret, fd = -1;
 
 /*
  * MSI vs MSI-X - The guest has direct access to MSI mask and pending
@@ -390,6 +432,12 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool 
msix)
 } else {
 fd = 
event_notifier_get_fd(&vdev->msi_vectors[i].kvm_interrupt);
 }
+ret = vfio_register_msi_binding(vdev, i);
+if (ret) {
+error_report("%s failed to register S1 MSI binding "
+ "for vector %d(%d)", __func__, i, ret);
+return ret;
+}
 }
 
 fds[i] = fd;
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 9f1868af2d..5de97a8882 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -117,6 +117,8 @@ vfio_get_dev_region(const char *name, int index, uint32_t 
type, uint32_t subtype
 vfio_dma_unmap_overflow_workaround(void) ""
 vfio_iommu_addr_inv_iotlb(int asid, uint64_t addr, uint64_t size, uint64_t 
nb_granules, bool leaf) "nested IOTLB invalidate asid=%d, addr=0x%"PRIx64" 
granule_size=0x%"PRIx64" nb_granules=0x%"PRIx64" leaf=%d"
 vfio_iommu_asid_inv_iotlb(int asid) "nested IOTLB invalidate asid=%d"
+vfio_register_msi_binding(const char *name, int vector, uint64_t giova, 
uint64_t gdb) "%s: register ve

[Qemu-devel] [RFC v4 20/27] hw/vfio/common: Setup nested stage mappings

2019-05-27 Thread Eric Auger
In nested mode, legacy vfio_iommu_map_notify cannot be used as
there is no "caching" mode and we do not trap on map.

On Intel, vfio_iommu_map_notify was used to DMA map the RAM
through the host single stage.

With nested mode, we need to setup the stage 2 and the stage 1
separately. This patch introduces a prereg_lsitener to setup
the stage 2 mapping.

The stage 1 mapping, owned by the guest, is passed to the host
when the guest invalidates the stage 1 configuration, through
a dedicated config IOMMU notifier. Guest IOTLB invalidations
are cascaded downto the host through another IOMMU MR UNMAP
notifier.

Signed-off-by: Eric Auger 

---

v3 -> v4:
- use iommu_inv_pasid_info for ASID invalidation

v2 -> v3:
- use VFIO_IOMMU_ATTACH_PASID_TABLE
- new user API
- handle leaf

v1 -> v2:
- adapt to uapi changes
- pass the asid
- pass IOMMU_NOTIFIER_S1_CFG when initializing the config notifier
---
 hw/vfio/common.c | 151 +++
 hw/vfio/trace-events |   2 +
 2 files changed, 142 insertions(+), 11 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 26bc2ab19f..084e3f30e6 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -445,6 +445,71 @@ static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, void 
**vaddr,
 return true;
 }
 
+/* Pass the guest stage 1 config to the host */
+static void vfio_iommu_nested_notify(IOMMUNotifier *n, IOMMUConfig *cfg)
+{
+VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
+VFIOContainer *container = giommu->container;
+struct vfio_iommu_type1_attach_pasid_table info;
+int ret;
+
+info.argsz = sizeof(info);
+info.flags = 0;
+memcpy(&info.config, &cfg->pasid_cfg, sizeof(cfg->pasid_cfg));
+
+ret = ioctl(container->fd, VFIO_IOMMU_ATTACH_PASID_TABLE, &info);
+if (ret) {
+error_report("%p: failed to pass S1 config to the host (%d)",
+ container, ret);
+}
+}
+
+/* Propagate a guest IOTLB invalidation to the host (nested mode) */
+static void vfio_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
+{
+VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
+hwaddr start = iotlb->iova + giommu->iommu_offset;
+
+VFIOContainer *container = giommu->container;
+struct vfio_iommu_type1_cache_invalidate ustruct;
+size_t size = iotlb->addr_mask + 1;
+int ret;
+
+assert(iotlb->perm == IOMMU_NONE);
+
+ustruct.argsz = sizeof(ustruct);
+ustruct.flags = 0;
+ustruct.info.version = IOMMU_CACHE_INVALIDATE_INFO_VERSION_1;
+
+if (size <= 0x1) {
+ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
+ustruct.info.granularity = IOMMU_INV_GRANU_ADDR;
+ustruct.info.addr_info.flags = IOMMU_INV_ADDR_FLAGS_ARCHID;
+if (iotlb->leaf) {
+ustruct.info.addr_info.flags |= IOMMU_INV_ADDR_FLAGS_LEAF;
+}
+ustruct.info.addr_info.archid = iotlb->arch_id;
+ustruct.info.addr_info.addr = start;
+ustruct.info.addr_info.granule_size = size;
+ustruct.info.addr_info.nb_granules = 1;
+trace_vfio_iommu_addr_inv_iotlb(iotlb->arch_id, start, size, 1,
+iotlb->leaf);
+} else {
+ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
+ustruct.info.granularity = IOMMU_INV_GRANU_PASID;
+ustruct.info.pasid_info.archid = iotlb->arch_id;
+ustruct.info.pasid_info.flags = IOMMU_INV_PASID_FLAGS_ARCHID;
+trace_vfio_iommu_asid_inv_iotlb(iotlb->arch_id);
+}
+
+ret = ioctl(container->fd, VFIO_IOMMU_CACHE_INVALIDATE, &ustruct);
+if (ret) {
+error_report("%p: failed to invalidate CACHE for 0x%"PRIx64
+ " mask=0x%"PRIx64" (%d)",
+ container, start, iotlb->addr_mask, ret);
+}
+}
+
 static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
 {
 VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
@@ -628,6 +693,32 @@ static void vfio_dma_unmap_ram_section(VFIOContainer 
*container,
 }
 }
 
+static void vfio_prereg_listener_region_add(MemoryListener *listener,
+MemoryRegionSection *section)
+{
+VFIOContainer *container =
+container_of(listener, VFIOContainer, prereg_listener);
+
+if (!memory_region_is_ram(section->mr)) {
+return;
+}
+
+vfio_dma_map_ram_section(container, section);
+
+}
+static void vfio_prereg_listener_region_del(MemoryListener *listener,
+ MemoryRegionSection *section)
+{
+VFIOContainer *container =
+container_of(listener, VFIOContainer, prereg_listener);
+
+if (!memory_region_is_ram(section->mr)) {
+return;
+}
+
+vfio_dma_unmap_ram_section(container, section);
+}
+
 static void vfio_listener_region_add(MemoryListener *listener,
  MemoryRegionSection *section)
 {
@@ -739,21 +830,40 @@ static void vfio_listener_region

[Qemu-devel] [RFC v4 18/27] hw/vfio/common: Introduce hostwin_from_range helper

2019-05-27 Thread Eric Auger
Let's introduce a hostwin_from_range() helper that returns the
hostwin encapsulating an IOVA range or NULL if non is found.

This improves the readibility of callers and removes the usage
of hostwin_found.

Signed-off-by: Eric Auger 
---
 hw/vfio/common.c | 37 ++---
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 7df8b92563..5c4b444f24 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -511,6 +511,19 @@ static VFIOGuestIOMMU 
*vfio_alloc_guest_iommu(VFIOContainer *container,
 return giommu;
 }
 
+static VFIOHostDMAWindow *
+hostwin_from_range(VFIOContainer *container, hwaddr iova, hwaddr end)
+{
+VFIOHostDMAWindow *hostwin;
+
+QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
+if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
+return hostwin;
+}
+}
+return NULL;
+}
+
 static void vfio_listener_region_add(MemoryListener *listener,
  MemoryRegionSection *section)
 {
@@ -520,7 +533,6 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 void *vaddr;
 int ret;
 VFIOHostDMAWindow *hostwin;
-bool hostwin_found;
 
 if (vfio_listener_skipped_section(section)) {
 trace_vfio_listener_region_add_skip(
@@ -597,15 +609,8 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 #endif
 }
 
-hostwin_found = false;
-QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
-if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
-hostwin_found = true;
-break;
-}
-}
-
-if (!hostwin_found) {
+hostwin = hostwin_from_range(container, iova, end);
+if (!hostwin) {
 error_report("vfio: IOMMU container %p can't map guest IOVA region"
  " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx,
  container, iova, end);
@@ -776,16 +781,10 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
 
 if (memory_region_is_ram_device(section->mr)) {
 hwaddr pgmask;
-VFIOHostDMAWindow *hostwin;
-bool hostwin_found = false;
+VFIOHostDMAWindow *hostwin =
+hostwin_from_range(container, iova, end);
 
-QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
-if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
-hostwin_found = true;
-break;
-}
-}
-assert(hostwin_found); /* or region_add() would have failed */
+assert(hostwin); /* or region_add() would have failed */
 
 pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
 try_unmap = !((iova & pgmask) || (int128_get64(llsize) & pgmask));
-- 
2.20.1




[Qemu-devel] [RFC v4 21/27] hw/vfio/common: Register a MAP notifier for MSI binding

2019-05-27 Thread Eric Auger
Instantiate a MAP notifier to register the MSI stage 1
binding (gIOVA -> gDB) to the host. This allows the host
to build a nested mapping towards the physical doorbell:
guest IOVA -> guest Doorbell -> physical doorbell.
  Stage1  Stage 2

The unregistration is done on VFIO container deallocation.

Signed-off-by: Eric Auger 

---

v2 -> v3:
- only register the notifier if the IOMMU translates MSIs
- record the msi bindings in a container list and unregister on
  container release
---
 hw/vfio/common.c  | 69 +++
 include/hw/vfio/vfio-common.h |  8 
 2 files changed, 77 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 084e3f30e6..532ede0e70 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -510,6 +510,56 @@ static void vfio_iommu_unmap_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
 }
 }
 
+static void vfio_iommu_msi_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
+{
+VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
+VFIOContainer *container = giommu->container;
+int ret;
+
+struct vfio_iommu_type1_bind_msi ustruct;
+VFIOMSIBinding *binding;
+
+QLIST_FOREACH(binding, &container->msibinding_list, next) {
+if (binding->iova == iotlb->iova) {
+return;
+}
+}
+ustruct.argsz = sizeof(struct vfio_iommu_type1_bind_msi);
+ustruct.flags = 0;
+
+ustruct.iova = iotlb->iova;
+ustruct.gpa = iotlb->translated_addr;
+ustruct.size = iotlb->addr_mask + 1;
+ret = ioctl(container->fd, VFIO_IOMMU_BIND_MSI , &ustruct);
+if (ret) {
+error_report("%s: failed to register the stage1 MSI binding (%d)",
+ __func__, ret);
+}
+binding =  g_new0(VFIOMSIBinding, 1);
+binding->iova = ustruct.iova;
+binding->gpa = ustruct.gpa;
+binding->size = ustruct.size;
+
+QLIST_INSERT_HEAD(&container->msibinding_list, binding, next);
+}
+
+static void vfio_container_unbind_msis(VFIOContainer *container)
+{
+VFIOMSIBinding *binding, *tmp;
+
+QLIST_FOREACH_SAFE(binding, &container->msibinding_list, next, tmp) {
+struct vfio_iommu_type1_unbind_msi ustruct;
+
+/* the MSI doorbell is not used anymore, unregister it */
+ustruct.argsz = sizeof(struct vfio_iommu_type1_unbind_msi);
+ustruct.flags = 0;
+ustruct.iova = binding->iova;
+ioctl(container->fd, VFIO_IOMMU_UNBIND_MSI , &ustruct);
+QLIST_REMOVE(binding, next);
+g_free(binding);
+}
+}
+
 static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
 {
 VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
@@ -837,6 +887,8 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
MEMTXATTRS_UNSPECIFIED);
 
 if (container->iommu_type == VFIO_TYPE1_NESTING_IOMMU) {
+bool translate_msi;
+
 /* Config notifier to propagate guest stage 1 config changes */
 giommu = vfio_alloc_guest_iommu(container, iommu_mr, offset);
 iommu_config_notifier_init(&giommu->n, vfio_iommu_nested_notify,
@@ -853,6 +905,21 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
   iommu_idx);
 QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
 memory_region_register_iommu_notifier(section->mr, &giommu->n);
+
+memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_MSI_TRANSLATE,
+ (void *)&translate_msi);
+if (translate_msi) {
+giommu = vfio_alloc_guest_iommu(container, iommu_mr, offset);
+iommu_iotlb_notifier_init(&giommu->n,
+  vfio_iommu_msi_map_notify,
+  IOMMU_NOTIFIER_IOTLB_MAP,
+  section->offset_within_region,
+  int128_get64(llend),
+  iommu_idx);
+QLIST_INSERT_HEAD(&container->giommu_list, giommu,
+  giommu_next);
+memory_region_register_iommu_notifier(section->mr, &giommu->n);
+}
 } else {
 /* MAP/UNMAP IOTLB notifier */
 giommu = vfio_alloc_guest_iommu(container, iommu_mr, offset);
@@ -1629,6 +1696,8 @@ static void vfio_disconnect_container(VFIOGroup *group)
 g_free(giommu);
 }
 
+vfio_container_unbind_msis(container);
+
 trace_vfio_disconnect_container(container->fd);
 close(container->fd);
 g_free(container);
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 686d99ff8c..c862d87725 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -64,6 +64,13 @@ typedef struct VFIOAddressSpace {

[Qemu-devel] [RFC v4 19/27] hw/vfio/common: Introduce helpers to DMA map/unmap a RAM section

2019-05-27 Thread Eric Auger
Let's introduce two helpers that allow to DMA map/unmap a RAM
section. Those helpers will be called for nested stage setup in
another call site. Also the vfio_listener_region_add/del()
structure may be clearer.

Signed-off-by: Eric Auger 
---
 hw/vfio/common.c | 178 ++-
 hw/vfio/trace-events |   4 +-
 2 files changed, 109 insertions(+), 73 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 5c4b444f24..26bc2ab19f 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -524,13 +524,116 @@ hostwin_from_range(VFIOContainer *container, hwaddr 
iova, hwaddr end)
 return NULL;
 }
 
+static int vfio_dma_map_ram_section(VFIOContainer *container,
+MemoryRegionSection *section)
+{
+VFIOHostDMAWindow *hostwin;
+Int128 llend, llsize;
+hwaddr iova, end;
+void *vaddr;
+int ret;
+
+assert(memory_region_is_ram(section->mr));
+
+iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+llend = int128_make64(section->offset_within_address_space);
+llend = int128_add(llend, section->size);
+llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
+end = int128_get64(int128_sub(llend, int128_one()));
+
+vaddr = memory_region_get_ram_ptr(section->mr) +
+section->offset_within_region +
+(iova - section->offset_within_address_space);
+
+hostwin = hostwin_from_range(container, iova, end);
+if (!hostwin) {
+error_report("vfio: IOMMU container %p can't map guest IOVA region"
+ " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx,
+ container, iova, end);
+return -EFAULT;
+}
+
+trace_vfio_dma_map_ram(iova, end, vaddr);
+
+llsize = int128_sub(llend, int128_make64(iova));
+
+if (memory_region_is_ram_device(section->mr)) {
+hwaddr pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
+
+if ((iova & pgmask) || (int128_get64(llsize) & pgmask)) {
+trace_vfio_listener_region_add_no_dma_map(
+memory_region_name(section->mr),
+section->offset_within_address_space,
+int128_getlo(section->size),
+pgmask + 1);
+return 0;
+}
+}
+
+ret = vfio_dma_map(container, iova, int128_get64(llsize),
+   vaddr, section->readonly);
+if (ret) {
+error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
+ "0x%"HWADDR_PRIx", %p) = %d (%m)",
+ container, iova, int128_get64(llsize), vaddr, ret);
+if (memory_region_is_ram_device(section->mr)) {
+/* Allow unexpected mappings not to be fatal for RAM devices */
+return 0;
+}
+return ret;
+}
+return 0;
+}
+
+static void vfio_dma_unmap_ram_section(VFIOContainer *container,
+   MemoryRegionSection *section)
+{
+Int128 llend, llsize;
+hwaddr iova, end;
+bool try_unmap = true;
+int ret;
+
+iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+llend = int128_make64(section->offset_within_address_space);
+llend = int128_add(llend, section->size);
+llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
+
+if (int128_ge(int128_make64(iova), llend)) {
+return;
+}
+end = int128_get64(int128_sub(llend, int128_one()));
+
+llsize = int128_sub(llend, int128_make64(iova));
+
+trace_vfio_dma_unmap_ram(iova, end);
+
+if (memory_region_is_ram_device(section->mr)) {
+hwaddr pgmask;
+VFIOHostDMAWindow *hostwin =
+hostwin_from_range(container, iova, end);
+
+assert(hostwin); /* or region_add() would have failed */
+
+pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
+try_unmap = !((iova & pgmask) || (int128_get64(llsize) & pgmask));
+}
+
+if (try_unmap) {
+ret = vfio_dma_unmap(container, iova, int128_get64(llsize));
+if (ret) {
+error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
+ "0x%"HWADDR_PRIx") = %d (%m)",
+ container, iova, int128_get64(llsize), ret);
+}
+}
+}
+
 static void vfio_listener_region_add(MemoryListener *listener,
  MemoryRegionSection *section)
 {
 VFIOContainer *container = container_of(listener, VFIOContainer, listener);
 hwaddr iova, end;
-Int128 llend, llsize;
-void *vaddr;
+Int128 llend;
 int ret;
 VFIOHostDMAWindow *hostwin;
 
@@ -657,41 +760,10 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 }
 
 /* Here we assume that memory_region_is_ram(section->mr)==true */
-
-vaddr = memory_region_get_ram_ptr(section->mr) +
-section->offset_within_region +
-(iova - section->offset_within_address_space);
-
-trace_vfio_listener_region_add_ram(iova, end, vaddr);
-
-  

[Qemu-devel] [RFC v4 23/27] memory: Introduce IOMMU Memory Region inject_faults API

2019-05-27 Thread Eric Auger
This new API allows to inject @count iommu_faults into
the IOMMU memory region.

Signed-off-by: Eric Auger 
---
 include/exec/memory.h | 25 +
 memory.c  | 10 ++
 2 files changed, 35 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 9f107ebedb..593ee7fc50 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -57,6 +57,8 @@ struct MemoryRegionMmio {
 CPUWriteMemoryFunc *write[3];
 };
 
+struct iommu_fault;
+
 typedef struct IOMMUTLBEntry IOMMUTLBEntry;
 
 /* See address_space_translate: bit 0 is read, bit 1 is write.  */
@@ -400,6 +402,19 @@ typedef struct IOMMUMemoryRegionClass {
  * @iommu: the IOMMUMemoryRegion
  */
 int (*num_indexes)(IOMMUMemoryRegion *iommu);
+
+/*
+ * Inject @count faults into the IOMMU memory region
+ *
+ * Optional method: if this method is not provided, then
+ * memory_region_injection_faults() will return -ENOENT
+ *
+ * @iommu: the IOMMU memory region to inject the faults in
+ * @count: number of faults to inject
+ * @buf: fault buffer
+ */
+int (*inject_faults)(IOMMUMemoryRegion *iommu, int count,
+ struct iommu_fault *buf);
 } IOMMUMemoryRegionClass;
 
 typedef struct CoalescedMemoryRange CoalescedMemoryRange;
@@ -1216,6 +1231,16 @@ int memory_region_iommu_attrs_to_index(IOMMUMemoryRegion 
*iommu_mr,
  */
 int memory_region_iommu_num_indexes(IOMMUMemoryRegion *iommu_mr);
 
+/**
+ * memory_region_inject_faults : inject @count faults stored in @buf
+ *
+ * @iommu_mr: the IOMMU memory region
+ * @count: number of faults to be injected
+ * @buf: buffer containing the faults
+ */
+int memory_region_inject_faults(IOMMUMemoryRegion *iommu_mr, int count,
+struct iommu_fault *buf);
+
 /**
  * memory_region_name: get a memory region's name
  *
diff --git a/memory.c b/memory.c
index d90d8ea67e..16996ef14e 100644
--- a/memory.c
+++ b/memory.c
@@ -2038,6 +2038,16 @@ int memory_region_iommu_num_indexes(IOMMUMemoryRegion 
*iommu_mr)
 return imrc->num_indexes(iommu_mr);
 }
 
+int memory_region_inject_faults(IOMMUMemoryRegion *iommu_mr, int count,
+struct iommu_fault *buf)
+{
+IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_GET_CLASS(iommu_mr);
+if (!imrc->inject_faults) {
+return -ENOENT;
+}
+return imrc->inject_faults(iommu_mr, count, buf);
+}
+
 void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
 {
 uint8_t mask = 1 << client;
-- 
2.20.1




[Qemu-devel] [Bug 1829682] Re: QEMU PPC SYSTEM regression - 3.1.0 and GIT - Fail to boot AIX

2019-05-27 Thread Ivan Warren via Qemu-devel
According to git bisect :

 git bisect bad
c24ba3d0a34f68ad2c6bf1a15bc43770005f6cc0 is the first bad commit
commit c24ba3d0a34f68ad2c6bf1a15bc43770005f6cc0
Author: Laurent Vivier 
Date:   Wed Dec 19 17:35:41 2018 +0100

spapr: Add H-Call H_HOME_NODE_ASSOCIATIVITY

H_HOME_NODE_ASSOCIATIVITY H-Call returns the associativity domain
designation associated with the identifier input parameter

This fixes a crash when we try to hotplug a CPU in memory-less and
CPU-less numa node. In this case, the kernel tries to online the
node, but without the information provided by this h-call, the node id,
it cannot and the CPU is started while the node is not onlined.

It also removes the warning message from the kernel:
  VPHN is not supported. Disabling polling..

Signed-off-by: Laurent Vivier 
Reviewed-by: Greg Kurz 
Signed-off-by: David Gibson 

:04 04 97fe7c5db103c5426f25f2741db918e8cbc03b75 
ed55cf6abd483aa01974c18d613461cc9e80e2c3 M  hw
:04 04 4d51166be64bc71a72bd60eaa412aadc2117fc4c 
614be9f9c87d20f7a2c23921a37d771a8956ee7c M  include

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1829682

Title:
  QEMU PPC SYSTEM regression - 3.1.0 and GIT - Fail to boot AIX

Status in QEMU:
  New

Bug description:
  Built from source on a debian system

  Linux db08 4.9.0-8-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27) x86_64 GNU/Linux
  gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)

  Last git commit (from queued gdibson repository)

  starting AIX 7.2 TL 2 SP 2 with the following : (the install was done
  under qemu 3.1.0)

  qemu-system-ppc64 -M pseries \
  -cpu power7 \
  -cdrom AIX_v7.2_Install_7200-02-02-1806_DVD_1_of_2_32018.iso \
  -net nic \
  -net tap,ifname=tap2,script=no \
  -drive file=DISK1.IMG,if=none,id=drive-virtio-disk0 \
  -device virtio-scsi-pci,id=scsi -device scsi-hd,drive=drive-virtio-disk0 \
  -m 4G \
  -serial stdio \
  -monitor unix:ms,server,nowait \
  -accel tcg \
  -k fr \
  -nographic \
  -prom-env input-device=/vdevice/vty@7100 \
  -prom-env output-device=/vdevice/vty@7100 \
  -prom-env diag-switch?=false \
  -prom-env boot-command="boot 
/pci@8002000/scsi@2/disk@100 -s verbose"

  Yields this :

  
  ^M
  SLOF^[[0m^[[?25l 
**^M
  ^[[1mQEMU Starting^M
  ^[[0m Build Date = Jan 14 2019 18:00:39^M
   FW Version = git-a5b428e1c1eae703^M
   Press "s" to enter Open Firmware.^M^M
  ^M^M
  
^[[0m^[[?25hC^MC0100^MC0120^MC0140^MC0200^MC0240^MC0260^MC02E0^MC0300^MC0320^MC0340^MC0360^MC0370^MC0380^MC0371^MC0372^MC0373^MC0374^MC03F0^MC0400^MC0480^MC04C0^MC04D0^MC0500^MPopulating
 /vdevice methods^M
  Populating /vdevice/vty@7100^M
  Populating /vdevice/nvram@7101^M
  Populating /vdevice/l-lan@7102^M
  Populating /vdevice/v-scsi@7103^M
 SCSI: Looking for devices^M
8200 CD-ROM   : "QEMU QEMU CD-ROM  2.5+"^M
  C05A0^MPopulating /pci@8002000^M
   00  (D) : 1234 qemu vga^M
   00 0800 (D) : 1033 0194serial bus [ usb-xhci ]^M
   00 1000 (D) : 1af4 1004virtio [ scsi ]^M
  Populating /pci@8002000/scsi@2^M
 SCSI: Looking for devices^M
100 DISK : "QEMU QEMU HARDDISK2.5+"^M
  C0600^MC06C0^MC0700^MC0800^MC0880^MC0890^MC08A0^MC08A8^MInstalling QEMU fb^M
  ^M
  ^M
  ^M
  C08B0^MScanning USB ^M
XHCI: Initializing^M
  USB Keyboard ^M
  USB mouse ^M
  C08C0^MC08D0^MNo console specified using screen & keyboard^M
  User selected input-device console: /vdevice/vty@7100^M
  User selected output-device console: /vdevice/vty@7100^M
  C08E0^MC08E8^MC08FF^M ^M
Welcome to Open Firmware^M
  ^M
Copyright (c) 2004, 2017 IBM Corporation All rights reserved.^M
This program and the accompanying materials are made available^M
under the terms of the BSD License available at^M
http://www.opensource.org/licenses/bsd-license.php^M
  ^M
  ^M
  Trying to load: -s verbose from: 
/pci@8002000/scsi@2/disk@100 ...   Successfully loaded^M
  ^M
  ---> qemu,pseries detected <---^M
  ^M
  ^M
  ^M
  ^M
  ^M
  ^M
  ^M
  
---^M
  Welcome to AIX.^M
 boot image timestamp: 05:56:13 04/20/2019^M
  processor count: 1;  memory size: 4096MB;  kernel size: 38426884^M
   boot device: /pci@8002000/scsi@2/disk@100^M
  ^M
  8000FFEC bytes of free memory remain at address 7FFF0014^M
  load address: 0x4000   aixmon size: 0x000D2C00   boot image size: 
0x01A6B430^M
  ^LAIX vm,uuid property contains inval

[Qemu-devel] [RFC v4 24/27] hw/arm/smmuv3: Implement fault injection

2019-05-27 Thread Eric Auger
We convert iommu_fault structs received from the kernel
into the data struct used by the emulation code and record
the evnts into the virtual event queue.

Signed-off-by: Eric Auger 

---

v3 -> v4:
- fix compil issue on mingw

Exhaustive mapping remains to be done
---
 hw/arm/smmuv3.c | 71 +
 1 file changed, 71 insertions(+)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index a697968ace..4b6480bec0 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1549,6 +1549,76 @@ smmuv3_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier 
*n)
 {
 }
 
+struct iommu_fault;
+
+static inline int
+smmuv3_inject_faults(IOMMUMemoryRegion *iommu_mr, int count,
+ struct iommu_fault *buf)
+{
+#ifdef __linux__
+SMMUDevice *sdev = container_of(iommu_mr, SMMUDevice, iommu);
+SMMUv3State *s3 = sdev->smmu;
+uint32_t sid = smmu_get_sid(sdev);
+int i;
+
+for (i = 0; i < count; i++) {
+SMMUEventInfo info = {};
+struct iommu_fault_unrecoverable *record;
+
+if (buf[i].type != IOMMU_FAULT_DMA_UNRECOV) {
+continue;
+}
+
+info.sid = sid;
+record = &buf[i].event;
+
+switch (record->reason) {
+case IOMMU_FAULT_REASON_PASID_INVALID:
+info.type = SMMU_EVT_C_BAD_SUBSTREAMID;
+/* TODO further fill info.u.c_bad_substream */
+break;
+case IOMMU_FAULT_REASON_PASID_FETCH:
+info.type = SMMU_EVT_F_CD_FETCH;
+break;
+case IOMMU_FAULT_REASON_BAD_PASID_ENTRY:
+info.type = SMMU_EVT_C_BAD_CD;
+/* TODO further fill info.u.c_bad_cd */
+break;
+case IOMMU_FAULT_REASON_WALK_EABT:
+info.type = SMMU_EVT_F_WALK_EABT;
+info.u.f_walk_eabt.addr = record->addr;
+info.u.f_walk_eabt.addr2 = record->fetch_addr;
+break;
+case IOMMU_FAULT_REASON_PTE_FETCH:
+info.type = SMMU_EVT_F_TRANSLATION;
+info.u.f_translation.addr = record->addr;
+break;
+case IOMMU_FAULT_REASON_OOR_ADDRESS:
+info.type = SMMU_EVT_F_ADDR_SIZE;
+info.u.f_addr_size.addr = record->addr;
+break;
+case IOMMU_FAULT_REASON_ACCESS:
+info.type = SMMU_EVT_F_ACCESS;
+info.u.f_access.addr = record->addr;
+break;
+case IOMMU_FAULT_REASON_PERMISSION:
+info.type = SMMU_EVT_F_PERMISSION;
+info.u.f_permission.addr = record->addr;
+break;
+default:
+warn_report("%s Unexpected fault reason received from host: %d",
+__func__, record->reason);
+continue;
+}
+
+smmuv3_record_event(s3, &info);
+}
+return 0;
+#else
+return -1;
+#endif
+}
+
 static void smmuv3_iommu_memory_region_class_init(ObjectClass *klass,
   void *data)
 {
@@ -1558,6 +1628,7 @@ static void 
smmuv3_iommu_memory_region_class_init(ObjectClass *klass,
 imrc->notify_flag_changed = smmuv3_notify_flag_changed;
 imrc->get_attr = smmuv3_get_attr;
 imrc->replay = smmuv3_replay;
+imrc->inject_faults = smmuv3_inject_faults;
 }
 
 static const TypeInfo smmuv3_type_info = {
-- 
2.20.1




[Qemu-devel] [RFC v4 26/27] vfio-pci: Set up fault regions

2019-05-27 Thread Eric Auger
We setup two fault regions: the producer fault is read-only from the
user space perspective. It is composed of the fault queue (mmappable)
and a header written by the kernel, located in a separate page.

The consumer fault is write-only from the user-space perspective.

Signed-off-by: Eric Auger 

---
---
 hw/vfio/pci.c | 99 +++
 hw/vfio/pci.h |  2 ++
 2 files changed, 101 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 29d4f633b0..8208171f92 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2505,11 +2505,100 @@ int vfio_populate_vga(VFIOPCIDevice *vdev, Error 
**errp)
 return 0;
 }
 
+static void vfio_init_fault_regions(VFIOPCIDevice *vdev, Error **errp)
+{
+struct vfio_region_info *fault_region_info = NULL;
+struct vfio_region_info_cap_fault *cap_fault;
+VFIODevice *vbasedev = &vdev->vbasedev;
+struct vfio_info_cap_header *hdr;
+char *fault_region_name = NULL;
+uint32_t max_version;
+ssize_t bytes;
+int ret;
+
+/* Producer Fault Region */
+ret = vfio_get_dev_region_info(&vdev->vbasedev,
+   VFIO_REGION_TYPE_NESTED,
+   VFIO_REGION_SUBTYPE_NESTED_FAULT_PROD,
+   &fault_region_info);
+if (!ret) {
+hdr = vfio_get_region_info_cap(fault_region_info,
+   VFIO_REGION_INFO_CAP_PRODUCER_FAULT);
+if (!hdr) {
+error_setg(errp, "failed to retrieve fault ABI max version");
+g_free(fault_region_info);
+return;
+}
+cap_fault = container_of(hdr, struct vfio_region_info_cap_fault,
+ header);
+max_version = cap_fault->version;
+
+fault_region_name = g_strdup_printf("%s FAULT PROD %d",
+vbasedev->name,
+fault_region_info->index);
+
+ret = vfio_region_setup(OBJECT(vdev), vbasedev,
+&vdev->fault_prod_region,
+fault_region_info->index,
+fault_region_name);
+if (ret) {
+error_setg_errno(errp, -ret,
+ "failed to setup the fault prod region %d",
+ fault_region_info->index);
+goto out;
+}
+
+ret = vfio_region_mmap(&vdev->fault_prod_region);
+if (ret) {
+error_report("Failed to mmap fault queue(%d)", ret);
+}
+
+g_free(fault_region_info);
+g_free(fault_region_name);
+} else {
+goto out;
+}
+
+/* Consumer Fault Region */
+ret = vfio_get_dev_region_info(&vdev->vbasedev,
+   VFIO_REGION_TYPE_NESTED,
+   VFIO_REGION_SUBTYPE_NESTED_FAULT_CONS,
+   &fault_region_info);
+if (!ret) {
+fault_region_name = g_strdup_printf("%s FAULT CONS %d",
+vbasedev->name,
+fault_region_info->index);
+
+ret = vfio_region_setup(OBJECT(vdev), vbasedev,
+&vdev->fault_cons_region,
+fault_region_info->index,
+fault_region_name);
+if (ret) {
+error_setg_errno(errp, -ret,
+ "failed to setup the fault cons region %d",
+ fault_region_info->index);
+}
+
+/* Set the chosen fault ABI version in the consume header*/
+bytes = pwrite(vdev->vbasedev.fd, &max_version, 4,
+   vdev->fault_cons_region.fd_offset);
+if (bytes != 4) {
+error_setg(errp,
+   "Unable to set the chosen fault ABI version (%d)",
+   max_version);
+}
+}
+out:
+g_free(fault_region_name);
+g_free(fault_region_info);
+}
+
 static void vfio_populate_device(VFIOPCIDevice *vdev, Error **errp)
 {
 VFIODevice *vbasedev = &vdev->vbasedev;
 struct vfio_region_info *reg_info;
 struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
+Error *err = NULL;
 int i, ret = -1;
 
 /* Sanity check device */
@@ -2573,6 +2662,12 @@ static void vfio_populate_device(VFIOPCIDevice *vdev, 
Error **errp)
 }
 }
 
+vfio_init_fault_regions(vdev, &err);
+if (err) {
+error_propagate(errp, err);
+return;
+}
+
 irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
 
 ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
@@ -3105,6 +3200,8 @@ static void vfio_instance_finalize(Object *obj)
 
 vfio_display_finalize(vdev);
 vfio_bars_finalize(vdev);
+vfio_region_finalize(&vdev->fault_prod_region);
+vfio_region_finalize(&vdev->fault_cons_regio

[Qemu-devel] [RFC v4 27/27] vfio-pci: Implement the DMA fault handler

2019-05-27 Thread Eric Auger
Whenever the eventfd is triggered, we retrieve the DMA faults
from the mmapped fault region and inject them in the iommu
memory region.

Signed-off-by: Eric Auger 
---
 hw/vfio/pci.c | 53 +++
 hw/vfio/pci.h |  1 +
 2 files changed, 54 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 8208171f92..a07acf98c7 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2834,10 +2834,63 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice 
*vdev)
 static void vfio_dma_fault_notifier_handler(void *opaque)
 {
 VFIOPCIDevice *vdev = opaque;
+PCIDevice *pdev = &vdev->pdev;
+AddressSpace *as = pci_device_iommu_address_space(pdev);
+IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(as->root);
+struct vfio_region_fault_prod header;
+struct iommu_fault *queue;
+char *queue_buffer = NULL;
+ssize_t bytes;
 
 if (!event_notifier_test_and_clear(&vdev->dma_fault_notifier)) {
 return;
 }
+
+if (!vdev->fault_prod_region.size || !vdev->fault_cons_region.size) {
+return;
+}
+
+bytes = pread(vdev->vbasedev.fd, &header, sizeof(header),
+  vdev->fault_prod_region.fd_offset);
+if (bytes != sizeof(header)) {
+error_report("%s unable to read the fault region header (0x%lx)",
+ __func__, bytes);
+return;
+}
+
+/* Normally the fault queue is mmapped */
+queue = (struct iommu_fault *)vdev->fault_prod_region.mmaps[0].mmap;
+if (!queue) {
+size_t queue_size = header.nb_entries * header.entry_size;
+
+error_report("%s: fault queue not mmapped: slower fault handling",
+ vdev->vbasedev.name);
+
+queue_buffer = g_malloc(queue_size);
+bytes =  pread(vdev->vbasedev.fd, queue_buffer, queue_size,
+   vdev->fault_prod_region.fd_offset + header.offset);
+if (bytes != queue_size) {
+error_report("%s unable to read the fault queue (0x%lx)",
+ __func__, bytes);
+return;
+}
+
+queue = (struct iommu_fault *)queue_buffer;
+}
+
+while (vdev->fault_cons_index != header.prod) {
+memory_region_inject_faults(iommu_mr, 1,
+&queue[vdev->fault_cons_index]);
+vdev->fault_cons_index =
+(vdev->fault_cons_index + 1) % header.nb_entries;
+}
+bytes = pwrite(vdev->vbasedev.fd, &vdev->fault_cons_index, 4,
+   vdev->fault_cons_region.fd_offset + 4);
+if (bytes != 4) {
+error_report("%s unable to write the fault region cons index (0x%lx)",
+ __func__, bytes);
+}
+g_free(queue_buffer);
 }
 
 static void vfio_register_dma_fault_notifier(VFIOPCIDevice *vdev)
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index ee64081b47..01737d9372 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -138,6 +138,7 @@ typedef struct VFIOPCIDevice {
 EventNotifier dma_fault_notifier;
 VFIORegion fault_prod_region;
 VFIORegion fault_cons_region;
+uint32_t fault_cons_index;
 int (*resetfn)(struct VFIOPCIDevice *);
 uint32_t vendor_id;
 uint32_t device_id;
-- 
2.20.1




Re: [Qemu-devel] [PATCH v14 0/1] qcow2: cluster space preallocation

2019-05-27 Thread Max Reitz
On 26.05.19 17:01, Alberto Garcia wrote:
> On Fri 24 May 2019 03:56:21 PM CEST, Max Reitz  wrote:
>>> +---+---+--+---+--+--+
>>> |   file|before| after| gain |
>>> +---+---+--+---+--+--+
>>> |ssd|  61.153  |  36.313  |  41% |
>>> |hdd| 112.676  | 122.056  |  -8% |
>>> +---+--+--+--+
>>
>> I’ve done a few more tests, and I’ve seen more slowdown on an HDD.
>> (Like 30 % when doing 64 kB requests that are not aligned to
>> clusters.)  On the other hand, the SSD gain is generally in the same
>> ballpark (38 % when issuing the same kind of requests.)
>   [...]
>> [1] Hm.  We can probably investigate whether the file is stored on a
>> rotational medium or not.  Is there a fundamental reason why this
>> patch seems to degrade performance on an HDD but improves it on an
>> SSD?  If so, we can probably make a choice based on that.
> 
> This is when writing to an unallocated cluster with no existing data on
> the backing image, right? Then it's probably because you need 2
> operations (write zeros + write data) instead of just one.

Hm, yes.  I didn’t test writing tail and head separately, which should
be even worse.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC 1/3] block: Add ImageRotationalInfo

2019-05-27 Thread Max Reitz
On 26.05.19 17:08, Alberto Garcia wrote:
> On Fri 24 May 2019 07:28:10 PM CEST, Max Reitz  wrote:
>> +##
>> +# @ImageRotationalInfo:
>> +#
>> +# Indicates whether an image is stored on a rotating disk or not.
>> +#
>> +# @solid-state: Image is stored on a solid-state drive
>> +#
>> +# @rotating:Image is stored on a rotating disk
> 
> What happens when you cannot tell? You assume it's solid-state?

When *I* cannot tell?  This field is generally optional, so in that case
it just will not be present.

(When Linux cannot tell?  I don’t know :-))

Do you think there should be an explicit value for that?

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH] hw/i386/pc: check apci hotplug capability before nvdimm's

2019-05-27 Thread Igor Mammedov
On Thu, 11 Apr 2019 15:17:39 +0800
Wei Yang  wrote:

> pc_memory_pre_plug() is called during hotplug for both pc-dimm and
> nvdimm. This is more proper to check apci hotplug capability before
> check nvdimm specific capability.
not sure what this about.
Currently we are checking if ACPI is enabled
  if (!pcms->acpi_dev || !acpi_enabled) { ...
before nvdimm check and it looks better to me that we cancel
nvdimm hotplug earlier than passing it to
hotplug_handler_pre_plug(pcms->acpi_dev, dev, &local_err)
with this patch ACPI device handler will be called before
nvdimm check happens, so it's +1 unnecessary call chain in
the case of nvdimm, which I'd rather not have.

Are there any issues with current call flow?
(commit message doesn't really explaining why we need this patch)

> 
> Signed-off-by: Wei Yang 
> ---
>  hw/i386/pc.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index f2c15bf1f2..d48b6f9582 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -2091,17 +2091,17 @@ static void pc_memory_pre_plug(HotplugHandler 
> *hotplug_dev, DeviceState *dev,
>  return;
>  }
>  
> -if (is_nvdimm && !ms->nvdimms_state->is_enabled) {
> -error_setg(errp, "nvdimm is not enabled: missing 'nvdimm' in '-M'");
> -return;
> -}
> -
>  hotplug_handler_pre_plug(pcms->acpi_dev, dev, &local_err);
>  if (local_err) {
>  error_propagate(errp, local_err);
>  return;
>  }
>  
> +if (is_nvdimm && !ms->nvdimms_state->is_enabled) {
> +error_setg(errp, "nvdimm is not enabled: missing 'nvdimm' in '-M'");
> +return;
> +}
> +
>  pc_dimm_pre_plug(PC_DIMM(dev), MACHINE(hotplug_dev),
>   pcmc->enforce_aligned_dimm ? NULL : &legacy_align, 
> errp);
>  }




Re: [Qemu-devel] [PATCH v7 10/10] hw/m68k: define Macintosh Quadra 800

2019-05-27 Thread Mark Cave-Ayland
On 25/05/2019 23:50, Laurent Vivier wrote:

> If you want to test the machine, it doesn't yet boot a MacROM, but you can
> boot a linux kernel from the command line.
> 
> You can install your own disk using debian-installer with:
> 
> ./qemu-system-m68k \
> -M q800 \
> -serial none -serial mon:stdio \
> -m 1000M -drive file=m68k.qcow2,format=qcow2 \
> -net nic,model=dp83932,addr=09:00:07:12:34:57 \
> -append "console=ttyS0 vga=off" \
> -kernel vmlinux-4.15.0-2-m68k \
> -initrd initrd.gz \
> -drive file=debian-9.0-m68k-NETINST-1.iso \
> -drive file=m68k.qcow2,format=qcow2 \
> -nographic
> 
> If you use a graphic adapter instead of "-nographic", you can use "-g" to set 
> the
> size of the display (I use "-g 1600x800x24").
> 
> Co-developed-by: Mark Cave-Ayland 
> Signed-off-by: Mark Cave-Ayland 
> Signed-off-by: Laurent Vivier 
> ---
>  MAINTAINERS  |  14 ++
>  default-configs/m68k-softmmu.mak |   1 +
>  hw/m68k/Kconfig  |  12 +
>  hw/m68k/Makefile.objs|   1 +
>  hw/m68k/bootinfo.h   | 100 +
>  hw/m68k/q800.c   | 369 +++
>  6 files changed, 497 insertions(+)
>  create mode 100644 hw/m68k/bootinfo.h
>  create mode 100644 hw/m68k/q800.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 3cacd751bf..274dfd6e19 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -906,6 +906,20 @@ F: hw/char/mcf_uart.c
>  F: hw/net/mcf_fec.c
>  F: include/hw/m68k/mcf*.h
>  
> +q800
> +M: Laurent Vivier 
> +S: Maintained
> +F: hw/block/swim.c
> +F: hw/m68k/bootinfo.h
> +F: hw/display/macfb.c
> +F: hw/m68k/q800.c
> +F: hw/misc/mac_via.c
> +F: hw/nubus/*
> +F: include/hw/block/swim.h
> +F: include/hw/display/macfb.h
> +F: include/hw/misc/mac_via.h
> +F: include/hw/nubus/*
> +
>  MicroBlaze Machines
>  ---
>  petalogix_s3adsp1800
> diff --git a/default-configs/m68k-softmmu.mak 
> b/default-configs/m68k-softmmu.mak
> index e17495e2a0..7e3649c1b8 100644
> --- a/default-configs/m68k-softmmu.mak
> +++ b/default-configs/m68k-softmmu.mak
> @@ -4,3 +4,4 @@
>  #
>  CONFIG_AN5206=y
>  CONFIG_MCF5208=y
> +CONFIG_Q800=y
> diff --git a/hw/m68k/Kconfig b/hw/m68k/Kconfig
> index 49ef0b3f6d..ffa8e48fd8 100644
> --- a/hw/m68k/Kconfig
> +++ b/hw/m68k/Kconfig
> @@ -7,3 +7,15 @@ config MCF5208
>  bool
>  select COLDFIRE
>  select PTIMER
> +
> +config Q800
> +bool
> +select FRAMEBUFFER
> +select ADB
> +select MAC_VIA
> +select ESCC
> +select ESP
> +select MACFB
> +select NUBUS
> +select DP8393X
> +select SWIM
> diff --git a/hw/m68k/Makefile.objs b/hw/m68k/Makefile.objs
> index 482f8477b4..cfd13fae53 100644
> --- a/hw/m68k/Makefile.objs
> +++ b/hw/m68k/Makefile.objs
> @@ -1,2 +1,3 @@
>  obj-$(CONFIG_AN5206) += an5206.o mcf5206.o
>  obj-$(CONFIG_MCF5208) += mcf5208.o mcf_intc.o
> +obj-$(CONFIG_Q800) += q800.o
> diff --git a/hw/m68k/bootinfo.h b/hw/m68k/bootinfo.h
> new file mode 100644
> index 00..6584775f6d
> --- /dev/null
> +++ b/hw/m68k/bootinfo.h
> @@ -0,0 +1,100 @@
> +struct bi_record {
> +uint16_t tag;/* tag ID */
> +uint16_t size;   /* size of record */
> +uint32_t data[0];/* data */
> +};
> +
> +/* machine independent tags */
> +
> +#define BI_LAST 0x /* last record */
> +#define BI_MACHTYPE 0x0001 /* machine type (u_long) */
> +#define BI_CPUTYPE  0x0002 /* cpu type (u_long) */
> +#define BI_FPUTYPE  0x0003 /* fpu type (u_long) */
> +#define BI_MMUTYPE  0x0004 /* mmu type (u_long) */
> +#define BI_MEMCHUNK 0x0005 /* memory chunk address and size */
> +   /* (struct mem_info) */
> +#define BI_RAMDISK  0x0006 /* ramdisk address and size */
> +   /* (struct mem_info) */
> +#define BI_COMMAND_LINE 0x0007 /* kernel command line parameters */
> +   /* (string) */
> +
> +/*  Macintosh-specific tags (all u_long) */
> +
> +#define BI_MAC_MODEL0x8000  /* Mac Gestalt ID (model type) */
> +#define BI_MAC_VADDR0x8001  /* Mac video base address */
> +#define BI_MAC_VDEPTH   0x8002  /* Mac video depth */
> +#define BI_MAC_VROW 0x8003  /* Mac video rowbytes */
> +#define BI_MAC_VDIM 0x8004  /* Mac video dimensions */
> +#define BI_MAC_VLOGICAL 0x8005  /* Mac video logical base */
> +#define BI_MAC_SCCBASE  0x8006  /* Mac SCC base address */
> +#define BI_MAC_BTIME0x8007  /* Mac boot time */
> +#define BI_MAC_GMTBIAS  0x8008  /* Mac GMT timezone offset */
> +#define BI_MAC_MEMSIZE  0x8009  /* Mac RAM size (sanity check) */
> +#define BI_MAC_CPUID0x800a  /* Mac CPU type (sanity check) */
> +#define BI_MAC_ROMBASE  0x800b  /* Mac system ROM base address */
> +
> +/*  Macintosh hardware profile data */
> +
> +#define BI_MAC_VIA1BASE 0x8010  /* Mac VIA1 base address (always present) */
> +#define BI_MAC_VIA2BASE 0x8011  /* Mac VIA2 base address (type varies) */
> +#def

Re: [Qemu-devel] [RFC 1/3] block: Add ImageRotationalInfo

2019-05-27 Thread Alberto Garcia
On Mon 27 May 2019 02:16:53 PM CEST, Max Reitz wrote:
> On 26.05.19 17:08, Alberto Garcia wrote:
>> On Fri 24 May 2019 07:28:10 PM CEST, Max Reitz  wrote:
>>> +##
>>> +# @ImageRotationalInfo:
>>> +#
>>> +# Indicates whether an image is stored on a rotating disk or not.
>>> +#
>>> +# @solid-state: Image is stored on a solid-state drive
>>> +#
>>> +# @rotating:Image is stored on a rotating disk
>> 
>> What happens when you cannot tell? You assume it's solid-state?
>
> When *I* cannot tell?  This field is generally optional, so in that case
> it just will not be present.
>
> (When Linux cannot tell?  I don’t know :-))
>
> Do you think there should be an explicit value for that?

I'll try to rephrase:

we have a new optimization that improves performance on SSDs but reduces
performance on HDDs, so this series would detect where an image is
stored in order to enable the faster code path for each case.

What happens if QEMU cannot detect if we have a solid drive or a
rotational drive? (e.g. a remote storage backend). Will it default to
enabling preallocation using write_zeroes()?

Berto



Re: [Qemu-devel] Running linux on qemu omap

2019-05-27 Thread Tony Lindgren
Hi,

* Philippe Mathieu-Daudé  [190523 12:01]:
> What I use as reference for testing ARM boards [*] is the work of
> Guenter Roeck:
> https://github.com/groeck/linux-build-test/blob/master/rootfs/arm/run-qemu-arm.sh

I think Guenter also has v2.3.50-local-linaro branch in his
github repo that has support for few extra boards like Beagleboard.
Not sure what's the current branch to use though.

Regards,

Tony







Re: [Qemu-devel] [PATCH v3] numa: improve cpu hotplug error message with a wrong node-id

2019-05-27 Thread Igor Mammedov
On Mon, 27 May 2019 08:55:49 +0200
Laurent Vivier  wrote:

> On 24/05/2019 22:14, Eduardo Habkost wrote:
> > On Fri, May 24, 2019 at 04:39:12PM +0200, Laurent Vivier wrote:  
> >> On 24/05/2019 16:10, Igor Mammedov wrote:  
> >>> On Fri, 24 May 2019 12:35:21 +0200
> >>> Laurent Vivier  wrote:
> >>>  
>  On pseries, core-ids are strongly binded to a node-id by the command
>  line option. If an user tries to add a CPU to the wrong node, he has
>  an error but it is not really helpful:
> 
>  qemu-system-ppc64 ... -smp 1,maxcpus=64,cores=1,threads=1,sockets=1 \
>    -numa node,nodeid=0 -numa node,nodeid=1 ...
> 
>  (qemu) device_add power9_v2.0-spapr-cpu-core,core-id=30,node-id=1
>  Error: node-id=1 must match numa node specified with -numa option
> 
>  This patch improves this error message by giving to the user the good
>  topology information (node-id, socket-id and thread-id if they are
>  available) to use with the core-id he's providing:
> 
>  Error: node-id=1 must match numa node specified with -numa option 
>  'node-id 0'
> 
>  Signed-off-by: Laurent Vivier 
>  ---
> 
>  Notes:
>    v3: only add the topology to the existing message
>    As suggested by Igor replace
>  Error: core-id 30 can only be plugged into node-id 0
>    by
>  Error: node-id=1 must match numa node specified with -numa 
>  option 'node-id 0'
>    v2: display full topology in the error message  
> >>>numa.c | 25 -  
> 1 file changed, 24 insertions(+), 1 deletion(-)
> 
>  diff --git a/numa.c b/numa.c
>  index 3875e1efda3a..7882ec294be4 100644
>  --- a/numa.c
>  +++ b/numa.c
>  @@ -458,6 +458,27 @@ void qmp_set_numa_node(NumaOptions *cmd, Error 
>  **errp)
> set_numa_options(MACHINE(qdev_get_machine()), cmd, errp);
> }
>  +static char *cpu_topology_to_string(const CPUArchId *cpu)
>  +{
>  +GString *s = g_string_new(NULL);
>  +if (cpu->props.has_socket_id) {
>  +g_string_append_printf(s, "socket-id %"PRId64, 
>  cpu->props.socket_id);
>  +}
>  +if (cpu->props.has_node_id) {
>  +if (s->len) {
>  +g_string_append_printf(s, ", ");
>  +}
>  +g_string_append_printf(s, "node-id %"PRId64, 
>  cpu->props.node_id);
>  +}
>  +if (cpu->props.has_thread_id) {
>  +if (s->len) {
>  +g_string_append_printf(s, ", ");
>  +}
>  +g_string_append_printf(s, "thread-id %"PRId64, 
>  cpu->props.thread_id);
>  +}
>  +return g_string_free(s, false);
>  +}  
> >>>
> >>> turns out we already have such helper: cpu_slot_to_string()  
> >>
> >> It doesn't display the node-id but the core-id. And node-id is what we need
> >> to know.  
> > 
> > I'm confused about what you are trying to do here.
> > 
> > On v1, the message looked like:
> >Error: core-id 30 can only be plugged into node-id 0
> > 
> > which is probably good for spapr.
> > 
> > 
> > Then I suggested you added the other cpu->props fields.  e.g. on
> > PC the message would look like:
> >Error: socket-id 20, core-id 30, thread-id 40 can only be plugged into 
> > node-id 0
> > 
> > 
> > But you sent a v2 patch that would print this on PC:
> >Error: core-id 30 can only be plugged into socket-id 20, node-id 0, 
> > thread-id 40
> > 
> > which doesn't make sense to me.
> > 
> > 
> > Then in a reply to v2, Igor suggested:
> > 
> >   error_setg(errp, "node-id=%d must match numa node specified "
> > "with -numa option '%s'", node_id, topology);
> > 
> > 
> > Igor suggest would address the problem above.  I expected it to become:
> >node-id=0 must match numa node specified with -numa option core-id=30
> > and on PC:
> >node-id=0 must match numa node specified with -numa option 
> > socket-id=20,core-id=30,thread-id=40
> > 
> > Or maybe it could include the input node-id too:
> >node-id=0 must match numa node specified with -numa option 
> > node-id=1,core-id=30
> > and on PC:
> >node-id=0 must match numa node specified with -numa option 
> > node-id=1,socket-id=20,core-id=30,thread-id=40
> > 
> > Both options would work.
> > 
> > 
> > But you implemented code that would print:
> >Error: node-id=0 must match numa node specified with -numa option 
> > 'node-id 1'
> > and on PC it would print:
> >Error: node-id=0 must match numa node specified with -numa option 
> > 'socket-id 20 node-id 1 thread-id=40'
> > 
> > which doesn't make sense to me.
> > 
> > 
> > I was expecting something like:
> >Error: CPU slot core-id=30 is bound to node-id 0, but node-id 1 was 
> > specified
> > and on PC:
> >Error: CPU slot socket-id=20,core-id=30,thread-id=40 is bound to node-id 
> > 0, bu

Re: [Qemu-devel] Question about wrong ram-node0 reference

2019-05-27 Thread liujunjie (A)
We find only one VM aborted among at least 20 VMs with the same configuration. 
And this problem does not reproduce yet... (Be aware of reproduce is importance 
to figure out the problem, we already tried to add more VMs to reproduce, but 
no results yet.)
The qemu cmdline is as follows:
/usr/bin/qemu-kvm -name guest=instance-00025bf8,debug-threads=on -S -object 
secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain-118-instance-00025bf8/master-key.aes
 -machine 
pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off,max-ram-below-4g=2G -cpu 
host,host-cache-info=on -m 131072 -realtime min_guarantee=131072,mlock=off -smp 
16,sockets=2,cores=4,threads=2 -object iothread,id=iothread1 -object 
iothread,id=iothread2 -object iothread,id=iothread3 -object 
iothread,id=iothread4 -object iothread,id=iothread5 -object 
iothread,id=iothread6 -object iothread,id=iothread7 -object 
iothread,id=iothread8 -object iothread,id=iothread9 -object 
iothread,id=iothread10 -object iothread,id=iothread11 -object 
iothread,id=iothread12 -object iothread,id=iothread13 -object 
iothread,id=iothread14 -object iothread,id=iothread15 -object 
iothread,id=iothread16 -object iothread,id=iothread17 -object 
iothread,id=iothread18 -object iothread,id=iothread19 -object 
iothread,id=iothread20 -object iothread,id=iothread21 -object 
iothread,id=iothread22 -object iothread,id=iothread23 -object 
iothread,id=iothread24 -object iothread,id=iothread25 -object 
iothread,id=iothread26 -object iothread,id=iothread27 -object 
iothread,id=iothread28 -object iothread,id=iothread29 -object 
memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/118-instance-00025bf8,share=yes,size=68719476736,host-nodes=0,policy=bind
 -numa node,nodeid=0,cpus=0-7,memdev=ram-node0 -object 
memory-backend-file,id=ram-node1,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/118-instance-00025bf8,share=yes,size=68719476736,host-nodes=1,policy=bind
 -numa node,nodeid=1,cpus=8-15,memdev=ram-node1 -uuid 
6952c043-4e0c-4267-80c1-fac2e302443f -smbios type=1,manufacturer=OpenStack 
Foundation,product=OpenStack 
Nova,version=13.2.1-20181119144459,serial=c5cc21e6-1d3b-4587-8c1e-208a1d19a47e,uuid=6952c043-4e0c-4267-80c1-fac2e302443f,family=Virtual
 Machine -no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-118-instance-00025bf8/monitor.sock,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=control -rtc 
base=2019-01-21T06:59:37,clock=vm,driftfix=slew -global 
kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on -device 
pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x3 -device 
pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x4 -device 
pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0x5 -device 
pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0x6 -device 
pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0x7 -device 
pci-bridge,chassis_nr=6,id=pci.6,bus=pci.0,addr=0x8 -device 
pci-bridge,chassis_nr=7,id=pci.7,bus=pci.0,addr=0x9 -device 
pci-bridge,chassis_nr=8,id=pci.8,bus=pci.0,addr=0xa -device 
piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device 
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0xb -drive 
file=/dev/mapper/648d06e72e68404a9401854e21409f3d-dm,format=raw,if=none,id=drive-virtio-disk0,serial=648d06e7-2e68-404a-9401-854e21409f3d,cache=none,aio=native
 -device 
virtio-blk-pci,scsi=off,bus=pci.2,addr=0x1,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 -chardev socket,id=charnet0,path=/var/run/vhost-user/tap4ba9f4eb-19 -netdev 
vhost-user,chardev=charnet0,queues=4,id=hostnet0 -device 
virtio-net-pci,mq=on,vectors=10,netdev=hostnet0,id=net0,mac=fa:16:3e:0f:ed:94,bus=pci.4,addr=0x3,bootindex=2
 -add-fd set=0,fd=45 -chardev file,id=charserial0,path=/dev/fdset/0,append=on 
-device isa-serial,chardev=charserial0,id=serial0 -chardev 
socket,id=charchannel0,path=/var/run/libvirt/qemu/instance-00025bf8.extend,server,nowait
 -device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.1
 -chardev 
socket,id=charchannel1,path=/var/run/libvirt/qemu/instance-00025bf8.agent,server,nowait
 -device 
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0
 -chardev 
socket,id=charchannel2,path=/var/run/libvirt/qemu/instance-00025bf8.hostd,server,nowait
 -device 
virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=org.qemu.guest_agent.2
 -chardev 
socket,id=charchannel3,path=/var/run/libvirt/qemu/instance-00025bf8.upgraded,server,nowait
 -device 
virtserialport,bus=virtio-serial0.0,nr=4,chardev=charchannel3,id=channel3,name=org.qemu.guest_agent.3
 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 172.28.5.246:3,password -k 
en-us -device cirrus-vga,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device 
vfio-pci,host=95:00.0,id=hostdev0,bus=pci.5,addr=0x1 -device 
vfio-pci,host=99:00.0,id=hostdev1,bus=pci.5,addr=0x2 -device 
vfio-pci,host=35:00.0,id=hostdev2,

[Qemu-devel] [PATCH] qcow2-bitmap: initialize bitmap directory alignment

2019-05-27 Thread Andrey Shinkevich
Valgrind detects multiple issues in QEMU iotests when the memory is
used without being initialized. Valgrind may dump lots of unnecessary
reports what makes the memory issue analysis harder. Particularly,
that is true for the aligned bitmap directory and can be seen while
running the iotest #169. Padding the aligned space with zeros eases
the pain.

Signed-off-by: Andrey Shinkevich 
---
 block/qcow2-bitmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
index 8a75366..4941764 100644
--- a/block/qcow2-bitmap.c
+++ b/block/qcow2-bitmap.c
@@ -754,7 +754,7 @@ static int bitmap_list_store(BlockDriverState *bs, 
Qcow2BitmapList *bm_list,
 dir_offset = *offset;
 }
 
-dir = g_try_malloc(dir_size);
+dir = g_try_malloc0(dir_size);
 if (dir == NULL) {
 return -ENOMEM;
 }
-- 
1.8.3.1




Re: [Qemu-devel] [RFC 1/3] block: Add ImageRotationalInfo

2019-05-27 Thread Max Reitz
On 27.05.19 14:37, Alberto Garcia wrote:
> On Mon 27 May 2019 02:16:53 PM CEST, Max Reitz wrote:
>> On 26.05.19 17:08, Alberto Garcia wrote:
>>> On Fri 24 May 2019 07:28:10 PM CEST, Max Reitz  wrote:
 +##
 +# @ImageRotationalInfo:
 +#
 +# Indicates whether an image is stored on a rotating disk or not.
 +#
 +# @solid-state: Image is stored on a solid-state drive
 +#
 +# @rotating:Image is stored on a rotating disk
>>>
>>> What happens when you cannot tell? You assume it's solid-state?
>>
>> When *I* cannot tell?  This field is generally optional, so in that case
>> it just will not be present.
>>
>> (When Linux cannot tell?  I don’t know :-))
>>
>> Do you think there should be an explicit value for that?
> 
> I'll try to rephrase:
> 
> we have a new optimization that improves performance on SSDs but reduces
> performance on HDDs, so this series would detect where an image is
> stored in order to enable the faster code path for each case.
> 
> What happens if QEMU cannot detect if we have a solid drive or a
> rotational drive? (e.g. a remote storage backend). Will it default to
> enabling preallocation using write_zeroes()?

In this series, yes.  That is the default I chose.

We have to make a separate decision for each case.  In the case of
filling newly allocated areas with zeroes, I think the performance gain
for SSDs is more important than the performance loss for HDDs.  That is
what I wrote in my response to Anton’s series.  So I took the series
even without it being able to distinguish both cases at all.
Consequentially, I believe it is reasonable for that to be the default
behavior if we cannot tell.

I think in general optimizing for SSDs should probably be the default.
HDDs are slow anyway, so whoever uses them probably doesn’t care about
performance too much anyway...?  Whereas people using SSDs probably do.
 But as I said, we can and should always make a separate decision for
each case.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PULL v2 04/36] virtio: Introduce started flag to VirtioDevice

2019-05-27 Thread Yongji Xie
On Mon, 27 May 2019 at 18:44, Greg Kurz  wrote:
>
> On Fri, 24 May 2019 19:56:06 +0800
> Yongji Xie  wrote:
>
> > On Fri, 24 May 2019 at 18:20, Greg Kurz  wrote:
> > >
> > > On Mon, 20 May 2019 19:10:35 -0400
> > > "Michael S. Tsirkin"  wrote:
> > >
> > > > From: Xie Yongji 
> > > >
> > > > The virtio 1.0 transitional devices support driver uses the device
> > > > before setting the DRIVER_OK status bit. So we introduce a started
> > > > flag to indicate whether driver has started the device or not.
> > > >
> > > > Signed-off-by: Xie Yongji 
> > > > Signed-off-by: Zhang Yu 
> > > > Message-Id: <20190320112646.3712-2-xieyon...@baidu.com>
> > > > Reviewed-by: Michael S. Tsirkin 
> > > > Signed-off-by: Michael S. Tsirkin 
> > > > ---
> > > >  include/hw/virtio/virtio.h |  2 ++
> > > >  hw/virtio/virtio.c | 52 --
> > > >  2 files changed, 52 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> > > > index 7140381e3a..27c0efc3d0 100644
> > > > --- a/include/hw/virtio/virtio.h
> > > > +++ b/include/hw/virtio/virtio.h
> > > > @@ -105,6 +105,8 @@ struct VirtIODevice
> > > >  uint16_t device_id;
> > > >  bool vm_running;
> > > >  bool broken; /* device in invalid state, needs reset */
> > > > +bool started;
> > > > +bool start_on_kick; /* virtio 1.0 transitional devices support 
> > > > that */
> > > >  VMChangeStateEntry *vmstate;
> > > >  char *bus_name;
> > > >  uint8_t device_endian;
> > > > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > > > index 28056a7ef7..5d533ac74e 100644
> > > > --- a/hw/virtio/virtio.c
> > > > +++ b/hw/virtio/virtio.c
> > > > @@ -1162,10 +1162,16 @@ int virtio_set_status(VirtIODevice *vdev, 
> > > > uint8_t val)
> > > >  }
> > > >  }
> > > >  }
> > > > +vdev->started = val & VIRTIO_CONFIG_S_DRIVER_OK;
> > > > +if (unlikely(vdev->start_on_kick && vdev->started)) {
> > > > +vdev->start_on_kick = false;
> > > > +}
> > > > +
> > > >  if (k->set_status) {
> > > >  k->set_status(vdev, val);
> > > >  }
> > > >  vdev->status = val;
> > > > +
> > > >  return 0;
> > > >  }
> > > >
> > > > @@ -1208,6 +1214,9 @@ void virtio_reset(void *opaque)
> > > >  k->reset(vdev);
> > > >  }
> > > >
> > > > +vdev->start_on_kick = (virtio_host_has_feature(vdev, 
> > > > VIRTIO_F_VERSION_1) &&
> > > > +  !virtio_vdev_has_feature(vdev, 
> > > > VIRTIO_F_VERSION_1));
> > > > +vdev->started = false;
> > > >  vdev->broken = false;
> > > >  vdev->guest_features = 0;
> > > >  vdev->queue_sel = 0;
> > > > @@ -1518,14 +1527,21 @@ void virtio_queue_set_align(VirtIODevice *vdev, 
> > > > int n, int align)
> > > >
> > > >  static bool virtio_queue_notify_aio_vq(VirtQueue *vq)
> > > >  {
> > > > +bool ret = false;
> > > > +
> > > >  if (vq->vring.desc && vq->handle_aio_output) {
> > > >  VirtIODevice *vdev = vq->vdev;
> > > >
> > > >  trace_virtio_queue_notify(vdev, vq - vdev->vq, vq);
> > > > -return vq->handle_aio_output(vdev, vq);
> > > > +ret = vq->handle_aio_output(vdev, vq);
> > > > +
> > > > +if (unlikely(vdev->start_on_kick)) {
> > > > +vdev->started = true;
> > > > +vdev->start_on_kick = false;
> > > > +}
> > > >  }
> > > >
> > > > -return false;
> > > > +return ret;
> > > >  }
> > > >
> > > >  static void virtio_queue_notify_vq(VirtQueue *vq)
> > > > @@ -1539,6 +1555,11 @@ static void virtio_queue_notify_vq(VirtQueue *vq)
> > > >
> > > >  trace_virtio_queue_notify(vdev, vq - vdev->vq, vq);
> > > >  vq->handle_output(vdev, vq);
> > > > +
> > > > +if (unlikely(vdev->start_on_kick)) {
> > > > +vdev->started = true;
> > > > +vdev->start_on_kick = false;
> > > > +}
> > > >  }
> > > >  }
> > > >
> > > > @@ -1556,6 +1577,11 @@ void virtio_queue_notify(VirtIODevice *vdev, int 
> > > > n)
> > > >  } else if (vq->handle_output) {
> > > >  vq->handle_output(vdev, vq);
> > > >  }
> > > > +
> > > > +if (unlikely(vdev->start_on_kick)) {
> > > > +vdev->started = true;
> > > > +vdev->start_on_kick = false;
> > > > +}
> > > >  }
> > > >
> > > >  uint16_t virtio_queue_vector(VirtIODevice *vdev, int n)
> > > > @@ -1770,6 +1796,13 @@ static bool virtio_broken_needed(void *opaque)
> > > >  return vdev->broken;
> > > >  }
> > > >
> > > > +static bool virtio_started_needed(void *opaque)
> > > > +{
> > > > +VirtIODevice *vdev = opaque;
> > > > +
> > > > +return vdev->started;
> > >
> > > Existing machine types don't know about the "virtio/started" subsection. 
> > > This
> > > breaks migration to older QEMUs if the driver has started the device, ie. 
> > > most
> > > probably always when it comes to live migration.
> > >
> > > My understanding is that we do try to suppor

Re: [Qemu-devel] Our use of #include is undisciplined, and what to do about it

2019-05-27 Thread Markus Armbruster
It's been three years, let's examine how things have evolved.

I'm using commit db3d11ee3f0, which is a bit behind current master, just
so I can apply my "[PATCH 0/4] Cleanups around qemu-common.h" cleanly.

Markus Armbruster  writes:

[...]
> = The status quo and why I hate it =
>
> I've seen several schools of thought on use of #include.
>
> There's the "no #include in headers" school: every .c file includes
> exactly the headers it needs, and the prerequisites they need.  Cyclic
> inclusion becomes impossible.  You can't sweep cyclic dependencies under
> the rug.  Headers are read just once per compilation unit.  The amount
> of crap you include is clearly visible.  However, maintaining the
> #include directives is a drag, not least because their order matters.
> Especially when headers neglect to spell out their dependencies.  Or
> they do, but it's wrong.
>
> There's the "headers must be self-contained" school: every header
> includes everything it needs.  Headers can be included in any order.
> Sorted #include directives are tidy and easy to navigate.  Headers can
> be read multiple times, which can only hurt compilation time.

Our compilers avoid this for headers with proper header guards.

>You need
> to make an effort to avoid cyclic dependencies and excessive inclusion.
>
> And then there's the school of non-thought: when it doesn't compile,
> sprinkle #include on the mess semi-randomly until it does.
>
> We do a bit of all three, but the result looks awfully close to what the
> school of non-thought produces.
>
> Every .c file includes qemu/osdep.h first.  For me, a .c file that
> includes nothing but that comes out well over half a Megabyte in >23k
> lines preprocessed.  Where does all this crap come from?
>
>   #lines  KiBytes  #files  source
> 5233 102   5   QEMU
> 8035 159  70   system
> 7915 224  73   GLib
> 2458  89   1   # lines
>23641 576 149   total
>
> "# lines" are lines added by the preprocessor so the rest of the
> compiler can keep track of source locations.

#lines  KiBytes  #files  source
   375   8   5   QEMU
  9722 230 113   system
  8212 254  74   GLib
  1517  65 N/A   # lines
 19826 557 192   total

The weight QEMU lost, system + GLib put on.

> Having the compiler wade through almost half a Megabyte of system+GLib
> crap before it begins to consider the code we care about feels wasteful.
> Perhaps we should rethink our approach to including library headers.

No change.

> Of the 102K that are actually our own, just 7K come from include/.  95K
> come from qapi-types.h.

Fixed.

> Judging from the .d files in my build tree, 95% of the .c files include
> qemu-common.h.  That makes things a good deal worse.

Down to 90%.  My "[PATCH 0/4] Cleanups around qemu-common.h" shrinks it
to less than 10%.  Small enough for me not to repeat the measurements
below.

>   Without
> NEED_CPU_H, this adds a modest 44K of our own headers, but almost 100K
> of system headers:
>
>   #lines  KiBytes  #files  source
> 6938 146  16   QEMU
>11426 254  74   system
> 7915 224  73   GLib
> 2658 100   1   # lines
>28937 726 164   total
>
> NEED_CPU_H adds another 120K of our own headers:
>
>   #lines  KiBytes  #files  source
>11534 263  43   QEMU
>11548 256  78   system
> 7915 225  72   GLib
> 3370 138   1   # lines
>34367 883 194   total
>
> The average size of a .c file is just over 15KiB.  To get to the actual
> C code there, the compiler has to wade through at least 550-880KiB of
> headers.  In other words, roughly 2% of the source comes from .c in the
> best case.
>
> But that's not even the worst part.  The worst part by far are our
> "touch this and recompile the world" headers.
>
> I find just short 4000 .d files in my build tree.

Some 6400 now, ignoring the .d that don't contain ".o:".

>Guess how many of our
> headers are listed as prerequisites in more than 90% of them (thus
> touching them will recompile the .c file)?  *Twenty-two*.

Down to 12 before my "[PATCH 0/4] Cleanups around qemu-common.h", and to
10 afterwards.

>Almost fifty
> recompile more half of the world.

No significant change.

> Naturally, touching osdep.h or anything it includes recompiles the
> world.  These are:
>
> config-host.h
> include/glib-compat.h
> include/qapi/error.h
> include/qemu/compiler.h
> include/qemu/osdep.h
> include/qemu/typedefs.h
> include/sysemu/os-posix.h
> qapi-types.h
>
> NEED_CPU_H adds
>
> config-target.h
>
> Fine, except for qapi/error.h and qapi-types.h.  The latter is an itch I
> need to scratc

Re: [Qemu-devel] [PATCH v3 0/5] blockdev-backup: don't check aio_context too early

2019-05-27 Thread Max Reitz
On 23.05.19 19:06, John Snow wrote:
> See patch one's commit message for justification.
> Patches 2-5 are for testing, because that's always how these things go.
> 
> 001/5:[] [--] 'blockdev-backup: don't check aio_context too early'
> 002/5:[0004] [FC] 'iotests.py: do not use infinite waits'
> 003/5:[down]  'QEMUMachine: add events_wait method'
> 004/5:[0022] [FC] 'iotests.py: rewrite run_job to be pickier'
> 005/5:[0017] [FC] 'iotests: add iotest 250 for testing blockdev-backup
>across iothread contexts'
> 
> v3: Rebased on Max's staging branch:
> Rebase patch 2
> added patch 3, to add events_wait.
> Rework patch 4 to make run_job consume legacy events, too
> Minorly edit patch 5 due to the two above.
> v2: added patch 4, with iotest framework adjustments in patches 2/3.

Thanks, applied to my block branch:

https://git.xanclic.moe/XanClic/qemu/commits/branch/block
https://github.com/XanClic/qemu/commits/block

(:-P)

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v4 00/20] monitor: add asynchronous command type

2019-05-27 Thread Markus Armbruster
Gerd Hoffmann  writes:

> On Mon, May 27, 2019 at 10:18:42AM +0200, Markus Armbruster wrote:
>> Marc-André Lureau  writes:
>> 
>> > Hi
>> >
>> > On Thu, May 23, 2019 at 9:52 AM Markus Armbruster  
>> > wrote:
>> >> I'm not sure how asynchronous commands could support reconnect and
>> >> resume.
>> >
>> > The same way as current commands, including job commands.
>> 
>> Consider the following scenario: a management application such as
>> libvirt starts a long-running task with the intent to monitor it until
>> it finishes.  Half-way through, the management application needs to
>> disconnect and reconnect for some reason (systemctl restart, or crash &
>> recover, or whatever).
>> 
>> If the long-running task is a job, the management application can resume
>> after reconnect: the job's ID is as valid as it was before, and the
>> commands to query and control the job work as before.
>> 
>> What if it's and asynchronous command?
>
> This is not meant for some long-running job which you have to manage.
>
> Allowing commands being asynchronous makes sense for things which (a)
> typically don't take long, and (b) don't need any management.
>
> So, if the connection goes down the job is simply canceled, and after
> reconnecting the management can simply send the same command again.

Is this worth its own infrastructure?

Would you hazard a guess on how many commands can take long enough to
demand a conversion to asynchronous, yet not need any management?

>> > Whenever we can solve things on qemu side, I would rather not
>> > deprecate current API.
>> 
>> Making a synchronous command asynchronous definitely changes API.
>
> Inside qemu yes, sure.  But for the QMP client nothing changes.

Command replies can arrive out of order, can't they?



Re: [Qemu-devel] [PATCH] event_match: always match on None value

2019-05-27 Thread Max Reitz
On 24.05.19 20:02, John Snow wrote:
> Before, event_match didn't always recurse if the event value was not a
> dictionary, and would instead check for equality immediately.
> 
> By delaying equality checking to post-recursion, we can allow leaf
> values like "5" to match "None" and take advantage of the generic
> None-returns-True clause.
> 
> This makes the matching a little more obviously consistent at the
> expense of being able to check for explicit None values, which is
> probably not that important given what this function is used for.
> 
> Signed-off-by: John Snow 
> ---
>  python/qemu/__init__.py | 27 +++
>  1 file changed, 15 insertions(+), 12 deletions(-)
> 
> diff --git a/python/qemu/__init__.py b/python/qemu/__init__.py
> index 98ed8a2e28..77d45f88fe 100644
> --- a/python/qemu/__init__.py
> +++ b/python/qemu/__init__.py
> @@ -409,27 +409,30 @@ class QEMUMachine(object):
>  
>  The match criteria takes the form of a matching subdict. The event is
>  checked to be a superset of the subdict, recursively, with matching
> -values whenever those values are not None.
> +values whenever the subdict values are not None.
> +
> +This has a limitation that you cannot explicitly check for None 
> values.
>  
>  Examples, with the subdict queries on the left:
>   - None matches any object.
>   - {"foo": None} matches {"foo": {"bar": 1}}
> - - {"foo": {"baz": None}} does not match {"foo": {"bar": 1}}
> - - {"foo": {"baz": 2}} matches {"foo": {"bar": 1, "baz": 2}}
> + - {"foo": None} matches {"foo": 5}
> + - {"foo": {"abc": None}} does not match {"foo": {"bar": 1}}
> + - {"foo": {"rab": 2}} matches {"foo": {"bar": 1, "rab": 2}}
>  """
>  if match is None:
>  return True
>  
> -for key in match:
> -if key in event:
> -if isinstance(event[key], dict):
> -if not QEMUMachine.event_match(event[key], match[key]):
> -return False
> -elif event[key] != match[key]:
> +try:
> +for key in match:
> +if key in event:
> +return QEMUMachine.event_match(event[key], match[key])

With this change, we only check a single key that is both in @match and
@event.  I think we want to keep the "if not -- return False" pattern,
don’t we?

Max

> +else:
>  return False
> -else:
> -return False
> -return True
> +return True
> +except TypeError:
> +# either match or event wasn't iterable (not a dict)
> +return match == event
>  
>  def event_wait(self, name, timeout=60.0, match=None):
>  """
> 




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH] qcow2-bitmap: initialize bitmap directory alignment

2019-05-27 Thread Max Reitz
On 27.05.19 14:52, Andrey Shinkevich wrote:
> Valgrind detects multiple issues in QEMU iotests when the memory is
> used without being initialized. Valgrind may dump lots of unnecessary
> reports what makes the memory issue analysis harder. Particularly,
> that is true for the aligned bitmap directory and can be seen while
> running the iotest #169. Padding the aligned space with zeros eases
> the pain.
> 
> Signed-off-by: Andrey Shinkevich 
> ---
>  block/qcow2-bitmap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Thanks, applied to my block branch:

https://git.xanclic.moe/XanClic/qemu/commits/branch/block

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v7 10/10] hw/m68k: define Macintosh Quadra 800

2019-05-27 Thread Aleksandar Markovic
On May 26, 2019 1:07 AM, "Laurent Vivier"  wrote:
>
> If you want to test the machine, it doesn't yet boot a MacROM, but you can
> boot a linux kernel from the command line.
>
> You can install your own disk using debian-installer with:
>
> ./qemu-system-m68k \
> -M q800 \
> -serial none -serial mon:stdio \
> -m 1000M -drive file=m68k.qcow2,format=qcow2 \
> -net nic,model=dp83932,addr=09:00:07:12:34:57 \
> -append "console=ttyS0 vga=off" \
> -kernel vmlinux-4.15.0-2-m68k \
> -initrd initrd.gz \
> -drive file=debian-9.0-m68k-NETINST-1.iso \
> -drive file=m68k.qcow2,format=qcow2 \
> -nographic
>

Hello Laurent,

How does one obtain vmlinux-4.15.0-2-m68 and init.rd?

Greetings, Aleksandar

> If you use a graphic adapter instead of "-nographic", you can use "-g" to
set the
> size of the display (I use "-g 1600x800x24").
>
> Co-developed-by: Mark Cave-Ayland 
> Signed-off-by: Mark Cave-Ayland 
> Signed-off-by: Laurent Vivier 
> ---
>  MAINTAINERS  |  14 ++
>  default-configs/m68k-softmmu.mak |   1 +
>  hw/m68k/Kconfig  |  12 +
>  hw/m68k/Makefile.objs|   1 +
>  hw/m68k/bootinfo.h   | 100 +
>  hw/m68k/q800.c   | 369 +++
>  6 files changed, 497 insertions(+)
>  create mode 100644 hw/m68k/bootinfo.h
>  create mode 100644 hw/m68k/q800.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 3cacd751bf..274dfd6e19 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -906,6 +906,20 @@ F: hw/char/mcf_uart.c
>  F: hw/net/mcf_fec.c
>  F: include/hw/m68k/mcf*.h
>
> +q800
> +M: Laurent Vivier 
> +S: Maintained
> +F: hw/block/swim.c
> +F: hw/m68k/bootinfo.h
> +F: hw/display/macfb.c
> +F: hw/m68k/q800.c
> +F: hw/misc/mac_via.c
> +F: hw/nubus/*
> +F: include/hw/block/swim.h
> +F: include/hw/display/macfb.h
> +F: include/hw/misc/mac_via.h
> +F: include/hw/nubus/*
> +
>  MicroBlaze Machines
>  ---
>  petalogix_s3adsp1800
> diff --git a/default-configs/m68k-softmmu.mak
b/default-configs/m68k-softmmu.mak
> index e17495e2a0..7e3649c1b8 100644
> --- a/default-configs/m68k-softmmu.mak
> +++ b/default-configs/m68k-softmmu.mak
> @@ -4,3 +4,4 @@
>  #
>  CONFIG_AN5206=y
>  CONFIG_MCF5208=y
> +CONFIG_Q800=y
> diff --git a/hw/m68k/Kconfig b/hw/m68k/Kconfig
> index 49ef0b3f6d..ffa8e48fd8 100644
> --- a/hw/m68k/Kconfig
> +++ b/hw/m68k/Kconfig
> @@ -7,3 +7,15 @@ config MCF5208
>  bool
>  select COLDFIRE
>  select PTIMER
> +
> +config Q800
> +bool
> +select FRAMEBUFFER
> +select ADB
> +select MAC_VIA
> +select ESCC
> +select ESP
> +select MACFB
> +select NUBUS
> +select DP8393X
> +select SWIM
> diff --git a/hw/m68k/Makefile.objs b/hw/m68k/Makefile.objs
> index 482f8477b4..cfd13fae53 100644
> --- a/hw/m68k/Makefile.objs
> +++ b/hw/m68k/Makefile.objs
> @@ -1,2 +1,3 @@
>  obj-$(CONFIG_AN5206) += an5206.o mcf5206.o
>  obj-$(CONFIG_MCF5208) += mcf5208.o mcf_intc.o
> +obj-$(CONFIG_Q800) += q800.o
> diff --git a/hw/m68k/bootinfo.h b/hw/m68k/bootinfo.h
> new file mode 100644
> index 00..6584775f6d
> --- /dev/null
> +++ b/hw/m68k/bootinfo.h
> @@ -0,0 +1,100 @@
> +struct bi_record {
> +uint16_t tag;/* tag ID */
> +uint16_t size;   /* size of record */
> +uint32_t data[0];/* data */
> +};
> +
> +/* machine independent tags */
> +
> +#define BI_LAST 0x /* last record */
> +#define BI_MACHTYPE 0x0001 /* machine type (u_long) */
> +#define BI_CPUTYPE  0x0002 /* cpu type (u_long) */
> +#define BI_FPUTYPE  0x0003 /* fpu type (u_long) */
> +#define BI_MMUTYPE  0x0004 /* mmu type (u_long) */
> +#define BI_MEMCHUNK 0x0005 /* memory chunk address and size */
> +   /* (struct mem_info) */
> +#define BI_RAMDISK  0x0006 /* ramdisk address and size */
> +   /* (struct mem_info) */
> +#define BI_COMMAND_LINE 0x0007 /* kernel command line parameters */
> +   /* (string) */
> +
> +/*  Macintosh-specific tags (all u_long) */
> +
> +#define BI_MAC_MODEL0x8000  /* Mac Gestalt ID (model type) */
> +#define BI_MAC_VADDR0x8001  /* Mac video base address */
> +#define BI_MAC_VDEPTH   0x8002  /* Mac video depth */
> +#define BI_MAC_VROW 0x8003  /* Mac video rowbytes */
> +#define BI_MAC_VDIM 0x8004  /* Mac video dimensions */
> +#define BI_MAC_VLOGICAL 0x8005  /* Mac video logical base */
> +#define BI_MAC_SCCBASE  0x8006  /* Mac SCC base address */
> +#define BI_MAC_BTIME0x8007  /* Mac boot time */
> +#define BI_MAC_GMTBIAS  0x8008  /* Mac GMT timezone offset */
> +#define BI_MAC_MEMSIZE  0x8009  /* Mac RAM size (sanity check) */
> +#define BI_MAC_CPUID0x800a  /* Mac CPU type (sanity check) */
> +#define BI_MAC_ROMBASE  0x800b  /* Mac system ROM base address */
> +
> +/*  Macintosh hardware profile data */
> +
> +#define BI_MAC_VIA1BASE 0x8010  /* Mac VIA1 base address (always
presen

  1   2   >