Re: [PATCH v2 13/13] blockdev: Drop unused drive_get_next()

2021-11-18 Thread Hanna Reitz

On 17.11.21 17:34, Markus Armbruster wrote:

drive_get_next() is basically a bad idea.  It returns the "next" block
backend of a certain interface type.  "Next" means bus=0,unit=N, where
subsequent calls count N up from zero, per interface type.

This lets you define unit numbers implicitly by execution order.  If the
order changes, or new calls appear "in the middle", unit numbers change.
ABI break.  Hard to spot in review.

The previous commits eliminated all uses.  Drop the function.

Cc: Kevin Wolf 
Cc: Hanna Reitz 
Signed-off-by: Markus Armbruster 
---
  include/sysemu/blockdev.h |  1 -
  blockdev.c| 10 --
  2 files changed, 11 deletions(-)


Reviewed-by: Hanna Reitz 




Re: [PATCH] block/vvfat.c fix leak when failure occurs

2021-11-18 Thread Hanna Reitz

On 16.11.21 13:57, Daniella Lee wrote:

Function vvfat_open called function enable_write_target and init_directories,
and these functions malloc new memory for BDRVVVFATState::qcow_filename,
BDRVVVFATState::used_clusters, and BDRVVVFATState::cluster_buff.

When the specified folder does not exist ,it may contains memory leak.
After init_directories function is executed, the vvfat_open return -EIO,
and bdrv_open_driver goto label open_failed,
the program use g_free(bs->opaque) to release BDRVVVFATState struct
without members mentioned.

command line:
qemu-system-x86_64 -hdb   -usb -device usb-storage,drive=fat16
-drive file=fat:rw:fat-type=16:"",
id=fat16,format=raw,if=none

enable_write_target called:
(gdb) bt
#0  enable_write_target (bs=0x56f9f000, errp=0x7fffd780)
 at ../block/vvfat.c:3114
#1  vvfat_open (bs=0x56f9f000, options=0x56fa45d0,
 flags=155650, errp=0x7fffd780) at ../block/vvfat.c:1236
#2  bdrv_open_driver (bs=0x56f9f000, drv=0x56c47920 ,
 node_name=0x0, options=0x56fa45d0, open_flags=155650,
 errp=0x7fffd890) at ../block.c:1558
#3  bdrv_open_common (bs=0x56f9f000, file=0x0, options=0x56fa45d0,
 errp=0x7fffd890) at ../block.c:1852
#4  bdrv_open_inherit (filename=0x56f73310 "fat:rw:",
 reference=0x0, options=0x56fa45d0, flags=40962, parent=0x56f98cd0,
 child_class=0x56b1d6a0 , child_role=19,
 errp=0x7fffda90) at ../block.c:3779
#5  bdrv_open_child_bs (filename=0x56f73310 "fat:rw:",
 options=0x56f9cfc0, bdref_key=0x56239bb8 "file",
 parent=0x56f98cd0, child_class=0x56b1d6a0 ,
 child_role=19, allow_none=true, errp=0x7fffda90) at ../block.c:3419
#6  bdrv_open_inherit (filename=0x56f73310 "fat:rw:",
 reference=0x0, options=0x56f9cfc0, flags=8194, parent=0x0,
 child_class=0x0, child_role=0, errp=0x56c98c40 )
 at ../block.c:3726
#7  bdrv_open (filename=0x56f73310 "fat:rw:", reference=0x0,
 options=0x56f757b0, flags=0, errp=0x56c98c40 )
 at ../block.c:3872
#8  blk_new_open (filename=0x56f73310 "fat:rw:", reference=0x0,
 options=0x56f757b0, flags=0, errp=0x56c98c40 )
 at ../block/block-backend.c:436
#9  blockdev_init (file=0x56f73310 "fat:rw:",
 bs_opts=0x56f757b0, errp=0x56c98c40 )
 at ../blockdev.c:608
#10 drive_new (all_opts=0x56d2b700, block_default_type=IF_IDE,
 errp=0x56c98c40 ) at ../blockdev.c:992
..

Signed-off-by: Daniella Lee 
---
  block/vvfat.c | 15 +++
  1 file changed, 15 insertions(+)


Hi,

Thanks for your patch!  Yes, that makes sense.

I believe there are some issues that should be addressed, though:


diff --git a/block/vvfat.c b/block/vvfat.c
index 05e78e3c27..454a74c5d5 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -1280,7 +1280,22 @@ static int vvfat_open(BlockDriverState *bs, QDict 
*options, int flags,
  qemu_co_mutex_init(&s->lock);
  
  ret = 0;

+
+qemu_opts_del(opts);
+return ret;


Optional: I’d drop the `ret = 0;` line and just `return 0;` here.


  fail:
+if(s->qcow_filename) {


Our coding style requires a space between `if` and the opening parenthesis.


+g_free(s->qcow_filename);


`g_free()` checks whether the parameter given to it is `NULL`, and if 
so, performs a no-op.  So checking whether `s->qcow_filename != NULL` 
before calling `g_free()` is not necessary.


We have a script under scripts/checkpatch.pl that takes patch files as 
input and checks whether they conform to our coding style.  It’s really 
helpful, for example in these two cases it does report the issues.



+s->qcow_filename = NULL;
+}
+if(s->cluster_buffer) {
+g_free(s->cluster_buffer);
+s->cluster_buffer = NULL;
+}
+if(s->used_clusters) {
+g_free(s->used_clusters);


`s->used_clusters` is allocated with `calloc()`, so it can’t be freed 
with `g_free()`.  But you’re right, it should be `g_free()`-able, so the 
fix is to have `enable_write_target()` allocate it with `g_malloc0(size)`.


(And this made me notice that we free neither `s->used_clusters` nor 
`s->qcow_filename` in vvfat_close()...  Oops.)



+s->used_clusters = NULL;
+}
  qemu_opts_del(opts);
  return ret;
  }


Finally, `enable_write_target()` frees `s->qcow_filename` on error.  
That seems unnecessary now, though not wrong.  (It’s just weird that it 
frees that one, but not `s->used_clusters`...)


Hanna




Re: [PATCH] docs: Introducing pseries documentation.

2021-11-18 Thread Cédric Le Goater




+ * Multi processor support for many Power processors generations: POWER5+,
+   POWER7, POWER7+, POWER8, POWER8NVL, Power9, and Power10 (there is no support
+   for POWER6 processors).


I wouldn't trust the POWER5+ cpu emulation with pseries; only POWER7
and later has been tested at all.  Actually.. I wouldn't trust the
POWER5+ CPU emulation much at all, if it's been tested, it's not for a
long, long time.


SLOF starts and gets a machine check but it's alive.

Thanks,

C.




Re: [PATCH] docs: Introducing pseries documentation.

2021-11-18 Thread Cédric Le Goater

Hello Leonardo,

On 11/17/21 21:14, lagar...@linux.ibm.com wrote:

From: Leonardo Garcia 

The purpose of this document is to substitute the content currently
available in the QEMU wiki at [0]. This initial version does contain
some additional content as well. Whenever this documentation gets
upstream and is reflected in [1], the QEMU wiki will be edited to point
to this documentation, so that we only need to keep it updated in one
place.

0. https://wiki.qemu.org/Documentation/Platforms/POWER
1. https://qemu.readthedocs.io/en/latest/system/ppc/pseries.html

Signed-off-by: Leonardo Garcia 



Thanks for this update,

Some general comments,

There are other ppc documents :

  docs/specs/ppc-spapr-hcalls.txt
  docs/specs/ppc-spapr-hotplug.txt
  docs/specs/ppc-spapr-numa.rst
  docs/specs/ppc-spapr-uv-hcalls.txt
  docs/specs/ppc-spapr-xive.rst
  docs/specs/ppc-xive.rst

that should be moved maybe and/or referenced by this file ? Feel free
to do a follow up.

On the terminology, I think we should use in the text :

   pSeries, PowerNV, POWER[0-9]

When it comes to QEMU machines names, it's 'pseries', 'powernv'

Some small comments,



---
  docs/system/ppc/pseries.rst | 185 
  1 file changed, 185 insertions(+)

diff --git a/docs/system/ppc/pseries.rst b/docs/system/ppc/pseries.rst
index 932d4dd17d..2de3fb4d51 100644
--- a/docs/system/ppc/pseries.rst
+++ b/docs/system/ppc/pseries.rst
@@ -1,12 +1,197 @@
  pSeries family boards (``pseries``)
  ===
  
+The Power machine virtualized environment described by the `Linux on Power


para-virtualized ?


+Architecture Reference document (LoPAR)
+`_
+is called pseries. This environment is also known as sPAPR, System p guests, or
+simply Power Linux guests (although it is capable of running other operating
+systems, such as AIX).
+
+Even though pseries is designed to behave as a guest environment, it is also
+capable of acting as a hypervisor OS, providing, on that role, nested
+virtualization capabilities.


on POWER9 and above processors. Maybe we should start introducing the
KVM-pseries term.


+
  Supported devices
  -
  
+ * Multi processor support for many Power processors generations: POWER5+,

+   POWER7, POWER7+, POWER8, POWER8NVL, Power9, and Power10 (there is no support
+   for POWER6 processors).


POWER8NVL is for baremetal only.


+ * Interrupt Controller, XICS (POWER8) and XIVE (Power9 and Power10)
+ * vPHB PCIe Host bridge.
+ * vscsi and vnet devices, compatible with the same devices available on a
+   PowerVM hypervisor with VIOS managing LPARs.
+ * Virtio based devices.
+ * PCIe device pass through.
+
  Missing devices
  ---
  
+ * SPICE support.
  
  Firmware

  
+
+`SLOF `_ (Slimline Open Firmware) is an
+implementation of the `IEEE 1275-1994, Standard for Boot (Initialization
+Configuration) Firmware: Core Requirements and Practices
+`_.
+
+QEMU includes a prebuilt image of SLOF which is updated when a more recent
+version is required.
+
+Build directions
+
+
+.. code-block:: bash
+
+  ./configure --target-list=ppc64-softmmu && make
+
+Running instructions
+
+
+Someone can select the pseries machine type by running QEMU with the following
+options:
+
+.. code-block:: bash
+
+  qemu-system-ppc64 -M pseries 
+
+sPAPR devices
+-
+
+The sPAPR specification defines a set of para-virtualized devices, which are
+also supported by the pseries machine in QEMU and can be instantiated with the
+`-device` option:
+
+* spapr-vlan : A virtual network interface.
+* spapr-vscsi : A virtual SCSI disk interface.
+* spapr-rng : A pseudo-device for passing random number generator data to the
+  guest (see the `H_RANDOM hypercall feature
+  `_ for details).


spapr-vty and tpm.


+
+These are compatible with the devices historically available for use when
+running the IBM PowerVM hypervisor with LPARs.
+
+However, since these devices have originally been specified with another
+hypervisor and non-Linux guests in mind, you should use the virtio counterparts
+(virtio-net, virtio-blk/scsi and virtio-rng) if possible instead, since they
+will most probably give you better performance with Linux guests in a QEMU
+environment.
+
+The pseries machine in QEMU is always instantiated with a NVRAM device
+(``spapr-nvram``), so it is not needed to add this manually. 


also a spapr-vty and a spapr-pci-host-bridge


However, if someone
+wants to make the contents of the NVRAM device persistent, they will need to
+specify a PFLASH device when starting QEMU, i.e. either use
+``-drive if=pflash,file=,format=raw`` to set the default PFLASH
+device, or specify one with an ID
+(``-drive if=none,file=,format=raw,id=pfid``) and pass that 

Re: [PATCH] docs: Minor updates on the powernv documentation.

2021-11-18 Thread Cédric Le Goater

On 11/17/21 21:16, lagar...@linux.ibm.com wrote:

From: Leonardo Garcia 

Signed-off-by: Leonardo Garcia 
---
  docs/system/ppc/powernv.rst | 56 +++--
  1 file changed, 29 insertions(+), 27 deletions(-)

diff --git a/docs/system/ppc/powernv.rst b/docs/system/ppc/powernv.rst
index 86186b7d2c..907c4ce4f9 100644
--- a/docs/system/ppc/powernv.rst
+++ b/docs/system/ppc/powernv.rst
@@ -1,7 +1,7 @@
-PowerNV family boards (``powernv8``, ``powernv9``)
+PowerNV family boards (``powernv8``, ``powernv9``, ``power10``)


powernv10.

And I still haven't resent the XIVE2/PHB5 patches to enable full support.


  ==
  
-PowerNV (as Non-Virtualized) is the "baremetal" platform using the

+PowerNV (as Non-Virtualized) is the "bare metal" platform using the
  OPAL firmware. It runs Linux on IBM and OpenPOWER systems and it can
  be used as an hypervisor OS, running KVM guests, or simply as a host
  OS.
@@ -15,17 +15,15 @@ beyond the scope of what QEMU addresses today.
  Supported devices
  -
  
- * Multi processor support for POWER8, POWER8NVL and POWER9.

- * XSCOM, serial communication sideband bus to configure chiplets
- * Simple LPC Controller
- * Processor Service Interface (PSI) Controller
- * Interrupt Controller, XICS (POWER8) and XIVE (POWER9)
- * POWER8 PHB3 PCIe Host bridge and POWER9 PHB4 PCIe Host bridge
- * Simple OCC is an on-chip microcontroller used for power management
-   tasks
- * iBT device to handle BMC communication, with the internal BMC
-   simulator provided by QEMU or an external BMC such as an Aspeed
-   QEMU machine.
+ * Multi processor support for POWER8, POWER8NVL and Power9.


s/Power9/POWER9/ right ? I am starting to doubt :)


+ * XSCOM, serial communication sideband bus to configure chiplets.
+ * Simple LPC Controller.
+ * Processor Service Interface (PSI) Controller.
+ * Interrupt Controller, XICS (POWER8) and XIVE (Power9) and XIVE2 (Power10).
+ * POWER8 PHB3 PCIe Host bridge and POWER9 PHB4 PCIe Host bridge.
+ * Simple OCC is an on-chip micro-controller used for power management tasks.
+ * iBT device to handle BMC communication, with the internal BMC simulator
+   provided by QEMU or an external BMC such as an Aspeed QEMU machine.
   * PNOR containing the different firmware partitions.
  
  Missing devices

@@ -33,27 +31,25 @@ Missing devices
  
  A lot is missing, among which :
  
- * POWER10 processor

- * XIVE2 (POWER10) interrupt controller
- * I2C controllers (yet to be merged)
- * NPU/NPU2/NPU3 controllers
- * EEH support for PCIe Host bridge controllers
- * NX controller
- * VAS controller
- * chipTOD (Time Of Day)
+ * I2C controllers (yet to be merged).
+ * NPU/NPU2/NPU3 controllers.
+ * EEH support for PCIe Host bridge controllers.
+ * NX controller.
+ * VAS controller.
+ * chipTOD (Time Of Day).
   * Self Boot Engine (SBE).
- * FSI bus
+ * FSI bus.
  
  Firmware

  
  
  The OPAL firmware (OpenPower Abstraction Layer) for OpenPower systems

  includes the runtime services ``skiboot`` and the bootloader kernel and
-initramfs ``skiroot``. Source code can be found on GitHub:
+initramfs ``skiroot``. Source code can be found on the `OpenPOWER account at
+GitHub `_.
  
-  https://github.com/open-power.

-
-Prebuilt images of ``skiboot`` and ``skiroot`` are made available on the `OpenPOWER 
`__ site.
+Prebuilt images of ``skiboot`` and ``skiroot`` are made available on the
+`OpenPOWER `__ site.
  
  QEMU includes a prebuilt image of ``skiboot`` which is updated when a

  more recent version is required by the models.
@@ -83,6 +79,7 @@ and a SATA disk :
  
  Complex PCIe configuration

  ~~
+
  Six PHBs are defined per chip (POWER9) but no default PCI layout is
  provided (to be compatible with libvirt). One PCI device can be added
  on any of the available PCIe slots using command line options such as:
@@ -157,7 +154,7 @@ one on the command line :
  The files `palmetto-SDR.bin 
`__
  and `palmetto-FRU.bin `__
  define a Sensor Data Record repository and a Field Replaceable Unit
-inventory for a palmetto BMC. They can be used to extend the QEMU BMC
+inventory for a Palmetto BMC. They can be used to extend the QEMU BMC
  simulator.
  
  .. code-block:: bash

@@ -190,3 +187,8 @@ CAVEATS
  
   * No support for multiple HW threads (SMT=1). Same as pseries.

   * CPU can hang when doing intensive I/Os. Use ``-append powersave=off`` in 
that case.


The last caveat is not true. I found the issue since in the decrementer
model.

Thanks

C.


+
+Maintainer contact information
+--
+
+Cédric Le Goater 
\ No newline at end of file






Re: [PATCH v3] failover: fix unplug pending detection

2021-11-18 Thread Laurent Vivier

On 18/10/2021 10:27, Michael S. Tsirkin wrote:

On Mon, Oct 18, 2021 at 09:19:16AM +0200, Laurent Vivier wrote:

Hi,

I don't understand if there are some issues


Gerd did identify some issues, you felt they aren't related to the patch
and need to be addressed separately.

Gerd posted patches that are supposed to address them since.
"try improve native hotplug for pcie root ports"
Could you please either
- test and report that your series depend on
   Gerd's one to now work without the issues.
   preferably by reposting a patch that applies on top.
- test and report that the functionality is still partially
   broken but explain in the commit log that this is not due
   to the patch itself, and not made worse.

in both cases please CC reviewers: Daniel, Gerd.



I'm writing a test in tests/qtest that tests virtio-net failover, and I've added a test 
that checks the migration doesn't start while the card is not unplugged.


I've run the test on top of current qemu master (where Gerd's series is merged) and the 
problem still exists.


I will re-send this fix and the test in the same series.

Thanks,
Laurent




Re: [PATCH v1 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically

2021-11-18 Thread Juan Quintela
huang...@chinatelecom.cn wrote:
> From: Hyman Huang(黄勇) 
>
> introduce the third method GLOBAL_DIRTY_RESTRAINT of dirty
> tracking for calculate dirtyrate periodly for dirty restraint.
>
> implement thread for calculate dirtyrate periodly, which will
> be used for dirty restraint.
>
> add dirtyrestraint.h to introduce the util function for dirty
> restrain.
>
> Signed-off-by: Hyman Huang(黄勇) 

Some comentes:

> +void dirtyrestraint_calc_start(void);
> +
> +void dirtyrestraint_calc_state_init(int max_cpus);

dirtylimit_? instead of restraint.

We have a start function, but I can't see a finish/end/stop functions.

> +#define DIRTYRESTRAINT_CALC_TIME_MS 1000/* 1000ms */
> +
> +struct {
> +DirtyRatesData data;
> +int64_t period;
> +bool enable;

Related to previous comment.  I can't see where we set enable to 1, but
nowhere were we set it back to 0, so this never finish.

> +QemuCond ready_cond;
> +QemuMutex ready_mtx;

This is a question of style, but when you only have a mutex and a cond
in one struct, you can use the "cond" and "mutex" names.

But as said, it is a question of style, if you preffer do it this way.

> +static inline void record_dirtypages(DirtyPageRecord *dirty_pages,
> + CPUState *cpu, bool start);

You have put the code at the beggining of the file, if you put it at the
end of it, I think you can avoid this forward declaration.

> +static void dirtyrestraint_calc_func(void)
> +{
> +CPUState *cpu;
> +DirtyPageRecord *dirty_pages;
> +int64_t start_time, end_time, calc_time;
> +DirtyRateVcpu rate;
> +int i = 0;
> +
> +dirty_pages = g_malloc0(sizeof(*dirty_pages) *
> +dirtyrestraint_calc_state->data.nvcpu);
> +
> +dirtyrestraint_global_dirty_log_start();
> +
> +CPU_FOREACH(cpu) {
> +record_dirtypages(dirty_pages, cpu, true);
> +}
> +
> +start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +g_usleep(DIRTYRESTRAINT_CALC_TIME_MS * 1000);
> +end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +calc_time = end_time - start_time;
> +
> +dirtyrestraint_global_dirty_log_stop();
> +
> +CPU_FOREACH(cpu) {
> +record_dirtypages(dirty_pages, cpu, false);
> +}
> +
> +for (i = 0; i < dirtyrestraint_calc_state->data.nvcpu; i++) {
> +uint64_t increased_dirty_pages =
> +dirty_pages[i].end_pages - dirty_pages[i].start_pages;
> +uint64_t memory_size_MB =
> +(increased_dirty_pages * TARGET_PAGE_SIZE) >> 20;
> +int64_t dirtyrate = (memory_size_MB * 1000) / calc_time;
> +
> +rate.id = i;
> +rate.dirty_rate  = dirtyrate;
> +dirtyrestraint_calc_state->data.rates[i] = rate;
> +
> +trace_dirtyrate_do_calculate_vcpu(i,
> +dirtyrestraint_calc_state->data.rates[i].dirty_rate);
> +}
> +
> +return;

unnecesary return;

> +}
> +
> +static void *dirtyrestraint_calc_thread(void *opaque)
> +{
> +rcu_register_thread();
> +
> +while (qatomic_read(&dirtyrestraint_calc_state->enable)) {
> +dirtyrestraint_calc_func();
> +dirtyrestraint_calc_state->ready = true;

   You really need this to be a global variable?  You can pass
   it on the opaque, no?

Later, Juan.




Re: [PATCH v4 08/25] block: introduce assert_bdrv_graph_writable

2021-11-18 Thread Emanuele Giuseppe Esposito



On 12/11/2021 15:40, Hanna Reitz wrote:

On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:

We want to be sure that the functions that write the child and
parent list of a bs are under BQL and drain.

BQL prevents from concurrent writings from the GS API, while
drains protect from I/O.

TODO: drains are missing in some functions using this assert.
Therefore a proper assertion will fail. Because adding drains
requires additional discussions, they will be added in future
series.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Stefan Hajnoczi 
---
  block.c    |  5 +
  block/io.c | 11 +++
  include/block/block_int-global-state.h | 10 +-
  3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 41c5883c5c..94bff5c757 100644
--- a/block.c
+++ b/block.c
@@ -2734,12 +2734,14 @@ static void 
bdrv_replace_child_noperm(BdrvChild *child,

  if (child->klass->detach) {
  child->klass->detach(child);
  }
+    assert_bdrv_graph_writable(old_bs);
  QLIST_REMOVE(child, next_parent);


I think this belongs above the .detach() call (and the QLIST_REMOVE() 
belongs into the .detach() implementation, as done in 
https://lists.nongnu.org/archive/html/qemu-block/2021-11/msg00240.html, 
which has been merged to Kevin’s block branch).


Yes, I rebased on kwolf/block branch. Thank you for pointing that out.



  }
  child->bs = new_bs;
  if (new_bs) {
+    assert_bdrv_graph_writable(new_bs);
  QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);


In both these places it’s a bit strange that the assertion is done on 
the child nodes.  The subgraph starting from them isn’t modified after 
all, so their subgraph technically doesn’t need to be writable.  I think 
a single assertion on the parent node would be preferable.


I presume the problem with that is that we don’t have the parent node 
here?  Do we need a new BdrvChildClass method that performs this 
assertion on the parent node?




Uhm I am not sure what you mean here.

Just to recap on how I see this: the assertion 
assert_bdrv_graph_writable(bs) is basically used to make sure we are 
protecting the write on some fields (childrens and parents lists in this 
patch) of a given @bs. It should work like a rwlock: reading is allowed 
to be concurrent, but a write should stop all readers to prevent 
concurrency issues. This is achieved by draining.


Let's use the first case that you point out, old_bs (it's specular for 
new_bs):


>> +assert_bdrv_graph_writable(old_bs);
>>   QLIST_REMOVE(child, next_parent);

So old_bs should be the child "son" (child->bs), meaning old_bs->parents 
contains the child. Therefore when a child is removed by old_bs, we need 
to be sure we are doing it safely.


So we should check that if old_bs exists, old_bs should be drained, to 
prevent any other iothread from reading the ->parents list that is being 
updated.


The only thing to keep in mind in this case is that just wrapping a 
drain around that won't be enough, because then the child won't be 
included in the drain_end(old_bs). Therefore the right way to cover this 
drain-wise once the assertion also checks for drains is:


drain_begin(old_bs)
assert_bdrv_graph_writable(old_bs)
QLIST_REMOVE(child, next_parent)
/* old_bs will be under drain_end, but not the child */
bdrv_parent_drained_end_single(child);
bdrv_drained_end(old_bs);

I think you agree on this so far.

Now I think your concern is related to the child "parent", namely 
child->opaque. The problem is that in the .detach and .attach callbacks 
we are firstly adding/removing the child from the list, and then calling 
drain on the subtree. We would ideally need to do the opposite:


assert_bdrv_graph_writable(bs);
QLIST_REMOVE(child, next);
bdrv_unapply_subtree_drain(child, bs);

In this case I think this would actually work, because removing/adding 
the child from the ->children list beforehand just prevents an 
additional recursion call (I think, and the fact that tests are passing 
seems to confirm my theory).


Of course you know this stuff better than me, so let me know if 
something here is wrong.



  /*
@@ -2940,6 +2942,7 @@ static int 
bdrv_attach_child_noperm(BlockDriverState *parent_bs,

  return ret;
  }
+    assert_bdrv_graph_writable(parent_bs);
  QLIST_INSERT_HEAD(&parent_bs->children, *child, next);
  /*
   * child is removed in bdrv_attach_child_common_abort(), so 
don't care to
@@ -3140,6 +3143,7 @@ static void 
bdrv_unset_inherits_from(BlockDriverState *root, BdrvChild *child,

  void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child)
  {
  assert(qemu_in_main_thread());
+    assert_bdrv_graph_writable(parent);


It looks to me like we have this assertion mainly because 
bdrv_replace_child_noperm() doesn’t have a pointer to this parent node. 
I

[PATCH for-6.2 0/2] esp: add fix for reset before transfer

2021-11-18 Thread Mark Cave-Ayland
This is the fix for Gitlab issue #724 discovered by fuzzing which I think is
worth including in 6.2 for 2 reasons: firstly the fix is to zero out
an extra field during chip reset which normally only occurs during driver
initialisation and durring IO timeouts, and secondly the bug causes a stale
SCSI data buffer pointer dereference rather than triggering a FIFO assert.

The first patch contains the very simple fix, whilst the second patch adds a
qtest based upon the original Gitlab issue.

Signed-off-by: Mark Cave-Ayland 


Mark Cave-Ayland (2):
  esp: ensure that async_len is reset to 0 during esp_hard_reset()
  qtest/am53c974-test: add test for reset before transfer

 hw/scsi/esp.c   |  1 +
 tests/qtest/am53c974-test.c | 30 ++
 2 files changed, 31 insertions(+)

-- 
2.20.1




[PATCH for-6.2 2/2] qtest/am53c974-test: add test for reset before transfer

2021-11-18 Thread Mark Cave-Ayland
Based upon the qtest reproducer posted to Gitlab issue #724 at
https://gitlab.com/qemu-project/qemu/-/issues/724.

Signed-off-by: Mark Cave-Ayland 
---
 tests/qtest/am53c974-test.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/tests/qtest/am53c974-test.c b/tests/qtest/am53c974-test.c
index 9b1e4211bd..d214a912b3 100644
--- a/tests/qtest/am53c974-test.c
+++ b/tests/qtest/am53c974-test.c
@@ -223,6 +223,34 @@ static void test_inflight_cancel_ok(void)
 qtest_quit(s);
 }
 
+static void test_reset_before_transfer_ok(void)
+{
+QTestState *s = qtest_init(
+"-device am53c974,id=scsi "
+"-device scsi-hd,drive=disk0 -drive "
+"id=disk0,if=none,file=null-co://,format=raw -nodefaults");
+
+qtest_outl(s, 0xcf8, 0x80001010);
+qtest_outl(s, 0xcfc, 0xc000);
+qtest_outl(s, 0xcf8, 0x80001004);
+qtest_outw(s, 0xcfc, 0x01);
+qtest_outl(s, 0xc007, 0x2500);
+qtest_outl(s, 0xc00a, 0x41);
+qtest_outl(s, 0xc00a, 0x41);
+qtest_outw(s, 0xc00b, 0x0200);
+qtest_outw(s, 0xc040, 0x03);
+qtest_outw(s, 0xc009, 0x00);
+qtest_outw(s, 0xc00b, 0x00);
+qtest_outw(s, 0xc009, 0x00);
+qtest_outw(s, 0xc00b, 0x00);
+qtest_outw(s, 0xc009, 0x00);
+qtest_outw(s, 0xc003, 0x1000);
+qtest_outw(s, 0xc00b, 0x1000);
+qtest_outl(s, 0xc00b, 0x9000);
+qtest_outw(s, 0xc00b, 0x1000);
+qtest_quit(s);
+}
+
 int main(int argc, char **argv)
 {
 const char *arch = qtest_get_arch();
@@ -248,6 +276,8 @@ int main(int argc, char **argv)
test_cancelled_request_ok);
 qtest_add_func("am53c974/test_inflight_cancel_ok",
test_inflight_cancel_ok);
+qtest_add_func("am53c974/test_reset_before_transfer_ok",
+   test_reset_before_transfer_ok);
 }
 
 return g_test_run();
-- 
2.20.1




[PATCH for-6.2 1/2] esp: ensure that async_len is reset to 0 during esp_hard_reset()

2021-11-18 Thread Mark Cave-Ayland
If a reset command is sent after data has been transferred into the SCSI buffer
ensure that async_len is reset to 0. Otherwise a subsequent TI command assumes
the SCSI buffer contains data to be transferred to the device causing it to
dereference the stale async_buf pointer.

Signed-off-by: Mark Cave-Ayland 
Fixes: https://gitlab.com/qemu-project/qemu/-/issues/724
---
 hw/scsi/esp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index 84f935b549..58d0edbd56 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -894,6 +894,7 @@ void esp_hard_reset(ESPState *s)
 memset(s->wregs, 0, ESP_REGS);
 s->tchi_written = 0;
 s->ti_size = 0;
+s->async_len = 0;
 fifo8_reset(&s->fifo);
 fifo8_reset(&s->cmdfifo);
 s->dma = 0;
-- 
2.20.1




Re: [PATCH v3] failover: fix unplug pending detection

2021-11-18 Thread Ani Sinha
On Thu, Nov 18, 2021 at 2:45 PM Laurent Vivier  wrote:
>
> On 18/10/2021 10:27, Michael S. Tsirkin wrote:
> > On Mon, Oct 18, 2021 at 09:19:16AM +0200, Laurent Vivier wrote:
> >> Hi,
> >>
> >> I don't understand if there are some issues
> >
> > Gerd did identify some issues, you felt they aren't related to the patch
> > and need to be addressed separately.
> >
> > Gerd posted patches that are supposed to address them since.
> > "try improve native hotplug for pcie root ports"
> > Could you please either
> > - test and report that your series depend on
> >Gerd's one to now work without the issues.
> >preferably by reposting a patch that applies on top.
> > - test and report that the functionality is still partially
> >broken but explain in the commit log that this is not due
> >to the patch itself, and not made worse.
> >
> > in both cases please CC reviewers: Daniel, Gerd.
> >
>
> I'm writing a test in tests/qtest that tests virtio-net failover, and I've 
> added a test
> that checks the migration doesn't start while the card is not unplugged.
>
> I've run the test on top of current qemu master (where Gerd's series is 
> merged) and the
> problem still exists.

btw, for the records, we have decided to continue to use acpi hotplug
as default and not revert to native for 6.2. So regardless of what
Gerd's patches does, we need to address issues around acpi hotplug if
it exists, for failover.

>
> I will re-send this fix and the test in the same series.
>
> Thanks,
> Laurent
>



Re: [PULL 0/1] VFIO fixes 2021-11-17 (for v6.2)

2021-11-18 Thread Richard Henderson

On 11/17/21 8:17 PM, Alex Williamson wrote:

The following changes since commit 3bb87484e77d22cf4e580a78856529c982195d32:

   Merge tag 'pull-request-2021-11-17' of https://gitlab.com/thuth/qemu into 
staging (2021-11-17 12:35:51 +0100)

are available in the Git repository at:

   git://github.com/awilliam/qemu-vfio.git tags/vfio-fixes-2027.0

for you to fetch changes up to f3bc3a73c908df15966e66f88d5a633bd42fd029:

   vfio: Fix memory leak of hostwin (2021-11-17 11:25:55 -0700)


VFIO fixes 2021-11-17

  * Fix hostwin memory leak (Peng Liang)


Peng Liang (1):
   vfio: Fix memory leak of hostwin

  hw/vfio/common.c | 8 
  1 file changed, 8 insertions(+)


Applied, thanks.

r~



Re: [PATCH for-6.2 2/2] qtest/am53c974-test: add test for reset before transfer

2021-11-18 Thread Thomas Huth

On 18/11/2021 11.03, Mark Cave-Ayland wrote:

Based upon the qtest reproducer posted to Gitlab issue #724 at
https://gitlab.com/qemu-project/qemu/-/issues/724.

Signed-off-by: Mark Cave-Ayland 
---
  tests/qtest/am53c974-test.c | 30 ++
  1 file changed, 30 insertions(+)

diff --git a/tests/qtest/am53c974-test.c b/tests/qtest/am53c974-test.c
index 9b1e4211bd..d214a912b3 100644
--- a/tests/qtest/am53c974-test.c
+++ b/tests/qtest/am53c974-test.c
@@ -223,6 +223,34 @@ static void test_inflight_cancel_ok(void)
  qtest_quit(s);
  }
  
+static void test_reset_before_transfer_ok(void)

+{
+QTestState *s = qtest_init(
+"-device am53c974,id=scsi "
+"-device scsi-hd,drive=disk0 -drive "
+"id=disk0,if=none,file=null-co://,format=raw -nodefaults");
+
+qtest_outl(s, 0xcf8, 0x80001010);
+qtest_outl(s, 0xcfc, 0xc000);
+qtest_outl(s, 0xcf8, 0x80001004);
+qtest_outw(s, 0xcfc, 0x01);
+qtest_outl(s, 0xc007, 0x2500);
+qtest_outl(s, 0xc00a, 0x41);
+qtest_outl(s, 0xc00a, 0x41);
+qtest_outw(s, 0xc00b, 0x0200);
+qtest_outw(s, 0xc040, 0x03);
+qtest_outw(s, 0xc009, 0x00);
+qtest_outw(s, 0xc00b, 0x00);
+qtest_outw(s, 0xc009, 0x00);
+qtest_outw(s, 0xc00b, 0x00);
+qtest_outw(s, 0xc009, 0x00);
+qtest_outw(s, 0xc003, 0x1000);
+qtest_outw(s, 0xc00b, 0x1000);
+qtest_outl(s, 0xc00b, 0x9000);
+qtest_outw(s, 0xc00b, 0x1000);
+qtest_quit(s);
+}
+
  int main(int argc, char **argv)
  {
  const char *arch = qtest_get_arch();
@@ -248,6 +276,8 @@ int main(int argc, char **argv)
 test_cancelled_request_ok);
  qtest_add_func("am53c974/test_inflight_cancel_ok",
 test_inflight_cancel_ok);
+qtest_add_func("am53c974/test_reset_before_transfer_ok",
+   test_reset_before_transfer_ok);
  }
  
  return g_test_run();




Acked-by: Thomas Huth 




Re: [PATCH v4 08/25] block: introduce assert_bdrv_graph_writable

2021-11-18 Thread Emanuele Giuseppe Esposito




On 18/11/2021 10:55, Emanuele Giuseppe Esposito wrote:


On 12/11/2021 15:40, Hanna Reitz wrote:

On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:

We want to be sure that the functions that write the child and
parent list of a bs are under BQL and drain.

BQL prevents from concurrent writings from the GS API, while
drains protect from I/O.

TODO: drains are missing in some functions using this assert.
Therefore a proper assertion will fail. Because adding drains
requires additional discussions, they will be added in future
series.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Stefan Hajnoczi 
---
  block.c    |  5 +
  block/io.c | 11 +++
  include/block/block_int-global-state.h | 10 +-
  3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 41c5883c5c..94bff5c757 100644
--- a/block.c
+++ b/block.c
@@ -2734,12 +2734,14 @@ static void 
bdrv_replace_child_noperm(BdrvChild *child,

  if (child->klass->detach) {
  child->klass->detach(child);
  }
+    assert_bdrv_graph_writable(old_bs);
  QLIST_REMOVE(child, next_parent);


I think this belongs above the .detach() call (and the QLIST_REMOVE() 
belongs into the .detach() implementation, as done in 
https://lists.nongnu.org/archive/html/qemu-block/2021-11/msg00240.html, which 
has been merged to Kevin’s block branch).


Yes, I rebased on kwolf/block branch. Thank you for pointing that out.



  }
  child->bs = new_bs;
  if (new_bs) {
+    assert_bdrv_graph_writable(new_bs);
  QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);


In both these places it’s a bit strange that the assertion is done on 
the child nodes.  The subgraph starting from them isn’t modified after 
all, so their subgraph technically doesn’t need to be writable.  I 
think a single assertion on the parent node would be preferable.


I presume the problem with that is that we don’t have the parent node 
here?  Do we need a new BdrvChildClass method that performs this 
assertion on the parent node?




Uhm I am not sure what you mean here.

Just to recap on how I see this: the assertion 
assert_bdrv_graph_writable(bs) is basically used to make sure we are 
protecting the write on some fields (childrens and parents lists in this 
patch) of a given @bs. It should work like a rwlock: reading is allowed 
to be concurrent, but a write should stop all readers to prevent 
concurrency issues. This is achieved by draining.


I am thinking to add an additional explanation to 
assert_bdrv_graph_writable header comment by saying
"Drains act as a rwlock: while reading is allowed to be concurrent from 
all iothreads, when a write needs to be performed we need to stop 
(drain) all involved iothreads from reading the graph, to avoid race 
conditions."


Somethink like that.

Emanuele


Let's use the first case that you point out, old_bs (it's specular for 
new_bs):


 >> +    assert_bdrv_graph_writable(old_bs);
 >>   QLIST_REMOVE(child, next_parent);

So old_bs should be the child "son" (child->bs), meaning old_bs->parents 
contains the child. Therefore when a child is removed by old_bs, we need 
to be sure we are doing it safely.


So we should check that if old_bs exists, old_bs should be drained, to 
prevent any other iothread from reading the ->parents list that is being 
updated.


The only thing to keep in mind in this case is that just wrapping a 
drain around that won't be enough, because then the child won't be 
included in the drain_end(old_bs). Therefore the right way to cover this 
drain-wise once the assertion also checks for drains is:


drain_begin(old_bs)
assert_bdrv_graph_writable(old_bs)
QLIST_REMOVE(child, next_parent)
/* old_bs will be under drain_end, but not the child */
bdrv_parent_drained_end_single(child);
bdrv_drained_end(old_bs);

I think you agree on this so far.

Now I think your concern is related to the child "parent", namely 
child->opaque. The problem is that in the .detach and .attach callbacks 
we are firstly adding/removing the child from the list, and then calling 
drain on the subtree. We would ideally need to do the opposite:


assert_bdrv_graph_writable(bs);
QLIST_REMOVE(child, next);
bdrv_unapply_subtree_drain(child, bs);

In this case I think this would actually work, because removing/adding 
the child from the ->children list beforehand just prevents an 
additional recursion call (I think, and the fact that tests are passing 
seems to confirm my theory).


Of course you know this stuff better than me, so let me know if 
something here is wrong.



  /*
@@ -2940,6 +2942,7 @@ static int 
bdrv_attach_child_noperm(BlockDriverState *parent_bs,

  return ret;
  }
+    assert_bdrv_graph_writable(parent_bs);
  QLIST_INSERT_HEAD(&parent_bs->children, *child, next);
  /*
   * child is removed in bdrv_attach_child_c

Re: [PATCH v2] hw/arm/virt: Expose empty NUMA nodes through ACPI

2021-11-18 Thread Jonathan Cameron
On Wed, 17 Nov 2021 19:08:28 +0100
David Hildenbrand  wrote:

> On 17.11.21 15:30, Jonathan Cameron wrote:
> > On Tue, 16 Nov 2021 12:11:29 +0100
> > David Hildenbrand  wrote:
> >   
> 
>  Examples include exposing HBM or PMEM to the VM. Just like on real HW,
>  this memory is exposed via cpu-less, special nodes. In contrast to real
>  HW, the memory is hotplugged later (I don't think HW supports hotplug
>  like that yet, but it might just be a matter of time).
> >>>
> >>> I suppose some of that maybe covered by GENERIC_AFFINITY entries in SRAT
> >>> some by MEMORY entries. Or nodes created dynamically like with normal
> >>> hotplug memory.
> >>> 
> >   
> 
> Hi Jonathan,
> 
> > The naming of the define is unhelpful.  GENERIC_AFFINITY here corresponds
> > to Generic Initiator Affinity.  So no good for memory. This is meant for
> > representation of accelerators / network cards etc so you can get the NUMA
> > characteristics for them accessing Memory in other nodes.
> > 
> > My understanding of 'traditional' memory hotplug is that typically the
> > PA into which memory is hotplugged is known at boot time whether or not
> > the memory is physically present.  As such, you present that in SRAT and 
> > rely
> > on the EFI memory map / other information sources to know the memory isn't
> > there.  When it is hotplugged later the address is looked up in SRAT to 
> > identify
> > the NUMA node.  
> 
> in virtualized environments we use the SRAT only to indicate the hotpluggable
> region (-> indicate maximum possible PFN to the guest OS), the actual present
> memory+PXM assignment is not done via SRAT. We differ quite a lot here from
> actual hardware I think.
> 
> > 
> > That model is less useful for more flexible entities like virtio-mem or
> > indeed physical hardware such as CXL type 3 memory devices which typically
> > need their own nodes.
> > 
> > For the CXL type 3 option, currently proposal is to use the CXL table 
> > entries
> > representing Physical Address space regions to work out how many NUMA nodes
> > are needed and just create extra ones at boot.
> > https://lore.kernel.org/linux-cxl/163553711933.2509508.2203471175679990.st...@dwillia2-desk3.amr.corp.intel.com
> > 
> > It's a heuristic as we might need more nodes to represent things well kernel
> > side, but it's better than nothing and less effort that true dynamic node 
> > creation.
> > If you chase through the earlier versions of Alison's patch you will find 
> > some
> > discussion of that.
> > 
> > I wonder if virtio-mem should just grow a CDAT instance via a DOE?
> > 
> > That would make all this stuff discoverable via PCI config space rather 
> > than ACPI
> > CDAT is at:
> > https://uefi.org/sites/default/files/resources/Coherent%20Device%20Attribute%20Table_1.01.pdf
> > but the table access protocol over PCI DOE is currently in the CXL 2.0 spec
> > (nothing stops others using it though AFAIK).
> > 
> > However, then we'd actually need either dynamic node creation in the OS, or
> > some sort of reserved pool of extra nodes.  Long term it may be the most
> > flexible option.  
> 
> 
> I think for virtio-mem it's actually a bit simpler:
> 
> a) The user defined on the QEMU cmdline an empty node
> b) The user assigned a virtio-mem device to a node, either when 
>coldplugging or hotplugging the device.
> 
> So we don't actually "hotplug" a new node, the (possible) node is already 
> known
> to QEMU right when starting up. It's just a matter of exposing that fact to 
> the
> guest OS -- similar to how we expose the maximum possible PFN to the guest OS.
> It's seems to boild down to an ACPI limitation.
> 
> Conceptually, virtio-mem on an empty node in QEMU is not that different from
> hot/coldplugging a CPU to an empty node or hot/coldplugging a DIMM/NVDIMM to
> an empty node. But I guess it all just doesn't work with QEMU as of now.

A side distraction perhaps, but there is a code first acpi proposal to add
a 'softer' form of CPU hotplug 
https://bugzilla.tianocore.org/show_bug.cgi?id=3706

Whilst the reason for that proposal was for arm64 systems where there is no 
architected
physical hotplug, it might partly solve the empty node question for CPUs.  They 
won't
be empty, there will simply be CPUs that are marked as 'Online capable'.

> 
> In current x86-64 code, we define the "hotpluggable region" in 
> hw/i386/acpi-build.c via
> 
>   build_srat_memory(table_data, machine->device_memory->base,
> hotpluggable_address_space_size, nb_numa_nodes - 1,
> MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
> 
> So we tell the guest OS "this range is hotpluggable" and "it contains to
> this node unless the device says something different". From both values we
> can -- when under QEMU -- conclude the maximum possible PFN and the maximum
> possible node. But the latter is not what Linux does: it simply maps the last
> numa node (indicated in the memory entry) to a PXM

Re: [PATCH v2 for-6.2] meson.build: Support ncurses on MacOS and OpenBSD

2021-11-18 Thread Daniel P . Berrangé
On Wed, Nov 17, 2021 at 09:53:55PM +0100, Stefan Weil wrote:
> MacOS provides header files for curses 5.7 with support
> for wide characters, but requires _XOPEN_SOURCE_EXTENDED=1
> to activate that.
> 
> By default those old header files are used even if there
> is a newer Homebrew installation of ncurses 6.2 available.
> 
> Change also the old macro definition of NCURSES_WIDECHAR
> and set it to 1 like it is done in newer versions of
> curses.h when _XOPEN_SOURCE_EXTENDED=1 is defined.
> 
> OpenBSD has the same version of ncurses and needs the same fix.
> 
> Suggested-by: Daniel P. Berrangé 
> Signed-off-by: Stefan Weil 
> ---
> 
> v2:
> - Only define _XOPEN_SOURCE_EXTENDED when curses.h is used.
> - Extended to fix OpenBSD, too (untested!)
> 
>  meson.build | 5 -
>  ui/curses.c | 4 
>  2 files changed, 8 insertions(+), 1 deletion(-)

Reviewed-by: Daniel P. Berrangé 


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [qemu-web PATCH] remove deployment phase from CI

2021-11-18 Thread Daniel P . Berrangé
On Thu, Nov 18, 2021 at 08:47:01AM +0100, Paolo Bonzini wrote:
> qemu.org is now served via a reverse proxy from qemu-project.gitlab.io; it 
> does
> not need anymore the rsync step to the QEMU project's shell server.
> Remove it from the CI.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  .gitlab-ci.yml | 24 
>  1 file changed, 24 deletions(-)

Reviewed-by: Daniel P. Berrangé 


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH-for-6.2 v2 1/2] hw/block/fdc: Kludge missing floppy drive to fix CVE-2021-20196

2021-11-18 Thread Darren Kenny
Hi Philippe,

A small nit below, but otherwise looks good.

On Thursday, 2021-11-18 at 00:24:21 +01, Philippe Mathieu-Daudé wrote:
> Guest might select another drive on the bus by setting the
> DRIVE_SEL bit of the DIGITAL OUTPUT REGISTER (DOR).
> The current controller model doesn't expect a BlockBackend
> to be NULL. A simple way to fix CVE-2021-20196 is to create
> an empty BlockBackend when it is missing. All further
> accesses will be safely handled, and the controller state
> machines keep behaving correctly.
>
> Fixes: CVE-2021-20196
> Reported-by: Gaoning Pan (Ant Security Light-Year Lab) 
> BugLink: https://bugs.launchpad.net/qemu/+bug/1912780
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/338
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/block/fdc.c | 14 +-
>  1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/hw/block/fdc.c b/hw/block/fdc.c
> index fa933cd3263..eab17e946d6 100644
> --- a/hw/block/fdc.c
> +++ b/hw/block/fdc.c
> @@ -1161,7 +1161,19 @@ static FDrive *get_drv(FDCtrl *fdctrl, int unit)
>  
>  static FDrive *get_cur_drv(FDCtrl *fdctrl)
>  {
> -return get_drv(fdctrl, fdctrl->cur_drv);
> +FDrive *cur_drv = get_drv(fdctrl, fdctrl->cur_drv);
> +
> +if (!cur_drv->blk) {
> +/*
> + * Kludge: empty drive line selected. Create an anonymous
> + * BlockBackend to avoid NULL deref with various BlockBackend
> + * API calls within this model (CVE-2021-20196).
> + * Due to the controller QOM model limitations, we don't
> + * attach the created to the controller.
>
The created  to the controller

Something is missing here, maybe 'device'?

Thanks,

Darren.

> + */
> +cur_drv->blk = blk_new(qemu_get_aio_context(), 0, BLK_PERM_ALL);
> +}
> +return cur_drv;
>  }
>  
>  /* Status A register : 0x00 (read-only) */
> -- 
> 2.31.1



Re: [PATCH-for-6.2] hw/i386/microvm: Reduce annoying debug message in dt_setup_microvm()

2021-11-18 Thread Darren Kenny
On Wednesday, 2021-11-17 at 18:43:31 +01, Philippe Mathieu-Daudé wrote:
> Fixes: f5918a99283 ("microvm: add device tree support.")
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Darren Kenny 

> ---
>  hw/i386/microvm-dt.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/hw/i386/microvm-dt.c b/hw/i386/microvm-dt.c
> index 875ba919639..6ee6c42904d 100644
> --- a/hw/i386/microvm-dt.c
> +++ b/hw/i386/microvm-dt.c
> @@ -327,7 +327,9 @@ void dt_setup_microvm(MicrovmMachineState *mms)
>  dt_setup_sys_bus(mms);
>  
>  /* add to fw_cfg */
> -fprintf(stderr, "%s: add etc/fdt to fw_cfg\n", __func__);
> +if (debug) {
> +fprintf(stderr, "%s: add etc/fdt to fw_cfg\n", __func__);
> +}
>  fw_cfg_add_file(x86ms->fw_cfg, "etc/fdt", mms->fdt, size);
>  
>  if (debug) {
> -- 
> 2.31.1



Re: [qemu-web PATCH] update links to the SubmitAPatch wiki page

2021-11-18 Thread Daniel P . Berrangé
On Thu, Nov 18, 2021 at 08:47:02AM +0100, Paolo Bonzini wrote:
> The page is now part of the documentation, but it also has a redirect
> in the qemu.org web server to provide a stable URL.  Use it instead
> of linking out to wiki.qemu.org.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  CONTRIBUTING.md| 2 +-
>  contribute.md  | 2 +-
>  contribute/report-a-bug.md | 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)

Reviewed-by: Daniel P. Berrangé 


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH-for-6.2 v2 0/2] hw/block/fdc: Fix CVE-2021-20196

2021-11-18 Thread Darren Kenny
Hi Philippe,

Apart from a nit on patch 1, all looks good, so:

Reviewed-by: Darren Kenny 

Thanks,

Darren.

On Thursday, 2021-11-18 at 00:24:20 +01, Philippe Mathieu-Daudé wrote:
> I'm not sure what happened to v1 from Prasad, so since we are
> at rc2 I took a simpler approach to fix this CVE: create an
> empty drive to satisfy the BlockBackend API calls.
>
> Added Alexander's reproducer along.
>
> v1: 
> https://lore.kernel.org/qemu-devel/20210123100345.642933-1-ppan...@redhat.com/
>
> Alexander Bulekov (1):
>   tests/qtest/fdc-test: Add a regression test for CVE-2021-20196
>
> Philippe Mathieu-Daudé (1):
>   hw/block/fdc: Kludge missing floppy drive to fix CVE-2021-20196
>
>  hw/block/fdc.c | 14 +-
>  tests/qtest/fdc-test.c | 21 +
>  2 files changed, 34 insertions(+), 1 deletion(-)
>
> -- 
> 2.31.1



Re: [PATCH-for-6.2] hw/i386/microvm: Reduce annoying debug message in dt_setup_microvm()

2021-11-18 Thread Sergio Lopez
On Wed, Nov 17, 2021 at 06:43:31PM +0100, Philippe Mathieu-Daudé wrote:
> Fixes: f5918a99283 ("microvm: add device tree support.")
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/i386/microvm-dt.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Reviewed-by: Sergio Lopez 

> diff --git a/hw/i386/microvm-dt.c b/hw/i386/microvm-dt.c
> index 875ba919639..6ee6c42904d 100644
> --- a/hw/i386/microvm-dt.c
> +++ b/hw/i386/microvm-dt.c
> @@ -327,7 +327,9 @@ void dt_setup_microvm(MicrovmMachineState *mms)
>  dt_setup_sys_bus(mms);
>  
>  /* add to fw_cfg */
> -fprintf(stderr, "%s: add etc/fdt to fw_cfg\n", __func__);
> +if (debug) {
> +fprintf(stderr, "%s: add etc/fdt to fw_cfg\n", __func__);
> +}
>  fw_cfg_add_file(x86ms->fw_cfg, "etc/fdt", mms->fdt, size);
>  
>  if (debug) {
> -- 
> 2.31.1
> 


signature.asc
Description: PGP signature


Re: [PATCH-for-6.2? 1/3] docs/devel/style: Improve GLib functions rST rendering

2021-11-18 Thread Darren Kenny
Hi Philippe,

There are some inconsistencies in the use of '()' when referring to
functions or macros below...

On Tuesday, 2021-11-16 at 16:13:15 +01, Philippe Mathieu-Daudé wrote:
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  docs/devel/style.rst | 31 ---
>  1 file changed, 16 insertions(+), 15 deletions(-)
>
> diff --git a/docs/devel/style.rst b/docs/devel/style.rst
> index 260e3263fa0..415a6b9d700 100644
> --- a/docs/devel/style.rst
> +++ b/docs/devel/style.rst
> @@ -413,13 +413,14 @@ multiple exist paths you can also improve the 
> readability of the code
>  by using ``g_autofree`` and related annotations. See :ref:`autofree-ref`
>  for more details.
>  
> -Calling ``g_malloc`` with a zero size is valid and will return NULL.
> +Calling ``g_malloc`` with a zero size is valid and will return ``NULL``.
>

g_malloc() ?

>  
>  Prefer ``g_new(T, n)`` instead of ``g_malloc(sizeof(T) * n)`` for the 
> following
>  reasons:
>  
> -* It catches multiplication overflowing size_t;
> -* It returns T ``*`` instead of void ``*``, letting compiler catch more type 
> errors.
> +* It catches multiplication overflowing ``size_t``;
> +* It returns ``T *`` instead of ``void *``, letting compiler catch more type
> +  errors.
>  
>  Declarations like
>  
> @@ -444,14 +445,14 @@ use this similar function when possible, but note its 
> different signature:
>  
>  void pstrcpy(char *dest, int dest_buf_size, const char *src)
>  
> -Don't use strcat because it can't check for buffer overflows, but:
> +Don't use ``strcat`` because it can't check for buffer overflows, but:
>

strcat() ?

>  
>  .. code-block:: c
>  
>  char *pstrcat(char *buf, int buf_size, const char *s)
>  
> -The same limitation exists with sprintf and vsprintf, so use snprintf and
> -vsnprintf.
> +The same limitation exists with ``sprintf`` and ``vsprintf``, so use

sprintf() and vsprintf()?

> +``snprintf`` and ``vsnprintf``.
>

snprintf() and vsnprintf()?

>  
>  QEMU provides other useful string functions:
>  
> @@ -464,8 +465,8 @@ QEMU provides other useful string functions:
>  There are also replacement character processing macros for isxyz and toxyz,
>  so instead of e.g. isalnum you should use qemu_isalnum.
>

Should this be isalnum() and qemu_isalnum()?

>  
> -Because of the memory management rules, you must use g_strdup/g_strndup
> -instead of plain strdup/strndup.
> +Because of the memory management rules, you must use ``g_strdup/g_strndup``
>

Wonder should this be ``g_strdup()``/``g_strndup()``

> +instead of plain ``strdup/strndup``.
>

And ``strdup()``/``strndup()``

>  
>  Printf-style functions
>  ==
> @@ -524,10 +525,10 @@ automatic cleanup:
>  
>  Most notably:
>  
> -* g_autofree - will invoke g_free() on the variable going out of scope
> +* ``g_autofree`` - will invoke ``g_free()`` on the variable going out of 
> scope
>

g_autofree() ?

>  
> -* g_autoptr - for structs / objects, will invoke the cleanup func created
> -  by a previous use of G_DEFINE_AUTOPTR_CLEANUP_FUNC. This is
> +* ``g_autoptr`` - for structs / objects, will invoke the cleanup func created
>

g_autoptr() ?

> +  by a previous use of ``G_DEFINE_AUTOPTR_CLEANUP_FUNC``. This is
>supported for most GLib data types and GObjects
>  
>  For example, instead of
> @@ -551,7 +552,7 @@ For example, instead of
>  return ret;
>  }
>  
> -Using g_autofree/g_autoptr enables the code to be written as:
> +Using ``g_autofree/g_autoptr`` enables the code to be written as:
>

``g_autofree()``/``g_autoptr()`` ?

>  
>  .. code-block:: c
>  
> @@ -569,13 +570,13 @@ Using g_autofree/g_autoptr enables the code to be 
> written as:
>  While this generally results in simpler, less leak-prone code, there
>  are still some caveats to beware of
>  
> -* Variables declared with g_auto* MUST always be initialized,
> +* Variables declared with ``g_auto*`` MUST always be initialized,
>

g_auto*() ?

>otherwise the cleanup function will use uninitialized stack memory
>  
> -* If a variable declared with g_auto* holds a value which must
> +* If a variable declared with ``g_auto*`` holds a value which must
>

g_auto*() ?

>live beyond the life of the function, that value must be saved
>and the original variable NULL'd out. This can be simpler using
> -  g_steal_pointer
> +  ``g_steal_pointer``
>

g_steal_pointer() ?

Thanks,

Darren.



Re: [PATCH-for-6.2? 3/3] docs/devel/style: Improve types/qualifiers rST rendering

2021-11-18 Thread Darren Kenny
Hi Philippe,

A couple here too w.r.t. function/macros...

On Tuesday, 2021-11-16 at 16:13:17 +01, Philippe Mathieu-Daudé wrote:
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  docs/devel/style.rst | 111 ++-
>  1 file changed, 56 insertions(+), 55 deletions(-)
>
> diff --git a/docs/devel/style.rst b/docs/devel/style.rst
> index 21f0f213193..f9f063ed8cb 100644
> --- a/docs/devel/style.rst
> +++ b/docs/devel/style.rst
> @@ -111,7 +111,7 @@ Variables are lower_case_with_underscores; easy to type 
> and read.  Structured
>  type names are in CamelCase; harder to type but standing out.  Enum type
>  names and function type names should also be in CamelCase.  Scalar type
>  names are lower_case_with_underscores_ending_with_a_t, like the POSIX
> -uint64_t and family.  Note that this last convention contradicts POSIX
> +``uint64_t`` and family.  Note that this last convention contradicts POSIX
>  and is therefore likely to be changed.
>  
>  Variable Naming Conventions
> @@ -195,9 +195,9 @@ blocks) are generally not allowed; declarations should be 
> at the beginning
>  of blocks.
>  
>  Every now and then, an exception is made for declarations inside a
> -#ifdef or #ifndef block: if the code looks nicer, such declarations can
> +``#ifdef`` or ``#ifndef`` block: if the code looks nicer, such declarations 
> can
>  be placed at the top of the block even if there are statements above.
> -On the other hand, however, it's often best to move that #ifdef/#ifndef
> +On the other hand, however, it's often best to move that ``#ifdef/#ifndef``
>  block to a separate function altogether.
>  
>  Conditional statements
> @@ -220,13 +220,13 @@ even when the constant is on the right.
>  Comment style
>  =
>  
> -We use traditional C-style /``*`` ``*``/ comments and avoid // comments.
> +We use traditional C-style ``/*`` ``*/`` comments and avoid ``//`` comments.
>  
> -Rationale: The // form is valid in C99, so this is purely a matter of
> +Rationale: The ``//`` form is valid in C99, so this is purely a matter of
>  consistency of style. The checkpatch script will warn you about this.
>  
>  Multiline comment blocks should have a row of stars on the left,
> -and the initial /``*`` and terminating ``*``/ both on their own lines:
> +and the initial ``/*`` and terminating ``*/`` both on their own lines:
>  
>  .. code-block:: c
>  
> @@ -290,57 +290,57 @@ a few useful guidelines here.
>  Scalars
>  ---
>  
> -If you're using "int" or "long", odds are good that there's a better type.
> -If a variable is counting something, it should be declared with an
> -unsigned type.
> +If you're using '``int``' or '``long``', odds are good that there's a better
> +type.  If a variable is counting something, it should be declared with an
> +*unsigned* type.
>  
> -If it's host memory-size related, size_t should be a good choice (use
> -ssize_t only if required). Guest RAM memory offsets must use ram_addr_t,
> +If it's host memory-size related, ``size_t`` should be a good choice (use
> +``ssize_t`` only if required). Guest RAM memory offsets must use 
> ``ram_addr_t``,
>  but only for RAM, it may not cover whole guest address space.
>  
> -If it's file-size related, use off_t.
> -If it's file-offset related (i.e., signed), use off_t.
> -If it's just counting small numbers use "unsigned int";
> +If it's file-size related, use ``off_t``.
> +If it's file-offset related (i.e., signed), use ``off_t``.
> +If it's just counting small numbers use '``unsigned int``';
>  (on all but oddball embedded systems, you can assume that that
>  type is at least four bytes wide).
>  
>  In the event that you require a specific width, use a standard type
> -like int32_t, uint32_t, uint64_t, etc.  The specific types are
> +like ``int32_t``, ``uint32_t``, ``uint64_t``, etc.  The specific types are
>  mandatory for VMState fields.
>  
> -Don't use Linux kernel internal types like u32, __u32 or __le32.
> +Don't use Linux kernel internal types like ``u32``, ``__u32`` or ``__le32``.
>  
> -Use hwaddr for guest physical addresses except pcibus_t
> -for PCI addresses.  In addition, ram_addr_t is a QEMU internal address
> +Use ``hwaddr`` for guest physical addresses except ``pcibus_t``
> +for PCI addresses.  In addition, ``ram_addr_t`` is a QEMU internal address
>  space that maps guest RAM physical addresses into an intermediate
>  address space that can map to host virtual address spaces.  Generally
> -speaking, the size of guest memory can always fit into ram_addr_t but
> +speaking, the size of guest memory can always fit into ``ram_addr_t`` but
>  it would not be correct to store an actual guest physical address in a
> -ram_addr_t.
> +``ram_addr_t``.
>  
>  For CPU virtual addresses there are several possible types.
> -vaddr is the best type to use to hold a CPU virtual address in
> +``vaddr`` is the best type to use to hold a CPU virtual address in
>  target-independent code. It is guaranteed to be large enough to 

Re: [PATCH v2] hw/arm/virt: Expose empty NUMA nodes through ACPI

2021-11-18 Thread David Hildenbrand
On 18.11.21 11:28, Jonathan Cameron wrote:
> On Wed, 17 Nov 2021 19:08:28 +0100
> David Hildenbrand  wrote:
> 
>> On 17.11.21 15:30, Jonathan Cameron wrote:
>>> On Tue, 16 Nov 2021 12:11:29 +0100
>>> David Hildenbrand  wrote:
>>>   
>>
>> Examples include exposing HBM or PMEM to the VM. Just like on real HW,
>> this memory is exposed via cpu-less, special nodes. In contrast to real
>> HW, the memory is hotplugged later (I don't think HW supports hotplug
>> like that yet, but it might just be a matter of time).
>
> I suppose some of that maybe covered by GENERIC_AFFINITY entries in SRAT
> some by MEMORY entries. Or nodes created dynamically like with normal
> hotplug memory.
> 
>>>   
>>
>> Hi Jonathan,
>>
>>> The naming of the define is unhelpful.  GENERIC_AFFINITY here corresponds
>>> to Generic Initiator Affinity.  So no good for memory. This is meant for
>>> representation of accelerators / network cards etc so you can get the NUMA
>>> characteristics for them accessing Memory in other nodes.
>>>
>>> My understanding of 'traditional' memory hotplug is that typically the
>>> PA into which memory is hotplugged is known at boot time whether or not
>>> the memory is physically present.  As such, you present that in SRAT and 
>>> rely
>>> on the EFI memory map / other information sources to know the memory isn't
>>> there.  When it is hotplugged later the address is looked up in SRAT to 
>>> identify
>>> the NUMA node.  
>>
>> in virtualized environments we use the SRAT only to indicate the hotpluggable
>> region (-> indicate maximum possible PFN to the guest OS), the actual present
>> memory+PXM assignment is not done via SRAT. We differ quite a lot here from
>> actual hardware I think.
>>
>>>
>>> That model is less useful for more flexible entities like virtio-mem or
>>> indeed physical hardware such as CXL type 3 memory devices which typically
>>> need their own nodes.
>>>
>>> For the CXL type 3 option, currently proposal is to use the CXL table 
>>> entries
>>> representing Physical Address space regions to work out how many NUMA nodes
>>> are needed and just create extra ones at boot.
>>> https://lore.kernel.org/linux-cxl/163553711933.2509508.2203471175679990.st...@dwillia2-desk3.amr.corp.intel.com
>>>
>>> It's a heuristic as we might need more nodes to represent things well kernel
>>> side, but it's better than nothing and less effort that true dynamic node 
>>> creation.
>>> If you chase through the earlier versions of Alison's patch you will find 
>>> some
>>> discussion of that.
>>>
>>> I wonder if virtio-mem should just grow a CDAT instance via a DOE?
>>>
>>> That would make all this stuff discoverable via PCI config space rather 
>>> than ACPI
>>> CDAT is at:
>>> https://uefi.org/sites/default/files/resources/Coherent%20Device%20Attribute%20Table_1.01.pdf
>>> but the table access protocol over PCI DOE is currently in the CXL 2.0 spec
>>> (nothing stops others using it though AFAIK).
>>>
>>> However, then we'd actually need either dynamic node creation in the OS, or
>>> some sort of reserved pool of extra nodes.  Long term it may be the most
>>> flexible option.  
>>
>>
>> I think for virtio-mem it's actually a bit simpler:
>>
>> a) The user defined on the QEMU cmdline an empty node
>> b) The user assigned a virtio-mem device to a node, either when 
>>coldplugging or hotplugging the device.
>>
>> So we don't actually "hotplug" a new node, the (possible) node is already 
>> known
>> to QEMU right when starting up. It's just a matter of exposing that fact to 
>> the
>> guest OS -- similar to how we expose the maximum possible PFN to the guest 
>> OS.
>> It's seems to boild down to an ACPI limitation.
>>
>> Conceptually, virtio-mem on an empty node in QEMU is not that different from
>> hot/coldplugging a CPU to an empty node or hot/coldplugging a DIMM/NVDIMM to
>> an empty node. But I guess it all just doesn't work with QEMU as of now.
> 
> A side distraction perhaps, but there is a code first acpi proposal to add
> a 'softer' form of CPU hotplug 
> https://bugzilla.tianocore.org/show_bug.cgi?id=3706
> 
> Whilst the reason for that proposal was for arm64 systems where there is no 
> architected
> physical hotplug, it might partly solve the empty node question for CPUs.  
> They won't
> be empty, there will simply be CPUs that are marked as 'Online capable'.
> 
>>
>> In current x86-64 code, we define the "hotpluggable region" in 
>> hw/i386/acpi-build.c via
>>
>>  build_srat_memory(table_data, machine->device_memory->base,
>>hotpluggable_address_space_size, nb_numa_nodes - 1,
>>MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
>>
>> So we tell the guest OS "this range is hotpluggable" and "it contains to
>> this node unless the device says something different". From both values we
>> can -- when under QEMU -- conclude the maximum possible PFN and the maximum
>> possible node. But the latter is not w

Re: [PATCH v2 1/3] icount: preserve cflags when custom tb is about to execute

2021-11-18 Thread Pavel Dovgalyuk

On 17.11.2021 12:47, Alex Bennée wrote:


Pavel Dovgalyuk  writes:


When debugging with the watchpoints, qemu may need to create
TB with single instruction. This is achieved by setting cpu->cflags_next_tb.
But when this block is about to execute, it may be interrupted by another
thread. In this case cflags will be lost and next executed TB will not
be the special one.
This patch checks TB exit reason and restores cflags_next_tb to allow
finding the interrupted block.


How about this alternative?


I checked all cflags_next_tb assignments.
Looks that this variant should work.



--8<---cut here---start->8---
accel/tcg: suppress IRQ check for special TBs

Generally when we set cpu->cflags_next_tb it is because we want to
carefully control the execution of the next TB. Currently there is a
race that causes cflags_next_tb to get ignored if an IRQ is processed
before we execute any actual instructions.

To avoid this we introduce a new compiler flag: CF_NOIRQ to suppress
this check in the generated code so we know we will definitely execute
the next block.

Signed-off-by: Alex Bennée 
Cc: Pavel Dovgalyuk 
Fixes: https://gitlab.com/qemu-project/qemu/-/issues/245

3 files changed, 22 insertions(+), 3 deletions(-)
include/exec/exec-all.h   |  1 +
include/exec/gen-icount.h | 19 ---
accel/tcg/cpu-exec.c  |  5 +

modified   include/exec/exec-all.h
@@ -503,6 +503,7 @@ struct TranslationBlock {
  #define CF_USE_ICOUNT0x0002
  #define CF_INVALID   0x0004 /* TB is stale. Set with @jmp_lock held */
  #define CF_PARALLEL  0x0008 /* Generate code for a parallel context */
+#define CF_NOIRQ 0x0010 /* Generate an uninterruptible TB */
  #define CF_CLUSTER_MASK  0xff00 /* Top 8 bits are cluster ID */
  #define CF_CLUSTER_SHIFT 24
  
modified   include/exec/gen-icount.h

@@ -21,7 +21,6 @@ static inline void gen_tb_start(const TranslationBlock *tb)
  {
  TCGv_i32 count;
  
-tcg_ctx->exitreq_label = gen_new_label();

  if (tb_cflags(tb) & CF_USE_ICOUNT) {
  count = tcg_temp_local_new_i32();
  } else {
@@ -42,7 +41,19 @@ static inline void gen_tb_start(const TranslationBlock *tb)
  icount_start_insn = tcg_last_op();
  }
  
-tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, tcg_ctx->exitreq_label);

+/*
+ * Emit the check against icount_decr.u32 to see if we should exit
+ * unless we suppress the check with CF_NOIRQ. If we are using
+ * icount and have suppressed interruption the higher level code
+ * should have ensured we don't run more instructions than the
+ * budget.
+ */
+if (tb_cflags(tb) & CF_NOIRQ) {
+tcg_ctx->exitreq_label = NULL;
+} else {
+tcg_ctx->exitreq_label = gen_new_label();
+tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, tcg_ctx->exitreq_label);
+}
  
  if (tb_cflags(tb) & CF_USE_ICOUNT) {

  tcg_gen_st16_i32(count, cpu_env,
@@ -74,7 +85,9 @@ static inline void gen_tb_end(const TranslationBlock *tb, int 
num_insns)
 tcgv_i32_arg(tcg_constant_i32(num_insns)));
  }
  
-gen_set_label(tcg_ctx->exitreq_label);

+if (tcg_ctx->exitreq_label) {
+gen_set_label(tcg_ctx->exitreq_label);
+}
  tcg_gen_exit_tb(tb, TB_EXIT_REQUESTED);
  }
  
modified   accel/tcg/cpu-exec.c

@@ -954,11 +954,16 @@ int cpu_exec(CPUState *cpu)
   * after-access watchpoints.  Since this request should never
   * have CF_INVALID set, -1 is a convenient invalid value that
   * does not require tcg headers for cpu_common_reset.
+ *
+ * As we don't want this special TB being interrupted by
+ * some sort of asynchronous event we apply CF_NOIRQ to
+ * disable the usual event checking.
   */
  cflags = cpu->cflags_next_tb;
  if (cflags == -1) {
  cflags = curr_cflags(cpu);
  } else {
+cflags |= CF_NOIRQ;
  cpu->cflags_next_tb = -1;
  }
  
--8<---cut here---end--->8---




Signed-off-by: Pavel Dovgalyuk 
---
  accel/tcg/cpu-exec.c |   10 ++
  1 file changed, 10 insertions(+)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 2d14d02f6c..df12452b8f 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -846,6 +846,16 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, 
TranslationBlock *tb,
   * cpu_handle_interrupt.  cpu_handle_interrupt will also
   * clear cpu->icount_decr.u16.high.
   */
+if (cpu->cflags_next_tb == -1
+&& (!use_icount || !(tb->cflags & CF_USE_ICOUNT)
+|| cpu_neg(cpu)->icount_decr.u16.low >= tb->icount)) {
+/*
+ * icount is disabled or there are enough instructions
+ * in the budget, do not retranslate this block with
+   

Re: [PATCH v1 0/9] hw/mos6522: VIA timer emulation fixes and improvements

2021-11-18 Thread Mark Cave-Ayland

On 17/11/2021 03:03, Finn Thain wrote:


On Fri, 24 Sep 2021, I wrote:


This is a patch series for QEMU that I started last year. The aim was to
try to get a monotonic clocksource for Linux/m68k guests. That hasn't
been achieved yet (for q800 machines). I'm submitting the patch series
because,

  - it improves 6522 emulation fidelity, although slightly slower, and



I did some more benchmarking to examine the performance implications.

I measured a performance improvement with this patch series. For a
Linux/m68k guest, the execution time for a gettimeofday syscall dropped
from 9 us to 5 us. (This is a fairly common syscall.)

The host CPU time consumed by qemu in starting the guest kernel and
executing a benchmark involving 20 million gettimeofday calls was as
follows.

qemu-system-m68k mainline (42f6c9179b):
 real 198 s
 sys  123 s
 user 73 s
 sys/user 1.68

qemu-system-m68k patched (0a0bca4711):
 real 112 s
 sys  63 s
 user 47 s
 sys/user 1.34

As with any microbenchmark, this workload is not a real-world one. For
comparison, here are measurements of the time to fully startup and
immediately shut down Debian Linux/m68k SID (systemd):

qemu-system-m68k mainline (42f6c9179b)
 real 31.5 s
 sys  1.59 s
 user 27.4 s
 sys/user 0.06

qemu-system-m68k patched (0a0bca4711)
 real 31.2 s
 sys  1.17 s
 user 27.6 s
 sys/user 0.04

The decrease in sys/user ratio reflects the small cost that has to be paid
for the improvement in 6522 emulation fidelity and timer accuracy. But
note that in both benchmarks wallclock execution time dropped, meaning
that the system is faster overall.

The gettimeofday testing revealed that the Linux kernel does not properly
protect userland from pathological hardware timers, and the gettimeofday
result was seen to jump backwards (that was unexpected, though Mark did
predict it).

This backwards jump was often observed in the mainline build during the
gettimeofday benchmark and is result of bugs in mos6522.c. The patched
build did not exhibit this problem (as yet).

The two benefits described here are offered in addition to all of the
other benefits described in the patches themselves. Please consider
merging this series.


Hi Finn,

I've not forgotten about this series - we're now in 6.2 freeze, but it's on my TODO 
list to revisit in the next development cycle this along with the ESP stress-ng 
changes which I've also been looking at. As mentioned in my previous reviews the 
patch will need some further analysis: particularly the logic in mos6522_read() that 
can generate a spurious interrupt on a register read needs to be removed, and also 
testing is required to ensure that these changes don't affect the CUDA clock warping 
which allows OS X to boot under qemu-system-ppc.


I'm confident that qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) is monotonic, since if it 
were not then there would be huge numbers of complaints from QEMU users. It appears 
that Linux can potentially alter the ticks in mac_read_clk() at 
https://github.com/torvalds/linux/blob/master/arch/m68k/mac/via.c#L624 which suggests 
the issue is related to timer wraparound. I'd like to confirm exactly which part of 
your series fixes the specific problem of the clock jumping backwards before merging 
these changes.


I realized that I could easily cross-compile a 5.14 kernel to test this theory with 
the test root image and .config you supplied at 
https://gitlab.com/qemu-project/qemu/-/issues/611 using the QEMU docker-m68k-cross 
image to avoid having to build a complete toolchain by hand. The kernel builds 
successfully using this method, but during boot it hangs sending the first SCSI CDB 
to the ESP device, failing to send the last byte using PDMA.


Are there known issues cross-compiling an m68k kernel on an x86 host? Or are your 
current kernels being built from a separate branch outside of mainline Linux git?



ATB,

Mark.



Re: [PATCH v2] hw/arm/virt: Expose empty NUMA nodes through ACPI

2021-11-18 Thread Jonathan Cameron
On Thu, 18 Nov 2021 12:06:27 +0100
David Hildenbrand  wrote:

> On 18.11.21 11:28, Jonathan Cameron wrote:
> > On Wed, 17 Nov 2021 19:08:28 +0100
> > David Hildenbrand  wrote:
> >   
> >> On 17.11.21 15:30, Jonathan Cameron wrote:  
> >>> On Tue, 16 Nov 2021 12:11:29 +0100
> >>> David Hildenbrand  wrote:
> >>> 
> >>
> >> Examples include exposing HBM or PMEM to the VM. Just like on real HW,
> >> this memory is exposed via cpu-less, special nodes. In contrast to real
> >> HW, the memory is hotplugged later (I don't think HW supports hotplug
> >> like that yet, but it might just be a matter of time).  
> >
> > I suppose some of that maybe covered by GENERIC_AFFINITY entries in SRAT
> > some by MEMORY entries. Or nodes created dynamically like with normal
> > hotplug memory.
> >   
> >>> 
> >>
> >> Hi Jonathan,
> >>  
> >>> The naming of the define is unhelpful.  GENERIC_AFFINITY here corresponds
> >>> to Generic Initiator Affinity.  So no good for memory. This is meant for
> >>> representation of accelerators / network cards etc so you can get the NUMA
> >>> characteristics for them accessing Memory in other nodes.
> >>>
> >>> My understanding of 'traditional' memory hotplug is that typically the
> >>> PA into which memory is hotplugged is known at boot time whether or not
> >>> the memory is physically present.  As such, you present that in SRAT and 
> >>> rely
> >>> on the EFI memory map / other information sources to know the memory isn't
> >>> there.  When it is hotplugged later the address is looked up in SRAT to 
> >>> identify
> >>> the NUMA node.
> >>
> >> in virtualized environments we use the SRAT only to indicate the 
> >> hotpluggable
> >> region (-> indicate maximum possible PFN to the guest OS), the actual 
> >> present
> >> memory+PXM assignment is not done via SRAT. We differ quite a lot here from
> >> actual hardware I think.
> >>  
> >>>
> >>> That model is less useful for more flexible entities like virtio-mem or
> >>> indeed physical hardware such as CXL type 3 memory devices which typically
> >>> need their own nodes.
> >>>
> >>> For the CXL type 3 option, currently proposal is to use the CXL table 
> >>> entries
> >>> representing Physical Address space regions to work out how many NUMA 
> >>> nodes
> >>> are needed and just create extra ones at boot.
> >>> https://lore.kernel.org/linux-cxl/163553711933.2509508.2203471175679990.st...@dwillia2-desk3.amr.corp.intel.com
> >>>
> >>> It's a heuristic as we might need more nodes to represent things well 
> >>> kernel
> >>> side, but it's better than nothing and less effort that true dynamic node 
> >>> creation.
> >>> If you chase through the earlier versions of Alison's patch you will find 
> >>> some
> >>> discussion of that.
> >>>
> >>> I wonder if virtio-mem should just grow a CDAT instance via a DOE?
> >>>
> >>> That would make all this stuff discoverable via PCI config space rather 
> >>> than ACPI
> >>> CDAT is at:
> >>> https://uefi.org/sites/default/files/resources/Coherent%20Device%20Attribute%20Table_1.01.pdf
> >>> but the table access protocol over PCI DOE is currently in the CXL 2.0 
> >>> spec
> >>> (nothing stops others using it though AFAIK).
> >>>
> >>> However, then we'd actually need either dynamic node creation in the OS, 
> >>> or
> >>> some sort of reserved pool of extra nodes.  Long term it may be the most
> >>> flexible option.
> >>
> >>
> >> I think for virtio-mem it's actually a bit simpler:
> >>
> >> a) The user defined on the QEMU cmdline an empty node
> >> b) The user assigned a virtio-mem device to a node, either when 
> >>coldplugging or hotplugging the device.
> >>
> >> So we don't actually "hotplug" a new node, the (possible) node is already 
> >> known
> >> to QEMU right when starting up. It's just a matter of exposing that fact 
> >> to the
> >> guest OS -- similar to how we expose the maximum possible PFN to the guest 
> >> OS.
> >> It's seems to boild down to an ACPI limitation.
> >>
> >> Conceptually, virtio-mem on an empty node in QEMU is not that different 
> >> from
> >> hot/coldplugging a CPU to an empty node or hot/coldplugging a DIMM/NVDIMM 
> >> to
> >> an empty node. But I guess it all just doesn't work with QEMU as of now.  
> > 
> > A side distraction perhaps, but there is a code first acpi proposal to add
> > a 'softer' form of CPU hotplug 
> > https://bugzilla.tianocore.org/show_bug.cgi?id=3706
> > 
> > Whilst the reason for that proposal was for arm64 systems where there is no 
> > architected
> > physical hotplug, it might partly solve the empty node question for CPUs.  
> > They won't
> > be empty, there will simply be CPUs that are marked as 'Online capable'.
> >   
> >>
> >> In current x86-64 code, we define the "hotpluggable region" in 
> >> hw/i386/acpi-build.c via
> >>
> >>build_srat_memory(table_data, machine->device_memory->base,
> >>  hotpluggable_address_space_size, nb_numa_nodes - 1,
> >>  

Re: [PATCH for-6.2 1/2] esp: ensure that async_len is reset to 0 during esp_hard_reset()

2021-11-18 Thread Philippe Mathieu-Daudé
On 11/18/21 11:03, Mark Cave-Ayland wrote:
> If a reset command is sent after data has been transferred into the SCSI 
> buffer
> ensure that async_len is reset to 0. Otherwise a subsequent TI command assumes
> the SCSI buffer contains data to be transferred to the device causing it to
> dereference the stale async_buf pointer.
> 
> Signed-off-by: Mark Cave-Ayland 
> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/724
> ---
>  hw/scsi/esp.c | 1 +
>  1 file changed, 1 insertion(+)

Reviewed-by: Philippe Mathieu-Daudé 





Re: [PATCH-for-6.2 v2 1/2] hw/block/fdc: Kludge missing floppy drive to fix CVE-2021-20196

2021-11-18 Thread Philippe Mathieu-Daudé
On 11/18/21 11:44, Darren Kenny wrote:
> Hi Philippe,
> 
> A small nit below, but otherwise looks good.
> 
> On Thursday, 2021-11-18 at 00:24:21 +01, Philippe Mathieu-Daudé wrote:
>> Guest might select another drive on the bus by setting the
>> DRIVE_SEL bit of the DIGITAL OUTPUT REGISTER (DOR).
>> The current controller model doesn't expect a BlockBackend
>> to be NULL. A simple way to fix CVE-2021-20196 is to create
>> an empty BlockBackend when it is missing. All further
>> accesses will be safely handled, and the controller state
>> machines keep behaving correctly.
>>
>> Fixes: CVE-2021-20196
>> Reported-by: Gaoning Pan (Ant Security Light-Year Lab) 
>> BugLink: https://bugs.launchpad.net/qemu/+bug/1912780
>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/338
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>>  hw/block/fdc.c | 14 +-
>>  1 file changed, 13 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/block/fdc.c b/hw/block/fdc.c
>> index fa933cd3263..eab17e946d6 100644
>> --- a/hw/block/fdc.c
>> +++ b/hw/block/fdc.c
>> @@ -1161,7 +1161,19 @@ static FDrive *get_drv(FDCtrl *fdctrl, int unit)
>>  
>>  static FDrive *get_cur_drv(FDCtrl *fdctrl)
>>  {
>> -return get_drv(fdctrl, fdctrl->cur_drv);
>> +FDrive *cur_drv = get_drv(fdctrl, fdctrl->cur_drv);
>> +
>> +if (!cur_drv->blk) {
>> +/*
>> + * Kludge: empty drive line selected. Create an anonymous
>> + * BlockBackend to avoid NULL deref with various BlockBackend
>> + * API calls within this model (CVE-2021-20196).
>> + * Due to the controller QOM model limitations, we don't
>> + * attach the created to the controller.
>>
> The created  to the controller
> 
> Something is missing here, maybe 'device'?

OK. I forgot to add Cc: qemu-sta...@nongnu.org so will respin.

Thanks,

Phil.




[PATCH-for-6.2 0/2] hw/block/fdc: Fix CVE-2021-3507

2021-11-18 Thread Philippe Mathieu-Daudé
Trivial fix for CVE-2021-3507.

Philippe Mathieu-Daudé (2):
  hw/block/fdc: Prevent end-of-track overrun (CVE-2021-3507)
  tests/qtest/fdc-test: Add a regression test for CVE-2021-3507

 hw/block/fdc.c |  8 
 tests/qtest/fdc-test.c | 20 
 2 files changed, 28 insertions(+)

-- 
2.31.1





[PATCH-for-6.2 1/2] hw/block/fdc: Prevent end-of-track overrun (CVE-2021-3507)

2021-11-18 Thread Philippe Mathieu-Daudé
Per the 82078 datasheet, if the end-of-track (EOT byte in
the FIFO) is more than the number of sectors per side, the
command is terminated unsuccessfully:

* 5.2.5 DATA TRANSFER TERMINATION

  The 82078 supports terminal count explicitly through
  the TC pin and implicitly through the underrun/over-
  run and end-of-track (EOT) functions. For full sector
  transfers, the EOT parameter can define the last
  sector to be transferred in a single or multisector
  transfer. If the last sector to be transferred is a par-
  tial sector, the host can stop transferring the data in
  mid-sector, and the 82078 will continue to complete
  the sector as if a hardware TC was received. The
  only difference between these implicit functions and
  TC is that they return "abnormal termination" result
  status. Such status indications can be ignored if they
  were expected.

* 6.1.3 READ TRACK

  This command terminates when the EOT specified
  number of sectors have been read. If the 82078
  does not find an I D Address Mark on the diskette
  after the second· occurrence of a pulse on the
  INDX# pin, then it sets the IC code in Status Regis-
  ter 0 to "01" (Abnormal termination), sets the MA bit
  in Status Register 1 to "1", and terminates the com-
  mand.

* 6.1.6 VERIFY

  Refer to Table 6-6 and Table 6-7 for information
  concerning the values of MT and EC versus SC and
  EOT value.

* Table 6·6. Result Phase Table

* Table 6-7. Verify Command Result Phase Table

Fix by aborting the transfer when EOT > # Sectors Per Side.

Cc: qemu-sta...@nongnu.org
Cc: Hervé Poussineau 
Fixes: baca51faff0 ("floppy driver: disk geometry auto detect")
Reported-by: Alexander Bulekov 
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/339
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/block/fdc.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index fa933cd3263..d21b717b7d6 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -1512,6 +1512,14 @@ static void fdctrl_start_transfer(FDCtrl *fdctrl, int 
direction)
 int tmp;
 fdctrl->data_len = 128 << (fdctrl->fifo[5] > 7 ? 7 : fdctrl->fifo[5]);
 tmp = (fdctrl->fifo[6] - ks + 1);
+if (tmp < 0) {
+FLOPPY_DPRINTF("invalid EOT: %d\n", tmp);
+fdctrl_stop_transfer(fdctrl, FD_SR0_ABNTERM, FD_SR1_MA, 0x00);
+fdctrl->fifo[3] = kt;
+fdctrl->fifo[4] = kh;
+fdctrl->fifo[5] = ks;
+return;
+}
 if (fdctrl->fifo[0] & 0x80)
 tmp += fdctrl->fifo[6];
 fdctrl->data_len *= tmp;
-- 
2.31.1




[PATCH-for-6.2 2/2] tests/qtest/fdc-test: Add a regression test for CVE-2021-3507

2021-11-18 Thread Philippe Mathieu-Daudé
Add the reproducer from https://gitlab.com/qemu-project/qemu/-/issues/339

Without the previous commit, when running 'make check-qtest-i386'
with QEMU configured with '--enable-sanitizers' we get:

  ==4028352==ERROR: AddressSanitizer: heap-buffer-overflow on address 
0x61962a00 at pc 0x5626d03c491a bp 0x7ffdb4199410 sp 0x7ffdb4198bc0
  READ of size 786432 at 0x61962a00 thread T0
  #0 0x5626d03c4919 in __asan_memcpy (qemu-system-i386+0x1e65919)
  #1 0x5626d1c023cc in flatview_write_continue softmmu/physmem.c:2787:13
  #2 0x5626d1bf0c0f in flatview_write softmmu/physmem.c:2822:14
  #3 0x5626d1bf0798 in address_space_write softmmu/physmem.c:2914:18
  #4 0x5626d1bf0f37 in address_space_rw softmmu/physmem.c:2924:16
  #5 0x5626d1bf14c8 in cpu_physical_memory_rw softmmu/physmem.c:2933:5
  #6 0x5626d0bd5649 in cpu_physical_memory_write 
include/exec/cpu-common.h:82:5
  #7 0x5626d0bd0a07 in i8257_dma_write_memory hw/dma/i8257.c:452:9
  #8 0x5626d09f825d in fdctrl_transfer_handler hw/block/fdc.c:1616:13
  #9 0x5626d0a048b4 in fdctrl_start_transfer hw/block/fdc.c:1539:13
  #10 0x5626d09f4c3e in fdctrl_write_data hw/block/fdc.c:2266:13
  #11 0x5626d09f22f7 in fdctrl_write hw/block/fdc.c:829:9
  #12 0x5626d1c20bc5 in portio_write softmmu/ioport.c:207:17

  0x61962a00 is located 0 bytes to the right of 512-byte region 
[0x61962800,0x61962a00)
  allocated by thread T0 here:
  #0 0x5626d03c66ec in posix_memalign (qemu-system-i386+0x1e676ec)
  #1 0x5626d2b988d4 in qemu_try_memalign util/oslib-posix.c:210:11
  #2 0x5626d2b98b0c in qemu_memalign util/oslib-posix.c:226:27
  #3 0x5626d09fbaf0 in fdctrl_realize_common hw/block/fdc.c:2341:20
  #4 0x5626d0a150ed in isabus_fdc_realize hw/block/fdc-isa.c:113:5
  #5 0x5626d2367935 in device_set_realized hw/core/qdev.c:531:13

  SUMMARY: AddressSanitizer: heap-buffer-overflow (qemu-system-i386+0x1e65919) 
in __asan_memcpy
  Shadow bytes around the buggy address:
0x0c32800044f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c3280004500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c3280004510: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c3280004520: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c3280004530: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  =>0x0c3280004540:[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c3280004550: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c3280004560: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c3280004570: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c3280004580: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c3280004590: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable:   00
Heap left redzone:   fa
Freed heap region:   fd
  ==4028352==ABORTING

Reported-by: Alexander Bulekov 
Signed-off-by: Philippe Mathieu-Daudé 
---
 tests/qtest/fdc-test.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/tests/qtest/fdc-test.c b/tests/qtest/fdc-test.c
index 26b69f7c5cd..f164d972d10 100644
--- a/tests/qtest/fdc-test.c
+++ b/tests/qtest/fdc-test.c
@@ -546,6 +546,25 @@ static void fuzz_registers(void)
 }
 }
 
+static void test_cve_2021_3507(void)
+{
+QTestState *s;
+
+s = qtest_initf("-nographic -m 32M -nodefaults "
+"-drive file=%s,format=raw,if=floppy", test_image);
+qtest_outl(s, 0x9, 0x0a0206);
+qtest_outw(s, 0x3f4, 0x1600);
+qtest_outw(s, 0x3f4, 0x);
+qtest_outw(s, 0x3f4, 0x);
+qtest_outw(s, 0x3f4, 0x);
+qtest_outw(s, 0x3f4, 0x0200);
+qtest_outw(s, 0x3f4, 0x0200);
+qtest_outw(s, 0x3f4, 0x);
+qtest_outw(s, 0x3f4, 0x);
+qtest_outw(s, 0x3f4, 0x);
+qtest_quit(s);
+}
+
 int main(int argc, char **argv)
 {
 int fd;
@@ -576,6 +595,7 @@ int main(int argc, char **argv)
 qtest_add_func("/fdc/read_no_dma_18", test_read_no_dma_18);
 qtest_add_func("/fdc/read_no_dma_19", test_read_no_dma_19);
 qtest_add_func("/fdc/fuzz-registers", fuzz_registers);
+qtest_add_func("/fdc/fuzz/cve_2021_3507", test_cve_2021_3507);
 
 ret = g_test_run();
 
-- 
2.31.1




Re: [PATCH v2 1/2] qemu-options: define -spice only #ifdef CONFIG_SPICE

2021-11-18 Thread Markus Armbruster
marcandre.lur...@redhat.com writes:

> From: Marc-André Lureau 
>
> Fixes:
> https://bugzilla.redhat.com/show_bug.cgi?id=1982600
>
> Signed-off-by: Marc-André Lureau 
> ---
>  softmmu/vl.c| 2 ++
>  qemu-options.hx | 2 ++
>  2 files changed, 4 insertions(+)
>
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index 1159a64bce4e..385465fbeb6d 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -3538,6 +3538,7 @@ void qemu_init(int argc, char **argv, char **envp)
>  case QEMU_OPTION_readconfig:
>  qemu_read_config_file(optarg, qemu_parse_config_group, 
> &error_fatal);
>  break;
> +#ifdef CONFIG_SPICE
>  case QEMU_OPTION_spice:
>  olist = qemu_find_opts_err("spice", NULL);
>  if (!olist) {
   error_report("spice support is disabled");
   exit(1);
   }

Is this error still reachable?

> @@ -3550,6 +3551,7 @@ void qemu_init(int argc, char **argv, char **envp)
>  }
>  display_remote++;
>  break;
> +#endif
>  case QEMU_OPTION_writeconfig:
>  {
>  FILE *fp;
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 7749f59300b5..323913945a5d 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -2017,6 +2017,7 @@ SRST
>  Enable SDL.
>  ERST
>  
> +#ifdef CONFIG_SPICE
>  DEF("spice", HAS_ARG, QEMU_OPTION_spice,
>  "-spice [port=port][,tls-port=secured-port][,x509-dir=]\n"
>  "   [,x509-key-file=][,x509-key-password=]\n"
> @@ -2038,6 +2039,7 @@ DEF("spice", HAS_ARG, QEMU_OPTION_spice,
>  "   enable spice\n"
>  "   at least one of {port, tls-port} is mandatory\n",
>  QEMU_ARCH_ALL)
> +#endif
>  SRST
>  ``-spice option[,option[,...]]``
>  Enable the spice remote desktop protocol. Valid options are




[PATCH-for-6.2 v3 0/2] hw/block/fdc: Fix CVE-2021-20196

2021-11-18 Thread Philippe Mathieu-Daudé
I'm not sure what happened to v1 from Prasad, so since we are
at rc2 I took a simpler approach to fix this CVE: create an
empty drive to satisfy the BlockBackend API calls.

Added Alexander's reproducer along.

Since v2:
- Reword comment (Darren)
- Add Darren R-b tag

v2: 
https://lore.kernel.org/qemu-devel/2027232422.1026411-1-phi...@redhat.com/
v1: 
https://lore.kernel.org/qemu-devel/20210123100345.642933-1-ppan...@redhat.com/
Based-on: <2028115733.4038610-1-phi...@redhat.com>

Alexander Bulekov (1):
  tests/qtest/fdc-test: Add a regression test for CVE-2021-20196

Philippe Mathieu-Daudé (1):
  hw/block/fdc: Kludge missing floppy drive to fix CVE-2021-20196

 hw/block/fdc.c | 14 +-
 tests/qtest/fdc-test.c | 21 +
 2 files changed, 34 insertions(+), 1 deletion(-)

-- 
2.31.1





[PATCH-for-6.2 v3 1/2] hw/block/fdc: Kludge missing floppy drive to fix CVE-2021-20196

2021-11-18 Thread Philippe Mathieu-Daudé
Guest might select another drive on the bus by setting the
DRIVE_SEL bit of the DIGITAL OUTPUT REGISTER (DOR).
The current controller model doesn't expect a BlockBackend
to be NULL. A simple way to fix CVE-2021-20196 is to create
an empty BlockBackend when it is missing. All further
accesses will be safely handled, and the controller state
machines keep behaving correctly.

Cc: qemu-sta...@nongnu.org
Fixes: CVE-2021-20196
Reported-by: Gaoning Pan (Ant Security Light-Year Lab) 
BugLink: https://bugs.launchpad.net/qemu/+bug/1912780
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/338
Reviewed-by: Darren Kenny 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/block/fdc.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index d21b717b7d6..6f94b6a6daa 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -1161,7 +1161,19 @@ static FDrive *get_drv(FDCtrl *fdctrl, int unit)
 
 static FDrive *get_cur_drv(FDCtrl *fdctrl)
 {
-return get_drv(fdctrl, fdctrl->cur_drv);
+FDrive *cur_drv = get_drv(fdctrl, fdctrl->cur_drv);
+
+if (!cur_drv->blk) {
+/*
+ * Kludge: empty drive line selected. Create an anonymous
+ * BlockBackend to avoid NULL deref with various BlockBackend
+ * API calls within this model (CVE-2021-20196).
+ * Due to the controller QOM model limitations, we don't
+ * attach the created to the controller device.
+ */
+cur_drv->blk = blk_new(qemu_get_aio_context(), 0, BLK_PERM_ALL);
+}
+return cur_drv;
 }
 
 /* Status A register : 0x00 (read-only) */
-- 
2.31.1




[PATCH-for-6.2 v3 2/2] tests/qtest/fdc-test: Add a regression test for CVE-2021-20196

2021-11-18 Thread Philippe Mathieu-Daudé
From: Alexander Bulekov 

Without the previous commit, when running 'make check-qtest-i386'
with QEMU configured with '--enable-sanitizers' we get:

  AddressSanitizer:DEADLYSIGNAL
  =
  ==287878==ERROR: AddressSanitizer: SEGV on unknown address 0x0344
  ==287878==The signal is caused by a WRITE memory access.
  ==287878==Hint: address points to the zero page.
  #0 0x564b2e5bac27 in blk_inc_in_flight block/block-backend.c:1346:5
  #1 0x564b2e5bb228 in blk_pwritev_part block/block-backend.c:1317:5
  #2 0x564b2e5bcd57 in blk_pwrite block/block-backend.c:1498:11
  #3 0x564b2ca1cdd3 in fdctrl_write_data hw/block/fdc.c:2221:17
  #4 0x564b2ca1b2f7 in fdctrl_write hw/block/fdc.c:829:9
  #5 0x564b2dc49503 in portio_write softmmu/ioport.c:201:9

Add the reproducer for CVE-2021-20196.

Signed-off-by: Alexander Bulekov 
Message-Id: <20210319050906.14875-2-alx...@bu.edu>
[PMD: Rebased, use global test_image]
Reviewed-by: Darren Kenny 
Signed-off-by: Philippe Mathieu-Daudé 
---
 tests/qtest/fdc-test.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/tests/qtest/fdc-test.c b/tests/qtest/fdc-test.c
index f164d972d10..0f8b9b753f4 100644
--- a/tests/qtest/fdc-test.c
+++ b/tests/qtest/fdc-test.c
@@ -565,6 +565,26 @@ static void test_cve_2021_3507(void)
 qtest_quit(s);
 }
 
+static void test_cve_2021_20196(void)
+{
+QTestState *s;
+
+s = qtest_initf("-nographic -m 32M -nodefaults "
+"-drive file=%s,format=raw,if=floppy", test_image);
+qtest_outw(s, 0x3f2, 0x0004);
+qtest_outw(s, 0x3f4, 0x0200);
+qtest_outw(s, 0x3f4, 0x);
+qtest_outw(s, 0x3f4, 0x);
+qtest_outw(s, 0x3f4, 0x);
+qtest_outw(s, 0x3f4, 0x);
+qtest_outw(s, 0x3f4, 0x);
+qtest_outw(s, 0x3f4, 0x);
+qtest_outw(s, 0x3f4, 0x);
+qtest_outw(s, 0x3f4, 0x);
+qtest_outw(s, 0x3f2, 0x0001);
+qtest_quit(s);
+}
+
 int main(int argc, char **argv)
 {
 int fd;
@@ -596,6 +616,7 @@ int main(int argc, char **argv)
 qtest_add_func("/fdc/read_no_dma_19", test_read_no_dma_19);
 qtest_add_func("/fdc/fuzz-registers", fuzz_registers);
 qtest_add_func("/fdc/fuzz/cve_2021_3507", test_cve_2021_3507);
+qtest_add_func("/fdc/fuzz/cve_2021_20196", test_cve_2021_20196);
 
 ret = g_test_run();
 
-- 
2.31.1




Re: [PATCH-for-6.2? 1/3] docs/devel/style: Improve GLib functions rST rendering

2021-11-18 Thread Philippe Mathieu-Daudé
On 11/18/21 11:58, Darren Kenny wrote:
> Hi Philippe,
> 
> There are some inconsistencies in the use of '()' when referring to
> functions or macros below...

Daniel, if you agree with Darren comments I can respin addressing them.

> On Tuesday, 2021-11-16 at 16:13:15 +01, Philippe Mathieu-Daudé wrote:
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>>  docs/devel/style.rst | 31 ---
>>  1 file changed, 16 insertions(+), 15 deletions(-)
>>
>> diff --git a/docs/devel/style.rst b/docs/devel/style.rst
>> index 260e3263fa0..415a6b9d700 100644
>> --- a/docs/devel/style.rst
>> +++ b/docs/devel/style.rst
>> @@ -413,13 +413,14 @@ multiple exist paths you can also improve the 
>> readability of the code
>>  by using ``g_autofree`` and related annotations. See :ref:`autofree-ref`
>>  for more details.
>>  
>> -Calling ``g_malloc`` with a zero size is valid and will return NULL.
>> +Calling ``g_malloc`` with a zero size is valid and will return ``NULL``.
>>
> 
> g_malloc() ?
> 
>>  
>>  Prefer ``g_new(T, n)`` instead of ``g_malloc(sizeof(T) * n)`` for the 
>> following
>>  reasons:
>>  
>> -* It catches multiplication overflowing size_t;
>> -* It returns T ``*`` instead of void ``*``, letting compiler catch more 
>> type errors.
>> +* It catches multiplication overflowing ``size_t``;
>> +* It returns ``T *`` instead of ``void *``, letting compiler catch more type
>> +  errors.
>>  
>>  Declarations like
>>  
>> @@ -444,14 +445,14 @@ use this similar function when possible, but note its 
>> different signature:
>>  
>>  void pstrcpy(char *dest, int dest_buf_size, const char *src)
>>  
>> -Don't use strcat because it can't check for buffer overflows, but:
>> +Don't use ``strcat`` because it can't check for buffer overflows, but:
>>
> 
> strcat() ?
> 
>>  
>>  .. code-block:: c
>>  
>>  char *pstrcat(char *buf, int buf_size, const char *s)
>>  
>> -The same limitation exists with sprintf and vsprintf, so use snprintf and
>> -vsnprintf.
>> +The same limitation exists with ``sprintf`` and ``vsprintf``, so use
> 
> sprintf() and vsprintf()?
> 
>> +``snprintf`` and ``vsnprintf``.
>>
> 
> snprintf() and vsnprintf()?
> 
>>  
>>  QEMU provides other useful string functions:
>>  
>> @@ -464,8 +465,8 @@ QEMU provides other useful string functions:
>>  There are also replacement character processing macros for isxyz and toxyz,
>>  so instead of e.g. isalnum you should use qemu_isalnum.
>>
> 
> Should this be isalnum() and qemu_isalnum()?
> 
>>  
>> -Because of the memory management rules, you must use g_strdup/g_strndup
>> -instead of plain strdup/strndup.
>> +Because of the memory management rules, you must use ``g_strdup/g_strndup``
>>
> 
> Wonder should this be ``g_strdup()``/``g_strndup()``
> 
>> +instead of plain ``strdup/strndup``.
>>
> 
> And ``strdup()``/``strndup()``
> 
>>  
>>  Printf-style functions
>>  ==
>> @@ -524,10 +525,10 @@ automatic cleanup:
>>  
>>  Most notably:
>>  
>> -* g_autofree - will invoke g_free() on the variable going out of scope
>> +* ``g_autofree`` - will invoke ``g_free()`` on the variable going out of 
>> scope
>>
> 
> g_autofree() ?
> 
>>  
>> -* g_autoptr - for structs / objects, will invoke the cleanup func created
>> -  by a previous use of G_DEFINE_AUTOPTR_CLEANUP_FUNC. This is
>> +* ``g_autoptr`` - for structs / objects, will invoke the cleanup func 
>> created
>>
> 
> g_autoptr() ?
> 
>> +  by a previous use of ``G_DEFINE_AUTOPTR_CLEANUP_FUNC``. This is
>>supported for most GLib data types and GObjects
>>  
>>  For example, instead of
>> @@ -551,7 +552,7 @@ For example, instead of
>>  return ret;
>>  }
>>  
>> -Using g_autofree/g_autoptr enables the code to be written as:
>> +Using ``g_autofree/g_autoptr`` enables the code to be written as:
>>
> 
> ``g_autofree()``/``g_autoptr()`` ?
> 
>>  
>>  .. code-block:: c
>>  
>> @@ -569,13 +570,13 @@ Using g_autofree/g_autoptr enables the code to be 
>> written as:
>>  While this generally results in simpler, less leak-prone code, there
>>  are still some caveats to beware of
>>  
>> -* Variables declared with g_auto* MUST always be initialized,
>> +* Variables declared with ``g_auto*`` MUST always be initialized,
>>
> 
> g_auto*() ?
> 
>>otherwise the cleanup function will use uninitialized stack memory
>>  
>> -* If a variable declared with g_auto* holds a value which must
>> +* If a variable declared with ``g_auto*`` holds a value which must
>>
> 
> g_auto*() ?
> 
>>live beyond the life of the function, that value must be saved
>>and the original variable NULL'd out. This can be simpler using
>> -  g_steal_pointer
>> +  ``g_steal_pointer``
>>
> 
> g_steal_pointer() ?
> 
> Thanks,
> 
> Darren.
> 




Re: [PATCH v3 0/6] SEV: add kernel-hashes=on for measured -kernel launch

2021-11-18 Thread Dov Murik
Pinging again -- Daniel said this should be added to 6.2.

Is there anything I should do?

Thanks,
-Dov

On 14/11/2021 20:02, Dov Murik wrote:
> Paolo,
> 
> Can you please add this series (already reviewed) to the fixes in 6.2?
> 
> Thanks,
> -Dov
> 
> 
> On 11/11/2021 12:00, Dov Murik wrote:
>> Tom Lendacky and Brijesh Singh reported two issues with launching SEV
>> guests with the -kernel QEMU option when an old [1] or wrongly configured [2]
>> OVMF images are used.
>>
>> To fix these issues, these series "hides" the whole kernel hashes
>> additions behind a kernel-hashes=on option (with default value of
>> "off").  This allows existing scenarios to work without change, and
>> explicitly forces kernel hashes additions for guests that require that.
>>
>> Patch 1 introduces a new boolean option "kernel-hashes" on the sev-guest
>> object, and patch 2 causes QEMU to add kernel hashes only if its
>> explicitly set to "on".  This will mitigate both experienced issues
>> because the default of the new setting is off, and therefore is backward
>> compatible with older OVMF images (which don't have a designated hashes
>> table area) or with guests that don't wish to measure the kernel/initrd.
>>
>> Patch 3 fixes the wording on the error message displayed when no hashes
>> table is found in the guest firmware.
>>
>> Patch 4 detects incorrect address and length of the guest firmware
>> hashes table area and fails the boot.
>>
>> Patch 5 is a refactoring of parts of the same function
>> sev_add_kernel_loader_hashes() to calculate all padding sizes at
>> compile-time.  Patch 6 also changes the same function and replaces the
>> call to qemu_map_ram_ptr() with address_space_map() to allow for error
>> detection.  Patches 5-6 are not required to fix the issues above, but
>> are suggested as an improvement (no functional change intended).
>>
>> To enable addition of kernel/initrd/cmdline hashes into the SEV guest at
>> launch time, specify:
>>
>> qemu-system-x86_64 ... -object sev-guest,...,kernel-hashes=on
>>
>>
>> [1] 
>> https://lore.kernel.org/qemu-devel/3b9d10d9-5d9c-da52-f18c-cd93c1931...@amd.com/
>> [2] 
>> https://lore.kernel.org/qemu-devel/001dd81a-282d-c307-a657-e228480d4...@amd.com/
>>
>>
>> Changes in v3:
>>  - Patch 1/6: Add "(since 6.2)" in the documentation of the
>>kernel-hashes option (thanks Markus)
>>  - Patch 3/6: Change error string use "kernel" instead of "-kernel"
>>(thanks Daniel)
>>
>> v2: 
>> https://lore.kernel.org/qemu-devel/20211108134840.2757206-1-dovmu...@linux.ibm.com/
>> Changes in v2:
>>  - Instead of trying to figure out whether to add hashes or not,
>>explicity declare an option (kernel-hashes=on) for that.  When that
>>option is turned on, fail if the hashes cannot be added.
>>  - Rephrase error message when no hashes table GUID is found.
>>  - Replace qemu_map_ram_ptr with address_space_map
>>
>> v1: 
>> https://lore.kernel.org/qemu-devel/20211101102136.1706421-1-dovmu...@linux.ibm.com/
>>
>>
>> Dov Murik (6):
>>   qapi/qom,target/i386: sev-guest: Introduce kernel-hashes=on|off option
>>   target/i386/sev: Add kernel hashes only if sev-guest.kernel-hashes=on
>>   target/i386/sev: Rephrase error message when no hashes table in guest
>> firmware
>>   target/i386/sev: Fail when invalid hashes table area detected
>>   target/i386/sev: Perform padding calculations at compile-time
>>   target/i386/sev: Replace qemu_map_ram_ptr with address_space_map
>>
>>  qapi/qom.json |  7 -
>>  target/i386/sev.c | 77 +++
>>  qemu-options.hx   |  6 +++-
>>  3 files changed, 75 insertions(+), 15 deletions(-)
>>
>>
>> base-commit: af531756d25541a1b3b3d9a14e72e7fedd941a2e
>>



[qemu-web PATCH v2] Add Sponsors page

2021-11-18 Thread Philippe Mathieu-Daudé
Add a page listing QEMU sponsors.

For now, only mention Fosshost which requested to be listed:
https://lists.gnu.org/archive/html/qemu-devel/2021-06/msg05381.html

Cc: Thomas Markey 
Resolves: https://gitlab.com/qemu-project/qemu-web/-/issues/2
Signed-off-by: Philippe Mathieu-Daudé 
---
Since v1:
- move to footer (Daniel)
- only list sponsor who asked to be listed (Stefan)
---
 _includes/footer.html | 3 +++
 assets/css/style.css  | 6 +-
 sponsors.md   | 9 +
 3 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 sponsors.md

diff --git a/_includes/footer.html b/_includes/footer.html
index 1a0b960..b1f3f0c 100644
--- a/_includes/footer.html
+++ b/_includes/footer.html
@@ -15,6 +15,9 @@

QEMU is a member of Software Freedom Conservancy

+   
+   QEMU has sponsors
+   

Website licenses

diff --git a/assets/css/style.css b/assets/css/style.css
index 88f7e85..aede79a 100644
--- a/assets/css/style.css
+++ b/assets/css/style.css
@@ -533,7 +533,7 @@
padding-left: 1em;
}
 
-   #licenses, #conservancy, #edit-page {
+   #licenses, #conservancy, #sponsors, #edit-page {
padding: 0em;
padding-left: 1em;
padding-right: 1em;
@@ -552,6 +552,10 @@
float: left;
}
 
+   #sponsors {
+   float: left;
+   }
+
#edit-page a {
overflow: hidden;
background: url(../images/edit-page.png);
diff --git a/sponsors.md b/sponsors.md
new file mode 100644
index 000..1c097c8
--- /dev/null
+++ b/sponsors.md
@@ -0,0 +1,9 @@
+---
+title: QEMU sponsors
+permalink: /sponsors/
+---
+
+QEMU has sponsors!
+
+For continuous integration and testing, hardware is provided by:
+- [Fosshost](https://fosshost.org/)
-- 
2.31.1




Re: [PATCH-for-6.2] net: vmxnet3: validate configuration values during activate (CVE-2021-20203)

2021-11-18 Thread Philippe Mathieu-Daudé
ping?

On 10/18/21 11:09, P J P wrote:
> On Monday, 18 October, 2021, 12:20:55 pm IST, Thomas Huth  
> wrote:
> On 30/01/2021 14.16, P J P wrote:
>>> diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
>>> index eff299f629..4a910ca971 100644
>>> --- a/hw/net/vmxnet3.c
>>> +++ b/hw/net/vmxnet3.c
>>> @@ -1420,6 +1420,7 @@ static void vmxnet3_activate_device(VMXNET3State *s)
>>>       vmxnet3_setup_rx_filtering(s);
>>>       /* Cache fields from shared memory */
>>>       s->mtu = VMXNET3_READ_DRV_SHARED32(d, s->drv_shmem, devRead.misc.mtu);
>>> +    assert(VMXNET3_MIN_MTU <= s->mtu && s->mtu < VMXNET3_MAX_MTU);
>>>       VMW_CFPRN("MTU is %u", s->mtu);
>>>   
>>>       s->max_rx_frags =
>>> @@ -1473,6 +1474,9 @@ static void vmxnet3_activate_device(VMXNET3State *s)
>>>           /* Read rings memory locations for TX queues */
>>>           pa = VMXNET3_READ_TX_QUEUE_DESCR64(d, qdescr_pa, 
>>> conf.txRingBasePA);
>>>           size = VMXNET3_READ_TX_QUEUE_DESCR32(d, qdescr_pa, 
>>> conf.txRingSize);
>>> +        if (size > VMXNET3_TX_RING_MAX_SIZE) {
>>> +            size = VMXNET3_TX_RING_MAX_SIZE;
>>> +        }
>>>   
>>>           vmxnet3_ring_init(d, &s->txq_descr[i].tx_ring, pa, size,
>>>                             sizeof(struct Vmxnet3_TxDesc), false);
>>> @@ -1483,6 +1487,9 @@ static void vmxnet3_activate_device(VMXNET3State *s)
>>>           /* TXC ring */
>>>           pa = VMXNET3_READ_TX_QUEUE_DESCR64(d, qdescr_pa, 
>>> conf.compRingBasePA);
>>>           size = VMXNET3_READ_TX_QUEUE_DESCR32(d, qdescr_pa, 
>>> conf.compRingSize);
>>> +        if (size > VMXNET3_TC_RING_MAX_SIZE) {
>>> +            size = VMXNET3_TC_RING_MAX_SIZE;
>>> +        }
>>>           vmxnet3_ring_init(d, &s->txq_descr[i].comp_ring, pa, size,
>>>                             sizeof(struct Vmxnet3_TxCompDesc), true);
>>>           VMXNET3_RING_DUMP(VMW_CFPRN, "TXC", i, 
>>> &s->txq_descr[i].comp_ring);
>>> @@ -1524,6 +1531,9 @@ static void vmxnet3_activate_device(VMXNET3State *s)
>>>               /* RX rings */
>>>               pa = VMXNET3_READ_RX_QUEUE_DESCR64(d, qd_pa, 
>>> conf.rxRingBasePA[j]);
>>>               size = VMXNET3_READ_RX_QUEUE_DESCR32(d, qd_pa, 
>>> conf.rxRingSize[j]);
>>> +            if (size > VMXNET3_RX_RING_MAX_SIZE) {
>>> +                size = VMXNET3_RX_RING_MAX_SIZE;
>>> +            }
>>>               vmxnet3_ring_init(d, &s->rxq_descr[i].rx_ring[j], pa, size,
>>>                                 sizeof(struct Vmxnet3_RxDesc), false);
>>>               VMW_CFPRN("RX queue %d:%d: Base: %" PRIx64 ", Size: %d",
>>> @@ -1533,6 +1543,9 @@ static void vmxnet3_activate_device(VMXNET3State *s)
>>>           /* RXC ring */
>>>           pa = VMXNET3_READ_RX_QUEUE_DESCR64(d, qd_pa, conf.compRingBasePA);
>>>           size = VMXNET3_READ_RX_QUEUE_DESCR32(d, qd_pa, conf.compRingSize);
>>> +        if (size > VMXNET3_RC_RING_MAX_SIZE) {
>>> +            size = VMXNET3_RC_RING_MAX_SIZE;
>>> +        }
>>>           vmxnet3_ring_init(d, &s->rxq_descr[i].comp_ring, pa, size,
>>>                             sizeof(struct Vmxnet3_RxCompDesc), true);
>>>           VMW_CFPRN("RXC queue %d: Base: %" PRIx64 ", Size: %d", i, pa, 
>>> size);
>>>
>>>
>> Ping!
>>
>> According to 
>> https://gitlab.com/qemu-project/qemu/-/issues/308#note_705736713 this is 
>> still an issue...
>>
>> Patch looks fine to me ... maybe just add some 
>> qemu_log_mask(LOG_GUEST_ERROR, ...) statements before correcting the values?
> 
> 
> * Oops! Not sure how I missed it, thought it was pulled upstream.
>   Will send a revised patch.
> 
> 
> Thank you.
> ---
>   - P J P
> 




Re: [PATCH v4 19/25] block_int-common.h: split function pointers in BlockDriver

2021-11-18 Thread Emanuele Giuseppe Esposito




On 15/11/2021 13:00, Hanna Reitz wrote:

+
+    /*
+ * I/O API functions. These functions are thread-safe.
+ *
+ * See include/block/block-io.h for more information about
+ * the I/O API.
+ */
+
+    int coroutine_fn (*bdrv_co_create)(BlockdevCreateOptions *opts,
+   Error **errp);
+    int coroutine_fn (*bdrv_co_create_opts)(BlockDriver *drv,
+    const char *filename,
+    QemuOpts *opts,
+    Error **errp);


Now this is really interesting.  Technically I suppose these should work 
in any thread, but trying to do so results in:


$ touch /tmp/iothread-create-test.qcow2
$ ./qemu-system-x86_64 -object iothread,id=iothr0 -qmp stdio <{"execute":"blockdev-add","arguments":{"node-name":"proto","driver":"file","filename":"/tmp/iothread-create-test.qcow2"}} 

{"execute":"x-blockdev-set-iothread","arguments":{"node-name":"proto","iothread":"iothr0"}} 

{"execute":"blockdev-create","arguments":{"job-id":"create","options":{"driver":"qcow2","file":"proto","size":0}}} 


EOF
{"QMP": {"version": {"qemu": {"micro": 90, "minor": 1, "major": 6}, 
"package": "v6.2.0-rc0-40-gd02d5fe5fb-dirty"}, "capabilities": ["oob"]}}

{"return": {}}
{"return": {}}
{"return": {}}
{"timestamp": {"seconds": 1636973542, "microseconds": 338117}, "event": 
"JOB_STATUS_CHANGE", "data": {"status": "created", "id": "create"}}
{"timestamp": {"seconds": 1636973542, "microseconds": 338197}, "event": 
"JOB_STATUS_CHANGE", "data": {"status": "running", "id": "create"}}

{"return": {}}
qemu: qemu_mutex_unlock_impl: Operation not permitted
[1]    86154 IOT instruction (core dumped)  ./qemu-system-x86_64 -object 
iothread,id=iothr0 -qmp stdio <<<''


So something’s fishy and perhaps we should investigate this...  I mean, 
I can’t really imagine a case where someone would need to run a 
blockdev-create job in an I/O thread, but right now the interface allows 
for it.


And then bdrv_create() is classified as global state, and also 
bdrv_co_create_opts_simple(), which is supposed to be a drop-in function 
for this .bdrv_co_create_opts function.  So that can’t work.


Also, I believe there might have been some functions you classified as 
GS that are called from .create* implementations.  I accepted that, 
given the abort I sketched above.  However, if we classify image 
creation as I/O, then those would need to be re-evaluated. For example, 
qcow2_co_create_opts() calls bdrv_create_file(), which is a GS function.


Some of this issues could be addressed by making .bdrv_co_create_opts a 
GS function and .bdrv_co_create an I/O function.  I believe that would 
be the ideal split, even though as shown above .bdrv_co_create doesn’t 
work in an I/O thread, and then you have the issue of probably all 
format drivers’ .bdrv_co_create implementations calling 
bdrv_open_blockdev_ref(), which is a GS function.


(VMDK even calls blk_new_open(), blk_new_with_bs(), and blk_unref(), 
none of which can ever be I/O functions, I think.)


I believe in practice the best is to for now classify all create-related 
functions as GS functions.  This is supported by the fact that 
qmp_blockdev_create() specifically creates the create job in the main 
context (with a TODO comment) and requires block drivers to error out 
when they encounter a node in a different AioContext.




Ok after better reviewing this I agree with you:
- .bdrv_co_create_opts is for sure a GS function. It is called by 
bdrv_create and it is asserted to be under BQL.
- .bdrv_co_create should also be a GS, and the easiest thing to do would 
be to follow the existing TODO and make sure we cannot run it outside 
the main loop. I think that I will put it as GS, and add the BQL 
assertion to blockdev_create_run, so that if for some reasons someone 
tries to do what you did above, will crash because of the assertion, and 
not because of the aiocontext lock missing.


Emanuele




Re: [qemu-web PATCH v2] Add Sponsors page

2021-11-18 Thread Thomas Huth

On 18/11/2021 13.29, Philippe Mathieu-Daudé wrote:

Add a page listing QEMU sponsors.

For now, only mention Fosshost which requested to be listed:
https://lists.gnu.org/archive/html/qemu-devel/2021-06/msg05381.html

...

+QEMU has sponsors!
+
+For continuous integration and testing, hardware is provided by:
+- [Fosshost](https://fosshost.org/)


Are we finally using the server now? ... the last time I asked, it was just 
idle and we talked about returning it?


 Thomas




Re: [PATCH-for-6.2? 1/3] docs/devel/style: Improve GLib functions rST rendering

2021-11-18 Thread Daniel P . Berrangé
On Thu, Nov 18, 2021 at 01:12:26PM +0100, Philippe Mathieu-Daudé wrote:
> On 11/18/21 11:58, Darren Kenny wrote:
> > Hi Philippe,
> > 
> > There are some inconsistencies in the use of '()' when referring to
> > functions or macros below...
> 
> Daniel, if you agree with Darren comments I can respin addressing them.

It is fine with me.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v3 0/6] SEV: add kernel-hashes=on for measured -kernel launch

2021-11-18 Thread Daniel P . Berrangé
On Thu, Nov 18, 2021 at 02:21:09PM +0200, Dov Murik wrote:
> Pinging again -- Daniel said this should be added to 6.2.
> 
> Is there anything I should do?

I'm going to take care of sending a PULL to relieve Paolo's
workload.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: does drive_get_next(IF_NONE) make sense?

2021-11-18 Thread Alistair Francis
On Tue, Nov 16, 2021 at 2:10 AM Thomas Huth  wrote:
>
> On 15/11/2021 08.12, Alistair Francis wrote:
> > On Mon, Nov 15, 2021 at 3:32 PM Markus Armbruster  wrote:
> >>
> >> Peter Maydell  writes:
> >>
> >>> On Fri, 12 Nov 2021 at 13:34, Markus Armbruster  wrote:
> 
>  Thomas Huth  writes:
> 
> > On 03/11/2021 09.41, Markus Armbruster wrote:
> >> Peter Maydell  writes:
> >>
> >>> Does it make sense for a device/board to do drive_get_next(IF_NONE) ?
> >> Short answer: hell, no!  ;)
> >
> > Would it make sense to add an "assert(type != IF_NONE)" to drive_get()
> > to avoid such mistakes in the future?
> 
>  Worth a try.
> >>>
> >>> You need to fix the sifive_u_otp device first :-)
> >>
> >> And for that, we may want Hao Wu's "[PATCH v4 5/7] blockdev: Add a new
> >> IF type IF_OTHER" first.
> >
> > I can fixup sifive_u_otp, just let me know what the prefered solution is
>
> What kind of device is that OTP exactly? If it is some kind of non-serial
> flash device, maybe you could simply use IF_PFLASH instead?

It just says "one time programmable memory". I'm guessing it's
sometype of eFuse.

Alistair

>
>   Thomas
>



Re: [PATCH v3 0/6] SEV: add kernel-hashes=on for measured -kernel launch

2021-11-18 Thread Dov Murik



On 18/11/2021 15:02, Daniel P. Berrangé wrote:
> On Thu, Nov 18, 2021 at 02:21:09PM +0200, Dov Murik wrote:
>> Pinging again -- Daniel said this should be added to 6.2.
>>
>> Is there anything I should do?
> 
> I'm going to take care of sending a PULL to relieve Paolo's
> workload.
> 

Thanks Daniel.

-Dov




Re: does drive_get_next(IF_NONE) make sense?

2021-11-18 Thread Peter Maydell
On Thu, 18 Nov 2021 at 13:04, Alistair Francis  wrote:
>
> On Tue, Nov 16, 2021 at 2:10 AM Thomas Huth  wrote:
> > What kind of device is that OTP exactly? If it is some kind of non-serial
> > flash device, maybe you could simply use IF_PFLASH instead?
>
> It just says "one time programmable memory". I'm guessing it's
> sometype of eFuse.

We used IF_PFLASH for the Xilinx efuse devices we recently added.
So we should probably use that for consistency, unless we want
to instead say that efuses should be something other than IF_PFLASH.

Either way it's a compatibility break for command lines, so we
should probably try to have only one rather than two :-)

-- PMM



[PATCH v2 0/3] Fix mtfsf, mtfsfi and mtfsb1 bug

2021-11-18 Thread Lucas Mateus Castro (alqotel)
The instructions mtfsf, mtfsfi and mtfsb1, when called, fail to set the FI
bit (bit 46 in the FPSCR) and can set to 1 the reserved bit 52 of the
FPSCR, as reported in https://gitlab.com/qemu-project/qemu/-/issues/266
(although the bug report is only for mtfsf, the bug applies to mtfsfi and
mtfsb1 as well).

These instructions also fail to throw an exception when the exception
and enabling bits are set, this can be tested by adding
'prctl(PR_SET_FPEXC, PR_FP_EXC_PRECISE);' before the __builtin_mtfsf
call in the test case of the bug report.

These patches aim to fix these issues.

Changes from v1:
- added a test for mtfsf (patch 3)
- moved "Resolves" to second patch
- removed gen_reset_fpstatus() from mtfsf,mtfsfi and mtfsb1 instructions

Lucas Mateus Castro (alqotel) (3):
  target/ppc: Fixed call to deferred exception
  target/ppc: ppc_store_fpscr doesn't update bit 52
  test/tcg/ppc64le: test mtfsf

 target/ppc/cpu.c   |  2 +-
 target/ppc/cpu.h   |  3 ++
 target/ppc/fpu_helper.c| 41 ++
 target/ppc/helper.h|  1 +
 target/ppc/translate/fp-impl.c.inc |  9 ++---
 tests/tcg/ppc64/Makefile.target|  1 +
 tests/tcg/ppc64le/Makefile.target  |  1 +
 tests/tcg/ppc64le/mtfsf.c  | 56 ++
 8 files changed, 107 insertions(+), 7 deletions(-)
 create mode 100644 tests/tcg/ppc64le/mtfsf.c

-- 
2.31.1




[PATCH v2 1/3] target/ppc: Fixed call to deferred exception

2021-11-18 Thread Lucas Mateus Castro (alqotel)
mtfsf, mtfsfi and mtfsb1 instructions call helper_float_check_status
after updating the value of FPSCR, but helper_float_check_status
checks fp_status and fp_status isn't updated based on FPSCR and
since the value of fp_status is reset earlier in the instruction,
it's always 0.

Because of this helper_float_check_status would change the FI bit to 0
as this bit checks if the last operation was inexact and
float_flag_inexact is always 0.

These instructions also don't throw exceptions correctly since
helper_float_check_status throw exceptions based on fp_status.

This commit created a new helper, helper_fpscr_check_status that checks
FPSCR value instead of fp_status and checks for a larger variety of
exceptions than do_float_check_status.

Since fp_status isn't used, gen_reset_fpstatus() was removed.

The hardware used to compare QEMU's behavior to was a Power9.

Signed-off-by: Lucas Mateus Castro (alqotel) 
---
 target/ppc/fpu_helper.c| 41 ++
 target/ppc/helper.h|  1 +
 target/ppc/translate/fp-impl.c.inc |  9 +++
 3 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index c4896cecc8..f086cb503f 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -414,6 +414,47 @@ void helper_store_fpscr(CPUPPCState *env, uint64_t val, 
uint32_t nibbles)
 ppc_store_fpscr(env, val);
 }
 
+void helper_fpscr_check_status(CPUPPCState *env)
+{
+CPUState *cs = env_cpu(env);
+target_ulong fpscr = env->fpscr;
+int error = 0;
+
+if ((fpscr & FP_VXSOFT) && (fpscr_ve != 0)) {
+error = POWERPC_EXCP_FP_VXSOFT;
+} else if ((fpscr & FP_OX) && (fpscr & FP_OE)) {
+error = POWERPC_EXCP_FP_OX;
+} else if ((fpscr & FP_UX) && (fpscr & FP_UE)) {
+error = POWERPC_EXCP_FP_UX;
+} else if ((fpscr & FP_XX) && (fpscr & FP_XE)) {
+error = POWERPC_EXCP_FP_XX;
+} else if ((fpscr & FP_ZX) && (fpscr & FP_ZE)) {
+error = POWERPC_EXCP_FP_ZX;
+} else if ((fpscr & FP_VXSNAN) && (fpscr_ve != 0)) {
+error = POWERPC_EXCP_FP_VXSNAN;
+} else if ((fpscr & FP_VXISI) && (fpscr_ve != 0)) {
+error = POWERPC_EXCP_FP_VXISI;
+} else if ((fpscr & FP_VXIDI) && (fpscr_ve != 0)) {
+error = POWERPC_EXCP_FP_VXIDI;
+} else if ((fpscr & FP_VXZDZ) && (fpscr_ve != 0)) {
+error = POWERPC_EXCP_FP_VXZDZ;
+} else if ((fpscr & FP_VXIMZ) && (fpscr_ve != 0)) {
+error = POWERPC_EXCP_FP_VXIMZ;
+} else if ((fpscr & FP_VXVC) && (fpscr_ve != 0)) {
+error = POWERPC_EXCP_FP_VXVC;
+}
+
+if (error) {
+cs->exception_index = POWERPC_EXCP_PROGRAM;
+env->error_code = error | POWERPC_EXCP_FP;
+/* Deferred floating-point exception after target FPSCR update */
+if (fp_exceptions_enabled(env)) {
+raise_exception_err_ra(env, cs->exception_index,
+   env->error_code, GETPC());
+}
+}
+}
+
 static void do_float_check_status(CPUPPCState *env, uintptr_t raddr)
 {
 CPUState *cs = env_cpu(env);
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 627811cefc..632a81c676 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -63,6 +63,7 @@ DEF_HELPER_FLAGS_1(cntlzw32, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_2(brinc, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 
 DEF_HELPER_1(float_check_status, void, env)
+DEF_HELPER_1(fpscr_check_status, void, env)
 DEF_HELPER_1(reset_fpstatus, void, env)
 DEF_HELPER_2(compute_fprf_float64, void, env, i64)
 DEF_HELPER_3(store_fpscr, void, env, i64, i32)
diff --git a/target/ppc/translate/fp-impl.c.inc 
b/target/ppc/translate/fp-impl.c.inc
index d1dbb1b96b..ca195fd9d2 100644
--- a/target/ppc/translate/fp-impl.c.inc
+++ b/target/ppc/translate/fp-impl.c.inc
@@ -769,7 +769,6 @@ static void gen_mtfsb1(DisasContext *ctx)
 return;
 }
 crb = 31 - crbD(ctx->opcode);
-gen_reset_fpstatus();
 /* XXX: we pretend we can only do IEEE floating-point computations */
 if (likely(crb != FPSCR_FEX && crb != FPSCR_VX && crb != FPSCR_NI)) {
 TCGv_i32 t0;
@@ -782,7 +781,7 @@ static void gen_mtfsb1(DisasContext *ctx)
 tcg_gen_shri_i32(cpu_crf[1], cpu_crf[1], FPSCR_OX);
 }
 /* We can raise a deferred exception */
-gen_helper_float_check_status(cpu_env);
+gen_helper_fpscr_check_status(cpu_env);
 }
 
 /* mtfsf */
@@ -803,7 +802,6 @@ static void gen_mtfsf(DisasContext *ctx)
 gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
 return;
 }
-gen_reset_fpstatus();
 if (l) {
 t0 = tcg_const_i32((ctx->insns_flags2 & PPC2_ISA205) ? 0x : 0xff);
 } else {
@@ -818,7 +816,7 @@ static void gen_mtfsf(DisasContext *ctx)
 tcg_gen_shri_i32(cpu_crf[1], cpu_crf[1], FPSCR_OX);
 }
 /* We can raise a deferred exception */
-gen_helper_float_check_status(cpu_env);
+gen_helper_fpscr_check_status(cpu_env);
 tcg_temp

[PATCH v2 2/3] target/ppc: ppc_store_fpscr doesn't update bit 52

2021-11-18 Thread Lucas Mateus Castro (alqotel)
This commit fixes the difference reported in the bug in the reserved
bit 52, it does this by adding this bit to the mask of bits to not be
directly altered in the ppc_store_fpscr function (the hardware used to
compare to QEMU was a Power9).

Although this is a difference reported in the bug, since it's a reserved
bit it may be a "don't care" case, as put in the bug report. Looking at
the ISA it doesn't explicitly mentions this bit can't be set, like it
does for FEX and VX, so I'm unsure if this is necessary.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/266
Signed-off-by: Lucas Mateus Castro (alqotel) 
---
 target/ppc/cpu.c | 2 +-
 target/ppc/cpu.h | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/target/ppc/cpu.c b/target/ppc/cpu.c
index f933d9f2bd..d7b42bae52 100644
--- a/target/ppc/cpu.c
+++ b/target/ppc/cpu.c
@@ -112,7 +112,7 @@ static inline void fpscr_set_rounding_mode(CPUPPCState *env)
 
 void ppc_store_fpscr(CPUPPCState *env, target_ulong val)
 {
-val &= ~(FP_VX | FP_FEX);
+val &= FPSCR_MTFS_MASK;
 if (val & FPSCR_IX) {
 val |= FP_VX;
 }
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index e946da5f3a..53463092ab 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -759,6 +759,9 @@ enum {
   FP_VXZDZ  | FP_VXIMZ  | FP_VXVC   | FP_VXSOFT | \
   FP_VXSQRT | FP_VXCVI)
 
+/* FPSCR bits that can be set by mtfsf, mtfsfi and mtfsb1 */
+#define FPSCR_MTFS_MASK (~((1ull << 11) | FP_VX | FP_FEX))
+
 /*/
 /* Vector status and control register */
 #define VSCR_NJ 16 /* Vector non-java */
-- 
2.31.1




[PATCH v2 3/3] test/tcg/ppc64le: test mtfsf

2021-11-18 Thread Lucas Mateus Castro (alqotel)
Added tests for the mtfsf to check if FI bit of FPSCR is being set
and if exception calls are being made correctly.

Signed-off-by: Lucas Mateus Castro (alqotel) 
---
 tests/tcg/ppc64/Makefile.target   |  1 +
 tests/tcg/ppc64le/Makefile.target |  1 +
 tests/tcg/ppc64le/mtfsf.c | 56 +++
 3 files changed, 58 insertions(+)
 create mode 100644 tests/tcg/ppc64le/mtfsf.c

diff --git a/tests/tcg/ppc64/Makefile.target b/tests/tcg/ppc64/Makefile.target
index 6ab7934fdf..8f4c7ac4ed 100644
--- a/tests/tcg/ppc64/Makefile.target
+++ b/tests/tcg/ppc64/Makefile.target
@@ -11,6 +11,7 @@ endif
 bcdsub: CFLAGS += -mpower8-vector
 
 PPC64_TESTS += byte_reverse
+PPC64_TESTS += mtfsf
 ifneq ($(DOCKER_IMAGE)$(CROSS_CC_HAS_POWER10),)
 run-byte_reverse: QEMU_OPTS+=-cpu POWER10
 run-plugin-byte_reverse-with-%: QEMU_OPTS+=-cpu POWER10
diff --git a/tests/tcg/ppc64le/Makefile.target 
b/tests/tcg/ppc64le/Makefile.target
index 5e65b1590d..b8cd9bf73a 100644
--- a/tests/tcg/ppc64le/Makefile.target
+++ b/tests/tcg/ppc64le/Makefile.target
@@ -10,6 +10,7 @@ endif
 bcdsub: CFLAGS += -mpower8-vector
 
 PPC64LE_TESTS += byte_reverse
+PPC64LE_TESTS += mtfsf
 ifneq ($(DOCKER_IMAGE)$(CROSS_CC_HAS_POWER10),)
 run-byte_reverse: QEMU_OPTS+=-cpu POWER10
 run-plugin-byte_reverse-with-%: QEMU_OPTS+=-cpu POWER10
diff --git a/tests/tcg/ppc64le/mtfsf.c b/tests/tcg/ppc64le/mtfsf.c
new file mode 100644
index 00..9b3290d94c
--- /dev/null
+++ b/tests/tcg/ppc64le/mtfsf.c
@@ -0,0 +1,56 @@
+#include 
+#include 
+#include 
+#include 
+
+#define FPSCR_VE 7  /* Floating-point invalid operation exception enable */
+#define FPSCR_VXSOFT 10 /* Floating-point invalid operation exception (soft) */
+#define FPSCR_FI 17 /* Floating-point fraction inexact   */
+
+#define FP_VE   (1ull << FPSCR_VE)
+#define FP_VXSOFT   (1ull << FPSCR_VXSOFT)
+#define FP_FI   (1ull << FPSCR_FI)
+
+void sigfpe_handler(int sig, siginfo_t *si, void *ucontext)
+{
+exit(0);
+}
+
+int main(void)
+{
+union {
+double d;
+long long ll;
+} fpscr;
+
+struct sigaction sa = {
+.sa_sigaction = sigfpe_handler,
+.sa_flags = SA_SIGINFO
+};
+
+/*
+ * Enable the MSR bits F0 and F1 to enable exceptions.
+ * This shouldn't be needed in linux-user as these bits are enabled by
+ * default, but this allows to execute either in a VM or a real machine
+ * to compare the behaviors.
+ */
+prctl(PR_SET_FPEXC, PR_FP_EXC_PRECISE);
+
+/* First test if the FI bit is being set correctly */
+fpscr.ll = FP_FI;
+__builtin_mtfsf(0b, fpscr.d);
+fpscr.d = __builtin_mffs();
+assert((fpscr.ll & FP_FI) != 0);
+
+/* Then test if the deferred exception is being called correctly */
+sigaction(SIGFPE, &sa, NULL);
+
+/*
+ * Although the VXSOFT exception has been chosen, based on test in a Power9
+ * any combination of exception bit + its enabling bit should work.
+ */
+fpscr.ll = FP_VE | FP_VXSOFT;
+__builtin_mtfsf(0b, fpscr.d);
+
+return 1;
+}
-- 
2.31.1




[PULL 0/6 for-6.2] AMD SEV patches

2021-11-18 Thread Daniel P . Berrangé
The following changes since commit 0055ecca84cb948c935224b4f7ca1ceb26209790:

  Merge tag 'vfio-fixes-2027.0' of git://github.com/awilliam/qemu-vfio into 
staging (2021-11-18 09:39:47 +0100)

are available in the Git repository at:

  https://gitlab.com/berrange/qemu tags/sev-hashes-pull-request

for you to fetch changes up to 58603ba2680fa35eade630e4b040e96953a11021:

  target/i386/sev: Replace qemu_map_ram_ptr with address_space_map (2021-11-18 
13:28:32 +)


Add property for requesting AMD SEV measured kernel launch

 - The 'sev-guest' object gains a boolean 'kernel-hashes' property
   which must be enabled to request a measured kernel launch.



Dov Murik (6):
  qapi/qom,target/i386: sev-guest: Introduce kernel-hashes=on|off option
  target/i386/sev: Add kernel hashes only if sev-guest.kernel-hashes=on
  target/i386/sev: Rephrase error message when no hashes table in guest
firmware
  target/i386/sev: Fail when invalid hashes table area detected
  target/i386/sev: Perform padding calculations at compile-time
  target/i386/sev: Replace qemu_map_ram_ptr with address_space_map

 qapi/qom.json |  7 -
 qemu-options.hx   |  6 +++-
 target/i386/sev.c | 79 +++
 3 files changed, 77 insertions(+), 15 deletions(-)

-- 
2.31.1





[PULL 3/6] target/i386/sev: Rephrase error message when no hashes table in guest firmware

2021-11-18 Thread Daniel P . Berrangé
From: Dov Murik 

Signed-off-by: Dov Murik 
Acked-by: Brijesh Singh 
Signed-off-by: Daniel P. Berrangé 
---
 target/i386/sev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index e3abbeef68..6ff196f7ad 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -1232,7 +1232,8 @@ bool sev_add_kernel_loader_hashes(SevKernelLoaderContext 
*ctx, Error **errp)
 }
 
 if (!pc_system_ovmf_table_find(SEV_HASH_TABLE_RV_GUID, &data, NULL)) {
-error_setg(errp, "SEV: kernel specified but OVMF has no hash table 
guid");
+error_setg(errp, "SEV: kernel specified but guest firmware "
+ "has no hashes table GUID");
 return false;
 }
 area = (SevHashTableDescriptor *)data;
-- 
2.31.1




[PULL 2/6] target/i386/sev: Add kernel hashes only if sev-guest.kernel-hashes=on

2021-11-18 Thread Daniel P . Berrangé
From: Dov Murik 

Commit cff03145ed3c ("sev/i386: Introduce sev_add_kernel_loader_hashes
for measured linux boot", 2021-09-30) introduced measured direct boot
with -kernel, using an OVMF-designated hashes table which QEMU fills.

However, if OVMF doesn't designate such an area, QEMU would completely
abort the VM launch.  This breaks launching with -kernel using older
OVMF images which don't publish the SEV_HASH_TABLE_RV_GUID.

Fix that so QEMU will only look for the hashes table if the sev-guest
kernel-hashes option is set to on.  Otherwise, QEMU won't look for the
designated area in OVMF and won't fill that area.

To enable addition of kernel hashes, launch the guest with:

-object sev-guest,...,kernel-hashes=on

Signed-off-by: Dov Murik 
Reported-by: Tom Lendacky 
Acked-by: Brijesh Singh 
Signed-off-by: Daniel P. Berrangé 
---
 target/i386/sev.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index cad32812f5..e3abbeef68 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -1223,6 +1223,14 @@ bool sev_add_kernel_loader_hashes(SevKernelLoaderContext 
*ctx, Error **errp)
 size_t hash_len = HASH_SIZE;
 int aligned_len;
 
+/*
+ * Only add the kernel hashes if the sev-guest configuration explicitly
+ * stated kernel-hashes=on.
+ */
+if (!sev_guest->kernel_hashes) {
+return false;
+}
+
 if (!pc_system_ovmf_table_find(SEV_HASH_TABLE_RV_GUID, &data, NULL)) {
 error_setg(errp, "SEV: kernel specified but OVMF has no hash table 
guid");
 return false;
-- 
2.31.1




[PATCH v3 0/5] tests/qtest: add some tests for virtio-net failover

2021-11-18 Thread Laurent Vivier
This series adds a qtest entry to test virtio-net failover feature.

We check following error cases:

- check missing id on device with failover_pair_id triggers an error
- check a primary device plugged on a bus that doesn't support hotplug
  triggers an error

We check the status of the machine before and after hotplugging cards and
feature negotiation:

- check we don't see the primary device at boot if failover is on
- check we see the primary device at boot if failover is off
- check we don't see the primary device if failover is on
  but failover_pair_id is not the one with on (I think this should be changed)
- check the primary device is plugged after the feature negotiation
- check the result if the primary device is plugged before standby device and
  vice-versa
- check the if the primary device is coldplugged and the standy device
  hotplugged and vice-versa
- check the migration triggers the unplug and the hotplug

There is one preliminary patch in the series:

- PATCH 1 introduces a function to enable PCI bridge.
  Failover needs to be plugged on a pcie-root-port and while
  the root port is not configured the cards behind it are not
  available

v3:
- fix a bug with ACPI unplug and add the related test

v2:
- remove PATCH 1 that introduced a function that can be replaced by
  qobject_to_json_pretty() (Markus)
- Add migration to a file and from the file to check the card is
  correctly unplugged on the source, and hotplugged on the dest
- Add an ACPI call to eject the card as the kernel would do

Laurent Vivier (5):
  qtest/libqos: add a function to initialize secondary PCI buses
  tests/qtest: add some tests for virtio-net failover
  failover: fix unplug pending detection
  libqtest: add a function to use a timeout when waiting for an event
  tests/libqtest: update virtio-net failover test

 hw/acpi/pcihp.c   |  30 +-
 include/hw/pci/pci_bridge.h   |   8 +
 tests/qtest/libqos/libqtest.h |  26 +-
 tests/qtest/libqos/pci.c  | 118 ++
 tests/qtest/libqos/pci.h  |   1 +
 tests/qtest/libqtest.c|  37 +-
 tests/qtest/meson.build   |   3 +
 tests/qtest/virtio-net-failover.c | 644 ++
 tests/unit/test-qga.c |   2 +-
 9 files changed, 859 insertions(+), 10 deletions(-)
 create mode 100644 tests/qtest/virtio-net-failover.c

-- 
2.33.1





[PULL 4/6] target/i386/sev: Fail when invalid hashes table area detected

2021-11-18 Thread Daniel P . Berrangé
From: Dov Murik 

Commit cff03145ed3c ("sev/i386: Introduce sev_add_kernel_loader_hashes
for measured linux boot", 2021-09-30) introduced measured direct boot
with -kernel, using an OVMF-designated hashes table which QEMU fills.

However, no checks are performed on the validity of the hashes area
designated by OVMF.  Specifically, if OVMF publishes the
SEV_HASH_TABLE_RV_GUID entry but it is filled with zeroes, this will
cause QEMU to write the hashes entries over the first page of the
guest's memory (GPA 0).

Add validity checks to the published area.  If the hashes table area's
base address is zero, or its size is too small to fit the aligned hashes
table, display an error and stop the guest launch.  In such case, the
following error will be displayed:

qemu-system-x86_64: SEV: guest firmware hashes table area is invalid 
(base=0x0 size=0x0)

Signed-off-by: Dov Murik 
Reported-by: Brijesh Singh 
Acked-by: Brijesh Singh 
Signed-off-by: Daniel P. Berrangé 
---
 target/i386/sev.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 6ff196f7ad..d11b512361 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -1221,7 +1221,7 @@ bool sev_add_kernel_loader_hashes(SevKernelLoaderContext 
*ctx, Error **errp)
 uint8_t kernel_hash[HASH_SIZE];
 uint8_t *hashp;
 size_t hash_len = HASH_SIZE;
-int aligned_len;
+int aligned_len = ROUND_UP(sizeof(SevHashTable), 16);
 
 /*
  * Only add the kernel hashes if the sev-guest configuration explicitly
@@ -1237,6 +1237,11 @@ bool sev_add_kernel_loader_hashes(SevKernelLoaderContext 
*ctx, Error **errp)
 return false;
 }
 area = (SevHashTableDescriptor *)data;
+if (!area->base || area->size < aligned_len) {
+error_setg(errp, "SEV: guest firmware hashes table area is invalid "
+ "(base=0x%x size=0x%x)", area->base, area->size);
+return false;
+}
 
 /*
  * Calculate hash of kernel command-line with the terminating null byte. If
@@ -1295,7 +1300,6 @@ bool sev_add_kernel_loader_hashes(SevKernelLoaderContext 
*ctx, Error **errp)
 memcpy(ht->kernel.hash, kernel_hash, sizeof(ht->kernel.hash));
 
 /* When calling sev_encrypt_flash, the length has to be 16 byte aligned */
-aligned_len = ROUND_UP(ht->len, 16);
 if (aligned_len != ht->len) {
 /* zero the excess data so the measurement can be reliably calculated 
*/
 memset(ht->padding, 0, aligned_len - ht->len);
-- 
2.31.1




Re: [qemu-web PATCH v2] Add Sponsors page

2021-11-18 Thread Paolo Bonzini

On 11/18/21 13:29, Philippe Mathieu-Daudé wrote:

Add a page listing QEMU sponsors.

For now, only mention Fosshost which requested to be listed:
https://lists.gnu.org/archive/html/qemu-devel/2021-06/msg05381.html

Cc: Thomas Markey 
Resolves: https://gitlab.com/qemu-project/qemu-web/-/issues/2
Signed-off-by: Philippe Mathieu-Daudé 


Microsoft is also sponsoring QEMU with free Azure credits, though we 
aren't using them yet...


Paolo




[PATCH v3 2/5] tests/qtest: add some tests for virtio-net failover

2021-11-18 Thread Laurent Vivier
Add test cases to test several error cases that must be
generated by invalid failover configuration.

Add a combination of coldplug and hotplug test cases to be
sure the primary is correctly managed according the
presence or not of the STANDBY feature.

Signed-off-by: Laurent Vivier 
---
 tests/qtest/meson.build   |   3 +
 tests/qtest/virtio-net-failover.c | 635 ++
 2 files changed, 638 insertions(+)
 create mode 100644 tests/qtest/virtio-net-failover.c

diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index c9d8458062ff..6d66bf522156 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -22,6 +22,9 @@ qtests_generic = \
   (config_all_devices.has_key('CONFIG_VIRTIO_SCSI') ? 
['fuzz-virtio-scsi-test'] : []) + \
   (config_all_devices.has_key('CONFIG_SB16') ? ['fuzz-sb16-test'] : []) + \
   (config_all_devices.has_key('CONFIG_SDHCI_PCI') ? ['fuzz-sdcard-test'] : []) 
+ \
+  (config_all_devices.has_key('CONFIG_VIRTIO_NET') and \
+   config_all_devices.has_key('CONFIG_Q35') and \
+   config_all_devices.has_key('CONFIG_VIRTIO_PCI') ? ['virtio-net-failover'] : 
[]) + \
   [
   'cdrom-test',
   'device-introspect-test',
diff --git a/tests/qtest/virtio-net-failover.c 
b/tests/qtest/virtio-net-failover.c
new file mode 100644
index ..576d4454c3c3
--- /dev/null
+++ b/tests/qtest/virtio-net-failover.c
@@ -0,0 +1,635 @@
+#include "qemu/osdep.h"
+#include "libqos/libqtest.h"
+#include "libqos/pci.h"
+#include "libqos/pci-pc.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qlist.h"
+#include "qapi/qmp/qjson.h"
+#include "libqos/malloc-pc.h"
+#include "libqos/virtio-pci.h"
+#include "hw/pci/pci.h"
+
+#define ACPI_PCIHP_ADDR_ICH90x0cc0
+#define PCI_EJ_BASE 0x0008
+
+#define BASE_MACHINE "-M q35 -nodefaults " \
+"-device pcie-root-port,id=root0,addr=0x1,bus=pcie.0,chassis=1 " \
+"-device pcie-root-port,id=root1,addr=0x2,bus=pcie.0,chassis=2 "
+
+#define MAC_PRIMARY "52:54:00:11:11:11"
+#define MAC_STANDBY "52:54:00:22:22:22"
+
+static QGuestAllocator guest_malloc;
+static QPCIBus *pcibus;
+
+static QTestState *machine_start(const char *args)
+{
+QTestState *qts;
+QPCIDevice *dev;
+
+qts = qtest_init(args);
+
+pc_alloc_init(&guest_malloc, qts, 0);
+pcibus = qpci_new_pc(qts, &guest_malloc);
+g_assert(qpci_secondary_buses_init(pcibus) == 2);
+
+dev = qpci_device_find(pcibus, QPCI_DEVFN(1, 0)); /* root0 */
+g_assert_nonnull(dev);
+
+qpci_device_enable(dev);
+qpci_iomap(dev, 4, NULL);
+
+g_free(dev);
+
+dev = qpci_device_find(pcibus, QPCI_DEVFN(2, 0)); /* root1 */
+g_assert_nonnull(dev);
+
+qpci_device_enable(dev);
+qpci_iomap(dev, 4, NULL);
+
+g_free(dev);
+
+return qts;
+}
+
+static void machine_stop(QTestState *qts)
+{
+qpci_free_pc(pcibus);
+alloc_destroy(&guest_malloc);
+qtest_quit(qts);
+}
+
+static void test_error_id(void)
+{
+QTestState *qts;
+QDict *resp;
+QDict *err;
+
+qts = machine_start(BASE_MACHINE
+"-device 
virtio-net,bus=root0,id=standby0,failover=on");
+
+resp = qtest_qmp(qts, "{'execute': 'device_add',"
+  "'arguments': {"
+  "'driver': 'virtio-net',"
+  "'bus': 'root1',"
+  "'failover_pair_id': 'standby0'"
+  "} }");
+g_assert(qdict_haskey(resp, "error"));
+
+err = qdict_get_qdict(resp, "error");
+g_assert(qdict_haskey(err, "desc"));
+
+g_assert_cmpstr(qdict_get_str(err, "desc"), ==,
+"Device with failover_pair_id needs to have id");
+
+qobject_unref(resp);
+
+machine_stop(qts);
+}
+
+static void test_error_pcie(void)
+{
+QTestState *qts;
+QDict *resp;
+QDict *err;
+
+qts = machine_start(BASE_MACHINE
+"-device 
virtio-net,bus=root0,id=standby0,failover=on");
+
+resp = qtest_qmp(qts, "{'execute': 'device_add',"
+  "'arguments': {"
+  "'driver': 'virtio-net',"
+  "'id': 'primary0',"
+  "'bus': 'pcie.0',"
+  "'failover_pair_id': 'standby0'"
+  "} }");
+g_assert(qdict_haskey(resp, "error"));
+
+err = qdict_get_qdict(resp, "error");
+g_assert(qdict_haskey(err, "desc"));
+
+g_assert_cmpstr(qdict_get_str(err, "desc"), ==,
+"Bus 'pcie.0' does not support hotplugging");
+
+qobject_unref(resp);
+
+machine_stop(qts);
+}
+
+static QDict *find_device(QDict *bus, const char *name)
+{
+const QObject *obj;
+QList *devices;
+QList *list;
+
+devices = qdict_get_qlist(bus, "devices");
+if (devices == NULL) {
+return NULL;
+}
+
+list = qlist_copy(devices);
+while ((obj = qlist_pop(list))) {
+QDict *device;
+
+device = qobject_to(QDict, obj);
+
+if (qdi

[PULL 1/6] qapi/qom, target/i386: sev-guest: Introduce kernel-hashes=on|off option

2021-11-18 Thread Daniel P . Berrangé
From: Dov Murik 

Introduce new boolean 'kernel-hashes' option on the sev-guest object.
It will be used to to decide whether to add the hashes of
kernel/initrd/cmdline to SEV guest memory when booting with -kernel.
The default value is 'off'.

Signed-off-by: Dov Murik 
Acked-by: Brijesh Singh 
Signed-off-by: Daniel P. Berrangé 
---
 qapi/qom.json |  7 ++-
 qemu-options.hx   |  6 +-
 target/i386/sev.c | 20 
 3 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/qapi/qom.json b/qapi/qom.json
index ccd1167808..eeb5395ff3 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -769,6 +769,10 @@
 # @reduced-phys-bits: number of bits in physical addresses that become
 # unavailable when SEV is enabled
 #
+# @kernel-hashes: if true, add hashes of kernel/initrd/cmdline to a
+# designated guest firmware page for measured boot
+# with -kernel (default: false) (since 6.2)
+#
 # Since: 2.12
 ##
 { 'struct': 'SevGuestProperties',
@@ -778,7 +782,8 @@
 '*policy': 'uint32',
 '*handle': 'uint32',
 '*cbitpos': 'uint32',
-'reduced-phys-bits': 'uint32' } }
+'reduced-phys-bits': 'uint32',
+'*kernel-hashes': 'bool' } }
 
 ##
 # @ObjectType:
diff --git a/qemu-options.hx b/qemu-options.hx
index 7749f59300..ae2c6dbbfc 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -5189,7 +5189,7 @@ SRST
  -object secret,id=sec0,keyid=secmaster0,format=base64,\\
  data=$SECRET,iv=$(sev_device = g_strdup(value);
 }
 
+static bool sev_guest_get_kernel_hashes(Object *obj, Error **errp)
+{
+SevGuestState *sev = SEV_GUEST(obj);
+
+return sev->kernel_hashes;
+}
+
+static void sev_guest_set_kernel_hashes(Object *obj, bool value, Error **errp)
+{
+SevGuestState *sev = SEV_GUEST(obj);
+
+sev->kernel_hashes = value;
+}
+
 static void
 sev_guest_class_init(ObjectClass *oc, void *data)
 {
@@ -345,6 +360,11 @@ sev_guest_class_init(ObjectClass *oc, void *data)
   sev_guest_set_session_file);
 object_class_property_set_description(oc, "session-file",
 "guest owners session parameters (encoded with base64)");
+object_class_property_add_bool(oc, "kernel-hashes",
+   sev_guest_get_kernel_hashes,
+   sev_guest_set_kernel_hashes);
+object_class_property_set_description(oc, "kernel-hashes",
+"add kernel hashes to guest firmware for measured Linux boot");
 }
 
 static void
-- 
2.31.1




Re: [qemu-web PATCH v2] Add Sponsors page

2021-11-18 Thread Philippe Mathieu-Daudé
+project maintainers

On 11/18/21 13:54, Thomas Huth wrote:
> On 18/11/2021 13.29, Philippe Mathieu-Daudé wrote:
>> Add a page listing QEMU sponsors.
>>
>> For now, only mention Fosshost which requested to be listed:
>> https://lists.gnu.org/archive/html/qemu-devel/2021-06/msg05381.html
> ...
>> +QEMU has sponsors!
>> +
>> +For continuous integration and testing, hardware is provided by:
>> +- [Fosshost](https://fosshost.org/)
> 
> Are we finally using the server now? ... the last time I asked, it was
> just idle and we talked about returning it?

I frankly have no idea whether we are using it and who has access
to it.

Thomas, who is your contact with the QEMU community?

Anyhow whether the community is using this resource or wasting it
is orthogonal to this patch purpose, which is to address Fosshost
request to be listed, and we can not deny we had access to their
resources for months.

Thanks,

Phil.




[PULL 5/6] target/i386/sev: Perform padding calculations at compile-time

2021-11-18 Thread Daniel P . Berrangé
From: Dov Murik 

In sev_add_kernel_loader_hashes, the sizes of structs are known at
compile-time, so calculate needed padding at compile-time.

No functional change intended.

Signed-off-by: Dov Murik 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Philippe Mathieu-Daudé 
Acked-by: Brijesh Singh 
Signed-off-by: Daniel P. Berrangé 
---
 target/i386/sev.c | 28 ++--
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index d11b512361..4fd258a570 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -110,9 +110,19 @@ typedef struct QEMU_PACKED SevHashTable {
 SevHashTableEntry cmdline;
 SevHashTableEntry initrd;
 SevHashTableEntry kernel;
-uint8_t padding[];
 } SevHashTable;
 
+/*
+ * Data encrypted by sev_encrypt_flash() must be padded to a multiple of
+ * 16 bytes.
+ */
+typedef struct QEMU_PACKED PaddedSevHashTable {
+SevHashTable ht;
+uint8_t padding[ROUND_UP(sizeof(SevHashTable), 16) - sizeof(SevHashTable)];
+} PaddedSevHashTable;
+
+QEMU_BUILD_BUG_ON(sizeof(PaddedSevHashTable) % 16 != 0);
+
 static SevGuestState *sev_guest;
 static Error *sev_mig_blocker;
 
@@ -1216,12 +1226,12 @@ bool 
sev_add_kernel_loader_hashes(SevKernelLoaderContext *ctx, Error **errp)
 uint8_t *data;
 SevHashTableDescriptor *area;
 SevHashTable *ht;
+PaddedSevHashTable *padded_ht;
 uint8_t cmdline_hash[HASH_SIZE];
 uint8_t initrd_hash[HASH_SIZE];
 uint8_t kernel_hash[HASH_SIZE];
 uint8_t *hashp;
 size_t hash_len = HASH_SIZE;
-int aligned_len = ROUND_UP(sizeof(SevHashTable), 16);
 
 /*
  * Only add the kernel hashes if the sev-guest configuration explicitly
@@ -1237,7 +1247,7 @@ bool sev_add_kernel_loader_hashes(SevKernelLoaderContext 
*ctx, Error **errp)
 return false;
 }
 area = (SevHashTableDescriptor *)data;
-if (!area->base || area->size < aligned_len) {
+if (!area->base || area->size < sizeof(PaddedSevHashTable)) {
 error_setg(errp, "SEV: guest firmware hashes table area is invalid "
  "(base=0x%x size=0x%x)", area->base, area->size);
 return false;
@@ -1282,7 +1292,8 @@ bool sev_add_kernel_loader_hashes(SevKernelLoaderContext 
*ctx, Error **errp)
  * Populate the hashes table in the guest's memory at the OVMF-designated
  * area for the SEV hashes table
  */
-ht = qemu_map_ram_ptr(NULL, area->base);
+padded_ht = qemu_map_ram_ptr(NULL, area->base);
+ht = &padded_ht->ht;
 
 ht->guid = sev_hash_table_header_guid;
 ht->len = sizeof(*ht);
@@ -1299,13 +1310,10 @@ bool 
sev_add_kernel_loader_hashes(SevKernelLoaderContext *ctx, Error **errp)
 ht->kernel.len = sizeof(ht->kernel);
 memcpy(ht->kernel.hash, kernel_hash, sizeof(ht->kernel.hash));
 
-/* When calling sev_encrypt_flash, the length has to be 16 byte aligned */
-if (aligned_len != ht->len) {
-/* zero the excess data so the measurement can be reliably calculated 
*/
-memset(ht->padding, 0, aligned_len - ht->len);
-}
+/* zero the excess data so the measurement can be reliably calculated */
+memset(padded_ht->padding, 0, sizeof(padded_ht->padding));
 
-if (sev_encrypt_flash((uint8_t *)ht, aligned_len, errp) < 0) {
+if (sev_encrypt_flash((uint8_t *)padded_ht, sizeof(*padded_ht), errp) < 0) 
{
 return false;
 }
 
-- 
2.31.1




Re: [qemu-web PATCH v2] Add Sponsors page

2021-11-18 Thread Philippe Mathieu-Daudé
On 11/18/21 14:41, Paolo Bonzini wrote:
> On 11/18/21 13:29, Philippe Mathieu-Daudé wrote:
>> Add a page listing QEMU sponsors.
>>
>> For now, only mention Fosshost which requested to be listed:
>> https://lists.gnu.org/archive/html/qemu-devel/2021-06/msg05381.html
>>
>> Cc: Thomas Markey 
>> Resolves: https://gitlab.com/qemu-project/qemu-web/-/issues/2
>> Signed-off-by: Philippe Mathieu-Daudé 
> 
> Microsoft is also sponsoring QEMU with free Azure credits, though we
> aren't using them yet...

For now I'm following Stefan recommendation from v1 [*]:

  "I would only include organizations that ask us to display
   sponsorship information to keep things simple."

I am not aware of Microsoft asking to be listed, should I add it in v3?

[*] https://www.mail-archive.com/qemu-devel@nongnu.org/msg836665.html




[PATCH v3 1/5] qtest/libqos: add a function to initialize secondary PCI buses

2021-11-18 Thread Laurent Vivier
Scan the PCI devices to find bridge and set PCI_SECONDARY_BUS and
PCI_SUBORDINATE_BUS (algorithm from seabios)

Signed-off-by: Laurent Vivier 
---
 include/hw/pci/pci_bridge.h |   8 +++
 tests/qtest/libqos/pci.c| 118 
 tests/qtest/libqos/pci.h|   1 +
 3 files changed, 127 insertions(+)

diff --git a/include/hw/pci/pci_bridge.h b/include/hw/pci/pci_bridge.h
index a94d350034bf..30691a6e5728 100644
--- a/include/hw/pci/pci_bridge.h
+++ b/include/hw/pci/pci_bridge.h
@@ -138,6 +138,7 @@ typedef struct PCIBridgeQemuCap {
 uint64_t mem_pref_64; /* Prefetchable memory to reserve (64-bit MMIO) */
 } PCIBridgeQemuCap;
 
+#define REDHAT_PCI_CAP_TYPE_OFFSET  3
 #define REDHAT_PCI_CAP_RESOURCE_RESERVE 1
 
 /*
@@ -152,6 +153,13 @@ typedef struct PCIResReserve {
 uint64_t mem_pref_64;
 } PCIResReserve;
 
+#define REDHAT_PCI_CAP_RES_RESERVE_BUS_RES 4
+#define REDHAT_PCI_CAP_RES_RESERVE_IO  8
+#define REDHAT_PCI_CAP_RES_RESERVE_MEM 16
+#define REDHAT_PCI_CAP_RES_RESERVE_PREF_MEM_32 20
+#define REDHAT_PCI_CAP_RES_RESERVE_PREF_MEM_64 24
+#define REDHAT_PCI_CAP_RES_RESERVE_CAP_SIZE32
+
 int pci_bridge_qemu_reserve_cap_init(PCIDevice *dev, int cap_offset,
PCIResReserve res_reserve, Error **errp);
 
diff --git a/tests/qtest/libqos/pci.c b/tests/qtest/libqos/pci.c
index e1e96189c821..3f0b18f4750b 100644
--- a/tests/qtest/libqos/pci.c
+++ b/tests/qtest/libqos/pci.c
@@ -13,6 +13,8 @@
 #include "qemu/osdep.h"
 #include "pci.h"
 
+#include "hw/pci/pci.h"
+#include "hw/pci/pci_bridge.h"
 #include "hw/pci/pci_regs.h"
 #include "qemu/host-utils.h"
 #include "qgraph.h"
@@ -99,6 +101,122 @@ void qpci_device_init(QPCIDevice *dev, QPCIBus *bus, 
QPCIAddress *addr)
 g_assert(!addr->device_id || device_id == addr->device_id);
 }
 
+static uint8_t qpci_find_resource_reserve_capability(QPCIDevice *dev)
+{
+uint16_t device_id;
+uint8_t cap = 0;
+
+if (qpci_config_readw(dev, PCI_VENDOR_ID) != PCI_VENDOR_ID_REDHAT) {
+return 0;
+}
+
+device_id = qpci_config_readw(dev, PCI_DEVICE_ID);
+
+if (device_id != PCI_DEVICE_ID_REDHAT_PCIE_RP &&
+device_id != PCI_DEVICE_ID_REDHAT_BRIDGE) {
+return 0;
+}
+
+do {
+cap = qpci_find_capability(dev, PCI_CAP_ID_VNDR, cap);
+} while (cap &&
+ qpci_config_readb(dev, cap + REDHAT_PCI_CAP_TYPE_OFFSET) !=
+ REDHAT_PCI_CAP_RESOURCE_RESERVE);
+if (cap) {
+uint8_t cap_len = qpci_config_readb(dev, cap + PCI_CAP_FLAGS);
+if (cap_len < REDHAT_PCI_CAP_RES_RESERVE_CAP_SIZE) {
+return 0;
+}
+}
+return cap;
+}
+
+static void qpci_secondary_buses_rec(QPCIBus *qbus, int bus, int *pci_bus)
+{
+QPCIDevice *dev;
+uint16_t class;
+uint8_t pribus, secbus, subbus;
+int i;
+
+for (i = 0; i < 32; i++) {
+dev = qpci_device_find(qbus, QPCI_DEVFN(bus + i, 0));
+if (dev == NULL) {
+continue;
+}
+class = qpci_config_readw(dev, PCI_CLASS_DEVICE);
+if (class == PCI_CLASS_BRIDGE_PCI) {
+qpci_config_writeb(dev, PCI_SECONDARY_BUS, 255);
+qpci_config_writeb(dev, PCI_SUBORDINATE_BUS, 0);
+}
+g_free(dev);
+}
+
+for (i = 0; i < 32; i++) {
+dev = qpci_device_find(qbus, QPCI_DEVFN(bus + i, 0));
+if (dev == NULL) {
+continue;
+}
+class = qpci_config_readw(dev, PCI_CLASS_DEVICE);
+if (class != PCI_CLASS_BRIDGE_PCI) {
+continue;
+}
+
+pribus = qpci_config_readb(dev, PCI_PRIMARY_BUS);
+if (pribus != bus) {
+qpci_config_writeb(dev, PCI_PRIMARY_BUS, bus);
+}
+
+secbus = qpci_config_readb(dev, PCI_SECONDARY_BUS);
+(*pci_bus)++;
+if (*pci_bus != secbus) {
+secbus = *pci_bus;
+qpci_config_writeb(dev, PCI_SECONDARY_BUS, secbus);
+}
+
+subbus = qpci_config_readb(dev, PCI_SUBORDINATE_BUS);
+qpci_config_writeb(dev, PCI_SUBORDINATE_BUS, 255);
+
+qpci_secondary_buses_rec(qbus, secbus << 5, pci_bus);
+
+if (subbus != *pci_bus) {
+uint8_t res_bus = *pci_bus;
+uint8_t cap = qpci_find_resource_reserve_capability(dev);
+
+if (cap) {
+uint32_t tmp_res_bus;
+
+tmp_res_bus = qpci_config_readl(dev, cap +
+
REDHAT_PCI_CAP_RES_RESERVE_BUS_RES);
+if (tmp_res_bus != (uint32_t)-1) {
+res_bus = tmp_res_bus & 0xFF;
+if ((uint8_t)(res_bus + secbus) < secbus ||
+(uint8_t)(res_bus + secbus) < res_bus) {
+res_bus = 0;
+}
+if (secbus + res_bus > *pci_bus) {
+res_bus = secbus + res_bus;
+}
+}
+}
+   

Re: [PATCH v3 0/6] SEV: add kernel-hashes=on for measured -kernel launch

2021-11-18 Thread Paolo Bonzini

On 11/18/21 14:02, Daniel P. Berrangé wrote:

On Thu, Nov 18, 2021 at 02:21:09PM +0200, Dov Murik wrote:

Pinging again -- Daniel said this should be added to 6.2.

Is there anything I should do?


I'm going to take care of sending a PULL to relieve Paolo's
workload.


Apologies, I ignored the series last week because it sounded like 7.0 
material.


Paolo




[PATCH v3 3/5] failover: fix unplug pending detection

2021-11-18 Thread Laurent Vivier
Failover needs to detect the end of the PCI unplug to start migration
after the VFIO card has been unplugged.

To do that, a flag is set in pcie_cap_slot_unplug_request_cb() and reset in
pcie_unplug_device().

But since
17858a169508 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on Q35")
we have switched to ACPI unplug and these functions are not called anymore
and the flag not set. So failover migration is not able to detect if card
is really unplugged and acts as it's done as soon as it's started. So it
doesn't wait the end of the unplug to start the migration. We don't see any
problem when we test that because ACPI unplug is faster than PCIe native
hotplug and when the migration really starts the unplug operation is
already done.

See c000a9bd06ea ("pci: mark device having guest unplug request pending")
a99c4da9fc2a ("pci: mark devices partially unplugged")

Signed-off-by: Laurent Vivier 
Reviewed-by: Ani Sinha 
---
 hw/acpi/pcihp.c | 30 +++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index f610a25d2ef9..30405b5113d7 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -222,9 +222,27 @@ static void acpi_pcihp_eject_slot(AcpiPciHpState *s, 
unsigned bsel, unsigned slo
 PCIDevice *dev = PCI_DEVICE(qdev);
 if (PCI_SLOT(dev->devfn) == slot) {
 if (!acpi_pcihp_pc_no_hotplug(s, dev)) {
-hotplug_ctrl = qdev_get_hotplug_handler(qdev);
-hotplug_handler_unplug(hotplug_ctrl, qdev, &error_abort);
-object_unparent(OBJECT(qdev));
+/*
+ * partially_hotplugged is used by virtio-net failover:
+ * failover has asked the guest OS to unplug the device
+ * but we need to keep some references to the device
+ * to be able to plug it back in case of failure so
+ * we don't execute hotplug_handler_unplug().
+ */
+if (dev->partially_hotplugged) {
+/*
+ * pending_deleted_event is set to true when
+ * virtio-net failover asks to unplug the device,
+ * and set to false here when the operation is done
+ * This is used by the migration loop to detect the
+ * end of the operation and really start the migration.
+ */
+qdev->pending_deleted_event = false;
+} else {
+hotplug_ctrl = qdev_get_hotplug_handler(qdev);
+hotplug_handler_unplug(hotplug_ctrl, qdev, &error_abort);
+object_unparent(OBJECT(qdev));
+}
 }
 }
 }
@@ -396,6 +414,12 @@ void acpi_pcihp_device_unplug_request_cb(HotplugHandler 
*hotplug_dev,
 return;
 }
 
+/*
+ * pending_deleted_event is used by virtio-net failover to detect the
+ * end of the unplug operation, the flag is set to false in
+ * acpi_pcihp_eject_slot() when the operation is completed.
+ */
+pdev->qdev.pending_deleted_event = true;
 s->acpi_pcihp_pci_status[bsel].down |= (1U << slot);
 acpi_send_event(DEVICE(hotplug_dev), ACPI_PCI_HOTPLUG_STATUS);
 }
-- 
2.33.1




[PATCH v3 4/5] libqtest: add a function to use a timeout when waiting for an event

2021-11-18 Thread Laurent Vivier
To be able to check we _don't_ receive a given event, we need to
be able to stop to wait for it after a given amount of time.

To do that, introduce a timeout value in qtest_qmp_eventwait().
The new version of the function is qtest_qmp_eventwait_timeout(),
that uses the new function qtest_qmp_receive_dict_timeout().

Signed-off-by: Laurent Vivier 
---
 tests/qtest/libqos/libqtest.h | 26 +++-
 tests/qtest/libqtest.c| 37 ++-
 tests/unit/test-qga.c |  2 +-
 3 files changed, 58 insertions(+), 7 deletions(-)

diff --git a/tests/qtest/libqos/libqtest.h b/tests/qtest/libqos/libqtest.h
index 59e927119563..3f96afa5f431 100644
--- a/tests/qtest/libqos/libqtest.h
+++ b/tests/qtest/libqos/libqtest.h
@@ -209,6 +209,17 @@ void qtest_qmp_vsend_fds(QTestState *s, int *fds, size_t 
fds_num,
 void qtest_qmp_vsend(QTestState *s, const char *fmt, va_list ap)
 GCC_FMT_ATTR(2, 0);
 
+/**
+ * qtest_qmp_receive_dict_timeout:
+ * @s: #QTestState instance to operate on.
+ * @timeout: time to wait before aborting read
+ *
+ * Reads a QMP message from QEMU and returns the response.
+ * If a timeout is provided, return NULL if message is
+ * not received during the given amount of time
+ */
+QDict *qtest_qmp_receive_dict_timeout(QTestState *s, struct timeval *timeout);
+
 /**
  * qtest_qmp_receive_dict:
  * @s: #QTestState instance to operate on.
@@ -236,6 +247,19 @@ QDict *qtest_qmp_receive(QTestState *s);
  */
 void qtest_qmp_eventwait(QTestState *s, const char *event);
 
+/**
+ * qtest_qmp_eventwait_timeout:
+ * @s: #QTestState instance to operate on.
+ * @timeout: time to wait before aborting wait
+ * @event: event to wait for.
+ *
+ * Continuously polls for QMP responses until it receives the desired event
+ * or the timeout exausts.
+ * Returns a copy of the event for further investigation or NULL on timeout
+ */
+QDict *qtest_qmp_eventwait_timeout(QTestState *s, struct timeval *timeout,
+   const char *event);
+
 /**
  * qtest_qmp_eventwait_ref:
  * @s: #QTestState instance to operate on.
@@ -690,7 +714,7 @@ void qtest_remove_abrt_handler(void *data);
 void qtest_qmp_assert_success(QTestState *qts, const char *fmt, ...)
 GCC_FMT_ATTR(2, 3);
 
-QDict *qmp_fd_receive(int fd);
+QDict *qmp_fd_receive(int fd, struct timeval *timeout);
 void qmp_fd_vsend_fds(int fd, int *fds, size_t fds_num,
   const char *fmt, va_list ap) GCC_FMT_ATTR(4, 0);
 void qmp_fd_vsend(int fd, const char *fmt, va_list ap) GCC_FMT_ATTR(2, 0);
diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index 25aeea385bfa..10e05d0c7aa8 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -599,7 +599,7 @@ static void qmp_response(void *opaque, QObject *obj, Error 
*err)
 g_assert(qmp->response);
 }
 
-QDict *qmp_fd_receive(int fd)
+QDict *qmp_fd_receive(int fd, struct timeval *timeout)
 {
 QMPResponseParser qmp;
 bool log = getenv("QTEST_LOG") != NULL;
@@ -610,6 +610,18 @@ QDict *qmp_fd_receive(int fd)
 ssize_t len;
 char c;
 
+if (timeout) {
+fd_set set;
+int ret;
+
+FD_ZERO(&set);
+FD_SET(fd, &set);
+ret = select(fd + 1, &set, NULL, NULL, timeout);
+if (ret == 0) {
+/* timeout */
+return NULL;
+}
+}
 len = read(fd, &c, 1);
 if (len == -1 && errno == EINTR) {
 continue;
@@ -643,9 +655,14 @@ QDict *qtest_qmp_receive(QTestState *s)
 }
 }
 
+QDict *qtest_qmp_receive_dict_timeout(QTestState *s, struct timeval *timeout)
+{
+return qmp_fd_receive(s->qmp_fd, timeout);
+}
+
 QDict *qtest_qmp_receive_dict(QTestState *s)
 {
-return qmp_fd_receive(s->qmp_fd);
+return qtest_qmp_receive_dict_timeout(s, NULL);
 }
 
 int qtest_socket_server(const char *socket_path)
@@ -729,7 +746,7 @@ QDict *qmp_fdv(int fd, const char *fmt, va_list ap)
 {
 qmp_fd_vsend_fds(fd, NULL, 0, fmt, ap);
 
-return qmp_fd_receive(fd);
+return qmp_fd_receive(fd, NULL);
 }
 
 QDict *qtest_vqmp_fds(QTestState *s, int *fds, size_t fds_num,
@@ -848,7 +865,8 @@ QDict *qtest_qmp_event_ref(QTestState *s, const char *event)
 return NULL;
 }
 
-QDict *qtest_qmp_eventwait_ref(QTestState *s, const char *event)
+QDict *qtest_qmp_eventwait_timeout(QTestState *s, struct timeval *timeout,
+   const char *event)
 {
 QDict *response = qtest_qmp_event_ref(s, event);
 
@@ -857,7 +875,11 @@ QDict *qtest_qmp_eventwait_ref(QTestState *s, const char 
*event)
 }
 
 for (;;) {
-response = qtest_qmp_receive_dict(s);
+response = qtest_qmp_receive_dict_timeout(s, timeout);
+if (timeout != NULL && response == NULL) {
+/* exit on timeout */
+return NULL;
+}
 if ((qdict_haskey(response, "event")) &&
 (strcmp(qdict_get_str(response, "event"), event) == 0)) {
 

Re: [PATCH v4 00/25] block layer: split block APIs in global state and I/O

2021-11-18 Thread Paolo Bonzini

On 11/15/21 17:03, Hanna Reitz wrote:


I only really see four solutions for this:
(1) We somehow make the amend job run in the main context under the BQL 
and have it prevent all concurrent I/O access (seems bad)
(2) We can make the permission functions part of the I/O path (seems 
wrong and probably impossible?)
(3) We can drop the permissions update and permanently require the 
permissions that we need when updating keys (I think this might break 
existing use cases)
(4) We can acquire the BQL around the permission update call and perhaps 
that works?


I don’t know how (4) would work but it’s basically the only reasonable 
solution I can come up with.  Would this be a way to call a BQL function 
from an I/O function?


I think that would deadlock:

mainI/O thread
-
start bdrv_co_amend
take BQL
bdrv_drain
... hangs ...

(2) is definitely wrong.

(3) I have no idea.

Would it be possible or meaningful to do the bdrv_child_refresh_perms in 
qmp_x_blockdev_amend?  It seems that all users need it, and in general 
it seems weird to amend a qcow2 or luks header (and thus the meaning of 
parts of the file) while others can write to the same file.


Paolo




[PULL 6/6] target/i386/sev: Replace qemu_map_ram_ptr with address_space_map

2021-11-18 Thread Daniel P . Berrangé
From: Dov Murik 

Use address_space_map/unmap and check for errors.

Signed-off-by: Dov Murik 
Acked-by: Brijesh Singh 
[Two lines wrapped for length - Daniel]
Signed-off-by: Daniel P. Berrangé 
---
 target/i386/sev.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 4fd258a570..025ff7a6f8 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -37,6 +37,7 @@
 #include "qapi/qmp/qerror.h"
 #include "exec/confidential-guest-support.h"
 #include "hw/i386/pc.h"
+#include "exec/address-spaces.h"
 
 #define TYPE_SEV_GUEST "sev-guest"
 OBJECT_DECLARE_SIMPLE_TYPE(SevGuestState, SEV_GUEST)
@@ -1232,6 +1233,9 @@ bool sev_add_kernel_loader_hashes(SevKernelLoaderContext 
*ctx, Error **errp)
 uint8_t kernel_hash[HASH_SIZE];
 uint8_t *hashp;
 size_t hash_len = HASH_SIZE;
+hwaddr mapped_len = sizeof(*padded_ht);
+MemTxAttrs attrs = { 0 };
+bool ret = true;
 
 /*
  * Only add the kernel hashes if the sev-guest configuration explicitly
@@ -1292,7 +1296,12 @@ bool sev_add_kernel_loader_hashes(SevKernelLoaderContext 
*ctx, Error **errp)
  * Populate the hashes table in the guest's memory at the OVMF-designated
  * area for the SEV hashes table
  */
-padded_ht = qemu_map_ram_ptr(NULL, area->base);
+padded_ht = address_space_map(&address_space_memory, area->base,
+  &mapped_len, true, attrs);
+if (!padded_ht || mapped_len != sizeof(*padded_ht)) {
+error_setg(errp, "SEV: cannot map hashes table guest memory area");
+return false;
+}
 ht = &padded_ht->ht;
 
 ht->guid = sev_hash_table_header_guid;
@@ -1314,10 +1323,13 @@ bool 
sev_add_kernel_loader_hashes(SevKernelLoaderContext *ctx, Error **errp)
 memset(padded_ht->padding, 0, sizeof(padded_ht->padding));
 
 if (sev_encrypt_flash((uint8_t *)padded_ht, sizeof(*padded_ht), errp) < 0) 
{
-return false;
+ret = false;
 }
 
-return true;
+address_space_unmap(&address_space_memory, padded_ht,
+mapped_len, true, mapped_len);
+
+return ret;
 }
 
 static void
-- 
2.31.1




Re: [PATCH v4 00/25] block layer: split block APIs in global state and I/O

2021-11-18 Thread Paolo Bonzini

On 11/15/21 17:03, Hanna Reitz wrote:

and second fuse_do_truncate(), which calls blk_set_perm().


Here it seems that a non-growable export is still growable as long as 
nobody is watching. :)  Is this the desired behavior?


Paolo




[PATCH v3 5/5] tests/libqtest: update virtio-net failover test

2021-11-18 Thread Laurent Vivier
Update the migration test to check we correctly wait the end
of the card unplug before doing the migration.

Signed-off-by: Laurent Vivier 
---
 tests/qtest/virtio-net-failover.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/tests/qtest/virtio-net-failover.c 
b/tests/qtest/virtio-net-failover.c
index 576d4454c3c3..7cc276e6ffc1 100644
--- a/tests/qtest/virtio-net-failover.c
+++ b/tests/qtest/virtio-net-failover.c
@@ -478,6 +478,7 @@ static void test_outmigrate(gconstpointer opaque)
 QTestState *qts;
 QDict *resp, *args, *data;
 g_autofree gchar *uri = g_strdup_printf("exec: cat > %s", (gchar *)opaque);
+struct timeval timeout;
 
 qts = machine_start(BASE_MACHINE
  "-netdev user,id=hs0 "
@@ -525,11 +526,19 @@ static void test_outmigrate(gconstpointer opaque)
 
 qobject_unref(resp);
 
-qtest_outl(qts, ACPI_PCIHP_ADDR_ICH9 + PCI_EJ_BASE, 1);
+/*
+ * The migration cannot start if the card is not ejected,
+ * so we check it cannot end ("STOP") before the card is ejected
+ */
+/* 10s is enough for ACPI, PCIe native would need at least 30s */
+timeout.tv_sec = 10;
+timeout.tv_usec = 0;
+resp = qtest_qmp_eventwait_timeout(qts, &timeout, "STOP");
+g_assert_null(resp);
 
-resp = qtest_qmp_eventwait_ref(qts, "STOP");
-qobject_unref(resp);
+qtest_outl(qts, ACPI_PCIHP_ADDR_ICH9 + PCI_EJ_BASE, 1);
 
+qtest_qmp_eventwait(qts, "STOP");
 /*
  * in fact, the card is ejected from the point of view of kernel
  * but not really from QEMU to be able to hotplug it back if
-- 
2.33.1




Re: [qemu-web PATCH v2] Add Sponsors page

2021-11-18 Thread Philippe Mathieu-Daudé
On 11/18/21 14:59, Daniel P. Berrangé wrote:
> On Thu, Nov 18, 2021 at 01:54:38PM +0100, Thomas Huth wrote:
>> On 18/11/2021 13.29, Philippe Mathieu-Daudé wrote:
>>> Add a page listing QEMU sponsors.
>>>
>>> For now, only mention Fosshost which requested to be listed:
>>> https://lists.gnu.org/archive/html/qemu-devel/2021-06/msg05381.html
>> ...
>>> +QEMU has sponsors!
>>> +
>>> +For continuous integration and testing, hardware is provided by:
>>> +- [Fosshost](https://fosshost.org/)
>>
>> Are we finally using the server now? ... the last time I asked, it was just
>> idle and we talked about returning it?
> 
> IMHO, whether we're currently using it or not is tangential.
> 
> The sponsor has granted / reserved resources for QEMU, which can have
> a direct cost for them, and/or take away from what the can grant other
> projects. As such we should be creditting them for giving us this grant,
> even if we've not got around to using it.
> 
> If we do decide to give up any particular resources, it is a quick patch
> to update this page again.
> 
> My only suggestion would be that we don't explicitly state /what/ we're
> using the resource for, just that we've been granted it
> 
> IOW say something more like
> 
> [Fosshost](https://fosshost.org/) has provided QEMU
> access to a dedicated physical compute host.
> 
> [Microsoft](https://microsoft.com) has provided QEMU
> with credits for use on Azure Cloud.

This sounds good to me. I will respin using your suggested
scheme.




Re: [qemu-web PATCH v2] Add Sponsors page

2021-11-18 Thread Daniel P . Berrangé
On Thu, Nov 18, 2021 at 01:54:38PM +0100, Thomas Huth wrote:
> On 18/11/2021 13.29, Philippe Mathieu-Daudé wrote:
> > Add a page listing QEMU sponsors.
> > 
> > For now, only mention Fosshost which requested to be listed:
> > https://lists.gnu.org/archive/html/qemu-devel/2021-06/msg05381.html
> ...
> > +QEMU has sponsors!
> > +
> > +For continuous integration and testing, hardware is provided by:
> > +- [Fosshost](https://fosshost.org/)
> 
> Are we finally using the server now? ... the last time I asked, it was just
> idle and we talked about returning it?

IMHO, whether we're currently using it or not is tangential.

The sponsor has granted / reserved resources for QEMU, which can have
a direct cost for them, and/or take away from what the can grant other
projects. As such we should be creditting them for giving us this grant,
even if we've not got around to using it.

If we do decide to give up any particular resources, it is a quick patch
to update this page again.

My only suggestion would be that we don't explicitly state /what/ we're
using the resource for, just that we've been granted it

IOW say something more like

[Fosshost](https://fosshost.org/) has provided QEMU
access to a dedicated physical compute host.

[Microsoft](https://microsoft.com) has provided QEMU
with credits for use on Azure Cloud.

etc

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [qemu-web PATCH v2] Add Sponsors page

2021-11-18 Thread Philippe Mathieu-Daudé
Cc'ing Alistair regarding the RISC-V foundation help:
https://www.cnx-software.com/2021/05/03/the-risc-v-foundation-to-give-away-1000-risc-v-development-boards/

On 11/18/21 13:29, Philippe Mathieu-Daudé wrote:
> Add a page listing QEMU sponsors.
> 
> For now, only mention Fosshost which requested to be listed:
> https://lists.gnu.org/archive/html/qemu-devel/2021-06/msg05381.html
> 
> Cc: Thomas Markey 
> Resolves: https://gitlab.com/qemu-project/qemu-web/-/issues/2
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> Since v1:
> - move to footer (Daniel)
> - only list sponsor who asked to be listed (Stefan)
> ---

> diff --git a/sponsors.md b/sponsors.md
> new file mode 100644
> index 000..1c097c8
> --- /dev/null
> +++ b/sponsors.md
> @@ -0,0 +1,9 @@
> +---
> +title: QEMU sponsors
> +permalink: /sponsors/
> +---
> +
> +QEMU has sponsors!
> +
> +For continuous integration and testing, hardware is provided by:
> +- [Fosshost](https://fosshost.org/)
> 




Re: [PATCH v2 1/2] qemu-options: define -spice only #ifdef CONFIG_SPICE

2021-11-18 Thread Marc-André Lureau
Hi

On Thu, Nov 18, 2021 at 3:58 PM Markus Armbruster  wrote:
>
> marcandre.lur...@redhat.com writes:
>
> > From: Marc-André Lureau 
> >
> > Fixes:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1982600
> >
> > Signed-off-by: Marc-André Lureau 
> > ---
> >  softmmu/vl.c| 2 ++
> >  qemu-options.hx | 2 ++
> >  2 files changed, 4 insertions(+)
> >
> > diff --git a/softmmu/vl.c b/softmmu/vl.c
> > index 1159a64bce4e..385465fbeb6d 100644
> > --- a/softmmu/vl.c
> > +++ b/softmmu/vl.c
> > @@ -3538,6 +3538,7 @@ void qemu_init(int argc, char **argv, char **envp)
> >  case QEMU_OPTION_readconfig:
> >  qemu_read_config_file(optarg, qemu_parse_config_group, 
> > &error_fatal);
> >  break;
> > +#ifdef CONFIG_SPICE
> >  case QEMU_OPTION_spice:
> >  olist = qemu_find_opts_err("spice", NULL);
> >  if (!olist) {
>error_report("spice support is disabled");
>exit(1);
>}
>
> Is this error still reachable?

I wonder if module loading failed, it's non fatal and will report this error.

>
> > @@ -3550,6 +3551,7 @@ void qemu_init(int argc, char **argv, char **envp)
> >  }
> >  display_remote++;
> >  break;
> > +#endif
> >  case QEMU_OPTION_writeconfig:
> >  {
> >  FILE *fp;
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 7749f59300b5..323913945a5d 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -2017,6 +2017,7 @@ SRST
> >  Enable SDL.
> >  ERST
> >
> > +#ifdef CONFIG_SPICE
> >  DEF("spice", HAS_ARG, QEMU_OPTION_spice,
> >  "-spice [port=port][,tls-port=secured-port][,x509-dir=]\n"
> >  "   [,x509-key-file=][,x509-key-password=]\n"
> > @@ -2038,6 +2039,7 @@ DEF("spice", HAS_ARG, QEMU_OPTION_spice,
> >  "   enable spice\n"
> >  "   at least one of {port, tls-port} is mandatory\n",
> >  QEMU_ARCH_ALL)
> > +#endif
> >  SRST
> >  ``-spice option[,option[,...]]``
> >  Enable the spice remote desktop protocol. Valid options are
>




[PATCH-for-6.2?] docs: Spell QEMU all caps

2021-11-18 Thread Philippe Mathieu-Daudé
Replace Qemu -> QEMU.

Signed-off-by: Philippe Mathieu-Daudé 
---
 docs/devel/modules.rst|  2 +-
 docs/devel/multi-thread-tcg.rst   |  2 +-
 docs/devel/style.rst  |  2 +-
 docs/devel/ui.rst |  4 ++--
 docs/interop/nbd.txt  |  6 +++---
 docs/interop/qcow2.txt|  8 
 docs/multiseat.txt|  2 +-
 docs/system/device-url-syntax.rst.inc |  2 +-
 docs/system/i386/sgx.rst  | 26 +-
 docs/u2f.txt  |  2 +-
 10 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/docs/devel/modules.rst b/docs/devel/modules.rst
index 066f347b89b..8e999c4fa48 100644
--- a/docs/devel/modules.rst
+++ b/docs/devel/modules.rst
@@ -1,5 +1,5 @@
 
-Qemu modules
+QEMU modules
 
 
 .. kernel-doc:: include/qemu/module.h
diff --git a/docs/devel/multi-thread-tcg.rst b/docs/devel/multi-thread-tcg.rst
index 5b446ee08b6..c9541a7b20a 100644
--- a/docs/devel/multi-thread-tcg.rst
+++ b/docs/devel/multi-thread-tcg.rst
@@ -228,7 +228,7 @@ Emulated hardware state
 
 Currently thanks to KVM work any access to IO memory is automatically
 protected by the global iothread mutex, also known as the BQL (Big
-Qemu Lock). Any IO region that doesn't use global mutex is expected to
+QEMU Lock). Any IO region that doesn't use global mutex is expected to
 do its own locking.
 
 However IO memory isn't the only way emulated hardware state can be
diff --git a/docs/devel/style.rst b/docs/devel/style.rst
index 260e3263fa0..e00af62e763 100644
--- a/docs/devel/style.rst
+++ b/docs/devel/style.rst
@@ -686,7 +686,7 @@ Rationale: hex numbers are hard to read in logs when there 
is no 0x prefix,
 especially when (occasionally) the representation doesn't contain any letters
 and especially in one line with other decimal numbers. Number groups are 
allowed
 to not use '0x' because for some things notations like %x.%x.%x are used not
-only in Qemu. Also dumping raw data bytes with '0x' is less readable.
+only in QEMU. Also dumping raw data bytes with '0x' is less readable.
 
 '#' printf flag
 ---
diff --git a/docs/devel/ui.rst b/docs/devel/ui.rst
index 06c7d622ce7..17fb667dec4 100644
--- a/docs/devel/ui.rst
+++ b/docs/devel/ui.rst
@@ -1,8 +1,8 @@
 =
-Qemu UI subsystem
+QEMU UI subsystem
 =
 
-Qemu Clipboard
+QEMU Clipboard
 --
 
 .. kernel-doc:: include/ui/clipboard.h
diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
index 10ce098a29b..bdb0f2a41ac 100644
--- a/docs/interop/nbd.txt
+++ b/docs/interop/nbd.txt
@@ -1,4 +1,4 @@
-Qemu supports the NBD protocol, and has an internal NBD client (see
+QEMU supports the NBD protocol, and has an internal NBD client (see
 block/nbd.c), an internal NBD server (see blockdev-nbd.c), and an
 external NBD server tool (see qemu-nbd.c). The common code is placed
 in nbd/*.
@@ -7,11 +7,11 @@ The NBD protocol is specified here:
 https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md
 
 The following paragraphs describe some specific properties of NBD
-protocol realization in Qemu.
+protocol realization in QEMU.
 
 = Metadata namespaces =
 
-Qemu supports the "base:allocation" metadata context as defined in the
+QEMU supports the "base:allocation" metadata context as defined in the
 NBD protocol specification, and also defines an additional metadata
 namespace "qemu".
 
diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index 0463f761efb..f7dc304ff69 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -313,7 +313,7 @@ The fields of the bitmaps extension are:
The number of bitmaps contained in the image. Must be
greater than or equal to 1.
 
-   Note: Qemu currently only supports up to 65535 bitmaps per
+   Note: QEMU currently only supports up to 65535 bitmaps per
image.
 
   4 -  7:  Reserved, must be zero.
@@ -775,7 +775,7 @@ Structure of a bitmap directory entry:
   2: extra_data_compatible
  This flags is meaningful when the extra data is
  unknown to the software (currently any extra data is
- unknown to Qemu).
+ unknown to QEMU).
  If it is set, the bitmap may be used as expected, 
extra
  data must be left as is.
  If it is not set, the bitmap must not be used, but
@@ -793,7 +793,7 @@ Structure of a bitmap directory entry:
  17:granularity_bits
 Granularity bits. Valid values: 0 - 63.
 
-Note: Qemu currently supports only values 9 - 31.
+Note: QEMU currently supports only values 9 - 31.
 
 Granularity is calculated as
 granularity = 1 << granula

[PATCH 0/3] block: misc fixes & improvements for SSH block driver key fingerprints

2021-11-18 Thread Daniel P . Berrangé
 * The docs were pointing people towards the obsolete and insecure
   MD5 fingerprint config instead of preferred sha256
 * The sha256 fingerprint handling wasn't wired up into the legacy
   CLI parsing code
 * Finger print check failures were hard to diagnose due to limited
   info reported on error.

Daniel P. Berrangé (3):
  block: better document SSH host key fingerprint checking
  block: support sha256 fingerprint with pre-blockdev options
  block: print the server key type and fingerprint on failure

 block/ssh.c| 42 +-
 docs/system/qemu-block-drivers.rst.inc | 30 +++---
 2 files changed, 61 insertions(+), 11 deletions(-)

-- 
2.31.1





[PATCH 1/3] block: better document SSH host key fingerprint checking

2021-11-18 Thread Daniel P . Berrangé
The docs still illustrate host key fingerprint checking using the old
md5 hashes which are considered insecure and obsolete. Change it to
illustrate using a sha256 hash. Also show how to extract the hash
value from the known_hosts file.

Signed-off-by: Daniel P. Berrangé 
---
 docs/system/qemu-block-drivers.rst.inc | 30 ++
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/docs/system/qemu-block-drivers.rst.inc 
b/docs/system/qemu-block-drivers.rst.inc
index 16225710eb..2aeeaf6361 100644
--- a/docs/system/qemu-block-drivers.rst.inc
+++ b/docs/system/qemu-block-drivers.rst.inc
@@ -778,10 +778,32 @@ The optional *HOST_KEY_CHECK* parameter controls how the 
remote
 host's key is checked.  The default is ``yes`` which means to use
 the local ``.ssh/known_hosts`` file.  Setting this to ``no``
 turns off known-hosts checking.  Or you can check that the host key
-matches a specific fingerprint:
-``host_key_check=md5:78:45:8e:14:57:4f:d5:45:83:0a:0e:f3:49:82:c9:c8``
-(``sha1:`` can also be used as a prefix, but note that OpenSSH
-tools only use MD5 to print fingerprints).
+matches a specific fingerprint. The fingerprint can be provided in
+``md5``, ``sha1``, or ``sha256`` format, however, it is strongly
+recommended to only use ``sha256``, since the other options are
+considered insecure by modern standards. The fingerprint value
+must be given as a hex encoded string::
+
+  
host_key_check=sha256:04ce2ae89ff4295a6b9c4111640bdcb3297858ee55cb434d9dd88796e93aa795``
+
+The key string may optionally contain ":" separators between
+each pair of hex digits.
+
+The ``$HOME/.ssh/known_hosts`` file contains the base64 encoded
+host keys. These can be converted into the format needed for
+QEMU using a command such as::
+
+   $ for key in `grep 10.33.8.112 known_hosts | awk '{print $3}'`
+ do
+   echo $key | base64 -d | sha256sum
+ done
+ 6c3aa525beda9dc83eadfbd7e5ba7d976ecb59575d1633c87cd06ed2ed6e366f  -
+ 12214fd9ea5b408086f98ecccd9958609bd9ac7c0ea316734006bc7818b45dc8  -
+ d36420137bcbd101209ef70c3b15dc07362fbe0fa53c5b135eba6e6afa82f0ce  -
+
+Note that there can be multiple keys present per host, each with
+different key ciphers. Care is needed to pick the key fingerprint
+that matches the cipher QEMU will negotiate with the remote server.
 
 Currently authentication must be done using ssh-agent.  Other
 authentication methods may be supported in future.
-- 
2.31.1




[PATCH 2/3] block: support sha256 fingerprint with pre-blockdev options

2021-11-18 Thread Daniel P . Berrangé
When support for sha256 fingerprint checking was aded in

  commit bf783261f0aee6e81af3916bff7606d71ccdc153
  Author: Daniel P. Berrangé 
  Date:   Tue Jun 22 12:51:56 2021 +0100

block/ssh: add support for sha256 host key fingerprints

it was only made to work with -blockdev. Getting it working with
-drive requires some extra custom parsing.

Signed-off-by: Daniel P. Berrangé 
---
 block/ssh.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/block/ssh.c b/block/ssh.c
index e0fbb4934b..fcc0ab765a 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -556,6 +556,11 @@ static bool ssh_process_legacy_options(QDict *output_opts,
 qdict_put_str(output_opts, "host-key-check.type", "sha1");
 qdict_put_str(output_opts, "host-key-check.hash",
   &host_key_check[5]);
+} else if (strncmp(host_key_check, "sha256:", 7) == 0) {
+qdict_put_str(output_opts, "host-key-check.mode", "hash");
+qdict_put_str(output_opts, "host-key-check.type", "sha256");
+qdict_put_str(output_opts, "host-key-check.hash",
+  &host_key_check[7]);
 } else if (strcmp(host_key_check, "yes") == 0) {
 qdict_put_str(output_opts, "host-key-check.mode", "known_hosts");
 } else {
-- 
2.31.1




[PATCH 3/3] block: print the server key type and fingerprint on failure

2021-11-18 Thread Daniel P . Berrangé
When validating the server key fingerprint fails, it is difficult for
the user to know what they got wrong. The fingerprint accepted by QEMU
is received in a different format than openssh displays. There can also
be keys for multiple different ciphers in known_hosts. It may not be
obvious which cipher QEMU will use and whether it will be the same
as openssh. Address this by printing the server key type and its
corresponding fingerprint in the format QEMU accepts.

Signed-off-by: Daniel P. Berrangé 
---
 block/ssh.c | 37 ++---
 1 file changed, 30 insertions(+), 7 deletions(-)

diff --git a/block/ssh.c b/block/ssh.c
index fcc0ab765a..967a2b971e 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -386,14 +386,28 @@ static int compare_fingerprint(const unsigned char 
*fingerprint, size_t len,
 return *host_key_check - '\0';
 }
 
+static char *format_fingerprint(const unsigned char *fingerprint, size_t len)
+{
+static const char *hex = "0123456789abcdef";
+char *ret = g_new0(char, (len * 2) + 1);
+for (size_t i = 0; i < len; i++) {
+ret[i * 2] = hex[((fingerprint[i] >> 4) & 0xf)];
+ret[(i * 2) + 1] = hex[(fingerprint[i] & 0xf)];
+}
+ret[len * 2] = '\0';
+return ret;
+}
+
 static int
 check_host_key_hash(BDRVSSHState *s, const char *hash,
-enum ssh_publickey_hash_type type, Error **errp)
+enum ssh_publickey_hash_type type, const char *typestr,
+Error **errp)
 {
 int r;
 ssh_key pubkey;
 unsigned char *server_hash;
 size_t server_hash_len;
+const char *keytype;
 
 r = ssh_get_server_publickey(s->session, &pubkey);
 if (r != SSH_OK) {
@@ -401,6 +415,8 @@ check_host_key_hash(BDRVSSHState *s, const char *hash,
 return -EINVAL;
 }
 
+keytype = ssh_key_type_to_char(ssh_key_type(pubkey));
+
 r = ssh_get_publickey_hash(pubkey, type, &server_hash, &server_hash_len);
 ssh_key_free(pubkey);
 if (r != 0) {
@@ -410,12 +426,16 @@ check_host_key_hash(BDRVSSHState *s, const char *hash,
 }
 
 r = compare_fingerprint(server_hash, server_hash_len, hash);
-ssh_clean_pubkey_hash(&server_hash);
 if (r != 0) {
-error_setg(errp, "remote host key does not match host_key_check '%s'",
-   hash);
+g_autofree char *server_fp = format_fingerprint(server_hash,
+server_hash_len);
+error_setg(errp, "remote host %s key fingerprint '%s:%s' "
+   "does not match host_key_check '%s:%s'",
+   keytype, typestr, server_fp, typestr, hash);
+ssh_clean_pubkey_hash(&server_hash);
 return -EPERM;
 }
+ssh_clean_pubkey_hash(&server_hash);
 
 return 0;
 }
@@ -436,13 +456,16 @@ static int check_host_key(BDRVSSHState *s, 
SshHostKeyCheck *hkc, Error **errp)
 case SSH_HOST_KEY_CHECK_MODE_HASH:
 if (hkc->u.hash.type == SSH_HOST_KEY_CHECK_HASH_TYPE_MD5) {
 return check_host_key_hash(s, hkc->u.hash.hash,
-   SSH_PUBLICKEY_HASH_MD5, errp);
+   SSH_PUBLICKEY_HASH_MD5, "md5",
+   errp);
 } else if (hkc->u.hash.type == SSH_HOST_KEY_CHECK_HASH_TYPE_SHA1) {
 return check_host_key_hash(s, hkc->u.hash.hash,
-   SSH_PUBLICKEY_HASH_SHA1, errp);
+   SSH_PUBLICKEY_HASH_SHA1, "sha1",
+   errp);
 } else if (hkc->u.hash.type == SSH_HOST_KEY_CHECK_HASH_TYPE_SHA256) {
 return check_host_key_hash(s, hkc->u.hash.hash,
-   SSH_PUBLICKEY_HASH_SHA256, errp);
+   SSH_PUBLICKEY_HASH_SHA256, "sha256",
+   errp);
 }
 g_assert_not_reached();
 break;
-- 
2.31.1




Re: [PATCH v2 12/13] hw/arm/aspeed: Replace drive_get_next() by drive_get()

2021-11-18 Thread Cédric Le Goater

On 11/17/21 17:34, Markus Armbruster wrote:

drive_get_next() is basically a bad idea.  It returns the "next" block
backend of a certain interface type.  "Next" means bus=0,unit=N, where
subsequent calls count N up from zero, per interface type.

This lets you define unit numbers implicitly by execution order.  If the
order changes, or new calls appear "in the middle", unit numbers change.
ABI break.  Hard to spot in review.

The aspeed machines connects backends with drive_get_next() in several
counting loops, one of them in a helper function, and a conditional.
Change it to use drive_get() directly.  This makes the unit numbers
explicit in the code.


I hope we can introduce some bus id. At least for the SPI controllers.
 

Cc: "Cédric Le Goater" 
Cc: Peter Maydell 
Cc: Andrew Jeffery 
Cc: Joel Stanley 
Cc: qemu-...@nongnu.org
Signed-off-by: Markus Armbruster 
---


Reviewed-by: Cédric Le Goater 

Thanks,

C.


  hw/arm/aspeed.c | 21 +
  1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index a77f46b3ad..cf20ae0db5 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -284,12 +284,13 @@ static void write_boot_rom(DriveInfo *dinfo, hwaddr addr, 
size_t rom_size,
  }
  
  static void aspeed_board_init_flashes(AspeedSMCState *s,

-  const char *flashtype)
+  const char *flashtype,
+  int unit0)
  {
  int i ;
  
  for (i = 0; i < s->num_cs; ++i) {

-DriveInfo *dinfo = drive_get_next(IF_MTD);
+DriveInfo *dinfo = drive_get(IF_MTD, 0, unit0 + i);
  qemu_irq cs_line;
  DeviceState *dev;
  
@@ -382,10 +383,12 @@ static void aspeed_machine_init(MachineState *machine)

"max_ram", max_ram_size  - machine->ram_size);
  memory_region_add_subregion(&bmc->ram_container, machine->ram_size, 
&bmc->max_ram);
  
-aspeed_board_init_flashes(&bmc->soc.fmc, bmc->fmc_model ?

-  bmc->fmc_model : amc->fmc_model);
-aspeed_board_init_flashes(&bmc->soc.spi[0], bmc->spi_model ?
-  bmc->spi_model : amc->spi_model);
+aspeed_board_init_flashes(&bmc->soc.fmc,
+  bmc->fmc_model ? bmc->fmc_model : amc->fmc_model,
+  0);
+aspeed_board_init_flashes(&bmc->soc.spi[0],
+  bmc->spi_model ? bmc->spi_model : amc->spi_model,
+  bmc->soc.fmc.num_cs);
  
  /* Install first FMC flash content as a boot rom. */

  if (drive0) {
@@ -435,11 +438,13 @@ static void aspeed_machine_init(MachineState *machine)
  }
  
  for (i = 0; i < bmc->soc.sdhci.num_slots; i++) {

-sdhci_attach_drive(&bmc->soc.sdhci.slots[i], drive_get_next(IF_SD));
+sdhci_attach_drive(&bmc->soc.sdhci.slots[i],
+   drive_get(IF_SD, 0, i));
  }
  
  if (bmc->soc.emmc.num_slots) {

-sdhci_attach_drive(&bmc->soc.emmc.slots[0], drive_get_next(IF_SD));
+sdhci_attach_drive(&bmc->soc.emmc.slots[0],
+   drive_get(IF_SD, 0, bmc->soc.sdhci.num_slots));
  }
  
  arm_load_kernel(ARM_CPU(first_cpu), machine, &aspeed_board_binfo);







Re: [PULL SUBSYSTEM qemu-pseries] pseries: Update SLOF firmware image

2021-11-18 Thread Cédric Le Goater
 

No need to rebase if Alexey sends his pullreq once v6.2 is out.


Yeah. We might do that Alexey. No hurries.

Thanks,

C.



[PATCH-for-6.2?] docs: Render binary names as monospaced text

2021-11-18 Thread Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé 
---
 docs/about/removed-features.rst|  8 
 docs/devel/build-system.rst|  6 +++---
 docs/devel/multi-process.rst   |  6 +++---
 docs/devel/testing.rst |  8 
 docs/image-fuzzer.txt  |  6 +++---
 docs/system/arm/orangepi.rst   |  2 +-
 docs/system/images.rst |  2 +-
 docs/system/qemu-block-drivers.rst.inc |  6 +++---
 docs/system/tls.rst|  2 +-
 docs/tools/qemu-img.rst| 18 +-
 docs/tools/qemu-nbd.rst|  4 ++--
 docs/tools/qemu-storage-daemon.rst |  7 ---
 docs/tools/virtiofsd.rst   |  4 ++--
 13 files changed, 40 insertions(+), 39 deletions(-)

diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst
index 9d0d90c90d9..c02d1f6d494 100644
--- a/docs/about/removed-features.rst
+++ b/docs/about/removed-features.rst
@@ -658,8 +658,8 @@ enforce that any failure to open the backing image 
(including if the
 backing file is missing or an incorrect format was specified) is an
 error when ``-u`` is not used.
 
-qemu-img amend to adjust backing file (removed in 6.1)
-''
+``qemu-img`` amend to adjust backing file (removed in 6.1)
+''
 
 The use of ``qemu-img amend`` to modify the name or format of a qcow2
 backing image was never fully documented or tested, and interferes
@@ -670,8 +670,8 @@ backing chain should be performed with ``qemu-img rebase 
-u`` either
 before or after the remaining changes being performed by amend, as
 appropriate.
 
-qemu-img backing file without format (removed in 6.1)
-'
+``qemu-img`` backing file without format (removed in 6.1)
+'
 
 The use of ``qemu-img create``, ``qemu-img rebase``, or ``qemu-img
 convert`` to create or modify an image that depends on a backing file
diff --git a/docs/devel/build-system.rst b/docs/devel/build-system.rst
index 7a83f5fc0db..431caba7aa0 100644
--- a/docs/devel/build-system.rst
+++ b/docs/devel/build-system.rst
@@ -121,11 +121,11 @@ process for:
 
 1) executables, which include:
 
-   - Tools - qemu-img, qemu-nbd, qga (guest agent), etc
+   - Tools - ``qemu-img``, ``qemu-nbd``, ``qga`` (guest agent), etc
 
-   - System emulators - qemu-system-$ARCH
+   - System emulators - ``qemu-system-$ARCH``
 
-   - Userspace emulators - qemu-$ARCH
+   - Userspace emulators - ``qemu-$ARCH``
 
- Unit tests
 
diff --git a/docs/devel/multi-process.rst b/docs/devel/multi-process.rst
index e5758a79aba..2c5ec977df8 100644
--- a/docs/devel/multi-process.rst
+++ b/docs/devel/multi-process.rst
@@ -187,9 +187,9 @@ desired, in which the emulation application should only be 
allowed to
 access the files or devices the VM it's running on behalf of can access.
  qemu-io model
 
-Qemu-io is a test harness used to test changes to the QEMU block backend
-object code. (e.g., the code that implements disk images for disk driver
-emulation) Qemu-io is not a device emulation application per se, but it
+``qemu-io`` is a test harness used to test changes to the QEMU block backend
+object code (e.g., the code that implements disk images for disk driver
+emulation). ``qemu-io`` is not a device emulation application per se, but it
 does compile the QEMU block objects into a separate binary from the main
 QEMU one. This could be useful for disk device emulation, since its
 emulation applications will need to include the QEMU block objects.
diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst
index 60c59023e58..755343c7dd0 100644
--- a/docs/devel/testing.rst
+++ b/docs/devel/testing.rst
@@ -564,11 +564,11 @@ exploiting a QEMU security bug to compromise the host.
 QEMU binaries
 ~
 
-By default, qemu-system-x86_64 is searched in $PATH to run the guest. If there
-isn't one, or if it is older than 2.10, the test won't work. In this case,
+By default, ``qemu-system-x86_64`` is searched in $PATH to run the guest. If
+there isn't one, or if it is older than 2.10, the test won't work. In this 
case,
 provide the QEMU binary in env var: ``QEMU=/path/to/qemu-2.10+``.
 
-Likewise the path to qemu-img can be set in QEMU_IMG environment variable.
+Likewise the path to ``qemu-img`` can be set in QEMU_IMG environment variable.
 
 Make jobs
 ~
@@ -650,7 +650,7 @@ supported. To start the fuzzer, run
 
   tests/image-fuzzer/runner.py -c '[["qemu-img", "info", "$test_img"]]' 
/tmp/test qcow2
 
-Alternatively, some command different from "qemu-img info" can be tested, by
+Alternatively, some command different from ``qemu-img info`` can be tested, by
 changing the ``-c`` option.
 
 Integration tests using the Avocado Framework
diff --git a/docs/image-fuzzer.txt b/docs/image-fuzzer.txt
index 3e23ebec331..279cc8c807f 100644
--- a/d

Re: [PATCH 1/3] ppc/pnv: Tune the POWER9 PCIe Host bridge model

2021-11-18 Thread Cédric Le Goater

On 11/16/21 18:01, Frederic Barrat wrote:

The PHB v4 found on POWER9 doesn't request any LSI, so let's clear the
Interrupt Pin register in the config space so that the model matches
the hardware.

If we don't, then we inherit from the default pcie root bridge, which
requests a LSI. And because we don't map it correctly in the device
tree, all PHBs allocate the same bogus hw interrupt. We end up with
inconsistent interrupt controller (xive) data. The problem goes away
if we don't allocate the LSI in the first place.

Signed-off-by: Frederic Barrat 


Reviewed-by: Cédric Le Goater 

Thanks,

C.



---
  hw/pci-host/pnv_phb4.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/pci-host/pnv_phb4.c b/hw/pci-host/pnv_phb4.c
index 5c375a9f28..1659d55b4f 100644
--- a/hw/pci-host/pnv_phb4.c
+++ b/hw/pci-host/pnv_phb4.c
@@ -1234,10 +1234,13 @@ static void pnv_phb4_reset(DeviceState *dev)
  PCIDevice *root_dev = PCI_DEVICE(&phb->root);
  
  /*

- * Configure PCI device id at reset using a property.
+ * Configure the PCI device at reset:
+ *   - set the Vendor and Device ID to for the root bridge
+ *   - no LSI
   */
  pci_config_set_vendor_id(root_dev->config, PCI_VENDOR_ID_IBM);
  pci_config_set_device_id(root_dev->config, phb->device_id);
+pci_config_set_interrupt_pin(root_dev->config, 0);
  }
  
  static const char *pnv_phb4_root_bus_path(PCIHostState *host_bridge,







Re: [PATCH 2/3] pci: Export the pci_intx() function

2021-11-18 Thread Cédric Le Goater

On 11/16/21 18:01, Frederic Barrat wrote:

Move the pci_intx() definition to the PCI header file, so that it can
be called from other PCI files. It is used by the next patch.

Signed-off-by: Frederic Barrat 



Reviewed-by: Cédric Le Goater 

Thanks,

C.



---
  hw/pci/pci.c | 5 -
  include/hw/pci/pci.h | 5 +
  2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index e5993c1ef5..249d7e4cf6 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1497,11 +1497,6 @@ static void pci_irq_handler(void *opaque, int irq_num, 
int level)
  pci_change_irq_level(pci_dev, irq_num, change);
  }
  
-static inline int pci_intx(PCIDevice *pci_dev)

-{
-return pci_get_byte(pci_dev->config + PCI_INTERRUPT_PIN) - 1;
-}
-
  qemu_irq pci_allocate_irq(PCIDevice *pci_dev)
  {
  int intx = pci_intx(pci_dev);
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index e7cdf2d5ec..35f8eb67bd 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -735,6 +735,11 @@ void lsi53c8xx_handle_legacy_cmdline(DeviceState *lsi_dev);
  qemu_irq pci_allocate_irq(PCIDevice *pci_dev);
  void pci_set_irq(PCIDevice *pci_dev, int level);
  
+static inline int pci_intx(PCIDevice *pci_dev)

+{
+return pci_get_byte(pci_dev->config + PCI_INTERRUPT_PIN) - 1;
+}
+
  static inline void pci_irq_assert(PCIDevice *pci_dev)
  {
  pci_set_irq(pci_dev, 1);






Re: [PATCH v2 10/13] hw/arm/xlnx-zcu102: Replace drive_get_next() by drive_get()

2021-11-18 Thread Edgar E. Iglesias
On Wed, Nov 17, 2021 at 05:34:06PM +0100, Markus Armbruster wrote:
> drive_get_next() is basically a bad idea.  It returns the "next" block
> backend of a certain interface type.  "Next" means bus=0,unit=N, where
> subsequent calls count N up from zero, per interface type.
> 
> This lets you define unit numbers implicitly by execution order.  If the
> order changes, or new calls appear "in the middle", unit numbers change.
> ABI break.  Hard to spot in review.
> 
> Machine "xlnx-zcu102" connects backends with drive_get_next() in
> several counting loops.  Change it to use drive_get() directly.  This
> makes the unit numbers explicit in the code.


Acked-by: Edgar E. Iglesias 



> 
> Cc: Alistair Francis 
> Cc: "Edgar E. Iglesias" 
> Cc: Peter Maydell 
> Cc: qemu-...@nongnu.org
> Signed-off-by: Markus Armbruster 
> ---
>  hw/arm/xlnx-zcu102.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/arm/xlnx-zcu102.c b/hw/arm/xlnx-zcu102.c
> index 3dc2b5e8ca..45eb19ab3b 100644
> --- a/hw/arm/xlnx-zcu102.c
> +++ b/hw/arm/xlnx-zcu102.c
> @@ -169,7 +169,7 @@ static void xlnx_zcu102_init(MachineState *machine)
>  /* Create and plug in the SD cards */
>  for (i = 0; i < XLNX_ZYNQMP_NUM_SDHCI; i++) {
>  BusState *bus;
> -DriveInfo *di = drive_get_next(IF_SD);
> +DriveInfo *di = drive_get(IF_SD, 0, i);
>  BlockBackend *blk = di ? blk_by_legacy_dinfo(di) : NULL;
>  DeviceState *carddev;
>  char *bus_name;
> @@ -190,7 +190,7 @@ static void xlnx_zcu102_init(MachineState *machine)
>  BusState *spi_bus;
>  DeviceState *flash_dev;
>  qemu_irq cs_line;
> -DriveInfo *dinfo = drive_get_next(IF_MTD);
> +DriveInfo *dinfo = drive_get(IF_MTD, 0, i);
>  gchar *bus_name = g_strdup_printf("spi%d", i);
>  
>  spi_bus = qdev_get_child_bus(DEVICE(&s->soc), bus_name);
> @@ -212,7 +212,7 @@ static void xlnx_zcu102_init(MachineState *machine)
>  BusState *spi_bus;
>  DeviceState *flash_dev;
>  qemu_irq cs_line;
> -DriveInfo *dinfo = drive_get_next(IF_MTD);
> +DriveInfo *dinfo = drive_get(IF_MTD, 0, XLNX_ZYNQMP_NUM_SPIS + i);
>  int bus = i / XLNX_ZYNQMP_NUM_QSPI_BUS_CS;
>  gchar *bus_name = g_strdup_printf("qspi%d", bus);
>  
> -- 
> 2.31.1
> 



Re: [PATCH v2 11/13] hw/arm/xilinx_zynq: Replace drive_get_next() by drive_get()

2021-11-18 Thread Edgar E. Iglesias
On Wed, Nov 17, 2021 at 05:34:07PM +0100, Markus Armbruster wrote:
> drive_get_next() is basically a bad idea.  It returns the "next" block
> backend of a certain interface type.  "Next" means bus=0,unit=N, where
> subsequent calls count N up from zero, per interface type.
> 
> This lets you define unit numbers implicitly by execution order.  If the
> order changes, or new calls appear "in the middle", unit numbers change.
> ABI break.  Hard to spot in review.
> 
> Machine "xlnx-zcu102" connects backends with drive_get_next() in two
> counting loops, one of them in a helper function.  Change it to use
> drive_get() directly.  This makes the unit numbers explicit in the
> code.


Acked-by: Edgar E. Iglesias 


> 
> Cc: "Edgar E. Iglesias" 
> Cc: Alistair Francis 
> Cc: Peter Maydell 
> Cc: qemu-...@nongnu.org
> Signed-off-by: Markus Armbruster 
> ---
>  hw/arm/xilinx_zynq.c | 16 +---
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
> index 69c333e91b..50e7268396 100644
> --- a/hw/arm/xilinx_zynq.c
> +++ b/hw/arm/xilinx_zynq.c
> @@ -125,9 +125,10 @@ static void gem_init(NICInfo *nd, uint32_t base, 
> qemu_irq irq)
>  sysbus_connect_irq(s, 0, irq);
>  }
>  
> -static inline void zynq_init_spi_flashes(uint32_t base_addr, qemu_irq irq,
> - bool is_qspi)
> +static inline int zynq_init_spi_flashes(uint32_t base_addr, qemu_irq irq,
> +bool is_qspi, int unit0)
>  {
> +int unit = unit0;
>  DeviceState *dev;
>  SysBusDevice *busdev;
>  SSIBus *spi;
> @@ -156,7 +157,7 @@ static inline void zynq_init_spi_flashes(uint32_t 
> base_addr, qemu_irq irq,
>  spi = (SSIBus *)qdev_get_child_bus(dev, bus_name);
>  
>  for (j = 0; j < num_ss; ++j) {
> -DriveInfo *dinfo = drive_get_next(IF_MTD);
> +DriveInfo *dinfo = drive_get(IF_MTD, 0, unit++);
>  flash_dev = qdev_new("n25q128");
>  if (dinfo) {
>  qdev_prop_set_drive_err(flash_dev, "drive",
> @@ -170,6 +171,7 @@ static inline void zynq_init_spi_flashes(uint32_t 
> base_addr, qemu_irq irq,
>  }
>  }
>  
> +return unit;
>  }
>  
>  static void zynq_init(MachineState *machine)
> @@ -247,9 +249,9 @@ static void zynq_init(MachineState *machine)
>  pic[n] = qdev_get_gpio_in(dev, n);
>  }
>  
> -zynq_init_spi_flashes(0xE0006000, pic[58-IRQ_OFFSET], false);
> -zynq_init_spi_flashes(0xE0007000, pic[81-IRQ_OFFSET], false);
> -zynq_init_spi_flashes(0xE000D000, pic[51-IRQ_OFFSET], true);
> +n = zynq_init_spi_flashes(0xE0006000, pic[58 - IRQ_OFFSET], false, 0);
> +n = zynq_init_spi_flashes(0xE0007000, pic[81 - IRQ_OFFSET], false, n);
> +n = zynq_init_spi_flashes(0xE000D000, pic[51 - IRQ_OFFSET], true, n);
>  
>  sysbus_create_simple(TYPE_CHIPIDEA, 0xE0002000, pic[53 - IRQ_OFFSET]);
>  sysbus_create_simple(TYPE_CHIPIDEA, 0xE0003000, pic[76 - IRQ_OFFSET]);
> @@ -298,7 +300,7 @@ static void zynq_init(MachineState *machine)
>  sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, hci_addr);
>  sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[hci_irq - 
> IRQ_OFFSET]);
>  
> -di = drive_get_next(IF_SD);
> +di = drive_get(IF_SD, 0, n);
>  blk = di ? blk_by_legacy_dinfo(di) : NULL;
>  carddev = qdev_new(TYPE_SD_CARD);
>  qdev_prop_set_drive_err(carddev, "drive", blk, &error_fatal);
> -- 
> 2.31.1
> 



Re: [PATCH 3/3] pcie_aer: Don't trigger a LSI if none are defined

2021-11-18 Thread Cédric Le Goater

On 11/16/21 18:01, Frederic Barrat wrote:

Skip triggering an LSI when the AER root error status is updated if no
LSI is defined for the device. We can have a root bridge with no LSI,
MSI and MSI-X defined, for example on POWER systems.

Signed-off-by: Frederic Barrat 
---



Reviewed-by: Cédric Le Goater 

Thanks,

C.




  hw/pci/pcie_aer.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c
index 27f9cc56af..e1a8a88c8c 100644
--- a/hw/pci/pcie_aer.c
+++ b/hw/pci/pcie_aer.c
@@ -774,7 +774,9 @@ void pcie_aer_root_write_config(PCIDevice *dev,
  uint32_t root_cmd = pci_get_long(aer_cap + PCI_ERR_ROOT_COMMAND);
  /* 6.2.4.1.2 Interrupt Generation */
  if (!msix_enabled(dev) && !msi_enabled(dev)) {
-pci_set_irq(dev, !!(root_cmd & enabled_cmd));
+if (pci_intx(dev) != -1) {
+pci_set_irq(dev, !!(root_cmd & enabled_cmd));
+}
  return;
  }
  






Re: [PATCH] block/vvfat.c fix leak when failure occurs

2021-11-18 Thread Hanna Reitz

On 18.11.21 10:33, Daniella Lee wrote:

Thanks for your reply and your suggestion is useful.
This is my first submission, and I will pay attention to these issues 
in the future.

There are many other places you mentioned need to be modified,
do I need to resubmit the patch, or you want to modify them with other 
codes?


Hi,

Yes, you should send a v2 that addresses the issues.

As for the suggestions that concern places outside of `vvfat_open()`:  
Most of them need not be your concern if you don’t want them to be, but 
we definitely need to have `s->used_clusters` be allocated with 
`g_malloc0()`, or we can’t free it with `g_free()`.  (We could free it 
with `free()`, but that would be a suboptimal solution...)  So that 
allocation line in `enable_write_target()` should be fixed in v2.  The 
best way to do that would be to do it in an own patch (i.e. patch 1 
changes that allocation, and patch 2 is this patch), but I wouldn’t mind 
it too much if you do both changes in a single patch.


Regarding the other suggestions for other places: It would be nice to 
drop the clean-up path in `enable_write_target()`, but isn’t necessary.  
If you want to do it, you can do it as part of this patch, or on top in 
another patch, but you can also choose just to ignore that bit.  (And 
maybe then I’ll send a patch later.)


The fact that we’re leaking two of these buffers in `vvfat_close()` is 
definitely unrelated to this patch, so that’s also nothing you have to 
care about if you don’t want to.


I hope that made it clear...?  Don’t hesitate to ask more if it didn’t 
(or for any other questions you might have).


Hanna


On Thu, Nov 18, 2021 at 4:34 PM Hanna Reitz  wrote:

On 16.11.21 13:57, Daniella Lee wrote:
> Function vvfat_open called function enable_write_target and
init_directories,
> and these functions malloc new memory for
BDRVVVFATState::qcow_filename,
> BDRVVVFATState::used_clusters, and BDRVVVFATState::cluster_buff.
>
> When the specified folder does not exist ,it may contains memory
leak.
> After init_directories function is executed, the vvfat_open
return -EIO,
> and bdrv_open_driver goto label open_failed,
> the program use g_free(bs->opaque) to release BDRVVVFATState struct
> without members mentioned.
>
> command line:
> qemu-system-x86_64 -hdb   -usb -device
usb-storage,drive=fat16
> -drive file=fat:rw:fat-type=16:"",
> id=fat16,format=raw,if=none
>
> enable_write_target called:
> (gdb) bt
> #0  enable_write_target (bs=0x56f9f000, errp=0x7fffd780)
>      at ../block/vvfat.c:3114
> #1  vvfat_open (bs=0x56f9f000, options=0x56fa45d0,
>      flags=155650, errp=0x7fffd780) at ../block/vvfat.c:1236
> #2  bdrv_open_driver (bs=0x56f9f000, drv=0x56c47920
,
>      node_name=0x0, options=0x56fa45d0, open_flags=155650,
>      errp=0x7fffd890) at ../block.c:1558
> #3  bdrv_open_common (bs=0x56f9f000, file=0x0,
options=0x56fa45d0,
>      errp=0x7fffd890) at ../block.c:1852
> #4  bdrv_open_inherit (filename=0x56f73310 "fat:rw:",
>      reference=0x0, options=0x56fa45d0, flags=40962,
parent=0x56f98cd0,
>      child_class=0x56b1d6a0 , child_role=19,
>      errp=0x7fffda90) at ../block.c:3779
> #5  bdrv_open_child_bs (filename=0x56f73310 "fat:rw:",
>      options=0x56f9cfc0, bdref_key=0x56239bb8 "file",
>      parent=0x56f98cd0, child_class=0x56b1d6a0
,
>      child_role=19, allow_none=true, errp=0x7fffda90) at
../block.c:3419
> #6  bdrv_open_inherit (filename=0x56f73310 "fat:rw:",
>      reference=0x0, options=0x56f9cfc0, flags=8194, parent=0x0,
>      child_class=0x0, child_role=0, errp=0x56c98c40
)
>      at ../block.c:3726
> #7  bdrv_open (filename=0x56f73310 "fat:rw:",
reference=0x0,
>      options=0x56f757b0, flags=0, errp=0x56c98c40
)
>      at ../block.c:3872
> #8  blk_new_open (filename=0x56f73310 "fat:rw:",
reference=0x0,
>      options=0x56f757b0, flags=0, errp=0x56c98c40
)
>      at ../block/block-backend.c:436
> #9  blockdev_init (file=0x56f73310 "fat:rw:",
>      bs_opts=0x56f757b0, errp=0x56c98c40 )
>      at ../blockdev.c:608
> #10 drive_new (all_opts=0x56d2b700, block_default_type=IF_IDE,
>      errp=0x56c98c40 ) at ../blockdev.c:992
> ..
>
> Signed-off-by: Daniella Lee 
> ---
>   block/vvfat.c | 15 +++
>   1 file changed, 15 insertions(+)

Hi,

Thanks for your patch!  Yes, that makes sense.

I believe there are some issues that should be addressed, though:

> diff --git a/block/vvfat.c b/block/vvfat.c
> index 05e78e3c27..454a74c5d5 100644
> --- a/block/vvfat.c
> +++ b/block/vvfat.c
> @@ -1280,7 +1280,22 @@ static int vvfat_op

Re: [PATCH v2 09/13] hw/microblaze: Replace drive_get_next() by drive_get()

2021-11-18 Thread Edgar E. Iglesias
On Wed, Nov 17, 2021 at 05:34:05PM +0100, Markus Armbruster wrote:
> drive_get_next() is basically a bad idea.  It returns the "next" block
> backend of a certain interface type.  "Next" means bus=0,unit=N, where
> subsequent calls count N up from zero, per interface type.
> 
> This lets you define unit numbers implicitly by execution order.  If the
> order changes, or new calls appear "in the middle", unit numbers change.
> ABI break.  Hard to spot in review.
> 
> Machine "petalogix-ml605" connects backends with drive_get_next() in a
> counting loop.  Change it to use drive_get() directly.  This makes the
> unit numbers explicit in the code.


Acked-by: Edgar E. Iglesias 


> 
> Cc: "Edgar E. Iglesias" 
> Signed-off-by: Markus Armbruster 
> ---
>  hw/microblaze/petalogix_ml605_mmu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/microblaze/petalogix_ml605_mmu.c 
> b/hw/microblaze/petalogix_ml605_mmu.c
> index 159db6cbe2..a24fadddca 100644
> --- a/hw/microblaze/petalogix_ml605_mmu.c
> +++ b/hw/microblaze/petalogix_ml605_mmu.c
> @@ -183,7 +183,7 @@ petalogix_ml605_init(MachineState *machine)
>  spi = (SSIBus *)qdev_get_child_bus(dev, "spi");
>  
>  for (i = 0; i < NUM_SPI_FLASHES; i++) {
> -DriveInfo *dinfo = drive_get_next(IF_MTD);
> +DriveInfo *dinfo = drive_get(IF_MTD, 0, i);
>  qemu_irq cs_line;
>  
>  dev = qdev_new("n25q128");
> -- 
> 2.31.1
> 



Re: [PATCH v2 08/13] hw/arm/xlnx-versal-virt: Replace drive_get_next() by drive_get()

2021-11-18 Thread Edgar E. Iglesias
On Wed, Nov 17, 2021 at 05:34:04PM +0100, Markus Armbruster wrote:
> drive_get_next() is basically a bad idea.  It returns the "next" block
> backend of a certain interface type.  "Next" means bus=0,unit=N, where
> subsequent calls count N up from zero, per interface type.
> 
> This lets you define unit numbers implicitly by execution order.  If the
> order changes, or new calls appear "in the middle", unit numbers change.
> ABI break.  Hard to spot in review.
> 
> Machine "xlnx-versal-virt" connects backends with drive_get_next() in
> a counting loop.  Change it to use drive_get() directly.  This makes
> the unit numbers explicit in the code.

Acked-by: Edgar E. Iglesias 


> 
> Cc: Alistair Francis 
> Cc: "Edgar E. Iglesias" 
> Cc: Peter Maydell 
> Cc: qemu-...@nongnu.org
> Signed-off-by: Markus Armbruster 
> ---
>  hw/arm/xlnx-versal-virt.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
> index d2f55e29b6..0c5edc898e 100644
> --- a/hw/arm/xlnx-versal-virt.c
> +++ b/hw/arm/xlnx-versal-virt.c
> @@ -669,7 +669,8 @@ static void versal_virt_init(MachineState *machine)
>  
>  /* Plugin SD cards.  */
>  for (i = 0; i < ARRAY_SIZE(s->soc.pmc.iou.sd); i++) {
> -sd_plugin_card(&s->soc.pmc.iou.sd[i], drive_get_next(IF_SD));
> +sd_plugin_card(&s->soc.pmc.iou.sd[i],
> +   drive_get(IF_SD, 0, i));
>  }
>  
>  s->binfo.ram_size = machine->ram_size;
> -- 
> 2.31.1
> 



[PATCH-for-6.2? v2 1/5] docs/devel/style: Render C types as monospaced text

2021-11-18 Thread Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé 
---
 docs/devel/style.rst | 59 ++--
 1 file changed, 30 insertions(+), 29 deletions(-)

diff --git a/docs/devel/style.rst b/docs/devel/style.rst
index e00af62e763..3e519dc6ade 100644
--- a/docs/devel/style.rst
+++ b/docs/devel/style.rst
@@ -111,7 +111,7 @@ Variables are lower_case_with_underscores; easy to type and 
read.  Structured
 type names are in CamelCase; harder to type but standing out.  Enum type
 names and function type names should also be in CamelCase.  Scalar type
 names are lower_case_with_underscores_ending_with_a_t, like the POSIX
-uint64_t and family.  Note that this last convention contradicts POSIX
+``uint64_t`` and family.  Note that this last convention contradicts POSIX
 and is therefore likely to be changed.
 
 Variable Naming Conventions
@@ -290,57 +290,57 @@ a few useful guidelines here.
 Scalars
 ---
 
-If you're using "int" or "long", odds are good that there's a better type.
-If a variable is counting something, it should be declared with an
-unsigned type.
+If you're using '``int``' or '``long``', odds are good that there's a better
+type.  If a variable is counting something, it should be declared with an
+*unsigned* type.
 
-If it's host memory-size related, size_t should be a good choice (use
-ssize_t only if required). Guest RAM memory offsets must use ram_addr_t,
+If it's host memory-size related, ``size_t`` should be a good choice (use
+``ssize_t`` only if required). Guest RAM memory offsets must use 
``ram_addr_t``,
 but only for RAM, it may not cover whole guest address space.
 
-If it's file-size related, use off_t.
-If it's file-offset related (i.e., signed), use off_t.
-If it's just counting small numbers use "unsigned int";
+If it's file-size related, use ``off_t``.
+If it's file-offset related (i.e., signed), use ``off_t``.
+If it's just counting small numbers use '``unsigned int``';
 (on all but oddball embedded systems, you can assume that that
 type is at least four bytes wide).
 
 In the event that you require a specific width, use a standard type
-like int32_t, uint32_t, uint64_t, etc.  The specific types are
+like ``int32_t``, ``uint32_t``, ``uint64_t``, etc.  The specific types are
 mandatory for VMState fields.
 
-Don't use Linux kernel internal types like u32, __u32 or __le32.
+Don't use Linux kernel internal types like ``u32``, ``__u32`` or ``__le32``.
 
-Use hwaddr for guest physical addresses except pcibus_t
-for PCI addresses.  In addition, ram_addr_t is a QEMU internal address
+Use ``hwaddr`` for guest physical addresses except ``pcibus_t``
+for PCI addresses.  In addition, ``ram_addr_t`` is a QEMU internal address
 space that maps guest RAM physical addresses into an intermediate
 address space that can map to host virtual address spaces.  Generally
-speaking, the size of guest memory can always fit into ram_addr_t but
+speaking, the size of guest memory can always fit into ``ram_addr_t`` but
 it would not be correct to store an actual guest physical address in a
-ram_addr_t.
+``ram_addr_t``.
 
 For CPU virtual addresses there are several possible types.
-vaddr is the best type to use to hold a CPU virtual address in
+``vaddr`` is the best type to use to hold a CPU virtual address in
 target-independent code. It is guaranteed to be large enough to hold a
 virtual address for any target, and it does not change size from target
 to target. It is always unsigned.
-target_ulong is a type the size of a virtual address on the CPU; this means
+``target_ulong`` is a type the size of a virtual address on the CPU; this means
 it may be 32 or 64 bits depending on which target is being built. It should
 therefore be used only in target-specific code, and in some
 performance-critical built-per-target core code such as the TLB code.
-There is also a signed version, target_long.
-abi_ulong is for the ``*``-user targets, and represents a type the size of
-'void ``*``' in that target's ABI. (This may not be the same as the size of a
+There is also a signed version, ``target_long``.
+``abi_ulong`` is for the ``*-user`` targets, and represents a type the size of
+'``void *``' in that target's ABI. (This may not be the same as the size of a
 full CPU virtual address in the case of target ABIs which use 32 bit pointers
-on 64 bit CPUs, like sparc32plus.) Definitions of structures that must match
+on 64 bit CPUs, like *sparc32plus*.) Definitions of structures that must match
 the target's ABI must use this type for anything that on the target is defined
-to be an 'unsigned long' or a pointer type.
-There is also a signed version, abi_long.
+to be an '``unsigned long``' or a pointer type.
+There is also a signed version, ``abi_long``.
 
 Of course, take all of the above with a grain of salt.  If you're about
-to use some system interface that requires a type like size_t, pid_t or
-off_t, use matching types for any corresponding variables.
+to use some system interface that requires a type 

[PATCH-for-6.2? v2 4/5] docs/devel/style: Render C function names as monospaced text

2021-11-18 Thread Philippe Mathieu-Daudé
Add trailing parenthesis to functions and render
them as monospaced text.

Signed-off-by: Philippe Mathieu-Daudé 
---
 docs/devel/style.rst | 66 +++-
 1 file changed, 34 insertions(+), 32 deletions(-)

diff --git a/docs/devel/style.rst b/docs/devel/style.rst
index a7487d867e6..0397971e528 100644
--- a/docs/devel/style.rst
+++ b/docs/devel/style.rst
@@ -130,13 +130,13 @@ Function Naming Conventions
 
 Wrapped version of standard library or GLib functions use a ``qemu_``
 prefix to alert readers that they are seeing a wrapped version, for
-example ``qemu_strtol`` or ``qemu_mutex_lock``.  Other utility functions
+example ``qemu_strtol()`` or ``qemu_mutex_lock()``.  Other utility functions
 that are widely called from across the codebase should not have any
-prefix, for example ``pstrcpy`` or bit manipulation functions such as
-``find_first_bit``.
+prefix, for example ``pstrcpy()`` or bit manipulation functions such as
+``find_first_bit()``.
 
 The ``qemu_`` prefix is also used for functions that modify global
-emulator state, for example ``qemu_add_vm_change_state_handler``.
+emulator state, for example ``qemu_add_vm_change_state_handler()``.
 However, if there is an obvious subsystem-specific prefix it should be
 used instead.
 
@@ -385,15 +385,16 @@ avoided.
 Low level memory management
 ===
 
-Use of the ``malloc/free/realloc/calloc/valloc/memalign/posix_memalign``
+Use of the
+``malloc()/free()/realloc()/calloc()/valloc()/memalign()/posix_memalign()``
 APIs is not allowed in the QEMU codebase. Instead of these routines,
 use the GLib memory allocation routines
-``g_malloc/g_malloc0/g_new/g_new0/g_realloc/g_free``
-or QEMU's ``qemu_memalign/qemu_blockalign/qemu_vfree`` APIs.
+``g_malloc()/g_malloc0()/g_new()/g_new0()/g_realloc()/g_free()``
+or QEMU's ``qemu_memalign()/qemu_blockalign()/qemu_vfree()`` APIs.
 
-Please note that ``g_malloc`` will exit on allocation failure, so
+Please note that ``g_malloc()`` will exit on allocation failure, so
 there is no need to test for failure (as you would have to with
-``malloc``). Generally using ``g_malloc`` on start-up is fine as the
+``malloc()``). Generally using ``g_malloc()`` on start-up is fine as the
 result of a failure to allocate memory is going to be a fatal exit
 anyway. There may be some start-up cases where failing is unreasonable
 (for example speculatively loading a large debug symbol table).
@@ -401,11 +402,11 @@ anyway. There may be some start-up cases where failing is 
unreasonable
 Care should be taken to avoid introducing places where the guest could
 trigger an exit by causing a large allocation. For small allocations,
 of the order of 4k, a failure to allocate is likely indicative of an
-overloaded host and allowing ``g_malloc`` to ``exit`` is a reasonable
+overloaded host and allowing ``g_malloc()`` to ``exit()`` is a reasonable
 approach. However for larger allocations where we could realistically
 fall-back to a smaller one if need be we should use functions like
-``g_try_new`` and check the result. For example this is valid approach
-for a time/space trade-off like ``tlb_mmu_resize_locked`` in the
+``g_try_new()`` and check the result. For example this is valid approach
+for a time/space trade-off like ``tlb_mmu_resize_locked()`` in the
 SoftMMU TLB code.
 
 If the lifetime of the allocation is within the function and there are
@@ -413,7 +414,7 @@ multiple exist paths you can also improve the readability 
of the code
 by using ``g_autofree`` and related annotations. See :ref:`autofree-ref`
 for more details.
 
-Calling ``g_malloc`` with a zero size is valid and will return NULL.
+Calling ``g_malloc()`` with a zero size is valid and will return ``NULL``.
 
 Prefer ``g_new(T, n)`` instead of ``g_malloc(sizeof(T) * n)`` for the following
 reasons:
@@ -430,14 +431,15 @@ Declarations like
 
 are acceptable, though.
 
-Memory allocated by ``qemu_memalign`` or ``qemu_blockalign`` must be freed with
-``qemu_vfree``, since breaking this will cause problems on Win32.
+Memory allocated by ``qemu_memalign()`` or ``qemu_blockalign()`` must be freed
+with ``qemu_vfree()``, since breaking this will cause problems on Win32.
 
 String manipulation
 ===
 
-Do not use the strncpy function.  As mentioned in the man page, it does *not*
-guarantee a NULL-terminated buffer, which makes it extremely dangerous to use.
+Do not use the ``strncpy()`` function.  As mentioned in the man page, it does
+*not* guarantee a ``NULL``-terminated buffer, which makes it extremely
+dangerous to use.
 It also zeros trailing destination bytes out to the specified length.  Instead,
 use this similar function when possible, but note its different signature:
 
@@ -445,14 +447,14 @@ use this similar function when possible, but note its 
different signature:
 
 void pstrcpy(char *dest, int dest_buf_size, const char *src)
 
-Don't use strcat because it can't check for buffer overflows, but:
+Don't use ``s

[PATCH-for-6.2? v2 0/5] docs/devel/style: Improve rST rendering

2021-11-18 Thread Philippe Mathieu-Daudé
Various changes in docs/devel/style.rst to improve its
rST rendering (around C types/qualifiers/functions).

Since v1:
- Addressed Darren Kenny comments on function names

Based-on: <2028144317.4106651-1-phi...@redhat.com>

Philippe Mathieu-Daudé (5):
  docs/devel/style: Render C types as monospaced text
  docs/devel/style: Improve Error** functions rST rendering
  docs/devel/style: Improve string format rST rendering
  docs/devel/style: Render C function names as monospaced text
  docs/devel/style: Misc rST rendering improvements

 docs/devel/style.rst | 222 ++-
 1 file changed, 113 insertions(+), 109 deletions(-)

-- 
2.31.1





[PATCH-for-6.2? v2 2/5] docs/devel/style: Improve Error** functions rST rendering

2021-11-18 Thread Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé 
---
 docs/devel/style.rst | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/docs/devel/style.rst b/docs/devel/style.rst
index 3e519dc6ade..1a23021bc3e 100644
--- a/docs/devel/style.rst
+++ b/docs/devel/style.rst
@@ -602,16 +602,16 @@ Error handling and reporting
 Reporting errors to the human user
 --
 
-Do not use printf(), fprintf() or monitor_printf().  Instead, use
-error_report() or error_vreport() from error-report.h.  This ensures the
-error is reported in the right place (current monitor or stderr), and in
-a uniform format.
+Do not use ``printf()``, ``fprintf()`` or ``monitor_printf()``.  Instead, use
+``error_report()`` or ``error_vreport()`` from error-report.h.  This ensures
+the error is reported in the right place (current monitor or ``stderr``), and
+in a uniform format.
 
-Use error_printf() & friends to print additional information.
+Use ``error_printf()`` & friends to print additional information.
 
-error_report() prints the current location.  In certain common cases
+``error_report()`` prints the current location.  In certain common cases
 like command line parsing, the current location is tracked
-automatically.  To manipulate it manually, use the loc_``*``() from
+automatically.  To manipulate it manually, use the ``loc_*()`` from
 error-report.h.
 
 Propagating errors
@@ -621,7 +621,7 @@ An error can't always be reported to the user right where 
it's detected,
 but often needs to be propagated up the call chain to a place that can
 handle it.  This can be done in various ways.
 
-The most flexible one is Error objects.  See error.h for usage
+The most flexible one is ``Error`` objects.  See error.h for usage
 information.
 
 Use the simplest suitable method to communicate success / failure to
@@ -631,10 +631,10 @@ error, non-negative / -errno, non-null / null, or Error 
objects.
 Example: when a function returns a non-null pointer on success, and it
 can fail only in one way (as far as the caller is concerned), returning
 null on failure is just fine, and certainly simpler and a lot easier on
-the eyes than propagating an Error object through an Error ``**`` 
parameter.
+the eyes than propagating an Error object through an ``Error **`` parameter.
 
 Example: when a function's callers need to report details on failure
-only the function really knows, use Error ``**``, and set suitable errors.
+only the function really knows, use ``Error **``, and set suitable errors.
 
 Do not report an error to the user when you're also returning an error
 for somebody else to handle.  Leave the reporting to the place that
@@ -643,17 +643,17 @@ consumes the error returned.
 Handling errors
 ---
 
-Calling exit() is fine when handling configuration errors during
+Calling ``exit()`` is fine when handling configuration errors during
 startup.  It's problematic during normal operation.  In particular,
-monitor commands should never exit().
+monitor commands should never ``exit()``.
 
-Do not call exit() or abort() to handle an error that can be triggered
+Do not call ``exit()`` or ``abort()`` to handle an error that can be triggered
 by the guest (e.g., some unimplemented corner case in guest code
 translation or device emulation).  Guests should not be able to
 terminate QEMU.
 
-Note that &error_fatal is just another way to exit(1), and &error_abort
-is just another way to abort().
+Note that ``&error_fatal`` is just another way to ``exit(1)``, and
+``&error_abort`` is just another way to ``abort()``.
 
 
 trace-events style
-- 
2.31.1




[PATCH-for-6.2? v2 5/5] docs/devel/style: Misc rST rendering improvements

2021-11-18 Thread Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé 
---
 docs/devel/style.rst | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/docs/devel/style.rst b/docs/devel/style.rst
index 0397971e528..1db50b70544 100644
--- a/docs/devel/style.rst
+++ b/docs/devel/style.rst
@@ -4,7 +4,7 @@ QEMU Coding Style
 
 .. contents:: Table of Contents
 
-Please use the script checkpatch.pl in the scripts directory to check
+Please use the script ``checkpatch.pl`` in the scripts directory to check
 patches before submitting.
 
 Formatting and style
@@ -195,9 +195,9 @@ blocks) are generally not allowed; declarations should be 
at the beginning
 of blocks.
 
 Every now and then, an exception is made for declarations inside a
-#ifdef or #ifndef block: if the code looks nicer, such declarations can
+``#ifdef`` or ``#ifndef`` block: if the code looks nicer, such declarations can
 be placed at the top of the block even if there are statements above.
-On the other hand, however, it's often best to move that #ifdef/#ifndef
+On the other hand, however, it's often best to move that ``#ifdef/#ifndef``
 block to a separate function altogether.
 
 Conditional statements
@@ -220,13 +220,13 @@ even when the constant is on the right.
 Comment style
 =
 
-We use traditional C-style /``*`` ``*``/ comments and avoid // comments.
+We use traditional C-style ``/* */`` comments and avoid ``//`` comments.
 
-Rationale: The // form is valid in C99, so this is purely a matter of
+Rationale: The ``//`` form is valid in C99, so this is purely a matter of
 consistency of style. The checkpatch script will warn you about this.
 
 Multiline comment blocks should have a row of stars on the left,
-and the initial /``*`` and terminating ``*``/ both on their own lines:
+and the initial ``/*`` and terminating ``*/`` both on their own lines:
 
 .. code-block:: c
 
@@ -274,11 +274,11 @@ Order include directives as follows:
 #include "..."   /* and finally QEMU headers. */
 
 The "qemu/osdep.h" header contains preprocessor macros that affect the behavior
-of core system headers like .  It must be the first include so that
-core system headers included by external libraries get the preprocessor macros
-that QEMU depends on.
+of core system headers like .  It must be the first include so
+that core system headers included by external libraries get the preprocessor
+macros that QEMU depends on.
 
-Do not include "qemu/osdep.h" from header files since the .c file will have
+Do not include ``"qemu/osdep.h"`` from header files since the .c file will have
 already included it.
 
 C types
@@ -352,9 +352,9 @@ casts, then reconsider or ask for help.
 Pointers
 
 
-Ensure that all of your pointers are "const-correct".
+Ensure that all of your pointers are "``const``-correct".
 Unless a pointer is used to modify the pointed-to storage,
-give it the "const" attribute.  That way, the reader knows
+give it the '``const``' attribute.  That way, the reader knows
 up-front that this is a read-only pointer.  Perhaps more
 importantly, if we're diligent about this, when you see a non-const
 pointer, you're guaranteed that it is used to modify the storage
@@ -505,7 +505,7 @@ painful. These are:
   the sign bit (ie it is an arithmetic shift, not a logical shift)
 
 In addition, QEMU assumes that the compiler does not use the latitude
-given in C99 and C11 to treat aspects of signed '<<' as undefined, as
+given in C99 and C11 to treat aspects of signed '``<<``' as undefined, as
 documented in the GNU Compiler Collection manual starting at version 4.0.
 
 .. _autofree-ref:
-- 
2.31.1




[PATCH-for-6.2? v2 3/5] docs/devel/style: Improve string format rST rendering

2021-11-18 Thread Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé 
---
 docs/devel/style.rst | 41 +
 1 file changed, 21 insertions(+), 20 deletions(-)

diff --git a/docs/devel/style.rst b/docs/devel/style.rst
index 1a23021bc3e..a7487d867e6 100644
--- a/docs/devel/style.rst
+++ b/docs/devel/style.rst
@@ -471,11 +471,11 @@ instead of plain strdup/strndup.
 Printf-style functions
 ==
 
-Whenever you add a new printf-style function, i.e., one with a format
-string argument and following "..." in its prototype, be sure to use
-gcc's printf attribute directive in the prototype.
+Whenever you add a new ``printf()``-style function, i.e., one with a format
+string argument and following '``...``' in its prototype, be sure to use
+gcc's ``printf()`` attribute directive in the prototype.
 
-This makes it so gcc's -Wformat and -Wformat-security options can do
+This makes it so gcc's ``-Wformat`` and ``-Wformat-security`` options can do
 their jobs and cross-check format strings with the number and types
 of arguments.
 
@@ -659,10 +659,10 @@ Note that ``&error_fatal`` is just another way to 
``exit(1)``, and
 trace-events style
 ==
 
-0x prefix
--
+``0x`` prefix
+-
 
-In trace-events files, use a '0x' prefix to specify hex numbers, as in:
+In trace-events files, use a '``0x``' prefix to specify hex numbers, as in:
 
 .. code-block:: c
 
@@ -676,27 +676,28 @@ PCI bus id):
 
 another_trace(int cssid, int ssid, int dev_num) "bus id: %x.%x.%04x"
 
-However, you can use '0x' for such groups if you want. Anyway, be sure that
+However, you can use '``0x``' for such groups if you want. Anyway, be sure that
 it is obvious that numbers are in hex, ex.:
 
 .. code-block:: c
 
 data_dump(uint8_t c1, uint8_t c2, uint8_t c3) "bytes (in hex): %02x %02x 
%02x"
 
-Rationale: hex numbers are hard to read in logs when there is no 0x prefix,
-especially when (occasionally) the representation doesn't contain any letters
-and especially in one line with other decimal numbers. Number groups are 
allowed
-to not use '0x' because for some things notations like %x.%x.%x are used not
-only in QEMU. Also dumping raw data bytes with '0x' is less readable.
+Rationale: hex numbers are hard to read in logs when there is no '``0x``'
+prefix, especially when (occasionally) the representation doesn't contain any
+letters and especially in one line with other decimal numbers. Number groups
+are allowed to not use '``0x``' because for some things notations like
+'``%x.%x.%x``' are used not only in QEMU. Also dumping raw data bytes with
+'``0x``' is less readable.
 
-'#' printf flag

+'``#``' printf flag
+---
 
-Do not use printf flag '#', like '%#x'.
+Do not use printf flag '``#``', like '``%#x``'.
 
-Rationale: there are two ways to add a '0x' prefix to printed number: '0x%...'
-and '%#...'. For consistency the only one way should be used. Arguments for
-'0x%' are:
+Rationale: there are two ways to add a '``0x``' prefix to printed number:
+'``0x%...``' and '``%#...``'. For consistency the only one way should be used.
+Arguments for '``0x%``' are:
 
 * it is more popular
-* '%#' omits the 0x for the value 0 which makes output inconsistent
+* '``%#``' omits the ``0x`` for the value ``0`` which makes output inconsistent
-- 
2.31.1




Re: [PATCH-for-6.2 1/2] hw/misc/bcm2835_property: Fix framebuffer with recent RPi kernels

2021-11-18 Thread Philippe Mathieu-Daudé
On 10/17/21 09:48, Benjamin Herrenschmidt wrote:
> The framebuffer driver fails to initialize with recent Raspberry Pi
> kernels, such as the ones shipped in the current RaspiOS images
> (with the out of tree bcm2708_fb.c driver)
> 
> The reason is that this driver uses a new firmware call to query the
> number of displays, and the fallback when this call fails is broken.
> 
> So implement the call and claim we have exactly one display
> 
> Signed-off-by: Benjamin Herrenschmidt 
> ---
>  hw/misc/bcm2835_property.c | 4 
>  1 file changed, 4 insertions(+)

Reviewed-by: Philippe Mathieu-Daudé 



Re: [PATCH v4 08/25] block: introduce assert_bdrv_graph_writable

2021-11-18 Thread Hanna Reitz

On 18.11.21 10:55, Emanuele Giuseppe Esposito wrote:


On 12/11/2021 15:40, Hanna Reitz wrote:

On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:

We want to be sure that the functions that write the child and
parent list of a bs are under BQL and drain.

BQL prevents from concurrent writings from the GS API, while
drains protect from I/O.

TODO: drains are missing in some functions using this assert.
Therefore a proper assertion will fail. Because adding drains
requires additional discussions, they will be added in future
series.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Stefan Hajnoczi 
---
  block.c    |  5 +
  block/io.c | 11 +++
  include/block/block_int-global-state.h | 10 +-
  3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 41c5883c5c..94bff5c757 100644
--- a/block.c
+++ b/block.c
@@ -2734,12 +2734,14 @@ static void 
bdrv_replace_child_noperm(BdrvChild *child,

  if (child->klass->detach) {
  child->klass->detach(child);
  }
+    assert_bdrv_graph_writable(old_bs);
  QLIST_REMOVE(child, next_parent);


I think this belongs above the .detach() call (and the QLIST_REMOVE() 
belongs into the .detach() implementation, as done in 
https://lists.nongnu.org/archive/html/qemu-block/2021-11/msg00240.html, 
which has been merged to Kevin’s block branch).


Yes, I rebased on kwolf/block branch. Thank you for pointing that out.



  }
  child->bs = new_bs;
  if (new_bs) {
+    assert_bdrv_graph_writable(new_bs);
  QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);


In both these places it’s a bit strange that the assertion is done on 
the child nodes.  The subgraph starting from them isn’t modified 
after all, so their subgraph technically doesn’t need to be 
writable.  I think a single assertion on the parent node would be 
preferable.


I presume the problem with that is that we don’t have the parent node 
here?  Do we need a new BdrvChildClass method that performs this 
assertion on the parent node?




Uhm I am not sure what you mean here.

Just to recap on how I see this: the assertion 
assert_bdrv_graph_writable(bs) is basically used to make sure we are 
protecting the write on some fields (childrens and parents lists in 
this patch) of a given @bs.


Oh, OK.  I understood it to mean that the subgraph starting at `bs` is 
mutable, i.e. including all of its recursive children.


And yes, you’re right, the child BDSs are indeed modified, too, so we 
should check them, too.


It should work like a rwlock: reading is allowed to be concurrent, but 
a write should stop all readers to prevent concurrency issues. This is 
achieved by draining.


Draining works on a subgraph, so I suppose it does mean that the whole 
subgraph will be mutable.  But no matter, at least new_bs is not in the 
parent’s subgraph, so it wouldn’t be included in the check if we only 
checked the parent.


Let's use the first case that you point out, old_bs (it's specular for 
new_bs):


>> +    assert_bdrv_graph_writable(old_bs);
>>   QLIST_REMOVE(child, next_parent);

So old_bs should be the child "son" (child->bs), meaning 
old_bs->parents contains the child. Therefore when a child is removed 
by old_bs, we need to be sure we are doing it safely.


So we should check that if old_bs exists, old_bs should be drained, to 
prevent any other iothread from reading the ->parents list that is 
being updated.


The only thing to keep in mind in this case is that just wrapping a 
drain around that won't be enough, because then the child won't be 
included in the drain_end(old_bs). Therefore the right way to cover 
this drain-wise once the assertion also checks for drains is:


drain_begin(old_bs)
assert_bdrv_graph_writable(old_bs)
QLIST_REMOVE(child, next_parent)
/* old_bs will be under drain_end, but not the child */
bdrv_parent_drained_end_single(child);
bdrv_drained_end(old_bs);

I think you agree on this so far.

Now I think your concern is related to the child "parent", namely 
child->opaque. The problem is that in the .detach and .attach 
callbacks we are firstly adding/removing the child from the list, and 
then calling drain on the subtree.


It was my impression that you’d want bdrv_replace_child_noperm() to 
always be called in a section where the subgraph starting from the 
parent BDS is drained, not just the child BDSs that are swapped (old_bs 
and new_bs).


My abstract concern is that bdrv_replace_child_noperm() does not modify 
only old_bs and new_bs, but actually modifies the whole subgraph 
starting at the parent node.  A concrete example for this is that we 
modify not only the children’s parent lists, but also the parent’s 
children list.


That’s why I’m asking why we only check that the graph is writable for 
the children, when actually I feel like we’re modifying the parent, too.



We

  1   2   3   >