date:20231016

Re: [PATCH v4 03/10] migration: migrate 'inc' command option is deprecated.

2023-10-16 Thread Juan Quintela

Markus Armbruster  wrote:
> Juan Quintela  writes:
>
>> Set the 'block_incremental' migration parameter to 'true' instead.
>>
>> Reviewed-by: Thomas Huth 
>> Acked-by: Stefan Hajnoczi 
>> Signed-off-by: Juan Quintela 
>>
>> ---
>>
>> Improve documentation and style (thanks Markus)
>> ---
>>  docs/about/deprecated.rst | 7 +++
>>  qapi/migration.json   | 8 +++-
>>  migration/migration.c | 6 ++
>>  3 files changed, 20 insertions(+), 1 deletion(-)
>>
>> diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
>> index 1c4d7f36f0..1b6b2870cf 100644
>> --- a/docs/about/deprecated.rst
>> +++ b/docs/about/deprecated.rst
>> @@ -452,3 +452,10 @@ Migration
>>  ``skipped`` field in Migration stats has been deprecated.  It hasn't
>>  been used for more than 10 years.
>>  
>> +``inc`` migrate command option (since 8.2)
>> +''
>> +
>> +The new way to modify migration is using migration parameters.
>> +``inc`` functionality can be achieved by setting the
>> +``block-incremental`` migration parameter to ``true``.
>> +
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index 6865fea3c5..56bbd55b87 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -1492,6 +1492,11 @@
>>  #
>>  # @resume: resume one paused migration, default "off". (since 3.0)
>>  #
>> +# Features:
>> +#
>> +# @deprecated: Member @inc is deprecated.  Use migration parameter
>> +# @block-incremental instead.
>
> This is fine now.  It becomes bad advice in PATCH 05, which deprecates
> @block-incremental.  Two solutions:
>
> 1. Change this patch to point to an alternative that will *not* be
> deprecated.

Ok, clearly I am not explaining myself properly O:-)

History of block migration:
* In the beggining there was -b and -i migrate options
  There was the only way to do storage of migration.
* We moved to use parameters and capabilities for migration
  So we created @block-incremental and @block.
  But we didn't remove the command line options (for backward
  compatibility).
* We were asked to modify migration so some storaged was migrated and
  some was not migrated during migration.  But block people found that
  it was a good idea to allow storage migration without migrating the
  vm, so they created this blockdev-mirror mechanism that is shinny,
  funny, faster,  better.

So now we have old code that basically nobody uses (the last big user
was COLO, but now it can use multifd).  So we want to drop it, but we
don't care about a direct replacement.

So, why I am interested in removing this?
- @block and @block-incremental: If you don't use block migration, their
  existence don't bother you.  They are "quite" independent of the rest
  of the migration code (they could be better integrated, but not big
  trouble here).
- migrate options -i/-b: This ones hurt us each time that we need to
  do changing in options.  Notice that we have "perfect" replacements
  with @block and @block-incremental, exactly the same
  result/semantics/...
  You can see the trobles in the RFC patches

 * [PATCH v4 07/10] [RFC] migration: Make -i/-b an error for hmp and qmp
 * [PATCH v4 08/10] [RFC] migration: Remove helpers needed for -i/-b 
migrate options

So what I want, I want to remove -i/-b in the next version (9.0?).  For
the other, I want to remove it, but I don't care if the code is around
in "deprecated" state for another couple of years if there are still
people that feel that they want it.

This is the reason that I put a pointer for -i/-b to
@block/@block-incremental.  They are "perfect" replacements.

I can put here to use blockdev-mirror + NBD, but the replacement is not
so direct.

Does this make sense?


> 2. Change PATCH 05.
>
> Same end result.
>
>> +#
>>  # Returns: nothing on success
>>  #
>>  # Since: 0.14
>> @@ -1513,7 +1518,8 @@
>>  # <- { "return": {} }
>>  ##
>>  { 'command': 'migrate',
>> -  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool',
>> +  'data': {'uri': 'str', '*blk': 'bool',
>> +   '*inc': { 'type': 'bool', 'features': [ 'deprecated' ] },
>> '*detach': 'bool', '*resume': 'bool' } }
>>  
>>  ##
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 1c6c81ad49..ac4897fe0d 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -1601,6 +1601,12 @@ static bool migrate_prepare(MigrationState *s, bool 
>> blk, bool blk_inc,
>>  {
>>  Error *local_err = NULL;
>>  
>> +if (blk_inc) {
>> +warn_report("@inc/-i migrate option is deprecated, set the"
>
> This is either about QMP migrate's parameter "inc", or HMP migrate's
> flags -i.

Needs to be @inc.  I want about the "-i" command option in other place.

> In the former case, we want something like "parameter 'inc' is
> deprecated".

This one.

> In the latter case, we want something like "-i is deprecated".

Ok, changing.

> Trying to do both in a single message results in a sub-par message.  If
> you want to do better, y

Re: [PATCH 3/4] hw/pci-host/bonito: Access memory regions via pci_address_space[_io]()

2023-10-16 Thread Philippe Mathieu-Daudé


On 16/10/23 00:19, Bernhard Beschow wrote:

Am 11. Oktober 2023 18:59:53 UTC schrieb "Philippe Mathieu-Daudé" 
:

PCI functions are plugged on a PCI bus. They can only access
external memory regions via the bus.

Signed-off-by: Philippe Mathieu-Daudé 
---
hw/pci-host/bonito.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)




@@ -719,7 +719,7 @@ static void bonito_pci_realize(PCIDevice *dev, Error **errp)

 memory_region_init_alias(pcimem_alias, NULL, "pci.mem.alias",
  &bs->pci_mem, 0, BONITO_PCIHI_SIZE);
-memory_region_add_subregion(get_system_memory(),
+memory_region_add_subregion(pci_address_space(dev),


I need to keep `get_system_memory()` here to get the same results for `info 
mtree` in the QEMU console before and after this patch when running 
`qemu-system-mips64el -M fuloong2e -S`. The other two changes above seem to 
work as expected.


Good catch, thank you Bernhard!

Re: [PATCH v2 1/4] migration: check for rate_limit_max for RATE_LIMIT_DISABLED

2023-10-16 Thread Juan Quintela

Elena Ufimtseva  wrote:
> In migration rate limiting atomic operations are used
> to read the rate limit variables and transferred bytes and
> they are expensive. Check first if rate_limit_max is equal
> to RATE_LIMIT_DISABLED and return false immediately if so.
>
> Note that with this patch we will also will stop flushing
> by not calling qemu_fflush() from migration_transferred_bytes()
> if the migration rate is not exceeded.
> This should be fine since migration thread calls in the loop
> migration_update_counters from migration_rate_limit() that
> calls the migration_transferred_bytes() and flushes there.
>
> Signed-off-by: Elena Ufimtseva 
> Reviewed-by: Fabiano Rosas 
> Reviewed-by: Peter Xu 

Reviewed-by: Juan Quintela 

queued.

Re: [PATCH v2 3/4] multifd: fix counters in multifd_send_thread

2023-10-16 Thread Juan Quintela

Elena Ufimtseva  wrote:
> Previous commit cbec7eb76879d419e7dbf531ee2506ec0722e825
> "migration/multifd: Compute transferred bytes correctly"
> removed accounting for packet_len in non-rdma
> case, but the next_packet_size only accounts for pages, not for
> the header packet (normal_pages * PAGE_SIZE) that is being sent
> as iov[0]. The packet_len part should be added to account for
> the size of MultiFDPacket and the array of the offsets.
>
> Signed-off-by: Elena Ufimtseva 
> Reviewed-by: Fabiano Rosas 

Reviewed-by: Juan Quintela 

queued.

Re: [PATCH v2 4/4] multifd: reset next_packet_len after sending pages

2023-10-16 Thread Juan Quintela

Elena Ufimtseva  wrote:
> Sometimes multifd sends just sync packet with no pages
> (normal_num is 0). In this case the old value is being
> preserved and being accounted for while only packet_len
> is being transferred.
> Reset it to 0 after sending and accounting for.
>
> Signed-off-by: Elena Ufimtseva 
> Reviewed-by: Fabiano Rosas 

Reviewed-by: Juan Quintela

Re: [PATCH RESEND 3/7] migration/ram: Remove RAMState from xbzrle_cache_zero_page

2023-10-16 Thread Juan Quintela

Fabiano Rosas  wrote:
> 'rs' is not used in that function. It's a leftover from commit
> 9360447d34 ("ram: Use MigrationStats for statistics").
>
> Reviewed-by: Peter Xu 
> Signed-off-by: Fabiano Rosas 

Reviewed-by: Juan Quintela 

queued.

Re: [PATCH RESEND 2/7] migration/ram: Refactor precopy ram loading code

2023-10-16 Thread Juan Quintela

Fabiano Rosas  wrote:
> From: Nikolay Borisov 
>
> Extract the ramblock parsing code into a routine that operates on the
> sequence of headers from the stream and another the parses the
> individual ramblock. This makes ram_load_precopy() easier to
> comprehend.
>
> Signed-off-by: Nikolay Borisov 
> Reviewed-by: Philippe Mathieu-Daudé 
> Reviewed-by: Peter Xu 
> Signed-off-by: Fabiano Rosas 

Reviewed-by: Juan Quintela 
queued.

Re: [PATCH RESEND 4/7] migration/ram: Stop passing QEMUFile around in save_zero_page

2023-10-16 Thread Juan Quintela

Fabiano Rosas  wrote:
> We don't need the QEMUFile when we're already passing the
> PageSearchStatus.
>
> Reviewed-by: Peter Xu 
> Signed-off-by: Fabiano Rosas 

Reviewed-by: Juan Quintela

Re: [PATCH 2/2] riscv: zicond: make default

2023-10-16 Thread Andrew Jones

On Mon, Oct 16, 2023 at 03:39:40PM +1000, Alistair Francis wrote:
> On Fri, Aug 11, 2023 at 5:01 PM Andrew Jones  wrote:
> >
> > On Thu, Aug 10, 2023 at 02:07:17PM -0400, Alistair Francis wrote:
> > > On Tue, Aug 8, 2023 at 6:10 PM Vineet Gupta  wrote:
> > > >
> > > >
> > > >
> > > > On 8/8/23 14:06, Daniel Henrique Barboza wrote:
> > > > > (CCing Alistair and other reviewers)
> > > > >
> > > > > On 8/8/23 15:17, Vineet Gupta wrote:
> > > > >> Again this helps with better testing and something qemu has been 
> > > > >> doing
> > > > >> with newer features anyways.
> > > > >>
> > > > >> Signed-off-by: Vineet Gupta 
> > > > >> ---
> > > > >
> > > > > Even if we can reach a consensus about removing the experimental (x-
> > > > > prefix) status
> > > > > from an extension that is Frozen instead of ratified, enabling stuff
> > > > > in the default
> > > > > CPUs because it's easier to test is something we would like to avoid.
> > > > > The rv64
> > > > > CPU has a random set of extensions enabled for the most different and
> > > > > undocumented
> > > > > reasons, and users don't know what they'll get because we keep beefing
> > > > > up the
> > > > > generic CPUs arbitrarily.
> > >
> > > The idea was to enable "most" extensions for the virt machine. It's a
> > > bit wishy-washy, but the idea was to enable as much as possible by
> > > default on the virt machine, as long as it doesn't conflict. The goal
> > > being to allow users to get the "best" experience as all their
> > > favourite extensions are enabled.
> > >
> > > It's harder to do in practice, so we are in a weird state where users
> > > don't know what is and isn't enabled.
> > >
> > > We probably want to revisit this. We should try to enable what is
> > > useful for users and make it clear what is and isn't enabled. I'm not
> > > clear on how best to do that though.
> > >
> > > Again, I think this comes back to we need to version the virt machine.
> > > I might do that as a starting point, that allows us to make changes in
> > > a clear way.
> >
> > While some extensions will impact the machine model, as well as cpu
> > models, versioning the machine model won't help much with ambiguity in
> > cpu model extension support. Daniel's proposal of having a base cpu mode,
> > which, on top, users can explicitly enable what they want (including with
> > profile support which work like a shorthand to enable many extensions at
> > once), is, IMO, the best way for users to know what they get. Also, the
> > 'max' cpu model is the best way to "quickly get as much as possible" for
> > testing. To know what's in 'max', or named cpu models, we need to
> > implement qmp_query_cpu_model_expansion(). Something that could be used
> > from the command line would also be nice, but neither x86 nor arm provide
> > that (they have '-cpu help', but arm doesn't output anything for cpu
> > features and x86 dumps all features out without saying what's enabled for
> > any particular cpu model...)
> >
> > I know x86 people have in the past discussed versioning cpu models, but
> > I don't think that should be necessary for riscv with the base+profile
> > approach. A profile would effectively be a versioned cpu model in that
> > case.
> >
> > Finally, I'd discourage versioning the virt machine type until we need
> > to worry about users creating riscv guest images that they are unwilling
> > to modify, despite wanting to update their QEMU versions. And, even then,
> 
> What's the problem with versioning the virt machine though?

The initial versioning support is no big deal, just a couple new functions
and macros. And, new versions which don't change anything or just change
the current state of preexisting properties and attributes also have very
little developer work. However, when changes require adding compat
variables, which scatter around if's to manage things in one way for
one machine version and another way for others, then code starts to get
more difficult to maintain. And, since adding compat variables is known to
cause a maintenance burden, then just about every change the machine model
gets will lead to time spent discussing whether or not a compat variable
is necessary. But, none of that is the worst part of versioned machines.
The worst part is that the test matrix explodes. Typically all test cases
will get run N times where N is the number of supported machine types,
even if for most test cases the machine type doesn't make a difference. So
that's a waste of time and energy. And, if nobody really cares about the
old machine types, then running test cases which do depend on the machine
type is still a waste of time and energy. And, of course, machine types
which are just a number bump, definitely lead to waste.

> 
> I'm thinking that in the future we would want to switch from PLIC to
> AIA; change the memory map; or change the default extensions (maybe to
> a profile). All of those would require a versioned virt machine.

Having never versioned the machine type befo

Re: [PATCH RESEND 7/7] tests/qtest: Re-enable multifd cancel test

2023-10-16 Thread Juan Quintela

Fabiano Rosas  wrote:
> We've found the source of flakiness in this test, so re-enable it.
>
> Reviewed-by: Juan Quintela 
> Signed-off-by: Fabiano Rosas 

One test still missing to cleanup the serial file.

Will send it later and then we can reenable it.

Later, Juan.

Re: [RFC PATCH 05/11] testing/avocado: ppc add new BookE boot_linux_console.py tests

2023-10-16 Thread Cédric Le Goater


On 10/10/23 22:53, Nicholas Piggin wrote:

On Tue Oct 10, 2023 at 10:03 PM AEST, Joel Stanley wrote:

On Tue, 10 Oct 2023 at 18:23, Nicholas Piggin  wrote:


Add simple Linux kernel boot tests for BookE 64-bit and 32-bit CPUs
using Guenter Roeck's rootfs images for Linux testing, and a gitlab
repository with kernel images that I built since there are very few
sources of modern BookE images now.

Signed-off-by: Nicholas Piggin 


Reviewed-by: Joel Stanley 

Should we get mpe to add a https://github.com/linuxppc/qemu-ci-images
for you to keep those kernel images? But perhaps you'd prefer to keep
them on gitlab. Just a suggestion.


Not a bad idea. Or we could try for gitlab/qemu/ci-images I suppose.


Feel free to take these :

  https://github.com/legoater/qemu-ppc-boot/tree/main/buildroot

Supported machines

prep/ppc 604 CPU
ref405ep/ppc 405EP CPU
bamboo/ppc 440EP CPU
sam460ex/ppc 460EX CPU (equivalent to a 440)
g3beige/ppc G3 CPU
mac99/ppc G4 CPU
e500mc/ppc e500mc CPU
mpc8544ds/ppc e500v2 CPU
ppce500/ppc64 e5500, e6500
mac99/ppc64 970 CPU with 64bit and 32bit user space
pseries/ppc64 POWER5+, 970, 970MP, POWER7
pseries/ppc64le POWER8, POWER9, POWER10
powernv8/ppc64le POWER8 HV CPU
powernv9/ppc64le POWER9 HV CPU

Thanks,

C.







---
  tests/avocado/boot_linux_console.py | 53 +
  1 file changed, 53 insertions(+)

diff --git a/tests/avocado/boot_linux_console.py 
b/tests/avocado/boot_linux_console.py
index 9434304cd3..dc3346ef49 100644
--- a/tests/avocado/boot_linux_console.py
+++ b/tests/avocado/boot_linux_console.py
@@ -1355,6 +1355,59 @@ def test_ppc64_e500(self):
  tar_hash = '6951d86d644b302898da2fd701739c9406527fe1'
  self.do_test_advcal_2018('19', tar_hash, 'uImage')

+def test_ppc64_e6500(self):
+"""
+:avocado: tags=arch:ppc64
+:avocado: tags=machine:ppce500
+:avocado: tags=cpu:e6500
+:avocado: tags=accel:tcg
+"""
+kernel_url = 
('https://gitlab.com/npiggin/qemu-ci-images/-/raw/main/ppc/corenet64_vmlinux?ref_type=heads&inline=false')


Is the ref_type?heads=inline-false required? I seem to get the file
successfully with wget and those omitted.


I just copied the download link, so if it works without then
I'll remove it.

Thanks,
Nick

Re: [RFC PATCH 11/11] ppc/pnv: Change powernv default to powernv10

2023-10-16 Thread Cédric Le Goater


On 10/10/23 09:52, Nicholas Piggin wrote:

POWER10 is the latest IBM Power machine. Although it is not offered in
"OPAL mode" (i.e., powernv configuration), so there is a case that it
should remain at powernv9, most of the development work is going into
powernv10 at the moment.

Signed-off-by: Nicholas Piggin 


I think we would update skiboot to v7.1 also.

Reviewed-by: Cédric Le Goater 

Thanks,

C.




---
  hw/ppc/pnv.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index eb54f93986..f3dad5ae05 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -2195,8 +2195,6 @@ static void pnv_machine_power9_class_init(ObjectClass 
*oc, void *data)
  
  xfc->match_nvt = pnv_match_nvt;
  
-mc->alias = "powernv";

-
  pmc->compat = compat;
  pmc->compat_size = sizeof(compat);
  pmc->dt_power_mgt = pnv_dt_power_mgt;
@@ -2220,6 +2218,8 @@ static void pnv_machine_power10_class_init(ObjectClass 
*oc, void *data)
  mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power10_v2.0");
  compat_props_add(mc->compat_props, phb_compat, G_N_ELEMENTS(phb_compat));
  
+mc->alias = "powernv";

+
  pmc->compat = compat;
  pmc->compat_size = sizeof(compat);
  pmc->dt_power_mgt = pnv_dt_power_mgt;

Re: [RFC PATCH 10/11] ppc/spapr: change pseries machine default to POWER10 CPU

2023-10-16 Thread Cédric Le Goater


On 10/10/23 09:52, Nicholas Piggin wrote:

POWER10 is the latest pseries CPU.

Signed-off-by: Nicholas Piggin 



Reviewed-by: Cédric Le Goater 

Thanks,

C.



---
  hw/ppc/spapr.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index d4230d3647..9d3475d64b 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -4661,7 +4661,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
  
  smc->dr_lmb_enabled = true;

  smc->update_dt_enabled = true;
-mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power9_v2.2");
+mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power10_v2.0");
  mc->has_hotpluggable_cpus = true;
  mc->nvdimm_supported = true;
  smc->resize_hpt_default = SPAPR_RESIZE_HPT_ENABLED;

[PATCH] vhost-user: Fix protocol feature bit conflict

2023-10-16 Thread Hanna Czenczek

The VHOST_USER_PROTOCOL_F_XEN_MMAP feature bit was defined in
f21e95ee97d, which has been part of qemu's 8.1.0 release.  However, it
seems it was never added to qemu's code, but it is well possible that it
is already used by different front-ends outside of qemu (i.e., Xen).

VHOST_USER_PROTOCOL_F_SHARED_OBJECT in contrast was added to qemu's code
in 16094766627, but never defined in the vhost-user specification.  As a
consequence, both bits were defined to be 17, which cannot work.

Regardless of whether actual code or the specification should take
precedence, F_XEN_MMAP is already part of a qemu release, while
F_SHARED_OBJECT is not.  Therefore, bump the latter to take number 18
instead of 17, and add this to the specification.

Take the opportunity to add at least a little note on the
VhostUserShared structure to the specification.  This structure is
referenced by the new commands introduced in 16094766627, but was not
defined.

Fixes: 160947666276c5b7f6bca4d746bcac2966635d79
   ("vhost-user: add shared_object msg")
Signed-off-by: Hanna Czenczek 
---
 docs/interop/vhost-user.rst   | 11 +++
 include/hw/virtio/vhost-user.h|  3 ++-
 subprojects/libvhost-user/libvhost-user.h |  3 ++-
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index 415bb47a19..768fb5c28c 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -275,6 +275,16 @@ Inflight description
 
 :queue size: a 16-bit size of virtqueues
 
+VhostUserShared
+^^^
+
++--+
+| UUID |
++--+
+
+:UUID: 16 bytes UUID, whose first three components (a 32-bit value, then
+  two 16-bit values) are stored in big endian.
+
 C structure
 ---
 
@@ -885,6 +895,7 @@ Protocol features
   #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS  15
   #define VHOST_USER_PROTOCOL_F_STATUS   16
   #define VHOST_USER_PROTOCOL_F_XEN_MMAP 17
+  #define VHOST_USER_PROTOCOL_F_SHARED_OBJECT18
 
 Front-end message types
 ---
diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
index 9f9ddf878d..1d4121431b 100644
--- a/include/hw/virtio/vhost-user.h
+++ b/include/hw/virtio/vhost-user.h
@@ -29,7 +29,8 @@ enum VhostUserProtocolFeature {
 VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS = 14,
 VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS = 15,
 VHOST_USER_PROTOCOL_F_STATUS = 16,
-VHOST_USER_PROTOCOL_F_SHARED_OBJECT = 17,
+/* Feature 17 reserved for VHOST_USER_PROTOCOL_F_XEN_MMAP. */
+VHOST_USER_PROTOCOL_F_SHARED_OBJECT = 18,
 VHOST_USER_PROTOCOL_F_MAX
 };
 
diff --git a/subprojects/libvhost-user/libvhost-user.h 
b/subprojects/libvhost-user/libvhost-user.h
index b36a42a7ca..c2352904f0 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -65,7 +65,8 @@ enum VhostUserProtocolFeature {
 VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS = 14,
 VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS = 15,
 /* Feature 16 is reserved for VHOST_USER_PROTOCOL_F_STATUS. */
-VHOST_USER_PROTOCOL_F_SHARED_OBJECT = 17,
+/* Feature 17 reserved for VHOST_USER_PROTOCOL_F_XEN_MMAP. */
+VHOST_USER_PROTOCOL_F_SHARED_OBJECT = 18,
 VHOST_USER_PROTOCOL_F_MAX
 };
 
-- 
2.41.0

Re: [PATCH v3 0/8] qemu-img: rebase: add compression support

2023-10-16 Thread Andrey Drobyshev

On 10/2/23 09:35, Andrey Drobyshev wrote:
> On 9/19/23 20:57, Andrey Drobyshev wrote:
>> v2 --> v3:
>>  * Patch 3/8: fixed logic in the if statement, so that we align on blk
>>when blk_old_backing == NULL;
>>  * Patch 4/8: comment fix;
>>  * Patch 5/8: comment fix; dropped redundant "if (blk_new_backing)"
>>statements.
>>
>> v2: https://lists.nongnu.org/archive/html/qemu-block/2023-09/msg00448.html
>>
>> Andrey Drobyshev (8):
>>   qemu-img: rebase: stop when reaching EOF of old backing file
>>   qemu-iotests: 024: add rebasing test case for overlay_size >
>> backing_size
>>   qemu-img: rebase: use backing files' BlockBackend for buffer alignment
>>   qemu-img: add chunk size parameter to compare_buffers()
>>   qemu-img: rebase: avoid unnecessary COW operations
>>   iotests/{024, 271}: add testcases for qemu-img rebase
>>   qemu-img: add compression option to rebase subcommand
>>   iotests: add tests for "qemu-img rebase" with compression
>>
>>  docs/tools/qemu-img.rst|   6 +-
>>  qemu-img-cmds.hx   |   4 +-
>>  qemu-img.c | 136 ++
>>  tests/qemu-iotests/024 | 117 ++
>>  tests/qemu-iotests/024.out |  73 
>>  tests/qemu-iotests/271 | 131 +
>>  tests/qemu-iotests/271.out |  82 ++
>>  tests/qemu-iotests/314 | 165 +
>>  tests/qemu-iotests/314.out |  75 +
>>  9 files changed, 752 insertions(+), 37 deletions(-)
>>  create mode 100755 tests/qemu-iotests/314
>>  create mode 100644 tests/qemu-iotests/314.out
>>
> 
> Ping

Friendly ping

Re: [RFC PATCH 05/11] testing/avocado: ppc add new BookE boot_linux_console.py tests

2023-10-16 Thread Cédric Le Goater


On 10/10/23 09:52, Nicholas Piggin wrote:

Add simple Linux kernel boot tests for BookE 64-bit and 32-bit CPUs
using Guenter Roeck's rootfs images for Linux testing, and a gitlab
repository with kernel images that I built since there are very few
sources of modern BookE images now.

Signed-off-by: Nicholas Piggin 
---
  tests/avocado/boot_linux_console.py | 53 +
  1 file changed, 53 insertions(+)

diff --git a/tests/avocado/boot_linux_console.py 
b/tests/avocado/boot_linux_console.py
index 9434304cd3..dc3346ef49 100644
--- a/tests/avocado/boot_linux_console.py
+++ b/tests/avocado/boot_linux_console.py
@@ -1355,6 +1355,59 @@ def test_ppc64_e500(self):
  tar_hash = '6951d86d644b302898da2fd701739c9406527fe1'
  self.do_test_advcal_2018('19', tar_hash, 'uImage')
  
+def test_ppc64_e6500(self):

+"""
+:avocado: tags=arch:ppc64
+:avocado: tags=machine:ppce500
+:avocado: tags=cpu:e6500
+:avocado: tags=accel:tcg
+"""
+kernel_url = 
('https://gitlab.com/npiggin/qemu-ci-images/-/raw/main/ppc/corenet64_vmlinux?ref_type=heads&inline=false')
+kernel_hash = '01051590b083fec66cb3b9e2e553e95d4cf47691'
+kernel_path = self.fetch_asset(kernel_url, asset_hash=kernel_hash)
+
+initrd_url = 
('https://github.com/groeck/linux-build-test/raw/master/rootfs/ppc64/rootfs.cpio.gz')



I think you should use buildroot images from the qemu_ppc64_e5500 defconfig:

  https://github.com/buildroot/buildroot/tree/master/board/qemu/ppc64-e5500

The question is where to store them. I did under my GH account for PPC and
Aspeed but it is not satisfactory in the long term.

May be could have a common repo for all the buildroot QEMU boards images ?
and rebuild once a year ?

Thanks,

C.



+initrd_hash = '798acffc036c3b1ae6cacf95c869bba2'
+initrd_path = self.fetch_asset(initrd_url, asset_hash=initrd_hash,
+   algorithm="md5")
+
+self.vm.set_console()
+kernel_command_line = self.KERNEL_COMMON_COMMAND_LINE
+self.vm.add_args('-smp', '2',
+ '-kernel', kernel_path,
+ '-initrd', initrd_path,
+ '-append', kernel_command_line,
+ '-no-reboot')
+self.vm.launch()
+# Wait for VM to shut down gracefully
+self.vm.wait()
+
+def test_ppc32_mpc85xx(self):
+"""
+:avocado: tags=arch:ppc
+:avocado: tags=machine:ppce500
+:avocado: tags=cpu:mpc8568
+:avocado: tags=accel:tcg
+"""
+kernel_url = 
('https://gitlab.com/npiggin/qemu-ci-images/-/raw/main/ppc/mpc85xx_vmlinux?ref_type=heads&inline=false')
+kernel_hash = '726f7f574a491282454850b48546b3827593142b'
+kernel_path = self.fetch_asset(kernel_url, asset_hash=kernel_hash)
+
+initrd_url = 
('https://github.com/groeck/linux-build-test/raw/master/rootfs/ppc/rootfs.cpio.gz')
+initrd_hash = '4d30fa93b742c493e8cf2140e49bbd9a'
+initrd_path = self.fetch_asset(initrd_url, asset_hash=initrd_hash,
+   algorithm="md5")
+
+self.vm.set_console()
+kernel_command_line = self.KERNEL_COMMON_COMMAND_LINE
+self.vm.add_args('-kernel', kernel_path,
+ '-initrd', initrd_path,
+ '-append', kernel_command_line,
+ '-no-reboot')
+self.vm.launch()
+# Wait for VM to shut down gracefully
+self.vm.wait()
+
  def do_test_ppc64_powernv(self, proc):
  self.require_accelerator("tcg")
  images_url = 
('https://github.com/open-power/op-build/releases/download/v2.7/')

Re: [RFC PATCH 02/11] tests/avocado: Add ppc pseries and powernv Hash MMU tests

2023-10-16 Thread Cédric Le Goater


On 10/10/23 09:52, Nicholas Piggin wrote:

The Hash MMU mode is supported along side Radix in POWER hardware, and
Linux supports running in either mode. Radix is the default so to keep
up testing of QEMU Hash MMU, add some explicit Hash MMU tests.

Signed-off-by: Nicholas Piggin 


Nice ! Could we do the same with XICS and XIVE (xive=off) ? since XIVE
is the default now.

May be we should add a boot test for all CPUs supported by pseries :

  970, 970mp, POWER5+, P8, P9, P10.

Same for PowerNV.

Thanks,

C.


---
  tests/avocado/ppc_powernv.py | 21 ++---
  tests/avocado/ppc_pseries.py | 20 +---
  2 files changed, 35 insertions(+), 6 deletions(-)

diff --git a/tests/avocado/ppc_powernv.py b/tests/avocado/ppc_powernv.py
index d0e5c07bde..2be322c47d 100644
--- a/tests/avocado/ppc_powernv.py
+++ b/tests/avocado/ppc_powernv.py
@@ -12,11 +12,11 @@
  class powernvMachine(QemuSystemTest):
  
  timeout = 90

-KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 '
+KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 console=hvc0 '
  panic_message = 'Kernel panic - not syncing'
  good_message = 'VFS: Cannot open root device'
  
-def do_test_linux_boot(self):

+def do_test_linux_boot(self, kernel_command_line = 
KERNEL_COMMON_COMMAND_LINE):
  self.require_accelerator("tcg")
  kernel_url = ('https://archives.fedoraproject.org/pub/archive'
'/fedora-secondary/releases/29/Everything/ppc64le/os'
@@ -25,7 +25,6 @@ def do_test_linux_boot(self):
  kernel_path = self.fetch_asset(kernel_url, asset_hash=kernel_hash)
  
  self.vm.set_console()

-kernel_command_line = self.KERNEL_COMMON_COMMAND_LINE + 'console=hvc0'
  self.vm.add_args('-kernel', kernel_path,
   '-append', kernel_command_line)
  self.vm.launch()
@@ -54,6 +53,22 @@ def test_linux_smp_boot(self):
  wait_for_console_pattern(self, console_pattern, self.panic_message)
  wait_for_console_pattern(self, self.good_message, self.panic_message)
  
+def test_linux_smp_hpt_boot(self):

+"""
+:avocado: tags=arch:ppc64
+:avocado: tags=machine:powernv
+:avocado: tags=accel:tcg
+"""
+
+self.vm.add_args('-smp', '4')
+self.do_test_linux_boot(self.KERNEL_COMMON_COMMAND_LINE +
+'disable_radix')
+console_pattern = 'smp: Brought up 1 node, 4 CPUs'
+wait_for_console_pattern(self, 'hash-mmu: Initializing hash mmu',
+ self.panic_message)
+wait_for_console_pattern(self, console_pattern, self.panic_message)
+wait_for_console_pattern(self, self.good_message, self.panic_message)
+
  def test_linux_smt_boot(self):
  """
  :avocado: tags=arch:ppc64
diff --git a/tests/avocado/ppc_pseries.py b/tests/avocado/ppc_pseries.py
index a8311e6555..74aaa4ac4a 100644
--- a/tests/avocado/ppc_pseries.py
+++ b/tests/avocado/ppc_pseries.py
@@ -12,11 +12,11 @@
  class pseriesMachine(QemuSystemTest):
  
  timeout = 90

-KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 '
+KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 console=hvc0 '
  panic_message = 'Kernel panic - not syncing'
  good_message = 'VFS: Cannot open root device'
  
-def do_test_ppc64_linux_boot(self):

+def do_test_ppc64_linux_boot(self, kernel_command_line = 
KERNEL_COMMON_COMMAND_LINE):
  kernel_url = ('https://archives.fedoraproject.org/pub/archive'
'/fedora-secondary/releases/29/Everything/ppc64le/os'
'/ppc/ppc64/vmlinuz')
@@ -24,7 +24,6 @@ def do_test_ppc64_linux_boot(self):
  kernel_path = self.fetch_asset(kernel_url, asset_hash=kernel_hash)
  
  self.vm.set_console()

-kernel_command_line = self.KERNEL_COMMON_COMMAND_LINE + 'console=hvc0'
  self.vm.add_args('-kernel', kernel_path,
   '-append', kernel_command_line)
  self.vm.launch()
@@ -62,6 +61,21 @@ def test_ppc64_linux_smp_boot(self):
  wait_for_console_pattern(self, console_pattern, self.panic_message)
  wait_for_console_pattern(self, self.good_message, self.panic_message)
  
+def test_ppc64_linux_hpt_smp_boot(self):

+"""
+:avocado: tags=arch:ppc64
+:avocado: tags=machine:pseries
+"""
+
+self.vm.add_args('-smp', '4')
+self.do_test_ppc64_linux_boot(self.KERNEL_COMMON_COMMAND_LINE +
+  'disable_radix')
+console_pattern = 'smp: Brought up 1 node, 4 CPUs'
+wait_for_console_pattern(self, 'hash-mmu: Initializing hash mmu',
+ self.panic_message)
+wait_for_console_pattern(self, console_pattern, self.panic_message)
+wait_for_console_pattern(self, self.good_message, self.panic_message)
+
  def test_ppc64_linux_smt_boot(self):
  """
  :avocado:

Re: [PATCH 1/3] migration/multifd: Remove direct "socket" references

2023-10-16 Thread Juan Quintela

Fabiano Rosas  wrote:
> We're about to enable support for other transports in multifd, so
> remove direct references to sockets.
>
> Signed-off-by: Fabiano Rosas 

Reviewed-by: Juan Quintela 

queued.

[PATCH v2 01/27] vfio: Rename VFIOContainer into VFIOLegacyContainer

2023-10-16 Thread Zhenzhong Duan

From: Eric Auger 

In the prospect to introduce a base object for the VFIOContainer
and derive into the existing legacy container and the iommufd
based container, let's rename the existing one into
VFIOLegacyContainer. This is just an incremental step to ease
the migration. Soon there won't be any reference to the legacy
container in the common.c code. Only the container.c should
handle the VFIOLegacyContainer object.

No functional change intended.

Signed-off-by: Eric Auger 
Signed-off-by: Yi Liu 
Signed-off-by: Yi Sun 
Signed-off-by: Zhenzhong Duan 
---
 include/hw/vfio/vfio-common.h | 46 -
 hw/vfio/common.c  | 63 ---
 hw/vfio/container.c   | 45 +
 hw/vfio/spapr.c   | 12 +++
 4 files changed, 89 insertions(+), 77 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 7780b9073a..34648e518e 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -74,13 +74,13 @@ typedef struct VFIOMigration {
 
 typedef struct VFIOAddressSpace {
 AddressSpace *as;
-QLIST_HEAD(, VFIOContainer) containers;
+QLIST_HEAD(, VFIOLegacyContainer) containers;
 QLIST_ENTRY(VFIOAddressSpace) list;
 } VFIOAddressSpace;
 
 struct VFIOGroup;
 
-typedef struct VFIOContainer {
+typedef struct VFIOLegacyContainer {
 VFIOAddressSpace *space;
 int fd; /* /dev/vfio/vfio, empowered by the attached groups */
 MemoryListener listener;
@@ -97,12 +97,12 @@ typedef struct VFIOContainer {
 QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
 QLIST_HEAD(, VFIOGroup) group_list;
 QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
-QLIST_ENTRY(VFIOContainer) next;
+QLIST_ENTRY(VFIOLegacyContainer) next;
 QLIST_HEAD(, VFIODevice) device_list;
-} VFIOContainer;
+} VFIOLegacyContainer;
 
 typedef struct VFIOGuestIOMMU {
-VFIOContainer *container;
+VFIOLegacyContainer *container;
 IOMMUMemoryRegion *iommu_mr;
 hwaddr iommu_offset;
 IOMMUNotifier n;
@@ -110,7 +110,7 @@ typedef struct VFIOGuestIOMMU {
 } VFIOGuestIOMMU;
 
 typedef struct VFIORamDiscardListener {
-VFIOContainer *container;
+VFIOLegacyContainer *container;
 MemoryRegion *mr;
 hwaddr offset_within_address_space;
 hwaddr size;
@@ -133,7 +133,7 @@ typedef struct VFIODevice {
 QLIST_ENTRY(VFIODevice) container_next;
 QLIST_ENTRY(VFIODevice) global_next;
 struct VFIOGroup *group;
-VFIOContainer *container;
+VFIOLegacyContainer *container;
 char *sysfsdev;
 char *name;
 DeviceState *dev;
@@ -167,7 +167,7 @@ struct VFIODeviceOps {
 typedef struct VFIOGroup {
 int fd;
 int groupid;
-VFIOContainer *container;
+VFIOLegacyContainer *container;
 QLIST_HEAD(, VFIODevice) device_list;
 QLIST_ENTRY(VFIOGroup) next;
 QLIST_ENTRY(VFIOGroup) container_next;
@@ -206,28 +206,28 @@ typedef struct {
 hwaddr pages;
 } VFIOBitmap;
 
-void vfio_host_win_add(VFIOContainer *container,
+void vfio_host_win_add(VFIOLegacyContainer *container,
hwaddr min_iova, hwaddr max_iova,
uint64_t iova_pgsizes);
-int vfio_host_win_del(VFIOContainer *container, hwaddr min_iova,
+int vfio_host_win_del(VFIOLegacyContainer *container, hwaddr min_iova,
   hwaddr max_iova);
 VFIOAddressSpace *vfio_get_address_space(AddressSpace *as);
 void vfio_put_address_space(VFIOAddressSpace *space);
-bool vfio_devices_all_running_and_saving(VFIOContainer *container);
+bool vfio_devices_all_running_and_saving(VFIOLegacyContainer *container);
 
 /* container->fd */
-int vfio_dma_unmap(VFIOContainer *container, hwaddr iova,
+int vfio_dma_unmap(VFIOLegacyContainer *container, hwaddr iova,
ram_addr_t size, IOMMUTLBEntry *iotlb);
-int vfio_dma_map(VFIOContainer *container, hwaddr iova,
+int vfio_dma_map(VFIOLegacyContainer *container, hwaddr iova,
  ram_addr_t size, void *vaddr, bool readonly);
-int vfio_set_dirty_page_tracking(VFIOContainer *container, bool start);
-int vfio_query_dirty_bitmap(VFIOContainer *container, VFIOBitmap *vbmap,
+int vfio_set_dirty_page_tracking(VFIOLegacyContainer *container, bool start);
+int vfio_query_dirty_bitmap(VFIOLegacyContainer *container, VFIOBitmap *vbmap,
 hwaddr iova, hwaddr size);
 
-int vfio_container_add_section_window(VFIOContainer *container,
+int vfio_container_add_section_window(VFIOLegacyContainer *container,
   MemoryRegionSection *section,
   Error **errp);
-void vfio_container_del_section_window(VFIOContainer *container,
+void vfio_container_del_section_window(VFIOLegacyContainer *container,
MemoryRegionSection *section);
 
 void vfio_disable_irqindex(VFIODevice *vbasedev, int index);
@@ -290,21 +290,21 @@ vfio_get_cap(void *ptr, uint32_t cap_o

[PATCH v2 00/27] vfio: Adopt iommufd

2023-10-16 Thread Zhenzhong Duan

Hi,

Thanks all for giving guides and comments on previous series, here is
the pure iommufd support part.


PATCH 1-15: Abstract out base container
PATCH 16: Add --enable/--disable-iommufd config support
PATCH 17: Introduce iommufd object
PATCH 18-21: add IOMMUFD container and cdev support
PATCH 22-27: fd passing for IOMMUFD object and cdev


We have done wide test with different combinations, e.g:
- PCI device were tested
- FD passing and hot reset with some trick.
- device hotplug test with legacy and iommufd backends
- with or without vIOMMU for legacy and iommufd backends
- devices linked to different iommufd backends
- VFIO migration with a E800 net card(no dirty sync support) passthrough
- platform, ccw and ap were only compile-tested due to environment limit


Given some iommufd kernel limitations, the iommufd backend is
not yet fully on par with the legacy backend w.r.t. features like:
- p2p mappings (you will see related error traces)
- dirty page sync
- and etc.


qemu code: https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_cdev_v2

--

Below are some background and graph about the design:

With the introduction of iommufd, the Linux kernel provides a generic
interface for userspace drivers to propagate their DMA mappings to kernel
for assigned devices. This series does the porting of the VFIO devices
onto the /dev/iommu uapi and let it coexist with the legacy implementation.

At QEMU level, interactions with the /dev/iommu are abstracted by a new
iommufd object (compiled in with the CONFIG_IOMMUFD option).

Any QEMU device (e.g. vfio device) wishing to use /dev/iommu must be
linked with an iommufd object. In this series, the vfio-pci device is
granted with such capability (other VFIO devices are not yet ready):

It gets a new optional parameter named iommufd which allows to pass
an iommufd object:

-object iommufd,id=iommufd0
-device vfio-pci,host=:02:00.0,iommufd=iommufd0

Note the /dev/iommu and vfio cdev can be externally opened by a
management layer. In such a case the fd is passed:

-object iommufd,id=iommufd0,fd=22
-device vfio-pci,iommufd=iommufd0,fd=23

If the fd parameter is not passed, the fd is opened by QEMU.
See https://www.mail-archive.com/qemu-devel@nongnu.org/msg937155.html
for detailed discuss on this requirement.

If no iommufd option is passed to the vfio-pci device, iommufd is not
used and the end-user gets the behavior based on the legacy vfio iommu
interfaces:

-device vfio-pci,host=:02:00.0

While the legacy kernel interface is group-centric, the new iommufd
interface is device-centric, relying on device fd and iommufd.

To support both interfaces in the QEMU VFIO device we reworked the vfio
container abstraction so that the generic VFIO code can use either
backend.

The VFIOContainer object becomes a base object derived into
a) the legacy VFIO container and
b) the new iommufd based container.

The base object implements generic code such as code related to
memory_listener and address space management whereas the derived
objects implement callbacks specific to either BE, legacy and
iommufd. Indeed each backend has its own way to setup secure context
and dma management interface. The below diagram shows how it looks
like with both BEs.

VFIO   AddressSpace/Memory
+---+  +--+  +-+  +-+
|  pci  |  | platform |  |  ap |  | ccw |
+---+---+  ++-+  +--+--+  +--+--+ +--+
|   |   |||   AddressSpace   |
|   |   ||++-+
+---V---V---VV+   /
|   VFIOAddressSpace  | <+
|  |  |  MemoryListener
|  VFIOContainer list |
+---+++
||
||
+---V--++V--+
|   iommufd||vfio legacy|
|  container   || container |
+---+--+++--+
||
| /dev/iommu | /dev/vfio/vfio
| /dev/vfio/devices/vfioX| /dev/vfio/$group_id
Userspace   ||
++===
Kernel  |  device fd |
+---+| group/container fd
| (BIND_IOMMUFD || (SET_CONTAINER/SET_IOMMU)
|  ATTACH_IOAS) || device fd
|   ||
|   +---VV-+
iommufd |   |vfio  |
(map/unmap  |

[PATCH v2 05/27] vfio/common: Move giommu_list in base container

2023-10-16 Thread Zhenzhong Duan

From: Eric Auger 

Move the giommu_list field in the base object and store the
base container in the VFIOGuestIOMMU.

We introduce vfio_container_init/destroy helper on the base
container.

No fucntional change intended.

Signed-off-by: Eric Auger 
Signed-off-by: Yi Liu 
Signed-off-by: Yi Sun 
Signed-off-by: Zhenzhong Duan 
---
 include/hw/vfio/vfio-common.h |  9 -
 include/hw/vfio/vfio-container-base.h | 13 +
 hw/vfio/common.c  | 18 --
 hw/vfio/container-base.c  | 19 +++
 hw/vfio/container.c   | 13 +++--
 5 files changed, 47 insertions(+), 25 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index f2aa122c47..884d1627f4 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -95,7 +95,6 @@ typedef struct VFIOLegacyContainer {
 uint64_t max_dirty_bitmap_size;
 unsigned long pgsizes;
 unsigned int dma_max_mappings;
-QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
 QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
 QLIST_HEAD(, VFIOGroup) group_list;
 QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
@@ -103,14 +102,6 @@ typedef struct VFIOLegacyContainer {
 QLIST_HEAD(, VFIODevice) device_list;
 } VFIOLegacyContainer;
 
-typedef struct VFIOGuestIOMMU {
-VFIOLegacyContainer *container;
-IOMMUMemoryRegion *iommu_mr;
-hwaddr iommu_offset;
-IOMMUNotifier n;
-QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
-} VFIOGuestIOMMU;
-
 typedef struct VFIORamDiscardListener {
 VFIOLegacyContainer *container;
 MemoryRegion *mr;
diff --git a/include/hw/vfio/vfio-container-base.h 
b/include/hw/vfio/vfio-container-base.h
index 1483e77441..b6c8eb2313 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -33,6 +33,14 @@ typedef struct VFIOContainer VFIOContainer;
 typedef struct VFIODevice VFIODevice;
 typedef struct VFIOIOMMUBackendOpsClass VFIOIOMMUBackendOpsClass;
 
+typedef struct VFIOGuestIOMMU {
+VFIOContainer *bcontainer;
+IOMMUMemoryRegion *iommu_mr;
+hwaddr iommu_offset;
+IOMMUNotifier n;
+QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
+} VFIOGuestIOMMU;
+
 typedef struct {
 unsigned long *bitmap;
 hwaddr size;
@@ -44,6 +52,7 @@ typedef struct {
  */
 struct VFIOContainer {
 VFIOIOMMUBackendOpsClass *ops;
+QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
 };
 
 int vfio_container_dma_map(VFIOContainer *bcontainer,
@@ -53,6 +62,10 @@ int vfio_container_dma_unmap(VFIOContainer *bcontainer,
  hwaddr iova, ram_addr_t size,
  IOMMUTLBEntry *iotlb);
 
+void vfio_container_init(VFIOContainer *bcontainer,
+ struct VFIOIOMMUBackendOpsClass *ops);
+void vfio_container_destroy(VFIOContainer *bcontainer);
+
 #define TYPE_VFIO_IOMMU_BACKEND_LEGACY_OPS "vfio-iommu-backend-legacy-ops"
 #define TYPE_VFIO_IOMMU_BACKEND_OPS "vfio-iommu-backend-ops"
 
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 6be1526d79..1adfdca4f5 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -337,7 +337,7 @@ static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void 
**vaddr,
 static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
 {
 VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
-VFIOContainer *bcontainer = &giommu->container->bcontainer;
+VFIOContainer *bcontainer = giommu->bcontainer;
 hwaddr iova = iotlb->iova + giommu->iommu_offset;
 void *vaddr;
 int ret;
@@ -632,6 +632,7 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 VFIOLegacyContainer *container = container_of(listener,
   VFIOLegacyContainer,
   listener);
+VFIOContainer *bcontainer = &container->bcontainer;
 hwaddr iova, end;
 Int128 llend, llsize;
 void *vaddr;
@@ -683,7 +684,7 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 giommu->iommu_mr = iommu_mr;
 giommu->iommu_offset = section->offset_within_address_space -
section->offset_within_region;
-giommu->container = container;
+giommu->bcontainer = bcontainer;
 llend = int128_add(int128_make64(section->offset_within_region),
section->size);
 llend = int128_sub(llend, int128_one());
@@ -709,7 +710,7 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 g_free(giommu);
 goto fail;
 }
-QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
+QLIST_INSERT_HEAD(&bcontainer->giommu_list, giommu, giommu_next);
 memory_region_iommu_replay(giommu->iommu_mr, &giommu->n);
 
 return;
@@ -796,6 +797,7 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
 VFIOLegacyC

[PATCH v2 03/27] VFIO/container: Introduce dummy VFIOContainerClass implementation

2023-10-16 Thread Zhenzhong Duan

From: Eric Auger 

Let's instantiate a dummy VFIOContainerClass implementation whose
functions are not yet implemented.

No fucntional change intended.

Signed-off-by: Eric Auger 
Signed-off-by: Yi Liu 
Signed-off-by: Yi Sun 
Signed-off-by: Zhenzhong Duan 
---
 include/hw/vfio/vfio-container-base.h |  1 +
 hw/vfio/container-base.c  | 40 +++
 hw/vfio/container.c   | 22 +++
 hw/vfio/meson.build   |  1 +
 4 files changed, 64 insertions(+)
 create mode 100644 hw/vfio/container-base.c

diff --git a/include/hw/vfio/vfio-container-base.h 
b/include/hw/vfio/vfio-container-base.h
index afc8543d22..226e960fb5 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -46,6 +46,7 @@ struct VFIOContainer {
 VFIOIOMMUBackendOpsClass *ops;
 };
 
+#define TYPE_VFIO_IOMMU_BACKEND_LEGACY_OPS "vfio-iommu-backend-legacy-ops"
 #define TYPE_VFIO_IOMMU_BACKEND_OPS "vfio-iommu-backend-ops"
 
 DECLARE_CLASS_CHECKERS(VFIOIOMMUBackendOpsClass,
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
new file mode 100644
index 00..0c21e77039
--- /dev/null
+++ b/hw/vfio/container-base.c
@@ -0,0 +1,40 @@
+/*
+ * VFIO BASE CONTAINER
+ *
+ * Copyright (C) 2023 Intel Corporation.
+ * Copyright Red Hat, Inc. 2023
+ *
+ * Authors: Yi Liu 
+ *  Eric Auger 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "hw/vfio/vfio-container-base.h"
+
+static const TypeInfo vfio_iommu_backend_ops_type_info = {
+.name = TYPE_VFIO_IOMMU_BACKEND_OPS,
+.parent = TYPE_OBJECT,
+.abstract = true,
+.class_size = sizeof(VFIOIOMMUBackendOpsClass),
+};
+
+static void vfio_iommu_backend_ops_register_types(void)
+{
+type_register_static(&vfio_iommu_backend_ops_type_info);
+}
+type_init(vfio_iommu_backend_ops_register_types);
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 8fde302ae9..acc4a6bf8a 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -539,6 +539,9 @@ static void 
vfio_get_iommu_info_migration(VFIOLegacyContainer *container,
 static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
   Error **errp)
 {
+VFIOIOMMUBackendOpsClass *ops = VFIO_IOMMU_BACKEND_OPS_CLASS(
+object_class_by_name(TYPE_VFIO_IOMMU_BACKEND_LEGACY_OPS));
+VFIOContainer *bcontainer;
 VFIOLegacyContainer *container;
 int ret, fd;
 VFIOAddressSpace *space;
@@ -620,6 +623,8 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
 QLIST_INIT(&container->giommu_list);
 QLIST_INIT(&container->hostwin_list);
 QLIST_INIT(&container->vrdl_list);
+bcontainer = &container->bcontainer;
+bcontainer->ops = ops;
 
 ret = vfio_init_container(container, group->fd, errp);
 if (ret) {
@@ -1160,3 +1165,20 @@ void vfio_detach_device(VFIODevice *vbasedev)
 vfio_put_base_device(vbasedev);
 vfio_put_group(group);
 }
+
+static void vfio_iommu_backend_legacy_ops_class_init(ObjectClass *oc,
+ void *data) {
+}
+
+static const TypeInfo vfio_iommu_backend_legacy_ops_type = {
+.name = TYPE_VFIO_IOMMU_BACKEND_LEGACY_OPS,
+
+.parent = TYPE_VFIO_IOMMU_BACKEND_OPS,
+.class_init = vfio_iommu_backend_legacy_ops_class_init,
+.abstract = true,
+};
+static void vfio_iommu_backend_legacy_ops_register_types(void)
+{
+type_register_static(&vfio_iommu_backend_legacy_ops_type);
+}
+type_init(vfio_iommu_backend_legacy_ops_register_types);
diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
index 2a6912c940..eb6ce6229d 100644
--- a/hw/vfio/meson.build
+++ b/hw/vfio/meson.build
@@ -2,6 +2,7 @@ vfio_ss = ss.source_set()
 vfio_ss.add(files(
   'helpers.c',
   'common.c',
+  'container-base.c',
   'container.c',
   'spapr.c',
   'migration.c',
-- 
2.34.1

[PATCH v2 11/27] vfio/container: Convert functions to base container

2023-10-16 Thread Zhenzhong Duan

From: Eric Auger 

In the prospect to get rid of VFIOLegacyContainer refs
in common.c lets convert misc functions to use the base
container object instead:

vfio_devices_all_dirty_tracking
vfio_devices_all_device_dirty_tracking
vfio_devices_all_running_and_mig_active
vfio_devices_query_dirty_bitmap
vfio_get_dirty_bitmap

Signed-off-by: Eric Auger 
Signed-off-by: Zhenzhong Duan 
---
 include/hw/vfio/vfio-common.h |  9 
 hw/vfio/common.c  | 42 +++
 hw/vfio/container.c   |  6 ++---
 hw/vfio/trace-events  |  2 +-
 4 files changed, 26 insertions(+), 33 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 6979359457..7bb75bc7cd 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -182,7 +182,6 @@ int vfio_host_win_del(VFIOContainer *bcontainer, hwaddr 
min_iova,
   hwaddr max_iova);
 VFIOAddressSpace *vfio_get_address_space(AddressSpace *as);
 void vfio_put_address_space(VFIOAddressSpace *space);
-bool vfio_devices_all_running_and_saving(VFIOLegacyContainer *container);
 
 void vfio_disable_irqindex(VFIODevice *vbasedev, int index);
 void vfio_unmask_single_irqindex(VFIODevice *vbasedev, int index);
@@ -254,11 +253,11 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error 
**errp);
 void vfio_migration_exit(VFIODevice *vbasedev);
 
 int vfio_bitmap_alloc(VFIOBitmap *vbmap, hwaddr size);
-bool vfio_devices_all_running_and_mig_active(VFIOLegacyContainer *container);
-bool vfio_devices_all_device_dirty_tracking(VFIOLegacyContainer *container);
-int vfio_devices_query_dirty_bitmap(VFIOLegacyContainer *container,
+bool vfio_devices_all_running_and_mig_active(VFIOContainer *bcontainer);
+bool vfio_devices_all_device_dirty_tracking(VFIOContainer *bcontainer);
+int vfio_devices_query_dirty_bitmap(VFIOContainer *bcontainer,
 VFIOBitmap *vbmap, hwaddr iova,
 hwaddr size);
-int vfio_get_dirty_bitmap(VFIOLegacyContainer *container, uint64_t iova,
+int vfio_get_dirty_bitmap(VFIOContainer *bcontainer, uint64_t iova,
  uint64_t size, ram_addr_t ram_addr);
 #endif /* HW_VFIO_VFIO_COMMON_H */
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 7350af038a..1c47bcc478 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -184,9 +184,8 @@ bool vfio_device_state_is_precopy(VFIODevice *vbasedev)
migration->device_state == VFIO_DEVICE_STATE_PRE_COPY_P2P;
 }
 
-static bool vfio_devices_all_dirty_tracking(VFIOLegacyContainer *container)
+static bool vfio_devices_all_dirty_tracking(VFIOContainer *bcontainer)
 {
-VFIOContainer *bcontainer = &container->bcontainer;
 VFIODevice *vbasedev;
 MigrationState *ms = migrate_get_current();
 
@@ -211,9 +210,8 @@ static bool 
vfio_devices_all_dirty_tracking(VFIOLegacyContainer *container)
 return true;
 }
 
-bool vfio_devices_all_device_dirty_tracking(VFIOLegacyContainer *container)
+bool vfio_devices_all_device_dirty_tracking(VFIOContainer *bcontainer)
 {
-VFIOContainer *bcontainer = &container->bcontainer;
 VFIODevice *vbasedev;
 
 QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
@@ -229,9 +227,8 @@ bool 
vfio_devices_all_device_dirty_tracking(VFIOLegacyContainer *container)
  * Check if all VFIO devices are running and migration is active, which is
  * essentially equivalent to the migration being in pre-copy phase.
  */
-bool vfio_devices_all_running_and_mig_active(VFIOLegacyContainer *container)
+bool vfio_devices_all_running_and_mig_active(VFIOContainer *bcontainer)
 {
-VFIOContainer *bcontainer = &container->bcontainer;
 VFIODevice *vbasedev;
 
 if (!migration_is_active(migrate_get_current())) {
@@ -1152,7 +1149,7 @@ static void vfio_listener_log_global_start(MemoryListener 
*listener)
   listener);
 int ret;
 
-if (vfio_devices_all_device_dirty_tracking(container)) {
+if (vfio_devices_all_device_dirty_tracking(&container->bcontainer)) {
 ret = vfio_devices_dma_logging_start(container);
 } else {
 ret = vfio_container_set_dirty_page_tracking(&container->bcontainer,
@@ -1173,7 +1170,7 @@ static void vfio_listener_log_global_stop(MemoryListener 
*listener)
   listener);
 int ret = 0;
 
-if (vfio_devices_all_device_dirty_tracking(container)) {
+if (vfio_devices_all_device_dirty_tracking(&container->bcontainer)) {
 vfio_devices_dma_logging_stop(container);
 } else {
 ret = vfio_container_set_dirty_page_tracking(&container->bcontainer,
@@ -1213,11 +1210,10 @@ static int vfio_device_dma_logging_report(VFIODevice 
*vbasedev, hwaddr iova,
 return 0;
 }
 
-int vfio_devices_query_dirty_bitmap(VFIOLegacyContainer *container,
+int vfio_devices_query_dirty_bitmap(VFIOContainer *bcontainer,

[PATCH v2 02/27] vfio: Introduce base object for VFIOContainer and targetted interface

2023-10-16 Thread Zhenzhong Duan

From: Eric Auger 

Introduce a dumb VFIOContainer base object and its targetted interface.
This is willingly not a QOM object because we don't want it to be
visible from the user interface.  The VFIOContainer will be smoothly
populated in subsequent patches as well as interfaces.

No fucntional change intended.

Signed-off-by: Eric Auger 
Signed-off-by: Yi Liu 
Signed-off-by: Yi Sun 
Signed-off-by: Zhenzhong Duan 
---
 include/hw/vfio/vfio-common.h |  8 +--
 include/hw/vfio/vfio-container-base.h | 82 +++
 2 files changed, 84 insertions(+), 6 deletions(-)
 create mode 100644 include/hw/vfio/vfio-container-base.h

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 34648e518e..9651cf921c 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -30,6 +30,7 @@
 #include 
 #endif
 #include "sysemu/sysemu.h"
+#include "hw/vfio/vfio-container-base.h"
 
 #define VFIO_MSG_PREFIX "vfio %s: "
 
@@ -81,6 +82,7 @@ typedef struct VFIOAddressSpace {
 struct VFIOGroup;
 
 typedef struct VFIOLegacyContainer {
+VFIOContainer bcontainer;
 VFIOAddressSpace *space;
 int fd; /* /dev/vfio/vfio, empowered by the attached groups */
 MemoryListener listener;
@@ -200,12 +202,6 @@ typedef struct VFIODisplay {
 } dmabuf;
 } VFIODisplay;
 
-typedef struct {
-unsigned long *bitmap;
-hwaddr size;
-hwaddr pages;
-} VFIOBitmap;
-
 void vfio_host_win_add(VFIOLegacyContainer *container,
hwaddr min_iova, hwaddr max_iova,
uint64_t iova_pgsizes);
diff --git a/include/hw/vfio/vfio-container-base.h 
b/include/hw/vfio/vfio-container-base.h
new file mode 100644
index 00..afc8543d22
--- /dev/null
+++ b/include/hw/vfio/vfio-container-base.h
@@ -0,0 +1,82 @@
+/*
+ * VFIO BASE CONTAINER
+ *
+ * Copyright (C) 2023 Intel Corporation.
+ * Copyright Red Hat, Inc. 2023
+ *
+ * Authors: Yi Liu 
+ *  Eric Auger 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#ifndef HW_VFIO_VFIO_BASE_CONTAINER_H
+#define HW_VFIO_VFIO_BASE_CONTAINER_H
+
+#include "exec/memory.h"
+#ifndef CONFIG_USER_ONLY
+#include "exec/hwaddr.h"
+#endif
+
+typedef struct VFIOContainer VFIOContainer;
+typedef struct VFIODevice VFIODevice;
+typedef struct VFIOIOMMUBackendOpsClass VFIOIOMMUBackendOpsClass;
+
+typedef struct {
+unsigned long *bitmap;
+hwaddr size;
+hwaddr pages;
+} VFIOBitmap;
+
+/*
+ * This is the base object for vfio container backends
+ */
+struct VFIOContainer {
+VFIOIOMMUBackendOpsClass *ops;
+};
+
+#define TYPE_VFIO_IOMMU_BACKEND_OPS "vfio-iommu-backend-ops"
+
+DECLARE_CLASS_CHECKERS(VFIOIOMMUBackendOpsClass,
+   VFIO_IOMMU_BACKEND_OPS, TYPE_VFIO_IOMMU_BACKEND_OPS)
+
+struct VFIOIOMMUBackendOpsClass {
+/*< private >*/
+ObjectClass parent_class;
+
+/*< public >*/
+/* required */
+int (*dma_map)(VFIOContainer *bcontainer,
+   hwaddr iova, ram_addr_t size,
+   void *vaddr, bool readonly);
+int (*dma_unmap)(VFIOContainer *bcontainer,
+ hwaddr iova, ram_addr_t size,
+ IOMMUTLBEntry *iotlb);
+int (*attach_device)(char *name, VFIODevice *vbasedev,
+ AddressSpace *as, Error **errp);
+void (*detach_device)(VFIODevice *vbasedev);
+/* migration feature */
+int (*set_dirty_page_tracking)(VFIOContainer *bcontainer, bool start);
+int (*query_dirty_bitmap)(VFIOContainer *bcontainer, VFIOBitmap *vbmap,
+  hwaddr iova, hwaddr size);
+
+/* SPAPR specific */
+int (*add_window)(VFIOContainer *bcontainer,
+  MemoryRegionSection *section,
+  Error **errp);
+void (*del_window)(VFIOContainer *bcontainer,
+   MemoryRegionSection *section);
+};
+
+#endif /* HW_VFIO_VFIO_BASE_CONTAINER_H */
-- 
2.34.1

[PATCH v2 20/27] vfio/container: Bypass EEH if iommufd backend

2023-10-16 Thread Zhenzhong Duan

IBM EEH is only supported by legacy backend currently, bypass it
for IOMMUFD backend.

Signed-off-by: Zhenzhong Duan 
---
 hw/vfio/container.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index c86accdb38..dd9534afab 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -1047,6 +1047,8 @@ static VFIOLegacyContainer 
*vfio_eeh_as_container(AddressSpace *as)
 {
 VFIOAddressSpace *space = vfio_get_address_space(as);
 VFIOContainer *bcontainer = NULL;
+const VFIOIOMMUBackendOpsClass *ops = VFIO_IOMMU_BACKEND_OPS_CLASS(
+  
object_class_by_name(TYPE_VFIO_IOMMU_BACKEND_LEGACY_OPS));
 
 if (QLIST_EMPTY(&space->containers)) {
 /* No containers to act on */
@@ -1055,7 +1057,7 @@ static VFIOLegacyContainer 
*vfio_eeh_as_container(AddressSpace *as)
 
 bcontainer = QLIST_FIRST(&space->containers);
 
-if (QLIST_NEXT(bcontainer, next)) {
+if (QLIST_NEXT(bcontainer, next) || bcontainer->ops != ops) {
 /*
  * We don't yet have logic to synchronize EEH state across
  * multiple containers
-- 
2.34.1

[PATCH v2 06/27] vfio/container: Move space field to base container

2023-10-16 Thread Zhenzhong Duan

From: Eric Auger 

Move the space field to the base object. Also the VFIOAddressSpace
now contains a list of base containers.

No fucntional change intended.

Signed-off-by: Eric Auger 
Signed-off-by: Yi Liu 
Signed-off-by: Yi Sun 
Signed-off-by: Zhenzhong Duan 
---
 include/hw/vfio/vfio-common.h |  8 
 include/hw/vfio/vfio-container-base.h |  9 +
 hw/vfio/common.c  |  4 ++--
 hw/vfio/container-base.c  |  4 
 hw/vfio/container.c   | 28 +--
 5 files changed, 29 insertions(+), 24 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 884d1627f4..33f475957c 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -73,17 +73,10 @@ typedef struct VFIOMigration {
 bool initial_data_sent;
 } VFIOMigration;
 
-typedef struct VFIOAddressSpace {
-AddressSpace *as;
-QLIST_HEAD(, VFIOLegacyContainer) containers;
-QLIST_ENTRY(VFIOAddressSpace) list;
-} VFIOAddressSpace;
-
 struct VFIOGroup;
 
 typedef struct VFIOLegacyContainer {
 VFIOContainer bcontainer;
-VFIOAddressSpace *space;
 int fd; /* /dev/vfio/vfio, empowered by the attached groups */
 MemoryListener listener;
 MemoryListener prereg_listener;
@@ -98,7 +91,6 @@ typedef struct VFIOLegacyContainer {
 QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
 QLIST_HEAD(, VFIOGroup) group_list;
 QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
-QLIST_ENTRY(VFIOLegacyContainer) next;
 QLIST_HEAD(, VFIODevice) device_list;
 } VFIOLegacyContainer;
 
diff --git a/include/hw/vfio/vfio-container-base.h 
b/include/hw/vfio/vfio-container-base.h
index b6c8eb2313..9504564f4e 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -33,6 +33,12 @@ typedef struct VFIOContainer VFIOContainer;
 typedef struct VFIODevice VFIODevice;
 typedef struct VFIOIOMMUBackendOpsClass VFIOIOMMUBackendOpsClass;
 
+typedef struct VFIOAddressSpace {
+AddressSpace *as;
+QLIST_HEAD(, VFIOContainer) containers;
+QLIST_ENTRY(VFIOAddressSpace) list;
+} VFIOAddressSpace;
+
 typedef struct VFIOGuestIOMMU {
 VFIOContainer *bcontainer;
 IOMMUMemoryRegion *iommu_mr;
@@ -52,7 +58,9 @@ typedef struct {
  */
 struct VFIOContainer {
 VFIOIOMMUBackendOpsClass *ops;
+VFIOAddressSpace *space;
 QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
+QLIST_ENTRY(VFIOContainer) next;
 };
 
 int vfio_container_dma_map(VFIOContainer *bcontainer,
@@ -63,6 +71,7 @@ int vfio_container_dma_unmap(VFIOContainer *bcontainer,
  IOMMUTLBEntry *iotlb);
 
 void vfio_container_init(VFIOContainer *bcontainer,
+ VFIOAddressSpace *space,
  struct VFIOIOMMUBackendOpsClass *ops);
 void vfio_container_destroy(VFIOContainer *bcontainer);
 
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 1adfdca4f5..c92af34eed 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -152,7 +152,7 @@ void vfio_unblock_multiple_devices_migration(void)
 
 bool vfio_viommu_preset(VFIODevice *vbasedev)
 {
-return vbasedev->container->space->as != &address_space_memory;
+return vbasedev->container->bcontainer.space->as != &address_space_memory;
 }
 
 static void vfio_set_migration_error(int err)
@@ -990,7 +990,7 @@ static void vfio_dirty_tracking_init(VFIOLegacyContainer 
*container,
 dirty.container = container;
 
 memory_listener_register(&dirty.listener,
- container->space->as);
+ container->bcontainer.space->as);
 
 *ranges = dirty.ranges;
 
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index 6da50e8151..e1056dd78e 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -49,9 +49,11 @@ int vfio_container_dma_unmap(VFIOContainer *bcontainer,
 }
 
 void vfio_container_init(VFIOContainer *bcontainer,
+ VFIOAddressSpace *space,
  struct VFIOIOMMUBackendOpsClass *ops)
 {
 bcontainer->ops = ops;
+bcontainer->space = space;
 QLIST_INIT(&bcontainer->giommu_list);
 }
 
@@ -59,6 +61,8 @@ void vfio_container_destroy(VFIOContainer *bcontainer)
 {
 VFIOGuestIOMMU *giommu, *tmp;
 
+QLIST_REMOVE(bcontainer, next);
+
 QLIST_FOREACH_SAFE(giommu, &bcontainer->giommu_list, giommu_next, tmp) {
 memory_region_unregister_iommu_notifier(
 MEMORY_REGION(giommu->iommu_mr), &giommu->n);
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index de6b018eeb..fd2d602fb9 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -588,7 +588,8 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
  * details once we know which type of IOMMU we are using.
  */
 
-QLIST_FOREACH(container, &space->containers, next) {
+QLIST_FOREACH(bcontainer, &space->containers, next) {
+container = con

[PATCH v2 19/27] vfio/iommufd: Implement the iommufd backend

2023-10-16 Thread Zhenzhong Duan

From: Yi Liu 

Add the iommufd backend. The IOMMUFD container class is implemented
based on the new /dev/iommu user API. This backend obviously depends
on CONFIG_IOMMUFD.

So far, the iommufd backend doesn't support dirty page sync yet due
to missing support in the host kernel.

Co-authored-by: Eric Auger 
Signed-off-by: Eric Auger 
Signed-off-by: Yi Liu 
Signed-off-by: Zhenzhong Duan 
---
 include/hw/vfio/vfio-common.h |  22 ++
 include/hw/vfio/vfio-container-base.h |   3 +
 hw/vfio/common.c  |  19 +-
 hw/vfio/iommufd.c | 535 ++
 hw/vfio/meson.build   |   3 +
 hw/vfio/trace-events  |  12 +
 6 files changed, 590 insertions(+), 4 deletions(-)
 create mode 100644 hw/vfio/iommufd.c

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 9f2b86581b..e72f5962ee 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -83,6 +83,23 @@ typedef struct VFIOLegacyContainer {
 QLIST_HEAD(, VFIOGroup) group_list;
 } VFIOLegacyContainer;
 
+#ifdef CONFIG_IOMMUFD
+typedef struct VFIOIOASHwpt {
+uint32_t hwpt_id;
+QLIST_HEAD(, VFIODevice) device_list;
+QLIST_ENTRY(VFIOIOASHwpt) next;
+} VFIOIOASHwpt;
+
+typedef struct IOMMUFDBackend IOMMUFDBackend;
+
+typedef struct VFIOIOMMUFDContainer {
+VFIOContainer bcontainer;
+IOMMUFDBackend *be;
+uint32_t ioas_id;
+QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
+} VFIOIOMMUFDContainer;
+#endif
+
 typedef struct VFIODeviceOps VFIODeviceOps;
 
 typedef struct VFIODevice {
@@ -110,6 +127,11 @@ typedef struct VFIODevice {
 OnOffAuto pre_copy_dirty_page_tracking;
 bool dirty_pages_supported;
 bool dirty_tracking;
+#ifdef CONFIG_IOMMUFD
+int devid;
+VFIOIOASHwpt *hwpt;
+IOMMUFDBackend *iommufd;
+#endif
 } VFIODevice;
 
 struct VFIODeviceOps {
diff --git a/include/hw/vfio/vfio-container-base.h 
b/include/hw/vfio/vfio-container-base.h
index 9a5971a00a..5345986993 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -114,6 +114,9 @@ void vfio_container_init(VFIOContainer *bcontainer,
 void vfio_container_destroy(VFIOContainer *bcontainer);
 
 #define TYPE_VFIO_IOMMU_BACKEND_LEGACY_OPS "vfio-iommu-backend-legacy-ops"
+#ifdef CONFIG_IOMMUFD
+#define TYPE_VFIO_IOMMU_BACKEND_IOMMUFD_OPS "vfio-iommu-backend-iommufd-ops"
+#endif
 #define TYPE_VFIO_IOMMU_BACKEND_OPS "vfio-iommu-backend-ops"
 
 DECLARE_CLASS_CHECKERS(VFIOIOMMUBackendOpsClass,
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index ee2ebf4be9..6901573c32 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1520,10 +1520,13 @@ VFIOAddressSpace *vfio_get_address_space(AddressSpace 
*as)
 
 void vfio_put_address_space(VFIOAddressSpace *space)
 {
-if (QLIST_EMPTY(&space->containers)) {
-QLIST_REMOVE(space, list);
-g_free(space);
+if (!QLIST_EMPTY(&space->containers)) {
+return;
 }
+
+QLIST_REMOVE(space, list);
+g_free(space);
+
 if (QLIST_EMPTY(&vfio_address_spaces)) {
 qemu_unregister_reset(vfio_reset_handler, NULL);
 }
@@ -1558,8 +1561,16 @@ int vfio_attach_device(char *name, VFIODevice *vbasedev,
 {
 const VFIOIOMMUBackendOpsClass *ops;
 
-ops = VFIO_IOMMU_BACKEND_OPS_CLASS(
+#ifdef CONFIG_IOMMUFD
+if (vbasedev->iommufd) {
+ops = VFIO_IOMMU_BACKEND_OPS_CLASS(
+  object_class_by_name(TYPE_VFIO_IOMMU_BACKEND_IOMMUFD_OPS));
+} else
+#endif
+{
+ops = VFIO_IOMMU_BACKEND_OPS_CLASS(
   object_class_by_name(TYPE_VFIO_IOMMU_BACKEND_LEGACY_OPS));
+}
 if (!ops) {
 error_setg(errp, "VFIO IOMMU Backend not found!");
 return -ENODEV;
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
new file mode 100644
index 00..ee8c4620b6
--- /dev/null
+++ b/hw/vfio/iommufd.c
@@ -0,0 +1,535 @@
+/*
+ * iommufd container backend
+ *
+ * Copyright (C) 2023 Intel Corporation.
+ * Copyright Red Hat, Inc. 2023
+ *
+ * Authors: Yi Liu 
+ *  Eric Auger 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include 
+#include 
+#include 
+
+#include "hw/vfio/vfio-common.h"
+#include "qemu/error-report.h"
+#include "trace.h"
+#include "qapi/error.h"
+#include "sysemu/iommufd.h"
+#include "hw/qdev-core.h"
+#include "sysemu/reset.h"
+#inc

[PATCH v2 17/27] backends/iommufd: Introduce the iommufd object

2023-10-16 Thread Zhenzhong Duan

From: Eric Auger 

Introduce an iommufd object which allows the interaction
with the host /dev/iommu device.

The /dev/iommu can have been already pre-opened outside of qemu,
in which case the fd can be passed directly along with the
iommufd object:

This allows the iommufd object to be shared accross several
subsystems (VFIO, VDPA, ...). For example, libvirt would open
the /dev/iommu once.

If no fd is passed along with the iommufd object, the /dev/iommu
is opened by the qemu code.

The CONFIG_IOMMUFD option must be set to compile this new object.

Suggested-by: Alex Williamson 
Signed-off-by: Eric Auger 
Signed-off-by: Yi Liu 
Signed-off-by: Zhenzhong Duan 
---
 MAINTAINERS  |   7 +
 qapi/qom.json|  18 ++-
 include/sysemu/iommufd.h |  46 +++
 backends/iommufd-stub.c  |  59 +
 backends/iommufd.c   | 268 +++
 backends/Kconfig |   4 +
 backends/meson.build |   5 +
 backends/trace-events|  12 ++
 qemu-options.hx  |  13 ++
 9 files changed, 431 insertions(+), 1 deletion(-)
 create mode 100644 include/sysemu/iommufd.h
 create mode 100644 backends/iommufd-stub.c
 create mode 100644 backends/iommufd.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 9e7dec4a58..a7cdeb7825 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2081,6 +2081,13 @@ F: hw/vfio/ap.c
 F: docs/system/s390x/vfio-ap.rst
 L: qemu-s3...@nongnu.org
 
+iommufd
+M: Yi Liu 
+M: Eric Auger 
+S: Supported
+F: backends/iommufd.c
+F: include/sysemu/iommufd.h
+
 vhost
 M: Michael S. Tsirkin 
 S: Supported
diff --git a/qapi/qom.json b/qapi/qom.json
index c53ef978ff..3f964e57f5 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -794,6 +794,18 @@
 { 'struct': 'VfioUserServerProperties',
   'data': { 'socket': 'SocketAddress', 'device': 'str' } }
 
+##
+# @IOMMUFDProperties:
+#
+# Properties for IOMMUFDbackend objects.
+#
+# fd: file descriptor name
+#
+# Since: 7.2
+##
+{ 'struct': 'IOMMUFDProperties',
+'data': { '*fd': 'str' } }
+
 ##
 # @RngProperties:
 #
@@ -948,6 +960,8 @@
 'qtest',
 'rng-builtin',
 'rng-egd',
+{ 'name': 'iommufd',
+  'if': 'CONFIG_IOMMUFD' },
 { 'name': 'rng-random',
   'if': 'CONFIG_POSIX' },
 'secret',
@@ -1029,7 +1043,9 @@
   'tls-creds-x509': 'TlsCredsX509Properties',
   'tls-cipher-suites':  'TlsCredsProperties',
   'x-remote-object':'RemoteObjectProperties',
-  'x-vfio-user-server': 'VfioUserServerProperties'
+  'x-vfio-user-server': 'VfioUserServerProperties',
+  'iommufd':{ 'type': 'IOMMUFDProperties',
+  'if': 'CONFIG_IOMMUFD' }
   } }
 
 ##
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
new file mode 100644
index 00..f0e5c7eeb8
--- /dev/null
+++ b/include/sysemu/iommufd.h
@@ -0,0 +1,46 @@
+#ifndef SYSEMU_IOMMUFD_H
+#define SYSEMU_IOMMUFD_H
+
+#include "qom/object.h"
+#include "qemu/thread.h"
+#include "exec/hwaddr.h"
+#include "exec/cpu-common.h"
+
+#define TYPE_IOMMUFD_BACKEND "iommufd"
+OBJECT_DECLARE_TYPE(IOMMUFDBackend, IOMMUFDBackendClass,
+IOMMUFD_BACKEND)
+#define IOMMUFD_BACKEND(obj) \
+OBJECT_CHECK(IOMMUFDBackend, (obj), TYPE_IOMMUFD_BACKEND)
+#define IOMMUFD_BACKEND_GET_CLASS(obj) \
+OBJECT_GET_CLASS(IOMMUFDBackendClass, (obj), TYPE_IOMMUFD_BACKEND)
+#define IOMMUFD_BACKEND_CLASS(klass) \
+OBJECT_CLASS_CHECK(IOMMUFDBackendClass, (klass), TYPE_IOMMUFD_BACKEND)
+struct IOMMUFDBackendClass {
+ObjectClass parent_class;
+};
+
+struct IOMMUFDBackend {
+Object parent;
+
+/*< protected >*/
+int fd;/* /dev/iommu file descriptor */
+bool owned;/* is the /dev/iommu opened internally */
+QemuMutex lock;
+uint32_t users;
+
+/*< public >*/
+};
+
+int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp);
+void iommufd_backend_disconnect(IOMMUFDBackend *be);
+
+int iommufd_backend_get_ioas(IOMMUFDBackend *be, uint32_t *ioas_id);
+void iommufd_backend_put_ioas(IOMMUFDBackend *be, uint32_t ioas_id);
+void iommufd_backend_free_id(int fd, uint32_t id);
+int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova,
+ram_addr_t size, void *vaddr, bool readonly);
+int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
+  hwaddr iova, ram_addr_t size);
+int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
+   uint32_t pt_id, uint32_t *out_hwpt);
+#endif
diff --git a/backends/iommufd-stub.c b/backends/iommufd-stub.c
new file mode 100644
index 00..cfb9a87859
--- /dev/null
+++ b/backends/iommufd-stub.c
@@ -0,0 +1,59 @@
+/*
+ * iommufd container backend stub
+ *
+ * Copyright (C) 2023 Intel Corporation.
+ * Copyright Red Hat, Inc. 2023
+ *
+ * Authors: Yi Liu 
+ *  Eric Auger 
+ *
+ * This program is free software; you can redistribute it

[PATCH v2 16/27] Add iommufd configure option

2023-10-16 Thread Zhenzhong Duan

This adds "--enable-iommufd/--disable-iommufd" to enable or disable
iommufd support, enabled by default.

Signed-off-by: Zhenzhong Duan 
---
 meson.build   | 6 ++
 meson_options.txt | 2 ++
 scripts/meson-buildoptions.sh | 3 +++
 3 files changed, 11 insertions(+)

diff --git a/meson.build b/meson.build
index 79aef19bdc..e8d285aa5b 100644
--- a/meson.build
+++ b/meson.build
@@ -560,6 +560,10 @@ have_tpm = get_option('tpm') \
   .require(targetos != 'windows', error_message: 'TPM emulation only available 
on POSIX systems') \
   .allowed()
 
+have_iommufd = get_option('iommufd') \
+  .require(targetos == 'linux', error_message: 'iommufd is supported only on 
Linux') \
+  .allowed()
+
 # vhost
 have_vhost_user = get_option('vhost_user') \
   .disable_auto_if(targetos != 'linux') \
@@ -2126,6 +2130,7 @@ if get_option('tcg').allowed()
 endif
 config_host_data.set('CONFIG_TPM', have_tpm)
 config_host_data.set('CONFIG_TSAN', get_option('tsan'))
+config_host_data.set('CONFIG_IOMMUFD', have_iommufd)
 config_host_data.set('CONFIG_USB_LIBUSB', libusb.found())
 config_host_data.set('CONFIG_VDE', vde.found())
 config_host_data.set('CONFIG_VHOST_NET', have_vhost_net)
@@ -4061,6 +4066,7 @@ summary_info += {'vhost-user-crypto support': 
have_vhost_user_crypto}
 summary_info += {'vhost-user-blk server support': have_vhost_user_blk_server}
 summary_info += {'vhost-vdpa support': have_vhost_vdpa}
 summary_info += {'build guest agent': have_ga}
+summary_info += {'iommufd support': have_iommufd}
 summary(summary_info, bool_yn: true, section: 'Configurable features')
 
 # Compilation information
diff --git a/meson_options.txt b/meson_options.txt
index 6a17b90968..62bd75284b 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -107,6 +107,8 @@ option('dbus_display', type: 'feature', value: 'auto',
description: '-display dbus support')
 option('tpm', type : 'feature', value : 'auto',
description: 'TPM support')
+option('iommufd', type : 'feature', value : 'auto',
+   description: 'iommufd support')
 
 # Do not enable it by default even for Mingw32, because it doesn't
 # work on Wine.
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 2a74b0275b..86909dc2cc 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -114,6 +114,7 @@ meson_options_help() {
   printf "%s\n" '  guest-agent-msi Build MSI package for the QEMU Guest Agent'
   printf "%s\n" '  hvf HVF acceleration support'
   printf "%s\n" '  iconv   Font glyph conversion support'
+  printf "%s\n" '  iommufd iommufd support'
   printf "%s\n" '  jackJACK sound support'
   printf "%s\n" '  keyring Linux keyring support'
   printf "%s\n" '  kvm KVM acceleration support'
@@ -327,6 +328,8 @@ _meson_option_parse() {
 --enable-install-blobs) printf "%s" -Dinstall_blobs=true ;;
 --disable-install-blobs) printf "%s" -Dinstall_blobs=false ;;
 --interp-prefix=*) quote_sh "-Dinterp_prefix=$2" ;;
+--enable-iommufd) printf "%s" -Diommufd=enabled ;;
+--disable-iommufd) printf "%s" -Diommufd=disabled ;;
 --enable-jack) printf "%s" -Djack=enabled ;;
 --disable-jack) printf "%s" -Djack=disabled ;;
 --enable-keyring) printf "%s" -Dkeyring=enabled ;;
-- 
2.34.1

[PATCH v2 22/27] vfio/pci: Allow the selection of a given iommu backend

2023-10-16 Thread Zhenzhong Duan

From: Eric Auger 

Now we support two types of iommu backends, let's add the capability
to select one of them. This depends on whether an iommufd object has
been linked with the vfio-pci device:

if the user wants to use the legacy backend, it shall not
link the vfio-pci device with any iommufd object:

-device vfio-pci,host=:02:00.0

This is called the legacy mode/backend.

If the user wants to use the iommufd backend (/dev/iommu) it
shall pass an iommufd object id in the vfio-pci device options:

 -object iommufd,id=iommufd0
 -device vfio-pci,host=:02:00.0,iommufd=iommufd0

Suggested-by: Alex Williamson 
Signed-off-by: Eric Auger 
Signed-off-by: Yi Liu 
Signed-off-by: Zhenzhong Duan 
---
 hw/vfio/pci.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 24fc047423..15e1b771b0 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -43,6 +43,7 @@
 #include "migration/blocker.h"
 #include "migration/qemu-file.h"
 #include "linux/iommufd.h"
+#include "sysemu/iommufd.h"
 
 #define TYPE_VFIO_PCI_NOHOTPLUG "vfio-pci-nohotplug"
 
@@ -3700,6 +3701,10 @@ static Property vfio_pci_dev_properties[] = {
  * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
  * DEFINE_PROP_STRING("vfiogroupfd, VFIOPCIDevice, vfiogroupfd_name),
  */
+#ifdef CONFIG_IOMMUFD
+DEFINE_PROP_LINK("iommufd", VFIOPCIDevice, vbasedev.iommufd,
+ TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
+#endif
 DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.34.1

[PATCH v2 07/27] vfio/container: switch to IOMMU BE add/del_section_window

2023-10-16 Thread Zhenzhong Duan

From: Eric Auger 

No fucntional change intended.

Signed-off-by: Eric Auger 
Signed-off-by: Yi Liu 
Signed-off-by: Yi Sun 
Signed-off-by: Zhenzhong Duan 
---
 include/hw/vfio/vfio-common.h |  6 --
 include/hw/vfio/vfio-container-base.h |  5 +
 hw/vfio/common.c  |  4 ++--
 hw/vfio/container-base.c  | 21 +
 hw/vfio/container.c   | 19 ++-
 5 files changed, 42 insertions(+), 13 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 33f475957c..b83ae4b3b6 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -199,12 +199,6 @@ int vfio_set_dirty_page_tracking(VFIOLegacyContainer 
*container, bool start);
 int vfio_query_dirty_bitmap(VFIOLegacyContainer *container, VFIOBitmap *vbmap,
 hwaddr iova, hwaddr size);
 
-int vfio_container_add_section_window(VFIOLegacyContainer *container,
-  MemoryRegionSection *section,
-  Error **errp);
-void vfio_container_del_section_window(VFIOLegacyContainer *container,
-   MemoryRegionSection *section);
-
 void vfio_disable_irqindex(VFIODevice *vbasedev, int index);
 void vfio_unmask_single_irqindex(VFIODevice *vbasedev, int index);
 void vfio_mask_single_irqindex(VFIODevice *vbasedev, int index);
diff --git a/include/hw/vfio/vfio-container-base.h 
b/include/hw/vfio/vfio-container-base.h
index 9504564f4e..1f6d5fd229 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -69,6 +69,11 @@ int vfio_container_dma_map(VFIOContainer *bcontainer,
 int vfio_container_dma_unmap(VFIOContainer *bcontainer,
  hwaddr iova, ram_addr_t size,
  IOMMUTLBEntry *iotlb);
+int vfio_container_add_section_window(VFIOContainer *bcontainer,
+  MemoryRegionSection *section,
+  Error **errp);
+void vfio_container_del_section_window(VFIOContainer *bcontainer,
+   MemoryRegionSection *section);
 
 void vfio_container_init(VFIOContainer *bcontainer,
  VFIOAddressSpace *space,
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index c92af34eed..49cb5b6958 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -655,7 +655,7 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 return;
 }
 
-if (vfio_container_add_section_window(container, section, &err)) {
+if (vfio_container_add_section_window(bcontainer, section, &err)) {
 goto fail;
 }
 
@@ -879,7 +879,7 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
 
 memory_region_unref(section->mr);
 
-vfio_container_del_section_window(container, section);
+vfio_container_del_section_window(&container->bcontainer, section);
 }
 
 typedef struct VFIODirtyRanges {
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index e1056dd78e..f2a9a33465 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -48,6 +48,27 @@ int vfio_container_dma_unmap(VFIOContainer *bcontainer,
 return bcontainer->ops->dma_unmap(bcontainer, iova, size, iotlb);
 }
 
+int vfio_container_add_section_window(VFIOContainer *bcontainer,
+  MemoryRegionSection *section,
+  Error **errp)
+{
+if (!bcontainer->ops->add_window) {
+return 0;
+}
+
+return bcontainer->ops->add_window(bcontainer, section, errp);
+}
+
+void vfio_container_del_section_window(VFIOContainer *bcontainer,
+   MemoryRegionSection *section)
+{
+if (!bcontainer->ops->del_window) {
+return;
+}
+
+return bcontainer->ops->del_window(bcontainer, section);
+}
+
 void vfio_container_init(VFIOContainer *bcontainer,
  VFIOAddressSpace *space,
  struct VFIOIOMMUBackendOpsClass *ops)
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index fd2d602fb9..7ca61a7d36 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -215,10 +215,13 @@ static int vfio_legacy_dma_map(VFIOContainer *bcontainer, 
hwaddr iova,
 return -errno;
 }
 
-int vfio_container_add_section_window(VFIOLegacyContainer *container,
-  MemoryRegionSection *section,
-  Error **errp)
+static int vfio_legacy_add_section_window(VFIOContainer *bcontainer,
+  MemoryRegionSection *section,
+  Error **errp)
 {
+VFIOLegacyContainer *container = container_of(bcontainer,
+  VFIOLegacyContainer,
+  bcontainer);

[PATCH v2 09/27] vfio/container: Switch to IOMMU BE set_dirty_page_tracking/query_dirty_bitmap API

2023-10-16 Thread Zhenzhong Duan

From: Eric Auger 

dirty_pages_supported field is also moved to the base container

No fucntional change intended.

Signed-off-by: Eric Auger 
Signed-off-by: Yi Liu 
Signed-off-by: Yi Sun 
Signed-off-by: Zhenzhong Duan 
---
 include/hw/vfio/vfio-common.h |  6 --
 include/hw/vfio/vfio-container-base.h |  6 ++
 hw/vfio/common.c  | 12 
 hw/vfio/container-base.c  | 23 +++
 hw/vfio/container.c   | 23 ---
 5 files changed, 53 insertions(+), 17 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 85dbda296a..39bcc7ec33 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -83,7 +83,6 @@ typedef struct VFIOLegacyContainer {
 unsigned iommu_type;
 Error *error;
 bool initialized;
-bool dirty_pages_supported;
 uint64_t dirty_pgsizes;
 uint64_t max_dirty_bitmap_size;
 unsigned long pgsizes;
@@ -186,11 +185,6 @@ VFIOAddressSpace *vfio_get_address_space(AddressSpace *as);
 void vfio_put_address_space(VFIOAddressSpace *space);
 bool vfio_devices_all_running_and_saving(VFIOLegacyContainer *container);
 
-/* container->fd */
-int vfio_set_dirty_page_tracking(VFIOLegacyContainer *container, bool start);
-int vfio_query_dirty_bitmap(VFIOLegacyContainer *container, VFIOBitmap *vbmap,
-hwaddr iova, hwaddr size);
-
 void vfio_disable_irqindex(VFIODevice *vbasedev, int index);
 void vfio_unmask_single_irqindex(VFIODevice *vbasedev, int index);
 void vfio_mask_single_irqindex(VFIODevice *vbasedev, int index);
diff --git a/include/hw/vfio/vfio-container-base.h 
b/include/hw/vfio/vfio-container-base.h
index 03bffbff73..5ab52774b5 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -66,6 +66,7 @@ typedef struct {
 struct VFIOContainer {
 VFIOIOMMUBackendOpsClass *ops;
 VFIOAddressSpace *space;
+bool dirty_pages_supported;
 QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
 QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
 QLIST_ENTRY(VFIOContainer) next;
@@ -77,6 +78,11 @@ int vfio_container_dma_map(VFIOContainer *bcontainer,
 int vfio_container_dma_unmap(VFIOContainer *bcontainer,
  hwaddr iova, ram_addr_t size,
  IOMMUTLBEntry *iotlb);
+int vfio_container_set_dirty_page_tracking(VFIOContainer *bcontainer,
+   bool start);
+int vfio_container_query_dirty_bitmap(VFIOContainer *bcontainer,
+  VFIOBitmap *vbmap,
+  hwaddr iova, hwaddr size);
 int vfio_container_add_section_window(VFIOContainer *bcontainer,
   MemoryRegionSection *section,
   Error **errp);
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 511f538c00..855d6d82d0 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1149,7 +1149,8 @@ static void vfio_listener_log_global_start(MemoryListener 
*listener)
 if (vfio_devices_all_device_dirty_tracking(container)) {
 ret = vfio_devices_dma_logging_start(container);
 } else {
-ret = vfio_set_dirty_page_tracking(container, true);
+ret = vfio_container_set_dirty_page_tracking(&container->bcontainer,
+ true);
 }
 
 if (ret) {
@@ -1169,7 +1170,8 @@ static void vfio_listener_log_global_stop(MemoryListener 
*listener)
 if (vfio_devices_all_device_dirty_tracking(container)) {
 vfio_devices_dma_logging_stop(container);
 } else {
-ret = vfio_set_dirty_page_tracking(container, false);
+ret = vfio_container_set_dirty_page_tracking(&container->bcontainer,
+ false);
 }
 
 if (ret) {
@@ -1237,7 +1239,8 @@ int vfio_get_dirty_bitmap(VFIOLegacyContainer *container, 
uint64_t iova,
 VFIOBitmap vbmap;
 int ret;
 
-if (!container->dirty_pages_supported && !all_device_dirty_tracking) {
+if (!container->bcontainer.dirty_pages_supported &&
+!all_device_dirty_tracking) {
 cpu_physical_memory_set_dirty_range(ram_addr, size,
 tcg_enabled() ? DIRTY_CLIENTS_ALL :
 DIRTY_CLIENTS_NOCODE);
@@ -1252,7 +1255,8 @@ int vfio_get_dirty_bitmap(VFIOLegacyContainer *container, 
uint64_t iova,
 if (all_device_dirty_tracking) {
 ret = vfio_devices_query_dirty_bitmap(container, &vbmap, iova, size);
 } else {
-ret = vfio_query_dirty_bitmap(container, &vbmap, iova, size);
+ret = vfio_container_query_dirty_bitmap(&container->bcontainer, &vbmap,
+iova, size);
 }
 
 if (ret) {
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index 12b2

[PATCH v2 08/27] vfio/container: Move hostwin_list in base container

2023-10-16 Thread Zhenzhong Duan

From: Eric Auger 

Move hostwin_list in base container. This conducts to
passing a base container to vfio_host_win_add/del and
vfio_find_hostwin.

No fucntional change intended.

Signed-off-by: Eric Auger 
Signed-off-by: Yi Liu 
Signed-off-by: Yi Sun 
Signed-off-by: Zhenzhong Duan 
---
 include/hw/vfio/vfio-common.h | 12 ++--
 include/hw/vfio/vfio-container-base.h |  8 
 hw/vfio/common.c  | 18 +-
 hw/vfio/container-base.c  |  8 
 hw/vfio/container.c   | 18 +-
 5 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index b83ae4b3b6..85dbda296a 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -88,7 +88,6 @@ typedef struct VFIOLegacyContainer {
 uint64_t max_dirty_bitmap_size;
 unsigned long pgsizes;
 unsigned int dma_max_mappings;
-QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
 QLIST_HEAD(, VFIOGroup) group_list;
 QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
 QLIST_HEAD(, VFIODevice) device_list;
@@ -104,13 +103,6 @@ typedef struct VFIORamDiscardListener {
 QLIST_ENTRY(VFIORamDiscardListener) next;
 } VFIORamDiscardListener;
 
-typedef struct VFIOHostDMAWindow {
-hwaddr min_iova;
-hwaddr max_iova;
-uint64_t iova_pgsizes;
-QLIST_ENTRY(VFIOHostDMAWindow) hostwin_next;
-} VFIOHostDMAWindow;
-
 typedef struct VFIODeviceOps VFIODeviceOps;
 
 typedef struct VFIODevice {
@@ -185,10 +177,10 @@ typedef struct VFIODisplay {
 } dmabuf;
 } VFIODisplay;
 
-void vfio_host_win_add(VFIOLegacyContainer *container,
+void vfio_host_win_add(VFIOContainer *bcontainer,
hwaddr min_iova, hwaddr max_iova,
uint64_t iova_pgsizes);
-int vfio_host_win_del(VFIOLegacyContainer *container, hwaddr min_iova,
+int vfio_host_win_del(VFIOContainer *bcontainer, hwaddr min_iova,
   hwaddr max_iova);
 VFIOAddressSpace *vfio_get_address_space(AddressSpace *as);
 void vfio_put_address_space(VFIOAddressSpace *space);
diff --git a/include/hw/vfio/vfio-container-base.h 
b/include/hw/vfio/vfio-container-base.h
index 1f6d5fd229..03bffbff73 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -47,6 +47,13 @@ typedef struct VFIOGuestIOMMU {
 QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
 } VFIOGuestIOMMU;
 
+typedef struct VFIOHostDMAWindow {
+hwaddr min_iova;
+hwaddr max_iova;
+uint64_t iova_pgsizes;
+QLIST_ENTRY(VFIOHostDMAWindow) hostwin_next;
+} VFIOHostDMAWindow;
+
 typedef struct {
 unsigned long *bitmap;
 hwaddr size;
@@ -60,6 +67,7 @@ struct VFIOContainer {
 VFIOIOMMUBackendOpsClass *ops;
 VFIOAddressSpace *space;
 QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
+QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
 QLIST_ENTRY(VFIOContainer) next;
 };
 
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 49cb5b6958..511f538c00 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -252,12 +252,12 @@ bool 
vfio_devices_all_running_and_mig_active(VFIOLegacyContainer *container)
 return true;
 }
 
-void vfio_host_win_add(VFIOLegacyContainer *container, hwaddr min_iova,
+void vfio_host_win_add(VFIOContainer *bcontainer, hwaddr min_iova,
hwaddr max_iova, uint64_t iova_pgsizes)
 {
 VFIOHostDMAWindow *hostwin;
 
-QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
+QLIST_FOREACH(hostwin, &bcontainer->hostwin_list, hostwin_next) {
 if (ranges_overlap(hostwin->min_iova,
hostwin->max_iova - hostwin->min_iova + 1,
min_iova,
@@ -271,15 +271,15 @@ void vfio_host_win_add(VFIOLegacyContainer *container, 
hwaddr min_iova,
 hostwin->min_iova = min_iova;
 hostwin->max_iova = max_iova;
 hostwin->iova_pgsizes = iova_pgsizes;
-QLIST_INSERT_HEAD(&container->hostwin_list, hostwin, hostwin_next);
+QLIST_INSERT_HEAD(&bcontainer->hostwin_list, hostwin, hostwin_next);
 }
 
-int vfio_host_win_del(VFIOLegacyContainer *container,
+int vfio_host_win_del(VFIOContainer *bcontainer,
   hwaddr min_iova, hwaddr max_iova)
 {
 VFIOHostDMAWindow *hostwin;
 
-QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
+QLIST_FOREACH(hostwin, &bcontainer->hostwin_list, hostwin_next) {
 if (hostwin->min_iova == min_iova && hostwin->max_iova == max_iova) {
 QLIST_REMOVE(hostwin, hostwin_next);
 g_free(hostwin);
@@ -540,13 +540,13 @@ static void 
vfio_unregister_ram_discard_listener(VFIOLegacyContainer *container,
 g_free(vrdl);
 }
 
-static VFIOHostDMAWindow *vfio_find_hostwin(VFIOLegacyContainer *container,
+static VFIOHostDMAWindow *vfio_find_hostwin(VFIOContainer *bcontainer,
 hwaddr iova, hwaddr end)
 {
 VFIOHostDMAWindow

[PATCH v2 10/27] vfio/container: Move per container device list in base container

2023-10-16 Thread Zhenzhong Duan

VFIO Device is also changed to point to base container instead of
legacy container.

No fucntional change intended.

Signed-off-by: Zhenzhong Duan 
---
 include/hw/vfio/vfio-common.h |  3 +--
 include/hw/vfio/vfio-container-base.h |  1 +
 hw/vfio/common.c  | 23 +++
 hw/vfio/container.c   | 12 ++--
 4 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 39bcc7ec33..6979359457 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -89,7 +89,6 @@ typedef struct VFIOLegacyContainer {
 unsigned int dma_max_mappings;
 QLIST_HEAD(, VFIOGroup) group_list;
 QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
-QLIST_HEAD(, VFIODevice) device_list;
 } VFIOLegacyContainer;
 
 typedef struct VFIORamDiscardListener {
@@ -109,7 +108,7 @@ typedef struct VFIODevice {
 QLIST_ENTRY(VFIODevice) container_next;
 QLIST_ENTRY(VFIODevice) global_next;
 struct VFIOGroup *group;
-VFIOLegacyContainer *container;
+VFIOContainer *bcontainer;
 char *sysfsdev;
 char *name;
 DeviceState *dev;
diff --git a/include/hw/vfio/vfio-container-base.h 
b/include/hw/vfio/vfio-container-base.h
index 5ab52774b5..49637a1e6c 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -70,6 +70,7 @@ struct VFIOContainer {
 QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
 QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
 QLIST_ENTRY(VFIOContainer) next;
+QLIST_HEAD(, VFIODevice) device_list;
 };
 
 int vfio_container_dma_map(VFIOContainer *bcontainer,
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 855d6d82d0..7350af038a 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -152,7 +152,7 @@ void vfio_unblock_multiple_devices_migration(void)
 
 bool vfio_viommu_preset(VFIODevice *vbasedev)
 {
-return vbasedev->container->bcontainer.space->as != &address_space_memory;
+return vbasedev->bcontainer->space->as != &address_space_memory;
 }
 
 static void vfio_set_migration_error(int err)
@@ -186,6 +186,7 @@ bool vfio_device_state_is_precopy(VFIODevice *vbasedev)
 
 static bool vfio_devices_all_dirty_tracking(VFIOLegacyContainer *container)
 {
+VFIOContainer *bcontainer = &container->bcontainer;
 VFIODevice *vbasedev;
 MigrationState *ms = migrate_get_current();
 
@@ -194,7 +195,7 @@ static bool 
vfio_devices_all_dirty_tracking(VFIOLegacyContainer *container)
 return false;
 }
 
-QLIST_FOREACH(vbasedev, &container->device_list, container_next) {
+QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
 VFIOMigration *migration = vbasedev->migration;
 
 if (!migration) {
@@ -212,9 +213,10 @@ static bool 
vfio_devices_all_dirty_tracking(VFIOLegacyContainer *container)
 
 bool vfio_devices_all_device_dirty_tracking(VFIOLegacyContainer *container)
 {
+VFIOContainer *bcontainer = &container->bcontainer;
 VFIODevice *vbasedev;
 
-QLIST_FOREACH(vbasedev, &container->device_list, container_next) {
+QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
 if (!vbasedev->dirty_pages_supported) {
 return false;
 }
@@ -229,13 +231,14 @@ bool 
vfio_devices_all_device_dirty_tracking(VFIOLegacyContainer *container)
  */
 bool vfio_devices_all_running_and_mig_active(VFIOLegacyContainer *container)
 {
+VFIOContainer *bcontainer = &container->bcontainer;
 VFIODevice *vbasedev;
 
 if (!migration_is_active(migrate_get_current())) {
 return false;
 }
 
-QLIST_FOREACH(vbasedev, &container->device_list, container_next) {
+QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
 VFIOMigration *migration = vbasedev->migration;
 
 if (!migration) {
@@ -901,12 +904,13 @@ static bool vfio_section_is_vfio_pci(MemoryRegionSection 
*section,
  VFIOLegacyContainer *container)
 {
 VFIOPCIDevice *pcidev;
+VFIOContainer *bcontainer = &container->bcontainer;
 VFIODevice *vbasedev;
 Object *owner;
 
 owner = memory_region_owner(section->mr);
 
-QLIST_FOREACH(vbasedev, &container->device_list, container_next) {
+QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
 if (vbasedev->type != VFIO_DEVICE_TYPE_PCI) {
 continue;
 }
@@ -1007,13 +1011,14 @@ static void 
vfio_devices_dma_logging_stop(VFIOLegacyContainer *container)
 uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature),
   sizeof(uint64_t))] = {};
 struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
+VFIOContainer *bcontainer = &container->bcontainer;
 VFIODevice *vbasedev;
 
 feature->argsz = sizeof(buf);
 feature->flags = VFIO_DEVICE_FEATURE_SET |
  VFIO_DEVICE_FEATURE_DMA_LOGGING_STOP;

Re: [PATCH 2/3] migration/multifd: Unify multifd_send_thread error paths

2023-10-16 Thread Juan Quintela

Fabiano Rosas  wrote:
> The preferred usage of the Error type is to always set both the return
> code and the error when a failure happens. As all code called from the
> send thread follows this pattern, we'll always have the return code
> and the error set at the same time.
>
> Aside from the convention, in this piece of code this must be the
> case, otherwise the if (ret != 0) would be exiting the thread without
> calling multifd_send_terminate_threads() which is incorrect.
>
> Unify both paths to make it clear that both are taken when there's an
> error.
>
> Signed-off-by: Fabiano Rosas 

Reviewed-by: Juan Quintela 
queued.

[PATCH v2 26/27] vfio/ap: Make vfio cdev pre-openable by passing a file handle

2023-10-16 Thread Zhenzhong Duan

This gives management tools like libvirt a chance to open the vfio
cdev with privilege and pass FD to qemu. This way qemu never needs
to have privilege to open a VFIO or iommu cdev node.

Opportunisticly, remove some unnecessory double-cast.

Signed-off-by: Zhenzhong Duan 
---
 hw/vfio/ap.c | 32 +++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
index 1f8e88aeb3..a34cae31a2 100644
--- a/hw/vfio/ap.c
+++ b/hw/vfio/ap.c
@@ -30,6 +30,7 @@
 #include "hw/s390x/ap-bridge.h"
 #include "exec/address-spaces.h"
 #include "qom/object.h"
+#include "monitor/monitor.h"
 
 #define TYPE_VFIO_AP_DEVICE  "vfio-ap"
 
@@ -160,7 +161,10 @@ static void vfio_ap_realize(DeviceState *dev, Error **errp)
 VFIOAPDevice *vapdev = VFIO_AP_DEVICE(dev);
 VFIODevice *vbasedev = &vapdev->vdev;
 
-vbasedev->name = g_path_get_basename(vbasedev->sysfsdev);
+if (vfio_device_get_name(vbasedev, errp)) {
+return;
+}
+
 vbasedev->ops = &vfio_ap_ops;
 vbasedev->type = VFIO_DEVICE_TYPE_AP;
 vbasedev->dev = dev;
@@ -230,11 +234,36 @@ static const VMStateDescription vfio_ap_vmstate = {
 .unmigratable = 1,
 };
 
+static void vfio_ap_instance_init(Object *obj)
+{
+VFIOAPDevice *vapdev = VFIO_AP_DEVICE(obj);
+
+vapdev->vdev.fd = -1;
+}
+
+#ifdef CONFIG_IOMMUFD
+static void vfio_ap_set_fd(Object *obj, const char *str, Error **errp)
+{
+VFIOAPDevice *vapdev = VFIO_AP_DEVICE(obj);
+int fd = -1;
+
+fd = monitor_fd_param(monitor_cur(), str, errp);
+if (fd == -1) {
+error_prepend(errp, "Could not parse remote object fd %s:", str);
+return;
+}
+vapdev->vdev.fd = fd;
+}
+#endif
+
 static void vfio_ap_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
 
 device_class_set_props(dc, vfio_ap_properties);
+#ifdef CONFIG_IOMMUFD
+object_class_property_add_str(klass, "fd", NULL, vfio_ap_set_fd);
+#endif
 dc->vmsd = &vfio_ap_vmstate;
 dc->desc = "VFIO-based AP device assignment";
 set_bit(DEVICE_CATEGORY_MISC, dc->categories);
@@ -249,6 +278,7 @@ static const TypeInfo vfio_ap_info = {
 .name = TYPE_VFIO_AP_DEVICE,
 .parent = TYPE_AP_DEVICE,
 .instance_size = sizeof(VFIOAPDevice),
+.instance_init = vfio_ap_instance_init,
 .class_init = vfio_ap_class_init,
 };
 
-- 
2.34.1

[PATCH v2 15/27] vfio/container: Implement attach/detach_device

2023-10-16 Thread Zhenzhong Duan

From: Eric Auger 

No fucntional change intended.

Signed-off-by: Eric Auger 
Signed-off-by: Yi Liu 
Signed-off-by: Yi Sun 
Signed-off-by: Zhenzhong Duan 
---
 hw/vfio/common.c| 22 ++
 hw/vfio/container.c | 12 +---
 2 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index da1d64efca..ee2ebf4be9 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1552,3 +1552,25 @@ retry:
 
 return info;
 }
+
+int vfio_attach_device(char *name, VFIODevice *vbasedev,
+   AddressSpace *as, Error **errp)
+{
+const VFIOIOMMUBackendOpsClass *ops;
+
+ops = VFIO_IOMMU_BACKEND_OPS_CLASS(
+  object_class_by_name(TYPE_VFIO_IOMMU_BACKEND_LEGACY_OPS));
+if (!ops) {
+error_setg(errp, "VFIO IOMMU Backend not found!");
+return -ENODEV;
+}
+return ops->attach_device(name, vbasedev, as, errp);
+}
+
+void vfio_detach_device(VFIODevice *vbasedev)
+{
+if (!vbasedev->bcontainer) {
+return;
+}
+vbasedev->bcontainer->ops->detach_device(vbasedev);
+}
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 9d5be749c7..c86accdb38 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -1117,8 +1117,8 @@ static int vfio_device_groupid(VFIODevice *vbasedev, 
Error **errp)
  * @name and @vbasedev->name are likely to be different depending
  * on the type of the device, hence the need for passing @name
  */
-int vfio_attach_device(char *name, VFIODevice *vbasedev,
-   AddressSpace *as, Error **errp)
+static int vfio_legacy_attach_device(char *name, VFIODevice *vbasedev,
+ AddressSpace *as, Error **errp)
 {
 int groupid = vfio_device_groupid(vbasedev, errp);
 VFIODevice *vbasedev_iter;
@@ -1158,14 +1158,10 @@ int vfio_attach_device(char *name, VFIODevice *vbasedev,
 return ret;
 }
 
-void vfio_detach_device(VFIODevice *vbasedev)
+static void vfio_legacy_detach_device(VFIODevice *vbasedev)
 {
 VFIOGroup *group = vbasedev->group;
 
-if (!vbasedev->bcontainer) {
-return;
-}
-
 QLIST_REMOVE(vbasedev, global_next);
 QLIST_REMOVE(vbasedev, container_next);
 vbasedev->bcontainer = NULL;
@@ -1180,6 +1176,8 @@ static void 
vfio_iommu_backend_legacy_ops_class_init(ObjectClass *oc,
 
 ops->dma_map = vfio_legacy_dma_map;
 ops->dma_unmap = vfio_legacy_dma_unmap;
+ops->attach_device = vfio_legacy_attach_device;
+ops->detach_device = vfio_legacy_detach_device;
 ops->set_dirty_page_tracking = vfio_legacy_set_dirty_page_tracking;
 ops->query_dirty_bitmap = vfio_legacy_query_dirty_bitmap;
 ops->add_window = vfio_legacy_add_section_window;
-- 
2.34.1

[PATCH v2 14/27] vfio/container: Move dirty_pgsizes and max_dirty_bitmap_size to base container

2023-10-16 Thread Zhenzhong Duan

From: Eric Auger 

No functional change intended.

Signed-off-by: Eric Auger 
Signed-off-by: Yi Liu 
Signed-off-by: Yi Sun 
Signed-off-by: Zhenzhong Duan 
---
 include/hw/vfio/vfio-common.h |  2 --
 include/hw/vfio/vfio-container-base.h |  2 ++
 hw/vfio/container.c   | 11 ++-
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 8771160849..9f2b86581b 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -80,8 +80,6 @@ typedef struct VFIOLegacyContainer {
 int fd; /* /dev/vfio/vfio, empowered by the attached groups */
 MemoryListener prereg_listener;
 unsigned iommu_type;
-uint64_t dirty_pgsizes;
-uint64_t max_dirty_bitmap_size;
 QLIST_HEAD(, VFIOGroup) group_list;
 } VFIOLegacyContainer;
 
diff --git a/include/hw/vfio/vfio-container-base.h 
b/include/hw/vfio/vfio-container-base.h
index 96d33495c1..9a5971a00a 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -79,6 +79,8 @@ struct VFIOContainer {
 MemoryListener listener;
 Error *error;
 bool initialized;
+uint64_t dirty_pgsizes;
+uint64_t max_dirty_bitmap_size;
 unsigned long pgsizes;
 unsigned int dma_max_mappings;
 bool dirty_pages_supported;
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 5b14a9b307..9d5be749c7 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -70,6 +70,7 @@ static int vfio_dma_unmap_bitmap(VFIOLegacyContainer 
*container,
  hwaddr iova, ram_addr_t size,
  IOMMUTLBEntry *iotlb)
 {
+VFIOContainer *bcontainer = &container->bcontainer;
 struct vfio_iommu_type1_dma_unmap *unmap;
 struct vfio_bitmap *bitmap;
 VFIOBitmap vbmap;
@@ -97,7 +98,7 @@ static int vfio_dma_unmap_bitmap(VFIOLegacyContainer 
*container,
 bitmap->size = vbmap.size;
 bitmap->data = (__u64 *)vbmap.bitmap;
 
-if (vbmap.size > container->max_dirty_bitmap_size) {
+if (vbmap.size > bcontainer->max_dirty_bitmap_size) {
 error_report("UNMAP: Size of bitmap too big 0x%"PRIx64, vbmap.size);
 ret = -E2BIG;
 goto unmap_exit;
@@ -139,7 +140,7 @@ static int vfio_legacy_dma_unmap(VFIOContainer *bcontainer, 
hwaddr iova,
 
 if (iotlb && vfio_devices_all_running_and_mig_active(bcontainer)) {
 if (!vfio_devices_all_device_dirty_tracking(bcontainer) &&
-container->bcontainer.dirty_pages_supported) {
+bcontainer->dirty_pages_supported) {
 return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
 }
 
@@ -162,7 +163,7 @@ static int vfio_legacy_dma_unmap(VFIOContainer *bcontainer, 
hwaddr iova,
 if (errno == EINVAL && unmap.size && !(unmap.iova + unmap.size) &&
 container->iommu_type == VFIO_TYPE1v2_IOMMU) {
 trace_vfio_legacy_dma_unmap_overflow_workaround();
-unmap.size -= 1ULL << ctz64(container->bcontainer.pgsizes);
+unmap.size -= 1ULL << ctz64(bcontainer->pgsizes);
 continue;
 }
 error_report("VFIO_UNMAP_DMA failed: %s", strerror(errno));
@@ -558,8 +559,8 @@ static void 
vfio_get_iommu_info_migration(VFIOLegacyContainer *container,
  */
 if (cap_mig->pgsize_bitmap & qemu_real_host_page_size()) {
 bcontainer->dirty_pages_supported = true;
-container->max_dirty_bitmap_size = cap_mig->max_dirty_bitmap_size;
-container->dirty_pgsizes = cap_mig->pgsize_bitmap;
+bcontainer->max_dirty_bitmap_size = cap_mig->max_dirty_bitmap_size;
+bcontainer->dirty_pgsizes = cap_mig->pgsize_bitmap;
 }
 }
 
-- 
2.34.1

[PATCH v2 25/27] vfio/platform: Make vfio cdev pre-openable by passing a file handle

2023-10-16 Thread Zhenzhong Duan

This gives management tools like libvirt a chance to open the vfio
cdev with privilege and pass FD to qemu. This way qemu never needs
to have privilege to open a VFIO or iommu cdev node.

Signed-off-by: Zhenzhong Duan 
---
 hw/vfio/platform.c | 41 +
 1 file changed, 33 insertions(+), 8 deletions(-)

diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index a1c25e0337..aa0b2b9583 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -35,6 +35,7 @@
 #include "hw/platform-bus.h"
 #include "hw/qdev-properties.h"
 #include "sysemu/kvm.h"
+#include "monitor/monitor.h"
 
 /*
  * Functions used whatever the injection method
@@ -529,14 +530,13 @@ static VFIODeviceOps vfio_platform_ops = {
  */
 static int vfio_base_device_init(VFIODevice *vbasedev, Error **errp)
 {
-struct stat st;
 int ret;
 
-/* @sysfsdev takes precedence over @host */
-if (vbasedev->sysfsdev) {
+/* @fd takes precedence over @sysfsdev which takes precedence over @host */
+if (vbasedev->fd < 0 && vbasedev->sysfsdev) {
 g_free(vbasedev->name);
 vbasedev->name = g_path_get_basename(vbasedev->sysfsdev);
-} else {
+} else if (vbasedev->fd < 0) {
 if (!vbasedev->name || strchr(vbasedev->name, '/')) {
 error_setg(errp, "wrong host device name");
 return -EINVAL;
@@ -546,10 +546,9 @@ static int vfio_base_device_init(VFIODevice *vbasedev, 
Error **errp)
  vbasedev->name);
 }
 
-if (stat(vbasedev->sysfsdev, &st) < 0) {
-error_setg_errno(errp, errno,
- "failed to get the sysfs host device file status");
-return -errno;
+ret = vfio_device_get_name(vbasedev, errp);
+if (ret) {
+return ret;
 }
 
 ret = vfio_attach_device(vbasedev->name, vbasedev,
@@ -656,6 +655,28 @@ static Property vfio_platform_dev_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+static void vfio_platform_instance_init(Object *obj)
+{
+VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(obj);
+
+vdev->vbasedev.fd = -1;
+}
+
+#ifdef CONFIG_IOMMUFD
+static void vfio_platform_set_fd(Object *obj, const char *str, Error **errp)
+{
+VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(obj);
+int fd = -1;
+
+fd = monitor_fd_param(monitor_cur(), str, errp);
+if (fd == -1) {
+error_prepend(errp, "Could not parse remote object fd %s:", str);
+return;
+}
+vdev->vbasedev.fd = fd;
+}
+#endif
+
 static void vfio_platform_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
@@ -663,6 +684,9 @@ static void vfio_platform_class_init(ObjectClass *klass, 
void *data)
 
 dc->realize = vfio_platform_realize;
 device_class_set_props(dc, vfio_platform_dev_properties);
+#ifdef CONFIG_IOMMUFD
+object_class_property_add_str(klass, "fd", NULL, vfio_platform_set_fd);
+#endif
 dc->vmsd = &vfio_platform_vmstate;
 dc->desc = "VFIO-based platform device assignment";
 sbc->connect_irq_notifier = vfio_start_irqfd_injection;
@@ -675,6 +699,7 @@ static const TypeInfo vfio_platform_dev_info = {
 .name = TYPE_VFIO_PLATFORM,
 .parent = TYPE_SYS_BUS_DEVICE,
 .instance_size = sizeof(VFIOPlatformDevice),
+.instance_init = vfio_platform_instance_init,
 .class_init = vfio_platform_class_init,
 .class_size = sizeof(VFIOPlatformDeviceClass),
 };
-- 
2.34.1

[PATCH v2 18/27] util/char_dev: Add open_cdev()

2023-10-16 Thread Zhenzhong Duan

From: Yi Liu 

/dev/vfio/devices/vfioX may not exist. In that case it is still possible
to open /dev/char/$major:$minor instead. Add helper function to abstract
the cdev open.

Suggested-by: Jason Gunthorpe 
Signed-off-by: Yi Liu 
Signed-off-by: Zhenzhong Duan 
---
 MAINTAINERS |  6 +++
 include/qemu/chardev_open.h | 16 
 util/chardev_open.c | 81 +
 util/meson.build|  1 +
 4 files changed, 104 insertions(+)
 create mode 100644 include/qemu/chardev_open.h
 create mode 100644 util/chardev_open.c

diff --git a/MAINTAINERS b/MAINTAINERS
index a7cdeb7825..eb6b7d274c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3408,6 +3408,12 @@ S: Maintained
 F: include/qemu/iova-tree.h
 F: util/iova-tree.c
 
+cdev Open
+M: Yi Liu 
+S: Maintained
+F: include/qemu/chardev_open.h
+F: util/chardev_open.c
+
 elf2dmp
 M: Viktor Prutyanov 
 S: Maintained
diff --git a/include/qemu/chardev_open.h b/include/qemu/chardev_open.h
new file mode 100644
index 00..6580d351c6
--- /dev/null
+++ b/include/qemu/chardev_open.h
@@ -0,0 +1,16 @@
+/*
+ * QEMU Chardev Helper
+ *
+ * Copyright (C) 2023 Intel Corporation.
+ *
+ * Authors: Yi Liu 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_CHARDEV_HELPERS_H
+#define QEMU_CHARDEV_HELPERS_H
+
+int open_cdev(const char *devpath, dev_t cdev);
+#endif
diff --git a/util/chardev_open.c b/util/chardev_open.c
new file mode 100644
index 00..005d2b81bd
--- /dev/null
+++ b/util/chardev_open.c
@@ -0,0 +1,81 @@
+/*
+ * Copyright (c) 2019, Mellanox Technologies. All rights reserved.
+ * Copyright (C) 2023 Intel Corporation.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *  Redistribution and use in source and binary forms, with or
+ *  without modification, are permitted provided that the following
+ *  conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * Authors: Yi Liu 
+ *
+ * Copied from
+ * https://github.com/linux-rdma/rdma-core/blob/master/util/open_cdev.c
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/chardev_open.h"
+
+static int open_cdev_internal(const char *path, dev_t cdev)
+{
+struct stat st;
+int fd;
+
+fd = qemu_open_old(path, O_RDWR);
+if (fd == -1) {
+return -1;
+}
+if (fstat(fd, &st) || !S_ISCHR(st.st_mode) ||
+(cdev != 0 && st.st_rdev != cdev)) {
+close(fd);
+return -1;
+}
+return fd;
+}
+
+static int open_cdev_robust(dev_t cdev)
+{
+g_autofree char *devpath;
+
+/*
+ * This assumes that udev is being used and is creating the /dev/char/
+ * symlinks.
+ */
+devpath = g_strdup_printf("/dev/char/%u:%u", major(cdev), minor(cdev));
+return open_cdev_internal(devpath, cdev);
+}
+
+int open_cdev(const char *devpath, dev_t cdev)
+{
+int fd;
+
+fd = open_cdev_internal(devpath, cdev);
+if (fd == -1 && cdev != 0) {
+return open_cdev_robust(cdev);
+}
+return fd;
+}
diff --git a/util/meson.build b/util/meson.build
index c4827fd70a..654f4528fb 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -106,6 +106,7 @@ if have_block
 util_ss.add(files('filemonitor-stub.c'))
   endif
   util_ss.add(when: 'CONFIG_LINUX', if_true: files('vfio-helpers.c'))
+  util_ss.add(when: 'CONFIG_LINUX', if_true: files('chardev_open.c'))
 endif
 
 if cpu == 'aarch64'
-- 
2.34.1

[PATCH v2 27/27] vfio/ccw: Make vfio cdev pre-openable by passing a file handle

2023-10-16 Thread Zhenzhong Duan

This gives management tools like libvirt a chance to open the vfio
cdev with privilege and pass FD to qemu. This way qemu never needs
to have privilege to open a VFIO or iommu cdev node.

Opportunisticly, remove a redundant definition of TYPE_VFIO_CCW.

Signed-off-by: Zhenzhong Duan 
---
 hw/vfio/ccw.c | 34 +++---
 1 file changed, 31 insertions(+), 3 deletions(-)

diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index c7f8e70783..f151652bc2 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -31,6 +31,7 @@
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
+#include "monitor/monitor.h"
 
 struct VFIOCCWDevice {
 S390CCWDevice cdev;
@@ -590,11 +591,12 @@ static void vfio_ccw_realize(DeviceState *dev, Error 
**errp)
 }
 }
 
+if (vfio_device_get_name(vbasedev, errp)) {
+return;
+}
+
 vbasedev->ops = &vfio_ccw_ops;
 vbasedev->type = VFIO_DEVICE_TYPE_CCW;
-vbasedev->name = g_strdup_printf("%x.%x.%04x", vcdev->cdev.hostid.cssid,
-   vcdev->cdev.hostid.ssid,
-   vcdev->cdev.hostid.devid);
 vbasedev->dev = dev;
 
 /*
@@ -691,12 +693,37 @@ static const VMStateDescription vfio_ccw_vmstate = {
 .unmigratable = 1,
 };
 
+static void vfio_ccw_instance_init(Object *obj)
+{
+VFIOCCWDevice *vcdev = VFIO_CCW(obj);
+
+vcdev->vdev.fd = -1;
+}
+
+#ifdef CONFIG_IOMMUFD
+static void vfio_ccw_set_fd(Object *obj, const char *str, Error **errp)
+{
+VFIOCCWDevice *vcdev = VFIO_CCW(obj);
+int fd = -1;
+
+fd = monitor_fd_param(monitor_cur(), str, errp);
+if (fd == -1) {
+error_prepend(errp, "Could not parse remote object fd %s:", str);
+return;
+}
+vcdev->vdev.fd = fd;
+}
+#endif
+
 static void vfio_ccw_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
 S390CCWDeviceClass *cdc = S390_CCW_DEVICE_CLASS(klass);
 
 device_class_set_props(dc, vfio_ccw_properties);
+#ifdef CONFIG_IOMMUFD
+object_class_property_add_str(klass, "fd", NULL, vfio_ccw_set_fd);
+#endif
 dc->vmsd = &vfio_ccw_vmstate;
 dc->desc = "VFIO-based subchannel assignment";
 set_bit(DEVICE_CATEGORY_MISC, dc->categories);
@@ -714,6 +741,7 @@ static const TypeInfo vfio_ccw_info = {
 .name = TYPE_VFIO_CCW,
 .parent = TYPE_S390_CCW,
 .instance_size = sizeof(VFIOCCWDevice),
+.instance_init = vfio_ccw_instance_init,
 .class_init = vfio_ccw_class_init,
 };
 
-- 
2.34.1

Re: [PATCH 3/3] migration/multifd: Clarify Error usage in multifd_channel_connect

2023-10-16 Thread Juan Quintela

Fabiano Rosas  wrote:
> The function is currently called from two sites, one always gives it a
> NULL Error and the other always gives it a non-NULL Error.
>
> In the non-NULL case, all it does it trace the error and return. One
> of the callers already have tracing, add a tracepoint to the other and
> stop passing the error into the function.
>
> Cc: Markus Armbruster 
> Signed-off-by: Fabiano Rosas 

Reviewed-by: Juan Quintela 

queued.

[PATCH v2 13/27] vfio/container: Move listener to base container

2023-10-16 Thread Zhenzhong Duan

From: Eric Auger 

Move listener to base container. Also error and initialized fields
are moved at the same time.

No functional change intended.

Signed-off-by: Eric Auger 
Signed-off-by: Yi Liu 
Signed-off-by: Yi Sun 
Signed-off-by: Zhenzhong Duan 
---
 include/hw/vfio/vfio-common.h |   3 -
 include/hw/vfio/vfio-container-base.h |   3 +
 hw/vfio/common.c  | 116 +++---
 hw/vfio/container-base.c  |   1 +
 hw/vfio/container.c   |  31 +++
 hw/vfio/spapr.c   |   7 +-
 6 files changed, 72 insertions(+), 89 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 18dd676a2a..8771160849 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -78,11 +78,8 @@ struct VFIOGroup;
 typedef struct VFIOLegacyContainer {
 VFIOContainer bcontainer;
 int fd; /* /dev/vfio/vfio, empowered by the attached groups */
-MemoryListener listener;
 MemoryListener prereg_listener;
 unsigned iommu_type;
-Error *error;
-bool initialized;
 uint64_t dirty_pgsizes;
 uint64_t max_dirty_bitmap_size;
 QLIST_HEAD(, VFIOGroup) group_list;
diff --git a/include/hw/vfio/vfio-container-base.h 
b/include/hw/vfio/vfio-container-base.h
index d6ffd7efc4..96d33495c1 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -76,6 +76,9 @@ typedef struct {
 struct VFIOContainer {
 VFIOIOMMUBackendOpsClass *ops;
 VFIOAddressSpace *space;
+MemoryListener listener;
+Error *error;
+bool initialized;
 unsigned long pgsizes;
 unsigned int dma_max_mappings;
 bool dirty_pages_supported;
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index b833def682..da1d64efca 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -602,7 +602,7 @@ static bool vfio_listener_valid_section(MemoryRegionSection 
*section,
 return true;
 }
 
-static bool vfio_get_section_iova_range(VFIOLegacyContainer *container,
+static bool vfio_get_section_iova_range(VFIOContainer *bcontainer,
 MemoryRegionSection *section,
 hwaddr *out_iova, hwaddr *out_end,
 Int128 *out_llend)
@@ -630,10 +630,7 @@ static bool 
vfio_get_section_iova_range(VFIOLegacyContainer *container,
 static void vfio_listener_region_add(MemoryListener *listener,
  MemoryRegionSection *section)
 {
-VFIOLegacyContainer *container = container_of(listener,
-  VFIOLegacyContainer,
-  listener);
-VFIOContainer *bcontainer = &container->bcontainer;
+VFIOContainer *bcontainer = container_of(listener, VFIOContainer, 
listener);
 hwaddr iova, end;
 Int128 llend, llsize;
 void *vaddr;
@@ -645,7 +642,8 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 return;
 }
 
-if (!vfio_get_section_iova_range(container, section, &iova, &end, &llend)) 
{
+if (!vfio_get_section_iova_range(bcontainer, section, &iova, &end,
+ &llend)) {
 if (memory_region_is_ram_device(section->mr)) {
 trace_vfio_listener_region_add_no_dma_map(
 memory_region_name(section->mr),
@@ -663,7 +661,7 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 hostwin = vfio_find_hostwin(bcontainer, iova, end);
 if (!hostwin) {
 error_setg(&err, "Container %p can't map guest IOVA region"
-   " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
+   " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, bcontainer, iova, end);
 goto fail;
 }
 
@@ -750,13 +748,12 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 }
 }
 
-ret = vfio_container_dma_map(&container->bcontainer,
- iova, int128_get64(llsize), vaddr,
- section->readonly);
+ret = vfio_container_dma_map(bcontainer, iova, int128_get64(llsize),
+ vaddr, section->readonly);
 if (ret) {
 error_setg(&err, "vfio_container_dma_map(%p, 0x%"HWADDR_PRIx", "
"0x%"HWADDR_PRIx", %p) = %d (%s)",
-   container, iova, int128_get64(llsize), vaddr, ret,
+   bcontainer, iova, int128_get64(llsize), vaddr, ret,
strerror(-ret));
 if (memory_region_is_ram_device(section->mr)) {
 /* Allow unexpected mappings not to be fatal for RAM devices */
@@ -778,9 +775,9 @@ fail:
  * can gracefully fail.  Runtime, there's not much we can do other
  * than throw a hardware error.
  */
-if (!container->initialized) {
-if (!container->error) {
-error_propagate_prepend(&container->e

[PATCH v2 12/27] vfio/container: Move vrdl_list, pgsizes and dma_max_mappings to base container

2023-10-16 Thread Zhenzhong Duan

From: Eric Auger 

Move vrdl_list, pgsizes and dma_max_mappings to the base
container object

No functional change intended.

Signed-off-by: Eric Auger 
Signed-off-by: Yi Liu 
Signed-off-by: Yi Sun 
Signed-off-by: Zhenzhong Duan 
---
 include/hw/vfio/vfio-common.h | 13 
 include/hw/vfio/vfio-container-base.h | 13 
 hw/vfio/common.c  | 48 +--
 hw/vfio/container-base.c  | 12 +++
 hw/vfio/container.c   | 18 +-
 hw/vfio/spapr.c   |  4 +--
 6 files changed, 59 insertions(+), 49 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 7bb75bc7cd..18dd676a2a 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -85,22 +85,9 @@ typedef struct VFIOLegacyContainer {
 bool initialized;
 uint64_t dirty_pgsizes;
 uint64_t max_dirty_bitmap_size;
-unsigned long pgsizes;
-unsigned int dma_max_mappings;
 QLIST_HEAD(, VFIOGroup) group_list;
-QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
 } VFIOLegacyContainer;
 
-typedef struct VFIORamDiscardListener {
-VFIOLegacyContainer *container;
-MemoryRegion *mr;
-hwaddr offset_within_address_space;
-hwaddr size;
-uint64_t granularity;
-RamDiscardListener listener;
-QLIST_ENTRY(VFIORamDiscardListener) next;
-} VFIORamDiscardListener;
-
 typedef struct VFIODeviceOps VFIODeviceOps;
 
 typedef struct VFIODevice {
diff --git a/include/hw/vfio/vfio-container-base.h 
b/include/hw/vfio/vfio-container-base.h
index 49637a1e6c..d6ffd7efc4 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -47,6 +47,16 @@ typedef struct VFIOGuestIOMMU {
 QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
 } VFIOGuestIOMMU;
 
+typedef struct VFIORamDiscardListener {
+VFIOContainer *bcontainer;
+MemoryRegion *mr;
+hwaddr offset_within_address_space;
+hwaddr size;
+uint64_t granularity;
+RamDiscardListener listener;
+QLIST_ENTRY(VFIORamDiscardListener) next;
+} VFIORamDiscardListener;
+
 typedef struct VFIOHostDMAWindow {
 hwaddr min_iova;
 hwaddr max_iova;
@@ -66,9 +76,12 @@ typedef struct {
 struct VFIOContainer {
 VFIOIOMMUBackendOpsClass *ops;
 VFIOAddressSpace *space;
+unsigned long pgsizes;
+unsigned int dma_max_mappings;
 bool dirty_pages_supported;
 QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
 QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
+QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
 QLIST_ENTRY(VFIOContainer) next;
 QLIST_HEAD(, VFIODevice) device_list;
 };
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 1c47bcc478..b833def682 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -396,13 +396,13 @@ static void 
vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
 {
 VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
 listener);
+VFIOContainer *bcontainer = vrdl->bcontainer;
 const hwaddr size = int128_get64(section->size);
 const hwaddr iova = section->offset_within_address_space;
 int ret;
 
 /* Unmap with a single call. */
-ret = vfio_container_dma_unmap(&vrdl->container->bcontainer,
-   iova, size , NULL);
+ret = vfio_container_dma_unmap(bcontainer, iova, size , NULL);
 if (ret) {
 error_report("%s: vfio_container_dma_unmap() failed: %s", __func__,
  strerror(-ret));
@@ -414,6 +414,7 @@ static int 
vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
 {
 VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
 listener);
+VFIOContainer *bcontainer = vrdl->bcontainer;
 const hwaddr end = section->offset_within_region +
int128_get64(section->size);
 hwaddr start, next, iova;
@@ -432,8 +433,8 @@ static int 
vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
section->offset_within_address_space;
 vaddr = memory_region_get_ram_ptr(section->mr) + start;
 
-ret = vfio_container_dma_map(&vrdl->container->bcontainer, iova,
- next - start, vaddr, section->readonly);
+ret = vfio_container_dma_map(bcontainer, iova, next - start,
+ vaddr, section->readonly);
 if (ret) {
 /* Rollback */
 vfio_ram_discard_notify_discard(rdl, section);
@@ -443,7 +444,7 @@ static int 
vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
 return 0;
 }
 
-static void vfio_register_ram_discard_listener(VFIOLegacyContainer *container,
+static void vfio_register_ram_discard_listener(VFIOContainer *bcontainer,
MemoryRegionSection *section)
 {
 RamDiscardManager *rdm = 
memory_region_ge

[PATCH v2 04/27] vfio/container: Switch to dma_map|unmap API

2023-10-16 Thread Zhenzhong Duan

From: Eric Auger 

No fucntional change intended.

Signed-off-by: Eric Auger 
Signed-off-by: Yi Liu 
Signed-off-by: Yi Sun 
Signed-off-by: Zhenzhong Duan 
---
 include/hw/vfio/vfio-common.h |  4 ---
 include/hw/vfio/vfio-container-base.h |  7 +
 hw/vfio/common.c  | 45 +++
 hw/vfio/container-base.c  | 22 +
 hw/vfio/container.c   | 25 +++
 hw/vfio/trace-events  |  2 +-
 6 files changed, 74 insertions(+), 31 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 9651cf921c..f2aa122c47 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -212,10 +212,6 @@ void vfio_put_address_space(VFIOAddressSpace *space);
 bool vfio_devices_all_running_and_saving(VFIOLegacyContainer *container);
 
 /* container->fd */
-int vfio_dma_unmap(VFIOLegacyContainer *container, hwaddr iova,
-   ram_addr_t size, IOMMUTLBEntry *iotlb);
-int vfio_dma_map(VFIOLegacyContainer *container, hwaddr iova,
- ram_addr_t size, void *vaddr, bool readonly);
 int vfio_set_dirty_page_tracking(VFIOLegacyContainer *container, bool start);
 int vfio_query_dirty_bitmap(VFIOLegacyContainer *container, VFIOBitmap *vbmap,
 hwaddr iova, hwaddr size);
diff --git a/include/hw/vfio/vfio-container-base.h 
b/include/hw/vfio/vfio-container-base.h
index 226e960fb5..1483e77441 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -46,6 +46,13 @@ struct VFIOContainer {
 VFIOIOMMUBackendOpsClass *ops;
 };
 
+int vfio_container_dma_map(VFIOContainer *bcontainer,
+   hwaddr iova, ram_addr_t size,
+   void *vaddr, bool readonly);
+int vfio_container_dma_unmap(VFIOContainer *bcontainer,
+ hwaddr iova, ram_addr_t size,
+ IOMMUTLBEntry *iotlb);
+
 #define TYPE_VFIO_IOMMU_BACKEND_LEGACY_OPS "vfio-iommu-backend-legacy-ops"
 #define TYPE_VFIO_IOMMU_BACKEND_OPS "vfio-iommu-backend-ops"
 
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index b51ef3a15a..6be1526d79 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -337,7 +337,7 @@ static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void 
**vaddr,
 static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
 {
 VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
-VFIOLegacyContainer *container = giommu->container;
+VFIOContainer *bcontainer = &giommu->container->bcontainer;
 hwaddr iova = iotlb->iova + giommu->iommu_offset;
 void *vaddr;
 int ret;
@@ -367,21 +367,22 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
  * of vaddr will always be there, even if the memory object is
  * destroyed and its backing memory munmap-ed.
  */
-ret = vfio_dma_map(container, iova,
-   iotlb->addr_mask + 1, vaddr,
-   read_only);
+ret = vfio_container_dma_map(bcontainer, iova,
+ iotlb->addr_mask + 1, vaddr,
+ read_only);
 if (ret) {
-error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
+error_report("vfio_container_dma_map(%p, 0x%"HWADDR_PRIx", "
  "0x%"HWADDR_PRIx", %p) = %d (%s)",
- container, iova,
+ bcontainer, iova,
  iotlb->addr_mask + 1, vaddr, ret, strerror(-ret));
 }
 } else {
-ret = vfio_dma_unmap(container, iova, iotlb->addr_mask + 1, iotlb);
+ret = vfio_container_dma_unmap(bcontainer, iova,
+   iotlb->addr_mask + 1, iotlb);
 if (ret) {
-error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
+error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
  "0x%"HWADDR_PRIx") = %d (%s)",
- container, iova,
+ bcontainer, iova,
  iotlb->addr_mask + 1, ret, strerror(-ret));
 vfio_set_migration_error(ret);
 }
@@ -400,9 +401,10 @@ static void 
vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
 int ret;
 
 /* Unmap with a single call. */
-ret = vfio_dma_unmap(vrdl->container, iova, size , NULL);
+ret = vfio_container_dma_unmap(&vrdl->container->bcontainer,
+   iova, size , NULL);
 if (ret) {
-error_report("%s: vfio_dma_unmap() failed: %s", __func__,
+error_report("%s: vfio_container_dma_unmap() failed: %s", __func__,
  strerror(-ret));
 }
 }
@@ -430,8 +432,8 @@ static int 
vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
section->offset_within_address_spa

[PATCH v2 24/27] vfio: Allow the selection of a given iommu backend for platform ap and ccw

2023-10-16 Thread Zhenzhong Duan

Previously we added support to select iommu backend for vfio pci
device. Now we added others, E.g: platform, ap and ccw.

Signed-off-by: Zhenzhong Duan 
---
 include/hw/vfio/vfio-platform.h | 1 +
 hw/vfio/ap.c| 5 +
 hw/vfio/ccw.c   | 5 +
 hw/vfio/platform.c  | 4 
 4 files changed, 15 insertions(+)

diff --git a/include/hw/vfio/vfio-platform.h b/include/hw/vfio/vfio-platform.h
index c414c3dffc..f57f4276f2 100644
--- a/include/hw/vfio/vfio-platform.h
+++ b/include/hw/vfio/vfio-platform.h
@@ -18,6 +18,7 @@
 
 #include "hw/sysbus.h"
 #include "hw/vfio/vfio-common.h"
+#include "sysemu/iommufd.h"
 #include "qemu/event_notifier.h"
 #include "qemu/queue.h"
 #include "qom/object.h"
diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
index 5f257bffb9..1f8e88aeb3 100644
--- a/hw/vfio/ap.c
+++ b/hw/vfio/ap.c
@@ -16,6 +16,7 @@
 #include "qapi/error.h"
 #include "hw/vfio/vfio.h"
 #include "hw/vfio/vfio-common.h"
+#include "sysemu/iommufd.h"
 #include "hw/s390x/ap-device.h"
 #include "qemu/error-report.h"
 #include "qemu/event_notifier.h"
@@ -205,6 +206,10 @@ static void vfio_ap_unrealize(DeviceState *dev)
 
 static Property vfio_ap_properties[] = {
 DEFINE_PROP_STRING("sysfsdev", VFIOAPDevice, vdev.sysfsdev),
+#ifdef CONFIG_IOMMUFD
+DEFINE_PROP_LINK("iommufd", VFIOAPDevice, vdev.iommufd,
+ TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
+#endif
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index 6623ae237b..c7f8e70783 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -22,6 +22,7 @@
 #include "qapi/error.h"
 #include "hw/vfio/vfio.h"
 #include "hw/vfio/vfio-common.h"
+#include "sysemu/iommufd.h"
 #include "hw/s390x/s390-ccw.h"
 #include "hw/s390x/vfio-ccw.h"
 #include "hw/qdev-properties.h"
@@ -678,6 +679,10 @@ static void vfio_ccw_unrealize(DeviceState *dev)
 static Property vfio_ccw_properties[] = {
 DEFINE_PROP_STRING("sysfsdev", VFIOCCWDevice, vdev.sysfsdev),
 DEFINE_PROP_BOOL("force-orb-pfch", VFIOCCWDevice, force_orb_pfch, false),
+#ifdef CONFIG_IOMMUFD
+DEFINE_PROP_LINK("iommufd", VFIOCCWDevice, vdev.iommufd,
+ TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
+#endif
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index 8e3d4ac458..a1c25e0337 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -649,6 +649,10 @@ static Property vfio_platform_dev_properties[] = {
 DEFINE_PROP_UINT32("mmap-timeout-ms", VFIOPlatformDevice,
mmap_timeout, 1100),
 DEFINE_PROP_BOOL("x-irqfd", VFIOPlatformDevice, irqfd_allowed, true),
+#ifdef CONFIG_IOMMUFD
+DEFINE_PROP_LINK("iommufd", VFIOPlatformDevice, vbasedev.iommufd,
+ TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
+#endif
 DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.34.1

[PATCH v2 23/27] vfio/pci: Make vfio cdev pre-openable by passing a file handle

2023-10-16 Thread Zhenzhong Duan

This gives management tools like libvirt a chance to open the vfio
cdev with privilege and pass FD to qemu. This way qemu never needs
to have privilege to open a VFIO or iommu cdev node.

Together with the earlier support of pre-opening /dev/iommu device,
now we have full support of passing a vfio device to unprivileged
qemu by management tool. This mode is no more considered for the
legacy backend. So let's remove the "TODO" comment.

Add a helper function vfio_device_get_name() to check fd and get
device name, it will also be used by other vfio devices.

There is no easy way to check if a device is mdev with FD passing,
so fail the x-balloon-allowed check unconditionally in this case.

There is also no easy way to get BDF as name with FD passing, so
we fake a name by VFIO_FD[fd].

Signed-off-by: Zhenzhong Duan 
---
 include/hw/vfio/vfio-common.h |  1 +
 hw/vfio/helpers.c | 33 +
 hw/vfio/iommufd.c | 12 +++
 hw/vfio/pci.c | 40 ---
 4 files changed, 70 insertions(+), 16 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index e72f5962ee..e6804baa6d 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -244,6 +244,7 @@ struct vfio_info_cap_header *
 vfio_get_device_info_cap(struct vfio_device_info *info, uint16_t id);
 struct vfio_info_cap_header *
 vfio_get_cap(void *ptr, uint32_t cap_offset, uint16_t id);
+int vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
 #endif
 extern const MemoryListener vfio_prereg_listener;
 
diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
index 7e5da21b31..70c65cf71d 100644
--- a/hw/vfio/helpers.c
+++ b/hw/vfio/helpers.c
@@ -610,3 +610,36 @@ bool vfio_has_region_cap(VFIODevice *vbasedev, int region, 
uint16_t cap_type)
 
 return ret;
 }
+
+int vfio_device_get_name(VFIODevice *vbasedev, Error **errp)
+{
+struct stat st;
+
+if (vbasedev->fd < 0) {
+if (stat(vbasedev->sysfsdev, &st) < 0) {
+error_setg_errno(errp, errno, "no such host device");
+error_prepend(errp, VFIO_MSG_PREFIX, vbasedev->sysfsdev);
+return -errno;
+}
+/* User may specify a name, e.g: VFIO platform device */
+if (!vbasedev->name) {
+vbasedev->name = g_path_get_basename(vbasedev->sysfsdev);
+}
+}
+#ifdef CONFIG_IOMMUFD
+else {
+if (!vbasedev->iommufd) {
+error_setg(errp, "Use FD passing only with iommufd backend");
+return -EINVAL;
+}
+/*
+ * Give a name with fd so any function printing out vbasedev->name
+ * will not break.
+ */
+if (!vbasedev->name) {
+vbasedev->name = g_strdup_printf("VFIO_FD%d", vbasedev->fd);
+}
+}
+#endif
+return 0;
+}
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index ee8c4620b6..aabc1d1024 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -361,11 +361,15 @@ static int iommufd_attach_device(char *name, VFIODevice 
*vbasedev,
 uint32_t ioas_id;
 Error *err = NULL;
 
-devfd = vfio_get_devicefd(vbasedev->sysfsdev, errp);
-if (devfd < 0) {
-return devfd;
+if (vbasedev->fd < 0) {
+devfd = vfio_get_devicefd(vbasedev->sysfsdev, errp);
+if (devfd < 0) {
+return devfd;
+}
+vbasedev->fd = devfd;
+} else {
+devfd = vbasedev->fd;
 }
-vbasedev->fd = devfd;
 
 ret = iommufd_connect_and_bind(vbasedev, errp);
 if (ret) {
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 15e1b771b0..edb787d3d1 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -44,6 +44,7 @@
 #include "migration/qemu-file.h"
 #include "linux/iommufd.h"
 #include "sysemu/iommufd.h"
+#include "monitor/monitor.h"
 
 #define TYPE_VFIO_PCI_NOHOTPLUG "vfio-pci-nohotplug"
 
@@ -3257,18 +3258,23 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 VFIODevice *vbasedev = &vdev->vbasedev;
 char *tmp, *subsys;
 Error *err = NULL;
-struct stat st;
 int i, ret;
 bool is_mdev;
 char uuid[UUID_FMT_LEN];
 char *name;
 
-if (!vbasedev->sysfsdev) {
+if (vbasedev->fd < 0 && !vbasedev->sysfsdev) {
 if (!(~vdev->host.domain || ~vdev->host.bus ||
   ~vdev->host.slot || ~vdev->host.function)) {
 error_setg(errp, "No provided host device");
+#ifdef CONFIG_IOMMUFD
+error_append_hint(errp, "Use -device vfio-pci,host=:BB:DD.F, "
+  "-device vfio-pci,sysfsdev=PATH_TO_DEVICE "
+  "or -device vfio-pci,fd=DEVICE_FD\n");
+#else
 error_append_hint(errp, "Use -device vfio-pci,host=:BB:DD.F "
   "or -device vfio-pci,sysfsdev=PATH_TO_DEVICE\n");
+#endif
 return;
 }
 vbasedev->sysfsdev =
@@ -3277,13 +3283,9 @@ static void vfio_realize(PCIDevice *pdev, Err

[PATCH v2 21/27] vfio/pci: Adapt vfio pci hot reset support with iommufd BE

2023-10-16 Thread Zhenzhong Duan

As pci hot reset path need to reference pci specific functions
and data structures, adding container level callback functions
for legacy and iommufd BE and referencing those pci specific
func/data is no better than implementing reset support with
iommufd BE directly in pci.c

This way we can also share the common bus reset and system reset
path for both BEs.

A help function vfio_pci_get_pci_hot_reset_info() is extracted out
for usage by both BEs.

Signed-off-by: Zhenzhong Duan 
---
 hw/vfio/pci.c| 212 +++
 hw/vfio/trace-events |   1 +
 2 files changed, 196 insertions(+), 17 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index b27011cee7..24fc047423 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -42,6 +42,7 @@
 #include "qapi/error.h"
 #include "migration/blocker.h"
 #include "migration/qemu-file.h"
+#include "linux/iommufd.h"
 
 #define TYPE_VFIO_PCI_NOHOTPLUG "vfio-pci-nohotplug"
 
@@ -2445,22 +2446,13 @@ static bool vfio_pci_host_match(PCIHostDeviceAddress 
*addr, const char *name)
 return (strcmp(tmp, name) == 0);
 }
 
-static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
+static int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev,
+   struct vfio_pci_hot_reset_info **info_p)
 {
-VFIOGroup *group;
 struct vfio_pci_hot_reset_info *info;
-struct vfio_pci_dependent_device *devices;
-struct vfio_pci_hot_reset *reset;
-int32_t *fds;
-int ret, i, count;
-bool multi = false;
+int ret, count;
 
-trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
-
-if (!single) {
-vfio_pci_pre_reset(vdev);
-}
-vdev->vbasedev.needs_reset = false;
+assert(info_p && !*info_p);
 
 info = g_malloc0(sizeof(*info));
 info->argsz = sizeof(*info);
@@ -2468,24 +2460,53 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool 
single)
 ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
 if (ret && errno != ENOSPC) {
 ret = -errno;
+g_free(info);
 if (!vdev->has_pm_reset) {
 error_report("vfio: Cannot reset device %s, "
  "no available reset mechanism.", vdev->vbasedev.name);
 }
-goto out_single;
+return ret;
 }
 
 count = info->count;
-info = g_realloc(info, sizeof(*info) + (count * sizeof(*devices)));
-info->argsz = sizeof(*info) + (count * sizeof(*devices));
-devices = &info->devices[0];
+info = g_realloc(info, sizeof(*info) + (count * sizeof(info->devices[0])));
+info->argsz = sizeof(*info) + (count * sizeof(info->devices[0]));
 
 ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
 if (ret) {
 ret = -errno;
+g_free(info);
 error_report("vfio: hot reset info failed: %m");
+return ret;
+}
+
+*info_p = info;
+return 0;
+}
+
+static int vfio_pci_hot_reset_legacy(VFIOPCIDevice *vdev, bool single)
+{
+VFIOGroup *group;
+struct vfio_pci_hot_reset_info *info = NULL;
+struct vfio_pci_dependent_device *devices;
+struct vfio_pci_hot_reset *reset;
+int32_t *fds;
+int ret, i, count;
+bool multi = false;
+
+trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
+
+if (!single) {
+vfio_pci_pre_reset(vdev);
+}
+vdev->vbasedev.needs_reset = false;
+
+ret = vfio_pci_get_pci_hot_reset_info(vdev, &info);
+
+if (ret) {
 goto out_single;
 }
+devices = &info->devices[0];
 
 trace_vfio_pci_hot_reset_has_dep_devices(vdev->vbasedev.name);
 
@@ -2627,6 +2648,163 @@ out_single:
 return ret;
 }
 
+#ifdef CONFIG_IOMMUFD
+static VFIODevice *vfio_pci_find_by_iommufd_devid(__u32 devid)
+{
+VFIODevice *vbasedev_iter;
+VFIOIOMMUBackendOpsClass *ops = VFIO_IOMMU_BACKEND_OPS_CLASS(
+object_class_by_name(TYPE_VFIO_IOMMU_BACKEND_IOMMUFD_OPS));
+
+QLIST_FOREACH(vbasedev_iter, &vfio_device_list, global_next) {
+if (vbasedev_iter->bcontainer->ops != ops) {
+continue;
+}
+if (devid == vbasedev_iter->devid) {
+return vbasedev_iter;
+}
+}
+return NULL;
+}
+
+static int vfio_pci_hot_reset_iommufd(VFIOPCIDevice *vdev, bool single)
+{
+struct vfio_pci_hot_reset_info *info = NULL;
+struct vfio_pci_dependent_device *devices;
+struct vfio_pci_hot_reset *reset;
+int ret, i;
+bool multi = false;
+
+trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
+
+if (!single) {
+vfio_pci_pre_reset(vdev);
+}
+vdev->vbasedev.needs_reset = false;
+
+ret = vfio_pci_get_pci_hot_reset_info(vdev, &info);
+
+if (ret) {
+goto out_single;
+}
+
+assert(info->flags & VFIO_PCI_HOT_RESET_FLAG_DEV_ID);
+
+devices = &info->devices[0];
+
+if (!(info->flags & VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED)) {
+if (!vdev->

Re: [PATCH] vhost-user: Fix protocol feature bit conflict

2023-10-16 Thread Manos Pitsidianakis


On Mon, 16 Oct 2023 11:32, Hanna Czenczek  wrote:
diff --git a/include/hw/virtio/vhost-user.h 
b/include/hw/virtio/vhost-user.h

index 9f9ddf878d..1d4121431b 100644
--- a/include/hw/virtio/vhost-user.h
+++ b/include/hw/virtio/vhost-user.h
@@ -29,7 +29,8 @@ enum VhostUserProtocolFeature {
VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS = 14,
VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS = 15,
VHOST_USER_PROTOCOL_F_STATUS = 16,
-VHOST_USER_PROTOCOL_F_SHARED_OBJECT = 17,
+/* Feature 17 reserved for VHOST_USER_PROTOCOL_F_XEN_MMAP. */
+VHOST_USER_PROTOCOL_F_SHARED_OBJECT = 18,
VHOST_USER_PROTOCOL_F_MAX
};


May I ask, why not define VHOST_USER_PROTOCOL_F_XEN_MMAP as well instead 
of a comment mention?


Otherwise:

Reviewed-by: Emmanouil Pitsidianakis

Re: -drive if=none: can't we make this the default?

2023-10-16 Thread Daniel P . Berrangé

On Sat, Oct 14, 2023 at 10:16:16PM +0300, Michael Tokarev wrote:
> Can't we make -drive if=none the default?
> 
> Yes, I know current default is ide, and whole world have to use if=none 
> explicitly
> to undo this.  I think at this point we can deprecate if=ide default and 
> switch to
> if=none in the next release.  I think it will be a welcome change.

IMHO we'd be better off investing more effort in pushing people towards
-blockdev though better documentation of the latter.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH] tests/vm: netbsd: install dtc

2023-10-16 Thread Thomas Huth


On 13/10/2023 17.30, Paolo Bonzini wrote:

Install dtc as it is now a mandatory external dependency in order to build QEMU.

Signed-off-by: Paolo Bonzini 
---
  tests/vm/netbsd | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/tests/vm/netbsd b/tests/vm/netbsd
index 939dc1b22a1..3ef1ec2d9cc 100755
--- a/tests/vm/netbsd
+++ b/tests/vm/netbsd
@@ -40,6 +40,9 @@ class NetBSDVM(basevm.BaseVM):
  "gsed",
  "gettext-tools",
  
+# libs: basic

+"dtc",
+
  # libs: crypto
  "gnutls",
  


Does this work for you? When I run "make vm-build-netbsd", I'm still getting 
a failure:


76 packages to install:
  git-base-2.41.0nb1 pkgconf-1.9.5 xz-5.4.3 python310-3.10.12 
py310-expat-3.10.12nb1 ninja-build-1.11.1
  bash-5.2.15 gmake-4.4.1 gsed-4.9nb1 gettext-tools-0.21.1 dtc-1.7.0 
gnutls-3.8.0nb3 jpeg-9e png-1.6.39
  capstone-4.0.2 SDL2-2.26.5nb1 gtk3+-3.24.38 zstd-1.5.5 libslirp-4.7.0nb1 
pcre2-10.42 curl-8.1.2

  libuuid-2.32.1nb1 libffi-3.4.4 gettext-lib-0.21.1 p11-kit-0.24.1 nettle-3.9.1
  mozilla-rootcerts-1.0.20230505 libtasn1-4.19.0 libcfg+-0.7.0 gmp-6.2.1nb3 
wayland-protocols-1.31nb1
  wayland-1.21.0nb2 libxkbcommon-1.5.0nb1 libsamplerate-0.2.2nb4 
shared-mime-info-2.2nb2 pango-1.50.12nb1
  libcups-2.4.6nb1 libXft-2.3.8 hicolor-icon-theme-0.17nb1 glib2-2.74.6nb1 
gdk-pixbuf2-2.42.10nb2
  fribidi-1.0.13 freetype2-2.13.0nb1 fontconfig-2.14.2nb1 
cairo-gobject-1.16.0nb7 cairo-1.16.0nb9 atk-2.38.0
  at-spi2-atk-2.38.0nb1 lz4-1.9.4 nghttp2-1.54.0 libidn2-2.3.4 
readline-8.2nb2 libsndfile-1.2.0nb2
  fftw-3.3.10nb1 libxslt-1.1.38 libepoll-shim-0.0.20230411 
at-spi2-core-2.40.3nb2 lzo-2.10 brotli-1.0.9
  tiff-4.5.1nb1 libpaper-2.1.0nb2 dbus-1.14.6 harfbuzz-7.3.0 
graphite2-1.3.14nb1 libunistring-1.1
  libxml2-2.10.4nb1 libgcrypt-1.10.2 mpg123-1.31.3 libvorbis-1.3.7 
libopus-1.4 libogg-1.3.5nb1 lame-3.100nb5

  flac-1.4.2 jbigkit-2.1nb1 xmlcatmgr-2.2nb1 libgpg-error-1.47
[...]
installing dtc-1.7.0...
[...]
The Meson build system
Version: 0.63.3
Source dir: /home/qemu/qemu-test.Li0spd/src
Build dir: /home/qemu/qemu-test.Li0spd/build
Build type: native build
Project name: qemu
Project version: 8.1.50
C compiler for the host machine: cc -m64 -mcx16 (gcc 7.5.0 "cc (nb4 
20200810) 7.5.0")

C linker for the host machine: cc -m64 -mcx16 ld.bfd 2.31.1
[...]
Run-time dependency capstone found: YES 4.0.2
Library fdt found: NO
Initialized empty Git repository in 
/home/qemu/qemu-test.Li0spd/src/subprojects/dtc/.git/
fatal: unable to access 'https://gitlab.com/qemu-project/dtc.git/': SSL 
certificate problem: unable to get local issuer certificate


../src/meson.build:3076:4: ERROR: Git command failed: ['/usr/pkg/bin/git', 
'fetch', '--depth', '1', 'origin', 'b6910bec11614980a21e46fbccc35934b671bd81']


A full log can be found at 
/home/qemu/qemu-test.Li0spd/build/meson-logs/meson-log.txt


ERROR: meson setup failed

... so though the NetBSD people finally upgraded their dtc to a usable 
level, our meson.build seems to be unable to detect it?


 Thomas

Re: [PATCH] tests/docker: avoid invalid escape in Python string

2023-10-16 Thread Manos Pitsidianakis


On Mon, 16 Oct 2023 09:23, Paolo Bonzini  wrote:

This is an error in Python 3.12; fix it by using a raw string literal.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Paolo Bonzini 
---
tests/docker/docker.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/docker/docker.py b/tests/docker/docker.py
index 688ef62989c..3b8a26704df 100755
--- a/tests/docker/docker.py
+++ b/tests/docker/docker.py
@@ -186,7 +186,7 @@ def _check_binfmt_misc(executable):
  (binary))
return None, True

-m = re.search("interpreter (\S+)\n", entry)
+m = re.search(r"interpreter (\S+)\n", entry)
interp = m.group(1)
if interp and interp != executable:
print("binfmt_misc for %s does not point to %s, using %s" %
--
2.41.0




Reviewed-by: Emmanouil Pitsidianakis

Re: [PATCH v2 00/10] riscv: RVA22U64 profile support

2023-10-16 Thread Andrew Jones

On Thu, Oct 12, 2023 at 04:07:50PM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 10/11/23 00:01, Alistair Francis wrote:
> > On Sat, Oct 7, 2023 at 12:23 AM Daniel Henrique Barboza
> >  wrote:
> > > 
> > > Hi,
> > > 
> > > Several design changes were made in this version after the reviews and
> > > feedback in the v1 [1]. The high-level summary is:
> > > 
> > > - we'll no longer allow users to set profile flags for vendor CPUs. If
> > >we're to adhere to the current policy of not allowing users to enable
> > >extensions for vendor CPUs, the profile support would become a
> > >glorified way of checking if the vendor CPU happens to support a
> > >specific profile. If a future vendor CPU supports a profile the CPU
> > >can declare it manually in its cpu_init() function, the flag will
> > >still be set, but users can't change it;
> > > 
> > > - disabling a profile will now disable all the mandatory extensions from
> > >the CPU;
> > 
> > What happens if you enable one profile and disable a different one?
> 
> With this implementation as is the profiles will be evaluated by the order 
> they're
> declared in riscv_cpu_profiles[]. Which isn't exactly ideal since we're 
> exchanging
> a left-to-right ordering in the command line by an arbitrary order that we 
> happened
> to set in the code.
> 
> I can make some tweaks to make the profiles sensible to left-to-right order 
> between
> them, while keeping regular extension with higher priority. e.g.:
> 
> 
> -cpu rv64,zicbom=true,profileA=false,profileB=true,zicboz=false
> -cpu rv64,profileA=false,zicbom=true,zicboz=false,profileB=true
> -cpu rv64,profileA=false,profileB=true,zicbom=true,zicboz=false
> 
> These would all do the same thing: "keeping zicbom=true and zicboz=false, 
> disable profileA
> and then enable profile B"
> 
> Switching the profiles order would have a different result:
> 
> -cpu rv64,profileB=true,profileA=false,zicbom=true,zicboz=false
> 
> "keeping zicbom=true and zicboz=false, enable profile B and then disable 
> profile A"
> 
> 
> I'm happy to hear any other alternative/ideas. We'll either deal with some 
> left-to-right
> ordering w.r.t profiles or deal with an internal profile commit ordering. TBH 
> I think
> it's sensible to demand left-to-right command line ordering for profiles only.

left-to-right ordering is how the rest of QEMU properties work and scripts
depend on it. For example, one can do -cpu $MODEL,$DEFAULT_PROPS,$MORE_PROPS
where $MORE_PROPS can not only add more props but also override default
props (DEFAULT_PROPS='foo=off', MORE_PROPS='foo=on' - foo will be on).
left-to-right also works with multiple -cpu parameters, i.e. -cpu
$MODEL,$DEFAULT_PROPS -cpu $MODEL,$MY_PROPS will replace default props
with my props.

I don't think profiles should be treated special with regard to this. They
should behave the same as any property. If one does
profileA=off,profileB=on and there are overlapping extensions then a
sanity check in cpu-finalize should catch that and error out. Otherwise,
why not. Profiles are just like big 'G' extensions and 'G' would behave
the same way.

Thanks,
drew

Re: [PATCH 0/2] Move Fuloong2e PCI IRQ mapping to board code

2023-10-16 Thread Philippe Mathieu-Daudé


On 5/1/23 16:44, Bernhard Beschow wrote:


Bernhard Beschow (2):
   hw/pci-host/bonito: Inline pci_register_root_bus()
   hw/pci-host/bonito: Map PCI IRQs in board code


Thanks, queued to mips-next.

Re: [PATCH 01/17] meson: do not build shaders by default

2023-10-16 Thread Manos Pitsidianakis


On Mon, 16 Oct 2023 09:31, Paolo Bonzini  wrote:

They are not needed when building user-mode emulators.

Signed-off-by: Paolo Bonzini 


Reviewed-by: Emmanouil Pitsidianakis

Re: [PATCH] MAINTAINERS: Add a general architecture section for x86

2023-10-16 Thread Thomas Huth


On 29/09/2023 15.45, Thomas Huth wrote:

It's a little bit weird that the files in target/i386/ which
are not in a subfolder there do not have any associated
maintainer (and thus nobody might be CC:-ed on changes to
these files). We should have a general x86 section for these
files, similar to what we already have for s390x and mips.
Since Paolo is already listed as maintainer for both, the
x86 KVM and TCG CPUs, I'd like to suggest him as maintainer
for the general files, too.

Signed-off-by: Thomas Huth 
---
  Richard, being listed as x86 TCG CPU maintainer, do you
  want to be listed here, too?

  MAINTAINERS | 11 +++
  1 file changed, 11 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 3914bbd85b..5b4ab7d142 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -131,6 +131,17 @@ K: ^Subject:.*(?i)mips
  F: docs/system/target-mips.rst
  F: configs/targets/mips*
  
+X86 general architecture support

+M: Paolo Bonzini 
+S: Maintained
+F: configs/devices/i386-softmmu/default.mak
+F: configs/targets/i386-softmmu.mak
+F: configs/targets/x86_64-softmmu.mak
+F: docs/system/target-i386*
+F: target/i386/*.[ch]
+F: target/i386/Kconfig
+F: target/i386/meson.build
+
  Guest CPU cores (TCG)
  -
  Overall TCG CPUs


Friendly Ping!

Paolo, Richard, what do you think about this?

 Thomas

Re: [PATCH 0/3] hw/mips: Cleanup in preparation of heterogenous prototype

2023-10-16 Thread Philippe Mathieu-Daudé


On 9/10/23 19:14, Philippe Mathieu-Daudé wrote:


Philippe Mathieu-Daudé (3):
   hw/mips: Merge 'hw/mips/cpudevs.h' with 'target/mips/cpu.h'
   hw/misc/mips_itu: Declare itc_reconfigure() in 'hw/misc/mips_itu.h'
   hw/misc/mips_itu: Make MIPSITUState target agnostic


Queued to mips-next.

Re: [PATCH] tests/vm: netbsd: install dtc

2023-10-16 Thread Daniel P . Berrangé

On Mon, Oct 16, 2023 at 11:00:14AM +0200, Thomas Huth wrote:
> On 13/10/2023 17.30, Paolo Bonzini wrote:
> > Install dtc as it is now a mandatory external dependency in order to build 
> > QEMU.
> > 
> > Signed-off-by: Paolo Bonzini 
> > ---
> >   tests/vm/netbsd | 3 +++
> >   1 file changed, 3 insertions(+)
> > 
> > diff --git a/tests/vm/netbsd b/tests/vm/netbsd
> > index 939dc1b22a1..3ef1ec2d9cc 100755
> > --- a/tests/vm/netbsd
> > +++ b/tests/vm/netbsd
> > @@ -40,6 +40,9 @@ class NetBSDVM(basevm.BaseVM):
> >   "gsed",
> >   "gettext-tools",
> > +# libs: basic
> > +"dtc",
> > +
> >   # libs: crypto
> >   "gnutls",
> 
> Does this work for you? When I run "make vm-build-netbsd", I'm still getting
> a failure:
> 
> 76 packages to install:
>   git-base-2.41.0nb1 pkgconf-1.9.5 xz-5.4.3 python310-3.10.12
> py310-expat-3.10.12nb1 ninja-build-1.11.1
>   bash-5.2.15 gmake-4.4.1 gsed-4.9nb1 gettext-tools-0.21.1 dtc-1.7.0
> gnutls-3.8.0nb3 jpeg-9e png-1.6.39
>   capstone-4.0.2 SDL2-2.26.5nb1 gtk3+-3.24.38 zstd-1.5.5 libslirp-4.7.0nb1
> pcre2-10.42 curl-8.1.2
>   libuuid-2.32.1nb1 libffi-3.4.4 gettext-lib-0.21.1 p11-kit-0.24.1 
> nettle-3.9.1
>   mozilla-rootcerts-1.0.20230505 libtasn1-4.19.0 libcfg+-0.7.0 gmp-6.2.1nb3
> wayland-protocols-1.31nb1
>   wayland-1.21.0nb2 libxkbcommon-1.5.0nb1 libsamplerate-0.2.2nb4
> shared-mime-info-2.2nb2 pango-1.50.12nb1
>   libcups-2.4.6nb1 libXft-2.3.8 hicolor-icon-theme-0.17nb1 glib2-2.74.6nb1
> gdk-pixbuf2-2.42.10nb2
>   fribidi-1.0.13 freetype2-2.13.0nb1 fontconfig-2.14.2nb1
> cairo-gobject-1.16.0nb7 cairo-1.16.0nb9 atk-2.38.0
>   at-spi2-atk-2.38.0nb1 lz4-1.9.4 nghttp2-1.54.0 libidn2-2.3.4
> readline-8.2nb2 libsndfile-1.2.0nb2
>   fftw-3.3.10nb1 libxslt-1.1.38 libepoll-shim-0.0.20230411
> at-spi2-core-2.40.3nb2 lzo-2.10 brotli-1.0.9
>   tiff-4.5.1nb1 libpaper-2.1.0nb2 dbus-1.14.6 harfbuzz-7.3.0
> graphite2-1.3.14nb1 libunistring-1.1
>   libxml2-2.10.4nb1 libgcrypt-1.10.2 mpg123-1.31.3 libvorbis-1.3.7
> libopus-1.4 libogg-1.3.5nb1 lame-3.100nb5
>   flac-1.4.2 jbigkit-2.1nb1 xmlcatmgr-2.2nb1 libgpg-error-1.47
> [...]
> installing dtc-1.7.0...
> [...]
> The Meson build system
> Version: 0.63.3
> Source dir: /home/qemu/qemu-test.Li0spd/src
> Build dir: /home/qemu/qemu-test.Li0spd/build
> Build type: native build
> Project name: qemu
> Project version: 8.1.50
> C compiler for the host machine: cc -m64 -mcx16 (gcc 7.5.0 "cc (nb4
> 20200810) 7.5.0")
> C linker for the host machine: cc -m64 -mcx16 ld.bfd 2.31.1
> [...]
> Run-time dependency capstone found: YES 4.0.2
> Library fdt found: NO
> Initialized empty Git repository in
> /home/qemu/qemu-test.Li0spd/src/subprojects/dtc/.git/
> fatal: unable to access 'https://gitlab.com/qemu-project/dtc.git/': SSL
> certificate problem: unable to get local issuer certificate
> 
> ../src/meson.build:3076:4: ERROR: Git command failed: ['/usr/pkg/bin/git',
> 'fetch', '--depth', '1', 'origin',
> 'b6910bec11614980a21e46fbccc35934b671bd81']
> 
> A full log can be found at
> /home/qemu/qemu-test.Li0spd/build/meson-logs/meson-log.txt
> 
> ERROR: meson setup failed
> 
> ... so though the NetBSD people finally upgraded their dtc to a usable
> level, our meson.build seems to be unable to detect it?

They claim to have version 1.7.0

  https://ftp.netbsd.org/pub/pkgsrc/current/pkgsrc/sysutils/dtc/index.html

and we claim to want 1.5.0, so should be OK.

Suggests that our detection, or test compilation is failing. The
meson-log.txt might have more info, if you can access that ?


Also separately it appears we're missing the public CA cert bundle,
so we should not see a cert error from gitlab.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH] hw/mips/malta: Use sdram_type enum from 'hw/i2c/smbus_eeprom.h'

2023-10-16 Thread Philippe Mathieu-Daudé


On 9/10/23 11:21, Philippe Mathieu-Daudé wrote:

Since commit 93198b6cad ("i2c: Split smbus into parts") the SDRAM
types are enumerated as sdram_type in "hw/i2c/smbus_eeprom.h".

Using the enum removes this global shadow warning:

   hw/mips/malta.c:209:12: error: declaration shadows a variable in the global 
scope [-Werror,-Wshadow]
   enum { SDR = 0x4, DDR2 = 0x8 } type;
  ^
   include/hw/i2c/smbus_eeprom.h:33:19: note: previous declaration is here
   enum sdram_type { SDR = 0x4, DDR = 0x7, DDR2 = 0x8 };
 ^

Signed-off-by: Philippe Mathieu-Daudé 
---
  hw/mips/malta.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Queued to mips-next.

Re: [PATCH] tests/vm: netbsd: install dtc

2023-10-16 Thread Daniel P . Berrangé

On Mon, Oct 16, 2023 at 10:06:11AM +0100, Daniel P. Berrangé wrote:
> On Mon, Oct 16, 2023 at 11:00:14AM +0200, Thomas Huth wrote:
> > On 13/10/2023 17.30, Paolo Bonzini wrote:
> > > Install dtc as it is now a mandatory external dependency in order to 
> > > build QEMU.
> > > 
> > > Signed-off-by: Paolo Bonzini 
> > > ---
> > >   tests/vm/netbsd | 3 +++
> > >   1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/tests/vm/netbsd b/tests/vm/netbsd
> > > index 939dc1b22a1..3ef1ec2d9cc 100755
> > > --- a/tests/vm/netbsd
> > > +++ b/tests/vm/netbsd
> > > @@ -40,6 +40,9 @@ class NetBSDVM(basevm.BaseVM):
> > >   "gsed",
> > >   "gettext-tools",
> > > +# libs: basic
> > > +"dtc",
> > > +
> > >   # libs: crypto
> > >   "gnutls",
> > 
> > Does this work for you? When I run "make vm-build-netbsd", I'm still getting
> > a failure:

snip

> > Library fdt found: NO
> > Initialized empty Git repository in
> > /home/qemu/qemu-test.Li0spd/src/subprojects/dtc/.git/
> > fatal: unable to access 'https://gitlab.com/qemu-project/dtc.git/': SSL
> > certificate problem: unable to get local issuer certificate
> > 
> > ../src/meson.build:3076:4: ERROR: Git command failed: ['/usr/pkg/bin/git',
> > 'fetch', '--depth', '1', 'origin',
> > 'b6910bec11614980a21e46fbccc35934b671bd81']
> > 
> > A full log can be found at
> > /home/qemu/qemu-test.Li0spd/build/meson-logs/meson-log.txt
> > 
> > ERROR: meson setup failed
> > 
> > ... so though the NetBSD people finally upgraded their dtc to a usable
> > level, our meson.build seems to be unable to detect it?
> 
> They claim to have version 1.7.0
> 
>   https://ftp.netbsd.org/pub/pkgsrc/current/pkgsrc/sysutils/dtc/index.html
> 
> and we claim to want 1.5.0, so should be OK.
> 
> Suggests that our detection, or test compilation is failing. The
> meson-log.txt might have more info, if you can access that ?
> 
> 
> Also separately it appears we're missing the public CA cert bundle,
> so we should not see a cert error from gitlab.

The latter is presumably solvable with this:

  
https://ftp.netbsd.org/pub/pkgsrc/current/pkgsrc/security/ca-certificates/index.html


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 03/17] meson, cutils: allow non-relocatable installs

2023-10-16 Thread Manos Pitsidianakis


On Mon, 16 Oct 2023 09:31, Paolo Bonzini  wrote:

diff --git a/meson.build b/meson.build
index 010d2c649c2..251838f2609 100644
--- a/meson.build
+++ b/meson.build
@@ -2111,6 +2111,7 @@ config_host_data.set('CONFIG_OPENGL', opengl.found())
config_host_data.set('CONFIG_PLUGIN', get_option('plugins'))
config_host_data.set('CONFIG_RBD', rbd.found())
config_host_data.set('CONFIG_RDMA', rdma.found())
+config_host_data.set('CONFIG_RELOCATABLE', get_option('relocatable'))
config_host_data.set('CONFIG_SAFESTACK', get_option('safe_stack'))
config_host_data.set('CONFIG_SDL', sdl.found())
config_host_data.set('CONFIG_SDL_IMAGE', sdl_image.found())


Is relocatable a good choice here? The term is used in linking and might 
be confusing (when I read the subject that's what I thought it'd be 
about). How about 'movable`?


Otherwise:

Reviewed-by: Emmanouil Pitsidianakis

Re: [PATCH v2] MAINTANERS: Split vt82c686 out of fuloong2e

2023-10-16 Thread Philippe Mathieu-Daudé


On 15/10/23 16:15, BALATON Zoltan wrote:

The VIA south bridgges are now mostly used by other machines not just
fuloong2e so split off into a separate section and take maintanership.

Signed-off-by: BALATON Zoltan 
---




@@ -2491,6 +2488,15 @@ S: Maintained
  F: hw/isa/piix4.c
  F: include/hw/southbridge/piix.h
  
+VIA South Bridges (VT82C686B, VT8231)

+M: BALATON Zoltan 
+S: Maintained
+R: Philippe Mathieu-Daudé 


Thanks, queued to mips-next, keeping a 'M:' tag here
and fixing typos in subject / bridge.

Re: [PATCH] tests/vm: netbsd: install dtc

2023-10-16 Thread Thomas Huth


On 16/10/2023 11.06, Daniel P. Berrangé wrote:

On Mon, Oct 16, 2023 at 11:00:14AM +0200, Thomas Huth wrote:

On 13/10/2023 17.30, Paolo Bonzini wrote:

Install dtc as it is now a mandatory external dependency in order to build QEMU.

Signed-off-by: Paolo Bonzini 
---
   tests/vm/netbsd | 3 +++
   1 file changed, 3 insertions(+)

diff --git a/tests/vm/netbsd b/tests/vm/netbsd
index 939dc1b22a1..3ef1ec2d9cc 100755
--- a/tests/vm/netbsd
+++ b/tests/vm/netbsd
@@ -40,6 +40,9 @@ class NetBSDVM(basevm.BaseVM):
   "gsed",
   "gettext-tools",
+# libs: basic
+"dtc",
+
   # libs: crypto
   "gnutls",


Does this work for you? When I run "make vm-build-netbsd", I'm still getting
a failure:

76 packages to install:
   git-base-2.41.0nb1 pkgconf-1.9.5 xz-5.4.3 python310-3.10.12
py310-expat-3.10.12nb1 ninja-build-1.11.1
   bash-5.2.15 gmake-4.4.1 gsed-4.9nb1 gettext-tools-0.21.1 dtc-1.7.0
gnutls-3.8.0nb3 jpeg-9e png-1.6.39
   capstone-4.0.2 SDL2-2.26.5nb1 gtk3+-3.24.38 zstd-1.5.5 libslirp-4.7.0nb1
pcre2-10.42 curl-8.1.2
   libuuid-2.32.1nb1 libffi-3.4.4 gettext-lib-0.21.1 p11-kit-0.24.1 nettle-3.9.1
   mozilla-rootcerts-1.0.20230505 libtasn1-4.19.0 libcfg+-0.7.0 gmp-6.2.1nb3
wayland-protocols-1.31nb1
   wayland-1.21.0nb2 libxkbcommon-1.5.0nb1 libsamplerate-0.2.2nb4
shared-mime-info-2.2nb2 pango-1.50.12nb1
   libcups-2.4.6nb1 libXft-2.3.8 hicolor-icon-theme-0.17nb1 glib2-2.74.6nb1
gdk-pixbuf2-2.42.10nb2
   fribidi-1.0.13 freetype2-2.13.0nb1 fontconfig-2.14.2nb1
cairo-gobject-1.16.0nb7 cairo-1.16.0nb9 atk-2.38.0
   at-spi2-atk-2.38.0nb1 lz4-1.9.4 nghttp2-1.54.0 libidn2-2.3.4
readline-8.2nb2 libsndfile-1.2.0nb2
   fftw-3.3.10nb1 libxslt-1.1.38 libepoll-shim-0.0.20230411
at-spi2-core-2.40.3nb2 lzo-2.10 brotli-1.0.9
   tiff-4.5.1nb1 libpaper-2.1.0nb2 dbus-1.14.6 harfbuzz-7.3.0
graphite2-1.3.14nb1 libunistring-1.1
   libxml2-2.10.4nb1 libgcrypt-1.10.2 mpg123-1.31.3 libvorbis-1.3.7
libopus-1.4 libogg-1.3.5nb1 lame-3.100nb5
   flac-1.4.2 jbigkit-2.1nb1 xmlcatmgr-2.2nb1 libgpg-error-1.47
[...]
installing dtc-1.7.0...
[...]
The Meson build system
Version: 0.63.3
Source dir: /home/qemu/qemu-test.Li0spd/src
Build dir: /home/qemu/qemu-test.Li0spd/build
Build type: native build
Project name: qemu
Project version: 8.1.50
C compiler for the host machine: cc -m64 -mcx16 (gcc 7.5.0 "cc (nb4
20200810) 7.5.0")
C linker for the host machine: cc -m64 -mcx16 ld.bfd 2.31.1
[...]
Run-time dependency capstone found: YES 4.0.2
Library fdt found: NO
Initialized empty Git repository in
/home/qemu/qemu-test.Li0spd/src/subprojects/dtc/.git/
fatal: unable to access 'https://gitlab.com/qemu-project/dtc.git/': SSL
certificate problem: unable to get local issuer certificate

../src/meson.build:3076:4: ERROR: Git command failed: ['/usr/pkg/bin/git',
'fetch', '--depth', '1', 'origin',
'b6910bec11614980a21e46fbccc35934b671bd81']

A full log can be found at
/home/qemu/qemu-test.Li0spd/build/meson-logs/meson-log.txt

ERROR: meson setup failed

... so though the NetBSD people finally upgraded their dtc to a usable
level, our meson.build seems to be unable to detect it?


They claim to have version 1.7.0

   https://ftp.netbsd.org/pub/pkgsrc/current/pkgsrc/sysutils/dtc/index.html

and we claim to want 1.5.0, so should be OK.

Suggests that our detection, or test compilation is failing. The
meson-log.txt might have more info, if you can access that ?


Look like libfdt is installed there in an unusual location?
I can make it work with this patch on top:

diff --git a/tests/vm/netbsd b/tests/vm/netbsd
index fdf8064cef..2ccc7f2cdd 100755
--- a/tests/vm/netbsd
+++ b/tests/vm/netbsd
@@ -69,8 +69,9 @@ class NetBSDVM(basevm.BaseVM):
 cd $(mktemp -d /home/qemu/qemu-test.XX);
 mkdir src build; cd src;
 tar -xf /dev/rld1a;
-cd ../build
-../src/configure --disable-opengl {configure_opts};
+cd ../build;
+../src/configure --disable-opengl --extra-ldflags=-L/usr/pkg/lib \
+ --extra-cflags=-I/usr/pkg/include {configure_opts};
 gmake --output-sync -j{jobs} {target} {verbose};
 """
 poweroff = "/sbin/poweroff"

Could you add that to your patch, Paolo?

 Thanks,
  Thomas

Re: [PATCH 04/17] configure: clean up handling of CFI option

2023-10-16 Thread Philippe Mathieu-Daudé


On 16/10/23 08:31, Paolo Bonzini wrote:

Avoid that --enable-cfi --disable-cfi leaves b_lto set to true.

Signed-off-by: Paolo Bonzini 
---
  configure | 7 +++
  1 file changed, 3 insertions(+), 4 deletions(-)




@@ -1845,6 +1843,7 @@ if test "$skip_meson" = no; then
  
# QEMU options

test "$cfi" != false && meson_option_add "-Dcfi=$cfi"
+  test "$cfi" != false && meson_option_add "-Db_lto=$cfi"


Merge as "-Dcfi=$cfi -Db_lto=$cfi"?


test "$docs" != auto && meson_option_add "-Ddocs=$docs"
test -n "${LIB_FUZZING_ENGINE+xxx}" && meson_option_add 
"-Dfuzzing_engine=$LIB_FUZZING_ENGINE"
test "$plugins" = yes && meson_option_add "-Dplugins=true"

Re: [PATCH 08/17] configure, tests/tcg: simplify GDB conditionals

2023-10-16 Thread Manos Pitsidianakis


On Mon, 16 Oct 2023 09:31, Paolo Bonzini  wrote:

Unify HAVE_GDB_BIN (currently in config-host.mak) and
HOST_GDB_SUPPORTS_ARCH into a single GDB variable in
config-target.mak.

Signed-off-by: Paolo Bonzini 


Reviewed-by: Emmanouil Pitsidianakis

[PATCH] contrib/plugins: Close file descriptor on connect failure

2023-10-16 Thread Cong Liu

This patch closes the file descriptor fd on connect failure to avoid
resource leak.

Signed-off-by: Cong Liu 
---
 contrib/plugins/lockstep.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/contrib/plugins/lockstep.c b/contrib/plugins/lockstep.c
index f0cb8792c6fa..3c0f2b485181 100644
--- a/contrib/plugins/lockstep.c
+++ b/contrib/plugins/lockstep.c
@@ -303,6 +303,7 @@ static bool connect_socket(const char *path)
 sockaddr.sun_family = AF_UNIX;
 if (g_strlcpy(sockaddr.sun_path, path, pathlen) >= pathlen) {
 perror("bad path");
+close(fd);
 return false;
 }
 
-- 
2.34.1

Re: [PATCH 11/17] configure: remove some dead cruft

2023-10-16 Thread Thomas Huth


On 16/10/2023 08.31, Paolo Bonzini wrote:

print_error is only invoked in one place, and $git is unused.

Signed-off-by: Paolo Bonzini 
---
  configure | 7 +--
  1 file changed, 1 insertion(+), 6 deletions(-)



Reviewed-by: Thomas Huth

Re: [PATCH] hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps

2023-10-16 Thread Cédric Le Goater


On 10/16/23 07:03, Philippe Mathieu-Daudé wrote:

Hi Cédric, Liu, Joao,

On 13/10/23 16:56, Cédric Le Goater wrote:

From: Liu Yi L 

This patch modifies pci_setup_iommu() to set PCIIOMMUOps
instead of setting PCIIOMMUFunc. PCIIOMMUFunc is used to
get an address space for a PCI device in vendor specific
way. The PCIIOMMUOps still offers this functionality. But
using PCIIOMMUOps leaves space to add more iommu related
vendor specific operations.

Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Peter Xu 
Cc: Eric Auger 
Cc: Yi Sun 
Cc: David Gibson 
Cc: "Michael S. Tsirkin" 
Cc: Eric Auger 
Cc: Peter Maydell 
Cc: Paolo Bonzini 
Cc: Peter Xu 
Cc: Jason Wang 
Cc: Andrey Smirnov 
Cc: Helge Deller 
Cc: "Hervé Poussineau" 
Cc: Mark Cave-Ayland 
Cc: BALATON Zoltan 
Cc: Elena Ufimtseva 
Cc: Jagannathan Raman 
Cc: Matthew Rosato 
Cc: Eric Farman 
Cc: Halil Pasic 
Cc: Christian Borntraeger 
Cc: Thomas Huth 
Reviewed-by: David Gibson 
Reviewed-by: Peter Xu 
Signed-off-by: Liu Yi L 
[ clg: - refreshed on latest QEMU
    - included hw/remote/iommu.c   ]
Signed-off-by: Cédric Le Goater 
---


  Hello,

  Initially sent by Yi Liu as part of series "intel_iommu: expose
  Shared Virtual Addressing to VMs" [1], this patch would also simplify
  the changes Joao wants to introduce in "vfio: VFIO migration support
  with vIOMMU" [2].

  Has anyone objections ?

  Thanks,

  C.

  [1] 
https://lore.kernel.org/qemu-devel/20210302203827.437645-5-yi.l@intel.com/
  [2] 
https://lore.kernel.org/qemu-devel/20230622214845.3980-1-joao.m.mart...@oracle.com/



  include/hw/pci/pci.h |  8 ++--
  include/hw/pci/pci_bus.h |  2 +-
  hw/alpha/typhoon.c   |  6 +-
  hw/arm/smmu-common.c |  6 +-
  hw/i386/amd_iommu.c  |  6 +-
  hw/i386/intel_iommu.c    |  6 +-
  hw/pci-host/designware.c |  6 +-
  hw/pci-host/dino.c   |  6 +-
  hw/pci-host/pnv_phb3.c   |  6 +-
  hw/pci-host/pnv_phb4.c   |  6 +-
  hw/pci-host/ppce500.c    |  6 +-
  hw/pci-host/raven.c  |  6 +-
  hw/pci-host/sabre.c  |  6 +-
  hw/pci/pci.c | 18 +-
  hw/ppc/ppc440_pcix.c |  6 +-
  hw/ppc/spapr_pci.c   |  6 +-
  hw/remote/iommu.c    |  6 +-
  hw/s390x/s390-pci-bus.c  |  8 ++--
  hw/virtio/virtio-iommu.c |  6 +-
  19 files changed, 101 insertions(+), 25 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 
b70a0b95ff5ae367ed7f98483ec8d1d1b6274530..486e54174b1755995328f2352fd4571d01e107dc
 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -366,10 +366,14 @@ void pci_bus_get_w64_range(PCIBus *bus, Range *range);
  void pci_device_deassert_intx(PCIDevice *dev);
-typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *, void *, int);
+typedef struct PCIIOMMUOps PCIIOMMUOps;


Preferably:

typedef ...


+struct PCIIOMMUOps {


yes.


    /* documentation ... */


+    AddressSpace * (*get_address_space)(PCIBus *bus,
+    void *opaque, int32_t devfn);
+};


... PCIIOMMUOps;

Should this be PciIommuOps?


I think this is one of the exceptions to the QEMU coding style and
the other PCI types keep a capital PCI, PCIIORegion, PCIINTxRoute,
PCIEAERErr, etc.

Let's be consistent with the existing naming scheme, PCIIOMMUOps.


Do we need 'int32_t' for devfn or 'int' is enough?


int is enough.


Would "lookup_address_space" be clearer?


The calling routing is pci_device_iommu_address_space(). Let's keep
get_address_space() for now.


  AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
-void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque);


Since the prototype is modified, we can take the opportunity to
document it :)


OK. That would be the first documentation entry in pci.h. I guess it
won't do any harm but will it be collected in the documentation under
"Internal QEMU APIs" ?
 

+void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *iommu_ops, void *opaque);


Otherwise the change makes sense.



Thanks,

C.

Re: [PATCH v2 03/16] target/arm: Move internal declarations from 'cpu-qom.h' to 'cpu.h'

2023-10-16 Thread Philippe Mathieu-Daudé


On 13/10/23 16:27, Richard Henderson wrote:

On 10/13/23 07:01, Philippe Mathieu-Daudé wrote:

These definitions and declarations are only used by
target/arm/, no need to expose them to generic hw/.

Signed-off-by: Philippe Mathieu-Daudé 
---
  target/arm/cpu-qom.h | 28 
  target/arm/cpu.h | 28 
  2 files changed, 28 insertions(+), 28 deletions(-)




diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index fb1b08371c..06f92dacb9 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1116,11 +1116,39 @@ struct ArchCPU {
  uint64_t gt_cntfrq_hz;
  };
+/* Callback functions for the generic timer's timers. */
+void arm_gt_ptimer_cb(void *opaque);
+void arm_gt_vtimer_cb(void *opaque);
+void arm_gt_htimer_cb(void *opaque);
+void arm_gt_stimer_cb(void *opaque);
+void arm_gt_hvtimer_cb(void *opaque);
+
  unsigned int gt_cntfrq_period_ns(ARMCPU *cpu);
  void gt_rme_post_el_change(ARMCPU *cpu, void *opaque);
  void arm_cpu_post_init(Object *obj);
+void arm_cpu_register(const ARMCPUInfo *info);
+void aarch64_cpu_register(const ARMCPUInfo *info);
+
+void register_cp_regs_for_features(ARMCPU *cpu);
+void init_cpreg_list(ARMCPU *cpu);


These can go to internals.h.


OK, I'm squashing:

-- >8 --
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index ad2f32efd5..2bd8aaff3d 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1153,12 +1153,6 @@ void gt_rme_post_el_change(ARMCPU *cpu, void 
*opaque);


 void arm_cpu_post_init(Object *obj);

-void arm_cpu_register(const ARMCPUInfo *info);
-void aarch64_cpu_register(const ARMCPUInfo *info);
-
-void register_cp_regs_for_features(ARMCPU *cpu);
-void init_cpreg_list(ARMCPU *cpu);
-
 #define ARM_AFF0_SHIFT 0
 #define ARM_AFF0_MASK  (0xFFULL << ARM_AFF0_SHIFT)
 #define ARM_AFF1_SHIFT 8
diff --git a/target/arm/internals.h b/target/arm/internals.h
index 1dd9182a54..cfd64145ea 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -182,6 +182,12 @@ static inline int r14_bank_number(int mode)
 return (mode == ARM_CPU_MODE_HYP) ? BANK_USRSYS : bank_number(mode);
 }

+void arm_cpu_register(const ARMCPUInfo *info);
+void aarch64_cpu_register(const ARMCPUInfo *info);
+
+void register_cp_regs_for_features(ARMCPU *cpu);
+void init_cpreg_list(ARMCPU *cpu);
+
 void arm_cpu_register_gdb_regs_for_features(ARMCPU *cpu);
 void arm_translate_init(void);

---



Otherwise,
Reviewed-by: Richard Henderson 


Thanks!

Re: [PATCH] tests/vm: netbsd: install dtc

2023-10-16 Thread Paolo Bonzini

On Mon, Oct 16, 2023 at 11:21 AM Thomas Huth  wrote:
>
> On 16/10/2023 11.06, Daniel P. Berrangé wrote:
> > On Mon, Oct 16, 2023 at 11:00:14AM +0200, Thomas Huth wrote:
> >> On 13/10/2023 17.30, Paolo Bonzini wrote:
> >>> Install dtc as it is now a mandatory external dependency in order to 
> >>> build QEMU.
> >>>
> >>> Signed-off-by: Paolo Bonzini 
> >>> ---
> >>>tests/vm/netbsd | 3 +++
> >>>1 file changed, 3 insertions(+)
> >>>
> >>> diff --git a/tests/vm/netbsd b/tests/vm/netbsd
> >>> index 939dc1b22a1..3ef1ec2d9cc 100755
> >>> --- a/tests/vm/netbsd
> >>> +++ b/tests/vm/netbsd
> >>> @@ -40,6 +40,9 @@ class NetBSDVM(basevm.BaseVM):
> >>>"gsed",
> >>>"gettext-tools",
> >>> +# libs: basic
> >>> +"dtc",
> >>> +
> >>># libs: crypto
> >>>"gnutls",
> >>
> >> Does this work for you? When I run "make vm-build-netbsd", I'm still 
> >> getting
> >> a failure:
> >>
> >> 76 packages to install:
> >>git-base-2.41.0nb1 pkgconf-1.9.5 xz-5.4.3 python310-3.10.12
> >> py310-expat-3.10.12nb1 ninja-build-1.11.1
> >>bash-5.2.15 gmake-4.4.1 gsed-4.9nb1 gettext-tools-0.21.1 dtc-1.7.0
> >> gnutls-3.8.0nb3 jpeg-9e png-1.6.39
> >>capstone-4.0.2 SDL2-2.26.5nb1 gtk3+-3.24.38 zstd-1.5.5 libslirp-4.7.0nb1
> >> pcre2-10.42 curl-8.1.2
> >>libuuid-2.32.1nb1 libffi-3.4.4 gettext-lib-0.21.1 p11-kit-0.24.1 
> >> nettle-3.9.1
> >>mozilla-rootcerts-1.0.20230505 libtasn1-4.19.0 libcfg+-0.7.0 
> >> gmp-6.2.1nb3
> >> wayland-protocols-1.31nb1
> >>wayland-1.21.0nb2 libxkbcommon-1.5.0nb1 libsamplerate-0.2.2nb4
> >> shared-mime-info-2.2nb2 pango-1.50.12nb1
> >>libcups-2.4.6nb1 libXft-2.3.8 hicolor-icon-theme-0.17nb1 glib2-2.74.6nb1
> >> gdk-pixbuf2-2.42.10nb2
> >>fribidi-1.0.13 freetype2-2.13.0nb1 fontconfig-2.14.2nb1
> >> cairo-gobject-1.16.0nb7 cairo-1.16.0nb9 atk-2.38.0
> >>at-spi2-atk-2.38.0nb1 lz4-1.9.4 nghttp2-1.54.0 libidn2-2.3.4
> >> readline-8.2nb2 libsndfile-1.2.0nb2
> >>fftw-3.3.10nb1 libxslt-1.1.38 libepoll-shim-0.0.20230411
> >> at-spi2-core-2.40.3nb2 lzo-2.10 brotli-1.0.9
> >>tiff-4.5.1nb1 libpaper-2.1.0nb2 dbus-1.14.6 harfbuzz-7.3.0
> >> graphite2-1.3.14nb1 libunistring-1.1
> >>libxml2-2.10.4nb1 libgcrypt-1.10.2 mpg123-1.31.3 libvorbis-1.3.7
> >> libopus-1.4 libogg-1.3.5nb1 lame-3.100nb5
> >>flac-1.4.2 jbigkit-2.1nb1 xmlcatmgr-2.2nb1 libgpg-error-1.47
> >> [...]
> >> installing dtc-1.7.0...
> >> [...]
> >> The Meson build system
> >> Version: 0.63.3
> >> Source dir: /home/qemu/qemu-test.Li0spd/src
> >> Build dir: /home/qemu/qemu-test.Li0spd/build
> >> Build type: native build
> >> Project name: qemu
> >> Project version: 8.1.50
> >> C compiler for the host machine: cc -m64 -mcx16 (gcc 7.5.0 "cc (nb4
> >> 20200810) 7.5.0")
> >> C linker for the host machine: cc -m64 -mcx16 ld.bfd 2.31.1
> >> [...]
> >> Run-time dependency capstone found: YES 4.0.2
> >> Library fdt found: NO
> >> Initialized empty Git repository in
> >> /home/qemu/qemu-test.Li0spd/src/subprojects/dtc/.git/
> >> fatal: unable to access 'https://gitlab.com/qemu-project/dtc.git/': SSL
> >> certificate problem: unable to get local issuer certificate
> >>
> >> ../src/meson.build:3076:4: ERROR: Git command failed: ['/usr/pkg/bin/git',
> >> 'fetch', '--depth', '1', 'origin',
> >> 'b6910bec11614980a21e46fbccc35934b671bd81']
> >>
> >> A full log can be found at
> >> /home/qemu/qemu-test.Li0spd/build/meson-logs/meson-log.txt
> >>
> >> ERROR: meson setup failed
> >>
> >> ... so though the NetBSD people finally upgraded their dtc to a usable
> >> level, our meson.build seems to be unable to detect it?
> >
> > They claim to have version 1.7.0
> >
> >https://ftp.netbsd.org/pub/pkgsrc/current/pkgsrc/sysutils/dtc/index.html
> >
> > and we claim to want 1.5.0, so should be OK.
> >
> > Suggests that our detection, or test compilation is failing. The
> > meson-log.txt might have more info, if you can access that ?
>
> Look like libfdt is installed there in an unusual location?

Indeed, and it looks like it's intentional; from https://pkgin.net/,
for example:

> Invoke the configure script, for example:
>
> $ ./configure --prefix=/usr/pkg --with-libraries=/usr/pkg/lib 
> --with-includes=/usr/pkg/include
>
> And finally build the binary:
>
> $ make

Paolo

[PATCH] tests/vm: avoid invalid escape in Python string

2023-10-16 Thread Paolo Bonzini

This is an error in Python 3.12; fix it by using a raw string literal
or by double-escaping the backslash.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Paolo Bonzini 
---
 tests/vm/basevm.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tests/vm/basevm.py b/tests/vm/basevm.py
index a97e23b0ce0..6e31698906b 100644
--- a/tests/vm/basevm.py
+++ b/tests/vm/basevm.py
@@ -331,8 +331,8 @@ def console_init(self, timeout = None):
 def console_log(self, text):
 for line in re.split("[\r\n]", text):
 # filter out terminal escape sequences
-line = re.sub("\x1b\[[0-9;?]*[a-zA-Z]", "", line)
-line = re.sub("\x1b\([0-9;?]*[a-zA-Z]", "", line)
+line = re.sub("\x1b\\[[0-9;?]*[a-zA-Z]", "", line)
+line = re.sub("\x1b\\([0-9;?]*[a-zA-Z]", "", line)
 # replace unprintable chars
 line = re.sub("\x1b", "", line)
 line = re.sub("[\x00-\x1f]", ".", line)
@@ -530,7 +530,7 @@ def get_qemu_version(qemu_path):
and return the major number."""
 output = subprocess.check_output([qemu_path, '--version'])
 version_line = output.decode("utf-8")
-version_num = re.split(' |\(', version_line)[3].split('.')[0]
+version_num = re.split(r' |\(', version_line)[3].split('.')[0]
 return int(version_num)
 
 def parse_config(config, args):
-- 
2.41.0

[PATCH] target/hexagon: avoid invalid escape in Python string

2023-10-16 Thread Paolo Bonzini

This is an error in Python 3.12; fix it by using a raw string literal.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Paolo Bonzini 
---
 target/hexagon/hex_common.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/hexagon/hex_common.py b/target/hexagon/hex_common.py
index dce1b852a7b..0da65d6dd6a 100755
--- a/target/hexagon/hex_common.py
+++ b/target/hexagon/hex_common.py
@@ -45,7 +45,7 @@ def uniquify(seq):
 immre = re.compile(r"[#]([rRsSuUm])(\d+)(?:[:](\d+))?")
 reg_or_immre = re.compile(
 r"(((?

[PATCH] tracetool: avoid invalid escape in Python string

2023-10-16 Thread Paolo Bonzini

This is an error in Python 3.12; fix it by using a raw string literal.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Paolo Bonzini 
---
 scripts/tracetool/__init__.py| 14 +++---
 scripts/tracetool/format/log_stap.py |  2 +-
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/scripts/tracetool/__init__.py b/scripts/tracetool/__init__.py
index 33cf85e2b04..b29594d75e0 100644
--- a/scripts/tracetool/__init__.py
+++ b/scripts/tracetool/__init__.py
@@ -210,12 +210,12 @@ class Event(object):
 
 """
 
-_CRE = re.compile("((?P[\w\s]+)\s+)?"
-  "(?P\w+)"
-  "\((?P[^)]*)\)"
-  "\s*"
-  "(?:(?:(?P\".+),)?\s*(?P\".+))?"
-  "\s*")
+_CRE = re.compile(r"((?P[\w\s]+)\s+)?"
+  r"(?P\w+)"
+  r"\((?P[^)]*)\)"
+  r"\s*"
+  r"(?:(?:(?P\".+),)?\s*(?P\".+))?"
+  r"\s*")
 
 _VALID_PROPS = set(["disable", "vcpu"])
 
@@ -326,7 +326,7 @@ def __repr__(self):
   fmt)
 # Star matching on PRI is dangerous as one might have multiple
 # arguments with that format, hence the non-greedy version of it.
-_FMT = re.compile("(%[\d\.]*\w+|%.*?PRI\S+)")
+_FMT = re.compile(r"(%[\d\.]*\w+|%.*?PRI\S+)")
 
 def formats(self):
 """List conversion specifiers in the argument print format string."""
diff --git a/scripts/tracetool/format/log_stap.py 
b/scripts/tracetool/format/log_stap.py
index 0b6549d534a..b49afababd6 100644
--- a/scripts/tracetool/format/log_stap.py
+++ b/scripts/tracetool/format/log_stap.py
@@ -83,7 +83,7 @@ def c_fmt_to_stap(fmt):
 # and "%ll" is not valid at all. Similarly the size_t
 # based "%z" size qualifier is not valid. We just
 # strip all size qualifiers for sanity.
-fmt = re.sub("%(\d*)(l+|z)(x|u|d)", "%\\1\\3", "".join(bits))
+fmt = re.sub(r"%(\d*)(l+|z)(x|u|d)", r"%\1\3", "".join(bits))
 return fmt
 
 def generate(events, backend, group):
-- 
2.41.0

[PATCH] tests/avocado: avoid invalid escape in Python string

2023-10-16 Thread Paolo Bonzini

This is an error in Python 3.12; fix it by using a raw string literal.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Paolo Bonzini 
---
 tests/avocado/virtio_check_params.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/avocado/virtio_check_params.py 
b/tests/avocado/virtio_check_params.py
index 4093da8a674..0b1e99fc24b 100644
--- a/tests/avocado/virtio_check_params.py
+++ b/tests/avocado/virtio_check_params.py
@@ -43,7 +43,7 @@
 class VirtioMaxSegSettingsCheck(QemuSystemTest):
 @staticmethod
 def make_pattern(props):
-pattern_items = ['{0} = \w+'.format(prop) for prop in props]
+pattern_items = [r'{0} = \w+'.format(prop) for prop in props]
 return '|'.join(pattern_items)
 
 def query_virtqueue(self, vm, dev_type_name):
-- 
2.41.0

[PATCH] docs/sphinx: avoid invalid escape in Python string

2023-10-16 Thread Paolo Bonzini

This is an error in Python 3.12; fix it by using a raw string literal.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Paolo Bonzini 
---
 docs/sphinx/hxtool.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/sphinx/hxtool.py b/docs/sphinx/hxtool.py
index fb0649a3d5b..9f6b9d87dcc 100644
--- a/docs/sphinx/hxtool.py
+++ b/docs/sphinx/hxtool.py
@@ -49,7 +49,7 @@ def serror(file, lnum, errtext):
 
 def parse_directive(line):
 """Return first word of line, if any"""
-return re.split('\W', line)[0]
+return re.split(r'\W', line)[0]
 
 def parse_defheading(file, lnum, line):
 """Handle a DEFHEADING directive"""
-- 
2.41.0

Re: [PATCH 0/3] hw/pci-host/sh_pcic: Style cleanup

2023-10-16 Thread Philippe Mathieu-Daudé


On 12/10/23 06:12, Philippe Mathieu-Daudé wrote:


Philippe Mathieu-Daudé (3):
   hw/pci-host/sh_pcic: Declare CPU QOM types using DEFINE_TYPES() macro
   hw/pci-host/sh_pcic: Correct PCI host / devfn#0 function names
   hw/pci-host/sh_pcic: Replace magic value by proper definition


Series queued, thanks.

Re: [PATCH 04/17] configure: clean up handling of CFI option

2023-10-16 Thread Paolo Bonzini


On 10/16/23 11:22, Philippe Mathieu-Daudé wrote:

On 16/10/23 08:31, Paolo Bonzini wrote:

Avoid that --enable-cfi --disable-cfi leaves b_lto set to true.

Signed-off-by: Paolo Bonzini 
---
  configure | 7 +++
  1 file changed, 3 insertions(+), 4 deletions(-)




@@ -1845,6 +1843,7 @@ if test "$skip_meson" = no; then
    # QEMU options
    test "$cfi" != false && meson_option_add "-Dcfi=$cfi"
+  test "$cfi" != false && meson_option_add "-Db_lto=$cfi"


Merge as "-Dcfi=$cfi -Db_lto=$cfi"?


Sure, it also needs a little change to meson_option_add though:

diff --git a/configure b/configure
index 3da46ed202d..fd88ef3fec2 100755
--- a/configure
+++ b/configure
@@ -624,7 +624,10 @@ meson_option_build_array() {
 
 meson_options=

 meson_option_add() {
-  meson_options="$meson_options $(quote_sh "$1")"
+  local arg
+  for arg; do
+meson_options="$meson_options $(quote_sh "$arg")"
+  done
 }
 meson_option_parse() {
   meson_options="$meson_options $(_meson_option_parse "$@")"
@@ -1842,8 +1845,7 @@ if test "$skip_meson" = no; then
   test "$werror" = yes && meson_option_add -Dwerror=true
 
   # QEMU options

-  test "$cfi" != false && meson_option_add "-Dcfi=$cfi"
-  test "$cfi" != false && meson_option_add "-Db_lto=$cfi"
+  test "$cfi" != false && meson_option_add "-Dcfi=$cfi" "-Db_lto=$cfi"
   test "$docs" != auto && meson_option_add "-Ddocs=$docs"
   test -n "${LIB_FUZZING_ENGINE+xxx}" && meson_option_add 
"-Dfuzzing_engine=$LIB_FUZZING_ENGINE"
   test "$plugins" = yes && meson_option_add "-Dplugins=true"

Ok to squash that in?

Paolo




    test "$docs" != auto && meson_option_add "-Ddocs=$docs"
    test -n "${LIB_FUZZING_ENGINE+xxx}" && meson_option_add 
"-Dfuzzing_engine=$LIB_FUZZING_ENGINE"

    test "$plugins" = yes && meson_option_add "-Dplugins=true"

Re: [PATCH 03/17] meson, cutils: allow non-relocatable installs

2023-10-16 Thread Paolo Bonzini


On 10/16/23 11:08, Manos Pitsidianakis wrote:

On Mon, 16 Oct 2023 09:31, Paolo Bonzini  wrote:

diff --git a/meson.build b/meson.build
index 010d2c649c2..251838f2609 100644
--- a/meson.build
+++ b/meson.build
@@ -2111,6 +2111,7 @@ config_host_data.set('CONFIG_OPENGL', 
opengl.found())

config_host_data.set('CONFIG_PLUGIN', get_option('plugins'))
config_host_data.set('CONFIG_RBD', rbd.found())
config_host_data.set('CONFIG_RDMA', rdma.found())
+config_host_data.set('CONFIG_RELOCATABLE', get_option('relocatable'))
config_host_data.set('CONFIG_SAFESTACK', get_option('safe_stack'))
config_host_data.set('CONFIG_SDL', sdl.found())
config_host_data.set('CONFIG_SDL_IMAGE', sdl_image.found())


Is relocatable a good choice here? The term is used in linking and might 
be confusing (when I read the subject that's what I thought it'd be 
about). How about 'movable`?


I think it's a relatively common usage.  Google finds many uses for RPM 
but also in a lot of random forums (CMake, Julia, FreeBSD).  See also 
https://nehckl0.medium.com/creating-relocatable-linux-executables-by-setting-rpath-with-origin-45de573a2e98 
or 
https://www.gnu.org/software/gnulib/manual/html_node/Supporting-Relocation.html.


Paolo

[PATCH] scripts: Mark feature_to_c.py as non-executable to fix a build issue

2023-10-16 Thread Thomas Huth

Meson tries to run scripts via the shebang line if they files are
marked as executable. If "python3" is not in the $PATH, or if it
is a version that is too old, then the script execution fails.
We should make sure to run scripts via the python3 interpreter
that is used for Meson itself. For this, the files need to be marked
as non-executable, then meson will use the python3 binary that has
been used to run itself.

Fixes: 956af7daad ("gdbstub: Introduce GDBFeature structure")
Signed-off-by: Thomas Huth 
---
 scripts/feature_to_c.py | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 mode change 100755 => 100644 scripts/feature_to_c.py

diff --git a/scripts/feature_to_c.py b/scripts/feature_to_c.py
old mode 100755
new mode 100644
-- 
2.41.0

RE: [PATCH V6 0/9] Add architecture agnostic code to support vCPU Hotplug

2023-10-16 Thread Salil Mehta via

Hi Miguel,

> From: Miguel Luis 
> Sent: Friday, October 13, 2023 5:34 PM
> To: Salil Mehta 
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; Marc Zyngier
> ; jean-phili...@linaro.org; Jonathan Cameron
> ; lpieral...@kernel.org; Peter Maydell
> ; Richard Henderson
> ; imamm...@redhat.com;
> andrew.jo...@linux.dev; da...@redhat.com; phi...@linaro.org;
> eric.au...@redhat.com; oliver.up...@linux.dev; pbonz...@redhat.com;
> m...@redhat.com; w...@kernel.org; gs...@redhat.com; raf...@kernel.org;
> alex.ben...@linaro.org; li...@armlinux.org.uk;
> dar...@os.amperecomputing.com; il...@os.amperecomputing.com;
> vis...@os.amperecomputing.com; Karl Heubaum ;
> salil.me...@opnsrc.net; zhukeqian ; wangxiongfeng
> (C) ; wangyanan (Y) ;
> jiakern...@gmail.com; maob...@loongson.cn; lixiang...@loongson.cn; Linuxarm
> 
> Subject: Re: [PATCH V6 0/9] Add architecture agnostic code to support vCPU
> Hotplug
> 
> Hi Salil,
> 
> > On 13 Oct 2023, at 10:51, Salil Mehta  wrote:
> >
> > Virtual CPU hotplug support is being added across various
> architectures[1][3].
> > This series adds various code bits common across all architectures:


[...]


> I tested it for Arm64, make check, boot/reboot, live migration and found no
> issues,
> so for this, please feel free to add:
> 
> Tested-by: Miguel Luis 

Great. Many thanks for the confirmation. 

I guess you are repeating the same for x86 as well?

Salil.

Re: Performance Issue with CXL-emulation

2023-10-16 Thread Jonathan Cameron via

On Sun, 15 Oct 2023 10:39:46 -0700
lokesh jaliminche  wrote:

> Hi Everyone,
> 
> I am facing performance issues while copying data to the CXL device
> (Emulated with QEMU). I get approximately 500KB/Sec. Any suggestion on how
> to improve this?

Hi Lokesh,

The target so far of QEMU emulation of CXL devices has been on functionality.
I'm in favour of work to improve on that, but it isn't likely to be my focus
- can offer some pointers on where to look though!

The fundamental problem (probably) is address decoding in CXL for interleaving
is at a sub page granularity. That means we can't use page table to perform the 
address
look ups in hardware. Note this also has the side effect that kvm won't work if
there is any chance that you will run instructions out of the CXL memory - it's
fine if you are interested in data only (DAX etc). (I've had a note in my todo 
list
to add a warning message about the KVM limitations for a while).

There have been a few discussions (mostly when we were debugging some TCG issues
and considering KVM support) about how we 'might' be able to improve this.  
That focused
on a general 'fix', but there may be some lower hanging fruit.

The options I think might work are:

1) Special case configurations where there is no interleave going on.
   I'm not entirely sure how this would fit together and it won't deal with the
   more interesting cases - if it does work I'd want it to be minimally 
invasive because
   those complex cases are the main focus of testing etc.  There is an 
extension of this
   where we handle interleave, but only if it is 4k or above (on appropriately 
configured
   host).

2) Add caching layer to the CXL fixed memory windows.  That would hold copies 
of a
   number of pages that have been accessed in a software cache and setup the 
mappings for
   the hardware page table walkers to find them. If the page isn't cached we'd 
trigger
   a pagefault and have to bring it into the cache. If the configuration of the 
interleave
   is touched, all caches would need to be written back etc. This would need to 
be optional
   because I don't want to have to add cache coherency protocols etc when we 
add shared
   memory support (fun though it would be ;) 

3) Might be worth looking at the critical paths for lookups in your 
configuration.
   Maybe we can optimize the address decoders (basically a software TLB for HPA 
to DPA).
   I've not looked at the performance of those paths.  For your example the 
lookup is
   * CFMWS - nothing to do
   * Host bridge - nothing to do beyond a sanity check on range I think.
   * Nothing to to do.
   * Type 3 device - basic range match.
   So I'm not sure it is worth while - but you could do a really simple test by 
detecting
   no interleave is going on and caching the offset needed to go HPA to DPA + a 
device reference
   for the first time cxl_cfmws_find_device() is called. 
   https://elixir.bootlin.com/qemu/latest/source/hw/cxl/cxl-host.c#L129

   Then just match on hwaddr on another call of cxl_cmws_find_device() and 
return the device
   directly.  Maybe also shortcut lookups in cxl_type3_hpa_to_as_and_dpa() 
which does the endpoint
   decoding part. A quick hack would let you know if it was worth looking at 
something more general.

   Gut feeling is this last approach might get you some perf uptick but not 
going to solve
   the fundamental problem that in general we can't do the translation in 
hardware (unlike most
   other memory accesses in QEMU).

   Not I believe all writes to file backed memory will go all the way to the 
file. So you might want
   to try backing it with RAM but I as with the above, that's not going to 
address the fundamental
   problem.

Jonathan

> 
> Steps to reproduce :
> ===
> 1. QEMU Command:
> sudo /opt/qemu-cxl/bin/qemu-system-x86_64 \
> -hda ./images/ubuntu-22.04-server-cloudimg-amd64.img \
> -hdb ./images/user-data.img \
> -M q35,cxl=on,accel=kvm,nvdimm=on \
> -smp 16 \
> -m 16G,maxmem=32G,slots=8 \
> -object
> memory-backend-file,id=cxl-mem1,share=on,mem-path=/mnt/qemu_files/cxltest.raw,size=256M
> \
> -object
> memory-backend-file,id=cxl-lsa1,share=on,mem-path=/mnt/qemu_files/lsa.raw,size=256M
> \
> -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
> -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
> -device
> cxl-type3,bus=root_port13,persistent-memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0
> \
> -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G \
> -nographic \
> 
> 2. Configure device with fsdax mode
> ubuntu@ubuntu:~$ cxl list
> [
>   {
> "memdevs":[
>   {
> "memdev":"mem0",
> "pmem_size":268435456,
> "serial":0,
> "host":":0d:00.0"
>   }
> ]
>   },
>   {
> "regions":[
>   {
> "region":"region0",
> "resource":45365592064,
> "size":268435456,
> "type":"pmem",
> "interleave_ways":1,
> "interleave_granularity":1024,
> "decode_state":"commit"
>

Re: -drive if=none: can't we make this the default?

2023-10-16 Thread Paolo Bonzini


On 10/14/23 21:16, Michael Tokarev wrote:

Can't we make -drive if=none the default?

Yes, I know current default is ide, and whole world have to use if=none 
explicitly
to undo this.  I think at this point we can deprecate if=ide default and switch 
to
if=none in the next release.  I think it will be a welcome change.


I think if anything we should have no default at all.  But if I had my way:

1) if=none would be deprecated (but with a much longer cycle than 1 
year, probably), and everything that uses it would have to use -blockdev.


2) -drive would be limited to a very small set of suboptions (file, 
cache, if, and the ones in qemu_common_drive_opts) and anything that 
specifies the driver would go through -blockdev.


Paolo

RE: [PATCH V5 4/9] hw/acpi: Init GED framework with CPU hotplug events

2023-10-16 Thread Salil Mehta via

> From: Shaoqin Huang 
> Sent: Monday, October 16, 2023 3:54 AM
> To: Salil Mehta ; qemu-devel@nongnu.org; qemu-
> a...@nongnu.org
> Cc: m...@kernel.org; jean-phili...@linaro.org; Jonathan Cameron
> ; lpieral...@kernel.org;
> peter.mayd...@linaro.org; richard.hender...@linaro.org;
> imamm...@redhat.com; andrew.jo...@linux.dev; da...@redhat.com;
> phi...@linaro.org; eric.au...@redhat.com; oliver.up...@linux.dev;
> pbonz...@redhat.com; m...@redhat.com; w...@kernel.org; gs...@redhat.com;
> raf...@kernel.org; alex.ben...@linaro.org; li...@armlinux.org.uk;
> dar...@os.amperecomputing.com; il...@os.amperecomputing.com;
> vis...@os.amperecomputing.com; karl.heub...@oracle.com;
> miguel.l...@oracle.com; salil.me...@opnsrc.net; zhukeqian
> ; wangxiongfeng (C) ;
> wangyanan (Y) ; jiakern...@gmail.com;
> maob...@loongson.cn; lixiang...@loongson.cn; Linuxarm 
> Subject: Re: [PATCH V5 4/9] hw/acpi: Init GED framework with CPU hotplug
> events
> 
> 
> 
> On 10/12/23 03:43, Salil Mehta via wrote:
> > ACPI GED(as described in the ACPI 6.2 spec) can be used to generate ACPI
> events
> > when OSPM/guest receives an interrupt listed in the _CRS object of GED.
> OSPM
> > then maps or demultiplexes the event by evaluating _EVT method.
> >
> > This change adds the support of CPU hotplug event initialization in the
> > existing GED framework.
> >
> > Co-developed-by: Keqian Zhu 
> > Signed-off-by: Keqian Zhu 
> > Signed-off-by: Salil Mehta 
> > Reviewed-by: Jonathan Cameron 
> > Reviewed-by: Gavin Shan 
> > Reviewed-by: David Hildenbrand 
> > Tested-by: Vishnu Pajjuri 
> Reviewed-by: Shaoqin Huang 

Thanks.

Please use latest version for any further reviews.

https://lore.kernel.org/qemu-devel/4764cf47-47ca-4685-805c-bbe6310be...@oracle.com/T/#m563b7fb4690998c72cee7a41b215224e1cc53cc0

Thanks
Salil.

Re: [PATCH V6 0/9] Add architecture agnostic code to support vCPU Hotplug

2023-10-16 Thread Miguel Luis

Hi Salil,

> On 16 Oct 2023, at 09:52, Salil Mehta  wrote:
> 
> Hi Miguel,
> 
>> From: Miguel Luis 
>> Sent: Friday, October 13, 2023 5:34 PM
>> To: Salil Mehta 
>> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; Marc Zyngier
>> ; jean-phili...@linaro.org; Jonathan Cameron
>> ; lpieral...@kernel.org; Peter Maydell
>> ; Richard Henderson
>> ; imamm...@redhat.com;
>> andrew.jo...@linux.dev; da...@redhat.com; phi...@linaro.org;
>> eric.au...@redhat.com; oliver.up...@linux.dev; pbonz...@redhat.com;
>> m...@redhat.com; w...@kernel.org; gs...@redhat.com; raf...@kernel.org;
>> alex.ben...@linaro.org; li...@armlinux.org.uk;
>> dar...@os.amperecomputing.com; il...@os.amperecomputing.com;
>> vis...@os.amperecomputing.com; Karl Heubaum ;
>> salil.me...@opnsrc.net; zhukeqian ; wangxiongfeng
>> (C) ; wangyanan (Y) ;
>> jiakern...@gmail.com; maob...@loongson.cn; lixiang...@loongson.cn; Linuxarm
>> 
>> Subject: Re: [PATCH V6 0/9] Add architecture agnostic code to support vCPU
>> Hotplug
>> 
>> Hi Salil,
>> 
>>> On 13 Oct 2023, at 10:51, Salil Mehta  wrote:
>>> 
>>> Virtual CPU hotplug support is being added across various
>> architectures[1][3].
>>> This series adds various code bits common across all architectures:
> 
> 
> [...]
> 
> 
>> I tested it for Arm64, make check, boot/reboot, live migration and found no
>> issues,
>> so for this, please feel free to add:
>> 
>> Tested-by: Miguel Luis 
> 
> Great. Many thanks for the confirmation. 
> 
> I guess you are repeating the same for x86 as well?
> 

You are welcome!

Absolutely, I’m repeating those same tests for x86.

Thanks
Miguel

> Salil.

[PULL 04/38] migration: fix RAMBlock add NULL check

2023-10-16 Thread Juan Quintela

From: Dmitry Frolov 

qemu_ram_block_from_host() may return NULL, which will be dereferenced w/o
check. Usualy return value is checked for this function.
Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Dmitry Frolov 
Reviewed-by: Fabiano Rosas 
Reviewed-by: Peter Xu 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Message-ID: <20231010104851.802947-1-fro...@swemel.ru>
---
 migration/ram.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 24d91de8b3..e8df4dc862 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4285,6 +4285,11 @@ static void ram_mig_ram_block_resized(RAMBlockNotifier 
*n, void *host,
 RAMBlock *rb = qemu_ram_block_from_host(host, false, &offset);
 Error *err = NULL;
 
+if (!rb) {
+error_report("RAM block not found");
+return;
+}
+
 if (migrate_ram_is_ignored(rb)) {
 return;
 }
-- 
2.41.0

[PULL 02/38] migration: Use g_autofree to simplify ram_dirty_bitmap_reload()

2023-10-16 Thread Juan Quintela

From: Philippe Mathieu-Daudé 

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Markus Armbruster 
Reviewed-by: Fabiano Rosas 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Message-ID: <20231011023627.86691-1-phi...@linaro.org>
---
 migration/ram.c | 17 ++---
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 2f5ce4d60b..24d91de8b3 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4159,7 +4159,8 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock 
*block)
 int ret = -EINVAL;
 /* from_dst_file is always valid because we're within rp_thread */
 QEMUFile *file = s->rp_state.from_dst_file;
-unsigned long *le_bitmap, nbits = block->used_length >> TARGET_PAGE_BITS;
+g_autofree unsigned long *le_bitmap = NULL;
+unsigned long nbits = block->used_length >> TARGET_PAGE_BITS;
 uint64_t local_size = DIV_ROUND_UP(nbits, 8);
 uint64_t size, end_mark;
 RAMState *rs = ram_state;
@@ -4188,8 +4189,7 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock 
*block)
 error_report("%s: ramblock '%s' bitmap size mismatch "
  "(0x%"PRIx64" != 0x%"PRIx64")", __func__,
  block->idstr, size, local_size);
-ret = -EINVAL;
-goto out;
+return -EINVAL;
 }
 
 size = qemu_get_buffer(file, (uint8_t *)le_bitmap, local_size);
@@ -4200,15 +4200,13 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock 
*block)
 error_report("%s: read bitmap failed for ramblock '%s': %d"
  " (size 0x%"PRIx64", got: 0x%"PRIx64")",
  __func__, block->idstr, ret, local_size, size);
-ret = -EIO;
-goto out;
+return -EIO;
 }
 
 if (end_mark != RAMBLOCK_RECV_BITMAP_ENDING) {
 error_report("%s: ramblock '%s' end mark incorrect: 0x%"PRIx64,
  __func__, block->idstr, end_mark);
-ret = -EINVAL;
-goto out;
+return -EINVAL;
 }
 
 /*
@@ -4240,10 +4238,7 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock 
*block)
  */
 migration_rp_kick(s);
 
-ret = 0;
-out:
-g_free(le_bitmap);
-return ret;
+return 0;
 }
 
 static int ram_resume_prepare(MigrationState *s, void *opaque)
-- 
2.41.0

[PULL 05/38] migration: Add the configuration vmstate to the json writer

2023-10-16 Thread Juan Quintela

From: Nikolay Borisov 

Make the migration json writer part of MigrationState struct, allowing
the 'configuration' object be serialized to json.

This will facilitate the parsing of the 'configuration' object in the
next patch that fixes analyze-migration.py for arm.

Signed-off-by: Nikolay Borisov 
Signed-off-by: Fabiano Rosas 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Message-ID: <20231009184326.15777-2-faro...@suse.de>
---
 migration/migration.c |  1 +
 migration/savevm.c| 20 
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index ed04ca3b1c..98151b1424 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1442,6 +1442,7 @@ int migrate_init(MigrationState *s, Error **errp)
 error_free(s->error);
 s->error = NULL;
 s->hostname = NULL;
+s->vmdesc = NULL;
 
 migrate_set_state(&s->state, MIGRATION_STATUS_NONE, 
MIGRATION_STATUS_SETUP);
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 497ce02bd7..bce698b0af 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1217,13 +1217,27 @@ void qemu_savevm_non_migratable_list(strList **reasons)
 
 void qemu_savevm_state_header(QEMUFile *f)
 {
+MigrationState *s = migrate_get_current();
+
+s->vmdesc = json_writer_new(false);
+
 trace_savevm_state_header();
 qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
 qemu_put_be32(f, QEMU_VM_FILE_VERSION);
 
-if (migrate_get_current()->send_configuration) {
+if (s->send_configuration) {
 qemu_put_byte(f, QEMU_VM_CONFIGURATION);
-vmstate_save_state(f, &vmstate_configuration, &savevm_state, 0);
+
+/*
+ * This starts the main json object and is paired with the
+ * json_writer_end_object in
+ * qemu_savevm_state_complete_precopy_non_iterable
+ */
+json_writer_start_object(s->vmdesc, NULL);
+
+json_writer_start_object(s->vmdesc, "configuration");
+vmstate_save_state(f, &vmstate_configuration, &savevm_state, 
s->vmdesc);
+json_writer_end_object(s->vmdesc);
 }
 }
 
@@ -1272,8 +1286,6 @@ void qemu_savevm_state_setup(QEMUFile *f)
 Error *local_err = NULL;
 int ret;
 
-ms->vmdesc = json_writer_new(false);
-json_writer_start_object(ms->vmdesc, NULL);
 json_writer_int64(ms->vmdesc, "page_size", qemu_target_page_size());
 json_writer_start_array(ms->vmdesc, "devices");
 
-- 
2.41.0

[PULL 15/38] migration/rdma: Unfold ram_control_before_iterate()

2023-10-16 Thread Juan Quintela

Once there:
- Remove unused data parameter
- unfold it in its callers.
- change all callers to call qemu_rdma_registration_start()
- We need to call QIO_CHANNEL_RDMA() after we check for migrate_rdma()

Reviewed-by: Li Zhijian 
Reviewed-by: Fabiano Rosas 
Signed-off-by: Juan Quintela 
Message-ID: <20231011203527.9061-3-quint...@redhat.com>
---
 migration/qemu-file.h |  2 --
 migration/rdma.h  |  7 +++
 migration/qemu-file.c | 13 +
 migration/ram.c   | 16 +---
 migration/rdma.c  | 12 
 5 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 03e718c264..d6a370c569 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -55,7 +55,6 @@ typedef int (QEMURamSaveFunc)(QEMUFile *f,
   size_t size);
 
 typedef struct QEMUFileHooks {
-QEMURamHookFunc *before_ram_iterate;
 QEMURamHookFunc *after_ram_iterate;
 QEMURamHookFunc *hook_ram_load;
 QEMURamSaveFunc *save_page;
@@ -127,7 +126,6 @@ void qemu_fflush(QEMUFile *f);
 void qemu_file_set_blocking(QEMUFile *f, bool block);
 int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size);
 
-void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data);
 
diff --git a/migration/rdma.h b/migration/rdma.h
index de2ba09dc5..670c67a8cb 100644
--- a/migration/rdma.h
+++ b/migration/rdma.h
@@ -22,4 +22,11 @@ void rdma_start_outgoing_migration(void *opaque, const char 
*host_port,
 
 void rdma_start_incoming_migration(const char *host_port, Error **errp);
 
+
+#ifdef CONFIG_RDMA
+int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags);
+#else
+static inline
+int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags) { return 0; }
+#endif
 #endif
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 7fb659296f..5e2d73fd68 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -32,6 +32,7 @@
 #include "trace.h"
 #include "options.h"
 #include "qapi/error.h"
+#include "rdma.h"
 
 #define IO_BUF_SIZE 32768
 #define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 64)
@@ -297,18 +298,6 @@ void qemu_fflush(QEMUFile *f)
 f->iovcnt = 0;
 }
 
-void ram_control_before_iterate(QEMUFile *f, uint64_t flags)
-{
-int ret = 0;
-
-if (f->hooks && f->hooks->before_ram_iterate) {
-ret = f->hooks->before_ram_iterate(f, flags, NULL);
-if (ret < 0) {
-qemu_file_set_error(f, ret);
-}
-}
-}
-
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags)
 {
 int ret = 0;
diff --git a/migration/ram.c b/migration/ram.c
index acb8f95f00..6592431a4e 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -59,6 +59,7 @@
 #include "qemu/iov.h"
 #include "multifd.h"
 #include "sysemu/runstate.h"
+#include "rdma.h"
 #include "options.h"
 #include "sysemu/dirtylimit.h"
 #include "sysemu/kvm.h"
@@ -3060,7 +3061,10 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 }
 }
 
-ram_control_before_iterate(f, RAM_CONTROL_SETUP);
+ret = qemu_rdma_registration_start(f, RAM_CONTROL_SETUP);
+if (ret < 0) {
+qemu_file_set_error(f, ret);
+}
 ram_control_after_iterate(f, RAM_CONTROL_SETUP);
 
 migration_ops = g_malloc0(sizeof(MigrationOps));
@@ -3123,7 +3127,10 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 /* Read version before ram_list.blocks */
 smp_rmb();
 
-ram_control_before_iterate(f, RAM_CONTROL_ROUND);
+ret = qemu_rdma_registration_start(f, RAM_CONTROL_ROUND);
+if (ret < 0) {
+qemu_file_set_error(f, ret);
+}
 
 t0 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 i = 0;
@@ -3228,7 +3235,10 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 migration_bitmap_sync_precopy(rs, true);
 }
 
-ram_control_before_iterate(f, RAM_CONTROL_FINISH);
+ret = qemu_rdma_registration_start(f, RAM_CONTROL_FINISH);
+if (ret < 0) {
+qemu_file_set_error(f, ret);
+}
 
 /* try transferring iterative blocks of memory */
 
diff --git a/migration/rdma.c b/migration/rdma.c
index f155f3e1c8..3d74ad6db0 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3850,18 +3850,15 @@ static int rdma_load_hook(QEMUFile *f, uint64_t flags, 
void *data)
 }
 }
 
-static int qemu_rdma_registration_start(QEMUFile *f,
-uint64_t flags, void *data)
+int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags)
 {
-QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
-RDMAContext *rdma;
-
-if (migration_in_postcopy()) {
+if (!migrate_rdma() || migration_in_postcopy()) {
 return 0;
 }
 
+QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
 RCU_READ_LOCK_GUARD();
-rdma = qatomic_rcu_read(&rioc->rdm

[PULL 12/38] migration: hold the BQL during setup

2023-10-16 Thread Juan Quintela

From: Fiona Ebner 

This is intended to be a semantic revert of commit 9b09503752
("migration: run setup callbacks out of big lock"). There have been so
many changes since that commit (e.g. a new setup callback
dirty_bitmap_save_setup() that also needs to be adapted now), it's
easier to do the revert manually.

For snapshots, the bdrv_writev_vmstate() function is used during setup
(in QIOChannelBlock backing the QEMUFile), but not holding the BQL
while calling it could lead to an assertion failure. To understand
how, first note the following:

1. Generated coroutine wrappers for block layer functions spawn the
coroutine and use AIO_WAIT_WHILE()/aio_poll() to wait for it.
2. If the host OS switches threads at an inconvenient time, it can
happen that a bottom half scheduled for the main thread's AioContext
is executed as part of a vCPU thread's aio_poll().

An example leading to the assertion failure is as follows:

main thread:
1. A snapshot-save QMP command gets issued.
2. snapshot_save_job_bh() is scheduled.

vCPU thread:
3. aio_poll() for the main thread's AioContext is called (e.g. when
the guest writes to a pflash device, as part of blk_pwrite which is a
generated coroutine wrapper).
4. snapshot_save_job_bh() is executed as part of aio_poll().
3. qemu_savevm_state() is called.
4. qemu_mutex_unlock_iothread() is called. Now
qemu_get_current_aio_context() returns 0x0.
5. bdrv_writev_vmstate() is executed during the usual savevm setup
via qemu_fflush(). But this function is a generated coroutine wrapper,
so it uses AIO_WAIT_WHILE. There, the assertion
assert(qemu_get_current_aio_context() == qemu_get_aio_context());
will fail.

To fix it, ensure that the BQL is held during setup. While it would
only be needed for snapshots, adapting migration too avoids additional
logic for conditional locking/unlocking in the setup callbacks.
Writing the header could (in theory) also trigger qemu_fflush() and
thus bdrv_writev_vmstate(), so the locked section also covers the
qemu_savevm_state_header() call, even for migration for consistency.

The section around multifd_send_sync_main() needs to be unlocked to
avoid a deadlock. In particular, the multifd_save_setup() function calls
socket_send_channel_create() using multifd_new_send_channel_async() as a
callback and then waits for the callback to signal via the
channels_ready semaphore. The connection happens via
qio_task_run_in_thread(), but the callback is only executed via
qio_task_thread_result() which is scheduled for the main event loop.
Without unlocking the section, the main thread would never get to
process the task result and the callback meaning there would be no
signal via the channels_ready semaphore.

The comment in ram_init_bitmaps() was introduced by 4987783400
("migration: fix incorrect memory_global_dirty_log_start outside BQL")
and is removed, because it referred to the qemu_mutex_lock_iothread()
call.

Signed-off-by: Fiona Ebner 
Reviewed-by: Fabiano Rosas 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Message-ID: <20231013105839.415989-1-f.eb...@proxmox.com>
---
 include/migration/register.h   | 2 +-
 migration/block-dirty-bitmap.c | 3 ---
 migration/block.c  | 5 -
 migration/migration.c  | 6 ++
 migration/ram.c| 6 +++---
 migration/savevm.c | 2 --
 6 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index 2b12c6adec..fed1d04a3c 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -25,6 +25,7 @@ typedef struct SaveVMHandlers {
  * used to perform early checks.
  */
 int (*save_prepare)(void *opaque, Error **errp);
+int (*save_setup)(QEMUFile *f, void *opaque);
 void (*save_cleanup)(void *opaque);
 int (*save_live_complete_postcopy)(QEMUFile *f, void *opaque);
 int (*save_live_complete_precopy)(QEMUFile *f, void *opaque);
@@ -50,7 +51,6 @@ typedef struct SaveVMHandlers {
 int (*save_live_iterate)(QEMUFile *f, void *opaque);
 
 /* This runs outside the iothread lock!  */
-int (*save_setup)(QEMUFile *f, void *opaque);
 /* Note for save_live_pending:
  * must_precopy:
  * - must be migrated in precopy or in stopped state
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 032fc5f405..03cb2e72ee 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -1214,9 +1214,7 @@ static int dirty_bitmap_save_setup(QEMUFile *f, void 
*opaque)
 DBMSaveState *s = &((DBMState *)opaque)->save;
 SaveBitmapState *dbms = NULL;
 
-qemu_mutex_lock_iothread();
 if (init_dirty_bitmap_migration(s) < 0) {
-qemu_mutex_unlock_iothread();
 return -1;
 }
 
@@ -1224,7 +1222,6 @@ static int dirty_bitmap_save_setup(QEMUFile *f, void 
*opaque)
 send_bitmap_start(f, s, dbms);
 }
 qemu_put_bitmap_flags(f, DIRTY_BITMAP_MIG_FLAG_EOS);
-qemu_mutex_unlock_iothread()

[PULL 18/38] migration/rdma: Unfold hook_ram_load()

2023-10-16 Thread Juan Quintela

There is only one flag called with: RAM_CONTROL_BLOCK_REG.

Reviewed-by: Li Zhijian 
Signed-off-by: Juan Quintela 
Message-ID: <20231011203527.9061-6-quint...@redhat.com>
---
 migration/qemu-file.h | 11 ---
 migration/rdma.h  |  3 +++
 migration/qemu-file.c | 10 --
 migration/ram.c   |  6 --
 migration/rdma.c  | 34 +++---
 5 files changed, 18 insertions(+), 46 deletions(-)

diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 14ff0d9cc4..80c30631dc 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -29,20 +29,12 @@
 #include "exec/cpu-common.h"
 #include "io/channel.h"
 
-/*
- * This function provides hooks around different
- * stages of RAM migration.
- * 'data' is call specific data associated with the 'flags' value
- */
-typedef int (QEMURamHookFunc)(QEMUFile *f, uint64_t flags, void *data);
-
 /*
  * Constants used by ram_control_* hooks
  */
 #define RAM_CONTROL_SETUP 0
 #define RAM_CONTROL_ROUND 1
 #define RAM_CONTROL_FINISH3
-#define RAM_CONTROL_BLOCK_REG 4
 
 /*
  * This function allows override of where the RAM page
@@ -54,7 +46,6 @@ typedef int (QEMURamSaveFunc)(QEMUFile *f,
   size_t size);
 
 typedef struct QEMUFileHooks {
-QEMURamHookFunc *hook_ram_load;
 QEMURamSaveFunc *save_page;
 } QEMUFileHooks;
 
@@ -124,8 +115,6 @@ void qemu_fflush(QEMUFile *f);
 void qemu_file_set_blocking(QEMUFile *f, bool block);
 int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size);
 
-void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data);
-
 /* Whenever this is found in the data stream, the flags
  * will be passed to ram_control_load_hook in the incoming-migration
  * side. This lets before_ram_iterate/after_ram_iterate add
diff --git a/migration/rdma.h b/migration/rdma.h
index 8bd277efb9..8df8b4089a 100644
--- a/migration/rdma.h
+++ b/migration/rdma.h
@@ -27,6 +27,7 @@ void rdma_start_incoming_migration(const char *host_port, 
Error **errp);
 int qemu_rdma_registration_handle(QEMUFile *f);
 int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags);
 int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags);
+int rdma_block_notification_handle(QEMUFile *f, const char *name);
 #else
 static inline
 int qemu_rdma_registration_handle(QEMUFile *f) { return 0; }
@@ -34,5 +35,7 @@ static inline
 int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags) { return 0; }
 static inline
 int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags) { return 0; }
+static inline
+int rdma_block_notification_handle(QEMUFile *f, const char *name) { return 0; }
 #endif
 #endif
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index e7dba2a849..4a414b8976 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -298,16 +298,6 @@ void qemu_fflush(QEMUFile *f)
 f->iovcnt = 0;
 }
 
-void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data)
-{
-if (f->hooks && f->hooks->hook_ram_load) {
-int ret = f->hooks->hook_ram_load(f, flags, data);
-if (ret < 0) {
-qemu_file_set_error(f, ret);
-}
-}
-}
-
 int ram_control_save_page(QEMUFile *f, ram_addr_t block_offset,
   ram_addr_t offset, size_t size)
 {
diff --git a/migration/ram.c b/migration/ram.c
index f6ea1831b5..8c462276cd 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4025,8 +4025,10 @@ static int ram_load_precopy(QEMUFile *f)
 ret = -EINVAL;
 }
 }
-ram_control_load_hook(f, RAM_CONTROL_BLOCK_REG,
-  block->idstr);
+ret = rdma_block_notification_handle(f, block->idstr);
+if (ret < 0) {
+qemu_file_set_error(f, ret);
+}
 } else {
 error_report("Unknown ramblock \"%s\", cannot "
  "accept migration", id);
diff --git a/migration/rdma.c b/migration/rdma.c
index 5c20f425a9..0b1cb03b2b 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3799,22 +3799,23 @@ err:
 }
 
 /* Destination:
- * Called via a ram_control_load_hook during the initial RAM load section which
- * lists the RAMBlocks by name.  This lets us know the order of the RAMBlocks
- * on the source.
- * We've already built our local RAMBlock list, but not yet sent the list to
- * the source.
+ * Called during the initial RAM load section which lists the
+ * RAMBlocks by name.  This lets us know the order of the RAMBlocks on
+ * the source.  We've already built our local RAMBlock list, but not
+ * yet sent the list to the source.
  */
-static int
-rdma_block_notification_handle(QEMUFile *f, const char *name)
+int rdma_block_notification_handle(QEMUFile *f, const char *name)
 {
-RDMAContext *rdma;
-QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
 int curr;
 i

[PULL 22/38] migration/rdma: Remove qemu_ prefix from exported functions

2023-10-16 Thread Juan Quintela

Functions are long enough even without this.

Reviewed-by: Peter Xu 
Reviewed-by: Li Zhijian 
Signed-off-by: Juan Quintela 
Message-ID: <20231011203527.9061-10-quint...@redhat.com>
---
 migration/rdma.h   | 12 ++--
 migration/ram.c| 14 +++---
 migration/rdma.c   | 40 +++-
 migration/trace-events | 28 ++--
 4 files changed, 46 insertions(+), 48 deletions(-)

diff --git a/migration/rdma.h b/migration/rdma.h
index 1ff3718a76..30b15b4466 100644
--- a/migration/rdma.h
+++ b/migration/rdma.h
@@ -42,19 +42,19 @@ void rdma_start_incoming_migration(const char *host_port, 
Error **errp);
 #define RAM_SAVE_CONTROL_DELAYED  -2000
 
 #ifdef CONFIG_RDMA
-int qemu_rdma_registration_handle(QEMUFile *f);
-int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags);
-int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags);
+int rdma_registration_handle(QEMUFile *f);
+int rdma_registration_start(QEMUFile *f, uint64_t flags);
+int rdma_registration_stop(QEMUFile *f, uint64_t flags);
 int rdma_block_notification_handle(QEMUFile *f, const char *name);
 int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
ram_addr_t offset, size_t size);
 #else
 static inline
-int qemu_rdma_registration_handle(QEMUFile *f) { return 0; }
+int rdma_registration_handle(QEMUFile *f) { return 0; }
 static inline
-int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags) { return 0; }
+int rdma_registration_start(QEMUFile *f, uint64_t flags) { return 0; }
 static inline
-int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags) { return 0; }
+int rdma_registration_stop(QEMUFile *f, uint64_t flags) { return 0; }
 static inline
 int rdma_block_notification_handle(QEMUFile *f, const char *name) { return 0; }
 static inline
diff --git a/migration/ram.c b/migration/ram.c
index 6a4aed2a75..a9bc6ae1f1 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3061,12 +3061,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 }
 }
 
-ret = qemu_rdma_registration_start(f, RAM_CONTROL_SETUP);
+ret = rdma_registration_start(f, RAM_CONTROL_SETUP);
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 }
 
-ret = qemu_rdma_registration_stop(f, RAM_CONTROL_SETUP);
+ret = rdma_registration_stop(f, RAM_CONTROL_SETUP);
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 }
@@ -3131,7 +3131,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 /* Read version before ram_list.blocks */
 smp_rmb();
 
-ret = qemu_rdma_registration_start(f, RAM_CONTROL_ROUND);
+ret = rdma_registration_start(f, RAM_CONTROL_ROUND);
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 }
@@ -3191,7 +3191,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
  * Must occur before EOS (or any QEMUFile operation)
  * because of RDMA protocol.
  */
-ret = qemu_rdma_registration_stop(f, RAM_CONTROL_ROUND);
+ret = rdma_registration_stop(f, RAM_CONTROL_ROUND);
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 }
@@ -3242,7 +3242,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 migration_bitmap_sync_precopy(rs, true);
 }
 
-ret = qemu_rdma_registration_start(f, RAM_CONTROL_FINISH);
+ret = rdma_registration_start(f, RAM_CONTROL_FINISH);
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 }
@@ -3268,7 +3268,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 
 ram_flush_compressed_data(rs);
 
-int ret = qemu_rdma_registration_stop(f, RAM_CONTROL_FINISH);
+int ret = rdma_registration_stop(f, RAM_CONTROL_FINISH);
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 }
@@ -4077,7 +4077,7 @@ static int ram_load_precopy(QEMUFile *f)
 }
 break;
 case RAM_SAVE_FLAG_HOOK:
-ret = qemu_rdma_registration_handle(f);
+ret = rdma_registration_handle(f);
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 }
diff --git a/migration/rdma.c b/migration/rdma.c
index 9883b0a250..c147c94b08 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3540,7 +3540,7 @@ static int dest_ram_sort_func(const void *a, const void 
*b)
  *
  * Keep doing this until the source tells us to stop.
  */
-int qemu_rdma_registration_handle(QEMUFile *f)
+int rdma_registration_handle(QEMUFile *f)
 {
 RDMAControlHeader reg_resp = { .len = sizeof(RDMARegisterResult),
.type = RDMA_CONTROL_REGISTER_RESULT,
@@ -3586,7 +3586,7 @@ int qemu_rdma_registration_handle(QEMUFile *f)
 
 local = &rdma->local_ram_blocks;
 do {
-trace_qemu_rdma_registration_handle_wait();
+trace_rdma_registration_handle_wait();
 
 ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_NONE, &err);
 
@@ -3606,

[PULL 07/38] migration: Add capability parsing to analyze-migration.py

2023-10-16 Thread Juan Quintela

From: Fabiano Rosas 

The script is broken when the configuration/capabilities section is
present. Add support for parsing the capabilities so we can fix it in
the next patch.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Message-ID: <20231009184326.15777-4-faro...@suse.de>
---
 scripts/analyze-migration.py | 38 
 1 file changed, 38 insertions(+)

diff --git a/scripts/analyze-migration.py b/scripts/analyze-migration.py
index 24687db497..c700fed64d 100755
--- a/scripts/analyze-migration.py
+++ b/scripts/analyze-migration.py
@@ -264,6 +264,24 @@ class ConfigurationSection(object):
 def __init__(self, file, desc):
 self.file = file
 self.desc = desc
+self.caps = []
+
+def parse_capabilities(self, vmsd_caps):
+if not vmsd_caps:
+return
+
+ncaps = vmsd_caps.data['caps_count'].data
+self.caps = vmsd_caps.data['capabilities']
+
+if type(self.caps) != list:
+self.caps = [self.caps]
+
+if len(self.caps) != ncaps:
+raise Exception("Number of capabilities doesn't match "
+"caps_count field")
+
+def has_capability(self, cap):
+return any([str(c) == cap for c in self.caps])
 
 def read(self):
 if self.desc:
@@ -271,6 +289,8 @@ def read(self):
 section = VMSDSection(self.file, version_id, self.desc,
   'configuration')
 section.read()
+self.parse_capabilities(
+section.data.get("configuration/capabilities"))
 else:
 # backward compatibility for older streams that don't have
 # the configuration section in the json
@@ -297,6 +317,23 @@ def read(self):
 self.data = self.file.readvar(size)
 return self.data
 
+class VMSDFieldCap(object):
+def __init__(self, desc, file):
+self.file = file
+self.desc = desc
+self.data = ""
+
+def __repr__(self):
+return self.data
+
+def __str__(self):
+return self.data
+
+def read(self):
+len = self.file.read8()
+self.data = self.file.readstr(len)
+
+
 class VMSDFieldInt(VMSDFieldGeneric):
 def __init__(self, desc, file):
 super(VMSDFieldInt, self).__init__(desc, file)
@@ -471,6 +508,7 @@ def getDict(self):
 "unused_buffer" : VMSDFieldGeneric,
 "bitmap" : VMSDFieldGeneric,
 "struct" : VMSDFieldStruct,
+"capability": VMSDFieldCap,
 "unknown" : VMSDFieldGeneric,
 }
 
-- 
2.41.0

[PULL 03/38] migration: Allow user to specify available switchover bandwidth

2023-10-16 Thread Juan Quintela

From: Peter Xu 

Migration bandwidth is a very important value to live migration.  It's
because it's one of the major factors that we'll make decision on when to
switchover to destination in a precopy process.

This value is currently estimated by QEMU during the whole live migration
process by monitoring how fast we were sending the data.  This can be the
most accurate bandwidth if in the ideal world, where we're always feeding
unlimited data to the migration channel, and then it'll be limited to the
bandwidth that is available.

However in reality it may be very different, e.g., over a 10Gbps network we
can see query-migrate showing migration bandwidth of only a few tens of
MB/s just because there are plenty of other things the migration thread
might be doing.  For example, the migration thread can be busy scanning
zero pages, or it can be fetching dirty bitmap from other external dirty
sources (like vhost or KVM).  It means we may not be pushing data as much
as possible to migration channel, so the bandwidth estimated from "how many
data we sent in the channel" can be dramatically inaccurate sometimes.

With that, the decision to switchover will be affected, by assuming that we
may not be able to switchover at all with such a low bandwidth, but in
reality we can.

The migration may not even converge at all with the downtime specified,
with that wrong estimation of bandwidth, keeping iterations forever with a
low estimation of bandwidth.

The issue is QEMU itself may not be able to avoid those uncertainties on
measuing the real "available migration bandwidth".  At least not something
I can think of so far.

One way to fix this is when the user is fully aware of the available
bandwidth, then we can allow the user to help providing an accurate value.

For example, if the user has a dedicated channel of 10Gbps for migration
for this specific VM, the user can specify this bandwidth so QEMU can
always do the calculation based on this fact, trusting the user as long as
specified.  It may not be the exact bandwidth when switching over (in which
case qemu will push migration data as fast as possible), but much better
than QEMU trying to wildly guess, especially when very wrong.

A new parameter "avail-switchover-bandwidth" is introduced just for this.
So when the user specified this parameter, instead of trusting the
estimated value from QEMU itself (based on the QEMUFile send speed), it
trusts the user more by using this value to decide when to switchover,
assuming that we'll have such bandwidth available then.

Note that specifying this value will not throttle the bandwidth for
switchover yet, so QEMU will always use the full bandwidth possible for
sending switchover data, assuming that should always be the most important
way to use the network at that time.

This can resolve issues like "unconvergence migration" which is caused by
hilarious low "migration bandwidth" detected for whatever reason.

Reported-by: Zhiyi Guo 
Reviewed-by: Joao Martins 
Reviewed-by: Juan Quintela 
Signed-off-by: Peter Xu 
Signed-off-by: Juan Quintela 
Message-ID: <20231010221922.40638-1-pet...@redhat.com>
---
 qapi/migration.json| 34 +-
 migration/migration.h  |  2 +-
 migration/options.h|  1 +
 migration/migration-hmp-cmds.c | 14 ++
 migration/migration.c  | 24 +---
 migration/options.c| 28 
 migration/trace-events |  2 +-
 7 files changed, 99 insertions(+), 6 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index d7dfaa5db9..360e609f66 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -758,6 +758,16 @@
 # @max-bandwidth: to set maximum speed for migration.  maximum speed
 # in bytes per second.  (Since 2.8)
 #
+# @avail-switchover-bandwidth: to set the available bandwidth that
+# migration can use during switchover phase.  NOTE!  This does not
+# limit the bandwidth during switchover, but only for calculations when
+# making decisions to switchover.  By default, this value is zero,
+# which means QEMU will estimate the bandwidth automatically.  This can
+# be set when the estimated value is not accurate, while the user is
+# able to guarantee such bandwidth is available when switching over.
+# When specified correctly, this can make the switchover decision much
+# more accurate.  (Since 8.2)
+#
 # @downtime-limit: set maximum tolerated downtime for migration.
 # maximum downtime in milliseconds (Since 2.8)
 #
@@ -839,7 +849,7 @@
'cpu-throttle-initial', 'cpu-throttle-increment',
'cpu-throttle-tailslow',
'tls-creds', 'tls-hostname', 'tls-authz', 'max-bandwidth',
-   'downtime-limit',
+   'avail-switchover-bandwidth', 'downtime-limit',
{ 'name': 'x-checkpoint-delay', 'features': [ 'unstable' ] },
'block-incremental',
'multifd-c

[PULL 16/38] migration/rdma: Unfold ram_control_after_iterate()

2023-10-16 Thread Juan Quintela

Once there:
- Remove unused data parameter
- unfold it in its callers
- change all callers to call qemu_rdma_registration_stop()
- We need to call QIO_CHANNEL_RDMA() after we check for migrate_rdma()

Reviewed-by: Li Zhijian 
Signed-off-by: Juan Quintela 
Message-ID: <20231011203527.9061-4-quint...@redhat.com>
---
 migration/qemu-file.h |  2 --
 migration/rdma.h  |  3 +++
 migration/qemu-file.c | 12 
 migration/ram.c   | 17 ++---
 migration/rdma.c  |  9 -
 5 files changed, 21 insertions(+), 22 deletions(-)

diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index d6a370c569..35e671a01e 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -55,7 +55,6 @@ typedef int (QEMURamSaveFunc)(QEMUFile *f,
   size_t size);
 
 typedef struct QEMUFileHooks {
-QEMURamHookFunc *after_ram_iterate;
 QEMURamHookFunc *hook_ram_load;
 QEMURamSaveFunc *save_page;
 } QEMUFileHooks;
@@ -126,7 +125,6 @@ void qemu_fflush(QEMUFile *f);
 void qemu_file_set_blocking(QEMUFile *f, bool block);
 int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size);
 
-void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data);
 
 /* Whenever this is found in the data stream, the flags
diff --git a/migration/rdma.h b/migration/rdma.h
index 670c67a8cb..c13b94c782 100644
--- a/migration/rdma.h
+++ b/migration/rdma.h
@@ -25,8 +25,11 @@ void rdma_start_incoming_migration(const char *host_port, 
Error **errp);
 
 #ifdef CONFIG_RDMA
 int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags);
+int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags);
 #else
 static inline
 int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags) { return 0; }
+static inline
+int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags) { return 0; }
 #endif
 #endif
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 5e2d73fd68..e7dba2a849 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -298,18 +298,6 @@ void qemu_fflush(QEMUFile *f)
 f->iovcnt = 0;
 }
 
-void ram_control_after_iterate(QEMUFile *f, uint64_t flags)
-{
-int ret = 0;
-
-if (f->hooks && f->hooks->after_ram_iterate) {
-ret = f->hooks->after_ram_iterate(f, flags, NULL);
-if (ret < 0) {
-qemu_file_set_error(f, ret);
-}
-}
-}
-
 void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data)
 {
 if (f->hooks && f->hooks->hook_ram_load) {
diff --git a/migration/ram.c b/migration/ram.c
index 6592431a4e..f1ddc1f9fa 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3065,7 +3065,11 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 }
-ram_control_after_iterate(f, RAM_CONTROL_SETUP);
+
+ret = qemu_rdma_registration_stop(f, RAM_CONTROL_SETUP);
+if (ret < 0) {
+qemu_file_set_error(f, ret);
+}
 
 migration_ops = g_malloc0(sizeof(MigrationOps));
 migration_ops->ram_save_target_page = ram_save_target_page_legacy;
@@ -3187,7 +3191,10 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
  * Must occur before EOS (or any QEMUFile operation)
  * because of RDMA protocol.
  */
-ram_control_after_iterate(f, RAM_CONTROL_ROUND);
+ret = qemu_rdma_registration_stop(f, RAM_CONTROL_ROUND);
+if (ret < 0) {
+qemu_file_set_error(f, ret);
+}
 
 out:
 if (ret >= 0
@@ -3260,7 +3267,11 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 qemu_mutex_unlock(&rs->bitmap_mutex);
 
 ram_flush_compressed_data(rs);
-ram_control_after_iterate(f, RAM_CONTROL_FINISH);
+
+int ret = qemu_rdma_registration_stop(f, RAM_CONTROL_FINISH);
+if (ret < 0) {
+qemu_file_set_error(f, ret);
+}
 }
 
 if (ret < 0) {
diff --git a/migration/rdma.c b/migration/rdma.c
index 3d74ad6db0..4b32d375ec 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3878,20 +3878,20 @@ int qemu_rdma_registration_start(QEMUFile *f, uint64_t 
flags)
  * Inform dest that dynamic registrations are done for now.
  * First, flush writes, if any.
  */
-static int qemu_rdma_registration_stop(QEMUFile *f,
-   uint64_t flags, void *data)
+int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags)
 {
-QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
+QIOChannelRDMA *rioc;
 Error *err = NULL;
 RDMAContext *rdma;
 RDMAControlHeader head = { .len = 0, .repeat = 1 };
 int ret;
 
-if (migration_in_postcopy()) {
+if (!migrate_rdma() || migration_in_postcopy()) {
 return 0;
 }
 
 RCU_READ_LOCK_GUARD();
+rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
 rdma = qatomic_rcu_read(&rioc->rdmaout);
 if (!rdma) {
 return -1;
@@ -3999,7 +3999,6 @@ static const QEMUFileHooks rdma_read_hooks = {
 };
 
 sta

[PULL 11/38] tests/qtest: migration-test: Add tests for file-based migration

2023-10-16 Thread Juan Quintela

From: Fabiano Rosas 

Add basic tests for file-based migration.

Note that we cannot use test_precopy_common because that routine
expects it to be possible to run the migration live. With the file
transport there is no live migration because we must wait for the
source to finish writing the migration data to the file before the
destination can start reading. Add a new migration function
specifically to handle the file migration.

Reviewed-by: Peter Xu 
Reviewed-by: Juan Quintela 
Signed-off-by: Fabiano Rosas 
Signed-off-by: Juan Quintela 
Message-ID: <20230712190742.22294-7-faro...@suse.de>
---
 tests/qtest/migration-test.c | 147 +++
 1 file changed, 147 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index cef5081f8c..da02b6d692 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -68,6 +68,10 @@ static bool got_dst_resume;
 
 #define ANALYZE_SCRIPT "scripts/analyze-migration.py"
 
+#define QEMU_VM_FILE_MAGIC 0x5145564d
+#define FILE_TEST_FILENAME "migfile"
+#define FILE_TEST_OFFSET 0x1000
+
 #if defined(__linux__)
 #include 
 #include 
@@ -884,6 +888,7 @@ static void test_migrate_end(QTestState *from, QTestState 
*to, bool test_dest)
 cleanup("migsocket");
 cleanup("src_serial");
 cleanup("dest_serial");
+cleanup(FILE_TEST_FILENAME);
 }
 
 #ifdef CONFIG_GNUTLS
@@ -1667,6 +1672,70 @@ finish:
 test_migrate_end(from, to, args->result == MIG_TEST_SUCCEED);
 }
 
+static void test_file_common(MigrateCommon *args, bool stop_src)
+{
+QTestState *from, *to;
+void *data_hook = NULL;
+g_autofree char *connect_uri = g_strdup(args->connect_uri);
+
+if (test_migrate_start(&from, &to, args->listen_uri, &args->start)) {
+return;
+}
+
+/*
+ * File migration is never live. We can keep the source VM running
+ * during migration, but the destination will not be running
+ * concurrently.
+ */
+g_assert_false(args->live);
+
+if (args->start_hook) {
+data_hook = args->start_hook(from, to);
+}
+
+migrate_ensure_converge(from);
+wait_for_serial("src_serial");
+
+if (stop_src) {
+qtest_qmp_assert_success(from, "{ 'execute' : 'stop'}");
+if (!got_src_stop) {
+qtest_qmp_eventwait(from, "STOP");
+}
+}
+
+if (args->result == MIG_TEST_QMP_ERROR) {
+migrate_qmp_fail(from, connect_uri, "{}");
+goto finish;
+}
+
+migrate_qmp(from, connect_uri, "{}");
+wait_for_migration_complete(from);
+
+/*
+ * We need to wait for the source to finish before starting the
+ * destination.
+ */
+migrate_incoming_qmp(to, connect_uri, "{}");
+wait_for_migration_complete(to);
+
+if (stop_src) {
+qtest_qmp_assert_success(to, "{ 'execute' : 'cont'}");
+}
+
+if (!got_dst_resume) {
+qtest_qmp_eventwait(to, "RESUME");
+}
+
+wait_for_serial("dest_serial");
+
+finish:
+if (args->finish_hook) {
+args->finish_hook(from, to, data_hook);
+}
+
+test_migrate_end(from, to, args->result == MIG_TEST_SUCCEED);
+}
+
 static void test_precopy_unix_plain(void)
 {
 g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
@@ -1862,6 +1931,76 @@ static void test_precopy_unix_compress_nowait(void)
 test_precopy_common(&args);
 }
 
+static void test_precopy_file(void)
+{
+g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs,
+   FILE_TEST_FILENAME);
+MigrateCommon args = {
+.connect_uri = uri,
+.listen_uri = "defer",
+};
+
+test_file_common(&args, true);
+}
+
+static void file_offset_finish_hook(QTestState *from, QTestState *to,
+void *opaque)
+{
+#if defined(__linux__)
+g_autofree char *path = g_strdup_printf("%s/%s", tmpfs, 
FILE_TEST_FILENAME);
+size_t size = FILE_TEST_OFFSET + sizeof(QEMU_VM_FILE_MAGIC);
+uintptr_t *addr, *p;
+int fd;
+
+fd = open(path, O_RDONLY);
+g_assert(fd != -1);
+addr = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0);
+g_assert(addr != MAP_FAILED);
+
+/*
+ * Ensure the skipped offset contains zeros and the migration
+ * stream starts at the right place.
+ */
+p = addr;
+while (p < addr + FILE_TEST_OFFSET / sizeof(uintptr_t)) {
+g_assert(*p == 0);
+p++;
+}
+g_assert_cmpint(cpu_to_be32(*p), ==, QEMU_VM_FILE_MAGIC);
+
+munmap(addr, size);
+close(fd);
+#endif
+}
+
+static void test_precopy_file_offset(void)
+{
+g_autofree char *uri = g_strdup_printf("file:%s/%s,offset=%d", tmpfs,
+   FILE_TEST_FILENAME,
+   FILE_TEST_OFFSET);
+MigrateCommon args = {
+.connect_uri = uri,
+.listen_uri = "defer",
+.finish_hook = file_offset_finish_hook,
+};
+
+test_file_common(&args, false);
+}
+
+stat

[PULL 19/38] migration/rdma: Create rdma_control_save_page()

2023-10-16 Thread Juan Quintela

The only user of ram_control_save_page() and save_page() hook was
rdma. Just move the function to rdma.c, rename it to
rdma_control_save_page().

Reviewed-by: Peter Xu 
Reviewed-by: Li Zhijian 
Signed-off-by: Juan Quintela 
Message-ID: <20231011203527.9061-7-quint...@redhat.com>
---
 migration/qemu-file.h | 12 
 migration/rdma.h  | 10 ++
 migration/qemu-file.c | 20 
 migration/ram.c   |  4 ++--
 migration/rdma.c  | 19 ++-
 5 files changed, 30 insertions(+), 35 deletions(-)

diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 80c30631dc..60510a2819 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -36,17 +36,7 @@
 #define RAM_CONTROL_ROUND 1
 #define RAM_CONTROL_FINISH3
 
-/*
- * This function allows override of where the RAM page
- * is saved (such as RDMA, for example.)
- */
-typedef int (QEMURamSaveFunc)(QEMUFile *f,
-  ram_addr_t block_offset,
-  ram_addr_t offset,
-  size_t size);
-
 typedef struct QEMUFileHooks {
-QEMURamSaveFunc *save_page;
 } QEMUFileHooks;
 
 QEMUFile *qemu_file_new_input(QIOChannel *ioc);
@@ -125,8 +115,6 @@ int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size);
 #define RAM_SAVE_CONTROL_NOT_SUPP -1000
 #define RAM_SAVE_CONTROL_DELAYED  -2000
 
-int ram_control_save_page(QEMUFile *f, ram_addr_t block_offset,
-  ram_addr_t offset, size_t size);
 QIOChannel *qemu_file_get_ioc(QEMUFile *file);
 
 #endif
diff --git a/migration/rdma.h b/migration/rdma.h
index 8df8b4089a..09a16c1e3c 100644
--- a/migration/rdma.h
+++ b/migration/rdma.h
@@ -17,6 +17,8 @@
 #ifndef QEMU_MIGRATION_RDMA_H
 #define QEMU_MIGRATION_RDMA_H
 
+#include "exec/memory.h"
+
 void rdma_start_outgoing_migration(void *opaque, const char *host_port,
Error **errp);
 
@@ -28,6 +30,8 @@ int qemu_rdma_registration_handle(QEMUFile *f);
 int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags);
 int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags);
 int rdma_block_notification_handle(QEMUFile *f, const char *name);
+int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
+   ram_addr_t offset, size_t size);
 #else
 static inline
 int qemu_rdma_registration_handle(QEMUFile *f) { return 0; }
@@ -37,5 +41,11 @@ static inline
 int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags) { return 0; }
 static inline
 int rdma_block_notification_handle(QEMUFile *f, const char *name) { return 0; }
+static inline
+int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
+   ram_addr_t offset, size_t size)
+{
+return RAM_SAVE_CONTROL_NOT_SUPP;
+}
 #endif
 #endif
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 4a414b8976..745eaf7a5b 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -298,26 +298,6 @@ void qemu_fflush(QEMUFile *f)
 f->iovcnt = 0;
 }
 
-int ram_control_save_page(QEMUFile *f, ram_addr_t block_offset,
-  ram_addr_t offset, size_t size)
-{
-if (f->hooks && f->hooks->save_page) {
-int ret = f->hooks->save_page(f, block_offset, offset, size);
-/*
- * RAM_SAVE_CONTROL_* are negative values
- */
-if (ret != RAM_SAVE_CONTROL_DELAYED &&
-ret != RAM_SAVE_CONTROL_NOT_SUPP) {
-if (ret < 0) {
-qemu_file_set_error(f, ret);
-}
-}
-return ret;
-}
-
-return RAM_SAVE_CONTROL_NOT_SUPP;
-}
-
 /*
  * Attempt to fill the buffer from the underlying file
  * Returns the number of bytes read, or negative value for an error.
diff --git a/migration/ram.c b/migration/ram.c
index 8c462276cd..3b4b09f6ff 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1197,8 +1197,8 @@ static bool control_save_page(PageSearchStatus *pss, 
RAMBlock *block,
 {
 int ret;
 
-ret = ram_control_save_page(pss->pss_channel, block->offset, offset,
-TARGET_PAGE_SIZE);
+ret = rdma_control_save_page(pss->pss_channel, block->offset, offset,
+ TARGET_PAGE_SIZE);
 if (ret == RAM_SAVE_CONTROL_NOT_SUPP) {
 return false;
 }
diff --git a/migration/rdma.c b/migration/rdma.c
index 0b1cb03b2b..f66bd939d7 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3314,6 +3314,24 @@ err:
 return -1;
 }
 
+int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
+   ram_addr_t offset, size_t size)
+{
+if (!migrate_rdma()) {
+return RAM_SAVE_CONTROL_NOT_SUPP;
+}
+
+int ret = qemu_rdma_save_page(f, block_offset, offset, size);
+
+if (ret != RAM_SAVE_CONTROL_DELAYED &&
+ret != RAM_SAVE_CONTROL_NOT_SUPP) {
+if (ret < 0) {
+qemu_file_set_error(f, ret);
+}
+}
+return ret;
+}
+
 static void rdma

[PULL 10/38] tests/qtest/migration: Add a test for the analyze-migration script

2023-10-16 Thread Juan Quintela

From: Fabiano Rosas 

Add a smoke test that migrates to a file and gives it to the
script. It should catch the most annoying errors such as changes in
the ram flags.

After code has been merged it becomes way harder to figure out what is
causing the script to fail, the person making the change is the most
likely to know right away what the problem is.

Signed-off-by: Fabiano Rosas 
Acked-by: Thomas Huth 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Message-ID: <20231009184326.15777-7-faro...@suse.de>
---
 tests/qtest/migration-test.c | 60 
 tests/qtest/meson.build  |  2 ++
 2 files changed, 62 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 8eb2053dbb..cef5081f8c 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -66,6 +66,8 @@ static bool got_dst_resume;
  */
 #define DIRTYLIMIT_TOLERANCE_RANGE  25  /* MB/s */
 
+#define ANALYZE_SCRIPT "scripts/analyze-migration.py"
+
 #if defined(__linux__)
 #include 
 #include 
@@ -1501,6 +1503,61 @@ static void test_baddest(void)
 test_migrate_end(from, to, false);
 }
 
+#ifndef _WIN32
+static void test_analyze_script(void)
+{
+MigrateStart args = {
+.opts_source = "-uuid ----",
+};
+QTestState *from, *to;
+g_autofree char *uri = NULL;
+g_autofree char *file = NULL;
+int pid, wstatus;
+const char *python = g_getenv("PYTHON");
+
+if (!python) {
+g_test_skip("PYTHON variable not set");
+return;
+}
+
+/* dummy url */
+if (test_migrate_start(&from, &to, "tcp:127.0.0.1:0", &args)) {
+return;
+}
+
+/*
+ * Setting these two capabilities causes the "configuration"
+ * vmstate to include subsections for them. The script needs to
+ * parse those subsections properly.
+ */
+migrate_set_capability(from, "validate-uuid", true);
+migrate_set_capability(from, "x-ignore-shared", true);
+
+file = g_strdup_printf("%s/migfile", tmpfs);
+uri = g_strdup_printf("exec:cat > %s", file);
+
+migrate_ensure_converge(from);
+migrate_qmp(from, uri, "{}");
+wait_for_migration_complete(from);
+
+pid = fork();
+if (!pid) {
+close(1);
+open("/dev/null", O_WRONLY);
+execl(python, python, ANALYZE_SCRIPT, "-f", file, NULL);
+g_assert_not_reached();
+}
+
+g_assert(waitpid(pid, &wstatus, 0) == pid);
+if (WIFEXITED(wstatus) && WEXITSTATUS(wstatus) != 0) {
+g_test_message("Failed to analyze the migration stream");
+g_test_fail();
+}
+test_migrate_end(from, to, false);
+cleanup("migfile");
+}
+#endif
+
 static void test_precopy_common(MigrateCommon *args)
 {
 QTestState *from, *to;
@@ -2837,6 +2894,9 @@ int main(int argc, char **argv)
 }
 
 qtest_add_func("/migration/bad_dest", test_baddest);
+#ifndef _WIN32
+qtest_add_func("/migration/analyze-script", test_analyze_script);
+#endif
 qtest_add_func("/migration/precopy/unix/plain", test_precopy_unix_plain);
 qtest_add_func("/migration/precopy/unix/xbzrle", test_precopy_unix_xbzrle);
 /*
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 66795cfcd2..d6022ebd64 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -357,6 +357,8 @@ foreach dir : target_dirs
 test_deps += [qsd]
   endif
 
+  qtest_env.set('PYTHON', python.full_path())
+
   foreach test : target_qtests
 # Executables are shared across targets, declare them only the first time 
we
 # encounter them
-- 
2.41.0

[PULL 00/38] Migration 20231016 patches

2023-10-16 Thread Juan Quintela

The following changes since commit 63011373ad22c794a013da69663c03f1297a5c56:

  Merge tag 'pull-riscv-to-apply-20231012-1' of 
https://github.com/alistair23/qemu into staging (2023-10-12 10:24:44 -0400)

are available in the Git repository at:

  https://gitlab.com/juan.quintela/qemu.git tags/migration-20231016-pull-request

for you to fetch changes up to f39b0f42753635b0f2d8b00a26d11bb197bf51e2:

  migration/multifd: Clarify Error usage in multifd_channel_connect (2023-10-16 
11:01:33 +0200)


Migration Pull request (20231016)

In this pull request:
- rdma cleanups
- removal of QEMUFileHook
- test for analyze-migration.py
- test for multifd file
- multifd cleanups
- available switchover bandwidth
- lots of cleanups.

CI: https://gitlab.com/juan.quintela/qemu/-/pipelines/1037878829

Please, apply.



Dmitry Frolov (1):
  migration: fix RAMBlock add NULL check

Elena Ufimtseva (3):
  migration: check for rate_limit_max for RATE_LIMIT_DISABLED
  multifd: fix counters in multifd_send_thread
  multifd: reset next_packet_len after sending pages

Fabiano Rosas (13):
  migration: Fix analyze-migration.py 'configuration' parsing
  migration: Add capability parsing to analyze-migration.py
  migration: Fix analyze-migration.py when ignore-shared is used
  migration: Fix analyze-migration read operation signedness
  tests/qtest/migration: Add a test for the analyze-migration script
  tests/qtest: migration-test: Add tests for file-based migration
  migration/ram: Remove RAMState from xbzrle_cache_zero_page
  migration/ram: Stop passing QEMUFile around in save_zero_page
  migration/ram: Move xbzrle zero page handling into save_zero_page
  migration/ram: Merge save_zero_page functions
  migration/multifd: Remove direct "socket" references
  migration/multifd: Unify multifd_send_thread error paths
  migration/multifd: Clarify Error usage in multifd_channel_connect

Fiona Ebner (1):
  migration: hold the BQL during setup

Juan Quintela (15):
  migration: Non multifd migration don't care about multifd flushes
  migration: Create migrate_rdma()
  migration/rdma: Unfold ram_control_before_iterate()
  migration/rdma: Unfold ram_control_after_iterate()
  migration/rdma: Remove all uses of RAM_CONTROL_HOOK
  migration/rdma: Unfold hook_ram_load()
  migration/rdma: Create rdma_control_save_page()
  qemu-file: Remove QEMUFileHooks
  migration/rdma: Move rdma constants from qemu-file.h to rdma.h
  migration/rdma: Remove qemu_ prefix from exported functions
  migration/rdma: Check sooner if we are in postcopy for save_page()
  migration/rdma: Use i as for index instead of idx
  migration/rdma: Declare for index variables local
  migration/rdma: Remove all "ret" variables that are used only once
  migration: Improve json and formatting

Nikolay Borisov (2):
  migration: Add the configuration vmstate to the json writer
  migration/ram: Refactor precopy ram loading code

Peter Xu (1):
  migration: Allow user to specify available switchover bandwidth

Philippe Mathieu-Daudé (1):
  migration: Use g_autofree to simplify ram_dirty_bitmap_reload()

Wei Wang (1):
  migration: refactor migration_completion

 qapi/migration.json|  41 -
 include/migration/register.h   |   2 +-
 migration/migration.h  |   4 +-
 migration/options.h|   2 +
 migration/qemu-file.h  |  49 --
 migration/rdma.h   |  42 +
 migration/block-dirty-bitmap.c |   3 -
 migration/block.c  |   5 -
 migration/migration-hmp-cmds.c |  14 ++
 migration/migration-stats.c|   9 +-
 migration/migration.c  | 199 +
 migration/multifd.c| 101 +--
 migration/options.c|  35 
 migration/qemu-file.c  |  61 +--
 migration/ram.c| 306 ++---
 migration/rdma.c   | 259 
 migration/savevm.c |  22 ++-
 tests/qtest/migration-test.c   | 207 ++
 migration/trace-events |  33 ++--
 scripts/analyze-migration.py   |  67 +++-
 tests/qtest/meson.build|   2 +
 21 files changed, 895 insertions(+), 568 deletions(-)

-- 
2.41.0

[PULL 20/38] qemu-file: Remove QEMUFileHooks

2023-10-16 Thread Juan Quintela

The only user was rdma, and its use is gone.

Reviewed-by: Peter Xu 
Reviewed-by: Li Zhijian 
Signed-off-by: Juan Quintela 
Message-ID: <20231011203527.9061-8-quint...@redhat.com>
---
 migration/qemu-file.h | 4 
 migration/qemu-file.c | 6 --
 migration/rdma.c  | 9 -
 3 files changed, 19 deletions(-)

diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 60510a2819..0b22d8335f 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -36,12 +36,8 @@
 #define RAM_CONTROL_ROUND 1
 #define RAM_CONTROL_FINISH3
 
-typedef struct QEMUFileHooks {
-} QEMUFileHooks;
-
 QEMUFile *qemu_file_new_input(QIOChannel *ioc);
 QEMUFile *qemu_file_new_output(QIOChannel *ioc);
-void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks);
 int qemu_fclose(QEMUFile *f);
 
 /*
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 745eaf7a5b..3fb25148d1 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -38,7 +38,6 @@
 #define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 64)
 
 struct QEMUFile {
-const QEMUFileHooks *hooks;
 QIOChannel *ioc;
 bool is_writable;
 
@@ -133,11 +132,6 @@ QEMUFile *qemu_file_new_input(QIOChannel *ioc)
 return qemu_file_new_impl(ioc, false);
 }
 
-void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks)
-{
-f->hooks = hooks;
-}
-
 /*
  * Get last error for stream f with optional Error*
  *
diff --git a/migration/rdma.c b/migration/rdma.c
index f66bd939d7..9883b0a250 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -4003,13 +4003,6 @@ err:
 return -1;
 }
 
-static const QEMUFileHooks rdma_read_hooks = {
-};
-
-static const QEMUFileHooks rdma_write_hooks = {
-};
-
-
 static void qio_channel_rdma_finalize(Object *obj)
 {
 QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(obj);
@@ -4061,7 +4054,6 @@ static QEMUFile *rdma_new_input(RDMAContext *rdma)
 rioc->file = qemu_file_new_input(QIO_CHANNEL(rioc));
 rioc->rdmain = rdma;
 rioc->rdmaout = rdma->return_path;
-qemu_file_set_hooks(rioc->file, &rdma_read_hooks);
 
 return rioc->file;
 }
@@ -4073,7 +4065,6 @@ static QEMUFile *rdma_new_output(RDMAContext *rdma)
 rioc->file = qemu_file_new_output(QIO_CHANNEL(rioc));
 rioc->rdmaout = rdma;
 rioc->rdmain = rdma->return_path;
-qemu_file_set_hooks(rioc->file, &rdma_write_hooks);
 
 return rioc->file;
 }
-- 
2.41.0

[PULL 26/38] migration/rdma: Remove all "ret" variables that are used only once

2023-10-16 Thread Juan Quintela

Change code that is:

int ret;
...

ret = foo();
if (ret[ < 0]?) {

to:

if (foo()[ < 0]) {

Reviewed-by: Fabiano Rosas 
Reviewed-by: Li Zhijian 
Signed-off-by: Juan Quintela 
Message-ID: <20231011203527.9061-14-quint...@redhat.com>
---
 migration/rdma.c | 29 -
 1 file changed, 8 insertions(+), 21 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 09015fbd1a..2a1852ec7f 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1107,7 +1107,6 @@ err_alloc_pd_cq:
 static int qemu_rdma_alloc_qp(RDMAContext *rdma)
 {
 struct ibv_qp_init_attr attr = { 0 };
-int ret;
 
 attr.cap.max_send_wr = RDMA_SIGNALED_SEND_MAX;
 attr.cap.max_recv_wr = 3;
@@ -1117,8 +1116,7 @@ static int qemu_rdma_alloc_qp(RDMAContext *rdma)
 attr.recv_cq = rdma->recv_cq;
 attr.qp_type = IBV_QPT_RC;
 
-ret = rdma_create_qp(rdma->cm_id, rdma->pd, &attr);
-if (ret < 0) {
+if (rdma_create_qp(rdma->cm_id, rdma->pd, &attr) < 0) {
 return -1;
 }
 
@@ -1130,8 +1128,8 @@ static int qemu_rdma_alloc_qp(RDMAContext *rdma)
 static bool rdma_support_odp(struct ibv_context *dev)
 {
 struct ibv_device_attr_ex attr = {0};
-int ret = ibv_query_device_ex(dev, NULL, &attr);
-if (ret) {
+
+if (ibv_query_device_ex(dev, NULL, &attr)) {
 return false;
 }
 
@@ -1508,7 +1506,6 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
struct ibv_comp_channel *comp_channel)
 {
 struct rdma_cm_event *cm_event;
-int ret;
 
 /*
  * Coroutine doesn't start until migration_fd_process_incoming()
@@ -1544,8 +1541,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
 }
 
 if (pfds[1].revents) {
-ret = rdma_get_cm_event(rdma->channel, &cm_event);
-if (ret < 0) {
+if (rdma_get_cm_event(rdma->channel, &cm_event) < 0) {
 return -1;
 }
 
@@ -2317,12 +2313,10 @@ static int qemu_rdma_write(RDMAContext *rdma,
 uint64_t current_addr = block_offset + offset;
 uint64_t index = rdma->current_index;
 uint64_t chunk = rdma->current_chunk;
-int ret;
 
 /* If we cannot merge it, we flush the current buffer first. */
 if (!qemu_rdma_buffer_mergeable(rdma, current_addr, len)) {
-ret = qemu_rdma_write_flush(rdma, errp);
-if (ret < 0) {
+if (qemu_rdma_write_flush(rdma, errp) < 0) {
 return -1;
 }
 rdma->current_length = 0;
@@ -2936,7 +2930,6 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
 static int qemu_rdma_drain_cq(RDMAContext *rdma)
 {
 Error *err = NULL;
-int ret;
 
 if (qemu_rdma_write_flush(rdma, &err) < 0) {
 error_report_err(err);
@@ -2944,8 +2937,7 @@ static int qemu_rdma_drain_cq(RDMAContext *rdma)
 }
 
 while (rdma->nb_sent) {
-ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_RDMA_WRITE, NULL);
-if (ret < 0) {
+if (qemu_rdma_block_for_wrid(rdma, RDMA_WRID_RDMA_WRITE, NULL) < 0) {
 error_report("rdma migration: complete polling error!");
 return -1;
 }
@@ -3323,12 +3315,10 @@ static void rdma_accept_incoming_migration(void 
*opaque);
 static void rdma_cm_poll_handler(void *opaque)
 {
 RDMAContext *rdma = opaque;
-int ret;
 struct rdma_cm_event *cm_event;
 MigrationIncomingState *mis = migration_incoming_get_current();
 
-ret = rdma_get_cm_event(rdma->channel, &cm_event);
-if (ret < 0) {
+if (rdma_get_cm_event(rdma->channel, &cm_event) < 0) {
 error_report("get_cm_event failed %d", errno);
 return;
 }
@@ -4053,14 +4043,11 @@ static QEMUFile *rdma_new_output(RDMAContext *rdma)
 static void rdma_accept_incoming_migration(void *opaque)
 {
 RDMAContext *rdma = opaque;
-int ret;
 QEMUFile *f;
 Error *local_err = NULL;
 
 trace_qemu_rdma_accept_incoming_migration();
-ret = qemu_rdma_accept(rdma);
-
-if (ret < 0) {
+if (qemu_rdma_accept(rdma) < 0) {
 error_report("RDMA ERROR: Migration initialization failed");
 return;
 }
-- 
2.41.0

1 2 3 4 5 >

1 - 100 of 489 matches

Mail list logo