date:20250225

Re: [PATCH v2 02/10] loongarch: reset vcpu after it's created

2025-02-25 Thread bibo mao





On 2025/2/8 上午12:20, Igor Mammedov wrote:

Reseting vcpu before its thread is created, caused various issues in the past
for other targets. It doesn't cause issues for loongarch at the moment but
to be consistent with the rest of targets, move reset during realize time
after qemu_init_vcpu().

That basically prevents reset being run when when vCPU is in incositent state
(i.e. accelerator hasn't initialized vCPU yet).

Signed-off-by: Igor Mammedov 
---
CC: Song Gao 
---
  target/loongarch/cpu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index e91f4a5239..15018d43ae 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -634,8 +634,8 @@ static void loongarch_cpu_realizefn(DeviceState *dev, Error 
**errp)
  
  loongarch_cpu_register_gdb_regs_for_features(cs);
  
-cpu_reset(cs);

  qemu_init_vcpu(cs);
+cpu_reset(cs);
  
  lacc->parent_realize(dev, errp);

  }

yes, that is actually a problem in kvm mode. vCPU thread should be 
created and kvm_fd for the vCPU need valid before cpu_reset().


Reviewed-by: Bibo Mao

Re: [PATCH] hw/i386: introduce x86_firmware_reconfigure api

2025-02-25 Thread Gerd Hoffmann

On Tue, Feb 25, 2025 at 09:10:40AM +0530, Ani Sinha wrote:
> On Mon, Feb 24, 2025 at 9:01 PM Gerd Hoffmann  wrote:
> >
> >   Hi,
> >
> > >  /* should only be called once */
> > > -if (ovmf_flash_parsed) {
> > > +if (ovmf_flash_parsed && !force) {
> >
> > I think it makes more sense to clear ovmf_flash_parsed when replacing
> > the firmware (instead of adding the force override).
> 
> I thought about that but wondered if that is an internal
> variable/implementation specific detail which should not be exposed
> outside.

You don't have to expose the variable directly, you can also provide a
helper function to clear it.  Best with a name which documents the
purpose.

take care,
  Gerd

RE: [PATCH v3 00/28] Support AST2700 A1

2025-02-25 Thread Jamin Lin

Hi Cedric, 

> Cc: Troy Lee 
> Subject: Re: [PATCH v3 00/28] Support AST2700 A1
> 
> On 2/20/25 06:11, Jamin Lin wrote:
> > Hi Cedric,
> >
> >> Subject: Re: [PATCH v3 00/28] Support AST2700 A1
> >>
> >> Hello Jamin,
> >>
> >>
> >> On 2/13/25 04:35, Jamin Lin wrote:
> >>> v1:
> >>>1. Refactor INTC model to support both INTC0 and INTC1.
> >>>2. Support AST2700 A1.
> >>>3. Create ast2700a0-evb machine.
> >>>
> >>> v2:
> >>> To streamline the review process, split the following patch series 
> >>> into
> >>> three parts.
> >>>
> >> https://patchwork.kernel.org/project/qemu-devel/cover/20250121070424.
> >> 246
> >> 5942-1-jamin_...@aspeedtech.com/
> >>> This patch series focuses on cleaning up the INTC model to
> >>> facilitate future support for the INTC_IO model.
> >>>
> >>> v3:
> >>>1. Update and add functional test for AST2700
> >>>2. Add AST2700 INTC design guidance and its block diagram.
> >>>3. Retaining the INTC naming and introducing a new INTCIO model
> >>> to
> >> support the AST2700 A1.
> >>>4. Create ast2700a1-evb machine and rename ast2700a0-evb
> machine
> >>>5. Fix silicon revision issue and support AST2700 A1.
> >>>
> >>> With the patch applied, QEMU now supports two machines for running
> >> AST2700 SoCs:
> >>> ast2700a0-evb: Designed for AST2700 A0
> >>> ast2700a1-evb: Designed for AST2700 A1
> >>>
> >>> Test information
> >>> 1. QEMU version:
> >>>
> >>
> https://github.com/qemu/qemu/commit/ffaf7f0376f8040ce9068d71ae9ae872
> >> 25
> >>> 05c42e
> >>> 2. ASPEED SDK v09.05 pre-built image
> >>>  https://github.com/AspeedTech-BMC/openbmc/releases/tag/v09.05
> >>>  ast2700-default-obmc.tar.gz (AST2700 A1)
> >>>
> >>
> https://github.com/AspeedTech-BMC/openbmc/releases/download/v09.05/as
> >> t
> >> 2700-default-obmc.tar.gz
> >>>  ast2700-a0-default-obmc.tar.gz (AST2700 A0)
> >>>
> >>>
> >>
> https://github.com/AspeedTech-BMC/openbmc/releases/download/v09.05/as
> >> t
> >>> 2700-a0-default-obmc.tar.gz
> >>
> >> The part adding new functional tests needs a rework. See comment.
> >>
> >>> Known Issue:
> >>> The HACE crypto and hash engine is enable by default since AST2700 A1.
> >>> However, aspeed_hace.c(HACE model) currently does not support the
> >> CRYPTO command.
> >>> To boot AST2700 A1, I have created a Patch 21 which temporarily
> >>> resolves the issue by sending an interrupt to notify the firmware
> >>> that the cryptographic command has completed. It is a temporary
> >>> workaround to resolve the boot issue in the Crypto Manager SelfTest.
> >>>
> >>> As a result, you will encounter the following kernel warning due to
> >>> the Crypto Manager test failure. If you don't want to see these
> >>> kernel warning, please add the following settings in your kernel config.
> >>>
> >>> ```
> >>> CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y
> >>> ```
> >>
> >> Would it be possible to send the hace changes in its own series ?
> >>
> >>
> > Currently, the HACE HW engine and crypto self-tests are enabled by
> > default in SDK v09.05 FW. To boot QEMU for AST2700 A1 with the SDK
> > v09.05 pre-built image, a CRYPTO workaround patch is required.
> 
> can we upstream the HACE changes before having full AST2700 A1 support ?
> 

I send HACE patch here, 
https://patchwork.kernel.org/project/qemu-devel/cover/20250225075622.305515-1-jamin_...@aspeedtech.com/
Thanks-Jamin

> Thanks,
> 
> C.
> 
> 
> 
> 
> >
> > The AST2700 A1 patch series includes functional tests. To make the
> > functional tests pass for AST2700 A1, I have included the HACE patch in the
> same patch series.
> >
> > There are two ways to split this patch series:
> >
> > Solution A:
> >
> > 1. Create series 1 to support AST2700 A1.
> > 2. Create series 2 to support HACE.
> > 3. Create series 3 to support AST2700 A1 functional tests.
> >
> > Series 3 should depend on series 1 and 2.
> >
> > Solution B:
> >
> > 1. Place a pre-built image called "ast2700-a1-qemu-disable-self-test" in the
> SDK v09.05 Github repository.
> > https://github.com/AspeedTech-BMC/openbmc/releases/tag/v09.05
> > 2. Create one patch series to support AST2700 A1 with its functional tests.
> > 3. Create series 2 to support HACE.
> >
> > Could you tell me which solution you prefer or could you please give me any
> suggestion?
> >
> > Thanks-Jamin
> >
> >>>
> >>> Jamin Lin (28):
> >>> hw/intc/aspeed: Support setting different memory and register size
> >>> hw/intc/aspeed: Introduce helper functions for enable and status
> >>>   registers
> >>> hw/intc/aspeed: Add object type name to trace events for better
> >>>   debugging
> >>> hw/arm/aspeed: Rename IRQ table and machine name for AST2700
> A0
> >>> hw/arm/aspeed_ast27x0: Sort the IRQ table by IRQ number
> >>> hw/intc/aspeed: Support different memory region ops
> >>> hw/intc/aspeed: Rename num_ints to num_inpins for clarity
> >>> hw/intc/aspeed: Add support for multiple output pins in INTC
> >>> hw/intc/aspeed: Refactor INTC to suppo

Re: [PATCH 09/15] rust: irq: wrap IRQState with Opaque<>

2025-02-25 Thread Zhao Liu

> +/// An opaque wrapper around [`bindings::IRQState`].
> +#[repr(transparent)]
> +#[derive(Debug, qemu_api_macros::Wrapper)]
> +pub struct IRQState(Opaque);
> +
>  /// Interrupt sources are used by devices to pass changes to a value 
> (typically
>  /// a boolean).  The interrupt sink is usually an interrupt controller or
>  /// GPIO controller.
> @@ -22,8 +28,7 @@
>  /// method sends a `true` value to the sink.  If the guest has to see a
>  /// different polarity, that change is performed by the board between the
>  /// device and the interrupt controller.
> -pub type IRQState = bindings::IRQState;
> -
> +///
>  /// Interrupts are implemented as a pointer to the interrupt "sink", which 
> has
>  /// type [`IRQState`].  A device exposes its source as a QOM link property 
> using
>  /// a function such as [`SysBusDeviceMethods::init_irq`], and
> @@ -41,7 +46,7 @@ pub struct InterruptSource
>  where
>  c_int: From,
>  {
> -cell: BqlCell<*mut IRQState>,
> +cell: BqlCell<*mut bindings::IRQState>,

Once we've already wrapper IRQState in Opaque<>, should we still use
bindings::IRQState?

Although InterruptSource just stores a pointer. However, I think we can
use wrapped IRQState here instead of the native binding type, since this
item is also crossing the FFI boundary. What do you think?

>  _marker: PhantomData,
>  }
>

Re: [PATCH 05/10] qapi/parser: adjust info location for doc body section

2025-02-25 Thread Markus Armbruster

John Snow  writes:

> Instead of using the info object for the doc block as a whole (which
> always points to the very first line of the block), update the info
> pointer for each call to ensure_untagged_section when the existing
> section is otherwise empty. This way, Sphinx error information will
> match precisely to where the text actually starts.
>
> For example, this patch will move the info pointer for the "Hello!"
> untagged section ...
>
>> ##   <-- from here ...
>> # Hello! <-- ... to here.
>> ##
>
> Signed-off-by: John Snow 

This patch would be easier to accept with a test case where it improves
the error location.  I tried to construct one quickly, but failed.  Can
you help?

Re: [PATCH 10/15] rust: qom: wrap Object with Opaque<>

2025-02-25 Thread Zhao Liu

> @@ -621,7 +629,7 @@ pub trait ObjectImpl: ObjectType + IsA {
>  /// We expect the FFI user of this function to pass a valid pointer that
>  /// can be downcasted to type `T`. We also expect the device is
>  /// readable/writeable from one thread at any time.
> -unsafe extern "C" fn rust_unparent_fn(dev: *mut Object) {
> +unsafe extern "C" fn rust_unparent_fn(dev: *mut 
> bindings::Object) {
>  let state = NonNull::new(dev).unwrap().cast::();
>  T::UNPARENT.unwrap()(unsafe { state.as_ref() });
>  }
> @@ -796,8 +804,9 @@ fn new() -> Owned {
>  // SAFETY: the object created by object_new is allocated on
>  // the heap and has a reference count of 1
>  unsafe {
> -let obj = &*object_new(Self::TYPE_NAME.as_ptr());
> -Owned::from_raw(obj.unsafe_cast::())
> +let obj = object_new(Self::TYPE_NAME.as_ptr());

Having the same name is always not ideal, so let's name the first one raw_obj.

> +let obj = Object::from_raw(obj).unsafe_cast::();
> +Owned::from_raw(obj)
>  }
>  }
>  }

Others look good to me,

Reviewed-by: Zhao Liu 

>

Re: [PATCH 09/15] rust: irq: wrap IRQState with Opaque<>

2025-02-25 Thread Paolo Bonzini


On 2/25/25 09:26, Zhao Liu wrote:

+/// An opaque wrapper around [`bindings::IRQState`].
+#[repr(transparent)]
+#[derive(Debug, qemu_api_macros::Wrapper)]
+pub struct IRQState(Opaque);
+
  /// Interrupt sources are used by devices to pass changes to a value 
(typically
  /// a boolean).  The interrupt sink is usually an interrupt controller or
  /// GPIO controller.
@@ -22,8 +28,7 @@
  /// method sends a `true` value to the sink.  If the guest has to see a
  /// different polarity, that change is performed by the board between the
  /// device and the interrupt controller.
-pub type IRQState = bindings::IRQState;
-
+///
  /// Interrupts are implemented as a pointer to the interrupt "sink", which has
  /// type [`IRQState`].  A device exposes its source as a QOM link property 
using
  /// a function such as [`SysBusDeviceMethods::init_irq`], and
@@ -41,7 +46,7 @@ pub struct InterruptSource
  where
  c_int: From,
  {
-cell: BqlCell<*mut IRQState>,
+cell: BqlCell<*mut bindings::IRQState>,


Once we've already wrapper IRQState in Opaque<>, should we still use
bindings::IRQState?

Although InterruptSource just stores a pointer. However, I think we can
use wrapped IRQState here instead of the native binding type, since this
item is also crossing the FFI boundary. What do you think?


Using the wrapped IRQState would be a bit more code because you have to 
cast the pointer all the time when calling C code.  As far as 
correctness is concerned, it does not really matter because as you said 
it only stores a pointer.


However, if needed, InterruptSource could have a function that converts 
from IRQState to Option<&Opaque>.  Since the accessor is 
needed anyway, that would be a good place to put the conversion.


Paolo

Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support

2025-02-25 Thread Gerd Hoffman

On Tue, Feb 25, 2025 at 10:51:08AM +0530, Ani Sinha wrote:
> On Mon, Feb 24, 2025 at 9:17 PM Gerd Hoffman  wrote:
> >
> > Works nicely for me.  Test case:
> >   https://kraxel.gitlab.io/uefi-tools-rs/seabios.efi
> 
> yeah if I can't get my unit test working we can make this an
> integration test. or best case scenario, we can have both.

Do you have a branch with your unit test somewhere?

> > Also this adds five fw_cfg files.  Given that the number of fw_cfg files
> > we can have is limited I'm not convinced this is the best idea to move
> > forward.
> 
> Right, For implementation, I suggest we combine FILE_VMFWUPDATE_OBLOB
> and FILE_VMFWUPDATE_FWBLOB together and also make
> FILE_VMFWUPDATE_CONTROL part of the same structure. These are all
> writable by the guest so makes sense to have one fwcfg for it. We can
> have another read-only fwcfg for FILE_VMFWUPDATE_BIOS_SIZE and
> FILE_VMFWUPDATE_CAP.

I'd prefer to put everything into one file.  Maybe except the opaque
blob page.  A struct along the lines of:

struct vmfwupdate {
u64 capabilities;   // 'cap' file
u64 firmware_size;  // 'bios-size' file
u64 control;// disable bit etc. 
u64 update_paddr;   // 'fw-blob' file, paddr field
u64 update_size;// 'fw-blob' file, size field
}

On the host side you'll need two copies of the struct then: one where
the guest can read from and write to, and one shadow struct where the
actual values are stored.  The write callback goes sanity-check the
guest-written data, takes everything which passes into the shadow
struct, finally goes copy back the shadow struct to the guest struct
so the guest can see what the host has accepted.

Part of the verification process can be that you already copy the
firmware to a host buffer.  Best using dma_* functions, these return
errors in case something went wrong (say the paddr passed is not backed
by guest ram).  Which gives you the chance to propagate those errors
back to the guest before it'll actually try to launch the updated
firmware via reset.

take care,
  Gerd

Re: [PATCH 12/15] rust: sysbus: wrap SysBusDevice with Opaque<>

2025-02-25 Thread Zhao Liu

On Fri, Feb 21, 2025 at 06:03:39PM +0100, Paolo Bonzini wrote:
> Date: Fri, 21 Feb 2025 18:03:39 +0100
> From: Paolo Bonzini 
> Subject: [PATCH 12/15] rust: sysbus: wrap SysBusDevice with Opaque<>
> X-Mailer: git-send-email 2.48.1
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  rust/hw/timer/hpet/src/hpet.rs |  2 +-
>  rust/qemu-api/src/bindings.rs  |  3 ---
>  rust/qemu-api/src/sysbus.rs| 25 ++---
>  3 files changed, 19 insertions(+), 11 deletions(-)
> 
> diff --git a/rust/hw/timer/hpet/src/hpet.rs b/rust/hw/timer/hpet/src/hpet.rs
> index be27eb0eff4..19e63465cff 100644
> --- a/rust/hw/timer/hpet/src/hpet.rs
> +++ b/rust/hw/timer/hpet/src/hpet.rs
> @@ -741,7 +741,7 @@ fn reset_hold(&self, _type: ResetType) {
>  HPETFwConfig::update_hpet_cfg(
>  self.hpet_id.get(),
>  self.capability.get() as u32,
> -sbd.mmio[0].addr,
> +unsafe { *sbd.as_ptr() }.mmio[0].addr,

We can wrap this unsafe code into SysBusDeviceMethods...then the device
won't introduce more unsafe code.

>  );
>  
>  // to document that the RTC lowers its output on reset as well

Others, fine for me,

Reviewed-by: Zhao Liu

Re: [PATCH v2 0/2] Emulated AMD IOMMU cleanup and fixes

2025-02-25 Thread Michael Tokarev


07.02.2025 07:53, Sairaj Kodilkar wrote:

This series provides few bug fixes for emulated AMD IOMMU. The series is based
on top of qemu upstream master commit d922088eb4ba.

Patch 1: The code was using wrong DTE field to determine interrupt passthrough.
  Hence replaced it with correct field according to [1].

Patch 2: Current code sets the PCI capability BAR low and high to the
  lower and upper 16 bits of AMDVI_BASE_ADDR respectively, which is
  wrong. Instead use 32 bit mask to set the PCI capability BAR low and
  high.
  The guest IOMMU driver works with current qemu code because it uses
  base address from the IVRS table and not the one provided by
  PCI capability.

Sairaj Kodilkar (2):
   amd_iommu: Use correct DTE field for interrupt passthrough
   amd_iommu: Use correct bitmask to set capability BAR

  hw/i386/amd_iommu.c | 10 +-
  hw/i386/amd_iommu.h |  2 +-
  2 files changed, 6 insertions(+), 6 deletions(-)


Is this qemu-stable material (current series: 7.2, 8.2, 9.2)?

3684717b74 "amd_iommu: Use correct bitmask to set capability BAR" does
not apply to 7.2, since v8.0.0-10-g6291a28645 "hw/i386/amd_iommu: Explicit
use of AMDVI_BASE_ADDR in amdvi_init" in not in 7.2, but the change can be
adjusted for 7.2 easily, or 6291a28645 can be picked up too.

Thanks,

/mjt

Re: [PATCH v2 0/9] target/loongarch: LoongArch32 fixes 1

2025-02-25 Thread bibo mao





On 2025/2/25 上午8:40, Jiaxun Yang wrote:

Hi all,

This series is a collection of small fixes I made to TCG for
LoongArch32.

There are still many thing broken, especially on CSRs. More
series following. However this is sufficient to boot 32bit
kernel.
Is there any product introduction about LoongArch32 board? such as MMU 
type, memory type(DDR or SRAM), interrupt controller type.


Regards
Bibo Mao


Thanks for revivewing!

Signed-off-by: Jiaxun Yang 
---
Changes in v2:
- Addressing minor review comments
- Don't create 32bit vairant, simply allow 32bit CPU on qemu-loongarch64
- Link to v1: 
https://lore.kernel.org/r/20241222-la32-fixes1-v1-0-8c62b7e59...@flygoat.com

---
Jiaxun Yang (9):
   target/loongarch: Enable rotr.w/rotri.w for LoongArch32
   target/loongarch: Fix address generation for gen_sc
   target/loongarch: Fix PGD CSR for LoongArch32
   target/loongarch: Perform sign extension for IOCSR reads
   target/loongarch: Use target_ulong for iocsrrd helper results
   target/loongarch: Fix some modifiers for log formatting
   target/loongarch: Use target_ulong for CSR helpers
   target/loongarch: Fix load type for gen_ll
   target/loongarch: Introduce max32 CPU type

  target/loongarch/cpu.c | 152 +
  target/loongarch/helper.h  |  22 +--
  target/loongarch/tcg/csr_helper.c  |   2 +-
  target/loongarch/tcg/insn_trans/trans_atomic.c.inc |   8 +-
  target/loongarch/tcg/insn_trans/trans_shift.c.inc  |   4 +-
  target/loongarch/tcg/iocsr_helper.c|  20 +--
  target/loongarch/tcg/op_helper.c   |   4 +-
  target/loongarch/tcg/tlb_helper.c  |   2 +-
  target/loongarch/tcg/translate.c   |   5 +-
  9 files changed, 155 insertions(+), 64 deletions(-)
---
base-commit: 65cb7129f4160c7e07a0da107f888ec73ae96776
change-id: 20241222-la32-fixes1-368cc14d0986

Best regards,

Re: [PATCH 11/15] rust: qdev: wrap Clock and DeviceState with Opaque<>

2025-02-25 Thread Zhao Liu

> -unsafe extern "C" fn rust_realize_fn(dev: *mut DeviceState, 
> _errp: *mut *mut Error) {
> +unsafe extern "C" fn rust_realize_fn(
> +dev: *mut bindings::DeviceState,
> +_errp: *mut *mut Error,
> +) {
>  let state = NonNull::new(dev).unwrap().cast::();
>  T::REALIZE.unwrap()(unsafe { state.as_ref() });
>  }
> @@ -251,7 +270,7 @@ fn init_clock_in FnCall<(&'a Self::Target, 
> ClockEvent)>>(
>  events: ClockEvent,
>  ) -> Owned {
>  fn do_init_clock_in(
> -dev: *mut DeviceState,
> +dev: &DeviceState,
>  name: &str,
>  cb: Option,
>  events: ClockEvent,
> @@ -265,14 +284,15 @@ fn do_init_clock_in(
>  unsafe {
>  let cstr = CString::new(name).unwrap();
>  let clk = bindings::qdev_init_clock_in(
> -dev,
> +dev.0.as_mut_ptr(),

This can be simplfied as dev.as_mut_ptr().

>  cstr.as_ptr(),
>  cb,
> -dev.cast::(),
> +dev.0.as_void_ptr(),

If Wrapper provides as_void_ptr(), then this can also be simplified.

>  events.0,
>  );
>  
> -Owned::from(&*clk)
> +let clk: &Clock = Clock::from_raw(clk);
> +Owned::from(clk)
>  }
>  }

LGTM,

Reviewed-by: Zhao Liu

Re: [PATCH 13/15] rust: memory: wrap MemoryRegion with Opaque<>

2025-02-25 Thread Zhao Liu

>  impl MemoryRegion {
>  // inline to ensure that it is not included in tests, which only
> @@ -174,13 +174,15 @@ pub fn init_io>(
>  size: u64,
>  ) {
>  unsafe {
> -Self::do_init_io(&mut self.inner, owner.cast::(), 
> &ops.0, name, size);
> +Self::do_init_io(
> +self.0.as_mut_ptr(),

I'm not sure why the wrapper doesn't work here.

If I change this to `self.as_mut_ptr()`, then compiler tries to apply
`ObjectDeref::as_mut_ptr` and complains:

the trait `qom::ObjectType` is not implemented for `bindings::MemoryRegion`


But when I modify the function signature to &self like:

diff --git a/rust/qemu-api/src/memory.rs b/rust/qemu-api/src/memory.rs
index fdb1ea11fcf9..a82348c4a564 100644
--- a/rust/qemu-api/src/memory.rs
+++ b/rust/qemu-api/src/memory.rs
@@ -167,7 +167,7 @@ unsafe fn do_init_io(
 }

 pub fn init_io>(
-&mut self,
+&self,
 owner: *mut T,
 ops: &'static MemoryRegionOps,
 name: &'static str,

Then the Wrapper's as_mut_ptr() can work.

> +owner.cast::(),
> +&ops.0,
> +name,
> +size,
> +);
>  }
>  }
> -
> -pub(crate) const fn as_mut_ptr(&self) -> *mut bindings::MemoryRegion {
> -addr_of!(self.inner) as *mut _
> -}
>  }
>  
>  unsafe impl ObjectType for MemoryRegion {
> -- 
> 2.48.1
> 
>

Re: [PATCH v7 05/52] i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES

2025-02-25 Thread Xiaoyao Li


On 2/19/2025 3:21 AM, Francesco Lavra wrote:

On Fri, 24 Jan 2025 08:20:01 -0500, Xiaoyao Li wrote:

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 4ff94860815d..bd212abab865 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -10,17 +10,122 @@
   */
  
  #include "qemu/osdep.h"

+#include "qemu/error-report.h"
+#include "qapi/error.h"
  #include "qom/object_interfaces.h"
  
  #include "hw/i386/x86.h"

  #include "kvm_i386.h"
  #include "tdx.h"
  
+static struct kvm_tdx_capabilities *tdx_caps;


Instead of a static variable, this should be a member of the TdxGuest
struct.


I don't think so.

tdx_caps is reported from KVM, which indicates what XFAM/Attributes and 
configurable CPUID bits that can be configured for a TD under the KVM.


It's not the specific properties of the TD.

So I would keep it as it.

Re: [RFC 1/2] system/memory: Allow creating IOMMU mappings from RAM discard populate notifiers

2025-02-25 Thread David Hildenbrand


On 25.02.25 03:00, Chenyi Qiang wrote:



On 2/21/2025 6:04 PM, Chenyi Qiang wrote:



On 2/21/2025 4:09 PM, David Hildenbrand wrote:

On 21.02.25 03:25, Chenyi Qiang wrote:



On 2/21/2025 3:39 AM, David Hildenbrand wrote:

On 20.02.25 17:13, Jean-Philippe Brucker wrote:

For Arm CCA we'd like the guest_memfd discard notifier to call the
IOMMU
notifiers and create e.g. VFIO mappings. The default VFIO discard
notifier isn't sufficient for CCA because the DMA addresses need a
translation (even without vIOMMU).

At the moment:
* guest_memfd_state_change() calls the populate() notifier
* the populate notifier() calls IOMMU notifiers
* the IOMMU notifier handler calls memory_get_xlat_addr() to get a VA
* it calls ram_discard_manager_is_populated() which fails.

guest_memfd_state_change() only changes the section's state after
calling the populate() notifier. We can't easily invert the order of
operation because it uses the old state bitmap to know which pages need
the populate() notifier.


I assume we talk about this code: [1]

[1] https://lkml.kernel.org/r/20250217081833.21568-1-
chenyi.qi...@intel.com


+static int memory_attribute_state_change(MemoryAttributeManager *mgr,
uint64_t offset,
+ uint64_t size, bool
shared_to_private)
+{
+    int block_size = memory_attribute_manager_get_block_size(mgr);
+    int ret = 0;
+
+    if (!memory_attribute_is_valid_range(mgr, offset, size)) {
+    error_report("%s, invalid range: offset 0x%lx, size 0x%lx",
+ __func__, offset, size);
+    return -1;
+    }
+
+    if ((shared_to_private && memory_attribute_is_range_discarded(mgr,
offset, size)) ||
+    (!shared_to_private && memory_attribute_is_range_populated(mgr,
offset, size))) {
+    return 0;
+    }
+
+    if (shared_to_private) {
+    memory_attribute_notify_discard(mgr, offset, size);
+    } else {
+    ret = memory_attribute_notify_populate(mgr, offset, size);
+    }
+
+    if (!ret) {
+    unsigned long first_bit = offset / block_size;
+    unsigned long nbits = size / block_size;
+
+    g_assert((first_bit + nbits) <= mgr->bitmap_size);
+
+    if (shared_to_private) {
+    bitmap_clear(mgr->shared_bitmap, first_bit, nbits);
+    } else {
+    bitmap_set(mgr->shared_bitmap, first_bit, nbits);
+    }
+
+    return 0;
+    }
+
+    return ret;
+}

Then, in memory_attribute_notify_populate(), we walk the bitmap again.

Why?

We just checked that it's all in the expected state, no?


virtio-mem doesn't handle it that way, so I'm curious why we would have
to do it here?


I was concerned about the case where the guest issues a request that
only partial of the range is in the desired state.
I think the main problem is the policy for the guest conversion request.
My current handling is:

1. When a conversion request is made for a range already in the desired
    state, the helper simply returns success.


Yes.


2. For requests involving a range partially in the desired state, only
    the necessary segments are converted, ensuring the entire range
    complies with the request efficiently.



Ah, now I get:

+    if ((shared_to_private && memory_attribute_is_range_discarded(mgr,
offset, size)) ||
+    (!shared_to_private && memory_attribute_is_range_populated(mgr,
offset, size))) {
+    return 0;
+    }
+

We're not failing if it might already partially be in the other state.


3. In scenarios where a conversion request is declined by other systems,
    such as a failure from VFIO during notify_populate(), the helper will
    roll back the request, maintaining consistency.

And the policy of virtio-mem is to refuse the state change if not all
blocks are in the opposite state.


Yes.



Actually, this part is still a uncertain to me.



IIUC, the problem does not exist if we only convert a single page at a
time.

Is there a known use case where such partial conversions could happen?


I don't see such case yet. Actually, I'm trying to follow the behavior
of KVM_SET_MEMORY_ATTRIBUTES ioctl during page conversion. In KVM, it
doesn't reject the request if the whole range isn't in the opposite
state. It just uses xa_store() to update it. Also, I don't see the spec
says how to handle such case. To be robust, I just allow this special case.




BTW, per the status/bitmap track, the virtio-mem also changes the bitmap
after the plug/unplug notifier. This is the same, correct?

Right. But because we reject these partial requests, we don't have to
traverse the bitmap and could just adjust the bitmap operations.


Yes, If we treat it as a guest error/bug, we can adjust it.


Hi David, do you think which option is better? If prefer to reject the
partial requests, I'll change it in my next version.


Hi,

still scratching my head. Having to work around it as in this patch here is
suboptimal.

Could we simplify the whole thing while still allowing for (unexpected) partial
conversions?

Essentially

Re: [PATCH 09/15] rust: irq: wrap IRQState with Opaque<>

2025-02-25 Thread Zhao Liu

On Tue, Feb 25, 2025 at 09:28:52AM +0100, Paolo Bonzini wrote:
> Date: Tue, 25 Feb 2025 09:28:52 +0100
> From: Paolo Bonzini 
> Subject: Re: [PATCH 09/15] rust: irq: wrap IRQState with Opaque<>
> 
> On 2/25/25 09:26, Zhao Liu wrote:
> > > +/// An opaque wrapper around [`bindings::IRQState`].
> > > +#[repr(transparent)]
> > > +#[derive(Debug, qemu_api_macros::Wrapper)]
> > > +pub struct IRQState(Opaque);
> > > +
> > >   /// Interrupt sources are used by devices to pass changes to a value 
> > > (typically
> > >   /// a boolean).  The interrupt sink is usually an interrupt controller 
> > > or
> > >   /// GPIO controller.
> > > @@ -22,8 +28,7 @@
> > >   /// method sends a `true` value to the sink.  If the guest has to see a
> > >   /// different polarity, that change is performed by the board between 
> > > the
> > >   /// device and the interrupt controller.
> > > -pub type IRQState = bindings::IRQState;
> > > -
> > > +///
> > >   /// Interrupts are implemented as a pointer to the interrupt "sink", 
> > > which has
> > >   /// type [`IRQState`].  A device exposes its source as a QOM link 
> > > property using
> > >   /// a function such as [`SysBusDeviceMethods::init_irq`], and
> > > @@ -41,7 +46,7 @@ pub struct InterruptSource
> > >   where
> > >   c_int: From,
> > >   {
> > > -cell: BqlCell<*mut IRQState>,
> > > +cell: BqlCell<*mut bindings::IRQState>,
> > 
> > Once we've already wrapper IRQState in Opaque<>, should we still use
> > bindings::IRQState?
> > 
> > Although InterruptSource just stores a pointer. However, I think we can
> > use wrapped IRQState here instead of the native binding type, since this
> > item is also crossing the FFI boundary. What do you think?
> 
> Using the wrapped IRQState would be a bit more code because you have to cast
> the pointer all the time when calling C code.  As far as correctness is
> concerned, it does not really matter because as you said it only stores a
> pointer.

Yes, it makes sense. This conversion doesn't block the current patch. The
correctness has been guaranteed.

> However, if needed, InterruptSource could have a function that converts from
> IRQState to Option<&Opaque>.  Since the accessor is needed
> anyway, that would be a good place to put the conversion. 

Then I understand we still need `assert!(bql_locked())` in assessor as
the doc said: "it is possible to assert in the code that the right lock
is taken, to use it together with a custom lock guard type, or to let C
code take the lock, as appropriate."

Re: [PATCH v3 03/14] acpi/ghes: Use HEST table offsets when preparing GHES records

2025-02-25 Thread Igor Mammedov

On Fri, 21 Feb 2025 07:02:21 +0100
Mauro Carvalho Chehab  wrote:

> Em Mon, 3 Feb 2025 15:34:23 +0100
> Igor Mammedov  escreveu:
> 
> > On Fri, 31 Jan 2025 18:42:44 +0100
> > Mauro Carvalho Chehab  wrote:
> >   
> > > There are two pointers that are needed during error injection:
> > > 
> > > 1. The start address of the CPER block to be stored;
> > > 2. The address of the ack.
> > > 
> > > It is preferable to calculate them from the HEST table.  This allows
> > > checking the source ID, the size of the table and the type of the
> > > HEST error block structures.
> > > 
> > > Yet, keep the old code, as this is needed for migration purposes.
> > > 
> > > Signed-off-by: Mauro Carvalho Chehab 
> > > ---
> > >  hw/acpi/ghes.c | 132 -
> > >  include/hw/acpi/ghes.h |   1 +
> > >  2 files changed, 119 insertions(+), 14 deletions(-)
> > > 
> > > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > > index 27478f2d5674..8f284fd191a6 100644
> > > --- a/hw/acpi/ghes.c
> > > +++ b/hw/acpi/ghes.c
> > > @@ -41,6 +41,12 @@
> > >  /* Address offset in Generic Address Structure(GAS) */
> > >  #define GAS_ADDR_OFFSET 4
> > >  
> > > +/*
> > > + * ACPI spec 1.0b
> > > + * 5.2.3 System Description Table Header
> > > + */
> > > +#define ACPI_DESC_HEADER_OFFSET 36
> > > +
> > >  /*
> > >   * The total size of Generic Error Data Entry
> > >   * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
> > > @@ -61,6 +67,25 @@
> > >   */
> > >  #define ACPI_GHES_GESB_SIZE 20
> > >  
> > > +/*
> > > + * Offsets with regards to the start of the HEST table stored at
> > > + * ags->hest_addr_le,
> > 
> > If I read this literary, then offsets above are not what
> > declared later in this patch.
> > I'd really drop this comment altogether as it's confusing,
> > and rather get variables/macro naming right
> >   
> > > according with the memory layout map at
> > > + * docs/specs/acpi_hest_ghes.rst.
> > > + */
> > 
> > what we need is update to above doc, describing new and old ways.
> > a separate patch.  
> 
> I can't see anything that should be changed at
> docs/specs/acpi_hest_ghes.rst, as this series doesn't change the
> firmware layout: we're still using two firmware tables:
> 
> - etc/acpi/tables, with HEST on it;
> - etc/hardware_errors, with:
>   - error block addresses;
>   - read_ack registers;
>   - CPER records.
> 
> The only changes that this series introduce are related to how
> the error generation logic navigates between HEST and hw_errors
> firmware. This is not described at acpi_hest_ghes.rst, and both
> ways follow ACPI specs to the letter.
> 
> The only difference is that the code which populates the CPER
> record and the error/read offsets doesn't require to know how
> the HEST table generation placed offsets, as it will basically
> reproduce what OSPM firmware does when handling   HEST events.

section 8 describes old way to get to address to record old CPER,
so it needs to amended to also describe a new approach and say
which way is used for which version.

possibly section 11 might need some messaging as well.


> 
> Thanks,
> Mauro
>

Re: [PATCH v2] vhost-user-snd: correct the calculation of config_size

2025-02-25 Thread Michael Tokarev


17.02.2025 17:07, Stefano Garzarella wrote:

On Mon, Feb 17, 2025 at 02:12:55PM +0100, Matias Ezequiel Vara Larsen wrote:

Use virtio_get_config_size() rather than sizeof(struct
virtio_snd_config) for the config_size in the vhost-user-snd frontend.
The frontend shall rely on device features for the size of the device
configuration space. The presence of `controls` in the config space
depends on VIRTIO_SND_F_CTLS according to the specification (v1.3):
`
5.14.4 Device Configuration Layout
...

controls
(driver-read-only) indicates a total number of all available control
elements if VIRTIO_SND_F_CTLS has been negotiated.
`
This fixes an issue introduced by commit ab0c7fb2 ("linux-headers:
update to current kvm/next") in which the optional field `controls` is
added to the virtio_snd_config structure. This breaks vhost-user-device
backends that do not implement the `controls` field.

Fixes: ab0c7fb22b ("linux-headers: update to current kvm/next")


If we want to backport this on stable branches, better to add:

Cc: qemu-sta...@nongnu.org


There's a problem with stable branches though:


static const Property vsnd_properties[] = {
    DEFINE_PROP_CHR("chardev", VHostUserBase, chardev),
+    DEFINE_PROP_BIT64("controls", VHostUserBase,
+  parent_obj.host_features, VIRTIO_SND_F_CTLS, false),
};


This change introduces an incompatible change in migration stream,
it looks like.  I'm not sure how to handle this for stable branches,
is there a way?

Thanks,

/mjt

[PATCH] chardev: use remoteAddr if the chardev is client

2025-02-25 Thread Haoqian He

If the chardev is client, the socket file path in localAddr may be NULL.
This is because the socket path comes from getsockname(), according
to man page, getsockname() returns the current address bound by the
socket sockfd. If the chardev is client, it's socket is unbound sockfd.

Therefore, when computing the client chardev socket file path, using
remoteAddr is more appropriate.

Signed-off-by: Haoqian He 
---
 chardev/char-socket.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 91496ceda9..2f842f9f88 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -571,9 +571,13 @@ static char *qemu_chr_compute_filename(SocketChardev *s)
 
 switch (ss->ss_family) {
 case AF_UNIX:
-return g_strdup_printf("unix:%s%s",
-   ((struct sockaddr_un *)(ss))->sun_path,
-   s->is_listen ? ",server=on" : "");
+if (s->is_listen) {
+return g_strdup_printf("unix:%s,server=on",
+   ((struct sockaddr_un *)(ss))->sun_path);
+} else {
+return g_strdup_printf("unix:%s",
+   ((struct sockaddr_un *)(ps))->sun_path);
+}
 case AF_INET6:
 left  = "[";
 right = "]";
-- 
2.48.1

Re: [PATCH v4] vfio: Add property documentation

2025-02-25 Thread Anthony Krowiak






On 2/17/25 12:34 PM, Cédric Le Goater wrote:

Investigate the git history to uncover when and why the VFIO
properties were introduced and update the models. This is mostly
targeting vfio-pci device, since vfio-platform, vfio-ap and vfio-ccw
devices are simpler.

Sort the properties based on the QEMU version in which they were
introduced.

Cc: Tony Krowiak 
Cc: Eric Farman 
Cc: Eric Auger 
Signed-off-by: Cédric Le Goater 
---

  Should we introduce documentation for properties like the kernel has
  in Documentation/ABI/*/sysfs-* ?

  Changes in v4:

  - Latest improvements from Alex

  Changes in v3:

  - Re-organized the vfio-pci properties based on the QEMU version in
which they were introduced
  - Added property labels
  - Improved description as suggested by Alex, Tomita and Corvin

  Changes in v2:

  - Fixed version numbers
  - Fixed #ifdef in vfio/ccw.c
  - Addressed vfio-pci-nohotplug
  - Organize the vfio-pci properties in topics

  hw/vfio/ap.c   |   9 
  hw/vfio/ccw.c  |  15 ++
  hw/vfio/pci.c  | 125 +
  hw/vfio/platform.c |  24 +
  4 files changed, 173 insertions(+)

diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
index 
30b08ad375d5ecae886c5000fbaa364799fe76d0..c7ab4ff57ada0ed0e5a76f52b5a05c86ca4fe0b4
 100644
--- a/hw/vfio/ap.c
+++ b/hw/vfio/ap.c
@@ -257,6 +257,15 @@ static void vfio_ap_class_init(ObjectClass *klass, void 
*data)
  dc->hotpluggable = true;
  device_class_set_legacy_reset(dc, vfio_ap_reset);
  dc->bus_type = TYPE_AP_BUS;
+
+object_class_property_set_description(klass, /* 3.1 */
+  "sysfsdev",
+  "Host sysfs path of assigned 
device");
+#ifdef CONFIG_IOMMUFD
+object_class_property_set_description(klass, /* 9.0 */
+  "iommufd",
+  "Set host IOMMUFD backend device");
+#endif
  }
  
  static const TypeInfo vfio_ap_info = {


For ap.c:
Reviewed-by: Anthony Krowiak

Re: [PATCH v2] vhost-user-snd: correct the calculation of config_size

2025-02-25 Thread Stefano Garzarella

On Tue, 25 Feb 2025 at 10:40, Michael Tokarev  wrote:
>
> 17.02.2025 17:07, Stefano Garzarella wrote:
> > On Mon, Feb 17, 2025 at 02:12:55PM +0100, Matias Ezequiel Vara Larsen wrote:
> >> Use virtio_get_config_size() rather than sizeof(struct
> >> virtio_snd_config) for the config_size in the vhost-user-snd frontend.
> >> The frontend shall rely on device features for the size of the device
> >> configuration space. The presence of `controls` in the config space
> >> depends on VIRTIO_SND_F_CTLS according to the specification (v1.3):
> >> `
> >> 5.14.4 Device Configuration Layout
> >> ...
> >>
> >> controls
> >> (driver-read-only) indicates a total number of all available control
> >> elements if VIRTIO_SND_F_CTLS has been negotiated.
> >> `
> >> This fixes an issue introduced by commit ab0c7fb2 ("linux-headers:
> >> update to current kvm/next") in which the optional field `controls` is
> >> added to the virtio_snd_config structure. This breaks vhost-user-device
> >> backends that do not implement the `controls` field.
> >>
> >> Fixes: ab0c7fb22b ("linux-headers: update to current kvm/next")
> >
> > If we want to backport this on stable branches, better to add:
> >
> > Cc: qemu-sta...@nongnu.org
>
> There's a problem with stable branches though:
>
> >> static const Property vsnd_properties[] = {
> >> DEFINE_PROP_CHR("chardev", VHostUserBase, chardev),
> >> +DEFINE_PROP_BIT64("controls", VHostUserBase,
> >> +  parent_obj.host_features, VIRTIO_SND_F_CTLS, false),
> >> };
>
> This change introduces an incompatible change in migration stream,
> it looks like.  I'm not sure how to handle this for stable branches,
> is there a way?

Good point, but in hw/virtio/vhost-user-snd.c IIUC the device is
marked unmigratable:

static const VMStateDescription vu_snd_vmstate = {
.name = "vhost-user-snd",
.unmigratable = 1,
};

So should we care about migration stream for this change?

Thanks,
Stefano

Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support

2025-02-25 Thread Gerd Hoffman

  Hi,

> > See 
> > https://lore.kernel.org/qemu-devel/20250219071431.50626-2-kra...@redhat.com/
> 
> After looking at it, it seems to me that data will be in host byte order
> and guest has no idea what that is.
> Probably it should advertise byteorder as part of the structure,
> and guest side should then do conversion if necessary.  

The structure is already defined and in use by edk2, so changing it is
not an option.  Guest is little endian (there is no big endian UEFI).
So simply declaring the struct fields being little endian byte order
and convert them if needed on the host side makes alot more sense here.
I'll fix the patch accordingly for v5.

take care,
  Gerd

Re: [PATCH 14/15] rust: chardev: wrap Chardev with Opaque<>

2025-02-25 Thread Zhao Liu

On Fri, Feb 21, 2025 at 06:03:41PM +0100, Paolo Bonzini wrote:
> Date: Fri, 21 Feb 2025 18:03:41 +0100
> From: Paolo Bonzini 
> Subject: [PATCH 14/15] rust: chardev: wrap Chardev with Opaque<>
> X-Mailer: git-send-email 2.48.1
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  rust/qemu-api/src/bindings.rs | 3 ---
>  rust/qemu-api/src/chardev.rs  | 8 ++--
>  rust/qemu-api/src/qdev.rs | 1 +
>  3 files changed, 7 insertions(+), 5 deletions(-)

Reviewed-by: Zhao Liu

Re: [PATCH 15/15] rust: bindings: remove more unnecessary Send/Sync impls

2025-02-25 Thread Zhao Liu

On Fri, Feb 21, 2025 at 06:03:42PM +0100, Paolo Bonzini wrote:
> Date: Fri, 21 Feb 2025 18:03:42 +0100
> From: Paolo Bonzini 
> Subject: [PATCH 15/15] rust: bindings: remove more unnecessary Send/Sync
>  impls
> X-Mailer: git-send-email 2.48.1
> 
> Send and Sync are now implemented on the opaque wrappers.  Remove them
> from the bindings module, unless the structs are pure data containers
> and/or have no C functions defined on them.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  rust/qemu-api/src/bindings.rs | 8 ++--
>  1 file changed, 2 insertions(+), 6 deletions(-)

Reviewed-by: Zhao Liu

Re: [PATCH] QIOChannelSocket: Flush zerocopy socket error queue on ENOBUF failure for sendmsg

2025-02-25 Thread Daniel P . Berrangé

On Fri, Feb 21, 2025 at 04:44:48AM -0500, Manish Mishra wrote:
> We allocate extra metadata SKBs in case of zerocopy send. This metadata memory
> is accounted for in the OPTMEM limit. If there is any error with sending
> zerocopy data or if zerocopy was skipped, these metadata SKBs are queued in 
> the
> socket error queue. This error queue is freed when userspace reads it.
> 
> Usually, if there are continuous failures, we merge the metadata into a single
> SKB and free another one. However, if there is any out-of-order processing or
> an intermittent zerocopy failures, this error chain can grow significantly,
> exhausting the OPTMEM limit. As a result, all new sendmsg requests fail to
> allocate any new SKB, leading to an ENOBUF error.

IIUC, you are effectively saying that the migration code is calling
qio_channel_write() too many times, before it calls qio_channel_flush(.)

Can you clarify what yu mean by the "OPTMEM limit" here ? I'm wondering
if this is potentially triggered by suboptimal tuning of the deployment
environment or we need to document tuning better. 

> To workaround this, if we encounter an ENOBUF error with a zerocopy sendmsg,
> we flush the error queue and retry once more.
> 
> Signed-off-by: Manish Mishra 
> ---
>  include/io/channel-socket.h |  1 +
>  io/channel-socket.c | 52 -
>  2 files changed, 46 insertions(+), 7 deletions(-)
> 
> diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
> index ab15577d38..6cfc66eb5b 100644
> --- a/include/io/channel-socket.h
> +++ b/include/io/channel-socket.h
> @@ -49,6 +49,7 @@ struct QIOChannelSocket {
>  socklen_t remoteAddrLen;
>  ssize_t zero_copy_queued;
>  ssize_t zero_copy_sent;
> +bool new_zero_copy_sent_success;
>  };
>  
>  
> diff --git a/io/channel-socket.c b/io/channel-socket.c
> index 608bcf066e..c7f576290f 100644
> --- a/io/channel-socket.c
> +++ b/io/channel-socket.c
> @@ -37,6 +37,11 @@
>  
>  #define SOCKET_MAX_FDS 16
>  
> +#ifdef QEMU_MSG_ZEROCOPY
> +static int qio_channel_socket_flush_internal(QIOChannel *ioc,
> + Error **errp);
> +#endif
> +
>  SocketAddress *
>  qio_channel_socket_get_local_address(QIOChannelSocket *ioc,
>   Error **errp)
> @@ -65,6 +70,7 @@ qio_channel_socket_new(void)
>  sioc->fd = -1;
>  sioc->zero_copy_queued = 0;
>  sioc->zero_copy_sent = 0;
> +sioc->new_zero_copy_sent_success = FALSE;
>  
>  ioc = QIO_CHANNEL(sioc);
>  qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN);
> @@ -566,6 +572,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
>  size_t fdsize = sizeof(int) * nfds;
>  struct cmsghdr *cmsg;
>  int sflags = 0;
> +bool zero_copy_flush_pending = TRUE;
>  
>  memset(control, 0, CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS));
>  
> @@ -612,9 +619,21 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
>  goto retry;
>  case ENOBUFS:
>  if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
> -error_setg_errno(errp, errno,
> - "Process can't lock enough memory for using 
> MSG_ZEROCOPY");
> -return -1;
> +if (zero_copy_flush_pending) {
> +ret = qio_channel_socket_flush_internal(ioc, errp);

Calling this is problematic, because qio_channel_socket_flush is
designed to block execution until all buffers are flushed. When
ioc is in non-blocking mode, this breaks the required semantics
of qio_channel_socket_writev which must return EAGAIN and not
block.

IOW, if we need to be able to flush at this point, we must be
able to do a partial flush, rather than blocking on a full
flush

> +if (ret < 0) {
> +error_setg_errno(errp, errno,
> + "Zerocopy flush failed");
> +return -1;
> +}
> +zero_copy_flush_pending = FALSE;
> +goto retry;
> +} else {
> +error_setg_errno(errp, errno,
> + "Process can't lock enough memory for "
> + "using MSG_ZEROCOPY");
> +return -1;
> +}
>  }
>  break;
>  }
> @@ -725,8 +744,8 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
>  
>  
>  #ifdef QEMU_MSG_ZEROCOPY
> -static int qio_channel_socket_flush(QIOChannel *ioc,
> -Error **errp)
> +static int qio_channel_socket_flush_internal(QIOChannel *ioc,
> + Error **errp)
>  {
>  QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
>  struct msghdr msg = {};
> @@ -791,15 +810,34 @@ static int qio_channel_socket_flush(QIOChannel *ioc,
>  /* No errors, count s

Re: [PATCH] hw/virtio/virtio-nsm: Respond with correct length

2025-02-25 Thread Michael Tokarev


13.02.2025 14:45, Alexander Graf wrote:

When we return a response packet from NSM, we need to indicate its
length according to the content of the response. Prior to this patch, we
returned the length of the source buffer, which may confuse guest code
that relies on the response size.

Fix it by returning the response payload size instead.

Fixes: bb154e3e0cc715 ("device/virtio-nsm: Support for Nitro Secure Module 
device")
Reported-by: Vikrant Garg 
Signed-off-by: Alexander Graf 


This looks like qemu-stable material (9.2.x).
Please let me know if it is not.

Thanks,

/mjt

Re: [PATCH] hw/virtio/virtio-nsm: Respond with correct length

2025-02-25 Thread Michael Tokarev


25.02.2025 12:32, Michael Tokarev wrote:


This looks like qemu-stable material (9.2.x).


Ah, it is already Cc'd to qemu-stable@, -- n/m.

Re: Seeking help on implementing sync over ivshmem shared memory

2025-02-25 Thread Alex Bennée

Jayakrishnan A  writes:

> Hi Team ,
>
> Seeking help on implementing sync over ivshmem shared memory , As part of 
> internal project we could able to achieve
> shared ivshmem with doorbell mechanism ,But in order to achieve shared memory 
> synchronisation we are trying to add
> atomic operation  over this shared memory area variables , Just wanted to 
> analyse whether this atomic variable between
> VMs shared memory will work as expected , If not is there any suggested way 
> ahead to implement synchronisation over
> ivshmem shared memory for threads running in multiple VMs.

Atomic accesses are properly modelled for all QEMU TCG guests that
support MTTCG using the hosts underlying atomic support. The shared
memory region ivshmem_bar2/server_bar2 are just pages shared between the
two QEMU processes so atomic accesses should behave the same way.

I'm unfamiliar with if both sides see the MMIO region but there is no
intrinsic synchronisation for MMIO regions which are terminated by a
MemoryRegionOps structure although vCPUS on the same QEMU will be
serialised by the BQL.

>
> Thanks and Regards , 
> Jayakrishnan A

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH 1/5] hw/pci: Basic support for PCI power management

2025-02-25 Thread Eric Auger

Hi Alex,


On 2/25/25 6:24 AM, Alex Williamson wrote:
> On Mon, 24 Feb 2025 20:03:56 +0100
> Eric Auger  wrote:
>
>> Hi Alex,
>>
>> On 2/20/25 11:48 PM, Alex Williamson wrote:
>>> The memory and IO BARs for devices are only accessible in the D0
>>> power state.  In other power states the PCI spec defines that the
>>> device should respond to TLPs and messages with an Unsupported
>>> Request response.  The closest we can come to emulating this
>>> behavior is to consider the BARs as unmapped, removing them from
>>> the address space.
>>>
>>> Introduce an interface to register the PM capability such that
>>> the device power state is considered relative to the BAR mapping
>>> state.  
>> "the device power state is considered relative to the BAR mapping
>> state."
>> I rather understood that you want the BAR mapping state to depend on the 
>> power state but maybe I misunderstand the langage here.
> Yes, for the BAR to be mapped, both the memory enable state of the
> command register and the device power state are relevant, they are both
> considered.
thanks. That sounds clearer to me.
>
>>> Cc: Michael S. Tsirkin 
>>> Cc: Marcel Apfelbaum 
>>> Signed-off-by: Alex Williamson 
>>> ---
>>>  hw/pci/pci.c| 83 -
>>>  hw/pci/trace-events |  2 +
>>>  include/hw/pci/pci.h|  3 ++
>>>  include/hw/pci/pci_device.h |  3 ++
>>>  4 files changed, 89 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>> index 2afa423925c5..4f2554dd63ac 100644
>>> --- a/hw/pci/pci.c
>>> +++ b/hw/pci/pci.c
>>> @@ -435,6 +435,74 @@ static void pci_msi_trigger(PCIDevice *dev, MSIMessage 
>>> msg)
>>>   attrs, NULL);
>>>  }
>>>  
>>> +int pci_pm_init(PCIDevice *d, uint8_t offset, Error **errp)
>>> +{
>>> +int cap = pci_add_capability(d, PCI_CAP_ID_PM, offset, PCI_PM_SIZEOF, 
>>> errp);
>>> +
>>> +if (cap < 0) {
>>> +return cap;
>>> +}
>>> +
>>> +d->pm_cap = cap;
>>> +d->cap_present |= QEMU_PCI_CAP_PM;
>>> +pci_set_word(d->wmask + cap + PCI_PM_CTRL, PCI_PM_CTRL_STATE_MASK);
>>> +
>>> +return cap;
>>> +}
>>> +
>>> +static uint8_t pci_pm_state(PCIDevice *d)
>>> +{
>>> +uint16_t pmcsr;
>>> +
>>> +if (!(d->cap_present & QEMU_PCI_CAP_PM)) {
>>> +return 0;
>>> +}
>>> +
>>> +pmcsr = pci_get_word(d->config + d->pm_cap + PCI_PM_CTRL);
>>> +
>>> +return pmcsr & PCI_PM_CTRL_STATE_MASK;
>>> +}
>>> +
>>> +static uint8_t pci_pm_update(PCIDevice *d, uint32_t addr, int l, uint8_t 
>>> old)  
>> the old/new terminology sounds weird to me. generally when one updates
>> we pass the new value.
> The new state lives in config space, the old state is recorded before
> config space is updated.  Otherwise we interfere with or duplicate the
> update of config space, which seems error prone and unnecessarily
> complicated.  pci_update_irq_disable() uses the same strategy.
ok
>
>>> +{
>>> +uint16_t pmc;
>>> +uint8_t new;
>>> +
>>> +if (!(d->cap_present & QEMU_PCI_CAP_PM) ||
>>> +!range_covers_byte(addr, l, d->pm_cap + PCI_PM_CTRL)) {
>>> +return old;
>>> +}
>>> +
>>> +new = pci_pm_state(d);  
>> and new sounds to be the current state.
> The state after the guest config write is first applied, yes.
>
>> pci_pm_update() does a kind of validation of the current value? 
> Yes.  The PCI spec indicates that invalid transitions are ignored by
> the device.  That's currently all this does, but if a device wanted a
> callback to effect some behavior at state change, this would be a
> logical place to do that.
ok
>
>>> +if (new == old) {
>>> +return old;
>>> +}
>>> +
>>> +pmc = pci_get_word(d->config + d->pm_cap + PCI_PM_PMC);
>>> +
>>> +/*
>>> + * Transitions to D1 & D2 are only allowed if supported.  Devices may
>>> + * only transition to higher D-states or to D0.  The write is dropped
>>> + * for rejected transitions by restoring the old state.
>>> + */
>>> +if ((!(pmc & PCI_PM_CAP_D1) && new == 1) ||
>>> +(!(pmc & PCI_PM_CAP_D2) && new == 2) ||
>>> +(old && new && new < old)) {
>>> +pci_word_test_and_clear_mask(d->config + d->pm_cap + PCI_PM_CTRL,
>>> + PCI_PM_CTRL_STATE_MASK);
>>> +pci_word_test_and_set_mask(d->config + d->pm_cap + PCI_PM_CTRL,
>>> +   old);
>>> +trace_pci_pm_bad_transition(d->name, pci_dev_bus_num(d),
>>> +PCI_SLOT(d->devfn), PCI_FUNC(d->devfn),
>>> +old, new);  
>> the trace does not output that the old state is kept. This may be helpful.
> I was trying to keep the trace log brief.  The trace indicates it's a
> "bad" transition versus the trace below indicates a successful
> transition.  AIUI, most trace logs require some degree of familiarity
> with the call path to fully comprehend and it seemed unnecessary to
> print th

Re: [PATCH v2] vdpa: Fix endian bugs in shadow virtqueue

2025-02-25 Thread Michael Tokarev


12.02.2025 19:49, Konstantin Shkolnyy wrote:

VDPA didn't work on a big-endian machine due to missing/incorrect
CPU<->LE data format conversions.

Signed-off-by: Konstantin Shkolnyy 


This looks like a qemu-stable material.
Please let me know if it is not.

Thanks,

/mjt

Re: [PATCH 13/15] rust: memory: wrap MemoryRegion with Opaque<>

2025-02-25 Thread Paolo Bonzini


On 2/25/25 10:14, Zhao Liu wrote:

  impl MemoryRegion {
  // inline to ensure that it is not included in tests, which only
@@ -174,13 +174,15 @@ pub fn init_io>(
  size: u64,
  ) {
  unsafe {
-Self::do_init_io(&mut self.inner, owner.cast::(), &ops.0, 
name, size);
+Self::do_init_io(
+self.0.as_mut_ptr(),


I'm not sure why the wrapper doesn't work here.

If I change this to `self.as_mut_ptr()`, then compiler tries to apply
`ObjectDeref::as_mut_ptr` and complains:

the trait `qom::ObjectType` is not implemented for `bindings::MemoryRegion`


It's because the implementation of ObjectDeref for "&mut MemoryRegion" 
wins over coercing "&mut MemoryRegion" to "&MemoryRegion" and then 
calling MemoryRegion::as_mut_ptr().


I added a comment

// self.0.as_mut_ptr() needed because Rust tries to call
// ObjectDeref::as_mut_ptr() on "&mut Self", instead of coercing
// to "&Self" and then calling MemoryRegion::as_mut_ptr().
// Revisit if/when ObjectCastMut is not needed anymore; it is
// only used in a couple places for initialization.

Paolo



But when I modify the function signature to &self like:

diff --git a/rust/qemu-api/src/memory.rs b/rust/qemu-api/src/memory.rs
index fdb1ea11fcf9..a82348c4a564 100644
--- a/rust/qemu-api/src/memory.rs
+++ b/rust/qemu-api/src/memory.rs
@@ -167,7 +167,7 @@ unsafe fn do_init_io(
  }

  pub fn init_io>(
-&mut self,
+&self,
  owner: *mut T,
  ops: &'static MemoryRegionOps,
  name: &'static str,

Then the Wrapper's as_mut_ptr() can work.


+owner.cast::(),
+&ops.0,
+name,
+size,
+);
  }
  }
-
-pub(crate) const fn as_mut_ptr(&self) -> *mut bindings::MemoryRegion {
-addr_of!(self.inner) as *mut _
-}
  }
  
  unsafe impl ObjectType for MemoryRegion {

--
2.48.1

Re: [PATCH v7 08/52] i386/tdx: Initialize TDX before creating TD vcpus

2025-02-25 Thread Xiaoyao Li


On 2/19/2025 6:14 PM, Francesco Lavra wrote:

On Fri, 2025-01-24 at 08:20 -0500, Xiaoyao Li wrote:

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 45867dbe0839..e35a9fbd687e 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -540,8 +540,15 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
  
  trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
  
+    /*

+ * tdx_pre_create_vcpu() may call cpu_x86_cpuid(). It in turn
may call
+ * kvm_vm_ioctl(). Set cpu->kvm_state in advance to avoid NULL
pointer
+ * dereference.
+ */
+    cpu->kvm_state = s;


This assignment should be removed from kvm_create_vcpu(), as now it's
redundant there.


I'll just drop the change in this patch since there is no dependency in 
cpu_x86_cpuid() in current upstream QEMU.



  ret = kvm_arch_pre_create_vcpu(cpu, errp);
  if (ret < 0) {
+    cpu->kvm_state = NULL;


No need to reset cpu->kvm_state to NULL, there already are other error
conditions under which cpu->kvm_state remains initialized.


  goto err;
  }
  
@@ -550,6 +557,7 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)

  error_setg_errno(errp, -ret,
   "kvm_init_vcpu: kvm_create_vcpu failed
(%lu)",
   kvm_arch_vcpu_id(cpu));
+    cpu->kvm_state = NULL;


Same here.

Re: [PATCH 5/5] ui/console-vc: implement DCH (delete) and ICH (insert) commands

2025-02-25 Thread Roman Penyaev

On Mon, Feb 24, 2025 at 8:25 AM Marc-André Lureau
 wrote:
>
> Hi
>
> On Sun, Feb 23, 2025 at 6:56 PM Roman Penyaev  wrote:
> >
> > This patch implements DCH (delete character) and ICH (insert
> > character) commands.
> >
> > DCH - Delete Character:
> >"As characters are deleted, the remaining characters between the
> > cursor and right margin move to the left. Character attributes move
> > with the characters. The terminal adds blank spaces with no visual
> > character attributes at the right margin. DCH has no effect outside
> > the scrolling margins" [1].
> >
> > ICH - Insert Character:
> >"The ICH sequence inserts Pn blank characters with the normal
> > character attribute. The cursor remains at the beginning of the
> > blank characters. Text between the cursor and right margin moves to
> > the right. Characters scrolled past the right margin are lost. ICH
> > has no effect outside the scrolling margins" [2].
> >
> > Without these commands console is barely usable.
> >
> > [1] https://vt100.net/docs/vt510-rm/DCH.html
> > [1] https://vt100.net/docs/vt510-rm/ICH.html
> >
> > Signed-off-by: Roman Penyaev 
> > Cc: "Marc-André Lureau" 
> > Cc: qemu-devel@nongnu.org
> > ---
> >  ui/console-vc.c | 108 +---
> >  1 file changed, 102 insertions(+), 6 deletions(-)
> >
> > diff --git a/ui/console-vc.c b/ui/console-vc.c
> > index 522adc2806c8..bc667897d1bc 100644
> > --- a/ui/console-vc.c
> > +++ b/ui/console-vc.c
> > @@ -598,21 +598,29 @@ static void vc_clear_xy(VCChardev *vc, int x, int y)
> >  vc_update_xy(vc, x, y);
> >  }
> >
> > -static void vc_put_one(VCChardev *vc, int ch)
> > +static void vc_insert_xy(VCChardev *vc, int ch, int x, int y)
> >  {
> >  QemuTextConsole *s = vc->console;
> >  TextCell *c;
> >  int y1;
> > +
> > +y1 = (s->y_base + y) % s->total_height;
> > +c = &s->cells[y1 * s->width + x];
> > +c->ch = ch;
> > +c->t_attrib = vc->t_attrib;
> > +vc_update_xy(vc, x, y);
> > +}
> > +
> > +static void vc_put_one(VCChardev *vc, int ch)
> > +{
> > +QemuTextConsole *s = vc->console;
> > +
> >  if (s->x >= s->width) {
> >  /* line wrap */
> >  s->x = 0;
> >  vc_put_lf(vc);
> >  }
> > -y1 = (s->y_base + s->y) % s->total_height;
> > -c = &s->cells[y1 * s->width + s->x];
> > -c->ch = ch;
> > -c->t_attrib = vc->t_attrib;
> > -vc_update_xy(vc, s->x, s->y);
> > +vc_insert_xy(vc, ch, s->x, s->y);
> >  s->x++;
> >  }
> >
> > @@ -645,6 +653,88 @@ static void vc_set_cursor(VCChardev *vc, int x, int y)
> >  s->y = y;
> >  }
> >
> > +/**
> > + * vc_csi_P() - (DCH) deletes one or more characters from the cursor
> > + * position to the right. As characters are deleted, the remaining
> > + * characters between the cursor and right margin move to the
> > + * left. Character attributes move with the characters.
> > + */
> > +static void vc_csi_P(struct VCChardev *vc, unsigned int nr)
> > +{
> > +QemuTextConsole *s = vc->console;
> > +TextCell *c1, *c2;
> > +unsigned int x1, x2, y;
> > +unsigned int end, len;
> > +
> > +if (!nr) {
> > +nr = 1;
> > +}
> > +if (nr > s->width - s->x) {
> > +nr = s->width - s->x;
> > +if (!nr) {
> > +return;
> > +}
> > +}
> > +
> > +x1 = s->x;
> > +x2 = s->x + nr;
> > +len = s->width - x2;
> > +if (len) {
> > +y = (s->y_base + s->y) % s->total_height;
> > +c1 = &s->cells[y * s->width + x1];
> > +c2 = &s->cells[y * s->width + x2];
> > +memmove(c1, c2, len * sizeof(*c1));
> > +for (end = x1 + len; x1 < end; x1++) {
> > +vc_update_xy(vc, x1, s->y);
> > +}
> > +}
> > +/* Clear the rest */
> > +for (; x1 < s->width; x1++) {
> > +vc_clear_xy(vc, x1, s->y);
> > +}
> > +}
> > +
> > +/**
> > + * vc_csi_at() - (ICH) inserts `nr` blank characters with the normal
> > + * character attribute. The cursor remains at the beginning of the
>
> What is "normal character attribute"?
>
> Should you set it to  TEXT_ATTRIBUTES_DEFAULT while inserting blank chars?

Hi,

The wording "default attributes" is more precise, I like it. The
original VT spec
and ECMA-48 are rather vague about exactly what attributes should be set
for "erased state", but various terminal implementations (including vt.c from
the Linux kernel) simply reset the character to the default state. From the
perspective of the existing console-vc.c API, a more correct call is
vt_clear_xt(),
which clears attributes to default rather than inserting a "space"
with vt_put_xy().
I'll fix that. Thanks.

--
Roman

Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support

2025-02-25 Thread Ani Sinha

On Tue, Feb 25, 2025 at 2:09 PM Gerd Hoffman  wrote:
>
> On Tue, Feb 25, 2025 at 10:51:08AM +0530, Ani Sinha wrote:
> > On Mon, Feb 24, 2025 at 9:17 PM Gerd Hoffman  wrote:
> > >
> > > Works nicely for me.  Test case:
> > >   https://kraxel.gitlab.io/uefi-tools-rs/seabios.efi
> >
> > yeah if I can't get my unit test working we can make this an
> > integration test. or best case scenario, we can have both.
>
> Do you have a branch with your unit test somewhere?
>
> > > Also this adds five fw_cfg files.  Given that the number of fw_cfg files
> > > we can have is limited I'm not convinced this is the best idea to move
> > > forward.
> >
> > Right, For implementation, I suggest we combine FILE_VMFWUPDATE_OBLOB
> > and FILE_VMFWUPDATE_FWBLOB together and also make
> > FILE_VMFWUPDATE_CONTROL part of the same structure. These are all
> > writable by the guest so makes sense to have one fwcfg for it. We can
> > have another read-only fwcfg for FILE_VMFWUPDATE_BIOS_SIZE and
> > FILE_VMFWUPDATE_CAP.
>
> I'd prefer to put everything into one file.  Maybe except the opaque
> blob page.  A struct along the lines of:
>
> struct vmfwupdate {
> u64 capabilities;   // 'cap' file
> u64 firmware_size;  // 'bios-size' file
> u64 control;// disable bit etc.
> u64 update_paddr;   // 'fw-blob' file, paddr field
> u64 update_size;// 'fw-blob' file, size field
> }
>
> On the host side you'll need two copies of the struct then: one where
> the guest can read from and write to, and one shadow struct where the
> actual values are stored.  The write callback goes sanity-check the
> guest-written data, takes everything which passes into the shadow
> struct, finally goes copy back the shadow struct to the guest struct
> so the guest can see what the host has accepted.
>
> Part of the verification process can be that you already copy the
> firmware to a host buffer.

I think we decided early on that we would not want to do that - that
is consume extra memory on the host side for boot components.
right alex?

Best using dma_* functions, these return
> errors in case something went wrong (say the paddr passed is not backed
> by guest ram).  Which gives you the chance to propagate those errors
> back to the guest before it'll actually try to launch the updated
> firmware via reset.
>
> take care,
>   Gerd
>

Re: [PATCH v3 00/14] Change ghes to use HEST-based offsets and add support for error inject

2025-02-25 Thread Igor Mammedov

On Fri, 21 Feb 2025 10:21:27 +
Jonathan Cameron  wrote:

> On Fri, 21 Feb 2025 07:38:23 +0100
> Mauro Carvalho Chehab  wrote:
> 
> > Em Mon, 3 Feb 2025 16:22:36 +0100
> > Igor Mammedov  escreveu:
> >   
> > > On Mon, 3 Feb 2025 11:09:34 +
> > > Jonathan Cameron  wrote:
> > > 
> > > > On Fri, 31 Jan 2025 18:42:41 +0100
> > > > Mauro Carvalho Chehab  wrote:
> > > >   
> > > > > Now that the ghes preparation patches were merged, let's add support
> > > > > for error injection.
> > > > > 
> > > > > On this series, the first 6 patches chang to the math used to 
> > > > > calculate offsets at HEST
> > > > > table and hardware_error firmware file, together with its migration 
> > > > > code. Migration tested
> > > > > with both latest QEMU released kernel and upstream, on both 
> > > > > directions.
> > > > > 
> > > > > The next patches add a new QAPI to allow injecting GHESv2 errors, and 
> > > > > a script using such QAPI
> > > > >to inject ARM Processor Error records.
> > > > > 
> > > > > If I'm counting well, this is the 19th submission of my error inject 
> > > > > patches.
> > > > 
> > > > Looks good to me. All remaining trivial things are in the category
> > > > of things to consider only if you are doing another spin.  The code
> > > > ends up how I'd like it at the end of the series anyway, just
> > > > a question of the precise path to that state!  
> > > 
> > > if you look at series as a whole it's more or less fine (I guess you
> > > and me got used to it)
> > > 
> > > however if you take it patch by patch (as if you've never seen it)
> > > ordering is messed up (the same would apply to everyone after a while
> > > when it's forgotten)
> > > 
> > > So I'd strongly suggest to restructure the series (especially 2-6/14).
> > > re sum up my comments wrt ordering:
> > > 
> > > 0  add testcase for HEST table with current HEST as expected blob
> > >(currently missing), so that we can be sure that we haven't messed
> > >existing tables during refactoring.
> 
> To potentially save time I think Igor is asking that before you do anything
> at all you plug the existing test hole which is that we don't test HEST
> at all.   Even after this series I think we don't test HEST.  You add
> a stub hest and exclusion but then in patch 12 the HEST stub is deleted 
> whereas
> it should be replaced with the example data for the test.

that's what I was saying.
HEST table should be in DSDT, but it's optional and one has to use
'ras=on' option to enable that, which we aren't doing ATM.
So whatever changes are happening we aren't seeing them in tests
nor will we see any regression for the same reason.

While white listing tables before change should happen and then updating them
is the right thing to do, it's not sufficient since none of tests
run with 'ras' enabled, hence code is not actually executed. 

> 
> That indeed doesn't address testing the error data storage which would be
> a different problem.

I'd skip hardware_errors/CPER testing from QEMU unit tests.
That's basically requires functioning 'APEI driver' to test that.

Maybe we can use Ani's framework to parse HEST and all the way
towards CPER record(s) traversal, but that's certainly out of
scope of this series.
It could be done on top, but I won't insist even on that
since Mauro's out of tree error injection testing will
cover that using actual guest (which I assume he would
like to run periodically).

> > 
> > Not sure if I got this one. The HEST table is part of etc/acpi/tables,
> > which is already tested, as you pointed at the previous reviews. Doing
> > changes there is already detected. That's basically why we added patches
> > 10 and 12:
> > 
> > [PATCH v3 10/14] tests/acpi: virt: allow acpi table changes for a new 
> > table: HEST
> > [PATCH v3 12/14] tests/acpi: virt: add a HEST table to aarch64 virt and 
> > update DSDT
> > 
> > What tests don't have is a check for etc/hardware_errors firmware inside 
> > tests/data/acpi/aarch64/virt/, but, IMO, we shouldn't add it there.
> > 
> > See, hardware_errors table contains only some skeleton space to
> > store:
> > 
> > - 1 or more error block address offsets;
> > - 1 or more read ack register;
> > - 1 or more HEST source entries containing CPER blocks.
> > 
> > There's nothing there to be actually checked: it is just some
> > empty spaces with a variable number of fields.
> > 
> > With the new code, the actual number of CPER blocks and their
> > corresponding offsets and read ack registers can be different on 
> > different architectures. So, for instance, when we add x86 support,
> > we'll likely start with just one error source entry, while arm will
> > have two after this changeset.
> > 
> > Also, one possibility to address the issues reported by Gavin Shan at
> > https://lore.kernel.org/qemu-devel/20250214041635.608012-1-gs...@redhat.com/
> > would be to have one entry per each CPU. So, the size of such firmware
> > could be dependent on the

Re: Problem with iotest 233

2025-02-25 Thread Kevin Wolf

Am 25.02.2025 um 08:20 hat Thomas Huth geschrieben:
> 
>  Hi!
> 
> I'm facing a weird hang in iotest 233 on my Fedora 41 laptop. When running
> 
>  ./check -raw 233
> 
> the test simply hangs. Looking at the log, the last message is "== check
> plain client to TLS server fails ==". I added some debug messages, and it
> seems like the previous NBD server is not correctly terminated here.
> The test works fine again if I apply this patch:
> 
> diff --git a/tests/qemu-iotests/common.nbd b/tests/qemu-iotests/common.nbd
> --- a/tests/qemu-iotests/common.nbd
> +++ b/tests/qemu-iotests/common.nbd
> @@ -35,7 +35,7 @@ nbd_server_stop()
>  read NBD_PID < "$nbd_pid_file"
>  rm -f "$nbd_pid_file"
>  if [ -n "$NBD_PID" ]; then
> -kill "$NBD_PID"
> +kill -9 "$NBD_PID"
>  fi
>  fi
>  rm -f "$nbd_unix_socket" "$nbd_stderr_fifo"
> 
> ... but that does not look like the right solution to me. What could prevent
> the qemu-nbd from correctly shutting down when it receives a normal SIGTERM
> signal?

Not sure. In theory, qemu_system_killed() should set state = TERMINATE
and make main_loop_wait() return through the notification, which should
then make it shut down. Maybe you can attach gdb and check what 'state'
is when it hangs and if it's still in the main loop?

I can't reproduce the problem, though I'm on F40. I tried it both on my
working branch and with current git master (b69801dd).

Kevin

Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support

2025-02-25 Thread Gerd Hoffman

  Hi,

> > Part of the verification process can be that you already copy the
> > firmware to a host buffer.
> 
> I think we decided early on that we would not want to do that - that
> is consume extra memory on the host side for boot components.

Fine with me, was just an idea, certainly not critical.  Just have qemu
log errors is probably good enough.

take care,
  Gerd

Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support

2025-02-25 Thread Igor Mammedov

On Mon, 24 Feb 2025 16:47:31 +0100
Gerd Hoffman  wrote:

> On Fri, Feb 14, 2025 at 09:04:07PM +0530, Ani Sinha wrote:
> > VM firmware update is a mechanism where the virtual machines can use their
> > preferred and trusted firmware image in their execution environment without
> > having to depend on a untrusted party to provide the firmware bundle. This 
> > is
> > particularly useful for confidential virtual machines that are deployed in 
> > the
> > cloud where the tenant and the cloud provider are two different entities. In
> > this scenario, virtual machines can bring their own trusted firmware image
> > bundled as a part of their filesystem (using UKIs for example[1]) and then 
> > use
> > this hypervisor interface to update to their trusted firmware image. This 
> > also
> > allows the guests to have a consistent measurements on the firmware image.  
> 
> Works nicely for me.  Test case:
>   https://kraxel.gitlab.io/uefi-tools-rs/seabios.efi
> 
> > +Fw-cfg Files
> > +
> > +
> > +Guests drive vmfwupdate through special ``fw-cfg`` files that control its 
> > flow
> > +followed by a standard system reset operation. When vmfwupdate is 
> > available,
> > +it provides the following ``fw-cfg`` files:
> > +
> > +* ``vmfwupdate/cap`` (``u64``) - Read-only Little Endian encoded bitmap of 
> > additional
> > +* ``vmfwupdate/bios-size`` (``u64``) - Little Endian encoded size of the 
> > BIOS region.
> > +* ``vmfwupdate/opaque`` (``4096 bytes``) - A 4 KiB buffer that survives 
> > the BIOS replacement
> > +* ``vmfwupdate/disable`` (``u8``) - Indicates whether the interface is 
> > disabled.
> > +* ``vmfwupdate/bios-addr`` (``u64``) - A 64bit Little Endian encoded guest 
> > physical address  
> 
> This is out of sync with the actual code (vmfwupdate/bios-addr does not
> exist there).
> 
> Also this adds five fw_cfg files.  Given that the number of fw_cfg files
> we can have is limited I'm not convinced this is the best idea to move
> forward.
> 
> Alternatives I see:
> 
>  (1) Place everything in a single fw_cfg file.  ramfb does this for
>  example, the guest writes a struct with a bunch of values.
> 
>  (2) Go for a mmio register interface.  The EFI variable store I'm
>  working on does this.  fw_cfg is used for hardware discovery,
>  via etc/hardware-info file (which can carry entries for multiple
>  devices).
> 
> See 
> https://lore.kernel.org/qemu-devel/20250219071431.50626-2-kra...@redhat.com/

After looking at it, it seems to me that data will be in host byte order
and guest has no idea what that is.
Probably it should advertise byteorder as part of the structure,
and guest side should then do conversion if necessary.  

> and 
> https://lore.kernel.org/qemu-devel/20250219071431.50626-21-kra...@redhat.com/
> (v4 has a double-free bug in patch #1 which will be fixed in v5 of the
> series).
> 
> take care,
>   Gerd
>

qemu-devel@nongnu.org

2025-02-25 Thread Igor Mammedov

On Fri,  7 Feb 2025 17:20:38 +0100
Igor Mammedov  wrote:

> Changelog:
>   * drop wire/unwire hooks patches
>   * drop unrealize related patches
>   * include fixed up patches from
>[PATCH 0/6] tcg: fix qemu crash when add assert_cpu_is_self() is 
> enabled
> and cleanups related to cpu->created check
> https://patchew.org/QEMU/20250129134436.1240740-1-imamm...@redhat.com/
> as it's related to the topic (well, modulo bsd cleanup)
>   * CI mostly green modulo rust failure on Ubuntu
>  https://gitlab.com/imammedo/qemu/-/pipelines/1660855467
> 
> The goal of this series is to expose vCPUs in a stable state
> to the accelerators, in particular the QDev 'REALIZED' step.
> 
> To do this we split out cpu_index assignment into a separate step,
> and move call cpu_list_add() to the end of CPU realize stage.
> 
> I expect these changes to allow CPUState::cpu_index clarifications
> and simplifications, but this will be addressed (and commented) in
> a separate series.
> 
> As result, the series also
>  * fix regression intoroduced by
>   30933c4fb4f3d ("tcg/cputlb: remove other-cpu capability from TLB 
> flushing")
>for deatials see 'tcg:tlb: use tcg_debug_assert() in assert_cpu_is_self()'
>  * drops no longer needed workaround 'cpu->check' due to vCPU being exposed
>too early in cpus_queue.

Richard,
gentle ping,
can you pick it up is it looks reasonable.
 
> CC: Paolo Bonzini 
> CC: Richard Henderson 
> CC: "Philippe Mathieu-Daudé" 
> CC: Alex Bennée 
> 
> Igor Mammedov (7):
>   bsd-user: drop not longer used target_reset_cpu()
>   loongarch: reset vcpu after it's created
>   m68k: reset vcpu after it's created
>   tcg:tlb: use tcg_debug_assert() in assert_cpu_is_self()
>   Revert "tcg/cputlb: remove other-cpu capability from TLB flushing"
>   tcg: drop cpu->created check
>   cpus: expose only realized vCPUs to global &cpus_queue
> 
> Philippe Mathieu-Daudé (3):
>   accel/tcg: Simplify use of &first_cpu in rr_cpu_thread_fn()
>   accel/kvm: Assert vCPU is created when calling kvm_dirty_ring_reap*()
>   accel/kvm: Remove unreachable assertion in kvm_dirty_ring_reap*()
> 
>  bsd-user/aarch64/target_arch_cpu.h |  5 ---
>  bsd-user/arm/target_arch_cpu.h |  4 ---
>  bsd-user/i386/target_arch_cpu.h|  5 ---
>  bsd-user/riscv/target_arch_cpu.h   |  4 ---
>  bsd-user/x86_64/target_arch_cpu.h  |  5 ---
>  include/hw/core/cpu.h  |  6 
>  accel/kvm/kvm-all.c|  9 --
>  accel/tcg/cputlb.c | 49 +-
>  accel/tcg/tcg-accel-ops-rr.c   | 13 +---
>  cpu-common.c   | 23 --
>  cpu-target.c   |  2 +-
>  hw/core/cpu-common.c   |  2 ++
>  target/loongarch/cpu.c |  2 +-
>  target/m68k/cpu.c  |  2 +-
>  14 files changed, 68 insertions(+), 63 deletions(-)
>

[PATCH 08/10] plugins/api: split out the vaddr/hwaddr helpers

2025-02-25 Thread Alex Bennée

These only work for system-mode and are NOPs for user-mode.

Signed-off-by: Alex Bennée 
---
 plugins/api-system.c | 58 
 plugins/api-user.c   | 40 +
 plugins/api.c| 70 
 plugins/meson.build  |  2 +-
 4 files changed, 99 insertions(+), 71 deletions(-)
 create mode 100644 plugins/api-user.c

diff --git a/plugins/api-system.c b/plugins/api-system.c
index cb0dd8f730..38560de342 100644
--- a/plugins/api-system.c
+++ b/plugins/api-system.c
@@ -12,6 +12,10 @@
 
 #include "qemu/osdep.h"
 #include "qemu/main-loop.h"
+#include "qapi/error.h"
+#include "migration/blocker.h"
+#include "hw/boards.h"
+#include "qemu/plugin-memory.h"
 #include "qemu/plugin.h"
 
 /*
@@ -37,3 +41,57 @@ uint64_t qemu_plugin_entry_code(void)
 {
 return 0;
 }
+
+/*
+ * Virtual Memory queries
+ */
+
+static __thread struct qemu_plugin_hwaddr hwaddr_info;
+
+struct qemu_plugin_hwaddr *qemu_plugin_get_hwaddr(qemu_plugin_meminfo_t info,
+  uint64_t vaddr)
+{
+CPUState *cpu = current_cpu;
+unsigned int mmu_idx = get_mmuidx(info);
+enum qemu_plugin_mem_rw rw = get_plugin_meminfo_rw(info);
+hwaddr_info.is_store = (rw & QEMU_PLUGIN_MEM_W) != 0;
+
+assert(mmu_idx < NB_MMU_MODES);
+
+if (!tlb_plugin_lookup(cpu, vaddr, mmu_idx,
+   hwaddr_info.is_store, &hwaddr_info)) {
+error_report("invalid use of qemu_plugin_get_hwaddr");
+return NULL;
+}
+
+return &hwaddr_info;
+}
+
+bool qemu_plugin_hwaddr_is_io(const struct qemu_plugin_hwaddr *haddr)
+{
+return haddr->is_io;
+}
+
+uint64_t qemu_plugin_hwaddr_phys_addr(const struct qemu_plugin_hwaddr *haddr)
+{
+if (haddr) {
+return haddr->phys_addr;
+}
+return 0;
+}
+
+const char *qemu_plugin_hwaddr_device_name(const struct qemu_plugin_hwaddr *h)
+{
+if (h && h->is_io) {
+MemoryRegion *mr = h->mr;
+if (!mr->name) {
+unsigned maddr = (uintptr_t)mr;
+g_autofree char *temp = g_strdup_printf("anon%08x", maddr);
+return g_intern_string(temp);
+} else {
+return g_intern_string(mr->name);
+}
+} else {
+return g_intern_static_string("RAM");
+}
+}
diff --git a/plugins/api-user.c b/plugins/api-user.c
new file mode 100644
index 00..867b420339
--- /dev/null
+++ b/plugins/api-user.c
@@ -0,0 +1,40 @@
+/*
+ * QEMU Plugin API - user-mode only implementations
+ *
+ * This provides the APIs that have a user-mode specific
+ * implementations or are only relevant to user-mode.
+ *
+ * Copyright (C) 2017, Emilio G. Cota 
+ * Copyright (C) 2019-2025, Linaro
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/plugin.h"
+
+/*
+ * Virtual Memory queries - these are all NOPs for user-mode which
+ * only ever has visibility of virtual addresses.
+ */
+
+struct qemu_plugin_hwaddr *qemu_plugin_get_hwaddr(qemu_plugin_meminfo_t info,
+  uint64_t vaddr)
+{
+return NULL;
+}
+
+bool qemu_plugin_hwaddr_is_io(const struct qemu_plugin_hwaddr *haddr)
+{
+return false;
+}
+
+uint64_t qemu_plugin_hwaddr_phys_addr(const struct qemu_plugin_hwaddr *haddr)
+{
+return 0;
+}
+
+const char *qemu_plugin_hwaddr_device_name(const struct qemu_plugin_hwaddr *h)
+{
+return g_intern_static_string("Invalid");
+}
diff --git a/plugins/api.c b/plugins/api.c
index 12a3b6a66d..b04577424f 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -382,76 +382,6 @@ qemu_plugin_mem_value 
qemu_plugin_mem_get_value(qemu_plugin_meminfo_t info)
 return value;
 }
 
-/*
- * Virtual Memory queries
- */
-
-#ifdef CONFIG_SOFTMMU
-static __thread struct qemu_plugin_hwaddr hwaddr_info;
-#endif
-
-struct qemu_plugin_hwaddr *qemu_plugin_get_hwaddr(qemu_plugin_meminfo_t info,
-  uint64_t vaddr)
-{
-#ifdef CONFIG_SOFTMMU
-CPUState *cpu = current_cpu;
-unsigned int mmu_idx = get_mmuidx(info);
-enum qemu_plugin_mem_rw rw = get_plugin_meminfo_rw(info);
-hwaddr_info.is_store = (rw & QEMU_PLUGIN_MEM_W) != 0;
-
-assert(mmu_idx < NB_MMU_MODES);
-
-if (!tlb_plugin_lookup(cpu, vaddr, mmu_idx,
-   hwaddr_info.is_store, &hwaddr_info)) {
-error_report("invalid use of qemu_plugin_get_hwaddr");
-return NULL;
-}
-
-return &hwaddr_info;
-#else
-return NULL;
-#endif
-}
-
-bool qemu_plugin_hwaddr_is_io(const struct qemu_plugin_hwaddr *haddr)
-{
-#ifdef CONFIG_SOFTMMU
-return haddr->is_io;
-#else
-return false;
-#endif
-}
-
-uint64_t qemu_plugin_hwaddr_phys_addr(const struct qemu_plugin_hwaddr *haddr)
-{
-#ifdef CONFIG_SOFTMMU
-if (haddr) {
-return haddr->phys_addr;
-}
-#endif
-return 0;
-}
-
-const char *qemu_plugin_hwaddr_device_name(const struct qemu_plugin_hwaddr *h)
-{
-#ifdef CONFIG_SOFTM

[PATCH 04/10] plugins/api: clean-up the includes

2025-02-25 Thread Alex Bennée

Thanks to re-factoring and clean-up work (especially to exec-all) we
no longer need such broad headers for the api.

Signed-off-by: Alex Bennée 
---
 plugins/api.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/plugins/api.c b/plugins/api.c
index 10b258b08d..3e1aac7bfb 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -39,9 +39,7 @@
 #include "qemu/main-loop.h"
 #include "qemu/plugin.h"
 #include "qemu/log.h"
-#include "qemu/timer.h"
 #include "tcg/tcg.h"
-#include "exec/exec-all.h"
 #include "exec/gdbstub.h"
 #include "exec/translation-block.h"
 #include "exec/translator.h"
@@ -50,7 +48,6 @@
 #ifndef CONFIG_USER_ONLY
 #include "qapi/error.h"
 #include "migration/blocker.h"
-#include "exec/ram_addr.h"
 #include "qemu/plugin-memory.h"
 #include "hw/boards.h"
 #else
-- 
2.39.5

[PATCH 01/10] plugins/api: use tcg_ctx to get TARGET_PAGE_MASK

2025-02-25 Thread Alex Bennée

Requiring TARGET_PAGE_MASK to be defined gets in the way of building
this unit once. As tcg_ctx has the value lets use it.

Signed-off-by: Alex Bennée 
---
 plugins/api.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/plugins/api.c b/plugins/api.c
index cf8cdf076a..10b258b08d 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -287,7 +287,7 @@ uint64_t qemu_plugin_insn_vaddr(const struct 
qemu_plugin_insn *insn)
 void *qemu_plugin_insn_haddr(const struct qemu_plugin_insn *insn)
 {
 const DisasContextBase *db = tcg_ctx->plugin_db;
-vaddr page0_last = db->pc_first | ~TARGET_PAGE_MASK;
+vaddr page0_last = db->pc_first | ~tcg_ctx->page_mask;
 
 if (db->fake_insn) {
 return NULL;
-- 
2.39.5

[PATCH 10/10] plugins/api: build only once

2025-02-25 Thread Alex Bennée

Now all the softmmu/user-mode stuff has been split out we can build
this compilation unit only once.

Signed-off-by: Alex Bennée 
---
 plugins/api.c   | 11 ---
 plugins/meson.build |  3 +--
 2 files changed, 1 insertion(+), 13 deletions(-)

diff --git a/plugins/api.c b/plugins/api.c
index 61480d3dc1..a292b5bff3 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -45,17 +45,6 @@
 #include "exec/translator.h"
 #include "disas/disas.h"
 #include "plugin.h"
-#ifndef CONFIG_USER_ONLY
-#include "qapi/error.h"
-#include "migration/blocker.h"
-#include "qemu/plugin-memory.h"
-#include "hw/boards.h"
-#else
-#include "qemu.h"
-#ifdef CONFIG_LINUX
-#include "loader.h"
-#endif
-#endif
 
 /* Uninstall and Reset handlers */
 
diff --git a/plugins/meson.build b/plugins/meson.build
index 942b59e904..d27220d5ff 100644
--- a/plugins/meson.build
+++ b/plugins/meson.build
@@ -61,9 +61,8 @@ endif
 user_ss.add(files('user.c', 'api-user.c'))
 system_ss.add(files('system.c', 'api-system.c'))
 
-common_ss.add(files('loader.c'))
+common_ss.add(files('loader.c', 'api.c'))
 
 specific_ss.add(files(
   'core.c',
-  'api.c',
 ))
-- 
2.39.5

[PATCH 05/10] plugins/plugin.h: include queue.h

2025-02-25 Thread Alex Bennée

Headers should bring in what they need so don't rely on getting
queue.h by side effects. This will help with clean-ups in the
following patches.

Signed-off-by: Alex Bennée 
---
 plugins/plugin.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/plugins/plugin.h b/plugins/plugin.h
index 30e2299a54..9ed20b5c41 100644
--- a/plugins/plugin.h
+++ b/plugins/plugin.h
@@ -13,6 +13,7 @@
 #define PLUGIN_H
 
 #include 
+#include "qemu/queue.h"
 #include "qemu/qht.h"
 
 #define QEMU_PLUGIN_MIN_VERSION 2
-- 
2.39.5

[PATCH 00/10] plugins: reduce total number of build objects

2025-02-25 Thread Alex Bennée

As we move towards a more modular build this series converts both
loader and api to build once objects. For both objects the only real
difference is between user mode and system emulation so those bits
have been hived off into those source sets.

The remaining core plugin is more intimately aligned with the TCG
backend so requires definitions like TCG_TARGET_LONG. Hopefully this
can been cleaned up once Richards TCG rationalisation code is added.

Please review.

Alex.

Alex Bennée (10):
  plugins/api: use tcg_ctx to get TARGET_PAGE_MASK
  plugins/loader: populate target_name with target_name()
  include/qemu: plugin-memory.h doesn't need cpu-defs.h
  plugins/api: clean-up the includes
  plugins/plugin.h: include queue.h
  plugins/loader: compile loader only once
  plugins/api: split out binary path/start/end/entry code
  plugins/api: split out the vaddr/hwaddr helpers
  plugins/api: split out time control helpers
  plugins/api: build only once

 include/qemu/plugin-memory.h |   1 -
 plugins/plugin.h |   7 ++
 linux-user/plugin-api.c  |  43 +
 plugins/api-system.c | 131 +++
 plugins/api-user.c   |  57 
 plugins/api.c| 170 +--
 plugins/loader.c |  15 +---
 plugins/system.c |  24 +
 plugins/user.c   |  19 
 linux-user/meson.build   |   1 +
 plugins/meson.build  |   8 +-
 11 files changed, 292 insertions(+), 184 deletions(-)
 create mode 100644 linux-user/plugin-api.c
 create mode 100644 plugins/api-system.c
 create mode 100644 plugins/api-user.c
 create mode 100644 plugins/system.c
 create mode 100644 plugins/user.c

-- 
2.39.5

[PATCH 02/10] plugins/loader: populate target_name with target_name()

2025-02-25 Thread Alex Bennée

We have a function we can call for this, lets not rely on macros that
stop us building once.

Signed-off-by: Alex Bennée 
---
 plugins/loader.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/plugins/loader.c b/plugins/loader.c
index 99686b5466..827473c8b6 100644
--- a/plugins/loader.c
+++ b/plugins/loader.c
@@ -297,7 +297,7 @@ int qemu_plugin_load_list(QemuPluginList *head, Error 
**errp)
 struct qemu_plugin_desc *desc, *next;
 g_autofree qemu_info_t *info = g_new0(qemu_info_t, 1);
 
-info->target_name = TARGET_NAME;
+info->target_name = target_name();
 info->version.min = QEMU_PLUGIN_MIN_VERSION;
 info->version.cur = QEMU_PLUGIN_VERSION;
 #ifndef CONFIG_USER_ONLY
-- 
2.39.5

[PATCH 03/10] include/qemu: plugin-memory.h doesn't need cpu-defs.h

2025-02-25 Thread Alex Bennée

hwaddr is a fixed size on all builds.

Signed-off-by: Alex Bennée 
---
 include/qemu/plugin-memory.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/qemu/plugin-memory.h b/include/qemu/plugin-memory.h
index 71c1123308..6065ec7aaf 100644
--- a/include/qemu/plugin-memory.h
+++ b/include/qemu/plugin-memory.h
@@ -9,7 +9,6 @@
 #ifndef PLUGIN_MEMORY_H
 #define PLUGIN_MEMORY_H
 
-#include "exec/cpu-defs.h"
 #include "exec/hwaddr.h"
 
 struct qemu_plugin_hwaddr {
-- 
2.39.5

[PATCH 07/10] plugins/api: split out binary path/start/end/entry code

2025-02-25 Thread Alex Bennée

To move the main api.c to a single build compilation object we need to
start splitting out user and system specific code. As we need to grob
around host headers we move these particular helpers into the *-user
mode directories.

The binary/start/end/entry helpers are all NOPs for system mode.

Signed-off-by: Alex Bennée 
---
 linux-user/plugin-api.c | 43 +
 plugins/api-system.c| 39 +
 plugins/api.c   | 43 -
 linux-user/meson.build  |  1 +
 plugins/meson.build |  2 +-
 5 files changed, 84 insertions(+), 44 deletions(-)
 create mode 100644 linux-user/plugin-api.c
 create mode 100644 plugins/api-system.c

diff --git a/linux-user/plugin-api.c b/linux-user/plugin-api.c
new file mode 100644
index 00..42e977c339
--- /dev/null
+++ b/linux-user/plugin-api.c
@@ -0,0 +1,43 @@
+/*
+ * QEMU Plugin API - linux-user-mode only implementations
+ *
+ * Common user-mode only APIs are in plugins/api-user. These helpers
+ * are only specific to linux-user.
+ *
+ * Copyright (C) 2017, Emilio G. Cota 
+ * Copyright (C) 2019-2025, Linaro
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/main-loop.h"
+#include "qemu/plugin.h"
+#include "qemu.h"
+
+/*
+ * Binary path, start and end locations. Host specific due to TaskState.
+ */
+const char *qemu_plugin_path_to_binary(void)
+{
+TaskState *ts = get_task_state(current_cpu);
+return g_strdup(ts->bprm->filename);
+}
+
+uint64_t qemu_plugin_start_code(void)
+{
+TaskState *ts = get_task_state(current_cpu);
+return ts->info->start_code;
+}
+
+uint64_t qemu_plugin_end_code(void)
+{
+TaskState *ts = get_task_state(current_cpu);
+return ts->info->end_code;
+}
+
+uint64_t qemu_plugin_entry_code(void)
+{
+TaskState *ts = get_task_state(current_cpu);
+return ts->info->entry;
+}
diff --git a/plugins/api-system.c b/plugins/api-system.c
new file mode 100644
index 00..cb0dd8f730
--- /dev/null
+++ b/plugins/api-system.c
@@ -0,0 +1,39 @@
+/*
+ * QEMU Plugin API - System specific implementations
+ *
+ * This provides the APIs that have a specific system implementation
+ * or are only relevant to system-mode.
+ *
+ * Copyright (C) 2017, Emilio G. Cota 
+ * Copyright (C) 2019-2025, Linaro
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/main-loop.h"
+#include "qemu/plugin.h"
+
+/*
+ * In system mode we cannot trace the binary being executed so the
+ * helpers all return NULL/0.
+ */
+const char *qemu_plugin_path_to_binary(void)
+{
+return NULL;
+}
+
+uint64_t qemu_plugin_start_code(void)
+{
+return 0;
+}
+
+uint64_t qemu_plugin_end_code(void)
+{
+return 0;
+}
+
+uint64_t qemu_plugin_entry_code(void)
+{
+return 0;
+}
diff --git a/plugins/api.c b/plugins/api.c
index 3e1aac7bfb..12a3b6a66d 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -470,49 +470,6 @@ bool qemu_plugin_bool_parse(const char *name, const char 
*value, bool *ret)
 return name && value && qapi_bool_parse(name, value, ret, NULL);
 }
 
-/*
- * Binary path, start and end locations
- */
-const char *qemu_plugin_path_to_binary(void)
-{
-char *path = NULL;
-#ifdef CONFIG_USER_ONLY
-TaskState *ts = get_task_state(current_cpu);
-path = g_strdup(ts->bprm->filename);
-#endif
-return path;
-}
-
-uint64_t qemu_plugin_start_code(void)
-{
-uint64_t start = 0;
-#ifdef CONFIG_USER_ONLY
-TaskState *ts = get_task_state(current_cpu);
-start = ts->info->start_code;
-#endif
-return start;
-}
-
-uint64_t qemu_plugin_end_code(void)
-{
-uint64_t end = 0;
-#ifdef CONFIG_USER_ONLY
-TaskState *ts = get_task_state(current_cpu);
-end = ts->info->end_code;
-#endif
-return end;
-}
-
-uint64_t qemu_plugin_entry_code(void)
-{
-uint64_t entry = 0;
-#ifdef CONFIG_USER_ONLY
-TaskState *ts = get_task_state(current_cpu);
-entry = ts->info->entry;
-#endif
-return entry;
-}
-
 /*
  * Create register handles.
  *
diff --git a/linux-user/meson.build b/linux-user/meson.build
index f75b4fe0e3..f47a213ca3 100644
--- a/linux-user/meson.build
+++ b/linux-user/meson.build
@@ -27,6 +27,7 @@ linux_user_ss.add(libdw)
 linux_user_ss.add(when: 'TARGET_HAS_BFLT', if_true: files('flatload.c'))
 linux_user_ss.add(when: 'TARGET_I386', if_true: files('vm86.c'))
 linux_user_ss.add(when: 'CONFIG_ARM_COMPATIBLE_SEMIHOSTING', if_true: 
files('semihost.c'))
+linux_user_ss.add(when: 'CONFIG_TCG_PLUGINS', if_true: files('plugin-api.c'))
 
 syscall_nr_generators = {}
 
diff --git a/plugins/meson.build b/plugins/meson.build
index f7820806d3..9c9bc9e5bb 100644
--- a/plugins/meson.build
+++ b/plugins/meson.build
@@ -59,7 +59,7 @@ if host_os == 'windows'
 endif
 
 user_ss.add(files('user.c'))
-system_ss.add(files('system.c'))
+system_ss.add(files('system.c', 'api-system.c'))
 
 common_ss.add(files('loader.c'))
 
-- 
2.39.5

[PATCH 06/10] plugins/loader: compile loader only once

2025-02-25 Thread Alex Bennée

There is very little in loader that is different between builds save
for a tiny user/system mode difference in the plugin_info structure.
Create two new files, user and system to hold mode specific helpers
and move loader into common_ss.

Signed-off-by: Alex Bennée 
---
 plugins/plugin.h|  6 ++
 plugins/loader.c| 13 ++---
 plugins/system.c| 24 
 plugins/user.c  | 19 +++
 plugins/meson.build |  7 ++-
 5 files changed, 57 insertions(+), 12 deletions(-)
 create mode 100644 plugins/system.c
 create mode 100644 plugins/user.c

diff --git a/plugins/plugin.h b/plugins/plugin.h
index 9ed20b5c41..6fbc443b96 100644
--- a/plugins/plugin.h
+++ b/plugins/plugin.h
@@ -119,4 +119,10 @@ struct qemu_plugin_scoreboard 
*plugin_scoreboard_new(size_t element_size);
 
 void plugin_scoreboard_free(struct qemu_plugin_scoreboard *score);
 
+/**
+ * qemu_plugin_fillin_mode_info() - populate mode specific info
+ * info: pointer to qemu_info_t structure
+ */
+void qemu_plugin_fillin_mode_info(qemu_info_t *info);
+
 #endif /* PLUGIN_H */
diff --git a/plugins/loader.c b/plugins/loader.c
index 827473c8b6..7523d554f0 100644
--- a/plugins/loader.c
+++ b/plugins/loader.c
@@ -31,9 +31,6 @@
 #include "qemu/memalign.h"
 #include "hw/core/cpu.h"
 #include "exec/tb-flush.h"
-#ifndef CONFIG_USER_ONLY
-#include "hw/boards.h"
-#endif
 
 #include "plugin.h"
 
@@ -300,14 +297,8 @@ int qemu_plugin_load_list(QemuPluginList *head, Error 
**errp)
 info->target_name = target_name();
 info->version.min = QEMU_PLUGIN_MIN_VERSION;
 info->version.cur = QEMU_PLUGIN_VERSION;
-#ifndef CONFIG_USER_ONLY
-MachineState *ms = MACHINE(qdev_get_machine());
-info->system_emulation = true;
-info->system.smp_vcpus = ms->smp.cpus;
-info->system.max_vcpus = ms->smp.max_cpus;
-#else
-info->system_emulation = false;
-#endif
+
+qemu_plugin_fillin_mode_info(info);
 
 QTAILQ_FOREACH_SAFE(desc, head, entry, next) {
 int err;
diff --git a/plugins/system.c b/plugins/system.c
new file mode 100644
index 00..b3ecc33ba5
--- /dev/null
+++ b/plugins/system.c
@@ -0,0 +1,24 @@
+/*
+ * QEMU Plugin system-emulation helpers
+ *
+ * Helpers that are specific to system emulation.
+ *
+ * Copyright (C) 2017, Emilio G. Cota 
+ * Copyright (C) 2019-2025, Linaro
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/plugin.h"
+#include "hw/boards.h"
+
+#include "plugin.h"
+
+void qemu_plugin_fillin_mode_info(qemu_info_t *info)
+{
+MachineState *ms = MACHINE(qdev_get_machine());
+info->system_emulation = true;
+info->system.smp_vcpus = ms->smp.cpus;
+info->system.max_vcpus = ms->smp.max_cpus;
+}
diff --git a/plugins/user.c b/plugins/user.c
new file mode 100644
index 00..250d542502
--- /dev/null
+++ b/plugins/user.c
@@ -0,0 +1,19 @@
+/*
+ * QEMU Plugin user-mode helpers
+ *
+ * Helpers that are specific to user-mode.
+ *
+ * Copyright (C) 2017, Emilio G. Cota 
+ * Copyright (C) 2019-2025, Linaro
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/plugin.h"
+#include "plugin.h"
+
+void qemu_plugin_fillin_mode_info(qemu_info_t *info)
+{
+info->system_emulation = false;
+}
diff --git a/plugins/meson.build b/plugins/meson.build
index d60be2a4d6..f7820806d3 100644
--- a/plugins/meson.build
+++ b/plugins/meson.build
@@ -57,8 +57,13 @@ if host_os == 'windows'
 command: dlltool_cmd
   )
 endif
+
+user_ss.add(files('user.c'))
+system_ss.add(files('system.c'))
+
+common_ss.add(files('loader.c'))
+
 specific_ss.add(files(
-  'loader.c',
   'core.c',
   'api.c',
 ))
-- 
2.39.5

[PATCH 09/10] plugins/api: split out time control helpers

2025-02-25 Thread Alex Bennée

These are only usable in system mode where we control the timer. For
user-mode make them NOPs.

Signed-off-by: Alex Bennée 
---
 plugins/api-system.c | 34 ++
 plugins/api-user.c   | 17 +
 plugins/api.c| 41 -
 3 files changed, 51 insertions(+), 41 deletions(-)

diff --git a/plugins/api-system.c b/plugins/api-system.c
index 38560de342..cc190b167e 100644
--- a/plugins/api-system.c
+++ b/plugins/api-system.c
@@ -95,3 +95,37 @@ const char *qemu_plugin_hwaddr_device_name(const struct 
qemu_plugin_hwaddr *h)
 return g_intern_static_string("RAM");
 }
 }
+
+/*
+ * Time control
+ */
+static bool has_control;
+static Error *migration_blocker;
+
+const void *qemu_plugin_request_time_control(void)
+{
+if (!has_control) {
+has_control = true;
+error_setg(&migration_blocker,
+   "TCG plugin time control does not support migration");
+migrate_add_blocker(&migration_blocker, NULL);
+return &has_control;
+}
+return NULL;
+}
+
+static void advance_virtual_time__async(CPUState *cpu, run_on_cpu_data data)
+{
+int64_t new_time = data.host_ulong;
+qemu_clock_advance_virtual_time(new_time);
+}
+
+void qemu_plugin_update_ns(const void *handle, int64_t new_time)
+{
+if (handle == &has_control) {
+/* Need to execute out of cpu_exec, so bql can be locked. */
+async_run_on_cpu(current_cpu,
+ advance_virtual_time__async,
+ RUN_ON_CPU_HOST_ULONG(new_time));
+}
+}
diff --git a/plugins/api-user.c b/plugins/api-user.c
index 867b420339..28704a89e8 100644
--- a/plugins/api-user.c
+++ b/plugins/api-user.c
@@ -12,6 +12,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/plugin.h"
+#include "exec/log.h"
 
 /*
  * Virtual Memory queries - these are all NOPs for user-mode which
@@ -38,3 +39,19 @@ const char *qemu_plugin_hwaddr_device_name(const struct 
qemu_plugin_hwaddr *h)
 {
 return g_intern_static_string("Invalid");
 }
+
+/*
+ * Time control - for user mode the only real time is wall clock time
+ * so realistically all you can do in user mode is slow down execution
+ * which doesn't require the ability to mess with the clock.
+ */
+
+const void *qemu_plugin_request_time_control(void)
+{
+return NULL;
+}
+
+void qemu_plugin_update_ns(const void *handle, int64_t new_time)
+{
+qemu_log_mask(LOG_UNIMP, "user-mode can't control time");
+}
diff --git a/plugins/api.c b/plugins/api.c
index b04577424f..61480d3dc1 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -525,44 +525,3 @@ uint64_t qemu_plugin_u64_sum(qemu_plugin_u64 entry)
 return total;
 }
 
-/*
- * Time control
- */
-static bool has_control;
-#ifdef CONFIG_SOFTMMU
-static Error *migration_blocker;
-#endif
-
-const void *qemu_plugin_request_time_control(void)
-{
-if (!has_control) {
-has_control = true;
-#ifdef CONFIG_SOFTMMU
-error_setg(&migration_blocker,
-   "TCG plugin time control does not support migration");
-migrate_add_blocker(&migration_blocker, NULL);
-#endif
-return &has_control;
-}
-return NULL;
-}
-
-#ifdef CONFIG_SOFTMMU
-static void advance_virtual_time__async(CPUState *cpu, run_on_cpu_data data)
-{
-int64_t new_time = data.host_ulong;
-qemu_clock_advance_virtual_time(new_time);
-}
-#endif
-
-void qemu_plugin_update_ns(const void *handle, int64_t new_time)
-{
-#ifdef CONFIG_SOFTMMU
-if (handle == &has_control) {
-/* Need to execute out of cpu_exec, so bql can be locked. */
-async_run_on_cpu(current_cpu,
- advance_virtual_time__async,
- RUN_ON_CPU_HOST_ULONG(new_time));
-}
-#endif
-}
-- 
2.39.5

Re: [PATCH v2] hw/misc/macio/gpio.c: Add constants for register bits

2025-02-25 Thread Mark Cave-Ayland


On 24/02/2025 14:10, BALATON Zoltan wrote:


Add named constants for register bit values that should make it easier
to understand what these mean.

Signed-off-by: BALATON Zoltan 
Reviewed-by: Philippe Mathieu-Daudé 
---
  hw/misc/macio/gpio.c | 19 ---
  1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/hw/misc/macio/gpio.c b/hw/misc/macio/gpio.c
index 4364afc84a..e87bfca1f5 100644
--- a/hw/misc/macio/gpio.c
+++ b/hw/misc/macio/gpio.c
@@ -34,6 +34,11 @@
  #include "qemu/module.h"
  #include "trace.h"
  
+enum MacioGPIORegisterBits {

+OUT_DATA   = 1,
+IN_DATA= 2,
+OUT_ENABLE = 4,
+};
  
  void macio_set_gpio(MacIOGPIOState *s, uint32_t gpio, bool state)

  {
@@ -41,14 +46,14 @@ void macio_set_gpio(MacIOGPIOState *s, uint32_t gpio, bool 
state)
  
  trace_macio_set_gpio(gpio, state);
  
-if (s->gpio_regs[gpio] & 4) {

+if (s->gpio_regs[gpio] & OUT_ENABLE) {
  qemu_log_mask(LOG_GUEST_ERROR,
"GPIO: Setting GPIO %d while it's an output\n", gpio);
  }
  
-new_reg = s->gpio_regs[gpio] & ~2;

+new_reg = s->gpio_regs[gpio] & ~IN_DATA;
  if (state) {
-new_reg |= 2;
+new_reg |= IN_DATA;
  }
  
  if (new_reg == s->gpio_regs[gpio]) {

@@ -107,12 +112,12 @@ static void macio_gpio_write(void *opaque, hwaddr addr, 
uint64_t value,
  
  addr -= 8;

  if (addr < 36) {
-value &= ~2;
+value &= ~IN_DATA;
  
-if (value & 4) {

-ibit = (value & 1) << 1;
+if (value & OUT_ENABLE) {
+ibit = (value & OUT_DATA) << 1;
  } else {
-ibit = s->gpio_regs[addr] & 2;
+ibit = s->gpio_regs[addr] & IN_DATA;
  }
  
  s->gpio_regs[addr] = value | ibit;


Reviewed-by: Mark Cave-Ayland 


ATB,

Mark.

[PATCH V4] migration: ram block cpr blockers

2025-02-25 Thread Steve Sistare

Unlike cpr-reboot mode, cpr-transfer mode cannot save volatile ram blocks
in the migration stream file and recreate them later, because the physical
memory for the blocks is pinned and registered for vfio.  Add a blocker
for volatile ram blocks.

Also add a blocker for RAM_GUEST_MEMFD.  Preserving guest_memfd may be
sufficient for CPR, but it has not been tested yet.

Signed-off-by: Steve Sistare 
Reviewed-by: Fabiano Rosas 
Reviewed-by: Peter Xu 
Reviewed-by: David Hildenbrand 
---
 include/exec/memory.h   |  3 +++
 include/exec/ramblock.h |  1 +
 migration/savevm.c  |  2 ++
 system/physmem.c| 68 +
 4 files changed, 74 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 9f73b59..ea5d33a 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -3184,6 +3184,9 @@ bool ram_block_discard_is_disabled(void);
  */
 bool ram_block_discard_is_required(void);
 
+void ram_block_add_cpr_blocker(RAMBlock *rb, Error **errp);
+void ram_block_del_cpr_blocker(RAMBlock *rb);
+
 #endif
 
 #endif
diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 0babd10..64484cd 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -39,6 +39,7 @@ struct RAMBlock {
 /* RCU-enabled, writes protected by the ramlist lock */
 QLIST_ENTRY(RAMBlock) next;
 QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers;
+Error *cpr_blocker;
 int fd;
 uint64_t fd_offset;
 int guest_memfd;
diff --git a/migration/savevm.c b/migration/savevm.c
index bc375db..85a3559 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -3315,12 +3315,14 @@ void vmstate_register_ram(MemoryRegion *mr, DeviceState 
*dev)
 qemu_ram_set_idstr(mr->ram_block,
memory_region_name(mr), dev);
 qemu_ram_set_migratable(mr->ram_block);
+ram_block_add_cpr_blocker(mr->ram_block, &error_fatal);
 }
 
 void vmstate_unregister_ram(MemoryRegion *mr, DeviceState *dev)
 {
 qemu_ram_unset_idstr(mr->ram_block);
 qemu_ram_unset_migratable(mr->ram_block);
+ram_block_del_cpr_blocker(mr->ram_block);
 }
 
 void vmstate_register_ram_global(MemoryRegion *mr)
diff --git a/system/physmem.c b/system/physmem.c
index 67c9db9..4ffa977 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -70,7 +70,10 @@
 
 #include "qemu/pmem.h"
 
+#include "qapi/qapi-types-migration.h"
+#include "migration/blocker.h"
 #include "migration/cpr.h"
+#include "migration/options.h"
 #include "migration/vmstate.h"
 
 #include "qemu/range.h"
@@ -1899,6 +1902,14 @@ static void ram_block_add(RAMBlock *new_block, Error 
**errp)
 qemu_mutex_unlock_ramlist();
 goto out_free;
 }
+
+error_setg(&new_block->cpr_blocker,
+   "Memory region %s uses guest_memfd, "
+   "which is not supported with CPR.",
+   memory_region_name(new_block->mr));
+migrate_add_blocker_modes(&new_block->cpr_blocker, errp,
+  MIG_MODE_CPR_TRANSFER,
+  -1);
 }
 
 ram_size = (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS;
@@ -4059,3 +4070,60 @@ bool ram_block_discard_is_required(void)
 return qatomic_read(&ram_block_discard_required_cnt) ||
qatomic_read(&ram_block_coordinated_discard_required_cnt);
 }
+
+/*
+ * Return true if ram is compatible with CPR.  Do not exclude rom,
+ * because the rom file could change in new QEMU.
+ */
+static bool ram_is_cpr_compatible(RAMBlock *rb)
+{
+MemoryRegion *mr = rb->mr;
+
+if (!mr || !memory_region_is_ram(mr)) {
+return true;
+}
+
+/* Ram device is remapped in new QEMU */
+if (memory_region_is_ram_device(mr)) {
+return true;
+}
+
+/* Named files are remapped in new QEMU, same contents if shared (no COW) 
*/
+if (qemu_ram_is_shared(rb) && qemu_ram_is_named_file(rb)) {
+return true;
+}
+
+/* A file descriptor is remapped in new QEMU */
+if (rb->fd >= 0 && qemu_ram_is_shared(rb)) {
+return true;
+}
+
+return false;
+}
+
+/*
+ * Add a blocker for each volatile ram block.  This function should only be
+ * called after we know that the block is migratable.  Non-migratable blocks
+ * are either re-created in new QEMU, or are handled specially, or are covered
+ * by a device-level CPR blocker.
+ */
+void ram_block_add_cpr_blocker(RAMBlock *rb, Error **errp)
+{
+assert(qemu_ram_is_migratable(rb));
+
+if (ram_is_cpr_compatible(rb)) {
+return;
+}
+
+error_setg(&rb->cpr_blocker,
+   "Memory region %s is not compatible with CPR. share=on is "
+   "required for memory-backend objects, and aux-ram-share=on is "
+   "required.", memory_region_name(rb->mr));
+migrate_add_blocker_modes(&rb->cpr_blocker, errp, MIG_MODE_CPR_TRANSFER,
+  -1);
+}
+
+void ram_block_del_cpr_blocker(RAM

[PATCH v1] virtio-gpu-virgl: Correct virgl_cmd_context_create()

2025-02-25 Thread liweishi

From: Weishi Li 

Due to the fact that g->parent_obj.conf only adds
VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED setting when
VIRGL_VERSION_MAJOR >= 1, virgl_cmd_comtext_create()
will always return by error=VIRTIO_GPU_RESP_ERR_UNSPEC
when VIRGL_VERSION_MAJOR < 1, resulting in gl context
initialization failure.

Signed-off-by: Weishi Li 
---
 hw/display/virtio-gpu-virgl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 145a0b3879..48f6121e16 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -338,6 +338,7 @@ static void virgl_cmd_context_create(VirtIOGPU *g,
 cc.debug_name);
 
 if (cc.context_init) {
+#if VIRGL_VERSION_MAJOR >= 1
 if (!virtio_gpu_context_init_enabled(g->parent_obj.conf)) {
 qemu_log_mask(LOG_GUEST_ERROR, "%s: context_init disabled",
   __func__);
@@ -345,7 +346,6 @@ static void virgl_cmd_context_create(VirtIOGPU *g,
 return;
 }
 
-#if VIRGL_VERSION_MAJOR >= 1
 virgl_renderer_context_create_with_flags(cc.hdr.ctx_id,
  cc.context_init,
  cc.nlen,
-- 
2.25.1

RE: [PATCH v3 01/28] hw/intc/aspeed: Support setting different memory and register size

2025-02-25 Thread Jamin Lin

Hi Cedric, 

> >> and the register array as:
> >>
> >> uint32_t regs[ASPEED_INTC_NR_REGS];
> >>
> >> The number of regs looks pretty big for me. Are the registers
> >> covering the whole MMIO aperture ?
> >>
> > According to the datasheet, the entire register address space size of
> > INTC (CPU DIE) is 16KB (0x1210-0x12103FFF). Therefore, I set the
> memory size to 0x4000.
> > Currently, we need to use the "GICINT192-201 raw status and clear" register
> INTC1B04.
> > Thus, an array size of 0x2000 is sufficient.
> 
> yes but we are only using these regs :
> 
> REG32(GICINT128_EN, 0x1000)
> REG32(GICINT128_STATUS, 0x1004)
> REG32(GICINT129_EN, 0x1100)
> REG32(GICINT129_STATUS, 0x1104)
> REG32(GICINT130_EN, 0x1200)
> REG32(GICINT130_STATUS, 0x1204)
> REG32(GICINT131_EN, 0x1300)
> REG32(GICINT131_STATUS, 0x1304)
> REG32(GICINT132_EN, 0x1400)
> REG32(GICINT132_STATUS, 0x1404)
> REG32(GICINT133_EN, 0x1500)
> REG32(GICINT133_STATUS, 0x1504)
> REG32(GICINT134_EN, 0x1600)
> REG32(GICINT134_STATUS, 0x1604)
> REG32(GICINT135_EN, 0x1700)
> REG32(GICINT135_STATUS, 0x1704)
> REG32(GICINT136_EN, 0x1800)
> REG32(GICINT136_STATUS, 0x1804)
> REG32(GICINT192_201_EN, 0x1B00)
> REG32(GICINT192_201_STATUS, 0x1B04)
> 
> so the first 0x1000 are unused.
> 
> 
> >
> > However, we are going to increase the size to 0x3000 to support the
> > co-processors SSP and TSP In the another patch series.
> > We also need to include the following register definitions:
> >
> > /* SSP INTC Registers */
> > REG32(SSPINT128_EN, 0x2000)
> > REG32(SSPINT128_STATUS, 0x2004)
> > REG32(SSPINT129_EN, 0x2100)
> > REG32(SSPINT129_STATUS, 0x2104)
> > REG32(SSPINT130_EN, 0x2200)
> > REG32(SSPINT130_STATUS, 0x2204)
> > REG32(SSPINT131_EN, 0x2300)
> > REG32(SSPINT131_STATUS, 0x2304)
> > REG32(SSPINT132_EN, 0x2400)
> > REG32(SSPINT132_STATUS, 0x2404)
> > REG32(SSPINT133_EN, 0x2500)
> > REG32(SSPINT133_STATUS, 0x2504)
> > REG32(SSPINT134_EN, 0x2600)
> > REG32(SSPINT134_STATUS, 0x2604)
> > REG32(SSPINT135_EN, 0x2700)
> > REG32(SSPINT135_STATUS, 0x2704)
> > REG32(SSPINT136_EN, 0x2800)
> > REG32(SSPINT136_STATUS, 0x2804)
> > REG32(SSPINT137_EN, 0x2900)
> > REG32(SSPINT137_STATUS, 0x2904)
> > REG32(SSPINT138_EN, 0x2A00)
> > REG32(SSPINT138_STATUS, 0x2A04)
> > REG32(SSPINT160_169_EN, 0x2B00)
> > REG32(SSPINT160_169_STATUS, 0x2B04)
> >
> >>
> >>> +if (offset >= aic->reg_size) {
> >>
> >> This is dead code since the MMIO aperture has the same size. You
> >> could remove the check.
> >
> > Will remove.
> >>
> >>>qemu_log_mask(LOG_GUEST_ERROR,
> >>>  "%s: Out-of-bounds read at offset 0x%"
> >> HWADDR_PRIx "\n",
> >>>  __func__, offset); @@ -143,7 +144,7 @@
> >> static
> >>> void aspeed_intc_write(void *opaque, hwaddr offset, uint64_t data,
> >>>uint32_t change;
> >>>uint32_t irq;
> >>>
> >>> -if (addr >= ASPEED_INTC_NR_REGS) {
> >>> +if (offset >= aic->reg_size) {
> >>>qemu_log_mask(LOG_GUEST_ERROR,
> >>>  "%s: Out-of-bounds write at offset 0x%"
> >> HWADDR_PRIx "\n",
> >>>  __func__, offset); @@ -302,10 +303,16
> @@
> >>> static void aspeed_intc_realize(DeviceState *dev, Error **errp)
> >>>AspeedINTCClass *aic = ASPEED_INTC_GET_CLASS(s);
> >>>int i;
> >>>
> >>> +memory_region_init(&s->iomem_container, OBJECT(s),
> >>> +TYPE_ASPEED_INTC ".container", aic->mem_size);
> >>> +
> >>> +sysbus_init_mmio(sbd, &s->iomem_container);
> >>
> >> Why introduce a container ? Do you plan to have multiple sub-regions ?
> >>
> > I created a container to save the entire register address space of
> > INTC and its sub-region to place the actual used register address space.
> > 1210-12103fff (prio 0, i/o): aspeed.intc.container
> >1210-12101fff (prio 0, i/o):
> > aspeed.intc.regs 14c18000-14c183ff (prio 0, i/o):
> aspeed.intc.container
> >14c18000-14c183d7 (prio 0, i/o):
> > aspeed.intc.regs
> >
> > If I misunderstood this design, I will change it to have two memory regions
> for INTC and INTCIO, respectively.
> > If need, I will change to the following memory regions.  --> it removes
> containers.
> >1210-12101fff (prio 0, i/o):
> aspeed.intc.regs
> >14c18000-14c183d7 (prio 0, i/o):
> > aspeed.intc.regs
> 
> I don't think the region container is useful in that case since there is only 
> a
> single region per model.
> 
> However, we could "optimize" a bit the MMIO apertures to avoid mapping
> huge unused gaps and only m

[PATCH v2 2/5] pci: Use PCI PM capability initializer

2025-02-25 Thread Alex Williamson

Switch callers directly initializing the PCI PM capability with
pci_add_capability() to use pci_pm_init().

Cc: Dmitry Fleytman 
Cc: Akihiko Odaki 
Cc: Jason Wang 
Cc: Stefan Weil 
Cc: Sriram Yagnaraman 
Cc: Keith Busch 
Cc: Klaus Jensen 
Cc: Jesper Devantier 
Cc: Michael S. Tsirkin 
Cc: Marcel Apfelbaum 
Cc: Cédric Le Goater 
Signed-off-by: Alex Williamson 
---
 hw/net/e1000e.c | 3 +--
 hw/net/eepro100.c   | 4 +---
 hw/net/igb.c| 3 +--
 hw/nvme/ctrl.c  | 3 +--
 hw/pci-bridge/pcie_pci_bridge.c | 2 +-
 hw/vfio/pci.c   | 7 ++-
 hw/virtio/virtio-pci.c  | 3 +--
 7 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
index f637853073e2..b72cbab7e889 100644
--- a/hw/net/e1000e.c
+++ b/hw/net/e1000e.c
@@ -372,8 +372,7 @@ static int
 e1000e_add_pm_capability(PCIDevice *pdev, uint8_t offset, uint16_t pmc)
 {
 Error *local_err = NULL;
-int ret = pci_add_capability(pdev, PCI_CAP_ID_PM, offset,
- PCI_PM_SIZEOF, &local_err);
+int ret = pci_pm_init(pdev, offset, &local_err);
 
 if (local_err) {
 error_report_err(local_err);
diff --git a/hw/net/eepro100.c b/hw/net/eepro100.c
index 6d853229aec2..29a39865a608 100644
--- a/hw/net/eepro100.c
+++ b/hw/net/eepro100.c
@@ -551,9 +551,7 @@ static void e100_pci_reset(EEPRO100State *s, Error **errp)
 if (info->power_management) {
 /* Power Management Capabilities */
 int cfg_offset = 0xdc;
-int r = pci_add_capability(&s->dev, PCI_CAP_ID_PM,
-   cfg_offset, PCI_PM_SIZEOF,
-   errp);
+int r = pci_pm_init(&s->dev, cfg_offset, errp);
 if (r < 0) {
 return;
 }
diff --git a/hw/net/igb.c b/hw/net/igb.c
index 4d93ce629f95..700dbc746d3d 100644
--- a/hw/net/igb.c
+++ b/hw/net/igb.c
@@ -356,8 +356,7 @@ static int
 igb_add_pm_capability(PCIDevice *pdev, uint8_t offset, uint16_t pmc)
 {
 Error *local_err = NULL;
-int ret = pci_add_capability(pdev, PCI_CAP_ID_PM, offset,
- PCI_PM_SIZEOF, &local_err);
+int ret = pci_pm_init(pdev, offset, &local_err);
 
 if (local_err) {
 error_report_err(local_err);
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 68903d1d7067..1faea3d2b85b 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -8503,8 +8503,7 @@ static int nvme_add_pm_capability(PCIDevice *pci_dev, 
uint8_t offset)
 Error *err = NULL;
 int ret;
 
-ret = pci_add_capability(pci_dev, PCI_CAP_ID_PM, offset,
- PCI_PM_SIZEOF, &err);
+ret = pci_pm_init(pci_dev, offset, &err);
 if (err) {
 error_report_err(err);
 return ret;
diff --git a/hw/pci-bridge/pcie_pci_bridge.c b/hw/pci-bridge/pcie_pci_bridge.c
index fd4514a595ce..9fa656b43b42 100644
--- a/hw/pci-bridge/pcie_pci_bridge.c
+++ b/hw/pci-bridge/pcie_pci_bridge.c
@@ -52,7 +52,7 @@ static void pcie_pci_bridge_realize(PCIDevice *d, Error 
**errp)
 goto cap_error;
 }
 
-pos = pci_add_capability(d, PCI_CAP_ID_PM, 0, PCI_PM_SIZEOF, errp);
+pos = pci_pm_init(d, 0, errp);
 if (pos < 0) {
 goto pm_error;
 }
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 89d900e9cf0c..1a4a0b4b15b4 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2216,7 +2216,12 @@ static bool vfio_add_std_cap(VFIOPCIDevice *vdev, 
uint8_t pos, Error **errp)
 case PCI_CAP_ID_PM:
 vfio_check_pm_reset(vdev, pos);
 vdev->pm_cap = pos;
-ret = pci_add_capability(pdev, cap_id, pos, size, errp) >= 0;
+ret = pci_pm_init(pdev, pos, errp) >= 0;
+/*
+ * PCI-core config space emulation needs write access to the power
+ * state enabled for tracking BAR mapping relative to PM state.
+ */
+pci_set_word(pdev->wmask + pos + PCI_PM_CTRL, PCI_PM_CTRL_STATE_MASK);
 break;
 case PCI_CAP_ID_AF:
 vfio_check_af_flr(vdev, pos);
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index c773a9130c7e..afe8b5551c5c 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -2204,8 +2204,7 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error 
**errp)
 pos = pcie_endpoint_cap_init(pci_dev, 0);
 assert(pos > 0);
 
-pos = pci_add_capability(pci_dev, PCI_CAP_ID_PM, 0,
- PCI_PM_SIZEOF, errp);
+pos = pci_pm_init(pci_dev, 0, errp);
 if (pos < 0) {
 return;
 }
-- 
2.48.1

[PATCH v2 3/5] vfio/pci: Delete local pm_cap

2025-02-25 Thread Alex Williamson

This is now redundant to PCIDevice.pm_cap.

Cc: Cédric Le Goater 
Reviewed-by: Zhenzhong Duan 
Reviewed-by: Eric Auger 
Signed-off-by: Alex Williamson 
---
 hw/vfio/pci.c | 9 -
 hw/vfio/pci.h | 1 -
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 1a4a0b4b15b4..eab8974e9b48 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2215,7 +2215,6 @@ static bool vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t 
pos, Error **errp)
 break;
 case PCI_CAP_ID_PM:
 vfio_check_pm_reset(vdev, pos);
-vdev->pm_cap = pos;
 ret = pci_pm_init(pdev, pos, errp) >= 0;
 /*
  * PCI-core config space emulation needs write access to the power
@@ -2412,17 +2411,17 @@ void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
 vfio_disable_interrupts(vdev);
 
 /* Make sure the device is in D0 */
-if (vdev->pm_cap) {
+if (pdev->pm_cap) {
 uint16_t pmcsr;
 uint8_t state;
 
-pmcsr = vfio_pci_read_config(pdev, vdev->pm_cap + PCI_PM_CTRL, 2);
+pmcsr = vfio_pci_read_config(pdev, pdev->pm_cap + PCI_PM_CTRL, 2);
 state = pmcsr & PCI_PM_CTRL_STATE_MASK;
 if (state) {
 pmcsr &= ~PCI_PM_CTRL_STATE_MASK;
-vfio_pci_write_config(pdev, vdev->pm_cap + PCI_PM_CTRL, pmcsr, 2);
+vfio_pci_write_config(pdev, pdev->pm_cap + PCI_PM_CTRL, pmcsr, 2);
 /* vfio handles the necessary delay here */
-pmcsr = vfio_pci_read_config(pdev, vdev->pm_cap + PCI_PM_CTRL, 2);
+pmcsr = vfio_pci_read_config(pdev, pdev->pm_cap + PCI_PM_CTRL, 2);
 state = pmcsr & PCI_PM_CTRL_STATE_MASK;
 if (state) {
 error_report("vfio: Unable to power on device, stuck in D%d",
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 43c166680abb..d638c781f6f1 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -160,7 +160,6 @@ struct VFIOPCIDevice {
 int32_t bootindex;
 uint32_t igd_gms;
 OffAutoPCIBAR msix_relo;
-uint8_t pm_cap;
 uint8_t nv_gpudirect_clique;
 bool pci_aer;
 bool req_enabled;
-- 
2.48.1

Re: [PATCH] hw/misc/macio: Improve trace logs

2025-02-25 Thread Mark Cave-Ayland


On 22/02/2025 12:28, BALATON Zoltan wrote:


Add macio_gpio_read trace event and use that in macio_gpio_read()
instead of macio_gpio_write. Also change log message to match
macio_timer_{read,write}.

Signed-off-by: BALATON Zoltan 
---
  hw/misc/macio/gpio.c   | 2 +-
  hw/misc/macio/trace-events | 3 ++-
  2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/misc/macio/gpio.c b/hw/misc/macio/gpio.c
index 7cad62819a..4364afc84a 100644
--- a/hw/misc/macio/gpio.c
+++ b/hw/misc/macio/gpio.c
@@ -135,7 +135,7 @@ static uint64_t macio_gpio_read(void *opaque, hwaddr addr, 
unsigned size)
  }
  }
  
-trace_macio_gpio_write(addr, val);

+trace_macio_gpio_read(addr, val);
  return val;
  }
  
diff --git a/hw/misc/macio/trace-events b/hw/misc/macio/trace-events

index ad4b9d1c08..055a407aeb 100644
--- a/hw/misc/macio/trace-events
+++ b/hw/misc/macio/trace-events
@@ -18,7 +18,8 @@ macio_timer_read(uint64_t addr, unsigned len, uint32_t val) "read 
addr 0x%"PRIx6
  macio_set_gpio(int gpio, bool state) "setting GPIO %d to %d"
  macio_gpio_irq_assert(int gpio) "asserting GPIO %d"
  macio_gpio_irq_deassert(int gpio) "deasserting GPIO %d"
-macio_gpio_write(uint64_t addr, uint64_t val) "addr: 0x%"PRIx64" value: 
0x%"PRIx64
+macio_gpio_write(uint64_t addr, uint64_t val) "addr 0x%"PRIx64" val 0x%"PRIx64
+macio_gpio_read(uint64_t addr, uint64_t val) "addr 0x%"PRIx64" val 0x%"PRIx64
  
  # pmu.c

  pmu_adb_poll(int olen) "ADB autopoll, olen=%d"


Reviewed-by: Mark Cave-Ayland 


ATB,

Mark.

Re: [PATCH] chardev: use remoteAddr if the chardev is client

2025-02-25 Thread Haoqian He



> 2025年2月25日 21:29，Marc-André Lureau  写道：
> 
> Hi Haoqian
> 
> On Tue, Feb 25, 2025 at 5:19 PM Haoqian He  wrote:
>> 
>> I use chardev to connect with a vhost-user target, the chardev backend type 
>> is
>> socket, part of QEMU boot parameter:
>>  -device vhost-user-blk-pci,chardev=my-vhost-blk-0,id=my-vhost-blk-0,\
>>  bus=pci.1,addr=0x1,bootindex=1,num-queues=4 \
>>  -chardev socket,id=my-vhost-blk-0,path=/tmp/vhost-blk.1
>> 
>> I want to log the chardev’s socket path when socket successfully connected
>> in tcp_chr_connect (chr->filename), but i got "unix:" instead of the expected
>> "unix:/tmp/vhost-blk.1".
>> 
>> The chr->filename was computed from the function qemu_chr_compute_filename,
>> which always return the unix path store by local QIOChannelSocket localAddr.
>> 
>> Unfortunately, the localAddr is obtained via getsockname (see the function
>> qio_channel_socket_set_fd), and according to the man page:
>> "getsockname() returns the current address bound by the socket sockfd".
>> 
>> In this scenario, the socket client's sockfd is unbonded, so the socket
>> filename in localAddr(ss) is NULL.
>> As a solution, we can use remoteAddr(ps) obtained by getpeername (see the
>> function qio_channel_socket_set_fd).
>> 
> 
> thanks a lot, I didn't notice ps != ss in the patch. That makes sense.
> 
> Reviewed-by: Marc-André Lureau 
> 
> 
> 
> -- 
> Marc-André Lureau

Thanks, Lureau.

--
Haoqian

Re: [PATCH v3 012/162] tcg: Convert or to TCGOutOpBinary

2025-02-25 Thread Philippe Mathieu-Daudé


On 17/2/25 00:07, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
  tcg/tcg.c|  4 +++
  tcg/aarch64/tcg-target.c.inc | 31 -
  tcg/arm/tcg-target.c.inc | 24 
  tcg/i386/tcg-target.c.inc| 25 +
  tcg/loongarch64/tcg-target.c.inc | 29 
  tcg/mips/tcg-target.c.inc| 25 -
  tcg/ppc/tcg-target.c.inc | 29 
  tcg/riscv/tcg-target.c.inc   | 29 
  tcg/s390x/tcg-target.c.inc   | 47 +---
  tcg/sparc64/tcg-target.c.inc | 23 
  tcg/tci/tcg-target.c.inc | 14 --
  11 files changed, 186 insertions(+), 94 deletions(-)




diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc




+static void tgen_ori(TCGContext *s, TCGType type,
+ TCGReg a0, TCGReg a1, tcg_target_long a2)
+{
+int rexw = type == TCG_TYPE_I32 ? 0 : P_REXW;
+tgen_arithi(s, ARITH_OR + rexw, a0, a2, 0);


s/0/false/


+}




diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc




+static const TCGOutOpBinary outop_or = {
+.base.static_constraint = C_O1_I2(r, r, rU),


So 32-bit gets s/i/U/ which is TCG_CT_CONST_U32, a no-op, OK.


+.out_rrr = tgen_or,
+.out_rri = tgen_ori,
+};




-case INDEX_op_or_i32:
  case INDEX_op_xor_i32:
  case INDEX_op_orc_i32:
  case INDEX_op_eqv_i32:
@@ -4172,7 +4180,6 @@ tcg_target_op_def(TCGOpcode op, TCGType type, unsigned 
flags)
  
  case INDEX_op_sub_i32:

  return C_O1_I2(r, rI, ri);
-case INDEX_op_or_i64:
  case INDEX_op_xor_i64:
  return C_O1_I2(r, r, rU);
  case INDEX_op_sub_i64:


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 03/10] qapi: delete un-needed python static analysis configs

2025-02-25 Thread Markus Armbruster

John Snow  writes:

> The pylint config is being left in place because the settings differ
> enough from the python/ directory settings that we need a chit-chat on
> how to merge them O:-)
>
> Everything else can go.
>
> Signed-off-by: John Snow 
> ---
>  scripts/qapi/.flake8| 3 ---
>  scripts/qapi/.isort.cfg | 7 ---
>  scripts/qapi/mypy.ini   | 4 
>  3 files changed, 14 deletions(-)
>  delete mode 100644 scripts/qapi/.flake8
>  delete mode 100644 scripts/qapi/.isort.cfg
>  delete mode 100644 scripts/qapi/mypy.ini
>
> diff --git a/scripts/qapi/.flake8 b/scripts/qapi/.flake8
> deleted file mode 100644
> index a873ff67309..000
> --- a/scripts/qapi/.flake8
> +++ /dev/null
> @@ -1,3 +0,0 @@
> -[flake8]
> -# Prefer pylint's bare-except checks to flake8's
> -extend-ignore = E722

python/setup.cfg has:

   [flake8]
   # Prefer pylint's bare-except checks to flake8's
   extend-ignore = E722
   exclude = __pycache__,

Good.

> diff --git a/scripts/qapi/.isort.cfg b/scripts/qapi/.isort.cfg
> deleted file mode 100644
> index 643caa1fbd6..000
> --- a/scripts/qapi/.isort.cfg
> +++ /dev/null
> @@ -1,7 +0,0 @@
> -[settings]
> -force_grid_wrap=4
> -force_sort_within_sections=True
> -include_trailing_comma=True
> -line_length=72
> -lines_after_imports=2
> -multi_line_output=3

python/setup.cfg has:

   [isort]
   force_grid_wrap=4
   force_sort_within_sections=True
   include_trailing_comma=True
   line_length=72
   lines_after_imports=2
   multi_line_output=3

Good.

> diff --git a/scripts/qapi/mypy.ini b/scripts/qapi/mypy.ini
> deleted file mode 100644
> index 8109470a031..000
> --- a/scripts/qapi/mypy.ini
> +++ /dev/null
> @@ -1,4 +0,0 @@
> -[mypy]
> -strict = True
> -disallow_untyped_calls = False
> -python_version = 3.8

python/setup.cfg has:

   [mypy]
   strict = True
   python_version = 3.8
   warn_unused_configs = True
   namespace_packages = True
   warn_unused_ignores = False

Can you briefly explain the differences?

python/setup.cfg additionally has a bunch of ignore_missing_imports that
don't apply here, as far as I can tell.

Re: [PATCH v2] qapi: pluggable backend code generators

2025-02-25 Thread Markus Armbruster

Daniel P. Berrangé  writes:

> On Tue, Feb 25, 2025 at 01:31:56PM +0100, Markus Armbruster wrote:
>> Daniel P. Berrangé  writes:
>> 
>> > The 'qapi.backend.QAPIBackend' class defines an API contract for code
>> > generators. The current generator is put into a new class
>> > 'qapi.backend.QAPICBackend' and made to be the default impl.
>> >
>> > A custom generator can be requested using the '-k' arg which takes a
>> 
>> Missed an instance of -k.  Can fix this myself.
>> 
>> > fully qualified python class name
>> >
>> >qapi-gen.py -B the.python.module.QAPIMyBackend
>> >
>> > This allows out of tree code to use the QAPI generator infrastructure
>> > to create new language bindings for QAPI schemas. This has the caveat
>> > that the QAPI generator APIs are not guaranteed stable, so consumers
>> > of this feature may have to update their code to be compatible with
>> > future QEMU releases.
>> >
>> > Signed-off-by: Daniel P. Berrangé 
>> > ---
>
>> > -def generate(schema_file: str,
>> > - output_dir: str,
>> > - prefix: str,
>> > - unmask: bool = False,
>> > - builtins: bool = False,
>> > - gen_tracing: bool = False) -> None:
>> > -"""
>> > -Generate C code for the given schema into the target directory.
>> > +def create_backend(path: str) -> QAPIBackend:
>> > +if path is None:
>> > +return QAPICBackend()
>> >  
>> > -:param schema_file: The primary QAPI schema file.
>> > -:param output_dir: The output directory to store generated code.
>> > -:param prefix: Optional C-code prefix for symbol names.
>> > -:param unmask: Expose non-ABI names through introspection?
>> > -:param builtins: Generate code for built-in types?
>> > +if "." not in path:
>> > +print(f"Missing qualified module path in '{path}'", 
>> > file=sys.stderr)
>> > +sys.exit(1)
>> >  
>> > -:raise QAPIError: On failures.
>> > -"""
>> > -assert invalid_prefix_char(prefix) is None
>> > +module_path, _, class_name = path.rpartition('.')
>> 
>> I'd like to tweak this to
>> 
>>module_path, dot, class_name = path.rpartition('.')
>>if not dot:
>>print(f"argument of -B must be of the form MODULE.CLASS",
>>  file=sys.stderr)
>>sys.exit(1)
>
> Yep, sure thing.
>
>> 
>> > +try:
>> > +mod = import_module(module_path)
>> > +except Exception as ex:
>> 
>> pylint complains:
>> 
>> scripts/qapi/main.py:39:11: W0718: Catching too general exception 
>> Exception (broad-exception-caught)
>> 
>> I can't see offhand what exception(s) we're supposed to catch here, so
>> let's not worry about this now.
>
> Yeah, I was concerned that by putting specialized exception
> classes here, we would miss some possible scenarios, and IMHO
> the completeness of catching Exception is better than the
> technical purity of pylint's complaint. 

Cleaner code would require a stronger contract.

We'll be just fine.

>> > +if not hasattr(mod, class_name):
>> > +print(f"Module '{module_path}' has no class '{class_name}'", 
>> > file=sys.stderr)
>> 
>> pycodestyle-3 complains:
>> 
>> scripts/qapi/main.py:44:80: E501 line too long (85 > 79 characters)
>> 
>> Let's break the line after the comma, and also start the error message
>> in lower case for consistency with error messages elsewhere.
>> 
>> > +sys.exit(1)
>> > +klass = getattr(mod, class_name)
>> 
>> Alternatively
>> 
>>try:
>>klass = getattr(mod, class_name)
>>except AttributeError:
>>print(f"module '{module_path}' has no class '{class_name}'",
>>  file=sys.stderr)
>>sys.exit(1)
>> 
>> Admittedly a matter of taste.  I tend to avoid the
>> 
>> if frobnicate would fail:
>> error out
>> frobnicate
>> 
>> pattern when practical.
>
> I guess I tend to avoid using exception catching for such flow
> control, but I don't mind that much either way.

I'm not a fan of using exceptions for mundane failures, but it's how
Python rolls.

>> > -schema = QAPISchema(schema_file)
>> > -gen_types(schema, output_dir, prefix, builtins)
>> > -gen_features(schema, output_dir, prefix)
>> > -gen_visit(schema, output_dir, prefix, builtins)
>> > -gen_commands(schema, output_dir, prefix, gen_tracing)
>> > -gen_events(schema, output_dir, prefix)
>> > -gen_introspect(schema, output_dir, prefix, unmask)
>> > +return backend
>> > +except Exception as ex:
>> 
>> Likewise too general exception.
>> 
>> I'd like to shrink the try block and reduce the nesting:
>> 
>>try:
>>backend = klass()
>>except Exception as ex:
>>print(f"Backend '{path}' cannot be instantiated: {ex}", 
>> file=sys.stderr)
>>sys.exit(1)
>> 
>>if not isinstance(backend, QAPIBackend):
>>print(f"Backend '{path}' must be an instance of QAPIBackend", 
>> file=sys.stderr)
>>

Re: [PATCH v5 14/24] hw/uefi: add var-service-json.c + qapi for NV vars.

2025-02-25 Thread Gerd Hoffmann

  Hi,

> > +# @data: variable value, encoded as hex string.
> 
> I understand this is a blob.  We commonly use base64 for that.  Why not
> here?

It's an existing format already supported by other tools.  Guess I
should add that to the preamble.

> > +# @digest: variable certificate digest.  Used to verify the signature
> > +# of updates for authenticated variables.
> 
> How to create and verify these digests will be obvious enough to users
> of this interface?

Well, no.  It's a somewhat complicated story ...

UEFI has two kinds of authenticated variables.  First, the secure boot
variables (PK, KEK, db, dbx).  These have hard-coded rules, the rule
most relevant in practice is that signed update requests for the 'db'
and 'dbx' variables are checked against the 'KEK' certificate database.

For other authenticated variables UEFI essentially remembers the
certificate when the variable is created, and any update / delete
requests need a signature from the same certificate (simplified a
bit to keep it short, the actual rules are more complicated, details
are in the UEFI spec).

This digest handles the "remembers the certificate" part.

In practice the second kind of authenticated variables is rarely used.
I have yet to see a piece of software actually using that in practice
(other than the test case I have written).


Also note that this is *not* part of the QMP interface.  The driver uses
this to store the efi variable store in json format on disk (see
var-service-json.c in this patch).

Adding support for reading/writing uefi variables is something we might
add to the monitor in the future.   Should that happen it is not clear
whenever such an interface would actually use the raw 'UefiVariable'
struct or if higher-level interfaces would be more useful.  For example
a command to query whenever the guest has secure boot turned on.

take care,
  Gerd

Re: [PATCH v3 017/162] tcg: Convert xor to TCGOutOpBinary

2025-02-25 Thread Philippe Mathieu-Daudé


On 17/2/25 00:07, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
  tcg/tcg.c|  4 +++
  tcg/aarch64/tcg-target.c.inc | 31 +++-
  tcg/arm/tcg-target.c.inc | 25 +++-
  tcg/i386/tcg-target.c.inc| 27 -
  tcg/loongarch64/tcg-target.c.inc | 29 +++---
  tcg/mips/tcg-target.c.inc| 28 +++---
  tcg/ppc/tcg-target.c.inc | 30 +++
  tcg/riscv/tcg-target.c.inc   | 29 +++---
  tcg/s390x/tcg-target.c.inc   | 50 
  tcg/sparc64/tcg-target.c.inc | 23 +++
  tcg/tci/tcg-target.c.inc | 14 +++--
  11 files changed, 186 insertions(+), 104 deletions(-)




diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc



+static void tgen_xori(TCGContext *s, TCGType type,
+  TCGReg a0, TCGReg a1, tcg_target_long a2)
+{
+int rexw = type == TCG_TYPE_I32 ? 0 : P_REXW;
+tgen_arithi(s, ARITH_XOR + rexw, a0, a2, 0);


s/0/false/


+}


Reviewed-by: Philippe Mathieu-Daudé

[PATCH v4 6/6] migration: Add qtest for migration over RDMA

2025-02-25 Thread Li Zhijian via

This qtest requires there is a RDMA(RoCE) link in the host.
In order to make the test work smoothly, introduce a
scripts/rdma-migration-helper.sh to
- setup a new Soft-RoCE(aka RXE) if it's root
- detect existing RoCE link

Test will be skipped if there is no available RoCE link.
 # Start of rdma tests
 # Running /x86_64/migration/precopy/rdma/plain
 ok 1 /x86_64/migration/precopy/rdma/plain # SKIP
 There is no available rdma link to run RDMA migration test.
 To enable the test:
 (1) Run 'scripts/rdma-migration-helper.sh setup' with root and rerun the test
 or
 (2) Run the test with root privilege

 # End of rdma tests

Reviewed-by: Peter Xu 
Signed-off-by: Li Zhijian 
---
 MAINTAINERS   |  1 +
 scripts/rdma-migration-helper.sh  | 41 +
 tests/qtest/migration/precopy-tests.c | 64 +++
 3 files changed, 106 insertions(+)
 create mode 100755 scripts/rdma-migration-helper.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index 3848d37a38d..15360fcdc4b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3480,6 +3480,7 @@ R: Li Zhijian 
 R: Peter Xu 
 S: Odd Fixes
 F: migration/rdma*
+F: scripts/rdma-migration-helper.sh
 
 Migration dirty limit and dirty page rate
 M: Hyman Huang 
diff --git a/scripts/rdma-migration-helper.sh b/scripts/rdma-migration-helper.sh
new file mode 100755
index 000..66557d9e267
--- /dev/null
+++ b/scripts/rdma-migration-helper.sh
@@ -0,0 +1,41 @@
+#!/bin/bash
+
+# Copied from blktests
+get_ipv4_addr()
+{
+ip -4 -o addr show dev "$1" |
+sed -n 's/.*[[:blank:]]inet[[:blank:]]*\([^[:blank:]/]*\).*/\1/p' |
+tr -d '\n'
+}
+
+has_soft_rdma()
+{
+rdma link | grep -q " netdev $1[[:blank:]]*\$"
+}
+
+rdma_rxe_setup_detect()
+{
+(
+cd /sys/class/net &&
+for i in *; do
+[ -e "$i" ] || continue
+[ "$i" = "lo" ] && continue
+[ "$(<"$i/addr_len")" = 6 ] || continue
+[ "$(<"$i/carrier")" = 1 ] || continue
+
+has_soft_rdma "$i" && break
+[ "$operation" = "setup" ] &&
+rdma link add "${i}_rxe" type rxe netdev "$i" && break
+done
+has_soft_rdma "$i" || return
+get_ipv4_addr "$i"
+)
+}
+
+operation=${1:-setup}
+
+if [ "$operation" == "setup" ] || [ "$operation" == "detect" ]; then
+rdma_rxe_setup_detect
+else
+echo "Usage: $0 [setup | detect]"
+fi
diff --git a/tests/qtest/migration/precopy-tests.c 
b/tests/qtest/migration/precopy-tests.c
index ba273d10b9a..bf97f4e9325 100644
--- a/tests/qtest/migration/precopy-tests.c
+++ b/tests/qtest/migration/precopy-tests.c
@@ -99,6 +99,66 @@ static void test_precopy_unix_dirty_ring(void)
 test_precopy_common(&args);
 }
 
+#ifdef CONFIG_RDMA
+
+#define RDMA_MIGRATION_HELPER "scripts/rdma-migration-helper.sh"
+static int new_rdma_link(char *buffer)
+{
+const char *argument = (geteuid() == 0) ? "setup" : "detect";
+char cmd[1024];
+
+snprintf(cmd, sizeof(cmd), "%s %s", RDMA_MIGRATION_HELPER, argument);
+
+FILE *pipe = popen(cmd, "r");
+if (pipe == NULL) {
+perror("Failed to run script");
+return -1;
+}
+
+int idx = 0;
+while (fgets(buffer + idx, 128 - idx, pipe) != NULL) {
+idx += strlen(buffer);
+}
+
+int status = pclose(pipe);
+if (status == -1) {
+perror("Error reported by pclose()");
+return -1;
+} else if (WIFEXITED(status)) {
+return WEXITSTATUS(status);
+}
+
+return -1;
+}
+
+static void test_precopy_rdma_plain(void)
+{
+char buffer[128] = {};
+
+if (new_rdma_link(buffer)) {
+g_test_skip("\nThere is no available rdma link to run RDMA migration 
test.\n"
+"To enable the test:\n"
+"(1) Run \'" RDMA_MIGRATION_HELPER " setup\' with root and 
rerun the test\n"
+"or\n"
+"(2) Run the test with root privilege\n");
+return;
+}
+
+/*
+ * TODO: query a free port instead of hard code.
+ * 29200=('R'+'D'+'M'+'A')*100
+ **/
+g_autofree char *uri = g_strdup_printf("rdma:%s:29200", buffer);
+
+MigrateCommon args = {
+.listen_uri = uri,
+.connect_uri = uri,
+};
+
+test_precopy_common(&args);
+}
+#endif
+
 static void test_precopy_tcp_plain(void)
 {
 MigrateCommon args = {
@@ -1124,6 +1184,10 @@ static void 
migration_test_add_precopy_smoke(MigrationTestEnv *env)
test_multifd_tcp_uri_none);
 migration_test_add("/migration/multifd/tcp/plain/cancel",
test_multifd_tcp_cancel);
+#ifdef CONFIG_RDMA
+migration_test_add("/migration/precopy/rdma/plain",
+   test_precopy_rdma_plain);
+#endif
 }
 
 void migration_test_add_precopy(MigrationTestEnv *env)
-- 
2.44.0

[PATCH v4 2/6] migration: check RDMA and capabilities are compatible on both sides

2025-02-25 Thread Li Zhijian via

Depending on the order of starting RDMA and setting capability,
the following scenarios can be categorized into the following scenarios:
Source:
 S1: [set capabilities] -> [Start RDMA outgoing]
Destination:
 D1: [set capabilities] -> [Start RDMA incoming]
 D2: [Start RDMA incoming] -> [set capabilities]

Previously, compatibility between RDMA and capabilities was verified only
in scenario D1, potentially causing migration failures in other situations.

For scenarios S1 and D1, we can seamlessly incorporate
migration_transport_compatible() to address compatibility between
channels and capabilities vs transport.

For scenario D2, ensure compatibility within migrate_caps_check().

Signed-off-by: Li Zhijian 
---

V4:
  - Remove Reviewed-tag and cover above D2 scenario

V3:
  - collect Reviewed tag
  - reorder: 5th -> 2nd
---
 migration/migration.c | 30 --
 migration/options.c   | 21 +
 migration/options.h   |  1 +
 3 files changed, 42 insertions(+), 10 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index c597aa707e5..0736d3a6728 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -238,6 +238,24 @@ 
migration_channels_and_transport_compatible(MigrationAddress *addr,
 return true;
 }
 
+static bool
+migration_capabilities_and_transport_compatible(MigrationAddress *addr,
+Error **errp)
+{
+if (addr->transport == MIGRATION_ADDRESS_TYPE_RDMA) {
+return migrate_rdma_caps_check(migrate_get_current()->capabilities,
+   errp);
+}
+
+return true;
+}
+
+static bool migration_transport_compatible(MigrationAddress *addr, Error 
**errp)
+{
+return migration_channels_and_transport_compatible(addr, errp) &&
+   migration_capabilities_and_transport_compatible(addr, errp);
+}
+
 static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp)
 {
 uintptr_t a = (uintptr_t) ap, b = (uintptr_t) bp;
@@ -716,7 +734,7 @@ static void qemu_start_incoming_migration(const char *uri, 
bool has_channels,
 }
 
 /* transport mechanism not suitable for migration? */
-if (!migration_channels_and_transport_compatible(addr, errp)) {
+if (!migration_transport_compatible(addr, errp)) {
 return;
 }
 
@@ -735,14 +753,6 @@ static void qemu_start_incoming_migration(const char *uri, 
bool has_channels,
 }
 #ifdef CONFIG_RDMA
 } else if (addr->transport == MIGRATION_ADDRESS_TYPE_RDMA) {
-if (migrate_xbzrle()) {
-error_setg(errp, "RDMA and XBZRLE can't be used together");
-return;
-}
-if (migrate_multifd()) {
-error_setg(errp, "RDMA and multifd can't be used together");
-return;
-}
 rdma_start_incoming_migration(&addr->u.rdma, errp);
 #endif
 } else if (addr->transport == MIGRATION_ADDRESS_TYPE_EXEC) {
@@ -2159,7 +2169,7 @@ void qmp_migrate(const char *uri, bool has_channels,
 }
 
 /* transport mechanism not suitable for migration? */
-if (!migration_channels_and_transport_compatible(addr, errp)) {
+if (!migration_transport_compatible(addr, errp)) {
 return;
 }
 
diff --git a/migration/options.c b/migration/options.c
index bb259d192a9..c6f18df5864 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -439,6 +439,20 @@ static bool migrate_incoming_started(void)
 return !!migration_incoming_get_current()->transport_data;
 }
 
+bool migrate_rdma_caps_check(bool *caps, Error **errp)
+{
+if (caps[MIGRATION_CAPABILITY_XBZRLE]) {
+error_setg(errp, "RDMA and XBZRLE can't be used together");
+return false;
+}
+if (caps[MIGRATION_CAPABILITY_MULTIFD]) {
+error_setg(errp, "RDMA and multifd can't be used together");
+return false;
+}
+
+return true;
+}
+
 /**
  * @migration_caps_check - check capability compatibility
  *
@@ -602,6 +616,13 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, 
Error **errp)
 }
 }
 
+/*
+ * On destination side, check the cases that capability is being set
+ * after incoming thread has started.
+*/
+if (migrate_rdma() && !migrate_rdma_caps_check(new_caps, errp)) {
+return false;
+}
 return true;
 }
 
diff --git a/migration/options.h b/migration/options.h
index 762be4e641a..82d839709e7 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -57,6 +57,7 @@ bool migrate_tls(void);
 
 /* capabilities helpers */
 
+bool migrate_rdma_caps_check(bool *caps, Error **errp);
 bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp);
 
 /* parameters */
-- 
2.44.0

[PATCH v4 0/6] migration/rdma: fixes, refactor and cleanup

2025-02-25 Thread Li Zhijian via

- It fix the RDMA migration broken issue
- disable RDMA + postcopy
- some cleanups
- Add a qtest for RDMA at last

Changes since V3:
- check RDMA and capabilities are compatible on both sides # renamed from
  previous V3's "migration: Add 
migration_capabilities_and_transport_compatible()"

Changes since V2:
- squash previous 2/3/4 to '[PATCH v3 5/6] migration: Unfold  
control_save_page()'
- reorder the patch layout to prevent recently added code from being deleted 
again.
- collect Reviewed tags from Peter

Changes since V1[0]:
Add some saparate patches to refactor and cleanup based on V1

[0] 
https://lore.kernel.org/qemu-devel/20250218074345.638203-1-lizhij...@fujitsu.com/


Li Zhijian (6):
  migration: Prioritize RDMA in ram_save_target_page()
  migration: check RDMA and capabilities are compatible on both sides
  migration: disable RDMA + postcopy-ram
  migration/rdma: Remove redundant migration_in_postcopy checks
  migration: Unfold control_save_page()
  migration: Add qtest for migration over RDMA

 MAINTAINERS   |  1 +
 migration/migration.c | 30 -
 migration/options.c   | 25 +++
 migration/options.h   |  1 +
 migration/ram.c   | 41 +
 migration/rdma.c  | 11 ++---
 migration/rdma.h  |  3 +-
 scripts/rdma-migration-helper.sh  | 41 +
 tests/qtest/migration/precopy-tests.c | 64 +++
 9 files changed, 168 insertions(+), 49 deletions(-)
 create mode 100755 scripts/rdma-migration-helper.sh

-- 
2.44.0

[PATCH v4 4/6] migration/rdma: Remove redundant migration_in_postcopy checks

2025-02-25 Thread Li Zhijian via

Since we have disabled RDMA + postcopy, it's safe to remove
the migration_in_postcopy() that follows the migrate_rdma().

Signed-off-by: Li Zhijian 
---
V3:
  reorder: 7th->4th
---
 migration/rdma.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 76fb0349238..e5b4ac599b1 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3284,7 +3284,7 @@ err:
 int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
ram_addr_t offset, size_t size)
 {
-if (!migrate_rdma() || migration_in_postcopy()) {
+if (!migrate_rdma()) {
 return RAM_SAVE_CONTROL_NOT_SUPP;
 }
 
@@ -3829,7 +3829,7 @@ int rdma_block_notification_handle(QEMUFile *f, const 
char *name)
 
 int rdma_registration_start(QEMUFile *f, uint64_t flags)
 {
-if (!migrate_rdma() || migration_in_postcopy()) {
+if (!migrate_rdma()) {
 return 0;
 }
 
@@ -3861,7 +3861,7 @@ int rdma_registration_stop(QEMUFile *f, uint64_t flags)
 RDMAControlHeader head = { .len = 0, .repeat = 1 };
 int ret;
 
-if (!migrate_rdma() || migration_in_postcopy()) {
+if (!migrate_rdma()) {
 return 0;
 }
 
-- 
2.44.0

[PATCH v4 5/6] migration: Unfold control_save_page()

2025-02-25 Thread Li Zhijian via

control_save_page() is for RDMA only, unfold it to make the code more
clear.
In addition:
 - Similar to other branches style in ram_save_target_page(), involve RDMA
   only if the condition 'migrate_rdma()' is true.
 - Further simplify the code by removing the RAM_SAVE_CONTROL_NOT_SUPP.

Signed-off-by: Li Zhijian 
---
V3:
  squash previous 2nd, 3th, 4th into one patch
---
 migration/ram.c  | 34 +++---
 migration/rdma.c |  7 ++-
 migration/rdma.h |  3 +--
 3 files changed, 10 insertions(+), 34 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 424df6d9f13..c363034c882 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1143,32 +1143,6 @@ static int save_zero_page(RAMState *rs, PageSearchStatus 
*pss,
 return len;
 }
 
-/*
- * @pages: the number of pages written by the control path,
- *< 0 - error
- *> 0 - number of pages written
- *
- * Return true if the pages has been saved, otherwise false is returned.
- */
-static bool control_save_page(PageSearchStatus *pss,
-  ram_addr_t offset, int *pages)
-{
-int ret;
-
-ret = rdma_control_save_page(pss->pss_channel, pss->block->offset, offset,
- TARGET_PAGE_SIZE);
-if (ret == RAM_SAVE_CONTROL_NOT_SUPP) {
-return false;
-}
-
-if (ret == RAM_SAVE_CONTROL_DELAYED) {
-*pages = 1;
-return true;
-}
-*pages = ret;
-return true;
-}
-
 /*
  * directly send the page to the stream
  *
@@ -1965,7 +1939,13 @@ static int ram_save_target_page(RAMState *rs, 
PageSearchStatus *pss)
 int res;
 
 /* Hand over to RDMA first */
-if (control_save_page(pss, offset, &res)) {
+if (migrate_rdma()) {
+res = rdma_control_save_page(pss->pss_channel, pss->block->offset,
+ offset, TARGET_PAGE_SIZE);
+
+if (res == RAM_SAVE_CONTROL_DELAYED) {
+res = 1;
+}
 return res;
 }
 
diff --git a/migration/rdma.c b/migration/rdma.c
index e5b4ac599b1..08eb924ffaa 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3284,14 +3284,11 @@ err:
 int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
ram_addr_t offset, size_t size)
 {
-if (!migrate_rdma()) {
-return RAM_SAVE_CONTROL_NOT_SUPP;
-}
+assert(migrate_rdma());
 
 int ret = qemu_rdma_save_page(f, block_offset, offset, size);
 
-if (ret != RAM_SAVE_CONTROL_DELAYED &&
-ret != RAM_SAVE_CONTROL_NOT_SUPP) {
+if (ret != RAM_SAVE_CONTROL_DELAYED) {
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 }
diff --git a/migration/rdma.h b/migration/rdma.h
index f55f28bbed1..8eeb0117b91 100644
--- a/migration/rdma.h
+++ b/migration/rdma.h
@@ -33,7 +33,6 @@ void rdma_start_incoming_migration(InetSocketAddress 
*host_port, Error **errp);
 #define RAM_CONTROL_ROUND 1
 #define RAM_CONTROL_FINISH3
 
-#define RAM_SAVE_CONTROL_NOT_SUPP -1000
 #define RAM_SAVE_CONTROL_DELAYED  -2000
 
 #ifdef CONFIG_RDMA
@@ -56,7 +55,7 @@ static inline
 int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
ram_addr_t offset, size_t size)
 {
-return RAM_SAVE_CONTROL_NOT_SUPP;
+g_assert_not_reached();
 }
 #endif
 #endif
-- 
2.44.0

[PATCH v4 1/6] migration: Prioritize RDMA in ram_save_target_page()

2025-02-25 Thread Li Zhijian via

Address an error in RDMA-based migration by ensuring RDMA is prioritized
when saving pages in `ram_save_target_page()`.

Previously, the RDMA protocol's page-saving step was placed after other
protocols due to a refactoring in commit bc38dc2f5f3. This led to migration
failures characterized by unknown control messages and state loading errors
destination:
(qemu) qemu-system-x86_64: Unknown control message QEMU FILE
qemu-system-x86_64: error while loading state section id 1(ram)
qemu-system-x86_64: load of migration failed: Operation not permitted
source:
(qemu) qemu-system-x86_64: RDMA is in an error state waiting migration to abort!
qemu-system-x86_64: failed to save SaveStateEntry with id(name): 1(ram): -1
qemu-system-x86_64: rdma migration: recv polling control error!
qemu-system-x86_64: warning: Early error. Sending error.
qemu-system-x86_64: warning: rdma migration: send polling control error

RDMA migration implemented its own protocol/method to send pages to
destination side, hand over to RDMA first to prevent pages being saved by
other protocol.

Fixes: bc38dc2f5f3 ("migration: refactor ram_save_target_page functions")
Reviewed-by: Peter Xu 
Signed-off-by: Li Zhijian 
---
V3:
   collect Reviewed tags
---
 migration/ram.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 589b6505eb2..424df6d9f13 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1964,6 +1964,11 @@ static int ram_save_target_page(RAMState *rs, 
PageSearchStatus *pss)
 ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
 int res;
 
+/* Hand over to RDMA first */
+if (control_save_page(pss, offset, &res)) {
+return res;
+}
+
 if (!migrate_multifd()
 || migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) {
 if (save_zero_page(rs, pss, offset)) {
@@ -1976,10 +1981,6 @@ static int ram_save_target_page(RAMState *rs, 
PageSearchStatus *pss)
 return ram_save_multifd_page(block, offset);
 }
 
-if (control_save_page(pss, offset, &res)) {
-return res;
-}
-
 return ram_save_page(rs, pss);
 }
 
-- 
2.44.0

Re: [PATCH v2 5/8] migration: Add migration_capabilities_and_transport_compatible() helper

2025-02-25 Thread Zhijian Li (Fujitsu)



On 25/02/2025 22:48, Peter Xu wrote:
> On Tue, Feb 25, 2025 at 06:37:21AM +, Zhijian Li (Fujitsu) wrote:
>>
>>
>> On 25/02/2025 03:58, Peter Xu wrote:
>>> On Fri, Feb 21, 2025 at 02:36:09PM +0800, Li Zhijian wrote:
 Similar to migration_channels_and_transport_compatible(), introduce a
 new helper migration_capabilities_and_transport_compatible() to check if
 the capabilites is compatible with the transport.

 Currently, only move the capabilities vs RDMA transport to this
 function.

 Signed-off-by: Li Zhijian 
>>>
>>> Yeah this is even better, thanks.
>>>
>>> Reviewed-by: Peter Xu 
>>
>> Hi Peter,
>>
>> Thinking one step further, this patch looks promising and can check
>> most situations. However, on the destination side, if the user first
>> specifies '-incoming' (with the startup parameter -incoming xxx or
>> migrate_incoming xxx) and then 'migrate_set_capability xxx on',
>> the function migration_capabilities_and_transport_compatible() will
>> not be called to check compatibility, which might lead to migration failure.
>>
>> To address this, without introducing a new member 'transport' into the 
>> MigrationState
>> structure, the code might need to be adjusted to this:
>>
>> The question is whether we need to consider it now(in this patch set)?
> 
> We can do that in one patch.

Okay, please ignore the V3 and take another look at the V4 which integrated your
below suggestion.


Thanks

> 
>>
>>static bool migration_transport_compatible(MigrationAddress *addr, Error 
>> **errp)
>>{
>>return migration_channels_and_transport_compatible(addr, errp) &&
>> -   migration_capabilities_and_transport_compatible(addr, errp);
>> +   migration_capabilities_and_transport_compatible(addr->transport,
>> +   migrate_get_current()->capabilities, errp);
> 
> Here IMHO we could make migration_capabilities_and_transport_compatible()
> taking addr+errp like before, then..
> 
>>}
>>
>>static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp)
>> diff --git a/migration/options.c b/migration/options.c
>> index bb259d192a9..59f0ed5b528 100644
>> --- a/migration/options.c
>> +++ b/migration/options.c
>> @@ -439,6 +439,29 @@ static bool migrate_incoming_started(void)
>>return !!migration_incoming_get_current()->transport_data;
>>}
>>
>> +bool
>> +migration_capabilities_and_transport_compatible(MigrationAddressType 
>> transport,
>> +bool *new_caps,
>> +Error **errp)
>> +{
> 
> ..  here fetch global capability list and feed it.
> 
>> +if (transport == MIGRATION_ADDRESS_TYPE_RDMA) {
> 
> [1]
> 
>> +if (new_caps[MIGRATION_CAPABILITY_XBZRLE]) {
>> +error_setg(errp, "RDMA and XBZRLE can't be used together");
>> +return false;
>> +}
>> +if (new_caps[MIGRATION_CAPABILITY_MULTIFD]) {
>> +error_setg(errp, "RDMA and multifd can't be used together");
>> +return false;
>> +}
>> +if (new_caps[MIGRATION_CAPABILITY_POSTCOPY_RAM]) {
>> +error_setg(errp, "RDMA and postcopy-ram can't be used 
>> together");
>> +return false;
>> +}
> 
> We could introduce migration_rdma_caps_check(&caps, errp) for this chunk
> (since [1]), then...
> 
>> +}
>> +
>> +return true;
>> +}
>> +
>>/**
>> * @migration_caps_check - check capability compatibility
>> *
>> @@ -602,6 +625,15 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, 
>> Error **errp)
>>}
>>}
>>
>> +/*
>> + * In destination side, check the cases that capability is being set
>> + * after incoming thread has started.
>> +*/
>> +if (migrate_rdma() &&
>> +!migration_capabilities_and_transport_compatible(
>> +MIGRATION_ADDRESS_TYPE_RDMA, new_caps, errp)) {
>> +return false;
>> +}
> 
> ... use migration_rdma_caps_check() here, might be slightly more readable:
> 
>if (migrate_rdma() && !migration_rdma_caps_check(new_caps, errp)) {
>return false;
>}
> 
>>return true;
>>}
>>
>> diff --git a/migration/options.h b/migration/options.h
>> index 762be4e641a..ca6a40e7545 100644
>> --- a/migration/options.h
>> +++ b/migration/options.h
>> @@ -58,6 +58,9 @@ bool migrate_tls(void);
>>/* capabilities helpers */
>>
>>bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp);
>> +bool
>> +migration_capabilities_and_transport_compatible(MigrationAddressType 
>> transport,
>> +bool *new_caps, Error 
>> **errp);
>>
>>>
>

Re: [PATCH v2 04/18] hw/arm: Add i.MX 8M Plus EVK board

2025-02-25 Thread Bernhard Beschow




Am 25. Februar 2025 17:00:53 UTC schrieb Peter Maydell 
:
>On Tue, 25 Feb 2025 at 15:42, Peter Maydell  wrote:
>> The C compiler for the OpenSUSE CI job doesn't seem to like this:
>> https://gitlab.com/pm215/qemu/-/jobs/9239416833
>>
>> ../hw/arm/fsl-imx8mp.c: In function ‘fsl_imx8mp_realize’:
>> ../hw/arm/fsl-imx8mp.c:382:15: error: initializer element is not constant
>>  { fsl_imx8mp_memmap[FSL_IMX8MP_UART1].addr, 
>> FSL_IMX8MP_UART1_IRQ },
>>^
>> ../hw/arm/fsl-imx8mp.c:382:15: note: (near initialization for
>> ‘serial_table[0].addr’)
>> ../hw/arm/fsl-imx8mp.c:383:15: error: initializer element is not constant
>>  { fsl_imx8mp_memmap[FSL_IMX8MP_UART2].addr, 
>> FSL_IMX8MP_UART2_IRQ },
>>^
>>
>> This is (gcc 7.5.0 "cc (SUSE Linux) 7.5.0") apparently. That's
>> a pretty old compiler, only just within the bounds of our
>> version requirements (which are 7.4 or better), so I'm guessing
>> it's just not as smart about figuring out that the
>> initializer here really is a constant value.
>>
>> I'll fix this up by dropping the "const" from the serial_table[]
>> etc definitions.
>
>More specifically, you have to drop 'static const', leaving just 'struct'.
>Minimal repro: https://godbolt.org/z/5css4hv67

I haven't checked, but this might be caused by the multiplications (... * KiB) 
which the old compiler might not perform at compile time. Thanks for looking 
into it without another iteration of the series!

Best regards,
Bernhard


>
>-- PMM

RE: [PATCH v3 15/28] hw/misc/aspeed_scu: Fix the revision ID cannot be set in the SOC layer for AST2700

2025-02-25 Thread Jamin Lin

Hi Cedric, 

> 
> On 2/13/25 04:35, Jamin Lin wrote:
> > According to the design of the AST2600, it has a Silicon Revision ID
> > Register, specifically SCU004 and SCU014, to set the Revision ID for the
> AST2600.
> > For the AST2600 A3, SCU004 is set to 0x05030303 and SCU014 is set to
> 0x05030303.
> > In the "aspeed_ast2600_scu_reset" function, the hardcoded value
> > "AST2600_A3_SILICON_REV" is set in SCU004, and "s->silicon_rev" is set
> > in SCU014. The value of "s->silicon_rev" is set by the SOC layer via
> > the "silicon-rev" property.
> >
> > However, the design of the AST2700 is different. There are two SCU
> controllers:
> > SCU0 (CPU Die) and SCU1 (IO Die). In the AST2700, the firmware reads
> > the SCU Silicon Revision ID register (SCU0_000) and the SCUIO Silicon
> > Revision ID register (SCU1_000) and combines them into a 64-bit value.
> > The combined value of SCU0_000[23:16] and SCU1_000[23:16] represents
> > the silicon revision. For example, the AST2700-A1 revision is
> > "0x0601010306010103", where
> > SCU0_000 should be 06010103 and SCU1_000 should be 06010103.
> 
> Are both these values supposed to be identical ? if not, we should plan for
> changes at machine/SoC level too.
> 

Currently, these values are supposed to be identical. Therefore, we can reuse 
the current design of the 
silicon_rev in the machine/SoC layer for AST2700.

> >   static const uint32_t ast2700_a0_resets[ASPEED_AST2700_SCU_NR_REGS]
> = {
> > -[AST2700_SILICON_REV]   = AST2700_A0_SILICON_REV,
> >   [AST2700_HW_STRAP1] = 0x0800,
> >   [AST2700_HW_STRAP1_CLR] = 0xFFF0FFF0,
> >   [AST2700_HW_STRAP1_LOCK]= 0x0FFF,
> > @@ -940,6 +939,7 @@ static void aspeed_ast2700_scu_reset(DeviceState
> *dev)
> >   AspeedSCUClass *asc = ASPEED_SCU_GET_CLASS(dev);
> >
> >   memcpy(s->regs, asc->resets, asc->nr_regs * 4);
> > +s->regs[AST2700_SILICON_REV] = s->silicon_rev;
> >
> >   }
> >
> >   static void aspeed_2700_scu_class_init(ObjectClass *klass, void
> > *data) @@ -1032,7 +1032,6 @@ static const MemoryRegionOps
> aspeed_ast2700_scuio_ops = {
> >   };
> >
> >   static const uint32_t
> ast2700_a0_resets_io[ASPEED_AST2700_SCU_NR_REGS] = {
> > -[AST2700_SILICON_REV]   = 0x0603,
> >   [AST2700_HW_STRAP1] = 0x0504,
> 
> why isn't AST2700_HW_STRAP1 assigned with s->hw_strap1 property ?
> 

This is a bug. The design of the HW_STRAP register has changed in the AST2700. 
There is one hw_strap1 register in the SCU (CPU DIE) and another hw_strap1 
register in the SCUIO (IO DIE). 
The values of these two hw_strap1 registers should not be the same.

To fix this issue, I made the following changes. Do you have any suggestions?
Additionally, would it be possible to submit a separate patch for the SCU 
silicon_rev and SCU hw_strap fix?
The patch series for supporting AST2700 A1 is quite large.
Thanks-Jamin

1. I dumped the real values of both registers on the EVB

root@ast2700-a0-default:~# md 14c02010 1  > SCUIO hw_strap1
14c02010: 0500
root@ast2700-a0-default:~# md 12c02010 1 --> SCU hw_strap1 
12c02010: 0800

The value AST2700_EVB_HW_STRAP1 0x0800 is used for the SCU hw_strap1.
The value AST2700_EVB_HW_STRAP2 0x0500 is used for the SCUIO hw_strap1

--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -181,8 +181,8 @@ struct AspeedMachineState {

 #ifdef TARGET_AARCH64
 /* AST2700 evb hardware value */
-#define AST2700_EVB_HW_STRAP1 0x00C0
-#define AST2700_EVB_HW_STRAP2 0x0003
+#define AST2700_EVB_HW_STRAP1 0x0800
+#define AST2700_EVB_HW_STRAP2 0x0500
 #endif

2.  Change to set hw_strap2 in the SCUIO model. Note this will modify the 
hw_strap1 register of the SCUIO.

+++ b/hw/arm/aspeed_ast27x0.c
@@ -410,14 +410,14 @@ static void aspeed_soc_ast2700_init(Object *obj)
  sc->silicon_rev);
 object_property_add_alias(obj, "hw-strap1", OBJECT(&s->scu),
   "hw-strap1");
-object_property_add_alias(obj, "hw-strap2", OBJECT(&s->scu),
-  "hw-strap2");
 object_property_add_alias(obj, "hw-prot-key", OBJECT(&s->scu),
   "hw-prot-key");

 object_initialize_child(obj, "scuio", &s->scuio, TYPE_ASPEED_2700_SCUIO);
 qdev_prop_set_uint32(DEVICE(&s->scuio), "silicon-rev",
  sc->silicon_rev);
+object_property_add_alias(obj, "hw-strap2", OBJECT(&s->scuio),
+  "hw-strap2");

3. I introduced a new aspeed_ast2700_scuio_reset function for the SCUIO model 
and 
set the value of the hw_strap2 property to the HW_STRAP1 register of the SCUIO 
model.

s->regs[AST2700_HW_STRAP1] = s->hw_strap2;

diff --git a/hw/misc/aspeed_scu.c b/hw/misc/aspeed_scu.c
index 259b142850..23ab7526f5 100644
--- a/hw/misc/aspeed_scu.c
+++ b/hw/misc/aspeed_scu.c
@@ -912,7 +912,6 @@ static const MemoryRegionOps aspeed_ast2700_scu_ops = {
 };

 static const uint32_t ast270

[PATCH v4 3/6] migration: disable RDMA + postcopy-ram

2025-02-25 Thread Li Zhijian via

It's believed that RDMA + postcopy-ram has been broken for a while.
Rather than spending time re-enabling it, let's simply disable it as a
trade-off.

Reviewed-by: Peter Xu 
Signed-off-by: Li Zhijian 
---
V3:
  - collect Reviewed tag
  - reoder: 6th -> 3th
---
 migration/options.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/migration/options.c b/migration/options.c
index c6f18df5864..527ba05d413 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -449,6 +449,10 @@ bool migrate_rdma_caps_check(bool *caps, Error **errp)
 error_setg(errp, "RDMA and multifd can't be used together");
 return false;
 }
+if (caps[MIGRATION_CAPABILITY_POSTCOPY_RAM]) {
+error_setg(errp, "RDMA and postcopy-ram can't be used together");
+return false;
+}
 
 return true;
 }
-- 
2.44.0

Re: Problem with iotest 233

2025-02-25 Thread Thomas Huth


On 25/02/2025 22.00, Thomas Huth wrote:

On 25/02/2025 21.35, Thomas Huth wrote:

On 25/02/2025 18.57, Daniel P. Berrangé wrote:

On Tue, Feb 25, 2025 at 06:52:43PM +0100, Thomas Huth wrote:

On 25/02/2025 18.44, Thomas Huth wrote:

On 25/02/2025 11.12, Kevin Wolf wrote:

Am 25.02.2025 um 08:20 hat Thomas Huth geschrieben:


   Hi!

I'm facing a weird hang in iotest 233 on my Fedora 41 laptop. When 
running


   ./check -raw 233

the test simply hangs. Looking at the log, the last message is "== check
plain client to TLS server fails ==". I added some debug messages, 
and it

seems like the previous NBD server is not correctly terminated here.
The test works fine again if I apply this patch:

diff --git a/tests/qemu-iotests/common.nbd b/tests/qemu-iotests/ 
common.nbd

--- a/tests/qemu-iotests/common.nbd
+++ b/tests/qemu-iotests/common.nbd
@@ -35,7 +35,7 @@ nbd_server_stop()
   read NBD_PID < "$nbd_pid_file"
   rm -f "$nbd_pid_file"
   if [ -n "$NBD_PID" ]; then
-    kill "$NBD_PID"
+    kill -9 "$NBD_PID"
   fi
   fi
   rm -f "$nbd_unix_socket" "$nbd_stderr_fifo"

... but that does not look like the right solution to me. What could 
prevent
the qemu-nbd from correctly shutting down when it receives a normal 
SIGTERM

signal?


Not sure. In theory, qemu_system_killed() should set state = TERMINATE
and make main_loop_wait() return through the notification, which should
then make it shut down. Maybe you can attach gdb and check what 'state'
is when it hangs and if it's still in the main loop?


I attached a gdb and ran "bt", and it looks like it is hanging in an
exit() handler:

(gdb) bt
#0  0x7f127f8fff1d in syscall () from /lib64/libc.so.6
#1  0x7f127fd32e1d in g_cond_wait () from /lib64/libglib-2.0.so.0
#2  0x5583df3048b2 in flush_trace_file (wait=true) at
../../devel/qemu/ trace/simple.c:140
#3  st_flush_trace_buffer () at ../../devel/qemu/trace/simple.c:383
#4  0x7f127f8296c1 in __run_exit_handlers () from /lib64/libc.so.6
#5  0x7f127f82978e in exit () from /lib64/libc.so.6
#6  0x5583df1ae9e1 in main (argc=, argv=) at ../../devel/qemu/qemu-nbd.c:1242


Ah, now that I wrote that: I recently ran "configure" with
--enable-trace-backends=simple ... when I remove that from "config.status"
again, then the test works fine again 8-)

Still, I think it should not hang with the simple trace backend here, 
should it?


IIUC this is waiting on trace_empty_cond.

This condition should be signalled from wait_for_trace_records_available
which is in turn called from writeout_thread.

This thread is started from st_init, which is called from 
trace_init_backends

which should be called from qemu-nbd. I would expect this thread to still
be running when exit() handlers are run.

Does GDB show any other threads running at the time of this hang ?


There is indeed a second thread running:

(gdb) thread apply all bt

Thread 2 (Thread 0x7f657096b6c0 (LWP 1117884) "qemu-nbd"):
#0  0x7f6573419f1d in syscall () from /lib64/libc.so.6
#1  0x562bbad9b783 in qemu_futex_wait (f=0x562bbaed25d8 
, val=4294967295) at ../../devel/qemu/include/qemu/ 
futex.h:29
#2  0x562bbad9b9af in qemu_event_wait (ev=0x562bbaed25d8 
) at ../../devel/qemu/util/qemu-thread-posix.c:465
#3  0x562bbada86a6 in call_rcu_thread (opaque=0x0) at ../../devel/ 
qemu/ util/rcu.c:278
#4  0x562bbad9bba3 in qemu_thread_start (args=0x562bf958a5c0) 
at ../../ devel/qemu/util/qemu-thread-posix.c:542

#5  0x7f6573398168 in start_thread () from /lib64/libc.so.6
#6  0x7f657341c14c in __clone3 () from /lib64/libc.so.6


Though, that does not look like the thread from the simpletrace, but the the 
QEMU RCU thread instead ... so no clue where that writer thread might have 
gone...


OK, I think I now understood the problem: qemu-nbd is calling 
trace_init_backends() first, which creates the simpletrace threads and 
installs the atexit() handler. Then it is calling fork() since the test uses 
the --fork command line option. But fork() does not clone the simpletrace 
thread into the new process, only the main thread (see man-page of fork, the 
new process starts single-threaded). So when the new child process exits, 
the exit handler calls the simple trace flush function which tries to wait 
for a thread that has never been created in that process.


The test works when I move the trace_init_backends() behind the fork() in 
the main function... but I am not sure if we would miss some logs this way, 
so I don't know whether that's the right solution. Could a qemu-nbd expert 
please have a look at this?


 Thanks,
  Thomas

Re: [PATCH v5 14/24] hw/uefi: add var-service-json.c + qapi for NV vars.

2025-02-25 Thread Markus Armbruster

Gerd Hoffmann  writes:

> Define qapi schema for the uefi variable store state.
>
> Use it and the generated visitor helper functions to store persistent
> (EFI_VARIABLE_NON_VOLATILE) variables in JSON format on disk.
>
> Signed-off-by: Gerd Hoffmann 

[...]

> diff --git a/qapi/meson.build b/qapi/meson.build
> index e7bc54e5d047..eadde4db307f 100644
> --- a/qapi/meson.build
> +++ b/qapi/meson.build
> @@ -65,6 +65,7 @@ if have_system
>  'pci',
>  'rocker',
>  'tpm',
> +'uefi',
>]
>  endif
>  if have_system or have_tools
> diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
> index b1581988e4eb..2877aff73d0c 100644
> --- a/qapi/qapi-schema.json
> +++ b/qapi/qapi-schema.json
> @@ -81,3 +81,4 @@
>  { 'include': 'vfio.json' }
>  { 'include': 'cryptodev.json' }
>  { 'include': 'cxl.json' }
> +{ 'include': 'uefi.json' }
> diff --git a/qapi/uefi.json b/qapi/uefi.json
> new file mode 100644
> index ..c1dfa76b6eb2
> --- /dev/null
> +++ b/qapi/uefi.json
> @@ -0,0 +1,55 @@
> +# -*- Mode: Python -*-
> +# vim: filetype=python
> +#
> +
> +##
> +# = UEFI Variable Store
> +#
> +# The qemu efi variable store implementation (hw/uefi/) uses this to
> +# store non-volatile variables on disk.
> +##
> +
> +##
> +# @UefiVariable:
> +#
> +# UEFI Variable.  Check the UEFI specifification for more detailed
> +# information on the fields.
> +#
> +# @guid: variable namespace GUID
> +#
> +# @name: variable name, in UTF-8 encoding.
> +#
> +# @attr: variable attributes.
> +#
> +# @data: variable value, encoded as hex string.

I understand this is a blob.  We commonly use base64 for that.  Why not
here?

> +#
> +# @time: variable modification time.  EFI_TIME struct, encoded as hex
> +# string.  Used only for authenticated variables, where the
> +# EFI_VARIABLE_TIME_BASED_AUTHENTICATED_WRITE_ACCESS attribute bit
> +# is set.
> +#
> +# @digest: variable certificate digest.  Used to verify the signature
> +# of updates for authenticated variables.

How to create and verify these digests will be obvious enough to users
of this interface?

> +#
> +# Since: 10.0
> +##
> +{ 'struct' : 'UefiVariable',
> +  'data' : { 'guid'  : 'str',
> + 'name'  : 'str',
> + 'attr'  : 'int',
> + 'data'  : 'str',
> + '*time' : 'str',
> + '*digest' : 'str'}}
> +
> +##
> +# @UefiVarStore:
> +#
> +# @version: currently allways 2

always

> +#
> +# @variables: list of UEFI variables
> +#
> +# Since: 10.0
> +##
> +{ 'struct' : 'UefiVarStore',
> +  'data' : { 'version'   : 'int',
> + 'variables' : [ 'UefiVariable' ] }}

[PATCH] tests/functional: Replace the ppc64 e500 advent calendar test

2025-02-25 Thread Cédric Le Goater

Replace the advent calendar test with a buildroot image built with
qemu_ppc64_e5500_defconfig. Boot a ppce500 machine from kernel and
disk, test network and poweroff. Add '-no-shutdown' to the command
line to avoid exiting from QEMU as it seems to bother the functional
framework.

Signed-off-by: Cédric Le Goater 
---
 tests/functional/test_ppc64_e500.py | 33 +++--
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/tests/functional/test_ppc64_e500.py 
b/tests/functional/test_ppc64_e500.py
index b92fe0b0e75e..9ce7ae6c4798 100755
--- a/tests/functional/test_ppc64_e500.py
+++ b/tests/functional/test_ppc64_e500.py
@@ -5,20 +5,39 @@
 # SPDX-License-Identifier: GPL-2.0-or-later
 
 from qemu_test import LinuxKernelTest, Asset
+from qemu_test import exec_command_and_wait_for_pattern
 
 
 class E500Test(LinuxKernelTest):
 
-ASSET_DAY19 = Asset(
-
'https://qemu-advcal.gitlab.io/qac-best-of-multiarch/download/day19.tar.xz',
-'20b1bb5a8488c664defbb5d283addc91a05335a936c63b3f5ff7eee74b725755')
+ASSET_BR2_E5500_UIMAGE = Asset(
+
'https://github.com/legoater/qemu-ppc-boot/raw/refs/heads/main/buildroot/qemu_ppc64_e5500-2023.11-8-gdcd9f0f6eb-20240104/uImage',
+'2478187c455d6cca3984e9dfde9c635d824ea16236b85fd6b4809f744706deda')
 
-def test_ppc64_e500(self):
+ASSET_BR2_E5500_ROOTFS = Asset(
+
'https://github.com/legoater/qemu-ppc-boot/raw/refs/heads/main//buildroot/qemu_ppc64_e5500-2023.11-8-gdcd9f0f6eb-20240104/rootfs.ext2',
+'9035ef97237c84c7522baaff17d25cdfca4bb7a053d5e296e902919473423d76')
+
+def test_ppc64_e500_buildroot(self):
 self.set_machine('ppce500')
 self.cpu = 'e5500'
-self.archive_extract(self.ASSET_DAY19)
-self.launch_kernel(self.scratch_file('day19', 'uImage'),
-   wait_for='QEMU advent calendar')
+
+uimage_path = self.ASSET_BR2_E5500_UIMAGE.fetch()
+rootfs_path = self.ASSET_BR2_E5500_ROOTFS.fetch()
+
+self.vm.set_console()
+self.vm.add_args('-kernel', uimage_path,
+ '-append', 'root=/dev/vda',
+ '-drive', f'file={rootfs_path},if=virtio,format=raw',
+ '-snapshot', '-no-shutdown')
+self.vm.launch()
+
+self.wait_for_console_pattern('Linux version')
+self.wait_for_console_pattern('/init as init process')
+self.wait_for_console_pattern('lease of 10.0.2.15')
+self.wait_for_console_pattern('buildroot login:')
+exec_command_and_wait_for_pattern(self, 'root', '#')
+exec_command_and_wait_for_pattern(self, 'poweroff', 'Power down')
 
 if __name__ == '__main__':
 LinuxKernelTest.main()
-- 
2.48.1

[PATCH] tests/functional: Update the ppc64 pseries and pnv tests

2025-02-25 Thread Cédric Le Goater

The tests are using a now archived Fedora29 release. Switch to the
most recent Fedora41 release.

Signed-off-by: Cédric Le Goater 
---
 tests/functional/test_ppc64_powernv.py | 6 +++---
 tests/functional/test_ppc64_pseries.py | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/tests/functional/test_ppc64_powernv.py 
b/tests/functional/test_ppc64_powernv.py
index 685e2178ed78..a9da7905366e 100755
--- a/tests/functional/test_ppc64_powernv.py
+++ b/tests/functional/test_ppc64_powernv.py
@@ -18,9 +18,9 @@ class powernvMachine(LinuxKernelTest):
 good_message = 'VFS: Cannot open root device'
 
 ASSET_KERNEL = Asset(
-('https://archives.fedoraproject.org/pub/archive/fedora-secondary/'
- 'releases/29/Everything/ppc64le/os/ppc/ppc64/vmlinuz'),
-'383c2f5c23bc0d9d32680c3924d3fd7ee25cc5ef97091ac1aa5e1d853422fc5f')
+('https://archives.fedoraproject.org/pub/fedora-secondary/'
+ 'releases/41/Everything/ppc64le/os/ppc/ppc64/vmlinuz'),
+'eca627adbe42437cacea169beeb4c3c12a5cfbca1a6b1ba5218d28139d2143c4')
 
 def do_test_linux_boot(self, command_line = KERNEL_COMMON_COMMAND_LINE):
 self.require_accelerator("tcg")
diff --git a/tests/functional/test_ppc64_pseries.py 
b/tests/functional/test_ppc64_pseries.py
index fdc404ed033d..92317cddead1 100755
--- a/tests/functional/test_ppc64_pseries.py
+++ b/tests/functional/test_ppc64_pseries.py
@@ -18,9 +18,9 @@ class pseriesMachine(QemuSystemTest):
 good_message = 'VFS: Cannot open root device'
 
 ASSET_KERNEL = Asset(
-('https://archives.fedoraproject.org/pub/archive/fedora-secondary/'
- 'releases/29/Everything/ppc64le/os/ppc/ppc64/vmlinuz'),
-'383c2f5c23bc0d9d32680c3924d3fd7ee25cc5ef97091ac1aa5e1d853422fc5f')
+('https://archives.fedoraproject.org/pub/fedora-secondary/'
+ 'releases/41/Everything/ppc64le/os/ppc/ppc64/vmlinuz'),
+'eca627adbe42437cacea169beeb4c3c12a5cfbca1a6b1ba5218d28139d2143c4')
 
 def do_test_ppc64_linux_boot(self, kernel_command_line = 
KERNEL_COMMON_COMMAND_LINE):
 kernel_path = self.ASSET_KERNEL.fetch()
-- 
2.48.1

Re: [PATCH 4/4] target/arm: Retry pushing CPER error if necessary

2025-02-25 Thread Gavin Shan


On 2/21/25 9:04 PM, Jonathan Cameron wrote:

On Fri, 21 Feb 2025 15:27:36 +1000
Gavin Shan  wrote:


[...]
  


I would say #1 is the ideal model because the read_ack_register is the 
bottleneck
and it should be scaled up to max_cpus. In that way, the bottleneck can be 
avoided
from the bottom. Another benefit with #1 is the error can be delivered 
immediately
to the vCPU where the error was raised. This matches with the syntax of SEA to 
me.


I don't think it helps for the bottleneck in linux at least.  A whole bunch of 
locks
are taken on each SEA because of the novel use of the fixmap.  There is only one
VA ever used to access the error status blocks we just change what PA it points 
to
under a spin lock. Maybe that can be improved on if we can persuade people that 
error
handling performance is a thing to care about!



Right, it doesn't helps for the bottleneck in guest kernel due to 
@ghes_notify_lock_sea.
With the lock, all existing GHES devices and error statuses are serialized for 
access. I
was actually talking about the benefit to avoid the bottleneck regarding the 
read_ack_regsiter,
which is the synchronization mechanism between guest kernel and QEMU. For 
example, an error
has been raised on vCPU-0, but not acknowledged at (A). Another error raised on 
vCPU-1
can't be delivered because we have only one GHES device and error status block, 
which
has been reserved for the error raised on vCPU-0.  With solution #1, the 
bottleneck can
be avoided with multiple GHES devices and error status blocks.

  vCPU-0   vCPU-1
  ==   ==
  kvm_cpu_exec kvm_cpu_exec
kvm_vcpu_ioctl(RUN)  kvm_vcpu_ioctl(RUN)
kvm_arch_on_sigbus_vcpu  kvm_arch_on_sigbus_vcpu
  acpi_ghes_memory_errors  acpi_ghes_memory_errors  
 (B)
  kvm_inject_arm_sea
kvm_vcpu_ioctl(RUN)
  :
do_mem_abort
  do_sea
apei_claim_sea
  ghes_notify_sea
raw_spin_lock(&ghes_notify_lock_sea)
ghes_in_nmi_spool_from_list
  ghes_in_nmi_queue_one_entry
ghes_clear_estatus (A)
raw_spin_unlock(&ghes_notify_lock_sea)

Thanks,
Gavin

Re: [PATCH v5 19/36] vfio/migration: Convert bytes_transferred counter to atomic

2025-02-25 Thread Cédric Le Goater


On 2/19/25 21:34, Maciej S. Szmigiero wrote:

From: "Maciej S. Szmigiero" 

So it can be safety accessed from multiple threads.

This variable type needs to be changed to unsigned long since
32-bit host platforms lack the necessary addition atomics on 64-bit
variables.

Using 32-bit counters on 32-bit host platforms should not be a problem
in practice since they can't realistically address more memory anyway.


Is it useful to have VFIO on 32-bit host platforms ?

If not, VFIO PCI should depend on (AARCH64 || PPC64 || X86_64) and we
could drop this patch. Let's address that independently.

Thanks,

C.








Signed-off-by: Maciej S. Szmigiero 
---
  hw/vfio/migration.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 03890eaa48a9..5532787be63b 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -55,7 +55,7 @@
   */
  #define VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE (1 * MiB)
  
-static int64_t bytes_transferred;

+static unsigned long bytes_transferred;
  
  static const char *mig_state_to_str(enum vfio_device_mig_state state)

  {
@@ -391,7 +391,7 @@ static ssize_t vfio_save_block(QEMUFile *f, VFIOMigration 
*migration)
  qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE);
  qemu_put_be64(f, data_size);
  qemu_put_buffer(f, migration->data_buffer, data_size);
-bytes_transferred += data_size;
+qatomic_add(&bytes_transferred, data_size);
  
  trace_vfio_save_block(migration->vbasedev->name, data_size);
  
@@ -1013,12 +1013,12 @@ static int vfio_block_migration(VFIODevice *vbasedev, Error *err, Error **errp)
  
  int64_t vfio_mig_bytes_transferred(void)

  {
-return bytes_transferred;
+return MIN(qatomic_read(&bytes_transferred), INT64_MAX);
  }
  
  void vfio_reset_bytes_transferred(void)

  {
-bytes_transferred = 0;
+qatomic_set(&bytes_transferred, 0);
  }
  
  /*

Re: [PATCH] tests/functional: Update the ppc64 pseries and pnv tests

2025-02-25 Thread Cédric Le Goater


On 2/26/25 08:01, Thomas Huth wrote:

On 26/02/2025 07.54, Cédric Le Goater wrote:

The tests are using a now archived Fedora29 release. Switch to the
most recent Fedora41 release.

Signed-off-by: Cédric Le Goater 
---
  tests/functional/test_ppc64_powernv.py | 6 +++---
  tests/functional/test_ppc64_pseries.py | 6 +++---
  2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/tests/functional/test_ppc64_powernv.py 
b/tests/functional/test_ppc64_powernv.py
index 685e2178ed78..a9da7905366e 100755
--- a/tests/functional/test_ppc64_powernv.py
+++ b/tests/functional/test_ppc64_powernv.py
@@ -18,9 +18,9 @@ class powernvMachine(LinuxKernelTest):
  good_message = 'VFS: Cannot open root device'
  ASSET_KERNEL = Asset(
-    ('https://archives.fedoraproject.org/pub/archive/fedora-secondary/'
- 'releases/29/Everything/ppc64le/os/ppc/ppc64/vmlinuz'),
-    '383c2f5c23bc0d9d32680c3924d3fd7ee25cc5ef97091ac1aa5e1d853422fc5f')
+    ('https://archives.fedoraproject.org/pub/fedora-secondary/'
+ 'releases/41/Everything/ppc64le/os/ppc/ppc64/vmlinuz'),
+    'eca627adbe42437cacea169beeb4c3c12a5cfbca1a6b1ba5218d28139d2143c4')


I think we should rather avoid the very latest and greatest Fedora URLs here... 
they will be invalid in a couple of months after Fedora 43 has been released. 
And if we keep switching the test assets all the time, this will make it more 
difficult to bisect regressions in the future.

  Thomas



So we should point to the latest archive (fedora38) then ?


Thanks,

C.

Re: [PATCH] hw/gpio: npcm7xx: fixup out-of-bounds access

2025-02-25 Thread Philippe Mathieu-Daudé


On 26/2/25 03:46, Patrick Venture wrote:

The reg isn't validated to be a possible register before
it's dereferenced for one case.  The mmio space registered
for the gpio device is 4KiB but there aren't that many
registers in the struct.

Google-Bug-Id: 397469048
Change-Id: I2fb8d0d3d41422baab22e8fc7e9fadd0f2ee7068


^ Both lines are irrelevant on mainstream git history, otherwise:

Reviewed-by: Philippe Mathieu-Daudé 

Cc: qemu-sta...@nongnu.org
Fixes: 526dbbe0874 ("hw/gpio: Add GPIO model for Nuvoton NPCM7xx")


Signed-off-by: Patrick Venture 
---
  hw/gpio/npcm7xx_gpio.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

Re: [PATCH] tests/functional: Update the ppc64 pseries and pnv tests

2025-02-25 Thread Thomas Huth


On 26/02/2025 07.54, Cédric Le Goater wrote:

The tests are using a now archived Fedora29 release. Switch to the
most recent Fedora41 release.

Signed-off-by: Cédric Le Goater 
---
  tests/functional/test_ppc64_powernv.py | 6 +++---
  tests/functional/test_ppc64_pseries.py | 6 +++---
  2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/tests/functional/test_ppc64_powernv.py 
b/tests/functional/test_ppc64_powernv.py
index 685e2178ed78..a9da7905366e 100755
--- a/tests/functional/test_ppc64_powernv.py
+++ b/tests/functional/test_ppc64_powernv.py
@@ -18,9 +18,9 @@ class powernvMachine(LinuxKernelTest):
  good_message = 'VFS: Cannot open root device'
  
  ASSET_KERNEL = Asset(

-('https://archives.fedoraproject.org/pub/archive/fedora-secondary/'
- 'releases/29/Everything/ppc64le/os/ppc/ppc64/vmlinuz'),
-'383c2f5c23bc0d9d32680c3924d3fd7ee25cc5ef97091ac1aa5e1d853422fc5f')
+('https://archives.fedoraproject.org/pub/fedora-secondary/'
+ 'releases/41/Everything/ppc64le/os/ppc/ppc64/vmlinuz'),
+'eca627adbe42437cacea169beeb4c3c12a5cfbca1a6b1ba5218d28139d2143c4')


I think we should rather avoid the very latest and greatest Fedora URLs 
here... they will be invalid in a couple of months after Fedora 43 has been 
released. And if we keep switching the test assets all the time, this will 
make it more difficult to bisect regressions in the future.


 Thomas

Re: [PATCH] tests/functional: Replace the ppc64 e500 advent calendar test

2025-02-25 Thread Thomas Huth


On 26/02/2025 07.50, Cédric Le Goater wrote:

Replace the advent calendar test with a buildroot image built with
qemu_ppc64_e5500_defconfig.


When picking this up, I'll add a "Unlike the advent calendar image, this 
newer buildroot image supports networking, too, so we can check whether it 
gets assigned with an IP address" as justification for the replacement, ok?



Boot a ppce500 machine from kernel and
disk, test network and poweroff. Add '-no-shutdown' to the command
line to avoid exiting from QEMU as it seems to bother the functional
framework.

Signed-off-by: Cédric Le Goater 


Reviewed-by: Thomas Huth 



---
  tests/functional/test_ppc64_e500.py | 33 +++--
  1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/tests/functional/test_ppc64_e500.py 
b/tests/functional/test_ppc64_e500.py
index b92fe0b0e75e..9ce7ae6c4798 100755
--- a/tests/functional/test_ppc64_e500.py
+++ b/tests/functional/test_ppc64_e500.py
@@ -5,20 +5,39 @@
  # SPDX-License-Identifier: GPL-2.0-or-later
  
  from qemu_test import LinuxKernelTest, Asset

+from qemu_test import exec_command_and_wait_for_pattern
  
  
  class E500Test(LinuxKernelTest):
  
-ASSET_DAY19 = Asset(

-
'https://qemu-advcal.gitlab.io/qac-best-of-multiarch/download/day19.tar.xz',
-'20b1bb5a8488c664defbb5d283addc91a05335a936c63b3f5ff7eee74b725755')
+ASSET_BR2_E5500_UIMAGE = Asset(
+
'https://github.com/legoater/qemu-ppc-boot/raw/refs/heads/main/buildroot/qemu_ppc64_e5500-2023.11-8-gdcd9f0f6eb-20240104/uImage',
+'2478187c455d6cca3984e9dfde9c635d824ea16236b85fd6b4809f744706deda')
  
-def test_ppc64_e500(self):

+ASSET_BR2_E5500_ROOTFS = Asset(
+
'https://github.com/legoater/qemu-ppc-boot/raw/refs/heads/main//buildroot/qemu_ppc64_e5500-2023.11-8-gdcd9f0f6eb-20240104/rootfs.ext2',
+'9035ef97237c84c7522baaff17d25cdfca4bb7a053d5e296e902919473423d76')
+
+def test_ppc64_e500_buildroot(self):
  self.set_machine('ppce500')
  self.cpu = 'e5500'
-self.archive_extract(self.ASSET_DAY19)
-self.launch_kernel(self.scratch_file('day19', 'uImage'),
-   wait_for='QEMU advent calendar')
+
+uimage_path = self.ASSET_BR2_E5500_UIMAGE.fetch()
+rootfs_path = self.ASSET_BR2_E5500_ROOTFS.fetch()
+
+self.vm.set_console()
+self.vm.add_args('-kernel', uimage_path,
+ '-append', 'root=/dev/vda',
+ '-drive', f'file={rootfs_path},if=virtio,format=raw',
+ '-snapshot', '-no-shutdown')
+self.vm.launch()
+
+self.wait_for_console_pattern('Linux version')
+self.wait_for_console_pattern('/init as init process')
+self.wait_for_console_pattern('lease of 10.0.2.15')
+self.wait_for_console_pattern('buildroot login:')
+exec_command_and_wait_for_pattern(self, 'root', '#')
+exec_command_and_wait_for_pattern(self, 'poweroff', 'Power down')
  
  if __name__ == '__main__':

  LinuxKernelTest.main()

Re: [PATCH v2 02/10] loongarch: reset vcpu after it's created

2025-02-25 Thread Philippe Mathieu-Daudé


On 7/2/25 17:20, Igor Mammedov wrote:

Reseting vcpu before its thread is created, caused various issues in the past
for other targets. It doesn't cause issues for loongarch at the moment but
to be consistent with the rest of targets, move reset during realize time
after qemu_init_vcpu().

That basically prevents reset being run when when vCPU is in incositent state


"inconsistent"


(i.e. accelerator hasn't initialized vCPU yet).

Signed-off-by: Igor Mammedov 


Reviewed-by: Philippe Mathieu-Daudé 

Cc: qemu-sta...@nongnu.org


---
CC: Song Gao 
---
  target/loongarch/cpu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index e91f4a5239..15018d43ae 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -634,8 +634,8 @@ static void loongarch_cpu_realizefn(DeviceState *dev, Error 
**errp)
  
  loongarch_cpu_register_gdb_regs_for_features(cs);
  
-cpu_reset(cs);

  qemu_init_vcpu(cs);
+cpu_reset(cs);
  
  lacc->parent_realize(dev, errp);

  }

Re: [PATCH v2 03/10] m68k: reset vcpu after it's created

2025-02-25 Thread Philippe Mathieu-Daudé


On 7/2/25 17:20, Igor Mammedov wrote:

Reseting vcpu before its thread is created, caused various issues in the past
for other targets. It doesn't cause issues for m68k at the moment but
to be consistent with the rest of targets, move reset during realize time
after qemu_init_vcpu().

That basically prevents reset being run when when vCPU is in incositent state


(typo)


(i.e. accelerator hasn't initialized vCPU yet).

Signed-off-by: Igor Mammedov 


Cc: qemu-sta...@nongnu.org
Reviewed-by: Philippe Mathieu-Daudé 


---
CC: Laurent Vivier 
---
  target/m68k/cpu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

qemu-devel@nongnu.org

2025-02-25 Thread Philippe Mathieu-Daudé


On 7/2/25 17:20, Igor Mammedov wrote:

cpu_list_add() was doing 2 distinct things:
- assign some index to vCPU
- add unrealized (thus in inconsistent state) vCPU to &cpus_queue

Code using CPU_FOREACH() macro would iterate over possibly
unrealized vCPUs, often dealt with special casing.

Instead of working around of vCPU existence in cpus_queue,
split out cpu_index assignment from cpu_list_add(),


Better split 2 distinct changes in 2 patches for clarity.


and move the later to the end of realize stage,
right before vCPU is let run.

Signed-off-by: Igor Mammedov 
---
CC: Yanan Wang 
CC: Zhao Liu 
---
  include/hw/core/cpu.h |  6 ++
  cpu-common.c  | 23 ++-
  cpu-target.c  |  2 +-
  hw/core/cpu-common.c  |  2 ++
  4 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index fb397cdfc5..c338fd31bd 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -750,6 +750,12 @@ bool cpu_virtio_is_big_endian(CPUState *cpu);
  
  #endif /* CONFIG_USER_ONLY */
  
+/**

+ * cpu_auto_assign_cpu_index:
+ * @cpu: The CPU to be assigned a cpu_index
+ */
+void cpu_auto_assign_cpu_index(CPUState *cpu);
+
  /**
   * cpu_list_add:
   * @cpu: The CPU to be added to the list of CPUs.
diff --git a/cpu-common.c b/cpu-common.c
index 4248b2d727..92f3d00e56 100644
--- a/cpu-common.c
+++ b/cpu-common.c
@@ -71,15 +71,7 @@ int cpu_get_free_index(void)
  return max_cpu_index;
  }
  
-CPUTailQ cpus_queue = QTAILQ_HEAD_INITIALIZER(cpus_queue);

-static unsigned int cpu_list_generation_id;
-
-unsigned int cpu_list_generation_id_get(void)
-{
-return cpu_list_generation_id;
-}
-
-void cpu_list_add(CPUState *cpu)
+void cpu_auto_assign_cpu_index(CPUState *cpu)
  {
  static bool cpu_index_auto_assigned;
  
@@ -91,6 +83,19 @@ void cpu_list_add(CPUState *cpu)

  } else {
  assert(!cpu_index_auto_assigned);
  }
+}
+
+CPUTailQ cpus_queue = QTAILQ_HEAD_INITIALIZER(cpus_queue);
+static unsigned int cpu_list_generation_id;
+
+unsigned int cpu_list_generation_id_get(void)
+{
+return cpu_list_generation_id;
+}
+
+void cpu_list_add(CPUState *cpu)
+{
+QEMU_LOCK_GUARD(&qemu_cpu_list_lock);
  QTAILQ_INSERT_TAIL_RCU(&cpus_queue, cpu, node);
  cpu_list_generation_id++;
  }
diff --git a/cpu-target.c b/cpu-target.c
index 667688332c..0c86c18a50 100644
--- a/cpu-target.c
+++ b/cpu-target.c
@@ -142,7 +142,7 @@ bool cpu_exec_realizefn(CPUState *cpu, Error **errp)
  }
  
  /* Wait until cpu initialization complete before exposing cpu. */

-cpu_list_add(cpu);
+cpu_auto_assign_cpu_index(cpu);
  
  #ifdef CONFIG_USER_ONLY

  assert(qdev_get_vmsd(DEVICE(cpu)) == NULL ||
diff --git a/hw/core/cpu-common.c b/hw/core/cpu-common.c
index cb79566cc5..c29737e5e3 100644
--- a/hw/core/cpu-common.c
+++ b/hw/core/cpu-common.c
@@ -211,6 +211,8 @@ static void cpu_common_realizefn(DeviceState *dev, Error 
**errp)
  }
  }
  
+cpu_list_add(cpu);

+
  if (dev->hotplugged) {
  cpu_synchronize_post_init(cpu);
  cpu_resume(cpu);

Re: [PATCH 4/4] target/arm: Retry pushing CPER error if necessary

2025-02-25 Thread Gavin Shan


On 2/25/25 9:19 PM, Igor Mammedov wrote:

On Fri, 21 Feb 2025 11:04:35 +
Jonathan Cameron  wrote:


Ideally I'd like whatever we choose to look like what a bare metal machine
does - mostly because we are less likely to hit untested OS paths.


Ack for that but,
that would need someone from hw/firmware side since error status block
handling is done by firmware.

right now we are just making things up based on spec interpretation.



It's a good point. I think it's worthwhile to understand how the RAS event
is processed and turned to CPER by firmware.

I didn't figure out how CPER is generated by edk2 after looking into tf-a (trust
firmware ARM) and edk2 for a while. I will consult to EDK2 developers to seek
their helps. However, there is a note in tf-a that briefly explaining how RAS
event is handled.

  From tf-a/plat/arm/board/fvp/aarch64/fvp_lsp_ras_sp.c:
  (g...@github.com:ARM-software/arm-trusted-firmware.git)

  /*
   * Note: Typical RAS error handling flow with Firmware First Handling
   *
   * Step 1: Exception resulting from a RAS error in the normal world is routed 
to
   * EL3.
   * Step 2: This exception is typically signaled as either a synchronous 
external
   * abort or SError or interrupt. TF-A (EL3 firmware) delegates the
   * control to platform specific handler built on top of the RAS helper
   * utilities.
   * Step 3: With the help of a Logical Secure Partition, TF-A sends a direct
   * message to dedicated S-EL0 (or S-EL1) RAS Partition managed by 
SPMC.
   * TF-A also populates a shared buffer with a data structure 
containing
   * enough information (such as system registers) to identify and 
triage
   * the RAS error.
   * Step 4: RAS SP generates the Common Platform Error Record (CPER) and shares
   * it with normal world firmware and/or OS kernel through a reserved
   * buffer memory.
   * Step 5: RAS SP responds to the direct message with information necessary 
for
   * TF-A to notify the OS kernel.
   * Step 6: Consequently, TF-A dispatches an SDEI event to notify the OS kernel
   * about the CPER records for further logging.
   */

According to the note, RAS SP (Secure Partition) is the black box where the RAS
event raised by tf-a is turned to CPER. Unfortunately, I didn't find the source
code to understand the details yet.

Thanks,
Gavin

Re: [PATCH 03/10] qapi: delete un-needed python static analysis configs

2025-02-25 Thread Markus Armbruster

Markus Armbruster  writes:

> John Snow  writes:
>
>> The pylint config is being left in place because the settings differ
>> enough from the python/ directory settings that we need a chit-chat on
>> how to merge them O:-)
>>
>> Everything else can go.
>>
>> Signed-off-by: John Snow 
>
> I tried to compare the contents of the deleted files to "the python/
> directory settings", but I can't find the latter.  Am I confused?

John told me: it's in python/setup.cfg.

[PATCH v3 2/6] migration: Add migration_capabilities_and_transport_compatible() helper

2025-02-25 Thread Li Zhijian via

Similar to migration_channels_and_transport_compatible(), introduce a
new helper migration_capabilities_and_transport_compatible() to check if
the capabilites is compatible with the transport.

Currently, only move the capabilities vs RDMA transport to this
function.

Reviewed-by: Peter Xu 
Signed-off-by: Li Zhijian 
---
V3:
  - collect Reviewed tag
  - reorder: 5th -> 2nd
---
 migration/migration.c | 36 ++--
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index c597aa707e5..2eacae25e0e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -238,6 +238,30 @@ 
migration_channels_and_transport_compatible(MigrationAddress *addr,
 return true;
 }
 
+static bool
+migration_capabilities_and_transport_compatible(MigrationAddress *addr,
+Error **errp)
+{
+if (addr->transport == MIGRATION_ADDRESS_TYPE_RDMA) {
+if (migrate_xbzrle()) {
+error_setg(errp, "RDMA and XBZRLE can't be used together");
+return false;
+}
+if (migrate_multifd()) {
+error_setg(errp, "RDMA and multifd can't be used together");
+return false;
+}
+}
+
+return true;
+}
+
+static bool migration_transport_compatible(MigrationAddress *addr, Error 
**errp)
+{
+return migration_channels_and_transport_compatible(addr, errp) &&
+   migration_capabilities_and_transport_compatible(addr, errp);
+}
+
 static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp)
 {
 uintptr_t a = (uintptr_t) ap, b = (uintptr_t) bp;
@@ -716,7 +740,7 @@ static void qemu_start_incoming_migration(const char *uri, 
bool has_channels,
 }
 
 /* transport mechanism not suitable for migration? */
-if (!migration_channels_and_transport_compatible(addr, errp)) {
+if (!migration_transport_compatible(addr, errp)) {
 return;
 }
 
@@ -735,14 +759,6 @@ static void qemu_start_incoming_migration(const char *uri, 
bool has_channels,
 }
 #ifdef CONFIG_RDMA
 } else if (addr->transport == MIGRATION_ADDRESS_TYPE_RDMA) {
-if (migrate_xbzrle()) {
-error_setg(errp, "RDMA and XBZRLE can't be used together");
-return;
-}
-if (migrate_multifd()) {
-error_setg(errp, "RDMA and multifd can't be used together");
-return;
-}
 rdma_start_incoming_migration(&addr->u.rdma, errp);
 #endif
 } else if (addr->transport == MIGRATION_ADDRESS_TYPE_EXEC) {
@@ -2159,7 +2175,7 @@ void qmp_migrate(const char *uri, bool has_channels,
 }
 
 /* transport mechanism not suitable for migration? */
-if (!migration_channels_and_transport_compatible(addr, errp)) {
+if (!migration_transport_compatible(addr, errp)) {
 return;
 }
 
-- 
2.44.0

[PATCH v3 6/6] migration: Add qtest for migration over RDMA

2025-02-25 Thread Li Zhijian via

This qtest requires there is a RDMA(RoCE) link in the host.
In order to make the test work smoothly, introduce a
scripts/rdma-migration-helper.sh to
- setup a new Soft-RoCE(aka RXE) if it's root
- detect existing RoCE link

Test will be skipped if there is no available RoCE link.
 # Start of rdma tests
 # Running /x86_64/migration/precopy/rdma/plain
 ok 1 /x86_64/migration/precopy/rdma/plain # SKIP
 There is no available rdma link to run RDMA migration test.
 To enable the test:
 (1) Run 'scripts/rdma-migration-helper.sh setup' with root and rerun the test
 or
 (2) Run the test with root privilege

 # End of rdma tests

Reviewed-by: Peter Xu 
Signed-off-by: Li Zhijian 
---
 MAINTAINERS   |  1 +
 scripts/rdma-migration-helper.sh  | 41 +
 tests/qtest/migration/precopy-tests.c | 64 +++
 3 files changed, 106 insertions(+)
 create mode 100755 scripts/rdma-migration-helper.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index 3848d37a38d..15360fcdc4b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3480,6 +3480,7 @@ R: Li Zhijian 
 R: Peter Xu 
 S: Odd Fixes
 F: migration/rdma*
+F: scripts/rdma-migration-helper.sh
 
 Migration dirty limit and dirty page rate
 M: Hyman Huang 
diff --git a/scripts/rdma-migration-helper.sh b/scripts/rdma-migration-helper.sh
new file mode 100755
index 000..66557d9e267
--- /dev/null
+++ b/scripts/rdma-migration-helper.sh
@@ -0,0 +1,41 @@
+#!/bin/bash
+
+# Copied from blktests
+get_ipv4_addr()
+{
+ip -4 -o addr show dev "$1" |
+sed -n 's/.*[[:blank:]]inet[[:blank:]]*\([^[:blank:]/]*\).*/\1/p' |
+tr -d '\n'
+}
+
+has_soft_rdma()
+{
+rdma link | grep -q " netdev $1[[:blank:]]*\$"
+}
+
+rdma_rxe_setup_detect()
+{
+(
+cd /sys/class/net &&
+for i in *; do
+[ -e "$i" ] || continue
+[ "$i" = "lo" ] && continue
+[ "$(<"$i/addr_len")" = 6 ] || continue
+[ "$(<"$i/carrier")" = 1 ] || continue
+
+has_soft_rdma "$i" && break
+[ "$operation" = "setup" ] &&
+rdma link add "${i}_rxe" type rxe netdev "$i" && break
+done
+has_soft_rdma "$i" || return
+get_ipv4_addr "$i"
+)
+}
+
+operation=${1:-setup}
+
+if [ "$operation" == "setup" ] || [ "$operation" == "detect" ]; then
+rdma_rxe_setup_detect
+else
+echo "Usage: $0 [setup | detect]"
+fi
diff --git a/tests/qtest/migration/precopy-tests.c 
b/tests/qtest/migration/precopy-tests.c
index ba273d10b9a..bf97f4e9325 100644
--- a/tests/qtest/migration/precopy-tests.c
+++ b/tests/qtest/migration/precopy-tests.c
@@ -99,6 +99,66 @@ static void test_precopy_unix_dirty_ring(void)
 test_precopy_common(&args);
 }
 
+#ifdef CONFIG_RDMA
+
+#define RDMA_MIGRATION_HELPER "scripts/rdma-migration-helper.sh"
+static int new_rdma_link(char *buffer)
+{
+const char *argument = (geteuid() == 0) ? "setup" : "detect";
+char cmd[1024];
+
+snprintf(cmd, sizeof(cmd), "%s %s", RDMA_MIGRATION_HELPER, argument);
+
+FILE *pipe = popen(cmd, "r");
+if (pipe == NULL) {
+perror("Failed to run script");
+return -1;
+}
+
+int idx = 0;
+while (fgets(buffer + idx, 128 - idx, pipe) != NULL) {
+idx += strlen(buffer);
+}
+
+int status = pclose(pipe);
+if (status == -1) {
+perror("Error reported by pclose()");
+return -1;
+} else if (WIFEXITED(status)) {
+return WEXITSTATUS(status);
+}
+
+return -1;
+}
+
+static void test_precopy_rdma_plain(void)
+{
+char buffer[128] = {};
+
+if (new_rdma_link(buffer)) {
+g_test_skip("\nThere is no available rdma link to run RDMA migration 
test.\n"
+"To enable the test:\n"
+"(1) Run \'" RDMA_MIGRATION_HELPER " setup\' with root and 
rerun the test\n"
+"or\n"
+"(2) Run the test with root privilege\n");
+return;
+}
+
+/*
+ * TODO: query a free port instead of hard code.
+ * 29200=('R'+'D'+'M'+'A')*100
+ **/
+g_autofree char *uri = g_strdup_printf("rdma:%s:29200", buffer);
+
+MigrateCommon args = {
+.listen_uri = uri,
+.connect_uri = uri,
+};
+
+test_precopy_common(&args);
+}
+#endif
+
 static void test_precopy_tcp_plain(void)
 {
 MigrateCommon args = {
@@ -1124,6 +1184,10 @@ static void 
migration_test_add_precopy_smoke(MigrationTestEnv *env)
test_multifd_tcp_uri_none);
 migration_test_add("/migration/multifd/tcp/plain/cancel",
test_multifd_tcp_cancel);
+#ifdef CONFIG_RDMA
+migration_test_add("/migration/precopy/rdma/plain",
+   test_precopy_rdma_plain);
+#endif
 }
 
 void migration_test_add_precopy(MigrationTestEnv *env)
-- 
2.44.0

[PATCH v3 5/6] migration: Unfold control_save_page()

2025-02-25 Thread Li Zhijian via

control_save_page() is for RDMA only, unfold it to make the code more
clear.
In addition:
 - Similar to other branches style in ram_save_target_page(), involve RDMA
   only if the condition 'migrate_rdma()' is true.
 - Further simplify the code by removing the RAM_SAVE_CONTROL_NOT_SUPP.

Signed-off-by: Li Zhijian 
---
V3:
  squash previous 2nd, 3th, 4th into one patch
---
 migration/ram.c  | 34 +++---
 migration/rdma.c |  7 ++-
 migration/rdma.h |  3 +--
 3 files changed, 10 insertions(+), 34 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 424df6d9f13..c363034c882 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1143,32 +1143,6 @@ static int save_zero_page(RAMState *rs, PageSearchStatus 
*pss,
 return len;
 }
 
-/*
- * @pages: the number of pages written by the control path,
- *< 0 - error
- *> 0 - number of pages written
- *
- * Return true if the pages has been saved, otherwise false is returned.
- */
-static bool control_save_page(PageSearchStatus *pss,
-  ram_addr_t offset, int *pages)
-{
-int ret;
-
-ret = rdma_control_save_page(pss->pss_channel, pss->block->offset, offset,
- TARGET_PAGE_SIZE);
-if (ret == RAM_SAVE_CONTROL_NOT_SUPP) {
-return false;
-}
-
-if (ret == RAM_SAVE_CONTROL_DELAYED) {
-*pages = 1;
-return true;
-}
-*pages = ret;
-return true;
-}
-
 /*
  * directly send the page to the stream
  *
@@ -1965,7 +1939,13 @@ static int ram_save_target_page(RAMState *rs, 
PageSearchStatus *pss)
 int res;
 
 /* Hand over to RDMA first */
-if (control_save_page(pss, offset, &res)) {
+if (migrate_rdma()) {
+res = rdma_control_save_page(pss->pss_channel, pss->block->offset,
+ offset, TARGET_PAGE_SIZE);
+
+if (res == RAM_SAVE_CONTROL_DELAYED) {
+res = 1;
+}
 return res;
 }
 
diff --git a/migration/rdma.c b/migration/rdma.c
index e5b4ac599b1..08eb924ffaa 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3284,14 +3284,11 @@ err:
 int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
ram_addr_t offset, size_t size)
 {
-if (!migrate_rdma()) {
-return RAM_SAVE_CONTROL_NOT_SUPP;
-}
+assert(migrate_rdma());
 
 int ret = qemu_rdma_save_page(f, block_offset, offset, size);
 
-if (ret != RAM_SAVE_CONTROL_DELAYED &&
-ret != RAM_SAVE_CONTROL_NOT_SUPP) {
+if (ret != RAM_SAVE_CONTROL_DELAYED) {
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 }
diff --git a/migration/rdma.h b/migration/rdma.h
index f55f28bbed1..8eeb0117b91 100644
--- a/migration/rdma.h
+++ b/migration/rdma.h
@@ -33,7 +33,6 @@ void rdma_start_incoming_migration(InetSocketAddress 
*host_port, Error **errp);
 #define RAM_CONTROL_ROUND 1
 #define RAM_CONTROL_FINISH3
 
-#define RAM_SAVE_CONTROL_NOT_SUPP -1000
 #define RAM_SAVE_CONTROL_DELAYED  -2000
 
 #ifdef CONFIG_RDMA
@@ -56,7 +55,7 @@ static inline
 int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
ram_addr_t offset, size_t size)
 {
-return RAM_SAVE_CONTROL_NOT_SUPP;
+g_assert_not_reached();
 }
 #endif
 #endif
-- 
2.44.0

[PATCH v3 1/6] migration: Prioritize RDMA in ram_save_target_page()

2025-02-25 Thread Li Zhijian via

Address an error in RDMA-based migration by ensuring RDMA is prioritized
when saving pages in `ram_save_target_page()`.

Previously, the RDMA protocol's page-saving step was placed after other
protocols due to a refactoring in commit bc38dc2f5f3. This led to migration
failures characterized by unknown control messages and state loading errors
destination:
(qemu) qemu-system-x86_64: Unknown control message QEMU FILE
qemu-system-x86_64: error while loading state section id 1(ram)
qemu-system-x86_64: load of migration failed: Operation not permitted
source:
(qemu) qemu-system-x86_64: RDMA is in an error state waiting migration to abort!
qemu-system-x86_64: failed to save SaveStateEntry with id(name): 1(ram): -1
qemu-system-x86_64: rdma migration: recv polling control error!
qemu-system-x86_64: warning: Early error. Sending error.
qemu-system-x86_64: warning: rdma migration: send polling control error

RDMA migration implemented its own protocol/method to send pages to
destination side, hand over to RDMA first to prevent pages being saved by
other protocol.

Fixes: bc38dc2f5f3 ("migration: refactor ram_save_target_page functions")
Reviewed-by: Peter Xu 
Signed-off-by: Li Zhijian 
---
V3:
   collect Reviewed tags
---
 migration/ram.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 589b6505eb2..424df6d9f13 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1964,6 +1964,11 @@ static int ram_save_target_page(RAMState *rs, 
PageSearchStatus *pss)
 ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
 int res;
 
+/* Hand over to RDMA first */
+if (control_save_page(pss, offset, &res)) {
+return res;
+}
+
 if (!migrate_multifd()
 || migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) {
 if (save_zero_page(rs, pss, offset)) {
@@ -1976,10 +1981,6 @@ static int ram_save_target_page(RAMState *rs, 
PageSearchStatus *pss)
 return ram_save_multifd_page(block, offset);
 }
 
-if (control_save_page(pss, offset, &res)) {
-return res;
-}
-
 return ram_save_page(rs, pss);
 }
 
-- 
2.44.0

[PATCH v3 3/6] migration: disable RDMA + postcopy-ram

2025-02-25 Thread Li Zhijian via

It's believed that RDMA + postcopy-ram has been broken for a while.
Rather than spending time re-enabling it, let's simply disable it as a
trade-off.

Reviewed-by: Peter Xu 
Signed-off-by: Li Zhijian 
---
V3:
  - collect Reviewed tag
  - reoder: 6th -> 3th
---
 migration/migration.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 2eacae25e0e..d414a4b1379 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -251,6 +251,10 @@ 
migration_capabilities_and_transport_compatible(MigrationAddress *addr,
 error_setg(errp, "RDMA and multifd can't be used together");
 return false;
 }
+if (migrate_postcopy_ram()) {
+error_setg(errp, "RDMA and postcopy-ram can't be used together");
+return false;
+}
 }
 
 return true;
-- 
2.44.0

[PATCH v3 0/6] migration/rdma: fixes, refactor and cleanup

2025-02-25 Thread Li Zhijian via

- It fix the RDMA migration broken issue
- disable RDMA + postcopy
- some cleanups
- Add a qtest for RDMA at last

Chnages since V2:
- squash previous 2/3/4 to '[PATCH v3 5/6] migration: Unfold  
control_save_page()'
- reorder the patch layout to prevent recently added code from being deleted 
again.
- collect Reviewed tags from Peter

Changes since V1[0]:
Add some saparate patches to refactor and cleanup based on V1

[0] 
https://lore.kernel.org/qemu-devel/20250218074345.638203-1-lizhij...@fujitsu.com/


Li Zhijian (6):
  migration: Prioritize RDMA in ram_save_target_page()
  migration: Add migration_capabilities_and_transport_compatible()
helper
  migration: disable RDMA + postcopy-ram
  migration/rdma: Remove redundant migration_in_postcopy checks
  migration: Unfold control_save_page()
  migration: Add qtest for migration over RDMA

 MAINTAINERS   |  1 +
 migration/migration.c | 40 -
 migration/ram.c   | 41 +
 migration/rdma.c  | 11 ++---
 migration/rdma.h  |  3 +-
 scripts/rdma-migration-helper.sh  | 41 +
 tests/qtest/migration/precopy-tests.c | 64 +++
 7 files changed, 152 insertions(+), 49 deletions(-)
 create mode 100755 scripts/rdma-migration-helper.sh

-- 
2.44.0

[PATCH v2 5/5] hw/vfio/pci: Re-order pre-reset

2025-02-25 Thread Alex Williamson

We want the device in the D0 power state going into reset, but the
config write can enable the BARs in the address space, which are
then removed from the address space once we clear the memory enable
bit in the command register.  Re-order to clear the command bit
first, so the power state change doesn't enable the BARs.

Cc: Cédric Le Goater 
Reviewed-by: Zhenzhong Duan 
Reviewed-by: Eric Auger 
Signed-off-by: Alex Williamson 
---
 hw/vfio/pci.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index eab8974e9b48..153455fae85d 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2410,6 +2410,15 @@ void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
 
 vfio_disable_interrupts(vdev);
 
+/*
+ * Stop any ongoing DMA by disconnecting I/O, MMIO, and bus master.
+ * Also put INTx Disable in known state.
+ */
+cmd = vfio_pci_read_config(pdev, PCI_COMMAND, 2);
+cmd &= ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER |
+ PCI_COMMAND_INTX_DISABLE);
+vfio_pci_write_config(pdev, PCI_COMMAND, cmd, 2);
+
 /* Make sure the device is in D0 */
 if (pdev->pm_cap) {
 uint16_t pmcsr;
@@ -2429,15 +2438,6 @@ void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
 }
 }
 }
-
-/*
- * Stop any ongoing DMA by disconnecting I/O, MMIO, and bus master.
- * Also put INTx Disable in known state.
- */
-cmd = vfio_pci_read_config(pdev, PCI_COMMAND, 2);
-cmd &= ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER |
- PCI_COMMAND_INTX_DISABLE);
-vfio_pci_write_config(pdev, PCI_COMMAND, cmd, 2);
 }
 
 void vfio_pci_post_reset(VFIOPCIDevice *vdev)
-- 
2.48.1

[PATCH v2 0/5] PCI: Implement basic PCI PM capability backing

2025-02-25 Thread Alex Williamson

v2:

Eric noted in v1 that one of the drivers had a redundant wmask setting
since pci_pm_init() enabled writes to the power state field.  This was
added because vfio-pci was not setting wmask for this capability but
is allowing writes to the PM state field through to the device.  For
vfio-pci, QEMU emulated config space is rather secondary to the config
space through vfio.

It turns out therefore, that vfio-pci is nearly unique in not already
managing the wmask of the PM capability state and if we embrace that
it's the pci_pm_init() caller's responsibility to manage the remaining
contents and write-access of the capability, then I think we also
solve the question of migration compatibility.  The new infrastructure
here is not changing whether any fields were previously writable, it's
only effecting a mapping change based on the value found there.

This requires only a slight change to patch 1/, removing setting of
the wmask, but commit log is also updated and comments added.  I also
made the bad transition trace a little more obvious given Eric's
comments.  Patch 2/ is also updated so that vfio-pci effects the wmask
change locally.  The couple drivers that don't currently update wmask
simply don't get this new BAR unmapped when not in D0 behavior.

Incorporated reviews for the unmodified patches.  Please re-review and
report any noted issues.  Thanks,

Alex

v1:

https://lore.kernel.org/all/20250220224918.2520417-1-alex.william...@redhat.com/

Eric recently identified an issue[1] where during graceful shutdown
of a VM in a vIOMMU configuration, the guest driver places the device
into the D3 power state, the vIOMMU is then disabled, triggering an
AddressSpace update.  The device BARs are still mapped into the AS,
but the vfio host driver refuses to DMA map the MMIO space due to the
device power state.

The proposed solution in [1] was to skip mappings based on the
device power state.  Here we take a different approach.  The PCI spec
defines that devices in D1/2/3 power state should respond only to
configuration and message requests and all other requests should be
handled as an Unsupported Request.  In other words, the memory and
IO BARs are not accessible except when the device is in the D0 power
state.

To emulate this behavior, we can factor the device power state into
the mapping state of the device BARs.  Therefore the BAR is marked
as unmapped if either the respective command register enable bit is
clear or the device is not in the D0 power state.

In order to implement this, the PowerState field of the PMCSR
register becomes writable, which allows the device to appear in
lower power states.  This also therefore implements D3 support
(insofar as the BAR behavior) for all devices implementing the PM
capability.  The PCI spec requires D3 support.

An aspect that needs attention here is whether this change in the
wmask and PMCSR bits becomes a problem for migration, and how we
might solve it.  For a guest migrating old->new, the device would
always be in the D0 power state, but the register becomes writable.
In the opposite direction, is it possible that a device could
migrate in a low power state and be stuck there since the bits are
read-only in old QEMU?  Do we need an option for this behavior and a
machine state bump, or are there alternatives?

Thanks,
Alex

[1]https://lore.kernel.org/all/20250219175941.135390-1-eric.au...@redhat.com/


Alex Williamson (5):
  hw/pci: Basic support for PCI power management
  pci: Use PCI PM capability initializer
  vfio/pci: Delete local pm_cap
  pcie, virtio: Remove redundant pm_cap
  hw/vfio/pci: Re-order pre-reset

 hw/net/e1000e.c |  3 +-
 hw/net/eepro100.c   |  4 +-
 hw/net/igb.c|  3 +-
 hw/nvme/ctrl.c  |  3 +-
 hw/pci-bridge/pcie_pci_bridge.c |  3 +-
 hw/pci/pci.c| 93 -
 hw/pci/trace-events |  2 +
 hw/vfio/pci.c   | 34 ++--
 hw/vfio/pci.h   |  1 -
 hw/virtio/virtio-pci.c  | 11 ++--
 include/hw/pci/pci.h|  3 ++
 include/hw/pci/pci_device.h |  3 ++
 include/hw/pci/pcie.h   |  2 -
 13 files changed, 127 insertions(+), 38 deletions(-)

-- 
2.48.1

[PATCH v2 1/5] hw/pci: Basic support for PCI power management

2025-02-25 Thread Alex Williamson

The memory and IO BARs for devices are only accessible in the D0 power
state.  In other power states the PCI spec defines that the device
responds to TLPs and messages with an Unsupported Request response.

To approximate this behavior, consider the BARs as unmapped when the
device is not in the D0 power state.  This makes the BARs inaccessible
and has the additional bonus for vfio-pci that we don't attempt to DMA
map BARs for devices in a non-D0 power state.

To support this, an interface is added for devices to register the PM
capability, which allows central tracking to enforce valid transitions
and unmap BARs in non-D0 states.

NB. We currently have device models (eepro100 and pcie_pci_bridge)
that register a PM capability but do not set wmask to enable writes to
the power state field.  In order to maintain migration compatibility,
this new helper does not manage the wmask to enable guest writes to
initiate a power state change.  The contents and write access of the
PM capability are still managed by the caller.

Cc: Michael S. Tsirkin 
Cc: Marcel Apfelbaum 
Signed-off-by: Alex Williamson 
---
 hw/pci/pci.c| 93 -
 hw/pci/trace-events |  2 +
 include/hw/pci/pci.h|  3 ++
 include/hw/pci/pci_device.h |  3 ++
 4 files changed, 99 insertions(+), 2 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 2afa423925c5..24629807de82 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -435,6 +435,84 @@ static void pci_msi_trigger(PCIDevice *dev, MSIMessage msg)
  attrs, NULL);
 }
 
+/*
+ * Register and track a PM capability.  If wmask is also enabled for the power
+ * state field of the pmcsr register, guest writes may change the device PM
+ * state.  BAR access is only enabled while the device is in the D0 state.
+ * Return the capability offset or negative error code.
+ */
+int pci_pm_init(PCIDevice *d, uint8_t offset, Error **errp)
+{
+int cap = pci_add_capability(d, PCI_CAP_ID_PM, offset, PCI_PM_SIZEOF, 
errp);
+
+if (cap < 0) {
+return cap;
+}
+
+d->pm_cap = cap;
+d->cap_present |= QEMU_PCI_CAP_PM;
+
+return cap;
+}
+
+static uint8_t pci_pm_state(PCIDevice *d)
+{
+uint16_t pmcsr;
+
+if (!(d->cap_present & QEMU_PCI_CAP_PM)) {
+return 0;
+}
+
+pmcsr = pci_get_word(d->config + d->pm_cap + PCI_PM_CTRL);
+
+return pmcsr & PCI_PM_CTRL_STATE_MASK;
+}
+
+/*
+ * Update the PM capability state based on the new value stored in config
+ * space respective to the old, pre-write state provided.  If the new value
+ * is rejected (unsupported or invalid transition) restore the old value.
+ * Return the resulting PM state.
+ */
+static uint8_t pci_pm_update(PCIDevice *d, uint32_t addr, int l, uint8_t old)
+{
+uint16_t pmc;
+uint8_t new;
+
+if (!(d->cap_present & QEMU_PCI_CAP_PM) ||
+!range_covers_byte(addr, l, d->pm_cap + PCI_PM_CTRL)) {
+return old;
+}
+
+new = pci_pm_state(d);
+if (new == old) {
+return old;
+}
+
+pmc = pci_get_word(d->config + d->pm_cap + PCI_PM_PMC);
+
+/*
+ * Transitions to D1 & D2 are only allowed if supported.  Devices may
+ * only transition to higher D-states or to D0.
+ */
+if ((!(pmc & PCI_PM_CAP_D1) && new == 1) ||
+(!(pmc & PCI_PM_CAP_D2) && new == 2) ||
+(old && new && new < old)) {
+pci_word_test_and_clear_mask(d->config + d->pm_cap + PCI_PM_CTRL,
+ PCI_PM_CTRL_STATE_MASK);
+pci_word_test_and_set_mask(d->config + d->pm_cap + PCI_PM_CTRL,
+   old);
+trace_pci_pm_bad_transition(d->name, pci_dev_bus_num(d),
+PCI_SLOT(d->devfn), PCI_FUNC(d->devfn),
+old, new);
+return old;
+}
+
+trace_pci_pm_transition(d->name, pci_dev_bus_num(d), PCI_SLOT(d->devfn),
+PCI_FUNC(d->devfn), old, new);
+return new;
+}
+
 static void pci_reset_regions(PCIDevice *dev)
 {
 int r;
@@ -474,6 +552,11 @@ static void pci_do_device_reset(PCIDevice *dev)
   pci_get_word(dev->wmask + PCI_INTERRUPT_LINE) |
   pci_get_word(dev->w1cmask + PCI_INTERRUPT_LINE));
 dev->config[PCI_CACHE_LINE_SIZE] = 0x0;
+/* Default PM state is D0 */
+if (dev->cap_present & QEMU_PCI_CAP_PM) {
+pci_word_test_and_clear_mask(dev->config + dev->pm_cap + PCI_PM_CTRL,
+ PCI_PM_CTRL_STATE_MASK);
+}
 pci_reset_regions(dev);
 pci_update_mappings(dev);
 
@@ -1598,7 +1681,7 @@ static void pci_update_mappings(PCIDevice *d)
 continue;
 
 new_addr = pci_bar_address(d, i, r->type, r->size);
-if (!d->enabled) {
+if (!d->enabled || pci_pm_state(d)) {
 new_addr = PCI_BAR_UNMAPPED;
 }
 
@@ -1664,6 +1747,7 @@ uint32_t pci_default_read_config(

[PATCH v2 4/5] pcie, virtio: Remove redundant pm_cap

2025-02-25 Thread Alex Williamson

The pm_cap on the PCIExpressDevice object can be distilled down
to the new instance on the PCIDevice object.

Cc: Michael S. Tsirkin 
Cc: Marcel Apfelbaum 
Reviewed-by: Michael S. Tsirkin 
Reviewed-by: Zhenzhong Duan 
Reviewed-by: Eric Auger 
Signed-off-by: Alex Williamson 
---
 hw/pci-bridge/pcie_pci_bridge.c | 1 -
 hw/virtio/virtio-pci.c  | 8 +++-
 include/hw/pci/pcie.h   | 2 --
 3 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/hw/pci-bridge/pcie_pci_bridge.c b/hw/pci-bridge/pcie_pci_bridge.c
index 9fa656b43b42..2429503cfbbf 100644
--- a/hw/pci-bridge/pcie_pci_bridge.c
+++ b/hw/pci-bridge/pcie_pci_bridge.c
@@ -56,7 +56,6 @@ static void pcie_pci_bridge_realize(PCIDevice *d, Error 
**errp)
 if (pos < 0) {
 goto pm_error;
 }
-d->exp.pm_cap = pos;
 pci_set_word(d->config + pos + PCI_PM_PMC, 0x3);
 
 pcie_cap_arifwd_init(d);
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index afe8b5551c5c..3ca3f849d391 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -2209,8 +2209,6 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error 
**errp)
 return;
 }
 
-pci_dev->exp.pm_cap = pos;
-
 /*
  * Indicates that this function complies with revision 1.2 of the
  * PCI Power Management Interface Specification.
@@ -2309,11 +2307,11 @@ static bool virtio_pci_no_soft_reset(PCIDevice *dev)
 {
 uint16_t pmcsr;
 
-if (!pci_is_express(dev) || !dev->exp.pm_cap) {
+if (!pci_is_express(dev) || !(dev->cap_present & QEMU_PCI_CAP_PM)) {
 return false;
 }
 
-pmcsr = pci_get_word(dev->config + dev->exp.pm_cap + PCI_PM_CTRL);
+pmcsr = pci_get_word(dev->config + dev->pm_cap + PCI_PM_CTRL);
 
 /*
  * When No_Soft_Reset bit is set and the device
@@ -2342,7 +2340,7 @@ static void virtio_pci_bus_reset_hold(Object *obj, 
ResetType type)
 
 if (proxy->flags & VIRTIO_PCI_FLAG_INIT_PM) {
 pci_word_test_and_clear_mask(
-dev->config + dev->exp.pm_cap + PCI_PM_CTRL,
+dev->config + dev->pm_cap + PCI_PM_CTRL,
 PCI_PM_CTRL_STATE_MASK);
 }
 }
diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index b8d59732bc63..70a5de09de39 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -58,8 +58,6 @@ typedef enum {
 struct PCIExpressDevice {
 /* Offset of express capability in config space */
 uint8_t exp_cap;
-/* Offset of Power Management capability in config space */
-uint8_t pm_cap;
 
 /* SLOT */
 bool hpev_notified; /* Logical AND of conditions for hot plug event.
-- 
2.48.1

[PATCH v3 4/6] migration/rdma: Remove redundant migration_in_postcopy checks

2025-02-25 Thread Li Zhijian via

Since we have disabled RDMA + postcopy, it's safe to remove
the migration_in_postcopy() that follows the migrate_rdma().

Signed-off-by: Li Zhijian 
---
V3:
  reorder: 7th->4th
---
 migration/rdma.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 76fb0349238..e5b4ac599b1 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3284,7 +3284,7 @@ err:
 int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
ram_addr_t offset, size_t size)
 {
-if (!migrate_rdma() || migration_in_postcopy()) {
+if (!migrate_rdma()) {
 return RAM_SAVE_CONTROL_NOT_SUPP;
 }
 
@@ -3829,7 +3829,7 @@ int rdma_block_notification_handle(QEMUFile *f, const 
char *name)
 
 int rdma_registration_start(QEMUFile *f, uint64_t flags)
 {
-if (!migrate_rdma() || migration_in_postcopy()) {
+if (!migrate_rdma()) {
 return 0;
 }
 
@@ -3861,7 +3861,7 @@ int rdma_registration_stop(QEMUFile *f, uint64_t flags)
 RDMAControlHeader head = { .len = 0, .repeat = 1 };
 int ret;
 
-if (!migrate_rdma() || migration_in_postcopy()) {
+if (!migrate_rdma()) {
 return 0;
 }
 
-- 
2.44.0

Re: [PATCH v2 00/11] riscv: IOMMU HPM support

2025-02-25 Thread Alistair Francis

On Tue, Feb 25, 2025 at 5:13 AM Daniel Henrique Barboza
 wrote:
>
> Hi,
>
> In this version no major changes were made. Just a rebase with
> alistair/riscv-to-apply.next and acks from Alistair.
>
> All patches acked.
>
> v1 link: 
> https://lore.kernel.org/qemu-riscv/20241205133003.184581-1-dbarb...@ventanamicro.com/
>
> Daniel Henrique Barboza (3):
>   hw/riscv/riscv-iommu.h: add missing headers
>   hw/riscv: add IOMMU HPM trace events
>   docs/specs/riscv-iommu.rst: add HPM support info
>
> Tomasz Jeznach (8):
>   hw/riscv/riscv-iommu-bits.h: HPM bits
>   hw/riscv/riscv-iommu: add riscv-iommu-hpm file
>   hw/riscv/riscv-iommu: add riscv_iommu_hpm_incr_ctr()
>   hw/riscv/riscv-iommu: instantiate hpm_timer
>   hw/riscv/riscv-iommu: add IOCOUNTINH mmio writes
>   hw/riscv/riscv-iommu: add IOHPMCYCLES mmio write
>   hw/riscv/riscv-iommu: add hpm events mmio write
>   hw/riscv/riscv-iommu.c: add RISCV_IOMMU_CAP_HPM cap

Thanks!

Applied to riscv-to-apply.next

Alistair

>
>  docs/specs/riscv-iommu.rst  |   2 +
>  hw/riscv/meson.build|   3 +-
>  hw/riscv/riscv-iommu-bits.h |  47 +
>  hw/riscv/riscv-iommu-hpm.c  | 381 
>  hw/riscv/riscv-iommu-hpm.h  |  33 
>  hw/riscv/riscv-iommu.c  | 131 +++--
>  hw/riscv/riscv-iommu.h  |  27 +++
>  hw/riscv/trace-events   |   5 +
>  8 files changed, 612 insertions(+), 17 deletions(-)
>  create mode 100644 hw/riscv/riscv-iommu-hpm.c
>  create mode 100644 hw/riscv/riscv-iommu-hpm.h
>
> --
> 2.48.1
>
>

1 2 3 4 >

1 - 100 of 309 matches

Mail list logo