Re: [RFC 0/2] Identify aliased maps in vdpa SVQ iova_tree

2024-05-09 Thread Eugenio Perez Martin
On Thu, May 9, 2024 at 8:27 AM Jason Wang  wrote:
>
> On Thu, May 9, 2024 at 1:16 AM Eugenio Perez Martin  
> wrote:
> >
> > On Wed, May 8, 2024 at 4:29 AM Jason Wang  wrote:
> > >
> > > On Tue, May 7, 2024 at 6:57 PM Eugenio Perez Martin  
> > > wrote:
> > > >
> > > > On Tue, May 7, 2024 at 9:29 AM Jason Wang  wrote:
> > > > >
> > > > > On Fri, Apr 12, 2024 at 3:56 PM Eugenio Perez Martin
> > > > >  wrote:
> > > > > >
> > > > > > On Fri, Apr 12, 2024 at 8:47 AM Jason Wang  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Wed, Apr 10, 2024 at 6:03 PM Eugenio Pérez 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > The guest may have overlapped memory regions, where different 
> > > > > > > > GPA leads
> > > > > > > > to the same HVA.  This causes a problem when overlapped regions
> > > > > > > > (different GPA but same translated HVA) exists in the tree, as 
> > > > > > > > looking
> > > > > > > > them by HVA will return them twice.
> > > > > > >
> > > > > > > I think I don't understand if there's any side effect for shadow 
> > > > > > > virtqueue?
> > > > > > >
> > > > > >
> > > > > > My bad, I totally forgot to put a reference to where this comes 
> > > > > > from.
> > > > > >
> > > > > > Si-Wei found that during initialization this sequences of maps /
> > > > > > unmaps happens [1]:
> > > > > >
> > > > > > HVAGPAIOVA
> > > > > > -
> > > > > > Map
> > > > > > [0x7f7903e0, 0x7f7983e0)[0x0, 0x8000) [0x1000, 
> > > > > > 0x8000)
> > > > > > [0x7f7983e0, 0x7f9903e0)[0x1, 0x208000)
> > > > > > [0x80001000, 0x201000)
> > > > > > [0x7f7903ea, 0x7f7903ec)[0xfeda, 0xfedc)
> > > > > > [0x201000, 0x221000)
> > > > > >
> > > > > > Unmap
> > > > > > [0x7f7903ea, 0x7f7903ec)[0xfeda, 0xfedc) 
> > > > > > [0x1000,
> > > > > > 0x2) ???
> > > > > >
> > > > > > The third HVA range is contained in the first one, but exposed 
> > > > > > under a
> > > > > > different GVA (aliased). This is not "flattened" by QEMU, as GPA 
> > > > > > does
> > > > > > not overlap, only HVA.
> > > > > >
> > > > > > At the third chunk unmap, the current algorithm finds the first 
> > > > > > chunk,
> > > > > > not the second one. This series is the way to tell the difference at
> > > > > > unmap time.
> > > > > >
> > > > > > [1] 
> > > > > > https://lists.nongnu.org/archive/html/qemu-devel/2024-04/msg00079.html
> > > > > >
> > > > > > Thanks!
> > > > >
> > > > > Ok, I was wondering if we need to store GPA(GIOVA) to HVA mappings in
> > > > > the iova tree to solve this issue completely. Then there won't be
> > > > > aliasing issues.
> > > > >
> > > >
> > > > I'm ok to explore that route but this has another problem. Both SVQ
> > > > vrings and CVQ buffers also need to be addressable by VhostIOVATree,
> > > > and they do not have GPA.
> > > >
> > > > At this moment vhost_svq_translate_addr is able to handle this
> > > > transparently as we translate vaddr to SVQ IOVA. How can we store
> > > > these new entries? Maybe a (hwaddr)-1 GPA to signal it has no GPA and
> > > > then a list to go through other entries (SVQ vaddr and CVQ buffers).
> > >
> > > This seems to be tricky.
> > >
> > > As discussed, it could be another iova tree.
> > >
> >
> > Yes but there are many ways to add another IOVATree. Let me expand & recap.
> >
> > Option 1 is to simply add another iova tree to VhostShadowVirtqueue.
> > Let's call it gpa_iova_tree, as opposed to the current iova_tree that
> > translates from vaddr to SVQ IOVA. To know which one to use is easy at
> > adding or removing, like in the memory listener, but how to know at
> > vhost_svq_translate_addr?
>
> Then we won't use virtqueue_pop() at all, we need a SVQ version of
> virtqueue_pop() to translate GPA to SVQ IOVA directly?
>

The problem is not virtqueue_pop, that's out of the
vhost_svq_translate_addr. The problem is the need of adding
conditionals / complexity in all the callers of

> >
> > The easiest way for me is to rely on memory_region_from_host(). When
> > vaddr is from the guest, it returns a valid MemoryRegion. When it is
> > not, it returns NULL. I'm not sure if this is a valid use case, it
> > just worked in my tests so far.
> >
> > Now we have the second problem: The GPA values of the regions of the
> > two IOVA tree must be unique. We need to be able to find unallocated
> > regions in SVQ IOVA. At this moment there is only one IOVATree, so
> > this is done easily by vhost_iova_tree_map_alloc. But it is very
> > complicated with two trees.
>
> Would it be simpler if we decouple the IOVA allocator? For example, we
> can have a dedicated gtree to track the allocated IOVA ranges. It is
> shared by both
>
> 1) Guest memory (GPA)
> 2) SVQ virtqueue and buffers
>
> And another gtree to track the GPA to IOVA.
>
> The SVQ code could use either

[PULL 1/3] hw/loongarch: Refine default numa id calculation

2024-05-09 Thread Song Gao
From: Bibo Mao 

With numa_test test case, there is subcase named test_def_cpu_split(),
there are 8 sockets and 2 numa nodes. Here is command line:
"-machine smp.cpus=8,smp.sockets=8 -numa node,memdev=ram -numa node"

The required result is:
  node 0 cpus: 0 2 4 6
  node 1 cpus: 1 3 5 7
Test case numa_test fails on LoongArch, since the actual result is:
  node 0 cpus: 0 1 2 3
  node 1 cpus: 4 5 6 7

It will be better if all the cpus in one socket share the same numa
node. Here socket id is used to calculate numa id in function
virt_get_default_cpu_node_id().

Signed-off-by: Bibo Mao 
Reviewed-by: Song Gao 
Message-Id: <20240319022606.2994565-1-maob...@loongson.cn>
Signed-off-by: Song Gao 
---
 hw/loongarch/virt.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index c0999878df..12d20055fe 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -1192,15 +1192,14 @@ virt_cpu_index_to_props(MachineState *ms, unsigned 
cpu_index)
 
 static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx)
 {
-int64_t nidx = 0;
+int64_t socket_id;
 
 if (ms->numa_state->num_nodes) {
-nidx = idx / (ms->smp.cpus / ms->numa_state->num_nodes);
-if (ms->numa_state->num_nodes <= nidx) {
-nidx = ms->numa_state->num_nodes - 1;
-}
+socket_id = ms->possible_cpus->cpus[idx].props.socket_id;
+return socket_id % ms->numa_state->num_nodes;
+} else {
+return 0;
 }
-return nidx;
 }
 
 static void loongarch_class_init(ObjectClass *oc, void *data)
-- 
2.25.1




[PULL 3/3] target/loongarch: Put cpucfg operation before CSR register

2024-05-09 Thread Song Gao
From: Bibo Mao 

On Loongarch, cpucfg is register for cpu feature, some other registers
depend on cpucfg feature such as perf CSR registers. Here put cpucfg
read/write operations before CSR register, so that KVM knows how many
perf CSR registers are valid from pre-set cpucfg feature information.

Signed-off-by: Bibo Mao 
Reviewed-by: Song Gao 
Message-Id: <20240428031651.1354587-1-maob...@loongson.cn>
Signed-off-by: Song Gao 
---
 target/loongarch/kvm/kvm.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/target/loongarch/kvm/kvm.c b/target/loongarch/kvm/kvm.c
index 8224d94333..bc75552d0f 100644
--- a/target/loongarch/kvm/kvm.c
+++ b/target/loongarch/kvm/kvm.c
@@ -587,22 +587,22 @@ int kvm_arch_get_registers(CPUState *cs)
 return ret;
 }
 
-ret = kvm_loongarch_get_csr(cs);
+ret = kvm_loongarch_get_cpucfg(cs);
 if (ret) {
 return ret;
 }
 
-ret = kvm_loongarch_get_regs_fp(cs);
+ret = kvm_loongarch_get_csr(cs);
 if (ret) {
 return ret;
 }
 
-ret = kvm_loongarch_get_mpstate(cs);
+ret = kvm_loongarch_get_regs_fp(cs);
 if (ret) {
 return ret;
 }
 
-ret = kvm_loongarch_get_cpucfg(cs);
+ret = kvm_loongarch_get_mpstate(cs);
 return ret;
 }
 
@@ -615,22 +615,22 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 return ret;
 }
 
-ret = kvm_loongarch_put_csr(cs, level);
+ret = kvm_loongarch_put_cpucfg(cs);
 if (ret) {
 return ret;
 }
 
-ret = kvm_loongarch_put_regs_fp(cs);
+ret = kvm_loongarch_put_csr(cs, level);
 if (ret) {
 return ret;
 }
 
-ret = kvm_loongarch_put_mpstate(cs);
+ret = kvm_loongarch_put_regs_fp(cs);
 if (ret) {
 return ret;
 }
 
-ret = kvm_loongarch_put_cpucfg(cs);
+ret = kvm_loongarch_put_mpstate(cs);
 return ret;
 }
 
-- 
2.25.1




[PULL 2/3] target/loongarch: Add TCG macro in structure CPUArchState

2024-05-09 Thread Song Gao
From: Bibo Mao 

In structure CPUArchState some struct elements are only used in TCG
mode, and it is not used in KVM mode. Macro CONFIG_TCG is added to
make it simpiler in KVM mode, also there is the same modification
in c code when these structure elements are used.

When VM runs in KVM mode, TLB entries are not used and do not need
migrate. It is only useful when it runs in TCG mode.

Signed-off-by: Bibo Mao 
Reviewed-by: Richard Henderson 
Message-Id: <20240506011912.2108842-1-maob...@loongson.cn>
Signed-off-by: Song Gao 
---
 target/loongarch/cpu.c|  7 +--
 target/loongarch/cpu.h| 16 ++--
 target/loongarch/cpu_helper.c |  9 +
 target/loongarch/machine.c| 30 +-
 4 files changed, 49 insertions(+), 13 deletions(-)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 96da1a685e..a0cad53676 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -505,7 +505,9 @@ static void loongarch_cpu_reset_hold(Object *obj, ResetType 
type)
 lacc->parent_phases.hold(obj, type);
 }
 
+#ifdef CONFIG_TCG
 env->fcsr0_mask = FCSR0_M1 | FCSR0_M2 | FCSR0_M3;
+#endif
 env->fcsr0 = 0x0;
 
 int n;
@@ -550,7 +552,9 @@ static void loongarch_cpu_reset_hold(Object *obj, ResetType 
type)
 
 #ifndef CONFIG_USER_ONLY
 env->pc = 0x1c00;
+#ifdef CONFIG_TCG
 memset(env->tlb, 0, sizeof(env->tlb));
+#endif
 if (kvm_enabled()) {
 kvm_arch_reset_vcpu(env);
 }
@@ -686,8 +690,7 @@ void loongarch_cpu_dump_state(CPUState *cs, FILE *f, int 
flags)
 int i;
 
 qemu_fprintf(f, " PC=%016" PRIx64 " ", env->pc);
-qemu_fprintf(f, " FCSR0 0x%08x  fp_status 0x%02x\n", env->fcsr0,
- get_float_exception_flags(&env->fp_status));
+qemu_fprintf(f, " FCSR0 0x%08x\n", env->fcsr0);
 
 /* gpr */
 for (i = 0; i < 32; i++) {
diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index c5722670f5..41b8e6d96d 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -270,6 +270,7 @@ union fpr_t {
 VReg  vreg;
 };
 
+#ifdef CONFIG_TCG
 struct LoongArchTLB {
 uint64_t tlb_misc;
 /* Fields corresponding to CSR_TLBELO0/1 */
@@ -277,23 +278,18 @@ struct LoongArchTLB {
 uint64_t tlb_entry1;
 };
 typedef struct LoongArchTLB LoongArchTLB;
+#endif
 
 typedef struct CPUArchState {
 uint64_t gpr[32];
 uint64_t pc;
 
 fpr_t fpr[32];
-float_status fp_status;
 bool cf[8];
-
 uint32_t fcsr0;
-uint32_t fcsr0_mask;
 
 uint32_t cpucfg[21];
 
-uint64_t lladdr; /* LL virtual address compared against SC */
-uint64_t llval;
-
 /* LoongArch CSRs */
 uint64_t CSR_CRMD;
 uint64_t CSR_PRMD;
@@ -350,8 +346,16 @@ typedef struct CPUArchState {
 uint64_t CSR_DERA;
 uint64_t CSR_DSAVE;
 
+#ifdef CONFIG_TCG
+float_status fp_status;
+uint32_t fcsr0_mask;
+uint64_t lladdr; /* LL virtual address compared against SC */
+uint64_t llval;
+#endif
 #ifndef CONFIG_USER_ONLY
+#ifdef CONFIG_TCG
 LoongArchTLB  tlb[LOONGARCH_TLB_MAX];
+#endif
 
 AddressSpace *address_space_iocsr;
 bool load_elf;
diff --git a/target/loongarch/cpu_helper.c b/target/loongarch/cpu_helper.c
index 960eec9567..580362ac3e 100644
--- a/target/loongarch/cpu_helper.c
+++ b/target/loongarch/cpu_helper.c
@@ -11,6 +11,7 @@
 #include "internals.h"
 #include "cpu-csr.h"
 
+#ifdef CONFIG_TCG
 static int loongarch_map_tlb_entry(CPULoongArchState *env, hwaddr *physical,
int *prot, target_ulong address,
int access_type, int index, int mmu_idx)
@@ -154,6 +155,14 @@ static int loongarch_map_address(CPULoongArchState *env, 
hwaddr *physical,
 
 return TLBRET_NOMATCH;
 }
+#else
+static int loongarch_map_address(CPULoongArchState *env, hwaddr *physical,
+ int *prot, target_ulong address,
+ MMUAccessType access_type, int mmu_idx)
+{
+return TLBRET_NOMATCH;
+}
+#endif
 
 static hwaddr dmw_va2pa(CPULoongArchState *env, target_ulong va,
 target_ulong dmw)
diff --git a/target/loongarch/machine.c b/target/loongarch/machine.c
index c7029fb9b4..9cd9e848d6 100644
--- a/target/loongarch/machine.c
+++ b/target/loongarch/machine.c
@@ -8,6 +8,7 @@
 #include "qemu/osdep.h"
 #include "cpu.h"
 #include "migration/cpu.h"
+#include "sysemu/tcg.h"
 #include "vec.h"
 
 static const VMStateDescription vmstate_fpu_reg = {
@@ -109,9 +110,15 @@ static const VMStateDescription vmstate_lasx = {
 },
 };
 
+#if defined(CONFIG_TCG) && !defined(CONFIG_USER_ONLY)
+static bool tlb_needed(void *opaque)
+{
+return tcg_enabled();
+}
+
 /* TLB state */
-const VMStateDescription vmstate_tlb = {
-.name = "cpu/tlb",
+static const VMStateDescription vmstate_tlb_entry = {
+.name = "cpu/tlb_entry",
 .version_id = 0,
 .minimum_version_id = 0,
 .fields = (const VMStateField[]) {
@@ -122,6 +129,19 @@ c

[PULL 0/3] loongarch-to-apply queue

2024-05-09 Thread Song Gao
The following changes since commit 4e66a08546a2588a4667766a1edab9caccf24ce3:

  Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging 
(2024-05-07 09:26:30 -0700)

are available in the Git repository at:

  https://gitlab.com/gaosong/qemu.git tags/pull-loongarch-20240509

for you to fetch changes up to 5872966db7abaa7f8753541b7a9f242df9752b50:

  target/loongarch: Put cpucfg operation before CSR register (2024-05-09 
15:19:22 +0800)


pull-loongarch-20240509


Bibo Mao (3):
  hw/loongarch: Refine default numa id calculation
  target/loongarch: Add TCG macro in structure CPUArchState
  target/loongarch: Put cpucfg operation before CSR register

 hw/loongarch/virt.c   | 11 +--
 target/loongarch/cpu.c|  7 +--
 target/loongarch/cpu.h| 16 ++--
 target/loongarch/cpu_helper.c |  9 +
 target/loongarch/kvm/kvm.c| 16 
 target/loongarch/machine.c| 30 +-
 6 files changed, 62 insertions(+), 27 deletions(-)




Re: [PATCH 8/9] migration: Add support for fdset with multifd + file

2024-05-09 Thread Daniel P . Berrangé
On Wed, May 08, 2024 at 05:39:53PM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > On Wed, May 08, 2024 at 09:53:48AM +0100, Daniel P. Berrangé wrote:
> >> On Fri, Apr 26, 2024 at 11:20:41AM -0300, Fabiano Rosas wrote:
> >> > Allow multifd to use an fdset when migrating to a file. This is useful
> >> > for the scenario where the management layer wants to have control over
> >> > the migration file.
> >> > 
> >> > By receiving the file descriptors directly, QEMU can delegate some
> >> > high level operating system operations to the management layer (such
> >> > as mandatory access control). The management layer might also want to
> >> > add its own headers before the migration stream.
> >> > 
> >> > Enable the "file:/dev/fdset/#" syntax for the multifd migration with
> >> > mapped-ram. The requirements for the fdset mechanism are:
> >> > 
> >> > On the migration source side:
> >> > 
> >> > - the fdset must contain two fds that are not duplicates between
> >> >   themselves;
> >> > - if direct-io is to be used, exactly one of the fds must have the
> >> >   O_DIRECT flag set;
> >> > - the file must be opened with WRONLY both times.
> >> > 
> >> > On the migration destination side:
> >> > 
> >> > - the fdset must contain one fd;
> >> > - the file must be opened with RDONLY.
> >> > 
> >> > Signed-off-by: Fabiano Rosas 
> >> > ---
> >> >  docs/devel/migration/main.rst   | 18 ++
> >> >  docs/devel/migration/mapped-ram.rst |  6 -
> >> >  migration/file.c| 38 -
> >> >  3 files changed, 60 insertions(+), 2 deletions(-)
> >> > 
> >> > diff --git a/docs/devel/migration/main.rst 
> >> > b/docs/devel/migration/main.rst
> >> > index 54385a23e5..50f6096470 100644
> >> > --- a/docs/devel/migration/main.rst
> >> > +++ b/docs/devel/migration/main.rst
> >> > @@ -47,6 +47,24 @@ over any transport.
> >> >QEMU interference. Note that QEMU does not flush cached file
> >> >data/metadata at the end of migration.
> >> >  
> >> > +  The file migration also supports using a file that has already been
> >> > +  opened. A set of file descriptors is passed to QEMU via an "fdset"
> >> > +  (see add-fd QMP command documentation). This method allows a
> >> > +  management application to have control over the migration file
> >> > +  opening operation. There are, however, strict requirements to this
> >> > +  interface:
> >> > +
> >> > +  On the migration source side:
> >> > +- if the multifd capability is to be used, the fdset must contain
> >> > +  two file descriptors that are not duplicates between themselves;
> >> > +- if the direct-io capability is to be used, exactly one of the
> >> > +  file descriptors must have the O_DIRECT flag set;
> >> > +- the file must be opened with WRONLY.
> >> > +
> >> > +  On the migration destination side:
> >> > +- the fdset must contain one file descriptor;
> >> > +- the file must be opened with RDONLY.
> >> > +
> >> >  In addition, support is included for migration using RDMA, which
> >> >  transports the page data using ``RDMA``, where the hardware takes care 
> >> > of
> >> >  transporting the pages, and the load on the CPU is much lower.  While 
> >> > the
> >> > diff --git a/docs/devel/migration/mapped-ram.rst 
> >> > b/docs/devel/migration/mapped-ram.rst
> >> > index fa4cefd9fc..e6505511f0 100644
> >> > --- a/docs/devel/migration/mapped-ram.rst
> >> > +++ b/docs/devel/migration/mapped-ram.rst
> >> > @@ -16,7 +16,7 @@ location in the file, rather than constantly being 
> >> > added to a
> >> >  sequential stream. Having the pages at fixed offsets also allows the
> >> >  usage of O_DIRECT for save/restore of the migration stream as the
> >> >  pages are ensured to be written respecting O_DIRECT alignment
> >> > -restrictions (direct-io support not yet implemented).
> >> > +restrictions.
> >> >  
> >> >  Usage
> >> >  -
> >> > @@ -35,6 +35,10 @@ Use a ``file:`` URL for migration:
> >> >  Mapped-ram migration is best done non-live, i.e. by stopping the VM on
> >> >  the source side before migrating.
> >> >  
> >> > +For best performance enable the ``direct-io`` capability as well:
> >> > +
> >> > +``migrate_set_capability direct-io on``
> >> > +
> >> >  Use-cases
> >> >  -
> >> >  
> >> > diff --git a/migration/file.c b/migration/file.c
> >> > index b9265b14dd..3bc8bc7463 100644
> >> > --- a/migration/file.c
> >> > +++ b/migration/file.c
> >> > @@ -17,6 +17,7 @@
> >> >  #include "io/channel-file.h"
> >> >  #include "io/channel-socket.h"
> >> >  #include "io/channel-util.h"
> >> > +#include "monitor/monitor.h"
> >> >  #include "options.h"
> >> >  #include "trace.h"
> >> >  
> >> > @@ -54,10 +55,18 @@ static void file_remove_fdset(void)
> >> >  }
> >> >  }
> >> >  
> >> > +/*
> >> > + * With multifd, due to the behavior of the dup() system call, we need
> >> > + * the fdset to have two non-duplicate fds so we can enable direct IO
> >> > + * in the secondary channels without 

Re: [PULL v2 00/28] Misc HW patches for 2024-05-08

2024-05-09 Thread Richard Henderson

On 5/9/24 00:15, Philippe Mathieu-Daudé wrote:

v2: Updated Bernhard's patches

The following changes since commit 4e66a08546a2588a4667766a1edab9caccf24ce3:

   Merge tag 'for-upstream' ofhttps://gitlab.com/bonzini/qemu  into staging 
(2024-05-07 09:26:30 -0700)

are available in the Git repository at:

   https://github.com/philmd/qemu.git  tags/hw-misc-20240508

for you to fetch changes up to 8b4d80bb53af30db5de91749216d0bb73fa93cab:

   misc: Use QEMU header path relative to include/ directory (2024-05-09 
00:07:21 +0200)


Misc HW patches

- Few more g_memdup() replaced by safer g_memdup2() wrapper (Phil)
- Endianness access fixed in vfio-user config space (Mattias)
- Replace qemu_mutex_lock() -> QEMU_LOCK_GUARD in system/physmem (Phil)
- Per-AddressSpace bounce buffering (Mattias)
- Allow to compile x86 PC machines without Floppy Controller (Thomas)
- Cleanups around i386 "isa-bios" memory regions (Bernhard)
- Remove unused usb rndis_config_parameter structure (David)
- Migrate missing clock in STM32L4x5 GPIOs (Inès)
- Deprecate PPC 'ref405ep' machine and 405 CPUs (Cédric)
- Memory leak fixed in Loongarch Virt machine (Song Gao)
- hw/loongarch/ code moved around (Paolo & Bibo Mao)
- Emulate S3 suspend in loongson3_virt machine (Jiaxun)
- Implement IOCSR address space in Loongson IPI (Jiaxun)
- Use QEMU header path relative to include/ directory (Phil)


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/9.1 as 
appropriate.


r~




Re: [PATCH] gitlab: Update msys2-64bit runner tags

2024-05-09 Thread Richard Henderson

On 5/7/24 19:53, Richard Henderson wrote:

Gitlab has deprecated and removed support for windows-1809
and shared-windows.  Update to saas-windows-medium-amd64 per

https://about.gitlab.com/blog/2024/01/22/windows-2022-support-for-gitlab-saas-runners/

Signed-off-by: Richard Henderson 
---
  .gitlab-ci.d/windows.yml | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/.gitlab-ci.d/windows.yml b/.gitlab-ci.d/windows.yml
index d26dbdd0c0..a83f23a786 100644
--- a/.gitlab-ci.d/windows.yml
+++ b/.gitlab-ci.d/windows.yml
@@ -1,9 +1,7 @@
  msys2-64bit:
extends: .base_job_template
tags:
-  - shared-windows
-  - windows
-  - windows-1809
+  - saas-windows-medium-amd64
cache:
  key: "$CI_JOB_NAME"
  paths:


Applied directly to master as a build fix.


r~



Re: CXL numa error on arm64 qemu virt machine

2024-05-09 Thread Yuquan Wang
On Wed, May 08, 2024 at 01:02:52PM +0100, Jonathan Cameron wrote:
> 
> > [0.00] ACPI: SRAT: Node 0 PXM 0 [mem 0x4000-0xbfff]
> > [0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0xc000-0x13fff]
> > [0.00] ACPI: Unknown target node for memory at 0x100, 
> > assuming node 0
> > [0.00] NUMA: Warning: invalid memblk node 16 [mem 
> > 0x0400-0x07ff]
> > [0.00] NUMA: Faking a node at [mem 
> > 0x0400-0x00013fff]
> > [0.00] NUMA: NODE_DATA [mem 0x13f7f89c0-0x13f7fafff]
> > 
> > Previous discussion: 
> > https://lore.kernel.org/linux-cxl/20231011150620.2...@huawei.com/
> > 
> > root@debian-bullseye-arm64:~# cxl create-region -d decoder0.0 -t ram
> > [   68.653873] cxl region0: Bypassing cpu_cache_invalidate_memregion() for 
> > testing!
> > [   68.660568] Unknown target node for memory at 0x100, assuming 
> > node 0
> 
> You need a load of kernel changes for NUMA nodes to work correctly with
> CXL memory on arm64 platforms.  I have some working code but need to tidy
> up a few corners that came up in an internal review earlier this week.
> 
> I have some travel coming up so may take a week or so to get those out.
> 
> Curiously that invalid memblk has nothing to do with the CXL fixed memory 
> window
> Could you check if that is happening for you without the CXL patches?
> 

Thanks.

I have checked it, the problem is caused by my bios firmware file. I
change the bios file and the numa topology can works.

BTW, if it is convenient, could you help to post the link of the patches of CXL 
memory NUMA nodes on arm64 platforms?

> 
> Whilst it doesn't work yet (because of missing kernel support)
> you'll need something that looks more like the generic ports test added in 
> https://gitlab.com/jic23/qemu/-/commit/6589c527920ba22fe0923b60b58d33a8e9fd371e
> 
> Most importantly
> -numa node,nodeid=2 -object acpi-generic-port,id=gp0,pci-bus-cxl.1,node=2
> + the bits setting distances etc.  Note CXL memory does not provide SLIT like
> data at the moment, so the test above won't help you identify if it is 
> correctly
> set up.  That's a gap in general in the kernel support. Whilst we'd love
> it if everyone moved to hmat derived information we may need to provide
> some fallback.
> 
> Jonathan
> 

Many thanks
Yuquan




[PATCH v2] tests/qtest: Add some test cases support on LoongArch

2024-05-09 Thread Bibo Mao
Add boot-serial-test and filter test cases support on LoongArch system.

Signed-off-by: Bibo Mao 
---
v1 ... v2:
 1. Refresh the changelog, adding filter test case support also.
 2. Adjust order of loongarch qtest in alphabetical order.
---
 tests/qtest/boot-serial-test.c | 10 ++
 tests/qtest/meson.build|  3 +++
 2 files changed, 13 insertions(+)

diff --git a/tests/qtest/boot-serial-test.c b/tests/qtest/boot-serial-test.c
index e3b7d65fe5..df389adeeb 100644
--- a/tests/qtest/boot-serial-test.c
+++ b/tests/qtest/boot-serial-test.c
@@ -26,6 +26,14 @@ static const uint8_t bios_avr[] = {
 0x80, 0x93, 0xc6, 0x00, /* sts 0x00C6, r24 ; Output 'T' */
 };
 
+static const uint8_t bios_loongarch64[] = {
+0x0c, 0xc0, 0x3f, 0x14, /* lu12i.w $t0, 0x1fe00*/
+0x8c, 0x81, 0x87, 0x03, /* ori $t0, $t0, 0x1e0 */
+0x0d, 0x50, 0x81, 0x03, /* li.w$t1, 'T'*/
+0x8d, 0x01, 0x00, 0x29, /* st.b$t1, $t0, 0 */
+0xff, 0xf3, 0xff, 0x53, /*  b  -16  # loop */
+};
+
 static const uint8_t kernel_mcf5208[] = {
 0x41, 0xf9, 0xfc, 0x06, 0x00, 0x00, /* lea 0xfc06,%a0 */
 0x10, 0x3c, 0x00, 0x54, /* move.b #'T',%d0 */
@@ -167,6 +175,8 @@ static const testdef_t tests[] = {
 { "sparc", "SS-600MP", "", "TMS390Z55" },
 { "sparc64", "sun4u", "", "UltraSPARC" },
 { "s390x", "s390-ccw-virtio", "", "device" },
+{ "loongarch64", "virt", "-cpu max", "TT", sizeof(bios_loongarch64),
+  NULL, bios_loongarch64 },
 { "m68k", "mcf5208evb", "", "TT", sizeof(kernel_mcf5208), kernel_mcf5208 },
 { "m68k", "next-cube", "", "TT", sizeof(bios_nextcube), 0, bios_nextcube },
 { "microblaze", "petalogix-s3adsp1800", "", "TT",
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 6f2f594ace..86293051dc 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -139,6 +139,9 @@ qtests_hppa = ['boot-serial-test'] + \
   qtests_filter + \
   (config_all_devices.has_key('CONFIG_VGA') ? ['display-vga-test'] : [])
 
+qtests_loongarch64 = qtests_filter + \
+  ['boot-serial-test']
+
 qtests_m68k = ['boot-serial-test'] + \
   qtests_filter
 

base-commit: 4e66a08546a2588a4667766a1edab9caccf24ce3
-- 
2.39.3




Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-05-09 Thread Zheng Chuan via
Hi, Peter,Lei,Jinpu.

On 2024/5/8 0:28, Peter Xu wrote:
> On Tue, May 07, 2024 at 01:50:43AM +, Gonglei (Arei) wrote:
>> Hello,
>>
>>> -Original Message-
>>> From: Peter Xu [mailto:pet...@redhat.com]
>>> Sent: Monday, May 6, 2024 11:18 PM
>>> To: Gonglei (Arei) 
>>> Cc: Daniel P. Berrangé ; Markus Armbruster
>>> ; Michael Galaxy ; Yu Zhang
>>> ; Zhijian Li (Fujitsu) ; Jinpu 
>>> Wang
>>> ; Elmar Gerdes ;
>>> qemu-devel@nongnu.org; Yuval Shaia ; Kevin Wolf
>>> ; Prasanna Kumar Kalever
>>> ; Cornelia Huck ;
>>> Michael Roth ; Prasanna Kumar Kalever
>>> ; integrat...@gluster.org; Paolo Bonzini
>>> ; qemu-bl...@nongnu.org; de...@lists.libvirt.org;
>>> Hanna Reitz ; Michael S. Tsirkin ;
>>> Thomas Huth ; Eric Blake ; Song
>>> Gao ; Marc-André Lureau
>>> ; Alex Bennée ;
>>> Wainer dos Santos Moschetta ; Beraldo Leal
>>> ; Pannengyuan ;
>>> Xiexiangyou 
>>> Subject: Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling
>>>
>>> On Mon, May 06, 2024 at 02:06:28AM +, Gonglei (Arei) wrote:
 Hi, Peter
>>>
>>> Hey, Lei,
>>>
>>> Happy to see you around again after years.
>>>
>> Haha, me too.
>>
 RDMA features high bandwidth, low latency (in non-blocking lossless
 network), and direct remote memory access by bypassing the CPU (As you
 know, CPU resources are expensive for cloud vendors, which is one of
 the reasons why we introduced offload cards.), which TCP does not have.
>>>
>>> It's another cost to use offload cards, v.s. preparing more cpu resources?
>>>
>> Software and hardware offload converged architecture is the way to go for 
>> all cloud vendors 
>> (Including comprehensive benefits in terms of performance, cost, security, 
>> and innovation speed), 
>> it's not just a matter of adding the resource of a DPU card.
>>
 In some scenarios where fast live migration is needed (extremely short
 interruption duration and migration duration) is very useful. To this
 end, we have also developed RDMA support for multifd.
>>>
>>> Will any of you upstream that work?  I'm curious how intrusive would it be
>>> when adding it to multifd, if it can keep only 5 exported functions like 
>>> what
>>> rdma.h does right now it'll be pretty nice.  We also want to make sure it 
>>> works
>>> with arbitrary sized loads and buffers, e.g. vfio is considering to add IO 
>>> loads to
>>> multifd channels too.
>>>
>>
>> In fact, we sent the patchset to the community in 2021. Pls see:
>> https://lore.kernel.org/all/20210203185906.GT2950@work-vm/T/
> 

Yes, I have sent the patchset of multifd support for rdma migration by taking 
over my colleague, and also
sorry for not keeping on this work at that time due to some reasons.
And also I am strongly agree with Lei that the RDMA protocol has some special 
advantages against with TCP
in some scenario, and we are indeed to use it in our product.

> I wasn't aware of that for sure in the past..
> 
> Multifd has changed quite a bit in the last 9.0 release, that may not apply
> anymore.  One thing to mention is please look at Dan's comment on possible
> use of rsocket.h:
> 
> https://lore.kernel.org/all/zjjm6rcqs5eho...@redhat.com/
> 
> And Jinpu did help provide an initial test result over the library:
> 
> https://lore.kernel.org/qemu-devel/camgffek8wiknqmouyxcathgtiem2dwocf_w7t0vmcd-i30t...@mail.gmail.com/
> 
> It looks like we have a chance to apply that in QEMU.
> 
>>
>>
>>> One thing to note that the question here is not about a pure performance
>>> comparison between rdma and nics only.  It's about help us make a decision
>>> on whether to drop rdma, iow, even if rdma performs well, the community 
>>> still
>>> has the right to drop it if nobody can actively work and maintain it.
>>> It's just that if nics can perform as good it's more a reason to drop, 
>>> unless
>>> companies can help to provide good support and work together.
>>>
>>
>> We are happy to provide the necessary review and maintenance work for RDMA
>> if the community needs it.
>>
>> CC'ing Chuan Zheng.
> 
> I'm not sure whether you and Jinpu's team would like to work together and
> provide a final solution for rdma over multifd.  It could be much simpler
> than the original 2021 proposal if the rsocket API will work out.
> 
> Thanks,
> 
That's a good news to see the socket abstraction for RDMA!
When I was developed the series above, the most pain is the RDMA migration has 
no QIOChannel abstraction and i need to take a 'fake channel'
for it which is awkward in code implementation.
So, as far as I know, we can do this by
i. the first thing is that we need to evaluate the rsocket is good enough to 
satisfy our QIOChannel fundamental abstraction
ii. if it works right, then we will continue to see if it can give us 
opportunity to hide the detail of rdma protocol
into rsocket by remove most of code in rdma.c and also some hack in 
migration main process.
iii. implement the advanced features like multi-fd and multi-uri for rdma 
migration.

Since I am not familiar wi

[PATCH v2 3/3] vfio/migration: Don't emit STOP_COPY VFIO migration QAPI event twice

2024-05-09 Thread Avihai Horon
When migrating a VFIO device that supports pre-copy, it is transitioned
to STOP_COPY twice: once in vfio_vmstate_change() and second time in
vfio_save_complete_precopy().

The second transition is harmless, as it's a STOP_COPY->STOP_COPY no-op
transition. However, with the newly added VFIO migration QAPI event, the
STOP_COPY event is undesirably emitted twice.

Prevent this by returning early in vfio_migration_set_state() if
new_state is the same as current device state.

Note that the STOP_COPY transition in vfio_save_complete_precopy() is
essential for VFIO devices that don't support pre-copy, for migrating an
already stopped guest and for snapshots.

Signed-off-by: Avihai Horon 
---
 hw/vfio/migration.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 5a359c4c78..14ef9c924e 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -143,6 +143,10 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
 (struct vfio_device_feature_mig_state *)feature->data;
 int ret;
 
+if (new_state == migration->device_state) {
+return 0;
+}
+
 feature->argsz = sizeof(buf);
 feature->flags =
 VFIO_DEVICE_FEATURE_SET | VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE;
-- 
2.26.3




[PATCH v2 2/3] vfio/migration: Emit VFIO migration QAPI event

2024-05-09 Thread Avihai Horon
Emit VFIO migration QAPI event when a VFIO device changes its migration
state. This can be used by management applications to get updates on the
current state of the VFIO device for their own purposes.

A new per VFIO device capability, "migration-events", is added so events
can be enabled only for the required devices. It is disabled by default.

Signed-off-by: Avihai Horon 
---
 include/hw/vfio/vfio-common.h |  1 +
 hw/vfio/migration.c   | 56 +--
 hw/vfio/pci.c |  2 ++
 3 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index b9da6c08ef..3ec5f2425e 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -115,6 +115,7 @@ typedef struct VFIODevice {
 bool no_mmap;
 bool ram_block_discard_allowed;
 OnOffAuto enable_migration;
+bool migration_events;
 VFIODeviceOps *ops;
 unsigned int num_irqs;
 unsigned int num_regions;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 06ae40969b..5a359c4c78 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -24,6 +24,7 @@
 #include "migration/register.h"
 #include "migration/blocker.h"
 #include "qapi/error.h"
+#include "qapi/qapi-events-vfio.h"
 #include "exec/ramlist.h"
 #include "exec/ram_addr.h"
 #include "pci.h"
@@ -80,6 +81,55 @@ static const char *mig_state_to_str(enum 
vfio_device_mig_state state)
 }
 }
 
+static VfioMigrationState
+mig_state_to_qapi_state(enum vfio_device_mig_state state)
+{
+switch (state) {
+case VFIO_DEVICE_STATE_STOP:
+return QAPI_VFIO_MIGRATION_STATE_STOP;
+case VFIO_DEVICE_STATE_RUNNING:
+return QAPI_VFIO_MIGRATION_STATE_RUNNING;
+case VFIO_DEVICE_STATE_STOP_COPY:
+return QAPI_VFIO_MIGRATION_STATE_STOP_COPY;
+case VFIO_DEVICE_STATE_RESUMING:
+return QAPI_VFIO_MIGRATION_STATE_RESUMING;
+case VFIO_DEVICE_STATE_RUNNING_P2P:
+return QAPI_VFIO_MIGRATION_STATE_RUNNING_P2P;
+case VFIO_DEVICE_STATE_PRE_COPY:
+return QAPI_VFIO_MIGRATION_STATE_PRE_COPY;
+case VFIO_DEVICE_STATE_PRE_COPY_P2P:
+return QAPI_VFIO_MIGRATION_STATE_PRE_COPY_P2P;
+default:
+g_assert_not_reached();
+}
+}
+
+static void vfio_migration_send_event(VFIODevice *vbasedev)
+{
+VFIOMigration *migration = vbasedev->migration;
+DeviceState *dev = vbasedev->dev;
+g_autofree char *qom_path = NULL;
+Object *obj;
+
+if (!vbasedev->migration_events) {
+return;
+}
+
+obj = vbasedev->ops->vfio_get_object(vbasedev);
+qom_path = object_get_canonical_path(obj);
+
+qapi_event_send_vfio_migration(
+dev->id, qom_path, mig_state_to_qapi_state(migration->device_state));
+}
+
+static void set_state(VFIODevice *vbasedev, enum vfio_device_mig_state state)
+{
+VFIOMigration *migration = vbasedev->migration;
+
+migration->device_state = state;
+vfio_migration_send_event(vbasedev);
+}
+
 static int vfio_migration_set_state(VFIODevice *vbasedev,
 enum vfio_device_mig_state new_state,
 enum vfio_device_mig_state recover_state)
@@ -125,12 +175,12 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
 goto reset_device;
 }
 
-migration->device_state = recover_state;
+set_state(vbasedev, recover_state);
 
 return ret;
 }
 
-migration->device_state = new_state;
+set_state(vbasedev, new_state);
 if (mig_state->data_fd != -1) {
 if (migration->data_fd != -1) {
 /*
@@ -156,7 +206,7 @@ reset_device:
  strerror(errno));
 }
 
-migration->device_state = VFIO_DEVICE_STATE_RUNNING;
+set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING);
 
 return ret;
 }
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 64780d1b79..8840602c50 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3362,6 +3362,8 @@ static Property vfio_pci_dev_properties[] = {
 VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false),
 DEFINE_PROP_ON_OFF_AUTO("enable-migration", VFIOPCIDevice,
 vbasedev.enable_migration, ON_OFF_AUTO_AUTO),
+DEFINE_PROP_BOOL("migration-events", VFIOPCIDevice,
+ vbasedev.migration_events, false),
 DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, vbasedev.no_mmap, false),
 DEFINE_PROP_BOOL("x-balloon-allowed", VFIOPCIDevice,
  vbasedev.ram_block_discard_allowed, false),
-- 
2.26.3




[PATCH v2 0/3] qapi/vfio: Add VFIO migration QAPI event

2024-05-09 Thread Avihai Horon
Hello,

This series adds a new QAPI event for VFIO device migration state 
change. This event will be emitted when a VFIO device changes its
state, for example, during migration or when stopping/starting the
guest.

This event can be used by management applications to get updates on the
current state of the VFIO device for their own purposes.

A new per VFIO device capability, "migration-events", is added so events
can be enabled only for the required devices. It is disabled by default.

Feedback/comments are appreciated,

Thanks.

Changes from v1 [1]:
* Added more info to patch #1 commit mesasge. (Markus)
* Renamed VFIODeviceMigState to VfioMigrationState and
  VFIO_DEVICE_MIG_STATE_CHANGED to VFIO_MIGRATION. (Joao, Markus)
* Added qom-path and qdev id to VFIO_MIGRATION event data. (Markus)
* Handled no-op state transitions in vfio_migration_set_state().
  (Cedric)
* Added helper to set VFIO state and emit VFIO event. (Peter)

[1]
https://lore.kernel.org/qemu-devel/20240430051621.19597-1-avih...@nvidia.com/

Avihai Horon (3):
  qapi/vfio: Add VFIO migration QAPI event
  vfio/migration: Emit VFIO migration QAPI event
  vfio/migration: Don't emit STOP_COPY VFIO migration QAPI event twice

 MAINTAINERS   |  1 +
 qapi/qapi-schema.json |  1 +
 qapi/vfio.json| 67 +++
 include/hw/vfio/vfio-common.h |  1 +
 hw/vfio/migration.c   | 60 +--
 hw/vfio/pci.c |  2 ++
 qapi/meson.build  |  1 +
 7 files changed, 130 insertions(+), 3 deletions(-)
 create mode 100644 qapi/vfio.json

-- 
2.26.3




[PATCH v2 1/3] qapi/vfio: Add VFIO migration QAPI event

2024-05-09 Thread Avihai Horon
Add a new QAPI event for VFIO migration. This event will be emitted when
a VFIO device changes its migration state, for example, during migration
or when stopping/starting the guest.

This event can be used by management applications to get updates on the
current state of the VFIO device for their own purposes.

Note that this new event is introduced since VFIO devices have a unique
set of migration states which cannot be described as accurately by other
existing events such as run state or migration status.

Signed-off-by: Avihai Horon 
---
 MAINTAINERS   |  1 +
 qapi/qapi-schema.json |  1 +
 qapi/vfio.json| 67 +++
 qapi/meson.build  |  1 +
 4 files changed, 70 insertions(+)
 create mode 100644 qapi/vfio.json

diff --git a/MAINTAINERS b/MAINTAINERS
index 84391777db..b5f1de459e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2160,6 +2160,7 @@ F: hw/vfio/*
 F: include/hw/vfio/
 F: docs/igd-assign.txt
 F: docs/devel/migration/vfio.rst
+F: qapi/vfio.json
 
 vfio-ccw
 M: Eric Farman 
diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
index 5e33da7228..b1581988e4 100644
--- a/qapi/qapi-schema.json
+++ b/qapi/qapi-schema.json
@@ -78,5 +78,6 @@
 { 'include': 'pci.json' }
 { 'include': 'stats.json' }
 { 'include': 'virtio.json' }
+{ 'include': 'vfio.json' }
 { 'include': 'cryptodev.json' }
 { 'include': 'cxl.json' }
diff --git a/qapi/vfio.json b/qapi/vfio.json
new file mode 100644
index 00..a0e5013188
--- /dev/null
+++ b/qapi/vfio.json
@@ -0,0 +1,67 @@
+# -*- Mode: Python -*-
+# vim: filetype=python
+#
+
+##
+# = VFIO devices
+##
+
+##
+# @VfioMigrationState:
+#
+# An enumeration of the VFIO device migration states.
+#
+# @stop: The device is stopped.
+#
+# @running: The device is running.
+#
+# @stop-copy: The device is stopped and its internal state is available
+# for reading.
+#
+# @resuming: The device is stopped and its internal state is available
+# for writing.
+#
+# @running-p2p: The device is running in the P2P quiescent state.
+#
+# @pre-copy: The device is running, tracking its internal state and its
+# internal state is available for reading.
+#
+# @pre-copy-p2p: The device is running in the P2P quiescent state,
+# tracking its internal state and its internal state is available
+# for reading.
+#
+# Since: 9.1
+##
+{ 'enum': 'VfioMigrationState',
+  'data': [ 'stop', 'running', 'stop-copy', 'resuming', 'running-p2p',
+'pre-copy', 'pre-copy-p2p' ],
+  'prefix': 'QAPI_VFIO_MIGRATION_STATE' }
+
+##
+# @VFIO_MIGRATION:
+#
+# This event is emitted when a VFIO device migration state is changed.
+#
+# @device-id: The device's id, if it has one.
+#
+# @qom-path: The device's QOM path.
+#
+# @device-state: The new changed device migration state.
+#
+# Since: 9.1
+#
+# Example:
+#
+# <- { "timestamp": { "seconds": 1713771323, "microseconds": 212268 },
+#  "event": "VFIO_MIGRATION",
+#  "data": {
+#  "device-id": "vfio_dev1",
+#  "qom-path": "/machine/peripheral/vfio_dev1",
+#  "device-state": "stop" } }
+##
+{ 'event': 'VFIO_MIGRATION',
+  'data': {
+  'device-id': 'str',
+  'qom-path': 'str',
+  'device-state': 'VfioMigrationState'
+  } }
diff --git a/qapi/meson.build b/qapi/meson.build
index c92af6e063..e7bc54e5d0 100644
--- a/qapi/meson.build
+++ b/qapi/meson.build
@@ -52,6 +52,7 @@ qapi_all_modules = [
   'stats',
   'trace',
   'transaction',
+  'vfio',
   'virtio',
   'yank',
 ]
-- 
2.26.3




Re: [PATCH v9] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER

2024-05-09 Thread Zhao Liu
Hi Daniel & Shaoqin,

Since x86 also needs to implement PMU filter feature, though it uses
the different KVM ioctl, we can still make the QEMU API as general as
possible.

To move forward with both ARM and x86, I'd like to discuss my API
thinking with you. ;-)

On Mon, Apr 15, 2024 at 06:29:25PM +0100, Daniel P. Berrangé wrote:
> Date: Mon, 15 Apr 2024 18:29:25 +0100
> From: "Daniel P. Berrangé" 
> Subject: Re: [PATCH v9] arm/kvm: Enable support for
>  KVM_ARM_VCPU_PMU_V3_FILTER
> 
> On Mon, Apr 08, 2024 at 10:49:40PM -0400, Shaoqin Huang wrote:
> > The KVM_ARM_VCPU_PMU_V3_FILTER provides the ability to let the VMM decide
> > which PMU events are provided to the guest. Add a new option
> > `kvm-pmu-filter` as -cpu sub-option to set the PMU Event Filtering.
> > Without the filter, all PMU events are exposed from host to guest by
> > default. The usage of the new sub-option can be found from the updated
> > document (docs/system/arm/cpu-features.rst).
> > 
> > Here is an example which shows how to use the PMU Event Filtering, when
> > we launch a guest by use kvm, add such command line:
> > 
> >   # qemu-system-aarch64 \
> > -accel kvm \
> > -cpu host,kvm-pmu-filter="D:0x11-0x11"
> 
> I'm still against implementing this one-off custom parsed syntax
> for kvm-pmu-filter values. Once this syntax exists, we're locked
> into back-compatibility for multiple releases, and it will make
> a conversion to QAPI/JSON harder.

Daniel, I understand you mean the new specific string format makes
external API support more complicated, right?

What about the following options:

1. Firstly, add a feature flag option in "-cpu" to enable kvm_filter
feature for CPU:

-cpu host,kvm-pmu-filter

2. Then use "-object kvm-pmu-event" to configure PMU event properties.
Since x86's PMU filter has very complex encoding rules, we need the
following three variants (one for general case, the other two are x86
specific):

- General format:
  -object kvm-pmu-event,action=[allowed|denied],events=[event-list]

  e.g, as Shaoqin's example,
  -object kvm-pmu-event,action=allowed,events=0x11-0x11,0x23-0x23
  -object kvm-pmu-event,action=denied,events=0x23-0x3a

- x86 raw_event encoding format (for single raw format event encoding):
  -object kvm-pmu-event,action=[allowed|denied],mode=0,select="0x01",
  umask="0x3c",fixed-bitmap="0x"

- x86 masked_event encoding format (for mutiple masked event encoding):
  -object kvm-pmu-event,action=[allowed|denied],mode=masked,select="0x01",
  mask="0x3c",match="0x11",exclude=true|false

The whole API architecture looks more complex, but has the advantage of
being as general as possible and avoiding the introduction of new string
format parsing.

What do you think? Because the most important thing about this feature
is the API design, welcome your comments!

Regards,
Zhao





Re: [PATCH v9] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER

2024-05-09 Thread Daniel P . Berrangé
On Thu, May 09, 2024 at 05:48:19PM +0800, Zhao Liu wrote:
> Hi Daniel & Shaoqin,
> 
> Since x86 also needs to implement PMU filter feature, though it uses
> the different KVM ioctl, we can still make the QEMU API as general as
> possible.
> 
> To move forward with both ARM and x86, I'd like to discuss my API
> thinking with you. ;-)
> 
> On Mon, Apr 15, 2024 at 06:29:25PM +0100, Daniel P. Berrangé wrote:
> > Date: Mon, 15 Apr 2024 18:29:25 +0100
> > From: "Daniel P. Berrangé" 
> > Subject: Re: [PATCH v9] arm/kvm: Enable support for
> >  KVM_ARM_VCPU_PMU_V3_FILTER
> > 
> > On Mon, Apr 08, 2024 at 10:49:40PM -0400, Shaoqin Huang wrote:
> > > The KVM_ARM_VCPU_PMU_V3_FILTER provides the ability to let the VMM decide
> > > which PMU events are provided to the guest. Add a new option
> > > `kvm-pmu-filter` as -cpu sub-option to set the PMU Event Filtering.
> > > Without the filter, all PMU events are exposed from host to guest by
> > > default. The usage of the new sub-option can be found from the updated
> > > document (docs/system/arm/cpu-features.rst).
> > > 
> > > Here is an example which shows how to use the PMU Event Filtering, when
> > > we launch a guest by use kvm, add such command line:
> > > 
> > >   # qemu-system-aarch64 \
> > > -accel kvm \
> > > -cpu host,kvm-pmu-filter="D:0x11-0x11"
> > 
> > I'm still against implementing this one-off custom parsed syntax
> > for kvm-pmu-filter values. Once this syntax exists, we're locked
> > into back-compatibility for multiple releases, and it will make
> > a conversion to QAPI/JSON harder.
> 
> Daniel, I understand you mean the new specific string format makes
> external API support more complicated, right?
> 
> What about the following options:
> 
> 1. Firstly, add a feature flag option in "-cpu" to enable kvm_filter
> feature for CPU:
> 
> -cpu host,kvm-pmu-filter
> 
> 2. Then use "-object kvm-pmu-event" to configure PMU event properties.
> Since x86's PMU filter has very complex encoding rules, we need the
> following three variants (one for general case, the other two are x86
> specific):
> 
> - General format:
>   -object kvm-pmu-event,action=[allowed|denied],events=[event-list]
> 
>   e.g, as Shaoqin's example,
>   -object kvm-pmu-event,action=allowed,events=0x11-0x11,0x23-0x23
>   -object kvm-pmu-event,action=denied,events=0x23-0x3a
> 
> - x86 raw_event encoding format (for single raw format event encoding):
>   -object kvm-pmu-event,action=[allowed|denied],mode=0,select="0x01",
>   umask="0x3c",fixed-bitmap="0x"
> 
> - x86 masked_event encoding format (for mutiple masked event encoding):
>   -object kvm-pmu-event,action=[allowed|denied],mode=masked,select="0x01",
>   mask="0x3c",match="0x11",exclude=true|false
> 
> The whole API architecture looks more complex, but has the advantage of
> being as general as possible and avoiding the introduction of new string
> format parsing.
> 
> What do you think? Because the most important thing about this feature
> is the API design, welcome your comments!

Please describe it in terms of a QAPI definition, as that's what we're
striving for with all QEMU public interfaces. Once the QAPI design is
agreed, then the -object mapping is trivial, as -object's JSON format
supports arbitrary QAPI structures.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




[PATCH v2] hw/virtio: Fix obtain the buffer id from the last descriptor

2024-05-09 Thread Wafer
The virtio-1.3 specification
 writes:
2.8.6 Next Flag: Descriptor Chaining
  Buffer ID is included in the last descriptor in the list.

If the feature (_F_INDIRECT_DESC) has been negotiated, install only
one descriptor in the virtqueue.
Therefor the buffer id should be obtained from the first descriptor.

In descriptor chaining scenarios, the buffer id should be obtained
from the last descriptor.

Fixes: 86044b24e8 ("virtio: basic packed virtqueue support")

Signed-off-by: Wafer 
Reviewed-by: Jason Wang 
Reviewed-by: Eugenio Pérez 

--
Changes in v2:
 - Use Jason suggestion: Move the code out of the loop.
---
 hw/virtio/virtio.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 871674f9be..e9e8447878 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1744,6 +1744,11 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t 
sz)
  &indirect_desc_cache);
 } while (rc == VIRTQUEUE_READ_DESC_MORE);
 
+if (desc_cache != &indirect_desc_cache) {
+/* Buffer ID is included in the last descriptor in the list. */
+id = desc.id;
+}
+
 /* Now copy what we have collected and mapped */
 elem = virtqueue_alloc_element(sz, out_num, in_num);
 for (i = 0; i < out_num; i++) {
-- 
2.27.0




Re: [PATCH v8 08/11] virtio-gpu: Handle resource blob commands

2024-05-09 Thread Dmitry Osipenko
On 5/5/24 09:47, Akihiko Odaki wrote:
> On 2024/05/02 4:20, Dmitry Osipenko wrote:
>> On 4/27/24 08:52, Akihiko Odaki wrote:
>>> On 2024/04/24 19:30, Dmitry Osipenko wrote:
 On 4/19/24 12:18, Akihiko Odaki wrote:
>> @@ -61,6 +61,10 @@ struct virtio_gpu_simple_resource {
>>     int dmabuf_fd;
>>     uint8_t *remapped;
>>     +    MemoryRegion *mr;
>> +    bool async_unmap_completed;
>> +    bool async_unmap_in_progress;
>> +
>
> Don't add fields to virtio_gpu_simple_resource but instead create a
> struct that embeds virtio_gpu_simple_resource in virtio-gpu-virgl.c.

 Please give a justification. I'd rather rename
 virtio_gpu_simple_resource s/_simple//. Simple resource already
 supports
 blob and the added fields are directly related to the blob. Don't see
 why another struct is needed.

>>>
>>> Because mapping is only implemented in virtio-gpu-gl while blob itself
>>> is implemented also in virtio-gpu.
>>
>> Rutubaga maps blobs and it should do unmapping blobs asynchronously as
>> well, AFAICT.
>>
> 
> Right. It makes sense to put mr in struct virtio_gpu_simple_resource in
> preparation for such a situation.
> 
> Based on this discussion, I think it is fine to put mr either in struct
> virtio_gpu_simple_resource or a distinct struct. However if you put mr
> in struct virtio_gpu_simple_resource, the logic that manages
> MemoryRegion should also be moved to virtio-gpu.c for consistency.

Rutabaga uses static MRs. It will either need a different workaround or
will have to move to dynamic MRs. I'll keep using distinct struct for now.

AFAICT, its a lesser problem for rutabaga because static MR isn't
subjected to the dynamic MR UAF problem that virgl has. On the other
hand, rutabaga is re-initing already inited static MR object on each new
mapping, that looks like a bug and it will need to move to dynamic MRs.

-- 
Best regards,
Dmitry




Re: [PATCH 1/6] virtio: Add bool to VirtQueueElement

2024-05-09 Thread Eugenio Perez Martin
On Mon, May 6, 2024 at 5:06 PM Jonah Palmer  wrote:
>
> Add the boolean 'filled' member to the VirtQueueElement structure. The
> use of this boolean will signify if the element has been written to the
> used / descriptor ring or not. This boolean is used to support the
> VIRTIO_F_IN_ORDER feature.
>
> Tested-by: Lei Yang 
> Signed-off-by: Jonah Palmer 
> ---
>  include/hw/virtio/virtio.h | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 7d5ffdc145..9ed9c3763c 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -69,6 +69,7 @@ typedef struct VirtQueueElement
>  unsigned int ndescs;
>  unsigned int out_num;
>  unsigned int in_num;
> +bool filled;

in_order_filled? I cannot come with a good name for this. Maybe we can
add a comment on top of the variable so we know what it is used for?

>  hwaddr *in_addr;
>  hwaddr *out_addr;
>  struct iovec *in_sg;
> --
> 2.39.3
>




Re: [PATCH v8 07/11] virtio-gpu: Support suspension of commands processing

2024-05-09 Thread Dmitry Osipenko
On 5/5/24 09:37, Akihiko Odaki wrote:
> On 2024/05/02 4:02, Dmitry Osipenko wrote:
>> On 4/27/24 08:48, Akihiko Odaki wrote:

 The VIRTIO_GPU_FILL_CMD() macro returns void and this macro is used by
 every function processing commands. Changing process_cmd() to return
 bool will require to change all those functions. Not worthwhile to
 change it, IMO. >
 The flag reflects the exact command status. The !finished + !suspended
 means that command is fenced, i.e. these flags don't have exactly same
 meaning.
>>>
>>> It is not necessary to change the signature of process_cmd(). You can
>>> just refer to !finished. No need to have the suspended flag.
>>
>> Not sure what you're meaning. The !finished says that cmd is fenced,
>> this fenced command is added to the polling list and the fence is
>> checked periodically by the fence_poll timer, meanwhile next virgl
>> commands are executed in the same time.
>>
>> This is completely different from the suspension where whole cmd
>> processing is blocked until command is resumed.
>>
> 
> !finished means you have not sent a response with
> virtio_gpu_ctrl_response(). Currently such a situation only happens when
> a fence is requested and virtio_gpu_process_cmdq() exploits the fact,
> but we are adding a new case without a fence.
> 
> So we need to add code to check if we are fencing or not in
> virtio_gpu_process_cmdq(). This can be achieved by evaluating the
> following expression as done in virtio_gpu_virgl_process_cmd():
> cmd->cmd_hdr.flags & VIRTIO_GPU_FLAG_FENCE

This works, but then I'll add back the res->async_unmap_in_progress
because we need to know whether unmapping has been started.

-- 
Best regards,
Dmitry




[PATCH] hw/nvme: Add CLI options for PCI vendor/device IDs and IEEE-OUI ID

2024-05-09 Thread Saif Abrar
Add CLI options for user specified
- PCI vendor, device, subsystem vendor and subsystem IDs
- IEEE-OUI ID

e.g. PCI IDs to be specified as follows:
-device 
nvme,id_vendor=0xABCD,id_device=0xA0B0,id_subsys_vendor=0xEF00,id_subsys=0xEF01

IEEE-OUI ID (Identify Controller bytes 75:73) is to be
specified in LE format.
(e.g. ieee_oui=0xABCDEF => Byte[73]=0xEF, Byte[74]=0xCD, Byte[75]=0xAB).

Signed-off-by: Saif Abrar 
---
 hw/nvme/nvme.h |  5 +
 hw/nvme/ctrl.c | 44 
 2 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index bed8191bd5..6e19a479d1 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -537,6 +537,11 @@ typedef struct NvmeParams {
 uint8_t  sriov_max_vq_per_vf;
 uint8_t  sriov_max_vi_per_vf;
 bool msix_exclusive_bar;
+uint16_t id_vendor;
+uint16_t id_device;
+uint16_t id_subsys_vendor;
+uint16_t id_subsys;
+uint32_t ieee_oui;
 } NvmeParams;
 
 typedef struct NvmeCtrl {
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 127c3d2383..35aeb48e0b 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -8050,8 +8050,9 @@ out:
 
 static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset)
 {
-uint16_t vf_dev_id = n->params.use_intel_id ?
- PCI_DEVICE_ID_INTEL_NVME : PCI_DEVICE_ID_REDHAT_NVME;
+uint16_t vf_dev_id = n->params.id_device ? n->params.id_device :
+(n->params.use_intel_id ?
+ PCI_DEVICE_ID_INTEL_NVME : PCI_DEVICE_ID_REDHAT_NVME);
 NvmePriCtrlCap *cap = &n->pri_ctrl_cap;
 uint64_t bar_size = nvme_mbar_size(le16_to_cpu(cap->vqfrsm),
   le16_to_cpu(cap->vifrsm),
@@ -8098,7 +8099,13 @@ static bool nvme_init_pci(NvmeCtrl *n, PCIDevice 
*pci_dev, Error **errp)
 pci_conf[PCI_INTERRUPT_PIN] = 1;
 pci_config_set_prog_interface(pci_conf, 0x2);
 
-if (n->params.use_intel_id) {
+if (n->params.id_vendor) {
+pci_config_set_vendor_id(pci_conf, n->params.id_vendor);
+pci_config_set_device_id(pci_conf, n->params.id_device);
+pci_set_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID,
+
n->params.id_subsys_vendor);
+pci_set_word(pci_conf + PCI_SUBSYSTEM_ID, n->params.id_subsys);
+} else if (n->params.use_intel_id) {
 pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_INTEL);
 pci_config_set_device_id(pci_conf, PCI_DEVICE_ID_INTEL_NVME);
 } else {
@@ -8206,7 +8213,11 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 
 id->rab = 6;
 
-if (n->params.use_intel_id) {
+if (n->params.ieee_oui) {
+id->ieee[0] = extract32(n->params.ieee_oui, 0,  8);
+id->ieee[1] = extract32(n->params.ieee_oui, 8,  8);
+id->ieee[2] = extract32(n->params.ieee_oui, 16, 8);
+} else if (n->params.use_intel_id) {
 id->ieee[0] = 0xb3;
 id->ieee[1] = 0x02;
 id->ieee[2] = 0x00;
@@ -8419,6 +8430,24 @@ static void nvme_exit(PCIDevice *pci_dev)
 memory_region_del_subregion(&n->bar0, &n->iomem);
 }
 
+static void nvme_prop_ieee_set(Object *obj, Visitor *v, const char *name,
+void *opaque, Error **errp)
+{
+Property *prop = opaque;
+uint32_t *val = object_field_prop_ptr(obj, prop);
+if (!visit_type_uint32(v, name, val, errp)) {
+return;
+}
+}
+
+static const PropertyInfo nvme_prop_ieee = {
+.name  = "uint32",
+.description = "IEEE OUI: Identify Controller bytes 75:73\
+ in LE format. (e.g. ieee_oui=0xABCDEF => Byte[73]=0xEF, Byte[74]=0xCD,\
+ Byte[75]=0xAB)",
+.set = nvme_prop_ieee_set,
+};
+
 static Property nvme_props[] = {
 DEFINE_BLOCK_PROPERTIES(NvmeCtrl, namespace.blkconf),
 DEFINE_PROP_LINK("pmrdev", NvmeCtrl, pmr.dev, TYPE_MEMORY_BACKEND,
@@ -8451,6 +8480,13 @@ static Property nvme_props[] = {
   params.sriov_max_vq_per_vf, 0),
 DEFINE_PROP_BOOL("msix-exclusive-bar", NvmeCtrl, params.msix_exclusive_bar,
  false),
+DEFINE_PROP_UINT16("id_vendor", NvmeCtrl, params.id_vendor, 0),
+DEFINE_PROP_UINT16("id_device", NvmeCtrl, params.id_device, 0),
+DEFINE_PROP_UINT16("id_subsys_vendor", NvmeCtrl,
+params.id_subsys_vendor, 
0),
+DEFINE_PROP_UINT16("id_subsys", NvmeCtrl, params.id_subsys, 0),
+DEFINE_PROP("ieee_oui", NvmeCtrl, params.ieee_oui, nvme_prop_ieee,
+  
uint32_t),
 DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.39.3




Re: [PATCH 2/6] virtio: virtqueue_pop - VIRTIO_F_IN_ORDER support

2024-05-09 Thread Eugenio Perez Martin
On Mon, May 6, 2024 at 5:06 PM Jonah Palmer  wrote:
>
> Add VIRTIO_F_IN_ORDER feature support in virtqueue_split_pop and
> virtqueue_packed_pop.
>
> VirtQueueElements popped from the available/descritpor ring are added to
> the VirtQueue's used_elems array in-order and in the same fashion as
> they would be added the used and descriptor rings, respectively.
>
> This will allow us to keep track of the current order, what elements
> have been written, as well as an element's essential data after being
> processed.
>
> Tested-by: Lei Yang 
> Signed-off-by: Jonah Palmer 
> ---
>  hw/virtio/virtio.c | 17 -
>  1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 893a072c9d..e6eb1bb453 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -1506,7 +1506,7 @@ static void *virtqueue_alloc_element(size_t sz, 
> unsigned out_num, unsigned in_nu
>
>  static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
>  {
> -unsigned int i, head, max;
> +unsigned int i, j, head, max;
>  VRingMemoryRegionCaches *caches;
>  MemoryRegionCache indirect_desc_cache;
>  MemoryRegionCache *desc_cache;
> @@ -1539,6 +1539,8 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t 
> sz)
>  goto done;
>  }
>
> +j = vq->last_avail_idx;
> +
>  if (!virtqueue_get_head(vq, vq->last_avail_idx++, &head)) {
>  goto done;
>  }
> @@ -1630,6 +1632,12 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t 
> sz)
>  elem->in_sg[i] = iov[out_num + i];
>  }
>
> +if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
> +vq->used_elems[j].index = elem->index;
> +vq->used_elems[j].len = elem->len;
> +vq->used_elems[j].ndescs = elem->ndescs;
> +}
> +
>  vq->inuse++;
>
>  trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
> @@ -1758,6 +1766,13 @@ static void *virtqueue_packed_pop(VirtQueue *vq, 
> size_t sz)
>
>  elem->index = id;
>  elem->ndescs = (desc_cache == &indirect_desc_cache) ? 1 : elem_entries;
> +
> +if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
> +vq->used_elems[vq->last_avail_idx].index = elem->index;
> +vq->used_elems[vq->last_avail_idx].len = elem->len;
> +vq->used_elems[vq->last_avail_idx].ndescs = elem->ndescs;
> +}
> +

I suggest using a consistent style between packed and split: Either
always use vq->last_avail_idx or j. If you use j, please rename to
something more related to the usage, as j is usually for iterations.

In my opinion I think vq->last_avail_idx is better.


>  vq->last_avail_idx += elem->ndescs;
>  vq->inuse += elem->ndescs;
>
> --
> 2.39.3
>




Re: [PATCH v2] target/loongarch/kvm: Fix VM recovery from disk failures

2024-05-09 Thread Peter Xu
On Wed, May 08, 2024 at 10:47:32AM +0800, Song Gao wrote:
> vmstate does not save kvm_state_conter,
> which can cause VM recovery from disk to fail.
> 
> Signed-off-by: Song Gao 

Acked-by: Peter Xu 

-- 
Peter Xu




[PATCH] hw/loongarch/virt.c: Fixes memory leak in ramName during loop iterations

2024-05-09 Thread R.Samarasekara
This patch fixes a memory leak in the ramName variable within the
hw/loongarch/virt.c file. The leak occurs due to repeated calls to
g_strdup_printf within a loop, causing memory allocated for ramName on
previous iterations to be unfreed.

Signed-off-by: R.Samarasekara 
---
 hw/loongarch/virt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index c0999878df..1fe02f8501 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -954,6 +954,7 @@ static void loongarch_init(MachineState *machine)
 fdt_add_memory_node(machine, phyAddr, numa_info[i].node_mem, i);
 offset += numa_info[i].node_mem;
 phyAddr += numa_info[i].node_mem;
+g_free(ramName);
 }
 
 /* initialize device memory address space */
-- 
2.40.1




Re: [PATCH 1/3] migration/colo: Minor fix for colo error message

2024-05-09 Thread Peter Xu
On Thu, May 09, 2024 at 11:31:04AM +0800, Li Zhijian wrote:
> - Explicitly show the missing module name: replication
> - Fix capability name to x-colo
> 
> Signed-off-by: Li Zhijian 

Reviewed-by: Peter Xu 

-- 
Peter Xu




Re: [PATCH 2/3] migration/colo: make colo_incoming_co() return void

2024-05-09 Thread Peter Xu
On Thu, May 09, 2024 at 11:31:05AM +0800, Li Zhijian via wrote:
> Currently, it always returns 0, no need to check the return value at all.
> In addition, enter colo coroutine only if migration_incoming_colo_enabled()
> is true.
> Once the destination side enters the COLO* state, the COLO process will
> take over the remaining processes until COLO exits.
> 
> Signed-off-by: Li Zhijian 

Reviewed-by: Peter Xu 

-- 
Peter Xu




Re: [PATCH 3/3] migration/colo: Tidy up bql_unlock() around bdrv_activate_all()

2024-05-09 Thread Peter Xu
On Thu, May 09, 2024 at 11:31:06AM +0800, Li Zhijian via wrote:
> Make the code more tight.
> 
> Cc: Michael Tokarev 
> Signed-off-by: Li Zhijian 

Reviewed-by: Peter Xu 

> ---
> This change/comment suggested by "Michael Tokarev " came
> a bit late at that time, let's update it together in these minor set
> this time.

You can use a tag next time:

Suggested-by: Michael Tokarev 

> ---
>  migration/colo.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index 991806c06a..1b6d9da1c8 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -838,12 +838,11 @@ static void *colo_process_incoming_thread(void *opaque)
>  /* Make sure all file formats throw away their mutable metadata */
>  bql_lock();
>  bdrv_activate_all(&local_err);
> +bql_unlock();
>  if (local_err) {
> -bql_unlock();
>  error_report_err(local_err);
>  return NULL;
>  }
> -bql_unlock();
>  
>  failover_init_state();
>  
> -- 
> 2.31.1
> 
> 

-- 
Peter Xu




Re: [PATCH] target/i386: remove PCOMMIT from TCG, deprecate property

2024-05-09 Thread Richard Henderson

On 5/8/24 17:44, Paolo Bonzini wrote:

The PCOMMIT instruction was never included in any physical processor.
TCG implements it as a no-op instruction, but its utility is debatable
to say the least.  Drop it from the decoder since it is only available
with "-cpu max", which does not guarantee migration compatibility
across versions, and deprecate the property just in case someone is
using it as "pcommit=off".

Signed-off-by: Paolo Bonzini
---
  docs/about/deprecated.rst   |  8 
  target/i386/cpu.h   |  2 --
  target/i386/cpu.c   |  2 +-
  target/i386/tcg/translate.c | 12 +---
  4 files changed, 10 insertions(+), 14 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH] scripts/simpletrace: Mark output with unstable timestamp as WARN

2024-05-09 Thread Stefan Hajnoczi
On Thu, May 09, 2024 at 11:59:10AM +0800, Zhao Liu wrote:
> On Wed, May 08, 2024 at 02:05:04PM -0400, Stefan Hajnoczi wrote:
> > Date: Wed, 8 May 2024 14:05:04 -0400
> > From: Stefan Hajnoczi 
> > Subject: Re: [PATCH] scripts/simpletrace: Mark output with unstable
> >  timestamp as WARN
> > 
> > On Wed, 8 May 2024 at 00:19, Zhao Liu  wrote:
> > >
> > > In some trace log, there're unstable timestamp breaking temporal
> > > ordering of trace records. For example:
> > >
> > > kvm_run_exit -0.015 pid=3289596 cpu_index=0x0 reason=0x6
> > > kvm_vm_ioctl -0.020 pid=3289596 type=0xc008ae67 arg=0x7ffeefb5aa60
> > > kvm_vm_ioctl -0.021 pid=3289596 type=0xc008ae67 arg=0x7ffeefb5aa60
> > >
> > > Negative delta intervals tend to get drowned in the massive trace logs,
> > > and an unstable timestamp can corrupt the calculation of intervals
> > > between two events adjacent to it.
> > >
> > > Therefore, mark the outputs with unstable timestamps as WARN like:
> > 
> > Why are the timestamps non-monotonic?
> > 
> > In a situation like that maybe not only the negative timestamps are
> > useless but even some positive timestamps are incorrect. I think it's
> > worth understanding the nature of the instability before merging a
> > fix.
> 
> I grabbed more traces (by -trace "*" to cover as many events as possible
> ) and have a couple observations:
> 
> * It's not an issue with kvm's ioctl, and that qemu_mutex_lock/
>   object_dynamic_cast_assert accounted for the vast majority of all
>   exception timestamps.
> * The total exception timestamp occurrence probability was roughly 0.013%
>   (608 / 4,616,938) in a 398M trace file.
> * And the intervals between the "wrong" timestamp and its pre event are
>   almost all within 50ns, even more concentrated within 20ns (there are
>   even quite a few 1~10ns).
> 
> Just a guess:
> 
> Would it be possible that a trace event which is too short of an interval,
> and happen to meet a delay in signaling to send to writeout_thread?
> 
> It seems that this short interval like a lack of real-time guarantees in
> the underlying mechanism...
> 
> If it's QEMU's own issue, I feel like the intervals should be randomized,
> not just within 50ns...
> 
> May I ask what you think? Any suggestions for researching this situation
> ;-)

QEMU uses clock_gettime(CLOCK_MONOTONIC) on Linux hosts. The man page
says:

  All CLOCK_MONOTONIC variants guarantee that the time returned by
  consecutive  calls  will  not go backwards, but successive calls
  may—depending  on  the  architecture—return  identical  (not-in‐
  creased) time values.

trace_record_start() calls clock_gettime(CLOCK_MONOTONIC) so trace events
should have monotonically increasing timestamps.

I don't see a scenario where trace record A's timestamp is greater than
trace record B's timestamp unless the clock is non-monotonic.

Which host CPU architecture and operating system are you running?

Please attach to the QEMU process with gdb and print out the value of
the use_rt_clock variable or add a printf in init_get_clock(). The value
should be 1.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v3] target/i386: Fix CPUID encoding of Fn8000001E_ECX

2024-05-09 Thread Michael Tokarev

03.05.2024 20:46, Babu Moger wrote:

Observed the following failure while booting the SEV-SNP guest and the
guest fails to boot with the smp parameters:
"-smp 192,sockets=1,dies=12,cores=8,threads=2".

qemu-system-x86_64: sev_snp_launch_update: SNP_LAUNCH_UPDATE ret=-5 fw_error=22 
'Invalid parameter'
qemu-system-x86_64: SEV-SNP: CPUID validation failed for function 0x801e, 
index: 0x0.
provided: eax:0x, ebx: 0x0100, ecx: 0x0b00, edx: 0x
expected: eax:0x, ebx: 0x0100, ecx: 0x0300, edx: 0x
qemu-system-x86_64: SEV-SNP: failed update CPUID page

...

Cc: qemu-sta...@nongnu.org
Fixes: 31ada106d891 ("Simplify CPUID_8000_001E for AMD")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Reviewed-by: Zhao Liu 
Signed-off-by: Babu Moger 
---
v3:
   Rebased to the latest tree.
   Updated the pc_compat_9_0 for the new flag.



diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 08c7de416f..46235466d7 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -81,6 +81,7 @@
  GlobalProperty pc_compat_9_0[] = {
  { TYPE_X86_CPU, "guest-phys-bits", "0" },
  { "sev-guest", "legacy-vm-type", "true" },
+{ TYPE_X86_CPU, "legacy-multi-node", "on" },
  };


Should this legacy-multi-node property be added to previous
machine types when applying to stable?  How about stable-8.2
and stable-7.2?

Thanks,

/mjt




Re: [PATCH 3/6] virtio: virtqueue_ordered_fill - VIRTIO_F_IN_ORDER support

2024-05-09 Thread Eugenio Perez Martin
On Mon, May 6, 2024 at 5:05 PM Jonah Palmer  wrote:
>
> Add VIRTIO_F_IN_ORDER feature support for virtqueue_fill operations.
>
> The goal of the virtqueue_fill operation when the VIRTIO_F_IN_ORDER
> feature has been negotiated is to search for this now-used element,
> set its length, and mark the element as filled in the VirtQueue's
> used_elems array.
>
> By marking the element as filled, it will indicate that this element is
> ready to be flushed, so long as the element is in-order.
>
> Tested-by: Lei Yang 
> Signed-off-by: Jonah Palmer 
> ---
>  hw/virtio/virtio.c | 26 +-
>  1 file changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index e6eb1bb453..064046b5e2 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -873,6 +873,28 @@ static void virtqueue_packed_fill(VirtQueue *vq, const 
> VirtQueueElement *elem,
>  vq->used_elems[idx].ndescs = elem->ndescs;
>  }
>
> +static void virtqueue_ordered_fill(VirtQueue *vq, const VirtQueueElement 
> *elem,
> +   unsigned int len)
> +{
> +unsigned int i = vq->used_idx;
> +
> +/* Search for element in vq->used_elems */
> +while (i != vq->last_avail_idx) {
> +/* Found element, set length and mark as filled */
> +if (vq->used_elems[i].index == elem->index) {
> +vq->used_elems[i].len = len;
> +vq->used_elems[i].filled = true;
> +break;
> +}
> +
> +i += vq->used_elems[i].ndescs;
> +
> +if (i >= vq->vring.num) {
> +i -= vq->vring.num;
> +}
> +}

This has a subtle problem: ndescs and elems->id are controlled by the
guest, so it could make QEMU to loop forever looking for the right
descriptor. For each iteration, the code must control that the
variable "i" will be different for the next iteration, and that there
will be no more than vq->last_avail_idx - vq->used_idx iterations.

Apart of that, I think it makes more sense to split the logical
sections of the function this way:
/* declarations */
i = vq->used_idx

/* Search for element in vq->used_elems */
while (vq->used_elems[i].index != elem->index &&
vq->used_elems[i].index i != vq->last_avail_idx && ...) {
...
}

/* Set length and mark as filled */
vq->used_elems[i].len = len;
vq->used_elems[i].filled = true;
---

But I'm ok either way.

> +}
> +
>  static void virtqueue_packed_fill_desc(VirtQueue *vq,
> const VirtQueueElement *elem,
> unsigned int idx,
> @@ -923,7 +945,9 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement 
> *elem,
>  return;
>  }
>
> -if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> +if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
> +virtqueue_ordered_fill(vq, elem, len);
> +} else if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
>  virtqueue_packed_fill(vq, elem, len, idx);
>  } else {
>  virtqueue_split_fill(vq, elem, len, idx);
> --
> 2.39.3
>




Re: [PATCH v3] target/i386: Fix CPUID encoding of Fn8000001E_ECX

2024-05-09 Thread Daniel P . Berrangé
On Thu, May 09, 2024 at 04:54:16PM +0300, Michael Tokarev wrote:
> 03.05.2024 20:46, Babu Moger wrote:
> > Observed the following failure while booting the SEV-SNP guest and the
> > guest fails to boot with the smp parameters:
> > "-smp 192,sockets=1,dies=12,cores=8,threads=2".
> > 
> > qemu-system-x86_64: sev_snp_launch_update: SNP_LAUNCH_UPDATE ret=-5 
> > fw_error=22 'Invalid parameter'
> > qemu-system-x86_64: SEV-SNP: CPUID validation failed for function 
> > 0x801e, index: 0x0.
> > provided: eax:0x, ebx: 0x0100, ecx: 0x0b00, edx: 0x
> > expected: eax:0x, ebx: 0x0100, ecx: 0x0300, edx: 0x
> > qemu-system-x86_64: SEV-SNP: failed update CPUID page
> ...
> > Cc: qemu-sta...@nongnu.org
> > Fixes: 31ada106d891 ("Simplify CPUID_8000_001E for AMD")
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> > Reviewed-by: Zhao Liu 
> > Signed-off-by: Babu Moger 
> > ---
> > v3:
> >Rebased to the latest tree.
> >Updated the pc_compat_9_0 for the new flag.
> 
> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > index 08c7de416f..46235466d7 100644
> > --- a/hw/i386/pc.c
> > +++ b/hw/i386/pc.c
> > @@ -81,6 +81,7 @@
> >   GlobalProperty pc_compat_9_0[] = {
> >   { TYPE_X86_CPU, "guest-phys-bits", "0" },
> >   { "sev-guest", "legacy-vm-type", "true" },
> > +{ TYPE_X86_CPU, "legacy-multi-node", "on" },
> >   };
> 
> Should this legacy-multi-node property be added to previous
> machine types when applying to stable?  How about stable-8.2
> and stable-7.2?

machine types are considered to express a fixed guest ABI
once part of a QEMU release. Given that we should not be
changing existing machine types in stable branches.

In theory we could create new "bug fix" machine types in stable
branches. To support live migration, we would then need to also
add those same stable branch "bug fix" machine type versions in
all future QEMU versions. This is generally not worth the hassle
of exploding the number of machine types.

If you backport the patch, minus the machine type, then users
can still get the fix but they'll need to manually set the
property to enable it.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-05-09 Thread Peter Xu
On Thu, May 09, 2024 at 04:58:34PM +0800, Zheng Chuan via wrote:
> That's a good news to see the socket abstraction for RDMA!
> When I was developed the series above, the most pain is the RDMA migration 
> has no QIOChannel abstraction and i need to take a 'fake channel'
> for it which is awkward in code implementation.
> So, as far as I know, we can do this by
> i. the first thing is that we need to evaluate the rsocket is good enough to 
> satisfy our QIOChannel fundamental abstraction
> ii. if it works right, then we will continue to see if it can give us 
> opportunity to hide the detail of rdma protocol
> into rsocket by remove most of code in rdma.c and also some hack in 
> migration main process.
> iii. implement the advanced features like multi-fd and multi-uri for rdma 
> migration.
> 
> Since I am not familiar with rsocket, I need some times to look at it and do 
> some quick verify with rdma migration based on rsocket.
> But, yes, I am willing to involved in this refactor work and to see if we can 
> make this migration feature more better:)

Based on what we have now, it looks like we'd better halt the deprecation
process a bit, so I think we shouldn't need to rush it at least in 9.1
then, and we'll need to see how it goes on the refactoring.

It'll be perfect if rsocket works, otherwise supporting multifd with little
overhead / exported APIs would also be a good thing in general with
whatever approach.  And obviously all based on the facts that we can get
resources from companies to support this feature first.

Note that so far nobody yet compared with rdma v.s. nic perf, so I hope if
any of us can provide some test results please do so.  Many people are
saying RDMA is better, but I yet didn't see any numbers comparing it with
modern TCP networks.  I don't want to have old impressions floating around
even if things might have changed..  When we have consolidated results, we
should share them out and also reflect that in QEMU's migration docs when a
rdma document page is ready.

Chuan, please check the whole thread discussion, it may help to understand
what we are looking for on rdma migrations [1].  Meanwhile please feel free
to sync with Jinpu's team and see how to move forward with such a project.

[1] https://lore.kernel.org/qemu-devel/87frwatp7n@suse.de/

Thanks,

-- 
Peter Xu




Re: [PATCH V1 09/26] migration: vmstate_register_named

2024-05-09 Thread Fabiano Rosas
Steve Sistare  writes:

> Define vmstate_register_named which takes the instance name as its first
> parameter, instead of generating the name from VMStateIf of the Object.
> This will be needed to register objects that are not Objects.  Pass the
> new name parameter to vmstate_register_with_alias_id.
>
> Signed-off-by: Steve Sistare 

Reviewed-by: Fabiano Rosas 



Re: [PATCH V1 09/26] migration: vmstate_register_named

2024-05-09 Thread Fabiano Rosas
Fabiano Rosas  writes:

> Steve Sistare  writes:
>
>> Define vmstate_register_named which takes the instance name as its first
>> parameter, instead of generating the name from VMStateIf of the Object.
>> This will be needed to register objects that are not Objects.  Pass the
>> new name parameter to vmstate_register_with_alias_id.
>>
>> Signed-off-by: Steve Sistare 
>
> Reviewed-by: Fabiano Rosas 

Actually, can't we define a wrapper type just for this purpose? For
example, looking at dbus-vmstate.c:

static void dbus_vmstate_class_init(ObjectClass *oc, void *data)
{
...
VMStateIfClass *vc = VMSTATE_IF_CLASS(oc);

vc->get_id = dbus_vmstate_get_id;
...
}

static const TypeInfo dbus_vmstate_info = {
.name = TYPE_DBUS_VMSTATE,
.parent = TYPE_OBJECT,
.instance_size = sizeof(DBusVMState),
.instance_finalize = dbus_vmstate_finalize,
.class_init = dbus_vmstate_class_init,
.interfaces = (InterfaceInfo[]) {
{ TYPE_USER_CREATABLE },   // without this one
{ TYPE_VMSTATE_IF },
{ }
}
};

static void register_types(void)
{
type_register_static(&dbus_vmstate_info);
}
type_init(register_types);



Re: [PATCH 01/14] include/hw: add helpers for defining versioned machine types

2024-05-09 Thread Daniel P . Berrangé
On Thu, May 02, 2024 at 12:34:49PM +0200, Thomas Huth wrote:
> On 01/05/2024 20.27, Daniel P. Berrangé wrote:
> > The various targets which define versioned machine types have
> > a bunch of obfuscated macro code for defining unique function
> > and variable names using string concatenation.
> > 
> > This addes a couple of helpers to improve the clarity of such
> > code macro.
> > 
> > Signed-off-by: Daniel P. Berrangé 
> > ---
> >   include/hw/boards.h | 166 
> >   1 file changed, 166 insertions(+)
> > 
> > diff --git a/include/hw/boards.h b/include/hw/boards.h
> > index 2fa800f11a..47ca450fca 100644
> > --- a/include/hw/boards.h
> > +++ b/include/hw/boards.h
> > @@ -414,6 +414,172 @@ struct MachineState {
> >   struct NumaState *numa_state;
> >   };
> > +/*
> > + * The macros which follow are intended to facilitate the
> > + * definition of versioned machine types, using a somewhat
> > + * similar pattern across targets:
> 
> I'd suggest to add a sentence at the end saying something like this: "For
> example, to create the macro for setting up a versioned "virt" machine could
> look like this:" (otherwise it's not immediately clear whether the example
> is about only the "macros which follow" or whether it's about how to set up
> a machine type)

Yes, will do

> 
> > + *  #define DEFINE_VIRT_MACHINE_IMPL(latest, ...) \
> > + *  static void MACHINE_VER_SYM(class_init, virt, __VA_ARGS__)( \
> > + *  ObjectClass *oc, \
> > + *  void *data) \

> > + *  #define DEFINE_VIRT_MACHINE_TAGGED(major, minor, micro, tag) \
> > + *  DEFINE_VIRT_MACHINE_IMPL(false, major, minor, micro, _, tag)
> > + */
> 
> I'd suggest to add a separate comment for the macro below, explaining that
> it is supposed to be used with __VA_ARGS__ to pick a certain other macro
> depending on the amount of entries in __VA_ARGS__.

Yep, Eric had the same suggestion

> 
> > +#define _MACHINE_VER_PICK(x1, x2, x3, x4, x5, x6, ...) x6
> > +
> > +/*
> > + * Construct a human targetted machine version string.
> 
> s/targetted/targeted/ according to my spell checker ?
> 
> Apart from the nits:
> Reviewed-by: Thomas Huth 
> 
>  Thomas
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH 01/14] include/hw: add helpers for defining versioned machine types

2024-05-09 Thread Daniel P . Berrangé
On Thu, May 02, 2024 at 09:57:21AM -0500, Eric Blake wrote:
> On Wed, May 01, 2024 at 07:27:46PM +0100, Daniel P. Berrangé wrote:
> > The various targets which define versioned machine types have
> > a bunch of obfuscated macro code for defining unique function
> > and variable names using string concatenation.
> > 
> > This addes a couple of helpers to improve the clarity of such
> > code macro.
> > 
> > Signed-off-by: Daniel P. Berrangé 
> > ---
> >  include/hw/boards.h | 166 
> >  1 file changed, 166 insertions(+)
> > 
> > diff --git a/include/hw/boards.h b/include/hw/boards.h
> > index 2fa800f11a..47ca450fca 100644
> > --- a/include/hw/boards.h
> > +++ b/include/hw/boards.h
> > @@ -414,6 +414,172 @@ struct MachineState {
> >  struct NumaState *numa_state;
> >  };
> >  
> > +/*
> > + * The macros which follow are intended to facilitate the
> > + * definition of versioned machine types, using a somewhat
> > + * similar pattern across targets:
> > + *
> > + *  #define DEFINE_VIRT_MACHINE_IMPL(latest, ...) \
> > + *  static void MACHINE_VER_SYM(class_init, virt, __VA_ARGS__)( \
> > + *  ObjectClass *oc, \
> > + *  void *data) \
> > + *  { \
> > + *  MachineClass *mc = MACHINE_CLASS(oc); \
> > + *  MACHINE_VER_SYM(options, virt, __VA_ARGS__)(mc); \
> 
> Nice to include example usage of the macros.  __VA_ARGS__ is getting
> expanded quite a few times here, but I don't see that as being too
> much of a problem.
> 
> > + *  // Normal 2 digit eg 'virt-9.0'
> > + *  #define DEFINE_VIRT_MACHINE(major, minor) \
> > + *  DEFINE_VIRT_MACHINE_IMPL(false, major, minor)
> > + *
> > + *  // Bugfix 3 digit  eg 'virt-9.0.1'
> 
> Inconsistent on whether you are using one or two spaces before 'eg'.

Will fix

> 
> > +
> > +#define _MACHINE_VER_PICK(x1, x2, x3, x4, x5, x6, ...) x6
> 
> This helper macro is powerful; it may be worth a short comment, maybe
> along the lines of:
> 
> /* Helper macro used to pick the right macro name based on the number
>  * of extra arguments passed to the containing macro; usage:
>  *
>  *  _MACHINE_VER_PICK(__VA_ARGS__, \
>  *MACRO_FOR_5_ARGS, \
>  *MACRO_FOR_4_ARGS, \
>  *MACRO_FOR_3_ARGS, \
>  *MACRO_FOR_2_ARGS)(optional prefix args, __VA_ARGS__)
>  */

Yes, will add something like this

> 
> But once understood, I see it comes in handy in several places below.
> 
> Reviewed-by: Eric Blake 
> 
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.
> Virtualization:  qemu.org | libguestfs.org
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH 08/14] include/hw: add macros for deprecation & removal of versioned machines

2024-05-09 Thread Daniel P . Berrangé
On Thu, May 02, 2024 at 12:59:05PM +0200, Thomas Huth wrote:
> On 01/05/2024 20.27, Daniel P. Berrangé wrote:
> > Versioned machines live for a long time to provide back compat for
> > incoming migration and restore of saved images. To guide users away from
> > usage of old machines, however, we want to deprecate any older than 3
> > years (equiv of 9 releases), and delete any older than 6 years (equiva
> > of 18 releases).
> > 
> > To get a standardized deprecation message and avoid having to remember
> > to manually add it after three years, this introduces two macros to be
> > used by targets when defining versioned machines.
> > 
> > * MACHINE_VER_DEPRECATION(major, minor)
> > 
> >Automates the task of setting the 'deprecation_reason' field on the
> >machine, if-and-only-if the major/minor version is older than 3 years.
> > 
> > * MACHINE_VER_DEPRECATION(major, minor)
> 
> That should be MACHINE_VER_DELETION instead.

Opps, yes.

> 
> >Simulates the deletion of by skipping registration of the QOM type
> >for a versioned machine, if-and-only-if the major/minor version is
> >older than 6 years.
> > 
> > By using these two macros there is no longer any manual work required
> > per-release to deprecate old machines. By preventing the use of machines
> > that have reached their deletion date, it is also no neccessary to
> 
> s/neccessary/necessary/
> 
> > manually delete machines per-release. Deletion can be batched up once a
> > year or whenever makes most sense.
> > 
> > Signed-off-by: Daniel P. Berrangé 
> > ---
> >   include/hw/boards.h | 84 +
> >   1 file changed, 84 insertions(+)
> 
> With the typos fixed:
> Reviewed-by: Thomas Huth 
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




[PATCH] target/i386: rdpkru/wrpkru are no-prefix instructions

2024-05-09 Thread Paolo Bonzini
Reject 0x66/0xf3/0xf2 in front of them.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Paolo Bonzini 
---
 target/i386/tcg/translate.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 5366dc32dd3..3da4fdf64cc 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -3907,7 +3907,8 @@ static void disas_insn_old(DisasContext *s, CPUState 
*cpu, int b)
 gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 1);
 break;
 case 0xee: /* rdpkru */
-if (prefixes & PREFIX_LOCK) {
+if (s->prefix & (PREFIX_LOCK | PREFIX_DATA
+ | PREFIX_REPZ | PREFIX_REPNZ)) {
 goto illegal_op;
 }
 tcg_gen_trunc_tl_i32(s->tmp2_i32, cpu_regs[R_ECX]);
@@ -3915,7 +3916,8 @@ static void disas_insn_old(DisasContext *s, CPUState 
*cpu, int b)
 tcg_gen_extr_i64_tl(cpu_regs[R_EAX], cpu_regs[R_EDX], s->tmp1_i64);
 break;
 case 0xef: /* wrpkru */
-if (prefixes & PREFIX_LOCK) {
+if (s->prefix & (PREFIX_LOCK | PREFIX_DATA
+ | PREFIX_REPZ | PREFIX_REPNZ)) {
 goto illegal_op;
 }
 tcg_gen_concat_tl_i64(s->tmp1_i64, cpu_regs[R_EAX],
-- 
2.45.0




[PATCH] tests/tcg: cover lzcnt/tzcnt/popcnt

2024-05-09 Thread Paolo Bonzini
Signed-off-by: Paolo Bonzini 
---
 tests/tcg/i386/test-i386.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/tests/tcg/i386/test-i386.c b/tests/tcg/i386/test-i386.c
index 864c4e620d5..ce3bf74b5a8 100644
--- a/tests/tcg/i386/test-i386.c
+++ b/tests/tcg/i386/test-i386.c
@@ -715,6 +715,30 @@ void test_mul(void)
 printf("%-10s A=" FMTLX " R=" FMTLX " %ld\n", #op, val, res, resz);\
 }
 
+void test_xcnt(void)
+{
+TEST_BSX(tzcntw, "w", 0);
+TEST_BSX(tzcntw, "w", 0x12340128);
+TEST_BSX(lzcntw, "w", 0);
+TEST_BSX(lzcntw, "w", 0x12340128);
+TEST_BSX(popcntw, "w", 0);
+TEST_BSX(popcntw, "w", 0x12340128);
+TEST_BSX(tzcntl, "k", 0);
+TEST_BSX(tzcntl, "k", 0x00340128);
+TEST_BSX(lzcntl, "k", 0);
+TEST_BSX(lzcntl, "k", 0x00340128);
+TEST_BSX(popcntl, "k", 0);
+TEST_BSX(popcntl, "k", 0x00340128);
+#if defined(__x86_64__)
+TEST_BSX(tzcntq, "", 0);
+TEST_BSX(tzcntq, "", 0x003401281234);
+TEST_BSX(lzcntq, "", 0);
+TEST_BSX(lzcntq, "", 0x003401281234);
+TEST_BSX(popcntq, "", 0);
+TEST_BSX(popcntq, "", 0x003401281234);
+#endif
+}
+
 void test_bsx(void)
 {
 TEST_BSX(bsrw, "w", 0);
@@ -2162,6 +2186,7 @@ int main(int argc, char **argv)
 func();
 }
 test_bsx();
+test_xcnt();
 test_mul();
 test_jcc();
 test_loop();
-- 
2.45.0




[PATCH] target/i386: fix operand size for DATA16 REX.W POPCNT

2024-05-09 Thread Paolo Bonzini
According to the manual, 32-bit vs 64-bit is governed by REX.W
and REX ignores the 0x66 prefix.  This can be confirmed with this
program:

#include 
int main()
{
   int x = 0x1234;
   int y;
   asm("popcntl %1, %0" : "=r" (y) : "r" (x)); printf("%x\n", y);
   asm("mov $-1, %0; .byte 0x66; popcntl %1, %0" : "+r" (y) : "r" (x)); 
printf("%x\n", y);
   asm("mov $-1, %0; .byte 0x66; popcntq %q1, %q0" : "+r" (y) : "r" (x)); 
printf("%x\n", y);
}

which prints 5//5 on real hardware and 5//
on QEMU.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Paolo Bonzini 
---
 target/i386/tcg/translate.c | 17 +
 1 file changed, 1 insertion(+), 16 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 7d9f6b5c55b..5366dc32dd3 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -411,16 +411,6 @@ static inline MemOp mo_stacksize(DisasContext *s)
 return CODE64(s) ? MO_64 : SS32(s) ? MO_32 : MO_16;
 }
 
-/* Select only size 64 else 32.  Used for SSE operand sizes.  */
-static inline MemOp mo_64_32(MemOp ot)
-{
-#ifdef TARGET_X86_64
-return ot == MO_64 ? MO_64 : MO_32;
-#else
-return MO_32;
-#endif
-}
-
 /* Select size 8 if lsb of B is clear, else OT.  Used for decoding
byte vs word opcodes.  */
 static inline MemOp mo_b_d(int b, MemOp ot)
@@ -4545,12 +4535,7 @@ static void disas_insn_old(DisasContext *s, CPUState 
*cpu, int b)
 modrm = x86_ldub_code(env, s);
 reg = ((modrm >> 3) & 7) | REX_R(s);
 
-if (s->prefix & PREFIX_DATA) {
-ot = MO_16;
-} else {
-ot = mo_64_32(dflag);
-}
-
+ot = dflag;
 gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0);
 gen_extu(ot, s->T0);
 tcg_gen_mov_tl(cpu_cc_src, s->T0);
-- 
2.45.0




[PATCH] target/i386: move prefetch and multi-byte UD/NOP to new decoder

2024-05-09 Thread Paolo Bonzini
These are trivial to add, and moving them to the new decoder fixes some
corner cases: raising #UD instead of an instruction fetch page fault for
the undefined opcodes, and incorrectly rejecting 0F 18 prefetches with
register operands (which are treated as reserved NOPs).

Signed-off-by: Paolo Bonzini 
---
 target/i386/tcg/decode-new.h |  1 +
 target/i386/tcg/translate.c  | 30 --
 target/i386/tcg/decode-new.c.inc | 24 +---
 target/i386/tcg/emit.c.inc   |  5 +
 4 files changed, 27 insertions(+), 33 deletions(-)

diff --git a/target/i386/tcg/decode-new.h b/target/i386/tcg/decode-new.h
index 2ea06b44787..51ef0e621b9 100644
--- a/target/i386/tcg/decode-new.h
+++ b/target/i386/tcg/decode-new.h
@@ -50,6 +50,7 @@ typedef enum X86OpType {
 X86_TYPE_EM, /* modrm byte selects an ALU memory operand */
 X86_TYPE_WM, /* modrm byte selects an XMM/YMM memory operand */
 X86_TYPE_I_unsigned, /* Immediate, zero-extended */
+X86_TYPE_nop, /* modrm operand decoded but not loaded into s->T{0,1} */
 X86_TYPE_2op, /* 2-operand RMW instruction */
 X86_TYPE_LoBits, /* encoded in bits 0-2 of the operand + REX.B */
 X86_TYPE_0, /* Hard-coded GPRs (RAX..RDI) */
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 3da4fdf64cc..de87775016b 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -4019,25 +4019,6 @@ static void disas_insn_old(DisasContext *s, CPUState 
*cpu, int b)
 set_cc_op(s, CC_OP_EFLAGS);
 }
 break;
-case 0x118:
-modrm = x86_ldub_code(env, s);
-mod = (modrm >> 6) & 3;
-op = (modrm >> 3) & 7;
-switch(op) {
-case 0: /* prefetchnta */
-case 1: /* prefetchnt0 */
-case 2: /* prefetchnt0 */
-case 3: /* prefetchnt0 */
-if (mod == 3)
-goto illegal_op;
-gen_nop_modrm(env, s, modrm);
-/* nothing more to do */
-break;
-default: /* nop (multi byte) */
-gen_nop_modrm(env, s, modrm);
-break;
-}
-break;
 case 0x11a:
 modrm = x86_ldub_code(env, s);
 if (s->flags & HF_MPX_EN_MASK) {
@@ -4229,10 +4210,6 @@ static void disas_insn_old(DisasContext *s, CPUState 
*cpu, int b)
 }
 gen_nop_modrm(env, s, modrm);
 break;
-case 0x119: case 0x11c ... 0x11f: /* nop (multi byte) */
-modrm = x86_ldub_code(env, s);
-gen_nop_modrm(env, s, modrm);
-break;
 
 case 0x120: /* mov reg, crN */
 case 0x122: /* mov crN, reg */
@@ -4506,13 +4483,6 @@ static void disas_insn_old(DisasContext *s, CPUState 
*cpu, int b)
 }
 break;
 
-case 0x10d: /* 3DNow! prefetch(w) */
-modrm = x86_ldub_code(env, s);
-mod = (modrm >> 6) & 3;
-if (mod == 3)
-goto illegal_op;
-gen_nop_modrm(env, s, modrm);
-break;
 case 0x1aa: /* rsm */
 gen_svm_check_intercept(s, SVM_EXIT_RSM);
 if (!(s->flags & HF_SMM_MASK))
diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index 0e1811399f8..4baf7672158 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -55,6 +55,10 @@
  * mask could be applied (and the original sign-extended value would be
  * optimized away by TCG) in the emitter function.
  *
+ * Finally, a "nop" operand type is used for multi-byte NOPs.  It accepts
+ * any value of mod including 11b (unlike M) but it does not try to
+ * interpret the operand (like M).
+ *
  * Vector operands
  * ---
  *
@@ -1056,6 +1060,16 @@ static const X86OpEntry opcodes_0F[256] = {
 [0xa0] = X86_OP_ENTRYr(PUSH, FS, w),
 [0xa1] = X86_OP_ENTRYw(POP, FS, w),
 
+[0x0b] = X86_OP_ENTRY0(UD),   /* UD2 */
+[0x0d] = X86_OP_ENTRY1(NOP,  M,v),/* 3DNow! prefetch */
+
+[0x18] = X86_OP_ENTRY1(NOP,  nop,v),  /* prefetch/reserved NOP */
+[0x19] = X86_OP_ENTRY1(NOP,  nop,v),  /* reserved NOP */
+[0x1c] = X86_OP_ENTRY1(NOP,  nop,v),  /* reserved NOP */
+[0x1d] = X86_OP_ENTRY1(NOP,  nop,v),  /* reserved NOP */
+[0x1e] = X86_OP_ENTRY1(NOP,  nop,v),  /* reserved NOP */
+[0x1f] = X86_OP_ENTRY1(NOP,  nop,v),  /* reserved NOP */
+
 [0x28] = X86_OP_ENTRY3(MOVDQ,  V,x,  None,None, W,x, vex1 p_00_66), /* 
MOVAPS */
 [0x29] = X86_OP_ENTRY3(MOVDQ,  W,x,  None,None, V,x, vex1 p_00_66), /* 
MOVAPS */
 [0x2A] = X86_OP_GROUP0(0F2A),
@@ -1135,6 +1149,8 @@ static const X86OpEntry opcodes_0F[256] = {
 [0xb6] = X86_OP_ENTRY3(MOV,G,v, E,b, None, None, zextT0), /* MOVZX */
 [0xb7] = X86_OP_ENTRY3(MOV,G,v, E,w, None, None, zextT0), /* MOVZX */
 
+/* decoded as modrm, which is visible as a difference between page fault 
and #UD */
+[0xb9] = X86_OP_ENTRYr(UD, nop,v),/* UD1 */
 [0xbe] = X86_OP_ENTRY3(MOV,G,v, E,b, None, None, s

[PATCH] target/i386: fix feature dependency for WAITPKG

2024-05-09 Thread Paolo Bonzini
The VMX feature bit depends on general availability of WAITPKG,
not the other way round.

Fixes: 33cc88261c3 ("target/i386: add support for 
VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE", 2023-08-28)
Cc: qemu-sta...@nongnu.org
Signed-off-by: Paolo Bonzini 
---
 target/i386/cpu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 1058b6803fd..f2ea6899e39 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1551,8 +1551,8 @@ static FeatureDep feature_dependencies[] = {
 .to = { FEAT_SVM,   ~0ull },
 },
 {
-.from = { FEAT_VMX_SECONDARY_CTLS,  
VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE },
-.to = { FEAT_7_0_ECX,   CPUID_7_0_ECX_WAITPKG },
+.from = { FEAT_7_0_ECX, CPUID_7_0_ECX_WAITPKG },
+.to = { FEAT_VMX_SECONDARY_CTLS,
VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE },
 },
 };
 
-- 
2.45.0




[PATCH] target/i386: add feature dependency for XSAVE

2024-05-09 Thread Paolo Bonzini
The XSAVEOPT, XSAVEC, XGETBV1, XSAVES features make no sense if you
cannot enable XSAVE in the first place.

Signed-off-by: Paolo Bonzini 
---
 target/i386/cpu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index f2ea6899e39..6f5ff71c6ee 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1550,6 +1550,10 @@ static FeatureDep feature_dependencies[] = {
 .from = { FEAT_8000_0001_ECX,   CPUID_EXT3_SVM },
 .to = { FEAT_SVM,   ~0ull },
 },
+{
+.from = { FEAT_1_ECX,   CPUID_EXT_XSAVE },
+.to = { FEAT_XSAVE, ~0ull },
+},
 {
 .from = { FEAT_7_0_ECX, CPUID_7_0_ECX_WAITPKG },
 .to = { FEAT_VMX_SECONDARY_CTLS,
VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE },
-- 
2.45.0




[PATCH 04/13] s390x: select correct components for no-board build

2024-05-09 Thread Paolo Bonzini
Signed-off-by: Paolo Bonzini 
---
 .gitlab-ci.d/buildtest.yml | 4 ++--
 target/s390x/Kconfig   | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index 13afd0df1f0..f8502905203 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -650,7 +650,7 @@ build-tci:
 # Check our reduced build configurations
 # requires libfdt: aarch64, arm, i386, loongarch64, microblaze, microblazeel,
 #   mips64el, or1k, ppc, ppc64, riscv32, riscv64, rx, x86_64
-# does not build without boards: i386, s390x, x86_64
+# does not build without boards: i386, x86_64
 build-without-defaults:
   extends: .native_build_job_template
   needs:
@@ -666,7 +666,7 @@ build-without-defaults:
   --disable-strip
 TARGETS: alpha-softmmu avr-softmmu cris-softmmu hppa-softmmu m68k-softmmu
   mips-softmmu mips64-softmmu mipsel-softmmu
-  sh4-softmmu sh4eb-softmmu sparc-softmmu
+  s390x-softmmu sh4-softmmu sh4eb-softmmu sparc-softmmu
   sparc64-softmmu tricore-softmmu xtensa-softmmu xtensaeb-softmmu
   hexagon-linux-user i386-linux-user s390x-linux-user
 MAKE_CHECK_ARGS: check
diff --git a/target/s390x/Kconfig b/target/s390x/Kconfig
index 72da48136c6..d886be48b47 100644
--- a/target/s390x/Kconfig
+++ b/target/s390x/Kconfig
@@ -1,2 +1,4 @@
 config S390X
 bool
+select PCI
+select S390_FLIC
-- 
2.45.0




[PATCH 09/13] i386: pc: remove unnecessary MachineClass overrides

2024-05-09 Thread Paolo Bonzini
There is no need to override these fields of MachineClass because they are
already set to the right value in the superclass.

Signed-off-by: Paolo Bonzini 
---
 include/hw/i386/x86.h | 4 
 hw/i386/pc.c  | 3 ---
 hw/i386/x86.c | 6 +++---
 3 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index d7b7d3f3ce0..c2062db13f5 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -114,10 +114,6 @@ uint32_t x86_cpu_apic_id_from_index(X86MachineState *pcms,
 
 void x86_cpu_new(X86MachineState *pcms, int64_t apic_id, Error **errp);
 void x86_cpus_init(X86MachineState *pcms, int default_cpu_version);
-CpuInstanceProperties x86_cpu_index_to_props(MachineState *ms,
- unsigned cpu_index);
-int64_t x86_get_default_cpu_node_id(const MachineState *ms, int idx);
-const CPUArchIdList *x86_possible_cpu_arch_ids(MachineState *ms);
 CPUArchId *x86_find_cpu_slot(MachineState *ms, uint32_t id, int *idx);
 void x86_rtc_set_cpus_count(ISADevice *rtc, uint16_t cpus_count);
 void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 19f21953b4a..bfb46e9b548 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1826,9 +1826,6 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 assert(!mc->get_hotplug_handler);
 mc->get_hotplug_handler = pc_get_hotplug_handler;
 mc->hotplug_allowed = pc_hotplug_allowed;
-mc->cpu_index_to_instance_props = x86_cpu_index_to_props;
-mc->get_default_cpu_node_id = x86_get_default_cpu_node_id;
-mc->possible_cpu_arch_ids = x86_possible_cpu_arch_ids;
 mc->auto_enable_numa_with_memhp = true;
 mc->auto_enable_numa_with_memdev = true;
 mc->has_hotpluggable_cpus = true;
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index c61f4ebfa6a..fcef652c1e3 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -443,7 +443,7 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 numa_cpu_pre_plug(cpu_slot, dev, errp);
 }
 
-CpuInstanceProperties
+static CpuInstanceProperties
 x86_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
 {
 MachineClass *mc = MACHINE_GET_CLASS(ms);
@@ -453,7 +453,7 @@ x86_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
 return possible_cpus->cpus[cpu_index].props;
 }
 
-int64_t x86_get_default_cpu_node_id(const MachineState *ms, int idx)
+static int64_t x86_get_default_cpu_node_id(const MachineState *ms, int idx)
 {
X86CPUTopoIDs topo_ids;
X86MachineState *x86ms = X86_MACHINE(ms);
@@ -467,7 +467,7 @@ int64_t x86_get_default_cpu_node_id(const MachineState *ms, 
int idx)
return topo_ids.pkg_id % ms->numa_state->num_nodes;
 }
 
-const CPUArchIdList *x86_possible_cpu_arch_ids(MachineState *ms)
+static const CPUArchIdList *x86_possible_cpu_arch_ids(MachineState *ms)
 {
 X86MachineState *x86ms = X86_MACHINE(ms);
 unsigned int max_cpus = ms->smp.max_cpus;
-- 
2.45.0




[PATCH 10/13] hw/i386: split x86.c in multiple parts

2024-05-09 Thread Paolo Bonzini
Keep the basic X86MachineState definition in x86.c.  Move out functions that
are only needed by other files: x86-common.c for the pc and microvm machines,
x86-cpu.c for those used by accelerator code.

Signed-off-by: Paolo Bonzini 
---
 include/hw/i386/x86.h |6 +-
 hw/i386/x86-common.c  | 1007 +++
 hw/i386/x86-cpu.c |   97 
 hw/i386/x86.c | 1052 +
 hw/i386/meson.build   |4 +-
 5 files changed, 1113 insertions(+), 1053 deletions(-)
 create mode 100644 hw/i386/x86-common.c
 create mode 100644 hw/i386/x86-cpu.c

diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index c2062db13f5..b006f16b8d3 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -21,6 +21,7 @@
 #include "exec/memory.h"
 
 #include "hw/boards.h"
+#include "hw/i386/topology.h"
 #include "hw/intc/ioapic.h"
 #include "hw/isa/isa.h"
 #include "qom/object.h"
@@ -109,12 +110,11 @@ struct X86MachineState {
 #define TYPE_X86_MACHINE   MACHINE_TYPE_NAME("x86")
 OBJECT_DECLARE_TYPE(X86MachineState, X86MachineClass, X86_MACHINE)
 
-uint32_t x86_cpu_apic_id_from_index(X86MachineState *pcms,
+void init_topo_info(X86CPUTopoInfo *topo_info, const X86MachineState *x86ms);
+uint32_t x86_cpu_apic_id_from_index(X86MachineState *x86ms,
 unsigned int cpu_index);
 
-void x86_cpu_new(X86MachineState *pcms, int64_t apic_id, Error **errp);
 void x86_cpus_init(X86MachineState *pcms, int default_cpu_version);
-CPUArchId *x86_find_cpu_slot(MachineState *ms, uint32_t id, int *idx);
 void x86_rtc_set_cpus_count(ISADevice *rtc, uint16_t cpus_count);
 void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
   DeviceState *dev, Error **errp);
diff --git a/hw/i386/x86-common.c b/hw/i386/x86-common.c
new file mode 100644
index 000..67b03c913a5
--- /dev/null
+++ b/hw/i386/x86-common.c
@@ -0,0 +1,1007 @@
+/*
+ * Copyright (c) 2003-2004 Fabrice Bellard
+ * Copyright (c) 2019, 2024 Red Hat, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "qemu/cutils.h"
+#include "qemu/units.h"
+#include "qemu/datadir.h"
+#include "qapi/error.h"
+#include "sysemu/numa.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/xen.h"
+#include "trace.h"
+
+#include "hw/i386/x86.h"
+#include "target/i386/cpu.h"
+#include "hw/rtc/mc146818rtc.h"
+#include "target/i386/sev.h"
+
+#include "hw/acpi/cpu_hotplug.h"
+#include "hw/irq.h"
+#include "hw/loader.h"
+#include "multiboot.h"
+#include "elf.h"
+#include "standard-headers/asm-x86/bootparam.h"
+#include CONFIG_DEVICES
+#include "kvm/kvm_i386.h"
+
+#ifdef CONFIG_XEN_EMU
+#include "hw/xen/xen.h"
+#include "hw/i386/kvm/xen_evtchn.h"
+#endif
+
+/* Physical Address of PVH entry point read from kernel ELF NOTE */
+static size_t pvh_start_addr;
+
+static void x86_cpu_new(X86MachineState *x86ms, int64_t apic_id, Error **errp)
+{
+Object *cpu = object_new(MACHINE(x86ms)->cpu_type);
+
+if (!object_property_set_uint(cpu, "apic-id", apic_id, errp)) {
+goto out;
+}
+qdev_realize(DEVICE(cpu), NULL, errp);
+
+out:
+object_unref(cpu);
+}
+
+void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version)
+{
+int i;
+const CPUArchIdList *possible_cpus;
+MachineState *ms = MACHINE(x86ms);
+MachineClass *mc = MACHINE_GET_CLASS(x86ms);
+
+x86_cpu_set_default_version(default_cpu_version);
+
+/*
+ * Calculates the limit to CPU APIC ID values
+ *
+ * Limit for the APIC ID value, so that all
+ * CPU APIC IDs are < x86ms->apic_id_limit.
+ *
+ * This is used for FW_CFG_MAX_CPUS. See comments on fw_cfg_arch_create().
+ */
+x86ms->apic_id_limit = x86_cpu_apic_id_from_index(x86ms,
+  ms->smp.max_cpus - 1) + 
1;
+
+/*
+ * Can we support APIC ID 255 or higher?  With KVM,

[PATCH 11/13] hw/i386: move rtc-reset-reinjection command out of hw/rtc

2024-05-09 Thread Paolo Bonzini
The rtc-reset-reinjection QMP command is specific to x86, other boards do not
have the ACK tracking functionality that is needed for RTC interrupt
reinjection.  Therefore the QMP command is only included in x86, but
qmp_rtc_reset_reinjection() is implemented by hw/rtc/mc146818rtc.c
and requires tracking of all created RTC devices.  Move the implementation
to hw/i386, so that 1) it is available even if no RTC device exist
2) the only RTC that exists is easily found in x86ms->rtc.

Signed-off-by: Paolo Bonzini 
---
 include/hw/rtc/mc146818rtc.h |  2 +-
 hw/i386/monitor.c| 46 
 hw/rtc/mc146818rtc.c | 12 ++
 hw/i386/meson.build  |  1 +
 4 files changed, 50 insertions(+), 11 deletions(-)
 create mode 100644 hw/i386/monitor.c

diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h
index 97cec0b3e84..64893be1515 100644
--- a/include/hw/rtc/mc146818rtc.h
+++ b/include/hw/rtc/mc146818rtc.h
@@ -55,6 +55,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, int 
base_year,
 qemu_irq intercept_irq);
 void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int val);
 int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr);
-void qmp_rtc_reset_reinjection(Error **errp);
+void rtc_reset_reinjection(MC146818RtcState *rtc);
 
 #endif /* HW_RTC_MC146818RTC_H */
diff --git a/hw/i386/monitor.c b/hw/i386/monitor.c
new file mode 100644
index 000..1ebd3564bf2
--- /dev/null
+++ b/hw/i386/monitor.c
@@ -0,0 +1,46 @@
+/*
+ * QEMU monitor
+ *
+ * Copyright (c) 2003-2004 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "monitor/monitor.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/error.h"
+#include "qapi/qapi-commands-misc-target.h"
+#include "hw/i386/x86.h"
+#include "hw/rtc/mc146818rtc.h"
+
+#include CONFIG_DEVICES
+
+void qmp_rtc_reset_reinjection(Error **errp)
+{
+X86MachineState *x86ms = X86_MACHINE(qdev_get_machine());
+
+#ifdef CONFIG_MC146818RTC
+if (x86ms->rtc) {
+rtc_reset_reinjection(MC146818_RTC(x86ms->rtc));
+}
+#else
+assert(!x86ms->rtc);
+#endif
+}
diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
index 3379f92748b..8ccee9a385d 100644
--- a/hw/rtc/mc146818rtc.c
+++ b/hw/rtc/mc146818rtc.c
@@ -104,16 +104,9 @@ static void rtc_coalesced_timer_update(MC146818RtcState *s)
 }
 }
 
-static QLIST_HEAD(, MC146818RtcState) rtc_devices =
-QLIST_HEAD_INITIALIZER(rtc_devices);
-
-void qmp_rtc_reset_reinjection(Error **errp)
+void rtc_reset_reinjection(MC146818RtcState *rtc)
 {
-MC146818RtcState *s;
-
-QLIST_FOREACH(s, &rtc_devices, link) {
-s->irq_coalesced = 0;
-}
+rtc->irq_coalesced = 0;
 }
 
 static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s)
@@ -941,7 +934,6 @@ static void rtc_realizefn(DeviceState *dev, Error **errp)
 object_property_add_tm(OBJECT(s), "date", rtc_get_date);
 
 qdev_init_gpio_out(dev, &s->irq, 1);
-QLIST_INSERT_HEAD(&rtc_devices, s, link);
 }
 
 MC146818RtcState *mc146818_rtc_init(ISABus *bus, int base_year,
diff --git a/hw/i386/meson.build b/hw/i386/meson.build
index 3437da0aad1..03aad10df7a 100644
--- a/hw/i386/meson.build
+++ b/hw/i386/meson.build
@@ -2,6 +2,7 @@ i386_ss = ss.source_set()
 i386_ss.add(files(
   'fw_cfg.c',
   'e820_memory_layout.c',
+  'monitor.c',
   'multiboot.c',
   'x86.c',
   'x86-cpu.c',
-- 
2.45.0




[PATCH 05/13] tests/qtest: s390x: fix operation in a build without any boards or devices

2024-05-09 Thread Paolo Bonzini
Do the bare minimum to ensure that at least a vanilla
--without-default-devices build works for all targets except i386,
x86_64 and ppc64.  In particular this fixes s390x-softmmu; i386 and
x86_64 have about a dozen failing tests that do not pass -M and therefore
require a default machine type; ppc64 has the same issue, though only
with numa-test.

If we can for now ignore the cases where boards and devices are picked
by hand, drive_del-test however can be fixed easily; almost all tests
check for the virtio-blk or virtio-scsi device that they use, and are
already skipped.  Only one didn't get the memo; plus another one does
not need a machine at all and can be run with -M none.

Signed-off-by: Paolo Bonzini 
---
 tests/qtest/drive_del-test.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/drive_del-test.c b/tests/qtest/drive_del-test.c
index 8a6f3ac963d..7b67a4bbee4 100644
--- a/tests/qtest/drive_del-test.c
+++ b/tests/qtest/drive_del-test.c
@@ -173,7 +173,7 @@ static void test_drive_without_dev(void)
 QTestState *qts;
 
 /* Start with an empty drive */
-qts = qtest_init("-drive if=none,id=drive0");
+qts = qtest_init("-drive if=none,id=drive0 -M none");
 
 /* Delete the drive */
 drive_del(qts);
@@ -192,6 +192,11 @@ static void test_after_failed_device_add(void)
 QDict *response;
 QTestState *qts;
 
+if (!has_device_builtin("virtio-blk")) {
+g_test_skip("Device virtio-blk is not available");
+return;
+}
+
 snprintf(driver, sizeof(driver), "virtio-blk-%s",
  qvirtio_get_dev_type());
 
-- 
2.45.0




[PATCH 00/13] fix --without-default-devices build and (mostly) tests

2024-05-09 Thread Paolo Bonzini
The recent change to make boards "default y" made them go away from
a --without-default-devices build, because the boards are not anymore
enabled explicitly in configs/devices/.

This is a problem for some targets that were not fully ready for this
and have generic target code that needs symbols from boards or devices.
The more complicated ones are s390x and i386.  In some cases some
components simply have to be required by target/ARCH/, but often
it is better to move some code around, associating it with boards
instead of targets or vice versa.

--without-default-devices builds in practice will always use a
custom config, but let's keep things tidy.  This series does
this for s390x (patches 1 to 5) and x86 (patches 6 to 12).  As a
small addendum, patch 13 fixes qtest for ARM (32- and 64-bit) on a
--without-default-devices build.

The series seems huge, but it's mostly code movement.  Patch 10
in particular moves board building code, which is unrelated to the
X86MachineState superclass and has many dependencies on NUMA or
hw/i386/pc-sysfw.c.

I suspect that there are more issues, for example when building
a CONFIG_MICROVM-only binary.  Fixing builds without boards on vanilla
upstream configs is the more pressing problem, though.

Patches 6 and 7 were tested with the Avocado Xen-on-KVM tests.

Paolo


Paolo Bonzini (13):
  s390x: move s390_cpu_addr2state to target/s390x/sigp.c
  s390_flic: add migration-enabled property
  s390: move css_migration_enabled from machine to css.c
  s390x: select correct components for no-board build
  tests/qtest: s390x: fix operation in a build without any boards or
devices
  xen: initialize legacy backends from xen_bus_init()
  xen: register legacy backends via xen_backend_init
  i386: correctly select code in hw/i386 that depends on other
components
  i386: pc: remove unnecessary MachineClass overrides
  hw/i386: split x86.c in multiple parts
  hw/i386: move rtc-reset-reinjection command out of hw/rtc
  i386: select correct components for no-board build
  tests/qtest: arm: fix operation in a build without any boards or
devices

 include/hw/i386/x86.h   |   10 +-
 include/hw/rtc/mc146818rtc.h|2 +-
 include/hw/s390x/css.h  |6 +
 include/hw/s390x/s390-virtio-ccw.h  |7 -
 include/hw/s390x/s390_flic.h|1 +
 include/hw/xen/xen-legacy-backend.h |   14 +-
 include/hw/xen/xen_pvdev.h  |1 -
 hw/9pfs/xen-9p-backend.c|8 +-
 hw/display/xenfb.c  |8 +-
 hw/i386/fw_cfg.c|2 +
 hw/i386/monitor.c   |   46 ++
 hw/i386/pc.c|4 -
 hw/i386/x86-common.c| 1007 +
 hw/i386/x86-cpu.c   |   97 +++
 hw/i386/x86.c   | 1058 +--
 hw/intc/ioapic-stub.c   |   29 +
 hw/intc/s390_flic.c |6 +-
 hw/rtc/mc146818rtc.c|   12 +-
 hw/s390x/css.c  |   10 +-
 hw/s390x/s390-virtio-ccw.c  |   32 +-
 hw/usb/xen-usb.c|   14 +-
 hw/xen/xen-bus.c|4 +
 hw/xen/xen-hvm-common.c |2 -
 hw/xen/xen-legacy-backend.c |   16 -
 hw/xenpv/xen_machine_pv.c   |5 +-
 target/s390x/sigp.c |   17 +
 tests/qtest/arm-cpu-features.c  |4 +
 tests/qtest/drive_del-test.c|7 +-
 tests/qtest/migration-test.c|6 +
 tests/qtest/numa-test.c |4 +
 .gitlab-ci.d/buildtest.yml  |4 +-
 hw/i386/meson.build |7 +-
 hw/intc/meson.build |2 +-
 target/i386/Kconfig |1 +
 target/s390x/Kconfig|2 +
 35 files changed, 1289 insertions(+), 1166 deletions(-)
 create mode 100644 hw/i386/monitor.c
 create mode 100644 hw/i386/x86-common.c
 create mode 100644 hw/i386/x86-cpu.c
 create mode 100644 hw/intc/ioapic-stub.c

-- 
2.45.0




[PATCH 02/13] s390_flic: add migration-enabled property

2024-05-09 Thread Paolo Bonzini
Instead of mucking with css_migration_enabled(), add a property specific to
the FLIC device, similar to what is done for TYPE_S390_STATTRIB.

Signed-off-by: Paolo Bonzini 
---
 include/hw/s390x/s390_flic.h | 1 +
 hw/intc/s390_flic.c  | 6 +-
 hw/s390x/s390-virtio-ccw.c   | 1 +
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/hw/s390x/s390_flic.h b/include/hw/s390x/s390_flic.h
index 3907a13d076..bcb081def58 100644
--- a/include/hw/s390x/s390_flic.h
+++ b/include/hw/s390x/s390_flic.h
@@ -47,6 +47,7 @@ struct S390FLICState {
 /* to limit AdapterRoutes.num_routes for compat */
 uint32_t adapter_routes_max_batch;
 bool ais_supported;
+bool migration_enabled;
 };
 
 
diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c
index f4a848460b8..7f930800877 100644
--- a/hw/intc/s390_flic.c
+++ b/hw/intc/s390_flic.c
@@ -405,6 +405,8 @@ static void qemu_s390_flic_class_init(ObjectClass *oc, void 
*data)
 static Property s390_flic_common_properties[] = {
 DEFINE_PROP_UINT32("adapter_routes_max_batch", S390FLICState,
adapter_routes_max_batch, ADAPTER_ROUTES_MAX_GSI),
+DEFINE_PROP_BOOL("migration-enabled", S390FLICState,
+ migration_enabled, true),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -457,7 +459,9 @@ type_init(qemu_s390_flic_register_types)
 
 static bool adapter_info_so_needed(void *opaque)
 {
-return css_migration_enabled();
+S390FLICState *fs = S390_FLIC_COMMON(opaque);
+
+return fs->migration_enabled;
 }
 
 const VMStateDescription vmstate_adapter_info_so = {
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index feabc173eb3..1383e47eeb5 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -1174,6 +1174,7 @@ static void ccw_machine_2_9_class_options(MachineClass 
*mc)
 S390CcwMachineClass *s390mc = S390_CCW_MACHINE_CLASS(mc);
 static GlobalProperty compat[] = {
 { TYPE_S390_STATTRIB, "migration-enabled", "off", },
+{ TYPE_S390_FLIC_COMMON, "migration-enabled", "off", },
 };
 
 ccw_machine_2_10_class_options(mc);
-- 
2.45.0




[PATCH 03/13] s390: move css_migration_enabled from machine to css.c

2024-05-09 Thread Paolo Bonzini
The CSS subsystem uses global variables, just face the truth and use
a variable also for whether the CSS vmstate is in use; remove the
indirection of fetching it from the machine type, which makes the
TCG code depend unnecessarily on the virtio-ccw machine.

Signed-off-by: Paolo Bonzini 
---
 include/hw/s390x/css.h |  6 ++
 include/hw/s390x/s390-virtio-ccw.h |  7 ---
 hw/s390x/css.c | 10 +++---
 hw/s390x/s390-virtio-ccw.c | 15 +++
 4 files changed, 16 insertions(+), 22 deletions(-)

diff --git a/include/hw/s390x/css.h b/include/hw/s390x/css.h
index ba72ee3dd20..8289e458370 100644
--- a/include/hw/s390x/css.h
+++ b/include/hw/s390x/css.h
@@ -333,4 +333,10 @@ static inline int ccw_dstream_read_buf(CcwDataStream *cds, 
void *buff, int len)
 #define ccw_dstream_read(cds, v) ccw_dstream_read_buf((cds), &(v), sizeof(v))
 #define ccw_dstream_write(cds, v) ccw_dstream_write_buf((cds), &(v), sizeof(v))
 
+/**
+ * true if (vmstate based) migration of the channel subsystem
+ * is enabled, false if it is disabled.
+ */
+extern bool css_migration_enabled;
+
 #endif
diff --git a/include/hw/s390x/s390-virtio-ccw.h 
b/include/hw/s390x/s390-virtio-ccw.h
index c1d46e78af8..c0494e511cb 100644
--- a/include/hw/s390x/s390-virtio-ccw.h
+++ b/include/hw/s390x/s390-virtio-ccw.h
@@ -43,7 +43,6 @@ struct S390CcwMachineClass {
 /*< public >*/
 bool ri_allowed;
 bool cpu_model_allowed;
-bool css_migration_enabled;
 bool hpage_1m_allowed;
 int max_threads;
 };
@@ -55,10 +54,4 @@ bool cpu_model_allowed(void);
 /* 1M huge page mappings allowed by the machine */
 bool hpage_1m_allowed(void);
 
-/**
- * Returns true if (vmstate based) migration of the channel subsystem
- * is enabled, false if it is disabled.
- */
-bool css_migration_enabled(void);
-
 #endif
diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index 295530963a6..b2d5327dbf4 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -23,6 +23,8 @@
 #include "hw/s390x/s390-virtio-ccw.h"
 #include "hw/s390x/s390-ccw.h"
 
+bool css_migration_enabled = true;
+
 typedef struct CrwContainer {
 CRW crw;
 QTAILQ_ENTRY(CrwContainer) sibling;
@@ -180,7 +182,7 @@ static const VMStateDescription vmstate_orb = {
 
 static bool vmstate_schdev_orb_needed(void *opaque)
 {
-return css_migration_enabled();
+return css_migration_enabled;
 }
 
 static const VMStateDescription vmstate_schdev_orb = {
@@ -388,7 +390,7 @@ static int subch_dev_post_load(void *opaque, int version_id)
 css_subch_assign(s->cssid, s->ssid, s->schid, s->devno, s);
 }
 
-if (css_migration_enabled()) {
+if (css_migration_enabled) {
 /* No compat voodoo to do ;) */
 return 0;
 }
@@ -412,7 +414,9 @@ static int subch_dev_post_load(void *opaque, int version_id)
 
 void css_register_vmstate(void)
 {
-vmstate_register(NULL, 0, &vmstate_css, &channel_subsys);
+if (css_migration_enabled) {
+vmstate_register(NULL, 0, &vmstate_css, &channel_subsys);
+}
 }
 
 IndAddr *get_indicator(hwaddr ind_addr, int len)
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 1383e47eeb5..aa90703d518 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -275,11 +275,9 @@ static void ccw_init(MachineState *machine)
 s390_enable_css_support(s390_cpu_addr2state(0));
 
 ret = css_create_css_image(VIRTUAL_CSSID, true);
-
 assert(ret == 0);
-if (css_migration_enabled()) {
-css_register_vmstate();
-}
+
+css_register_vmstate();
 
 /* Create VirtIO network adapters */
 s390_create_virtio_net(BUS(css_bus), mc->default_nic);
@@ -741,7 +739,6 @@ static void ccw_machine_class_init(ObjectClass *oc, void 
*data)
 
 s390mc->ri_allowed = true;
 s390mc->cpu_model_allowed = true;
-s390mc->css_migration_enabled = true;
 s390mc->hpage_1m_allowed = true;
 s390mc->max_threads = 1;
 mc->init = ccw_init;
@@ -811,11 +808,6 @@ static const TypeInfo ccw_machine_info = {
 },
 };
 
-bool css_migration_enabled(void)
-{
-return get_machine_class()->css_migration_enabled;
-}
-
 #define DEFINE_CCW_MACHINE(suffix, verstr, latest)\
 static void ccw_machine_##suffix##_class_init(ObjectClass *oc,\
   void *data) \
@@ -1171,7 +1163,6 @@ static void ccw_machine_2_9_instance_options(MachineState 
*machine)
 
 static void ccw_machine_2_9_class_options(MachineClass *mc)
 {
-S390CcwMachineClass *s390mc = S390_CCW_MACHINE_CLASS(mc);
 static GlobalProperty compat[] = {
 { TYPE_S390_STATTRIB, "migration-enabled", "off", },
 { TYPE_S390_FLIC_COMMON, "migration-enabled", "off", },
@@ -1180,7 +1171,7 @@ static void ccw_machine_2_9_class_options(MachineClass 
*mc)
 ccw_machine_2_10_class_options(mc);
 compat_props_add(mc->compat_props, hw_compat_2_9, hw_compat_2_9_len);
 compat_props_add(mc->compa

[PATCH 06/13] xen: initialize legacy backends from xen_bus_init()

2024-05-09 Thread Paolo Bonzini
Prepare for moving the calls to xen_be_register() under the
control of xen_bus_init(), using the normal xen_backend_init()
method that is used by the "modern" backends.

This requires the xenstore global variable to be initialized,
which is done by xen_be_init().  To ensure that everything is
ready at the time the xen_backend_init() functions are called,
remove the xen_be_init() function from all the boards and
place it directly in xen_bus_init().

Signed-off-by: Paolo Bonzini 
---
 hw/i386/pc.c  | 1 -
 hw/xen/xen-bus.c  | 4 
 hw/xen/xen-hvm-common.c   | 2 --
 hw/xenpv/xen_machine_pv.c | 5 +
 4 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 505ea750f4d..19f21953b4a 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1250,7 +1250,6 @@ void pc_basic_device_init(struct PCMachineState *pcms,
 pci_create_simple(pcms->pcibus, -1, "xen-platform");
 }
 xen_bus_init();
-xen_be_init();
 }
 #endif
 
diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c
index fb82cc33e48..95b207ac8b4 100644
--- a/hw/xen/xen-bus.c
+++ b/hw/xen/xen-bus.c
@@ -13,6 +13,7 @@
 #include "hw/sysbus.h"
 #include "hw/xen/xen.h"
 #include "hw/xen/xen-backend.h"
+#include "hw/xen/xen-legacy-backend.h" /* xen_be_init() */
 #include "hw/xen/xen-bus.h"
 #include "hw/xen/xen-bus-helper.h"
 #include "monitor/monitor.h"
@@ -329,6 +330,9 @@ static void xen_bus_realize(BusState *bus, Error **errp)
 goto fail;
 }
 
+/* Initialize legacy backend core & drivers */
+xen_be_init();
+
 if (xs_node_scanf(xenbus->xsh, XBT_NULL, "", /* domain root node */
   "domid", NULL, "%u", &domid) == 1) {
 xenbus->backend_id = domid;
diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
index 1627da73982..2d1b0321214 100644
--- a/hw/xen/xen-hvm-common.c
+++ b/hw/xen/xen-hvm-common.c
@@ -872,8 +872,6 @@ void xen_register_ioreq(XenIOState *state, unsigned int 
max_cpus,
 
 xen_bus_init();
 
-xen_be_init();
-
 return;
 
 err:
diff --git a/hw/xenpv/xen_machine_pv.c b/hw/xenpv/xen_machine_pv.c
index 1130d1a1479..b500ce09891 100644
--- a/hw/xenpv/xen_machine_pv.c
+++ b/hw/xenpv/xen_machine_pv.c
@@ -34,8 +34,7 @@ static void xen_init_pv(MachineState *machine)
 {
 setup_xen_backend_ops();
 
-/* Initialize backend core & drivers */
-xen_be_init();
+xen_bus_init();
 
 switch (xen_mode) {
 case XEN_ATTACH:
@@ -60,8 +59,6 @@ static void xen_init_pv(MachineState *machine)
 vga_interface_created = true;
 }
 
-xen_bus_init();
-
 /* config cleanup hook */
 atexit(xen_config_cleanup);
 }
-- 
2.45.0




[PATCH 07/13] xen: register legacy backends via xen_backend_init

2024-05-09 Thread Paolo Bonzini
It is okay to register legacy backends in the middle of xen_bus_init().
All that the registration does is record the existence of the backend
in xenstore.

This makes it possible to remove them from the build without introducing
undefined symbols in xen_be_init().  It also removes the need for the
backend_register callback, whose only purpose is to avoid registering
nonfunctional backends.

Signed-off-by: Paolo Bonzini 
---
 include/hw/xen/xen-legacy-backend.h | 14 ++
 include/hw/xen/xen_pvdev.h  |  1 -
 hw/9pfs/xen-9p-backend.c|  8 +++-
 hw/display/xenfb.c  |  8 +++-
 hw/usb/xen-usb.c| 14 --
 hw/xen/xen-legacy-backend.c | 16 
 6 files changed, 20 insertions(+), 41 deletions(-)

diff --git a/include/hw/xen/xen-legacy-backend.h 
b/include/hw/xen/xen-legacy-backend.h
index 2cca1747786..979c4ea04c5 100644
--- a/include/hw/xen/xen-legacy-backend.h
+++ b/include/hw/xen/xen-legacy-backend.h
@@ -66,18 +66,8 @@ static inline void xen_be_unmap_grant_ref(struct 
XenLegacyDevice *xendev,
 return xen_be_unmap_grant_refs(xendev, ptr, &ref, 1);
 }
 
-/* actual backend drivers */
-extern struct XenDevOps xen_console_ops;  /* xen_console.c */
-extern struct XenDevOps xen_kbdmouse_ops; /* xen_framebuffer.c */
-extern struct XenDevOps xen_framebuffer_ops;  /* xen_framebuffer.c */
-extern struct XenDevOps xen_blkdev_ops;   /* xen_disk.c*/
-#ifdef CONFIG_VIRTFS
-extern struct XenDevOps xen_9pfs_ops;   /* xen-9p-backend.c*/
-#endif
-extern struct XenDevOps xen_netdev_ops;   /* xen_nic.c */
-#ifdef CONFIG_USB_LIBUSB
-extern struct XenDevOps xen_usb_ops;  /* xen-usb.c */
-#endif
+/* backend drivers not included in all machines */
+extern struct XenDevOps xen_framebuffer_ops;  /* xenfb.c */
 
 /* configuration (aka xenbus setup) */
 void xen_config_cleanup(void);
diff --git a/include/hw/xen/xen_pvdev.h b/include/hw/xen/xen_pvdev.h
index ddad4b9f36a..fdf84f47af1 100644
--- a/include/hw/xen/xen_pvdev.h
+++ b/include/hw/xen/xen_pvdev.h
@@ -29,7 +29,6 @@ struct XenDevOps {
  const char *node);
 void  (*frontend_changed)(struct XenLegacyDevice *xendev,
   const char *node);
-int   (*backend_register)(void);
 };
 
 struct XenLegacyDevice {
diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c
index 4aa9c8c736d..a3ac53f989e 100644
--- a/hw/9pfs/xen-9p-backend.c
+++ b/hw/9pfs/xen-9p-backend.c
@@ -513,7 +513,7 @@ static void xen_9pfs_alloc(struct XenLegacyDevice *xendev)
 xenstore_write_be_int(xendev, "max-ring-page-order", MAX_RING_ORDER);
 }
 
-struct XenDevOps xen_9pfs_ops = {
+static struct XenDevOps xen_9pfs_ops = {
 .size   = sizeof(Xen9pfsDev),
 .flags  = DEVOPS_FLAG_NEED_GNTDEV,
 .alloc  = xen_9pfs_alloc,
@@ -522,3 +522,9 @@ struct XenDevOps xen_9pfs_ops = {
 .disconnect = xen_9pfs_disconnect,
 .free   = xen_9pfs_free,
 };
+
+static void xen_9pfs_register_backend(void)
+{
+xen_be_register("9pfs", &xen_9pfs_ops);
+}
+xen_backend_init(xen_9pfs_register_backend);
diff --git a/hw/display/xenfb.c b/hw/display/xenfb.c
index b2130a0d700..27536bfce0c 100644
--- a/hw/display/xenfb.c
+++ b/hw/display/xenfb.c
@@ -972,7 +972,7 @@ static void fb_event(struct XenLegacyDevice *xendev)
 
 /*  */
 
-struct XenDevOps xen_kbdmouse_ops = {
+static struct XenDevOps xen_kbdmouse_ops = {
 .size   = sizeof(struct XenInput),
 .init   = input_init,
 .initialise = input_initialise,
@@ -995,3 +995,9 @@ static const GraphicHwOps xenfb_ops = {
 .gfx_update  = xenfb_update,
 .ui_info = xenfb_ui_info,
 };
+
+static void xen_vkbd_register_backend(void)
+{
+xen_be_register("vkbd", &xen_kbdmouse_ops);
+}
+xen_backend_init(xen_vkbd_register_backend);
diff --git a/hw/usb/xen-usb.c b/hw/usb/xen-usb.c
index 09ec326aeae..416623f956a 100644
--- a/hw/usb/xen-usb.c
+++ b/hw/usb/xen-usb.c
@@ -1083,7 +1083,7 @@ static void usbback_event(struct XenLegacyDevice *xendev)
 qemu_bh_schedule(usbif->bh);
 }
 
-struct XenDevOps xen_usb_ops = {
+static struct XenDevOps xen_usb_ops = {
 .size= sizeof(struct usbback_info),
 .flags   = DEVOPS_FLAG_NEED_GNTDEV,
 .init= usbback_init,
@@ -1095,15 +1095,9 @@ struct XenDevOps xen_usb_ops = {
 .event   = usbback_event,
 };
 
-#else /* USBIF_SHORT_NOT_OK */
-
-static int usbback_not_supported(void)
+static void xen_usb_register_backend(void)
 {
-return -EINVAL;
+xen_be_register("qusb", &xen_usb_ops);
 }
-
-struct XenDevOps xen_usb_ops = {
-.backend_register = usbback_not_supported,
-};
-
+xen_backend_init(xen_usb_register_backend);
 #endif
diff --git a/hw/xen/xen-legacy-backend.c b/hw/xen/xen-legacy-backend.c
index 124dd5f3d68..6f0b300a421 100644
--- a/hw/xen/xe

[PATCH 12/13] i386: select correct components for no-board build

2024-05-09 Thread Paolo Bonzini
The local APIC is a part of the CPU and has callbacks that are invoked
from multiple accelerators.

The IOAPIC on the other hand is optional, but ioapic_eoi_broadcast is
used by common x86 code to implement the IOAPIC's implicit EOI mode.
Add a stub in case the IOAPIC device is not included but the APIC is.

Signed-off-by: Paolo Bonzini 
---
 hw/intc/ioapic-stub.c  | 29 +
 .gitlab-ci.d/buildtest.yml |  2 +-
 hw/intc/meson.build|  2 +-
 target/i386/Kconfig|  1 +
 4 files changed, 32 insertions(+), 2 deletions(-)
 create mode 100644 hw/intc/ioapic-stub.c

diff --git a/hw/intc/ioapic-stub.c b/hw/intc/ioapic-stub.c
new file mode 100644
index 000..4dcd86248da
--- /dev/null
+++ b/hw/intc/ioapic-stub.c
@@ -0,0 +1,29 @@
+/*
+ *  ioapic.c IOAPIC emulation logic
+ *
+ *  Copyright (c) 2004-2005 Fabrice Bellard
+ *
+ *  Split the ioapic logic from apic.c
+ *  Xiantao Zhang 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/intc/ioapic.h"
+
+void ioapic_eoi_broadcast(int vector)
+{
+}
diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index f8502905203..62616157206 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -650,7 +650,7 @@ build-tci:
 # Check our reduced build configurations
 # requires libfdt: aarch64, arm, i386, loongarch64, microblaze, microblazeel,
 #   mips64el, or1k, ppc, ppc64, riscv32, riscv64, rx, x86_64
-# does not build without boards: i386, x86_64
+# fails qtest without boards: i386, x86_64
 build-without-defaults:
   extends: .native_build_job_template
   needs:
diff --git a/hw/intc/meson.build b/hw/intc/meson.build
index f4b540e6a8b..0d1b7d0a432 100644
--- a/hw/intc/meson.build
+++ b/hw/intc/meson.build
@@ -20,7 +20,7 @@ system_ss.add(when: 'CONFIG_GOLDFISH_PIC', if_true: 
files('goldfish_pic.c'))
 system_ss.add(when: 'CONFIG_HEATHROW_PIC', if_true: files('heathrow_pic.c'))
 system_ss.add(when: 'CONFIG_I8259', if_true: files('i8259_common.c', 
'i8259.c'))
 system_ss.add(when: 'CONFIG_IMX', if_true: files('imx_avic.c', 'imx_gpcv2.c'))
-system_ss.add(when: 'CONFIG_IOAPIC', if_true: files('ioapic_common.c'))
+system_ss.add(when: 'CONFIG_IOAPIC', if_true: files('ioapic_common.c'), 
if_false: files('ioapic-stub.c'))
 system_ss.add(when: 'CONFIG_OMAP', if_true: files('omap_intc.c'))
 system_ss.add(when: 'CONFIG_OPENPIC', if_true: files('openpic.c'))
 system_ss.add(when: 'CONFIG_PL190', if_true: files('pl190.c'))
diff --git a/target/i386/Kconfig b/target/i386/Kconfig
index ad9291d3b8f..6b0feef0299 100644
--- a/target/i386/Kconfig
+++ b/target/i386/Kconfig
@@ -1,5 +1,6 @@
 config I386
 bool
+select APIC
 # kvm_arch_fixup_msi_route() needs to access PCIDevice
 select PCI if KVM
 
-- 
2.45.0




[PATCH 08/13] i386: correctly select code in hw/i386 that depends on other components

2024-05-09 Thread Paolo Bonzini
fw_cfg.c and vapic.c are currently included unconditionally but
depend on other components.  vapic.c depends on the local APIC,
while fw_cfg.c includes a piece of AML builder code that depends
on CONFIG_ACPI.

Signed-off-by: Paolo Bonzini 
---
 hw/i386/fw_cfg.c| 2 ++
 hw/i386/meson.build | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/i386/fw_cfg.c b/hw/i386/fw_cfg.c
index d802d2787f0..6e0d9945d07 100644
--- a/hw/i386/fw_cfg.c
+++ b/hw/i386/fw_cfg.c
@@ -203,6 +203,7 @@ void fw_cfg_build_feature_control(MachineState *ms, 
FWCfgState *fw_cfg)
 fw_cfg_add_file(fw_cfg, "etc/msr_feature_control", val, sizeof(*val));
 }
 
+#ifdef CONFIG_ACPI
 void fw_cfg_add_acpi_dsdt(Aml *scope, FWCfgState *fw_cfg)
 {
 /*
@@ -229,3 +230,4 @@ void fw_cfg_add_acpi_dsdt(Aml *scope, FWCfgState *fw_cfg)
 aml_append(dev, aml_name_decl("_CRS", crs));
 aml_append(scope, dev);
 }
+#endif
diff --git a/hw/i386/meson.build b/hw/i386/meson.build
index d8b70ef3e9c..d9da676038c 100644
--- a/hw/i386/meson.build
+++ b/hw/i386/meson.build
@@ -1,12 +1,12 @@
 i386_ss = ss.source_set()
 i386_ss.add(files(
   'fw_cfg.c',
-  'vapic.c',
   'e820_memory_layout.c',
   'multiboot.c',
   'x86.c',
 ))
 
+i386_ss.add(when: 'CONFIG_APIC', if_true: files('vapic.c'))
 i386_ss.add(when: 'CONFIG_X86_IOMMU', if_true: files('x86-iommu.c'),
   if_false: files('x86-iommu-stub.c'))
 i386_ss.add(when: 'CONFIG_AMD_IOMMU', if_true: files('amd_iommu.c'),
-- 
2.45.0




[PATCH 01/13] s390x: move s390_cpu_addr2state to target/s390x/sigp.c

2024-05-09 Thread Paolo Bonzini
This function has no dependency on the virtio-ccw machine type, though it
assumes that the CPU address corresponds to the core_id and the index.

If there is any need of something different or more fancy (unlikely)
S390 can include a MachineClass subclass and implement it there.  For
now, move it to sigp.c for simplicity.

Signed-off-by: Paolo Bonzini 
---
 hw/s390x/s390-virtio-ccw.c | 16 
 target/s390x/sigp.c| 17 +
 2 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 4dcc2138200..feabc173eb3 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -50,22 +50,6 @@
 
 static Error *pv_mig_blocker;
 
-S390CPU *s390_cpu_addr2state(uint16_t cpu_addr)
-{
-static MachineState *ms;
-
-if (!ms) {
-ms = MACHINE(qdev_get_machine());
-g_assert(ms->possible_cpus);
-}
-
-/* CPU address corresponds to the core_id and the index */
-if (cpu_addr >= ms->possible_cpus->len) {
-return NULL;
-}
-return S390_CPU(ms->possible_cpus->cpus[cpu_addr].cpu);
-}
-
 static S390CPU *s390x_new_cpu(const char *typename, uint32_t core_id,
   Error **errp)
 {
diff --git a/target/s390x/sigp.c b/target/s390x/sigp.c
index 9dd977349ab..ad0ad61177d 100644
--- a/target/s390x/sigp.c
+++ b/target/s390x/sigp.c
@@ -11,6 +11,7 @@
 #include "qemu/osdep.h"
 #include "cpu.h"
 #include "s390x-internal.h"
+#include "hw/boards.h"
 #include "sysemu/hw_accel.h"
 #include "sysemu/runstate.h"
 #include "exec/address-spaces.h"
@@ -435,6 +436,22 @@ static int sigp_set_architecture(S390CPU *cpu, uint32_t 
param,
 return SIGP_CC_STATUS_STORED;
 }
 
+S390CPU *s390_cpu_addr2state(uint16_t cpu_addr)
+{
+static MachineState *ms;
+
+if (!ms) {
+ms = MACHINE(qdev_get_machine());
+g_assert(ms->possible_cpus);
+}
+
+/* CPU address corresponds to the core_id and the index */
+if (cpu_addr >= ms->possible_cpus->len) {
+return NULL;
+}
+return S390_CPU(ms->possible_cpus->cpus[cpu_addr].cpu);
+}
+
 int handle_sigp(CPUS390XState *env, uint8_t order, uint64_t r1, uint64_t r3)
 {
 uint64_t *status_reg = &env->regs[r1];
-- 
2.45.0




[PATCH 13/13] tests/qtest: arm: fix operation in a build without any boards or devices

2024-05-09 Thread Paolo Bonzini
ARM/aarch64 are easy to fix because they already have to pass a machine
type by hand.  Just guard the tests with a check that the machine actually
exists.

Signed-off-by: Paolo Bonzini 
---
 tests/qtest/arm-cpu-features.c | 4 
 tests/qtest/migration-test.c   | 6 ++
 tests/qtest/numa-test.c| 4 
 3 files changed, 14 insertions(+)

diff --git a/tests/qtest/arm-cpu-features.c b/tests/qtest/arm-cpu-features.c
index 9d6e6190d55..966c65d5c3e 100644
--- a/tests/qtest/arm-cpu-features.c
+++ b/tests/qtest/arm-cpu-features.c
@@ -632,6 +632,10 @@ int main(int argc, char **argv)
 {
 g_test_init(&argc, &argv, NULL);
 
+if (!qtest_has_machine("virt")) {
+goto out;
+}
+
 if (qtest_has_accel("tcg")) {
 qtest_add_data_func("/arm/query-cpu-model-expansion",
 NULL, test_query_cpu_model_expansion);
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 5d6d8cd6343..31045b69fa7 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -813,6 +813,12 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 kvm_opts = ",dirty-ring-size=4096";
 }
 
+if (!qtest_has_machine(machine_alias)) {
+g_autofree char *msg = g_strdup_printf("machine %s not supported", 
machine_alias);
+g_test_skip(msg);
+return -1;
+}
+
 machine = resolve_machine_version(machine_alias, QEMU_ENV_SRC,
   QEMU_ENV_DST);
 
diff --git a/tests/qtest/numa-test.c b/tests/qtest/numa-test.c
index 4f4404a4b14..7aa262dbb99 100644
--- a/tests/qtest/numa-test.c
+++ b/tests/qtest/numa-test.c
@@ -558,6 +558,9 @@ int main(int argc, char **argv)
 }
 
 if (g_str_equal(arch, "aarch64")) {
+if (!qtest_has_machine("virt")) {
+goto out;
+}
 g_string_append(args, " -machine virt");
 }
 
@@ -590,5 +593,6 @@ int main(int argc, char **argv)
 aarch64_numa_cpu);
 }
 
+out:
 return g_test_run();
 }
-- 
2.45.0




Re: [PATCH] target/riscv: Remove experimental prefix from "B" extension

2024-05-09 Thread Daniel Henrique Barboza




On 5/8/24 08:22, Andrew Jones wrote:

On Tue, May 07, 2024 at 11:27:21AM GMT, Rob Bradford wrote:

This extension has now been ratified:
https://jira.riscv.org/browse/RVS-2006 so the "x-" prefix can be
removed.

Signed-off-by: Rob Bradford 
---
  target/riscv/cpu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index eb1a2e7d6d..861d9f4350 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1396,7 +1396,7 @@ static const MISAExtInfo misa_ext_info_arr[] = {
  MISA_EXT_INFO(RVJ, "x-j", "Dynamic translated languages"),
  MISA_EXT_INFO(RVV, "v", "Vector operations"),
  MISA_EXT_INFO(RVG, "g", "General purpose (IMAFD_Zicsr_Zifencei)"),
-MISA_EXT_INFO(RVB, "x-b", "Bit manipulation (Zba_Zbb_Zbs)")
+MISA_EXT_INFO(RVB, "b", "Bit manipulation (Zba_Zbb_Zbs)")
  };
  
  static void riscv_cpu_validate_misa_mxl(RISCVCPUClass *mcc)

--
2.44.0




Reviewed-by: Andrew Jones 

I think we should also either change the false to true for RVB in
misa_ext_cfgs[] or at least ensure RVB is set for the 'max' cpu
type in riscv_init_max_cpu_extensions().


I prefer if we keep misa_ext_cfgs[] as is. Changing the defaults in this array
will also change the defaults for rv64. IMO we should enable RVB manually in
riscv_init_max_cpu_extensions().

We already have some precedence for it: RVV is enabled in 'max' while is default
'false' for rv64.


Thanks,

Daniel




Thanks,
drew




[PATCH v5 8/8] hw/arm: xen: Enable use of grant mappings

2024-05-09 Thread Edgar E. Iglesias
From: "Edgar E. Iglesias" 

Signed-off-by: Edgar E. Iglesias 
Reviewed-by: Stefano Stabellini 
---
 hw/arm/xen_arm.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/arm/xen_arm.c b/hw/arm/xen_arm.c
index 15fa7dfa84..6fad829ede 100644
--- a/hw/arm/xen_arm.c
+++ b/hw/arm/xen_arm.c
@@ -125,6 +125,11 @@ static void xen_init_ram(MachineState *machine)
  GUEST_RAM1_BASE, ram_size[1]);
 memory_region_add_subregion(sysmem, GUEST_RAM1_BASE, &ram_hi);
 }
+
+/* Setup support for grants.  */
+memory_region_init_ram(&xen_grants, NULL, "xen.grants", block_len,
+   &error_fatal);
+memory_region_add_subregion(sysmem, XEN_GRANT_ADDR_OFF, &xen_grants);
 }
 
 void arch_handle_ioreq(XenIOState *state, ioreq_t *req)
-- 
2.40.1




[PATCH v5 1/8] xen: mapcache: Make MCACHE_BUCKET_SHIFT runtime configurable

2024-05-09 Thread Edgar E. Iglesias
From: "Edgar E. Iglesias" 

Make MCACHE_BUCKET_SHIFT runtime configurable per cache instance.

Signed-off-by: Edgar E. Iglesias 
Reviewed-by: Stefano Stabellini 
---
 hw/xen/xen-mapcache.c | 54 ++-
 1 file changed, 33 insertions(+), 21 deletions(-)

diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c
index fa6813b1ad..bc860f4373 100644
--- a/hw/xen/xen-mapcache.c
+++ b/hw/xen/xen-mapcache.c
@@ -23,13 +23,10 @@
 
 
 #if HOST_LONG_BITS == 32
-#  define MCACHE_BUCKET_SHIFT 16
 #  define MCACHE_MAX_SIZE (1UL<<31) /* 2GB Cap */
 #else
-#  define MCACHE_BUCKET_SHIFT 20
 #  define MCACHE_MAX_SIZE (1UL<<35) /* 32GB Cap */
 #endif
-#define MCACHE_BUCKET_SIZE (1UL << MCACHE_BUCKET_SHIFT)
 
 /* This is the size of the virtual address space reserve to QEMU that will not
  * be use by MapCache.
@@ -65,7 +62,8 @@ typedef struct MapCache {
 /* For most cases (>99.9%), the page address is the same. */
 MapCacheEntry *last_entry;
 unsigned long max_mcache_size;
-unsigned int mcache_bucket_shift;
+unsigned int bucket_shift;
+unsigned long bucket_size;
 
 phys_offset_to_gaddr_t phys_offset_to_gaddr;
 QemuMutex lock;
@@ -95,11 +93,14 @@ static inline int test_bits(int nr, int size, const 
unsigned long *addr)
 
 static MapCache *xen_map_cache_init_single(phys_offset_to_gaddr_t f,
void *opaque,
+   unsigned int bucket_shift,
unsigned long max_size)
 {
 unsigned long size;
 MapCache *mc;
 
+assert(bucket_shift >= XC_PAGE_SHIFT);
+
 mc = g_new0(MapCache, 1);
 
 mc->phys_offset_to_gaddr = f;
@@ -108,12 +109,14 @@ static MapCache 
*xen_map_cache_init_single(phys_offset_to_gaddr_t f,
 
 QTAILQ_INIT(&mc->locked_entries);
 
+mc->bucket_shift = bucket_shift;
+mc->bucket_size = 1UL << bucket_shift;
 mc->max_mcache_size = max_size;
 
 mc->nr_buckets =
 (((mc->max_mcache_size >> XC_PAGE_SHIFT) +
-  (1UL << (MCACHE_BUCKET_SHIFT - XC_PAGE_SHIFT)) - 1) >>
- (MCACHE_BUCKET_SHIFT - XC_PAGE_SHIFT));
+  (1UL << (bucket_shift - XC_PAGE_SHIFT)) - 1) >>
+ (bucket_shift - XC_PAGE_SHIFT));
 
 size = mc->nr_buckets * sizeof(MapCacheEntry);
 size = (size + XC_PAGE_SIZE - 1) & ~(XC_PAGE_SIZE - 1);
@@ -126,6 +129,13 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f, void 
*opaque)
 {
 struct rlimit rlimit_as;
 unsigned long max_mcache_size;
+unsigned int bucket_shift;
+
+if (HOST_LONG_BITS == 32) {
+bucket_shift = 16;
+} else {
+bucket_shift = 20;
+}
 
 if (geteuid() == 0) {
 rlimit_as.rlim_cur = RLIM_INFINITY;
@@ -146,7 +156,9 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f, void 
*opaque)
 }
 }
 
-mapcache = xen_map_cache_init_single(f, opaque, max_mcache_size);
+mapcache = xen_map_cache_init_single(f, opaque,
+ bucket_shift,
+ max_mcache_size);
 setrlimit(RLIMIT_AS, &rlimit_as);
 }
 
@@ -195,7 +207,7 @@ static void xen_remap_bucket(MapCache *mc,
 entry->valid_mapping = NULL;
 
 for (i = 0; i < nb_pfn; i++) {
-pfns[i] = (address_index << (MCACHE_BUCKET_SHIFT-XC_PAGE_SHIFT)) + i;
+pfns[i] = (address_index << (mc->bucket_shift - XC_PAGE_SHIFT)) + i;
 }
 
 /*
@@ -266,8 +278,8 @@ static uint8_t *xen_map_cache_unlocked(MapCache *mc,
 bool dummy = false;
 
 tryagain:
-address_index  = phys_addr >> MCACHE_BUCKET_SHIFT;
-address_offset = phys_addr & (MCACHE_BUCKET_SIZE - 1);
+address_index  = phys_addr >> mc->bucket_shift;
+address_offset = phys_addr & (mc->bucket_size - 1);
 
 trace_xen_map_cache(phys_addr);
 
@@ -294,14 +306,14 @@ tryagain:
 return mc->last_entry->vaddr_base + address_offset;
 }
 
-/* size is always a multiple of MCACHE_BUCKET_SIZE */
+/* size is always a multiple of mc->bucket_size */
 if (size) {
 cache_size = size + address_offset;
-if (cache_size % MCACHE_BUCKET_SIZE) {
-cache_size += MCACHE_BUCKET_SIZE - (cache_size % 
MCACHE_BUCKET_SIZE);
+if (cache_size % mc->bucket_size) {
+cache_size += mc->bucket_size - (cache_size % mc->bucket_size);
 }
 } else {
-cache_size = MCACHE_BUCKET_SIZE;
+cache_size = mc->bucket_size;
 }
 
 entry = &mc->entry[address_index % mc->nr_buckets];
@@ -422,7 +434,7 @@ static ram_addr_t 
xen_ram_addr_from_mapcache_single(MapCache *mc, void *ptr)
 trace_xen_ram_addr_from_mapcache_not_in_cache(ptr);
 raddr = RAM_ADDR_INVALID;
 } else {
-raddr = (reventry->paddr_index << MCACHE_BUCKET_SHIFT) +
+raddr = (reventry->paddr_index << mc->bucket_shift) +
  ((unsigned long) ptr - (unsigned long) entry->vaddr_base);
 }
 mapcache_unlock(mc);
@@ -585,8 +

[PATCH v5 4/8] softmmu: xen: Always pass offset + addr to xen_map_cache

2024-05-09 Thread Edgar E. Iglesias
From: "Edgar E. Iglesias" 

Always pass address with offset to xen_map_cache().
This is in preparation for support for grant mappings.

Since this is within a block that checks for offset == 0,
this has no functional changes.

Signed-off-by: Edgar E. Iglesias 
---
 system/physmem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/system/physmem.c b/system/physmem.c
index 342b7a8fd4..5e6257ef65 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -2230,7 +2230,8 @@ static void *qemu_ram_ptr_length(RAMBlock *block, 
ram_addr_t addr,
  * In that case just map the requested area.
  */
 if (block->offset == 0) {
-return xen_map_cache(block->mr, addr, len, lock, lock,
+return xen_map_cache(block->mr, block->offset + addr,
+ len, lock, lock,
  is_write);
 }
 
-- 
2.40.1




[PATCH v5 0/8] xen: Support grant mappings

2024-05-09 Thread Edgar E. Iglesias
From: "Edgar E. Iglesias" 

Hi,

Grant mappings are a mechanism in Xen for guests to grant each other
permissions to map and share pages. These grants can be temporary
so both map and unmaps must be respected. See here for more info:
https://github.com/xen-project/xen/blob/master/docs/misc/grant-tables.txt

Currently, the primary use-case for grants in QEMU, is with VirtIO backends.
Grant mappings will only work with models that use the address_space_map/unmap
interfaces, any other access will fail with appropriate error messages.

In response to feedback we got on v3, later version switch approach
from adding new MemoryRegion types and map/unmap hooks to instead reusing
the existing xen_map_cache() hooks (with extensions). Almost all of the
changes are now contained to the Xen modules.

This approach also refactors the mapcache to support multiple instances
(one for existing foreign mappings and another for grant mappings).

I've only enabled grants for the ARM PVH machine since that is what
I can currently test on.

Cheers,
Edgar

ChangeLog:

v4 -> v5:
* Compute grant_ref from address_index to xen_remap_bucket().
* Rename grant_is_write to is_write and comment on why foreign
  mappings don't yet use it.
* Remove unnecessary + mc->bucket_size - 1 in
  xen_invalidate_map_cache_entry_unlocked().
* Remove use of global mapcache in refactor of
  xen_replace_cache_entry_unlocked().
* Add error checking for xengnttab_unmap().
* Add assert in xen_replace_cache_entry_unlocked() against grant mappings.
* Fix memory leak when freeing first entry in mapcache buckets.
* Assert that bucket_shift is >= XC_PAGE_SHIFT when creating mapcache.
* Add missing use of xen_mr_is_memory() in hw/xen/xen-hvm-common.c.
* Rebase with master.

v3 -> v4:
* Reuse existing xen_map_cache hooks.
* Reuse existing map-cache for both foreign and grant mappings.
* Only enable grants for the ARM PVH machine (removed i386).

v2 -> v3:
* Drop patch 1/7. This was done because device unplug is an x86-only case.
* Add missing qemu_mutex_unlock() before return.

v1 -> v2:
* Split patch 2/7 to keep phymem.c changes in a separate.
* In patch "xen: add map and unmap callbacks for grant" add check for total
  allowed grant < XEN_MAX_VIRTIO_GRANTS.
* Fix formatting issues and re-based with master latest.


Edgar E. Iglesias (8):
  xen: mapcache: Make MCACHE_BUCKET_SHIFT runtime configurable
  xen: mapcache: Unmap first entries in buckets
  xen: Add xen_mr_is_memory()
  softmmu: xen: Always pass offset + addr to xen_map_cache
  softmmu: Replace check for RAMBlock offset 0 with xen_mr_is_memory
  xen: mapcache: Pass the ram_addr offset to xen_map_cache()
  xen: mapcache: Add support for grant mappings
  hw/arm: xen: Enable use of grant mappings

 hw/arm/xen_arm.c|   5 +
 hw/xen/xen-hvm-common.c |  18 ++-
 hw/xen/xen-mapcache.c   | 232 
 include/hw/xen/xen-hvm-common.h |   3 +
 include/sysemu/xen-mapcache.h   |   2 +
 include/sysemu/xen.h|  15 +++
 system/physmem.c|  12 +-
 7 files changed, 226 insertions(+), 61 deletions(-)

-- 
2.40.1




[PATCH v5 7/8] xen: mapcache: Add support for grant mappings

2024-05-09 Thread Edgar E. Iglesias
From: "Edgar E. Iglesias" 

Add a second mapcache for grant mappings. The mapcache for
grants needs to work with XC_PAGE_SIZE granularity since
we can't map larger ranges than what has been granted to us.

Like with foreign mappings (xen_memory), machines using grants
are expected to initialize the xen_grants MR and map it
into their address-map accordingly.

Signed-off-by: Edgar E. Iglesias 
---
 hw/xen/xen-hvm-common.c |  12 ++-
 hw/xen/xen-mapcache.c   | 163 ++--
 include/hw/xen/xen-hvm-common.h |   3 +
 include/sysemu/xen.h|   7 ++
 4 files changed, 152 insertions(+), 33 deletions(-)

diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
index c94f1990c5..7a1e2ce4b3 100644
--- a/hw/xen/xen-hvm-common.c
+++ b/hw/xen/xen-hvm-common.c
@@ -10,12 +10,18 @@
 #include "hw/boards.h"
 #include "hw/xen/arch_hvm.h"
 
-MemoryRegion xen_memory;
+MemoryRegion xen_memory, xen_grants;
 
-/* Check for xen memory.  */
+/* Check for any kind of xen memory, foreign mappings or grants.  */
 bool xen_mr_is_memory(MemoryRegion *mr)
 {
-return mr == &xen_memory;
+return mr == &xen_memory || mr == &xen_grants;
+}
+
+/* Check specifically for grants.  */
+bool xen_mr_is_grants(MemoryRegion *mr)
+{
+return mr == &xen_grants;
 }
 
 void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion *mr,
diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c
index 26bc38a9e3..25041ab02d 100644
--- a/hw/xen/xen-mapcache.c
+++ b/hw/xen/xen-mapcache.c
@@ -14,6 +14,7 @@
 
 #include 
 
+#include "hw/xen/xen-hvm-common.h"
 #include "hw/xen/xen_native.h"
 #include "qemu/bitmap.h"
 
@@ -21,6 +22,8 @@
 #include "sysemu/xen-mapcache.h"
 #include "trace.h"
 
+#include 
+#include 
 
 #if HOST_LONG_BITS == 32
 #  define MCACHE_MAX_SIZE (1UL<<31) /* 2GB Cap */
@@ -41,6 +44,7 @@ typedef struct MapCacheEntry {
 unsigned long *valid_mapping;
 uint32_t lock;
 #define XEN_MAPCACHE_ENTRY_DUMMY (1 << 0)
+#define XEN_MAPCACHE_ENTRY_GRANT (1 << 1)
 uint8_t flags;
 hwaddr size;
 struct MapCacheEntry *next;
@@ -71,6 +75,8 @@ typedef struct MapCache {
 } MapCache;
 
 static MapCache *mapcache;
+static MapCache *mapcache_grants;
+static xengnttab_handle *xen_region_gnttabdev;
 
 static inline void mapcache_lock(MapCache *mc)
 {
@@ -131,6 +137,12 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f, void 
*opaque)
 unsigned long max_mcache_size;
 unsigned int bucket_shift;
 
+xen_region_gnttabdev = xengnttab_open(NULL, 0);
+if (xen_region_gnttabdev == NULL) {
+error_report("mapcache: Failed to open gnttab device");
+exit(EXIT_FAILURE);
+}
+
 if (HOST_LONG_BITS == 32) {
 bucket_shift = 16;
 } else {
@@ -159,6 +171,15 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f, void 
*opaque)
 mapcache = xen_map_cache_init_single(f, opaque,
  bucket_shift,
  max_mcache_size);
+
+/*
+ * Grant mappings must use XC_PAGE_SIZE granularity since we can't
+ * map anything beyond the number of pages granted to us.
+ */
+mapcache_grants = xen_map_cache_init_single(f, opaque,
+XC_PAGE_SHIFT,
+max_mcache_size);
+
 setrlimit(RLIMIT_AS, &rlimit_as);
 }
 
@@ -168,17 +189,24 @@ static void xen_remap_bucket(MapCache *mc,
  hwaddr size,
  hwaddr address_index,
  bool dummy,
+ bool grant,
+ bool is_write,
  ram_addr_t ram_offset)
 {
 uint8_t *vaddr_base;
-xen_pfn_t *pfns;
+uint32_t *refs = NULL;
+xen_pfn_t *pfns = NULL;
 int *err;
 unsigned int i;
 hwaddr nb_pfn = size >> XC_PAGE_SHIFT;
 
 trace_xen_remap_bucket(address_index);
 
-pfns = g_new0(xen_pfn_t, nb_pfn);
+if (grant) {
+refs = g_new0(uint32_t, nb_pfn);
+} else {
+pfns = g_new0(xen_pfn_t, nb_pfn);
+}
 err = g_new0(int, nb_pfn);
 
 if (entry->vaddr_base != NULL) {
@@ -207,21 +235,51 @@ static void xen_remap_bucket(MapCache *mc,
 g_free(entry->valid_mapping);
 entry->valid_mapping = NULL;
 
-for (i = 0; i < nb_pfn; i++) {
-pfns[i] = (address_index << (mc->bucket_shift - XC_PAGE_SHIFT)) + i;
+if (grant) {
+hwaddr grant_base = address_index - (ram_offset >> XC_PAGE_SHIFT);
+
+for (i = 0; i < nb_pfn; i++) {
+refs[i] = grant_base + i;
+}
+} else {
+for (i = 0; i < nb_pfn; i++) {
+pfns[i] = (address_index << (mc->bucket_shift - XC_PAGE_SHIFT)) + 
i;
+}
 }
 
-/*
- * If the caller has requested the mapping at a specific address use
- * MAP_FIXED to make sure it's honored.
- */
+entry->flags &= ~XEN_MAPCACHE_ENTRY_GRANT;
+
 if (!

[PATCH v5 3/8] xen: Add xen_mr_is_memory()

2024-05-09 Thread Edgar E. Iglesias
From: "Edgar E. Iglesias" 

Add xen_mr_is_memory() to abstract away tests for the
xen_memory MR.

No functional changes.

Signed-off-by: Edgar E. Iglesias 
Reviewed-by: Stefano Stabellini 
Acked-by: David Hildenbrand 
---
 hw/xen/xen-hvm-common.c | 10 --
 include/sysemu/xen.h|  8 
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
index 1627da7398..c94f1990c5 100644
--- a/hw/xen/xen-hvm-common.c
+++ b/hw/xen/xen-hvm-common.c
@@ -12,6 +12,12 @@
 
 MemoryRegion xen_memory;
 
+/* Check for xen memory.  */
+bool xen_mr_is_memory(MemoryRegion *mr)
+{
+return mr == &xen_memory;
+}
+
 void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion *mr,
Error **errp)
 {
@@ -28,7 +34,7 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, 
MemoryRegion *mr,
 return;
 }
 
-if (mr == &xen_memory) {
+if (xen_mr_is_memory(mr)) {
 return;
 }
 
@@ -55,7 +61,7 @@ static void xen_set_memory(struct MemoryListener *listener,
 {
 XenIOState *state = container_of(listener, XenIOState, memory_listener);
 
-if (section->mr == &xen_memory) {
+if (xen_mr_is_memory(section->mr)) {
 return;
 } else {
 if (add) {
diff --git a/include/sysemu/xen.h b/include/sysemu/xen.h
index 754ec2e6cb..dc72f83bcb 100644
--- a/include/sysemu/xen.h
+++ b/include/sysemu/xen.h
@@ -34,6 +34,8 @@ void xen_hvm_modified_memory(ram_addr_t start, ram_addr_t 
length);
 void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size,
struct MemoryRegion *mr, Error **errp);
 
+bool xen_mr_is_memory(MemoryRegion *mr);
+
 #else /* !CONFIG_XEN_IS_POSSIBLE */
 
 #define xen_enabled() 0
@@ -47,6 +49,12 @@ static inline void xen_ram_alloc(ram_addr_t ram_addr, 
ram_addr_t size,
 g_assert_not_reached();
 }
 
+static inline bool xen_mr_is_memory(MemoryRegion *mr)
+{
+g_assert_not_reached();
+return false;
+}
+
 #endif /* CONFIG_XEN_IS_POSSIBLE */
 
 #endif
-- 
2.40.1




[PATCH v5 2/8] xen: mapcache: Unmap first entries in buckets

2024-05-09 Thread Edgar E. Iglesias
From: "Edgar E. Iglesias" 

When invalidating memory ranges, if we happen to hit the first
entry in a bucket we were never unmapping it. This was harmless
for foreign mappings but now that we're looking to reuse the
mapcache for transient grant mappings, we must unmap entries
when invalidated.

Signed-off-by: Edgar E. Iglesias 
Reviewed-by: Stefano Stabellini 
---
 hw/xen/xen-mapcache.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c
index bc860f4373..ec95445696 100644
--- a/hw/xen/xen-mapcache.c
+++ b/hw/xen/xen-mapcache.c
@@ -491,18 +491,23 @@ static void 
xen_invalidate_map_cache_entry_unlocked(MapCache *mc,
 return;
 }
 entry->lock--;
-if (entry->lock > 0 || pentry == NULL) {
+if (entry->lock > 0) {
 return;
 }
 
-pentry->next = entry->next;
 ram_block_notify_remove(entry->vaddr_base, entry->size, entry->size);
 if (munmap(entry->vaddr_base, entry->size) != 0) {
 perror("unmap fails");
 exit(-1);
 }
+
 g_free(entry->valid_mapping);
-g_free(entry);
+if (pentry) {
+pentry->next = entry->next;
+g_free(entry);
+} else {
+memset(entry, 0, sizeof *entry);
+}
 }
 
 typedef struct XenMapCacheData {
-- 
2.40.1




[PATCH v5 6/8] xen: mapcache: Pass the ram_addr offset to xen_map_cache()

2024-05-09 Thread Edgar E. Iglesias
From: "Edgar E. Iglesias" 

Pass the ram_addr offset to xen_map_cache.
This is in preparation for adding grant mappings that need
to compute the address within the RAMBlock.

No functional changes.

Signed-off-by: Edgar E. Iglesias 
---
 hw/xen/xen-mapcache.c | 16 +++-
 include/sysemu/xen-mapcache.h |  2 ++
 system/physmem.c  |  9 +
 3 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c
index ec95445696..26bc38a9e3 100644
--- a/hw/xen/xen-mapcache.c
+++ b/hw/xen/xen-mapcache.c
@@ -167,7 +167,8 @@ static void xen_remap_bucket(MapCache *mc,
  void *vaddr,
  hwaddr size,
  hwaddr address_index,
- bool dummy)
+ bool dummy,
+ ram_addr_t ram_offset)
 {
 uint8_t *vaddr_base;
 xen_pfn_t *pfns;
@@ -266,6 +267,7 @@ static void xen_remap_bucket(MapCache *mc,
 
 static uint8_t *xen_map_cache_unlocked(MapCache *mc,
hwaddr phys_addr, hwaddr size,
+   ram_addr_t ram_offset,
uint8_t lock, bool dma, bool is_write)
 {
 MapCacheEntry *entry, *pentry = NULL,
@@ -337,14 +339,16 @@ tryagain:
 if (!entry) {
 entry = g_new0(MapCacheEntry, 1);
 pentry->next = entry;
-xen_remap_bucket(mc, entry, NULL, cache_size, address_index, dummy);
+xen_remap_bucket(mc, entry, NULL, cache_size, address_index, dummy,
+ ram_offset);
 } else if (!entry->lock) {
 if (!entry->vaddr_base || entry->paddr_index != address_index ||
 entry->size != cache_size ||
 !test_bits(address_offset >> XC_PAGE_SHIFT,
 test_bit_size >> XC_PAGE_SHIFT,
 entry->valid_mapping)) {
-xen_remap_bucket(mc, entry, NULL, cache_size, address_index, 
dummy);
+xen_remap_bucket(mc, entry, NULL, cache_size, address_index, dummy,
+ ram_offset);
 }
 }
 
@@ -391,13 +395,15 @@ tryagain:
 
 uint8_t *xen_map_cache(MemoryRegion *mr,
hwaddr phys_addr, hwaddr size,
+   ram_addr_t ram_addr_offset,
uint8_t lock, bool dma,
bool is_write)
 {
 uint8_t *p;
 
 mapcache_lock(mapcache);
-p = xen_map_cache_unlocked(mapcache, phys_addr, size, lock, dma, is_write);
+p = xen_map_cache_unlocked(mapcache, phys_addr, size, ram_addr_offset,
+   lock, dma, is_write);
 mapcache_unlock(mapcache);
 return p;
 }
@@ -632,7 +638,7 @@ static uint8_t *xen_replace_cache_entry_unlocked(MapCache 
*mc,
 trace_xen_replace_cache_entry_dummy(old_phys_addr, new_phys_addr);
 
 xen_remap_bucket(mc, entry, entry->vaddr_base,
- cache_size, address_index, false);
+ cache_size, address_index, false, new_phys_addr);
 if (!test_bits(address_offset >> XC_PAGE_SHIFT,
 test_bit_size >> XC_PAGE_SHIFT,
 entry->valid_mapping)) {
diff --git a/include/sysemu/xen-mapcache.h b/include/sysemu/xen-mapcache.h
index 1ec9e66752..b5e3ea1bc0 100644
--- a/include/sysemu/xen-mapcache.h
+++ b/include/sysemu/xen-mapcache.h
@@ -19,6 +19,7 @@ typedef hwaddr (*phys_offset_to_gaddr_t)(hwaddr phys_offset,
 void xen_map_cache_init(phys_offset_to_gaddr_t f,
 void *opaque);
 uint8_t *xen_map_cache(MemoryRegion *mr, hwaddr phys_addr, hwaddr size,
+   ram_addr_t ram_addr_offset,
uint8_t lock, bool dma,
bool is_write);
 ram_addr_t xen_ram_addr_from_mapcache(void *ptr);
@@ -37,6 +38,7 @@ static inline void xen_map_cache_init(phys_offset_to_gaddr_t 
f,
 static inline uint8_t *xen_map_cache(MemoryRegion *mr,
  hwaddr phys_addr,
  hwaddr size,
+ ram_addr_t ram_addr_offset,
  uint8_t lock,
  bool dma,
  bool is_write)
diff --git a/system/physmem.c b/system/physmem.c
index b7847db1a2..33d09f7571 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -2231,13 +2231,14 @@ static void *qemu_ram_ptr_length(RAMBlock *block, 
ram_addr_t addr,
  */
 if (xen_mr_is_memory(block->mr)) {
 return xen_map_cache(block->mr, block->offset + addr,
- len, lock, lock,
- is_write);
+ len, block->offset,
+ lock, lock, is_write);
 }
 
 block->host = xen_map_cache(block->mr, block->offset,
- 

[PATCH v5 5/8] softmmu: Replace check for RAMBlock offset 0 with xen_mr_is_memory

2024-05-09 Thread Edgar E. Iglesias
From: "Edgar E. Iglesias" 

For xen, when checking for the first RAM (xen_memory), use
xen_mr_is_memory() rather than checking for a RAMBlock with
offset 0.

All Xen machines create xen_memory first so this has no
functional change for existing machines.

Signed-off-by: Edgar E. Iglesias 
---
 system/physmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/system/physmem.c b/system/physmem.c
index 5e6257ef65..b7847db1a2 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -2229,7 +2229,7 @@ static void *qemu_ram_ptr_length(RAMBlock *block, 
ram_addr_t addr,
  * because we don't want to map the entire memory in QEMU.
  * In that case just map the requested area.
  */
-if (block->offset == 0) {
+if (xen_mr_is_memory(block->mr)) {
 return xen_map_cache(block->mr, block->offset + addr,
  len, lock, lock,
  is_write);
-- 
2.40.1




Re: [PATCH V1 21/26] migration: migrate_add_blocker_mode

2024-05-09 Thread Fabiano Rosas
Steve Sistare  writes:

> Define a convenience function to add a migration blocker for a single mode.
>
> Signed-off-by: Steve Sistare 

Reviewed-by: Fabiano Rosas 



Re: hw/usb/hcd-ohci: Fix #1510, #303: pid not IN or OUT

2024-05-09 Thread Peter Maydell
On Wed, 8 May 2024 at 16:29, Cord Amfmgm  wrote:
> On Wed, May 8, 2024 at 3:45 AM Thomas Huth  wrote:
>>
>> Your Signed-off-by line does not match the From: line ... could you please
>> fix this? (see
>> https://www.qemu.org/docs/master/devel/submitting-a-patch.html#patch-emails-must-include-a-signed-off-by-line
>> , too)
>
>
> I'll submit the new patch request with my pseudonym in the From: and 
> Signed-off-by: lines, per your request. Doesn't matter to me. However, this 
> arises simply because I don't give gmail my real name - 
> https://en.wikipedia.org/wiki/Nymwars

I'm confused now. Of the two names you've used in this
patch (Cord Amfmgm and David Hubbard), are they both
pseudonyms, or is one a pseudonym and one your real name?

thanks
-- PMM



Re: [PATCH V1 22/26] migration: ram block cpr-exec blockers

2024-05-09 Thread Fabiano Rosas
Steve Sistare  writes:

> Unlike cpr-reboot mode, cpr-exec mode cannot save volatile ram blocks in the
> migration stream file and recreate them later, because the physical memory for
> the blocks is pinned and registered for vfio.  Add an exec-mode blocker for
> volatile ram blocks.
>
> Also add a blocker for RAM_GUEST_MEMFD.  Preserving guest_memfd may be
> sufficient for cpr-exec, but it has not been tested yet.
>
> - Steve

extra text here

>
> Signed-off-by: Steve Sistare 

Reviewed-by: Fabiano Rosas 




Re: [PATCH V1 23/26] migration: misc cpr-exec blockers

2024-05-09 Thread Fabiano Rosas
Steve Sistare  writes:

> Add blockers for cpr-exec migration mode for devices and options that do
> not support it.
>
> Signed-off-by: Steve Sistare 

Reviewed-by: Fabiano Rosas 



Re: [PATCH V1 24/26] seccomp: cpr-exec blocker

2024-05-09 Thread Fabiano Rosas
Steve Sistare  writes:

> cpr-exec mode needs permission to exec.  Block it if permission is denied.
>
> Signed-off-by: Steve Sistare 

Reviewed-by: Fabiano Rosas 



Re: hw/usb/hcd-ohci: Fix #1510, #303: pid not IN or OUT

2024-05-09 Thread Cord Amfmgm
On Thu, May 9, 2024 at 12:48 PM Peter Maydell 
wrote:

> On Wed, 8 May 2024 at 16:29, Cord Amfmgm  wrote:
> > On Wed, May 8, 2024 at 3:45 AM Thomas Huth  wrote:
> >>
> >> Your Signed-off-by line does not match the From: line ... could you
> please
> >> fix this? (see
> >>
> https://www.qemu.org/docs/master/devel/submitting-a-patch.html#patch-emails-must-include-a-signed-off-by-line
> >> , too)
> >
> >
> > I'll submit the new patch request with my pseudonym in the From: and
> Signed-off-by: lines, per your request. Doesn't matter to me. However, this
> arises simply because I don't give gmail my real name -
> https://en.wikipedia.org/wiki/Nymwars
>
> I'm confused now. Of the two names you've used in this
> patch (Cord Amfmgm and David Hubbard), are they both
> pseudonyms, or is one a pseudonym and one your real name?
>
>
Hi Peter,

I am attempting to submit a small patch. For context, I'm getting broader
attention now because apparently OHCI is one of the less used components of
qemu and maybe the review process was taking a while. That's relevant
because I wasn't able to get prompt feedback and am now choosing what
appears to be the most expeditious approach -- all I want is to get this
patch done and be out of your hair. If Thomas Huth wants me to use a
consistent name, have I not complied? Are you asking out of curiosity or is
there a valid reason why I should answer your question in order to get the
patch submitted? Would you like to have a friendly chat over virtual coffee
sometime (but off-list)?

If you could please clarify I'm sure the answer is an easy one.

Thanks


Re: [PATCH 1/9] block: add persistent reservation in/out api

2024-05-09 Thread Stefan Hajnoczi
On Wed, May 08, 2024 at 05:36:21PM +0800, Changqi Lu wrote:
> Add persistent reservation in/out operations
> at the block level. The following operations
> are included:
> 
> - read_keys:retrieves the list of registered keys.
> - read_reservation: retrieves the current reservation status.
> - register: registers a new reservation key.
> - reserve:  initiates a reservation for a specific key.
> - release:  releases a reservation for a specific key.
> - clear:clears all existing reservations.
> - preempt:  preempts a reservation held by another key.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  block/block-backend.c | 386 ++
>  block/io.c| 161 +
>  include/block/block-common.h  |   9 +
>  include/block/block-io.h  |  19 ++
>  include/block/block_int-common.h  |  31 +++
>  include/sysemu/block-backend-io.h |  22 ++
>  6 files changed, 628 insertions(+)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index db6f9b92a3..ec67937d28 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1770,6 +1770,392 @@ BlockAIOCB *blk_aio_ioctl(BlockBackend *blk, unsigned 
> long int req, void *buf,
>  return blk_aio_prwv(blk, req, 0, buf, blk_aio_ioctl_entry, 0, cb, 
> opaque);
>  }
>  
> +typedef struct BlkPrInCo {
> +BlockBackend *blk;
> +uint32_t *generation;
> +uint32_t num_keys;
> +BlockPrType *type;
> +uint64_t *keys;
> +int ret;
> +} BlkPrInCo;
> +
> +typedef struct BlkPrInCB {
> +BlockAIOCB common;
> +BlkPrInCo prco;
> +bool has_returned;
> +} BlkPrInCB;
> +
> +static const AIOCBInfo blk_pr_in_aiocb_info = {
> +.aiocb_size = sizeof(BlkPrInCB),
> +};
> +
> +static void blk_pr_in_complete(BlkPrInCB *acb)
> +{
> +if (acb->has_returned) {
> +acb->common.cb(acb->common.opaque, acb->prco.ret);
> +blk_dec_in_flight(acb->prco.blk);

Please add a comment identifying which blk_inc_in_flight() call this dec
is paired with. That makes it easier for people trying to understand
in-flight reference counting.

> +qemu_aio_unref(acb);
> +}
> +}
> +
> +static void blk_pr_in_complete_bh(void *opaque)
> +{
> +BlkPrInCB *acb = opaque;
> +assert(acb->has_returned);
> +blk_pr_in_complete(acb);
> +}
> +
> +static BlockAIOCB *blk_aio_pr_in(BlockBackend *blk, uint32_t *generation,
> + uint32_t num_keys, BlockPrType *type,
> + uint64_t *keys, CoroutineEntry co_entry,
> + BlockCompletionFunc *cb, void *opaque)
> +{
> +BlkPrInCB *acb;
> +Coroutine *co;
> +
> +blk_inc_in_flight(blk);

Please add a comment identifying which blk_dec_in_flight() call this inc
is paired with.

> +acb = blk_aio_get(&blk_pr_in_aiocb_info, blk, cb, opaque);
> +acb->prco = (BlkPrInCo) {
> +.blk= blk,
> +.generation = generation,
> +.num_keys   = num_keys,
> +.type   = type,
> +.ret= NOT_DONE,
> +.keys   = keys,
> +};
> +acb->has_returned = false;
> +
> +co = qemu_coroutine_create(co_entry, acb);
> +aio_co_enter(qemu_get_current_aio_context(), co);
> +
> +acb->has_returned = true;
> +if (acb->prco.ret != NOT_DONE) {
> +replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
> + blk_pr_in_complete_bh, acb);
> +}
> +
> +return &acb->common;
> +}
> +
> +
> +static int coroutine_fn
> +blk_aio_pr_do_read_keys(BlockBackend *blk, uint32_t *generation,
> +uint32_t num_keys, uint64_t *keys)
> +{
> +IO_CODE();
> +
> +blk_wait_while_drained(blk);
> +GRAPH_RDLOCK_GUARD();
> +
> +if (!blk_co_is_available(blk)) {
> +return -ENOMEDIUM;
> +}
> +
> +return bdrv_co_pr_read_keys(blk_bs(blk), generation, num_keys, keys);
> +}
> +
> +static void coroutine_fn blk_aio_pr_read_keys_entry(void *opaque)
> +{
> +BlkPrInCB *acb = opaque;
> +BlkPrInCo *prco = &acb->prco;
> +
> +prco->ret = blk_aio_pr_do_read_keys(prco->blk, prco->generation,
> +prco->num_keys, prco->keys);
> +blk_pr_in_complete(acb);
> +}
> +
> +BlockAIOCB *blk_aio_pr_read_keys(BlockBackend *blk, uint32_t *generation,
> + uint32_t num_keys, uint64_t *keys,
> + BlockCompletionFunc *cb, void *opaque)
> +{
> +IO_CODE();
> +return blk_aio_pr_in(blk, generation, num_keys, NULL, keys,
> + blk_aio_pr_read_keys_entry, cb, opaque);
> +}
> +
> +
> +static int coroutine_fn
> +blk_aio_pr_do_read_reservation(BlockBackend *blk, uint32_t *generation,
> +   uint64_t *key, BlockPrType *type)
> +{
> +IO_CODE();
> +
> +blk_wait_while_drained(blk);
> + 

Re: [PATCH 2/9] block/raw: add persistent reservation in/out driver

2024-05-09 Thread Stefan Hajnoczi
On Wed, May 08, 2024 at 05:36:22PM +0800, Changqi Lu wrote:
> Add persistent reservation in/out operations for raw driver.
> The following methods are implemented: bdrv_co_pr_read_keys,
> bdrv_co_pr_read_reservation, bdrv_co_pr_register, bdrv_co_pr_reserve,
> bdrv_co_pr_release, bdrv_co_pr_clear and bdrv_co_pr_preempt.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  block/raw-format.c | 55 ++
>  1 file changed, 55 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH 3/9] scsi/constant: add persistent reservation in/out protocol constants

2024-05-09 Thread Stefan Hajnoczi
On Wed, May 08, 2024 at 05:36:23PM +0800, Changqi Lu wrote:
> Add constants for the persistent reservation in/out protocol
> in the scsi/constant module. The constants include the persistent
> reservation command, type, and scope values defined in sections
> 6.13 and 6.14 of the SCSI Primary Commands-4 (SPC-4) specification.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  include/scsi/constants.h | 29 +
>  1 file changed, 29 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH 4/9] scsi/util: add helper functions for persistent reservation types conversion

2024-05-09 Thread Stefan Hajnoczi
On Wed, May 08, 2024 at 05:36:24PM +0800, Changqi Lu wrote:
> This commit introduces two helper functions
> that facilitate the conversion between the
> persistent reservation types used in the SCSI
> protocol and those used in the block layer.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  include/scsi/utils.h |  5 +
>  scsi/utils.c | 40 
>  2 files changed, 45 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH V1 25/26] migration: fix mismatched GPAs during cpr-exec

2024-05-09 Thread Fabiano Rosas
Steve Sistare  writes:

> For cpr-exec mode, ramblock_is_ignored is always true, and the address of
> each migrated memory region must match the address of the statically
> initialized region on the target.  However, for a PCI rom block, the region
> address is set when the guest writes to a BAR on the source, which does not
> occur on the target, causing a "Mismatched GPAs" error during cpr-exec
> migration.
>
> To fix, unconditionally set the target's address to the source's address
> if the region does not have an address yet.
>
> Signed-off-by: Steve Sistare 

Just a detail below.

Reviewed-by: Fabiano Rosas 

> ---
>  include/exec/memory.h | 12 
>  migration/ram.c   | 15 +--
>  system/memory.c   | 10 --
>  3 files changed, 29 insertions(+), 8 deletions(-)
>
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index d337737..4f654b0 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -801,6 +801,7 @@ struct MemoryRegion {
>  bool unmergeable;
>  uint8_t dirty_log_mask;
>  bool is_iommu;
> +bool has_addr;

This field is not used during memory access, maybe move it down below to
preserve the hole for future usage.

>  RAMBlock *ram_block;
>  Object *owner;
>  /* owner as TYPE_DEVICE. Used for re-entrancy checks in MR access 
> hotpath */
> @@ -2402,6 +2403,17 @@ void memory_region_set_enabled(MemoryRegion *mr, bool 
> enabled);
>  void memory_region_set_address(MemoryRegion *mr, hwaddr addr);
>  
>  /*
> + * memory_region_set_address_only: set the address of a region.
> + *
> + * Same as memory_region_set_address, but without causing transaction side
> + * effects.
> + *
> + * @mr: the region to be updated
> + * @addr: new address, relative to container region
> + */
> +void memory_region_set_address_only(MemoryRegion *mr, hwaddr addr);
> +
> +/*
>   * memory_region_set_size: dynamically update the size of a region.
>   *
>   * Dynamically updates the size of a region.
> diff --git a/migration/ram.c b/migration/ram.c
> index add285b..7b8d7f6 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -4196,12 +4196,15 @@ static int parse_ramblock(QEMUFile *f, RAMBlock 
> *block, ram_addr_t length)
>  }
>  if (migrate_ignore_shared()) {
>  hwaddr addr = qemu_get_be64(f);
> -if (migrate_ram_is_ignored(block) &&
> -block->mr->addr != addr) {
> -error_report("Mismatched GPAs for block %s "
> - "%" PRId64 "!= %" PRId64, block->idstr,
> - (uint64_t)addr, (uint64_t)block->mr->addr);
> -return -EINVAL;
> +if (migrate_ram_is_ignored(block)) {
> +if (!block->mr->has_addr) {
> +memory_region_set_address_only(block->mr, addr);
> +} else if (block->mr->addr != addr) {
> +error_report("Mismatched GPAs for block %s "
> + "%" PRId64 "!= %" PRId64, block->idstr,
> + (uint64_t)addr, (uint64_t)block->mr->addr);
> +return -EINVAL;
> +}
>  }
>  }
>  ret = rdma_block_notification_handle(f, block->idstr);
> diff --git a/system/memory.c b/system/memory.c
> index ca04a0e..3c72504 100644
> --- a/system/memory.c
> +++ b/system/memory.c
> @@ -2665,7 +2665,7 @@ static void 
> memory_region_add_subregion_common(MemoryRegion *mr,
>  for (alias = subregion->alias; alias; alias = alias->alias) {
>  alias->mapped_via_alias++;
>  }
> -subregion->addr = offset;
> +memory_region_set_address_only(subregion, offset);
>  memory_region_update_container_subregions(subregion);
>  }
>  
> @@ -2745,10 +2745,16 @@ static void 
> memory_region_readd_subregion(MemoryRegion *mr)
>  }
>  }
>  
> +void memory_region_set_address_only(MemoryRegion *mr, hwaddr addr)
> +{
> +mr->addr = addr;
> +mr->has_addr = true;
> +}
> +
>  void memory_region_set_address(MemoryRegion *mr, hwaddr addr)
>  {
>  if (addr != mr->addr) {
> -mr->addr = addr;
> +memory_region_set_address_only(mr, addr);
>  memory_region_readd_subregion(mr);
>  }
>  }



Re: [PATCH 5/9] hw/scsi: add persistent reservation in/out api for scsi device

2024-05-09 Thread Stefan Hajnoczi
On Wed, May 08, 2024 at 05:36:25PM +0800, Changqi Lu wrote:
> Add persistent reservation in/out operations in the
> SCSI device layer. By introducing the persistent
> reservation in/out api, this enables the SCSI device
> to perform reservation-related tasks, including querying
> keys, querying reservation status, registering reservation
> keys, initiating and releasing reservations, as well as
> clearing and preempting reservations held by other keys.
> 
> These operations are crucial for management and control of
> shared storage resources in a persistent manner.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  hw/scsi/scsi-disk.c | 302 
>  1 file changed, 302 insertions(+)

Paolo: Please review for buffer overflows. I find the buffer size
assumption in the SCSI layer mysterious (e.g. scsi_req_get_buf() returns
a void* and it's not obvious how large the buffer is), so I didn't check
for buffer overflows.

> 
> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> index 4bd7af9d0c..bdd66c4026 100644
> --- a/hw/scsi/scsi-disk.c
> +++ b/hw/scsi/scsi-disk.c
> @@ -32,6 +32,7 @@
>  #include "migration/vmstate.h"
>  #include "hw/scsi/emulation.h"
>  #include "scsi/constants.h"
> +#include "scsi/utils.h"
>  #include "sysemu/block-backend.h"
>  #include "sysemu/blockdev.h"
>  #include "hw/block/block.h"
> @@ -1474,6 +1475,296 @@ static void scsi_disk_emulate_read_data(SCSIRequest 
> *req)
>  scsi_req_complete(&r->req, GOOD);
>  }
>  
> +typedef struct SCSIPrReadKeys {
> +uint32_t generation;
> +uint32_t num_keys;
> +uint64_t *keys;
> +void *req;
> +} SCSIPrReadKeys;
> +
> +typedef struct SCSIPrReadReservation {
> +uint32_t generation;
> +uint64_t key;
> +BlockPrType type;
> +void *req;
> +} SCSIPrReadReservation;
> +
> +static void scsi_pr_read_keys_complete(void *opaque, int ret)
> +{
> +int num_keys;
> +uint8_t *buf;
> +SCSIPrReadKeys *blk_keys = (SCSIPrReadKeys *)opaque;
> +SCSIDiskReq *r = (SCSIDiskReq *)blk_keys->req;
> +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
> +
> +assert(blk_get_aio_context(s->qdev.conf.blk) ==
> +qemu_get_current_aio_context());
> +
> +assert(r->req.aiocb != NULL);
> +r->req.aiocb = NULL;
> +
> +if (scsi_disk_req_check_error(r, ret, true)) {
> +goto done;
> +}
> +
> +buf = scsi_req_get_buf(&r->req);
> +num_keys = MIN(blk_keys->num_keys, ret);
> +blk_keys->generation = cpu_to_be32(blk_keys->generation);
> +memcpy(&buf[0], &blk_keys->generation, 4);
> +for (int i = 0; i < num_keys; i++) {
> +blk_keys->keys[i] = cpu_to_be64(blk_keys->keys[i]);
> +memcpy(&buf[8 + i * 8], &blk_keys->keys[i], 8);
> +}
> +num_keys = cpu_to_be32(num_keys * 8);
> +memcpy(&buf[4], &num_keys, 4);
> +
> +scsi_req_data(&r->req, r->buflen);
> +done:
> +scsi_req_unref(&r->req);
> +g_free(blk_keys->keys);
> +g_free(blk_keys);
> +}
> +
> +static int scsi_disk_emulate_pr_read_keys(SCSIRequest *req)
> +{
> +SCSIPrReadKeys *blk_keys;
> +SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
> +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, req->dev);
> +int buflen = MIN(r->req.cmd.xfer, r->buflen);
> +int num_keys = (buflen - sizeof(uint32_t) * 2) / sizeof(uint64_t);
> +
> +blk_keys = g_new0(SCSIPrReadKeys, 1);
> +blk_keys->generation = 0;
> +/* num_keys is the maximum number of keys that can be transmitted */
> +blk_keys->num_keys = num_keys;
> +blk_keys->keys = g_malloc(sizeof(uint64_t) * num_keys);
> +blk_keys->req = r;
> +
> +scsi_req_ref(&r->req);
> +r->req.aiocb = blk_aio_pr_read_keys(s->qdev.conf.blk, 
> &blk_keys->generation,
> +blk_keys->num_keys, blk_keys->keys,
> +scsi_pr_read_keys_complete, 
> blk_keys);
> +return 0;
> +}
> +
> +static void scsi_pr_read_reservation_complete(void *opaque, int ret)
> +{
> +uint8_t *buf;
> +uint32_t num_keys = 0;
> +SCSIPrReadReservation *blk_rsv = (SCSIPrReadReservation *)opaque;
> +SCSIDiskReq *r = (SCSIDiskReq *)blk_rsv->req;
> +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
> +
> +assert(blk_get_aio_context(s->qdev.conf.blk) ==
> +qemu_get_current_aio_context());
> +
> +assert(r->req.aiocb != NULL);
> +r->req.aiocb = NULL;
> +
> +if (scsi_disk_req_check_error(r, ret, true)) {
> +goto done;
> +}
> +
> +buf = scsi_req_get_buf(&r->req);
> +blk_rsv->generation = cpu_to_be32(blk_rsv->generation);
> +memcpy(&buf[0], &blk_rsv->generation, 4);
> +if (ret) {
> +num_keys = cpu_to_be32(16);

The SPC-6 calls this field Additional Length, which is clearer than
num_keys (there is only 1 key here!). Could you rename it to
additional_len to avoid confusion?

> +blk_rsv->key = cpu_to_be64(blk_rsv->key);
> 

Re: [PATCH 6/9] block/nvme: add reservation command protocol constants

2024-05-09 Thread Stefan Hajnoczi
On Wed, May 08, 2024 at 05:36:26PM +0800, Changqi Lu wrote:
> Add constants for the NVMe persistent command protocol.
> The constants include the reservation command opcode and
> reservation type values defined in section 7 of the NVMe
> 2.0 specification.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  include/block/nvme.h | 30 ++
>  1 file changed, 30 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH 7/9] hw/nvme: add helper functions for converting reservation types

2024-05-09 Thread Stefan Hajnoczi
On Wed, May 08, 2024 at 05:36:27PM +0800, Changqi Lu wrote:
> This commit introduces two helper functions
> that facilitate the conversion between the
> reservation types used in the NVME protocol
> and those used in the block layer.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  hw/nvme/nvme.h | 40 
>  1 file changed, 40 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH 0/9] Support persistent reservation operations

2024-05-09 Thread Stefan Hajnoczi
On Wed, May 08, 2024 at 05:36:20PM +0800, Changqi Lu wrote:
> Hi,
> 
> I am going to introduce persistent reservation for QEMU block.
> There are three parts in this series:
> 
> Firstly, at the block layer, the commit abstracts seven APIs related to
> the persistent reservation command. These APIs including reading keys,
> reading reservations, registering, reserving, releasing, clearing and 
> preempting.
> 
> Next, the commit implements the necessary pr-related operation APIs for both 
> the
> SCSI protocol and NVMe protocol at the device layer. This ensures that the 
> necessary
> functionality is available for handling persistent reservations in these 
> protocols.
> 
> Finally, the commit includes adaptations to the iscsi driver at the driver 
> layer
> to verify the correct implementation and functionality of the changes.
> 
> With these changes, GFS works fine in the guest. Also, sg-utils(for SCSI 
> block) and
> nvme-cli(for NVMe block) work fine too.

What is the relationship to the existing PRManager functionality
(docs/interop/pr-helper.rst) where block/file-posix.c interprets SCSI
ioctls and sends persistent reservation requests to an external helper
process?

I wonder if block/file-posix.c can implement the new block driver
callbacks using pr_mgr (while keeping the existing scsi-generic
support).

Stefan


signature.asc
Description: PGP signature


Re: [PATCH 5/9] hw/scsi: add persistent reservation in/out api for scsi device

2024-05-09 Thread Stefan Hajnoczi
On Wed, May 08, 2024 at 05:36:25PM +0800, Changqi Lu wrote:
> Add persistent reservation in/out operations in the
> SCSI device layer. By introducing the persistent
> reservation in/out api, this enables the SCSI device
> to perform reservation-related tasks, including querying
> keys, querying reservation status, registering reservation
> keys, initiating and releasing reservations, as well as
> clearing and preempting reservations held by other keys.
> 
> These operations are crucial for management and control of
> shared storage resources in a persistent manner.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  hw/scsi/scsi-disk.c | 302 
>  1 file changed, 302 insertions(+)

Does sg_persist --report-capabilities
(https://linux.die.net/man/8/sg_persist) show that persistent
reservations are supported? Maybe some INQUIRY data still needs to be
added.

> 
> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> index 4bd7af9d0c..bdd66c4026 100644
> --- a/hw/scsi/scsi-disk.c
> +++ b/hw/scsi/scsi-disk.c
> @@ -32,6 +32,7 @@
>  #include "migration/vmstate.h"
>  #include "hw/scsi/emulation.h"
>  #include "scsi/constants.h"
> +#include "scsi/utils.h"
>  #include "sysemu/block-backend.h"
>  #include "sysemu/blockdev.h"
>  #include "hw/block/block.h"
> @@ -1474,6 +1475,296 @@ static void scsi_disk_emulate_read_data(SCSIRequest 
> *req)
>  scsi_req_complete(&r->req, GOOD);
>  }
>  
> +typedef struct SCSIPrReadKeys {
> +uint32_t generation;
> +uint32_t num_keys;
> +uint64_t *keys;
> +void *req;
> +} SCSIPrReadKeys;
> +
> +typedef struct SCSIPrReadReservation {
> +uint32_t generation;
> +uint64_t key;
> +BlockPrType type;
> +void *req;
> +} SCSIPrReadReservation;
> +
> +static void scsi_pr_read_keys_complete(void *opaque, int ret)
> +{
> +int num_keys;
> +uint8_t *buf;
> +SCSIPrReadKeys *blk_keys = (SCSIPrReadKeys *)opaque;
> +SCSIDiskReq *r = (SCSIDiskReq *)blk_keys->req;
> +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
> +
> +assert(blk_get_aio_context(s->qdev.conf.blk) ==
> +qemu_get_current_aio_context());
> +
> +assert(r->req.aiocb != NULL);
> +r->req.aiocb = NULL;
> +
> +if (scsi_disk_req_check_error(r, ret, true)) {
> +goto done;
> +}
> +
> +buf = scsi_req_get_buf(&r->req);
> +num_keys = MIN(blk_keys->num_keys, ret);
> +blk_keys->generation = cpu_to_be32(blk_keys->generation);
> +memcpy(&buf[0], &blk_keys->generation, 4);
> +for (int i = 0; i < num_keys; i++) {
> +blk_keys->keys[i] = cpu_to_be64(blk_keys->keys[i]);
> +memcpy(&buf[8 + i * 8], &blk_keys->keys[i], 8);
> +}
> +num_keys = cpu_to_be32(num_keys * 8);
> +memcpy(&buf[4], &num_keys, 4);
> +
> +scsi_req_data(&r->req, r->buflen);
> +done:
> +scsi_req_unref(&r->req);
> +g_free(blk_keys->keys);
> +g_free(blk_keys);
> +}
> +
> +static int scsi_disk_emulate_pr_read_keys(SCSIRequest *req)
> +{
> +SCSIPrReadKeys *blk_keys;
> +SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
> +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, req->dev);
> +int buflen = MIN(r->req.cmd.xfer, r->buflen);
> +int num_keys = (buflen - sizeof(uint32_t) * 2) / sizeof(uint64_t);
> +
> +blk_keys = g_new0(SCSIPrReadKeys, 1);
> +blk_keys->generation = 0;
> +/* num_keys is the maximum number of keys that can be transmitted */
> +blk_keys->num_keys = num_keys;
> +blk_keys->keys = g_malloc(sizeof(uint64_t) * num_keys);
> +blk_keys->req = r;
> +
> +scsi_req_ref(&r->req);
> +r->req.aiocb = blk_aio_pr_read_keys(s->qdev.conf.blk, 
> &blk_keys->generation,
> +blk_keys->num_keys, blk_keys->keys,
> +scsi_pr_read_keys_complete, 
> blk_keys);
> +return 0;
> +}
> +
> +static void scsi_pr_read_reservation_complete(void *opaque, int ret)
> +{
> +uint8_t *buf;
> +uint32_t num_keys = 0;
> +SCSIPrReadReservation *blk_rsv = (SCSIPrReadReservation *)opaque;
> +SCSIDiskReq *r = (SCSIDiskReq *)blk_rsv->req;
> +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
> +
> +assert(blk_get_aio_context(s->qdev.conf.blk) ==
> +qemu_get_current_aio_context());
> +
> +assert(r->req.aiocb != NULL);
> +r->req.aiocb = NULL;
> +
> +if (scsi_disk_req_check_error(r, ret, true)) {
> +goto done;
> +}
> +
> +buf = scsi_req_get_buf(&r->req);
> +blk_rsv->generation = cpu_to_be32(blk_rsv->generation);
> +memcpy(&buf[0], &blk_rsv->generation, 4);
> +if (ret) {
> +num_keys = cpu_to_be32(16);
> +blk_rsv->key = cpu_to_be64(blk_rsv->key);
> +memcpy(&buf[8], &blk_rsv->key, 8);
> +buf[21] = block_pr_type_to_scsi(blk_rsv->type) & 0xf;
> +} else {
> +num_keys = cpu_to_be32(0);
> +}
> +
> +memcpy(&buf[4], &num_keys, 4);
> +sc

[PATCH] vfio: container: Fix missing allocation of VFIOSpaprContainer

2024-05-09 Thread Shivaprasad G Bhat
The commit 6ad359ec29 "(vfio/spapr: Move prereg_listener into
spapr container)" began to use the newly introduced VFIOSpaprContainer
structure.

After several refactors, today the container_of(container,
VFIOSpaprContainer, ABC) is used when VFIOSpaprContainer is actually
not allocated. On PPC64 systems, this dereference is leading to corruption
showing up as glibc malloc assertion during guest start when using vfio.

Patch adds the missing allocation while also making the structure movement
to vfio common header file.

Fixes: 6ad359ec29 "(vfio/spapr: Move prereg_listener into spapr container)"
Signed-off-by: Shivaprasad G Bhat 
---
 hw/vfio/container.c   |6 --
 hw/vfio/spapr.c   |6 --
 include/hw/vfio/vfio-common.h |6 ++
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 77bdec276e..ecaf5786d9 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -539,6 +539,7 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
 {
 VFIOContainer *container;
 VFIOContainerBase *bcontainer;
+VFIOSpaprContainer *scontainer;
 int ret, fd;
 VFIOAddressSpace *space;

@@ -611,7 +612,8 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
 goto close_fd_exit;
 }

-container = g_malloc0(sizeof(*container));
+scontainer = g_malloc0(sizeof(*scontainer));
+container = &scontainer->container;
 container->fd = fd;
 bcontainer = &container->bcontainer;

@@ -675,7 +677,7 @@ unregister_container_exit:
 vfio_cpr_unregister_container(bcontainer);

 free_container_exit:
-g_free(container);
+g_free(scontainer);

 close_fd_exit:
 close(fd);
diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
index 0d949bb728..78d218b7e7 100644
--- a/hw/vfio/spapr.c
+++ b/hw/vfio/spapr.c
@@ -24,12 +24,6 @@
 #include "qapi/error.h"
 #include "trace.h"

-typedef struct VFIOSpaprContainer {
-VFIOContainer container;
-MemoryListener prereg_listener;
-QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
-} VFIOSpaprContainer;
-
 static bool vfio_prereg_listener_skipped_section(MemoryRegionSection *section)
 {
 if (memory_region_is_iommu(section->mr)) {
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index b9da6c08ef..010fa68ac6 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -82,6 +82,12 @@ typedef struct VFIOContainer {
 QLIST_HEAD(, VFIOGroup) group_list;
 } VFIOContainer;

+typedef struct VFIOSpaprContainer {
+VFIOContainer container;
+MemoryListener prereg_listener;
+QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
+} VFIOSpaprContainer;
+
 typedef struct VFIOHostDMAWindow {
 hwaddr min_iova;
 hwaddr max_iova;





Re: [PATCH V1 26/26] migration: only-migratable-modes

2024-05-09 Thread Fabiano Rosas
Steve Sistare  writes:

> Add the only-migratable-modes option as a generalization of only-migratable.
> Only devices that support all requested modes are allowed.
>
> Signed-off-by: Steve Sistare 
> ---
>  include/migration/misc.h   |  3 +++
>  include/sysemu/sysemu.h|  1 -
>  migration/migration-hmp-cmds.c | 26 +-
>  migration/migration.c  | 22 +-
>  migration/savevm.c |  2 +-
>  qemu-options.hx| 16 ++--
>  system/globals.c   |  1 -
>  system/vl.c| 13 -
>  target/s390x/cpu_models.c  |  4 +++-
>  9 files changed, 75 insertions(+), 13 deletions(-)
>
> diff --git a/include/migration/misc.h b/include/migration/misc.h
> index 5b963ba..3ad2cd9 100644
> --- a/include/migration/misc.h
> +++ b/include/migration/misc.h
> @@ -119,6 +119,9 @@ bool migration_incoming_postcopy_advised(void);
>  /* True if background snapshot is active */
>  bool migration_in_bg_snapshot(void);
>  
> +void migration_set_required_mode(MigMode mode);
> +bool migration_mode_required(MigMode mode);
> +
>  /* migration/block-dirty-bitmap.c */
>  void dirty_bitmap_mig_init(void);
>  
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index 5b4397e..0a9c4b4 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -8,7 +8,6 @@
>  
>  /* vl.c */
>  
> -extern int only_migratable;
>  extern const char *qemu_name;
>  extern QemuUUID qemu_uuid;
>  extern bool qemu_uuid_set;
> diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
> index 414c7e8..ca913b7 100644
> --- a/migration/migration-hmp-cmds.c
> +++ b/migration/migration-hmp-cmds.c
> @@ -16,6 +16,7 @@
>  #include "qemu/osdep.h"
>  #include "block/qapi.h"
>  #include "migration/snapshot.h"
> +#include "migration/misc.h"
>  #include "monitor/hmp.h"
>  #include "monitor/monitor.h"
>  #include "qapi/error.h"
> @@ -33,6 +34,28 @@
>  #include "options.h"
>  #include "migration.h"
>  
> +static void migration_dump_modes(Monitor *mon)
> +{
> +int mode, n = 0;
> +
> +monitor_printf(mon, "only-migratable-modes: ");
> +
> +for (mode = 0; mode < MIG_MODE__MAX; mode++) {
> +if (migration_mode_required(mode)) {
> +if (n++) {
> +monitor_printf(mon, ",");
> +}
> +monitor_printf(mon, "%s", MigMode_str(mode));
> +}
> +}
> +
> +if (!n) {
> +monitor_printf(mon, "none\n");
> +} else {
> +monitor_printf(mon, "\n");
> +}
> +}
> +
>  static void migration_global_dump(Monitor *mon)
>  {
>  MigrationState *ms = migrate_get_current();
> @@ -41,7 +64,7 @@ static void migration_global_dump(Monitor *mon)
>  monitor_printf(mon, "store-global-state: %s\n",
> ms->store_global_state ? "on" : "off");
>  monitor_printf(mon, "only-migratable: %s\n",
> -   only_migratable ? "on" : "off");
> +   migration_mode_required(MIG_MODE_NORMAL) ? "on" : "off");
>  monitor_printf(mon, "send-configuration: %s\n",
> ms->send_configuration ? "on" : "off");
>  monitor_printf(mon, "send-section-footer: %s\n",
> @@ -50,6 +73,7 @@ static void migration_global_dump(Monitor *mon)
> ms->decompress_error_check ? "on" : "off");
>  monitor_printf(mon, "clear-bitmap-shift: %u\n",
> ms->clear_bitmap_shift);
> +migration_dump_modes(mon);
>  }
>  
>  void hmp_info_migrate(Monitor *mon, const QDict *qdict)
> diff --git a/migration/migration.c b/migration/migration.c
> index 4984dee..5535b84 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1719,17 +1719,29 @@ static bool is_busy(Error **reasonp, Error **errp)
>  return false;
>  }
>  
> -static bool is_only_migratable(Error **reasonp, Error **errp, int modes)
> +static int migration_modes_required;
> +
> +void migration_set_required_mode(MigMode mode)
> +{
> +migration_modes_required |= BIT(mode);
> +}
> +
> +bool migration_mode_required(MigMode mode)
> +{
> +return !!(migration_modes_required & BIT(mode));
> +}
> +
> +static bool modes_are_required(Error **reasonp, Error **errp, int modes)
>  {
>  ERRP_GUARD();
>  
> -if (only_migratable && (modes & BIT(MIG_MODE_NORMAL))) {
> +if (migration_modes_required & modes) {
>  error_propagate_prepend(errp, *reasonp,
> -"disallowing migration blocker "
> -"(--only-migratable) for: ");
> +"-only-migratable{-modes}  specified, but: 
> ");

extra space before 'specified'

>  *reasonp = NULL;
>  return true;
>  }
> +
>  return false;
>  }
>  
> @@ -1783,7 +1795,7 @@ int migrate_add_blocker_modes(Error **reasonp, Error 
> **errp, MigMode mode, ...)
>  modes = get_modes(mode, ap);
>  va_end(ap);
>  
> -if (is_only_migratable(reasonp, e

Re: [PATCH v4 00/12] vhost-user: support any POSIX system (tested on macOS, FreeBSD, OpenBSD)

2024-05-09 Thread Stefan Hajnoczi
On Wed, May 08, 2024 at 09:44:44AM +0200, Stefano Garzarella wrote:
> v1: https://patchew.org/QEMU/20240228114759.44758-1-sgarz...@redhat.com/
> v2: https://patchew.org/QEMU/20240326133936.125332-1-sgarz...@redhat.com/
> v3: https://patchew.org/QEMU/20240404122330.92710-1-sgarz...@redhat.com/
> v4:
>   - rebased on master (commit e116b92d01c2cd75957a9f8ad1d4932292867b81)
>   - added patch 6 to move using QEMU bswap helper functions in a separate
> patch (Phil)
>   - fail if we find "share=off" in shm_backend_memory_alloc() (David)
>   - added Phil's R-b and David's A-b
> 
> The vhost-user protocol is not really Linux-specific, so let's try support
> QEMU's frontends and backends (including libvhost-user) in any POSIX system
> with this series. The main use case is to be able to use virtio devices that
> we don't have built-in in QEMU (e.g. virtiofsd, vhost-user-vsock, etc.) even
> in non-Linux systems.
> 
> The first 5 patches are more like fixes discovered at runtime on macOS or
> FreeBSD that could go even independently of this series.
> 
> Patches 6, 7, 8, and 9 enable building of frontends and backends (including
> libvhost-user) with associated code changes to succeed in compilation.
> 
> Patch 10 adds `memory-backend-shm` that uses the POSIX shm_open() API to
> create shared memory which is identified by an fd that can be shared with
> vhost-user backends. This is useful on those systems (like macOS) where
> we don't have memfd_create() or special filesystems like "/dev/shm".
> 
> Patches 11 and 12 use `memory-backend-shm` in some vhost-user tests.
> 
> Maybe the first 5 patches can go separately, but I only discovered those
> problems after testing patches 6 - 9, so I have included them in this series
> for now. Please let me know if you prefer that I send them separately.
> 
> I tested this series using vhost-user-blk and QSD on macOS Sonoma 14.4
> (aarch64), FreeBSD 14 (x86_64), OpenBSD 7.4 (x86_64), and Fedora 39 (x86_64)
> in this way:
> 
> - Start vhost-user-blk or QSD (same commands for all systems)
> 
>   vhost-user-blk -s /tmp/vhost.socket \
> -b Fedora-Cloud-Base-39-1.5.x86_64.raw
> 
>   qemu-storage-daemon \
> --blockdev 
> file,filename=Fedora-Cloud-Base-39-1.5.x86_64.qcow2,node-name=file \
> --blockdev qcow2,file=file,node-name=qcow2 \
> --export 
> vhost-user-blk,addr.type=unix,addr.path=/tmp/vhost.socket,id=vub,num-queues=1,node-name=qcow2,writable=on
> 
> - macOS (aarch64): start QEMU (using hvf accelerator)
> 
>   qemu-system-aarch64 -smp 2 -cpu host -M virt,accel=hvf,memory-backend=mem \
> -drive 
> file=./build/pc-bios/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on \
> -device virtio-net-device,netdev=net0 -netdev user,id=net0 \
> -device ramfb -device usb-ehci -device usb-kbd \
> -object memory-backend-shm,id=mem,size=512M \
> -device vhost-user-blk-pci,num-queues=1,disable-legacy=on,chardev=char0 \
> -chardev socket,id=char0,path=/tmp/vhost.socket
> 
> - FreeBSD/OpenBSD (x86_64): start QEMU (no accelerators available)
> 
>   qemu-system-x86_64 -smp 2 -M q35,memory-backend=mem \
> -object memory-backend-shm,id=mem,size="512M" \
> -device vhost-user-blk-pci,num-queues=1,chardev=char0 \
> -chardev socket,id=char0,path=/tmp/vhost.socket
> 
> - Fedora (x86_64): start QEMU (using kvm accelerator)
> 
>   qemu-system-x86_64 -smp 2 -M q35,accel=kvm,memory-backend=mem \
> -object memory-backend-shm,size="512M" \
> -device vhost-user-blk-pci,num-queues=1,chardev=char0 \
> -chardev socket,id=char0,path=/tmp/vhost.socket
> 
> Branch pushed (and CI started) at 
> https://gitlab.com/sgarzarella/qemu/-/tree/macos-vhost-user?ref_type=heads
> 
> Thanks,
> Stefano
> 
> Stefano Garzarella (12):
>   libvhost-user: set msg.msg_control to NULL when it is empty
>   libvhost-user: fail vu_message_write() if sendmsg() is failing
>   libvhost-user: mask F_INFLIGHT_SHMFD if memfd is not supported
>   vhost-user-server: do not set memory fd non-blocking
>   contrib/vhost-user-blk: fix bind() using the right size of the address
>   contrib/vhost-user-*: use QEMU bswap helper functions
>   vhost-user: enable frontends on any POSIX system
>   libvhost-user: enable it on any POSIX system
>   contrib/vhost-user-blk: enable it on any POSIX system
>   hostmem: add a new memory backend based on POSIX shm_open()
>   tests/qtest/vhost-user-blk-test: use memory-backend-shm
>   tests/qtest/vhost-user-test: add a test case for memory-backend-shm
> 
>  docs/system/devices/vhost-user.rst|   5 +-
>  meson.build   |   5 +-
>  qapi/qom.json |  17 +++
>  subprojects/libvhost-user/libvhost-user.h |   2 +-
>  backends/hostmem-shm.c| 123 ++
>  contrib/vhost-user-blk/vhost-user-blk.c   |  27 +++--
>  contrib/vhost-user-input/main.c   |  16 +--
>  hw/net/vhost_net.c|   5 +
>  subprojects/libvhost-user/libvhost-user.c |  76 +

[PATCH v5 00/32] Misc PPC exception and BookE MMU clean ups

2024-05-09 Thread BALATON Zoltan
This series does some further clean up mostly around BookE MMU to
untangle it from other MMU models. It also contains some other changes
that I've come up with while working on this. The Simplify
ppc_booke_xlate() part 1 and part 2 patches could be squashed together
but left them separate for easier review.

v5:
- drop sc patches from this series
- eliminate uninit warning work arounds and also get rid of
get_physical_address_wtlb() (one memset is still needed temporarily
but can be removed at the end)
- use function instead of macro

v4:
- Add a (probably redundant) check for MPC8xx case in ppc_xlate so we
don't have to care about it in lower levels
- Detangle BookE related functions from mmu_ctx_t to avoid some used
uninit work arounds and allow these to be moved out to mmu-booke.c
- Some other tweaks asked during review

v3:
- Address review comments from Nick
- Rebase on master
- Squashed some patches together
- Add some more patches I've done since last version

v2:
- Fix user mode issue in patch 1 by keeping old behaviour for user mode
- Add some more MMU clean up patches

Regards,
BALATON Zoltan


BALATON Zoltan (32):
  target/ppc: Remove unused helper
  target/ppc/mmu_common.c: Move calculation of a value closer to its
usage
  target/ppc/mmu_common.c: Remove unneeded local variable
  target/ppc/mmu_common.c: Simplify checking for real mode
  target/ppc/mmu_common.c: Drop cases for unimplemented MPC8xx MMU
  target/ppc/mmu_common.c: Introduce mmu6xx_get_physical_address()
  target/ppc/mmu_common.c: Move else branch to avoid large if block
  target/ppc/mmu_common.c: Move some debug logging
  target/ppc/mmu_common.c: Eliminate ret from
mmu6xx_get_physical_address()
  target/ppc/mmu_common.c: Split out BookE cases before checking real
mode
  target/ppc/mmu_common.c: Split off real mode cases in
get_physical_address_wtlb()
  target/ppc/mmu_common.c: Inline and remove check_physical()
  target/ppc/mmu_common.c: Fix misindented qemu_log_mask() calls
  target/ppc/mmu_common.c: Deindent ppc_jumbo_xlate()
  target/ppc/mmu_common.c: Replace hard coded constants in
ppc_jumbo_xlate()
  target/ppc/mmu_common.c: Don't use mmu_ctx_t for
mmu40x_get_physical_address()
  target/ppc/mmu_common.c: Don't use mmu_ctx_t in
mmubooke_get_physical_address()
  target/ppc/mmu_common.c: Don't use mmu_ctx_t in
mmubooke206_get_physical_address()
  target/ppc: Remove pp_check() and reuse ppc_hash32_pp_prot()
  target/ppc/mmu_common.c: Remove BookE from direct store handling
  target/ppc/mmu_common.c: Split off BookE handling from
ppc_jumbo_xlate()
  target/ppc/mmu_common.c: Eliminate get_physical_address_wtlb()
  target/ppc/mmu_common.c: Move mmu_ctx_t type to mmu_common.c
  target/ppc/mmu_common.c: Simplify ppc_booke_xlate() part 1
  target/ppc/mmu_common.c: Simplify ppc_booke_xlate() part 2
  target/ppc: Remove id_tlbs flag from CPU env
  target/ppc: Split off common embedded TLB init
  target/ppc/mmu-hash32.c: Drop a local variable
  target/ppc/mmu-radix64.c: Drop a local variable
  target/ppc: Add a function to check for page protection bit
  target/ppc: Move out BookE and related MMU functions from mmu_common.c
  target/ppc/mmu_common.c: Remove work around for spurious warnings

 hw/ppc/pegasos2.c|2 +-
 target/ppc/cpu.h |9 +-
 target/ppc/cpu_init.c|   70 +--
 target/ppc/helper.h  |2 -
 target/ppc/helper_regs.c |1 -
 target/ppc/internal.h|   75 +--
 target/ppc/meson.build   |1 +
 target/ppc/mmu-booke.c   |  531 +
 target/ppc/mmu-booke.h   |   17 +
 target/ppc/mmu-hash32.c  |   54 +-
 target/ppc/mmu-hash64.c  |2 +-
 target/ppc/mmu-radix64.c |5 +-
 target/ppc/mmu_common.c  | 1158 +-
 target/ppc/mmu_helper.c  |   37 +-
 14 files changed, 891 insertions(+), 1073 deletions(-)
 create mode 100644 target/ppc/mmu-booke.c
 create mode 100644 target/ppc/mmu-booke.h

-- 
2.30.9




[PATCH v5 09/32] target/ppc/mmu_common.c: Eliminate ret from mmu6xx_get_physical_address()

2024-05-09 Thread BALATON Zoltan
Return directly, which is simpler than dragging a return value through
multpile if and else blocks.

Signed-off-by: BALATON Zoltan 
Reviewed-by: Nicholas Piggin 
---
 target/ppc/mmu_common.c | 84 +++--
 1 file changed, 39 insertions(+), 45 deletions(-)

diff --git a/target/ppc/mmu_common.c b/target/ppc/mmu_common.c
index 89bfd9aa45..03d9e6bfda 100644
--- a/target/ppc/mmu_common.c
+++ b/target/ppc/mmu_common.c
@@ -386,7 +386,6 @@ static int mmu6xx_get_physical_address(CPUPPCState *env, 
mmu_ctx_t *ctx,
 target_ulong vsid, sr, pgidx;
 int ds, target_page_bits;
 bool pr;
-int ret;
 
 /* First try to find a BAT entry if there are any */
 if (env->nb_BATs && get_bat_6xx_tlb(env, ctx, eaddr, access_type) == 0) {
@@ -419,7 +418,6 @@ static int mmu6xx_get_physical_address(CPUPPCState *env, 
mmu_ctx_t *ctx,
 qemu_log_mask(CPU_LOG_MMU,
 "pte segment: key=%d ds %d nx %d vsid " TARGET_FMT_lx "\n",
 ctx->key, ds, ctx->nx, vsid);
-ret = -1;
 if (!ds) {
 /* Check if instruction fetch is allowed, if needed */
 if (type == ACCESS_CODE && ctx->nx) {
@@ -436,51 +434,47 @@ static int mmu6xx_get_physical_address(CPUPPCState *env, 
mmu_ctx_t *ctx,
 /* Initialize real address with an invalid value */
 ctx->raddr = (hwaddr)-1ULL;
 /* Software TLB search */
-ret = ppc6xx_tlb_check(env, ctx, eaddr, access_type);
-} else {
-qemu_log_mask(CPU_LOG_MMU, "direct store...\n");
-/* Direct-store segment : absolutely *BUGGY* for now */
-
-switch (type) {
-case ACCESS_INT:
-/* Integer load/store : only access allowed */
-break;
-case ACCESS_CODE:
-/* No code fetch is allowed in direct-store areas */
-return -4;
-case ACCESS_FLOAT:
-/* Floating point load/store */
-return -4;
-case ACCESS_RES:
-/* lwarx, ldarx or srwcx. */
-return -4;
-case ACCESS_CACHE:
-/*
- * dcba, dcbt, dcbtst, dcbf, dcbi, dcbst, dcbz, or icbi
- *
- * Should make the instruction do no-op.  As it already do
- * no-op, it's quite easy :-)
- */
-ctx->raddr = eaddr;
-return 0;
-case ACCESS_EXT:
-/* eciwx or ecowx */
-return -4;
-default:
-qemu_log_mask(CPU_LOG_MMU, "ERROR: instruction should not need "
-  "address translation\n");
-return -4;
-}
-if ((access_type == MMU_DATA_STORE || ctx->key != 1) &&
-(access_type == MMU_DATA_LOAD || ctx->key != 0)) {
-ctx->raddr = eaddr;
-ret = 2;
-} else {
-ret = -2;
-}
+return ppc6xx_tlb_check(env, ctx, eaddr, access_type);
 }
 
-return ret;
+/* Direct-store segment : absolutely *BUGGY* for now */
+qemu_log_mask(CPU_LOG_MMU, "direct store...\n");
+switch (type) {
+case ACCESS_INT:
+/* Integer load/store : only access allowed */
+break;
+case ACCESS_CODE:
+/* No code fetch is allowed in direct-store areas */
+return -4;
+case ACCESS_FLOAT:
+/* Floating point load/store */
+return -4;
+case ACCESS_RES:
+/* lwarx, ldarx or srwcx. */
+return -4;
+case ACCESS_CACHE:
+/*
+ * dcba, dcbt, dcbtst, dcbf, dcbi, dcbst, dcbz, or icbi
+ *
+ * Should make the instruction do no-op.  As it already do
+ * no-op, it's quite easy :-)
+ */
+ctx->raddr = eaddr;
+return 0;
+case ACCESS_EXT:
+/* eciwx or ecowx */
+return -4;
+default:
+qemu_log_mask(CPU_LOG_MMU, "ERROR: instruction should not need address"
+   " translation\n");
+return -4;
+}
+if ((access_type == MMU_DATA_STORE || ctx->key != 1) &&
+(access_type == MMU_DATA_LOAD || ctx->key != 0)) {
+ctx->raddr = eaddr;
+return 2;
+}
+return -2;
 }
 
 /* Generic TLB check function for embedded PowerPC implementations */
-- 
2.30.9




[PATCH v5 04/32] target/ppc/mmu_common.c: Simplify checking for real mode

2024-05-09 Thread BALATON Zoltan
In get_physical_address_wtlb() the real_mode flag depends on either
the MSR[IR] or MSR[DR] bit depending on access_type. Extract just the
needed bit in a more straight forward way instead of doing unnecessary
computation.

Signed-off-by: BALATON Zoltan 
Reviewed-by: Nicholas Piggin 
---
 target/ppc/mmu_common.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/target/ppc/mmu_common.c b/target/ppc/mmu_common.c
index 09cbeb0052..886fb6a657 100644
--- a/target/ppc/mmu_common.c
+++ b/target/ppc/mmu_common.c
@@ -1184,8 +1184,10 @@ int get_physical_address_wtlb(CPUPPCState *env, 
mmu_ctx_t *ctx,
  int mmu_idx)
 {
 int ret = -1;
-bool real_mode = (type == ACCESS_CODE && !FIELD_EX64(env->msr, MSR, IR)) ||
- (type != ACCESS_CODE && !FIELD_EX64(env->msr, MSR, DR));
+bool real_mode;
+
+real_mode = (type == ACCESS_CODE) ? !FIELD_EX64(env->msr, MSR, IR)
+  : !FIELD_EX64(env->msr, MSR, DR);
 
 switch (env->mmu_model) {
 case POWERPC_MMU_SOFT_6xx:
-- 
2.30.9




[PATCH v5 02/32] target/ppc/mmu_common.c: Move calculation of a value closer to its usage

2024-05-09 Thread BALATON Zoltan
In mmubooke_check_tlb() and mmubooke206_check_tlb() prot2 is
calculated first but only used after an unrelated check that can
return before tha value is used. Move the calculation after the check,
closer to where it is used, to keep them together and avoid computing
it when not needed.

Signed-off-by: BALATON Zoltan 
Reviwed-by: Nicholas Piggin 
---
 target/ppc/mmu_common.c | 36 +---
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/target/ppc/mmu_common.c b/target/ppc/mmu_common.c
index 4fde7fd3bf..f79e390306 100644
--- a/target/ppc/mmu_common.c
+++ b/target/ppc/mmu_common.c
@@ -635,12 +635,6 @@ static int mmubooke_check_tlb(CPUPPCState *env, 
ppcemb_tlb_t *tlb,
 return -1;
 }
 
-if (FIELD_EX64(env->msr, MSR, PR)) {
-prot2 = tlb->prot & 0xF;
-} else {
-prot2 = (tlb->prot >> 4) & 0xF;
-}
-
 /* Check the address space */
 if ((access_type == MMU_INST_FETCH ?
 FIELD_EX64(env->msr, MSR, IR) :
@@ -649,6 +643,11 @@ static int mmubooke_check_tlb(CPUPPCState *env, 
ppcemb_tlb_t *tlb,
 return -1;
 }
 
+if (FIELD_EX64(env->msr, MSR, PR)) {
+prot2 = tlb->prot & 0xF;
+} else {
+prot2 = (tlb->prot >> 4) & 0xF;
+}
 *prot = prot2;
 if (prot2 & prot_for_access_type(access_type)) {
 qemu_log_mask(CPU_LOG_MMU, "%s: good TLB!\n", __func__);
@@ -830,6 +829,18 @@ static int mmubooke206_check_tlb(CPUPPCState *env, 
ppcmas_tlb_t *tlb,
 
 found_tlb:
 
+/* Check the address space and permissions */
+if (access_type == MMU_INST_FETCH) {
+/* There is no way to fetch code using epid load */
+assert(!use_epid);
+as = FIELD_EX64(env->msr, MSR, IR);
+}
+
+if (as != ((tlb->mas1 & MAS1_TS) >> MAS1_TS_SHIFT)) {
+qemu_log_mask(CPU_LOG_MMU, "%s: AS doesn't match\n", __func__);
+return -1;
+}
+
 if (pr) {
 if (tlb->mas7_3 & MAS3_UR) {
 prot2 |= PAGE_READ;
@@ -851,19 +862,6 @@ found_tlb:
 prot2 |= PAGE_EXEC;
 }
 }
-
-/* Check the address space and permissions */
-if (access_type == MMU_INST_FETCH) {
-/* There is no way to fetch code using epid load */
-assert(!use_epid);
-as = FIELD_EX64(env->msr, MSR, IR);
-}
-
-if (as != ((tlb->mas1 & MAS1_TS) >> MAS1_TS_SHIFT)) {
-qemu_log_mask(CPU_LOG_MMU, "%s: AS doesn't match\n", __func__);
-return -1;
-}
-
 *prot = prot2;
 if (prot2 & prot_for_access_type(access_type)) {
 qemu_log_mask(CPU_LOG_MMU, "%s: good TLB!\n", __func__);
-- 
2.30.9




[PATCH v5 01/32] target/ppc: Remove unused helper

2024-05-09 Thread BALATON Zoltan
The helper_rac function is defined but not used, remove it.

Fixes: 005b69fdcc (target/ppc: Remove PowerPC 601 CPUs)
Signed-off-by: BALATON Zoltan 
Reviwed-by: Nicholas Piggin 
---
 target/ppc/helper.h |  2 --
 target/ppc/mmu_helper.c | 24 
 2 files changed, 26 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 86f97ee1e7..f769e01c3d 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -700,8 +700,6 @@ DEF_HELPER_2(book3s_msgclr, void, env, tl)
 
 DEF_HELPER_4(dlmzb, tl, env, tl, tl, i32)
 #if !defined(CONFIG_USER_ONLY)
-DEF_HELPER_2(rac, tl, env, tl)
-
 DEF_HELPER_2(load_dcr, tl, env, tl)
 DEF_HELPER_3(store_dcr, void, env, tl, tl)
 #endif
diff --git a/target/ppc/mmu_helper.c b/target/ppc/mmu_helper.c
index b35a93c198..421e777ee6 100644
--- a/target/ppc/mmu_helper.c
+++ b/target/ppc/mmu_helper.c
@@ -596,30 +596,6 @@ void helper_6xx_tlbi(CPUPPCState *env, target_ulong EPN)
 do_6xx_tlb(env, EPN, 1);
 }
 
-/*/
-/* PowerPC 601 specific instructions (POWER bridge) */
-
-target_ulong helper_rac(CPUPPCState *env, target_ulong addr)
-{
-mmu_ctx_t ctx;
-int nb_BATs;
-target_ulong ret = 0;
-
-/*
- * We don't have to generate many instances of this instruction,
- * as rac is supervisor only.
- *
- * XXX: FIX THIS: Pretend we have no BAT
- */
-nb_BATs = env->nb_BATs;
-env->nb_BATs = 0;
-if (get_physical_address_wtlb(env, &ctx, addr, 0, ACCESS_INT, 0) == 0) {
-ret = ctx.raddr;
-}
-env->nb_BATs = nb_BATs;
-return ret;
-}
-
 static inline target_ulong booke_tlb_to_page_size(int size)
 {
 return 1024 << (2 * size);
-- 
2.30.9




[PATCH v5 31/32] target/ppc: Move out BookE and related MMU functions from mmu_common.c

2024-05-09 Thread BALATON Zoltan
Add a new mmu-booke.c file for BookE and related MMU bits from
mmu_common.c.

Signed-off-by: BALATON Zoltan 
Acked-by: Nicholas Piggin 
---
 target/ppc/cpu.h|   4 -
 target/ppc/meson.build  |   1 +
 target/ppc/mmu-booke.c  | 531 
 target/ppc/mmu-booke.h  |  17 ++
 target/ppc/mmu_common.c | 508 +-
 target/ppc/mmu_helper.c |   1 +
 6 files changed, 551 insertions(+), 511 deletions(-)
 create mode 100644 target/ppc/mmu-booke.c
 create mode 100644 target/ppc/mmu-booke.h

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index cfb3ba5ac8..92b50a1be2 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1606,10 +1606,6 @@ void ppc_tlb_invalidate_all(CPUPPCState *env);
 void ppc_tlb_invalidate_one(CPUPPCState *env, target_ulong addr);
 void cpu_ppc_set_vhyp(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp);
 void cpu_ppc_set_1lpar(PowerPCCPU *cpu);
-int ppcmas_tlb_check(CPUPPCState *env, ppcmas_tlb_t *tlb, hwaddr *raddrp,
- target_ulong address, uint32_t pid);
-int ppcemb_tlb_search(CPUPPCState *env, target_ulong address, uint32_t pid);
-hwaddr booke206_tlb_to_page_size(CPUPPCState *env, ppcmas_tlb_t *tlb);
 #endif
 
 void ppc_store_fpscr(CPUPPCState *env, target_ulong val);
diff --git a/target/ppc/meson.build b/target/ppc/meson.build
index 0b89f9b89f..db3b7a0c33 100644
--- a/target/ppc/meson.build
+++ b/target/ppc/meson.build
@@ -37,6 +37,7 @@ ppc_system_ss.add(files(
   'arch_dump.c',
   'machine.c',
   'mmu-hash32.c',
+  'mmu-booke.c',
   'mmu_common.c',
   'ppc-qmp-cmds.c',
 ))
diff --git a/target/ppc/mmu-booke.c b/target/ppc/mmu-booke.c
new file mode 100644
index 00..55e5dd7c6b
--- /dev/null
+++ b/target/ppc/mmu-booke.c
@@ -0,0 +1,531 @@
+/*
+ *  PowerPC BookE MMU, TLB emulation helpers for QEMU.
+ *
+ *  Copyright (c) 2003-2007 Jocelyn Mayer
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "exec/page-protection.h"
+#include "exec/log.h"
+#include "cpu.h"
+#include "internal.h"
+#include "mmu-booke.h"
+
+/* Generic TLB check function for embedded PowerPC implementations */
+static bool ppcemb_tlb_check(CPUPPCState *env, ppcemb_tlb_t *tlb,
+ hwaddr *raddrp,
+ target_ulong address, uint32_t pid, int i)
+{
+target_ulong mask;
+
+/* Check valid flag */
+if (!(tlb->prot & PAGE_VALID)) {
+return false;
+}
+mask = ~(tlb->size - 1);
+qemu_log_mask(CPU_LOG_MMU, "%s: TLB %d address " TARGET_FMT_lx
+  " PID %u <=> " TARGET_FMT_lx " " TARGET_FMT_lx " %u %x\n",
+  __func__, i, address, pid, tlb->EPN,
+  mask, (uint32_t)tlb->PID, tlb->prot);
+/* Check PID */
+if (tlb->PID != 0 && tlb->PID != pid) {
+return false;
+}
+/* Check effective address */
+if ((address & mask) != tlb->EPN) {
+return false;
+}
+*raddrp = (tlb->RPN & mask) | (address & ~mask);
+return true;
+}
+
+/* Generic TLB search function for PowerPC embedded implementations */
+int ppcemb_tlb_search(CPUPPCState *env, target_ulong address, uint32_t pid)
+{
+ppcemb_tlb_t *tlb;
+hwaddr raddr;
+int i;
+
+for (i = 0; i < env->nb_tlb; i++) {
+tlb = &env->tlb.tlbe[i];
+if (ppcemb_tlb_check(env, tlb, &raddr, address, pid, i)) {
+return i;
+}
+}
+return -1;
+}
+
+int mmu40x_get_physical_address(CPUPPCState *env, hwaddr *raddr, int *prot,
+target_ulong address,
+MMUAccessType access_type)
+{
+ppcemb_tlb_t *tlb;
+int i, ret, zsel, zpr, pr;
+
+ret = -1;
+pr = FIELD_EX64(env->msr, MSR, PR);
+for (i = 0; i < env->nb_tlb; i++) {
+tlb = &env->tlb.tlbe[i];
+if (!ppcemb_tlb_check(env, tlb, raddr, address,
+  env->spr[SPR_40x_PID], i)) {
+continue;
+}
+zsel = (tlb->attr >> 4) & 0xF;
+zpr = (env->spr[SPR_40x_ZPR] >> (30 - (2 * zsel))) & 0x3;
+qemu_log_mask(CPU_LOG_MMU,
+  "%s: TLB %d zsel %d zpr %d ty %d attr %08x\n",
+  __func__, i, zsel, zpr, access_type, tlb->attr);
+/* Check execute enable bit */
+switch (zpr) {
+case 0x2:
+  

[PATCH v5 14/32] target/ppc/mmu_common.c: Deindent ppc_jumbo_xlate()

2024-05-09 Thread BALATON Zoltan
Instead of putting a large block of code in an if, invert the
condition and return early to be able to deindent the code block.

Signed-off-by: BALATON Zoltan 
Acked-by: Nicholas Piggin 
---
 target/ppc/mmu_common.c | 319 
 1 file changed, 159 insertions(+), 160 deletions(-)

diff --git a/target/ppc/mmu_common.c b/target/ppc/mmu_common.c
index 124148b3da..f40481b4b1 100644
--- a/target/ppc/mmu_common.c
+++ b/target/ppc/mmu_common.c
@@ -1264,187 +1264,186 @@ static bool ppc_jumbo_xlate(PowerPCCPU *cpu, vaddr 
eaddr,
 *protp = ctx.prot;
 *psizep = TARGET_PAGE_BITS;
 return true;
+} else if (!guest_visible) {
+return false;
 }
 
-if (guest_visible) {
-log_cpu_state_mask(CPU_LOG_MMU, cs, 0);
-if (type == ACCESS_CODE) {
-switch (ret) {
-case -1:
-/* No matches in page tables or TLB */
-switch (env->mmu_model) {
-case POWERPC_MMU_SOFT_6xx:
-cs->exception_index = POWERPC_EXCP_IFTLB;
-env->error_code = 1 << 18;
-env->spr[SPR_IMISS] = eaddr;
-env->spr[SPR_ICMP] = 0x8000 | ctx.ptem;
-goto tlb_miss;
-case POWERPC_MMU_SOFT_4xx:
-cs->exception_index = POWERPC_EXCP_ITLB;
-env->error_code = 0;
-env->spr[SPR_40x_DEAR] = eaddr;
-env->spr[SPR_40x_ESR] = 0x;
-break;
-case POWERPC_MMU_BOOKE206:
-booke206_update_mas_tlb_miss(env, eaddr, 2, mmu_idx);
-/* fall through */
-case POWERPC_MMU_BOOKE:
-cs->exception_index = POWERPC_EXCP_ITLB;
-env->error_code = 0;
-env->spr[SPR_BOOKE_DEAR] = eaddr;
-env->spr[SPR_BOOKE_ESR] = mmubooke206_esr(mmu_idx, 
MMU_DATA_LOAD);
-break;
-case POWERPC_MMU_REAL:
-cpu_abort(cs, "PowerPC in real mode should never raise "
-  "any MMU exceptions\n");
-default:
-cpu_abort(cs, "Unknown or invalid MMU model\n");
-}
+log_cpu_state_mask(CPU_LOG_MMU, cs, 0);
+if (type == ACCESS_CODE) {
+switch (ret) {
+case -1:
+/* No matches in page tables or TLB */
+switch (env->mmu_model) {
+case POWERPC_MMU_SOFT_6xx:
+cs->exception_index = POWERPC_EXCP_IFTLB;
+env->error_code = 1 << 18;
+env->spr[SPR_IMISS] = eaddr;
+env->spr[SPR_ICMP] = 0x8000 | ctx.ptem;
+goto tlb_miss;
+case POWERPC_MMU_SOFT_4xx:
+cs->exception_index = POWERPC_EXCP_ITLB;
+env->error_code = 0;
+env->spr[SPR_40x_DEAR] = eaddr;
+env->spr[SPR_40x_ESR] = 0x;
 break;
-case -2:
-/* Access rights violation */
-cs->exception_index = POWERPC_EXCP_ISI;
-if ((env->mmu_model == POWERPC_MMU_BOOKE) ||
-(env->mmu_model == POWERPC_MMU_BOOKE206)) {
-env->error_code = 0;
-} else {
-env->error_code = 0x0800;
-}
+case POWERPC_MMU_BOOKE206:
+booke206_update_mas_tlb_miss(env, eaddr, 2, mmu_idx);
+/* fall through */
+case POWERPC_MMU_BOOKE:
+cs->exception_index = POWERPC_EXCP_ITLB;
+env->error_code = 0;
+env->spr[SPR_BOOKE_DEAR] = eaddr;
+env->spr[SPR_BOOKE_ESR] = mmubooke206_esr(mmu_idx, 
MMU_DATA_LOAD);
 break;
-case -3:
-/* No execute protection violation */
-if ((env->mmu_model == POWERPC_MMU_BOOKE) ||
-(env->mmu_model == POWERPC_MMU_BOOKE206)) {
-env->spr[SPR_BOOKE_ESR] = 0x;
-env->error_code = 0;
+case POWERPC_MMU_REAL:
+cpu_abort(cs, "PowerPC in real mode should never raise "
+  "any MMU exceptions\n");
+default:
+cpu_abort(cs, "Unknown or invalid MMU model\n");
+}
+break;
+case -2:
+/* Access rights violation */
+cs->exception_index = POWERPC_EXCP_ISI;
+if ((env->mmu_model == POWERPC_MMU_BOOKE) ||
+(env->mmu_model == POWERPC_MMU_BOOKE206)) {
+env->error_code = 0;
+} else {
+env->error_code = 0x0800;
+}
+break;
+case -3:
+/* No execute protection violation */
+if ((env->mmu_model == POWERPC_M

[PATCH v5 28/32] target/ppc/mmu-hash32.c: Drop a local variable

2024-05-09 Thread BALATON Zoltan
In ppc_hash32_xlate() the value of need_prop is checked in two places
but precalculating it does not help because when we reach the first
check we always return and not reach the second place so the value
will only be used once. We can drop the local variable and calculate
it when needed, which makes these checks using it similar to other
places with such checks.

Signed-off-by: BALATON Zoltan 
Reviewed-by: Nicholas Piggin 
---
 target/ppc/mmu-hash32.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/target/ppc/mmu-hash32.c b/target/ppc/mmu-hash32.c
index 960751a50e..b5d7aeed4e 100644
--- a/target/ppc/mmu-hash32.c
+++ b/target/ppc/mmu-hash32.c
@@ -347,7 +347,6 @@ bool ppc_hash32_xlate(PowerPCCPU *cpu, vaddr eaddr, 
MMUAccessType access_type,
 hwaddr pte_offset;
 ppc_hash_pte32_t pte;
 int prot;
-int need_prot;
 hwaddr raddr;
 
 /* There are no hash32 large pages. */
@@ -361,13 +360,11 @@ bool ppc_hash32_xlate(PowerPCCPU *cpu, vaddr eaddr, 
MMUAccessType access_type,
 return true;
 }
 
-need_prot = prot_for_access_type(access_type);
-
 /* 2. Check Block Address Translation entries (BATs) */
 if (env->nb_BATs != 0) {
 raddr = ppc_hash32_bat_lookup(cpu, eaddr, access_type, protp, mmu_idx);
 if (raddr != -1) {
-if (need_prot & ~*protp) {
+if (prot_for_access_type(access_type) & ~*protp) {
 if (guest_visible) {
 if (access_type == MMU_INST_FETCH) {
 cs->exception_index = POWERPC_EXCP_ISI;
@@ -435,7 +432,7 @@ bool ppc_hash32_xlate(PowerPCCPU *cpu, vaddr eaddr, 
MMUAccessType access_type,
 
 prot = ppc_hash32_pte_prot(mmu_idx, sr, pte);
 
-if (need_prot & ~prot) {
+if (prot_for_access_type(access_type) & ~prot) {
 /* Access right violation */
 qemu_log_mask(CPU_LOG_MMU, "PTE access rejected\n");
 if (guest_visible) {
-- 
2.30.9




[PATCH v5 22/32] target/ppc/mmu_common.c: Eliminate get_physical_address_wtlb()

2024-05-09 Thread BALATON Zoltan
Inline get_physical_address_wtlb() in its only caller and remove it.

Signed-off-by: BALATON Zoltan 
---
 target/ppc/internal.h   |  5 +---
 target/ppc/mmu_common.c | 66 +
 2 files changed, 28 insertions(+), 43 deletions(-)

diff --git a/target/ppc/internal.h b/target/ppc/internal.h
index ffdf6c075d..5b28e8f3b0 100644
--- a/target/ppc/internal.h
+++ b/target/ppc/internal.h
@@ -297,10 +297,7 @@ typedef struct mmu_ctx_t mmu_ctx_t;
 bool ppc_xlate(PowerPCCPU *cpu, vaddr eaddr, MMUAccessType access_type,
   hwaddr *raddrp, int *psizep, int *protp,
   int mmu_idx, bool guest_visible);
-int get_physical_address_wtlb(CPUPPCState *env, mmu_ctx_t *ctx,
- target_ulong eaddr,
- MMUAccessType access_type, int type,
- int mmu_idx);
+
 /* Software driven TLB helpers */
 int ppc6xx_tlb_getnum(CPUPPCState *env, target_ulong eaddr,
 int way, int is_code);
diff --git a/target/ppc/mmu_common.c b/target/ppc/mmu_common.c
index d0d49c696b..d7e4baa9cf 100644
--- a/target/ppc/mmu_common.c
+++ b/target/ppc/mmu_common.c
@@ -1072,35 +1072,6 @@ void dump_mmu(CPUPPCState *env)
 }
 }
 
-int get_physical_address_wtlb(CPUPPCState *env, mmu_ctx_t *ctx,
- target_ulong eaddr,
- MMUAccessType access_type, int type,
- int mmu_idx)
-{
-bool real_mode = (type == ACCESS_CODE) ? !FIELD_EX64(env->msr, MSR, IR)
-   : !FIELD_EX64(env->msr, MSR, DR);
-if (real_mode) {
-ctx->raddr = eaddr;
-ctx->prot = PAGE_RWX;
-return 0;
-}
-
-switch (env->mmu_model) {
-case POWERPC_MMU_SOFT_6xx:
-return mmu6xx_get_physical_address(env, ctx, eaddr, access_type, type);
-case POWERPC_MMU_SOFT_4xx:
-/* avoid maybe used uninitialized warnings for unused fields in ctx */
-memset(ctx, 0, sizeof(*ctx));
-return mmu40x_get_physical_address(env, &ctx->raddr, &ctx->prot, eaddr,
-   access_type);
-case POWERPC_MMU_REAL:
-cpu_abort(env_cpu(env),
-  "PowerPC in real mode do not do any translation\n");
-default:
-cpu_abort(env_cpu(env), "Unknown or invalid MMU model\n");
-}
-}
-
 static void booke206_update_mas_tlb_miss(CPUPPCState *env, target_ulong 
address,
  MMUAccessType access_type, int 
mmu_idx)
 {
@@ -1255,6 +1226,15 @@ static bool ppc_jumbo_xlate(PowerPCCPU *cpu, vaddr eaddr,
 int type;
 int ret;
 
+/* If in real_mode then no translation */
+if (access_type == MMU_INST_FETCH ? !FIELD_EX64(env->msr, MSR, IR)
+  : !FIELD_EX64(env->msr, MSR, DR)) {
+*raddrp = eaddr;
+*protp = PAGE_RWX;
+*psizep = TARGET_PAGE_BITS;
+return true;
+}
+
 if (access_type == MMU_INST_FETCH) {
 /* code access */
 type = ACCESS_CODE;
@@ -1265,8 +1245,22 @@ static bool ppc_jumbo_xlate(PowerPCCPU *cpu, vaddr eaddr,
 type = ACCESS_INT;
 }
 
-ret = get_physical_address_wtlb(env, &ctx, eaddr, access_type,
-type, mmu_idx);
+switch (env->mmu_model) {
+case POWERPC_MMU_SOFT_6xx:
+ret = mmu6xx_get_physical_address(env, &ctx, eaddr, access_type, type);
+break;
+case POWERPC_MMU_SOFT_4xx:
+/* avoid maybe used uninitialized warnings for unused fields in ctx */
+memset(&ctx, 0, sizeof(ctx));
+ret = mmu40x_get_physical_address(env, &ctx.raddr, &ctx.prot, eaddr,
+  access_type);
+break;
+case POWERPC_MMU_REAL:
+cpu_abort(cs, "PowerPC in real mode do not do any translation\n");
+default:
+cpu_abort(cs, "Unknown or invalid MMU model\n");
+}
+
 if (ret == 0) {
 *raddrp = ctx.raddr;
 *protp = ctx.prot;
@@ -1294,11 +1288,8 @@ static bool ppc_jumbo_xlate(PowerPCCPU *cpu, vaddr eaddr,
 env->spr[SPR_40x_DEAR] = eaddr;
 env->spr[SPR_40x_ESR] = 0x;
 break;
-case POWERPC_MMU_REAL:
-cpu_abort(cs, "PowerPC in real mode should never raise "
-  "any MMU exceptions\n");
 default:
-cpu_abort(cs, "Unknown or invalid MMU model\n");
+g_assert_not_reached();
 }
 break;
 case -2:
@@ -1350,11 +1341,8 @@ static bool ppc_jumbo_xlate(PowerPCCPU *cpu, vaddr eaddr,
 env->spr[SPR_40x_ESR] = 0x;
 }
 break;
-case POWERPC_MMU_REAL:
-cpu_abort(cs, "PowerPC in real mode should never raise "
- 

Re: [PATCH 1/3] target/riscv: Save counter values during countinhibit update

2024-05-09 Thread Atish Kumar Patra
On Thu, May 2, 2024 at 5:39 AM Andrew Jones  wrote:
>
> On Tue, Apr 30, 2024 at 03:00:45PM GMT, Daniel Henrique Barboza wrote:
> >
> >
> > On 4/29/24 16:28, Atish Patra wrote:
> > > Currently, if a counter monitoring cycle/instret is stopped via
> > > mcountinhibit we just update the state while the value is saved
> > > during the next read. This is not accurate as the read may happen
> > > many cycles after the counter is stopped. Ideally, the read should
> > > return the value saved when the counter is stopped.
> > >
> > > Thus, save the value of the counter during the inhibit update
> > > operation and return that value during the read if corresponding bit
> > > in mcountihibit is set.
> > >
> > > Signed-off-by: Atish Patra 
> > > ---
> > >   target/riscv/cpu.h |  1 -
> > >   target/riscv/csr.c | 32 
> > >   target/riscv/machine.c |  1 -
> > >   3 files changed, 20 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> > > index 3b1a02b9449a..09bbf7ce9880 100644
> > > --- a/target/riscv/cpu.h
> > > +++ b/target/riscv/cpu.h
> > > @@ -153,7 +153,6 @@ typedef struct PMUCTRState {
> > >   target_ulong mhpmcounter_prev;
> > >   /* Snapshort value of a counter in RV32 */
> > >   target_ulong mhpmcounterh_prev;
> > > -bool started;
> > >   /* Value beyond UINT32_MAX/UINT64_MAX before overflow interrupt 
> > > trigger */
> > >   target_ulong irq_overflow_left;
> > >   } PMUCTRState;
> > > diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> > > index 726096444fae..68ca31aff47d 100644
> > > --- a/target/riscv/csr.c
> > > +++ b/target/riscv/csr.c
> > > @@ -929,17 +929,11 @@ static RISCVException 
> > > riscv_pmu_read_ctr(CPURISCVState *env, target_ulong *val,
> > >   if (get_field(env->mcountinhibit, BIT(ctr_idx))) {
> > >   /*
> > > - * Counter should not increment if inhibit bit is set. We can't 
> > > really
> > > - * stop the icount counting. Just return the counter value 
> > > written by
> > > - * the supervisor to indicate that counter was not incremented.
> > > + * Counter should not increment if inhibit bit is set. Just 
> > > return the
> > > + * current counter value.
> > >*/
> > > -if (!counter->started) {
> > > -*val = ctr_val;
> > > -return RISCV_EXCP_NONE;
> > > -} else {
> > > -/* Mark that the counter has been stopped */
> > > -counter->started = false;
> > > -}
> > > + *val = ctr_val;
> > > + return RISCV_EXCP_NONE;
> > >   }
> > >   /*
> > > @@ -1973,9 +1967,23 @@ static RISCVException 
> > > write_mcountinhibit(CPURISCVState *env, int csrno,
> > >   /* Check if any other counter is also monitoring 
> > > cycles/instructions */
> > >   for (cidx = 0; cidx < RV_MAX_MHPMCOUNTERS; cidx++) {
> > > -if (!get_field(env->mcountinhibit, BIT(cidx))) {
> > >   counter = &env->pmu_ctrs[cidx];
> > > -counter->started = true;
> > > +if (get_field(env->mcountinhibit, BIT(cidx)) && (val & 
> > > BIT(cidx))) {
> > > +   /*
> > > + * Update the counter value for cycle/instret as we can't 
> > > stop the
> > > + * host ticks. But we should show the current value at this 
> > > moment.
> > > + */
> > > +if (riscv_pmu_ctr_monitor_cycles(env, cidx) ||
> > > +riscv_pmu_ctr_monitor_instructions(env, cidx)) {
> > > +counter->mhpmcounter_val = get_ticks(false) -
> > > +   counter->mhpmcounter_prev +
> > > +   counter->mhpmcounter_val;
> > > +if (riscv_cpu_mxl(env) == MXL_RV32) {
> > > +counter->mhpmcounterh_val = get_ticks(false) -
> > > +
> > > counter->mhpmcounterh_prev +
> > > +
> > > counter->mhpmcounterh_val;
> > > +   }
> > > +}
> > >   }
> > >   }
> > > diff --git a/target/riscv/machine.c b/target/riscv/machine.c
> > > index 76f2150f78b5..3e0f2dd2ce2a 100644
> > > --- a/target/riscv/machine.c
> > > +++ b/target/riscv/machine.c
> > > @@ -328,7 +328,6 @@ static const VMStateDescription vmstate_pmu_ctr_state 
> > > = {
> > >   VMSTATE_UINTTL(mhpmcounterh_val, PMUCTRState),
> > >   VMSTATE_UINTTL(mhpmcounter_prev, PMUCTRState),
> > >   VMSTATE_UINTTL(mhpmcounterh_prev, PMUCTRState),
> > > -VMSTATE_BOOL(started, PMUCTRState),
> >
> > Unfortunately we can't remove fields from the VMStateDescription without 
> > breaking
> > migration backward compatibility. Older QEMUs will attempt to read a field 
> > that
> > doesn't exist and migration will fail.
> >
> > I'm assuming that we care about backward compat. If we're not up to this 
> > point yet
> > then w

[PATCH v5 30/32] target/ppc: Add a function to check for page protection bit

2024-05-09 Thread BALATON Zoltan
Checking if a page protection bit is set for a given access type is a
common operation. Add a function to avoid repeating the same check at
multiple places. As this relies on access type and page protection bit
values having certain relation also add an assert to ensure that this
assumption holds.

Signed-off-by: BALATON Zoltan 
---
 target/ppc/cpu_init.c|  5 +
 target/ppc/internal.h| 23 +--
 target/ppc/mmu-hash32.c  |  6 +++---
 target/ppc/mmu-hash64.c  |  2 +-
 target/ppc/mmu-radix64.c |  2 +-
 target/ppc/mmu_common.c  | 26 +-
 6 files changed, 28 insertions(+), 36 deletions(-)

diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 92c71b2a09..d3b92d9f0e 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -7388,6 +7388,11 @@ static void ppc_cpu_class_init(ObjectClass *oc, void 
*data)
 #ifndef CONFIG_USER_ONLY
 cc->sysemu_ops = &ppc_sysemu_ops;
 INTERRUPT_STATS_PROVIDER_CLASS(oc)->get_statistics = ppc_get_irq_stats;
+
+/* check_prot_access_type relies on MMU access and PAGE bits relations */
+qemu_build_assert(MMU_DATA_LOAD == 0 && MMU_DATA_STORE == 1 &&
+  MMU_INST_FETCH == 2 && PAGE_READ == 1 &&
+  PAGE_WRITE == 2 && PAGE_EXEC == 4);
 #endif
 
 cc->gdb_num_core_regs = 71;
diff --git a/target/ppc/internal.h b/target/ppc/internal.h
index 46176c4711..9cdb797dc0 100644
--- a/target/ppc/internal.h
+++ b/target/ppc/internal.h
@@ -234,27 +234,14 @@ void destroy_ppc_opcodes(PowerPCCPU *cpu);
 void ppc_gdb_init(CPUState *cs, PowerPCCPUClass *ppc);
 const gchar *ppc_gdb_arch_name(CPUState *cs);
 
-/**
- * prot_for_access_type:
- * @access_type: Access type
- *
- * Return the protection bit required for the given access type.
- */
-static inline int prot_for_access_type(MMUAccessType access_type)
+#ifndef CONFIG_USER_ONLY
+
+/* Check if permission bit required for the access_type is set in prot */
+static inline int check_prot_access_type(int prot, MMUAccessType access_type)
 {
-switch (access_type) {
-case MMU_INST_FETCH:
-return PAGE_EXEC;
-case MMU_DATA_LOAD:
-return PAGE_READ;
-case MMU_DATA_STORE:
-return PAGE_WRITE;
-}
-g_assert_not_reached();
+return prot & (1 << access_type);
 }
 
-#ifndef CONFIG_USER_ONLY
-
 /* PowerPC MMU emulation */
 static inline int ppc_hash32_pp_prot(int key, int pp, int nx)
 {
diff --git a/target/ppc/mmu-hash32.c b/target/ppc/mmu-hash32.c
index b5d7aeed4e..0b54f923d0 100644
--- a/target/ppc/mmu-hash32.c
+++ b/target/ppc/mmu-hash32.c
@@ -213,7 +213,7 @@ static bool ppc_hash32_direct_store(PowerPCCPU *cpu, 
target_ulong sr,
 }
 
 *prot = key ? PAGE_READ | PAGE_WRITE : PAGE_READ;
-if (*prot & prot_for_access_type(access_type)) {
+if (check_prot_access_type(*prot, access_type)) {
 *raddr = eaddr;
 return true;
 }
@@ -364,7 +364,7 @@ bool ppc_hash32_xlate(PowerPCCPU *cpu, vaddr eaddr, 
MMUAccessType access_type,
 if (env->nb_BATs != 0) {
 raddr = ppc_hash32_bat_lookup(cpu, eaddr, access_type, protp, mmu_idx);
 if (raddr != -1) {
-if (prot_for_access_type(access_type) & ~*protp) {
+if (!check_prot_access_type(*protp, access_type)) {
 if (guest_visible) {
 if (access_type == MMU_INST_FETCH) {
 cs->exception_index = POWERPC_EXCP_ISI;
@@ -432,7 +432,7 @@ bool ppc_hash32_xlate(PowerPCCPU *cpu, vaddr eaddr, 
MMUAccessType access_type,
 
 prot = ppc_hash32_pte_prot(mmu_idx, sr, pte);
 
-if (prot_for_access_type(access_type) & ~prot) {
+if (!check_prot_access_type(prot, access_type)) {
 /* Access right violation */
 qemu_log_mask(CPU_LOG_MMU, "PTE access rejected\n");
 if (guest_visible) {
diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
index 0966422a55..d9626f6aab 100644
--- a/target/ppc/mmu-hash64.c
+++ b/target/ppc/mmu-hash64.c
@@ -1097,7 +1097,7 @@ bool ppc_hash64_xlate(PowerPCCPU *cpu, vaddr eaddr, 
MMUAccessType access_type,
 amr_prot = ppc_hash64_amr_prot(cpu, pte);
 prot = exec_prot & pp_prot & amr_prot;
 
-need_prot = prot_for_access_type(access_type);
+need_prot = check_prot_access_type(PAGE_RWX, access_type);
 if (need_prot & ~prot) {
 /* Access right violation */
 qemu_log_mask(CPU_LOG_MMU, "PTE access rejected\n");
diff --git a/target/ppc/mmu-radix64.c b/target/ppc/mmu-radix64.c
index 395ce3b782..2c5ade5cea 100644
--- a/target/ppc/mmu-radix64.c
+++ b/target/ppc/mmu-radix64.c
@@ -209,7 +209,7 @@ static bool ppc_radix64_check_prot(PowerPCCPU *cpu, 
MMUAccessType access_type,
 }
 
 /* Check if requested access type is allowed */
-if (prot_for_access_type(access_type) & ~*prot) {
+if (!check_prot_access_type(*prot, access_type)) {
 /* Page Protected for that Access */
 *fault_cause |= access_type == MMU_INST_FETCH ? SRR1_NOEXEC_GUARD :
 

[PATCH v5 12/32] target/ppc/mmu_common.c: Inline and remove check_physical()

2024-05-09 Thread BALATON Zoltan
This function just does two assignments and and unnecessary check that
is always true so inline it in the only caller left and remove it.

Signed-off-by: BALATON Zoltan 
Reviewed-by: Nicholas Piggin 
---
 target/ppc/mmu_common.c | 26 +++---
 1 file changed, 3 insertions(+), 23 deletions(-)

diff --git a/target/ppc/mmu_common.c b/target/ppc/mmu_common.c
index b13150ce23..2f412dd7c5 100644
--- a/target/ppc/mmu_common.c
+++ b/target/ppc/mmu_common.c
@@ -1145,28 +1145,6 @@ void dump_mmu(CPUPPCState *env)
 }
 }
 
-static int check_physical(CPUPPCState *env, mmu_ctx_t *ctx, target_ulong eaddr,
-  MMUAccessType access_type)
-{
-ctx->raddr = eaddr;
-ctx->prot = PAGE_READ | PAGE_EXEC;
-
-switch (env->mmu_model) {
-case POWERPC_MMU_SOFT_6xx:
-case POWERPC_MMU_SOFT_4xx:
-case POWERPC_MMU_REAL:
-case POWERPC_MMU_BOOKE:
-ctx->prot |= PAGE_WRITE;
-break;
-
-default:
-/* Caller's checks mean we should never get here for other models */
-g_assert_not_reached();
-}
-
-return 0;
-}
-
 int get_physical_address_wtlb(CPUPPCState *env, mmu_ctx_t *ctx,
  target_ulong eaddr,
  MMUAccessType access_type, int type,
@@ -1186,7 +1164,9 @@ int get_physical_address_wtlb(CPUPPCState *env, mmu_ctx_t 
*ctx,
 if (real_mode && (env->mmu_model == POWERPC_MMU_SOFT_6xx ||
   env->mmu_model == POWERPC_MMU_SOFT_4xx ||
   env->mmu_model == POWERPC_MMU_REAL)) {
-return check_physical(env, ctx, eaddr, access_type);
+ctx->raddr = eaddr;
+ctx->prot = PAGE_RWX;
+return 0;
 }
 
 switch (env->mmu_model) {
-- 
2.30.9




  1   2   >