date:20230808

Re: [PULL 15/47] include/exec/user: Set ABI_LLONG_ALIGNMENT to 4 for nios2

2023-08-08 Thread Michael Tokarev


15.07.2023 16:52, Richard Henderson wrote:

Based on gcc's nios2.h setting BIGGEST_ALIGNMENT to 32 bits.

Signed-off-by: Richard Henderson 
---
  include/exec/user/abitypes.h | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/exec/user/abitypes.h b/include/exec/user/abitypes.h
index beba0a48c7..6191ce9f74 100644
--- a/include/exec/user/abitypes.h
+++ b/include/exec/user/abitypes.h
@@ -17,7 +17,8 @@
  
  #if (defined(TARGET_I386) && !defined(TARGET_X86_64)) \

  || defined(TARGET_SH4) \
-|| defined(TARGET_MICROBLAZE)
+|| defined(TARGET_MICROBLAZE) \
+|| defined(TARGET_NIOS2)
  #define ABI_LLONG_ALIGNMENT 4
  #endif


Hi!

It smells like we should pick a few of these changes for -stable too, no?

6ee960823d Fixed incorrect LLONG alignment for openrisc and cris
ea9812d93f include/exec/user: Set ABI_LLONG_ALIGNMENT to 4 for nios2
e73f27003e include/exec/user: Set ABI_LLONG_ALIGNMENT to 4 for microblaze

Thanks,

/mjt

Re: [RFC PATCH] hw/riscv: hart: allow other cpu instance

2023-08-08 Thread Nikita Shubin

Hello Deniel!

On Mon, 2023-07-31 at 11:12 -0300, Daniel Henrique Barboza wrote:
> > 
> > 
> > On 7/27/23 05:05, Nikita Shubin wrote:
> > > > From: Nikita Shubin 
> > > > 
> > > > Allow using instances derivative from RISCVCPU
> > > > 
> > > > Signed-off-by: Nikita Shubin 
> > > > ---
> > > > Currently it is not possible to overload instance of RISCVCPU,
> > > > i.e. something like this:
> > > > 
> > > > static const TypeInfo riscv_cpu_type_infos[] = {
> > > >   {
> > > >  .name = TYPE_ANOTHER_RISCV_CPU,
> > > >  .parent = TYPE_RISCV_CPU,
> > > >  .instance_size = sizeof(MyCPUState),
> > > >  .instance_init = riscv_my_cpu_init,
> > > >  }
> > > > };
> > > > 
> > > > Because we have RISCVHartArrayState.harts with exactly
> > > > the size of RISCVCPU.
> > > > 
> > > > Using own instances can be used to store some internal hart
> > > > state.
> > > > ---
> > > >   hw/riscv/boot.c   |  5 +++--
> > > >   hw/riscv/riscv_hart.c | 20 
> > > >   hw/riscv/sifive_u.c   |  7 +--
> > > >   hw/riscv/spike.c  |  4 +++-
> > > >   hw/riscv/virt-acpi-build.c    |  2 +-
> > > >   hw/riscv/virt.c   |  6 +++---
> > > >   include/hw/riscv/riscv_hart.h | 18 +-
> > > >   7 files changed, 44 insertions(+), 18 deletions(-)
> > > > 
> > > > diff --git a/hw/riscv/boot.c b/hw/riscv/boot.c
> > > > index 52bf8e67de..c0456dcc2e 100644
> > > > --- a/hw/riscv/boot.c
> > > > +++ b/hw/riscv/boot.c
> > > > @@ -36,7 +36,8 @@
> > > >   
> > > >   bool riscv_is_32bit(RISCVHartArrayState *harts)
> > > >   {
> > > > -    return harts->harts[0].env.misa_mxl_max == MXL_RV32;
> > > > +    RISCVCPU *hart = riscv_array_get_hart(harts, 0);
> > > > +    return hart->env.misa_mxl_max == MXL_RV32;
> > > >   }
> > > >   
> > > >   /*
> > > > @@ -414,7 +415,7 @@ void riscv_setup_rom_reset_vec(MachineState
> > > > *machine, RISCVHartArrayState *harts
> > > >   reset_vec[4] = 0x0182b283;   /* ld t0, 24(t0)
> > > > */
> > > >   }
> > > >   
> > > > -    if (!harts->harts[0].cfg.ext_icsr) {
> > > > +    if (!riscv_array_get_hart(harts, 0)->cfg.ext_icsr) {
> > > >   /*
> > > >    * The Zicsr extension has been disabled, so let's
> > > > ensure
> > > > we don't
> > > >    * run the CSR instruction. Let's fill the address
> > > > with a
> > > > non
> > > > diff --git a/hw/riscv/riscv_hart.c b/hw/riscv/riscv_hart.c
> > > > index 613ea2aaa0..74fc10ef48 100644
> > > > --- a/hw/riscv/riscv_hart.c
> > > > +++ b/hw/riscv/riscv_hart.c
> > > > @@ -43,24 +43,28 @@ static void riscv_harts_cpu_reset(void
> > > > *opaque)
> > > >   }
> > > >   
> > > >   static bool riscv_hart_realize(RISCVHartArrayState *s, int
> > > > idx,
> > > > -   char *cpu_type, Error **errp)
> > > > +   char *cpu_type, size_t size,
> > > > Error
> > > > **errp)
> > > >   {
> > > > -    object_initialize_child(OBJECT(s), "harts[*]", &s-
> > > > >harts[idx],
> > > > cpu_type);
> > > > -    qdev_prop_set_uint64(DEVICE(&s->harts[idx]), "resetvec",
> > > > s->resetvec);
> > > > -    s->harts[idx].env.mhartid = s->hartid_base + idx;
> > > > -    qemu_register_reset(riscv_harts_cpu_reset, &s-
> > > > >harts[idx]);
> > > > -    return qdev_realize(DEVICE(&s->harts[idx]), NULL, errp);
> > > > +    RISCVCPU *hart = riscv_array_get_hart(s, idx);
> > > > +    object_initialize_child_internal(OBJECT(s), "harts[*]",
> > > > +    hart, size, cpu_type);
> > > > +    qdev_prop_set_uint64(DEVICE(hart), "resetvec", s-
> > > > >resetvec);
> > > > +    hart->env.mhartid = s->hartid_base + idx;
> > > > +    qemu_register_reset(riscv_harts_cpu_reset, hart);
> > > > +    return qdev_realize(DEVICE(hart), NULL, errp);
> > > >   }
> > > >   
> > > >   static void riscv_harts_realize(DeviceState *dev, Error
> > > > **errp)
> > > >   {
> > > >   RISCVHartArrayState *s = RISCV_HART_ARRAY(dev);
> > > > +    size_t size = object_type_get_instance_size(s->cpu_type);
> > > >   int n;
> > > >   
> > > > -    s->harts = g_new0(RISCVCPU, s->num_harts);
> > > > +    s->harts = g_new0(RISCVCPU *, s->num_harts);
> > > >   
> > > >   for (n = 0; n < s->num_harts; n++) {
> > > > -    if (!riscv_hart_realize(s, n, s->cpu_type, errp)) {
> > > > +    s->harts[n] = RISCV_CPU(object_new(s->cpu_type));
> > > > +    if (!riscv_hart_realize(s, n, s->cpu_type, size,
> > > > errp)) {
> > > >   return;
> > 
> > Not an issue with this patch but riscv_hart_realize() can use some
> > review. I
> > I think that we're doing stuff in the wrong place. Perhaps I'll
> > look
> > into it.

Shoudn't allocation and mhartid assignment happen earlier in
instance_init() ?

> > 
> > 
> > > >   }
> > > >   }
> > > > diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
> > > > index 35a335b8d0..b8a54db81b 100644
> > > > --- a/hw/riscv/sifive_u.c
> > > > +++ b/hw/

Re: [PATCH] block-migration: Ensure we don't crash during migration cleanup

2023-08-08 Thread Claudio Fontana

added Kevin and Hanna for block, since this seems still untouched?

Thanks,

Claudio

On 7/31/23 22:33, Fabiano Rosas wrote:
> We can fail the blk_insert_bs() at init_blk_migration(), leaving the
> BlkMigDevState without a dirty_bitmap and BlockDriverState. Account
> for the possibly missing elements when doing cleanup.
> 
> Fix the following crashes:
> 
> Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> 0x55ec83ef in bdrv_release_dirty_bitmap (bitmap=0x0) at 
> ../block/dirty-bitmap.c:359
> 359 BlockDriverState *bs = bitmap->bs;
>  #0  0x55ec83ef in bdrv_release_dirty_bitmap (bitmap=0x0) at 
> ../block/dirty-bitmap.c:359
>  #1  0x55bba331 in unset_dirty_tracking () at ../migration/block.c:371
>  #2  0x55bbad98 in block_migration_cleanup_bmds () at 
> ../migration/block.c:681
> 
> Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> 0x55e971ff in bdrv_op_unblock (bs=0x0, 
> op=BLOCK_OP_TYPE_BACKUP_SOURCE, reason=0x0) at ../block.c:7073
> 7073QLIST_FOREACH_SAFE(blocker, &bs->op_blockers[op], list, next) {
>  #0  0x55e971ff in bdrv_op_unblock (bs=0x0, 
> op=BLOCK_OP_TYPE_BACKUP_SOURCE, reason=0x0) at ../block.c:7073
>  #1  0x55e9734a in bdrv_op_unblock_all (bs=0x0, reason=0x0) at 
> ../block.c:7095
>  #2  0x55bbae13 in block_migration_cleanup_bmds () at 
> ../migration/block.c:690
> 
> Signed-off-by: Fabiano Rosas 
> ---
>  migration/block.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/block.c b/migration/block.c
> index b9580a6c7e..86c2256a2b 100644
> --- a/migration/block.c
> +++ b/migration/block.c
> @@ -368,7 +368,9 @@ static void unset_dirty_tracking(void)
>  BlkMigDevState *bmds;
>  
>  QSIMPLEQ_FOREACH(bmds, &block_mig_state.bmds_list, entry) {
> -bdrv_release_dirty_bitmap(bmds->dirty_bitmap);
> +if (bmds->dirty_bitmap) {
> +bdrv_release_dirty_bitmap(bmds->dirty_bitmap);
> +}
>  }
>  }
>  
> @@ -676,13 +678,18 @@ static int64_t get_remaining_dirty(void)
>  static void block_migration_cleanup_bmds(void)
>  {
>  BlkMigDevState *bmds;
> +BlockDriverState *bs;
>  AioContext *ctx;
>  
>  unset_dirty_tracking();
>  
>  while ((bmds = QSIMPLEQ_FIRST(&block_mig_state.bmds_list)) != NULL) {
>  QSIMPLEQ_REMOVE_HEAD(&block_mig_state.bmds_list, entry);
> -bdrv_op_unblock_all(blk_bs(bmds->blk), bmds->blocker);
> +
> +bs = blk_bs(bmds->blk);
> +if (bs) {
> +bdrv_op_unblock_all(bs, bmds->blocker);
> +}
>  error_free(bmds->blocker);
>  
>  /* Save ctx, because bmds->blk can disappear during blk_unref.  */

[PATCH for-8.2 0/3] pnv/lpc: Hook up xscoms for LPC

2023-08-08 Thread Joel Stanley

P8 used xscoms for accessing the LPC bus. In P9 the LPC bus was memory
mapped, simplifying access, however the xscom registers remained. Some
firmwares use this method on P9 and P10, so wire it up in qemu.

The third patch is a hack that shows how to test this access method with
skiboot. It should not be applied.

Joel Stanley (3):
  pnv/lpc: Place mmio regs in their own memory region
  pnv/lpc: Hook up xscom region for P9/P10
  HACK: pnv/lpc: Set up XSCOM dt for P9

 include/hw/ppc/pnv_lpc.h   |  3 ++-
 include/hw/ppc/pnv_xscom.h |  6 ++
 hw/ppc/pnv.c   |  8 ++--
 hw/ppc/pnv_lpc.c   | 13 -
 4 files changed, 26 insertions(+), 4 deletions(-)

-- 
2.40.1

[PATCH for-8.2 2/3] pnv/lpc: Hook up xscom region for P9/P10

2023-08-08 Thread Joel Stanley

>From P9 on the LPC bus is memory mapped. However the xscom access still
is possible, so add it too.

Signed-off-by: Joel Stanley 
---
 include/hw/ppc/pnv_xscom.h | 6 ++
 hw/ppc/pnv.c   | 4 
 hw/ppc/pnv_lpc.c   | 6 ++
 3 files changed, 16 insertions(+)

diff --git a/include/hw/ppc/pnv_xscom.h b/include/hw/ppc/pnv_xscom.h
index 9bc64635471e..42601bdf419d 100644
--- a/include/hw/ppc/pnv_xscom.h
+++ b/include/hw/ppc/pnv_xscom.h
@@ -96,6 +96,9 @@ struct PnvXScomInterfaceClass {
 #define PNV9_XSCOM_SBE_CTRL_BASE  0x00050008
 #define PNV9_XSCOM_SBE_CTRL_SIZE  0x1
 
+#define PNV9_XSCOM_LPC_BASE   0x00090040
+#define PNV9_XSCOM_LPC_SIZE   PNV_XSCOM_LPC_SIZE
+
 #define PNV9_XSCOM_SBE_MBOX_BASE  0x000D0050
 #define PNV9_XSCOM_SBE_MBOX_SIZE  0x16
 
@@ -155,6 +158,9 @@ struct PnvXScomInterfaceClass {
 #define PNV10_XSCOM_SBE_CTRL_BASE  PNV9_XSCOM_SBE_CTRL_BASE
 #define PNV10_XSCOM_SBE_CTRL_SIZE  PNV9_XSCOM_SBE_CTRL_SIZE
 
+#define PNV10_XSCOM_LPC_BASE   PNV9_XSCOM_LPC_BASE
+#define PNV10_XSCOM_LPC_SIZE   PNV9_XSCOM_LPC_SIZE
+
 #define PNV10_XSCOM_SBE_MBOX_BASE  PNV9_XSCOM_SBE_MBOX_BASE
 #define PNV10_XSCOM_SBE_MBOX_SIZE  PNV9_XSCOM_SBE_MBOX_SIZE
 
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index afdaa25c2b26..a5db655b41b6 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -1566,6 +1566,8 @@ static void pnv_chip_power9_realize(DeviceState *dev, 
Error **errp)
 }
 memory_region_add_subregion(get_system_memory(), PNV9_LPCM_BASE(chip),
 &chip9->lpc.mmio_regs);
+pnv_xscom_add_subregion(chip, PNV9_XSCOM_LPC_BASE,
+&chip9->lpc.xscom_regs);
 
 chip->fw_mr = &chip9->lpc.isa_fw;
 chip->dt_isa_nodename = g_strdup_printf("/lpcm-opb@%" PRIx64 "/lpc@0",
@@ -1785,6 +1787,8 @@ static void pnv_chip_power10_realize(DeviceState *dev, 
Error **errp)
 }
 memory_region_add_subregion(get_system_memory(), PNV10_LPCM_BASE(chip),
 &chip10->lpc.mmio_regs);
+pnv_xscom_add_subregion(chip, PNV10_XSCOM_LPC_BASE,
+&chip10->lpc.xscom_regs);
 
 chip->fw_mr = &chip10->lpc.isa_fw;
 chip->dt_isa_nodename = g_strdup_printf("/lpcm-opb@%" PRIx64 "/lpc@0",
diff --git a/hw/ppc/pnv_lpc.c b/hw/ppc/pnv_lpc.c
index caf5e10a5f96..6c6a3134087f 100644
--- a/hw/ppc/pnv_lpc.c
+++ b/hw/ppc/pnv_lpc.c
@@ -666,6 +666,12 @@ static void pnv_lpc_power9_realize(DeviceState *dev, Error 
**errp)
 /* P9 uses a MMIO region */
 memory_region_init_io(&lpc->mmio_regs, OBJECT(lpc), &pnv_lpc_mmio_ops,
   lpc, "lpcm", PNV9_LPCM_SIZE);
+
+/* but the XSCOM region still exists */
+pnv_xscom_region_init(&lpc->xscom_regs, OBJECT(lpc),
+  &pnv_lpc_xscom_ops, lpc, "xscom-lpc",
+  PNV_XSCOM_LPC_SIZE);
+
 }
 
 static void pnv_lpc_power9_class_init(ObjectClass *klass, void *data)
-- 
2.40.1

[PATCH for-8.2 1/3] pnv/lpc: Place mmio regs in their own memory region

2023-08-08 Thread Joel Stanley

The P9 and P10 models re-used the xscom_regs memory region for the mmio
access, which is confusing.

Add a separate memory region in preparation for enabling both xscom and
mmio access.

Signed-off-by: Joel Stanley 
---
 include/hw/ppc/pnv_lpc.h | 3 ++-
 hw/ppc/pnv.c | 4 ++--
 hw/ppc/pnv_lpc.c | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/hw/ppc/pnv_lpc.h b/include/hw/ppc/pnv_lpc.h
index 5d22c4557041..3000964f8999 100644
--- a/include/hw/ppc/pnv_lpc.h
+++ b/include/hw/ppc/pnv_lpc.h
@@ -81,8 +81,9 @@ struct PnvLpcController {
 uint32_t lpc_hc_irqstat;
 uint32_t lpc_hc_error_addr;
 
-/* XSCOM registers */
+/* Registers */
 MemoryRegion xscom_regs;
+MemoryRegion mmio_regs;
 
 /* PSI to generate interrupts */
 qemu_irq psi_irq;
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index eb54f93986df..afdaa25c2b26 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -1565,7 +1565,7 @@ static void pnv_chip_power9_realize(DeviceState *dev, 
Error **errp)
 return;
 }
 memory_region_add_subregion(get_system_memory(), PNV9_LPCM_BASE(chip),
-&chip9->lpc.xscom_regs);
+&chip9->lpc.mmio_regs);
 
 chip->fw_mr = &chip9->lpc.isa_fw;
 chip->dt_isa_nodename = g_strdup_printf("/lpcm-opb@%" PRIx64 "/lpc@0",
@@ -1784,7 +1784,7 @@ static void pnv_chip_power10_realize(DeviceState *dev, 
Error **errp)
 return;
 }
 memory_region_add_subregion(get_system_memory(), PNV10_LPCM_BASE(chip),
-&chip10->lpc.xscom_regs);
+&chip10->lpc.mmio_regs);
 
 chip->fw_mr = &chip10->lpc.isa_fw;
 chip->dt_isa_nodename = g_strdup_printf("/lpcm-opb@%" PRIx64 "/lpc@0",
diff --git a/hw/ppc/pnv_lpc.c b/hw/ppc/pnv_lpc.c
index d692858bee78..caf5e10a5f96 100644
--- a/hw/ppc/pnv_lpc.c
+++ b/hw/ppc/pnv_lpc.c
@@ -664,7 +664,7 @@ static void pnv_lpc_power9_realize(DeviceState *dev, Error 
**errp)
 }
 
 /* P9 uses a MMIO region */
-memory_region_init_io(&lpc->xscom_regs, OBJECT(lpc), &pnv_lpc_mmio_ops,
+memory_region_init_io(&lpc->mmio_regs, OBJECT(lpc), &pnv_lpc_mmio_ops,
   lpc, "lpcm", PNV9_LPCM_SIZE);
 }
 
-- 
2.40.1

[PATCH for-8.2 3/3] HACK: pnv/lpc: Set up XSCOM dt for P9

2023-08-08 Thread Joel Stanley

To test qemu's model of the xscom interface, apply this patch to qemu
and the following change to skiboot:

  --- a/hw/lpc.c
  +++ b/hw/lpc.c
  @@ -1266,7 +1266,7 @@ static void lpc_init_chip_p9(struct dt_node *opb_node)
  lpc = zalloc(sizeof(struct lpcm));
  assert(lpc);
  lpc->chip_id = gcid;
  -   lpc->mbase = (void *)addr;
  +   lpc->xbase = dt_get_address(lpc_node, 0, NULL);
  lpc->fw_idsel = 0xff;
  lpc->fw_rdsz = 0xff;
  lpc->node = lpc_node;

Signed-off-by: Joel Stanley 
---
 hw/ppc/pnv_lpc.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/ppc/pnv_lpc.c b/hw/ppc/pnv_lpc.c
index 6c6a3134087f..62ab688407a3 100644
--- a/hw/ppc/pnv_lpc.c
+++ b/hw/ppc/pnv_lpc.c
@@ -218,6 +218,11 @@ int pnv_dt_lpc(PnvChip *chip, void *fdt, int root_offset, 
uint64_t lpcm_addr,
 offset = fdt_add_subnode(fdt, lpcm_offset, name);
 _FDT(offset);
 g_free(name);
+uint32_t lpc_pcba = PNV9_XSCOM_LPC_BASE;
+reg[0] = cpu_to_be32(lpc_pcba);
+reg[1] = cpu_to_be32(PNV_XSCOM_LPC_SIZE);
+
+_FDT((fdt_setprop(fdt, offset, "reg", reg, sizeof(reg;
 _FDT((fdt_setprop_cell(fdt, offset, "#address-cells", 2)));
 _FDT((fdt_setprop_cell(fdt, offset, "#size-cells", 1)));
 _FDT((fdt_setprop(fdt, offset, "compatible", lpc_compat,
-- 
2.40.1

Re: [PATCH v2 00/12] tests: enable meson test timeouts to improve debuggability

2023-08-08 Thread Alex Bennée



Daniel P. Berrangé  writes:

> Perhaps the most painful of all the GitLab CI failures we see are
> the enforced job timeouts:
>
>"ERROR: Job failed: execution took longer than 1h15m0s seconds"
>
>https://gitlab.com/qemu-project/qemu/-/jobs/4387047648
>
> when that hits the CI log shows what has *already* run, but figuring
> out what was currently running (or rather stuck) is an horrendously
> difficult.

I had this in my tree but I see there are a number of review comments to
take into account. Will there be a v3 and do we want it this late in the
cycle?

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

[PATCH v3] target/riscv: don't read CSR in riscv_csrrw_do64

2023-08-08 Thread Nikita Shubin

From: Nikita Shubin 

As per ISA:

"For CSRRWI, if rd=x0, then the instruction shall not read the CSR and
shall not cause any of the side effects that might occur on a CSR read."

trans_csrrwi() and trans_csrrw() call do_csrw() if rd=x0, do_csrw() calls
riscv_csrrw_do64(), via helper_csrw() passing NULL as *ret_value.

Signed-off-by: Nikita Shubin 
---
Changelog v2:
- fixed uninitialized old_value

Changelog v3:
- reword comment and commit message as Deniel suggested

---
 target/riscv/csr.c | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index ea7585329e..c5564d6d53 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -3908,21 +3908,27 @@ static RISCVException riscv_csrrw_do64(CPURISCVState 
*env, int csrno,
target_ulong write_mask)
 {
 RISCVException ret;
-target_ulong old_value;
+target_ulong old_value = 0;
 
 /* execute combined read/write operation if it exists */
 if (csr_ops[csrno].op) {
 return csr_ops[csrno].op(env, csrno, ret_value, new_value, write_mask);
 }
 
-/* if no accessor exists then return failure */
-if (!csr_ops[csrno].read) {
-return RISCV_EXCP_ILLEGAL_INST;
-}
-/* read old value */
-ret = csr_ops[csrno].read(env, csrno, &old_value);
-if (ret != RISCV_EXCP_NONE) {
-return ret;
+/*
+ * ret_value == NULL means that rd=x0 and we're coming from helper_csrw()
+ * and we can't throw side effects caused by CSR reads.
+ */
+if (ret_value) {
+/* if no accessor exists then return failure */
+if (!csr_ops[csrno].read) {
+return RISCV_EXCP_ILLEGAL_INST;
+}
+/* read old value */
+ret = csr_ops[csrno].read(env, csrno, &old_value);
+if (ret != RISCV_EXCP_NONE) {
+return ret;
+}
 }
 
 /* write value if writable and write mask set, otherwise drop writes */
-- 
2.39.2

Re: [PATCH for-8.1 v10 01/14] linux-user: Adjust task_unmapped_base for reserved_va

2023-08-08 Thread Alex Bennée



Richard Henderson  writes:

> Ensure that the chosen values for mmap_next_start and
> task_unmapped_base are within the guest address space.
>
> Tested-by: Helge Deller 
> Reviewed-by: Akihiko Odaki 
> Signed-off-by: Richard Henderson 
> ---
>  linux-user/user-mmap.h | 18 +-
>  linux-user/main.c  | 28 
>  linux-user/mmap.c  | 18 +++---
>  3 files changed, 48 insertions(+), 16 deletions(-)
>
> diff --git a/linux-user/user-mmap.h b/linux-user/user-mmap.h
> index 7265c2c116..fd456e024e 100644
> --- a/linux-user/user-mmap.h
> +++ b/linux-user/user-mmap.h
> @@ -18,6 +18,23 @@
>  #ifndef LINUX_USER_USER_MMAP_H
>  #define LINUX_USER_USER_MMAP_H
>  
> +#if HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 64
> +#ifdef TARGET_AARCH64
> +# define TASK_UNMAPPED_BASE  0x55
> +#else
> +# define TASK_UNMAPPED_BASE  (1ul << 38)
> +#endif
> +#else
> +#ifdef TARGET_HPPA
> +# define TASK_UNMAPPED_BASE  0xfa00
> +#else
> +# define TASK_UNMAPPED_BASE  0x4000
> +#endif
> +#endif
> +
> +extern abi_ulong task_unmapped_base;
> +extern abi_ulong mmap_next_start;
> +
>  int target_mprotect(abi_ulong start, abi_ulong len, int prot);
>  abi_long target_mmap(abi_ulong start, abi_ulong len, int prot,
>   int flags, int fd, off_t offset);
> @@ -26,7 +43,6 @@ abi_long target_mremap(abi_ulong old_addr, abi_ulong 
> old_size,
> abi_ulong new_size, unsigned long flags,
> abi_ulong new_addr);
>  abi_long target_madvise(abi_ulong start, abi_ulong len_in, int advice);
> -extern abi_ulong mmap_next_start;
>  abi_ulong mmap_find_vma(abi_ulong, abi_ulong, abi_ulong);
>  void mmap_fork_start(void);
>  void mmap_fork_end(int child);
> diff --git a/linux-user/main.c b/linux-user/main.c
> index 556956c363..be621dc792 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -821,6 +821,34 @@ int main(int argc, char **argv, char **envp)
>  reserved_va = max_reserved_va;
>  }
>  
> +/*
> + * Temporarily disable
> + *   "comparison is always false due to limited range of data type"
> + * due to comparison between (possible) uint64_t and uintptr_t.
> + */
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Wtype-limits"
> +
> +/*
> + * Select an initial value for task_unmapped_base that is in range.
> + */
> +if (reserved_va) {
> +if (TASK_UNMAPPED_BASE < reserved_va) {
> +task_unmapped_base = TASK_UNMAPPED_BASE;
> +} else {
> +/* The most common default formula is TASK_SIZE / 3. */
> +task_unmapped_base = TARGET_PAGE_ALIGN(reserved_va / 3);
> +}
> +} else if (TASK_UNMAPPED_BASE < UINTPTR_MAX) {
> +task_unmapped_base = TASK_UNMAPPED_BASE;
> +} else {
> +/* 32-bit host: pick something medium size. */
> +task_unmapped_base = 0x1000;
> +}
> +mmap_next_start = task_unmapped_base;
> +
> +#pragma GCC diagnostic pop
> +
>  {
>  Error *err = NULL;
>  if (seed_optarg != NULL) {
> diff --git a/linux-user/mmap.c b/linux-user/mmap.c
> index eb04fab8ab..84436d45c8 100644
> --- a/linux-user/mmap.c
> +++ b/linux-user/mmap.c
> @@ -299,20 +299,8 @@ static bool mmap_frag(abi_ulong real_start, abi_ulong 
> start, abi_ulong last,
>  return true;
>  }
>  
> -#if HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 64
> -#ifdef TARGET_AARCH64
> -# define TASK_UNMAPPED_BASE  0x55
> -#else
> -# define TASK_UNMAPPED_BASE  (1ul << 38)
> -#endif
> -#else
> -#ifdef TARGET_HPPA
> -# define TASK_UNMAPPED_BASE  0xfa00
> -#else
> -# define TASK_UNMAPPED_BASE  0x4000
> -#endif
> -#endif
> -abi_ulong mmap_next_start = TASK_UNMAPPED_BASE;
> +abi_ulong task_unmapped_base;
> +abi_ulong mmap_next_start;

I feel we could help ourselves a bit more by documenting these globals
and what they mean:

  task_unmapped_base represents the start of unmapped memory in the
  guests programs address space. It is generally a function of the size
  of the address space and it defined at the start of execution.
  mmap_next_start is the base address for the next anonymous mmap and is
  increased after each successful map, starting at task_unmapped_base.

One thing I'm slightly confused by is the ELF_ET_DYN_BASE can be above
this (or sometimes the same). Should the mapping of ELF segments be
handled with mmap_next_start? I assume once mmap_next_start meets the
mappings for the ELF segments we skip over until we get to more free
space after the program code?

>  
>  /*
>   * Subroutine of mmap_find_vma, used when we have pre-allocated
> @@ -391,7 +379,7 @@ abi_ulong mmap_find_vma(abi_ulong start, abi_ulong size, 
> abi_ulong align)
>  
>  if ((addr & (align - 1)) == 0) {
>  /* Success.  */
> -if (start == mmap_next_start && addr >= TASK_UNMAPPED_BASE) {
> +if (start == mmap_next_start && addr >= task_unmapped_ba

Re: [PATCH for-8.1 v10 02/14] linux-user: Define TASK_UNMAPPED_BASE in $guest/target_mman.h

2023-08-08 Thread Alex Bennée



Richard Henderson  writes:

> Provide default values that are as close as possible to the
> values used by the guest's kernel.
>
> Tested-by: Helge Deller 
> Reviewed-by: Helge Deller 
> Reviewed-by: Akihiko Odaki 
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

[PATCH] linux-user/riscv: Use abi_ulong for target_ucontext

2023-08-08 Thread LIU Zhiwei

We should not use types dependend on host arch for target_ucontext.
This bug is found when run rv32 applications.

Signed-off-by: LIU Zhiwei 
---
 linux-user/riscv/signal.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/linux-user/riscv/signal.c b/linux-user/riscv/signal.c
index eaa168199a..ff8634a272 100644
--- a/linux-user/riscv/signal.c
+++ b/linux-user/riscv/signal.c
@@ -38,8 +38,8 @@ struct target_sigcontext {
 }; /* cf. riscv-linux:arch/riscv/include/uapi/asm/ptrace.h */
 
 struct target_ucontext {
-unsigned long uc_flags;
-struct target_ucontext *uc_link;
+abi_ulong uc_flags;
+abi_ulong uc_link;
 target_stack_t uc_stack;
 target_sigset_t uc_sigmask;
 uint8_t   __unused[1024 / 8 - sizeof(target_sigset_t)];
-- 
2.17.1

Re: [PATCH for-8.1 v10 04/14] linux-user: Use MAP_FIXED_NOREPLACE for initial image mmap

2023-08-08 Thread Alex Bennée



Richard Henderson  writes:

> Use this as extra protection for the guest mapping over
> any qemu host mappings.
>
> Tested-by: Helge Deller 
> Reviewed-by: Helge Deller 
> Reviewed-by: Akihiko Odaki 
> Signed-off-by: Richard Henderson 
> ---
>  linux-user/elfload.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/linux-user/elfload.c b/linux-user/elfload.c
> index 36e4026f05..1b4bb2d5af 100644
> --- a/linux-user/elfload.c
> +++ b/linux-user/elfload.c
> @@ -3147,8 +3147,11 @@ static void load_elf_image(const char *image_name, int 
> image_fd,
>  /*
>   * Reserve address space for all of this.
>   *
> - * In the case of ET_EXEC, we supply MAP_FIXED so that we get
> - * exactly the address range that is required.
> + * In the case of ET_EXEC, we supply MAP_FIXED_NOREPLACE so that we get
> + * exactly the address range that is required.  Without reserved_va,
> + * the guest address space is not isolated.  We have attempted to avoid
> + * conflict with the host program itself via probe_guest_base, but using
> + * MAP_FIXED_NOREPLACE instead of MAP_FIXED provides an extra check.
>   *
>   * Otherwise this is ET_DYN, and we are searching for a location
>   * that can hold the memory space required.  If the image is
> @@ -3160,7 +3163,7 @@ static void load_elf_image(const char *image_name, int 
> image_fd,
>   */
>  load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, PROT_NONE,
>  MAP_PRIVATE | MAP_ANON | MAP_NORESERVE |
> -(ehdr->e_type == ET_EXEC ? MAP_FIXED : 0),
> +(ehdr->e_type == ET_EXEC ? MAP_FIXED_NOREPLACE : 
> 0),
>  -1, 0);

We should probably also check the result == load_addr for the places
where MAP_FIXED_NOREPLACE isn't supported as we have this in osdep.h:

  #ifndef MAP_FIXED_NOREPLACE
  #define MAP_FIXED_NOREPLACE 0
  #endif

See 2667e069e7 (linux-user: don't use MAP_FIXED in pgd_find_hole_fallback)

>  if (load_addr == -1) {
>  goto exit_mmap;


-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH for-8.1 v10 05/14] linux-user: Use elf_et_dyn_base for ET_DYN with interpreter

2023-08-08 Thread Alex Bennée



Richard Henderson  writes:

> Follow the lead of the linux kernel in fs/binfmt_elf.c,
> in which an ET_DYN executable which uses an interpreter
> (usually a PIE executable) is loaded away from where the
> interpreter itself will be loaded.
>
> Tested-by: Helge Deller 
> Reviewed-by: Helge Deller 
> Reviewed-by: Akihiko Odaki 
> Signed-off-by: Richard Henderson 

> @@ -3155,13 +3178,13 @@ static void load_elf_image(const char *image_name, 
> int image_fd,
>   *
>   * Otherwise this is ET_DYN, and we are searching for a location
>   * that can hold the memory space required.  If the image is
> - * pre-linked, LOADDR will be non-zero, and the kernel should
> + * pre-linked, LOAD_ADDR will be non-zero, and the kernel should
>   * honor that address if it happens to be free.
>   *
>   * In both cases, we will overwrite pages in this range with mappings
>   * from the executable.
>   */
> -load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, PROT_NONE,
> +load_addr = target_mmap(load_addr, (size_t)hiaddr - loaddr + 1, 
> PROT_NONE,
>  MAP_PRIVATE | MAP_ANON | MAP_NORESERVE |
>  (ehdr->e_type == ET_EXEC ? MAP_FIXED_NOREPLACE : 
> 0),
>  -1, 0);

See previous comment about verifying address.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH] target/loongarch: Split fcc register to fcc0-7 in gdbstub

2023-08-08 Thread Jiajie Chen




On 2023/8/8 14:10, bibo mao wrote:

I am not familiar with gdb, is there  abi breakage?
I do not know how gdb client works with gdb server with different versions.


Not abi breakage, but gdb will complain:

warning: while parsing target description (at line 1): Target 
description specified unknown architecture "loongarch64"

warning: Could not load XML target description; ignoring
warning: No executable has been specified and target does not support
determining executable automatically.  Try using the "file" command.
Truncated register 38 in remote 'g' packet

And gdb can no longer debug kernel running in qemu. You can reproduce 
this error using latest qemu(without this patch) and gdb(13.1 or later).




Regards
Bibo Mao


在 2023/8/8 13:42, Jiajie Chen 写道:

Since GDB 13.1(GDB commit ea3352172), GDB LoongArch changed to use
fcc0-7 instead of fcc register. This commit partially reverts commit
2f149c759 (`target/loongarch: Update gdb_set_fpu() and gdb_get_fpu()`)
to match the behavior of GDB.

Note that it is a breaking change for GDB 13.0 or earlier, but it is
also required for GDB 13.1 or later to work.

Signed-off-by: Jiajie Chen 
---
  gdb-xml/loongarch-fpu.xml  |  9 -
  target/loongarch/gdbstub.c | 16 +++-
  2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/gdb-xml/loongarch-fpu.xml b/gdb-xml/loongarch-fpu.xml
index 78e42cf5dd..e81e3382e7 100644
--- a/gdb-xml/loongarch-fpu.xml
+++ b/gdb-xml/loongarch-fpu.xml
@@ -45,6 +45,13 @@



-  
+  
+  
+  
+  
+  
+  
+  
+  

  
diff --git a/target/loongarch/gdbstub.c b/target/loongarch/gdbstub.c
index 0752fff924..15ad6778f1 100644
--- a/target/loongarch/gdbstub.c
+++ b/target/loongarch/gdbstub.c
@@ -70,10 +70,9 @@ static int loongarch_gdb_get_fpu(CPULoongArchState *env,
  {
  if (0 <= n && n < 32) {
  return gdb_get_reg64(mem_buf, env->fpr[n].vreg.D(0));
-} else if (n == 32) {
-uint64_t val = read_fcc(env);
-return gdb_get_reg64(mem_buf, val);
-} else if (n == 33) {
+} else if (32 <= n && n < 40) {
+return gdb_get_reg8(mem_buf, env->cf[n - 32]);
+} else if (n == 40) {
  return gdb_get_reg32(mem_buf, env->fcsr0);
  }
  return 0;
@@ -87,11 +86,10 @@ static int loongarch_gdb_set_fpu(CPULoongArchState *env,
  if (0 <= n && n < 32) {
  env->fpr[n].vreg.D(0) = ldq_p(mem_buf);
  length = 8;
-} else if (n == 32) {
-uint64_t val = ldq_p(mem_buf);
-write_fcc(env, val);
-length = 8;
-} else if (n == 33) {
+} else if (32 <= n && n < 40) {
+env->cf[n - 32] = ldub_p(mem_buf);
+length = 1;
+} else if (n == 40) {
  env->fcsr0 = ldl_p(mem_buf);
  length = 4;
  }

Re: [PATCH RFC 1/1] tcg: Always pass the full write size to notdirty_write()

2023-08-08 Thread Ilya Leoshkevich

On Mon, 2023-08-07 at 11:21 -0700, Richard Henderson wrote:
> On 8/7/23 06:56, Ilya Leoshkevich wrote:
> > One of notdirty_write()'s responsibilities is detecting self-
> > modifying
> > code. Some functions pass the full size of a write to it, some pass
> > 1.
> > When a write to a code section begins before a TB start, but then
> > overlaps the TB, the paths that pass 1 don't flush a TB and don't
> > return to the translator loop.
> > 
> > This may be masked, one example being HELPER(vstl). There,
> > probe_write_access() ultimately calls notdirty_write() with a size
> > of
> > 1 and misses self-modifying code. However, cpu_stq_be_data_ra()
> > ultimately calls mmu_watch_or_dirty(), which in turn calls
> > notdirty_write() with the full size.
> > 
> > It's still worth improving this, because there may still be
> > user-visible adverse effects in other helpers.
> > 
> > Signed-off-by: Ilya Leoshkevich 
> 
> IIRC there are some uses of probe_access_* that set size == 0.
> Should we adjust addr+size to cover the whole page for that case?
> That seems to be the intent, anyway.

There is a comment that says we shouldn't do watchpoint/smc detection
in this case:

/* Per the interface, size == 0 merely faults the access. */
if (size == 0) {
return NULL;
}

Come to think of it, qemu-user works this way too: SMC is detected on
the actual access, not the probe:

helper_vstl()
  cpu_stq_be_data_ra()
...
   stq_he_p()
 
   host_signal_handler()
 handle_sigsegv_accerr_write()
   page_unprotect()
 tb_invalidate_phys_page_unwind()
   cpu_loop_exit_noexc()

So all this is probably fine, I now think it's better to leave the code
as is, especially given that I cannot reproduce the original problem
anymore.

Re: [PATCH 1/2] target/s390x: Use a 16-bit immediate in VREP

2023-08-08 Thread Ilya Leoshkevich

On Mon, 2023-08-07 at 20:27 +0300, Michael Tokarev wrote:
> > On 07.08.23 19:02, Ilya Leoshkevich wrote:
> ..>> None that I know of, but I thought this was still nice to have,
> and at
> > > the same time small enough to not cause any trouble.
> 
> Ilya, do you have an idea why your messages don't reach qemu-stable@?
> 
> I see e.g. David's replies to yuor messages in qemu-stable@ - this
> way
> I know you sent something, and copy that message from qemu-devel@.
> 
> For example, this thread is shown in the archive with your messages
> unavailable:
> 
> https://mail.gnu.org/archive/html/qemu-stable/2023-08/threads.html#00118

No, unfortunately I don't know. Everything looks normal on my end,
qemu-stable@ is properly Cc:ed and I don't see any typos.
I've sent a question about this to the mailing list owner.

Re: [PATCH] target/loongarch: Split fcc register to fcc0-7 in gdbstub

2023-08-08 Thread Jiajie Chen




On 2023/8/8 17:55, Jiajie Chen wrote:


On 2023/8/8 14:10, bibo mao wrote:

I am not familiar with gdb, is there  abi breakage?
I do not know how gdb client works with gdb server with different 
versions.
There seemed no versioning in the process, but rather in-code xml 
validation. In gdb, the code only allows new xml (fcc0-7) and rejects 
old one (fcc), so gdb breaks qemu first and do not consider backward 
compatibility with qemu.


Not abi breakage, but gdb will complain:

warning: while parsing target description (at line 1): Target 
description specified unknown architecture "loongarch64"

warning: Could not load XML target description; ignoring
warning: No executable has been specified and target does not support
determining executable automatically.  Try using the "file" command.
Truncated register 38 in remote 'g' packet


Sorry, to be clear, the actual error message is:

(gdb) target extended-remote localhost:1234
Remote debugging using localhost:1234
warning: Architecture rejected target-supplied description
warning: No executable has been specified and target does not support

It rejects the target description xml given by qemu, thus using the 
builtin one. However, there is a mismatch in fcc registers, so it will 
not work if we list floating point registers.


At the same time, if we are using loongarch32 target(I recently posted 
patches to support this), it will reject the target description and 
fallback to loongarch64, making gcc not usable.




And gdb can no longer debug kernel running in qemu. You can reproduce 
this error using latest qemu(without this patch) and gdb(13.1 or later).




Regards
Bibo Mao


在 2023/8/8 13:42, Jiajie Chen 写道:

Since GDB 13.1(GDB commit ea3352172), GDB LoongArch changed to use
fcc0-7 instead of fcc register. This commit partially reverts commit
2f149c759 (`target/loongarch: Update gdb_set_fpu() and gdb_get_fpu()`)
to match the behavior of GDB.

Note that it is a breaking change for GDB 13.0 or earlier, but it is
also required for GDB 13.1 or later to work.

Signed-off-by: Jiajie Chen 
---
  gdb-xml/loongarch-fpu.xml  |  9 -
  target/loongarch/gdbstub.c | 16 +++-
  2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/gdb-xml/loongarch-fpu.xml b/gdb-xml/loongarch-fpu.xml
index 78e42cf5dd..e81e3382e7 100644
--- a/gdb-xml/loongarch-fpu.xml
+++ b/gdb-xml/loongarch-fpu.xml
@@ -45,6 +45,13 @@
    
    
    
-  
+  
+  
+  
+  
+  
+  
+  
+  
    
  
diff --git a/target/loongarch/gdbstub.c b/target/loongarch/gdbstub.c
index 0752fff924..15ad6778f1 100644
--- a/target/loongarch/gdbstub.c
+++ b/target/loongarch/gdbstub.c
@@ -70,10 +70,9 @@ static int 
loongarch_gdb_get_fpu(CPULoongArchState *env,

  {
  if (0 <= n && n < 32) {
  return gdb_get_reg64(mem_buf, env->fpr[n].vreg.D(0));
-    } else if (n == 32) {
-    uint64_t val = read_fcc(env);
-    return gdb_get_reg64(mem_buf, val);
-    } else if (n == 33) {
+    } else if (32 <= n && n < 40) {
+    return gdb_get_reg8(mem_buf, env->cf[n - 32]);
+    } else if (n == 40) {
  return gdb_get_reg32(mem_buf, env->fcsr0);
  }
  return 0;
@@ -87,11 +86,10 @@ static int 
loongarch_gdb_set_fpu(CPULoongArchState *env,

  if (0 <= n && n < 32) {
  env->fpr[n].vreg.D(0) = ldq_p(mem_buf);
  length = 8;
-    } else if (n == 32) {
-    uint64_t val = ldq_p(mem_buf);
-    write_fcc(env, val);
-    length = 8;
-    } else if (n == 33) {
+    } else if (32 <= n && n < 40) {
+    env->cf[n - 32] = ldub_p(mem_buf);
+    length = 1;
+    } else if (n == 40) {
  env->fcsr0 = ldl_p(mem_buf);
  length = 4;
  }

Re: [PATCH for-8.1 v10 06/14] linux-user: Adjust initial brk when interpreter is close to executable

2023-08-08 Thread Alex Bennée



Richard Henderson  writes:

> From: Helge Deller 
>
> While we attempt to load a ET_DYN executable far away from
> TASK_UNMAPPED_BASE, we are not completely in control of the
> address space layout.  If the interpreter lands close to
> the executable, leaving insufficient heap space, move brk.
>
> Tested-by: Helge Deller 
> Signed-off-by: Helge Deller 
> [rth: Re-order after ELF_ET_DYN_BASE patch so that we do not
>  "temporarily break" tsan, and also to minimize the changes required.
>  Remove image_info.reserve_brk as unused.]
> Reviewed-by: Akihiko Odaki 
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH for-8.1 v10 07/14] linux-user: Do not adjust image mapping for host page size

2023-08-08 Thread Alex Bennée



Richard Henderson  writes:

> Remove TARGET_ELF_EXEC_PAGESIZE, and 3 other TARGET_ELF_PAGE* macros
> based off of that.  Rely on target_mmap to handle guest vs host page
> size mismatch.
>
> Tested-by: Helge Deller 
> Reviewed-by: Helge Deller 
> Reviewed-by: Akihiko Odaki 
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [RFC PATCH] hw/riscv: hart: allow other cpu instance

2023-08-08 Thread Daniel Henrique Barboza





On 8/8/23 04:56, Nikita Shubin wrote:

Hello Deniel!

On Mon, 2023-07-31 at 11:12 -0300, Daniel Henrique Barboza wrote:



On 7/27/23 05:05, Nikita Shubin wrote:

From: Nikita Shubin 

Allow using instances derivative from RISCVCPU

Signed-off-by: Nikita Shubin 
---
Currently it is not possible to overload instance of RISCVCPU,
i.e. something like this:

static const TypeInfo riscv_cpu_type_infos[] = {
   {
  .name = TYPE_ANOTHER_RISCV_CPU,
  .parent = TYPE_RISCV_CPU,
  .instance_size = sizeof(MyCPUState),
  .instance_init = riscv_my_cpu_init,
  }
};

Because we have RISCVHartArrayState.harts with exactly
the size of RISCVCPU.

Using own instances can be used to store some internal hart
state.
---
   hw/riscv/boot.c   |  5 +++--
   hw/riscv/riscv_hart.c | 20 
   hw/riscv/sifive_u.c   |  7 +--
   hw/riscv/spike.c  |  4 +++-
   hw/riscv/virt-acpi-build.c    |  2 +-
   hw/riscv/virt.c   |  6 +++---
   include/hw/riscv/riscv_hart.h | 18 +-
   7 files changed, 44 insertions(+), 18 deletions(-)

diff --git a/hw/riscv/boot.c b/hw/riscv/boot.c
index 52bf8e67de..c0456dcc2e 100644
--- a/hw/riscv/boot.c
+++ b/hw/riscv/boot.c
@@ -36,7 +36,8 @@
   
   bool riscv_is_32bit(RISCVHartArrayState *harts)

   {
-    return harts->harts[0].env.misa_mxl_max == MXL_RV32;
+    RISCVCPU *hart = riscv_array_get_hart(harts, 0);
+    return hart->env.misa_mxl_max == MXL_RV32;
   }
   
   /*

@@ -414,7 +415,7 @@ void riscv_setup_rom_reset_vec(MachineState
*machine, RISCVHartArrayState *harts
   reset_vec[4] = 0x0182b283;   /* ld t0, 24(t0)
*/
   }
   
-    if (!harts->harts[0].cfg.ext_icsr) {

+    if (!riscv_array_get_hart(harts, 0)->cfg.ext_icsr) {
   /*
    * The Zicsr extension has been disabled, so let's
ensure
we don't
    * run the CSR instruction. Let's fill the address
with a
non
diff --git a/hw/riscv/riscv_hart.c b/hw/riscv/riscv_hart.c
index 613ea2aaa0..74fc10ef48 100644
--- a/hw/riscv/riscv_hart.c
+++ b/hw/riscv/riscv_hart.c
@@ -43,24 +43,28 @@ static void riscv_harts_cpu_reset(void
*opaque)
   }
   
   static bool riscv_hart_realize(RISCVHartArrayState *s, int

idx,
-   char *cpu_type, Error **errp)
+   char *cpu_type, size_t size,
Error
**errp)
   {
-    object_initialize_child(OBJECT(s), "harts[*]", &s-

harts[idx],

cpu_type);
-    qdev_prop_set_uint64(DEVICE(&s->harts[idx]), "resetvec",
s->resetvec);
-    s->harts[idx].env.mhartid = s->hartid_base + idx;
-    qemu_register_reset(riscv_harts_cpu_reset, &s-

harts[idx]);

-    return qdev_realize(DEVICE(&s->harts[idx]), NULL, errp);
+    RISCVCPU *hart = riscv_array_get_hart(s, idx);
+    object_initialize_child_internal(OBJECT(s), "harts[*]",
+    hart, size, cpu_type);
+    qdev_prop_set_uint64(DEVICE(hart), "resetvec", s-

resetvec);

+    hart->env.mhartid = s->hartid_base + idx;
+    qemu_register_reset(riscv_harts_cpu_reset, hart);
+    return qdev_realize(DEVICE(hart), NULL, errp);
   }
   
   static void riscv_harts_realize(DeviceState *dev, Error

**errp)
   {
   RISCVHartArrayState *s = RISCV_HART_ARRAY(dev);
+    size_t size = object_type_get_instance_size(s->cpu_type);
   int n;
   
-    s->harts = g_new0(RISCVCPU, s->num_harts);

+    s->harts = g_new0(RISCVCPU *, s->num_harts);
   
   for (n = 0; n < s->num_harts; n++) {

-    if (!riscv_hart_realize(s, n, s->cpu_type, errp)) {
+    s->harts[n] = RISCV_CPU(object_new(s->cpu_type));
+    if (!riscv_hart_realize(s, n, s->cpu_type, size,
errp)) {
   return;


Not an issue with this patch but riscv_hart_realize() can use some
review. I
I think that we're doing stuff in the wrong place. Perhaps I'll
look
into it.


Shoudn't allocation and mhartid assignment happen earlier in
instance_init() ?


Usually we use instance_init() to validate properties that the device will use 
later on
during realize(). We're not doing any validation ATM so we can live without 
instance_init().

What's out of place is the way we're doing with riscv_hart_realize(). For 
starters, we're
passing a RISCVCPUHartState to it because of resetvec while at the same time 
we're passing
cpu_type as a separate parameter, but both can be derived from 
RISCVCPUHartState. There's
also object_initialize_child() which is an initializer but is called under a 
function
named realize(), which is weird. Last but not the least we already have a 
realize() function
for each hart - it's riscv_cpu_realize(), from target/riscv/cpu.c, and we're 
calling it
via "qdev_realize(DEVICE(&s->harts[idx]), NULL, errp);".

It seems to me that the code would be clearer if we get rid of 
riscv_hart_realize() and just
open code it in the body of riscv_harts_realize().








   }
   }
diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index 35a335b8d0..b8a5

Re: [PATCH] linux-user/riscv: Use abi_ulong for target_ucontext

2023-08-08 Thread Daniel Henrique Barboza





On 8/8/23 06:34, LIU Zhiwei wrote:

We should not use types dependend on host arch for target_ucontext.
This bug is found when run rv32 applications.

Signed-off-by: LIU Zhiwei 
---


Reviewed-by: Daniel Henrique Barboza 



  linux-user/riscv/signal.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/linux-user/riscv/signal.c b/linux-user/riscv/signal.c
index eaa168199a..ff8634a272 100644
--- a/linux-user/riscv/signal.c
+++ b/linux-user/riscv/signal.c
@@ -38,8 +38,8 @@ struct target_sigcontext {
  }; /* cf. riscv-linux:arch/riscv/include/uapi/asm/ptrace.h */
  
  struct target_ucontext {

-unsigned long uc_flags;
-struct target_ucontext *uc_link;
+abi_ulong uc_flags;
+abi_ulong uc_link;
  target_stack_t uc_stack;
  target_sigset_t uc_sigmask;
  uint8_t   __unused[1024 / 8 - sizeof(target_sigset_t)];

Re: [PATCH for-8.1 v10 08/14] linux-user: Do not adjust zero_bss for host page size

2023-08-08 Thread Alex Bennée



Richard Henderson  writes:

> Rely on target_mmap to handle guest vs host page size mismatch.
>
> Tested-by: Helge Deller 
> Reviewed-by: Helge Deller 
> Reviewed-by: Akihiko Odaki 
> Signed-off-by: Richard Henderson 
> ---
>  linux-user/elfload.c | 54 +++-
>  1 file changed, 23 insertions(+), 31 deletions(-)
>
> diff --git a/linux-user/elfload.c b/linux-user/elfload.c
> index 964b21f997..6c28cb70ef 100644
> --- a/linux-user/elfload.c
> +++ b/linux-user/elfload.c
> @@ -2213,44 +2213,36 @@ static abi_ulong setup_arg_pages(struct linux_binprm 
> *bprm,
>  
>  /* Map and zero the bss.  We need to explicitly zero any fractional pages
> after the data section (i.e. bss).  */
> -static void zero_bss(abi_ulong elf_bss, abi_ulong last_bss, int prot)
> +static void zero_bss(abi_ulong start_bss, abi_ulong end_bss, int prot)
>  {
> -uintptr_t host_start, host_map_start, host_end;
> +abi_ulong align_bss;
>  
> -last_bss = TARGET_PAGE_ALIGN(last_bss);
> +align_bss = TARGET_PAGE_ALIGN(start_bss);
> +end_bss = TARGET_PAGE_ALIGN(end_bss);
>  
> -/* ??? There is confusion between qemu_real_host_page_size and
> -   qemu_host_page_size here and elsewhere in target_mmap, which
> -   may lead to the end of the data section mapping from the file
> -   not being mapped.  At least there was an explicit test and
> -   comment for that here, suggesting that "the file size must
> -   be known".  The comment probably pre-dates the introduction
> -   of the fstat system call in target_mmap which does in fact
> -   find out the size.  What isn't clear is if the workaround
> -   here is still actually needed.  For now, continue with it,
> -   but merge it with the "normal" mmap that would allocate the bss.  */
> +if (start_bss < align_bss) {
> +int flags = page_get_flags(start_bss);
>  
> -host_start = (uintptr_t) g2h_untagged(elf_bss);
> -host_end = (uintptr_t) g2h_untagged(last_bss);
> -host_map_start = REAL_HOST_PAGE_ALIGN(host_start);
> -
> -if (host_map_start < host_end) {
> -void *p = mmap((void *)host_map_start, host_end - host_map_start,
> -   prot, MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> -if (p == MAP_FAILED) {
> -perror("cannot mmap brk");
> -exit(-1);
> +if (!(flags & PAGE_VALID)) {
> +/* Map the start of the bss. */
> +align_bss -= TARGET_PAGE_SIZE;
> +} else if (flags & PAGE_WRITE) {
> +/* The page is already mapped writable. */
> +memset(g2h_untagged(start_bss), 0, align_bss - start_bss);
> +} else {
> +/* Read-only zeros? */
> +g_assert_not_reached();
>  }
>  }
>  
> -/* Ensure that the bss page(s) are valid */
> -if ((page_get_flags(last_bss-1) & prot) != prot) {
> -page_set_flags(elf_bss & TARGET_PAGE_MASK, last_bss - 1,
> -   prot | PAGE_VALID);
> -}
> -
> -if (host_start < host_map_start) {
> -memset((void *)host_start, 0, host_map_start - host_start);
> +if (align_bss < end_bss) {
> +abi_long err = target_mmap(align_bss, end_bss - align_bss, prot,
> +   MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS,
> +   -1, 0);
> +if (err == -1) {
> +perror("cannot mmap brk");
> +exit(-1);

brk != bss even if brk generally comes after the bss section.

> +}
>  }
>  }


-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH] target/loongarch: Split fcc register to fcc0-7 in gdbstub

2023-08-08 Thread bibo mao

I think that it is problem of loongarch gdb, rather qemu.
If so, everytime when gdb changes register layout, qemu need modify.
There should be compatible requirements between gdb client and gdb server.

Tiezhu,

what is your opition?

Regards
Bibo Mao

在 2023/8/8 18:03, Jiajie Chen 写道:
> 
> On 2023/8/8 17:55, Jiajie Chen wrote:
>>
>> On 2023/8/8 14:10, bibo mao wrote:
>>> I am not familiar with gdb, is there  abi breakage?
>>> I do not know how gdb client works with gdb server with different versions.
> There seemed no versioning in the process, but rather in-code xml validation. 
> In gdb, the code only allows new xml (fcc0-7) and rejects old one (fcc), so 
> gdb breaks qemu first and do not consider backward compatibility with qemu.
>>
>> Not abi breakage, but gdb will complain:
>>
>> warning: while parsing target description (at line 1): Target description 
>> specified unknown architecture "loongarch64"
>> warning: Could not load XML target description; ignoring
>> warning: No executable has been specified and target does not support
>> determining executable automatically.  Try using the "file" command.
>> Truncated register 38 in remote 'g' packet
> 
> Sorry, to be clear, the actual error message is:
> 
> (gdb) target extended-remote localhost:1234
> Remote debugging using localhost:1234
> warning: Architecture rejected target-supplied description
> warning: No executable has been specified and target does not support
> 
> It rejects the target description xml given by qemu, thus using the builtin 
> one. However, there is a mismatch in fcc registers, so it will not work if we 
> list floating point registers.
> 
> At the same time, if we are using loongarch32 target(I recently posted 
> patches to support this), it will reject the target description and fallback 
> to loongarch64, making gcc not usable.
> 
>>
>> And gdb can no longer debug kernel running in qemu. You can reproduce this 
>> error using latest qemu(without this patch) and gdb(13.1 or later).
>>
>>>
>>> Regards
>>> Bibo Mao
>>>
>>>
>>> 在 2023/8/8 13:42, Jiajie Chen 写道:
 Since GDB 13.1(GDB commit ea3352172), GDB LoongArch changed to use
 fcc0-7 instead of fcc register. This commit partially reverts commit
 2f149c759 (`target/loongarch: Update gdb_set_fpu() and gdb_get_fpu()`)
 to match the behavior of GDB.

 Note that it is a breaking change for GDB 13.0 or earlier, but it is
 also required for GDB 13.1 or later to work.

 Signed-off-by: Jiajie Chen 
 ---
   gdb-xml/loongarch-fpu.xml  |  9 -
   target/loongarch/gdbstub.c | 16 +++-
   2 files changed, 15 insertions(+), 10 deletions(-)

 diff --git a/gdb-xml/loongarch-fpu.xml b/gdb-xml/loongarch-fpu.xml
 index 78e42cf5dd..e81e3382e7 100644
 --- a/gdb-xml/loongarch-fpu.xml
 +++ b/gdb-xml/loongarch-fpu.xml
 @@ -45,6 +45,13 @@
     
     
     
 -  
 +  
 +  
 +  
 +  
 +  
 +  
 +  
 +  
     
   
 diff --git a/target/loongarch/gdbstub.c b/target/loongarch/gdbstub.c
 index 0752fff924..15ad6778f1 100644
 --- a/target/loongarch/gdbstub.c
 +++ b/target/loongarch/gdbstub.c
 @@ -70,10 +70,9 @@ static int loongarch_gdb_get_fpu(CPULoongArchState *env,
   {
   if (0 <= n && n < 32) {
   return gdb_get_reg64(mem_buf, env->fpr[n].vreg.D(0));
 -    } else if (n == 32) {
 -    uint64_t val = read_fcc(env);
 -    return gdb_get_reg64(mem_buf, val);
 -    } else if (n == 33) {
 +    } else if (32 <= n && n < 40) {
 +    return gdb_get_reg8(mem_buf, env->cf[n - 32]);
 +    } else if (n == 40) {
   return gdb_get_reg32(mem_buf, env->fcsr0);
   }
   return 0;
 @@ -87,11 +86,10 @@ static int loongarch_gdb_set_fpu(CPULoongArchState 
 *env,
   if (0 <= n && n < 32) {
   env->fpr[n].vreg.D(0) = ldq_p(mem_buf);
   length = 8;
 -    } else if (n == 32) {
 -    uint64_t val = ldq_p(mem_buf);
 -    write_fcc(env, val);
 -    length = 8;
 -    } else if (n == 33) {
 +    } else if (32 <= n && n < 40) {
 +    env->cf[n - 32] = ldub_p(mem_buf);
 +    length = 1;
 +    } else if (n == 40) {
   env->fcsr0 = ldl_p(mem_buf);
   length = 4;
   }

Re: [PATCH for-8.1 v10 09/14] linux-user: Use zero_bss for PT_LOAD with no file contents too

2023-08-08 Thread Alex Bennée



Richard Henderson  writes:

> If p_filesz == 0, then vaddr_ef == vaddr.  We can reuse the
> code in zero_bss rather than incompletely duplicating it in
> load_elf_image.
>
> Tested-by: Helge Deller 
> Reviewed-by: Helge Deller 
> Reviewed-by: Akihiko Odaki 
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH for-8.1 v10 11/14] linux-user: Remove duplicate CPU_LOG_PAGE from probe_guest_base

2023-08-08 Thread Alex Bennée



Richard Henderson  writes:

> The proper logging for probe_guest_base is in the main function.
> There is no need to duplicate that in the subroutines.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH for-8.1 v10 12/14] linux-user: Consolidate guest bounds check in probe_guest_base

2023-08-08 Thread Alex Bennée



Richard Henderson  writes:

> The three sets of checks are identical, logically.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH v3 00/17] Support smp.clusters for x86

2023-08-08 Thread Jonathan Cameron via

On Fri, 4 Aug 2023 21:17:59 +0800
Zhao Liu  wrote:

> Hi Jonathan,
> 
> On Tue, Aug 01, 2023 at 04:35:27PM +0100, Jonathan Cameron via wrote:
> > >   
> 
> [snip]
> 
> > > 
> > > ## New property: x-l2-cache-topo
> > > 
> > > The property l2-cache-topo will be used to change the L2 cache topology
> > > in CPUID.04H.
> > > 
> > > Now it allows user to set the L2 cache is shared in core level or
> > > cluster level.
> > > 
> > > If user passes "-cpu x-l2-cache-topo=[core|cluster]" then older L2 cache
> > > topology will be overrided by the new topology setting.
> > > 
> > > Since CPUID.04H is used by intel cpus, this property is available on
> > > intel cpus as for now.
> > > 
> > > When necessary, it can be extended to CPUID[0x801D] for amd cpus.  
> > 
> > Hi Zhao Liu,
> > 
> > As part of emulating arm's MPAM (cache partitioning controls) I needed
> > to add the missing cache description in the ACPI PPTT table. As such I ran
> > into a very similar problem to the one you are addressing.  
> 
> May I ask if the cache topology you need is symmetric or heterogeneous?
> 
> I had the discussion with Yanan [5] about heterogeneous cache. If you
> need a "symmetric" cache topology, maybe we could consider trying make
> this x-l2-cache-topo more generic.

For now, I'm interested in symmetric, but heterogeneous is certain
to pop up at some point as people will build MPAM mobile systems.
Right now there needs to be a lot of other work to emulate those well
in QEMU given the cores are likely to be quite different.

> 
> But if you need a heterogeneous cache topo, e.g., some cores have its
> own l2 cache, and other cores share the same l2 cache, only this command
> is not enough.
> 
> Intel hybrid platforms have the above case I mentioned, we used "hybrid
> CPU topology" [6] + "x-l2-cache-topo=cluster" to solve this:
> 
> For example, AdlerLake has 2 types of core, one type is P-core with L2 per
> P-core, and another type is E-core that 4 E-cores share a L2.
> 
> So we set a CPU topology like this:
> 
> Set 2 kinds of clusters:
> * 1 P-core is in a cluster.
> * 4 E-cores in a cluster.
> 
> Then we use "x-l2-cache-topo" to make l2 is shared at cluster. In this
> way, a P-core could own a L2 because its cluster only has 1 P-core, and
> 4 E-cores could could share a L2.

That can work if you aren't using clusters to describe other elements of
the topology.  We originally added PPTT based cluster support to Linux
to enable sharing of l3 tags (but not l3 data) among a cluster of
CPUs. So we'd need it for some of our platforms independent of this
particular aspect.  Sometimes the cluster will have associated caches
sometimes it won't and they will be at a different level.  PPTT represents
that cache topology well but is complex.

> 
> For more general way to set cache topology, Yanan and me discussed 2
> ways ([7] [8]). [8] depends on the QOM CPU topology mechanism that I'm
> working on.
> 
> [5]: https://mail.gnu.org/archive/html/qemu-devel/2023-02/msg04795.html
> [6]: https://mail.gnu.org/archive/html/qemu-devel/2023-02/msg03205.html
> [7]: https://mail.gnu.org/archive/html/qemu-devel/2023-02/msg05139.html
> [8]: https://mail.gnu.org/archive/html/qemu-devel/2023-02/msg05167.html

Great. I'll have a read!

> 
> > 
> > I wonder if a more generic description is possible? We can rely on ordering
> > of the cache levels, so what I was planning to propose was the rather 
> > lengthy
> > but flexible (and with better names ;)
> > 
> > -smp 
> > 16,sockets=1,clusters=4,threads=2,cache-cluster-start-level=2,cache-node-start-level=3
> >   
> 
> Could you explain more about this command?

I'll cc you on the RFC patch set in a few mins.

> I don't understand what 
> "cache-cluster-start-level=2,cache-node-start-level=3" mean.

Assume hierarchical max Nth level cache. 
Cache levels 1 to (cache-cluster-start-level - 1) are private to a core (shared 
by threads).
Cache levels cache-cluster-start-level to (cache-node-start-level - 1) are 
shared at cluster level.
Cache levels cache cach-node-start-level to N are at the Numa node level which 
may or may not
be the physical package.

It very much assumes non heterogeneous though which I don't like about this 
scheme.

> 
> > 
> > Perhaps we can come up with a common scheme that covers both usecases?
> > It gets more fiddly to define if we have variable topology across different 
> > clusters
> > - and that was going to be an open question in the RFC proposing this - our 
> > current
> > definition of the more basic topology doesn't cover those cases anyway.
> > 
> > What I want:
> > 
> > 1) No restriction on maximum cache levels - ...  
> 
> Hmmm, if there's no cache name, it would be difficult to define in cli.

Define by level number rather than name.

> 
> > ... some systems have more than 3  
> 
> What about L4? A name can simplify a lot of issues.

True but if we can make it take a number as a parameter it extends to
any level.

> 
> > 2) Easy ability to express everything from all

[PATCH for-8.1] linux-user: Define real MAP_FIXED_NOREPLACE value

2023-08-08 Thread Akihiko Odaki

do_brk() assumes target_mmap() emulates MAP_FIXED_NOREPLACE even when
the host does not support it. However, such emulation is not possible
if MAP_FIXED_NOREPLACE is defined as zero.

Define MAP_FIXED_NOREPLACE with the real value instead of zero if it is
not defined.

Fixes: e69e032d1a ("linux-user: Use MAP_FIXED_NOREPLACE for do_brk()")
Signed-off-by: Akihiko Odaki 
---
 include/qemu/osdep.h | 8 ++--
 linux-user/elfload.c | 1 -
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index cc61b00ba9..1aac17ec2f 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -289,8 +289,12 @@ void QEMU_ERROR("code path is reachable")
 #ifndef MAP_ANONYMOUS
 #define MAP_ANONYMOUS MAP_ANON
 #endif
-#ifndef MAP_FIXED_NOREPLACE
-#define MAP_FIXED_NOREPLACE 0
+#if defined(__linux__) && !defined(MAP_FIXED_NOREPLACE)
+#if HOST_ALPHA
+#define MAP_FIXED_NOREPLACE 0x20
+#else
+#define MAP_FIXED_NOREPLACE 0x10
+#endif
 #endif
 #ifndef MAP_NORESERVE
 #define MAP_NORESERVE 0
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 36e4026f05..9d9c79a653 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -2807,7 +2807,6 @@ static void pgb_reserved_va(const char *image_name, 
abi_ulong guest_loaddr,
 /* Widen the "image" to the entire reserved address space. */
 pgb_static(image_name, 0, reserved_va, align);
 
-/* osdep.h defines this as 0 if it's missing */
 flags |= MAP_FIXED_NOREPLACE;
 
 /* Reserve the memory on the host. */
-- 
2.41.0

[RFC PATCH 0/5] hw/arm: MPAM Emulation + PPTT cache description.

2023-08-08 Thread Jonathan Cameron via

Aim of this bit of emulation is to use it for testing James Morse's
kernel tree - in particularly letting us poke the corner cases.
Right now I'm not that focused on upstreaming this (too many other things
in my backlog), but any feedback on the approach etc welcome and perhaps
the PPTT part is useful independent of MPAM support.

Current kernel branch (one outstanding bug reported but that's hard to hit
and requires setting the narrowing target number of IDs to 1 which is bonkers):
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/log/?h=mpam/snapshot/v6.5-rc1

Supported:
* PPTT cache description - this is necessary for the cross references MPAM
  table entries use to establish which Cache any given control set influences.
  I included option for generating shared tables which were a common choice
  prior to MPAM needing those cross references.
* CPU emulation for MPAM. Given we aren't doing anything with the content
  this is just a case of adding the MPAM_IDR register and read/write registers
  to control current PARTID / PMG group.
* MPAM MSC emulation for caches and memory controllers.
  Multiple RIS support allows up to 16 such elements to be controlled via
  a single interface (used only for memory currently.
  Most controls wired up, though introspection interface and sanity checks
  only cover some of them so far. No monitoring yet.
* ACPI tables and device instantiation in ARM Virt. ACPI only because the
  kernel patches clearly state the DT binding is a WIP.
* A hack to add lots of caches to the MAX cpu via the relevant CPU registers
  - these are read back to generate the PPTT Table and MPAM devices.

TODO:
- Dealing with case of no NUMA nodes. Currently we don't start if NUMA
  nodes aren't specified and mpam=on.  Defaulting to a single NUMA
  node if MPAM is enabled may make more sense.
- Error injection / reporting on invalid parameters.
- Monitor support.
- Wire up the interrupts properly.
- Tighten checks on unexpected values to further help with catching
  bugs in kernel code (a few already found and fixed by James).
- ACPI table test (yeah I'm lazy).
- Remove remaining 'fixed' constraints on number of partitions etc
  so they can be different across controllers / different levels
  of the hierarchy.
- Expand the qmp introspection interface to cover the missing parts.

Example command line (who doesn't love SMT arm machines?):
aarch64-softmmu/qemu-system-aarch64 -D test.log -d unimp \
 -M virt,nvdimm=on,gic-version=3,mpam=on,mpam_min_msc=on \
 -m 4g -cpu max,core-count=2 \
 -smp 
16,sockets=1,clusters=4,threads=2,cache-cluster-start-level=2,cache-node-start-level=3
 \
 -kernel Image \
 -drive if=none,file=full.qcow2,format=qcow2,id=hd \
 -device pcie-root-port,id=root_port1 -device virtio-blk-pci,drive=hd \
 -qmp-pretty tcp:localhost:4445,server=on,wait=off \
...
 -nographic -no-reboot -append 'earlycon root=/dev/vda2' \
 -bios QEMU_EFI.fd \
 -object memory-backend-ram,size=1G,id=mem0 \
 -object memory-backend-ram,size=1G,id=mem1 \
 -object memory-backend-ram,size=1G,id=mem2 \
 -object memory-backend-ram,size=1G,id=mem3 \
 -numa node,nodeid=0,cpus=0-3,memdev=mem0 \
 -numa node,nodeid=1,cpus=4-7,memdev=mem1 \
 -numa node,nodeid=2,cpus=8-11,memdev=mem2 \
 -numa node,nodeid=3,cpus=12-15,memdev=mem3
 
QMP comamnds:

{ "execute": "qmp_capabilities" }
{ "execute": "query-mpam-cache",
  "arguments": {
"level": 3
  }
}

Will return something like (reformatted as the pretty version is 'long')
An 'ideal' version of this interface will take some more thought as it
needs to balance readability and clarity with complex implementation of
the code to 'interpret' the register values.

{
"return": [
{
"cpu": 0,
"level": 3,
"regs": [
{
"mbwumon-idr": 0,
"idr": 758514712831,
"cfg-cpbm": [
{ "words": [ 4294967295 ] },
{ "words": [ 0 ] },
{ "words": [ 0 ] },
{ "words": [ 0 ] }, 
{ "words": [ 0 ] },

{ "words": [ 0 ] }
],
"partid-nrw-idr": 31,
"mbw-idr": 0,
"csumon-idr": 0,
"esr": 0,
"ecr": 1,
"cfg-part-sel": 0,
"iidr": 44042038,
"cpor-idr": 32,
"msmon-idr": 0,
"ccap-idr": 2952791044,
"aidr": 17,
"pri-idr": 35
}
],
"type": 3
}
]
}

Jonathan Cameron (5):
  hw/acpi: Add PPTT cache descriptions
  HACK: target/arm/tcg: Add some more caches to cpu=max
  target/arm: Add support for MPAM CPU registers
  hw/arm: Add MPAM emulation.
  hw/arm/virt: Add MPAM MSCs for memory controllers and caches.

 qap

[RFC PATCH 1/5] hw/acpi: Add PPTT cache descriptions

2023-08-08 Thread Jonathan Cameron via

Current PPTT tables generated by QEMU only provide information on CPU
topology and neglect the description of Caches.

This patch adds flexible definition of those caches and updates the
table version to 3 to allow for the per CPU cache instance IDs needed
for cross references from the MPAM table.

If MPAM is not being used, then a unified description can be used,
greatly reducing the resulting table size.

New machine parameters are used to control the cache toplogy.
cache-cluster-start-level: Which caches are associated with the cluster
  level of the topology. e.g cache-cluster-start-level=2 results in shared
  l2 cache across a cluster.
cache-numa-start-level: Which caches are associate with the NUMA (in qemu
  this is currently the physical package level).  For example
  cache-cluster-start-level=2,cache-numa-start-level=3 gives
  private l1, cluster shared l2 and package shared L3.

FIXME: Test updates.

Signed-off-by: Jonathan Cameron 
---
 qapi/machine.json   |   8 +-
 include/hw/acpi/aml-build.h |  19 +++-
 include/hw/boards.h |   4 +
 hw/acpi/aml-build.c | 189 ++--
 hw/arm/virt-acpi-build.c| 130 -
 hw/core/machine-smp.c   |   8 ++
 hw/loongarch/acpi-build.c   |   2 +-
 7 files changed, 350 insertions(+), 10 deletions(-)

diff --git a/qapi/machine.json b/qapi/machine.json
index a08b6576ca..cc86784641 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -1494,6 +1494,10 @@
 # @maxcpus: maximum number of hotpluggable virtual CPUs in the virtual
 # machine
 #
+# @cache-cluster-start-level: Level of first cache attached to cluster
+#
+# @cache-node-start-level: Level of first cache attached to cluster
+#
 # Since: 6.1
 ##
 { 'struct': 'SMPConfiguration', 'data': {
@@ -1503,7 +1507,9 @@
  '*clusters': 'int',
  '*cores': 'int',
  '*threads': 'int',
- '*maxcpus': 'int' } }
+ '*maxcpus': 'int',
+ '*cache-cluster-start-level': 'int',
+ '*cache-node-start-level': 'int'} }
 
 ##
 # @x-query-irq:
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index d1fb08514b..055b74820d 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -489,8 +489,25 @@ void build_srat_memory(GArray *table_data, uint64_t base,
 void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms,
 const char *oem_id, const char *oem_table_id);
 
+typedef enum ACPIPPTTCacheType {
+DATA,
+INSTRUCTION,
+UNIFIED,
+} ACPIPPTTCacheType;
+
+typedef struct ACPIPPTTCache {
+ACPIPPTTCacheType type;
+int sets;
+int size;
+int associativity;
+int linesize;
+unsigned int pptt_id;
+int level;
+} ACPIPPTTCache;
+
 void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms,
-const char *oem_id, const char *oem_table_id);
+const char *oem_id, const char *oem_table_id,
+int num_caches, ACPIPPTTCache *caches);
 
 void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
 const char *oem_id, const char *oem_table_id);
diff --git a/include/hw/boards.h b/include/hw/boards.h
index ed83360198..6e8ab92684 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -316,6 +316,8 @@ typedef struct DeviceMemoryState {
  * @cores: the number of cores in one cluster
  * @threads: the number of threads in one core
  * @max_cpus: the maximum number of logical processors on the machine
+ * @cache_cluster_start_level: First cache level attached to cluster
+ * @cache_node_start_level: First cache level attached to node
  */
 typedef struct CpuTopology {
 unsigned int cpus;
@@ -325,6 +327,8 @@ typedef struct CpuTopology {
 unsigned int cores;
 unsigned int threads;
 unsigned int max_cpus;
+unsigned int cache_cluster_start_level;
+unsigned int cache_node_start_level;
 } CpuTopology;
 
 /**
diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index ea331a20d1..e103cd638f 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1994,32 +1994,175 @@ static void build_processor_hierarchy_node(GArray 
*tbl, uint32_t flags,
 }
 }
 
+static void build_cache_nodes(GArray *tbl, ACPIPPTTCache *cache,
+  uint32_t next_offset,
+  bool has_id, unsigned int id)
+{
+int val;
+
+/* Type 1 - cache */
+build_append_byte(tbl, 1);
+/* Length */
+build_append_byte(tbl, 28);
+/* Reserved */
+build_append_int_noprefix(tbl, 0, 2);
+/* Flags - everything except possibly the ID */
+build_append_int_noprefix(tbl, has_id ? 0xff : 0x7f, 4);
+/* Offset of next cache up */
+build_append_int_noprefix(tbl, next_offset, 4);
+build_append_int_noprefix(tbl, cache->size, 4);
+build_append_int_noprefix(tbl, cache->sets, 4);
+build_append_byte(tbl, cache->associativity);
+/* Read and Write allocate amd WB */
+val = 0x3 | (1 << 4);
+sw

Re: [PATCH for-8.1 v10 04/14] linux-user: Use MAP_FIXED_NOREPLACE for initial image mmap

2023-08-08 Thread Akihiko Odaki


On 2023/08/08 18:43, Alex Bennée wrote:


Richard Henderson  writes:


Use this as extra protection for the guest mapping over
any qemu host mappings.

Tested-by: Helge Deller 
Reviewed-by: Helge Deller 
Reviewed-by: Akihiko Odaki 
Signed-off-by: Richard Henderson 
---
  linux-user/elfload.c | 9 ++---
  1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 36e4026f05..1b4bb2d5af 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -3147,8 +3147,11 @@ static void load_elf_image(const char *image_name, int 
image_fd,
  /*
   * Reserve address space for all of this.
   *
- * In the case of ET_EXEC, we supply MAP_FIXED so that we get
- * exactly the address range that is required.
+ * In the case of ET_EXEC, we supply MAP_FIXED_NOREPLACE so that we get
+ * exactly the address range that is required.  Without reserved_va,
+ * the guest address space is not isolated.  We have attempted to avoid
+ * conflict with the host program itself via probe_guest_base, but using
+ * MAP_FIXED_NOREPLACE instead of MAP_FIXED provides an extra check.
   *
   * Otherwise this is ET_DYN, and we are searching for a location
   * that can hold the memory space required.  If the image is
@@ -3160,7 +3163,7 @@ static void load_elf_image(const char *image_name, int 
image_fd,
   */
  load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, PROT_NONE,
  MAP_PRIVATE | MAP_ANON | MAP_NORESERVE |
-(ehdr->e_type == ET_EXEC ? MAP_FIXED : 0),
+(ehdr->e_type == ET_EXEC ? MAP_FIXED_NOREPLACE : 
0),
  -1, 0);


We should probably also check the result == load_addr for the places
where MAP_FIXED_NOREPLACE isn't supported as we have this in osdep.h:

   #ifndef MAP_FIXED_NOREPLACE
   #define MAP_FIXED_NOREPLACE 0
   #endif

See 2667e069e7 (linux-user: don't use MAP_FIXED in pgd_find_hole_fallback)


It assumes target_mmap() emulates MAP_FIXED_NOREPLACE when the host does 
not support it as commit e69e032d1a ("linux-user: Use 
MAP_FIXED_NOREPLACE for do_brk()") already does, but defining 
MAP_FIXED_NOREPLACE zero breaks such emulation. I wrote a fix:

https://patchew.org/QEMU/20230808115242.73025-1-akihiko.od...@daynix.com/




  if (load_addr == -1) {
  goto exit_mmap;

[RFC PATCH 2/5] HACK: target/arm/tcg: Add some more caches to cpu=max

2023-08-08 Thread Jonathan Cameron via

Used to drive the MPAM cache intialization and to exercise more
of the PPTT cache entry generation code. Perhaps a default
L3 cache is acceptable for max?

Signed-off-by: Jonathan Cameron 
---
 target/arm/tcg/cpu64.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
index 8019f00bc3..2af67739f6 100644
--- a/target/arm/tcg/cpu64.c
+++ b/target/arm/tcg/cpu64.c
@@ -711,6 +711,17 @@ void aarch64_max_tcg_initfn(Object *obj)
 uint64_t t;
 uint32_t u;
 
+/*
+ * Expanded cache set
+ */
+cpu->clidr = 0x8204923; /* 4 4 4 4 3 in 3 bit fields */
+cpu->ccsidr[0] = 0x00ff001aull; /* 64KB L1 dcache */
+cpu->ccsidr[1] = 0x00ff001aull; /* 64KB L1 icache */
+cpu->ccsidr[2] = 0x07ff003aull; /* 1MB L2 unified cache */
+cpu->ccsidr[4] = 0x07ff007cull; /* 2MB L3 cache 128B line */
+cpu->ccsidr[6] = 0x7fff007cull; /* 16MB L4 cache 128B line */
+cpu->ccsidr[8] = 0x0007007cull; /* 2048MB L5 cache 128B line */
+
 /*
  * Reset MIDR so the guest doesn't mistake our 'max' CPU type for a real
  * one and try to apply errata workarounds or use impdef features we
@@ -828,6 +839,7 @@ void aarch64_max_tcg_initfn(Object *obj)
 t = FIELD_DP64(t, ID_AA64MMFR2, BBM, 2);  /* FEAT_BBM at level 2 */
 t = FIELD_DP64(t, ID_AA64MMFR2, EVT, 2);  /* FEAT_EVT */
 t = FIELD_DP64(t, ID_AA64MMFR2, E0PD, 1); /* FEAT_E0PD */
+t = FIELD_DP64(t, ID_AA64MMFR2, CCIDX, 1);  /* FEAT_TTCNP */
 cpu->isar.id_aa64mmfr2 = t;
 
 t = cpu->isar.id_aa64zfr0;
-- 
2.39.2

[RFC PATCH 3/5] target/arm: Add support for MPAM CPU registers

2023-08-08 Thread Jonathan Cameron via

It is common to support MPAM on CPU cores, but not in the rest
of the system, so there is little disadvantage in always enabling
these.

Signed-off-by: Jonathan Cameron 
---
 target/arm/cpu.h| 15 +++
 target/arm/cpu.c| 10 +-
 target/arm/helper.c | 30 ++
 3 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 88e5accda6..8d28e22291 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -735,6 +735,10 @@ typedef struct CPUArchState {
  * to keep the offsets into the rest of the structure smaller.
  */
 ARMVectorReg zarray[ARM_MAX_VQ * 16];
+
+uint64_t mpam0_el1;
+uint64_t mpam1_el1;
+
 #endif
 
 struct CPUBreakpoint *cpu_breakpoint[16];
@@ -1043,6 +1047,7 @@ struct ArchCPU {
 uint64_t id_aa64zfr0;
 uint64_t id_aa64smfr0;
 uint64_t reset_pmcr_el0;
+uint64_t mpamidr_el1;
 } isar;
 uint64_t midr;
 uint32_t revidr;
@@ -2327,6 +2332,16 @@ FIELD(DBGDEVID, DOUBLELOCK, 20, 4)
 FIELD(DBGDEVID, AUXREGS, 24, 4)
 FIELD(DBGDEVID, CIDMASK, 28, 4)
 
+FIELD(MPAMIDR, PARTID_MAX, 0, 16)
+FIELD(MPAMIDR, HAS_HCR, 17, 1)
+FIELD(MPAMIDR, VMR_MAX, 18, 3)
+FIELD(MPAMIDR, PMG_MAX, 32, 8)
+FIELD(MPAMIDR, HAS_ALTSP, 57, 1)
+FIELD(MPAMIDR, HAS_TIDR, 58, 1)
+FIELD(MPAMIDR, SP4, 59, 1)
+FIELD(MPAMIDR, HAS_FORCE_NS, 60, 1)
+FIELD(MPAMIDR, HAS_SDEFLT, 61, 1)
+
 FIELD(MVFR0, SIMDREG, 0, 4)
 FIELD(MVFR0, FPSP, 4, 4)
 FIELD(MVFR0, FPDP, 8, 4)
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 93c28d50e5..d85a3ec8a2 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -305,6 +305,9 @@ static void arm_cpu_reset_hold(Object *obj)
 env->cp15.rvbar = cpu->rvbar_prop;
 env->pc = env->cp15.rvbar;
 #endif
+
+env->mpam1_el1 = 1ULL << 63;
+
 } else {
 #if defined(CONFIG_USER_ONLY)
 /* Userspace expects access to cp10 and cp11 for FP/Neon */
@@ -2097,7 +2100,12 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
**errp)
 FIELD_DP32(cpu->isar.id_pfr0, ID_PFR0, AMU, 0);
 /* FEAT_MPAM (Memory Partitioning and Monitoring Extension) */
 cpu->isar.id_aa64pfr0 =
-FIELD_DP64(cpu->isar.id_aa64pfr0, ID_AA64PFR0, MPAM, 0);
+FIELD_DP64(cpu->isar.id_aa64pfr0, ID_AA64PFR0, MPAM, 1);
+cpu->isar.mpamidr_el1 =
+FIELD_DP64(cpu->isar.mpamidr_el1, MPAMIDR, PARTID_MAX, 63);
+cpu->isar.mpamidr_el1 =
+FIELD_DP64(cpu->isar.mpamidr_el1, MPAMIDR, PMG_MAX, 3);
+
 /* FEAT_NV (Nested Virtualization) */
 cpu->isar.id_aa64mmfr2 =
 FIELD_DP64(cpu->isar.id_aa64mmfr2, ID_AA64MMFR2, NV, 0);
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 50f61e42ca..dbeb8d9fa6 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -8072,7 +8072,17 @@ static const ARMCPRegInfo actlr2_hactlr2_reginfo[] = {
   .access = PL2_RW, .type = ARM_CP_CONST,
   .resetvalue = 0 },
 };
+/*
+static uint64_t mpam_el1_read(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+return 0;
+}
 
+static void mpam_el1_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t 
val)
+{
+return;
+}
+*/
 void register_cp_regs_for_features(ARMCPU *cpu)
 {
 /* Register all the coprocessor registers based on feature bits */
@@ -8404,6 +8414,26 @@ void register_cp_regs_for_features(ARMCPU *cpu)
   .access = PL1_R, .type = ARM_CP_CONST,
   .accessfn = access_aa64_tid3,
   .resetvalue = 0 },
+
+/* Should be separate feature */
+{ .name = "MPAMIDR_EL1", .state = ARM_CP_STATE_AA64,
+  .opc0 = 3, .opc1 = 0, .crn = 0xa, .crm = 4, .opc2 = 4,
+  .access = PL1_R, .type = ARM_CP_CONST,
+  .accessfn = access_aa64_tid3,
+  .resetvalue = cpu->isar.mpamidr_el1 },
+/* TODO: check the accessfn and whether we need a reset value for 
these */
+{ .name = "MPAM0_EL1", .state = ARM_CP_STATE_AA64,
+  .opc0 = 3, .opc1 = 0, .crn = 0xa, .crm = 5, .opc2 = 1,
+  .access = PL1_RW, .type = ARM_CP_ALIAS,
+  .accessfn = access_aa64_tid3,
+  .fieldoffset = offsetof(CPUARMState, mpam0_el1),
+},
+{ .name = "MPAM1_EL1", .state = ARM_CP_STATE_AA64,
+  .opc0 = 3, .opc1 = 0, .crn = 0xa, .crm = 5, .opc2 = 0,
+  .access = PL1_RW, .type = ARM_CP_ALIAS,
+  .accessfn = access_aa64_tid3,
+  .fieldoffset = offsetof(CPUARMState, mpam1_el1),
+},
 { .name = "MVFR0_EL1", .state = ARM_CP_STATE_AA64,
   .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 3, .opc2 = 0,
   .access = PL1_R, .type = ARM_CP_CONST,
-- 
2.39.2

[RFC PATCH 4/5] hw/arm: Add MPAM emulation.

2023-08-08 Thread Jonathan Cameron via

Note this doesn't 'do' anything other than provide an introspection
interface.  The intent here is to support work on the Linux kernel support
and for that a functional emulation of the interface is useful.

Signed-off-by: Jonathan Cameron 
---
 qapi/mpam.json   |  78 
 qapi/qapi-schema.json|   1 +
 include/hw/arm/mpam.h|  12 +
 include/hw/arm/virt.h|   2 +
 hw/arm/mpam-qapi-stubs.c |   9 +
 hw/arm/mpam-qapi.c   |  58 +++
 hw/arm/mpam.c| 886 +++
 hw/arm/Kconfig   |   4 +
 hw/arm/meson.build   |   4 +
 qapi/meson.build |   1 +
 10 files changed, 1055 insertions(+)

diff --git a/qapi/mpam.json b/qapi/mpam.json
new file mode 100644
index 00..f4990ef96b
--- /dev/null
+++ b/qapi/mpam.json
@@ -0,0 +1,78 @@
+# -*- Mode: Python -*-
+# vim: filetype=python
+
+##
+# = ARM MPAM State Introspection
+##
+
+##
+# @MpamBm:
+##
+{ 'struct': 'MpamBm',
+  'data' : { 'words' : [ 'int' ]
+   }
+}
+
+##
+# @MpamRegs:
+#
+# Per RIS Register State
+#
+##
+{ 'struct' : 'MpamRegs',
+  'data' : { 'idr' : 'int',
+ 'iidr' : 'int',
+ 'aidr' : 'int',
+ 'cpor-idr': 'int',
+ 'ccap-idr': 'int',
+ 'mbw-idr': 'int',
+ 'pri-idr': 'int',
+ 'partid-nrw-idr': 'int',
+ 'msmon-idr': 'int',
+ 'csumon-idr': 'int',
+ 'mbwumon-idr': 'int',
+ 'ecr': 'int',
+ 'esr': 'int',
+ 'cfg-part-sel': 'int',
+ 'cfg-cpbm': ['MpamBm']#lazy
+  }
+}
+
+##
+# @MpamCacheInfo:
+#
+# Information about MPAM Cache MSCs
+#
+# @cpu: First CPU of the set associated with the cache
+#
+# @level: Level of cache
+#
+# @type: type of cache - make an enum
+#
+# Since: 9.0
+##
+
+{ 'struct': 'MpamCacheInfo',
+  'data' : { 'cpu' : 'int',
+ 'level' : 'int',
+ 'type' : 'int',
+ 'regs' : ['MpamRegs']
+   }
+}
+
+##
+# @query-mpam-cache:
+#
+# Get a list of MpamInfo for all Cache Related MSC
+#
+# @level: Provide a cache level to filter against
+#
+# Returns: a list of @MpamCacheInfo describing all
+#   the MPAM cache MSC instances
+#
+# Since: 9.0
+##
+{ 'command': 'query-mpam-cache',
+  'data': { '*level': 'int' },
+  'returns' : ['MpamCacheInfo'],
+  'allow-preconfig': true }
diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
index 6594afba31..ea3ee75841 100644
--- a/qapi/qapi-schema.json
+++ b/qapi/qapi-schema.json
@@ -79,3 +79,4 @@
 { 'include': 'virtio.json' }
 { 'include': 'cryptodev.json' }
 { 'include': 'cxl.json' }
+{ 'include': 'mpam.json' }
diff --git a/include/hw/arm/mpam.h b/include/hw/arm/mpam.h
new file mode 100644
index 00..7bd88d57bc
--- /dev/null
+++ b/include/hw/arm/mpam.h
@@ -0,0 +1,12 @@
+#ifndef _MPAM_H_
+#define _MPAM_H_
+
+#include "qom/object.h"
+#include "qapi/qapi-commands-mpam.h"
+
+#define TYPE_MPAM_MSC_MEM "mpam-msc-mem"
+#define TYPE_MPAM_MSC_CACHE "mpam-msc-cache"
+
+void mpam_cache_fill_info(Object *obj, MpamCacheInfo *info);
+
+#endif /* _MPAM_H_ */
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index e1ddbea96b..ac015a391a 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -71,6 +71,7 @@ enum {
 VIRT_SMMU,
 VIRT_UART,
 VIRT_MMIO,
+VIRT_MPAM_MSC,
 VIRT_RTC,
 VIRT_FW_CFG,
 VIRT_PCIE,
@@ -160,6 +161,7 @@ struct VirtMachineState {
 bool ras;
 bool mte;
 bool dtb_randomness;
+bool mpam, mpam_min_msc;
 OnOffAuto acpi;
 VirtGICType gic_version;
 VirtIOMMUType iommu;
diff --git a/hw/arm/mpam-qapi-stubs.c b/hw/arm/mpam-qapi-stubs.c
new file mode 100644
index 00..40ccb7de9a
--- /dev/null
+++ b/hw/arm/mpam-qapi-stubs.c
@@ -0,0 +1,9 @@
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qapi/qapi-commands-mpam.h"
+
+MpamCacheInfoList *qmp_query_mpam_cache(bool has_level, int64_t level, Error 
**errp)
+{
+return NULL;
+}
diff --git a/hw/arm/mpam-qapi.c b/hw/arm/mpam-qapi.c
new file mode 100644
index 00..cf027e0da9
--- /dev/null
+++ b/hw/arm/mpam-qapi.c
@@ -0,0 +1,58 @@
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/arm/mpam.h"
+#include "qom/object.h"
+#include "qapi/qapi-commands-mpam.h"
+
+typedef struct MPAMQueryState {
+Error **errp;
+MpamCacheInfoList **head;
+bool level_filter_on;
+int level;
+} MPAMQueryState;
+
+static int mpam_query_cache(Object *obj, void *opaque)
+{
+MPAMQueryState *state = opaque;
+MpamCacheInfoList *infolist;
+MpamCacheInfo *info;
+
+if (!object_dynamic_cast(obj, TYPE_MPAM_MSC_CACHE)) {
+return 0;
+}
+if (state->level_filter_on &&
+object_property_get_uint(obj, "cache-level", state->errp) !=
+state->level) {
+return 0;
+}
+
+infolist = g_malloc0(sizeof(*infolist));
+info = g_malloc0(sizeof(*info));
+
+mpam_cache_fill_info(obj, info);
+
+infolist->value = info;
+
+*state

[RFC PATCH 5/5] hw/arm/virt: Add MPAM MSCs for memory controllers and caches.

2023-08-08 Thread Jonathan Cameron via

Allow sharing of MSC instances (using RIS) for memory controllers.
Currently cached controllers are not using RIS (tend to be more
in a system so more clever logic needed to auto allocate them).

No DT support yet.  The kernel bindings are considered unstable
so premature to add too much on that front.

Note that for now MPAM MSC creation is dependent on having SRAT
and hence you need some numa nodes defined.

Signed-off-by: Jonathan Cameron 
---
 include/hw/arm/mpam.h|   1 +
 hw/arm/virt-acpi-build.c | 197 +++
 hw/arm/virt.c| 134 ++
 3 files changed, 332 insertions(+)

diff --git a/include/hw/arm/mpam.h b/include/hw/arm/mpam.h
index 7bd88d57bc..8f47c8806f 100644
--- a/include/hw/arm/mpam.h
+++ b/include/hw/arm/mpam.h
@@ -7,6 +7,7 @@
 #define TYPE_MPAM_MSC_MEM "mpam-msc-mem"
 #define TYPE_MPAM_MSC_CACHE "mpam-msc-cache"
 
+#define MPAM_SIZE 0x4000 /* Big enough for anyone ;) */
 void mpam_cache_fill_info(Object *obj, MpamCacheInfo *info);
 
 #endif /* _MPAM_H_ */
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index ec8fdcefff..b14dce3722 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -49,6 +49,7 @@
 #include "hw/pci/pci_bus.h"
 #include "hw/pci-host/gpex.h"
 #include "hw/arm/virt.h"
+#include "hw/arm/mpam.h"
 #include "hw/intc/arm_gicv3_its_common.h"
 #include "hw/mem/nvdimm.h"
 #include "hw/platform-bus.h"
@@ -515,6 +516,198 @@ build_spcr(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 acpi_table_end(linker, &table);
 }
 
+static void build_msc_memory_controller(GArray *table_data, int identifier,
+hwaddr base_addr, uint32_t mpam_id,
+int num_ris, uint64_t *nodes)
+{
+int length = 72 + 24 * num_ris;
+int i;
+
+build_append_int_noprefix(table_data, length, 2);
+/* Interface Type */
+build_append_int_noprefix(table_data, 0 /* MMIO */, 1);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 1);
+build_append_int_noprefix(table_data, identifier, 4);
+build_append_int_noprefix(table_data, base_addr, 8);
+build_append_int_noprefix(table_data, MPAM_SIZE, 4);
+/* Overflow interrupt - HACK */
+build_append_int_noprefix(table_data, 0x2C, 4);
+/* Edge - SPI */
+build_append_int_noprefix(table_data, 1, 4);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 4);
+/* Overflow Int Affinity - HACK */
+build_append_int_noprefix(table_data, 0, 4);
+/* Error interrupt - HACK */
+build_append_int_noprefix(table_data, 0x2D, 4);
+/* Edge - SPI */
+build_append_int_noprefix(table_data, 1, 4);
+/* More reserved */
+build_append_int_noprefix(table_data, 0, 4);
+/* Error Int Affinity */
+build_append_int_noprefix(table_data, 0, 4);
+/* MAX_NRDY_USEC */
+build_append_int_noprefix(table_data, 100, 4);
+/* Linked Device - none for now */
+build_append_int_noprefix(table_data, 0, 8);
+/* _UID of linked device */
+build_append_int_noprefix(table_data, 0, 4);
+build_append_int_noprefix(table_data, num_ris, 4);
+/* Build a memory controller resouce */
+for (i = 0; i < num_ris; i++) {
+build_append_int_noprefix(table_data, mpam_id, 4);
+build_append_int_noprefix(table_data, i, 1);
+/* Reserved1 */
+build_append_int_noprefix(table_data, 0, 2);
+/* Locator type - 1 = Memory */
+build_append_int_noprefix(table_data, 1, 1);
+/* Locator part 1 Node */
+build_append_int_noprefix(table_data, nodes[i], 8);
+/* Locator part 2 reserved */
+build_append_int_noprefix(table_data, 0, 4);
+/* Num functional dependencies */
+build_append_int_noprefix(table_data, 0, 4);
+}
+}
+
+static void build_msc_cache_controller(GArray *table_data, int identifier,
+   hwaddr base_addr, uint32_t mpam_id,
+   int num_ris, uint64_t *cache_id)
+{
+int length = 72 + 24 * num_ris;
+int i;
+
+build_append_int_noprefix(table_data, length, 2);
+/* Interface Type */
+build_append_int_noprefix(table_data, 0 /* MMIO */, 1);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 1);
+build_append_int_noprefix(table_data, identifier, 4);
+build_append_int_noprefix(table_data, base_addr, 8);
+build_append_int_noprefix(table_data, MPAM_SIZE, 4);
+/* Overflow interrupt - HACK */
+build_append_int_noprefix(table_data, 0x2C, 4);
+/* Edge - SPI */
+build_append_int_noprefix(table_data, 1, 4);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 4);
+/* Overflow Int Affinity */
+build_append_int_noprefix(table_data, 0, 4);
+/* Error interrupt - HACK */
+build_append_int_noprefix(table_data, 0x2D, 4);
+/* Edge - SPI */
+build_append_int_noprefix(table_data, 1,

[PATCH v2 0/3] linux-user, configure: fix CPU canonicalization

2023-08-08 Thread Paolo Bonzini

The CPU model has to be canonicalized to what Meson wants in the cross
file, to what Linux uses for its asm-$ARCH directories, and to what
QEMU uses for its user-mode emulation host/$ARCH directories.  Do
all three in a single case statement, and check that the Linux and
QEMU directories actually exist.

At a small cost in repeated lines, this ensures that there are no hidden
ordering requirements between the case statements.  In particular, commit
89e5b7935e9 ("configure: Fix linux-user host detection for riscv64",
2023-08-06) broke ppc64le because it assigned host_arch based on a
non-canonicalized version of $cpu.

While doing this, I noticed that linux-user won't work on x32, alpha
and 32-bit s390 these days, due to missing common-user/ fragments.
The first two patches clean up the directories.

v1->v2: fix s390x compilation; rearrange case terminators


Paolo Bonzini (3):
  configure: fix detection for x32 linux-user
  linux-user: cleanup unused linux-user/include/host directories
  configure: unify case statements for CPU canonicalization

 configure   | 173 
 linux-user/include/host/alpha/host-signal.h |  55 ---
 linux-user/include/host/s390/host-signal.h  | 138 
 linux-user/include/host/s390x/host-signal.h | 139 +++-
 linux-user/include/host/x32/host-signal.h   |   1 -
 5 files changed, 240 insertions(+), 266 deletions(-)
 delete mode 100644 linux-user/include/host/alpha/host-signal.h
 delete mode 100644 linux-user/include/host/s390/host-signal.h
 delete mode 100644 linux-user/include/host/x32/host-signal.h

-- 
2.41.0

[PATCH v2 1/3] configure: fix detection for x32 linux-user

2023-08-08 Thread Paolo Bonzini

x32 uses the same signal handling fragments as x86_64, since host_arch
is set to x86_64 when Meson runs.  Remove the unnecessary forwarder and
set the host_arch variable properly in configure.

Reviewed-by: Richard Henderson 
Signed-off-by: Paolo Bonzini 
---
 configure | 2 ++
 linux-user/include/host/x32/host-signal.h | 1 -
 2 files changed, 2 insertions(+), 1 deletion(-)
 delete mode 100644 linux-user/include/host/x32/host-signal.h

diff --git a/configure b/configure
index 98dc78280e6..484d38d81f4 100755
--- a/configure
+++ b/configure
@@ -472,6 +472,8 @@ fi
 case "$cpu" in
   riscv*)
 host_arch=riscv ;;
+  x32)
+host_arch=x86_64 ;;
   *)
 host_arch="$cpu" ;;
 esac
diff --git a/linux-user/include/host/x32/host-signal.h 
b/linux-user/include/host/x32/host-signal.h
deleted file mode 100644
index 26800591d3b..000
--- a/linux-user/include/host/x32/host-signal.h
+++ /dev/null
@@ -1 +0,0 @@
-#include "../x86_64/host-signal.h"
-- 
2.41.0

[PATCH v2 3/3] configure: unify case statements for CPU canonicalization

2023-08-08 Thread Paolo Bonzini

The CPU model has to be canonicalized to what Meson wants in the cross
file, to what Linux uses for its asm-$ARCH directories, and to what
QEMU uses for its user-mode emulation host/$ARCH directories.  Do
all three in a single case statement, and check that the Linux and
QEMU directories actually exist.

At a small cost in repeated lines, this ensures that there are no hidden
ordering requirements between the case statements.  In particular, commit
89e5b7935e9 ("configure: Fix linux-user host detection for riscv64",
2023-08-06) broke ppc64le because it assigned host_arch based on a
non-canonicalized version of $cpu.

Reported-by: Joel Stanley 
Fixes: 89e5b7935e9 ("configure: Fix linux-user host detection for riscv64", 
2023-08-06)
Reviewed-by: Richard Henderson 
Signed-off-by: Paolo Bonzini 
---
 configure | 175 +++---
 1 file changed, 102 insertions(+), 73 deletions(-)

diff --git a/configure b/configure
index 484d38d81f4..24bfb9277b9 100755
--- a/configure
+++ b/configure
@@ -469,59 +469,119 @@ else
   echo "WARNING: unrecognized host CPU, proceeding with 'uname -m' output 
'$cpu'"
 fi
 
-case "$cpu" in
-  riscv*)
-host_arch=riscv ;;
-  x32)
-host_arch=x86_64 ;;
-  *)
-host_arch="$cpu" ;;
-esac
-
-# Normalise host CPU name and set multilib cflags.  The canonicalization
-# isn't really necessary, because the architectures that we check for
-# should not hit the 'uname -m' case, but better safe than sorry.
+# Normalise host CPU name to the values used by Meson cross files and in source
+# directories, and set multilib cflags.  The canonicalization isn't really
+# necessary, because the architectures that we check for should not hit the
+# 'uname -m' case, but better safe than sorry in case --cpu= is used.
+#
 # Note that this case should only have supported host CPUs, not guests.
+# Please keep it sorted and synchronized with meson.build's host_arch.
+host_arch=
+linux_arch=
 case "$cpu" in
+  aarch64)
+host_arch=aarch64
+linux_arch=arm64
+;;
+
   armv*b|armv*l|arm)
-cpu="arm" ;;
+cpu=arm
+host_arch=arm
+linux_arch=arm
+;;
 
   i386|i486|i586|i686)
 cpu="i386"
-CPU_CFLAGS="-m32" ;;
+host_arch=i386
+linux_arch=x86
+CPU_CFLAGS="-m32"
+;;
+
+  loongarch*)
+cpu=loongarch64
+host_arch=loongarch64
+;;
+
+  mips64*)
+cpu=mips64
+host_arch=mips
+linux_arch=mips
+;;
+  mips*)
+cpu=mips
+host_arch=mips
+linux_arch=mips
+;;
+
+  ppc)
+host_arch=ppc
+linux_arch=powerpc
+CPU_CFLAGS="-m32"
+;;
+  ppc64)
+host_arch=ppc64
+linux_arch=powerpc
+CPU_CFLAGS="-m64 -mbig-endian"
+;;
+  ppc64le)
+cpu=ppc64
+host_arch=ppc64
+linux_arch=powerpc
+CPU_CFLAGS="-m64 -mlittle-endian"
+;;
+
+  riscv32 | riscv64)
+host_arch=riscv
+linux_arch=riscv
+;;
+
+  s390)
+linux_arch=s390
+CPU_CFLAGS="-m31"
+;;
+  s390x)
+host_arch=s390x
+linux_arch=s390
+CPU_CFLAGS="-m64"
+;;
+
+  sparc|sun4[cdmuv])
+cpu=sparc
+CPU_CFLAGS="-m32 -mv8plus -mcpu=ultrasparc"
+;;
+  sparc64)
+host_arch=sparc64
+CPU_CFLAGS="-m64 -mcpu=ultrasparc"
+;;
+
   x32)
 cpu="x86_64"
-CPU_CFLAGS="-mx32" ;;
+host_arch=x86_64
+linux_arch=x86
+CPU_CFLAGS="-mx32"
+;;
   x86_64|amd64)
 cpu="x86_64"
+host_arch=x86_64
+linux_arch=x86
 # ??? Only extremely old AMD cpus do not have cmpxchg16b.
 # If we truly care, we should simply detect this case at
 # runtime and generate the fallback to serial emulation.
-CPU_CFLAGS="-m64 -mcx16" ;;
-
-  mips*)
-cpu="mips" ;;
-
-  ppc)
-CPU_CFLAGS="-m32" ;;
-  ppc64)
-CPU_CFLAGS="-m64 -mbig-endian" ;;
-  ppc64le)
-cpu="ppc64"
-CPU_CFLAGS="-m64 -mlittle-endian" ;;
-
-  s390)
-CPU_CFLAGS="-m31" ;;
-  s390x)
-CPU_CFLAGS="-m64" ;;
-
-  sparc|sun4[cdmuv])
-cpu="sparc"
-CPU_CFLAGS="-m32 -mv8plus -mcpu=ultrasparc" ;;
-  sparc64)
-CPU_CFLAGS="-m64 -mcpu=ultrasparc" ;;
+CPU_CFLAGS="-m64 -mcx16"
+;;
 esac
 
+if test -n "$host_arch" && {
+! test -d "$source_path/linux-user/include/host/$host_arch" ||
+! test -d "$source_path/common-user/host/$host_arch"; }; then
+error_exit "linux-user/include/host/$host_arch does not exist." \
+   "This is a bug in the configure script, please report it."
+fi
+if test -n "$linux_arch" && ! test -d 
"$source_path/linux-headers/asm-$linux_arch"; then
+error_exit "linux-headers/asm-$linux_arch does not exist." \
+   "This is a bug in the configure script, please report it."
+fi
+
 check_py_version() {
 # We require python >= 3.7.
 # NB: a True python conditional creates a non-zero return code (Failure)
@@ -812,7 +872,7 @@ default_target_list=""
 mak_wilds=""
 
 if [ "$linux_user" != no ]; then
-if [ "$targetos" = linux ] && [ -d 
"$source_path/linux-user/include/host/$host_arch" ]; then
+if [ "$targetos" = linux ] && [ -n

[PATCH v2 2/3] linux-user: cleanup unused linux-user/include/host directories

2023-08-08 Thread Paolo Bonzini

Alpha and 31-bit s390 lack the assembly fragment to handle signals
occurring at the same time as system calls, so they cannot run
linux-user emulation anymore.  Drop the host-signal.h files for
them.

Signed-off-by: Paolo Bonzini 
---
 linux-user/include/host/alpha/host-signal.h |  55 
 linux-user/include/host/s390/host-signal.h  | 138 ---
 linux-user/include/host/s390x/host-signal.h | 139 +++-
 3 files changed, 138 insertions(+), 194 deletions(-)
 delete mode 100644 linux-user/include/host/alpha/host-signal.h
 delete mode 100644 linux-user/include/host/s390/host-signal.h

diff --git a/linux-user/include/host/alpha/host-signal.h 
b/linux-user/include/host/alpha/host-signal.h
deleted file mode 100644
index 4f9e2abc4b0..000
--- a/linux-user/include/host/alpha/host-signal.h
+++ /dev/null
@@ -1,55 +0,0 @@
-/*
- * host-signal.h: signal info dependent on the host architecture
- *
- * Copyright (c) 2003-2005 Fabrice Bellard
- * Copyright (c) 2021 Linaro Limited
- *
- * This work is licensed under the terms of the GNU LGPL, version 2.1 or later.
- * See the COPYING file in the top-level directory.
- */
-
-#ifndef ALPHA_HOST_SIGNAL_H
-#define ALPHA_HOST_SIGNAL_H
-
-/* The third argument to a SA_SIGINFO handler is ucontext_t. */
-typedef ucontext_t host_sigcontext;
-
-static inline uintptr_t host_signal_pc(host_sigcontext *uc)
-{
-return uc->uc_mcontext.sc_pc;
-}
-
-static inline void host_signal_set_pc(host_sigcontext *uc, uintptr_t pc)
-{
-uc->uc_mcontext.sc_pc = pc;
-}
-
-static inline void *host_signal_mask(host_sigcontext *uc)
-{
-return &uc->uc_sigmask;
-}
-
-static inline bool host_signal_write(siginfo_t *info, host_sigcontext *uc)
-{
-uint32_t *pc = (uint32_t *)host_signal_pc(uc);
-uint32_t insn = *pc;
-
-/* XXX: need kernel patch to get write flag faster */
-switch (insn >> 26) {
-case 0x0d: /* stw */
-case 0x0e: /* stb */
-case 0x0f: /* stq_u */
-case 0x24: /* stf */
-case 0x25: /* stg */
-case 0x26: /* sts */
-case 0x27: /* stt */
-case 0x2c: /* stl */
-case 0x2d: /* stq */
-case 0x2e: /* stl_c */
-case 0x2f: /* stq_c */
-return true;
-}
-return false;
-}
-
-#endif
diff --git a/linux-user/include/host/s390/host-signal.h 
b/linux-user/include/host/s390/host-signal.h
deleted file mode 100644
index e6d3ec26dc7..000
--- a/linux-user/include/host/s390/host-signal.h
+++ /dev/null
@@ -1,138 +0,0 @@
-/*
- * host-signal.h: signal info dependent on the host architecture
- *
- * Copyright (c) 2003-2005 Fabrice Bellard
- * Copyright (c) 2021 Linaro Limited
- *
- * This work is licensed under the terms of the GNU LGPL, version 2.1 or later.
- * See the COPYING file in the top-level directory.
- */
-
-#ifndef S390_HOST_SIGNAL_H
-#define S390_HOST_SIGNAL_H
-
-/* The third argument to a SA_SIGINFO handler is ucontext_t. */
-typedef ucontext_t host_sigcontext;
-
-static inline uintptr_t host_signal_pc(host_sigcontext *uc)
-{
-return uc->uc_mcontext.psw.addr;
-}
-
-static inline void host_signal_set_pc(host_sigcontext *uc, uintptr_t pc)
-{
-uc->uc_mcontext.psw.addr = pc;
-}
-
-static inline void *host_signal_mask(host_sigcontext *uc)
-{
-return &uc->uc_sigmask;
-}
-
-static inline bool host_signal_write(siginfo_t *info, host_sigcontext *uc)
-{
-uint16_t *pinsn = (uint16_t *)host_signal_pc(uc);
-
-/*
- * ??? On linux, the non-rt signal handler has 4 (!) arguments instead
- * of the normal 2 arguments.  The 4th argument contains the "Translation-
- * Exception Identification for DAT Exceptions" from the hardware (aka
- * "int_parm_long"), which does in fact contain the is_write value.
- * The rt signal handler, as far as I can tell, does not give this value
- * at all.  Not that we could get to it from here even if it were.
- * So fall back to parsing instructions.  Treat read-modify-write ones as
- * writes, which is not fully correct, but for tracking self-modifying code
- * this is better than treating them as reads.  Checking si_addr page flags
- * might be a viable improvement, albeit a racy one.
- */
-/* ??? This is not even close to complete.  */
-switch (pinsn[0] >> 8) {
-case 0x50: /* ST */
-case 0x42: /* STC */
-case 0x40: /* STH */
-case 0x44: /* EX */
-case 0xba: /* CS */
-case 0xbb: /* CDS */
-return true;
-case 0xc4: /* RIL format insns */
-switch (pinsn[0] & 0xf) {
-case 0xf: /* STRL */
-case 0xb: /* STGRL */
-case 0x7: /* STHRL */
-return true;
-}
-break;
-case 0xc6: /* RIL-b format insns */
-switch (pinsn[0] & 0xf) {
-case 0x0: /* EXRL */
-return true;
-}
-break;
-case 0xc8: /* SSF format insns */
-switch (pinsn[0] & 0xf) {
-case 0x2: /* CSST */
-return true;
-}
-break;
-case 0xe3: /* RXY format insns */

Re: [PATCH 3/8] Introduced a new function to disconnect GPIO connections

2023-08-08 Thread lixianglai


Hi,Peter Maydell :


On 7/28/23 8:38 PM, Peter Maydell wrote:

On Thu, 20 Jul 2023 at 08:16, xianglai li  wrote:

It introduces a new function to unwire the
vcpu<->exioi interrupts for the vcpu hot-(un)plug cases.

Cc: Xiaojuan Yang 
Cc: Song Gao 
Cc: "Michael S. Tsirkin" 
Cc: Igor Mammedov 
Cc: Ani Sinha 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: "Philippe Mathieu-Daudé" 
Cc: Yanan Wang 
Cc: "Daniel P. Berrangé" 
Cc: Peter Xu 
Cc: David Hildenbrand 
Signed-off-by: xianglai li 
---
  hw/core/gpio.c | 4 ++--
  include/hw/qdev-core.h | 2 ++
  2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/core/gpio.c b/hw/core/gpio.c
index 80d07a6ec9..4fc6409545 100644
--- a/hw/core/gpio.c
+++ b/hw/core/gpio.c
@@ -143,8 +143,8 @@ qemu_irq qdev_get_gpio_out_connector(DeviceState *dev, 
const char *name, int n)

  /* disconnect a GPIO output, returning the disconnected input (if any) */

-static qemu_irq qdev_disconnect_gpio_out_named(DeviceState *dev,
-   const char *name, int n)
+qemu_irq qdev_disconnect_gpio_out_named(DeviceState *dev,
+const char *name, int n)
  {
  char *propname = g_strdup_printf("%s[%d]",
   name ? name : "unnamed-gpio-out", n);

This function as currently written is intended only for
use in qdev_intercept_gpio_out(), which in turn is there
only for qtest test case use. I would be a bit cautious
about whether there are unexpected issues with trying
to use it in "production" functionality with a running
QEMU (eg when perhaps some device might be trying to
signal the gpio line in another thread while this one
is trying to disconnect it).


That's how I understand it:

Take Loongarch's extended interrupt controller, for example:

1.The interrupt pin of the extended interrupt controller is initialized in

static void loongarch_extioi_instance_init(Object *obj)

{



    for (cpu = 0; cpu < EXTIOI_CPUS; cpu++) {

    for (pin = 0; pin < LS3A_INTC_IP; pin++) {
    qdev_init_gpio_out(DEVICE(obj), &s->parent_irq[cpu][pin], 1);
    }
    }

.

}

The above function binds the extended interrupt pin to the variable 
s->parent_irq[cpu][pin] pointer.


2.The assignment of the interrupt pin of the extended interrupt 
controller operates in


static LoongArchCPU *loongarch_cpu_create(MachineState *machine, 
LoongArchCPU *cpu, Error **errp)

{
...
    for (pin = 0; pin < LS3A_INTC_IP; pin++) {
    qdev_connect_gpio_out(extioi, (cpu_index * 8 + pin), 
qdev_get_gpio_in(cpudev, pin + 2));

    }
...
}

The above function binds the extended interrupt pin to the return value 
of the qdev_get_gpio_in(cpudev, pin + 2).


You actually do the following:

s->parent_irq[cpu][pin] = qdev_get_gpio_in(cpudev, pin + 2);

Variables are only accessed when the interrupt controller's IO space is 
extended to read and write.


There are QEMU thread locks when reading and writing in IO space：

static inline void glue(address_space_stw_internal, SUFFIX)(ARG1_DECL, 
hwaddr addr, uint16_t val, MemTxAttrs attrs, MemTxResult *result, enum 
device_endian endian)

{
...
    RCU_READ_LOCK();
...
    release_lock |= prepare_mmio_access(mr);
    r = memory_region_dispatch_write(mr, addr1, val, MO_16 | 
devend_memop(endian), attrs);

...
    if (release_lock) {
    qemu_mutex_unlock_iothread();
    }
    RCU_READ_UNLOCK();
...
}

Therefore, both are locked by the thread of QEMU when the CPU is 
unplugged or interrupted sent, which can ensure that the two are 
mutually exclusive.


From the call stack below, it can be seen that interrupt sending and 
CPU unplugging are in two worker threads, but both worker threads are 
mutexe.


for example:

Thread 1 "qemu-system-loo" hit Breakpoint 5, qemu_set_irq
41  if (!irq)
(gdb) bt
#0  0x00aaab1e3630 in qemu_set_irq
#1  0x00aaab1324d8 in loongarch_msi_mem_write
#2  0x00aaab1835e0 in memory_region_write_accessor
#3  0x00aaab182e38 in access_with_adjusted_size
#4  0x00aaab18323c in memory_region_dispatch_write
#5  0x00aaab190cf8 in address_space_stl_internal
#6  0x00aaab190cf8 in address_space_stl_le
#7  0x00aaab3714a8 in aio_dispatch_handler
#8  0x00aaab371e20 in aio_dispatch_handlers
#9  0x00aaab371e20 in aio_dispatch
#10 0x00aaab388ae0 in aio_ctx_dispatch
#11 0x00fff724d334 in g_main_context_dispatch
#12 0x00aaab38a390 in glib_pollfds_poll
#13 0x00aaab38a390 in os_host_main_loop_wait
#14 0x00aaab38a390 in main_loop_wait
#15 0x00aaab05aa0c in qemu_main_loop
#16 0x00aaab1ddf2c in qemu_default_main
#17 0x00fff60d9670 in __libc_start_main
#18 0x00e719fc in _start ()
(gdb) call qemu_mutex_iothread_locked()
$5 = true

Thread 4 "qemu-system-loo" hit Breakpoint 2, loongarch_cpu_destroy
1177    LoongArchMachineState *lsms = LOONGARCH_MACHINE(machine);
(gdb) bt
#0  0x00aaab0eeb60 in

Re: [PATCH 6/8] Add support of unrealize for loongarch cpu

2023-08-08 Thread lixianglai


Hi Igor Mammedov:


On 7/28/23 9:23 PM, Igor Mammedov wrote:

On Thu, 20 Jul 2023 15:15:11 +0800
xianglai li  wrote:


1.Add the Unrealize function to the Loongarch CPU for cpu hot-(un)plug
2.Add CPU topology-related properties to the Loongarch CPU for cpu hot-(un)plug

Cc: Xiaojuan Yang 
Cc: Song Gao 
Cc: "Michael S. Tsirkin" 
Cc: Igor Mammedov 
Cc: Ani Sinha 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: "Philippe Mathieu-Daudé" 
Cc: Yanan Wang 
Cc: "Daniel P. Berrangé" 
Cc: Peter Xu 
Cc: David Hildenbrand 
Signed-off-by: xianglai li 
---
  target/loongarch/cpu.c | 33 +
  target/loongarch/cpu.h |  1 +
  2 files changed, 34 insertions(+)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index ad93ecac92..97c577820f 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -18,6 +18,7 @@
  #include "cpu-csr.h"
  #include "sysemu/reset.h"
  #include "tcg/tcg.h"
+#include "hw/qdev-properties.h"
  
  const char * const regnames[32] = {

  "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7",
@@ -540,6 +541,24 @@ static void loongarch_cpu_realizefn(DeviceState *dev, 
Error **errp)
  lacc->parent_realize(dev, errp);
  }
  
+static void loongarch_cpu_unrealizefn(DeviceState *dev)

+{
+LoongArchCPUClass *mcc = LOONGARCH_CPU_GET_CLASS(dev);
+
+#ifndef CONFIG_USER_ONLY
+CPUState *cs = CPU(dev);
+LoongArchCPU *cpu = LOONGARCH_CPU(dev);
+CPULoongArchState *env = &cpu->env;
+
+cpu_remove_sync(CPU(dev));
+cpu_address_space_destroy(cs, 0);
+address_space_destroy(&env->address_space_iocsr);
+memory_region_del_subregion(&env->system_iocsr, &env->iocsr_mem);
+#endif
+
+mcc->parent_unrealize(dev);
+}
+
  #ifndef CONFIG_USER_ONLY
  static void loongarch_qemu_write(void *opaque, hwaddr addr,
   uint64_t val, unsigned size)
@@ -697,6 +716,15 @@ static gchar *loongarch_gdb_arch_name(CPUState *cs)
  return g_strdup("loongarch64");
  }
  



+static Property loongarch_cpu_properties[] = {
+DEFINE_PROP_INT32("socket-id", LoongArchCPU, socket_id, 0),
+DEFINE_PROP_INT32("core-id", LoongArchCPU, core_id, 0),
+DEFINE_PROP_INT32("thread-id", LoongArchCPU, thread_id, 0),
+DEFINE_PROP_INT32("node-id", LoongArchCPU, node_id, 
CPU_UNSET_NUMA_NODE_ID),
+
+DEFINE_PROP_END_OF_LIST()
+};

this should be a part of topo patches



Ok, I'll move it to the topo patch.


Thanks,

xianglai


+
  static void loongarch_cpu_class_init(ObjectClass *c, void *data)
  {
  LoongArchCPUClass *lacc = LOONGARCH_CPU_CLASS(c);
@@ -704,8 +732,12 @@ static void loongarch_cpu_class_init(ObjectClass *c, void 
*data)
  DeviceClass *dc = DEVICE_CLASS(c);
  ResettableClass *rc = RESETTABLE_CLASS(c);
  
+device_class_set_props(dc, loongarch_cpu_properties);

  device_class_set_parent_realize(dc, loongarch_cpu_realizefn,
  &lacc->parent_realize);
+device_class_set_parent_unrealize(dc, loongarch_cpu_unrealizefn,
+  &lacc->parent_unrealize);
+
  resettable_class_set_parent_phases(rc, NULL, loongarch_cpu_reset_hold, 
NULL,
 &lacc->parent_phases);
  
@@ -730,6 +762,7 @@ static void loongarch_cpu_class_init(ObjectClass *c, void *data)

  #ifdef CONFIG_TCG
  cc->tcg_ops = &loongarch_tcg_ops;
  #endif
+dc->user_creatable = true;
  }
  
  #define DEFINE_LOONGARCH_CPU_TYPE(model, initfn) \

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index f4439c245f..32feee4fe6 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -397,6 +397,7 @@ struct LoongArchCPUClass {
  /*< public >*/
  
  DeviceRealize parent_realize;

+DeviceUnrealize parent_unrealize;
  ResettablePhases parent_phases;
  };

Re: [PATCH 7/8] Update the ACPI table for the Loongarch CPU

2023-08-08 Thread lixianglai


Hi Igor Mammedov:


On 7/28/23 9:26 PM, Igor Mammedov wrote:

On Thu, 20 Jul 2023 15:15:12 +0800
xianglai li  wrote:


1.Create a new GED device type for Loongarch,
mount cpu_madt function to update the ACPI table

madt changes should be its own patch



Okay, I'll put the Madt-related changes into a separate patch.


Thanks,

xianglai


2.Update the APIC table for loongarch based on
CPU information to support CPU hot-(un)plug

Cc: Xiaojuan Yang 
Cc: Song Gao 
Cc: "Michael S. Tsirkin" 
Cc: Igor Mammedov 
Cc: Ani Sinha 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: "Philippe Mathieu-Daudé" 
Cc: Yanan Wang 
Cc: "Daniel P. Berrangé" 
Cc: Peter Xu 
Cc: David Hildenbrand 
Signed-off-by: xianglai li 
---
  hw/acpi/acpi-cpu-hotplug-stub.c   |  9 +
  hw/loongarch/acpi-build.c | 35 --
  hw/loongarch/generic_event_device_loongarch.c | 36 +++
  hw/loongarch/meson.build  |  2 +-
  4 files changed, 79 insertions(+), 3 deletions(-)
  create mode 100644 hw/loongarch/generic_event_device_loongarch.c

diff --git a/hw/acpi/acpi-cpu-hotplug-stub.c b/hw/acpi/acpi-cpu-hotplug-stub.c
index 2aec90d968..af9fda2cf4 100644
--- a/hw/acpi/acpi-cpu-hotplug-stub.c
+++ b/hw/acpi/acpi-cpu-hotplug-stub.c
@@ -19,6 +19,15 @@ void legacy_acpi_cpu_hotplug_init(MemoryRegion *parent, 
Object *owner,
  return;
  }
  
+void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,

+hwaddr mmap_io_base,
+const char *res_root,
+const char *event_handler_method,
+AmlRegionSpace rs)
+{
+return;
+}
+
  void acpi_cpu_ospm_status(CPUHotplugState *cpu_st, ACPIOSTInfoList ***list)
  {
  return;
diff --git a/hw/loongarch/acpi-build.c b/hw/loongarch/acpi-build.c
index 0b62c3a2f7..312908fb2f 100644
--- a/hw/loongarch/acpi-build.c
+++ b/hw/loongarch/acpi-build.c
@@ -46,6 +46,23 @@
  #define ACPI_BUILD_DPRINTF(fmt, ...)
  #endif
  
+void virt_madt_cpu_entry(int uid,

+ const CPUArchIdList *apic_ids,
+ GArray *entry, bool force_enabled)
+{
+uint32_t apic_id = apic_ids->cpus[uid].arch_id;
+/* Flags – Local APIC Flags */
+uint32_t flags = apic_ids->cpus[uid].cpu != NULL || force_enabled ?
+ 1 /* Enabled */ : 0;
+
+/* Rev 1.0b, Table 5-13 Processor Local APIC Structure */
+build_append_int_noprefix(entry, 0, 1);   /* Type */
+build_append_int_noprefix(entry, 8, 1);   /* Length */
+build_append_int_noprefix(entry, uid, 1); /* ACPI Processor ID */
+build_append_int_noprefix(entry, apic_id, 1); /* APIC ID */
+build_append_int_noprefix(entry, flags, 4); /* Flags */
+}
+
  /* build FADT */
  static void init_common_fadt_data(AcpiFadtData *data)
  {
@@ -121,15 +138,18 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
LoongArchMachineState *lams)
  build_append_int_noprefix(table_data, 1 /* PCAT_COMPAT */, 4); /* Flags */
  
  for (i = 0; i < arch_ids->len; i++) {

+uint32_t flags;
+
  /* Processor Core Interrupt Controller Structure */
  arch_id = arch_ids->cpus[i].arch_id;
+flags = arch_ids->cpus[i].cpu ? 1 : 0;
  
  build_append_int_noprefix(table_data, 17, 1);/* Type */

  build_append_int_noprefix(table_data, 15, 1);/* Length */
  build_append_int_noprefix(table_data, 1, 1); /* Version */
-build_append_int_noprefix(table_data, i + 1, 4); /* ACPI Processor ID 
*/
+build_append_int_noprefix(table_data, i, 4); /* ACPI Processor ID 
*/
  build_append_int_noprefix(table_data, arch_id, 4); /* Core ID */
-build_append_int_noprefix(table_data, 1, 4); /* Flags */
+build_append_int_noprefix(table_data, flags, 4);   /* Flags */
  }
  
  /* Extend I/O Interrupt Controller Structure */

@@ -292,6 +312,17 @@ build_la_ged_aml(Aml *dsdt, MachineState *machine)
   AML_SYSTEM_MEMORY,
   VIRT_GED_MEM_ADDR);
  }
+
+if (event & ACPI_GED_CPU_HOTPLUG_EVT) {
+CPUHotplugFeatures opts = {
+.acpi_1_compatible = false,
+.has_legacy_cphp = false
+};
+
+build_cpus_aml(dsdt, machine, opts, VIRT_GED_CPUHP_ADDR,
+   "\\_SB", "\\_GPE._E01", AML_SYSTEM_MEMORY);
+
+}
  acpi_dsdt_add_power_button(dsdt);
  }
  
diff --git a/hw/loongarch/generic_event_device_loongarch.c b/hw/loongarch/generic_event_device_loongarch.c

new file mode 100644
index 00..1fe550239b
--- /dev/null
+++ b/hw/loongarch/generic_event_device_loongarch.c
@@ -0,0 +1,36 @@
+/*
+ * loongarch variant of the generic event device for hw reduced acpi
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * v

Re: [PATCH 8/8] Turn on CPU hot-(un)plug customization for loongarch

2023-08-08 Thread lixianglai


Hi, Igor Mammedov:


On 7/28/23 9:30 PM, Igor Mammedov wrote:

On Thu, 20 Jul 2023 15:15:13 +0800
xianglai li  wrote:


Turn on CPU hot-(un)plug custom for loongarch in the configuration file

Cc: Xiaojuan Yang 
Cc: Song Gao 
Cc: "Michael S. Tsirkin" 
Cc: Igor Mammedov 
Cc: Ani Sinha 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: "Philippe Mathieu-Daudé" 
Cc: Yanan Wang 
Cc: "Daniel P. Berrangé" 
Cc: Peter Xu 
Cc: David Hildenbrand 
Signed-off-by: xianglai li 
---
  configs/devices/loongarch64-softmmu/default.mak | 1 +
  1 file changed, 1 insertion(+)

diff --git a/configs/devices/loongarch64-softmmu/default.mak 
b/configs/devices/loongarch64-softmmu/default.mak
index 928bc117ef..e596706fab 100644
--- a/configs/devices/loongarch64-softmmu/default.mak
+++ b/configs/devices/loongarch64-softmmu/default.mak
@@ -1,3 +1,4 @@
  # Default configuration for loongarch64-softmmu
  
  CONFIG_LOONGARCH_VIRT=y

+CONFIG_ACPI_CPU_HOTPLUG=y

this likely shall be part of prior patch (one that starts to use generic cpu 
hotplug functions)
otherwise you risk a broke bisection in the middle of series
(aka try to build series after applying each patch)



Do you mean this patch should be inside the first patch?

Thanks,

xianglai

Re: [PATCH v2 2/3] linux-user: cleanup unused linux-user/include/host directories

2023-08-08 Thread Michael Tokarev


08.08.2023 15:03, Paolo Bonzini wrote:

Alpha and 31-bit s390 lack the assembly fragment to handle signals
occurring at the same time as system calls, so they cannot run
linux-user emulation anymore.  Drop the host-signal.h files for
them.

Signed-off-by: Paolo Bonzini 
---
  linux-user/include/host/alpha/host-signal.h |  55 
  linux-user/include/host/s390/host-signal.h  | 138 ---
  linux-user/include/host/s390x/host-signal.h | 139 +++-
  3 files changed, 138 insertions(+), 194 deletions(-)
  delete mode 100644 linux-user/include/host/alpha/host-signal.h
  delete mode 100644 linux-user/include/host/s390/host-signal.h


I think this cleanup can be postproned to after 8.1 release,
while other 2 patches in this series are needed for 8.1 to
fix current breakage with riscv.

linux-user/include/host/s390/ subdir will be be effectively "orphaned"
after the subsequent patch, just like alpha/ in there now, it can be
cleaned up later just as well.

But anyway,

Reviewed-by: Michael Tokarev 

git format-patch should probably have a way to detect renames, no?
Or does it not work if the target file did exist before the patch?

/mjt

Re: [PATCH v2 1/3] configure: fix detection for x32 linux-user

2023-08-08 Thread Michael Tokarev


08.08.2023 15:03, Paolo Bonzini wrote:

x32 uses the same signal handling fragments as x86_64, since host_arch
is set to x86_64 when Meson runs.  Remove the unnecessary forwarder and
set the host_arch variable properly in configure.


Reviewed-by: Michael Tokarev 

/mjt

Re: [PATCH v2 3/3] configure: unify case statements for CPU canonicalization

2023-08-08 Thread Michael Tokarev


08.08.2023 15:03, Paolo Bonzini wrote:

The CPU model has to be canonicalized to what Meson wants in the cross
file, to what Linux uses for its asm-$ARCH directories, and to what
QEMU uses for its user-mode emulation host/$ARCH directories.  Do
all three in a single case statement, and check that the Linux and
QEMU directories actually exist.

At a small cost in repeated lines, this ensures that there are no hidden
ordering requirements between the case statements.  In particular, commit
89e5b7935e9 ("configure: Fix linux-user host detection for riscv64",
2023-08-06) broke ppc64le because it assigned host_arch based on a
non-canonicalized version of $cpu.

Reported-by: Joel Stanley 
Fixes: 89e5b7935e9 ("configure: Fix linux-user host detection for riscv64", 
2023-08-06)
Reviewed-by: Richard Henderson 
Signed-off-by: Paolo Bonzini 
---
  configure | 175 +++---
  1 file changed, 102 insertions(+), 73 deletions(-)


Reviewed-by: Michael Tokarev 

A nice cleanup.

/mjt

Re: [PATCH for-8.2 v3 1/6] vfio/migration: Move from STOP_COPY to STOP in vfio_save_cleanup()

2023-08-08 Thread Jason Gunthorpe

On Tue, Aug 08, 2023 at 09:23:09AM +0300, Avihai Horon wrote:
> 
> On 07/08/2023 18:53, Cédric Le Goater wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > [ Adding Juan and Peter for their awareness ]
> > 
> > On 8/2/23 10:14, Avihai Horon wrote:
> > > Changing the device state from STOP_COPY to STOP can take time as the
> > > device may need to free resources and do other operations as part of the
> > > transition. Currently, this is done in vfio_save_complete_precopy() and
> > > therefore it is counted in the migration downtime.
> > > 
> > > To avoid this, change the device state from STOP_COPY to STOP in
> > > vfio_save_cleanup(), which is called after migration has completed and
> > > thus is not part of migration downtime.
> > 
> > What bothers me is that this looks like a device specific optimization
> 
> True, currently it helps mlx5, but this change is based on the assumption
> that, in general, VFIO devices are likely to free resources when
> transitioning from STOP_COPY to STOP.
> So I think this is a good change to have in any case.

Yes, I agree with this idea. Kernel drivers should be designed to take
advantage of things like this and defer time consuming work to the
stop arc.

> > and we are loosing the error part.
> 
> I don't think we lose the error part.
> AFAIU, the crucial part is transitioning to STOP_COPY and sending the final
> data.
> If that's done successfully, then migration is successful.

Yes.

> The STOP_COPY->STOP transition is done as part of the cleanup flow, after
> the migration is completed -- i.e., failure in it does not affect the
> success of migration.
> Further more, if there is an error in the STOP_COPY->STOP transition, then
> it's reported by vfio_migration_set_state().

If STOP_COPY->STOP fails then the migration should succeed anyhow and
qemu has to FLR the VFIO device to recover it. This probably only
matters if the migration is aborted for some other reason and qemu has
to resume the VM. It will not be able to restore the device to running
until it has been FLRd.

However, qemu can probably ignore this error and eventually if it
tries to go to RUNNING the kernel will report failure and it can do
the FLR at that point. Otherwise on the migration success path qemu
should simply close the VFIO device.

Jason

Re: [PATCH v2 0/3] linux-user, configure: fix CPU canonicalization

2023-08-08 Thread Michael Tokarev


08.08.2023 15:03, Paolo Bonzini wrote:

The CPU model has to be canonicalized to what Meson wants in the cross
file, to what Linux uses for its asm-$ARCH directories, and to what
QEMU uses for its user-mode emulation host/$ARCH directories.  Do
all three in a single case statement, and check that the Linux and
QEMU directories actually exist.

At a small cost in repeated lines, this ensures that there are no hidden
ordering requirements between the case statements.  In particular, commit
89e5b7935e9 ("configure: Fix linux-user host detection for riscv64",
2023-08-06) broke ppc64le because it assigned host_arch based on a
non-canonicalized version of $cpu.

While doing this, I noticed that linux-user won't work on x32, alpha
and 32-bit s390 these days, due to missing common-user/ fragments.
The first two patches clean up the directories.

v1->v2: fix s390x compilation; rearrange case terminators


Paolo Bonzini (3):
   configure: fix detection for x32 linux-user
   linux-user: cleanup unused linux-user/include/host directories
   configure: unify case statements for CPU canonicalization

  configure   | 173 
  linux-user/include/host/alpha/host-signal.h |  55 ---
  linux-user/include/host/s390/host-signal.h  | 138 
  linux-user/include/host/s390x/host-signal.h | 139 +++-
  linux-user/include/host/x32/host-signal.h   |   1 -
  5 files changed, 240 insertions(+), 266 deletions(-)
  delete mode 100644 linux-user/include/host/alpha/host-signal.h
  delete mode 100644 linux-user/include/host/s390/host-signal.h
  delete mode 100644 linux-user/include/host/x32/host-signal.h


Tested-by: Michael Tokarev 

Tested on s390x and ppc64le hosts.

/mjt

Re: [PATCH v2 0/3] linux-user, configure: fix CPU canonicalization

2023-08-08 Thread Ilya Leoshkevich

On Tue, 2023-08-08 at 14:03 +0200, Paolo Bonzini wrote:
> The CPU model has to be canonicalized to what Meson wants in the
> cross
> file, to what Linux uses for its asm-$ARCH directories, and to what
> QEMU uses for its user-mode emulation host/$ARCH directories.  Do
> all three in a single case statement, and check that the Linux and
> QEMU directories actually exist.
> 
> At a small cost in repeated lines, this ensures that there are no
> hidden
> ordering requirements between the case statements.  In particular,
> commit
> 89e5b7935e9 ("configure: Fix linux-user host detection for riscv64",
> 2023-08-06) broke ppc64le because it assigned host_arch based on a
> non-canonicalized version of $cpu.
> 
> While doing this, I noticed that linux-user won't work on x32, alpha
> and 32-bit s390 these days, due to missing common-user/ fragments.
> The first two patches clean up the directories.
> 
> v1->v2: fix s390x compilation; rearrange case terminators
> 
> 
> Paolo Bonzini (3):
>   configure: fix detection for x32 linux-user
>   linux-user: cleanup unused linux-user/include/host directories
>   configure: unify case statements for CPU canonicalization
> 
>  configure   | 173 --
> --
>  linux-user/include/host/alpha/host-signal.h |  55 ---
>  linux-user/include/host/s390/host-signal.h  | 138 
>  linux-user/include/host/s390x/host-signal.h | 139 +++-
>  linux-user/include/host/x32/host-signal.h   |   1 -
>  5 files changed, 240 insertions(+), 266 deletions(-)
>  delete mode 100644 linux-user/include/host/alpha/host-signal.h
>  delete mode 100644 linux-user/include/host/s390/host-signal.h
>  delete mode 100644 linux-user/include/host/x32/host-signal.h

Now it works, thanks!

Acked-by: Ilya Leoshkevich 
Tested-by: Ilya Leoshkevich

[PULL v2] hw/nvme fixes

2023-08-08 Thread Klaus Jensen

From: Klaus Jensen 

Hi,

There was a small typo in the last pull. This replaces it.

The following changes since commit 0450cf08976f9036feaded438031b4cba94f6452:

  Merge tag 'fixes-pull-request' of https://gitlab.com/marcandre.lureau/qemu 
into staging (2023-08-07 13:55:00 -0700)

are available in the Git repository at:

  https://gitlab.com/birkelund/qemu.git tags/nvme-next-pull-request

for you to fetch changes up to ec5a138ce63ce460575a44cf9ec3172c33eb0fd6:

  docs: update hw/nvme documentation for protection information (2023-08-08 
15:28:05 +0200)


hw/nvme fixes

- fix for invalid protection information calculation
-BEGIN PGP SIGNATURE-

iQEzBAABCgAdFiEEUigzqnXi3OaiR2bATeGvMW1PDekFAmTSREoACgkQTeGvMW1P
DekH6Qf/e3gi0KloAUpbTQvGmBA6XmkJFAtOdZn7IJXVCowjYTIKU84DrdPyT1c1
rofL4w0klKG5c4Or/Cs4dH/ASxTWaQZRlFAYxsTW3nUX74MnaFDRZcN2geb30ws7
ryejVEKeHNWH/YYY4Ny55wO3tmy2ILAKnbiadiXhj4dQfCK1GzZnrx10PWxLNlkZ
KRhiXLNBHpPnDlrLq7/nLs+/0cMrrqEz6ISm/Ju4iUczAH/wmqEbR/yD3pAwmH07
PCaSeegOpwscovI5TWRelOJlzIXb6D8Xk9d3dGL5x/eeN7GlkgERX4MAcNYKwe8T
JNR8y2ErTEj2nLU/juES1EpiR2gYKw==
=vJlA
-END PGP SIGNATURE-



Ankit Kumar (2):
  hw/nvme: fix CRC64 for guard tag
  docs: update hw/nvme documentation for protection information

 docs/system/devices/nvme.rst | 12 +---
 hw/nvme/dif.c|  4 ++--
 2 files changed, 11 insertions(+), 5 deletions(-)

-- 
2.41.0

[PATCH] Add support of callbacks after instructions to plugin api

2023-08-08 Thread Mikhail Tyutin

Initially, we can only call the callback BEFORE instructions. This commit adds 
the ability to insert the callback AFTER instructions.

No callback call for control-flow instructions.

Signed-off-by: Aleksandr Anenkov 
Signed-off-by: Mikhail Tyutin 
---
 accel/tcg/plugin-gen.c   | 25 -
 accel/tcg/translator.c   | 18 +-
 include/qemu/plugin.h|  1 +
 include/qemu/qemu-plugin.h   | 33 -
 plugins/api.c| 26 --
 plugins/qemu-plugins.symbols |  2 ++
 tcg/tcg-op.c | 16 
 7 files changed, 108 insertions(+), 13 deletions(-)

diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index 5c13615112..88dcbda651 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -180,6 +180,8 @@ static void plugin_gen_empty_callback(enum plugin_gen_from 
from)
 {
 switch (from) {
 case PLUGIN_GEN_AFTER_INSN:
+gen_wrapped(from, PLUGIN_GEN_CB_UDATA, gen_empty_udata_cb);
+gen_wrapped(from, PLUGIN_GEN_CB_INLINE, gen_empty_inline_cb);
 gen_wrapped(from, PLUGIN_GEN_DISABLE_MEM_HELPER,
 gen_empty_mem_helper);
 break;
@@ -598,18 +600,21 @@ static void plugin_gen_tb_inline(const struct 
qemu_plugin_tb *ptb,
 }
 
 static void plugin_gen_insn_udata(const struct qemu_plugin_tb *ptb,
+  enum plugin_dyn_cb_type cb_type,
   TCGOp *begin_op, int insn_idx)
 {
+g_assert(cb_type == PLUGIN_CB_INSN || cb_type == PLUGIN_CB_AFTER_INSN);
 struct qemu_plugin_insn *insn = g_ptr_array_index(ptb->insns, insn_idx);
-
-inject_udata_cb(insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_REGULAR], begin_op);
+inject_udata_cb(insn->cbs[cb_type][PLUGIN_CB_REGULAR], begin_op);
 }
 
 static void plugin_gen_insn_inline(const struct qemu_plugin_tb *ptb,
+   enum plugin_dyn_cb_type cb_type,
TCGOp *begin_op, int insn_idx)
 {
+g_assert(cb_type == PLUGIN_CB_INSN || cb_type == PLUGIN_CB_AFTER_INSN);
 struct qemu_plugin_insn *insn = g_ptr_array_index(ptb->insns, insn_idx);
-inject_inline_cb(insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE],
+inject_inline_cb(insn->cbs[cb_type][PLUGIN_CB_INLINE],
  begin_op, op_ok);
 }
 
@@ -738,10 +743,12 @@ static void plugin_gen_inject(struct qemu_plugin_tb 
*plugin_tb)
 
 switch (type) {
 case PLUGIN_GEN_CB_UDATA:
-plugin_gen_insn_udata(plugin_tb, op, insn_idx);
+plugin_gen_insn_udata(plugin_tb, PLUGIN_CB_INSN,
+  op, insn_idx);
 break;
 case PLUGIN_GEN_CB_INLINE:
-plugin_gen_insn_inline(plugin_tb, op, insn_idx);
+plugin_gen_insn_inline(plugin_tb, PLUGIN_CB_INSN,
+   op, insn_idx);
 break;
 case PLUGIN_GEN_ENABLE_MEM_HELPER:
 plugin_gen_enable_mem_helper(plugin_tb, op, insn_idx);
@@ -773,6 +780,14 @@ static void plugin_gen_inject(struct qemu_plugin_tb 
*plugin_tb)
 g_assert(insn_idx >= 0);
 
 switch (type) {
+case PLUGIN_GEN_CB_UDATA:
+plugin_gen_insn_udata(plugin_tb, PLUGIN_CB_AFTER_INSN,
+  op, insn_idx);
+break;
+case PLUGIN_GEN_CB_INLINE:
+plugin_gen_insn_inline(plugin_tb, PLUGIN_CB_AFTER_INSN,
+   op, insn_idx);
+break;
 case PLUGIN_GEN_DISABLE_MEM_HELPER:
 plugin_gen_disable_mem_helper(plugin_tb, op, insn_idx);
 break;
diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index 1a6a5448c8..5e57dc754e 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -180,6 +180,12 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, 
int *max_insns,
 ops->translate_insn(db, cpu);
 }
 
+
+/* Stop translation if translate_insn so indicated.  */
+if (db->is_jmp != DISAS_NEXT) {
+break;
+}
+
 /*
  * We can't instrument after instructions that change control
  * flow although this only really affects post-load operations.
@@ -193,11 +199,6 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, 
int *max_insns,
 plugin_gen_insn_end();
 }
 
-/* Stop translation if translate_insn so indicated.  */
-if (db->is_jmp != DISAS_NEXT) {
-break;
-}
-
 /* Stop translation if the output buffer is full,
or we have executed all of the allowed instructions.  */
 if (tcg_op_buf_full() || db->num_insns >= db->max_insns) {
@@ -21

Re: [PATCH for-8.1 v10 04/14] linux-user: Use MAP_FIXED_NOREPLACE for initial image mmap

2023-08-08 Thread Alex Bennée



Akihiko Odaki  writes:

> On 2023/08/08 18:43, Alex Bennée wrote:
>> Richard Henderson  writes:
>> 
>>> Use this as extra protection for the guest mapping over
>>> any qemu host mappings.
>>>
>>> Tested-by: Helge Deller 
>>> Reviewed-by: Helge Deller 
>>> Reviewed-by: Akihiko Odaki 
>>> Signed-off-by: Richard Henderson 
>>> ---
>>>   linux-user/elfload.c | 9 ++---
>>>   1 file changed, 6 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/linux-user/elfload.c b/linux-user/elfload.c
>>> index 36e4026f05..1b4bb2d5af 100644
>>> --- a/linux-user/elfload.c
>>> +++ b/linux-user/elfload.c
>>> @@ -3147,8 +3147,11 @@ static void load_elf_image(const char *image_name, 
>>> int image_fd,
>>>   /*
>>>* Reserve address space for all of this.
>>>*
>>> - * In the case of ET_EXEC, we supply MAP_FIXED so that we get
>>> - * exactly the address range that is required.
>>> + * In the case of ET_EXEC, we supply MAP_FIXED_NOREPLACE so that we get
>>> + * exactly the address range that is required.  Without reserved_va,
>>> + * the guest address space is not isolated.  We have attempted to avoid
>>> + * conflict with the host program itself via probe_guest_base, but 
>>> using
>>> + * MAP_FIXED_NOREPLACE instead of MAP_FIXED provides an extra check.
>>>*
>>>* Otherwise this is ET_DYN, and we are searching for a location
>>>* that can hold the memory space required.  If the image is
>>> @@ -3160,7 +3163,7 @@ static void load_elf_image(const char *image_name, 
>>> int image_fd,
>>>*/
>>>   load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, 
>>> PROT_NONE,
>>>   MAP_PRIVATE | MAP_ANON | MAP_NORESERVE |
>>> -(ehdr->e_type == ET_EXEC ? MAP_FIXED : 0),
>>> +(ehdr->e_type == ET_EXEC ? MAP_FIXED_NOREPLACE 
>>> : 0),
>>>   -1, 0);
>> We should probably also check the result == load_addr for the places
>> where MAP_FIXED_NOREPLACE isn't supported as we have this in osdep.h:
>>#ifndef MAP_FIXED_NOREPLACE
>>#define MAP_FIXED_NOREPLACE 0
>>#endif
>> See 2667e069e7 (linux-user: don't use MAP_FIXED in
>> pgd_find_hole_fallback)
>
> It assumes target_mmap() emulates MAP_FIXED_NOREPLACE when the host
> does not support it as commit e69e032d1a ("linux-user: Use
> MAP_FIXED_NOREPLACE for do_brk()") already does, but defining
> MAP_FIXED_NOREPLACE zero breaks such emulation. I wrote a fix:
> https://patchew.org/QEMU/20230808115242.73025-1-akihiko.od...@daynix.com/

Hmm doesn't that push the problem to real mmap() calls to a host system
that doesn't support MAP_FIXED_NOREPLACE?

I wonder if we need an internal flag rather than overloading the host
flags?

>
>> 
>>>   if (load_addr == -1) {
>>>   goto exit_mmap;
>> 


-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH for-8.1 v10 04/14] linux-user: Use MAP_FIXED_NOREPLACE for initial image mmap

2023-08-08 Thread Akihiko Odaki


On 2023/08/08 22:48, Alex Bennée wrote:


Akihiko Odaki  writes:


On 2023/08/08 18:43, Alex Bennée wrote:

Richard Henderson  writes:


Use this as extra protection for the guest mapping over
any qemu host mappings.

Tested-by: Helge Deller 
Reviewed-by: Helge Deller 
Reviewed-by: Akihiko Odaki 
Signed-off-by: Richard Henderson 
---
   linux-user/elfload.c | 9 ++---
   1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 36e4026f05..1b4bb2d5af 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -3147,8 +3147,11 @@ static void load_elf_image(const char *image_name, int 
image_fd,
   /*
* Reserve address space for all of this.
*
- * In the case of ET_EXEC, we supply MAP_FIXED so that we get
- * exactly the address range that is required.
+ * In the case of ET_EXEC, we supply MAP_FIXED_NOREPLACE so that we get
+ * exactly the address range that is required.  Without reserved_va,
+ * the guest address space is not isolated.  We have attempted to avoid
+ * conflict with the host program itself via probe_guest_base, but using
+ * MAP_FIXED_NOREPLACE instead of MAP_FIXED provides an extra check.
*
* Otherwise this is ET_DYN, and we are searching for a location
* that can hold the memory space required.  If the image is
@@ -3160,7 +3163,7 @@ static void load_elf_image(const char *image_name, int 
image_fd,
*/
   load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, PROT_NONE,
   MAP_PRIVATE | MAP_ANON | MAP_NORESERVE |
-(ehdr->e_type == ET_EXEC ? MAP_FIXED : 0),
+(ehdr->e_type == ET_EXEC ? MAP_FIXED_NOREPLACE : 
0),
   -1, 0);

We should probably also check the result == load_addr for the places
where MAP_FIXED_NOREPLACE isn't supported as we have this in osdep.h:
#ifndef MAP_FIXED_NOREPLACE
#define MAP_FIXED_NOREPLACE 0
#endif
See 2667e069e7 (linux-user: don't use MAP_FIXED in
pgd_find_hole_fallback)


It assumes target_mmap() emulates MAP_FIXED_NOREPLACE when the host
does not support it as commit e69e032d1a ("linux-user: Use
MAP_FIXED_NOREPLACE for do_brk()") already does, but defining
MAP_FIXED_NOREPLACE zero breaks such emulation. I wrote a fix:
https://patchew.org/QEMU/20230808115242.73025-1-akihiko.od...@daynix.com/


Hmm doesn't that push the problem to real mmap() calls to a host system
that doesn't support MAP_FIXED_NOREPLACE?


That can happen even without that patch if you run QEMU built with a new 
libc on old Linux so we should have prepared for that kind of situation.


The man page also says:
> Note that older kernels which do not recognize the MAP_FIXED_NOREPLACE
> flag will typically (upon detecting a collision with a preexisting
> mapping) fall back to a “non-MAP_FIXED” type of behavior: they will
> return an address that is different from the requested address.
> Therefore, backward-compatible software should check the returned
> address against the requested address.
https://man7.org/linux/man-pages/man2/mmap.2.html

It basically means MAP_FIXED_NOREPLACE has no effect on a host that 
doesn't support it, and the existing code checking the returned address 
should continue to work.




I wonder if we need an internal flag rather than overloading the host
flags?










   if (load_addr == -1) {
   goto exit_mmap;

[RFC v4 00/11] Native Library Calls

2023-08-08 Thread Yeqi Fu

Executing a program under QEMU's user mode subjects the entire
program, including all library calls, to translation. It's important
to understand that many of these library functions are optimized
specifically for the guest architecture. Therefore, their
translation might not yield the most efficient execution.

When the semantics of a library function are well defined, we can
capitalize on this by substituting the translated version with a call
to the native equivalent function.

To achieve tangible results, focus should be given to functions such
as memory-related ('mem*') and string-related ('str*') functions.
These subsets of functions often have the most significant effect
on overall performance, making them optimal candidates for
optimization.

Yeqi Fu (11):
  build: Implement logic for sharing cross-building config files
  build: Implement libnative library and the build machinery for
libnative
  linux-user: Implement envlist_appendenv and add tests for envlist
  linux-user: Implement native-bypass option support
  linux-user/elfload: Add support for parsing symbols of native
libraries.
  tcg: Add tcg opcodes and helpers for native library calls
  target/i386: Add support for native library calls
  target/mips: Add support for native library calls
  target/arm: Add support for native library calls
  tests/tcg/multiarch: Add nativecall.c test
  docs/user: Add doc for native library calls

 Makefile|   2 +
 accel/tcg/tcg-runtime.h |  22 
 common-user/native/Makefile.include |   9 ++
 common-user/native/Makefile.target  |  22 
 common-user/native/libnative.c  |  67 
 configs/targets/aarch64-linux-user.mak  |   1 +
 configs/targets/arm-linux-user.mak  |   1 +
 configs/targets/i386-linux-user.mak |   1 +
 configs/targets/mips-linux-user.mak |   1 +
 configs/targets/mips64-linux-user.mak   |   1 +
 configs/targets/x86_64-linux-user.mak   |   1 +
 configure   |  96 
 docs/user/index.rst |   1 +
 docs/user/native_calls.rst  |  90 +++
 include/native/libnative.h  |   8 ++
 include/native/native.h |   9 ++
 include/qemu/envlist.h  |  13 +++
 include/tcg/tcg-op-common.h |  11 ++
 include/tcg/tcg.h   |   9 ++
 linux-user/elfload.c|  85 +-
 linux-user/main.c   |  38 +++
 target/arm/tcg/translate-a64.c  |  14 +++
 target/arm/tcg/translate.c  |  11 ++
 target/i386/tcg/translate.c |  27 +
 target/mips/tcg/translate.c |  21 +++-
 tcg/tcg-op.c| 140 
 tests/tcg/multiarch/Makefile.target |  17 +++
 tests/tcg/multiarch/native/nativecall.c |  98 +
 tests/unit/meson.build  |   1 +
 tests/unit/test-envlist.c   |  94 
 util/envlist.c  |  71 ++--
 31 files changed, 943 insertions(+), 39 deletions(-)
 create mode 100644 common-user/native/Makefile.include
 create mode 100644 common-user/native/Makefile.target
 create mode 100644 common-user/native/libnative.c
 create mode 100644 docs/user/native_calls.rst
 create mode 100644 include/native/libnative.h
 create mode 100644 include/native/native.h
 create mode 100644 tests/tcg/multiarch/native/nativecall.c
 create mode 100644 tests/unit/test-envlist.c

-- 
2.34.1

[RFC v4 02/11] build: Implement libnative library and the build machinery for libnative

2023-08-08 Thread Yeqi Fu

This commit implements a shared library, where native functions are
rewritten as special instructions. At runtime, user programs load
the shared library, and special instructions are executed when
native functions are called.

Signed-off-by: Yeqi Fu 
---
 Makefile|  2 +
 common-user/native/Makefile.include |  9 
 common-user/native/Makefile.target  | 22 ++
 common-user/native/libnative.c  | 67 +
 configure   | 39 +
 include/native/libnative.h  |  8 
 6 files changed, 147 insertions(+)
 create mode 100644 common-user/native/Makefile.include
 create mode 100644 common-user/native/Makefile.target
 create mode 100644 common-user/native/libnative.c
 create mode 100644 include/native/libnative.h

diff --git a/Makefile b/Makefile
index 5d48dfac18..6f6147b40f 100644
--- a/Makefile
+++ b/Makefile
@@ -182,6 +182,8 @@ SUBDIR_MAKEFLAGS=$(if $(V),,--no-print-directory --quiet)
 
 include $(SRC_PATH)/tests/Makefile.include
 
+include $(SRC_PATH)/common-user/native/Makefile.include
+
 all: recurse-all
 
 ROMS_RULES=$(foreach t, all clean distclean, $(addsuffix /$(t), $(ROMS)))
diff --git a/common-user/native/Makefile.include 
b/common-user/native/Makefile.include
new file mode 100644
index 00..40d20bcd4c
--- /dev/null
+++ b/common-user/native/Makefile.include
@@ -0,0 +1,9 @@
+.PHONY: build-native
+build-native: $(NATIVE_TARGETS:%=build-native-library-%)
+$(NATIVE_TARGETS:%=build-native-library-%): build-native-library-%:
+   $(call quiet-command, \
+   $(MAKE) -C common-user/native/$* $(SUBDIR_MAKEFLAGS), \
+   "BUILD","$* native library")
+# endif
+
+all: build-native
diff --git a/common-user/native/Makefile.target 
b/common-user/native/Makefile.target
new file mode 100644
index 00..0c1241b368
--- /dev/null
+++ b/common-user/native/Makefile.target
@@ -0,0 +1,22 @@
+# -*- Mode: makefile -*-
+#
+# Library for native calls
+#
+
+all:
+-include ../../config-host.mak
+-include config-target.mak
+
+CFLAGS+=-O1 -fPIC -shared -fno-stack-protector -I$(SRC_PATH)/include 
-D$(TARGET_NAME)
+LDFLAGS+=
+
+SRC = $(SRC_PATH)/common-user/native/libnative.c
+LIBNATIVE = libnative.so
+
+all: $(LIBNATIVE)
+
+$(LIBNATIVE): $(SRC)
+   $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(EXTRA_NATIVE_CALL_FLAGS) $< -o $@ 
$(LDFLAGS)
+
+clean:
+   rm -f $(LIBNATIVE)
diff --git a/common-user/native/libnative.c b/common-user/native/libnative.c
new file mode 100644
index 00..662ae6fbfe
--- /dev/null
+++ b/common-user/native/libnative.c
@@ -0,0 +1,67 @@
+#include 
+#include 
+#include 
+
+#include "native/libnative.h"
+
+#define WRAP_NATIVE() \
+do {  \
+__asm__ volatile(__CALL_EXPR : : : "memory"); \
+} while (0)
+
+#if defined(i386) || defined(x86_64)
+/*
+ * An unused instruction is utilized to mark a native call.
+ */
+#define __CALL_EXPR ".byte 0x0f, 0xff;"
+#endif
+
+#if defined(arm) || defined(aarch64)
+/*
+ * HLT is an invalid instruction for userspace and usefully has 16
+ * bits of spare immeadiate data which we can stuff data in.
+ */
+#define __CALL_EXPR "hlt 0x;"
+#endif
+
+#if defined(mips) || defined(mips64)
+/*
+ * The syscall instruction contains 20 unused bits, which are typically
+ * set to 0. These bits can be used to store non-zero data,
+ * distinguishing them from a regular syscall instruction.
+ */
+#define __CALL_EXPR "syscall 0x;"
+#endif
+
+void *memcpy(void *dest, const void *src, size_t n)
+{
+WRAP_NATIVE();
+}
+int memcmp(const void *s1, const void *s2, size_t n)
+{
+WRAP_NATIVE();
+}
+void *memset(void *s, int c, size_t n)
+{
+WRAP_NATIVE();
+}
+char *strncpy(char *dest, const char *src, size_t n)
+{
+WRAP_NATIVE();
+}
+int strncmp(const char *s1, const char *s2, size_t n)
+{
+WRAP_NATIVE();
+}
+char *strcpy(char *dest, const char *src)
+{
+WRAP_NATIVE();
+}
+char *strcat(char *dest, const char *src)
+{
+WRAP_NATIVE();
+}
+int strcmp(const char *s1, const char *s2)
+{
+WRAP_NATIVE();
+}
diff --git a/configure b/configure
index a076583141..e02fc2c5c0 100755
--- a/configure
+++ b/configure
@@ -1822,6 +1822,45 @@ if test "$tcg" = "enabled"; then
 fi
 )
 
+# common-user/native configuration
+(mkdir -p common-user/native
+
+native_targets=
+for target in $target_list; do
+  case $target in
+*-softmmu)
+continue
+;;
+  esac
+
+  # native call is only supported on these architectures
+  arch=${target%%-*}
+  config_target_mak=common-user/native/$target/config-target.mak
+  case $arch in
+i386|x86_64|arm|aarch64|mips|mips64)
+  if test -f cross-build/$target/config-target.mak; then
+mkdir -p "common-user/native/$target"
+ln -srf cross-build/$target/config-target.mak "$config_target_mak"
+if test $arch = arm; then
+  echo "EXTRA_NATIVE_CALL_FLAGS=-marm" >> "$config_target_mak"
+fi
+

[RFC v4 08/11] target/mips: Add support for native library calls

2023-08-08 Thread Yeqi Fu

This commit introduces support for native library calls on the
mips target. When special instructions reserved for native calls
are encountered, the code now performs address translation and
generates the corresponding native call.

Signed-off-by: Yeqi Fu 
---
 configs/targets/mips-linux-user.mak   |  1 +
 configs/targets/mips64-linux-user.mak |  1 +
 target/mips/tcg/translate.c   | 21 -
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/configs/targets/mips-linux-user.mak 
b/configs/targets/mips-linux-user.mak
index b4569a9893..fa005d487a 100644
--- a/configs/targets/mips-linux-user.mak
+++ b/configs/targets/mips-linux-user.mak
@@ -3,3 +3,4 @@ TARGET_ABI_MIPSO32=y
 TARGET_SYSTBL_ABI=o32
 TARGET_SYSTBL=syscall_o32.tbl
 TARGET_BIG_ENDIAN=y
+CONFIG_NATIVE_CALL=y
diff --git a/configs/targets/mips64-linux-user.mak 
b/configs/targets/mips64-linux-user.mak
index d2ff509a11..ecfe6bcf73 100644
--- a/configs/targets/mips64-linux-user.mak
+++ b/configs/targets/mips64-linux-user.mak
@@ -4,3 +4,4 @@ TARGET_BASE_ARCH=mips
 TARGET_SYSTBL_ABI=n64
 TARGET_SYSTBL=syscall_n64.tbl
 TARGET_BIG_ENDIAN=y
+CONFIG_NATIVE_CALL=y
diff --git a/target/mips/tcg/translate.c b/target/mips/tcg/translate.c
index 74af91e4f5..51a5c1d49b 100644
--- a/target/mips/tcg/translate.c
+++ b/target/mips/tcg/translate.c
@@ -36,6 +36,7 @@
 #include "exec/helper-info.c.inc"
 #undef  HELPER_H
 
+#include "native/native.h"
 
 /*
  * Many sysemu-only helpers are not reachable for user-only.
@@ -13487,7 +13488,7 @@ static void decode_opc_special_legacy(CPUMIPSState 
*env, DisasContext *ctx)
 static void decode_opc_special(CPUMIPSState *env, DisasContext *ctx)
 {
 int rs, rt, rd, sa;
-uint32_t op1;
+uint32_t op1, sig;
 
 rs = (ctx->opcode >> 21) & 0x1f;
 rt = (ctx->opcode >> 16) & 0x1f;
@@ -13583,6 +13584,24 @@ static void decode_opc_special(CPUMIPSState *env, 
DisasContext *ctx)
 #endif
 break;
 case OPC_SYSCALL:
+sig = (ctx->opcode) >> 6;
+if ((sig == 0x) && native_bypass_enabled()) {
+TCGv arg1 = tcg_temp_new();
+TCGv arg2 = tcg_temp_new();
+TCGv arg3 = tcg_temp_new();
+TCGv ret = tcg_temp_new();
+const char *fun_name = lookup_symbol((ctx->base.pc_next) & 0xfff);
+tcg_gen_mov_tl(arg1, cpu_gpr[4]);
+tcg_gen_mov_tl(arg2, cpu_gpr[5]);
+tcg_gen_mov_tl(arg3, cpu_gpr[6]);
+#if defined(TARGET_MIPS64)
+gen_native_call_i64(fun_name, ret, arg1, arg2, arg3);
+#else
+gen_native_call_i32(fun_name, ret, arg1, arg2, arg3);
+#endif
+tcg_gen_mov_tl(cpu_gpr[2], ret);
+break;
+}
 generate_exception_end(ctx, EXCP_SYSCALL);
 break;
 case OPC_BREAK:
-- 
2.34.1

[RFC v4 06/11] tcg: Add tcg opcodes and helpers for native library calls

2023-08-08 Thread Yeqi Fu

This commit implements tcg opcodes and helpers for extracting and
invoke native functions.

Signed-off-by: Yeqi Fu 
---
 accel/tcg/tcg-runtime.h |  22 ++
 include/tcg/tcg-op-common.h |  11 +++
 include/tcg/tcg.h   |   9 +++
 tcg/tcg-op.c| 140 
 4 files changed, 182 insertions(+)

diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
index 39e68007f9..bda78b4489 100644
--- a/accel/tcg/tcg-runtime.h
+++ b/accel/tcg/tcg-runtime.h
@@ -37,6 +37,28 @@ DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, 
env)
  */
 #define helper_memset memset
 DEF_HELPER_FLAGS_3(memset, TCG_CALL_NO_RWG, ptr, ptr, int, ptr)
+
+#define helper_memcpy memcpy
+DEF_HELPER_FLAGS_3(memcpy, TCG_CALL_NO_RWG, ptr, ptr, ptr, ptr)
+
+#define helper_strncpy strncpy
+DEF_HELPER_FLAGS_3(strncpy, TCG_CALL_NO_RWG, ptr, ptr, ptr, ptr)
+
+#define helper_memcmp memcmp
+DEF_HELPER_FLAGS_3(memcmp, TCG_CALL_NO_RWG, int, ptr, ptr, ptr)
+
+#define helper_strncmp strncmp
+DEF_HELPER_FLAGS_3(strncmp, TCG_CALL_NO_RWG, int, ptr, ptr, ptr)
+
+#define helper_strcpy strcpy
+DEF_HELPER_FLAGS_2(strcpy, TCG_CALL_NO_RWG, ptr, ptr, ptr)
+
+#define helper_strcat strcat
+DEF_HELPER_FLAGS_2(strcat, TCG_CALL_NO_RWG, ptr, ptr, ptr)
+
+#define helper_strcmp strcmp
+DEF_HELPER_FLAGS_2(strcmp, TCG_CALL_NO_RWG, int, ptr, ptr)
+
 #endif /* IN_HELPER_PROTO */
 
 DEF_HELPER_FLAGS_3(ld_i128, TCG_CALL_NO_WG, i128, env, i64, i32)
diff --git a/include/tcg/tcg-op-common.h b/include/tcg/tcg-op-common.h
index be382bbf77..2e712f1573 100644
--- a/include/tcg/tcg-op-common.h
+++ b/include/tcg/tcg-op-common.h
@@ -903,6 +903,12 @@ void tcg_gen_ld_vec(TCGv_vec r, TCGv_ptr base, TCGArg 
offset);
 void tcg_gen_st_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset);
 void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t);
 
+/* Host <-> guest conversions */
+void tcg_gen_g2h_i32(TCGv_ptr ret, TCGv_i32 arg);
+void tcg_gen_g2h_i64(TCGv_ptr ret, TCGv_i64 arg);
+void tcg_gen_h2g_i32(TCGv_i32 ret, TCGv_ptr arg);
+void tcg_gen_h2g_i64(TCGv_i64 ret, TCGv_ptr arg);
+
 /* Host pointer ops */
 
 #if UINTPTR_MAX == UINT32_MAX
@@ -938,6 +944,11 @@ static inline void tcg_gen_addi_ptr(TCGv_ptr r, TCGv_ptr 
a, intptr_t b)
 glue(tcg_gen_addi_,PTR)((NAT)r, (NAT)a, b);
 }
 
+static inline void tcg_gen_subi_ptr(TCGv_ptr r, TCGv_ptr a, intptr_t b)
+{
+glue(tcg_gen_subi_, PTR)((NAT)r, (NAT)a, b);
+}
+
 static inline void tcg_gen_mov_ptr(TCGv_ptr d, TCGv_ptr s)
 {
 glue(tcg_gen_mov_,PTR)((NAT)d, (NAT)s);
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 0875971719..7c7fafb1cd 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -35,6 +35,9 @@
 #include "tcg-target.h"
 #include "tcg/tcg-cond.h"
 #include "tcg/debug-assert.h"
+#ifdef CONFIG_USER_ONLY
+#include "exec/user/guest-base.h"
+#endif
 
 /* XXX: make safe guess about sizes */
 #define MAX_OP_PER_INSTR 266
@@ -1148,4 +1151,10 @@ static inline const TCGOpcode *tcg_swap_vecop_list(const 
TCGOpcode *n)
 
 bool tcg_can_emit_vecop_list(const TCGOpcode *, TCGType, unsigned);
 
+/* native call */
+void gen_native_call_i32(const char *fun_name, TCGv_i32 ret,
+ TCGv_i32 arg1, TCGv_i32 arg2, TCGv_i32 arg3);
+void gen_native_call_i64(const char *fun_name, TCGv_i64 ret,
+ TCGv_i64 arg1, TCGv_i64 arg2, TCGv_i64 arg3);
+
 #endif /* TCG_H */
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 7aadb37756..83e3a5682c 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -2852,3 +2852,143 @@ void tcg_gen_lookup_and_goto_ptr(void)
 tcg_gen_op1i(INDEX_op_goto_ptr, tcgv_ptr_arg(ptr));
 tcg_temp_free_ptr(ptr);
 }
+
+#ifdef CONFIG_USER_ONLY
+void tcg_gen_g2h_i32(TCGv_ptr ret, TCGv_i32 arg)
+{
+TCGv_ptr temp = tcg_temp_new_ptr();
+tcg_gen_ext_i32_ptr(temp, arg);
+tcg_gen_addi_ptr(ret, temp, guest_base);
+tcg_temp_free_ptr(temp);
+}
+
+void tcg_gen_g2h_i64(TCGv_ptr ret, TCGv_i64 arg)
+{
+TCGv_ptr temp = tcg_temp_new_ptr();
+tcg_gen_trunc_i64_ptr(temp, arg); /* Not sure */
+tcg_gen_addi_ptr(ret, temp, guest_base);
+tcg_temp_free_ptr(temp);
+}
+
+void tcg_gen_h2g_i32(TCGv_i32 ret, TCGv_ptr arg)
+{
+TCGv_ptr temp = tcg_temp_new_ptr();
+tcg_gen_subi_ptr(temp, arg, guest_base);
+tcg_gen_trunc_ptr_i32(ret, temp);
+tcg_temp_free_ptr(temp);
+}
+
+void tcg_gen_h2g_i64(TCGv_i64 ret, TCGv_ptr arg)
+{
+TCGv_ptr temp = tcg_temp_new_ptr();
+tcg_gen_subi_ptr(temp, arg, guest_base);
+tcg_gen_extu_ptr_i64(ret, temp);
+tcg_temp_free_ptr(temp);
+}
+
+#else
+void tcg_gen_g2h_i32(TCGv_ptr ret, TCGv_i32 arg)
+{
+}
+void tcg_gen_g2h_i64(TCGv_ptr ret, TCGv_i64 arg)
+{
+}
+void tcg_gen_h2g_i32(TCGv_i32 ret, TCGv_ptr arg)
+{
+}
+void tcg_gen_h2g_i64(TCGv_i64 ret, TCGv_ptr arg)
+{
+}
+#endif
+
+/* p: iptr ; i: i32 ; a: ptr(address) */
+void gen_native_call_i32(const char *fun_name, TCGv_i32 ret,
+ TCGv_i32 arg1, TCGv_i32 arg2, TCGv_i32 arg3)
+{
+TCGv_ptr arg1

[RFC v4 03/11] linux-user: Implement envlist_appendenv and add tests for envlist

2023-08-08 Thread Yeqi Fu

Signed-off-by: Yeqi Fu 
---
 include/qemu/envlist.h| 13 ++
 tests/unit/meson.build|  1 +
 tests/unit/test-envlist.c | 94 +++
 util/envlist.c| 71 -
 4 files changed, 169 insertions(+), 10 deletions(-)
 create mode 100644 tests/unit/test-envlist.c

diff --git a/include/qemu/envlist.h b/include/qemu/envlist.h
index 6006dfae44..9eb1459e79 100644
--- a/include/qemu/envlist.h
+++ b/include/qemu/envlist.h
@@ -1,12 +1,25 @@
 #ifndef ENVLIST_H
 #define ENVLIST_H
 
+#include "qemu/queue.h"
+
+struct envlist_entry {
+const char *ev_var;/* actual env value */
+QLIST_ENTRY(envlist_entry) ev_link;
+};
+
+struct envlist {
+QLIST_HEAD(, envlist_entry) el_entries; /* actual entries */
+size_t el_count;/* number of entries */
+};
+
 typedef struct envlist envlist_t;
 
 envlist_t *envlist_create(void);
 void envlist_free(envlist_t *);
 int envlist_setenv(envlist_t *, const char *);
 int envlist_unsetenv(envlist_t *, const char *);
+int envlist_appendenv(envlist_t *, const char *, const char *);
 int envlist_parse_set(envlist_t *, const char *);
 int envlist_parse_unset(envlist_t *, const char *);
 char **envlist_to_environ(const envlist_t *, size_t *);
diff --git a/tests/unit/meson.build b/tests/unit/meson.build
index 93977cc32d..9b25b45271 100644
--- a/tests/unit/meson.build
+++ b/tests/unit/meson.build
@@ -50,6 +50,7 @@ tests = {
   'test-qapi-util': [],
   'test-interval-tree': [],
   'test-xs-node': [qom],
+  'test-envlist': [],
 }
 
 if have_system or have_tools
diff --git a/tests/unit/test-envlist.c b/tests/unit/test-envlist.c
new file mode 100644
index 00..d1e148f738
--- /dev/null
+++ b/tests/unit/test-envlist.c
@@ -0,0 +1,94 @@
+/*
+ * testenvlist unit-tests.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/envlist.h"
+
+static void envlist_create_free_test(void)
+{
+envlist_t *testenvlist;
+
+testenvlist = envlist_create();
+g_assert(testenvlist != NULL);
+g_assert(testenvlist->el_count == 0);
+
+envlist_free(testenvlist);
+}
+
+static void envlist_set_unset_test(void)
+{
+envlist_t *testenvlist;
+const char *env = "TEST=123";
+struct envlist_entry *entry;
+
+testenvlist = envlist_create();
+g_assert(envlist_setenv(testenvlist, env) == 0);
+g_assert(testenvlist->el_count == 1);
+entry = testenvlist->el_entries.lh_first;
+g_assert_cmpstr(entry->ev_var, ==, "TEST=123");
+g_assert(envlist_unsetenv(testenvlist, "TEST") == 0);
+g_assert(testenvlist->el_count == 0);
+
+envlist_free(testenvlist);
+}
+
+static void envlist_parse_set_unset_test(void)
+{
+envlist_t *testenvlist;
+const char *env = "TEST1=123,TEST2=456";
+
+testenvlist = envlist_create();
+g_assert(envlist_parse_set(testenvlist, env) == 0);
+g_assert(testenvlist->el_count == 2);
+g_assert(envlist_parse_unset(testenvlist, "TEST1,TEST2") == 0);
+g_assert(testenvlist->el_count == 0);
+
+envlist_free(testenvlist);
+}
+
+static void envlist_appendenv_test(void)
+{
+envlist_t *testenvlist;
+const char *env = "TEST=123";
+struct envlist_entry *entry;
+
+testenvlist = envlist_create();
+g_assert(envlist_setenv(testenvlist, env) == 0);
+g_assert(envlist_appendenv(testenvlist, "TEST=456", ";") == 0);
+entry = testenvlist->el_entries.lh_first;
+g_assert_cmpstr(entry->ev_var, ==, "TEST=123;456");
+
+envlist_free(testenvlist);
+}
+
+static void envlist_to_environ_test(void)
+{
+envlist_t *testenvlist;
+const char *env = "TEST1=123,TEST2=456";
+size_t count;
+char **environ;
+
+testenvlist = envlist_create();
+g_assert(envlist_parse_set(testenvlist, env) == 0);
+g_assert(testenvlist->el_count == 2);
+environ = envlist_to_environ(testenvlist, &count);
+g_assert(count == 2);
+g_assert_cmpstr(environ[1], ==, "TEST1=123");
+g_assert_cmpstr(environ[0], ==, "TEST2=456");
+
+envlist_free(testenvlist);
+}
+
+int main(int argc, char **argv)
+{
+g_test_init(&argc, &argv, NULL);
+
+g_test_add_func("/envlist/create_free", envlist_create_free_test);
+g_test_add_func("/envlist/set_unset", envlist_set_unset_test);
+g_test_add_func("/envlist/parse_set_unset", envlist_parse_set_unset_test);
+g_test_add_func("/envlist/appendenv", envlist_appendenv_test);
+g_test_add_func("/envlist/to_environ", envlist_to_environ_test);
+
+return g_test_run();
+}
diff --git a/util/envlist.c b/util/envlist.c
index db937c0427..2570f0e30c 100644
--- a/util/envlist.c
+++ b/util/envlist.c
@@ -2,16 +2,6 @@
 #include "qemu/queue.h"
 #include "qemu/envlist.h"
 
-struct envlist_entry {
-const char *ev_var;/* actual env value */
-QLIST_ENTRY(envlist_entry) ev_link;
-};
-
-struct envlist {
-QLIST_HEAD(, envlist_entry) el_entries; /* actual entries */
-size_t el_count;/* number of entries */
-};
-
 static int envlist_pars

[RFC v4 07/11] target/i386: Add support for native library calls

2023-08-08 Thread Yeqi Fu

This commit introduces support for native library calls on the
i386 target. When special instructions reserved for native calls
are encountered, the code now performs address translation and
generates the corresponding native call.

Signed-off-by: Yeqi Fu 
---
 configs/targets/i386-linux-user.mak   |  1 +
 configs/targets/x86_64-linux-user.mak |  1 +
 target/i386/tcg/translate.c   | 27 +++
 3 files changed, 29 insertions(+)

diff --git a/configs/targets/i386-linux-user.mak 
b/configs/targets/i386-linux-user.mak
index 5b2546a430..2d8bca8f93 100644
--- a/configs/targets/i386-linux-user.mak
+++ b/configs/targets/i386-linux-user.mak
@@ -2,3 +2,4 @@ TARGET_ARCH=i386
 TARGET_SYSTBL_ABI=i386
 TARGET_SYSTBL=syscall_32.tbl
 TARGET_XML_FILES= gdb-xml/i386-32bit.xml
+CONFIG_NATIVE_CALL=y
diff --git a/configs/targets/x86_64-linux-user.mak 
b/configs/targets/x86_64-linux-user.mak
index 9ceefbb615..a53b017454 100644
--- a/configs/targets/x86_64-linux-user.mak
+++ b/configs/targets/x86_64-linux-user.mak
@@ -3,3 +3,4 @@ TARGET_BASE_ARCH=i386
 TARGET_SYSTBL_ABI=common,64
 TARGET_SYSTBL=syscall_64.tbl
 TARGET_XML_FILES= gdb-xml/i386-64bit.xml
+CONFIG_NATIVE_CALL=y
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 90c7b32f36..28bf4477fb 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -33,6 +33,7 @@
 #include "helper-tcg.h"
 
 #include "exec/log.h"
+#include "native/native.h"
 
 #define HELPER_H "helper.h"
 #include "exec/helper-info.c.inc"
@@ -6810,6 +6811,32 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 case 0x1d0 ... 0x1fe:
 disas_insn_new(s, cpu, b);
 break;
+case 0x1ff:
+if (native_bypass_enabled()) {
+TCGv ret = tcg_temp_new();
+TCGv arg1 = tcg_temp_new();
+TCGv arg2 = tcg_temp_new();
+TCGv arg3 = tcg_temp_new();
+const char *fun_name = lookup_symbol((s->base.pc_next) & 0xfff);
+#ifdef TARGET_X86_64
+tcg_gen_mov_tl(arg1, cpu_regs[R_EDI]);
+tcg_gen_mov_tl(arg2, cpu_regs[R_ESI]);
+tcg_gen_mov_tl(arg3, cpu_regs[R_EDX]);
+gen_native_call_i64(fun_name, ret, arg1, arg2, arg3);
+#else
+uintptr_t ra = GETPC();
+uint32_t a1 = cpu_ldl_data_ra(env, env->regs[R_ESP] + 4, ra);
+uint32_t a2 = cpu_ldl_data_ra(env, env->regs[R_ESP] + 8, ra);
+uint32_t a3 = cpu_ldl_data_ra(env, env->regs[R_ESP] + 12, ra);
+tcg_gen_movi_tl(arg1, a1);
+tcg_gen_movi_tl(arg2, a2);
+tcg_gen_movi_tl(arg3, a3);
+gen_native_call_i32(fun_name, ret, arg1, arg2, arg3);
+#endif
+tcg_gen_mov_tl(cpu_regs[R_EAX], ret);
+break;
+}
+break;
 default:
 goto unknown_op;
 }
-- 
2.34.1

[RFC v4 11/11] docs/user: Add doc for native library calls

2023-08-08 Thread Yeqi Fu

Signed-off-by: Yeqi Fu 
---
 docs/user/index.rst|  1 +
 docs/user/native_calls.rst | 90 ++
 2 files changed, 91 insertions(+)
 create mode 100644 docs/user/native_calls.rst

diff --git a/docs/user/index.rst b/docs/user/index.rst
index 782d27cda2..d3fc9b7af1 100644
--- a/docs/user/index.rst
+++ b/docs/user/index.rst
@@ -12,3 +12,4 @@ processes compiled for one CPU on another CPU.
:maxdepth: 2
 
main
+   native_calls
diff --git a/docs/user/native_calls.rst b/docs/user/native_calls.rst
new file mode 100644
index 00..c4b854096a
--- /dev/null
+++ b/docs/user/native_calls.rst
@@ -0,0 +1,90 @@
+Native Library Calls Optimization
+=
+
+Description
+---
+
+Executing a program under QEMU's user mode subjects the entire
+program, including all library calls, to translation. It's important
+to understand that many of these library functions are optimized
+specifically for the guest architecture. Therefore, their
+translation might not yield the most efficient execution.
+
+When the semantics of a library function are well defined, we can
+capitalize on this by substituting the translated version with a call
+to the native equivalent function.
+
+To achieve tangible results, focus should be given to functions such
+as memory-related ('mem*') and string-related ('str*') functions.
+These subsets of functions often have the most significant effect
+on overall performance, making them optimal candidates for
+optimization.
+
+Implementation
+--
+
+Upon setting the LD_PRELOAD environment variable, the dynamic linker
+will load the library specified in LD_PRELOAD preferentially. If there
+exist functions in the LD_PRELOAD library that share names with those
+in other libraries, they will override the corresponding functions in
+those other libraries.
+
+To implement native library bypass, we created a shared library and
+re-implemented the native functions within it as a special
+instruction sequence. By means of the LD_PRELOAD environment
+variable, we load this shared library into the user program.
+Therefore, when the user program calls a native function, it actually
+executes this special instruction sequence. During execution, QEMU's
+translator captures these special instructions and executes the
+corresponding native functions.
+
+These special instructions are implemented using
+architecture-specific unused or invalid opcodes, ensuring that they
+do not conflict with existing instructions.
+
+
+i386 and x86_64
+---
+An unused instruction is utilized to mark a native call.
+
+arm and aarch64
+---
+HLT is an invalid instruction for userspace and usefully has 16
+bits of spare immeadiate data which we can stuff data in.
+
+mips and mips64
+---
+The syscall instruction contains 20 unused bits, which are typically
+set to 0. These bits can be used to store non-zero data,
+distinguishing them from a regular syscall instruction.
+
+Usage
+-
+
+1. Install cross-compilation tools
+
+Cross-compilation tools are required to build the shared libraries
+that can hook the necessary library functions. For example, a viable
+command on Ubuntu is:
+
+::
+
+apt install gcc-arm-linux-gnueabihf gcc-aarch64-linux-gnu \
+gcc-mips-linux-gnu gcc-mips64-linux-gnuabi64
+
+
+2. Locate the compiled libnative.so
+
+After compilation, the libnative.so file can be found in the
+``./build/common-user/native/-linux-user`` directory.
+
+3. Run the program with the ``--native-bypass`` option
+
+To run your program with native library bypass, use the
+``--native-bypass`` option to import libnative.so:
+
+::
+
+qemu- --native-bypass \
+./build/common-user/native/-linux-user/libnative.so ./program
+
-- 
2.34.1

[RFC v4 05/11] linux-user/elfload: Add support for parsing symbols of native libraries.

2023-08-08 Thread Yeqi Fu

This commit addresses the need to parse symbols of native libraries.
The base address of a shared library is determined by the dynamic
linker. To simplify the process, we focus on the last three digits,
which reside within the same page and remain unaffected by the base
address.

Signed-off-by: Yeqi Fu 
---
 linux-user/elfload.c | 85 +---
 1 file changed, 80 insertions(+), 5 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 9a2ec568b0..9448f91005 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -22,6 +22,9 @@
 #include "qemu/error-report.h"
 #include "target_signal.h"
 #include "accel/tcg/debuginfo.h"
+#if defined(CONFIG_NATIVE_CALL)
+#include "native/native.h"
+#endif
 
 #ifdef _ARCH_PPC64
 #undef ARCH_DLINFO
@@ -2038,8 +2041,10 @@ static inline void 
bswap_mips_abiflags(Mips_elf_abiflags_v0 *abiflags) { }
 #ifdef USE_ELF_CORE_DUMP
 static int elf_core_dump(int, const CPUArchState *);
 #endif /* USE_ELF_CORE_DUMP */
-static void load_symbols(struct elfhdr *hdr, int fd, abi_ulong load_bias);
-
+static void load_symbols(struct elfhdr *hdr, int fd, abi_ulong load_bias,
+ uint load_type);
+#define ELF_SYM 1
+#define NATIVE_LIB_SYM 2
 /* Verify the portions of EHDR within E_IDENT for the target.
This can be performed before bswapping the entire header.  */
 static bool elf_check_ident(struct elfhdr *ehdr)
@@ -3302,7 +3307,7 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 }
 
 if (qemu_log_enabled()) {
-load_symbols(ehdr, image_fd, load_bias);
+load_symbols(ehdr, image_fd, load_bias, ELF_SYM);
 }
 
 debuginfo_report_elf(image_name, image_fd, load_bias);
@@ -3398,7 +3403,8 @@ static int symcmp(const void *s0, const void *s1)
 }
 
 /* Best attempt to load symbols from this ELF object. */
-static void load_symbols(struct elfhdr *hdr, int fd, abi_ulong load_bias)
+static void load_symbols(struct elfhdr *hdr, int fd, abi_ulong load_bias,
+ uint load_type)
 {
 int i, shnum, nsyms, sym_idx = 0, str_idx = 0;
 uint64_t segsz;
@@ -3456,7 +3462,21 @@ static void load_symbols(struct elfhdr *hdr, int fd, 
abi_ulong load_bias)
 for (i = 0; i < nsyms; ) {
 bswap_sym(syms + i);
 /* Throw away entries which we do not need.  */
-if (syms[i].st_shndx == SHN_UNDEF
+if (load_type == NATIVE_LIB_SYM)  {
+/*
+ * Load the native library.
+ * Since the base address of a shared library is determined
+ * by the dynamic linker, we only consider the last three
+ * digits here, which are within the same page and are not
+ * affected by the base address.
+ */
+#if defined(TARGET_ARM) || defined(TARGET_MIPS)
+/* The bottom address bit marks a Thumb or MIPS16 symbol.  */
+syms[i].st_value &= ~(target_ulong)1;
+#endif
+syms[i].st_value &= 0xfff;
+i++;
+} else if (syms[i].st_shndx == SHN_UNDEF
 || syms[i].st_shndx >= SHN_LORESERVE
 || ELF_ST_TYPE(syms[i].st_info) != STT_FUNC) {
 if (i < --nsyms) {
@@ -3542,14 +3562,63 @@ uint32_t get_elf_eflags(int fd)
 return ehdr.e_flags;
 }
 
+#if defined(CONFIG_NATIVE_CALL)
+static void load_native_library(const char *filename, struct image_info *info,
+ char bprm_buf[BPRM_BUF_SIZE])
+{
+int fd, retval;
+Error *err = NULL;
+
+fd = open(path(filename), O_RDONLY);
+if (fd < 0) {
+error_setg_file_open(&err, errno, filename);
+error_report_err(err);
+exit(-1);
+}
+
+retval = read(fd, bprm_buf, BPRM_BUF_SIZE);
+if (retval < 0) {
+error_setg_errno(&err, errno, "Error reading file header");
+error_reportf_err(err, "%s: ", filename);
+exit(-1);
+}
+
+if (retval < BPRM_BUF_SIZE) {
+memset(bprm_buf + retval, 0, BPRM_BUF_SIZE - retval);
+}
+
+struct elfhdr *ehdr = (struct elfhdr *)bprm_buf;
+
+if (!elf_check_ident(ehdr)) {
+error_setg(&err, "Invalid ELF image for this architecture");
+goto exit_errmsg;
+}
+bswap_ehdr(ehdr);
+if (!elf_check_ehdr(ehdr)) {
+error_setg(&err, "Invalid ELF image for this architecture");
+goto exit_errmsg;
+}
+
+/* We are only concerned with the symbols of native library. */
+load_symbols(ehdr, fd, 0, NATIVE_LIB_SYM);
+return;
+
+exit_errmsg:
+error_reportf_err(err, "%s: ", filename);
+exit(-1);
+}
+#endif
+
 int load_elf_binary(struct linux_binprm *bprm, struct image_info *info)
 {
 struct image_info interp_info;
+struct image_info nativelib_info;
 struct elfhdr elf_ex;
 char *elf_interpreter = NULL;
 char *scratch;
 
 memset(&interp_info, 0, sizeof(interp_info));
+memset(&nativelib_info, 0, sizeof(nativelib_info));
 #ifdef TARGET_MIPS
 interp_info.fp_abi = MIPS_AB

[RFC v4 04/11] linux-user: Implement native-bypass option support

2023-08-08 Thread Yeqi Fu

This commit implements the -native-bypass support in linux-user. The
native_calls_enabled() function can be true only when the
'-native-bypass' option is given.

Signed-off-by: Yeqi Fu 
---
 include/native/native.h |  9 +
 linux-user/main.c   | 38 ++
 2 files changed, 47 insertions(+)
 create mode 100644 include/native/native.h

diff --git a/include/native/native.h b/include/native/native.h
new file mode 100644
index 00..62951fafb1
--- /dev/null
+++ b/include/native/native.h
@@ -0,0 +1,9 @@
+/*
+ * Check if the native bypass feature is enabled.
+ */
+#if defined(CONFIG_USER_ONLY) && defined(CONFIG_NATIVE_CALL)
+extern char *native_lib_path;
+#define native_bypass_enabled() native_lib_path ? true : false
+#else
+#define native_bypass_enabled() false
+#endif
diff --git a/linux-user/main.c b/linux-user/main.c
index dba67ffa36..86ea0191f7 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -60,6 +60,11 @@
 #include "semihosting/semihost.h"
 #endif
 
+#if defined(CONFIG_NATIVE_CALL)
+#include "native/native.h"
+char *native_lib_path;
+#endif
+
 #ifndef AT_FLAGS_PRESERVE_ARGV0
 #define AT_FLAGS_PRESERVE_ARGV0_BIT 0
 #define AT_FLAGS_PRESERVE_ARGV0 (1 << AT_FLAGS_PRESERVE_ARGV0_BIT)
@@ -293,6 +298,17 @@ static void handle_arg_set_env(const char *arg)
 free(r);
 }
 
+#if defined(CONFIG_NATIVE_CALL)
+static void handle_arg_native_bypass(const char *arg)
+{
+if (access(arg, F_OK) != 0) {
+fprintf(stderr, "native library %s does not exist\n", arg);
+exit(EXIT_FAILURE);
+}
+native_lib_path = strdup(arg);
+}
+#endif
+
 static void handle_arg_unset_env(const char *arg)
 {
 char *r, *p, *token;
@@ -522,6 +538,10 @@ static const struct qemu_argument arg_table[] = {
  "",   "Generate a /tmp/perf-${pid}.map file for perf"},
 {"jitdump","QEMU_JITDUMP", false, handle_arg_jitdump,
  "",   "Generate a jit-${pid}.dump file for perf"},
+#if defined(CONFIG_NATIVE_CALL)
+{"native-bypass", "QEMU_NATIVE_BYPASS", true, handle_arg_native_bypass,
+ "",   "native bypass for library calls in user mode only."},
+#endif
 {NULL, NULL, false, NULL, NULL, NULL}
 };
 
@@ -834,6 +854,24 @@ int main(int argc, char **argv, char **envp)
 }
 }
 
+#if defined(CONFIG_NATIVE_CALL)
+/* Set the library for native bypass  */
+if (native_lib_path) {
+if (g_file_test(native_lib_path, G_FILE_TEST_EXISTS)) {
+GString *lib = g_string_new(native_lib_path);
+lib = g_string_prepend(lib, "LD_PRELOAD=");
+if (envlist_appendenv(envlist, g_string_free(lib, false), ":")) {
+fprintf(stderr,
+"failed to append the native library to environment.\n");
+exit(EXIT_FAILURE);
+}
+} else {
+fprintf(stderr, "native library %s does not exist.\n",
+native_lib_path);
+exit(EXIT_FAILURE);
+}
+}
+#endif
 target_environ = envlist_to_environ(envlist, NULL);
 envlist_free(envlist);
 
-- 
2.34.1

[RFC v4 10/11] tests/tcg/multiarch: Add nativecall.c test

2023-08-08 Thread Yeqi Fu

Introduce a new test for native calls to ensure their functionality.
The process involves cross-compiling the test cases, building them
as dynamically linked binaries, and running these binaries which
necessitates the addition of the appropriate interpreter prefix.

Signed-off-by: Yeqi Fu 
---
 tests/tcg/multiarch/Makefile.target | 17 +
 tests/tcg/multiarch/native/nativecall.c | 98 +
 2 files changed, 115 insertions(+)
 create mode 100644 tests/tcg/multiarch/native/nativecall.c

diff --git a/tests/tcg/multiarch/Makefile.target 
b/tests/tcg/multiarch/Makefile.target
index 43bddeaf21..5231df34ba 100644
--- a/tests/tcg/multiarch/Makefile.target
+++ b/tests/tcg/multiarch/Makefile.target
@@ -138,5 +138,22 @@ run-plugin-semiconsole-with-%:
 TESTS += semihosting semiconsole
 endif
 
+nativecall: native/nativecall.c
+   $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $< -o $@ $(filter-out 
-static,$(LDFLAGS))
+
+ifneq ($(LD_PREFIX),)
+ifneq ($(wildcard $(LIBNATIVE)),)
+run-nativecall: nativecall
+   $(call run-test,$<, $(QEMU) -L $(LD_PREFIX) --native-bypass 
$(LIBNATIVE) $<, "nativecall")
+else
+run-nativecall: nativecall
+   $(call skip-test, $<, "no native library found")
+endif
+else
+run-nativecall: nativecall
+   $(call skip-test, $<, "no elf interpreter prefix found")
+endif
+EXTRA_RUNS += run-nativecall
+
 # Update TESTS
 TESTS += $(MULTIARCH_TESTS)
diff --git a/tests/tcg/multiarch/native/nativecall.c 
b/tests/tcg/multiarch/native/nativecall.c
new file mode 100644
index 00..d3f6f49ed0
--- /dev/null
+++ b/tests/tcg/multiarch/native/nativecall.c
@@ -0,0 +1,98 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+
+void compare_memory(const void *a, const void *b, size_t n)
+{
+const unsigned char *p1 = a;
+const unsigned char *p2 = b;
+for (size_t i = 0; i < n; i++) {
+assert(p1[i] == p2[i]);
+}
+}
+
+void test_memcpy()
+{
+char src[] = "Hello, world!";
+char dest[20];
+memcpy(dest, src, 13);
+compare_memory(dest, src, 13);
+}
+
+void test_strncpy()
+{
+char src[] = "Hello, world!";
+char dest[20];
+strncpy(dest, src, 13);
+compare_memory(dest, src, 13);
+}
+
+void test_strcpy()
+{
+char src[] = "Hello, world!";
+char dest[20];
+strcpy(dest, src);
+compare_memory(dest, src, 13);
+}
+
+void test_strcat()
+{
+char src[20] = "Hello, ";
+char dst[] = "world!";
+char str[] = "Hello, world!";
+strcat(src, dest);
+compare_memory(src, str, 13);
+}
+
+void test_memcmp()
+{
+char str1[] = "abc";
+char str2[] = "abc";
+char str3[] = "def";
+assert(memcmp(str1, str2, 3) == 0);
+assert(memcmp(str1, str3, 3) != 0);
+}
+
+void test_strncmp()
+{
+char str1[] = "abc";
+char str2[] = "abc";
+char str3[] = "def";
+assert(strncmp(str1, str2, 2) == 0);
+assert(strncmp(str1, str3, 2) != 0);
+}
+
+void test_strcmp()
+{
+char str1[] = "abc";
+char str2[] = "abc";
+char str3[] = "def";
+assert(strcmp(str1, str2) == 0);
+assert(strcmp(str1, str3) != 0);
+}
+
+void test_memset()
+{
+char buffer[10];
+memset(buffer, 'A', 10);
+int i;
+for (i = 0; i < 10; i++) {
+assert(buffer[i] == 'A');
+}
+}
+
+int main()
+{
+test_memset();
+test_memcpy();
+test_strncpy();
+test_memcmp();
+test_strncmp();
+test_strcpy();
+test_strcmp();
+test_strcat();
+
+return EXIT_SUCCESS;
+}
-- 
2.34.1

[RFC v4 09/11] target/arm: Add support for native library calls

2023-08-08 Thread Yeqi Fu

This commit introduces support for native library calls on the
arm target. When special instructions reserved for native calls
are encountered, the code now performs address translation and
generates the corresponding native call.

Signed-off-by: Yeqi Fu 
---
 configs/targets/aarch64-linux-user.mak |  1 +
 configs/targets/arm-linux-user.mak |  1 +
 target/arm/tcg/translate-a64.c | 14 ++
 target/arm/tcg/translate.c | 11 +++
 4 files changed, 27 insertions(+)

diff --git a/configs/targets/aarch64-linux-user.mak 
b/configs/targets/aarch64-linux-user.mak
index ba8bc5fe3f..5a8fd98cd9 100644
--- a/configs/targets/aarch64-linux-user.mak
+++ b/configs/targets/aarch64-linux-user.mak
@@ -4,3 +4,4 @@ TARGET_XML_FILES= gdb-xml/aarch64-core.xml 
gdb-xml/aarch64-fpu.xml gdb-xml/aarch
 TARGET_HAS_BFLT=y
 CONFIG_SEMIHOSTING=y
 CONFIG_ARM_COMPATIBLE_SEMIHOSTING=y
+CONFIG_NATIVE_CALL=y
diff --git a/configs/targets/arm-linux-user.mak 
b/configs/targets/arm-linux-user.mak
index 7f5d65794c..f934fb82da 100644
--- a/configs/targets/arm-linux-user.mak
+++ b/configs/targets/arm-linux-user.mak
@@ -5,3 +5,4 @@ TARGET_XML_FILES= gdb-xml/arm-core.xml gdb-xml/arm-vfp.xml 
gdb-xml/arm-vfp3.xml
 TARGET_HAS_BFLT=y
 CONFIG_SEMIHOSTING=y
 CONFIG_ARM_COMPATIBLE_SEMIHOSTING=y
+CONFIG_NATIVE_CALL=y
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 3baab6aa60..422d943f92 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -25,6 +25,7 @@
 #include "arm_ldst.h"
 #include "semihosting/semihost.h"
 #include "cpregs.h"
+#include "native/native.h"
 
 static TCGv_i64 cpu_X[32];
 static TCGv_i64 cpu_pc;
@@ -2400,6 +2401,19 @@ static bool trans_HLT(DisasContext *s, arg_i *a)
  * it is required for halting debug disabled: it will UNDEF.
  * Secondly, "HLT 0xf000" is the A64 semihosting syscall instruction.
  */
+if (native_bypass_enabled() && (a->imm == 0x)) {
+TCGv_i64 arg1 = tcg_temp_new_i64();
+TCGv_i64 arg2 = tcg_temp_new_i64();
+TCGv_i64 arg3 = tcg_temp_new_i64();
+TCGv_i64 ret = tcg_temp_new_i64();
+const char *fun_name = lookup_symbol((s->base.pc_next) & 0xfff);
+tcg_gen_mov_i64(arg1, cpu_reg(s, 0));
+tcg_gen_mov_i64(arg2, cpu_reg(s, 1));
+tcg_gen_mov_i64(arg3, cpu_reg(s, 2));
+gen_native_call_i64(fun_name, ret, arg1, arg2, arg3);
+tcg_gen_mov_i64(cpu_reg(s, 0), ret);
+return true;
+}
 if (semihosting_enabled(s->current_el == 0) && a->imm == 0xf000) {
 gen_exception_internal_insn(s, EXCP_SEMIHOST);
 } else {
diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
index 13c88ba1b9..a095ebcea6 100644
--- a/target/arm/tcg/translate.c
+++ b/target/arm/tcg/translate.c
@@ -27,6 +27,7 @@
 #include "arm_ldst.h"
 #include "semihosting/semihost.h"
 #include "cpregs.h"
+#include "native/native.h"
 #include "exec/helper-proto.h"
 
 #define HELPER_H "helper.h"
@@ -1139,6 +1140,16 @@ static inline void gen_hlt(DisasContext *s, int imm)
  * semihosting, to provide some semblance of security
  * (and for consistency with our 32-bit semihosting).
  */
+if (native_bypass_enabled() && (imm == 0x)) {
+TCGv_i32 arg1 = load_reg(s, 0);
+TCGv_i32 arg2 = load_reg(s, 1);
+TCGv_i32 arg3 = load_reg(s, 2);
+TCGv_i32 ret = tcg_temp_new_i32();
+const char *fun_name = lookup_symbol((s->base.pc_next) & 0xfff);
+gen_native_call_i32(fun_name, ret, arg1, arg2, arg3);
+store_reg(s, 0, ret);
+return;
+}
 if (semihosting_enabled(s->current_el == 0) &&
 (imm == (s->thumb ? 0x3c : 0xf000))) {
 gen_exception_internal_insn(s, EXCP_SEMIHOST);
-- 
2.34.1

[RFC v4 01/11] build: Implement logic for sharing cross-building config files

2023-08-08 Thread Yeqi Fu

Signed-off-by: Yeqi Fu 
---
 configure | 57 +--
 1 file changed, 34 insertions(+), 23 deletions(-)

diff --git a/configure b/configure
index 2b41c49c0d..a076583141 100755
--- a/configure
+++ b/configure
@@ -1751,56 +1751,67 @@ if test "$ccache_cpp2" = "yes"; then
   echo "export CCACHE_CPP2=y" >> $config_host_mak
 fi
 
-# tests/tcg configuration
-(config_host_mak=tests/tcg/config-host.mak
-mkdir -p tests/tcg
-echo "# Automatically generated by configure - do not modify" > 
$config_host_mak
-echo "SRC_PATH=$source_path" >> $config_host_mak
-echo "HOST_CC=$host_cc" >> $config_host_mak
+# Prepare the config files for cross building.
+# This process generates 'cross-build//config-target.mak' files.
+# These files are then symlinked to the directories that need them which
+# including the TCG tests (tests/tcg/) and the libnative library
+# for linux-user (common/native//).
+mkdir -p cross-build
 
-# versioned checked in the main config_host.mak above
-if test -n "$gdb_bin"; then
-echo "HAVE_GDB_BIN=$gdb_bin" >> $config_host_mak
-fi
-if test "$plugins" = "yes" ; then
-echo "CONFIG_PLUGIN=y" >> $config_host_mak
-fi
-
-tcg_tests_targets=
 for target in $target_list; do
   arch=${target%%-*}
-
   case $target in
 xtensa*-linux-user)
-  # the toolchain is not complete with headers, only build softmmu tests
+  # the toolchain for tests/tcg is not complete with headers
   continue
   ;;
 *-softmmu)
-  test -f "$source_path/tests/tcg/$arch/Makefile.softmmu-target" || 
continue
   qemu="qemu-system-$arch"
   ;;
 *-linux-user|*-bsd-user)
   qemu="qemu-$arch"
   ;;
   esac
-
   if probe_target_compiler $target || test -n "$container_image"; then
   test -n "$container_image" && build_static=y
-  mkdir -p "tests/tcg/$target"
-  config_target_mak=tests/tcg/$target/config-target.mak
-  ln -sf "$source_path/tests/tcg/Makefile.target" 
"tests/tcg/$target/Makefile"
+  mkdir -p "cross-build/$target"
+  config_target_mak=cross-build/$target/config-target.mak
   echo "# Automatically generated by configure - do not modify" > 
"$config_target_mak"
   echo "TARGET_NAME=$arch" >> "$config_target_mak"
   echo "TARGET=$target" >> "$config_target_mak"
-  write_target_makefile "build-tcg-tests-$target" >> "$config_target_mak"
+  write_target_makefile "$target" >> "$config_target_mak"
   echo "BUILD_STATIC=$build_static" >> "$config_target_mak"
   echo "QEMU=$PWD/$qemu" >> "$config_target_mak"
 
+  # get the interpreter prefix and the path of libnative required for 
native call tests
+  if [ -d "/usr/$(echo "$target_cc" | sed 's/-gcc//')" ]; then
+  echo "LD_PREFIX=/usr/$(echo "$target_cc" | sed 's/-gcc//')" >> 
"$config_target_mak"
+  fi
+
   # will GDB work with these binaries?
   if test "${gdb_arches#*$arch}" != "$gdb_arches"; then
   echo "HOST_GDB_SUPPORTS_ARCH=y" >> "$config_target_mak"
   fi
+  fi
+done
+
+# tests/tcg configuration
+(mkdir -p tests/tcg
+# create a symlink to the config-host.mak file in the tests/tcg
+ln -srf $config_host_mak tests/tcg/config-host.mak
+
+tcg_tests_targets=
+for target in $target_list; do
+  case $target in
+*-softmmu)
+  test -f "$source_path/tests/tcg/$arch/Makefile.softmmu-target" || 
continue
+  ;;
+  esac
 
+  if test -f cross-build/$target/config-target.mak; then
+  mkdir -p "tests/tcg/$target"
+  ln -srf cross-build/$target/config-target.mak 
tests/tcg/$target/config-target.mak
+  ln -sf $source_path/tests/tcg/Makefile.target tests/tcg/$target/Makefile
   echo "run-tcg-tests-$target: $qemu\$(EXESUF)" >> Makefile.prereqs
   tcg_tests_targets="$tcg_tests_targets $target"
   fi
-- 
2.34.1

Re: [PATCH RFC 1/1] tcg: Always pass the full write size to notdirty_write()

2023-08-08 Thread Richard Henderson


On 8/8/23 02:59, Ilya Leoshkevich wrote:

On Mon, 2023-08-07 at 11:21 -0700, Richard Henderson wrote:

IIRC there are some uses of probe_access_* that set size == 0.
Should we adjust addr+size to cover the whole page for that case?
That seems to be the intent, anyway.


There is a comment that says we shouldn't do watchpoint/smc detection
in this case:

 /* Per the interface, size == 0 merely faults the access. */
 if (size == 0) {
 return NULL;
 }

Come to think of it, qemu-user works this way too: SMC is detected on
the actual access, not the probe:

 helper_vstl()
   cpu_stq_be_data_ra()
 ...
stq_he_p()
  
host_signal_handler()
  handle_sigsegv_accerr_write()
page_unprotect()
  tb_invalidate_phys_page_unwind()
cpu_loop_exit_noexc()

So all this is probably fine, I now think it's better to leave the code
as is, especially given that I cannot reproduce the original problem
anymore.


Ok then.


r~

Re: [PATCH] linux-user/riscv: Use abi_ulong for target_ucontext

2023-08-08 Thread Richard Henderson


On 8/8/23 02:34, LIU Zhiwei wrote:

We should not use types dependend on host arch for target_ucontext.
This bug is found when run rv32 applications.

Signed-off-by: LIU Zhiwei
---
  linux-user/riscv/signal.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH for-8.1] linux-user: Define real MAP_FIXED_NOREPLACE value

2023-08-08 Thread Richard Henderson


On 8/8/23 04:52, Akihiko Odaki wrote:

do_brk() assumes target_mmap() emulates MAP_FIXED_NOREPLACE even when
the host does not support it. However, such emulation is not possible
if MAP_FIXED_NOREPLACE is defined as zero.

Define MAP_FIXED_NOREPLACE with the real value instead of zero if it is
not defined.

Fixes: e69e032d1a ("linux-user: Use MAP_FIXED_NOREPLACE for do_brk()")
Signed-off-by: Akihiko Odaki 
---
  include/qemu/osdep.h | 8 ++--
  linux-user/elfload.c | 1 -
  2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index cc61b00ba9..1aac17ec2f 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -289,8 +289,12 @@ void QEMU_ERROR("code path is reachable")
  #ifndef MAP_ANONYMOUS
  #define MAP_ANONYMOUS MAP_ANON
  #endif
-#ifndef MAP_FIXED_NOREPLACE
-#define MAP_FIXED_NOREPLACE 0
+#if defined(__linux__) && !defined(MAP_FIXED_NOREPLACE)
+#if HOST_ALPHA


HOST_ALPHA is not a thing.  Also, see

https://lore.kernel.org/qemu-devel/20230808120303.585509-3-pbonz...@redhat.com/

where host support for Alpha is completely removed.



+#define MAP_FIXED_NOREPLACE 0x20
+#else
+#define MAP_FIXED_NOREPLACE 0x10
+#endif


Which supported hosts do not define this value?  Can we simply remove the 
fallback?


r~

Re: [PATCH v3] hw/cxl: Fix CFMW config memory leak

2023-08-08 Thread Jonathan Cameron via

On Sat, 5 Aug 2023 08:46:29 +0300
Michael Tokarev  wrote:

> 31.05.2023 14:08, Jonathan Cameron via wrote:
> > On Wed, 31 May 2023 09:51:43 +0200
> > Philippe Mathieu-Daudé  wrote:
> >   
> >> On 31/5/23 08:07, Li Zhijian wrote:  
> >>> Allocate targets and targets[n] resources when all sanity checks are
> >>> passed to avoid memory leaks.
> >>>
> >>> Suggested-by: Philippe Mathieu-Daudé 
> >>> Signed-off-by: Li Zhijian 
> >>> ---
> >>> V3: allocte further resource when we can't fail # Philippe  
> >>
> >> Thanks for the v3!
> >>
> >> Reviewed-by: Philippe Mathieu-Daudé   
> > 
> > Thanks.  I've added this near the top of my queue so will send
> > it out along with other similar fixes as a series for Michael
> > to consider picking up.  
> 
> Hi!
> 
> Has this been forgotten? Is it still needed?
> 
> /mjt
> 

Sorry, I'm running a bit behind.  Have this one a few other fixes
still queued up - I didn't consider any of them particularly critical
for the release so wasn't rushing.

I'll aim to get them out as a series soon though so they are
available for start of next cycle if not for slipping in before
the release.

Jonathan

Re: [PATCH] Add support of callbacks after instructions to plugin api

2023-08-08 Thread Richard Henderson


On 8/8/23 06:44, Mikhail Tyutin wrote:

Initially, we can only call the callback BEFORE instructions. This commit adds 
the ability to insert the callback AFTER instructions.

No callback call for control-flow instructions.


You're going to miss whole categories of instructions, not just control-flow.  You're 
going to miss anything that raises an exception.  The list goes on and on.  This is why we 
didn't implement this "after" hook in the first place.



r~

[PATCH] vfio/pci: hide ROM BAR on SFC9220 10/40G Ethernet Controller PF

2023-08-08 Thread Laszlo Ersek

The Solarflare Communications SFC9220 NIC's physical function (PF) appears
to expose an expansion ROM with the following characteristics:

(1) Single-image ROM, with only a legacy BIOS image (no UEFI driver).
Alex's rom-parser utility dumps it like this:

> Valid ROM signature found @0h, PCIR offset 20h
> PCIR: type 0 (x86 PC-AT), vendor: 1924, device: 0a03, class: 02
> PCIR: revision 3, vendor revision: 1
> Last image

(2) The BIOS image crashes when booted on i440fx.

(3) The BIOS image prints the following messages on q35:

> Solarflare Boot Manager (v5.2.2.1006)
> Solarflare Communications 2008-2019
> gPXE (http://etherboot.org) - [...] PCI[...] PnP PMM[...]

So it appears like a modified derivative of old gPXE.

Alex surmised in advance that the BIOS image could be accessing
host-physical addresses rather than guest-phys ones, leading to the crash
on i440fx.

Don't expose the option ROM BAR to the VM by default. While this prevents
netbooting the VM off the PF on q35/SeaBIOS (a relatively rare scenario),
it does not make any difference for UEFI, and at least the VM doesn't
crash during boot on i440fx/SeaBIOS (a relatively frequent scenario).
Users can restore the original behavior via the QEMU cmdline and the
libvirt domain XML.

(In two years, we've not seen any customer interest in this bug, hence
there's no incentive to investigate (2).)

Cc: Alex Williamson  (supporter:VFIO)
Cc: "Cédric Le Goater"  (supporter:VFIO)
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1975776
Signed-off-by: Laszlo Ersek 
---
 hw/vfio/pci-quirks.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index f4ff83680572..270eb16b91fa 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -45,6 +45,10 @@ static const struct {
 uint32_t device;
 } rom_denylist[] = {
 { 0x14e4, 0x168e }, /* Broadcom BCM 57810 */
+{ 0x1924, 0x0a03 }, /* Solarflare Communications
+ * SFC9220 10/40G Ethernet Controller
+ * https://bugzilla.redhat.com/show_bug.cgi?id=1975776
+ */
 };
 
 bool vfio_opt_rom_in_denylist(VFIOPCIDevice *vdev)

[PATCH for-8.1 v2] linux-user: Define real MAP_FIXED_NOREPLACE value

2023-08-08 Thread Akihiko Odaki

do_brk() assumes target_mmap() emulates MAP_FIXED_NOREPLACE even when
the host does not support it. However, such emulation is not possible
if MAP_FIXED_NOREPLACE is defined as zero.

Define MAP_FIXED_NOREPLACE with the real value instead of zero if it is
not defined.

Fixes: e69e032d1a ("linux-user: Use MAP_FIXED_NOREPLACE for do_brk()")
Signed-off-by: Akihiko Odaki 
---
 include/qemu/osdep.h | 8 ++--
 linux-user/elfload.c | 1 -
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index cc61b00ba9..6c1050afcd 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -289,8 +289,12 @@ void QEMU_ERROR("code path is reachable")
 #ifndef MAP_ANONYMOUS
 #define MAP_ANONYMOUS MAP_ANON
 #endif
-#ifndef MAP_FIXED_NOREPLACE
-#define MAP_FIXED_NOREPLACE 0
+#if defined(__linux__) && !defined(MAP_FIXED_NOREPLACE)
+#if MAP_HUGETLB == 0x10
+#define MAP_FIXED_NOREPLACE 0x20
+#else
+#define MAP_FIXED_NOREPLACE 0x10
+#endif
 #endif
 #ifndef MAP_NORESERVE
 #define MAP_NORESERVE 0
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 36e4026f05..9d9c79a653 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -2807,7 +2807,6 @@ static void pgb_reserved_va(const char *image_name, 
abi_ulong guest_loaddr,
 /* Widen the "image" to the entire reserved address space. */
 pgb_static(image_name, 0, reserved_va, align);
 
-/* osdep.h defines this as 0 if it's missing */
 flags |= MAP_FIXED_NOREPLACE;
 
 /* Reserve the memory on the host. */
-- 
2.41.0

Re: [PATCH for-8.1 v10 04/14] linux-user: Use MAP_FIXED_NOREPLACE for initial image mmap

2023-08-08 Thread Alex Bennée



Akihiko Odaki  writes:

> On 2023/08/08 22:48, Alex Bennée wrote:
>> Akihiko Odaki  writes:
>> 
>>> On 2023/08/08 18:43, Alex Bennée wrote:
 Richard Henderson  writes:

> Use this as extra protection for the guest mapping over
> any qemu host mappings.
>
> Tested-by: Helge Deller 
> Reviewed-by: Helge Deller 
> Reviewed-by: Akihiko Odaki 
> Signed-off-by: Richard Henderson 
> ---
>linux-user/elfload.c | 9 ++---
>1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/linux-user/elfload.c b/linux-user/elfload.c
> index 36e4026f05..1b4bb2d5af 100644
> --- a/linux-user/elfload.c
> +++ b/linux-user/elfload.c
> @@ -3147,8 +3147,11 @@ static void load_elf_image(const char *image_name, 
> int image_fd,
>/*
> * Reserve address space for all of this.
> *
> - * In the case of ET_EXEC, we supply MAP_FIXED so that we get
> - * exactly the address range that is required.
> + * In the case of ET_EXEC, we supply MAP_FIXED_NOREPLACE so that we 
> get
> + * exactly the address range that is required.  Without reserved_va,
> + * the guest address space is not isolated.  We have attempted to 
> avoid
> + * conflict with the host program itself via probe_guest_base, but 
> using
> + * MAP_FIXED_NOREPLACE instead of MAP_FIXED provides an extra check.
> *
> * Otherwise this is ET_DYN, and we are searching for a location
> * that can hold the memory space required.  If the image is
> @@ -3160,7 +3163,7 @@ static void load_elf_image(const char *image_name, 
> int image_fd,
> */
>load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, 
> PROT_NONE,
>MAP_PRIVATE | MAP_ANON | MAP_NORESERVE |
> -(ehdr->e_type == ET_EXEC ? MAP_FIXED : 0),
> +(ehdr->e_type == ET_EXEC ? 
> MAP_FIXED_NOREPLACE : 0),
>-1, 0);
 We should probably also check the result == load_addr for the places
 where MAP_FIXED_NOREPLACE isn't supported as we have this in osdep.h:
 #ifndef MAP_FIXED_NOREPLACE
 #define MAP_FIXED_NOREPLACE 0
 #endif
 See 2667e069e7 (linux-user: don't use MAP_FIXED in
 pgd_find_hole_fallback)
>>>
>>> It assumes target_mmap() emulates MAP_FIXED_NOREPLACE when the host
>>> does not support it as commit e69e032d1a ("linux-user: Use
>>> MAP_FIXED_NOREPLACE for do_brk()") already does, but defining
>>> MAP_FIXED_NOREPLACE zero breaks such emulation. I wrote a fix:
>>> https://patchew.org/QEMU/20230808115242.73025-1-akihiko.od...@daynix.com/
>> Hmm doesn't that push the problem to real mmap() calls to a host
>> system
>> that doesn't support MAP_FIXED_NOREPLACE?
>
> That can happen even without that patch if you run QEMU built with a
> new libc on old Linux so we should have prepared for that kind of
> situation.
>
> The man page also says:
>> Note that older kernels which do not recognize the MAP_FIXED_NOREPLACE
>> flag will typically (upon detecting a collision with a preexisting
>> mapping) fall back to a “non-MAP_FIXED” type of behavior: they will
>> return an address that is different from the requested address.
>> Therefore, backward-compatible software should check the returned
>> address against the requested address.
> https://man7.org/linux/man-pages/man2/mmap.2.html
>
> It basically means MAP_FIXED_NOREPLACE has no effect on a host that
> doesn't support it, and the existing code checking the returned
> address should continue to work.

OK - as long as it doesn't barf on unknown flags.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

[PATCH 1/2] hw/nvme: fix null pointer access in directive receive

2023-08-08 Thread Klaus Jensen

From: Klaus Jensen 

nvme_directive_receive() does not check if an endurance group has been
configured (set) prior to testing if flexible data placement is enabled
or not.

Fix this.

Cc: qemu-sta...@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1815
Fixes: 73064edfb864 ("hw/nvme: flexible data placement emulation")
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index d217ae91b506..e5b5c7034d2b 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -6900,7 +6900,7 @@ static uint16_t nvme_directive_receive(NvmeCtrl *n, 
NvmeRequest *req)
 case NVME_DIRECTIVE_IDENTIFY:
 switch (doper) {
 case NVME_DIRECTIVE_RETURN_PARAMS:
-if (ns->endgrp->fdp.enabled) {
+if (ns->endgrp && ns->endgrp->fdp.enabled) {
 id.supported |= 1 << NVME_DIRECTIVE_DATA_PLACEMENT;
 id.enabled |= 1 << NVME_DIRECTIVE_DATA_PLACEMENT;
 id.persistent |= 1 << NVME_DIRECTIVE_DATA_PLACEMENT;
-- 
2.41.0

[PATCH 2/2] hw/nvme: fix null pointer access in ruh update

2023-08-08 Thread Klaus Jensen

From: Klaus Jensen 

The Reclaim Unit Update operation in I/O Management Receive does not
verify the presence of a configured endurance group prior to accessing
it.

Fix this.

Cc: qemu-sta...@nongnu.org
Fixes: 73064edfb864 ("hw/nvme: flexible data placement emulation")
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index e5b5c7034d2b..539d27355313 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -4361,7 +4361,13 @@ static uint16_t nvme_io_mgmt_send_ruh_update(NvmeCtrl 
*n, NvmeRequest *req)
 uint32_t npid = (cdw10 >> 1) + 1;
 unsigned int i = 0;
 g_autofree uint16_t *pids = NULL;
-uint32_t maxnpid = n->subsys->endgrp.fdp.nrg * n->subsys->endgrp.fdp.nruh;
+uint32_t maxnpid;
+
+if (!ns->endgrp || !ns->endgrp->fdp.enabled) {
+return NVME_FDP_DISABLED | NVME_DNR;
+}
+
+maxnpid = n->subsys->endgrp.fdp.nrg * n->subsys->endgrp.fdp.nruh;
 
 if (unlikely(npid >= MIN(NVME_FDP_MAXPIDS, maxnpid))) {
 return NVME_INVALID_FIELD | NVME_DNR;
-- 
2.41.0

[PATCH 0/2] hw/nvme: two fixes

2023-08-08 Thread Klaus Jensen

From: Klaus Jensen 

Fix two potential accesses to null pointers.

Klaus Jensen (2):
  hw/nvme: fix null pointer access in directive receive
  hw/nvme: fix null pointer access in ruh update

 hw/nvme/ctrl.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

-- 
2.41.0

Re: [PATCH] Add support of callbacks after instructions to plugin api

2023-08-08 Thread Alex Bennée



Mikhail Tyutin  writes:

> Initially, we can only call the callback BEFORE instructions. This
> commit adds the ability to insert the callback AFTER instructions.

What is the use case for this? Because:


>  
> +
> +/* Stop translation if translate_insn so indicated.  */
> +if (db->is_jmp != DISAS_NEXT) {
> +break;
> +}
> +
>  /*
>   * We can't instrument after instructions that change control
>   * flow although this only really affects post-load operations.
> @@ -193,11 +199,6 @@ void translator_loop(CPUState *cpu, TranslationBlock 
> *tb, int *max_insns,
>  plugin_gen_insn_end();
>  }
>  
> -/* Stop translation if translate_insn so indicated.  */
> -if (db->is_jmp != DISAS_NEXT) {
> -break;
> -}
> -
>  /* Stop translation if the output buffer is full,
> or we have executed all of the allowed instructions.  */
>  if (tcg_op_buf_full() || db->num_insns >= db->max_insns) {
> @@ -211,6 +212,13 @@ void translator_loop(CPUState *cpu, TranslationBlock 
> *tb, int *max_insns,
>  gen_tb_end(tb, cflags, icount_start_insn, db->num_insns);
>  
>  if (plugin_enabled) {
> +/*
> + * Last chance to call plugin_gen_insn_end() if is skipped in 
> translation
> + * loop above.
> + */
> +if (db->is_jmp != DISAS_NEXT && tcg_ctx->exitreq_label == NULL) {
> +plugin_gen_insn_end();
> +}
>  plugin_gen_tb_end(cpu);
>  }
>  

> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -2819,6 +2819,22 @@ void tcg_gen_exit_tb(const TranslationBlock *tb, 
> unsigned idx)
>  tcg_debug_assert(idx == TB_EXIT_REQUESTED);
>  }
>  
> +#ifdef CONFIG_PLUGIN
> +/*
> + * Some of instruction generators insert exit_tb explicitelly to
> + * trigger early exit from translation block. On the other hand
> + * translation loop (translator_loop()) inserts plugin callbacks
> + * after instruction is generated, but it appears as dead code
> + * because of the explicit exit_tb insert.
> + *
> + * Calling plugin_gen_insn_end() here before the exit allows
> + * plugins to receive control before translation block exits.
> + */
> +if (tcg_ctx->plugin_insn) {
> +plugin_gen_insn_end();
> +}
> +#endif
> +

This isn't enough as we can exit the run loop in helpers. This is why
the execlog plugin jumps the hoops it does to complete handling of
execution on the next instruction.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH for-8.1] linux-user: Define real MAP_FIXED_NOREPLACE value

2023-08-08 Thread Akihiko Odaki


On 2023/08/08 23:42, Richard Henderson wrote:

On 8/8/23 04:52, Akihiko Odaki wrote:

do_brk() assumes target_mmap() emulates MAP_FIXED_NOREPLACE even when
the host does not support it. However, such emulation is not possible
if MAP_FIXED_NOREPLACE is defined as zero.

Define MAP_FIXED_NOREPLACE with the real value instead of zero if it is
not defined.

Fixes: e69e032d1a ("linux-user: Use MAP_FIXED_NOREPLACE for do_brk()")
Signed-off-by: Akihiko Odaki 
---
  include/qemu/osdep.h | 8 ++--
  linux-user/elfload.c | 1 -
  2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index cc61b00ba9..1aac17ec2f 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -289,8 +289,12 @@ void QEMU_ERROR("code path is reachable")
  #ifndef MAP_ANONYMOUS
  #define MAP_ANONYMOUS MAP_ANON
  #endif
-#ifndef MAP_FIXED_NOREPLACE
-#define MAP_FIXED_NOREPLACE 0
+#if defined(__linux__) && !defined(MAP_FIXED_NOREPLACE)
+#if HOST_ALPHA


HOST_ALPHA is not a thing.  Also, see

https://lore.kernel.org/qemu-devel/20230808120303.585509-3-pbonz...@redhat.com/

where host support for Alpha is completely removed.


I sent v2 with the condition HOST_ALPHA replaced with
MAP_HUGETLB == 0x10.

While Alpha is no longer supported, and linux-user will not work on 
Alpha, code outside linux-user also refers to osdep.h and the build 
script does not actively block building QEMU on an unsupported host so I 
left the definition for Alpha just in case.






+#define MAP_FIXED_NOREPLACE 0x20
+#else
+#define MAP_FIXED_NOREPLACE 0x10
+#endif


Which supported hosts do not define this value?  Can we simply remove 
the fallback?


glibc didn't have this defined until 2.28. The older releases still 
maintained are 2.26, and 2.27, according to:

https://sourceware.org/glibc/wiki/Release

The page says ALT Linux p9 and Ubuntu 18.04 LTS (Bionic Beaver) has 
glibc 2.27.





r~

Re: [PATCH for-8.1 v10 01/14] linux-user: Adjust task_unmapped_base for reserved_va

2023-08-08 Thread Richard Henderson


On 8/8/23 02:10, Alex Bennée wrote:

One thing I'm slightly confused by is the ELF_ET_DYN_BASE can be above
this (or sometimes the same). Should the mapping of ELF segments be
handled with mmap_next_start? I assume once mmap_next_start meets the
mappings for the ELF segments we skip over until we get to more free
space after the program code?


ELF_ET_DYN_BASE is a hack imported from the kernel to put separation between an ET_DYN 
main binary and TASK_UNMAPPED_BASE, so that the brk can follow the binary and have space 
to grow.


All of this is part of the "legacy" memory layout, for which there is a 
personality flag.

For 8.2, I think we should work on implementing the "new" memory layout, which places 
everything top-down.  But most importantly it completely separates brk from the binary.



r~

[PATCH] util: Delete a check for IA-64

2023-08-08 Thread Akihiko Odaki

HOST_IA64 will never be defined since IA-64 host is no longer supported.

Signed-off-by: Akihiko Odaki 
---
 util/oslib-posix.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 760390b31e..f7adb36dfb 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -618,10 +618,7 @@ void *qemu_alloc_stack(size_t *sz)
 abort();
 }
 
-#if defined(HOST_IA64)
-/* separate register stack */
-guardpage = ptr + (((*sz - pagesz) / 2) & ~pagesz);
-#elif defined(HOST_HPPA)
+#if defined(HOST_HPPA)
 /* stack grows up */
 guardpage = ptr + *sz - pagesz;
 #else
-- 
2.41.0

[PATCH] thunk: Delete checks for old host definitions

2023-08-08 Thread Akihiko Odaki

Alpha, IA-64, and PA-RISC hosts are no longer supported.

Signed-off-by: Akihiko Odaki 
---
 include/exec/user/thunk.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/exec/user/thunk.h b/include/exec/user/thunk.h
index 300a840d58..d9c131ec80 100644
--- a/include/exec/user/thunk.h
+++ b/include/exec/user/thunk.h
@@ -111,8 +111,7 @@ static inline int thunk_type_size(const argtype *type_ptr, 
int is_host)
 if (is_host) {
 #if defined(HOST_X86_64)
 return 8;
-#elif defined(HOST_ALPHA) || defined(HOST_IA64) || defined(HOST_MIPS) || \
-  defined(HOST_PARISC) || defined(HOST_SPARC64)
+#elif defined(HOST_MIPS) || defined(HOST_SPARC64)
 return 4;
 #elif defined(HOST_PPC)
 return sizeof(void *);
-- 
2.41.0

Re: [PATCH] util: Delete a check for IA-64

2023-08-08 Thread Peter Maydell

On Tue, 8 Aug 2023 at 16:21, Akihiko Odaki  wrote:
>
> HOST_IA64 will never be defined since IA-64 host is no longer supported.
>
> Signed-off-by: Akihiko Odaki 
> ---
>  util/oslib-posix.c | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 760390b31e..f7adb36dfb 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -618,10 +618,7 @@ void *qemu_alloc_stack(size_t *sz)
>  abort();
>  }
>
> -#if defined(HOST_IA64)
> -/* separate register stack */
> -guardpage = ptr + (((*sz - pagesz) / 2) & ~pagesz);
> -#elif defined(HOST_HPPA)
> +#if defined(HOST_HPPA)
>  /* stack grows up */
>  guardpage = ptr + *sz - pagesz;
>  #else

Reviewed-by: Peter Maydell 

(There's another one in include/exec/user/thunk.h.)

thanks
-- PMM

Re: [PATCH] thunk: Delete checks for old host definitions

2023-08-08 Thread Peter Maydell

On Tue, 8 Aug 2023 at 16:23, Akihiko Odaki  wrote:
>
> Alpha, IA-64, and PA-RISC hosts are no longer supported.
>
> Signed-off-by: Akihiko Odaki 
> ---
>  include/exec/user/thunk.h | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/include/exec/user/thunk.h b/include/exec/user/thunk.h
> index 300a840d58..d9c131ec80 100644
> --- a/include/exec/user/thunk.h
> +++ b/include/exec/user/thunk.h
> @@ -111,8 +111,7 @@ static inline int thunk_type_size(const argtype 
> *type_ptr, int is_host)
>  if (is_host) {
>  #if defined(HOST_X86_64)
>  return 8;
> -#elif defined(HOST_ALPHA) || defined(HOST_IA64) || defined(HOST_MIPS) || \
> -  defined(HOST_PARISC) || defined(HOST_SPARC64)
> +#elif defined(HOST_MIPS) || defined(HOST_SPARC64)
>  return 4;
>  #elif defined(HOST_PPC)
>  return sizeof(void *);
> --

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH for-8.1 v10 01/14] linux-user: Adjust task_unmapped_base for reserved_va

2023-08-08 Thread Helge Deller


Hi Richard,

On 8/7/23 18:36, Richard Henderson wrote:

Ensure that the chosen values for mmap_next_start and
task_unmapped_base are within the guest address space.

Tested-by: Helge Deller 
Reviewed-by: Akihiko Odaki 
Signed-off-by: Richard Henderson 


I've tested this whole series for quite some chroots.
Good thing is, that all targets do run and can execute a static hello-world
program.
So, overall that's good and I think the patch series should go in for 8.1.

Looking at the target's memmap I do see non-optimal heap-addresses for arm and
powerpc.
I know it's not directly related to your patches, but we should later
try to move the heap behind the executables where they can grow bigger.

Helge

armel-chroot  and armhf-chroot:
Linux p100 6.4.6-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Jul 24 20:51:12 UTC 
2023 armv7l GNU/Linux
00021000-00042000 rw-p  00:00 0  [heap]
0040-00408000 r-xp  fd:00 801471 
/usr/bin/cat
00408000-0041f000 ---p  00:00 0
0041f000-0042 r--p f000 fd:00 801471 
/usr/bin/cat
0042-00421000 rw-p 0001 fd:00 801471 
/usr/bin/cat
4000-40001000 ---p  00:00 0
40001000-40801000 rw-p  00:00 0  [stack]
40801000-40827000 r-xp  fd:00 839152 
/usr/lib/arm-linux-gnueabi/ld-linux.so.3
40827000-40828000 r--p 00026000 fd:00 839152 
/usr/lib/arm-linux-gnueabi/ld-linux.so.3
40828000-40829000 rw-p 00027000 fd:00 839152 
/usr/lib/arm-linux-gnueabi/ld-linux.so.3
...

powerpc-chroot
Linux p100 6.4.6-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Jul 24 20:51:12 UTC 
2023 ppc GNU/Linux
00021000-00043000 rw-p  00:00 0  [heap]
001c-003d5000 r-xp  fd:00 577250 
/usr/lib/powerpc-linux-gnu/libc.so.6
003d5000-003eb000 ---p 00215000 fd:00 577250 
/usr/lib/powerpc-linux-gnu/libc.so.6
003eb000-003f r--p 0021b000 fd:00 577250 
/usr/lib/powerpc-linux-gnu/libc.so.6
003f-003f1000 rw-p 0022 fd:00 577250 
/usr/lib/powerpc-linux-gnu/libc.so.6
003f1000-003fb000 rw-p  00:00 0
0040-0040b000 r-xp  fd:00 535994 
/usr/bin/cat
0040b000-0041f000 ---p  00:00 0
0041f000-0042 r--p f000 fd:00 535994 
/usr/bin/cat
0042-00421000 rw-p 0001 fd:00 535994 
/usr/bin/cat
4000-40001000 ---p  00:00 0
40001000-40801000 rw-p  00:00 0  [stack]
40801000-40834000 r-xp  fd:00 577246 
/usr/lib/powerpc-linux-gnu/ld.so.1
40834000-4084f000 ---p  00:00 0
4084f000-40851000 r--p 0003e000 fd:00 577246 
/usr/lib/powerpc-linux-gnu/ld.so.1
40851000-40852000 rw-p 0004 fd:00 577246 
/usr/lib/powerpc-linux-gnu/ld.so.1
40852000-40853000 r-xp  00:00 0
40853000-40855000 rw-p  00:00 0

Re: [PATCH] vfio/pci: hide ROM BAR on SFC9220 10/40G Ethernet Controller PF

2023-08-08 Thread Alex Williamson

On Tue,  8 Aug 2023 16:59:16 +0200
Laszlo Ersek  wrote:

> The Solarflare Communications SFC9220 NIC's physical function (PF) appears
> to expose an expansion ROM with the following characteristics:
> 
> (1) Single-image ROM, with only a legacy BIOS image (no UEFI driver).
> Alex's rom-parser utility dumps it like this:
> 
> > Valid ROM signature found @0h, PCIR offset 20h
> > PCIR: type 0 (x86 PC-AT), vendor: 1924, device: 0a03, class: 02
> > PCIR: revision 3, vendor revision: 1
> > Last image  
> 
> (2) The BIOS image crashes when booted on i440fx.
> 
> (3) The BIOS image prints the following messages on q35:
> 
> > Solarflare Boot Manager (v5.2.2.1006)
> > Solarflare Communications 2008-2019
> > gPXE (http://etherboot.org) - [...] PCI[...] PnP PMM[...]  
> 
> So it appears like a modified derivative of old gPXE.
> 
> Alex surmised in advance that the BIOS image could be accessing
> host-physical addresses rather than guest-phys ones, leading to the crash
> on i440fx.

ROMs sometimes take shortcuts around the standard interfaces to the
device and can therefore hit gaps in the virtualization, which is why
that's suspect to me.  However if it works on q35 but not 440fx it
might be more that we're not matching a PCI topology expectation of the
ROM.  Was it only tested on 440fx attached to the root bus or does it
also fail if the PF is attached downstream of a PCI-to-PCI bridge?

> Don't expose the option ROM BAR to the VM by default. While this prevents
> netbooting the VM off the PF on q35/SeaBIOS (a relatively rare scenario),
> it does not make any difference for UEFI, and at least the VM doesn't
> crash during boot on i440fx/SeaBIOS (a relatively frequent scenario).
> Users can restore the original behavior via the QEMU cmdline and the
> libvirt domain XML.
> 
> (In two years, we've not seen any customer interest in this bug, hence
> there's no incentive to investigate (2).)
> 
> Cc: Alex Williamson  (supporter:VFIO)
> Cc: "Cédric Le Goater"  (supporter:VFIO)
> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1975776
> Signed-off-by: Laszlo Ersek 
> ---
>  hw/vfio/pci-quirks.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> index f4ff83680572..270eb16b91fa 100644
> --- a/hw/vfio/pci-quirks.c
> +++ b/hw/vfio/pci-quirks.c
> @@ -45,6 +45,10 @@ static const struct {
>  uint32_t device;
>  } rom_denylist[] = {
>  { 0x14e4, 0x168e }, /* Broadcom BCM 57810 */
> +{ 0x1924, 0x0a03 }, /* Solarflare Communications
> + * SFC9220 10/40G Ethernet Controller
> + * 
> https://bugzilla.redhat.com/show_bug.cgi?id=1975776

Unfortunately this is not a public bz so there's not much point in
referencing it in public code or commit log :-\  Thanks,

Alex

> + */
>  };
>  
>  bool vfio_opt_rom_in_denylist(VFIOPCIDevice *vdev)

Re: [PATCH 09/24] target/tricore: Replace gen_cond_w with tcg_gen_negsetcond_tl

2023-08-08 Thread Bastian Koppelmann

On Mon, Aug 07, 2023 at 08:11:28PM -0700, Richard Henderson wrote:
> Signed-off-by: Richard Henderson 
> ---
>  target/tricore/translate.c | 16 ++--
>  1 file changed, 6 insertions(+), 10 deletions(-)
> 
> diff --git a/target/tricore/translate.c b/target/tricore/translate.c
> index 1947733870..6ae5ccbf72 100644
> --- a/target/tricore/translate.c
> +++ b/target/tricore/translate.c

Reviewed-by: Bastian Koppelmann 

Cheers,
Bastian

RE: [PATCH] Add support of callbacks after instructions to plugin api

2023-08-08 Thread Mikhail Tyutin

> On 8/8/23 06:44, Mikhail Tyutin wrote:
> > Initially, we can only call the callback BEFORE instructions. This commit 
> > adds the ability to insert the callback AFTER instructions.
> >
> > No callback call for control-flow instructions.
> 
> You're going to miss whole categories of instructions, not just control-flow. 
>  You're
> going to miss anything that raises an exception.  The list goes on and on.  
> This is why we
> didn't implement this "after" hook in the first place.
> 

To be fair it works quite well for code translations in user-mode and baremetal 
applications. At least we can intercept a set of instructions that have 
registers as operands and even syscall-like instructions. Logically it had to 
work identically to memory 'store' callbacks, but we had to add a shortcut to 
fix problem when some of code translators inserts exit_tb operation explicitly. 
Maybe there is better way to do it.

We use such AFTER callback in plugins to capture CPU state changes in generic 
way (using registers API patch I posted earlier). Without it, BEFORE callback 
has to be added to 'current' and 'following' instructions to achieve the same 
effect. Having callbacks on different instructions adds complexity to the 
callbacks itself to performs state dumps at appropriate conditions (e.g. was 
'previous' instruction the one we instrumented or it was some jump).

[RFC v1 0/3] Initial support for SPDM

2023-08-08 Thread Alistair Francis

The Security Protocol and Data Model (SPDM) Specification defines
messages, data objects, and sequences for performing message exchanges
over a variety of transport and physical media.
 - 
https://www.dmtf.org/sites/default/files/standards/documents/DSP0274_1.3.0.pdf

This series is a very initial start on adding SPDM support to QEMU.

This series uses libspdm [1] which is a reference implementation of
SPDM.

The series starts by adding support for building and linking with
libspdm. It then progresses to handling SPDM requests in the NVMe device
over the PCIe DOE protocol.

This is a very early attempt. The code quality is not super high, the C
code hasn't been tested at all. This is really just an RFC to see if
QEMU will accept linking with libspdm.

So, the main question is, how do people feel about linking with libspdm
to support SPDM?

1: https://github.com/DMTF/libspdm

Alistair Francis (3):
  subprojects: libspdm: Initial support
  hw: nvme: ctrl: Initial support for DOE
  hw: nvme: ctrl: Process SPDM requests

 meson.build   | 78 +++
 hw/nvme/nvme.h|  4 ++
 include/hw/pci/pcie_doe.h |  1 +
 hw/nvme/ctrl.c| 35 
 .gitmodules   |  3 ++
 meson_options.txt |  3 ++
 scripts/meson-buildoptions.sh |  3 ++
 subprojects/.gitignore|  1 +
 subprojects/libspdm.wrap  |  5 +++
 9 files changed, 133 insertions(+)
 create mode 100644 subprojects/libspdm.wrap

-- 
2.41.0

[RFC v1 3/3] hw: nvme: ctrl: Process SPDM requests

2023-08-08 Thread Alistair Francis

Signed-off-by: Alistair Francis 
---
 hw/nvme/ctrl.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index ec3d5d3c29..a299dc7175 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -206,6 +206,11 @@
 #include "hw/pci/pcie_sriov.h"
 #include "migration/vmstate.h"
 
+#ifdef CONFIG_LIBSPDM
+#include "library/spdm_common_lib.h"
+#include "library/spdm_responder_lib.h"
+#endif
+
 #include "nvme.h"
 #include "dif.h"
 #include "trace.h"
@@ -8092,6 +8097,16 @@ static int nvme_add_pm_capability(PCIDevice *pci_dev, 
uint8_t offset)
 #ifdef CONFIG_LIBSPDM
 static bool nvme_doe_spdm_rsp(DOECap *doe_cap)
 {
+void *context = (void *)malloc(libspdm_get_context_size());
+uint32_t *session_id;
+bool is_app_message;
+
+libspdm_init_context(context);
+
+libspdm_process_request(context, &session_id, &is_app_message,
+doe_cap->write_mbox_len,
+doe_cap->write_mbox);
+
 return false;
 }
 #endif
-- 
2.41.0

[RFC v1 2/3] hw: nvme: ctrl: Initial support for DOE

2023-08-08 Thread Alistair Francis

Signed-off-by: Alistair Francis 
---
 hw/nvme/nvme.h|  4 
 include/hw/pci/pcie_doe.h |  1 +
 hw/nvme/ctrl.c| 20 
 3 files changed, 25 insertions(+)

diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 209e8f5b4c..e0918516e3 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -20,6 +20,7 @@
 
 #include "qemu/uuid.h"
 #include "hw/pci/pci_device.h"
+#include "hw/pci/pcie_doe.h"
 #include "hw/block/block.h"
 
 #include "block/nvme.h"
@@ -597,6 +598,9 @@ typedef struct NvmeCtrl {
 uint16_tvqrfap;
 uint16_tvirfap;
 } next_pri_ctrl_cap;/* These override pri_ctrl_cap after reset */
+
+/* DOE */
+DOECap doe_spdm;
 } NvmeCtrl;
 
 typedef enum NvmeResetType {
diff --git a/include/hw/pci/pcie_doe.h b/include/hw/pci/pcie_doe.h
index 87dc17dcef..18e9492977 100644
--- a/include/hw/pci/pcie_doe.h
+++ b/include/hw/pci/pcie_doe.h
@@ -46,6 +46,7 @@ REG32(PCI_DOE_CAP_STATUS, 0)
 
 /* PCI-SIG defined Data Object Types - r6.0 Table 6-32 */
 #define PCI_SIG_DOE_DISCOVERY   0x00
+#define PCI_SIG_DOE_SPDM0x01
 
 #define PCI_DOE_DW_SIZE_MAX (1 << 18)
 #define PCI_DOE_PROTOCOL_NUM_MAX256
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index f2e5a2fa73..ec3d5d3c29 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -202,6 +202,7 @@
 #include "sysemu/block-backend.h"
 #include "sysemu/hostmem.h"
 #include "hw/pci/msix.h"
+#include "hw/pci/pcie_doe.h"
 #include "hw/pci/pcie_sriov.h"
 #include "migration/vmstate.h"
 
@@ -8088,6 +8089,13 @@ static int nvme_add_pm_capability(PCIDevice *pci_dev, 
uint8_t offset)
 return 0;
 }
 
+#ifdef CONFIG_LIBSPDM
+static bool nvme_doe_spdm_rsp(DOECap *doe_cap)
+{
+return false;
+}
+#endif
+
 static bool nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp)
 {
 ERRP_GUARD();
@@ -8317,6 +8325,13 @@ void nvme_attach_ns(NvmeCtrl *n, NvmeNamespace *ns)
 BDRV_REQUEST_MAX_BYTES / nvme_l2b(ns, 1));
 }
 
+#ifdef CONFIG_LIBSPDM
+static DOEProtocol doe_spdm_prot[] = {
+{ PCI_VENDOR_ID_PCI_SIG, PCI_SIG_DOE_SPDM, nvme_doe_spdm_rsp },
+{ }
+};
+#endif
+
 static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 {
 NvmeCtrl *n = NVME(pci_dev);
@@ -8359,6 +8374,11 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 
 nvme_attach_ns(n, ns);
 }
+
+#ifdef CONFIG_LIBSPDM
+/* DOE Initailization */
+pcie_doe_init(pci_dev, &n->doe_spdm, 0x190, doe_spdm_prot, true, 0);
+#endif
 }
 
 static void nvme_exit(PCIDevice *pci_dev)
-- 
2.41.0

[RFC v1 1/3] subprojects: libspdm: Initial support

2023-08-08 Thread Alistair Francis

Signed-off-by: Alistair Francis 
---
 meson.build   | 78 +++
 .gitmodules   |  3 ++
 meson_options.txt |  3 ++
 scripts/meson-buildoptions.sh |  3 ++
 subprojects/.gitignore|  1 +
 subprojects/libspdm.wrap  |  5 +++
 6 files changed, 93 insertions(+)
 create mode 100644 subprojects/libspdm.wrap

diff --git a/meson.build b/meson.build
index 98e68ef0b1..3ac91defbc 100644
--- a/meson.build
+++ b/meson.build
@@ -1864,6 +1864,15 @@ elif get_option('vduse_blk_export').disabled()
 have_vduse_blk_export = false
 endif
 
+have_libspdm = (targetos == 'linux')
+if get_option('libspdm').enabled()
+if targetos != 'linux'
+error('libspdm requires linux')
+endif
+elif get_option('libspdm').disabled()
+have_libspdm = false
+endif
+
 # libbpf
 libbpf = dependency('libbpf', required: get_option('bpf'), method: 
'pkg-config')
 if libbpf.found() and not cc.links('''
@@ -2141,6 +2150,7 @@ config_host_data.set('CONFIG_VHOST_VDPA', have_vhost_vdpa)
 config_host_data.set('CONFIG_VMNET', vmnet.found())
 config_host_data.set('CONFIG_VHOST_USER_BLK_SERVER', 
have_vhost_user_blk_server)
 config_host_data.set('CONFIG_VDUSE_BLK_EXPORT', have_vduse_blk_export)
+config_host_data.set('CONFIG_LIBSPDM', have_libspdm)
 config_host_data.set('CONFIG_PNG', png.found())
 config_host_data.set('CONFIG_VNC', vnc.found())
 config_host_data.set('CONFIG_VNC_JPEG', jpeg.found())
@@ -3172,6 +3182,7 @@ blockdev_ss = ss.source_set()
 block_ss = ss.source_set()
 chardev_ss = ss.source_set()
 common_ss = ss.source_set()
+libspdm_ss = ss.source_set()
 crypto_ss = ss.source_set()
 hwcore_ss = ss.source_set()
 io_ss = ss.source_set()
@@ -3321,6 +3332,56 @@ if have_libvduse
   libvduse = libvduse_proj.get_variable('libvduse_dep')
 endif
 
+libspdm = not_found
+if have_libspdm
+  cmake = import('cmake')
+  libspdm_opt_var = cmake.subproject_options()
+
+  libspdm_opt_var.add_cmake_defines({'ARCH': 'x64'})
+  libspdm_opt_var.add_cmake_defines({'TOOLCHAIN': 'NONE'})
+  libspdm_opt_var.add_cmake_defines({'TARGET': 'Release'})
+  libspdm_opt_var.add_cmake_defines({'CRYPTO': 'openssl'})
+  libspdm_opt_var.add_cmake_defines({'ENABLE_BINARY_BUILD': '1'})
+  libspdm_opt_var.add_cmake_defines({'COMPILED_LIBCRYPTO_PATH': '/usr/lib'})
+  libspdm_opt_var.add_cmake_defines({'COMPILED_LIBSSL_PATH': '/usr/lib'})
+  libspdm_opt_var.add_cmake_defines({'DISABLE_TESTS': '1'})
+
+  libspdm_proj = cmake.subproject('libspdm', options: libspdm_opt_var)
+
+  libspdm_lib = libspdm_proj.dependency('spdm_common_lib')
+  libspdm_ss.add(libspdm_lib)
+
+  libspdm_lib = libspdm_proj.dependency('memlib')
+  libspdm_ss.add(libspdm_lib)
+
+  libspdm_lib = libspdm_proj.dependency('malloclib')
+  libspdm_ss.add(libspdm_lib)
+
+  libspdm_lib = libspdm_proj.dependency('spdm_crypt_lib')
+  libspdm_ss.add(libspdm_lib)
+
+  libspdm_lib = libspdm_proj.dependency('spdm_responder_lib')
+  libspdm_ss.add(libspdm_lib)
+
+  libspdm_lib = libspdm_proj.dependency('cryptlib_openssl')
+  libspdm_ss.add(libspdm_lib)
+
+  libspdm_lib = libspdm_proj.dependency('spdm_device_secret_lib_null')
+  libspdm_ss.add(libspdm_lib)
+
+  libspdm_lib = libspdm_proj.dependency('platform_lib_null')
+  libspdm_ss.add(libspdm_lib)
+
+  libspdm_lib = libspdm_proj.dependency('spdm_secured_message_lib')
+  libspdm_ss.add(libspdm_lib)
+
+  libspdm_lib = libspdm_proj.dependency('spdm_crypt_lib')
+  libspdm_ss.add(libspdm_lib)
+
+  libspdm_lib = libspdm_proj.dependency('spdm_crypt_ext_lib')
+  libspdm_ss.add(libspdm_lib)
+endif
+
 # NOTE: the trace/ subdirectory needs the qapi_trace_events variable
 # that is filled in by qapi/.
 subdir('qapi')
@@ -3593,6 +3654,19 @@ libcrypto = static_library('crypto', crypto_ss.sources() 
+ genh,
 crypto = declare_dependency(link_whole: libcrypto,
 dependencies: [authz, qom])
 
+if have_libspdm
+  libspdm_ss = libspdm_ss.apply(config_host, strict: false)
+  libspdm = static_library('libspdm', libspdm_ss.sources() + genh,
+ dependencies: [libspdm_ss.dependencies()],
+ name_suffix: 'fa',
+ build_by_default: false)
+
+  openssl = dependency('openssl', version : '>=3.0.9')
+  libspdm = declare_dependency(link_whole: libspdm, dependencies: [openssl])
+
+  common_user_inc += 'subprojects/libspdm/include'
+endif
+
 io_ss = io_ss.apply(config_host, strict: false)
 libio = static_library('io', io_ss.sources() + genh,
dependencies: [io_ss.dependencies()],
@@ -3668,6 +3742,10 @@ if emulator_modules.length() > 0
   alias_target('modules', emulator_modules)
 endif
 
+if have_libspdm
+  system_ss.add(libspdm)
+endif
+
 system_ss.add(authz, blockdev, chardev, crypto, io, qmp)
 common_ss.add(qom, qemuutil)
 
diff --git a/.gitmodules b/.gitmodules
index 73cae4cd4d..1bf93427ad 100644
--- a/.gitmodules
+++ b/.gitmodules
@@ -43,3 +43,6 @@
 [submodule "tests/lcitool/libvirt-ci"]

Re: [PATCH 02/24] tcg: Use tcg_gen_negsetcond_*

2023-08-08 Thread Peter Maydell

On Tue, 8 Aug 2023 at 04:13, Richard Henderson
 wrote:
>
> Signed-off-by: Richard Henderson 
> ---
>  tcg/tcg-op-gvec.c | 6 ++
>  tcg/tcg-op.c  | 6 ++
>  2 files changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
> index a062239804..e260a07c61 100644
> --- a/tcg/tcg-op-gvec.c
> +++ b/tcg/tcg-op-gvec.c
> @@ -3692,8 +3692,7 @@ static void expand_cmp_i32(uint32_t dofs, uint32_t 
> aofs, uint32_t bofs,
>  for (i = 0; i < oprsz; i += 4) {
>  tcg_gen_ld_i32(t0, cpu_env, aofs + i);
>  tcg_gen_ld_i32(t1, cpu_env, bofs + i);
> -tcg_gen_setcond_i32(cond, t0, t0, t1);
> -tcg_gen_neg_i32(t0, t0);
> +tcg_gen_negsetcond_i32(cond, t0, t0, t1);
>  tcg_gen_st_i32(t0, cpu_env, dofs + i);
>  }

Is it not possible for the optimizer to notice "you did
a setcond followed by a neg, let me turn it into negsetcond
for you" ?

thanks
-- PMM

Re: [PATCH for-8.1 v10 08/14] linux-user: Do not adjust zero_bss for host page size

2023-08-08 Thread Richard Henderson


On 8/8/23 04:38, Alex Bennée wrote:

-if (host_start < host_map_start) {
-memset((void *)host_start, 0, host_map_start - host_start);
+if (align_bss < end_bss) {
+abi_long err = target_mmap(align_bss, end_bss - align_bss, prot,
+   MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS,
+   -1, 0);
+if (err == -1) {
+perror("cannot mmap brk");
+exit(-1);


brk != bss even if brk generally comes after the bss section.


I've removed this error report entirely, returning failure to the caller, which will then 
handle it like any other mmap failure of the image.



r~

[PATCH 1/2] block: minimize bs->reqs_lock section in tracked_request_end()

2023-08-08 Thread Stefan Hajnoczi

Signed-off-by: Stefan Hajnoczi 
---
 block/io.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index 055fcf7438..85d5176256 100644
--- a/block/io.c
+++ b/block/io.c
@@ -593,8 +593,14 @@ static void coroutine_fn 
tracked_request_end(BdrvTrackedRequest *req)
 
 qemu_co_mutex_lock(&req->bs->reqs_lock);
 QLIST_REMOVE(req, list);
+qemu_co_mutex_unlock(&req->bs->reqs_lock);
+
+/*
+ * At this point qemu_co_queue_wait(&req->wait_queue, ...) won't be called
+ * anymore because the request has been removed from the list, so it's safe
+ * to restart the queue outside reqs_lock to minimize the critical section.
+ */
 qemu_co_queue_restart_all(&req->wait_queue);
-qemu_co_mutex_unlock(&req->bs->reqs_lock);
 }
 
 /**
-- 
2.41.0

cpu/i386: update xsave components after CPUID filtering

2023-08-08 Thread Huanyu Zhai

From: NikoZHAI 

On i386 platform, CPUID data are setup through three consecutive steps: CPU 
model definition, expansion and filtering.
XSAVE components are enabled during the expansion stage, by checking if they 
are enabled in CPUID. However, it is still
probable that some XSAVE features will be enabled/disabled during the filtering 
stage and the XSAVE components left unchanged.
Inconsistency between XSAVE features and enabled XSAVE components can lead to 
problems on some Linux guests in the absence of
the following patch in the kernel:

https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1452368.html

A simple case to reproduce this problem is to start a SUSE 12 SP3 guest with 
cpu model set to Skylake-Server:
$ qemu-system-x86_64 -cpu Skylake-Server ...

In the SUSE 12 SP3 guest, one can observe that PKRU will be enabled without 
Intel PKU's presence.
That's because on platform with Skylake-Server cpus, Intel PKU is disabled 
during x86_cpu_filter_features(),
but the XSAVE PKRU bit was enabled by x86_cpu_expand_features().

Signed-off-by: Huanyu Zhai 
Co-authored-by: Wang Xin 
---
 target/i386/cpu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 1242bd541a..1f6424bd80 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6901,6 +6901,9 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool 
verbose)
 mark_unavailable_features(cpu, FEAT_7_0_EBX, 
CPUID_7_0_EBX_INTEL_PT, prefix);
 }
 }
+
+/* Update XSAVE components again based on the filtered CPU feature flags */
+x86_cpu_enable_xsave_components(cpu);
 }
 
 static void x86_cpu_hyperv_realize(X86CPU *cpu)
-- 
2.39.3

Re: [PATCH 02/24] tcg: Use tcg_gen_negsetcond_*

2023-08-08 Thread Richard Henderson


On 8/8/23 08:55, Peter Maydell wrote:

On Tue, 8 Aug 2023 at 04:13, Richard Henderson
 wrote:


Signed-off-by: Richard Henderson 
---
  tcg/tcg-op-gvec.c | 6 ++
  tcg/tcg-op.c  | 6 ++
  2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index a062239804..e260a07c61 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -3692,8 +3692,7 @@ static void expand_cmp_i32(uint32_t dofs, uint32_t aofs, 
uint32_t bofs,
  for (i = 0; i < oprsz; i += 4) {
  tcg_gen_ld_i32(t0, cpu_env, aofs + i);
  tcg_gen_ld_i32(t1, cpu_env, bofs + i);
-tcg_gen_setcond_i32(cond, t0, t0, t1);
-tcg_gen_neg_i32(t0, t0);
+tcg_gen_negsetcond_i32(cond, t0, t0, t1);
  tcg_gen_st_i32(t0, cpu_env, dofs + i);
  }


Is it not possible for the optimizer to notice "you did
a setcond followed by a neg, let me turn it into negsetcond
for you" ?


Not at present, no.  That sort of peephole optimization is a bit more difficult than what 
we do at present.



r~

[PATCH 0/2] block: change reqs_lock to QemuMutex

2023-08-08 Thread Stefan Hajnoczi

As part of the ongoing multi-queue QEMU block layer work, I found that CoMutex
reqs_lock scales poorly when more IOThreads are added. These patches double
IOPS in the 4 IOThreads randread benchmark that I have been running with my
out-of-tree virtio-blk-iothread-vq-mapping branch
(https://gitlab.com/stefanha/qemu/-/commits/virtio-blk-iothread-vq-mapping).

These patches can be merged in preparation for virtio-blk multi-queue block
layer support.

Stefan Hajnoczi (2):
  block: minimize bs->reqs_lock section in tracked_request_end()
  block: change reqs_lock to QemuMutex

 include/block/block_int-common.h |  2 +-
 block.c  |  4 +++-
 block/io.c   | 30 ++
 3 files changed, 22 insertions(+), 14 deletions(-)

-- 
2.41.0

Re: [PATCH for-8.1] linux-user: Define real MAP_FIXED_NOREPLACE value

2023-08-08 Thread Richard Henderson


On 8/8/23 08:16, Akihiko Odaki wrote:

Which supported hosts do not define this value?  Can we simply remove the 
fallback?


glibc didn't have this defined until 2.28. The older releases still maintained are 2.26, 
and 2.27, according to:

https://sourceware.org/glibc/wiki/Release


Thanks for digging into glibc versions.


The page says ALT Linux p9 and Ubuntu 18.04 LTS (Bionic Beaver) has glibc 2.27.


QEMU says

# Support for the previous major version will be dropped 2 years
# after the new major version is released or when the vendor itself
# drops support, whichever comes first.

Ubuntu 22.04 is the current major version, so 20.04 is still supported until 2024, but 
18.04 is unsupported.


Similarly, ALT 10.0 was released in 2021, so ALT 9 is now unsupported.

I have just run a patch to remove the fallback through gitlab CI and it has 
passed:

https://gitlab.com/rth7680/qemu/-/pipelines/959680899/


r~

Re: [PATCH for-8.1 v10 13/14] linux-user: Rewrite fixed probe_guest_base

2023-08-08 Thread Alex Bennée



Richard Henderson  writes:

> Create a set of subroutines to collect a set of guest addresses,
> all of which must be mappable on the host.  Use this within the
> renamed pgb_fixed subroutine to validate the user's choice of
> guest_base specified by the -B command-line option.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

1 2 3 >

1 - 100 of 275 matches

Mail list logo