date:20230705

[PATCH v2] linux-user/syscall: Implement execve without execveat

2023-07-05 Thread Pierrick Bouvier

Support for execveat syscall was implemented in 55bbe4 and is available
since QEMU 8.0.0. It relies on host execveat, which is widely available
on most of Linux kernels today.

However, this change breaks qemu-user self emulation, if "host" qemu
version is less than 8.0.0. Indeed, it does not implement yet execveat.
This strange use case happens with most of distribution today having
binfmt support.

With a concrete failing example:
$ qemu-x86_64-7.2 qemu-x86_64-8.0 /bin/bash -c /bin/ls
/bin/bash: line 1: /bin/ls: Function not implemented
-> not implemented means execve returned ENOSYS

qemu-user-static 7.2 and 8.0 can be conveniently grabbed from debian
packages qemu-user-static* [1].

One usage of this is running wine-arm64 from linux-x64 (details [2]).
This is by updating qemu embedded in docker image that we ran into this
issue.

The solution to update host qemu is not always possible. Either it's
complicated or ask you to recompile it, or simply is not accessible
(GitLab CI, GitHub Actions). Thus, it could be worth to implement execve
without relying on execveat, which is the goal of this patch.

This patch was tested with example presented in this commit message.

[1] http://ftp.us.debian.org/debian/pool/main/q/qemu/
[1] https://www.linaro.org/blog/emulate-windows-on-arm/

Signed-off-by: Pierrick Bouvier 
---
 linux-user/syscall.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 08162cc966..4945ddd7f2 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -659,6 +659,7 @@ safe_syscall4(pid_t, wait4, pid_t, pid, int *, status, int, 
options, \
 #endif
 safe_syscall5(int, waitid, idtype_t, idtype, id_t, id, siginfo_t *, infop, \
   int, options, struct rusage *, rusage)
+safe_syscall3(int, execve, const char *, filename, char **, argv, char **, 
envp)
 safe_syscall5(int, execveat, int, dirfd, const char *, filename,
   char **, argv, char **, envp, int, flags)
 #if defined(TARGET_NR_select) || defined(TARGET_NR__newselect) || \
@@ -8615,9 +8616,9 @@ ssize_t do_guest_readlink(const char *pathname, char 
*buf, size_t bufsiz)
 return ret;
 }
 
-static int do_execveat(CPUArchState *cpu_env, int dirfd,
-   abi_long pathname, abi_long guest_argp,
-   abi_long guest_envp, int flags)
+static int do_execv(CPUArchState *cpu_env, int dirfd,
+abi_long pathname, abi_long guest_argp,
+abi_long guest_envp, int flags, bool is_execveat)
 {
 int ret;
 char **argp, **envp;
@@ -8696,11 +8697,14 @@ static int do_execveat(CPUArchState *cpu_env, int dirfd,
 goto execve_efault;
 }
 
+const char* exe = p;
 if (is_proc_myself(p, "exe")) {
-ret = get_errno(safe_execveat(dirfd, exec_path, argp, envp, flags));
-} else {
-ret = get_errno(safe_execveat(dirfd, p, argp, envp, flags));
+exe = exec_path;
 }
+ret = is_execveat ?
+safe_execveat(dirfd, exe, argp, envp, flags):
+safe_execve(exe, argp, envp);
+ret = get_errno(ret);
 
 unlock_user(p, pathname, 0);
 
@@ -9251,9 +9255,9 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int 
num, abi_long arg1,
 return ret;
 #endif
 case TARGET_NR_execveat:
-return do_execveat(cpu_env, arg1, arg2, arg3, arg4, arg5);
+return do_execv(cpu_env, arg1, arg2, arg3, arg4, arg5, true);
 case TARGET_NR_execve:
-return do_execveat(cpu_env, AT_FDCWD, arg1, arg2, arg3, 0);
+return do_execv(cpu_env, AT_FDCWD, arg1, arg2, arg3, 0, false);
 case TARGET_NR_chdir:
 if (!(p = lock_user_string(arg1)))
 return -TARGET_EFAULT;
-- 
2.40.1

Re: [PULL 07/11] tests/tcg/aarch64: Add testcases for IC IVAU and dual-mapped code

2023-07-05 Thread Philippe Mathieu-Daudé


Cc'ing John.

On 5/7/23 06:53, Richard Henderson wrote:

On 7/4/23 18:36, Peter Maydell wrote:

+int main(int argc, char **argv)
+{
+    const char *shm_name = "qemu-test-tcg-aarch64-icivau";
+    int fd;
+
+    fd = shm_open(shm_name, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR);


Build failures:

https://gitlab.com/qemu-project/qemu/-/jobs/4592433393#L3958
https://gitlab.com/qemu-project/qemu/-/jobs/4592433395#L4149
https://gitlab.com/qemu-project/qemu/-/jobs/4592433400#L3694


/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/bin/ld: 
/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/lib/../lib/librt.a(shm_open.o):
 in function `shm_open':
(.text+0x3c): undefined reference to `__shm_directory'
/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/bin/ld: 
(.text+0xcc): undefined reference to `pthread_setcancelstate'
/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/bin/ld: 
(.text+0xfc): undefined reference to `pthread_setcancelstate'
/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/bin/ld: 
/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/lib/../lib/librt.a(shm_unlink.o):
 in function `shm_unlink':
(.text+0x30): undefined reference to `__shm_directory'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:119: icivau] Error 1
make[1]: *** Waiting for unfinished jobs
make: *** [/builds/qemu-project/qemu/tests/Makefile.include:50: 
build-tcg-tests-aarch64-linux-user] Error 2


It looks like this test needs something else.


Maybe:

icivau: LDFLAGS+=-lrt -pthread

?

Re: [PATCH] pnv/xive2: Always pass a presenter object when accessing the TIMA

2023-07-05 Thread Cédric Le Goater


On 7/5/23 10:14, Frederic Barrat wrote:

The low-level functions to access the TIMA take a presenter object as
a first argument. When accessing the TIMA from the IC BAR,
i.e. indirect calls, we currently pass a NULL pointer for the
presenter argument. While it appears ok with the current usage, it's
dangerous. And it's pretty easy to figure out the presenter in that
context, so this patch fixes it.

Signed-off-by: Frederic Barrat 




Reviewed-by: Cédric Le Goater 

Thanks,

C.



---
  hw/intc/pnv_xive2.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/intc/pnv_xive2.c b/hw/intc/pnv_xive2.c
index 82fcd3ea22..bbb44a533c 100644
--- a/hw/intc/pnv_xive2.c
+++ b/hw/intc/pnv_xive2.c
@@ -1624,6 +1624,7 @@ static uint64_t pnv_xive2_ic_tm_indirect_read(void 
*opaque, hwaddr offset,
unsigned size)
  {
  PnvXive2 *xive = PNV_XIVE2(opaque);
+XivePresenter *xptr = XIVE_PRESENTER(xive);
  hwaddr hw_page_offset;
  uint32_t pir;
  XiveTCTX *tctx;
@@ -1633,7 +1634,7 @@ static uint64_t pnv_xive2_ic_tm_indirect_read(void 
*opaque, hwaddr offset,
  hw_page_offset = pnv_xive2_ic_tm_get_hw_page_offset(xive, offset);
  tctx = pnv_xive2_get_indirect_tctx(xive, pir);
  if (tctx) {
-val = xive_tctx_tm_read(NULL, tctx, hw_page_offset, size);
+val = xive_tctx_tm_read(xptr, tctx, hw_page_offset, size);
  }
  
  return val;

@@ -1643,6 +1644,7 @@ static void pnv_xive2_ic_tm_indirect_write(void *opaque, 
hwaddr offset,
 uint64_t val, unsigned size)
  {
  PnvXive2 *xive = PNV_XIVE2(opaque);
+XivePresenter *xptr = XIVE_PRESENTER(xive);
  hwaddr hw_page_offset;
  uint32_t pir;
  XiveTCTX *tctx;
@@ -1651,7 +1653,7 @@ static void pnv_xive2_ic_tm_indirect_write(void *opaque, 
hwaddr offset,
  hw_page_offset = pnv_xive2_ic_tm_get_hw_page_offset(xive, offset);
  tctx = pnv_xive2_get_indirect_tctx(xive, pir);
  if (tctx) {
-xive_tctx_tm_write(NULL, tctx, hw_page_offset, val, size);
+xive_tctx_tm_write(xptr, tctx, hw_page_offset, val, size);
  }
  }

Re: [PATCH] ppc/pegasos2: Add support for -initrd command line option

2023-07-05 Thread Cédric Le Goater


Hello,

On 7/4/23 20:19, BALATON Zoltan wrote:

Signed-off-by: BALATON Zoltan 

The patch is not a oneliner, it deserves a minimal commit log.

Thanks,

C.


---
  hw/ppc/pegasos2.c | 32 +++-
  1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/pegasos2.c b/hw/ppc/pegasos2.c
index af5489de26..9c9944188b 100644
--- a/hw/ppc/pegasos2.c
+++ b/hw/ppc/pegasos2.c
@@ -44,6 +44,8 @@
  #define PROM_ADDR 0xfff0
  #define PROM_SIZE 0x8
  
+#define INITRD_MIN_ADDR 0x60

+
  #define KVMPPC_HCALL_BASE0xf000
  #define KVMPPC_H_RTAS(KVMPPC_HCALL_BASE + 0x0)
  #define KVMPPC_H_VOF_CLIENT  (KVMPPC_HCALL_BASE + 0x5)
@@ -80,6 +82,8 @@ struct Pegasos2MachineState {
  uint64_t kernel_addr;
  uint64_t kernel_entry;
  uint64_t kernel_size;
+uint64_t initrd_addr;
+uint64_t initrd_size;
  };
  
  static void *build_fdt(MachineState *machine, int *fdt_size);

@@ -117,7 +121,8 @@ static void pegasos2_init(MachineState *machine)
  I2CBus *i2c_bus;
  const char *fwname = machine->firmware ?: PROM_FILENAME;
  char *filename;
-int i, sz;
+int i;
+ssize_t sz;
  uint8_t *spd_data;
  
  /* init CPU */

@@ -213,6 +218,20 @@ static void pegasos2_init(MachineState *machine)
  warn_report("Using Virtual OpenFirmware but no -kernel option.");
  }
  
+if (machine->initrd_filename) {

+pm->initrd_addr = pm->kernel_addr + pm->kernel_size + 64 * KiB;
+pm->initrd_addr = ROUND_UP(pm->initrd_addr, 4);
+pm->initrd_addr = MAX(pm->initrd_addr, INITRD_MIN_ADDR);
+sz = load_image_targphys(machine->initrd_filename, pm->initrd_addr,
+ machine->ram_size - pm->initrd_addr);
+if (sz <= 0) {
+error_report("Could not load initrd '%s'",
+ machine->initrd_filename);
+exit(1);
+}
+pm->initrd_size = sz;
+}
+
  if (!pm->vof && machine->kernel_cmdline && machine->kernel_cmdline[0]) {
  warn_report("Option -append may be ineffective with -bios.");
  }
@@ -335,6 +354,11 @@ static void pegasos2_machine_reset(MachineState *machine, 
ShutdownCause reason)
  error_report("Memory for kernel is in use");
  exit(1);
  }
+if (pm->initrd_size &&
+vof_claim(pm->vof, pm->initrd_addr, pm->initrd_size, 0) == -1) {
+error_report("Memory for initrd is in use");
+exit(1);
+}
  fdt = build_fdt(machine, &sz);
  /* FIXME: VOF assumes entry is same as load address */
  d[0] = cpu_to_be64(pm->kernel_entry);
@@ -966,6 +990,12 @@ static void *build_fdt(MachineState *machine, int 
*fdt_size)
  qemu_fdt_setprop_string(fdt, "/memory@0", "name", "memory");
  
  qemu_fdt_add_subnode(fdt, "/chosen");

+if (pm->initrd_addr && pm->initrd_size) {
+qemu_fdt_setprop_cell(fdt, "/chosen", "linux,initrd-end",
+  pm->initrd_addr + pm->initrd_size);
+qemu_fdt_setprop_cell(fdt, "/chosen", "linux,initrd-start",
+  pm->initrd_addr);
+}
  qemu_fdt_setprop_string(fdt, "/chosen", "bootargs",
  machine->kernel_cmdline ?: "");
  qemu_fdt_setprop_string(fdt, "/chosen", "name", "chosen");

Re: [PATCH] ppc/pegasos2: Add support for -initrd command line option

2023-07-05 Thread Cédric Le Goater


On 7/5/23 11:35, Cédric Le Goater wrote:

Hello,

On 7/4/23 20:19, BALATON Zoltan wrote:

Signed-off-by: BALATON Zoltan 

The patch is not a oneliner, it deserves a minimal commit log.


It was fixed later on. Forget my comment.

Thanks,

C.

Re: [PATCH v2] hw/ide/piix: properly initialize the BMIBA register

2023-07-05 Thread Olaf Hering

Tue, 4 Jul 2023 08:38:33 +0200 Paolo Bonzini :

> I agree that calling pci_device_reset() would be a better match for 
> pci_xen_ide_unplug().

This change works as well:

--- a/hw/i386/xen/xen_platform.c
+++ b/hw/i386/xen/xen_platform.c
@@ -164,8 +164,9 @@ static void pci_unplug_nics(PCIBus *bus)
  *
  * [1] 
https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/misc/hvm-emulated-unplug.pandoc
  */
-static void pci_xen_ide_unplug(DeviceState *dev, bool aux)
+static void pci_xen_ide_unplug(PCIDevice *d, bool aux)
 {
+DeviceState *dev = DEVICE(d);
 PCIIDEState *pci_ide;
 int i;
 IDEDevice *idedev;
@@ -195,7 +196,7 @@ static void pci_xen_ide_unplug(DeviceState *dev, bool aux)
 blk_unref(blk);
 }
 }
-device_cold_reset(dev);
+pci_device_reset(d);
 }
 
 static void unplug_disks(PCIBus *b, PCIDevice *d, void *opaque)
@@ -210,7 +211,7 @@ static void unplug_disks(PCIBus *b, PCIDevice *d, void 
*opaque)
 
 switch (pci_get_word(d->config + PCI_CLASS_DEVICE)) {
 case PCI_CLASS_STORAGE_IDE:
-pci_xen_ide_unplug(DEVICE(d), aux);
+pci_xen_ide_unplug(d, aux);
 break;
 
 case PCI_CLASS_STORAGE_SCSI:
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -118,7 +118,6 @@ static void piix_ide_reset(DeviceState *dev)
 pci_set_word(pci_conf + PCI_COMMAND, 0x);
 pci_set_word(pci_conf + PCI_STATUS,
  PCI_STATUS_DEVSEL_MEDIUM | PCI_STATUS_FAST_BACK);
-pci_set_byte(pci_conf + 0x20, 0x01);  /* BMIBA: 20-23h */
 }
 
 static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, Error **errp)


Olaf


pgpkjwqzOkjC0.pgp
Description: Digitale Signatur von OpenPGP

Re: [PATCH v3] vhost-user: delay vhost_user_stop

2023-07-05 Thread Philippe Mathieu-Daudé


Hi Marc-André,

[very old patch...]

On 27/2/17 11:49, Marc-André Lureau wrote:

Since commit b0a335e351103bf92f3f9d0bd5759311be8156ac, a socket write
may trigger a disconnect events, calling vhost_user_stop() and clearing
all the vhost_dev strutures holding data that vhost.c functions expect
to remain valid. Delay the cleanup to keep the vhost_dev structure
valid during the vhost.c functions.

Signed-off-by: Marc-André Lureau 
---
v3:
  - use aio_bh_schedule_oneshot(), as suggest by Paolo
v2:
  - fix reconnect race

net/vhost-user.c | 53 ++---
  1 file changed, 46 insertions(+), 7 deletions(-)

diff --git a/net/vhost-user.c b/net/vhost-user.c
index 77b8110f8c..e7e63408a1 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -190,7 +190,35 @@ static gboolean net_vhost_user_watch(GIOChannel *chan, 
GIOCondition cond,
  
  qemu_chr_fe_disconnect(&s->chr);
  
-return FALSE;

+return TRUE;


Do you mind explaining this change again, it is not clear from
the commit description. We listen to G_IO_HUP, got a SIGHUP so
we disconnect the chardev but keep listening for future HUP?
In which case can that happen? How can we get a chardev connected
and initialized here without calling net_init_vhost_user() again?


+}
+
+static void net_vhost_user_event(void *opaque, int event);
+
+static void chr_closed_bh(void *opaque)
+{
+const char *name = opaque;
+NetClientState *ncs[MAX_QUEUE_NUM];
+VhostUserState *s;
+Error *err = NULL;
+int queues;
+
+queues = qemu_find_net_clients_except(name, ncs,
+  NET_CLIENT_DRIVER_NIC,
+  MAX_QUEUE_NUM);
+assert(queues < MAX_QUEUE_NUM);
+
+s = DO_UPCAST(VhostUserState, nc, ncs[0]);
+
+qmp_set_link(name, false, &err);
+vhost_user_stop(queues, ncs);
+
+qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, net_vhost_user_event,
+ opaque, NULL, true);
+
+if (err) {
+error_report_err(err);
+}
  }
  
  static void net_vhost_user_event(void *opaque, int event)

@@ -212,20 +240,31 @@ static void net_vhost_user_event(void *opaque, int event)
  trace_vhost_user_event(chr->label, event);
  switch (event) {
  case CHR_EVENT_OPENED:
-s->watch = qemu_chr_fe_add_watch(&s->chr, G_IO_HUP,
- net_vhost_user_watch, s);
  if (vhost_user_start(queues, ncs, &s->chr) < 0) {
  qemu_chr_fe_disconnect(&s->chr);
  return;
  }
+s->watch = qemu_chr_fe_add_watch(&s->chr, G_IO_HUP,
+ net_vhost_user_watch, s);
  qmp_set_link(name, true, &err);
  s->started = true;
  break;
  case CHR_EVENT_CLOSED:
-qmp_set_link(name, false, &err);
-vhost_user_stop(queues, ncs);
-g_source_remove(s->watch);
-s->watch = 0;
+/* a close event may happen during a read/write, but vhost
+ * code assumes the vhost_dev remains setup, so delay the
+ * stop & clear to idle.
+ * FIXME: better handle failure in vhost code, remove bh
+ */
+if (s->watch) {
+AioContext *ctx = qemu_get_current_aio_context();
+
+g_source_remove(s->watch);
+s->watch = 0;
+qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, NULL,
+ NULL, NULL, false);
+
+aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
+}
  break;
  }

Re: [PATCH] ppc/pnv: Set P10 core xscom region size to match hardware

2023-07-05 Thread Cédric Le Goater


On 7/5/23 04:05, Joel Stanley wrote:

On Wed, 5 Jul 2023 at 01:27, Nicholas Piggin  wrote:


The P10 core xscom memory regions overlap because the size is wrong.
The P10 core+L2 xscom region size is allocated as 0x1000 (with some
unused ranges). "EC" is used as a closer match, as "EX" includes L3
which has a disjoint xscom range that would require a different
region if it were implemented.

Signed-off-by: Nicholas Piggin 


Nice, that looks better:

0001-0001000f (prio 0, i/o): xscom-quad.0: 0x10
000100108000-00010010 (prio 0, i/o): xscom-core.3: 0x8000
00010011-000100117fff (prio 0, i/o): xscom-core.2: 0x8000
00010012-000100127fff (prio 0, i/o): xscom-core.1: 0x8000
00010014-000100147fff (prio 0, i/o): xscom-core.0: 0x8000
00010800-0001080f (prio 0, i/o): xscom-quad.4: 0x10
000108108000-00010810 (prio 0, i/o): xscom-core.7: 0x8000
00010811-000108117fff (prio 0, i/o): xscom-core.6: 0x8000
00010812-000108127fff (prio 0, i/o): xscom-core.5: 0x8000
00010814-000108147fff (prio 0, i/o): xscom-core.4: 0x8000

Reviewed-by: Joel Stanley 


It'd interesting to add some dummy SLW handlers to get rid of the
XSCOM errors at boot and shutdown on P10 :

[ 4824.393446266,3] XSCOM: write error gcid=0x0 pcb_addr=0x200e883c stat=0x0
[ 4824.393588777,5] Unable to log error
[ 4824.393650582,3] XSCOM: Write failed, ret =  -6
[ 4824.394124623,3] Could not set special wakeup on 0:0: Unable to write 
QME_SPWU_HYP.
[ 4824.394368459,3] XSCOM: write error gcid=0x0 pcb_addr=0x200e883c stat=0x0
[ 4824.394382007,5] Unable to log error
[ 4824.394384603,3] XSCOM: Write failed, ret =  -6

Thanks,

C.





---
  hw/ppc/pnv_core.c  | 3 +--
  include/hw/ppc/pnv_xscom.h | 2 +-
  2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/pnv_core.c b/hw/ppc/pnv_core.c
index b7223bb445..ffbc29cbf4 100644
--- a/hw/ppc/pnv_core.c
+++ b/hw/ppc/pnv_core.c
@@ -299,9 +299,8 @@ static void pnv_core_realize(DeviceState *dev, Error **errp)
  }

  snprintf(name, sizeof(name), "xscom-core.%d", cc->core_id);
-/* TODO: check PNV_XSCOM_EX_SIZE for p10 */
  pnv_xscom_region_init(&pc->xscom_regs, OBJECT(dev), pcc->xscom_ops,
-  pc, name, PNV_XSCOM_EX_SIZE);
+  pc, name, PNV10_XSCOM_EC_SIZE);

  qemu_register_reset(pnv_core_reset, pc);
  return;
diff --git a/include/hw/ppc/pnv_xscom.h b/include/hw/ppc/pnv_xscom.h
index f7da9a1dc6..a4c9d95dc5 100644
--- a/include/hw/ppc/pnv_xscom.h
+++ b/include/hw/ppc/pnv_xscom.h
@@ -133,7 +133,7 @@ struct PnvXScomInterfaceClass {

  #define PNV10_XSCOM_EC_BASE(core) \
  ((uint64_t) PNV10_XSCOM_EQ_BASE(core) | PNV10_XSCOM_EC(core & 0x3))
-#define PNV10_XSCOM_EC_SIZE0x10
+#define PNV10_XSCOM_EC_SIZE0x1000

  #define PNV10_XSCOM_PSIHB_BASE 0x3011D00
  #define PNV10_XSCOM_PSIHB_SIZE 0x100
--
2.40.1

Re: [PATCH v21 00/20] s390x: CPU Topology

2023-07-05 Thread Thomas Huth


On 30/06/2023 11.17, Pierre Morel wrote:
...

Testing
===

To use the QEMU patches, you will need Linux V6-rc1 or newer,
or use the following Linux mainline patches:

f5ecfee94493 2022-07-20 KVM: s390: resetting the Topology-Change-Report
24fe0195bc19 2022-07-20 KVM: s390: guest support for topology function
0130337ec45b 2022-07-20 KVM: s390: Cleanup ipte lock access and SIIF fac..

Currently this code is for KVM only, I have no idea if it is interesting
to provide a TCG patch. If ever it will be done in another series.

This series provide 12 avocado tests using Fedora-35 kernel and initrd
image.


 Hi Pierre,

the new avocado tests currently fail if you run them on a x86 host. Could 
you please add a check that they are properly skipped instead if the 
environment does not match? I guess a


 self.require_accelerator('kvm')

should do the job...

 Thomas

RE: [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO device assignment

2023-07-05 Thread Duan, Zhenzhong

>-Original Message-
>From: Jean-Philippe Brucker 
>Sent: Wednesday, July 5, 2023 4:29 PM
>Subject: Re: [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO device
>assignment
>
>On Wed, Jul 05, 2023 at 04:52:09AM +, Duan, Zhenzhong wrote:
>> Hi Eric,
>>
>> >-Original Message-
>> >From: Eric Auger 
>> >Sent: Tuesday, July 4, 2023 7:15 PM
>> >Subject: [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO
>> >device assignment
>> >
>> >When running on a 64kB page size host and protecting a VFIO device
>> >with the virtio-iommu, qemu crashes with this kind of message:
>> >
>> >qemu-kvm: virtio-iommu page mask 0xf000 is incompatible
>> >with mask 0x2001
>>
>> Does 0x2001 mean only  512MB and 64KB super page mapping is
>> supported for host iommu hw? 4KB mapping not supported?
>
>It's not a restriction by the HW IOMMU, but the host kernel. An Arm SMMU
>can implement 4KB, 16KB and/or 64KB granules, but the host kernel only
>advertises through VFIO the granule corresponding to host PAGE_SIZE. This
>restriction is done by arm_lpae_restrict_pgsizes() in order to choose a page
>size when a device is driven by the host.

Just curious why not advertises the Arm SMMU implemented granules to VFIO
Eg:4KB, 16KB or 64KB granules? But arm_lpae_restrict_pgsizes() restricted ones,
Eg: for SZ_4K, (SZ_4K | SZ_2M | SZ_1G).
(SZ_4K | SZ_2M | SZ_1G) looks not real hardware granules of Arm SMMU.

>
>>
>> There is a check in guest kernel side hint only 4KB is supported, with
>> this patch we force viommu->pgsize_bitmap to 0x2001 and fail below
>> check? Does this device work in guest?
>> Please correct me if I understand wrong.
>
>Right, a guest using 4KB pages under a host that uses 64KB is not supported,
>because if the guest attempts to dma_map a 4K page, the IOMMU cannot
>create a mapping small enough, the mapping would have to spill over
>neighbouring guest memory.
>
>One possible solution would be supporting multiple page granules. If we
>added a page granule negotiation through VFIO and virtio-iommu then the
>guest could pick the page size it wants. But this requires changes to Linux 
>UAPI
>so isn't a priority at the moment, because we're trying to enable nesting which
>would support 64K-host/4K-guest as well.
>
>See also the discussion on the patch that introduced the guest check
>https://lore.kernel.org/linux-iommu/20200318114047.1518048-1-jean-
>phili...@linaro.org/

Clear, thanks for sharing the history.

Regards
Zhenzhong

Re: [PATCH v21 16/20] tests/avocado: s390x cpu topology entitlement tests

2023-07-05 Thread Thomas Huth


On 30/06/2023 11.17, Pierre Morel wrote:

This test takes care to check the changes on different entitlements
when the guest requests a polarization change.

Signed-off-by: Pierre Morel 
---
  tests/avocado/s390_topology.py | 47 ++
  1 file changed, 47 insertions(+)

diff --git a/tests/avocado/s390_topology.py b/tests/avocado/s390_topology.py
index 2cf731cb1d..4855e5d7e4 100644
--- a/tests/avocado/s390_topology.py
+++ b/tests/avocado/s390_topology.py
@@ -240,3 +240,50 @@ def test_polarisation(self):
  res = self.vm.qmp('query-cpu-polarization')
  self.assertEqual(res['return']['polarization'], 'horizontal')
  self.check_topology(0, 0, 0, 0, 'medium', False)
+
+def test_entitlement(self):
+"""
+This test verifies that QEMU modifies the polarization
+after a guest request.

...

+self.check_topology(0, 0, 0, 0, 'low', False)
+self.check_topology(1, 0, 0, 0, 'medium', False)
+self.check_topology(2, 1, 0, 0, 'high', False)
+self.check_topology(3, 1, 0, 0, 'high', False)
+
+self.guest_set_dispatching('1');
+
+self.check_topology(0, 0, 0, 0, 'low', False)
+self.check_topology(1, 0, 0, 0, 'medium', False)
+self.check_topology(2, 1, 0, 0, 'high', False)
+self.check_topology(3, 1, 0, 0, 'high', False)
+
+self.guest_set_dispatching('0');
+
+self.check_topology(0, 0, 0, 0, 'low', False)
+self.check_topology(1, 0, 0, 0, 'medium', False)
+self.check_topology(2, 1, 0, 0, 'high', False)
+self.check_topology(3, 1, 0, 0, 'high', False)


Sorry, I think I'm too blind to see it, but what has changed after the guest 
changed the polarization?


 Thomas

Re: [PATCH v2] linux-user/syscall: Implement execve without execveat

2023-07-05 Thread Michael Tokarev


05.07.2023 12:00, Pierrick Bouvier wrote:
...

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 08162cc966..4945ddd7f2 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -659,6 +659,7 @@ safe_syscall4(pid_t, wait4, pid_t, pid, int *, status, int, 
options, \
  #endif
  safe_syscall5(int, waitid, idtype_t, idtype, id_t, id, siginfo_t *, infop, \
int, options, struct rusage *, rusage)
+safe_syscall3(int, execve, const char *, filename, char **, argv, char **, 
envp)
  safe_syscall5(int, execveat, int, dirfd, const char *, filename,
char **, argv, char **, envp, int, flags)
  #if defined(TARGET_NR_select) || defined(TARGET_NR__newselect) || \
@@ -8615,9 +8616,9 @@ ssize_t do_guest_readlink(const char *pathname, char 
*buf, size_t bufsiz)
  return ret;
  }
  
-static int do_execveat(CPUArchState *cpu_env, int dirfd,

-   abi_long pathname, abi_long guest_argp,
-   abi_long guest_envp, int flags)
+static int do_execv(CPUArchState *cpu_env, int dirfd,
+abi_long pathname, abi_long guest_argp,
+abi_long guest_envp, int flags, bool is_execveat)
  {
  int ret;
  char **argp, **envp;
@@ -8696,11 +8697,14 @@ static int do_execveat(CPUArchState *cpu_env, int dirfd,
  goto execve_efault;
  }
  
+const char* exe = p;

  if (is_proc_myself(p, "exe")) {
-ret = get_errno(safe_execveat(dirfd, exec_path, argp, envp, flags));
-} else {
-ret = get_errno(safe_execveat(dirfd, p, argp, envp, flags));
+exe = exec_path;
  }
+ret = is_execveat ?
+safe_execveat(dirfd, exe, argp, envp, flags):
+safe_execve(exe, argp, envp);
+ret = get_errno(ret);



This one has 2 issues reported by checkpatch.pl:

$ git show | ./scripts/checkpatch.pl -

ERROR: "foo* bar" should be "foo *bar"
#161: FILE: linux-user/syscall.c:8700:
+const char* exe = p;

ERROR: spaces required around that ':' (ctx:VxE)
#169: FILE: linux-user/syscall.c:8705:
+safe_execveat(dirfd, exe, argp, envp, flags):
 ^

total: 2 errors, 0 warnings, 47 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

As I mentioned in the v1, I don't remember offhand how the arithmetic if
should be styled in qemu.  At the very best, the v2 variant is difficult
to read because ":" is too close to ";" visually, an extra space before
it will make it more explicit.

Other than that, I'm fine with this version.

With the checkpatch issues fixed,

Reviewed-by: Michael Tokarev 

Thanks,

/mjt

Re: [PATCH v21 18/20] tests/avocado: s390x cpu topology test socket full

2023-07-05 Thread Thomas Huth


On 30/06/2023 11.17, Pierre Morel wrote:

This test verifies that QMP set-cpu-topology does not accept
to overload a socket.

Signed-off-by: Pierre Morel 
---
  tests/avocado/s390_topology.py | 25 +
  1 file changed, 25 insertions(+)

diff --git a/tests/avocado/s390_topology.py b/tests/avocado/s390_topology.py
index cba44bec91..0003b30702 100644
--- a/tests/avocado/s390_topology.py
+++ b/tests/avocado/s390_topology.py
@@ -315,3 +315,28 @@ def test_dedicated(self):
  self.guest_set_dispatching('0');
  
  self.check_topology(0, 0, 0, 0, 'high', True)

+
+def test_socket_full(self):
+"""
+This test verifies that QEMU does not accept to overload a socket.
+The socket-id 0 on book-id 0 already contains CPUs 0 and 1 and can
+not accept any new CPU while socket-id 0 on book-id 1 is free.
+
+:avocado: tags=arch:s390x
+:avocado: tags=machine:s390-ccw-virtio
+"""
+self.kernel_init()
+self.vm.add_args('-smp',
+ '3,drawers=2,books=2,sockets=3,cores=2,maxcpus=24')
+self.vm.launch()
+self.wait_until_booted()
+
+self.system_init()
+
+res = self.vm.qmp('set-cpu-topology',
+  {'core-id': 2, 'socket-id': 0, 'book-id': 0})
+self.assertEqual(res['error']['class'], 'GenericError')
+
+res = self.vm.qmp('set-cpu-topology',
+  {'core-id': 2, 'socket-id': 0, 'book-id': 1})
+self.assertEqual(res['return'], {})


Reviewed-by: Thomas Huth

Re: [PATCH v21 19/20] tests/avocado: s390x cpu topology dedicated errors

2023-07-05 Thread Thomas Huth


On 30/06/2023 11.17, Pierre Morel wrote:

Let's test that QEMU refuses to setup a dedicated CPU with
low or medium entitlement.

Signed-off-by: Pierre Morel 
---
  tests/avocado/s390_topology.py | 48 ++
  1 file changed, 48 insertions(+)


Reviewed-by: Thomas Huth

Re: [PATCH v21 20/20] tests/avocado: s390x cpu topology bad move

2023-07-05 Thread Thomas Huth


On 30/06/2023 11.17, Pierre Morel wrote:

This test verifies that QEMU refuses to move a CPU to an
unexistant location.


s/unexistant/nonexistent/ ?


Signed-off-by: Pierre Morel 
---
  tests/avocado/s390_topology.py | 25 +
  1 file changed, 25 insertions(+)

diff --git a/tests/avocado/s390_topology.py b/tests/avocado/s390_topology.py
index 99d9508cef..ea39168b53 100644
--- a/tests/avocado/s390_topology.py
+++ b/tests/avocado/s390_topology.py
@@ -388,3 +388,28 @@ def test_dedicated_error(self):
  res = self.vm.qmp('set-cpu-topology',
{'core-id': 0, 'entitlement': 'medium', 
'dedicated': False})
  self.assertEqual(res['return'], {})
+
+def test_move_error(self):
+"""
+This test verifies that QEMU refuses to move a CPU to an
+unexistant location


s/unexistant/nonexistent/ ?

With the words fixed:
Reviewed-by: Thomas Huth

Re: [PATCH v7 5/6] hw/pci: ensure PCIE devices are plugged into only slot 0 of PCIE port

2023-07-05 Thread Akihiko Odaki


On 2023/07/05 14:43, Ani Sinha wrote:




On 05-Jul-2023, at 7:09 AM, Akihiko Odaki  wrote:



On 2023/07/05 0:07, Ani Sinha wrote:

On 04-Jul-2023, at 7:58 PM, Igor Mammedov  wrote:

On Tue, 4 Jul 2023 19:20:00 +0530
Ani Sinha  wrote:


On 04-Jul-2023, at 6:18 PM, Igor Mammedov  wrote:

On Tue, 4 Jul 2023 21:02:09 +0900
Akihiko Odaki  wrote:


On 2023/07/04 20:59, Ani Sinha wrote:




On 04-Jul-2023, at 5:24 PM, Akihiko Odaki  wrote:

On 2023/07/04 20:25, Ani Sinha wrote:

PCI Express ports only have one slot, so PCI Express devices can only be
plugged into slot 0 on a PCIE port. Add a warning to let users know when the
invalid configuration is used. We may enforce this more strongly later on once
we get more clarity on whether we are introducing a bad regression for users
currenly using the wrong configuration.
The change has been tested to not break or alter behaviors of ARI capable
devices by instantiating seven vfs on an emulated igb device (the maximum
number of vfs the linux igb driver supports). The vfs instantiated correctly
and are seen to have non-zero device/slot numbers in the conventional PCI BDF
representation.
CC: jus...@redhat.com
CC: imamm...@redhat.com
CC: m...@redhat.com
CC: akihiko.od...@daynix.com
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
Signed-off-by: Ani Sinha 
Reviewed-by: Julia Suvorova 
---
hw/pci/pci.c | 15 +++
1 file changed, 15 insertions(+)
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index e2eb4c3b4a..47517ba3db 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -65,6 +65,7 @@ bool pci_available = true;
static char *pcibus_get_dev_path(DeviceState *dev);
static char *pcibus_get_fw_dev_path(DeviceState *dev);
static void pcibus_reset(BusState *qbus);
+static bool pcie_has_upstream_port(PCIDevice *dev);
   static Property pci_props[] = {
 DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
@@ -2121,6 +2122,20 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 }
 }
+/*
+ * With SRIOV and ARI, vfs can have non-zero slot in the conventional
+ * PCI interpretation as all five bits reserved for slot addresses are
+ * also used for function bits for the various vfs. Ignore that case.


You don't have to mention SR/IOV; it affects all ARI-capable devices. A PF can 
also have non-zero slot number in the conventional interpretation so you 
shouldn't call it vf either.


Can you please help write a comment that explains this properly for all cases - 
ARI/non-ARI, PFs and VFs? Once everyone agrees that its clear and correct, I 
will re-spin.


Simply, you can say:
With ARI, the slot number field in the conventional PCI interpretation
can have a non-zero value as the field bits are reused to extend the
function number bits. Ignore that case.


mentioning 'conventional PCI interpretation' in comment and then immediately
checking 'pci_is_express(pci_dev)' is confusing. Since comment belongs
only to PCIE branch it would be better to talk in only about PCIe stuff
and referring to relevant portions of spec.


Ok so how about this?

   * With ARI, devices can have non-zero slot in the traditional BDF
 * representation as all five bits reserved for slot addresses are
 * also used for function bits. Ignore that case.


you still refer to traditional (which I misread as 'conventional'),
steal the linux comment and argument it with ARI if necessary,
something like this (probably needs some more massaging):

The comment messaging in these patches seems to exceed the value of the patch 
itself :-)
How about this?
 /*
  * A PCIe Downstream Port normally leads to a Link with only Device
  * 0 on it (PCIe spec r3.1, sec 7.3.1).
  * With ARI, PCI_SLOT() can return non-zero value as all five bits
  * reserved for slot addresses are also used for function bits.
  * Hence, ignore ARI capable devices.
  */


Perhaps: s/normally leads to/must lead to/

 From the kernel perspective, they may need to deal with a quirky hardware that 
does not conform with the specification, but from QEMU perspective, it is what 
we *must* conform with.


PCI base spec 4.0, rev 3, section 7.3.1 says:

"
Downstream Ports that do not have ARI Forwarding enabled must associate only 
Device 0 with the device attached to the Logical Bus representing the Link from 
the Port. Configuration Requests 15 targeting the Bus Number associated with a 
Link specifying Device Number 0 are delivered to the device attached to the 
Link; Configuration Requests specifying all other Device Numbers (1-31) must be 
terminated by the Switch Downstream Port or the Root Port with an Unsupported 
Request Completion Status (equivalent to Master Abort in PCI). Non-ARI Devices 
must not assume that Device Number 0 is associated with their Upstream Port, 
but must capture their assigned Device Number as discussed in Section 2.2.6.2. 
Non-ARI Devices must respond to all Type 0 Configuration Read Requests, 
regardless of the Device Number specified in

[PATCH] riscv: add config for asid size

2023-07-05 Thread Ben Dooks

Add a config to the cpu state to control the size of the ASID area
in the SATP CSR to enable testing with smaller than the default (which
is currently maximum for both rv32 and rv64). It also adds the ability
to stop the ASID feature by using 0 to disable it.

For example, an rv64 with only 8 asid bits:
-cpu rv64,asid-bits=8

or no asids:
-cpu rv64,asid-bits=0

Signed-off-by: Ben Dooks 
---
 target/riscv/cpu.c  | 42 +
 target/riscv/cpu.h  |  1 +
 target/riscv/cpu_bits.h |  2 ++
 target/riscv/cpu_cfg.h  |  1 +
 target/riscv/csr.c  |  1 +
 5 files changed, 47 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 4035fe0e62..c703005aba 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1281,6 +1281,38 @@ static void riscv_cpu_satp_mode_finalize(RISCVCPU *cpu, 
Error **errp)
 }
 }
 }
+
+static void riscv_cpu_asid_finalized_features(RISCVCPU *cpu, Error **errp)
+{
+bool rv32 = riscv_cpu_mxl(&cpu->env) == MXL_RV32;
+target_ulong asid_mask, asid_shift;
+target_ulong calc_mask;
+
+if (rv32) {
+asid_mask = SATP32_ASID;
+asid_shift = SATP32_ASID_SHIFT;
+} else {
+asid_mask = SATP64_ASID;
+asid_shift = SATP64_ASID_SHIFT;
+}
+
+if (cpu->cfg.asid_bits < 0) {
+cpu->env.asid_clear = 0;
+return;
+}
+
+calc_mask = ((target_ulong)1 << cpu->cfg.asid_bits) - 1;
+calc_mask <<= asid_shift;
+
+if (calc_mask > asid_mask) {
+error_setg(errp, "invalid ASID bits [0 %d]",
+   __builtin_clz(asid_mask >> asid_shift));
+return;
+}
+
+cpu->env.asid_clear = calc_mask ^ asid_mask;
+}
+
 #endif
 
 static void riscv_cpu_finalize_features(RISCVCPU *cpu, Error **errp)
@@ -1293,6 +1325,12 @@ static void riscv_cpu_finalize_features(RISCVCPU *cpu, 
Error **errp)
 error_propagate(errp, local_err);
 return;
 }
+
+riscv_cpu_asid_finalized_features(cpu, &local_err);
+if (local_err != NULL) {
+error_propagate(errp, local_err);
+return;
+}
 #endif
 }
 
@@ -1648,6 +1686,10 @@ static Property riscv_cpu_extensions[] = {
 DEFINE_PROP_BOOL("zicboz", RISCVCPU, cfg.ext_icboz, true),
 DEFINE_PROP_UINT16("cboz_blocksize", RISCVCPU, cfg.cboz_blocksize, 64),
 
+#ifndef CONFIG_USER_ONLY
+DEFINE_PROP_INT32("asid-bits",  RISCVCPU, cfg.asid_bits, -1),
+#endif
+
 DEFINE_PROP_BOOL("zmmul", RISCVCPU, cfg.ext_zmmul, false),
 
 DEFINE_PROP_BOOL("zca", RISCVCPU, cfg.ext_zca, false),
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 7adb8706ac..5b35770795 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -194,6 +194,7 @@ struct CPUArchState {
 uint64_t mideleg;
 
 target_ulong satp;   /* since: priv-1.10.0 */
+target_ulong asid_clear;/* always clear these bits in satp */
 target_ulong stval;
 target_ulong medeleg;
 
diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index 59f0ffd9e1..fd753ce3f4 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -617,11 +617,13 @@ typedef enum {
 /* RV32 satp CSR field masks */
 #define SATP32_MODE 0x8000
 #define SATP32_ASID 0x7fc0
+#define SATP32_ASID_SHIFT   22
 #define SATP32_PPN  0x003f
 
 /* RV64 satp CSR field masks */
 #define SATP64_MODE 0xF000ULL
 #define SATP64_ASID 0x0000ULL
+#define SATP64_ASID_SHIFT   44
 #define SATP64_PPN  0x0FFFULL
 
 /* VM modes (satp.mode) privileged ISA 1.10 */
diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
index c4a627d335..4d578797cc 100644
--- a/target/riscv/cpu_cfg.h
+++ b/target/riscv/cpu_cfg.h
@@ -128,6 +128,7 @@ struct RISCVCPUConfig {
 bool short_isa_string;
 
 #ifndef CONFIG_USER_ONLY
+int32_t asid_bits;
 RISCVSATPMap satp_mode;
 #endif
 };
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 58499b5afc..215b71bd31 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -2731,6 +2731,7 @@ static RISCVException write_satp(CPURISCVState *env, int 
csrno,
 return RISCV_EXCP_NONE;
 }
 
+val &= ~env->asid_clear;
 if (riscv_cpu_mxl(env) == MXL_RV32) {
 vm = validate_vm(env, get_field(val, SATP32_MODE));
 mask = (val ^ env->satp) & (SATP32_MODE | SATP32_ASID | SATP32_PPN);
-- 
2.40.1

[PATCH] pnv/xive: Print CPU target in all TIMA traces

2023-07-05 Thread Frederic Barrat

Add the CPU target in the trace when reading/writing the TIMA
space. It was already done for other TIMA ops (notify, accept, ...),
only missing for those 2. Useful for debug and even more now that we
experiment with SMT.

Signed-off-by: Frederic Barrat 
---
 hw/intc/trace-events | 4 ++--
 hw/intc/xive.c   | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/intc/trace-events b/hw/intc/trace-events
index 5c6094c457..36ff71f947 100644
--- a/hw/intc/trace-events
+++ b/hw/intc/trace-events
@@ -265,8 +265,8 @@ xive_source_esb_read(uint64_t addr, uint32_t srcno, 
uint64_t value) "@0x%"PRIx64
 xive_source_esb_write(uint64_t addr, uint32_t srcno, uint64_t value) 
"@0x%"PRIx64" IRQ 0x%x val=0x%"PRIx64
 xive_router_end_notify(uint8_t end_blk, uint32_t end_idx, uint32_t end_data) 
"END 0x%02x/0x%04x -> enqueue 0x%08x"
 xive_router_end_escalate(uint8_t end_blk, uint32_t end_idx, uint8_t esc_blk, 
uint32_t esc_idx, uint32_t end_data) "END 0x%02x/0x%04x -> escalate END 
0x%02x/0x%04x data 0x%08x"
-xive_tctx_tm_write(uint64_t offset, unsigned int size, uint64_t value) 
"@0x%"PRIx64" sz=%d val=0x%" PRIx64
-xive_tctx_tm_read(uint64_t offset, unsigned int size, uint64_t value) 
"@0x%"PRIx64" sz=%d val=0x%" PRIx64
+xive_tctx_tm_write(uint32_t index, uint64_t offset, unsigned int size, 
uint64_t value) "target=%d @0x%"PRIx64" sz=%d val=0x%" PRIx64
+xive_tctx_tm_read(uint32_t index, uint64_t offset, unsigned int size, uint64_t 
value) "target=%d @0x%"PRIx64" sz=%d val=0x%" PRIx64
 xive_presenter_notify(uint8_t nvt_blk, uint32_t nvt_idx, uint8_t ring) "found 
NVT 0x%x/0x%x ring=0x%x"
 xive_end_source_read(uint8_t end_blk, uint32_t end_idx, uint64_t addr) "END 
0x%x/0x%x @0x%"PRIx64
 
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index c014e961a4..56670b2cac 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -566,7 +566,7 @@ void xive_tctx_tm_write(XivePresenter *xptr, XiveTCTX 
*tctx, hwaddr offset,
 {
 const XiveTmOp *xto;
 
-trace_xive_tctx_tm_write(offset, size, value);
+trace_xive_tctx_tm_write(tctx->cs->cpu_index, offset, size, value);
 
 /*
  * TODO: check V bit in Q[0-3]W2
@@ -639,7 +639,7 @@ uint64_t xive_tctx_tm_read(XivePresenter *xptr, XiveTCTX 
*tctx, hwaddr offset,
  */
 ret = xive_tm_raw_read(tctx, offset, size);
 out:
-trace_xive_tctx_tm_read(offset, size, ret);
+trace_xive_tctx_tm_read(tctx->cs->cpu_index, offset, size, ret);
 return ret;
 }
 
-- 
2.41.0

Re: [PATCH v3 0/3] vdpa: Return -EIO if device ack is VIRTIO_NET_ERR

2023-07-05 Thread Hawkins Jiawei

On 2023/7/5 15:59, Lei Yang wrote:
> Hello Hawkins
>
> QE can help test this series before  it is merged into master, I would
> like to know what test steps can cover this series related scenario?
>

Hi, I would like to suggest the following steps to test this patch series:

1.  Modify the QEMU source code to make the device return a
VIRTIO_NET_ERR for the CVQ command. Please apply the patch
provided below:

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 373609216f..58ade6d4e0 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -642,7 +642,7 @@ static int vhost_vdpa_net_load_mac(VhostVDPAState
*s, const VirtIONet *n)
  if (virtio_vdev_has_feature(&n->parent_obj,
VIRTIO_NET_F_CTRL_MAC_ADDR)) {
  ssize_t dev_written = vhost_vdpa_net_load_cmd(s,
VIRTIO_NET_CTRL_MAC,

VIRTIO_NET_CTRL_MAC_ADDR_SET,
-  n->mac, sizeof(n->mac));
+  n->mac,
sizeof(n->mac) - 1);
  if (unlikely(dev_written < 0)) {
  return dev_written;
  }

2. Start QEMU with the vdpa device in default state.
Without the patch series, QEMU should not trigger any errors or warnings.
With the series applied, QEMU should trigger the warning like
"qemu-system-x86_64: unable to start vhost net: 5: falling back on
userspace virtio".

Thanks!


> Thanks
> Lei
>
> On Tue, Jul 4, 2023 at 11:36 AM Hawkins Jiawei  wrote:
>>
>> According to VirtIO standard, "The class, command and
>> command-specific-data are set by the driver,
>> and the device sets the ack byte.
>> There is little it can do except issue a diagnostic
>> if ack is not VIRTIO_NET_OK."
>>
>> Therefore, QEMU should stop sending the queued SVQ commands and
>> cancel the device startup if the device's ack is not VIRTIO_NET_OK.
>>
>> Yet the problem is that, vhost_vdpa_net_load_x() returns 1 based on
>> `*s->status != VIRTIO_NET_OK` when the device's ack is VIRTIO_NET_ERR.
>> As a result, net->nc->info->load() also returns 1, this makes
>> vhost_net_start_one() incorrectly assume the device state is
>> successfully loaded by vhost_vdpa_net_load() and return 0, instead of
>> goto `fail` label to cancel the device startup, as vhost_net_start_one()
>> only cancels the device startup when net->nc->info->load() returns a
>> negative value.
>>
>> This patchset fixes this problem by returning -EIO when the device's
>> ack is not VIRTIO_NET_OK.
>>
>> Changelog
>> =
>> v3:
>>   - split the fixes suggested by Eugenio
>>   - return -EIO suggested by Michael
>>
>> v2: 
>> https://lore.kernel.org/all/69010e9ebb5e3729aef595ed92840f43e48e53e5.1687875592.git.yin31...@gmail.com/
>>   - fix the same bug in vhost_vdpa_net_load_offloads()
>>
>> v1: https://lore.kernel.org/all/cover.1686746406.git.yin31...@gmail.com/
>>
>> Hawkins Jiawei (3):
>>vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_mac()
>>vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_mq()
>>vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_offloads()
>>
>>   net/vhost-vdpa.c | 15 +++
>>   1 file changed, 11 insertions(+), 4 deletions(-)
>>
>> --
>> 2.25.1
>>
>>
>

Re: [PATCH] pnv/xive: Print CPU target in all TIMA traces

2023-07-05 Thread Philippe Mathieu-Daudé


On 5/7/23 13:00, Frederic Barrat wrote:

Add the CPU target in the trace when reading/writing the TIMA
space. It was already done for other TIMA ops (notify, accept, ...),
only missing for those 2. Useful for debug and even more now that we
experiment with SMT.

Signed-off-by: Frederic Barrat 
---
  hw/intc/trace-events | 4 ++--
  hw/intc/xive.c   | 4 ++--
  2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/intc/trace-events b/hw/intc/trace-events
index 5c6094c457..36ff71f947 100644
--- a/hw/intc/trace-events
+++ b/hw/intc/trace-events
@@ -265,8 +265,8 @@ xive_source_esb_read(uint64_t addr, uint32_t srcno, uint64_t value) 
"@0x%"PRIx64
  xive_source_esb_write(uint64_t addr, uint32_t srcno, uint64_t value) "@0x%"PRIx64" 
IRQ 0x%x val=0x%"PRIx64
  xive_router_end_notify(uint8_t end_blk, uint32_t end_idx, uint32_t end_data) "END 
0x%02x/0x%04x -> enqueue 0x%08x"
  xive_router_end_escalate(uint8_t end_blk, uint32_t end_idx, uint8_t esc_blk, uint32_t 
esc_idx, uint32_t end_data) "END 0x%02x/0x%04x -> escalate END 0x%02x/0x%04x data 
0x%08x"
-xive_tctx_tm_write(uint64_t offset, unsigned int size, uint64_t value) 
"@0x%"PRIx64" sz=%d val=0x%" PRIx64
-xive_tctx_tm_read(uint64_t offset, unsigned int size, uint64_t value) "@0x%"PRIx64" 
sz=%d val=0x%" PRIx64
+xive_tctx_tm_write(uint32_t index, uint64_t offset, unsigned int size, uint64_t value) 
"target=%d @0x%"PRIx64" sz=%d val=0x%" PRIx64
+xive_tctx_tm_read(uint32_t index, uint64_t offset, unsigned int size, uint64_t value) 
"target=%d @0x%"PRIx64" sz=%d val=0x%" PRIx64


"target" is kinda confusing, what about:

xive_tctx_tm_read(uint32_t cpu_index, ...) "cpu=%d @0x%"PRIx64" ...

?

Re: [PATCH] pnv/xive: Print CPU target in all TIMA traces

2023-07-05 Thread Cédric Le Goater


On 7/5/23 13:00, Frederic Barrat wrote:

Add the CPU target in the trace when reading/writing the TIMA
space. It was already done for other TIMA ops (notify, accept, ...),
only missing for those 2. Useful for debug and even more now that we
experiment with SMT.

Signed-off-by: Frederic Barrat 



Reviewed-by: Cédric Le Goater 

Thanks,

C.



---
  hw/intc/trace-events | 4 ++--
  hw/intc/xive.c   | 4 ++--
  2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/intc/trace-events b/hw/intc/trace-events
index 5c6094c457..36ff71f947 100644
--- a/hw/intc/trace-events
+++ b/hw/intc/trace-events
@@ -265,8 +265,8 @@ xive_source_esb_read(uint64_t addr, uint32_t srcno, uint64_t value) 
"@0x%"PRIx64
  xive_source_esb_write(uint64_t addr, uint32_t srcno, uint64_t value) "@0x%"PRIx64" 
IRQ 0x%x val=0x%"PRIx64
  xive_router_end_notify(uint8_t end_blk, uint32_t end_idx, uint32_t end_data) "END 
0x%02x/0x%04x -> enqueue 0x%08x"
  xive_router_end_escalate(uint8_t end_blk, uint32_t end_idx, uint8_t esc_blk, uint32_t 
esc_idx, uint32_t end_data) "END 0x%02x/0x%04x -> escalate END 0x%02x/0x%04x data 
0x%08x"
-xive_tctx_tm_write(uint64_t offset, unsigned int size, uint64_t value) 
"@0x%"PRIx64" sz=%d val=0x%" PRIx64
-xive_tctx_tm_read(uint64_t offset, unsigned int size, uint64_t value) "@0x%"PRIx64" 
sz=%d val=0x%" PRIx64
+xive_tctx_tm_write(uint32_t index, uint64_t offset, unsigned int size, uint64_t value) 
"target=%d @0x%"PRIx64" sz=%d val=0x%" PRIx64
+xive_tctx_tm_read(uint32_t index, uint64_t offset, unsigned int size, uint64_t value) 
"target=%d @0x%"PRIx64" sz=%d val=0x%" PRIx64
  xive_presenter_notify(uint8_t nvt_blk, uint32_t nvt_idx, uint8_t ring) "found NVT 
0x%x/0x%x ring=0x%x"
  xive_end_source_read(uint8_t end_blk, uint32_t end_idx, uint64_t addr) "END 
0x%x/0x%x @0x%"PRIx64
  
diff --git a/hw/intc/xive.c b/hw/intc/xive.c

index c014e961a4..56670b2cac 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -566,7 +566,7 @@ void xive_tctx_tm_write(XivePresenter *xptr, XiveTCTX 
*tctx, hwaddr offset,
  {
  const XiveTmOp *xto;
  
-trace_xive_tctx_tm_write(offset, size, value);

+trace_xive_tctx_tm_write(tctx->cs->cpu_index, offset, size, value);
  
  /*

   * TODO: check V bit in Q[0-3]W2
@@ -639,7 +639,7 @@ uint64_t xive_tctx_tm_read(XivePresenter *xptr, XiveTCTX 
*tctx, hwaddr offset,
   */
  ret = xive_tm_raw_read(tctx, offset, size);
  out:
-trace_xive_tctx_tm_read(offset, size, ret);
+trace_xive_tctx_tm_read(tctx->cs->cpu_index, offset, size, ret);
  return ret;
  }

Re: [PATCH v2] linux-user/syscall: Implement execve without execveat

2023-07-05 Thread Philippe Mathieu-Daudé


On 5/7/23 12:26, Michael Tokarev wrote:

05.07.2023 12:00, Pierrick Bouvier wrote:
...



@@ -8696,11 +8697,14 @@ static int do_execveat(CPUArchState *cpu_env, 
int dirfd,

  goto execve_efault;
  }
+    const char* exe = p;
  if (is_proc_myself(p, "exe")) {
-    ret = get_errno(safe_execveat(dirfd, exec_path, argp, envp, 
flags));

-    } else {
-    ret = get_errno(safe_execveat(dirfd, p, argp, envp, flags));
+    exe = exec_path;
  }
+    ret = is_execveat ?
+    safe_execveat(dirfd, exe, argp, envp, flags):
+    safe_execve(exe, argp, envp);
+    ret = get_errno(ret);




ERROR: spaces required around that ':' (ctx:VxE)
#169: FILE: linux-user/syscall.c:8705:
+    safe_execveat(dirfd, exe, argp, envp, flags):
  ^

total: 2 errors, 0 warnings, 47 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

As I mentioned in the v1, I don't remember offhand how the arithmetic if
should be styled in qemu.  At the very best, the v2 variant is difficult
to read because ":" is too close to ";" visually, an extra space before
it will make it more explicit.


KISS, alternatively:

ret = is_execveat
  ? safe_execveat(dirfd, exe, argp, envp, flags)
  : safe_execve(exe, argp, envp);

Re: [PATCH] pnv/xive: Print CPU target in all TIMA traces

2023-07-05 Thread Cédric Le Goater


On 7/5/23 13:12, Philippe Mathieu-Daudé wrote:

On 5/7/23 13:00, Frederic Barrat wrote:

Add the CPU target in the trace when reading/writing the TIMA
space. It was already done for other TIMA ops (notify, accept, ...),
only missing for those 2. Useful for debug and even more now that we
experiment with SMT.

Signed-off-by: Frederic Barrat 
---
  hw/intc/trace-events | 4 ++--
  hw/intc/xive.c   | 4 ++--
  2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/intc/trace-events b/hw/intc/trace-events
index 5c6094c457..36ff71f947 100644
--- a/hw/intc/trace-events
+++ b/hw/intc/trace-events
@@ -265,8 +265,8 @@ xive_source_esb_read(uint64_t addr, uint32_t srcno, uint64_t value) 
"@0x%"PRIx64
  xive_source_esb_write(uint64_t addr, uint32_t srcno, uint64_t value) "@0x%"PRIx64" 
IRQ 0x%x val=0x%"PRIx64
  xive_router_end_notify(uint8_t end_blk, uint32_t end_idx, uint32_t end_data) "END 
0x%02x/0x%04x -> enqueue 0x%08x"
  xive_router_end_escalate(uint8_t end_blk, uint32_t end_idx, uint8_t esc_blk, uint32_t 
esc_idx, uint32_t end_data) "END 0x%02x/0x%04x -> escalate END 0x%02x/0x%04x data 
0x%08x"
-xive_tctx_tm_write(uint64_t offset, unsigned int size, uint64_t value) 
"@0x%"PRIx64" sz=%d val=0x%" PRIx64
-xive_tctx_tm_read(uint64_t offset, unsigned int size, uint64_t value) "@0x%"PRIx64" 
sz=%d val=0x%" PRIx64
+xive_tctx_tm_write(uint32_t index, uint64_t offset, unsigned int size, uint64_t value) 
"target=%d @0x%"PRIx64" sz=%d val=0x%" PRIx64
+xive_tctx_tm_read(uint32_t index, uint64_t offset, unsigned int size, uint64_t value) 
"target=%d @0x%"PRIx64" sz=%d val=0x%" PRIx64


"target" is kinda confusing, what about:

xive_tctx_tm_read(uint32_t cpu_index, ...) "cpu=%d @0x%"PRIx64" ...


An interrupt 'source' is served by a 'target', a target could be a CPU,
a vCPU id, a group of vCPU, a process id.

'target' is part of the XIVE nomenclature, in HW specs, in drivers, FW,
Linux, KVM, and models in QEMU. It is fine.

Thanks,

C.
 


?

Re: [PATCH v6 1/9] migration: introduced 'MigrateAddress' in QAPI for migration wire protocol.

2023-07-05 Thread Markus Armbruster

Het Gala  writes:

> This patch introduces well defined MigrateAddress struct and its related
> child objects.
>
> The existing argument of 'migrate' and 'migrate-incoming' QAPI - 'uri'
> is of string type. The current migration flow follows double encoding
> scheme for  fetching migration parameters such as 'uri' and this is
> not an ideal design.
>
> Motive for intoducing struct level design is to prevent double encoding
> of QAPI arguments, as Qemu should be able to directly use the QAPI
> arguments without any level of encoding.
>
> Suggested-by: Aravind Retnakaran 
> Signed-off-by: Het Gala 
> Reviewed-by: Juan Quintela 
> Reviewed-by: Daniel P. Berrangé 
> ---
>  qapi/migration.json | 45 +
>  1 file changed, 45 insertions(+)
>
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 179af0c4d8..e61d25eba2 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1407,6 +1407,51 @@
>  ##
>  { 'command': 'migrate-continue', 'data': {'state': 'MigrationStatus'} }
>  
> +##
> +# @MigrationAddressType:
> +#
> +# The migration stream transport mechanisms.
> +#
> +# @socket: Migrate via socket.
> +#
> +# @exec: Direct the migration stream to another process.
> +#
> +# @rdma: Migrate via RDMA.
> +#
> +# Since 8.1
> +##
> +{ 'enum': 'MigrationAddressType',
> +  'data': ['socket', 'exec', 'rdma'] }
> +
> +##
> +# @MigrationExecCommand:
> +#
> +# @args: list of commands for migraton stream execution to a file.

Typo: migration

> +#
> +# Notes:
> +#
> +# 1. @args[0] needs to be the path to the new program.

@args can't be a "list of commands", as we're spawning just one process.
So what is it?

Digging through the code with the entire series applied...  Member @args
is used in two places:

1. qemu_start_incoming_migration() passes it to
   exec_start_incoming_migration(), which translates it into an array
   and passes it to qio_channel_command_new_spawn().

2. qmp_migrate() passes it to exec_start_outgoing_migration(), which
   does the same.

qio_channel_command_new_spawn() passes it to
g_spawn_async_with_pipes().  A close read of the latter's documentation
leads me to:

* args[0] is the excutable's file name.  As usual, a relative name is
  relative to the QEMU process's current working directory.

* args[1..] are the arguments.

Unlike POSIX interfaces like execv() and posix_spawn(), this doesn't
separate the executable's file name and 0-th argument.

In short, the head of @args is the executable's filename, and the
remainder are the arguments.  The fact that the the executable's
filename is passed as 0-th argument to the child process is detail.

Perhaps this could do:

   ##
   # @MigrationExecCommand:
   #
   # @args: command and arguments to execute.

If we want more detail, perhaps:

   # @args: command (list head) and arguments (list tail) to execute.

Not sure we need it.  Thoughts?

> +#
> +# Since 8.1
> +##
> +{ 'struct': 'MigrationExecCommand',
> +  'data': {'args': [ 'str' ] } }
> +
> +##
> +# @MigrationAddress:
> +#
> +# Migration endpoint configuration.
> +#
> +# Since 8.1
> +##
> +{ 'union': 'MigrationAddress',
> +  'base': { 'transport' : 'MigrationAddressType'},
> +  'discriminator': 'transport',
> +  'data': {
> +'socket': 'SocketAddress',
> +'exec': 'MigrationExecCommand',
> +'rdma': 'InetSocketAddress' } }
> +
>  ##
>  # @migrate:
>  #

[PATCH 2/2] python: bump minimum requirements so they are compatible with 3.12

2023-07-05 Thread Paolo Bonzini

There are many Python 3.12 issues right now, but a particularly
problematic one when debugging them is that one cannot even use
minreqs.txt in a Python 3.12 virtual environment to test with
locked package versions.

Bump the mypy and wrapt versions to fix this, while remaining
within the realm of versions compatible with Python 3.7.

Signed-off-by: Paolo Bonzini 
---
 python/setup.cfg | 2 +-
 python/tests/minreqs.txt | 9 -
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/python/setup.cfg b/python/setup.cfg
index 42f0b0be07d..5d7e95f5d24 100644
--- a/python/setup.cfg
+++ b/python/setup.cfg
@@ -39,7 +39,7 @@ devel =
 flake8 >= 5.0.4
 fusepy >= 2.0.4
 isort >= 5.1.2
-mypy >= 0.780
+mypy >= 1.4.0
 pylint >= 2.17.3
 tox >= 3.18.0
 urwid >= 2.1.2
diff --git a/python/tests/minreqs.txt b/python/tests/minreqs.txt
index 1ce72cef6d8..979461be6bb 100644
--- a/python/tests/minreqs.txt
+++ b/python/tests/minreqs.txt
@@ -28,7 +28,7 @@ avocado-framework==90.0
 # Linters
 flake8==5.0.4
 isort==5.1.2
-mypy==0.780
+mypy==1.4.0
 pylint==2.17.3
 
 # Transitive flake8 dependencies
@@ -37,12 +37,11 @@ pycodestyle==2.9.1
 pyflakes==2.5.0
 
 # Transitive mypy dependencies
-mypy-extensions==0.4.3
-typed-ast==1.4.0
-typing-extensions==4.5.0
+mypy-extensions==1.0.0
+typing-extensions==4.7.1
 
 # Transitive pylint dependencies
 astroid==2.15.4
 lazy-object-proxy==1.4.0
 toml==0.10.0
-wrapt==1.12.1
+wrapt==1.14.0
-- 
2.41.0

[PATCH 0/2] python: first step towards Python 3.12 support

2023-07-05 Thread Paolo Bonzini

The Python 3.12 situation is a mess, with both flake8 and pylint giving
false positives that do not happen with Python 3.11.  As a first step
towards understanding these issues, drop support for old linter versions
that do not work with it.  This at least makes it possible to install
easily the same versions of the linters on any version of Python, and
put the blame on the interpreter.

Paolo

Paolo Bonzini (2):
  python: work around mypy false positive
  python: bump minimum requirements so they are compatible with 3.12

 python/qemu/qmp/qmp_tui.py | 3 ++-
 python/setup.cfg   | 2 +-
 python/tests/minreqs.txt   | 9 -
 3 files changed, 7 insertions(+), 7 deletions(-)

-- 
2.41.0

Re: [PATCH] pnv/xive: Print CPU target in all TIMA traces

2023-07-05 Thread Philippe Mathieu-Daudé


On 5/7/23 13:18, Cédric Le Goater wrote:

On 7/5/23 13:12, Philippe Mathieu-Daudé wrote:

On 5/7/23 13:00, Frederic Barrat wrote:

Add the CPU target in the trace when reading/writing the TIMA
space. It was already done for other TIMA ops (notify, accept, ...),
only missing for those 2. Useful for debug and even more now that we
experiment with SMT.

Signed-off-by: Frederic Barrat 
---
  hw/intc/trace-events | 4 ++--
  hw/intc/xive.c   | 4 ++--
  2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/intc/trace-events b/hw/intc/trace-events
index 5c6094c457..36ff71f947 100644
--- a/hw/intc/trace-events
+++ b/hw/intc/trace-events
@@ -265,8 +265,8 @@ xive_source_esb_read(uint64_t addr, uint32_t 
srcno, uint64_t value) "@0x%"PRIx64
  xive_source_esb_write(uint64_t addr, uint32_t srcno, uint64_t 
value) "@0x%"PRIx64" IRQ 0x%x val=0x%"PRIx64
  xive_router_end_notify(uint8_t end_blk, uint32_t end_idx, uint32_t 
end_data) "END 0x%02x/0x%04x -> enqueue 0x%08x"
  xive_router_end_escalate(uint8_t end_blk, uint32_t end_idx, uint8_t 
esc_blk, uint32_t esc_idx, uint32_t end_data) "END 0x%02x/0x%04x -> 
escalate END 0x%02x/0x%04x data 0x%08x"
-xive_tctx_tm_write(uint64_t offset, unsigned int size, uint64_t 
value) "@0x%"PRIx64" sz=%d val=0x%" PRIx64
-xive_tctx_tm_read(uint64_t offset, unsigned int size, uint64_t 
value) "@0x%"PRIx64" sz=%d val=0x%" PRIx64
+xive_tctx_tm_write(uint32_t index, uint64_t offset, unsigned int 
size, uint64_t value) "target=%d @0x%"PRIx64" sz=%d val=0x%" PRIx64
+xive_tctx_tm_read(uint32_t index, uint64_t offset, unsigned int 
size, uint64_t value) "target=%d @0x%"PRIx64" sz=%d val=0x%" PRIx64


"target" is kinda confusing, what about:

xive_tctx_tm_read(uint32_t cpu_index, ...) "cpu=%d @0x%"PRIx64" ...


An interrupt 'source' is served by a 'target', a target could be a CPU,
a vCPU id, a group of vCPU, a process id.

'target' is part of the XIVE nomenclature, in HW specs, in drivers, FW,
Linux, KVM, and models in QEMU. It is fine.


Ah OK. Then xive_tctx_tm_read(uint32_t target, ...).

[PATCH 1/2] python: work around mypy false positive

2023-07-05 Thread Paolo Bonzini

mypy 1.4.0 signals an error:

qemu/qmp/qmp_tui.py:350: error: Non-overlapping equality check (left operand 
type: "Literal[Runstate.DISCONNECTING]", right operand type: 
"Literal[Runstate.IDLE]")  [comparison-overlap]

This is because it does not realiez that self.disconnect() could change
the value of self.runstate.

Signed-off-by: Paolo Bonzini 
---
 python/qemu/qmp/qmp_tui.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/python/qemu/qmp/qmp_tui.py b/python/qemu/qmp/qmp_tui.py
index 83691447231..1b68a71397f 100644
--- a/python/qemu/qmp/qmp_tui.py
+++ b/python/qemu/qmp/qmp_tui.py
@@ -346,7 +346,8 @@ async def manage_connection(self) -> None:
 self._set_status('[Disconnected]')
 await self.disconnect()
 # check if a retry is needed
-if self.runstate == Runstate.IDLE:
+# mypy bug - doesn't realize self.runstate could change
+if self.runstate == Runstate.IDLE:  # type: ignore
 continue
 await self.runstate_changed()
 
-- 
2.41.0

Re: [PATCH v6 2/9] migration: convert uri parameter into 'MigrateAddress' struct

2023-07-05 Thread Markus Armbruster

Drive-by comment...

Het Gala  writes:

> This patch parses 'migrate' and 'migrate-incoming' QAPI's 'uri' parameter
> with all the migration connection related information and stores them
> inside well defined 'MigrateAddress' struct.
>
> Misc: limit line width in exec.c to 80 characters recommended by Qemu.
>
> Suggested-by: Aravind Retnakaran 
> Signed-off-by: Het Gala 

[...]

> diff --git a/migration/migration.c b/migration/migration.c
> index dc05c6f6ea..0eb25bb5a4 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -64,6 +64,7 @@
>  #include "yank_functions.h"
>  #include "sysemu/qtest.h"
>  #include "options.h"
> +#include "qemu/sockets.h"
>  
>  static NotifierList migration_state_notifiers =
>  NOTIFIER_LIST_INITIALIZER(migration_state_notifiers);
> @@ -420,15 +421,63 @@ void migrate_add_address(SocketAddress *address)
>QAPI_CLONE(SocketAddress, address));
>  }
>  
> +static bool migrate_uri_parse(const char *uri,
> +  MigrationAddress **channel,
> +  Error **errp)
> +{
> +g_autoptr(MigrationAddress) addrs = g_new0(MigrationAddress, 1);

@addrs suggests plural, yet it points to a single MigrationAddress.
Recommend @addr.

Same elsewhere.

[...]

Re: [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO device assignment

2023-07-05 Thread Jean-Philippe Brucker

On Wed, Jul 05, 2023 at 10:13:11AM +, Duan, Zhenzhong wrote:
> >-Original Message-
> >From: Jean-Philippe Brucker 
> >Sent: Wednesday, July 5, 2023 4:29 PM
> >Subject: Re: [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO device
> >assignment
> >
> >On Wed, Jul 05, 2023 at 04:52:09AM +, Duan, Zhenzhong wrote:
> >> Hi Eric,
> >>
> >> >-Original Message-
> >> >From: Eric Auger 
> >> >Sent: Tuesday, July 4, 2023 7:15 PM
> >> >Subject: [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO
> >> >device assignment
> >> >
> >> >When running on a 64kB page size host and protecting a VFIO device
> >> >with the virtio-iommu, qemu crashes with this kind of message:
> >> >
> >> >qemu-kvm: virtio-iommu page mask 0xf000 is incompatible
> >> >with mask 0x2001
> >>
> >> Does 0x2001 mean only  512MB and 64KB super page mapping is
> >> supported for host iommu hw? 4KB mapping not supported?
> >
> >It's not a restriction by the HW IOMMU, but the host kernel. An Arm SMMU
> >can implement 4KB, 16KB and/or 64KB granules, but the host kernel only
> >advertises through VFIO the granule corresponding to host PAGE_SIZE. This
> >restriction is done by arm_lpae_restrict_pgsizes() in order to choose a page
> >size when a device is driven by the host.
> 
> Just curious why not advertises the Arm SMMU implemented granules to VFIO
> Eg:4KB, 16KB or 64KB granules?

That's possible, but the difficulty is setting up the page table
configuration afterwards. At the moment the host installs the HW page
tables early, when QEMU sets up the VFIO container. That initializes the
page size bitmap because configuring the HW page tables requires picking
one of the supported granules (setting TG0 in the SMMU Context
Descriptor).

If the guest could pick a granule via an ATTACH request, then QEMU would
need to tell the host kernel to install page tables with the desired
granule at that point. That would require a new interface in VFIO to
reconfigure a live container and replace the existing HW page tables
configuration (before ATTACH, the container must already be configured
with working page tables in order to implement boot-bypass, I think).

> But arm_lpae_restrict_pgsizes() restricted ones,
> Eg: for SZ_4K, (SZ_4K | SZ_2M | SZ_1G).
> (SZ_4K | SZ_2M | SZ_1G) looks not real hardware granules of Arm SMMU.

Yes, the granule here is 4K, and other bits only indicate huge page sizes,
so the user can try to optimize large mappings to use fewer TLB entries
where possible.

Thanks,
Jean

Re: [PATCH v3 06/13] docs/devel: simplify the minimal checklist

2023-07-05 Thread Philippe Mathieu-Daudé


Hi Alex,

On 17/11/22 18:25, Alex Bennée wrote:

The bullet points are quite long and contain process tips. Move those
bits of the bullet to the relevant sections and link to them. Use a
table for nicer formatting of the checklist.

Signed-off-by: Alex Bennée 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Paolo Bonzini 
Message-Id: <2022145529.4020801-8-alex.ben...@linaro.org>
---
  docs/devel/submitting-a-patch.rst | 75 ---
  1 file changed, 49 insertions(+), 26 deletions(-)




@@ -314,10 +320,12 @@ git repository to fetch the original commit.
  Patch emails must include a ``Signed-off-by:`` line
  ~~~
  
-For more information see `SubmittingPatches 1.12

-`__.
-This is vital or we will not be able to apply your patch! Please use
-your real name to sign a patch (not an alias or acronym).


Revisiting this patch, asking for some real name instead of alias
was at least helpful during patch review, we could address the
contributor by its name. Addressing an acronym is socially weird
(at least in my culture netiquette).


+Your patches **must** include a Signed-off-by: line. This is a hard
+requirement because it's how you say "I'm legally okay to contribute
+this and happy for it to go into QEMU". The process is modelled after
+the `Linux kernel
+`__
+policy.

Re: [PATCH] pnv/xive: Print CPU target in all TIMA traces

2023-07-05 Thread Cédric Le Goater


On 7/5/23 13:26, Philippe Mathieu-Daudé wrote:

On 5/7/23 13:18, Cédric Le Goater wrote:

On 7/5/23 13:12, Philippe Mathieu-Daudé wrote:

On 5/7/23 13:00, Frederic Barrat wrote:

Add the CPU target in the trace when reading/writing the TIMA
space. It was already done for other TIMA ops (notify, accept, ...),
only missing for those 2. Useful for debug and even more now that we
experiment with SMT.

Signed-off-by: Frederic Barrat 
---
  hw/intc/trace-events | 4 ++--
  hw/intc/xive.c   | 4 ++--
  2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/intc/trace-events b/hw/intc/trace-events
index 5c6094c457..36ff71f947 100644
--- a/hw/intc/trace-events
+++ b/hw/intc/trace-events
@@ -265,8 +265,8 @@ xive_source_esb_read(uint64_t addr, uint32_t srcno, uint64_t value) 
"@0x%"PRIx64
  xive_source_esb_write(uint64_t addr, uint32_t srcno, uint64_t value) "@0x%"PRIx64" 
IRQ 0x%x val=0x%"PRIx64
  xive_router_end_notify(uint8_t end_blk, uint32_t end_idx, uint32_t end_data) "END 
0x%02x/0x%04x -> enqueue 0x%08x"
  xive_router_end_escalate(uint8_t end_blk, uint32_t end_idx, uint8_t esc_blk, uint32_t 
esc_idx, uint32_t end_data) "END 0x%02x/0x%04x -> escalate END 0x%02x/0x%04x data 
0x%08x"
-xive_tctx_tm_write(uint64_t offset, unsigned int size, uint64_t value) 
"@0x%"PRIx64" sz=%d val=0x%" PRIx64
-xive_tctx_tm_read(uint64_t offset, unsigned int size, uint64_t value) "@0x%"PRIx64" 
sz=%d val=0x%" PRIx64
+xive_tctx_tm_write(uint32_t index, uint64_t offset, unsigned int size, uint64_t value) 
"target=%d @0x%"PRIx64" sz=%d val=0x%" PRIx64
+xive_tctx_tm_read(uint32_t index, uint64_t offset, unsigned int size, uint64_t value) 
"target=%d @0x%"PRIx64" sz=%d val=0x%" PRIx64


"target" is kinda confusing, what about:

xive_tctx_tm_read(uint32_t cpu_index, ...) "cpu=%d @0x%"PRIx64" ...


An interrupt 'source' is served by a 'target', a target could be a CPU,
a vCPU id, a group of vCPU, a process id.

'target' is part of the XIVE nomenclature, in HW specs, in drivers, FW,
Linux, KVM, and models in QEMU. It is fine.


Ah OK. Then xive_tctx_tm_read(uint32_t target, ...).


better indeed. What would be good to know also, is which vCPU is currently
dispatched on the HW thread. The info is in W2 of the ring being accessed.

Thanks,

C.

Re: [PATCH v2] virtio: add a new vcpu watchdog

2023-07-05 Thread Philippe Mathieu-Daudé


Hi,

On 5/7/23 10:18, zhanghao1 wrote:

Each vcpu creates a corresponding timer task. The watchdog
is driven by a timer according to a certain period. Each time
the timer expires, the counter is decremented. When the counter
is "0", the watchdog considers the vcpu to be stalling and resets
the VM. To avoid watchdog expiration, the guest kernel driver
needs to periodically send a pet event to update the counter.

Signed-off-by: zhanghao1 
---
v2:
  - change the function name and remove the redundant word 'stall'
  - add trace-events to replace DPRINTF and qemu_log
  - call 'watchdog_perform_action()' to reset vm
  - update g_new0 to replace malloc
  - update only use '.generic_name'
  - update the bool variable 'is_initialized' to uint32_t

v1: 
https://lore.kernel.org/qemu-devel/20230615061302.301754-1-zhangh...@kylinos.cn/

  hw/virtio/Kconfig|   5 +
  hw/virtio/meson.build|   2 +
  hw/virtio/trace-events   |   6 +
  hw/virtio/virtio-vcpu-watchdog-pci.c |  86 +
  hw/virtio/virtio-vcpu-watchdog.c | 226 +++
  include/hw/virtio/virtio-vcpu-watchdog.h |  37 
  6 files changed, 362 insertions(+)
  create mode 100644 hw/virtio/virtio-vcpu-watchdog-pci.c
  create mode 100644 hw/virtio/virtio-vcpu-watchdog.c
  create mode 100644 include/hw/virtio/virtio-vcpu-watchdog.h


Applying: virtio: add a new vcpu watchdog
.git/rebase-apply/patch:258: trailing whitespace.

error: patch failed: hw/virtio/meson.build:33
error: hw/virtio/meson.build: patch does not apply
Patch failed at 0001 virtio: add a new vcpu watchdog

On what commit/branch/tree is your patch based?

Thanks,

Phil.

[PATCH v2] python: bump minimum requirements so they are compatible with 3.12

2023-07-05 Thread Paolo Bonzini

There are several Python 3.12 issues in QEMU's python/ directory, but
a particularly problematic one when debugging them is that one cannot
even use minreqs.txt in a Python 3.12 virtual environment to test with
locked package versions.

Bump the mypy and wrapt versions to fix this, while remaining
within the realm of versions compatible with Python 3.7.

This requires a workaround for a mypy false positive

qemu/qmp/qmp_tui.py:350: error: Non-overlapping equality check (left 
operand type: "Literal[Runstate.DISCONNECTING]", right operand type: 
"Literal[Runstate.IDLE]")  [comparison-overlap]

where mypy does not realize that self.disconnect() could change
the value of self.runstate.

The remaining issues are due to missing support for Python 3.12
in released versions of flake8 and pylint, which should be
related to https://github.com/PyCQA/pycodestyle/issues/1145 and
https://github.com/pylint-dev/astroid/issues/2201 respectively.

Supersedes: <20230705112536.54025-1-pbonz...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
v1->v2: switch to the python-qemu-qmp fix for the mypy issue,
which however requires merging the two patches into
one for bisectability

 python/qemu/qmp/qmp_tui.py | 5 -
 python/setup.cfg   | 2 +-
 python/tests/minreqs.txt   | 9 -
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/python/qemu/qmp/qmp_tui.py b/python/qemu/qmp/qmp_tui.py
index 83691447231..2d9ebbd20bc 100644
--- a/python/qemu/qmp/qmp_tui.py
+++ b/python/qemu/qmp/qmp_tui.py
@@ -346,7 +346,10 @@ async def manage_connection(self) -> None:
 self._set_status('[Disconnected]')
 await self.disconnect()
 # check if a retry is needed
-if self.runstate == Runstate.IDLE:
+# mypy 1.4.0 doesn't believe runstate can change after
+# disconnect(), hence the cast.
+state = cast(Runstate, self.runstate)
+if state == Runstate.IDLE:
 continue
 await self.runstate_changed()
 
diff --git a/python/setup.cfg b/python/setup.cfg
index 42f0b0be07d..5d7e95f5d24 100644
--- a/python/setup.cfg
+++ b/python/setup.cfg
@@ -39,7 +39,7 @@ devel =
 flake8 >= 5.0.4
 fusepy >= 2.0.4
 isort >= 5.1.2
-mypy >= 0.780
+mypy >= 1.4.0
 pylint >= 2.17.3
 tox >= 3.18.0
 urwid >= 2.1.2
diff --git a/python/tests/minreqs.txt b/python/tests/minreqs.txt
index 1ce72cef6d8..979461be6bb 100644
--- a/python/tests/minreqs.txt
+++ b/python/tests/minreqs.txt
@@ -28,7 +28,7 @@ avocado-framework==90.0
 # Linters
 flake8==5.0.4
 isort==5.1.2
-mypy==0.780
+mypy==1.4.0
 pylint==2.17.3
 
 # Transitive flake8 dependencies
@@ -37,12 +37,11 @@ pycodestyle==2.9.1
 pyflakes==2.5.0
 
 # Transitive mypy dependencies
-mypy-extensions==0.4.3
-typed-ast==1.4.0
-typing-extensions==4.5.0
+mypy-extensions==1.0.0
+typing-extensions==4.7.1
 
 # Transitive pylint dependencies
 astroid==2.15.4
 lazy-object-proxy==1.4.0
 toml==0.10.0
-wrapt==1.12.1
+wrapt==1.14.0

[PATCH v8 0/6] test and QEMU fixes to ensure proper PCIE device usage

2023-07-05 Thread Ani Sinha

Patches 1-4:
Fix tests so that devices do not use non-zero slots on the pcie root
ports. PCIE ports only have one slot, so PCIE devices can only be
plugged into slot 0 on a PCIE port.

Patch 5:
Enforce only one slot on PCIE port.

Patch 6: add a cosmetic comment addition for better clarity of the code.

The test fixes must be applied before the QEMU change that checks for use
of a single slot in PCIE port.

CC: m...@redhat.com
CC: imamm...@redhat.com
CC: jus...@redhat.com
CC: th...@redhat.com
CC: lviv...@redhat.com
CC: michael.lab...@virtuozzo.com

Changelog:
===
v8: more comment messaging. rebased to latest master. small changes in patch
description and title.

v7: added tags, rebased to latest master.
For patch 5, converted a hard error to a warning.
Added patch 6.

v6: make patch 5 ARI compliant. fix commit message 
(s/pcie-root-port/pcie-to-pci/)
in patch 4. Rebase patchset to latest master.

v5: no code changes - correct a mistake in the commit log message.

v4: reword commit log for patch 4.

v3: tags added. reword the error description in patch 5. Reword commit log in 
patch 4. 

v2: add hd-geo-test fix as well as the actual QEMU code fix to the patchset.

The patches are added in the right order.

Ani Sinha (6):
  tests/acpi: allow changes in DSDT.noacpihp table blob
  tests/acpi/bios-tables-test: use the correct slot on the
pcie-root-port
  tests/acpi/bios-tables-test: update acpi blob q35/DSDT.noacpihp
  tests/qtest/hd-geo-test: fix incorrect pcie-root-port usage and
simplify test
  hw/pci: warn when PCIe device is plugged into non-zero slot of
downstream port
  hw/pci: add comment to explain checking for available function 0 in
pci hotplug

 hw/pci/pci.c  |  31 +++---
 tests/data/acpi/q35/DSDT.noacpihp | Bin 8248 -> 8241 bytes
 tests/qtest/bios-tables-test.c|   4 ++--
 tests/qtest/hd-geo-test.c |  18 -
 4 files changed, 38 insertions(+), 15 deletions(-)

-- 
2.39.1

[PATCH v8 6/6] hw/pci: add comment explaining the reason for checking function 0 in hotplug

2023-07-05 Thread Ani Sinha

This change is cosmetic. A comment is added explaining why we need to check for
the availability of function 0 when we hotplug a device.

CC: m...@redhat.com
Signed-off-by: Ani Sinha 
---
 hw/pci/pci.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 62b393dfb7..7aee3a7f12 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1181,9 +1181,14 @@ static PCIDevice *do_pci_register_device(PCIDevice 
*pci_dev,
PCI_SLOT(devfn), PCI_FUNC(devfn), name,
bus->devices[devfn]->name, bus->devices[devfn]->qdev.id);
 return NULL;
-} else if (dev->hotplugged &&
-   !pci_is_vf(pci_dev) &&
-   pci_get_function_0(pci_dev)) {
+} /*
+   * Populating function 0 triggers a scan from the guest that
+   * exposes other non-zero functions. Hence we need to ensure that
+   * function 0 wasn't added yet.
+   */
+else if (dev->hotplugged &&
+ !pci_is_vf(pci_dev) &&
+ pci_get_function_0(pci_dev)) {
 error_setg(errp, "PCI: slot %d function 0 already occupied by %s,"
" new func %s cannot be exposed to guest.",
PCI_SLOT(pci_get_function_0(pci_dev)->devfn),
-- 
2.39.1

[PATCH v8 6/6] hw/pci: add comment to explain checking for available function 0 in pci hotplug

2023-07-05 Thread Ani Sinha

This change is cosmetic. A comment is added explaining why we need to check for
the availability of function 0 when we hotplug a device.

CC: m...@redhat.com
Signed-off-by: Ani Sinha 
---
 hw/pci/pci.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 62b393dfb7..7aee3a7f12 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1181,9 +1181,14 @@ static PCIDevice *do_pci_register_device(PCIDevice 
*pci_dev,
PCI_SLOT(devfn), PCI_FUNC(devfn), name,
bus->devices[devfn]->name, bus->devices[devfn]->qdev.id);
 return NULL;
-} else if (dev->hotplugged &&
-   !pci_is_vf(pci_dev) &&
-   pci_get_function_0(pci_dev)) {
+} /*
+   * Populating function 0 triggers a scan from the guest that
+   * exposes other non-zero functions. Hence we need to ensure that
+   * function 0 wasn't added yet.
+   */
+else if (dev->hotplugged &&
+ !pci_is_vf(pci_dev) &&
+ pci_get_function_0(pci_dev)) {
 error_setg(errp, "PCI: slot %d function 0 already occupied by %s,"
" new func %s cannot be exposed to guest.",
PCI_SLOT(pci_get_function_0(pci_dev)->devfn),
-- 
2.39.1

[PATCH v8 2/6] tests/acpi/bios-tables-test: use the correct slot on the pcie-root-port

2023-07-05 Thread Ani Sinha

PCIE ports only have one slot, slot 0. Hence, non-zero slots are not available
for PCIE devices on PCIE root ports. Fix test_acpi_q35_tcg_no_acpi_hotplug()
so that the test does not use them.

Signed-off-by: Ani Sinha 
Reviewed-by: Igor Mammedov 
---
 tests/qtest/bios-tables-test.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c
index ed1c69cf01..47ba20b957 100644
--- a/tests/qtest/bios-tables-test.c
+++ b/tests/qtest/bios-tables-test.c
@@ -1020,9 +1020,9 @@ static void test_acpi_q35_tcg_no_acpi_hotplug(void)
 " -device pci-testdev,bus=nohprp,acpi-index=501"
 " -device pcie-root-port,id=nohprpint,port=0x0,chassis=3,hotplug=off,"
  "multifunction=on,addr=8.0"
-" -device pci-testdev,bus=nohprpint,acpi-index=601,addr=8.1"
+" -device pci-testdev,bus=nohprpint,acpi-index=601,addr=0.1"
 " -device pcie-root-port,id=hprp2,port=0x0,chassis=4,bus=nohprpint,"
- "addr=9.0"
+ "addr=0.2"
 " -device pci-testdev,bus=hprp2,acpi-index=602"
 , &data);
 free_test_data(&data);
-- 
2.39.1

[PATCH v8 5/6] hw/pci: warn when PCIe device is plugged into non-zero slot of downstream port

2023-07-05 Thread Ani Sinha

PCIe downstream ports only have a single device 0, so PCI Express devices can
only be plugged into slot 0 on a PCIe port. Add a warning to let users know
when the invalid configuration is used. We may enforce this more strongly later
once we get more clarity on whether we are introducing a bad regression for
users currently using the wrong configuration.

The change has been tested to not break or alter behaviors of ARI capable
devices by instantiating seven vfs on an emulated igb device (the maximum
number of vfs the igb device supports). The vfs are instantiated correctly
and are seen to have non-zero device/slot numbers in the conventional PCI BDF
representation.

CC: jus...@redhat.com
CC: imamm...@redhat.com
CC: m...@redhat.com
CC: akihiko.od...@daynix.com

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
Signed-off-by: Ani Sinha 
Reviewed-by: Julia Suvorova 
---
 hw/pci/pci.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index e2eb4c3b4a..62b393dfb7 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -65,6 +65,7 @@ bool pci_available = true;
 static char *pcibus_get_dev_path(DeviceState *dev);
 static char *pcibus_get_fw_dev_path(DeviceState *dev);
 static void pcibus_reset(BusState *qbus);
+static bool pcie_has_upstream_port(PCIDevice *dev);
 
 static Property pci_props[] = {
 DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
@@ -2121,6 +2122,25 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 }
 }
 
+/*
+ * A PCIe Downstream Port that do not have ARI Forwarding enabled must
+ * associate only Device 0 with the device attached to the bus
+ * representing the Link from the Port (PCIe base spec rev 4.0 ver 0.3,
+ * sec 7.3.1).
+ * With ARI, PCI_SLOT() can return non-zero value as the traditional
+ * 5-bit Device Number and 3-bit Function Number fields in its associated
+ * Routing IDs, Requester IDs and Completer IDs are interpreted as a
+ * single 8-bit Function Number. Hence, ignore ARI capable devices.
+ */
+if (pci_is_express(pci_dev) &&
+!pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI) &&
+pcie_has_upstream_port(pci_dev) &&
+PCI_SLOT(pci_dev->devfn)) {
+warn_report("PCI: slot %d is not valid for %s,"
+" parent device only allows plugging into slot 0.",
+PCI_SLOT(pci_dev->devfn), pci_dev->name);
+}
+
 if (pci_dev->failover_pair_id) {
 if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
 error_setg(errp, "failover primary device must be on "
-- 
2.39.1

[PATCH v8 3/6] tests/acpi/bios-tables-test: update acpi blob q35/DSDT.noacpihp

2023-07-05 Thread Ani Sinha

Some fixes were committed in bios-tables-test in the previous commit. Update
the acpi blob and clear bios-tables-test-allowed-diff.h so that the test
continues to pass with the changes in the bios-tables-test.

Following is the asl diff between the old and the newly updated blob:

@@ -1,30 +1,30 @@
 /*
  * Intel ACPI Component Architecture
  * AML/ASL+ Disassembler version 20210604 (64-bit version)
  * Copyright (c) 2000 - 2021 Intel Corporation
  *
  * Disassembling to symbolic ASL+ operators
  *
- * Disassembly of tests/data/acpi/q35/DSDT.noacpihp, Wed Jun 21 18:26:52 2023
+ * Disassembly of /tmp/aml-O8SU61, Wed Jun 21 18:26:52 2023
  *
  * Original Table Header:
  * Signature"DSDT"
- * Length   0x2038 (8248)
+ * Length   0x2031 (8241)
  * Revision 0x01  32-bit table (V1), no 64-bit math support
- * Checksum 0x4A
+ * Checksum 0x89
  * OEM ID   "BOCHS "
  * OEM Table ID "BXPC"
  * OEM Revision 0x0001 (1)
  * Compiler ID  "BXPC"
  * Compiler Version 0x0001 (1)
  */
 DefinitionBlock ("", "DSDT", 1, "BOCHS ", "BXPC", 0x0001)
 {
 Scope (\)
 {
 OperationRegion (DBG, SystemIO, 0x0402, One)
 Field (DBG, ByteAcc, NoLock, Preserve)
 {
 DBGB,   8
 }

@@ -3148,48 +3148,48 @@
 {
 Name (_ADR, Zero)  // _ADR: Address
 Method (_DSM, 4, Serialized)  // _DSM: Device-Specific 
Method
 {
 Local0 = Package (0x01)
 {
 0x01F5
 }
 Return (EDSM (Arg0, Arg1, Arg2, Arg3, Local0))
 }
 }
 }

 Device (S40)
 {
 Name (_ADR, 0x0008)  // _ADR: Address
-Device (S41)
+Device (S01)
 {
-Name (_ADR, 0x00080001)  // _ADR: Address
+Name (_ADR, One)  // _ADR: Address
 Method (_DSM, 4, Serialized)  // _DSM: Device-Specific 
Method
 {
 Local0 = Package (0x01)
 {
 0x0259
 }
 Return (EDSM (Arg0, Arg1, Arg2, Arg3, Local0))
 }
 }

-Device (S48)
+Device (S02)
 {
-Name (_ADR, 0x0009)  // _ADR: Address
+Name (_ADR, 0x02)  // _ADR: Address
 Device (S00)
 {
 Name (_ADR, Zero)  // _ADR: Address
 }
 }
 }

 Device (SF8)
 {
 Name (_ADR, 0x001F)  // _ADR: Address
 OperationRegion (PIRQ, PCI_Config, 0x60, 0x0C)
 Scope (\_SB)
 {
 Field (PCI0.SF8.PIRQ, ByteAcc, NoLock, Preserve)
 {
 PRQA,   8,

Signed-off-by: Ani Sinha 
Acked-by: Igor Mammedov 
---
 tests/data/acpi/q35/DSDT.noacpihp   | Bin 8248 -> 8241 bytes
 tests/qtest/bios-tables-test-allowed-diff.h |   1 -
 2 files changed, 1 deletion(-)

diff --git a/tests/data/acpi/q35/DSDT.noacpihp 
b/tests/data/acpi/q35/DSDT.noacpihp
index 
6ab1f0e52543fcb7f84a7fd1327fe5aa42010565..8cab2f8eb9ae94e0165f3f17857ec7d080fb0e13
 100644
GIT binary patch
delta 109
zcmdntu+f3bCDi)r&-xoSoL
DyqFtK

delta 94
zcmdn!u)~4NCD

[PATCH v8 4/6] tests/qtest/hd-geo-test: fix incorrect pcie-root-port usage and simplify test

2023-07-05 Thread Ani Sinha

The test attaches a SCSI controller to a non-zero slot and a pcie-to-pci bridge
on slot 0 on the same pcie-root-port. Since a downstream device can be attached
to a pcie-root-port only on slot 0, the above test configuration is not allowed.
Additionally using pcie.0 as id for pcie-to-pci bridge is incorrect as that id
is reserved only for the root bus.

In the test scenario, there is no need to attach a pcie-root-port to the
root complex. A SCSI controller can be attached to a pcie-to-pci bridge
which can then be directly attached to the root bus (pcie.0).

Fix the test and simplify it.

CC: m...@redhat.com
CC: imamm...@redhat.com
CC: Michael Labiuk 
Acked-by: Thomas Huth 
Reviewed-by: Igor Mammedov 
Signed-off-by: Ani Sinha 
---
 tests/qtest/hd-geo-test.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/tests/qtest/hd-geo-test.c b/tests/qtest/hd-geo-test.c
index 5aa258a2b3..d08bffad91 100644
--- a/tests/qtest/hd-geo-test.c
+++ b/tests/qtest/hd-geo-test.c
@@ -784,14 +784,12 @@ static void test_override_scsi(void)
 test_override(args, "pc", expected);
 }
 
-static void setup_pci_bridge(TestArgs *args, const char *id, const char 
*rootid)
+static void setup_pci_bridge(TestArgs *args, const char *id)
 {
 
-char *root, *br;
-root = g_strdup_printf("-device pcie-root-port,id=%s", rootid);
-br = g_strdup_printf("-device pcie-pci-bridge,bus=%s,id=%s", rootid, id);
+char *br;
+br = g_strdup_printf("-device pcie-pci-bridge,bus=pcie.0,id=%s", id);
 
-args->argc = append_arg(args->argc, args->argv, ARGV_SIZE, root);
 args->argc = append_arg(args->argc, args->argv, ARGV_SIZE, br);
 }
 
@@ -811,8 +809,8 @@ static void test_override_scsi_q35(void)
 add_drive_with_mbr(args, empty_mbr, 1);
 add_drive_with_mbr(args, empty_mbr, 1);
 add_drive_with_mbr(args, empty_mbr, 1);
-setup_pci_bridge(args, "pcie.0", "br");
-add_scsi_controller(args, "lsi53c895a", "br", 3);
+setup_pci_bridge(args, "pcie-pci-br");
+add_scsi_controller(args, "lsi53c895a", "pcie-pci-br", 3);
 add_scsi_disk(args, 0, 0, 0, 0, 0, 1, 120, 30);
 add_scsi_disk(args, 1, 0, 0, 1, 0, 9000, 120, 30);
 add_scsi_disk(args, 2, 0, 0, 2, 0, 1, 0, 0);
@@ -868,9 +866,9 @@ static void test_override_virtio_blk_q35(void)
 };
 add_drive_with_mbr(args, empty_mbr, 1);
 add_drive_with_mbr(args, empty_mbr, 1);
-setup_pci_bridge(args, "pcie.0", "br");
-add_virtio_disk(args, 0, "br", 3, 1, 120, 30);
-add_virtio_disk(args, 1, "br", 4, 9000, 120, 30);
+setup_pci_bridge(args, "pcie-pci-br");
+add_virtio_disk(args, 0, "pcie-pci-br", 3, 1, 120, 30);
+add_virtio_disk(args, 1, "pcie-pci-br", 4, 9000, 120, 30);
 test_override(args, "q35", expected);
 }
 
-- 
2.39.1

[PATCH v8 1/6] tests/acpi: allow changes in DSDT.noacpihp table blob

2023-07-05 Thread Ani Sinha

We are going to fix bio-tables-test in the next patch and hence need to
make sure the acpi tests continue to pass.

Signed-off-by: Ani Sinha 
Acked-by: Igor Mammedov 
---
 tests/qtest/bios-tables-test-allowed-diff.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..31df9c6187 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,2 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/q35/DSDT.noacpihp",
-- 
2.39.1

Re: [PATCH v8 6/6] hw/pci: add comment explaining the reason for checking function 0 in hotplug

2023-07-05 Thread Ani Sinha




> On 05-Jul-2023, at 5:29 PM, Ani Sinha  wrote:
> 
> This change is cosmetic. A comment is added explaining why we need to check 
> for
> the availability of function 0 when we hotplug a device.

Please ignore this patch. Its a duplicate of one already sent with an updated 
patch summary.

> 
> CC: m...@redhat.com
> Signed-off-by: Ani Sinha 
> ---
> hw/pci/pci.c | 11 ---
> 1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 62b393dfb7..7aee3a7f12 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -1181,9 +1181,14 @@ static PCIDevice *do_pci_register_device(PCIDevice 
> *pci_dev,
>PCI_SLOT(devfn), PCI_FUNC(devfn), name,
>bus->devices[devfn]->name, bus->devices[devfn]->qdev.id);
> return NULL;
> -} else if (dev->hotplugged &&
> -   !pci_is_vf(pci_dev) &&
> -   pci_get_function_0(pci_dev)) {
> +} /*
> +   * Populating function 0 triggers a scan from the guest that
> +   * exposes other non-zero functions. Hence we need to ensure that
> +   * function 0 wasn't added yet.
> +   */
> +else if (dev->hotplugged &&
> + !pci_is_vf(pci_dev) &&
> + pci_get_function_0(pci_dev)) {
> error_setg(errp, "PCI: slot %d function 0 already occupied by %s,"
>" new func %s cannot be exposed to guest.",
>PCI_SLOT(pci_get_function_0(pci_dev)->devfn),
> -- 
> 2.39.1
>

[PATCH v2 0/4] ppc/pnv: SMT support for powernv

2023-07-05 Thread Nicholas Piggin

These patches implement enough to install a distro, boot, run SMP KVM
guests with libvirt with good performance using MTTCG (as reported by
Cedric).

There are a few more SPRs that need to be done, and per-LPAR SPRs are
mostly not annotated yet so it can't run in 1 LPAR mode. But those can
be added in time, it will take a bit of time to get everything exactly
as hardware does so I consider this good enough to run common
software usefully.

Since RFC:
- Rebased against ppc-next (no conflicts vs upstream anyway).
- Add patch 4 avocado boot test with SMT, as was added with pseries SMT.
- Renamed POWERPC_FLAG_1LPAR to POWERPC_FLAG_SMT_1LPAR since it implies
  SMT.
- Fixed typos, patch 1, 3 changelogs improvement (hopefully).

Since v1:
- Fix clang compile bug
- Fix LPAR-per-thread bug in CTRL/DPDES/msgsndp in patch 1
- Add 2-socket test case to powernv Linux boot avocado test
- Remove SMT caveat from docs/system/ppc/powernv.rst

Thanks,
Nick

Nicholas Piggin (4):
  target/ppc: Add LPAR-per-core vs per-thread mode flag
  target/ppc: SMT support for the HID SPR
  ppc/pnv: SMT support for powernv
  tests/avocado: Add powernv machine test script

 docs/system/ppc/powernv.rst  |  5 ---
 hw/ppc/pnv.c | 12 +
 hw/ppc/pnv_core.c| 13 +++---
 hw/ppc/spapr_cpu_core.c  |  2 +
 target/ppc/cpu.h |  3 ++
 target/ppc/cpu_init.c| 14 +-
 target/ppc/excp_helper.c |  4 ++
 target/ppc/helper.h  |  1 +
 target/ppc/misc_helper.c | 29 
 target/ppc/spr_common.h  |  1 +
 target/ppc/translate.c   | 27 ---
 tests/avocado/ppc_powernv.py | 87 
 12 files changed, 179 insertions(+), 19 deletions(-)
 create mode 100644 tests/avocado/ppc_powernv.py

-- 
2.40.1

[PATCH v2 1/4] target/ppc: Add LPAR-per-core vs per-thread mode flag

2023-07-05 Thread Nicholas Piggin

The Power ISA has the concept of sub-processors:

  Hardware is allowed to sub-divide a multi-threaded processor into
  "sub-processors" that appear to privileged programs as multi-threaded
  processors with fewer threads.

POWER9 and POWER10 have two modes, either every thread is a
sub-processor or all threads appear as one multi-threaded processor. In
the user manuals these are known as "LPAR per thread" / "Thread LPAR",
and "LPAR per core" / "1 LPAR", respectively.

The practical difference is: in thread LPAR mode, non-hypervisor SPRs
are not shared between threads and msgsndp can not be used to message
siblings. In 1 LPAR mode, some SPRs are shared and msgsndp is usable.
Thrad LPAR allows multiple partitions to run concurrently on the same
core, and is a requirement for KVM to run on POWER9/10 (which does not
gang-schedule an LPAR on all threads of a core like POWER8 KVM).

Traditionally, SMT in PAPR environments including PowerVM and the
pseries QEMU machine with KVM acceleration behaves as in 1 LPAR mode.
In OPAL systems, Thread LPAR is used. When adding SMT to the powernv
machine, it is therefore preferable to emulate Thread LPAR.

To account for this difference between pseries and powernv, an LPAR mode
flag is added such that SPRs can be implemented as per-LPAR shared, and
that becomes either per-thread or per-core depending on the flag.

Reviewed-by: Joel Stanley 
Reviewed-by: Cédric Le Goater 
Tested-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/spapr_cpu_core.c  |  2 ++
 target/ppc/cpu.h |  3 +++
 target/ppc/cpu_init.c| 12 
 target/ppc/excp_helper.c |  4 
 target/ppc/misc_helper.c |  8 
 target/ppc/translate.c   | 11 ++-
 6 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index a4e3c2fadd..b482d9754a 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -270,6 +270,8 @@ static bool spapr_realize_vcpu(PowerPCCPU *cpu, 
SpaprMachineState *spapr,
 env->spr_cb[SPR_PIR].default_value = cs->cpu_index;
 env->spr_cb[SPR_TIR].default_value = thread_index;
 
+cpu_ppc_set_1lpar(cpu);
+
 /* Set time-base frequency to 512 MHz. vhyp must be set first. */
 cpu_ppc_tb_init(env, SPAPR_TIMEBASE_FREQ);
 
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index af12c93ebc..951f73d89d 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -674,6 +674,8 @@ enum {
 POWERPC_FLAG_SCV  = 0x0020,
 /* Has >1 thread per core*/
 POWERPC_FLAG_SMT  = 0x0040,
+/* Using "LPAR per core" mode  (as opposed to per-thread)*/
+POWERPC_FLAG_SMT_1LPAR= 0x0080,
 };
 
 /*
@@ -1437,6 +1439,7 @@ void store_booke_tsr(CPUPPCState *env, target_ulong val);
 void ppc_tlb_invalidate_all(CPUPPCState *env);
 void ppc_tlb_invalidate_one(CPUPPCState *env, target_ulong addr);
 void cpu_ppc_set_vhyp(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp);
+void cpu_ppc_set_1lpar(PowerPCCPU *cpu);
 int ppcmas_tlb_check(CPUPPCState *env, ppcmas_tlb_t *tlb, hwaddr *raddrp,
  target_ulong address, uint32_t pid);
 int ppcemb_tlb_search(CPUPPCState *env, target_ulong address, uint32_t pid);
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 5f4969664e..905a59aea9 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -6629,6 +6629,18 @@ void cpu_ppc_set_vhyp(PowerPCCPU *cpu, 
PPCVirtualHypervisor *vhyp)
 env->msr_mask &= ~MSR_HVB;
 }
 
+void cpu_ppc_set_1lpar(PowerPCCPU *cpu)
+{
+CPUPPCState *env = &cpu->env;
+
+/*
+ * pseries SMT means "LPAR per core" mode, e.g., msgsndp is usable
+ * between threads.
+ */
+if (env->flags & POWERPC_FLAG_SMT) {
+env->flags |= POWERPC_FLAG_SMT_1LPAR;
+}
+}
 #endif /* !defined(CONFIG_USER_ONLY) */
 
 #endif /* defined(TARGET_PPC64) */
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 354392668e..7683ea0fc9 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -3130,6 +3130,10 @@ void helper_book3s_msgsndp(CPUPPCState *env, 
target_ulong rb)
 
 helper_hfscr_facility_check(env, HFSCR_MSGP, "msgsndp", HFSCR_IC_MSGP);
 
+if (!(env->flags & POWERPC_FLAG_SMT_1LPAR)) {
+nr_threads = 1; /* msgsndp behaves as 1-thread in LPAR-per-thread 
mode*/
+}
+
 if (!dbell_type_server(rb) || ttir >= nr_threads) {
 return;
 }
diff --git a/target/ppc/misc_helper.c b/target/ppc/misc_helper.c
index 1f1af21f33..26e546cc9c 100644
--- a/target/ppc/misc_helper.c
+++ b/target/ppc/misc_helper.c
@@ -191,6 +191,10 @@ target_ulong helper_load_dpdes(CPUPPCState *env)
 
 helper_hfscr_facility_check(env, HFSCR_MSGP, "load DPDES", HFSCR_IC_MSGP);
 
+if (!(env->flags & POWERPC_FLAG_SMT_1LPAR)) {
+nr_threads = 1; /* DPDES behaves as 1-thread in LPAR-per-thread mode */
+}
+
 if (nr_threads == 1) {
 if (env->pending_interrupts &

[PATCH v2 3/4] ppc/pnv: SMT support for powernv

2023-07-05 Thread Nicholas Piggin

Set the TIR default value with the SMT thread index, and place some
standard limits on SMT configurations. Now powernv is able to boot
skiboot and Linux with a SMT topology, including booting a KVM guest.

There are several SPRs and other features (e.g., broadcast msgsnd)
that are not implemented, but not used by OPAL or Linux and can be
added incrementally.

Reviewed-by: Cédric Le Goater 
Tested-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 docs/system/ppc/powernv.rst |  5 -
 hw/ppc/pnv.c| 12 
 hw/ppc/pnv_core.c   | 13 +
 3 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/docs/system/ppc/powernv.rst b/docs/system/ppc/powernv.rst
index c8f9762342..09f3965858 100644
--- a/docs/system/ppc/powernv.rst
+++ b/docs/system/ppc/powernv.rst
@@ -195,11 +195,6 @@ Use a MTD drive to add a PNOR to the machine, and get a 
NVRAM :
 
   -drive file=./witherspoon.pnor,format=raw,if=mtd
 
-CAVEATS

-
- * No support for multiple HW threads (SMT=1). Same as pseries.
-
 Maintainer contact information
 --
 
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 5f25fe985a..23740f9d07 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -887,6 +887,18 @@ static void pnv_init(MachineState *machine)
 
 pnv->num_chips =
 machine->smp.max_cpus / (machine->smp.cores * machine->smp.threads);
+
+if (machine->smp.threads > 8) {
+error_report("Cannot support more than 8 threads/core "
+ "on a powernv machine");
+exit(1);
+}
+if (!is_power_of_2(machine->smp.threads)) {
+error_report("Cannot support %d threads/core on a powernv"
+ "machine because it must be a power of 2",
+ machine->smp.threads);
+exit(1);
+}
 /*
  * TODO: should we decide on how many chips we can create based
  * on #cores and Venice vs. Murano vs. Naples chip type etc...,
diff --git a/hw/ppc/pnv_core.c b/hw/ppc/pnv_core.c
index ffbc29cbf4..17c267fa61 100644
--- a/hw/ppc/pnv_core.c
+++ b/hw/ppc/pnv_core.c
@@ -218,12 +218,13 @@ static const MemoryRegionOps pnv_core_power10_xscom_ops = 
{
 .endianness = DEVICE_BIG_ENDIAN,
 };
 
-static void pnv_core_cpu_realize(PnvCore *pc, PowerPCCPU *cpu, Error **errp)
+static void pnv_core_cpu_realize(PnvCore *pc, PowerPCCPU *cpu, Error **errp,
+ int thread_index)
 {
 CPUPPCState *env = &cpu->env;
 int core_pir;
-int thread_index = 0; /* TODO: TCG supports only one thread */
 ppc_spr_t *pir = &env->spr_cb[SPR_PIR];
+ppc_spr_t *tir = &env->spr_cb[SPR_TIR];
 Error *local_err = NULL;
 PnvChipClass *pcc = PNV_CHIP_GET_CLASS(pc->chip);
 
@@ -239,11 +240,7 @@ static void pnv_core_cpu_realize(PnvCore *pc, PowerPCCPU 
*cpu, Error **errp)
 
 core_pir = object_property_get_uint(OBJECT(pc), "pir", &error_abort);
 
-/*
- * The PIR of a thread is the core PIR + the thread index. We will
- * need to find a way to get the thread index when TCG supports
- * more than 1. We could use the object name ?
- */
+tir->default_value = thread_index;
 pir->default_value = core_pir + thread_index;
 
 /* Set time-base frequency to 512 MHz */
@@ -292,7 +289,7 @@ static void pnv_core_realize(DeviceState *dev, Error **errp)
 }
 
 for (j = 0; j < cc->nr_threads; j++) {
-pnv_core_cpu_realize(pc, pc->threads[j], &local_err);
+pnv_core_cpu_realize(pc, pc->threads[j], &local_err, j);
 if (local_err) {
 goto err;
 }
-- 
2.40.1

[PATCH v2 2/4] target/ppc: SMT support for the HID SPR

2023-07-05 Thread Nicholas Piggin

HID is a per-core shared register, skiboot sets this (e.g., setting
HILE) on one thread and that must affect all threads of the core.

Reviewed-by: Cédric Le Goater 
Tested-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 target/ppc/cpu_init.c|  2 +-
 target/ppc/helper.h  |  1 +
 target/ppc/misc_helper.c | 21 +
 target/ppc/spr_common.h  |  1 +
 target/ppc/translate.c   | 16 
 5 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 905a59aea9..720aad9e05 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -5638,7 +5638,7 @@ static void register_power_common_book4_sprs(CPUPPCState 
*env)
 spr_register_hv(env, SPR_HID0, "HID0",
  SPR_NOACCESS, SPR_NOACCESS,
  SPR_NOACCESS, SPR_NOACCESS,
- &spr_read_generic, &spr_write_generic,
+ &spr_read_generic, &spr_core_write_generic,
  0x);
 spr_register_hv(env, SPR_TSCR, "TSCR",
  SPR_NOACCESS, SPR_NOACCESS,
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 828f7844c8..abec6fe341 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -704,6 +704,7 @@ DEF_HELPER_3(store_dcr, void, env, tl, tl)
 
 DEF_HELPER_2(load_dump_spr, void, env, i32)
 DEF_HELPER_2(store_dump_spr, void, env, i32)
+DEF_HELPER_3(spr_core_write_generic, void, env, i32, tl)
 DEF_HELPER_3(spr_write_CTRL, void, env, i32, tl)
 
 DEF_HELPER_4(fscr_facility_check, void, env, i32, i32, i32)
diff --git a/target/ppc/misc_helper.c b/target/ppc/misc_helper.c
index 26e546cc9c..692d058665 100644
--- a/target/ppc/misc_helper.c
+++ b/target/ppc/misc_helper.c
@@ -43,6 +43,27 @@ void helper_store_dump_spr(CPUPPCState *env, uint32_t sprn)
  env->spr[sprn]);
 }
 
+void helper_spr_core_write_generic(CPUPPCState *env, uint32_t sprn,
+   target_ulong val)
+{
+CPUState *cs = env_cpu(env);
+CPUState *ccs;
+uint32_t nr_threads = cs->nr_threads;
+uint32_t core_id = env->spr[SPR_PIR] & ~(nr_threads - 1);
+
+assert(core_id == env->spr[SPR_PIR] - env->spr[SPR_TIR]);
+
+if (nr_threads == 1) {
+env->spr[sprn] = val;
+return;
+}
+
+THREAD_SIBLING_FOREACH(cs, ccs) {
+CPUPPCState *cenv = &POWERPC_CPU(ccs)->env;
+cenv->spr[sprn] = val;
+}
+}
+
 void helper_spr_write_CTRL(CPUPPCState *env, uint32_t sprn,
target_ulong val)
 {
diff --git a/target/ppc/spr_common.h b/target/ppc/spr_common.h
index fbf52123b5..5995070eaf 100644
--- a/target/ppc/spr_common.h
+++ b/target/ppc/spr_common.h
@@ -82,6 +82,7 @@ void spr_noaccess(DisasContext *ctx, int gprn, int sprn);
 void spr_read_generic(DisasContext *ctx, int gprn, int sprn);
 void spr_write_generic(DisasContext *ctx, int sprn, int gprn);
 void spr_write_generic32(DisasContext *ctx, int sprn, int gprn);
+void spr_core_write_generic(DisasContext *ctx, int sprn, int gprn);
 void spr_write_MMCR0(DisasContext *ctx, int sprn, int gprn);
 void spr_write_MMCR1(DisasContext *ctx, int sprn, int gprn);
 void spr_write_PMC(DisasContext *ctx, int sprn, int gprn);
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 4556297ab5..e6a0709066 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -438,6 +438,22 @@ void spr_write_generic32(DisasContext *ctx, int sprn, int 
gprn)
 #endif
 }
 
+void spr_core_write_generic(DisasContext *ctx, int sprn, int gprn)
+{
+if (!(ctx->flags & POWERPC_FLAG_SMT)) {
+spr_write_generic(ctx, sprn, gprn);
+return;
+}
+
+if (!gen_serialize(ctx)) {
+return;
+}
+
+gen_helper_spr_core_write_generic(cpu_env, tcg_constant_i32(sprn),
+  cpu_gpr[gprn]);
+spr_store_dump_spr(sprn);
+}
+
 static void spr_write_CTRL_ST(DisasContext *ctx, int sprn, int gprn)
 {
 /* This does not implement >1 thread */
-- 
2.40.1

[PATCH v2 4/4] tests/avocado: Add powernv machine test script

2023-07-05 Thread Nicholas Piggin

This copies ppc_pseries.py to start a set of powernv tests, including
a Linux boot test for the newly added SMT mode.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
I didn't add the powernv10 support yet as Cedric suggested, and kept the
same vmlinuz because it's common with the pseries tests. We should do
that in later tests though. Might be time to update default to power10
soon if the model is becoming more complete...

Thanks,
Nick

 tests/avocado/ppc_powernv.py | 87 
 1 file changed, 87 insertions(+)
 create mode 100644 tests/avocado/ppc_powernv.py

diff --git a/tests/avocado/ppc_powernv.py b/tests/avocado/ppc_powernv.py
new file mode 100644
index 00..d0e5c07bde
--- /dev/null
+++ b/tests/avocado/ppc_powernv.py
@@ -0,0 +1,87 @@
+# Test that Linux kernel boots on ppc powernv machines and check the console
+#
+# Copyright (c) 2018, 2020 Red Hat, Inc.
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or
+# later.  See the COPYING file in the top-level directory.
+
+from avocado.utils import archive
+from avocado_qemu import QemuSystemTest
+from avocado_qemu import wait_for_console_pattern
+
+class powernvMachine(QemuSystemTest):
+
+timeout = 90
+KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 '
+panic_message = 'Kernel panic - not syncing'
+good_message = 'VFS: Cannot open root device'
+
+def do_test_linux_boot(self):
+self.require_accelerator("tcg")
+kernel_url = ('https://archives.fedoraproject.org/pub/archive'
+  '/fedora-secondary/releases/29/Everything/ppc64le/os'
+  '/ppc/ppc64/vmlinuz')
+kernel_hash = '3fe04abfc852b66653b8c3c897a59a689270bc77'
+kernel_path = self.fetch_asset(kernel_url, asset_hash=kernel_hash)
+
+self.vm.set_console()
+kernel_command_line = self.KERNEL_COMMON_COMMAND_LINE + 'console=hvc0'
+self.vm.add_args('-kernel', kernel_path,
+ '-append', kernel_command_line)
+self.vm.launch()
+
+def test_linux_boot(self):
+"""
+:avocado: tags=arch:ppc64
+:avocado: tags=machine:powernv
+:avocado: tags=accel:tcg
+"""
+
+self.do_test_linux_boot()
+console_pattern = 'VFS: Cannot open root device'
+wait_for_console_pattern(self, console_pattern, self.panic_message)
+
+def test_linux_smp_boot(self):
+"""
+:avocado: tags=arch:ppc64
+:avocado: tags=machine:powernv
+:avocado: tags=accel:tcg
+"""
+
+self.vm.add_args('-smp', '4')
+self.do_test_linux_boot()
+console_pattern = 'smp: Brought up 1 node, 4 CPUs'
+wait_for_console_pattern(self, console_pattern, self.panic_message)
+wait_for_console_pattern(self, self.good_message, self.panic_message)
+
+def test_linux_smt_boot(self):
+"""
+:avocado: tags=arch:ppc64
+:avocado: tags=machine:powernv
+:avocado: tags=accel:tcg
+"""
+
+self.vm.add_args('-smp', '4,threads=4')
+self.do_test_linux_boot()
+console_pattern = 'CPU maps initialized for 4 threads per core'
+wait_for_console_pattern(self, console_pattern, self.panic_message)
+console_pattern = 'smp: Brought up 1 node, 4 CPUs'
+wait_for_console_pattern(self, console_pattern, self.panic_message)
+wait_for_console_pattern(self, self.good_message, self.panic_message)
+
+def test_linux_big_boot(self):
+"""
+:avocado: tags=arch:ppc64
+:avocado: tags=machine:powernv
+:avocado: tags=accel:tcg
+"""
+
+self.vm.add_args('-smp', '16,threads=4,cores=2,sockets=2')
+
+# powernv does not support NUMA
+self.do_test_linux_boot()
+console_pattern = 'CPU maps initialized for 4 threads per core'
+wait_for_console_pattern(self, console_pattern, self.panic_message)
+console_pattern = 'smp: Brought up 2 nodes, 16 CPUs'
+wait_for_console_pattern(self, console_pattern, self.panic_message)
+wait_for_console_pattern(self, self.good_message, self.panic_message)
-- 
2.40.1

Re: [PATCH v2] linux-user/syscall: Implement execve without execveat

2023-07-05 Thread Pierrick Bouvier


On 7/5/23 12:26, Michael Tokarev wrote:

05.07.2023 12:00, Pierrick Bouvier wrote:
...

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 08162cc966..4945ddd7f2 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -659,6 +659,7 @@ safe_syscall4(pid_t, wait4, pid_t, pid, int *, status, int, 
options, \
   #endif
   safe_syscall5(int, waitid, idtype_t, idtype, id_t, id, siginfo_t *, infop, \
 int, options, struct rusage *, rusage)
+safe_syscall3(int, execve, const char *, filename, char **, argv, char **, 
envp)
   safe_syscall5(int, execveat, int, dirfd, const char *, filename,
 char **, argv, char **, envp, int, flags)
   #if defined(TARGET_NR_select) || defined(TARGET_NR__newselect) || \
@@ -8615,9 +8616,9 @@ ssize_t do_guest_readlink(const char *pathname, char 
*buf, size_t bufsiz)
   return ret;
   }
   
-static int do_execveat(CPUArchState *cpu_env, int dirfd,

-   abi_long pathname, abi_long guest_argp,
-   abi_long guest_envp, int flags)
+static int do_execv(CPUArchState *cpu_env, int dirfd,
+abi_long pathname, abi_long guest_argp,
+abi_long guest_envp, int flags, bool is_execveat)
   {
   int ret;
   char **argp, **envp;
@@ -8696,11 +8697,14 @@ static int do_execveat(CPUArchState *cpu_env, int dirfd,
   goto execve_efault;
   }
   
+const char* exe = p;

   if (is_proc_myself(p, "exe")) {
-ret = get_errno(safe_execveat(dirfd, exec_path, argp, envp, flags));
-} else {
-ret = get_errno(safe_execveat(dirfd, p, argp, envp, flags));
+exe = exec_path;
   }
+ret = is_execveat ?
+safe_execveat(dirfd, exe, argp, envp, flags):
+safe_execve(exe, argp, envp);
+ret = get_errno(ret);



This one has 2 issues reported by checkpatch.pl:

$ git show | ./scripts/checkpatch.pl -

ERROR: "foo* bar" should be "foo *bar"
#161: FILE: linux-user/syscall.c:8700:
+const char* exe = p;

ERROR: spaces required around that ':' (ctx:VxE)
#169: FILE: linux-user/syscall.c:8705:
+safe_execveat(dirfd, exe, argp, envp, flags):
   ^

total: 2 errors, 0 warnings, 47 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

As I mentioned in the v1, I don't remember offhand how the arithmetic if
should be styled in qemu.  At the very best, the v2 variant is difficult
to read because ":" is too close to ";" visually, an extra space before
it will make it more explicit.



Sorry about it.
I'll fix it, and use style advised by Philippe.


Other than that, I'm fine with this version.

With the checkpatch issues fixed,

Reviewed-by: Michael Tokarev 

Thanks,

/mjt

[PATCH v3] linux-user/syscall: Implement execve without execveat

2023-07-05 Thread Pierrick Bouvier

Support for execveat syscall was implemented in 55bbe4 and is available
since QEMU 8.0.0. It relies on host execveat, which is widely available
on most of Linux kernels today.

However, this change breaks qemu-user self emulation, if "host" qemu
version is less than 8.0.0. Indeed, it does not implement yet execveat.
This strange use case happens with most of distribution today having
binfmt support.

With a concrete failing example:
$ qemu-x86_64-7.2 qemu-x86_64-8.0 /bin/bash -c /bin/ls
/bin/bash: line 1: /bin/ls: Function not implemented
-> not implemented means execve returned ENOSYS

qemu-user-static 7.2 and 8.0 can be conveniently grabbed from debian
packages qemu-user-static* [1].

One usage of this is running wine-arm64 from linux-x64 (details [2]).
This is by updating qemu embedded in docker image that we ran into this
issue.

The solution to update host qemu is not always possible. Either it's
complicated or ask you to recompile it, or simply is not accessible
(GitLab CI, GitHub Actions). Thus, it could be worth to implement execve
without relying on execveat, which is the goal of this patch.

This patch was tested with example presented in this commit message.

[1] http://ftp.us.debian.org/debian/pool/main/q/qemu/
[1] https://www.linaro.org/blog/emulate-windows-on-arm/

Signed-off-by: Pierrick Bouvier 
---
 linux-user/syscall.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 08162cc966..90777c5833 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -659,6 +659,7 @@ safe_syscall4(pid_t, wait4, pid_t, pid, int *, status, int, 
options, \
 #endif
 safe_syscall5(int, waitid, idtype_t, idtype, id_t, id, siginfo_t *, infop, \
   int, options, struct rusage *, rusage)
+safe_syscall3(int, execve, const char *, filename, char **, argv, char **, 
envp)
 safe_syscall5(int, execveat, int, dirfd, const char *, filename,
   char **, argv, char **, envp, int, flags)
 #if defined(TARGET_NR_select) || defined(TARGET_NR__newselect) || \
@@ -8615,9 +8616,9 @@ ssize_t do_guest_readlink(const char *pathname, char 
*buf, size_t bufsiz)
 return ret;
 }
 
-static int do_execveat(CPUArchState *cpu_env, int dirfd,
-   abi_long pathname, abi_long guest_argp,
-   abi_long guest_envp, int flags)
+static int do_execv(CPUArchState *cpu_env, int dirfd,
+abi_long pathname, abi_long guest_argp,
+abi_long guest_envp, int flags, bool is_execveat)
 {
 int ret;
 char **argp, **envp;
@@ -8696,11 +8697,14 @@ static int do_execveat(CPUArchState *cpu_env, int dirfd,
 goto execve_efault;
 }
 
+const char *exe = p;
 if (is_proc_myself(p, "exe")) {
-ret = get_errno(safe_execveat(dirfd, exec_path, argp, envp, flags));
-} else {
-ret = get_errno(safe_execveat(dirfd, p, argp, envp, flags));
+exe = exec_path;
 }
+ret = is_execveat
+? safe_execveat(dirfd, exe, argp, envp, flags)
+: safe_execve(exe, argp, envp);
+ret = get_errno(ret);
 
 unlock_user(p, pathname, 0);
 
@@ -9251,9 +9255,9 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int 
num, abi_long arg1,
 return ret;
 #endif
 case TARGET_NR_execveat:
-return do_execveat(cpu_env, arg1, arg2, arg3, arg4, arg5);
+return do_execv(cpu_env, arg1, arg2, arg3, arg4, arg5, true);
 case TARGET_NR_execve:
-return do_execveat(cpu_env, AT_FDCWD, arg1, arg2, arg3, 0);
+return do_execv(cpu_env, AT_FDCWD, arg1, arg2, arg3, 0, false);
 case TARGET_NR_chdir:
 if (!(p = lock_user_string(arg1)))
 return -TARGET_EFAULT;
-- 
2.40.1

Re: [PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"

2023-07-05 Thread Mark Kanda


On 7/5/2023 2:15 AM, Stefano Garzarella wrote:

This reverts commit 8cc5583abe6419e7faaebc9fbd109f34f4c850f2.

That commit causes several problems in Linux as described in the BZ.
In particular, after a while, other devices on the bus are no longer
usable even if those devices are not affected by the hotunplug.
This may be a problem in Linux, but we have not been able to identify
it so far. So better to revert this patch until we find a solution.

Also, Oracle, which initially proposed this patch for a problem with
Solaris, seems to have already reversed it downstream:
 https://linux.oracle.com/errata/ELSA-2023-12065.html

Suggested-by: Thomas Huth 
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2176702
Cc: qemu-sta...@nongnu.org
Cc: Mark Kanda 
Signed-off-by: Stefano Garzarella 


Reviewed-by: Mark Kanda 

Thanks/regards,
-Mark


---
  include/hw/scsi/scsi.h |  1 -
  hw/scsi/scsi-bus.c | 18 --
  hw/scsi/virtio-scsi.c  |  2 --
  3 files changed, 21 deletions(-)

diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
index e2bb1a2fbf..7c8adf10b1 100644
--- a/include/hw/scsi/scsi.h
+++ b/include/hw/scsi/scsi.h
@@ -198,7 +198,6 @@ SCSIDevice *scsi_bus_legacy_add_drive(SCSIBus *bus, 
BlockBackend *blk,
BlockdevOnError rerror,
BlockdevOnError werror,
const char *serial, Error **errp);
-void scsi_bus_set_ua(SCSIBus *bus, SCSISense sense);
  void scsi_bus_legacy_handle_cmdline(SCSIBus *bus);
  
  SCSIRequest *scsi_req_alloc(const SCSIReqOps *reqops, SCSIDevice *d,

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index f80f4cb4fc..42a915f8b7 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -1617,24 +1617,6 @@ static int scsi_ua_precedence(SCSISense sense)
  return (sense.asc << 8) | sense.ascq;
  }
  
-void scsi_bus_set_ua(SCSIBus *bus, SCSISense sense)

-{
-int prec1, prec2;
-if (sense.key != UNIT_ATTENTION) {
-return;
-}
-
-/*
- * Override a pre-existing unit attention condition, except for a more
- * important reset condition.
- */
-prec1 = scsi_ua_precedence(bus->unit_attention);
-prec2 = scsi_ua_precedence(sense);
-if (prec2 < prec1) {
-bus->unit_attention = sense;
-}
-}
-
  void scsi_device_set_ua(SCSIDevice *sdev, SCSISense sense)
  {
  int prec1, prec2;
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 45b95ea070..1f56607100 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -1080,7 +1080,6 @@ static void virtio_scsi_hotplug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
  
  virtio_scsi_acquire(s);

  virtio_scsi_push_event(s, &info);
-scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
  virtio_scsi_release(s);
  }
  }
@@ -1112,7 +,6 @@ static void virtio_scsi_hotunplug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
  if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
  virtio_scsi_acquire(s);
  virtio_scsi_push_event(s, &info);
-scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
  virtio_scsi_release(s);
  }
  }

Re: [PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"

2023-07-05 Thread Stefano Garzarella


Hi Mark,

On Wed, Jul 05, 2023 at 07:28:05AM -0500, Mark Kanda wrote:

On 7/5/2023 2:15 AM, Stefano Garzarella wrote:

This reverts commit 8cc5583abe6419e7faaebc9fbd109f34f4c850f2.

That commit causes several problems in Linux as described in the BZ.
In particular, after a while, other devices on the bus are no longer
usable even if those devices are not affected by the hotunplug.
This may be a problem in Linux, but we have not been able to identify
it so far. So better to revert this patch until we find a solution.

Also, Oracle, which initially proposed this patch for a problem with
Solaris, seems to have already reversed it downstream:
https://linux.oracle.com/errata/ELSA-2023-12065.html

Suggested-by: Thomas Huth 
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2176702
Cc: qemu-sta...@nongnu.org
Cc: Mark Kanda 
Signed-off-by: Stefano Garzarella 


Reviewed-by: Mark Kanda 



Thanks for the review.

By any chance do you have any information you can share regarding
[Orabug: 34905939] mentioned in the errata?

I'd like to better understand why this patch created problems in Linux,
but solved others in Solaris.

Thanks,
Stefano

Re: [PATCH v4] virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events.

2023-07-05 Thread Mark Kanda


HI Stefano,

On 7/4/2023 9:14 AM, Stefano Garzarella wrote:

Hi Mark,
we have a bug [1] possibly related to this patch.

I saw this Oracle Linux errata [2] where you reverted this patch, but
there are no details.

Do you think we should revert it upstream as well?
Do you have any details about the problem it causes in Linux?


While the patch fixed an issue with Solaris VMs, we unfortunately discovered 
regressions with hotplug on Linux VMs. The symptoms were similar to what is 
reported in bz2176702. I think reverting it is the right thing to do until this 
can be investigated further.


Thanks/regards,
-Mark


[1] https://bugzilla.redhat.com/show_bug.cgi?id=2176702
[2] https://linux.oracle.com/errata/ELSA-2023-12065.html

Thanks,
Stefano

On Thu, Oct 6, 2022 at 9:53 PM Venu Busireddy  wrote:

Section 5.6.6.3 of VirtIO specification states, "Events will also
be reported via sense codes..." However, no sense data is sent when
VIRTIO_SCSI_EVT_RESET_RESCAN or VIRTIO_SCSI_EVT_RESET_REMOVED events
are reported (when disk hotplug/hotunplug events occur). SCSI layer
on Solaris depends on this sense data, and hence does not handle disk
hotplug/hotunplug events.

When the disk inventory changes, use the bus unit attention mechanism
to return a CHECK_CONDITION status with sense data of 0x06/0x3F/0x0E
(sense code REPORTED_LUNS_CHANGED).

Signed-off-by: Venu Busireddy 

v3 -> v4:
 - As suggested by Paolo Bonzini, use the bus unit attention mechanism
   instead of the device unit attention mechanism.

v2 -> v3:
 - Implement the suggestion from Paolo Bonzini .

v1 -> v2:
 - Send the sense data for VIRTIO_SCSI_EVT_RESET_REMOVED event too.
---
  hw/scsi/scsi-bus.c | 18 ++
  hw/scsi/virtio-scsi.c  |  2 ++
  include/hw/scsi/scsi.h |  1 +
  3 files changed, 21 insertions(+)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 4403717c4aaf..ceceafb2cdf3 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -1616,6 +1616,24 @@ static int scsi_ua_precedence(SCSISense sense)
  return (sense.asc << 8) | sense.ascq;
  }

+void scsi_bus_set_ua(SCSIBus *bus, SCSISense sense)
+{
+int prec1, prec2;
+if (sense.key != UNIT_ATTENTION) {
+return;
+}
+
+/*
+ * Override a pre-existing unit attention condition, except for a more
+ * important reset condition.
+ */
+prec1 = scsi_ua_precedence(bus->unit_attention);
+prec2 = scsi_ua_precedence(sense);
+if (prec2 < prec1) {
+bus->unit_attention = sense;
+}
+}
+
  void scsi_device_set_ua(SCSIDevice *sdev, SCSISense sense)
  {
  int prec1, prec2;
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 41f2a5630173..cf2721aa46c0 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -956,6 +956,7 @@ static void virtio_scsi_hotplug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
  virtio_scsi_push_event(s, sd,
 VIRTIO_SCSI_T_TRANSPORT_RESET,
 VIRTIO_SCSI_EVT_RESET_RESCAN);
+scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
  virtio_scsi_release(s);
  }
  }
@@ -973,6 +974,7 @@ static void virtio_scsi_hotunplug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
  virtio_scsi_push_event(s, sd,
 VIRTIO_SCSI_T_TRANSPORT_RESET,
 VIRTIO_SCSI_EVT_RESET_REMOVED);
+scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
  virtio_scsi_release(s);
  }

diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
index 001103488c23..3b1b3d278ee8 100644
--- a/include/hw/scsi/scsi.h
+++ b/include/hw/scsi/scsi.h
@@ -186,6 +186,7 @@ SCSIDevice *scsi_bus_legacy_add_drive(SCSIBus *bus, 
BlockBackend *blk,
BlockdevOnError rerror,
BlockdevOnError werror,
const char *serial, Error **errp);
+void scsi_bus_set_ua(SCSIBus *bus, SCSISense sense);
  void scsi_bus_legacy_handle_cmdline(SCSIBus *bus);
  void scsi_legacy_handle_cmdline(void);

Re: [PATCH] Hexagon: move GETPC() calls to top level helpers

2023-07-05 Thread Matheus Tavares Bernardino

> Taylor  wrote:
>
> > Matheus Tavares Bernardino  wrote:
> >
> > diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h index
> > 5451b061ee..efb8013912 100644
> > --- a/target/hexagon/macros.h
> > +++ b/target/hexagon/macros.h
> > +
> > +#define MEM_LOADn(SIZE, VA) ({ \
> > +check_noshuf(env, pkt_has_store_s1, slot, VA, SIZE); \
> > +cpu_ldub_data_ra(env, VA, GETPC()); \
> > +})
> 
> Note that check_noshuf calls HELPER(probe_noshuf_load) and
> HELPER(commit_store).  Both of those call GETPC() from within.  So, you'll
> need to pull the contents into separate functions that take ra as an
> argument.

Ah, good point. It was my understanding that, in case of a memory
exception in one of those nested helper calls, the GETPC() we would want
to use for unwinding was the one from the most recent helper. I'm still
trying to wrap my head around these concepts, though, so I might have
misunderstood it. Is this not the case?

> Does this pass the test suite?  You are only using the SIZE parameter in
> check_noshuf, but cpu_ldub_data_ra only reads a single byte.

My oversight, this has to be fixed, thanks.

Re: [PATCH v6 1/9] migration: introduced 'MigrateAddress' in QAPI for migration wire protocol.

2023-07-05 Thread Het Gala




On 05/07/23 4:51 pm, Markus Armbruster wrote:

Het Gala  writes:


This patch introduces well defined MigrateAddress struct and its related
child objects.

The existing argument of 'migrate' and 'migrate-incoming' QAPI - 'uri'
is of string type. The current migration flow follows double encoding
scheme for  fetching migration parameters such as 'uri' and this is
not an ideal design.

Motive for intoducing struct level design is to prevent double encoding
of QAPI arguments, as Qemu should be able to directly use the QAPI
arguments without any level of encoding.

Suggested-by: Aravind Retnakaran 
Signed-off-by: Het Gala 
Reviewed-by: Juan Quintela 
Reviewed-by: Daniel P. Berrangé 
---
  qapi/migration.json | 45 +
  1 file changed, 45 insertions(+)

diff --git a/qapi/migration.json b/qapi/migration.json
index 179af0c4d8..e61d25eba2 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1407,6 +1407,51 @@
  ##
  { 'command': 'migrate-continue', 'data': {'state': 'MigrationStatus'} }
  
+##

+# @MigrationAddressType:
+#
+# The migration stream transport mechanisms.
+#
+# @socket: Migrate via socket.
+#
+# @exec: Direct the migration stream to another process.
+#
+# @rdma: Migrate via RDMA.
+#
+# Since 8.1
+##
+{ 'enum': 'MigrationAddressType',
+  'data': ['socket', 'exec', 'rdma'] }
+
+##
+# @MigrationExecCommand:
+#
+# @args: list of commands for migraton stream execution to a file.

Typo: migration


+#
+# Notes:
+#
+# 1. @args[0] needs to be the path to the new program.

@args can't be a "list of commands", as we're spawning just one process.
So what is it?

Digging through the code with the entire series applied...  Member @args
is used in two places:

1. qemu_start_incoming_migration() passes it to
exec_start_incoming_migration(), which translates it into an array
and passes it to qio_channel_command_new_spawn().

2. qmp_migrate() passes it to exec_start_outgoing_migration(), which
does the same.

qio_channel_command_new_spawn() passes it to
g_spawn_async_with_pipes().  A close read of the latter's documentation
leads me to:

* args[0] is the excutable's file name.  As usual, a relative name is
   relative to the QEMU process's current working directory.

* args[1..] are the arguments.

Unlike POSIX interfaces like execv() and posix_spawn(), this doesn't
separate the executable's file name and 0-th argument.

In short, the head of @args is the executable's filename, and the
remainder are the arguments.  The fact that the the executable's
filename is passed as 0-th argument to the child process is detail.

Perhaps this could do:

##
# @MigrationExecCommand:
#
# @args: command and arguments to execute.

If we want more detail, perhaps:

# @args: command (list head) and arguments (list tail) to execute.

Not sure we need it.  Thoughts?
From a user prespective, I think defining that command / executable's 
filename is the list head would be good ? something like


@args: command (list head) and arguments to execute.


+#
+# Since 8.1
+##
+{ 'struct': 'MigrationExecCommand',
+  'data': {'args': [ 'str' ] } }
+
+##
+# @MigrationAddress:
+#
+# Migration endpoint configuration.
+#
+# Since 8.1
+##
+{ 'union': 'MigrationAddress',
+  'base': { 'transport' : 'MigrationAddressType'},
+  'discriminator': 'transport',
+  'data': {
+'socket': 'SocketAddress',
+'exec': 'MigrationExecCommand',
+'rdma': 'InetSocketAddress' } }
+
  ##
  # @migrate:
  #

Regards,
Het Gala

Re: [PATCH v6 1/9] migration: introduced 'MigrateAddress' in QAPI for migration wire protocol.

2023-07-05 Thread Markus Armbruster

Het Gala  writes:

> On 05/07/23 4:51 pm, Markus Armbruster wrote:
>> Het Gala  writes:
>>
>>> This patch introduces well defined MigrateAddress struct and its related
>>> child objects.
>>>
>>> The existing argument of 'migrate' and 'migrate-incoming' QAPI - 'uri'
>>> is of string type. The current migration flow follows double encoding
>>> scheme for  fetching migration parameters such as 'uri' and this is
>>> not an ideal design.
>>>
>>> Motive for intoducing struct level design is to prevent double encoding
>>> of QAPI arguments, as Qemu should be able to directly use the QAPI
>>> arguments without any level of encoding.
>>>
>>> Suggested-by: Aravind Retnakaran 
>>> Signed-off-by: Het Gala 
>>> Reviewed-by: Juan Quintela 
>>> Reviewed-by: Daniel P. Berrangé 
>>> ---
>>>   qapi/migration.json | 45 +
>>>   1 file changed, 45 insertions(+)
>>>
>>> diff --git a/qapi/migration.json b/qapi/migration.json
>>> index 179af0c4d8..e61d25eba2 100644
>>> --- a/qapi/migration.json
>>> +++ b/qapi/migration.json
>>> @@ -1407,6 +1407,51 @@
>>> ##
>>> { 'command': 'migrate-continue', 'data': {'state': 'MigrationStatus'} }
>>> +##
>>> +# @MigrationAddressType:
>>> +#
>>> +# The migration stream transport mechanisms.
>>> +#
>>> +# @socket: Migrate via socket.
>>> +#
>>> +# @exec: Direct the migration stream to another process.
>>> +#
>>> +# @rdma: Migrate via RDMA.
>>> +#
>>> +# Since 8.1
>>> +##
>>> +{ 'enum': 'MigrationAddressType',
>>> +  'data': ['socket', 'exec', 'rdma'] }
>>> +
>>> +##
>>> +# @MigrationExecCommand:
>>> +#
>>> +# @args: list of commands for migraton stream execution to a file.
>>
>> Typo: migration
>>
>>> +#
>>> +# Notes:
>>> +#
>>> +# 1. @args[0] needs to be the path to the new program.
>>
>> @args can't be a "list of commands", as we're spawning just one process.
>> So what is it?
>>
>> Digging through the code with the entire series applied...  Member @args
>> is used in two places:
>>
>> 1. qemu_start_incoming_migration() passes it to
>> exec_start_incoming_migration(), which translates it into an array
>> and passes it to qio_channel_command_new_spawn().
>>
>> 2. qmp_migrate() passes it to exec_start_outgoing_migration(), which
>> does the same.
>>
>> qio_channel_command_new_spawn() passes it to
>> g_spawn_async_with_pipes().  A close read of the latter's documentation
>> leads me to:
>>
>> * args[0] is the excutable's file name.  As usual, a relative name is
>>relative to the QEMU process's current working directory.
>>
>> * args[1..] are the arguments.
>>
>> Unlike POSIX interfaces like execv() and posix_spawn(), this doesn't
>> separate the executable's file name and 0-th argument.
>>
>> In short, the head of @args is the executable's filename, and the
>> remainder are the arguments.  The fact that the the executable's
>> filename is passed as 0-th argument to the child process is detail.
>>
>> Perhaps this could do:
>>
>> ##
>> # @MigrationExecCommand:
>> #
>> # @args: command and arguments to execute.
>>
>> If we want more detail, perhaps:
>>
>> # @args: command (list head) and arguments (list tail) to execute.
>>
>> Not sure we need it.  Thoughts?
>
> From a user prespective, I think defining that command / executable's 
> filename is the list head would be good ? something like
>
> @args: command (list head) and arguments to execute.

Works for me.

[...]

Re: [PATCH 2/2] virtio-iommu: Rework the trace in virtio_iommu_set_page_size_mask()

2023-07-05 Thread Eric Auger

Hi Zhenghong,

On 7/5/23 10:17, Duan, Zhenzhong wrote:
>
>> -Original Message-
>> From: Duan, Zhenzhong
>> Sent: Wednesday, July 5, 2023 12:56 PM
>> Subject: RE: [PATCH 2/2] virtio-iommu: Rework the trace in
>> virtio_iommu_set_page_size_mask()
>>
>>
>>
>>> -Original Message-
>>> From: Eric Auger 
>>> Sent: Tuesday, July 4, 2023 7:15 PM
>>> To: eric.auger@gmail.com; eric.au...@redhat.com; qemu-
>>> de...@nongnu.org; qemu-...@nongnu.org; m...@redhat.com; jean-
>>> phili...@linaro.org; Duan, Zhenzhong 
>>> Cc: alex.william...@redhat.com; c...@redhap.com;
>> bharat.bhus...@nxp.com;
>>> peter.mayd...@linaro.org
>>> Subject: [PATCH 2/2] virtio-iommu: Rework the trace in
>>> virtio_iommu_set_page_size_mask()
>>>
>>> The current error messages in virtio_iommu_set_page_size_mask() sound
>>> quite similar for different situations and miss the IOMMU memory region
>>> that causes the issue.
>>>
>>> Clarify them and rework the comment.
>>>
>>> Also remove the trace when the new page_size_mask is not applied as the
>>> current frozen granule is kept. This message is rather confusing for
>>> the end user and anyway the current granule would have been used by the
>>> driver
>>>
>>> Signed-off-by: Eric Auger 
>>> ---
>>> hw/virtio/virtio-iommu.c | 19 +++
>>> 1 file changed, 7 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c index
>>> 1eaf81bab5..0d9f7196fe 100644
>>> --- a/hw/virtio/virtio-iommu.c
>>> +++ b/hw/virtio/virtio-iommu.c
>>> @@ -1101,29 +1101,24 @@ static int
>>> virtio_iommu_set_page_size_mask(IOMMUMemoryRegion *mr,
>>>   new_mask);
>>>
>>> if ((cur_mask & new_mask) == 0) {
>>> -error_setg(errp, "virtio-iommu page mask 0x%"PRIx64
>>> -   " is incompatible with mask 0x%"PRIx64, cur_mask, 
>>> new_mask);
>>> +error_setg(errp, "virtio-iommu %s reports a page size mask 
>>> 0x%"PRIx64
>>> +   " incompatible with currently supported mask 0x%"PRIx64,
>>> +   mr->parent_obj.name, new_mask, cur_mask);
>>> return -1;
>>> }
>>>
>>> /*
>>>  * Once the granule is frozen we can't change the mask anymore. If by
>>>  * chance the hotplugged device supports the same granule, we can still
>>> - * accept it. Having a different masks is possible but the guest will 
>>> use
>>> - * sub-optimal block sizes, so warn about it.
>>> + * accept it.
>>>  */
>>> if (s->granule_frozen) {
>>> -int new_granule = ctz64(new_mask);
>>> int cur_granule = ctz64(cur_mask);
>>>
>>> -if (new_granule != cur_granule) {
>>> -error_setg(errp, "virtio-iommu page mask 0x%"PRIx64
>>> -   " is incompatible with mask 0x%"PRIx64, cur_mask,
>>> -   new_mask);
>>> +if (!(BIT(cur_granule) & new_mask)) {
> Sorry, I read this piece code again and got a question, if new_mask has finer
> granularity than cur_granule, should we allow it to pass even though
> BIT(cur_granule) is not set?
I think this should work but this is not straightforward to test.
virtio-iommu would use the current granule for map/unmap. In map/unmap
notifiers, this is split into pow2 ranges and cascaded to VFIO through
vfio_dma_map/unmap. The iova and size are aligned with the smaller
supported granule.

Jean, do you share this understanding or do I miss something.

Nevertheless the current code would have rejected that case and nobody
complained at that point ;-)

thanks

Eric

>
> Thanks
> Zhenzhong
>
>> Good catch.
>>
>> Reviewed-by: Zhenzhong Duan 
>>
>> Thanks
>> Zhenzhong
>>
>>> +error_setg(errp, "virtio-iommu %s does not support frozen
>>> + granule
>>> 0x%"PRIx64,
>>> +   mr->parent_obj.name, BIT(cur_granule));
>>> return -1;
>>> -} else if (new_mask != cur_mask) {
>>> -warn_report("virtio-iommu page mask 0x%"PRIx64
>>> -" does not match 0x%"PRIx64, cur_mask, new_mask);
>>> }
>>> return 0;
>>> }
>>> --
>>> 2.38.1

Re: [PATCH v6 2/9] migration: convert uri parameter into 'MigrateAddress' struct

2023-07-05 Thread Het Gala




On 05/07/23 4:59 pm, Markus Armbruster wrote:

Drive-by comment...

Het Gala  writes:


This patch parses 'migrate' and 'migrate-incoming' QAPI's 'uri' parameter
with all the migration connection related information and stores them
inside well defined 'MigrateAddress' struct.

Misc: limit line width in exec.c to 80 characters recommended by Qemu.

Suggested-by: Aravind Retnakaran 
Signed-off-by: Het Gala 

[...]


diff --git a/migration/migration.c b/migration/migration.c
index dc05c6f6ea..0eb25bb5a4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -64,6 +64,7 @@
  #include "yank_functions.h"
  #include "sysemu/qtest.h"
  #include "options.h"
+#include "qemu/sockets.h"
  
  static NotifierList migration_state_notifiers =

  NOTIFIER_LIST_INITIALIZER(migration_state_notifiers);
@@ -420,15 +421,63 @@ void migrate_add_address(SocketAddress *address)
QAPI_CLONE(SocketAddress, address));
  }
  
+static bool migrate_uri_parse(const char *uri,

+  MigrationAddress **channel,
+  Error **errp)
+{
+g_autoptr(MigrationAddress) addrs = g_new0(MigrationAddress, 1);

@addrs suggests plural, yet it points to a single MigrationAddress.
Recommend @addr.

Same elsewhere.

Ack.

[...]

Regards,
Het Gala

[PATCH v2 0/3] util/fifo8: Introduce fifo8_peek_buf()

2023-07-05 Thread Philippe Mathieu-Daudé

Extracted from "hw/char/pl011: Implement TX (async) FIFO":
https://lore.kernel.org/qemu-devel/20230522153144.30610-1-phi...@linaro.org/

All series reviewed.

Philippe Mathieu-Daudé (3):
  util/fifo8: Fix typo in fifo8_push_all() description
  util/fifo8: Allow fifo8_pop_buf() to not populate popped length
  util/fifo8: Introduce fifo8_peek_buf()

 include/qemu/fifo8.h | 39 +--
 util/fifo8.c | 28 +++-
 2 files changed, 56 insertions(+), 11 deletions(-)

-- 
2.38.1

[PATCH v2 1/3] util/fifo8: Fix typo in fifo8_push_all() description

2023-07-05 Thread Philippe Mathieu-Daudé

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Francisco Iglesias 
Reviewed-by: Alex Bennée 
---
 include/qemu/fifo8.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/qemu/fifo8.h b/include/qemu/fifo8.h
index 28bf2cee57..16be02f361 100644
--- a/include/qemu/fifo8.h
+++ b/include/qemu/fifo8.h
@@ -46,7 +46,7 @@ void fifo8_push(Fifo8 *fifo, uint8_t data);
  * fifo8_push_all:
  * @fifo: FIFO to push to
  * @data: data to push
- * @size: number of bytes to push
+ * @num: number of bytes to push
  *
  * Push a byte array to the FIFO. Behaviour is undefined if the FIFO is full.
  * Clients are responsible for checking the space left in the FIFO using
-- 
2.38.1

[PATCH v2 2/3] util/fifo8: Allow fifo8_pop_buf() to not populate popped length

2023-07-05 Thread Philippe Mathieu-Daudé

There might be cases where we know the number of bytes we can
pop from the FIFO, or we simply don't care how many bytes is
returned. Allow fifo8_pop_buf() to take a NULL numptr.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Francisco Iglesias 
Reviewed-by: Alex Bennée 
---
 include/qemu/fifo8.h | 10 +-
 util/fifo8.c | 12 
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/include/qemu/fifo8.h b/include/qemu/fifo8.h
index 16be02f361..d0d02bc73d 100644
--- a/include/qemu/fifo8.h
+++ b/include/qemu/fifo8.h
@@ -71,7 +71,7 @@ uint8_t fifo8_pop(Fifo8 *fifo);
  * fifo8_pop_buf:
  * @fifo: FIFO to pop from
  * @max: maximum number of bytes to pop
- * @num: actual number of returned bytes
+ * @numptr: pointer filled with number of bytes returned (can be NULL)
  *
  * Pop a number of elements from the FIFO up to a maximum of max. The buffer
  * containing the popped data is returned. This buffer points directly into
@@ -82,16 +82,16 @@ uint8_t fifo8_pop(Fifo8 *fifo);
  * around in the ring buffer; in this case only a contiguous part of the data
  * is returned.
  *
- * The number of valid bytes returned is populated in *num; will always return
- * at least 1 byte. max must not be 0 or greater than the number of bytes in
- * the FIFO.
+ * The number of valid bytes returned is populated in *numptr; will always
+ * return at least 1 byte. max must not be 0 or greater than the number of
+ * bytes in the FIFO.
  *
  * Clients are responsible for checking the availability of requested data
  * using fifo8_num_used().
  *
  * Returns: A pointer to popped data.
  */
-const uint8_t *fifo8_pop_buf(Fifo8 *fifo, uint32_t max, uint32_t *num);
+const uint8_t *fifo8_pop_buf(Fifo8 *fifo, uint32_t max, uint32_t *numptr);
 
 /**
  * fifo8_reset:
diff --git a/util/fifo8.c b/util/fifo8.c
index d4d1c135e0..032e985440 100644
--- a/util/fifo8.c
+++ b/util/fifo8.c
@@ -66,16 +66,20 @@ uint8_t fifo8_pop(Fifo8 *fifo)
 return ret;
 }
 
-const uint8_t *fifo8_pop_buf(Fifo8 *fifo, uint32_t max, uint32_t *num)
+const uint8_t *fifo8_pop_buf(Fifo8 *fifo, uint32_t max, uint32_t *numptr)
 {
 uint8_t *ret;
+uint32_t num;
 
 assert(max > 0 && max <= fifo->num);
-*num = MIN(fifo->capacity - fifo->head, max);
+num = MIN(fifo->capacity - fifo->head, max);
 ret = &fifo->data[fifo->head];
-fifo->head += *num;
+fifo->head += num;
 fifo->head %= fifo->capacity;
-fifo->num -= *num;
+fifo->num -= num;
+if (numptr) {
+*numptr = num;
+}
 return ret;
 }
 
-- 
2.38.1

[PATCH v2 3/3] util/fifo8: Introduce fifo8_peek_buf()

2023-07-05 Thread Philippe Mathieu-Daudé

To be able to peek at FIFO content without popping it,
introduce the fifo8_peek_buf() method by factoring
common content from fifo8_pop_buf().

Reviewed-by: Francisco Iglesias 
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/qemu/fifo8.h | 27 +++
 util/fifo8.c | 22 ++
 2 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/include/qemu/fifo8.h b/include/qemu/fifo8.h
index d0d02bc73d..c6295c6ff0 100644
--- a/include/qemu/fifo8.h
+++ b/include/qemu/fifo8.h
@@ -93,6 +93,33 @@ uint8_t fifo8_pop(Fifo8 *fifo);
  */
 const uint8_t *fifo8_pop_buf(Fifo8 *fifo, uint32_t max, uint32_t *numptr);
 
+/**
+ * fifo8_peek_buf: read upto max bytes from the fifo
+ * @fifo: FIFO to read from
+ * @max: maximum number of bytes to peek
+ * @numptr: pointer filled with number of bytes returned (can be NULL)
+ *
+ * Peek into a number of elements from the FIFO up to a maximum of max.
+ * The buffer containing the data peeked into is returned. This buffer points
+ * directly into the FIFO backing store. Since data is invalidated once any
+ * of the fifo8_* APIs are called on the FIFO, it is the caller responsibility
+ * to access it before doing further API calls.
+ *
+ * The function may return fewer bytes than requested when the data wraps
+ * around in the ring buffer; in this case only a contiguous part of the data
+ * is returned.
+ *
+ * The number of valid bytes returned is populated in *numptr; will always
+ * return at least 1 byte. max must not be 0 or greater than the number of
+ * bytes in the FIFO.
+ *
+ * Clients are responsible for checking the availability of requested data
+ * using fifo8_num_used().
+ *
+ * Returns: A pointer to peekable data.
+ */
+const uint8_t *fifo8_peek_buf(Fifo8 *fifo, uint32_t max, uint32_t *numptr);
+
 /**
  * fifo8_reset:
  * @fifo: FIFO to reset
diff --git a/util/fifo8.c b/util/fifo8.c
index 032e985440..e12477843e 100644
--- a/util/fifo8.c
+++ b/util/fifo8.c
@@ -66,7 +66,8 @@ uint8_t fifo8_pop(Fifo8 *fifo)
 return ret;
 }
 
-const uint8_t *fifo8_pop_buf(Fifo8 *fifo, uint32_t max, uint32_t *numptr)
+static const uint8_t *fifo8_peekpop_buf(Fifo8 *fifo, uint32_t max,
+uint32_t *numptr, bool do_pop)
 {
 uint8_t *ret;
 uint32_t num;
@@ -74,15 +75,28 @@ const uint8_t *fifo8_pop_buf(Fifo8 *fifo, uint32_t max, 
uint32_t *numptr)
 assert(max > 0 && max <= fifo->num);
 num = MIN(fifo->capacity - fifo->head, max);
 ret = &fifo->data[fifo->head];
-fifo->head += num;
-fifo->head %= fifo->capacity;
-fifo->num -= num;
+
+if (do_pop) {
+fifo->head += num;
+fifo->head %= fifo->capacity;
+fifo->num -= num;
+}
 if (numptr) {
 *numptr = num;
 }
 return ret;
 }
 
+const uint8_t *fifo8_peek_buf(Fifo8 *fifo, uint32_t max, uint32_t *numptr)
+{
+return fifo8_peekpop_buf(fifo, max, numptr, false);
+}
+
+const uint8_t *fifo8_pop_buf(Fifo8 *fifo, uint32_t max, uint32_t *numptr)
+{
+return fifo8_peekpop_buf(fifo, max, numptr, true);
+}
+
 void fifo8_reset(Fifo8 *fifo)
 {
 fifo->num = 0;
-- 
2.38.1

[PATCH 0/4] chardev/char-fe: Document FEWatchFunc and use G_SOURCE_CONTINUE/REMOVE

2023-07-05 Thread Philippe Mathieu-Daudé

Improve qio and chardev frontend documentation,
have FEWatchFunc handlers return G_SOURCE_CONTINUE/REMOVE.

Philippe Mathieu-Daudé (4):
  io/channel: Explicit QIOChannel doc is based on GLib's IOChannel
  chardev/char-fe: Clarify qemu_chr_fe_add_watch 'condition' arg is a
mask
  chardev/char-fe: Document FEWatchFunc typedef
  hw/char: Have FEWatchFunc handlers return G_SOURCE_CONTINUE/REMOVE

 include/chardev/char-fe.h | 18 --
 include/io/channel.h  |  2 +-
 hw/char/cadence_uart.c|  8 
 hw/char/cmsdk-apb-uart.c  |  6 +++---
 hw/char/ibex_uart.c   |  8 
 hw/char/nrf51_uart.c  |  4 ++--
 hw/char/serial.c  |  2 +-
 hw/char/virtio-console.c  |  2 +-
 hw/usb/redirect.c |  2 +-
 monitor/monitor.c |  2 +-
 net/vhost-user.c  |  2 +-
 11 files changed, 35 insertions(+), 21 deletions(-)

-- 
2.38.1

[PATCH 2/4] chardev/char-fe: Clarify qemu_chr_fe_add_watch 'condition' arg is a mask

2023-07-05 Thread Philippe Mathieu-Daudé

qemu_chr_fe_add_watch() can poll for multiple conditions.
It's @cond argument is a combination of all the condition bits.

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/chardev/char-fe.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/chardev/char-fe.h b/include/chardev/char-fe.h
index 8c420fa36e..309960046a 100644
--- a/include/chardev/char-fe.h
+++ b/include/chardev/char-fe.h
@@ -179,8 +179,8 @@ typedef gboolean (*FEWatchFunc)(void *do_not_use, 
GIOCondition condition, void *
 
 /**
  * qemu_chr_fe_add_watch:
- * @cond: the condition to poll for
- * @func: the function to call when the condition happens
+ * @cond: bitwise combination of conditions to poll for
+ * @func: the function to call when the conditions are satisfied
  * @user_data: the opaque pointer to pass to @func
  *
  * If the backend is connected, create and add a #GSource that fires
-- 
2.38.1

[PATCH 1/4] io/channel: Explicit QIOChannel doc is based on GLib's IOChannel

2023-07-05 Thread Philippe Mathieu-Daudé

One can get lost looking for "public API docs". Explicit
we are referring to GLib IOChannel documentation.

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/io/channel.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/io/channel.h b/include/io/channel.h
index 229bf36910..998718b470 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -97,7 +97,7 @@ struct QIOChannel {
  * The first five callbacks are mandatory to support, others
  * provide additional optional features.
  *
- * Consult the corresponding public API docs for a description
+ * Consult the corresponding GLib IOChannel public API docs for a description
  * of the semantics of each callback. io_shutdown in particular
  * must be thread-safe, terminate quickly and must not block.
  */
-- 
2.38.1

[PATCH 4/4] hw/char: Have FEWatchFunc handlers return G_SOURCE_CONTINUE/REMOVE

2023-07-05 Thread Philippe Mathieu-Daudé

GLib recommend to use G_SOURCE_REMOVE / G_SOURCE_CONTINUE
for GSourceFunc callbacks. Our FEWatchFunc is a GSourceFunc
returning such value. Use such definitions which are
"more memorable" [*].

[*] https://docs.gtk.org/glib/callback.SourceFunc.html#return-value

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/char/cadence_uart.c   | 8 
 hw/char/cmsdk-apb-uart.c | 6 +++---
 hw/char/ibex_uart.c  | 8 
 hw/char/nrf51_uart.c | 4 ++--
 hw/char/serial.c | 2 +-
 hw/char/virtio-console.c | 2 +-
 hw/usb/redirect.c| 2 +-
 monitor/monitor.c| 2 +-
 net/vhost-user.c | 2 +-
 9 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/hw/char/cadence_uart.c b/hw/char/cadence_uart.c
index 807e398541..eff0304a18 100644
--- a/hw/char/cadence_uart.c
+++ b/hw/char/cadence_uart.c
@@ -307,11 +307,11 @@ static gboolean cadence_uart_xmit(void *do_not_use, 
GIOCondition cond,
 /* instant drain the fifo when there's no back-end */
 if (!qemu_chr_fe_backend_connected(&s->chr)) {
 s->tx_count = 0;
-return FALSE;
+return G_SOURCE_REMOVE;
 }
 
 if (!s->tx_count) {
-return FALSE;
+return G_SOURCE_REMOVE;
 }
 
 ret = qemu_chr_fe_write(&s->chr, s->tx_fifo, s->tx_count);
@@ -326,12 +326,12 @@ static gboolean cadence_uart_xmit(void *do_not_use, 
GIOCondition cond,
 cadence_uart_xmit, s);
 if (!r) {
 s->tx_count = 0;
-return FALSE;
+return G_SOURCE_REMOVE;
 }
 }
 
 uart_update_status(s);
-return FALSE;
+return G_SOURCE_REMOVE;
 }
 
 static void uart_write_tx_fifo(CadenceUARTState *s, const uint8_t *buf,
diff --git a/hw/char/cmsdk-apb-uart.c b/hw/char/cmsdk-apb-uart.c
index f8dc89ee3d..d466cd93de 100644
--- a/hw/char/cmsdk-apb-uart.c
+++ b/hw/char/cmsdk-apb-uart.c
@@ -199,7 +199,7 @@ static gboolean uart_transmit(void *do_not_use, 
GIOCondition cond, void *opaque)
 s->watch_tag = 0;
 
 if (!(s->ctrl & R_CTRL_TX_EN_MASK) || !(s->state & R_STATE_TXFULL_MASK)) {
-return FALSE;
+return G_SOURCE_REMOVE;
 }
 
 ret = qemu_chr_fe_write(&s->chr, &s->txbuf, 1);
@@ -215,7 +215,7 @@ static gboolean uart_transmit(void *do_not_use, 
GIOCondition cond, void *opaque)
 }
 /* Transmit pending */
 trace_cmsdk_apb_uart_tx_pending();
-return FALSE;
+return G_SOURCE_REMOVE;
 }
 
 buffer_drained:
@@ -227,7 +227,7 @@ buffer_drained:
 s->intstatus |= R_INTSTATUS_TX_MASK;
 }
 cmsdk_apb_uart_update(s);
-return FALSE;
+return G_SOURCE_REMOVE;
 }
 
 static void uart_cancel_transmit(CMSDKAPBUART *s)
diff --git a/hw/char/ibex_uart.c b/hw/char/ibex_uart.c
index f70adb5308..51708c0836 100644
--- a/hw/char/ibex_uart.c
+++ b/hw/char/ibex_uart.c
@@ -147,7 +147,7 @@ static gboolean ibex_uart_xmit(void *do_not_use, 
GIOCondition cond,
 /* instant drain the fifo when there's no back-end */
 if (!qemu_chr_fe_backend_connected(&s->chr)) {
 s->tx_level = 0;
-return FALSE;
+return G_SOURCE_REMOVE;
 }
 
 if (!s->tx_level) {
@@ -156,7 +156,7 @@ static gboolean ibex_uart_xmit(void *do_not_use, 
GIOCondition cond,
 s->uart_intr_state |= R_INTR_STATE_TX_EMPTY_MASK;
 s->uart_intr_state &= ~R_INTR_STATE_TX_WATERMARK_MASK;
 ibex_uart_update_irqs(s);
-return FALSE;
+return G_SOURCE_REMOVE;
 }
 
 ret = qemu_chr_fe_write(&s->chr, s->tx_fifo, s->tx_level);
@@ -171,7 +171,7 @@ static gboolean ibex_uart_xmit(void *do_not_use, 
GIOCondition cond,
 ibex_uart_xmit, s);
 if (!r) {
 s->tx_level = 0;
-return FALSE;
+return G_SOURCE_REMOVE;
 }
 }
 
@@ -192,7 +192,7 @@ static gboolean ibex_uart_xmit(void *do_not_use, 
GIOCondition cond,
 }
 
 ibex_uart_update_irqs(s);
-return FALSE;
+return G_SOURCE_REMOVE;
 }
 
 static void uart_write_tx_fifo(IbexUartState *s, const uint8_t *buf,
diff --git a/hw/char/nrf51_uart.c b/hw/char/nrf51_uart.c
index 3c6f982de9..dfe2276d71 100644
--- a/hw/char/nrf51_uart.c
+++ b/hw/char/nrf51_uart.c
@@ -93,13 +93,13 @@ static gboolean uart_transmit(void *do_not_use, 
GIOCondition cond, void *opaque)
  */
 goto buffer_drained;
 }
-return FALSE;
+return G_SOURCE_REMOVE;
 }
 
 buffer_drained:
 s->reg[R_UART_TXDRDY] = 1;
 s->pending_tx_byte = false;
-return FALSE;
+return G_SOURCE_REMOVE;
 }
 
 static void uart_cancel_transmit(NRF51UARTState *s)
diff --git a/hw/char/serial.c b/hw/char/serial.c
index 270e1b1094..f3094f860f 100644
--- a/hw/char/serial.c
+++ b/hw/char/serial.c
@@ -226,7 +226,7 @@ static gboolean serial_watch_cb(void *do_not_use, 
GIOCondition cond,
 SerialState *s = opaque;
 s->watch_tag = 0;
 serial_xmit(s);
-return FALSE;
+return G_SOURCE_REMOVE;
 }
 
 static vo

[PATCH 3/4] chardev/char-fe: Document FEWatchFunc typedef

2023-07-05 Thread Philippe Mathieu-Daudé

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/chardev/char-fe.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/include/chardev/char-fe.h b/include/chardev/char-fe.h
index 309960046a..ec3f706a14 100644
--- a/include/chardev/char-fe.h
+++ b/include/chardev/char-fe.h
@@ -175,6 +175,20 @@ void qemu_chr_fe_printf(CharBackend *be, const char *fmt, 
...)
 G_GNUC_PRINTF(2, 3);
 
 
+/**
+ * FEWatchFunc: a #GSourceFunc called when any conditions requested by
+ *  qemu_chr_fe_add_watch() is satisfied.
+ * @do_not_use: depending on the underlying chardev, a GIOChannel or a
+ *  QIOChannel. DO NOT USE!
+ * @cond: bitwise combination of conditions watched and satisfied
+ *  before calling this callback.
+ * @data: user data passed at creation to qemu_chr_fe_add_watch(). Can
+ *  be NULL.
+ *
+ * Returns: G_SOURCE_REMOVE if the GSource should be removed from the
+ *  main loop, or G_SOURCE_CONTINUE to leave the GSource in
+ *  the main loop.
+ */
 typedef gboolean (*FEWatchFunc)(void *do_not_use, GIOCondition condition, void 
*data);
 
 /**
-- 
2.38.1

[PATCH] meson.build: Skip C++ detection unless we're targeting Windows

2023-07-05 Thread Thomas Huth

The only C++ code that we currently still have in the repository
is the code in qga/vss-win32/ - so we can skip the C++ detection
unless we are compiling binaries for Windows.

Signed-off-by: Thomas Huth 
---
 meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index a9ba0bfab3..5559b36859 100644
--- a/meson.build
+++ b/meson.build
@@ -20,7 +20,7 @@ config_host = keyval.load(meson.current_build_dir() / 
'config-host.mak')
 
 cc = meson.get_compiler('c')
 all_languages = ['c']
-if add_languages('cpp', required: false, native: false)
+if targetos == 'windows' and add_languages('cpp', required: false, native: 
false)
   all_languages += ['cpp']
   cxx = meson.get_compiler('cpp')
 endif
-- 
2.39.3

Re: [PULL 07/11] tests/tcg/aarch64: Add testcases for IC IVAU and dual-mapped code

2023-07-05 Thread Richard Henderson


On 7/5/23 11:25, Philippe Mathieu-Daudé wrote:
/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/bin/ld: 
/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/lib/../lib/librt.a(shm_open.o): in function `shm_open':

(.text+0x3c): undefined reference to `__shm_directory'
/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/bin/ld: 
(.text+0xcc): undefined reference to `pthread_setcancelstate'
/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/bin/ld: 
(.text+0xfc): undefined reference to `pthread_setcancelstate'
/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/bin/ld: 
/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/lib/../lib/librt.a(shm_unlink.o): in function `shm_unlink':

(.text+0x30): undefined reference to `__shm_directory'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:119: icivau] Error 1
make[1]: *** Waiting for unfinished jobs
make: *** [/builds/qemu-project/qemu/tests/Makefile.include:50: 
build-tcg-tests-aarch64-linux-user] Error 2


It looks like this test needs something else.


Maybe:

icivau: LDFLAGS+=-lrt -pthread


Yes, that's the ticket I'm sure.
Today is a travel day, so I'll still let someone else re-test.


r~

Re: [PATCH v7 14/20] target/riscv/kvm.c: add multi-letter extension KVM properties

2023-07-05 Thread Andrew Jones

On Fri, Jun 30, 2023 at 07:08:05AM -0300, Daniel Henrique Barboza wrote:
> Let's add KVM user properties for the multi-letter extensions that KVM
> currently supports: zicbom, zicboz, zihintpause, zbb, ssaia, sstc,
> svinval and svpbmt.
> 
> As with MISA extensions, we're using the KVMCPUConfig type to hold
> information about the state of each extension. However, multi-letter
> extensions have more cases to cover than MISA extensions, so we're
> adding an extra 'supported' flag as well. This flag will reflect if a
> given extension is supported by KVM, i.e. KVM knows how to handle it.
> This is determined during KVM extension discovery in
> kvm_riscv_init_multiext_cfg(), where we test for EINVAL errors. Any
> other error different from EINVAL will cause an abort.
> 
> The use of the 'user_set' is similar to what we already do with MISA
> extensions: the flag set only if the user is changing the extension
> state.
> 
> The 'supported' flag will be used later on to make an exception for
> users that are disabling multi-letter extensions that are unknown to
> KVM.
> 
> Signed-off-by: Daniel Henrique Barboza 
> Reviewed-by: Andrew Jones 
> ---
>  target/riscv/cpu.c |   8 +++
>  target/riscv/kvm.c | 119 +
>  2 files changed, 127 insertions(+)
> 
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index a9df61c9b4..f348424170 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -1778,6 +1778,14 @@ static void riscv_cpu_add_user_properties(Object *obj)
>  riscv_cpu_add_misa_properties(obj);
>  
>  for (prop = riscv_cpu_extensions; prop && prop->name; prop++) {
> +#ifndef CONFIG_USER_ONLY
> +if (kvm_enabled()) {
> +/* Check if KVM created the property already */
> +if (object_property_find(obj, prop->name)) {
> +continue;
> +}
> +}
> +#endif
>  qdev_property_add_static(dev, prop);
>  }
>  
> diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
> index 7afd6024e6..6ef81a6825 100644
> --- a/target/riscv/kvm.c
> +++ b/target/riscv/kvm.c
> @@ -113,6 +113,7 @@ typedef struct KVMCPUConfig {
>  target_ulong offset;
>  int kvm_reg_id;
>  bool user_set;
> +bool supported;
>  } KVMCPUConfig;
>  
>  #define KVM_MISA_CFG(_bit, _reg_id) \
> @@ -197,6 +198,81 @@ static void kvm_riscv_update_cpu_misa_ext(RISCVCPU *cpu, 
> CPUState *cs)
>  }
>  }
>  
> +#define CPUCFG(_prop) offsetof(struct RISCVCPUConfig, _prop)
> +
> +#define KVM_EXT_CFG(_name, _prop, _reg_id) \
> +{.name = _name, .offset = CPUCFG(_prop), \
> + .kvm_reg_id = _reg_id}
> +
> +static KVMCPUConfig kvm_multi_ext_cfgs[] = {
> +KVM_EXT_CFG("zicbom", ext_icbom, KVM_RISCV_ISA_EXT_ZICBOM),
> +KVM_EXT_CFG("zicboz", ext_icboz, KVM_RISCV_ISA_EXT_ZICBOZ),
> +KVM_EXT_CFG("zihintpause", ext_zihintpause, 
> KVM_RISCV_ISA_EXT_ZIHINTPAUSE),
> +KVM_EXT_CFG("zbb", ext_zbb, KVM_RISCV_ISA_EXT_ZBB),
> +KVM_EXT_CFG("ssaia", ext_ssaia, KVM_RISCV_ISA_EXT_SSAIA),
> +KVM_EXT_CFG("sstc", ext_sstc, KVM_RISCV_ISA_EXT_SSTC),
> +KVM_EXT_CFG("svinval", ext_svinval, KVM_RISCV_ISA_EXT_SVINVAL),
> +KVM_EXT_CFG("svpbmt", ext_svpbmt, KVM_RISCV_ISA_EXT_SVPBMT),
> +};
> +
> +static void kvm_cpu_cfg_set(RISCVCPU *cpu, KVMCPUConfig *multi_ext,
> +uint32_t val)
> +{
> +int cpu_cfg_offset = multi_ext->offset;
> +bool *ext_enabled = (void *)&cpu->cfg + cpu_cfg_offset;
> +
> +*ext_enabled = val;
> +}
> +
> +static uint32_t kvm_cpu_cfg_get(RISCVCPU *cpu,
> +KVMCPUConfig *multi_ext)
> +{
> +int cpu_cfg_offset = multi_ext->offset;
> +bool *ext_enabled = (void *)&cpu->cfg + cpu_cfg_offset;
> +
> +return *ext_enabled;
> +}
> +
> +static void kvm_cpu_set_multi_ext_cfg(Object *obj, Visitor *v,
> +  const char *name,
> +  void *opaque, Error **errp)
> +{
> +KVMCPUConfig *multi_ext_cfg = opaque;
> +RISCVCPU *cpu = RISCV_CPU(obj);
> +bool value, host_val;
> +
> +if (!visit_type_bool(v, name, &value, errp)) {
> +return;
> +}
> +
> +host_val = kvm_cpu_cfg_get(cpu, multi_ext_cfg);
> +
> +/*
> + * Ignore if the user is setting the same value
> + * as the host.
> + */
> +if (value == host_val) {
> +return;
> +}
> +
> +if (!multi_ext_cfg->supported) {
> +/*
> + * Error out if the user is trying to enable an
> + * extension that KVM doesn't support. Ignore
> + * option otherwise.
> + */
> +if (value) {
> +error_setg(errp, "KVM does not support disabling extension %s",
> +   multi_ext_cfg->name);
> +}
> +
> +return;
> +}
> +
> +multi_ext_cfg->user_set = true;
> +kvm_cpu_cfg_set(cpu, multi_ext_cfg, value);
> +}
> +
>  static void kvm_riscv_add_cpu_user_properties(Object *cpu_obj)
>  {
>  in

Re: GLibC AMD CPUID cache reporting regression (was Re: qemu-user self emulation broken with default CPU on x86/x64)

2023-07-05 Thread Karumanchi, Sajan

[AMD Official Use Only - General]

++ Prem


Thanks & Regards,

Sajan K.




From: Florian Weimer 
Sent: Tuesday, July 4, 2023 11:07 PM
To: Daniel P.Berrangé 
Cc: Pierrick Bouvier ; qemu-devel@nongnu.org 
; Richard Henderson ; 
laur...@vivier.eu ; Karumanchi, Sajan 

Subject: Re: GLibC AMD CPUID cache reporting regression (was Re: qemu-user self 
emulation broken with default CPU on x86/x64)

Caution: This message originated from an External Source. Use proper caution 
when opening attachments, clicking links, or responding.


* Daniel P. Berrangé:

> On Mon, Jul 03, 2023 at 06:03:08PM +0200, Pierrick Bouvier wrote:
>> Hi everyone,
>>
>> Recently (in d135f781 [1], between v7.0.0 and v8.0.0), qemu-user default cpu
>> was updated to "max" instead of qemu32/qemu64.
>>
>> This change "broke" qemu self emulation if this new default cpu is used.
>>
>> $ ./qemu-x86_64 ./qemu-x86_64 --version
>> qemu-x86_64: ../util/cacheflush.c:212: init_cache_info: Assertion `(isize &
>> (isize - 1)) == 0' failed.
>> qemu: uncaught target signal 6 (Aborted) - core dumped
>> Aborted
>>
>> By setting cpu back to qemu64, it works again.
>> $ ./qemu-x86_64 -cpu qemu64 ./qemu-x86_64  --version
>> qemu-x86_64 version 8.0.50 (v8.0.0-2317-ge125b08ed6)
>> Copyright (c) 2003-2023 Fabrice Bellard and the QEMU Project developers
>>
>> Commenting assert does not work, as qemu aligned malloc fail shortly after.
>>
>> I'm willing to fix it, but I'm not sure what is the issue with "max" cpu
>> exactly. Is it missing CPU cache line, or something else?
>
> I've observed GLibC is issuing CPUID leaf 0x8000_001d
>
> QEMU 'max' CPU model doesn't defnie xlevel, so QEMU makes it default
> to the same as min_xlevel, which is calculated to be 0x8000_000a.
>
> cpu_x86_cpuid() in QEMU sees CPUID leaf 0x8000_001d is above 0x8000_000a,
> and so  considers it an invaild CPUID and thus forces it to report
> 0x_000d which is supposedly what an invalid CPUID leaf should do.
>
>
> Net result: glibc is asking for 0x8000_001d, but getting back data
> for 0x_000d.
>
> This doesn't end happily for obvious reasons, getting garbage for
> the dcache sizes.
>
>
> The 'qemu64' CPU model also gets CPUID leaf 0x8000_001d capped back
> to 0x_000d, but crucially qemu64 lacks the 'xsave' feature bit,
> so QEMU returns all-zeroes for CPUID leaf 0x_000d. Still not
> good, but this makes glibc report 0 for DCACHE_*, which in turn
> avoids tripping up the nested qemu which queries DCACHE sysconf.
>
> So the problem is thus more widespread than just 'max' CPU model.
>
> Any QEMU CPU model with vendor=AuthenticAMD and the xsave feature,
> and the xlevel unset, will cause glibc to report garbage for the
> L1D cache info
>
> Any QEMU CPU model with vendor=AuthenticAMD and without the xsave
> feature, and the xlevel unset, will cause glibc to report zeroes
> for L1D cache info
>
> Neither is good, but the latter at least doesn't trip up the
> nested QEMU when it queries L1D cache info.
>
> I'm unsure if QEMU's behaviour is correct with calculating the
> default 'xlevel' values for 'max', but I'm assuming the xlevel
> was correct for Opteron_G4/5 since those are explicitly set
> in the code for along time.

We are tracking this as:

  New AMD cache size computation logic does not work for some CPUs,
  hypervisors
  

I filed it after we resolved the earlier crashes because the data is
clearly not accurate.  I was also able to confirm that impacts more than
just hypervisors.

Sajan posted a first patch:

  [PATCH] x86: Fix for cache computation on AMD legacy cpus.
  

However, it changes the reported cache sizes on some older CPUs compared
to what we had before (although the values are no longer zero at least).

Thanks,
Florian

Re: [PATCH v2] virtio: add a new vcpu watchdog

2023-07-05 Thread Alex Bennée



zhanghao1  writes:

> Each vcpu creates a corresponding timer task. The watchdog
> is driven by a timer according to a certain period. Each time
> the timer expires, the counter is decremented. When the counter
> is "0", the watchdog considers the vcpu to be stalling and resets
> the VM. To avoid watchdog expiration, the guest kernel driver
> needs to periodically send a pet event to update the counter.
>
> Signed-off-by: zhanghao1 
> ---
> v2:
>  - change the function name and remove the redundant word 'stall'
>  - add trace-events to replace DPRINTF and qemu_log
>  - call 'watchdog_perform_action()' to reset vm
>  - update g_new0 to replace malloc
>  - update only use '.generic_name'
>  - update the bool variable 'is_initialized' to uint32_t
>
> v1: 
> https://lore.kernel.org/qemu-devel/20230615061302.301754-1-zhangh...@kylinos.cn/

> +static void virtio_vcpu_watchdog_device_realize(DeviceState *dev, Error 
> **errp)
> +{
> +VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> +vwdt = VIRTIO_VCPU_WATCHDOG(dev);
> +
> +virtio_init(vdev, VIRTIO_ID_WATCHDOG, 0);

This will never compile because VIRTIO_ID_WATCHDOG isn't defined
anywhere.

Next time you post you need to also include a link to the kernel side of
the driver and the virtio specification (or inflight patch for it).

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH] Hexagon: move GETPC() calls to top level helpers

2023-07-05 Thread Richard Henderson


On 7/5/23 14:45, Matheus Tavares Bernardino wrote:

Taylor  wrote:
Note that check_noshuf calls HELPER(probe_noshuf_load) and
HELPER(commit_store).  Both of those call GETPC() from within.  So, you'll
need to pull the contents into separate functions that take ra as an
argument.


Ah, good point. It was my understanding that, in case of a memory
exception in one of those nested helper calls, the GETPC() we would want
to use for unwinding was the one from the most recent helper. I'm still
trying to wrap my head around these concepts, though, so I might have
misunderstood it. Is this not the case?


No, it is not the case.

GETPC fetches the return address from the current function.

The unwinder which uses this value needs the return address *into* the 
generated code.

Therefore the HELPER function that is directly called from generated code is the place at 
which GETPC must be used, and nowhere else.  The corollary is that one HELPER should avoid 
calling another HELPER, Just In Case -- use separate intermediate functions instead, as 
Taylor suggests above.



r~

Re: [PATCH 1/2] target/arm: Suppress more TCG unimplemented features in ID registers

2023-07-05 Thread Richard Henderson


On 7/4/23 15:06, Peter Maydell wrote:

We already squash the ID register field for FEAT_SPE (the Statistical
Profiling Extension) because TCG does not implement it and if we
advertise it to the guest the guest will crash trying to look at
non-existent system registers.  Do the same for some other features
which a real hardware Neoverse-V1 implements but which TCG doesn't:
  * FEAT_TRF (Self-hosted Trace Extension)
  * Trace Macrocell system register access
  * Memory mapped trace
  * FEAT_AMU (Activity Monitors Extension)
  * FEAT_MPAM (Memory Partitioning and Monitoring Extension)
  * FEAT_NV (Nested Virtualization)

Most of these, like FEAT_SPE, are "introspection/trace" type features
which QEMU is unlikely to ever implement.  The odd-one-out here is
FEAT_NV -- we could implement that and at some point we probably
will.

Signed-off-by: Peter Maydell
---
  target/arm/cpu.c | 33 +
  1 file changed, 29 insertions(+), 4 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH 2/2] target/arm: Define neoverse-v1

2023-07-05 Thread Richard Henderson


On 7/4/23 15:06, Peter Maydell wrote:

If you're checking the values against the TRM, note that the
summary tables differ from the register description in the TRM
for ID_AA64DFR0_EL1, ID_AA64ZFR0_EL1 and ID_PFR0_EL1: we
trust the versions in the register descriptions. Also the
MIDR value in the r1p2 TRM isn't updated from r1p1.
The CCSIDR_EL1 values in the TRM unfortunately seem to be wrong:
the comment in the patch describes how I've calculated the
values used here.

...

+cpu->isar.id_aa64mmfr2 = 0x1011ull;


I see 0x0220011102101011, not in your list of exceptions above.

Otherwise,
Reviewed-by: Richard Henderson 


r~

[PATCH 4/4] QGA VSS: Add log in functions begin/end

2023-07-05 Thread Konstantin Kostiuk

Signed-off-by: Konstantin Kostiuk 
---
 qga/vss-win32/install.cpp   | 33 +
 qga/vss-win32/provider.cpp  |  3 +++
 qga/vss-win32/requester.cpp | 34 ++
 3 files changed, 70 insertions(+)

diff --git a/qga/vss-win32/install.cpp b/qga/vss-win32/install.cpp
index c10a397e51..8364f60a30 100644
--- a/qga/vss-win32/install.cpp
+++ b/qga/vss-win32/install.cpp
@@ -100,6 +100,8 @@ HRESULT put_Value(ICatalogObject *pObj, LPCWSTR name, T val)
 /* Lookup Administrators group name from winmgmt */
 static HRESULT GetAdminName(_bstr_t *name)
 {
+PRINT_DEBUG_BEGIN;
+
 HRESULT hr;
 COMPointer pLoc;
 COMPointer pSvc;
@@ -142,6 +144,7 @@ static HRESULT GetAdminName(_bstr_t *name)
 }
 
 out:
+PRINT_DEBUG_END;
 return hr;
 }
 
@@ -149,6 +152,8 @@ out:
 static HRESULT getNameByStringSID(
 const wchar_t *sid, LPWSTR buffer, LPDWORD bufferLen)
 {
+PRINT_DEBUG_BEGIN;
+
 HRESULT hr = S_OK;
 PSID psid = NULL;
 SID_NAME_USE groupType;
@@ -168,6 +173,7 @@ static HRESULT getNameByStringSID(
 LocalFree(psid);
 
 out:
+PRINT_DEBUG_END;
 return hr;
 }
 
@@ -175,6 +181,8 @@ out:
 static HRESULT QGAProviderFind(
 HRESULT (*found)(ICatalogCollection *, int, void *), void *arg)
 {
+PRINT_DEBUG_BEGIN;
+
 HRESULT hr;
 COMInitializer initializer;
 COMPointer pUnknown;
@@ -205,12 +213,15 @@ static HRESULT QGAProviderFind(
 chk(pColl->SaveChanges(&n));
 
 out:
+PRINT_DEBUG_END;
 return hr;
 }
 
 /* Count QGA VSS provider in COM+ Application Catalog */
 static HRESULT QGAProviderCount(ICatalogCollection *coll, int i, void *arg)
 {
+PRINT_DEBUG_BEGIN;
+
 (*(int *)arg)++;
 return S_OK;
 }
@@ -218,28 +229,35 @@ static HRESULT QGAProviderCount(ICatalogCollection *coll, 
int i, void *arg)
 /* Remove QGA VSS provider from COM+ Application Catalog Collection */
 static HRESULT QGAProviderRemove(ICatalogCollection *coll, int i, void *arg)
 {
+PRINT_DEBUG_BEGIN;
 HRESULT hr;
 
 PRINT_DEBUG("Removing COM+ Application: %s", QGA_PROVIDER_NAME);
 chk(coll->Remove(i));
 out:
+PRINT_DEBUG_END;
 return hr;
 }
 
 /* Unregister this module from COM+ Applications Catalog */
 STDAPI COMUnregister(void)
 {
+PRINT_DEBUG_BEGIN;
+
 HRESULT hr;
 
 DllUnregisterServer();
 chk(QGAProviderFind(QGAProviderRemove, NULL));
 out:
+PRINT_DEBUG_END;
 return hr;
 }
 
 /* Register this module to COM+ Applications Catalog */
 STDAPI COMRegister(void)
 {
+PRINT_DEBUG_BEGIN;
+
 HRESULT hr;
 COMInitializer initializer;
 COMPointer pUnknown;
@@ -259,12 +277,14 @@ STDAPI COMRegister(void)
 
 if (!g_hinstDll) {
 errmsg(E_FAIL, "Failed to initialize DLL");
+PRINT_DEBUG_END;
 return E_FAIL;
 }
 
 chk(QGAProviderFind(QGAProviderCount, (void *)&count));
 if (count) {
 errmsg(E_ABORT, "QGA VSS Provider is already installed");
+PRINT_DEBUG_END;
 return E_ABORT;
 }
 
@@ -355,6 +375,7 @@ out:
 COMUnregister();
 }
 
+PRINT_DEBUG_END;
 return hr;
 }
 
@@ -370,6 +391,8 @@ STDAPI_(void) CALLBACK DLLCOMUnregister(HWND, HINSTANCE, 
LPSTR, int)
 
 static BOOL CreateRegistryKey(LPCTSTR key, LPCTSTR value, LPCTSTR data)
 {
+PRINT_DEBUG_BEGIN;
+
 HKEY  hKey;
 LONG  ret;
 DWORD size;
@@ -390,6 +413,7 @@ static BOOL CreateRegistryKey(LPCTSTR key, LPCTSTR value, 
LPCTSTR data)
 RegCloseKey(hKey);
 
 out:
+PRINT_DEBUG_END;
 if (ret != ERROR_SUCCESS) {
 /* As we cannot printf within DllRegisterServer(), show a dialog. */
 errmsg_dialog(ret, "Cannot add registry", key);
@@ -401,6 +425,8 @@ out:
 /* Register this dll as a VSS provider */
 STDAPI DllRegisterServer(void)
 {
+PRINT_DEBUG_BEGIN;
+
 COMInitializer initializer;
 COMPointer pVssAdmin;
 HRESULT hr = E_FAIL;
@@ -479,12 +505,15 @@ out:
 DllUnregisterServer();
 }
 
+PRINT_DEBUG_END;
 return hr;
 }
 
 /* Unregister this VSS hardware provider from the system */
 STDAPI DllUnregisterServer(void)
 {
+PRINT_DEBUG_BEGIN;
+
 TCHAR key[256];
 COMInitializer initializer;
 COMPointer pVssAdmin;
@@ -502,6 +531,7 @@ STDAPI DllUnregisterServer(void)
 SHDeleteKey(HKEY_CLASSES_ROOT, key);
 SHDeleteKey(HKEY_CLASSES_ROOT, g_szProgid);
 
+PRINT_DEBUG_END;
 return S_OK; /* Uninstall should never fail */
 }
 
@@ -528,6 +558,8 @@ namespace _com_util
 /* Stop QGA VSS provider service using Winsvc API  */
 STDAPI StopService(void)
 {
+PRINT_DEBUG_BEGIN;
+
 HRESULT hr = S_OK;
 SC_HANDLE manager = OpenSCManager(NULL, NULL, SC_MANAGER_ALL_ACCESS);
 SC_HANDLE service = NULL;
@@ -552,5 +584,6 @@ STDAPI StopService(void)
 out:
 CloseServiceHandle(service);
 CloseServiceHandle(manager);
+PRINT_DEBUG_END;
 return hr;
 }
diff --git a/qga/vss-win32/provider.cpp b/qga/vss-win32/provider.cpp
index 1b885e24ee..4405fb00f1 100644
--- a/qga/vss

[PATCH 1/4] QGA VSS: Add wrapper to send log to debugger and stderr

2023-07-05 Thread Konstantin Kostiuk

Signed-off-by: Konstantin Kostiuk 
---
 qga/vss-win32/vss-debug.h | 31 +++
 1 file changed, 31 insertions(+)
 create mode 100644 qga/vss-win32/vss-debug.h

diff --git a/qga/vss-win32/vss-debug.h b/qga/vss-win32/vss-debug.h
new file mode 100644
index 00..c0bdf7a3fc
--- /dev/null
+++ b/qga/vss-win32/vss-debug.h
@@ -0,0 +1,31 @@
+/*
+ * QEMU Guest Agent VSS debug declarations
+ *
+ * Copyright (C) 2023 Red Hat Inc
+ *
+ * Authors:
+ *  Konstantin Kostiuk 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include 
+
+#ifndef VSS_DEBUG_H
+#define VSS_DEBUG_H
+
+#define PRINT_DEBUG(fmt, ...) {   \
+char user_sting[512] = { 0 }; \
+char full_string[640] = { 0 };\
+snprintf(user_sting, 512, fmt, ## __VA_ARGS__);   \
+snprintf(full_string, 640, QGA_PROVIDER_NAME"[%lu]: %s %s\n", \
+GetCurrentThreadId(), __func__, user_sting);  \
+OutputDebugString(full_string);   \
+fprintf(stderr, "%s", full_string);   \
+}
+
+#define PRINT_DEBUG_BEGIN PRINT_DEBUG("begin")
+#define PRINT_DEBUG_END PRINT_DEBUG("end")
+
+#endif
-- 
2.34.1

[PATCH 0/4] QGA VSS Logging

2023-07-05 Thread Konstantin Kostiuk

Print all VSS error and trace to debugger and stderr.

Konstantin Kostiuk (4):
  QGA VSS: Add wrapper to send log to debugger and stderr
  QGA VSS: Replace 'fprintf(stderr' with PRINT_DEBUG
  QGA VSS: Print error in err_set
  QGA VSS: Add log in functions begin/end

 qga/vss-win32/install.cpp   | 46 -
 qga/vss-win32/provider.cpp  |  3 +++
 qga/vss-win32/requester.cpp | 51 -
 qga/vss-win32/vss-debug.h   | 31 ++
 4 files changed, 118 insertions(+), 13 deletions(-)
 create mode 100644 qga/vss-win32/vss-debug.h

--
2.34.1

[PATCH 2/4] QGA VSS: Replace 'fprintf(stderr' with PRINT_DEBUG

2023-07-05 Thread Konstantin Kostiuk

Signed-off-by: Konstantin Kostiuk 
---
 qga/vss-win32/install.cpp   | 13 +++--
 qga/vss-win32/requester.cpp |  9 +
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/qga/vss-win32/install.cpp b/qga/vss-win32/install.cpp
index ff93b08a9e..c10a397e51 100644
--- a/qga/vss-win32/install.cpp
+++ b/qga/vss-win32/install.cpp
@@ -13,6 +13,7 @@
 #include "qemu/osdep.h"
 
 #include "vss-common.h"
+#include "vss-debug.h"
 #ifdef HAVE_VSS_SDK
 #include 
 #else
@@ -54,7 +55,7 @@ void errmsg(DWORD err, const char *text)
   FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS,
   NULL, err, MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT),
   (char *)&msg, 0, NULL);
-fprintf(stderr, "%.*s. (Error: %lx) %s\n", len, text, err, msg);
+PRINT_DEBUG("%.*s. (Error: %lx) %s\n", len, text, err, msg);
 LocalFree(msg);
 }
 
@@ -219,7 +220,7 @@ static HRESULT QGAProviderRemove(ICatalogCollection *coll, 
int i, void *arg)
 {
 HRESULT hr;
 
-fprintf(stderr, "Removing COM+ Application: %s\n", QGA_PROVIDER_NAME);
+PRINT_DEBUG("Removing COM+ Application: %s", QGA_PROVIDER_NAME);
 chk(coll->Remove(i));
 out:
 return hr;
@@ -304,9 +305,9 @@ STDAPI COMRegister(void)
 }
 strcpy(tlbPath, dllPath);
 strcpy(tlbPath+n-3, "tlb");
-fprintf(stderr, "Registering " QGA_PROVIDER_NAME ":\n");
-fprintf(stderr, "  %s\n", dllPath);
-fprintf(stderr, "  %s\n", tlbPath);
+PRINT_DEBUG("Registering " QGA_PROVIDER_NAME ":");
+PRINT_DEBUG("  %s", dllPath);
+PRINT_DEBUG("  %s", tlbPath);
 if (!PathFileExists(tlbPath)) {
 hr = HRESULT_FROM_WIN32(ERROR_FILE_NOT_FOUND);
 errmsg(hr, "Failed to lookup tlb");
@@ -517,7 +518,7 @@ namespace _com_util
 }
 
 if (mbstowcs(bstr, ascii, len) == (size_t)-1) {
-fprintf(stderr, "Failed to convert string '%s' into BSTR", ascii);
+PRINT_DEBUG("Failed to convert string '%s' into BSTR", ascii);
 bstr[0] = 0;
 }
 return bstr;
diff --git a/qga/vss-win32/requester.cpp b/qga/vss-win32/requester.cpp
index 3e998af4a8..e4f7013c62 100644
--- a/qga/vss-win32/requester.cpp
+++ b/qga/vss-win32/requester.cpp
@@ -12,6 +12,7 @@
 
 #include "qemu/osdep.h"
 #include "vss-common.h"
+#include "vss-debug.h"
 #include "requester.h"
 #include "install.h"
 #include 
@@ -59,13 +60,13 @@ STDAPI requester_init(void)
 NULL, -1, NULL, NULL, RPC_C_AUTHN_LEVEL_PKT_PRIVACY,
 RPC_C_IMP_LEVEL_IDENTIFY, NULL, EOAC_NONE, NULL);
 if (FAILED(hr)) {
-fprintf(stderr, "failed to CoInitializeSecurity (error %lx)\n", hr);
+PRINT_DEBUG("failed to CoInitializeSecurity (error %lx)", hr);
 return hr;
 }
 
 hLib = LoadLibraryA("VSSAPI.DLL");
 if (!hLib) {
-fprintf(stderr, "failed to load VSSAPI.DLL\n");
+PRINT_DEBUG("failed to load VSSAPI.DLL");
 return HRESULT_FROM_WIN32(GetLastError());
 }
 
@@ -78,14 +79,14 @@ STDAPI requester_init(void)
 #endif
 );
 if (!pCreateVssBackupComponents) {
-fprintf(stderr, "failed to get proc address from VSSAPI.DLL\n");
+PRINT_DEBUG("failed to get proc address from VSSAPI.DLL");
 return HRESULT_FROM_WIN32(GetLastError());
 }
 
 pVssFreeSnapshotProperties = (t_VssFreeSnapshotProperties)
 GetProcAddress(hLib, "VssFreeSnapshotProperties");
 if (!pVssFreeSnapshotProperties) {
-fprintf(stderr, "failed to get proc address from VSSAPI.DLL\n");
+PRINT_DEBUG("failed to get proc address from VSSAPI.DLL");
 return HRESULT_FROM_WIN32(GetLastError());
 }
 
-- 
2.34.1

[PATCH 3/4] QGA VSS: Print error in err_set

2023-07-05 Thread Konstantin Kostiuk

Signed-off-by: Konstantin Kostiuk 
---
 qga/vss-win32/requester.cpp | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/qga/vss-win32/requester.cpp b/qga/vss-win32/requester.cpp
index e4f7013c62..73c32381eb 100644
--- a/qga/vss-win32/requester.cpp
+++ b/qga/vss-win32/requester.cpp
@@ -26,9 +26,11 @@
 
 #define DEFAULT_VSS_BACKUP_TYPE VSS_BT_FULL
 
-#define err_set(e, err, fmt, ...)   \
-((e)->error_setg_win32_wrapper((e)->errp, __FILE__, __LINE__, __func__, \
-   err, fmt, ## __VA_ARGS__))
+#define err_set(e, err, fmt, ...) { \
+(e)->error_setg_win32_wrapper((e)->errp, __FILE__, __LINE__, __func__,  \
+   err, fmt, ## __VA_ARGS__);   \
+PRINT_DEBUG(fmt, ## __VA_ARGS__);   \
+}
 /* Bad idea, works only when (e)->errp != NULL: */
 #define err_is_set(e) ((e)->errp && *(e)->errp)
 /* To lift this restriction, error_propagate(), like we do in QEMU code */
-- 
2.34.1

Re: [PATCH] meson.build: Skip C++ detection unless we're targeting Windows

2023-07-05 Thread Philippe Mathieu-Daudé


On 5/7/23 15:36, Thomas Huth wrote:

The only C++ code that we currently still have in the repository
is the code in qga/vss-win32/ - so we can skip the C++ detection
unless we are compiling binaries for Windows.

Signed-off-by: Thomas Huth 
---
  meson.build | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v4 02/37] tests/multiarch: Add test-aes

2023-07-05 Thread Richard Henderson


On 7/3/23 14:08, Christoph Müllner wrote:

+config-cc.mak: Makefile
+   $(quiet-@)( \
+   $(call cc-option,-mrv64g_zk, CROSS_CC_HAS_ZK) \


I think this should be "-march=rv64g_zk" instead of "-mrv64g_zk"
Otherwise this will always fail.


*shrug* FWIW, something built for me...



But I would drop this gating mechanism and use the idea that you
proposed for Zfa:


... however, you're right, I should be using .insn here.
Thanks for the translations, I've c-n-p'd them in.


r~

Re: [PATCH v2 4/4] tests/avocado: Add powernv machine test script

2023-07-05 Thread Cédric Le Goater


On 7/5/23 14:06, Nicholas Piggin wrote:

This copies ppc_pseries.py to start a set of powernv tests, including
a Linux boot test for the newly added SMT mode.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
I didn't add the powernv10 support yet as Cedric suggested, and kept the
same vmlinuz because it's common with the pseries tests. We should do
that in later tests though. Might be time to update default to power10
soon if the model is becoming more complete...


power10 is as good as power9, I would say. power9 has been the default for
4 years now. power9 systems are publicly  available though. As you wish.

If we had some ways to deactivate the doorbells when booting Linux, we
would test XIVE with the CPU IPIs also with the tests you introduced.

Pity this wasn't merged :

  https://lore.kernel.org/all/20211105102636.1016378-12-...@kaod.org/

May be it is possible to tweak the CPU features in OPAL with a QEMU
DT property to remove doorbells ?

C.




Thanks,
Nick

  tests/avocado/ppc_powernv.py | 87 
  1 file changed, 87 insertions(+)
  create mode 100644 tests/avocado/ppc_powernv.py

diff --git a/tests/avocado/ppc_powernv.py b/tests/avocado/ppc_powernv.py
new file mode 100644
index 00..d0e5c07bde
--- /dev/null
+++ b/tests/avocado/ppc_powernv.py
@@ -0,0 +1,87 @@
+# Test that Linux kernel boots on ppc powernv machines and check the console
+#
+# Copyright (c) 2018, 2020 Red Hat, Inc.
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or
+# later.  See the COPYING file in the top-level directory.
+
+from avocado.utils import archive
+from avocado_qemu import QemuSystemTest
+from avocado_qemu import wait_for_console_pattern
+
+class powernvMachine(QemuSystemTest):
+
+timeout = 90
+KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 '
+panic_message = 'Kernel panic - not syncing'
+good_message = 'VFS: Cannot open root device'
+
+def do_test_linux_boot(self):
+self.require_accelerator("tcg")
+kernel_url = ('https://archives.fedoraproject.org/pub/archive'
+  '/fedora-secondary/releases/29/Everything/ppc64le/os'
+  '/ppc/ppc64/vmlinuz')
+kernel_hash = '3fe04abfc852b66653b8c3c897a59a689270bc77'
+kernel_path = self.fetch_asset(kernel_url, asset_hash=kernel_hash)
+
+self.vm.set_console()
+kernel_command_line = self.KERNEL_COMMON_COMMAND_LINE + 'console=hvc0'
+self.vm.add_args('-kernel', kernel_path,
+ '-append', kernel_command_line)
+self.vm.launch()
+
+def test_linux_boot(self):
+"""
+:avocado: tags=arch:ppc64
+:avocado: tags=machine:powernv
+:avocado: tags=accel:tcg
+"""
+
+self.do_test_linux_boot()
+console_pattern = 'VFS: Cannot open root device'
+wait_for_console_pattern(self, console_pattern, self.panic_message)
+
+def test_linux_smp_boot(self):
+"""
+:avocado: tags=arch:ppc64
+:avocado: tags=machine:powernv
+:avocado: tags=accel:tcg
+"""
+
+self.vm.add_args('-smp', '4')
+self.do_test_linux_boot()
+console_pattern = 'smp: Brought up 1 node, 4 CPUs'
+wait_for_console_pattern(self, console_pattern, self.panic_message)
+wait_for_console_pattern(self, self.good_message, self.panic_message)
+
+def test_linux_smt_boot(self):
+"""
+:avocado: tags=arch:ppc64
+:avocado: tags=machine:powernv
+:avocado: tags=accel:tcg
+"""
+
+self.vm.add_args('-smp', '4,threads=4')
+self.do_test_linux_boot()
+console_pattern = 'CPU maps initialized for 4 threads per core'
+wait_for_console_pattern(self, console_pattern, self.panic_message)
+console_pattern = 'smp: Brought up 1 node, 4 CPUs'
+wait_for_console_pattern(self, console_pattern, self.panic_message)
+wait_for_console_pattern(self, self.good_message, self.panic_message)
+
+def test_linux_big_boot(self):
+"""
+:avocado: tags=arch:ppc64
+:avocado: tags=machine:powernv
+:avocado: tags=accel:tcg
+"""
+
+self.vm.add_args('-smp', '16,threads=4,cores=2,sockets=2')
+
+# powernv does not support NUMA
+self.do_test_linux_boot()
+console_pattern = 'CPU maps initialized for 4 threads per core'
+wait_for_console_pattern(self, console_pattern, self.panic_message)
+console_pattern = 'smp: Brought up 2 nodes, 16 CPUs'
+wait_for_console_pattern(self, console_pattern, self.panic_message)
+wait_for_console_pattern(self, self.good_message, self.panic_message)

Re: [PATCH] target/arm: gdbstub: Guard M-profile code with CONFIG_TCG

2023-07-05 Thread Richard Henderson


On 7/4/23 17:44, Peter Maydell wrote:

IIUC tcg_enabled(), this guard shouldn't be necessary; if CONFIG_TCG
is not defined, tcg_enabled() evaluates to 0, and the compiler should
elide the whole block.


IME it's a bit optimistic to assume that the compiler will always
do that, especially with no optimisation enabled.


There's plenty of other places that we do.
The compiler is usually pretty good with "if (0)".

My question is if

  if (arm_feature(env, ARM_FEATURE_M) && tcg_enabled()) { 


needs to be written

if (tcg_enabled()) {
if (arm_feature(..., M) {
   ...
}
}


r~

Re: [PATCH] target/arm: Avoid over-length shift in arm_cpu_sve_finalize() error case

2023-07-05 Thread Richard Henderson


On 7/4/23 17:43, Peter Maydell wrote:

If you build QEMU with the clang sanitizer enabled, you can see it
fire when running the arm-cpu-features test:

$ QTEST_QEMU_BINARY=./build/arm-clang/qemu-system-aarch64 
./build/arm-clang/tests/qtest/arm-cpu-features
[...]
../../target/arm/cpu64.c:125:19: runtime error: shift exponent 64 is too large 
for 64-bit type 'unsigned long long'
[...]

This happens because the user can specify some incorrect SVE
properties that result in our calculating a max_vq of 0.  We catch
this and error out, but before we do that we calculate

  vq_mask = MAKE_64BIT_MASK(0, max_vq);$

and the MAKE_64BIT_MASK() call is only valid for lengths that are
greater than zero, so we hit the undefined behaviour.

Change the logic so that if max_vq is 0 we specifically set vq_mask
to 0 without going via MAKE_64BIT_MASK().  This lets us drop the
max_vq check from the error-exit logic, because if max_vq is 0 then
vq_map must now be 0.

The UB only happens in the case where the user passed us an incorrect
set of SVE properties, so it's not a big problem in practice.

Signed-off-by: Peter Maydell
---
  target/arm/cpu64.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH] target/arm: Avoid over-length shift in arm_cpu_sve_finalize() error case

2023-07-05 Thread Richard Henderson


On 7/4/23 18:00, Alex Bennée wrote:


Peter Maydell  writes:


If you build QEMU with the clang sanitizer enabled, you can see it
fire when running the arm-cpu-features test:

$ QTEST_QEMU_BINARY=./build/arm-clang/qemu-system-aarch64 
./build/arm-clang/tests/qtest/arm-cpu-features
[...]
../../target/arm/cpu64.c:125:19: runtime error: shift exponent 64 is too large 
for 64-bit type 'unsigned long long'
[...]

This happens because the user can specify some incorrect SVE
properties that result in our calculating a max_vq of 0.  We catch
this and error out, but before we do that we calculate

  vq_mask = MAKE_64BIT_MASK(0, max_vq);$

and the MAKE_64BIT_MASK() call is only valid for lengths that are
greater than zero, so we hit the undefined behaviour.


Hmm that does make me worry we could have more land mines waiting to be
found. Would converting MAKE_64BIT_MASK into an inline function and
asserting be a better solution?


I'd be tempted to keep a macro, and use __builtin_constant_p to make sure this expands to 
a constant if possible.  Ideally constants would be diagnosed at compile-time and runtime 
values get runtime asserts.



r~

Re: [PATCH 1/4] io/channel: Explicit QIOChannel doc is based on GLib's IOChannel

2023-07-05 Thread Daniel P . Berrangé

On Wed, Jul 05, 2023 at 03:31:36PM +0200, Philippe Mathieu-Daudé wrote:
> One can get lost looking for "public API docs". Explicit
> we are referring to GLib IOChannel documentation.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  include/io/channel.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/io/channel.h b/include/io/channel.h
> index 229bf36910..998718b470 100644
> --- a/include/io/channel.h
> +++ b/include/io/channel.h
> @@ -97,7 +97,7 @@ struct QIOChannel {
>   * The first five callbacks are mandatory to support, others
>   * provide additional optional features.
>   *
> - * Consult the corresponding public API docs for a description
> + * Consult the corresponding GLib IOChannel public API docs for a description

This is refering to the public API docs in this very file.

ie for io_writev callback, the qio_channel_io_writev docs define
semantics.

>   * of the semantics of each callback. io_shutdown in particular
>   * must be thread-safe, terminate quickly and must not block.
>   */
> -- 
> 2.38.1
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v2] kconfig: Add PCIe devices to s390x machines

2023-07-05 Thread Matthew Rosato

On 7/4/23 9:33 AM, Philippe Mathieu-Daudé wrote:
> On 4/7/23 14:32, Cédric Le Goater wrote:
>> On 7/4/23 14:09, Philippe Mathieu-Daudé wrote:
>>> On 4/7/23 14:01, Cédric Le Goater wrote:
 It is useful to extend the number of available PCI devices to KVM guests
 for passthrough scenarios and also to expose these models to a different
 (big endian) architecture. Include models for Intel Ethernet adapters
 and one USB controller, which all support MSI-X. Devices only supporting
 INTx won't work on s390x.

 Signed-off-by: Cédric Le Goater 
 ---

   Tested under KVM as a machine device, under KVM nested as a passthrough
   device

   hw/s390x/Kconfig | 4 
   1 file changed, 4 insertions(+)

 diff --git a/hw/s390x/Kconfig b/hw/s390x/Kconfig
 index 5e7d8a2bae8b..7a82c58cdf6e 100644
 --- a/hw/s390x/Kconfig
 +++ b/hw/s390x/Kconfig
 @@ -10,3 +10,7 @@ config S390_CCW_VIRTIO
   select SCLPCONSOLE
   select VIRTIO_CCW
   select MSI_NONBROKEN
 +    imply PCI_EXPRESS
>>>
>>> No, PCIe is a bus, which is implemented in s390-pci-bus.c;
>>> S390_CCW_VIRTIO exposes this bus, so we Kconfig SELECT it.
>>>
 +    imply E1000E_PCI_EXPRESS
 +    imply IGB_PCI_EXPRESS
 +    imply USB_XHCI_PCI
>>>
>>> These are devices you can plug on a PCIe bus, so Kconfig
>>> IMPLY is correct.
>>
>> If I understand correctly, this should be ?
>>
>> @@ -5,8 +5,11 @@ config S390_CCW_VIRTIO
>>   imply VFIO_AP
>>   imply VFIO_CCW
>>   imply WDT_DIAG288
>> -    select PCI
>> +    select PCI_EXPRESS
>>   select S390_FLIC
>>   select SCLPCONSOLE
>>   select VIRTIO_CCW
>>   select MSI_NONBROKEN
>> +    imply E1000E_PCI_EXPRESS
>> +    imply IGB_PCI_EXPRESS
>> +    imply USB_XHCI_PCI
> 
> This is how I'd write this patch. Note I have zero knowledge of zPCI.
> 

Indeed, our s390x PCI emulation is lacking in some places (e.g. missing legacy 
interrupts as Thomas indicated in a prior thread) so we want to be selective 
about what we enable.  

I have no strong objection to adding them as long as you've tested them.

Based on the above comments, will there be a v3?  I don't have the imply'd 
devices readily available for test but I did do some passthrough and virtio 
sanity-testing with s390x hardware to make sure this changes doesn't regress 
anything there.  I used the diff just above (select PCI_EXPRESS + imply*3)

Thanks,
Matt

Re: [PATCH 1/2] block/file-posix: fix g_file_get_contents return path

2023-07-05 Thread Matthew Rosato

On 6/4/23 2:16 AM, Sam Li wrote:
> The g_file_get_contents() function returns a g_boolean. If it fails, the
> returned value will be 0 instead of -1. Solve the issue by skipping
> assigning ret value.
> 
> This issue was found by Matthew Rosato using virtio-blk-{pci,ccw} backed
> by an NVMe partition e.g. /dev/nvme0n1p1 on s390x.
> 
> Signed-off-by: Sam Li 

Polite ping on this patch -- this issue still exists in master as of today and 
this patch resolves it for me.  Just want to make sure it gets into 8.1

Thanks,
Matt

> ---
>  block/file-posix.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index ac1ed54811..0d9d179a35 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1232,7 +1232,6 @@ static int hdev_get_max_hw_transfer(int fd, struct stat 
> *st)
>  static int get_sysfs_str_val(struct stat *st, const char *attribute,
>   char **val) {
>  g_autofree char *sysfspath = NULL;
> -int ret;
>  size_t len;
>  
>  if (!S_ISBLK(st->st_mode)) {
> @@ -1242,8 +1241,7 @@ static int get_sysfs_str_val(struct stat *st, const 
> char *attribute,
>  sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/%s",
>  major(st->st_rdev), minor(st->st_rdev),
>  attribute);
> -ret = g_file_get_contents(sysfspath, val, &len, NULL);
> -if (ret == -1) {
> +if (!g_file_get_contents(sysfspath, val, &len, NULL)) {
>  return -ENOENT;
>  }
>  
> @@ -1253,7 +1251,7 @@ static int get_sysfs_str_val(struct stat *st, const 
> char *attribute,
>  if (*(p + len - 1) == '\n') {
>  *(p + len - 1) = '\0';
>  }
> -return ret;
> +return 0;
>  }
>  #endif
>

[PATCH v3 2/2] qtest/migration: Use switchover-hold to speedup

2023-07-05 Thread Peter Xu

This solution is heavily based on Daniel's original approach here, but
hopefully a cleaner way to impl:

https://lore.kernel.org/r/20230601161347.1803440-11-berra...@redhat.com

The difference is we use the switchover-hold flag rather than tuning
bw+downtime to guide test convergence, comparing to use the magic offset.

This can achieve similar goal of previous patch "tests/qtest: massively
speed up migration-test" but without magic offset to write or monitoring.

With this flag, we can safely always run migration tests with full
speed (bw=0).

However for postcopy tests, when with above bw=0, it's easy to happen that
right after switching to postcopy there're merely no page left, so the
postcopy paths are not well tested.

To remedy that, don't wait for a full iteration but switch to postcopy in
the 1st iteration, adding a precopy bw limit (200MB/s for now, running
100ms) so it should guarantee enough pages left for postcopy.

One pity is that normally postcopy switchover happens after 1-2 rounds of
precopy, so qtest doesn't follow that anymore (while it was trying to).
However it also means previous postcopy tests never tested the case where
there're holes in pages (pages that never got migrated during precopy), now
we cover that too which is actually also a valid scenario for postcopy.

The initial solution can reduce migration-test time from 8min to 1min40s,
this patch can further reduce it from 1m40s to <1m per my local test.

While at it, add migrate_set_bandwidth_[pre|post]copy() and use them.

Signed-off-by: Peter Xu 
---
 tests/qtest/migration-test.c | 39 +++-
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index b9cc194100..d5584d07a9 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -433,16 +433,23 @@ static void migrate_set_parameter_bool(QTestState *who, 
const char *parameter,
 
 static void migrate_ensure_non_converge(QTestState *who)
 {
-/* Can't converge with 1ms downtime + 3 mbs bandwidth limit */
-migrate_set_parameter_int(who, "max-bandwidth", 3 * 1000 * 1000);
-migrate_set_parameter_int(who, "downtime-limit", 1);
+/* Hold off switchover for precopy only */
+migrate_set_parameter_bool(who, "switchover-hold", true);
 }
 
 static void migrate_ensure_converge(QTestState *who)
 {
-/* Should converge with 30s downtime + 1 gbs bandwidth limit */
-migrate_set_parameter_int(who, "max-bandwidth", 1 * 1000 * 1000 * 1000);
-migrate_set_parameter_int(who, "downtime-limit", 30 * 1000);
+migrate_set_parameter_bool(who, "switchover-hold", false);
+}
+
+static void migrate_set_bandwidth_precopy(QTestState *who, int bw)
+{
+migrate_set_parameter_int(who, "max-bandwidth", bw);
+}
+
+static void migrate_set_bandwidth_postcopy(QTestState *who, int bw)
+{
+migrate_set_parameter_int(who, "max-postcopy-bandwidth", bw);
 }
 
 static void migrate_pause(QTestState *who)
@@ -736,6 +743,14 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 unlink(shmem_path);
 }
 
+/*
+ * By default, use full speed for precopy in qtests as that will reduce
+ * the time of testing. The default bandwidth (128MB/s) may be too slow
+ * in this case.  Specific test can overwrite this value after the
+ * function returns but before starting migration.
+ */
+migrate_set_bandwidth_precopy(*from, 0);
+
 return 0;
 }
 
@@ -1168,9 +1183,13 @@ static int migrate_postcopy_prepare(QTestState 
**from_ptr,
 /* Wait for the first serial output from the source */
 wait_for_serial("src_serial");
 
+/*
+ * Limit precopy to 200MB/s for 0.1 sec, so we guarantee to leave
+ * enough pages (total-20MB) for remote page fault processes later.
+ */
+migrate_set_bandwidth_precopy(from, 200 * 1024 * 1024);
 migrate_qmp(from, uri, "{}");
-
-wait_for_migration_pass(from);
+usleep(10);
 
 *from_ptr = from;
 *to_ptr = to;
@@ -1270,7 +1289,7 @@ static void test_postcopy_recovery_common(MigrateCommon 
*args)
 }
 
 /* Turn postcopy speed down, 4K/s is slow enough on any machines */
-migrate_set_parameter_int(from, "max-postcopy-bandwidth", 4096);
+migrate_set_bandwidth_postcopy(from, 4096);
 
 /* Now we start the postcopy */
 migrate_postcopy_start(from, to);
@@ -1314,7 +1333,7 @@ static void test_postcopy_recovery_common(MigrateCommon 
*args)
 migrate_qmp(from, uri, "{'resume': true}");
 
 /* Restore the postcopy bandwidth to unlimited */
-migrate_set_parameter_int(from, "max-postcopy-bandwidth", 0);
+migrate_set_bandwidth_postcopy(from, 0);
 
 migrate_postcopy_complete(from, to, args);
 }
-- 
2.41.0

[PATCH v3 1/2] migration: switchover-hold parameter

2023-07-05 Thread Peter Xu

Add a new migration parameter switchover-hold which can block src qemu
migration from switching over to dest from running.

One can set this flag to true so src qemu will keep iterating the VM data,
not switching over to dest even if it can.

It means now live migration works somehow like COLO; we keep syncing data
from src to dst without stopping.

When the user is ready for the switchover, one can set the parameter from
true->false.  That'll contain a implicit kick to migration thread to be
alive and re-evaluate the switchover decision.

This can be used in two cases so far in my mind:

  (1) One can use this parameter to start pre-heating migration (but not
  really migrating, so a migrate-cancel will cancel the preheat).  When
  the user wants to really migrate, just clear the flag.  It'll in most
  cases migrate immediately because most pages are already synced.

  (2) Can also be used as a clean way to do qtest, in many of the precopy
  tests we have requirement to run after 1 iteration without completing
  the precopy migration.  Before that we have either set bandwidth to
  ridiculous low value, or tricks on detecting guest memory change over
  some adhoc guest memory position.  Now we can simply set this flag
  then we know precopy won't complete and will just keep going.

Here we leveraged a sem to make sure migration thread won't busy spin on a
physical cpu, meanwhile provide a timedwait() of 10ms so it can still try
its best to sync with dest QEMU from time to time.  Note that the sem is
prone to outdated counts but it's benign, please refer to the comment above
the semaphore definition for more information.

Signed-off-by: Peter Xu 
---
 qapi/migration.json| 25 +++--
 migration/migration.h  | 17 +
 migration/migration-hmp-cmds.c |  3 ++
 migration/migration.c  | 68 --
 migration/options.c| 17 +
 5 files changed, 124 insertions(+), 6 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 47dfef0278..c050081555 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -789,6 +789,15 @@
 # Nodes are mapped to their block device name if there is one, and
 # to their node name otherwise.  (Since 5.2)
 #
+# @switchover-hold: Whether we should hold-off precopy switchover from
+# src to dest QEMU, even if we can finish migration in the
+# downtime specified.  By default off, so precopy migration will
+# complete as soon as possible.  One can set it to explicitly keep
+# iterating during precopy migration until set the flag to false
+# again to kick off the final switchover.  Note, this does not
+# affect postcopy switchover, because the user can control that
+# using "migrate-start-postcopy" command explicitly.  (Since 8.1)
+#
 # Features:
 #
 # @unstable: Member @x-checkpoint-delay is experimental.
@@ -810,7 +819,7 @@
'xbzrle-cache-size', 'max-postcopy-bandwidth',
'max-cpu-throttle', 'multifd-compression',
'multifd-zlib-level' ,'multifd-zstd-level',
-   'block-bitmap-mapping' ] }
+   'block-bitmap-mapping', 'switchover-hold' ] }
 
 ##
 # @MigrateSetParameters:
@@ -945,6 +954,10 @@
 # Nodes are mapped to their block device name if there is one, and
 # to their node name otherwise.  (Since 5.2)
 #
+# @switchover-hold: Whether we should hold-off precopy switchover from
+# src to dest QEMU.  For more details, please refer to
+# MigrationParameter entry of the same field.  (Since 8.1)
+#
 # Features:
 #
 # @unstable: Member @x-checkpoint-delay is experimental.
@@ -982,7 +995,8 @@
 '*multifd-compression': 'MultiFDCompression',
 '*multifd-zlib-level': 'uint8',
 '*multifd-zstd-level': 'uint8',
-'*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
+'*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ],
+'*switchover-hold': 'bool' } }
 
 ##
 # @migrate-set-parameters:
@@ -1137,6 +1151,10 @@
 # Nodes are mapped to their block device name if there is one, and
 # to their node name otherwise.  (Since 5.2)
 #
+# @switchover-hold: Whether we should hold-off precopy switchover from
+# src to dest QEMU.  For more details, please refer to
+# MigrationParameter entry of the same field.  (Since 8.1)
+#
 # Features:
 #
 # @unstable: Member @x-checkpoint-delay is experimental.
@@ -1171,7 +1189,8 @@
 '*multifd-compression': 'MultiFDCompression',
 '*multifd-zlib-level': 'uint8',
 '*multifd-zstd-level': 'uint8',
-'*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
+'*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ],
+'*switchover-hold': 'bool' } }
 
 ##
 # @query-migrate-parameters:
diff --git a/migration/migration.h b/migration/migration.h
index a80b22b703..6b31a4b371 100644
--- a/migration/mig

[PATCH v3 0/2] migration: switchover-hold flag

2023-07-05 Thread Peter Xu

This v3 patchset is based on master.  Since I'm not sure how long this
series will take for review, we could probably apply Dan's previous patch
10 first, then when I repost I can provide a revert patch when needed.

v3:
- Rebase only (v2 is not yet applicable after switchover-ack series merged)

A new flag "switchover-hold" is added to allow src qemu explicitly hold
switchover for precopy migration.  Note that this flag will not affect
postcopy switchover because src qemu already has migrate-start-postcopy,
which is a finer grained knob just for that.  In general this flag only
affects reaching migration completion phase, when set it'll block it from
happening while keep the migration iteration going.

This can be used in two cases so far in my mind:

  (1) One can use this parameter to start pre-heating migration (but not
  really migrating, so a migrate-cancel will cancel the preheat).  When
  the user wants to really migrate, just clear the flag.  It'll in most
  cases migrate immediately because most pages are already synced.

  (2) Can also be used as a clean way to do qtest, in many of the precopy
  tests we have requirement to run after 1 iteration without completing
  the precopy migration.  Before that we have either set bandwidth to
  ridiculous low value, or tricks on detecting guest memory change over
  some adhoc guest memory position.  Now we can simply set this flag
  then we know precopy won't complete and will just keep going.

The 1st use case may look a bit like COLO where we can actually keep both
QEMU _mostly_ in sync.  I'm not sure whether it can be useful anywhere,
though.

Patch 1 will introduce the new flag.

Patch 2 will leverage the new flag to speed up migration-test. An initial
test is this can make migration-test finish within a little bit less than
1m.

Please have a look, thanks.

Peter Xu (2):
  migration: switchover-hold parameter
  qtest/migration: Use switchover-hold to speedup

 qapi/migration.json| 25 +++--
 migration/migration.h  | 17 +
 migration/migration-hmp-cmds.c |  3 ++
 migration/migration.c  | 68 --
 migration/options.c| 17 +
 tests/qtest/migration-test.c   | 39 ++-
 6 files changed, 153 insertions(+), 16 deletions(-)

-- 
2.41.0

Re: [PATCH v3 0/2] migration: switchover-hold flag

2023-07-05 Thread Peter Xu

On Wed, Jul 05, 2023 at 11:08:22AM -0400, Peter Xu wrote:
> This v3 patchset is based on master.  Since I'm not sure how long this
> series will take for review, we could probably apply Dan's previous patch
> 10 first, then when I repost I can provide a revert patch when needed.
> 
> v3:
> - Rebase only (v2 is not yet applicable after switchover-ack series merged)

I got an unused var in this set, I'll repost again shortly, sorry.

-- 
Peter Xu

Re: [PATCH v2] kconfig: Add PCIe devices to s390x machines

2023-07-05 Thread Cédric Le Goater


On 7/5/23 16:54, Matthew Rosato wrote:

On 7/4/23 9:33 AM, Philippe Mathieu-Daudé wrote:

On 4/7/23 14:32, Cédric Le Goater wrote:

On 7/4/23 14:09, Philippe Mathieu-Daudé wrote:

On 4/7/23 14:01, Cédric Le Goater wrote:

It is useful to extend the number of available PCI devices to KVM guests
for passthrough scenarios and also to expose these models to a different
(big endian) architecture. Include models for Intel Ethernet adapters
and one USB controller, which all support MSI-X. Devices only supporting
INTx won't work on s390x.

Signed-off-by: Cédric Le Goater 
---

   Tested under KVM as a machine device, under KVM nested as a passthrough
   device

   hw/s390x/Kconfig | 4 
   1 file changed, 4 insertions(+)

diff --git a/hw/s390x/Kconfig b/hw/s390x/Kconfig
index 5e7d8a2bae8b..7a82c58cdf6e 100644
--- a/hw/s390x/Kconfig
+++ b/hw/s390x/Kconfig
@@ -10,3 +10,7 @@ config S390_CCW_VIRTIO
   select SCLPCONSOLE
   select VIRTIO_CCW
   select MSI_NONBROKEN
+    imply PCI_EXPRESS


No, PCIe is a bus, which is implemented in s390-pci-bus.c;
S390_CCW_VIRTIO exposes this bus, so we Kconfig SELECT it.


+    imply E1000E_PCI_EXPRESS
+    imply IGB_PCI_EXPRESS
+    imply USB_XHCI_PCI


These are devices you can plug on a PCIe bus, so Kconfig
IMPLY is correct.


If I understand correctly, this should be ?

@@ -5,8 +5,11 @@ config S390_CCW_VIRTIO
   imply VFIO_AP
   imply VFIO_CCW
   imply WDT_DIAG288
-    select PCI
+    select PCI_EXPRESS
   select S390_FLIC
   select SCLPCONSOLE
   select VIRTIO_CCW
   select MSI_NONBROKEN
+    imply E1000E_PCI_EXPRESS
+    imply IGB_PCI_EXPRESS
+    imply USB_XHCI_PCI


This is how I'd write this patch. Note I have zero knowledge of zPCI.



Indeed, our s390x PCI emulation is lacking in some places (e.g. missing legacy 
interrupts as Thomas indicated in a prior thread) so we want to be selective 
about what we enable.

I have no strong objection to adding them as long as you've tested them.

Based on the above comments, will there be a v3?  I don't have the imply'd 
devices readily available for test but I did do some passthrough and virtio 
sanity-testing with s390x hardware to make sure this changes doesn't regress 
anything there.  I used the diff just above (select PCI_EXPRESS + imply*3)


Good. The VM I use for tests has the following PCI devices :

 0001:00:00.0 Ethernet controller: Mellanox Technologies MT27710 Family 
[ConnectX-4 Lx Virtual Function]
 0002:00:00.0 Non-VGA unclassified device: IBM Internal Shared Memory (ISM) 
virtual PCI device
 0003:00:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network 
Connection
 0004:00:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network 
Connection (rev 01)

The first two devices are passthrough devices from the LPAR. Then in the nested,
I simply pass one of the last two devices, emulated in QEMU. Both kernels have
the Intel net drivers to check connectivity with the LPAR. These are not shipped
by default.

Sending a v3.

Thanks,

C.

[PATCH v3] kconfig: Add PCIe devices to s390x machines

2023-07-05 Thread Cédric Le Goater

It is useful to extend the number of available PCI devices to KVM guests
for passthrough scenarios and also to expose these models to a different
(big endian) architecture. Include models for Intel Ethernet adapters
and one USB controller, which all support MSI-X. Devices only supporting
INTx won't work on s390x.

Signed-off-by: Cédric Le Goater 
---

 v3: PCI -> PCI_EXPRESS
 v2: select -> imply
  
 hw/s390x/Kconfig | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/Kconfig b/hw/s390x/Kconfig
index 5e7d8a2bae8b..ab62c9120545 100644
--- a/hw/s390x/Kconfig
+++ b/hw/s390x/Kconfig
@@ -5,8 +5,11 @@ config S390_CCW_VIRTIO
 imply VFIO_AP
 imply VFIO_CCW
 imply WDT_DIAG288
-select PCI
+select PCI_EXPRESS
 select S390_FLIC
 select SCLPCONSOLE
 select VIRTIO_CCW
 select MSI_NONBROKEN
+imply E1000E_PCI_EXPRESS
+imply IGB_PCI_EXPRESS
+imply USB_XHCI_PCI
-- 
2.41.0

Re: [PATCH v2 01/19] hw/timer/arm_timer: Declare QOM types using DEFINE_TYPES() macro

2023-07-05 Thread Richard Henderson


On 7/4/23 16:49, Philippe Mathieu-Daudé wrote:

When multiple QOM types are registered in the same file,
it is simpler to use the the DEFINE_TYPES() macro. Replace
the type_init() / type_register_static() combination.

Signed-off-by: Philippe Mathieu-Daudé
Reviewed-by: Peter Maydell
---
  hw/timer/arm_timer.c | 35 +++
  1 file changed, 15 insertions(+), 20 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH v2 02/19] hw/timer/arm_timer: Remove pointless cast from void *

2023-07-05 Thread Richard Henderson


On 7/4/23 16:49, Philippe Mathieu-Daudé wrote:

Signed-off-by: Philippe Mathieu-Daudé
Reviewed-by: Peter Maydell
---
  hw/timer/arm_timer.c | 16 
  1 file changed, 8 insertions(+), 8 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH v3] linux-user/syscall: Implement execve without execveat

2023-07-05 Thread Michael Tokarev


05.07.2023 15:10, Pierrick Bouvier wrote:

Support for execveat syscall was implemented in 55bbe4 and is available
since QEMU 8.0.0. It relies on host execveat, which is widely available
on most of Linux kernels today.

However, this change breaks qemu-user self emulation, if "host" qemu
version is less than 8.0.0. Indeed, it does not implement yet execveat.
This strange use case happens with most of distribution today having
binfmt support.

With a concrete failing example:
$ qemu-x86_64-7.2 qemu-x86_64-8.0 /bin/bash -c /bin/ls
/bin/bash: line 1: /bin/ls: Function not implemented
-> not implemented means execve returned ENOSYS

qemu-user-static 7.2 and 8.0 can be conveniently grabbed from debian
packages qemu-user-static* [1].

One usage of this is running wine-arm64 from linux-x64 (details [2]).
This is by updating qemu embedded in docker image that we ran into this
issue.

The solution to update host qemu is not always possible. Either it's
complicated or ask you to recompile it, or simply is not accessible
(GitLab CI, GitHub Actions). Thus, it could be worth to implement execve
without relying on execveat, which is the goal of this patch.

This patch was tested with example presented in this commit message.

[1] http://ftp.us.debian.org/debian/pool/main/q/qemu/
[1] https://www.linaro.org/blog/emulate-windows-on-arm/

Signed-off-by: Pierrick Bouvier 


Reviewed-by: Michael Tokarev 
Cc: qemu-sta...@nongnu.org

Thanks!

/mjt

Re: [PATCH v2 03/19] hw/timer/arm_timer: Move SP804 code around

2023-07-05 Thread Richard Henderson


On 7/4/23 16:49, Philippe Mathieu-Daudé wrote:

Move sp804_properties[] and sp804_class_init() around
with the rest of SP804 code code. What follows the
"Integrator/CP timer module." is strictly ICP related.

Signed-off-by: Philippe Mathieu-Daudé
Reviewed-by: Peter Maydell
---
  hw/timer/arm_timer.c | 30 +++---
  1 file changed, 15 insertions(+), 15 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH v2 04/19] hw/timer/arm_timer: CamelCase rename icp_pit_state -> IntegratorPIT

2023-07-05 Thread Richard Henderson


On 7/4/23 16:49, Philippe Mathieu-Daudé wrote:

Following docs/devel/style.rst guidelines, rename icp_pit_state
using CamelCase as IntegratorPIT (PIT is an acronym).

Signed-off-by: Philippe Mathieu-Daudé
Reviewed-by: Peter Maydell
---
  hw/timer/arm_timer.c | 12 ++--
  1 file changed, 6 insertions(+), 6 deletions(-)


Reviewed-by: Richard Henderson 

r~

1 2 3 >

1 - 100 of 254 matches

Mail list logo