date:20240909

[PATCH] cxl: update kernel config requirement

2024-09-09 Thread luzhixing12345

add CXL_PMEM config dependence LIBNVIDIMM

CXL_REGION config is not enabled in 5.18, and it was introduced 
into kernel in 6.0 by commit 779dd20c for dynamic provisioning of new memory 
region

Signed-off-by: luzhixing12345 
---
 docs/system/devices/cxl.rst | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/docs/system/devices/cxl.rst b/docs/system/devices/cxl.rst
index 882b036f5e..ccbe47749f 100644
--- a/docs/system/devices/cxl.rst
+++ b/docs/system/devices/cxl.rst
@@ -401,9 +401,12 @@ OS management of CXL memory devices as described here.
 * CONFIG_CXL_BUS
 * CONFIG_CXL_PCI
 * CONFIG_CXL_ACPI
-* CONFIG_CXL_PMEM
+* CONFIG_CXL_PMEM(depends on LIBNVDIMM)
 * CONFIG_CXL_MEM
 * CONFIG_CXL_PORT
+
+Dynamic provisioning of new memory region since Linux 6.0
+
 * CONFIG_CXL_REGION
 
 References
-- 
2.34.1

Re: [PATCH v3 02/14] util: Add RISC-V vector extension probe in cpuinfo

2024-09-09 Thread LIU Zhiwei




On 2024/9/5 11:34, Richard Henderson wrote:

On 9/4/24 07:27, LIU Zhiwei wrote:

+    if (info & CPUINFO_ZVE64X) {
+    /*
+ * Get vlenb for Vector: vsetvli rd, x0, e64.
+ * VLMAX = LMUL * VLEN / SEW.
+ * The "vsetvli rd, x0, e64" means "LMUL = 1, SEW = 64, rd = 
VLMAX",

+ * so "vlenb = VLMAX * 64 / 8".
+ */
+    unsigned long vlmax = 0;
+    asm volatile(".insn i 0x57, 7, %0, zero, (3 << 3)" : 
"=r"(vlmax));

+    if (vlmax) {
+    riscv_vlenb = vlmax * 8;
+    assert(riscv_vlen >= 64 && !(riscv_vlen & (riscv_vlen - 
1)));

+    } else {
+    info &= ~CPUINFO_ZVE64X;
+    }
+    }


Surely this does not compile, since the riscv_vlen referenced in the 
assert does not exist.

riscv_vlen is macro about riscv_vlenb. I think you miss it.


That said, I've done some experimentation and I believe there is a 
further simplification to be had in instead saving log2(vlenb).


    if (info & CPUINFO_ZVE64X) {
    /*
 * We are guaranteed by RVV-1.0 that VLEN is a power of 2.
 * We are guaranteed by Zve64x that VLEN >= 64, and that
 * EEW of {8,16,32,64} are supported.
 *
 * Cache VLEN in a convenient form.
 */
    unsigned long vlenb;
    asm("csrr %0, vlenb" : "=r"(vlenb));


Should we use the .insn format here? Maybe we are having a compiler 
doesn't support vector.



riscv_lg2_vlenb = ctz32(vlenb);
    }


OK.

Thanks,
Zhiwei


I'll talk about how this can be used against the next patch with vsetvl.





r~

Re: [PATCH 1/3] ui/sdl2: reenable the SDL2 Windows keyboard hook procedure

2024-09-09 Thread Marc-André Lureau

Hi

On Mon, Sep 9, 2024 at 10:22 AM Volker Rümelin  wrote:
>
> Windows only:
>
> The libSDL2 Windows message loop needs the libSDL2 Windows low
> level keyboard hook procedure to grab the left and right Windows
> keys correctly. Reenable the SDL2 Windows keyboard hook procedure.
>
> Because the QEMU Windows keyboard hook procedure is still needed
> to filter out the special left Control key event for every Alt Gr
> key event, it's important to install the two keyboard hook
> procedures in the following order. First the SDL2 procedure, then
> the QEMU procedure.
>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2139
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2323
> Tested-by: Howard Spoelstra 
> Signed-off-by: Volker Rümelin 
> ---
>  ui/sdl2.c   | 53 ++---
>  ui/win32-kbd-hook.c |  3 +++
>  2 files changed, 38 insertions(+), 18 deletions(-)
>
> diff --git a/ui/sdl2.c b/ui/sdl2.c
> index 98ed974371..ac37c173a1 100644
> --- a/ui/sdl2.c
> +++ b/ui/sdl2.c
> @@ -42,6 +42,7 @@ static SDL_Surface *guest_sprite_surface;
>  static int gui_grab; /* if true, all keyboard/mouse events are grabbed */
>  static bool alt_grab;
>  static bool ctrl_grab;
> +static bool win32_kbd_grab;
>
>  static int gui_saved_grab;
>  static int gui_fullscreen;
> @@ -202,6 +203,19 @@ static void sdl_update_caption(struct sdl2_console *scon)
>  }
>  }
>
> +static void *sdl2_win32_get_hwnd(struct sdl2_console *scon)
> +{
> +#ifdef CONFIG_WIN32
> +SDL_SysWMinfo info;
> +
> +SDL_VERSION(&info.version);
> +if (SDL_GetWindowWMInfo(scon->real_window, &info)) {
> +return info.info.win.window;
> +}
> +#endif
> +return NULL;
> +}
> +
>  static void sdl_hide_cursor(struct sdl2_console *scon)
>  {
>  if (scon->opts->has_show_cursor && scon->opts->show_cursor) {
> @@ -259,9 +273,16 @@ static void sdl_grab_start(struct sdl2_console *scon)
>  } else {
>  sdl_hide_cursor(scon);
>  }
> +/*
> + * Windows: To ensure that QEMU's low level keyboard hook procedure is
> + * called before SDL2's, the QEMU procedure must first be removed and
> + * then the SDL2 and QEMU procedures must be installed in this order.
> + */
> +win32_kbd_set_window(NULL);
>  SDL_SetWindowGrab(scon->real_window, SDL_TRUE);
> +win32_kbd_set_window(sdl2_win32_get_hwnd(scon));
>  gui_grab = 1;
> -win32_kbd_set_grab(true);
> +win32_kbd_set_grab(win32_kbd_grab);
>  sdl_update_caption(scon);
>  }
>
> @@ -370,19 +391,6 @@ static int get_mod_state(void)
>  }
>  }
>
> -static void *sdl2_win32_get_hwnd(struct sdl2_console *scon)
> -{
> -#ifdef CONFIG_WIN32
> -SDL_SysWMinfo info;
> -
> -SDL_VERSION(&info.version);
> -if (SDL_GetWindowWMInfo(scon->real_window, &info)) {
> -return info.info.win.window;
> -}
> -#endif
> -return NULL;
> -}
> -
>  static void handle_keydown(SDL_Event *ev)
>  {
>  int win;
> @@ -605,7 +613,7 @@ static void handle_windowevent(SDL_Event *ev)
>  sdl2_redraw(scon);
>  break;
>  case SDL_WINDOWEVENT_FOCUS_GAINED:
> -win32_kbd_set_grab(gui_grab);
> +win32_kbd_set_grab(win32_kbd_grab && gui_grab);
>  if (qemu_console_is_graphic(scon->dcl.con)) {
>  win32_kbd_set_window(sdl2_win32_get_hwnd(scon));
>  }
> @@ -849,6 +857,7 @@ static void sdl2_display_init(DisplayState *ds, 
> DisplayOptions *o)
>  uint8_t data = 0;
>  int i;
>  SDL_SysWMinfo info;
> +SDL_version ver;
>  SDL_Surface *icon = NULL;
>  char *dir;
>
> @@ -866,10 +875,7 @@ static void sdl2_display_init(DisplayState *ds, 
> DisplayOptions *o)
>  #ifdef SDL_HINT_VIDEO_X11_NET_WM_BYPASS_COMPOSITOR /* only available since 
> SDL 2.0.8 */
>  SDL_SetHint(SDL_HINT_VIDEO_X11_NET_WM_BYPASS_COMPOSITOR, "0");
>  #endif
> -#ifndef CONFIG_WIN32
> -/* QEMU uses its own low level keyboard hook procedure on Windows */
>  SDL_SetHint(SDL_HINT_GRAB_KEYBOARD, "1");
> -#endif
>  #ifdef SDL_HINT_ALLOW_ALT_TAB_WHILE_GRABBED
>  SDL_SetHint(SDL_HINT_ALLOW_ALT_TAB_WHILE_GRABBED, "0");
>  #endif
> @@ -877,6 +883,17 @@ static void sdl2_display_init(DisplayState *ds, 
> DisplayOptions *o)
>  SDL_EnableScreenSaver();
>  memset(&info, 0, sizeof(info));
>  SDL_VERSION(&info.version);
> +/*
> + * Since version 2.16.0 under Windows, SDL2 has its own low level
> + * keyboard hook procedure to grab the keyboard. The remaining task of
> + * QEMU's low level keyboard hook procedure is to filter out the special
> + * left Control up/down key event for every Alt Gr key event on keyboards
> + * with an international layout.
> + */
> +SDL_GetVersion(&ver);
> +if (ver.major == 2 && ver.minor < 16) {
> +win32_kbd_grab = true;
> +}
>

Note: there is no 2.16 release. They jumped from 2.0.22 to 2.24 (see
https://github.com/libsdl-org/SDL/releases/tag/release-2.24.0)

The windows hook was indeed ad

Re: [PATCH] block: support locking on change medium

2024-09-09 Thread Akihiko Odaki


On 2024/09/09 10:58, Joelle van Dyne wrote:

New optional argument for 'blockdev-change-medium' QAPI command to allow
the caller to specify if they wish to enable file locking.

Signed-off-by: Joelle van Dyne 
---
  qapi/block.json| 23 ++-
  block/monitor/block-hmp-cmds.c |  2 +-
  block/qapi-sysemu.c| 22 ++
  ui/cocoa.m |  1 +
  4 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/qapi/block.json b/qapi/block.json
index e6f5c6..35e8e2e191 100644
--- a/qapi/block.json
+++ b/qapi/block.json
@@ -309,6 +309,23 @@
  { 'enum': 'BlockdevChangeReadOnlyMode',
'data': ['retain', 'read-only', 'read-write'] }
  
+##

+# @BlockdevChangeFileLockingMode:
+#
+# Specifies the new locking mode of a file image passed to the
+# @blockdev-change-medium command.
+#
+# @auto: Use locking if API is available
+#
+# @off: Disable file image locking
+#
+# @on: Enable file image locking
+#
+# Since: 9.2
+##
+{ 'enum': 'BlockdevChangeFileLockingMode',
+  'data': ['auto', 'off', 'on'] }


You can use OnOffAuto type instead of defining your own.


+
  ##
  # @blockdev-change-medium:
  #
@@ -330,6 +347,9 @@
  # @read-only-mode: change the read-only mode of the device; defaults
  # to 'retain'
  #
+# @file-locking-mode: change the locking mode of the file image; defaults
+# to 'auto' (since: 9.2)
+#
  # @force: if false (the default), an eject request through
  # blockdev-open-tray will be sent to the guest if it has locked
  # the tray (and the tray will not be opened immediately); if true,
@@ -378,7 +398,8 @@
  'filename': 'str',
  '*format': 'str',
  '*force': 'bool',
-'*read-only-mode': 'BlockdevChangeReadOnlyMode' } }
+'*read-only-mode': 'BlockdevChangeReadOnlyMode',
+'*file-locking-mode': 'BlockdevChangeFileLockingMode' } }
  
  ##

  # @DEVICE_TRAY_MOVED:
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index bdf2eb50b6..ff64020a80 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -1007,5 +1007,5 @@ void hmp_change_medium(Monitor *mon, const char *device, 
const char *target,
  }
  
  qmp_blockdev_change_medium(device, NULL, target, arg, true, force,

-   !!read_only, read_only_mode, errp);
+   !!read_only, read_only_mode, false, 0, errp);
  }
diff --git a/block/qapi-sysemu.c b/block/qapi-sysemu.c
index e4282631d2..8064bdfb3a 100644
--- a/block/qapi-sysemu.c
+++ b/block/qapi-sysemu.c
@@ -311,6 +311,8 @@ void qmp_blockdev_change_medium(const char *device,
  bool has_force, bool force,
  bool has_read_only,
  BlockdevChangeReadOnlyMode read_only,
+bool has_file_locking_mode,
+BlockdevChangeFileLockingMode 
file_locking_mode,
  Error **errp)
  {
  BlockBackend *blk;
@@ -362,6 +364,26 @@ void qmp_blockdev_change_medium(const char *device,
  qdict_put_str(options, "driver", format);
  }
  
+if (!has_file_locking_mode) {

+file_locking_mode = BLOCKDEV_CHANGE_FILE_LOCKING_MODE_AUTO;
+}
+
+switch (file_locking_mode) {
+case BLOCKDEV_CHANGE_FILE_LOCKING_MODE_AUTO:
+break;
+
+case BLOCKDEV_CHANGE_FILE_LOCKING_MODE_OFF:
+qdict_put_str(options, "file.locking", "off");
+break;
+
+case BLOCKDEV_CHANGE_FILE_LOCKING_MODE_ON:
+qdict_put_str(options, "file.locking", "on");
+break;
+
+default:
+abort();
+}
+
  medium_bs = bdrv_open(filename, NULL, options, bdrv_flags, errp);
  
  if (!medium_bs) {

diff --git a/ui/cocoa.m b/ui/cocoa.m
index 4c2dd33532..6e73c6e13e 100644
--- a/ui/cocoa.m
+++ b/ui/cocoa.m
@@ -1611,6 +1611,7 @@ - (void)changeDeviceMedia:(id)sender
 "raw",
 true, false,
 false, 0,
+   false, 0,


This change is irrelevant.

Regards,
Akihiko Odaki

Re: [RFC PATCH 2/8] usb/uhci: Introduce and use register defines

2024-09-09 Thread Cédric Le Goater


On 9/6/24 14:25, Guenter Roeck wrote:

Introduce defines for UHCI registers to simplify adding register access
in subsequent patches of the series.

No functional change.

Signed-off-by: Guenter Roeck 



Reviewed-by: Cédric Le Goater 

Thanks,

C.



---
  hw/usb/hcd-uhci.c  | 32 
  include/hw/usb/uhci-regs.h | 11 +++
  2 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/hw/usb/hcd-uhci.c b/hw/usb/hcd-uhci.c
index dfcc3e05c0..8bc163f688 100644
--- a/hw/usb/hcd-uhci.c
+++ b/hw/usb/hcd-uhci.c
@@ -389,7 +389,7 @@ static void uhci_port_write(void *opaque, hwaddr addr,
  trace_usb_uhci_mmio_writew(addr, val);
  
  switch (addr) {

-case 0x00:
+case UHCI_USBCMD:
  if ((val & UHCI_CMD_RS) && !(s->cmd & UHCI_CMD_RS)) {
  /* start frame processing */
  trace_usb_uhci_schedule_start();
@@ -424,7 +424,7 @@ static void uhci_port_write(void *opaque, hwaddr addr,
  }
  }
  break;
-case 0x02:
+case UHCI_USBSTS:
  s->status &= ~val;
  /*
   * XXX: the chip spec is not coherent, so we add a hidden
@@ -435,27 +435,27 @@ static void uhci_port_write(void *opaque, hwaddr addr,
  }
  uhci_update_irq(s);
  break;
-case 0x04:
+case UHCI_USBINTR:
  s->intr = val;
  uhci_update_irq(s);
  break;
-case 0x06:
+case UHCI_USBFRNUM:
  if (s->status & UHCI_STS_HCHALTED) {
  s->frnum = val & 0x7ff;
  }
  break;
-case 0x08:
+case UHCI_USBFLBASEADD:
  s->fl_base_addr &= 0x;
  s->fl_base_addr |= val & ~0xfff;
  break;
-case 0x0a:
+case UHCI_USBFLBASEADD + 2:
  s->fl_base_addr &= 0x;
  s->fl_base_addr |= (val << 16);
  break;
-case 0x0c:
+case UHCI_USBSOF:
  s->sof_timing = val & 0xff;
  break;
-case 0x10 ... 0x1f:
+case UHCI_USBPORTSC1 ... UHCI_USBPORTSC4:
  {
  UHCIPort *port;
  USBDevice *dev;
@@ -493,28 +493,28 @@ static uint64_t uhci_port_read(void *opaque, hwaddr addr, 
unsigned size)
  uint32_t val;
  
  switch (addr) {

-case 0x00:
+case UHCI_USBCMD:
  val = s->cmd;
  break;
-case 0x02:
+case UHCI_USBSTS:
  val = s->status;
  break;
-case 0x04:
+case UHCI_USBINTR:
  val = s->intr;
  break;
-case 0x06:
+case UHCI_USBFRNUM:
  val = s->frnum;
  break;
-case 0x08:
+case UHCI_USBFLBASEADD:
  val = s->fl_base_addr & 0x;
  break;
-case 0x0a:
+case UHCI_USBFLBASEADD + 2:
  val = (s->fl_base_addr >> 16) & 0x;
  break;
-case 0x0c:
+case UHCI_USBSOF:
  val = s->sof_timing;
  break;
-case 0x10 ... 0x1f:
+case UHCI_USBPORTSC1 ... UHCI_USBPORTSC4:
  {
  UHCIPort *port;
  int n;
diff --git a/include/hw/usb/uhci-regs.h b/include/hw/usb/uhci-regs.h
index fd45d29db0..5b81714e5c 100644
--- a/include/hw/usb/uhci-regs.h
+++ b/include/hw/usb/uhci-regs.h
@@ -1,6 +1,17 @@
  #ifndef HW_USB_UHCI_REGS_H
  #define HW_USB_UHCI_REGS_H
  
+#define UHCI_USBCMD   0

+#define UHCI_USBSTS   2
+#define UHCI_USBINTR  4
+#define UHCI_USBFRNUM 6
+#define UHCI_USBFLBASEADD 8
+#define UHCI_USBSOF   0x0c
+#define UHCI_USBPORTSC1   0x10
+#define UHCI_USBPORTSC2   0x12
+#define UHCI_USBPORTSC3   0x14
+#define UHCI_USBPORTSC4   0x16
+
  #define UHCI_CMD_FGR  (1 << 4)
  #define UHCI_CMD_EGSM (1 << 3)
  #define UHCI_CMD_GRESET   (1 << 2)

Re: [RFC PATCH 1/8] usb/uhci: checkpatch cleanup

2024-09-09 Thread Cédric Le Goater


On 9/6/24 14:25, Guenter Roeck wrote:

Fix reported checkpatch issues to prepare for next patches
in the series.

No functional change.

Signed-off-by: Guenter Roeck 



Reviewed-by: Cédric Le Goater 

Thanks,

C.



---
  hw/usb/hcd-uhci.c | 90 +--
  1 file changed, 56 insertions(+), 34 deletions(-)

diff --git a/hw/usb/hcd-uhci.c b/hw/usb/hcd-uhci.c
index a03cf22e69..dfcc3e05c0 100644
--- a/hw/usb/hcd-uhci.c
+++ b/hw/usb/hcd-uhci.c
@@ -67,7 +67,7 @@ struct UHCIPCIDeviceClass {
  UHCIInfo   info;
  };
  
-/*

+/*
   * Pending async transaction.
   * 'packet' must be the first field because completion
   * handler does "(UHCIAsync *) pkt" cast.
@@ -220,8 +220,9 @@ static void uhci_async_cancel(UHCIAsync *async)
  uhci_async_unlink(async);
  trace_usb_uhci_packet_cancel(async->queue->token, async->td_addr,
   async->done);
-if (!async->done)
+if (!async->done) {
  usb_cancel_packet(&async->packet);
+}
  uhci_async_free(async);
  }
  
@@ -322,7 +323,7 @@ static void uhci_reset(DeviceState *dev)

  s->fl_base_addr = 0;
  s->sof_timing = 64;
  
-for(i = 0; i < UHCI_PORTS; i++) {

+for (i = 0; i < UHCI_PORTS; i++) {
  port = &s->ports[i];
  port->ctrl = 0x0080;
  if (port->port.dev && port->port.dev->attached) {
@@ -387,7 +388,7 @@ static void uhci_port_write(void *opaque, hwaddr addr,
  
  trace_usb_uhci_mmio_writew(addr, val);
  
-switch(addr) {

+switch (addr) {
  case 0x00:
  if ((val & UHCI_CMD_RS) && !(s->cmd & UHCI_CMD_RS)) {
  /* start frame processing */
@@ -404,7 +405,7 @@ static void uhci_port_write(void *opaque, hwaddr addr,
  int i;
  
  /* send reset on the USB bus */

-for(i = 0; i < UHCI_PORTS; i++) {
+for (i = 0; i < UHCI_PORTS; i++) {
  port = &s->ports[i];
  usb_device_reset(port->port.dev);
  }
@@ -425,10 +426,13 @@ static void uhci_port_write(void *opaque, hwaddr addr,
  break;
  case 0x02:
  s->status &= ~val;
-/* XXX: the chip spec is not coherent, so we add a hidden
-   register to distinguish between IOC and SPD */
-if (val & UHCI_STS_USBINT)
+/*
+ * XXX: the chip spec is not coherent, so we add a hidden
+ * register to distinguish between IOC and SPD
+ */
+if (val & UHCI_STS_USBINT) {
  s->status2 = 0;
+}
  uhci_update_irq(s);
  break;
  case 0x04:
@@ -436,8 +440,9 @@ static void uhci_port_write(void *opaque, hwaddr addr,
  uhci_update_irq(s);
  break;
  case 0x06:
-if (s->status & UHCI_STS_HCHALTED)
+if (s->status & UHCI_STS_HCHALTED) {
  s->frnum = val & 0x7ff;
+}
  break;
  case 0x08:
  s->fl_base_addr &= 0x;
@@ -464,8 +469,8 @@ static void uhci_port_write(void *opaque, hwaddr addr,
  dev = port->port.dev;
  if (dev && dev->attached) {
  /* port reset */
-if ( (val & UHCI_PORT_RESET) &&
- !(port->ctrl & UHCI_PORT_RESET) ) {
+if ((val & UHCI_PORT_RESET) &&
+ !(port->ctrl & UHCI_PORT_RESET)) {
  usb_device_reset(dev);
  }
  }
@@ -487,7 +492,7 @@ static uint64_t uhci_port_read(void *opaque, hwaddr addr, 
unsigned size)
  UHCIState *s = opaque;
  uint32_t val;
  
-switch(addr) {

+switch (addr) {
  case 0x00:
  val = s->cmd;
  break;
@@ -533,12 +538,13 @@ static uint64_t uhci_port_read(void *opaque, hwaddr addr, 
unsigned size)
  }
  
  /* signal resume if controller suspended */

-static void uhci_resume (void *opaque)
+static void uhci_resume(void *opaque)
  {
  UHCIState *s = (UHCIState *)opaque;
  
-if (!s)

+if (!s) {
  return;
+}
  
  if (s->cmd & UHCI_CMD_EGSM) {

  s->cmd |= UHCI_CMD_FGR;
@@ -674,7 +680,8 @@ static int uhci_handle_td_error(UHCIState *s, UHCI_TD *td, 
uint32_t td_addr,
  return ret;
  }
  
-static int uhci_complete_td(UHCIState *s, UHCI_TD *td, UHCIAsync *async, uint32_t *int_mask)

+static int uhci_complete_td(UHCIState *s, UHCI_TD *td, UHCIAsync *async,
+uint32_t *int_mask)
  {
  int len = 0, max_len;
  uint8_t pid;
@@ -682,8 +689,9 @@ static int uhci_complete_td(UHCIState *s, UHCI_TD *td, 
UHCIAsync *async, uint32_
  max_len = ((td->token >> 21) + 1) & 0x7ff;
  pid = td->token & 0xff;
  
-if (td->ctrl & TD_CTRL_IOS)

+if (td->ctrl & TD_CTRL_IOS) {
  td->ctrl &= ~TD_CTRL_ACTIVE;
+}
  
  if (async->packet.status != USB_RET_SUCCESS) {

  return uhci_handle_td_error(s, td, async->td_addr,
@@ -693,12 +701,15 @@ static int uhci_complete_td(UHCIState *s, UHCI_TD *td, 
UHCIAsync

Re: [RFC PATCH 8/8] aspeed: Add uhci support for ast2400 and ast2500

2024-09-09 Thread Cédric Le Goater


On 9/6/24 14:25, Guenter Roeck wrote:

Enable UHCI support for ast2400 and ast2500 SoCs. With this patch,
the UHCI port is successfully instantiated on the ast2500-evb machine.

Signed-off-by: Guenter Roeck 



Reviewed-by: Cédric Le Goater 

Thanks,

C.



---
  hw/arm/aspeed_ast2400.c | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/hw/arm/aspeed_ast2400.c b/hw/arm/aspeed_ast2400.c
index d125886207..93bfe3e3dd 100644
--- a/hw/arm/aspeed_ast2400.c
+++ b/hw/arm/aspeed_ast2400.c
@@ -31,6 +31,7 @@ static const hwaddr aspeed_soc_ast2400_memmap[] = {
  [ASPEED_DEV_FMC]= 0x1E62,
  [ASPEED_DEV_SPI1]   = 0x1E63,
  [ASPEED_DEV_EHCI1]  = 0x1E6A1000,
+[ASPEED_DEV_UHCI]   = 0x1E6B,
  [ASPEED_DEV_VIC]= 0x1E6C,
  [ASPEED_DEV_SDMC]   = 0x1E6E,
  [ASPEED_DEV_SCU]= 0x1E6E2000,
@@ -68,6 +69,7 @@ static const hwaddr aspeed_soc_ast2500_memmap[] = {
  [ASPEED_DEV_SPI2]   = 0x1E631000,
  [ASPEED_DEV_EHCI1]  = 0x1E6A1000,
  [ASPEED_DEV_EHCI2]  = 0x1E6A3000,
+[ASPEED_DEV_UHCI]   = 0x1E6B,
  [ASPEED_DEV_VIC]= 0x1E6C,
  [ASPEED_DEV_SDMC]   = 0x1E6E,
  [ASPEED_DEV_SCU]= 0x1E6E2000,
@@ -107,6 +109,7 @@ static const int aspeed_soc_ast2400_irqmap[] = {
  [ASPEED_DEV_FMC]= 19,
  [ASPEED_DEV_EHCI1]  = 5,
  [ASPEED_DEV_EHCI2]  = 13,
+[ASPEED_DEV_UHCI]   = 14,
  [ASPEED_DEV_SDMC]   = 0,
  [ASPEED_DEV_SCU]= 21,
  [ASPEED_DEV_ADC]= 31,
@@ -199,6 +202,8 @@ static void aspeed_ast2400_soc_init(Object *obj)
  TYPE_PLATFORM_EHCI);
  }
  
+object_initialize_child(obj, "uhci", &s->uhci, TYPE_ASPEED_UHCI);

+
  snprintf(typename, sizeof(typename), "aspeed.sdmc-%s", socname);
  object_initialize_child(obj, "sdmc", &s->sdmc, typename);
  object_property_add_alias(obj, "ram-size", OBJECT(&s->sdmc),
@@ -393,6 +398,15 @@ static void aspeed_ast2400_soc_realize(DeviceState *dev, 
Error **errp)
 aspeed_soc_get_irq(s, ASPEED_DEV_EHCI1 + i));
  }
  
+/* UHCI */

+if (!sysbus_realize(SYS_BUS_DEVICE(&s->uhci), errp)) {
+return;
+}
+aspeed_mmio_map(s, SYS_BUS_DEVICE(&s->uhci), 0,
+sc->memmap[ASPEED_DEV_UHCI]);
+sysbus_connect_irq(SYS_BUS_DEVICE(&s->uhci), 0,
+   aspeed_soc_get_irq(s, ASPEED_DEV_UHCI));
+
  /* SDMC - SDRAM Memory Controller */
  if (!sysbus_realize(SYS_BUS_DEVICE(&s->sdmc), errp)) {
  return;

Re: [RFC PATCH 7/8] aspeed: Add uhci support for ast2600

2024-09-09 Thread Cédric Le Goater


On 9/6/24 14:25, Guenter Roeck wrote:

Enable UHCO support for the ast2600 SoC. With this patch, the UHCI port
is successfully instantiated on the rainier-bmc and ast2600-evb machines.

Signed-off-by: Guenter Roeck 



Reviewed-by: Cédric Le Goater 

Thanks,

C.



---
  hw/arm/aspeed_ast2600.c | 13 +
  include/hw/arm/aspeed_soc.h |  3 +++
  2 files changed, 16 insertions(+)

diff --git a/hw/arm/aspeed_ast2600.c b/hw/arm/aspeed_ast2600.c
index be3eb70cdd..cd7e4ae6c9 100644
--- a/hw/arm/aspeed_ast2600.c
+++ b/hw/arm/aspeed_ast2600.c
@@ -33,6 +33,7 @@ static const hwaddr aspeed_soc_ast2600_memmap[] = {
  [ASPEED_DEV_SPI2]  = 0x1E631000,
  [ASPEED_DEV_EHCI1] = 0x1E6A1000,
  [ASPEED_DEV_EHCI2] = 0x1E6A3000,
+[ASPEED_DEV_UHCI]  = 0x1E6B,
  [ASPEED_DEV_MII1]  = 0x1E65,
  [ASPEED_DEV_MII2]  = 0x1E650008,
  [ASPEED_DEV_MII3]  = 0x1E650010,
@@ -110,6 +111,7 @@ static const int aspeed_soc_ast2600_irqmap[] = {
  [ASPEED_DEV_SDHCI] = 43,
  [ASPEED_DEV_EHCI1] = 5,
  [ASPEED_DEV_EHCI2] = 9,
+[ASPEED_DEV_UHCI]  = 10,
  [ASPEED_DEV_EMMC]  = 15,
  [ASPEED_DEV_GPIO]  = 40,
  [ASPEED_DEV_GPIO_1_8V] = 11,
@@ -206,6 +208,8 @@ static void aspeed_soc_ast2600_init(Object *obj)
  TYPE_PLATFORM_EHCI);
  }
  
+object_initialize_child(obj, "uhci", &s->uhci, TYPE_ASPEED_UHCI);

+
  snprintf(typename, sizeof(typename), "aspeed.sdmc-%s", socname);
  object_initialize_child(obj, "sdmc", &s->sdmc, typename);
  object_property_add_alias(obj, "ram-size", OBJECT(&s->sdmc),
@@ -481,6 +485,15 @@ static void aspeed_soc_ast2600_realize(DeviceState *dev, 
Error **errp)
 aspeed_soc_get_irq(s, ASPEED_DEV_EHCI1 + i));
  }
  
+/* UHCI */

+if (!sysbus_realize(SYS_BUS_DEVICE(&s->uhci), errp)) {
+return;
+}
+aspeed_mmio_map(s, SYS_BUS_DEVICE(&s->uhci), 0,
+sc->memmap[ASPEED_DEV_UHCI]);
+sysbus_connect_irq(SYS_BUS_DEVICE(&s->uhci), 0,
+   aspeed_soc_get_irq(s, ASPEED_DEV_UHCI));
+
  /* SDMC - SDRAM Memory Controller */
  if (!sysbus_realize(SYS_BUS_DEVICE(&s->sdmc), errp)) {
  return;
diff --git a/include/hw/arm/aspeed_soc.h b/include/hw/arm/aspeed_soc.h
index 624d489e0d..b54849db72 100644
--- a/include/hw/arm/aspeed_soc.h
+++ b/include/hw/arm/aspeed_soc.h
@@ -34,6 +34,7 @@
  #include "hw/gpio/aspeed_gpio.h"
  #include "hw/sd/aspeed_sdhci.h"
  #include "hw/usb/hcd-ehci.h"
+#include "hw/usb/hcd-uhci-sysbus.h"
  #include "qom/object.h"
  #include "hw/misc/aspeed_lpc.h"
  #include "hw/misc/unimp.h"
@@ -72,6 +73,7 @@ struct AspeedSoCState {
  AspeedSMCState fmc;
  AspeedSMCState spi[ASPEED_SPIS_NUM];
  EHCISysBusState ehci[ASPEED_EHCIS_NUM];
+ASPEEDUHCIState uhci;
  AspeedSBCState sbc;
  AspeedSLIState sli;
  AspeedSLIState sliio;
@@ -193,6 +195,7 @@ enum {
  ASPEED_DEV_SPI2,
  ASPEED_DEV_EHCI1,
  ASPEED_DEV_EHCI2,
+ASPEED_DEV_UHCI,
  ASPEED_DEV_VIC,
  ASPEED_DEV_INTC,
  ASPEED_DEV_SDMC,

Re: [PATCH 1/3] ui/sdl2: reenable the SDL2 Windows keyboard hook procedure

2024-09-09 Thread Stefan Weil via


Am 09.09.24 um 09:26 schrieb Marc-André Lureau:


Hi

On Mon, Sep 9, 2024 at 10:22 AM Volker Rümelin  wrote:

Windows only:

The libSDL2 Windows message loop needs the libSDL2 Windows low
level keyboard hook procedure to grab the left and right Windows
keys correctly. Reenable the SDL2 Windows keyboard hook procedure.

Because the QEMU Windows keyboard hook procedure is still needed
to filter out the special left Control key event for every Alt Gr
key event, it's important to install the two keyboard hook
procedures in the following order. First the SDL2 procedure, then
the QEMU procedure.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2139
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2323
Tested-by: Howard Spoelstra 
Signed-off-by: Volker Rümelin 
---
  ui/sdl2.c   | 53 ++---
  ui/win32-kbd-hook.c |  3 +++
  2 files changed, 38 insertions(+), 18 deletions(-)

diff --git a/ui/sdl2.c b/ui/sdl2.c
index 98ed974371..ac37c173a1 100644
--- a/ui/sdl2.c
+++ b/ui/sdl2.c

[...]

@@ -877,6 +883,17 @@ static void sdl2_display_init(DisplayState *ds, 
DisplayOptions *o)
  SDL_EnableScreenSaver();
  memset(&info, 0, sizeof(info));
  SDL_VERSION(&info.version);
+/*
+ * Since version 2.16.0 under Windows, SDL2 has its own low level
+ * keyboard hook procedure to grab the keyboard. The remaining task of
+ * QEMU's low level keyboard hook procedure is to filter out the special
+ * left Control up/down key event for every Alt Gr key event on keyboards
+ * with an international layout.
+ */
+SDL_GetVersion(&ver);
+if (ver.major == 2 && ver.minor < 16) {
+win32_kbd_grab = true;
+}


Note: there is no 2.16 release. They jumped from 2.0.22 to 2.24 (see
https://github.com/libsdl-org/SDL/releases/tag/release-2.24.0)

The windows hook was indeed added in 2.0.16, released on Aug 10, 2021.

Given the distribution nature of the Windows binaries, I think we
could simply depend on a much recent version without worrying about
compatibility with < 2.0.16. This would help reduce the potential
combinations of versions and bugs reports.


[...]

I agree. My builds for Windows typically use the very latest versions, 
for example mingw-w64-i686-SDL2-2.30.7-1-any for the next build. So 
depending on a recent SDL version would be fine for me.


Stefan W.

[PATCH v2] target/riscv32: Fix masking of physical address

2024-09-09 Thread Andrew Jones

C doesn't extend the sign bit for unsigned types since there isn't a
sign bit to extend. This means a promotion of a u32 to a u64 results
in the upper 32 bits of the u64 being zero. If that result is then
used as a mask on another u64 the upper 32 bits will be cleared. rv32
physical addresses may be up to 34 bits wide, so we don't want to
clear the high bits while page aligning the address. The fix is to
use hwaddr for the mask, which, even on rv32, is 64-bits wide.

Fixes: af3fc195e3c8 ("target/riscv: Change the TLB page size depends on PMP 
entries.")
Signed-off-by: Andrew Jones 
---
-v2: Switch from signed long to hwaddr

 target/riscv/cpu_helper.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 395a1d914061..4b2c72780c36 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -1323,7 +1323,7 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
 int ret = TRANSLATE_FAIL;
 int mode = mmuidx_priv(mmu_idx);
 /* default TLB page size */
-target_ulong tlb_size = TARGET_PAGE_SIZE;
+hwaddr tlb_size = TARGET_PAGE_SIZE;
 
 env->guest_phys_fault_addr = 0;
 
@@ -1375,7 +1375,7 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
 
 qemu_log_mask(CPU_LOG_MMU,
   "%s PMP address=" HWADDR_FMT_plx " ret %d prot"
-  " %d tlb_size " TARGET_FMT_lu "\n",
+  " %d tlb_size %" HWADDR_PRIu "\n",
   __func__, pa, ret, prot_pmp, tlb_size);
 
 prot &= prot_pmp;
@@ -1409,7 +1409,7 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
 
 qemu_log_mask(CPU_LOG_MMU,
   "%s PMP address=" HWADDR_FMT_plx " ret %d prot"
-  " %d tlb_size " TARGET_FMT_lu "\n",
+  " %d tlb_size %" HWADDR_PRIu "\n",
   __func__, pa, ret, prot_pmp, tlb_size);
 
 prot &= prot_pmp;
-- 
2.46.0

Re: [PATCH 2/2] hw/riscv/virt: Introduce strict-dt

2024-09-09 Thread Andrew Jones

On Mon, Sep 09, 2024 at 12:41:24PM GMT, Alistair Francis wrote:
> On Mon, Aug 19, 2024 at 5:50 PM Andrew Jones  wrote:
> >
> > On Mon, Aug 19, 2024 at 11:19:18AM GMT, Alistair Francis wrote:
> > > On Sat, Aug 17, 2024 at 2:08 AM Andrew Jones  
> > > wrote:
> > > >
> > > > Older firmwares and OS kernels which use deprecated device tree
> > > > properties or are missing support for new properties may not be
> > > > tolerant of fully compliant device trees. When divergence to the
> > > > bindings specifications is harmless for new firmwares and OS kernels
> > > > which are compliant, then it's probably better to also continue
> > > > supporting the old firmwares and OS kernels by generating
> > > > non-compliant device trees. The '#msi-cells=<0>' property of the
> > > > imsic is one such property. Generating that property doesn't provide
> > > > anything necessary (no '#msi-cells' property or an '#msi-cells'
> > > > property with a value of zero mean the same thing) but it does
> > > > cause PCI devices to fail to find the MSI controller on Linux and,
> > > > for that reason, riscv virt doesn't currently generate it despite
> > > > that putting the DT out of compliance. For users that want a
> > > > compliant DT and know their software supports it, introduce a machine
> > > > property 'strict-dt' to do so. We also drop the one redundant
> > > > property that uses a deprecated name when strict-dt is enabled.
> > > >
> > > > Signed-off-by: Andrew Jones 
> > > > ---
> > > >  docs/system/riscv/virt.rst | 11 ++
> > > >  hw/riscv/virt.c| 43 ++
> > > >  include/hw/riscv/virt.h|  1 +
> > > >  3 files changed, 46 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/docs/system/riscv/virt.rst b/docs/system/riscv/virt.rst
> > > > index 9a06f95a3444..f08d0a053051 100644
> > > > --- a/docs/system/riscv/virt.rst
> > > > +++ b/docs/system/riscv/virt.rst
> > > > @@ -116,6 +116,17 @@ The following machine-specific options are 
> > > > supported:
> > > >having AIA IMSIC (i.e. "aia=aplic-imsic" selected). When not 
> > > > specified,
> > > >the default number of per-HART VS-level AIA IMSIC pages is 0.
> > > >
> > > > +- strict-dt=[on|off]
> > >
> > > Hmm... I don't love the idea of having yet another command line option.
> > >
> > > Does this really buy us a lot? Eventually we should deprecate the
> > > invalid DT bindings anyway
> >
> > I agree we should deprecate the invalid DT usage, with the goal of only
> > generating DTs that make the validator happy. I'm not sure how long that
> > deprecation period should be, though. It may need to be a while since
> > we'll need to decide when we've waited long enough to no longer care
> > about older kernels. In the meantime, we won't be making the validator
> > happy and may get bug reports due to that. With strct-dt we can just
> > direct people in that direction. Also, I wouldn't be surprised if
> > something else like this comes along some day, which is why I tried to
> > make the option as generic as possible. Finally, the 'if (strict_dt)'
> > self-documents to some extent. Otherwise we'll need to add comments
> > around explaining why we're diverging from the specs. Although we should
> > probably do that anyway, i.e. I should have put a comment on the
> > 'if (strict-dt) then #msi-cells' explaining why it's under strict-dt.
> > If we want strict-dt, then I'll send a v2 doing that. If we don't want
> > strict-dt then I'll send a v2 with just a comment explaining why
> > #msi-cells was left out.
> 
> I think go without strict-dt and add a comment.
> 
> In the future if we decide we really want to keep the validator happy
> then we can version the virt machine and use the older machine for
> backwards compatible kernels

OK, I'll post a patch with a comment as soon as I have an upstream
Linux commit to reference for the fix. So far the fix is only in
linux-next.

Thanks,
drew

RE: [PATCH v4 0/2] intel_iommu minor fixes

2024-09-09 Thread Duan, Zhenzhong

Hi Michael,

Kindly ping, seems this small series missed.

Thanks
Zhenzhong


>-Original Message-
>From: Duan, Zhenzhong 
>Subject: [PATCH v4 0/2] intel_iommu minor fixes
>
>Hi
>
>Fixes two minor issues in intel iommu.
>See patch for details.
>
>Tested scalable mode and legacy mode with vfio device passthrough: PASS
>Tested intel-iommu.flat in kvm-unit-test: PASS
>
>Thanks
>Zhenzhong
>
>Changelog:
>v4:
>- Use 12 bytes commit id in fix tag (Liu Yi)
>
>v3:
>- add fix tag (Liu Yi)
>- collect R-B
>
>v2:
>- s/take/taking/ (Liu Yi)
>- add patch2 (Liu Yi)
>
>Zhenzhong Duan (2):
>  intel_iommu: Fix invalidation descriptor type field
>  intel_iommu: Make PASID-cache and PIOTLB type invalid in legacy mode
>
> hw/i386/intel_iommu_internal.h | 11 ++-
> hw/i386/intel_iommu.c  | 24 
> 2 files changed, 18 insertions(+), 17 deletions(-)
>
>--
>2.34.1

Re: [PATCH v2 15/17] vfio/migration: Multifd device state transfer support - receive side

2024-09-09 Thread Avihai Horon




On 27/08/2024 20:54, Maciej S. Szmigiero wrote:

External email: Use caution opening links or attachments


From: "Maciej S. Szmigiero" 

The multifd received data needs to be reassembled since device state
packets sent via different multifd channels can arrive out-of-order.

Therefore, each VFIO device state packet carries a header indicating
its position in the stream.

The last such VFIO device state packet should have
VFIO_DEVICE_STATE_CONFIG_STATE flag set and carry the device config
state.

Since it's important to finish loading device state transferred via
the main migration channel (via save_live_iterate handler) before
starting loading the data asynchronously transferred via multifd
a new VFIO_MIG_FLAG_DEV_DATA_STATE_COMPLETE flag is introduced to
mark the end of the main migration channel data.

The device state loading process waits until that flag is seen before
commencing loading of the multifd-transferred device state.

Signed-off-by: Maciej S. Szmigiero 
---
  hw/vfio/migration.c   | 338 +-
  hw/vfio/pci.c |   2 +
  hw/vfio/trace-events  |   9 +-
  include/hw/vfio/vfio-common.h |  17 ++
  4 files changed, 362 insertions(+), 4 deletions(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 24679d8c5034..57c1542528dc 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -15,6 +15,7 @@
  #include 
  #include 

+#include "io/channel-buffer.h"
  #include "sysemu/runstate.h"
  #include "hw/vfio/vfio-common.h"
  #include "migration/misc.h"
@@ -47,6 +48,7 @@
  #define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xef13ULL)
  #define VFIO_MIG_FLAG_DEV_DATA_STATE(0xef14ULL)
  #define VFIO_MIG_FLAG_DEV_INIT_DATA_SENT (0xef15ULL)
+#define VFIO_MIG_FLAG_DEV_DATA_STATE_COMPLETE(0xef16ULL)

  /*
   * This is an arbitrary size based on migration of mlx5 devices, where 
typically
@@ -55,6 +57,15 @@
   */
  #define VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE (1 * MiB)

+#define VFIO_DEVICE_STATE_CONFIG_STATE (1)
+
+typedef struct VFIODeviceStatePacket {
+uint32_t version;
+uint32_t idx;
+uint32_t flags;
+uint8_t data[0];
+} QEMU_PACKED VFIODeviceStatePacket;
+
  static int64_t bytes_transferred;

  static const char *mig_state_to_str(enum vfio_device_mig_state state)
@@ -254,6 +265,188 @@ static int vfio_load_buffer(QEMUFile *f, VFIODevice 
*vbasedev,
  return ret;
  }

+typedef struct LoadedBuffer {
+bool is_present;
+char *data;
+size_t len;
+} LoadedBuffer;


Maybe rename LoadedBuffer to a more specific name, like VFIOStateBuffer?

I also feel like LoadedBuffer deserves a separate commit.
Plus, I think it will be good to add a full API for this, that wraps the 
g_array_* calls and holds the extra members.
E.g, VFIOStateBuffer, VFIOStateArray (will hold load_buf_idx, 
load_buf_idx_last, etc.), vfio_state_array_destroy(), 
vfio_state_array_alloc(), vfio_state_array_get(), etc...

IMHO, this will make it clearer.


+
+static void loaded_buffer_clear(gpointer data)
+{
+LoadedBuffer *lb = data;
+
+if (!lb->is_present) {
+return;
+}
+
+g_clear_pointer(&lb->data, g_free);
+lb->is_present = false;
+}
+
+static int vfio_load_state_buffer(void *opaque, char *data, size_t data_size,
+  Error **errp)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+VFIODeviceStatePacket *packet = (VFIODeviceStatePacket *)data;
+QEMU_LOCK_GUARD(&migration->load_bufs_mutex);


Move lock to where it's needed? I.e., after 
trace_vfio_load_state_device_buffer_incoming(vbasedev->name, packet->idx)



+LoadedBuffer *lb;
+
+if (data_size < sizeof(*packet)) {
+error_setg(errp, "packet too short at %zu (min is %zu)",
+   data_size, sizeof(*packet));
+return -1;
+}
+
+if (packet->version != 0) {
+error_setg(errp, "packet has unknown version %" PRIu32,
+   packet->version);
+return -1;
+}
+
+if (packet->idx == UINT32_MAX) {
+error_setg(errp, "packet has too high idx %" PRIu32,
+   packet->idx);
+return -1;
+}
+
+trace_vfio_load_state_device_buffer_incoming(vbasedev->name, packet->idx);
+
+/* config state packet should be the last one in the stream */
+if (packet->flags & VFIO_DEVICE_STATE_CONFIG_STATE) {
+migration->load_buf_idx_last = packet->idx;
+}
+
+assert(migration->load_bufs);
+if (packet->idx >= migration->load_bufs->len) {
+g_array_set_size(migration->load_bufs, packet->idx + 1);
+}
+
+lb = &g_array_index(migration->load_bufs, typeof(*lb), packet->idx);
+if (lb->is_present) {
+error_setg(errp, "state buffer %" PRIu32 " already filled", 
packet->idx);
+return -1;
+}
+
+assert(packet->idx >= migration->load_buf_idx);
+
+migration->load_buf_queued_pending_buffers++;
+if (migrat

Re: [PATCH 1/1] hw/intc/riscv_aplic: Check and update pending when write sourcecfg

2024-09-09 Thread Yong-Xuan Wang

Hi Alistair,

On Mon, Sep 9, 2024 at 10:32 AM Alistair Francis  wrote:
>
> On Thu, Aug 8, 2024 at 6:21 PM Yong-Xuan Wang  
> wrote:
> >
> > The section 4.5.2 of the RISC-V AIA specification says that any write
> > to a sourcecfg register of an APLIC might (or might not) cause the
> > corresponding interrupt-pending bit to be set to one if the rectified
> > input value is high (= 1) under the new source mode.
> >
> > If an interrupt is asserted before the driver configs its interrupt
> > type to APLIC, it's pending bit will not be set except a relevant
> > write to a setip or setipnum register. When we write the interrupt
> > type to sourcecfg register, if the APLIC device doesn't check
> > rectified input value and update the pending bit, this interrupt
> > might never becomes pending.
> >
> > For APLIC.m, we can manully set pending by setip or setipnum
> > registers in driver. But for APLIC.w, the pending status totally
> > depends on the rectified input value, we can't control the pending
> > status via mmio registers. In this case, hw should check and update
> > pending status for us when writing sourcecfg registers.
> >
> > Update QEMU emulation to handle "pre-existing" interrupts.
> >
> > Signed-off-by: Yong-Xuan Wang 
>
> Acked-by: Alistair Francis 
>
> > ---
> >  hw/intc/riscv_aplic.c | 49 +++
> >  1 file changed, 31 insertions(+), 18 deletions(-)
> >
> > diff --git a/hw/intc/riscv_aplic.c b/hw/intc/riscv_aplic.c
> > index 32edd6d07bb3..2a9ac76ce92e 100644
> > --- a/hw/intc/riscv_aplic.c
> > +++ b/hw/intc/riscv_aplic.c
> > @@ -159,31 +159,41 @@ static bool is_kvm_aia(bool msimode)
> >  return kvm_irqchip_in_kernel() && msimode;
> >  }
> >
> > +static bool riscv_aplic_irq_rectified_val(RISCVAPLICState *aplic,
> > +  uint32_t irq)
> > +{
> > +uint32_t sourcecfg, sm, raw_input, irq_inverted;
> > +
> > +if (!irq || aplic->num_irqs <= irq) {
> > +return false;
> > +}
> > +
> > +sourcecfg = aplic->sourcecfg[irq];
> > +if (sourcecfg & APLIC_SOURCECFG_D) {
> > +return false;
> > +}
> > +
> > +sm = sourcecfg & APLIC_SOURCECFG_SM_MASK;
> > +if (sm == APLIC_SOURCECFG_SM_INACTIVE) {
> > +return false;
> > +}
> > +
> > +raw_input = (aplic->state[irq] & APLIC_ISTATE_INPUT) ? 1 : 0;
> > +irq_inverted = (sm == APLIC_SOURCECFG_SM_LEVEL_LOW ||
> > +sm == APLIC_SOURCECFG_SM_EDGE_FALL) ? 1 : 0;
> > +return !!(raw_input ^ irq_inverted);
> > +}
> > +
> >  static uint32_t riscv_aplic_read_input_word(RISCVAPLICState *aplic,
> >  uint32_t word)
> >  {
> > -uint32_t i, irq, sourcecfg, sm, raw_input, irq_inverted, ret = 0;
> > +uint32_t i, irq, rectified_val, ret = 0;
> >
> >  for (i = 0; i < 32; i++) {
> >  irq = word * 32 + i;
> > -if (!irq || aplic->num_irqs <= irq) {
> > -continue;
> > -}
> > -
> > -sourcecfg = aplic->sourcecfg[irq];
> > -if (sourcecfg & APLIC_SOURCECFG_D) {
> > -continue;
> > -}
> > -
> > -sm = sourcecfg & APLIC_SOURCECFG_SM_MASK;
> > -if (sm == APLIC_SOURCECFG_SM_INACTIVE) {
> > -continue;
> > -}
> >
> > -raw_input = (aplic->state[irq] & APLIC_ISTATE_INPUT) ? 1 : 0;
> > -irq_inverted = (sm == APLIC_SOURCECFG_SM_LEVEL_LOW ||
> > -sm == APLIC_SOURCECFG_SM_EDGE_FALL) ? 1 : 0;
> > -ret |= (raw_input ^ irq_inverted) << i;
> > +rectified_val = riscv_aplic_irq_rectified_val(aplic, irq);
> > +ret |= rectified_val << i;
> >  }
> >
> >  return ret;
> > @@ -702,6 +712,9 @@ static void riscv_aplic_write(void *opaque, hwaddr 
> > addr, uint64_t value,
> >  (aplic->sourcecfg[irq] == 0)) {
> >  riscv_aplic_set_pending_raw(aplic, irq, false);
> >  riscv_aplic_set_enabled_raw(aplic, irq, false);
> > +} else {
> > +if (riscv_aplic_irq_rectified_val(aplic, irq))
> > +riscv_aplic_set_pending_raw(aplic, irq, true);
>
> You need curly braces for the if statement. You can run checkpatch.pl
> to catch issues like this
>

Thank you! I will fix it.

Regards,
Yong-Xuan

> Alistair
>
> >  }
> >  } else if (aplic->mmode && aplic->msimode &&
> > (addr == APLIC_MMSICFGADDR)) {
> > --
> > 2.17.1
> >
> >

Re: qemu emulation for USB ports of Allwinner H3

2024-09-09 Thread Gerd Hoffmann

On Sun, Sep 08, 2024 at 11:36:18AM GMT, Guenter Roeck wrote:
> Hi,
> 
> the Allwinner H3 USB port qemu emulation creates separate USB ports
> for its EHCI and OHCI controllers, resulting in a total of 8 USB ports.
> From the orangepi-pc emulation:
> 
> # lsusb
> Bus 005 Device 001: ID 1d6b:0002
> Bus 003 Device 001: ID 1d6b:0002
> Bus 001 Device 001: ID 1d6b:0002
> Bus 008 Device 001: ID 1d6b:0002
> Bus 006 Device 001: ID 1d6b:0001
> Bus 004 Device 001: ID 1d6b:0001
> Bus 002 Device 001: ID 1d6b:0002
> Bus 009 Device 001: ID 1d6b:0001
> Bus 007 Device 001: ID 1d6b:0001
> 
> The SoC supports EHCI companion interfaces, and my understanding is that
> it only has four physical USB ports. Does the real hardware instantiate
> separate EHCI and OHCI interfaces (for a total of 8 USB ports), or does it
> use the companion functionality ?

Well, on the guest side you'll see 8 ports even when using the companion
functionality.  Each physical usb port has one ehci port (used when you
plug in usb 2.0+ devices) and one ohci port (used when you plug in usb
1.1 devices).

The main difference is on the qemu backend side.  When using the
companion functionality you have a single qemu usb bus accepting both
1.1 and 2.0+ devices.  When not using the companion functionality you
have one usb bus accepting 2.0+ devices and another usb bus accepting
usb 1.1 devices ...

The guest-visible difference is an per-port bit in ehci registers which
controls whenever ehci or the companion manages the device plugged in.
This bit exists for backward compatibility (guests without ehci driver
can manage all devices via ohci, with usb 2.0+ devices being downgraded
to 1.1 compatibility mode then).

> If the real hardware only instantiates four USB ports (or, in other words,
> if it utilizes EHCI companion functionality), would it make sense to
> reflect that in qemu ?

Yes.

take care,
  Gerd

Re: [PATCH] tcg/i386: Implement vector TST{EQ,NE} for avx512

2024-09-09 Thread Philippe Mathieu-Daudé


On 8/9/24 20:51, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---

Based-on: <20240908022632.459477-1-richard.hender...@linaro.org>
("tcg: Improve support for cmpsel_vec")

---
  tcg/i386/tcg-target.h |  2 +-
  tcg/i386/tcg-target.c.inc | 31 ---
  2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 342be30c4c..c68ac023d8 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -224,7 +224,7 @@ typedef enum {
  #define TCG_TARGET_HAS_minmax_vec   1
  #define TCG_TARGET_HAS_bitsel_vec   have_avx512vl
  #define TCG_TARGET_HAS_cmpsel_vec   1
-#define TCG_TARGET_HAS_tst_vec  0
+#define TCG_TARGET_HAS_tst_vec  have_avx512bw
  
  #define TCG_TARGET_deposit_i32_valid(ofs, len) \

  (((ofs) == 0 && ((len) == 8 || (len) == 16)) || \
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 8c363b7bfc..afeaab313a 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -462,6 +462,14 @@ static bool tcg_target_const_match(int64_t val, int ct,
  #define OPC_VPSRLVD (0x45 | P_EXT38 | P_DATA16)
  #define OPC_VPSRLVQ (0x45 | P_EXT38 | P_DATA16 | P_VEXW)
  #define OPC_VPTERNLOGQ  (0x25 | P_EXT3A | P_DATA16 | P_VEXW | P_EVEX)
+#define OPC_VPTESTMB(0x26 | P_EXT38 | P_DATA16 | P_EVEX)
+#define OPC_VPTESTMW(0x26 | P_EXT38 | P_DATA16 | P_VEXW | P_EVEX)
+#define OPC_VPTESTMD(0x27 | P_EXT38 | P_DATA16 | P_EVEX)
+#define OPC_VPTESTMQ(0x27 | P_EXT38 | P_DATA16 | P_VEXW | P_EVEX)
+#define OPC_VPTESTNMB   (0x26 | P_EXT38 | P_SIMDF3 | P_EVEX)
+#define OPC_VPTESTNMW   (0x26 | P_EXT38 | P_SIMDF3 | P_VEXW | P_EVEX)
+#define OPC_VPTESTNMD   (0x27 | P_EXT38 | P_SIMDF3 | P_EVEX)
+#define OPC_VPTESTNMQ   (0x27 | P_EXT38 | P_SIMDF3 | P_VEXW | P_EVEX)
  #define OPC_VZEROUPPER  (0x77 | P_EXT)
  #define OPC_XCHG_ax_r32   (0x90)
  #define OPC_XCHG_EvGv   (0x87)
@@ -3145,6 +3153,13 @@ static void tcg_out_cmp_vec_k1(TCGContext *s, TCGType 
type, unsigned vece,
  { OPC_VPCMPB, OPC_VPCMPW, OPC_VPCMPD, OPC_VPCMPQ },
  { OPC_VPCMPUB, OPC_VPCMPUW, OPC_VPCMPUD, OPC_VPCMPUQ }
  };
+static const int testm_insn[4] = {
+OPC_VPTESTMB, OPC_VPTESTMW, OPC_VPTESTMD, OPC_VPTESTMQ
+};
+static const int testnm_insn[4] = {
+OPC_VPTESTMB, OPC_VPTESTMW, OPC_VPTESTMD, OPC_VPTESTMQ


OPC_VPTESTNMB, OPC_VPTESTNMW, OPC_VPTESTNMD, OPC_VPTESTNMQ ;)

Otherwise LGTM.


+};
+
  static const int cond_ext[16] = {
  [TCG_COND_EQ] = 0,
  [TCG_COND_NE] = 4,
@@ -3160,9 +3175,19 @@ static void tcg_out_cmp_vec_k1(TCGContext *s, TCGType 
type, unsigned vece,
  [TCG_COND_ALWAYS] = 7,
  };
  
-tcg_out_vex_modrm_type(s, cmpm_insn[is_unsigned_cond(cond)][vece],

-   /* k1 */ 1, v1, v2, type);
-tcg_out8(s, cond_ext[cond]);
+switch (cond) {
+case TCG_COND_TSTNE:
+tcg_out_vex_modrm_type(s, testm_insn[vece], /* k1 */ 1, v1, v2, type);
+break;
+case TCG_COND_TSTEQ:
+tcg_out_vex_modrm_type(s, testnm_insn[vece], /* k1 */ 1, v1, v2, type);
+break;
+default:
+tcg_out_vex_modrm_type(s, cmpm_insn[is_unsigned_cond(cond)][vece],
+   /* k1 */ 1, v1, v2, type);
+tcg_out8(s, cond_ext[cond]);
+break;
+}
  }
  
  static void tcg_out_k1_to_vec(TCGContext *s, TCGType type,

Re: [PATCH] Remove unnecessary code in the interface accel_system_init_ops_interfaces

2024-09-09 Thread Daniel Henrique Barboza





On 9/9/24 12:17 AM, Andrew.Yuan wrote:

The code 'ops = ACCEL_OPS_CLASS(module_object_class_by_name(ops_name));' is 
unnecessary;

And, the following code :
1.has the same functionality;
2.includes error checking;

Signed-off-by: Andrew.Yuan 
---
  accel/accel-system.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/accel/accel-system.c b/accel/accel-system.c
index f6c947dd82..5d502c8fd8 100644
--- a/accel/accel-system.c
+++ b/accel/accel-system.c
@@ -73,7 +73,7 @@ void accel_system_init_ops_interfaces(AccelClass *ac)
  g_assert(ac_name != NULL);
  
  ops_name = g_strdup_printf("%s" ACCEL_OPS_SUFFIX, ac_name);

-ops = ACCEL_OPS_CLASS(module_object_class_by_name(ops_name));
+


The code you're changing was added by 5141e9a23f ("accel: abort if we fail to
load the accelerator plugin") and I think this repetition is intended. If I have
to guess (first time looking at this code), ACCEL_OPS_CLASS() is creating the 
class
type QOM functions that the the second module_object_class_by_name() relies on 
to
catch the module load error the commit is trying to address.

I'm CCing Claudio to get a better idea of the intention here. At the very least 
we
should add a code comment explaining the reasoning behind initing 'ops' two 
times
in a row and so on.


Thanks,

Daniel


  oc = module_object_class_by_name(ops_name);
  if (!oc) {
  error_report("fatal: could not load module for type '%s'", ops_name);

Re: [PATCH] block: support locking on change medium

2024-09-09 Thread Kevin Wolf

Am 09.09.2024 um 03:58 hat Joelle van Dyne geschrieben:
> New optional argument for 'blockdev-change-medium' QAPI command to allow
> the caller to specify if they wish to enable file locking.
> 
> Signed-off-by: Joelle van Dyne 

I feel once you need to control such details of the backend, you should
really use a separate 'blockdev-add' commannd.

If it feels a bit too cumbersome to send explicit commands to open the
tray, remove the medium, insert the new medium referencing the node you
added with 'blockdev-add' and then close the tray again, I can
understand. Maybe what we should do is extend 'blockdev-change-medium'
so that it doesn't only accept a filename to specify the new images, but
alternatively also a node-name.

> +switch (file_locking_mode) {
> +case BLOCKDEV_CHANGE_FILE_LOCKING_MODE_AUTO:
> +break;
> +
> +case BLOCKDEV_CHANGE_FILE_LOCKING_MODE_OFF:
> +qdict_put_str(options, "file.locking", "off");
> +break;
> +
> +case BLOCKDEV_CHANGE_FILE_LOCKING_MODE_ON:
> +qdict_put_str(options, "file.locking", "on");
> +break;
> +
> +default:
> +abort();
> +}

Using "file.locking" makes assumptions about what the passed filename
string would result in. There is nothing that guarantees that the block
driver even has a "file" child, or that the "file" child is referring
to a file-posix driver rather than using a different protocol or being a
filter driver above yet another node. It also doesn't consider backing
files and other non-primary children of the opened node.

So this is not correct, and I don't think there is any realistic way of
making it correct with this approach.

Kevin

Re: [PATCH v3 2/2] hw/char: sifive_uart: Print uart characters async

2024-09-09 Thread Daniel Henrique Barboza





On 9/8/24 11:13 PM, Alistair Francis wrote:

The current approach of using qemu_chr_fe_write() and ignoring the
return values results in dropped characters [1].

Let's update the SiFive UART to use a async sifive_uart_xmit() function
to transmit the characters and apply back pressure to the guest with
the SIFIVE_UART_TXFIFO_FULL status.

This should avoid dropped characters and more realisticly model the
hardware.

1: https://gitlab.com/qemu-project/qemu/-/issues/2114

Signed-off-by: Alistair Francis 
Tested-by: Thomas Huth 


Reviewed-by: Daniel Henrique Barboza 


---
  include/hw/char/sifive_uart.h | 17 ++-
  hw/char/sifive_uart.c | 88 +--
  2 files changed, 99 insertions(+), 6 deletions(-)

diff --git a/include/hw/char/sifive_uart.h b/include/hw/char/sifive_uart.h
index 7f6c79f8bd..b43109bb8b 100644
--- a/include/hw/char/sifive_uart.h
+++ b/include/hw/char/sifive_uart.h
@@ -24,6 +24,7 @@
  #include "hw/qdev-properties.h"
  #include "hw/sysbus.h"
  #include "qom/object.h"
+#include "qemu/fifo8.h"
  
  enum {

  SIFIVE_UART_TXFIFO= 0,
@@ -48,9 +49,13 @@ enum {
  SIFIVE_UART_IP_RXWM   = 2  /* Receive watermark interrupt pending */
  };
  
+#define SIFIVE_UART_TXFIFO_FULL0x8000

+
  #define SIFIVE_UART_GET_TXCNT(txctrl)   ((txctrl >> 16) & 0x7)
  #define SIFIVE_UART_GET_RXCNT(rxctrl)   ((rxctrl >> 16) & 0x7)
+
  #define SIFIVE_UART_RX_FIFO_SIZE 8
+#define SIFIVE_UART_TX_FIFO_SIZE 8
  
  #define TYPE_SIFIVE_UART "riscv.sifive.uart"

  OBJECT_DECLARE_SIMPLE_TYPE(SiFiveUARTState, SIFIVE_UART)
@@ -63,13 +68,21 @@ struct SiFiveUARTState {
  qemu_irq irq;
  MemoryRegion mmio;
  CharBackend chr;
-uint8_t rx_fifo[SIFIVE_UART_RX_FIFO_SIZE];
-uint8_t rx_fifo_len;
+
+uint32_t txfifo;
  uint32_t ie;
  uint32_t ip;
  uint32_t txctrl;
  uint32_t rxctrl;
  uint32_t div;
+
+uint8_t rx_fifo[SIFIVE_UART_RX_FIFO_SIZE];
+uint8_t rx_fifo_len;
+
+Fifo8 tx_fifo;
+
+QEMUTimer *fifo_trigger_handle;
+uint64_t char_tx_time;
  };
  
  SiFiveUARTState *sifive_uart_create(MemoryRegion *address_space, hwaddr base,

diff --git a/hw/char/sifive_uart.c b/hw/char/sifive_uart.c
index 7fc6787f69..ab899b60d6 100644
--- a/hw/char/sifive_uart.c
+++ b/hw/char/sifive_uart.c
@@ -64,6 +64,72 @@ static void sifive_uart_update_irq(SiFiveUARTState *s)
  }
  }
  
+static gboolean sifive_uart_xmit(void *do_not_use, GIOCondition cond,

+ void *opaque)
+{
+SiFiveUARTState *s = opaque;
+int ret;
+const uint8_t *charecters;
+uint32_t numptr = 0;
+
+/* instant drain the fifo when there's no back-end */
+if (!qemu_chr_fe_backend_connected(&s->chr)) {
+fifo8_reset(&s->tx_fifo);
+return G_SOURCE_REMOVE;
+}
+
+if (fifo8_is_empty(&s->tx_fifo)) {
+return G_SOURCE_REMOVE;
+}
+
+/* Don't pop the FIFO in case the write fails */
+charecters = fifo8_peek_bufptr(&s->tx_fifo,
+   fifo8_num_used(&s->tx_fifo), &numptr);
+ret = qemu_chr_fe_write(&s->chr, charecters, numptr);
+
+if (ret >= 0) {
+/* We wrote the data, actually pop the fifo */
+fifo8_pop_bufptr(&s->tx_fifo, ret, NULL);
+}
+
+if (!fifo8_is_empty(&s->tx_fifo)) {
+guint r = qemu_chr_fe_add_watch(&s->chr, G_IO_OUT | G_IO_HUP,
+sifive_uart_xmit, s);
+if (!r) {
+fifo8_reset(&s->tx_fifo);
+return G_SOURCE_REMOVE;
+}
+}
+
+/* Clear the TX Full bit */
+if (!fifo8_is_full(&s->tx_fifo)) {
+s->txfifo &= ~SIFIVE_UART_TXFIFO_FULL;
+}
+
+sifive_uart_update_irq(s);
+return G_SOURCE_REMOVE;
+}
+
+static void sifive_uart_write_tx_fifo(SiFiveUARTState *s, const uint8_t *buf,
+  int size)
+{
+uint64_t current_time = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+
+if (size > fifo8_num_free(&s->tx_fifo)) {
+size = fifo8_num_free(&s->tx_fifo);
+qemu_log_mask(LOG_GUEST_ERROR, "sifive_uart: TX FIFO overflow");
+}
+
+fifo8_push_all(&s->tx_fifo, buf, size);
+
+if (fifo8_is_full(&s->tx_fifo)) {
+s->txfifo |= SIFIVE_UART_TXFIFO_FULL;
+}
+
+timer_mod(s->fifo_trigger_handle, current_time +
+  (s->char_tx_time * 4));
+}
+
  static uint64_t
  sifive_uart_read(void *opaque, hwaddr addr, unsigned int size)
  {
@@ -82,7 +148,7 @@ sifive_uart_read(void *opaque, hwaddr addr, unsigned int 
size)
  return 0x8000;
  
  case SIFIVE_UART_TXFIFO:

-return 0; /* Should check tx fifo */
+return s->txfifo;
  case SIFIVE_UART_IE:
  return s->ie;
  case SIFIVE_UART_IP:
@@ -106,12 +172,10 @@ sifive_uart_write(void *opaque, hwaddr addr,
  {
  SiFiveUARTState *s = opaque;
  uint32_t value = val64;
-unsigned char ch = value;
  
  switch (addr) {

  case SIFIVE_UART_TXFIFO:
-qe

Re: [PATCH v7 0/6] plugins: access values during a memory read/write

2024-09-09 Thread Alex Bennée

Pierrick Bouvier  writes:

> On 9/5/24 08:21, Alex Bennée wrote:
>> Pierrick Bouvier  writes:
>> 
>>> This series allows plugins to know which value is read/written during a 
>>> memory
>>> access.
>>>
>>> For every memory access, we know copy this value before calling mem 
>>> callbacks,
>>> and those can query it using new API function:
>>> - qemu_plugin_mem_get_value
>> Queued to patches 1-5 to plugins/next, thanks.
>> You can send the re-spun version of 6 once the review comments have
>> been
>> done.
>> 
>
> Thanks Alex,
>
> right now, my try to make check-tcg are blocked with the cross
> containers who don't compile, so I'll wait for this to be resolved.

Which ones?

> I still wonder if having a simple aarch64/x64 test is not enough, and
> covering 99.9% of the bug we could introduce in the future on this.

Have you measured the code coverage of the test?

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [RFC PATCH 1/2] hw/arm/sbsa-ref: Enable CXL Host Bridge by pxb-cxl

2024-09-09 Thread Marcin Juszkiewicz


On 30.08.2024 06:15, Yuquan Wang wrote:

The memory layout places 1M space for 16 host bridge register regions
in the sbsa-ref memmap. In addition, this creates a default pxb-cxl
(bus_nr=0xfe) bridge with one cxl-rp on sbsa-ref.


With this patchset applied I no longer can add pcie devices to sbsa-ref.

-device nvme,serial=deadbeef,bus=root_port_for_nvme1,drive=hdd
-drive file=disks/full-debian.hddimg,format=raw,id=hdd,if=none

Normally this adds NVME as pcie device but now it probably ends on 
pxb-cxl bus instead.


Also please bump platform_version.minor and document adding CXL in 
docs/system/arm/sbsa.rst file.

Re: [PATCH] Remove unnecessary code in the interface accel_system_init_ops_interfaces

2024-09-09 Thread Claudio Fontana

On 9/9/24 11:54, Daniel Henrique Barboza wrote:
> 
> 
> On 9/9/24 12:17 AM, Andrew.Yuan wrote:
>> The code 'ops = ACCEL_OPS_CLASS(module_object_class_by_name(ops_name));' is 
>> unnecessary;
>>
>> And, the following code :
>> 1.has the same functionality;
>> 2.includes error checking;
>>
>> Signed-off-by: Andrew.Yuan 
>> ---
>>   accel/accel-system.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/accel/accel-system.c b/accel/accel-system.c
>> index f6c947dd82..5d502c8fd8 100644
>> --- a/accel/accel-system.c
>> +++ b/accel/accel-system.c
>> @@ -73,7 +73,7 @@ void accel_system_init_ops_interfaces(AccelClass *ac)
>>   g_assert(ac_name != NULL);
>>   
>>   ops_name = g_strdup_printf("%s" ACCEL_OPS_SUFFIX, ac_name);
>> -ops = ACCEL_OPS_CLASS(module_object_class_by_name(ops_name));
>> +
> 
> The code you're changing was added by 5141e9a23f ("accel: abort if we fail to
> load the accelerator plugin") and I think this repetition is intended. If I 
> have
> to guess (first time looking at this code), ACCEL_OPS_CLASS() is creating the 
> class
> type QOM functions that the the second module_object_class_by_name() relies 
> on to
> catch the module load error the commit is trying to address.
> 
> I'm CCing Claudio to get a better idea of the intention here. At the very 
> least we
> should add a code comment explaining the reasoning behind initing 'ops' two 
> times
> in a row and so on.
> 
> 
> Thanks,
> 
> Daniel

Hi Daniel, just to signal that I've seen this message and will get to it when I 
am back to work later this week.

Ciao,

Claudio

> 
>>   oc = module_object_class_by_name(ops_name);
>>   if (!oc) {
>>   error_report("fatal: could not load module for type '%s'", 
>> ops_name);

Re: [PATCH v3 2/2] hw/char: sifive_uart: Print uart characters async

2024-09-09 Thread Philippe Mathieu-Daudé


Hi Alistair,

On 9/9/24 04:13, Alistair Francis wrote:

The current approach of using qemu_chr_fe_write() and ignoring the
return values results in dropped characters [1].

Let's update the SiFive UART to use a async sifive_uart_xmit() function
to transmit the characters and apply back pressure to the guest with
the SIFIVE_UART_TXFIFO_FULL status.

This should avoid dropped characters and more realisticly model the
hardware.

1: https://gitlab.com/qemu-project/qemu/-/issues/2114

Signed-off-by: Alistair Francis 
Tested-by: Thomas Huth 
---
  include/hw/char/sifive_uart.h | 17 ++-
  hw/char/sifive_uart.c | 88 +--
  2 files changed, 99 insertions(+), 6 deletions(-)

diff --git a/include/hw/char/sifive_uart.h b/include/hw/char/sifive_uart.h
index 7f6c79f8bd..b43109bb8b 100644
--- a/include/hw/char/sifive_uart.h
+++ b/include/hw/char/sifive_uart.h
@@ -24,6 +24,7 @@
  #include "hw/qdev-properties.h"
  #include "hw/sysbus.h"
  #include "qom/object.h"
+#include "qemu/fifo8.h"
  
  enum {

  SIFIVE_UART_TXFIFO= 0,
@@ -48,9 +49,13 @@ enum {
  SIFIVE_UART_IP_RXWM   = 2  /* Receive watermark interrupt pending */
  };
  
+#define SIFIVE_UART_TXFIFO_FULL0x8000

+
  #define SIFIVE_UART_GET_TXCNT(txctrl)   ((txctrl >> 16) & 0x7)
  #define SIFIVE_UART_GET_RXCNT(rxctrl)   ((rxctrl >> 16) & 0x7)
+
  #define SIFIVE_UART_RX_FIFO_SIZE 8
+#define SIFIVE_UART_TX_FIFO_SIZE 8
  
  #define TYPE_SIFIVE_UART "riscv.sifive.uart"

  OBJECT_DECLARE_SIMPLE_TYPE(SiFiveUARTState, SIFIVE_UART)
@@ -63,13 +68,21 @@ struct SiFiveUARTState {
  qemu_irq irq;
  MemoryRegion mmio;
  CharBackend chr;
-uint8_t rx_fifo[SIFIVE_UART_RX_FIFO_SIZE];
-uint8_t rx_fifo_len;
+
+uint32_t txfifo;
  uint32_t ie;
  uint32_t ip;
  uint32_t txctrl;
  uint32_t rxctrl;
  uint32_t div;
+
+uint8_t rx_fifo[SIFIVE_UART_RX_FIFO_SIZE];
+uint8_t rx_fifo_len;
+
+Fifo8 tx_fifo;
+
+QEMUTimer *fifo_trigger_handle;
+uint64_t char_tx_time;


Unfortunately some fields now need to be migrated and
tracked in vmstate_sifive_uart[].


  };
  
  SiFiveUARTState *sifive_uart_create(MemoryRegion *address_space, hwaddr base,

diff --git a/hw/char/sifive_uart.c b/hw/char/sifive_uart.c
index 7fc6787f69..ab899b60d6 100644
--- a/hw/char/sifive_uart.c
+++ b/hw/char/sifive_uart.c
@@ -64,6 +64,72 @@ static void sifive_uart_update_irq(SiFiveUARTState *s)
  }
  }
  
+static gboolean sifive_uart_xmit(void *do_not_use, GIOCondition cond,

+ void *opaque)
+{
+SiFiveUARTState *s = opaque;
+int ret;
+const uint8_t *charecters;


"characters" ;)


+uint32_t numptr = 0;
+
+/* instant drain the fifo when there's no back-end */
+if (!qemu_chr_fe_backend_connected(&s->chr)) {
+fifo8_reset(&s->tx_fifo);
+return G_SOURCE_REMOVE;
+}
+
+if (fifo8_is_empty(&s->tx_fifo)) {
+return G_SOURCE_REMOVE;
+}
+
+/* Don't pop the FIFO in case the write fails */
+charecters = fifo8_peek_bufptr(&s->tx_fifo,
+   fifo8_num_used(&s->tx_fifo), &numptr);
+ret = qemu_chr_fe_write(&s->chr, charecters, numptr);
+
+if (ret >= 0) {
+/* We wrote the data, actually pop the fifo */
+fifo8_pop_bufptr(&s->tx_fifo, ret, NULL);
+}
+
+if (!fifo8_is_empty(&s->tx_fifo)) {
+guint r = qemu_chr_fe_add_watch(&s->chr, G_IO_OUT | G_IO_HUP,
+sifive_uart_xmit, s);
+if (!r) {
+fifo8_reset(&s->tx_fifo);
+return G_SOURCE_REMOVE;
+}
+}
+
+/* Clear the TX Full bit */
+if (!fifo8_is_full(&s->tx_fifo)) {
+s->txfifo &= ~SIFIVE_UART_TXFIFO_FULL;
+}
+
+sifive_uart_update_irq(s);
+return G_SOURCE_REMOVE;


Alex suggested to see if we can have a generic (abstract?) FIFO char
implementation. I might have a look later when I get the PL011 series
in.


+}
+
+static void sifive_uart_write_tx_fifo(SiFiveUARTState *s, const uint8_t *buf,
+  int size)
+{
+uint64_t current_time = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+
+if (size > fifo8_num_free(&s->tx_fifo)) {
+size = fifo8_num_free(&s->tx_fifo);
+qemu_log_mask(LOG_GUEST_ERROR, "sifive_uart: TX FIFO overflow");
+}
+
+fifo8_push_all(&s->tx_fifo, buf, size);
+
+if (fifo8_is_full(&s->tx_fifo)) {
+s->txfifo |= SIFIVE_UART_TXFIFO_FULL;
+}
+
+timer_mod(s->fifo_trigger_handle, current_time +
+  (s->char_tx_time * 4));
+}
+
  static uint64_t
  sifive_uart_read(void *opaque, hwaddr addr, unsigned int size)
  {
@@ -82,7 +148,7 @@ sifive_uart_read(void *opaque, hwaddr addr, unsigned int 
size)
  return 0x8000;
  
  case SIFIVE_UART_TXFIFO:

-return 0; /* Should check tx fifo */
+return s->txfifo;
  case SIFIVE_UART_IE:
  return s->ie;
  case SIFIVE_UART_IP:
@@ -106,12

Re: [PATCH v3 2/2] hw/char: sifive_uart: Print uart characters async

2024-09-09 Thread Philippe Mathieu-Daudé


Hi Alistair,

On 9/9/24 04:13, Alistair Francis wrote:

The current approach of using qemu_chr_fe_write() and ignoring the
return values results in dropped characters [1].

Let's update the SiFive UART to use a async sifive_uart_xmit() function
to transmit the characters and apply back pressure to the guest with
the SIFIVE_UART_TXFIFO_FULL status.

This should avoid dropped characters and more realisticly model the
hardware.

1: https://gitlab.com/qemu-project/qemu/-/issues/2114

Signed-off-by: Alistair Francis 
Tested-by: Thomas Huth 
---
  include/hw/char/sifive_uart.h | 17 ++-
  hw/char/sifive_uart.c | 88 +--
  2 files changed, 99 insertions(+), 6 deletions(-)

diff --git a/include/hw/char/sifive_uart.h b/include/hw/char/sifive_uart.h
index 7f6c79f8bd..b43109bb8b 100644
--- a/include/hw/char/sifive_uart.h
+++ b/include/hw/char/sifive_uart.h
@@ -24,6 +24,7 @@
  #include "hw/qdev-properties.h"
  #include "hw/sysbus.h"
  #include "qom/object.h"
+#include "qemu/fifo8.h"
  
  enum {

  SIFIVE_UART_TXFIFO= 0,
@@ -48,9 +49,13 @@ enum {
  SIFIVE_UART_IP_RXWM   = 2  /* Receive watermark interrupt pending */
  };
  
+#define SIFIVE_UART_TXFIFO_FULL0x8000

+
  #define SIFIVE_UART_GET_TXCNT(txctrl)   ((txctrl >> 16) & 0x7)
  #define SIFIVE_UART_GET_RXCNT(rxctrl)   ((rxctrl >> 16) & 0x7)
+
  #define SIFIVE_UART_RX_FIFO_SIZE 8
+#define SIFIVE_UART_TX_FIFO_SIZE 8
  
  #define TYPE_SIFIVE_UART "riscv.sifive.uart"

  OBJECT_DECLARE_SIMPLE_TYPE(SiFiveUARTState, SIFIVE_UART)
@@ -63,13 +68,21 @@ struct SiFiveUARTState {
  qemu_irq irq;
  MemoryRegion mmio;
  CharBackend chr;
-uint8_t rx_fifo[SIFIVE_UART_RX_FIFO_SIZE];
-uint8_t rx_fifo_len;
+
+uint32_t txfifo;
  uint32_t ie;
  uint32_t ip;
  uint32_t txctrl;
  uint32_t rxctrl;
  uint32_t div;
+
+uint8_t rx_fifo[SIFIVE_UART_RX_FIFO_SIZE];
+uint8_t rx_fifo_len;
+
+Fifo8 tx_fifo;
+
+QEMUTimer *fifo_trigger_handle;
+uint64_t char_tx_time;


Unfortunately some fields now need to be migrated and
tracked in vmstate_sifive_uart[].


  };
  
  SiFiveUARTState *sifive_uart_create(MemoryRegion *address_space, hwaddr base,

diff --git a/hw/char/sifive_uart.c b/hw/char/sifive_uart.c
index 7fc6787f69..ab899b60d6 100644
--- a/hw/char/sifive_uart.c
+++ b/hw/char/sifive_uart.c
@@ -64,6 +64,72 @@ static void sifive_uart_update_irq(SiFiveUARTState *s)
  }
  }
  
+static gboolean sifive_uart_xmit(void *do_not_use, GIOCondition cond,

+ void *opaque)
+{
+SiFiveUARTState *s = opaque;
+int ret;
+const uint8_t *charecters;


"characters" ;)


+uint32_t numptr = 0;
+
+/* instant drain the fifo when there's no back-end */
+if (!qemu_chr_fe_backend_connected(&s->chr)) {
+fifo8_reset(&s->tx_fifo);
+return G_SOURCE_REMOVE;
+}
+
+if (fifo8_is_empty(&s->tx_fifo)) {
+return G_SOURCE_REMOVE;
+}
+
+/* Don't pop the FIFO in case the write fails */
+charecters = fifo8_peek_bufptr(&s->tx_fifo,
+   fifo8_num_used(&s->tx_fifo), &numptr);
+ret = qemu_chr_fe_write(&s->chr, charecters, numptr);
+
+if (ret >= 0) {
+/* We wrote the data, actually pop the fifo */
+fifo8_pop_bufptr(&s->tx_fifo, ret, NULL);
+}
+
+if (!fifo8_is_empty(&s->tx_fifo)) {
+guint r = qemu_chr_fe_add_watch(&s->chr, G_IO_OUT | G_IO_HUP,
+sifive_uart_xmit, s);
+if (!r) {
+fifo8_reset(&s->tx_fifo);
+return G_SOURCE_REMOVE;
+}
+}
+
+/* Clear the TX Full bit */
+if (!fifo8_is_full(&s->tx_fifo)) {
+s->txfifo &= ~SIFIVE_UART_TXFIFO_FULL;
+}
+
+sifive_uart_update_irq(s);
+return G_SOURCE_REMOVE;


Alex suggested to see if we can have a generic (abstract?) FIFO char
implementation. I might have a look later when I get the PL011 series
in.


+}
+
+static void sifive_uart_write_tx_fifo(SiFiveUARTState *s, const uint8_t *buf,
+  int size)
+{
+uint64_t current_time = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+
+if (size > fifo8_num_free(&s->tx_fifo)) {
+size = fifo8_num_free(&s->tx_fifo);
+qemu_log_mask(LOG_GUEST_ERROR, "sifive_uart: TX FIFO overflow");
+}
+
+fifo8_push_all(&s->tx_fifo, buf, size);
+
+if (fifo8_is_full(&s->tx_fifo)) {
+s->txfifo |= SIFIVE_UART_TXFIFO_FULL;
+}
+
+timer_mod(s->fifo_trigger_handle, current_time +
+  (s->char_tx_time * 4));
+}
+
  static uint64_t
  sifive_uart_read(void *opaque, hwaddr addr, unsigned int size)
  {
@@ -82,7 +148,7 @@ sifive_uart_read(void *opaque, hwaddr addr, unsigned int 
size)
  return 0x8000;
  
  case SIFIVE_UART_TXFIFO:

-return 0; /* Should check tx fifo */
+return s->txfifo;
  case SIFIVE_UART_IE:
  return s->ie;
  case SIFIVE_UART_IP:
@@ -106,12

Re: [PULL 27/34] migration/multifd: Move nocomp code into multifd-nocomp.c

2024-09-09 Thread Peter Maydell

On Wed, 4 Sept 2024 at 13:48, Fabiano Rosas  wrote:
>
> In preparation for adding new payload types to multifd, move most of
> the no-compression code into multifd-nocomp.c. Let's try to keep a
> semblance of layering by not mixing general multifd control flow with
> the details of transmitting pages of ram.
>
> There are still some pieces leftover, namely the p->normal, p->zero,
> etc variables that we use for zero page tracking and the packet
> allocation which is heavily dependent on the ram code.
>
> Reviewed-by: Peter Xu 
> Signed-off-by: Fabiano Rosas 

I know Coverity has only flagged this up because the
code has moved, but it seems like a good place to ask
the question:

> +void multifd_ram_fill_packet(MultiFDSendParams *p)
> +{
> +MultiFDPacket_t *packet = p->packet;
> +MultiFDPages_t *pages = &p->data->u.ram;
> +uint32_t zero_num = pages->num - pages->normal_num;
> +
> +packet->pages_alloc = cpu_to_be32(multifd_ram_page_count());
> +packet->normal_pages = cpu_to_be32(pages->normal_num);
> +packet->zero_pages = cpu_to_be32(zero_num);
> +
> +if (pages->block) {
> +strncpy(packet->ramblock, pages->block->idstr, 256);

Coverity points out that when we fill in the RAMBlock::idstr
here, if packet->ramblock is not NUL terminated then we won't
NUL-terminate idstr either (CID 1560071).

Is this really what is intended?

Perhaps
 pstrncpy(packet->ramblock, sizeof(packet->ramblock),
  pages->block->idstr);

would be better?

(pstrncpy will always NUL-terminate, and won't pointlessly
zero-fill the space after the string in the destination.)

> +}
> +
> +for (int i = 0; i < pages->num; i++) {
> +/* there are architectures where ram_addr_t is 32 bit */
> +uint64_t temp = pages->offset[i];
> +
> +packet->offset[i] = cpu_to_be64(temp);
> +}
> +
> +trace_multifd_send_ram_fill(p->id, pages->normal_num,
> +zero_num);
> +}

thanks
-- PMM

Re: [PULL 27/34] migration/multifd: Move nocomp code into multifd-nocomp.c

2024-09-09 Thread Peter Maydell

On Mon, 9 Sept 2024 at 11:28, Peter Maydell  wrote:
>
> On Wed, 4 Sept 2024 at 13:48, Fabiano Rosas  wrote:
> >
> > In preparation for adding new payload types to multifd, move most of
> > the no-compression code into multifd-nocomp.c. Let's try to keep a
> > semblance of layering by not mixing general multifd control flow with
> > the details of transmitting pages of ram.
> >
> > There are still some pieces leftover, namely the p->normal, p->zero,
> > etc variables that we use for zero page tracking and the packet
> > allocation which is heavily dependent on the ram code.
> >
> > Reviewed-by: Peter Xu 
> > Signed-off-by: Fabiano Rosas 
>
> I know Coverity has only flagged this up because the
> code has moved, but it seems like a good place to ask
> the question:
>
> > +void multifd_ram_fill_packet(MultiFDSendParams *p)
> > +{
> > +MultiFDPacket_t *packet = p->packet;
> > +MultiFDPages_t *pages = &p->data->u.ram;
> > +uint32_t zero_num = pages->num - pages->normal_num;
> > +
> > +packet->pages_alloc = cpu_to_be32(multifd_ram_page_count());
> > +packet->normal_pages = cpu_to_be32(pages->normal_num);
> > +packet->zero_pages = cpu_to_be32(zero_num);
> > +
> > +if (pages->block) {
> > +strncpy(packet->ramblock, pages->block->idstr, 256);
>
> Coverity points out that when we fill in the RAMBlock::idstr
> here, if packet->ramblock is not NUL terminated then we won't
> NUL-terminate idstr either (CID 1560071).
>
> Is this really what is intended?
>
> Perhaps
>  pstrncpy(packet->ramblock, sizeof(packet->ramblock),
>   pages->block->idstr);
>
> would be better?
>
> (pstrncpy will always NUL-terminate, and won't pointlessly
> zero-fill the space after the string in the destination.)

Whoops, the name of the function I'm trying to recommend
is "pstrcpy", not "pstrncpy" :-)

-- PMM

Re: [PATCH] virtio-9p: remove virtfs-proxy-helper

2024-09-09 Thread Greg Kurz

On Thu,  5 Sep 2024 10:22:59 +0200
Paolo Bonzini  wrote:

> It has been deprecated since 8.1; remove it and suggest using permission 
> mapping
> or virtiofsd.
> 
> Signed-off-by: Paolo Bonzini 
> ---

Thanks Paolo !

Acked-by: Greg Kurz 

>  MAINTAINERS|8 -
>  docs/about/deprecated.rst  |   23 -
>  docs/about/removed-features.rst|   14 +
>  docs/conf.py   |3 -
>  docs/meson.build   |1 -
>  docs/tools/index.rst   |1 -
>  docs/tools/virtfs-proxy-helper.rst |   75 --
>  meson.build|8 -
>  fsdev/qemu-fsdev.h |1 -
>  fsdev/qemu-fsdev.c |   19 -
>  fsdev/virtfs-proxy-helper.c| 1193 --
>  hw/9pfs/9p-proxy.c | 1279 
>  fsdev/meson.build  |8 -
>  hw/9pfs/meson.build|1 -
>  meson_options.txt  |2 -
>  qemu-options.hx|   46 -
>  scripts/meson-buildoptions.|0
>  scripts/meson-buildoptions.sh  |4 -
>  18 files changed, 14 insertions(+), 2672 deletions(-)
>  delete mode 100644 docs/tools/virtfs-proxy-helper.rst
>  delete mode 100644 fsdev/virtfs-proxy-helper.c
>  delete mode 100644 hw/9pfs/9p-proxy.c
>  create mode 100644 scripts/meson-buildoptions.
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 3584d6a6c6d..13e73987060 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2256,20 +2256,12 @@ S: Maintained
>  W: https://wiki.qemu.org/Documentation/9p
>  F: hw/9pfs/
>  X: hw/9pfs/xen-9p*
> -X: hw/9pfs/9p-proxy*
>  F: fsdev/
> -X: fsdev/virtfs-proxy-helper.c
>  F: tests/qtest/virtio-9p-test.c
>  F: tests/qtest/libqos/virtio-9p*
>  T: git https://gitlab.com/gkurz/qemu.git 9p-next
>  T: git https://github.com/cschoenebeck/qemu.git 9p.next
>  
> -virtio-9p-proxy
> -F: hw/9pfs/9p-proxy*
> -F: fsdev/virtfs-proxy-helper.c
> -F: docs/tools/virtfs-proxy-helper.rst
> -S: Obsolete
> -
>  virtio-blk
>  M: Stefan Hajnoczi 
>  L: qemu-bl...@nongnu.org
> diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
> index be62fa06c29..d45dc4fe62f 100644
> --- a/docs/about/deprecated.rst
> +++ b/docs/about/deprecated.rst
> @@ -316,29 +316,6 @@ the addition of volatile memory support, it is now 
> necessary to distinguish
>  between persistent and volatile memory backends.  As such, memdev is 
> deprecated
>  in favor of persistent-memdev.
>  
> -``-fsdev proxy`` and ``-virtfs proxy`` (since 8.1)
> -^^
> -
> -The 9p ``proxy`` filesystem backend driver has been deprecated and will be
> -removed (along with its proxy helper daemon) in a future version of QEMU. 
> Please
> -use ``-fsdev local`` or ``-virtfs local`` for using the 9p ``local`` 
> filesystem
> -backend, or alternatively consider deploying virtiofsd instead.
> -
> -The 9p ``proxy`` backend was originally developed as an alternative to the 9p
> -``local`` backend. The idea was to enhance security by dispatching actual low
> -level filesystem operations from 9p server (QEMU process) over to a separate
> -process (the virtfs-proxy-helper binary). However this alternative never 
> gained
> -momentum. The proxy backend is much slower than the local backend, hasn't 
> seen
> -any development in years, and showed to be less secure, especially due to the
> -fact that its helper daemon must be run as root, whereas with the local 
> backend
> -QEMU is typically run as unprivileged user and allows to tighten behaviour by
> -mapping permissions et al by using its 'mapped' security model option.
> -
> -Nowadays it would make sense to reimplement the ``proxy`` backend by using
> -QEMU's ``vhost`` feature, which would eliminate the high latency costs under
> -which the 9p ``proxy`` backend currently suffers. However as of to date 
> nobody
> -has indicated plans for such kind of reimplementation unfortunately.
> -
>  RISC-V 'any' CPU type ``-cpu any`` (since 8.2)
>  ^^
>  
> diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst
> index 5ae730d02ae..41d3affabfc 100644
> --- a/docs/about/removed-features.rst
> +++ b/docs/about/removed-features.rst
> @@ -517,6 +517,20 @@ The virtio-blk SCSI passthrough feature is a legacy 
> VIRTIO feature.  VIRTIO 1.0
>  and later do not support it because the virtio-scsi device was introduced for
>  full SCSI support.  Use virtio-scsi instead when SCSI passthrough is 
> required.
>  
> +``-fsdev proxy`` and ``-virtfs proxy`` (since 9.2)
> +^^
> +
> +The 9p ``proxy`` filesystem backend driver was originally developed to
> +enhance security by dispatching low level filesystem operations from 9p
> +server (QEMU process) over to a separate process (the virtfs-proxy-helper
> +binary). However the proxy backend was much slower than the local backend,
> +didn't

Re: [PATCH v2] aspeed: Deprecate the tacoma-bmc machine

2024-09-09 Thread Joel Stanley

On Sat, 31 Aug 2024 at 05:41, Guenter Roeck  wrote:
>
> On Fri, Aug 30, 2024 at 10:09:25AM +0200, Cédric Le Goater wrote:
> > Hello,
> >
> >
> > > > > I solved the problem by adding support for IBM Bonnell (which 
> > > > > instantiates
> > > > > the TPM chip through its devicetree file, similar to tacoma-bmc) to 
> > > > > my local
> > > > > copy of qemu.
> > > >
> > > > Hmm, did you copy the rainier-bmc machine definition ?
> > > >
> > > For aspeed_machine_bonnell_class_init(), pretty much yes, since I don't 
> > > know
> > > the actual hardware. For I2C initialization I used the devicetree file.
> > > You can find the patch in the master-local or v9.1.0-local branches
> > > of my qemu clone at https://github.com/groeck/qemu if you are interested.
> >
> > Oh nice ! Let's merge the IBM Bonnell machine. We can ask IBM to help fixing
> > the definitions (strapping). Enabling the PCA9554 is good to have too.

Instead of adding Bonnell to qemu, could we use the Rainier machine? I
know the kernel device tree removed the i2c tpm, but there's no harm
in it being present in the qemu machine.

The bonnell device tree should boot fine on the rainier machine for
your purposes.

Cheers,

Joel

[PATCH v11 00/10] Support persistent reservation operations

2024-09-09 Thread Changqi Lu

Hi,

Patch v11 has been modified, thanks to Klaus for the code review.

v10->v11:
- Before executing the pr operation, check whether it is supported.
  If it is not supported, return NVME_INVALID_OPCODE directly.

v9->v10:
- When the driver does not support the pr operation, the error
  code returned by nvme changes to Invalid Command Opcode.

v8->v9:
- Fix double-free and remove persistent reservation operations at 
nvme_is_write().

v7->v8:
- Fix num_keys may be less than 0 at scsi_pr_read_keys_complete().
- Fix buf memory leak at iscsi driver.

v6->v7:
- Add buferlen size check at SCSI layer.
- Add pr_cap calculation in bdrv_merge_limits() function at block layer,
  so the ugly bs->file->bs->bl.pr_cap in scsi and nvme layers was
  changed to bs->bl.pr_cap.
- Fix memory leak at iscsi driver, and some other spelling errors.

v5->v6:
- Add relevant comments in the io layer.

v4->v5:
- Fixed a memory leak bug at hw/nvme/ctrl.c.

v3->v4:
- At the nvme layer, the two patches of enabling the ONCS
  function and enabling rescap are combined into one.
- At the nvme layer, add helper functions for pr capacity
  conversion between the block layer and the nvme layer.

v2->v3:
In v2 Persist Through Power Loss(PTPL) is enable default.
In v3 PTPL is supported, which is passed as a parameter.

v1->v2:
- Add sg_persist --report-capabilities for SCSI protocol and enable
  oncs and rescap for NVMe protocol.
- Add persistent reservation capabilities constants and helper functions for
  SCSI and NVMe protocol.
- Add comments for necessary APIs.

v1:
- Add seven APIs about persistent reservation command for block layer.
  These APIs including reading keys, reading reservations, registering,
  reserving, releasing, clearing and preempting.
- Add the necessary pr-related operation APIs for both the
  SCSI protocol and NVMe protocol at the device layer.
- Add scsi driver at the driver layer to verify the functions

Changqi Lu (10):
  block: add persistent reservation in/out api
  block/raw: add persistent reservation in/out driver
  scsi/constant: add persistent reservation in/out protocol constants
  scsi/util: add helper functions for persistent reservation types
conversion
  hw/scsi: add persistent reservation in/out api for scsi device
  block/nvme: add reservation command protocol constants
  hw/nvme: add helper functions for converting reservation types
  hw/nvme: enable ONCS and rescap function
  hw/nvme: add reservation protocal command
  block/iscsi: add persistent reservation in/out driver

 block/block-backend.c | 403 +++
 block/io.c| 164 +++
 block/iscsi.c | 433 ++
 block/raw-format.c|  56 
 hw/nvme/ctrl.c| 349 +++-
 hw/nvme/ns.c  |  11 +
 hw/nvme/nvme.h|  93 +++
 hw/scsi/scsi-disk.c   | 368 +
 include/block/block-common.h  |  40 +++
 include/block/block-io.h  |  20 ++
 include/block/block_int-common.h  |  84 ++
 include/block/nvme.h  | 107 +++-
 include/scsi/constants.h  |  52 
 include/scsi/utils.h  |   8 +
 include/sysemu/block-backend-io.h |  24 ++
 scsi/utils.c  |  81 ++
 16 files changed, 2291 insertions(+), 2 deletions(-)

-- 
2.20.1

[PATCH v11 01/10] block: add persistent reservation in/out api

2024-09-09 Thread Changqi Lu

Add persistent reservation in/out operations
at the block level. The following operations
are included:

- read_keys:retrieves the list of registered keys.
- read_reservation: retrieves the current reservation status.
- register: registers a new reservation key.
- reserve:  initiates a reservation for a specific key.
- release:  releases a reservation for a specific key.
- clear:clears all existing reservations.
- preempt:  preempts a reservation held by another key.

Signed-off-by: Changqi Lu 
Signed-off-by: zhenwei pi 
Reviewed-by: Stefan Hajnoczi 
---
 block/block-backend.c | 403 ++
 block/io.c| 164 
 include/block/block-common.h  |  40 +++
 include/block/block-io.h  |  20 ++
 include/block/block_int-common.h  |  84 +++
 include/sysemu/block-backend-io.h |  24 ++
 6 files changed, 735 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index db6f9b92a3..b74aaba23f 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1770,6 +1770,409 @@ BlockAIOCB *blk_aio_ioctl(BlockBackend *blk, unsigned 
long int req, void *buf,
 return blk_aio_prwv(blk, req, 0, buf, blk_aio_ioctl_entry, 0, cb, opaque);
 }
 
+typedef struct BlkPrInCo {
+BlockBackend *blk;
+uint32_t *generation;
+uint32_t num_keys;
+BlockPrType *type;
+uint64_t *keys;
+int ret;
+} BlkPrInCo;
+
+typedef struct BlkPrInCB {
+BlockAIOCB common;
+BlkPrInCo prco;
+bool has_returned;
+} BlkPrInCB;
+
+static const AIOCBInfo blk_pr_in_aiocb_info = {
+.aiocb_size = sizeof(BlkPrInCB),
+};
+
+static void blk_pr_in_complete(BlkPrInCB *acb)
+{
+if (acb->has_returned) {
+acb->common.cb(acb->common.opaque, acb->prco.ret);
+
+/* This is paired with blk_inc_in_flight() in blk_aio_pr_in(). */
+blk_dec_in_flight(acb->prco.blk);
+qemu_aio_unref(acb);
+}
+}
+
+static void blk_pr_in_complete_bh(void *opaque)
+{
+BlkPrInCB *acb = opaque;
+assert(acb->has_returned);
+blk_pr_in_complete(acb);
+}
+
+static BlockAIOCB *blk_aio_pr_in(BlockBackend *blk, uint32_t *generation,
+ uint32_t num_keys, BlockPrType *type,
+ uint64_t *keys, CoroutineEntry co_entry,
+ BlockCompletionFunc *cb, void *opaque)
+{
+BlkPrInCB *acb;
+Coroutine *co;
+
+/* This is paired with blk_dec_in_flight() in blk_pr_in_complete(). */
+blk_inc_in_flight(blk);
+acb = blk_aio_get(&blk_pr_in_aiocb_info, blk, cb, opaque);
+acb->prco = (BlkPrInCo) {
+.blk= blk,
+.generation = generation,
+.num_keys   = num_keys,
+.type   = type,
+.ret= NOT_DONE,
+.keys   = keys,
+};
+acb->has_returned = false;
+
+co = qemu_coroutine_create(co_entry, acb);
+aio_co_enter(qemu_get_current_aio_context(), co);
+
+acb->has_returned = true;
+if (acb->prco.ret != NOT_DONE) {
+replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
+ blk_pr_in_complete_bh, acb);
+}
+
+return &acb->common;
+}
+
+/* To be called between exactly one pair of blk_inc/dec_in_flight() */
+static int coroutine_fn
+blk_aio_pr_do_read_keys(BlockBackend *blk, uint32_t *generation,
+uint32_t num_keys, uint64_t *keys)
+{
+IO_CODE();
+
+blk_wait_while_drained(blk);
+GRAPH_RDLOCK_GUARD();
+
+if (!blk_co_is_available(blk)) {
+return -ENOMEDIUM;
+}
+
+return bdrv_co_pr_read_keys(blk_bs(blk), generation, num_keys, keys);
+}
+
+static void coroutine_fn blk_aio_pr_read_keys_entry(void *opaque)
+{
+BlkPrInCB *acb = opaque;
+BlkPrInCo *prco = &acb->prco;
+
+prco->ret = blk_aio_pr_do_read_keys(prco->blk, prco->generation,
+prco->num_keys, prco->keys);
+blk_pr_in_complete(acb);
+}
+
+BlockAIOCB *blk_aio_pr_read_keys(BlockBackend *blk, uint32_t *generation,
+ uint32_t num_keys, uint64_t *keys,
+ BlockCompletionFunc *cb, void *opaque)
+{
+IO_CODE();
+return blk_aio_pr_in(blk, generation, num_keys, NULL, keys,
+ blk_aio_pr_read_keys_entry, cb, opaque);
+}
+
+/* To be called between exactly one pair of blk_inc/dec_in_flight() */
+static int coroutine_fn
+blk_aio_pr_do_read_reservation(BlockBackend *blk, uint32_t *generation,
+   uint64_t *key, BlockPrType *type)
+{
+IO_CODE();
+
+blk_wait_while_drained(blk);
+GRAPH_RDLOCK_GUARD();
+
+if (!blk_co_is_available(blk)) {
+return -ENOMEDIUM;
+}
+
+return bdrv_co_pr_read_reservation(blk_bs(blk), generation, key, type);
+}
+
+static void coroutine_fn blk_aio_pr_read_reservation_entry(void *opaque)
+{
+BlkPrInCB *acb = opaqu

[PATCH v11 05/10] hw/scsi: add persistent reservation in/out api for scsi device

2024-09-09 Thread Changqi Lu

Add persistent reservation in/out operations in the
SCSI device layer. By introducing the persistent
reservation in/out api, this enables the SCSI device
to perform reservation-related tasks, including querying
keys, querying reservation status, registering reservation
keys, initiating and releasing reservations, as well as
clearing and preempting reservations held by other keys.

These operations are crucial for management and control of
shared storage resources in a persistent manner.

Signed-off-by: Changqi Lu 
Signed-off-by: zhenwei pi 
Reviewed-by: Stefan Hajnoczi 
---
 hw/scsi/scsi-disk.c | 368 
 1 file changed, 368 insertions(+)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 0812d39c02..2441e5ffca 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -32,6 +32,7 @@
 #include "migration/vmstate.h"
 #include "hw/scsi/emulation.h"
 #include "scsi/constants.h"
+#include "scsi/utils.h"
 #include "sysemu/block-backend.h"
 #include "sysemu/blockdev.h"
 #include "hw/block/block.h"
@@ -42,6 +43,7 @@
 #include "qemu/cutils.h"
 #include "trace.h"
 #include "qom/object.h"
+#include "block/block_int.h"
 
 #ifdef __linux
 #include 
@@ -1477,6 +1479,362 @@ static void scsi_disk_emulate_read_data(SCSIRequest 
*req)
 scsi_req_complete(&r->req, GOOD);
 }
 
+typedef struct SCSIPrReadKeys {
+uint32_t generation;
+uint32_t num_keys;
+uint64_t *keys;
+SCSIDiskReq *req;
+} SCSIPrReadKeys;
+
+typedef struct SCSIPrReadReservation {
+uint32_t generation;
+uint64_t key;
+BlockPrType type;
+SCSIDiskReq *req;
+} SCSIPrReadReservation;
+
+static void scsi_pr_read_keys_complete(void *opaque, int ret)
+{
+int num_keys;
+uint8_t *buf;
+SCSIPrReadKeys *blk_keys = (SCSIPrReadKeys *)opaque;
+SCSIDiskReq *r = blk_keys->req;
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
+
+assert(blk_get_aio_context(s->qdev.conf.blk) ==
+qemu_get_current_aio_context());
+
+assert(r->req.aiocb != NULL);
+r->req.aiocb = NULL;
+
+if (scsi_disk_req_check_error(r, ret, true)) {
+goto done;
+}
+
+buf = scsi_req_get_buf(&r->req);
+num_keys = MIN(blk_keys->num_keys, ret > 0 ? ret : 0);
+blk_keys->generation = cpu_to_be32(blk_keys->generation);
+memcpy(&buf[0], &blk_keys->generation, 4);
+for (int i = 0; i < num_keys; i++) {
+blk_keys->keys[i] = cpu_to_be64(blk_keys->keys[i]);
+memcpy(&buf[8 + i * 8], &blk_keys->keys[i], 8);
+}
+num_keys = cpu_to_be32(num_keys * 8);
+memcpy(&buf[4], &num_keys, 4);
+
+scsi_req_data(&r->req, r->buflen);
+done:
+scsi_req_unref(&r->req);
+g_free(blk_keys->keys);
+g_free(blk_keys);
+}
+
+static void scsi_disk_emulate_pr_read_keys(SCSIRequest *req)
+{
+SCSIPrReadKeys *blk_keys;
+SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, req->dev);
+int buflen = MIN(r->req.cmd.xfer, r->buflen);
+int num_keys = (buflen - sizeof(uint32_t) * 2) / sizeof(uint64_t);
+
+if (num_keys <= 0) {
+scsi_check_condition(r, SENSE_CODE(INVALID_PARAM_LEN));
+return;
+}
+
+blk_keys = g_new0(SCSIPrReadKeys, 1);
+blk_keys->generation = 0;
+/* num_keys is the maximum number of keys that can be transmitted */
+blk_keys->num_keys = num_keys;
+blk_keys->keys = g_malloc(sizeof(uint64_t) * num_keys);
+blk_keys->req = r;
+
+/* The request is used as the AIO opaque value, so add a ref.  */
+scsi_req_ref(&r->req);
+r->req.aiocb = blk_aio_pr_read_keys(s->qdev.conf.blk, 
&blk_keys->generation,
+blk_keys->num_keys, blk_keys->keys,
+scsi_pr_read_keys_complete, blk_keys);
+return;
+}
+
+static void scsi_pr_read_reservation_complete(void *opaque, int ret)
+{
+uint8_t *buf;
+uint32_t additional_len = 0;
+SCSIPrReadReservation *blk_rsv = (SCSIPrReadReservation *)opaque;
+SCSIDiskReq *r = blk_rsv->req;
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
+
+assert(blk_get_aio_context(s->qdev.conf.blk) ==
+qemu_get_current_aio_context());
+
+assert(r->req.aiocb != NULL);
+r->req.aiocb = NULL;
+
+if (scsi_disk_req_check_error(r, ret, true)) {
+goto done;
+}
+
+buf = scsi_req_get_buf(&r->req);
+blk_rsv->generation = cpu_to_be32(blk_rsv->generation);
+memcpy(&buf[0], &blk_rsv->generation, 4);
+if (ret) {
+additional_len = cpu_to_be32(16);
+blk_rsv->key = cpu_to_be64(blk_rsv->key);
+memcpy(&buf[8], &blk_rsv->key, 8);
+buf[21] = block_pr_type_to_scsi(blk_rsv->type) & 0xf;
+} else {
+additional_len = cpu_to_be32(0);
+}
+
+memcpy(&buf[4], &additional_len, 4);
+scsi_req_data(&r->req, r->buflen);
+
+done:
+scsi_req_unref(&r->req);
+g_free(blk_rsv);
+}
+
+static void scsi_disk_emulate_pr_read

[PATCH v11 06/10] block/nvme: add reservation command protocol constants

2024-09-09 Thread Changqi Lu

Add constants for the NVMe persistent command protocol.
The constants include the reservation command opcode and
reservation type values defined in section 7 of the NVMe
2.0 specification.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Changqi Lu 
Signed-off-by: zhenwei pi 
---
 include/block/nvme.h | 61 
 1 file changed, 61 insertions(+)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index bb231d0b9a..8b125f7769 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -633,7 +633,11 @@ enum NvmeIoCommands {
 NVME_CMD_WRITE_ZEROES   = 0x08,
 NVME_CMD_DSM= 0x09,
 NVME_CMD_VERIFY = 0x0c,
+NVME_CMD_RESV_REGISTER  = 0x0d,
+NVME_CMD_RESV_REPORT= 0x0e,
+NVME_CMD_RESV_ACQUIRE   = 0x11,
 NVME_CMD_IO_MGMT_RECV   = 0x12,
+NVME_CMD_RESV_RELEASE   = 0x15,
 NVME_CMD_COPY   = 0x19,
 NVME_CMD_IO_MGMT_SEND   = 0x1d,
 NVME_CMD_ZONE_MGMT_SEND = 0x79,
@@ -641,6 +645,63 @@ enum NvmeIoCommands {
 NVME_CMD_ZONE_APPEND= 0x7d,
 };
 
+typedef enum {
+NVME_RESV_REGISTER_ACTION_REGISTER  = 0x00,
+NVME_RESV_REGISTER_ACTION_UNREGISTER= 0x01,
+NVME_RESV_REGISTER_ACTION_REPLACE   = 0x02,
+} NvmeReservationRegisterAction;
+
+typedef enum {
+NVME_RESV_RELEASE_ACTION_RELEASE= 0x00,
+NVME_RESV_RELEASE_ACTION_CLEAR  = 0x01,
+} NvmeReservationReleaseAction;
+
+typedef enum {
+NVME_RESV_ACQUIRE_ACTION_ACQUIRE= 0x00,
+NVME_RESV_ACQUIRE_ACTION_PREEMPT= 0x01,
+NVME_RESV_ACQUIRE_ACTION_PREEMPT_AND_ABORT  = 0x02,
+} NvmeReservationAcquireAction;
+
+typedef enum {
+NVME_RESV_WRITE_EXCLUSIVE   = 0x01,
+NVME_RESV_EXCLUSIVE_ACCESS  = 0x02,
+NVME_RESV_WRITE_EXCLUSIVE_REGS_ONLY = 0x03,
+NVME_RESV_EXCLUSIVE_ACCESS_REGS_ONLY= 0x04,
+NVME_RESV_WRITE_EXCLUSIVE_ALL_REGS  = 0x05,
+NVME_RESV_EXCLUSIVE_ACCESS_ALL_REGS = 0x06,
+} NvmeResvType;
+
+typedef enum {
+NVME_RESV_PTPL_NO_CHANGE = 0x00,
+NVME_RESV_PTPL_DISABLE   = 0x02,
+NVME_RESV_PTPL_ENABLE= 0x03,
+} NvmeResvPTPL;
+
+typedef enum NVMEPrCap {
+/* Persist Through Power Loss */
+NVME_PR_CAP_PTPL = 1 << 0,
+/* Write Exclusive reservation type */
+NVME_PR_CAP_WR_EX = 1 << 1,
+/* Exclusive Access reservation type */
+NVME_PR_CAP_EX_AC = 1 << 2,
+/* Write Exclusive Registrants Only reservation type */
+NVME_PR_CAP_WR_EX_RO = 1 << 3,
+/* Exclusive Access Registrants Only reservation type */
+NVME_PR_CAP_EX_AC_RO = 1 << 4,
+/* Write Exclusive All Registrants reservation type */
+NVME_PR_CAP_WR_EX_AR = 1 << 5,
+/* Exclusive Access All Registrants reservation type */
+NVME_PR_CAP_EX_AC_AR = 1 << 6,
+
+NVME_PR_CAP_ALL = (NVME_PR_CAP_PTPL |
+  NVME_PR_CAP_WR_EX |
+  NVME_PR_CAP_EX_AC |
+  NVME_PR_CAP_WR_EX_RO |
+  NVME_PR_CAP_EX_AC_RO |
+  NVME_PR_CAP_WR_EX_AR |
+  NVME_PR_CAP_EX_AC_AR),
+} NvmePrCap;
+
 typedef struct QEMU_PACKED NvmeDeleteQ {
 uint8_t opcode;
 uint8_t flags;
-- 
2.20.1

[PATCH v11 03/10] scsi/constant: add persistent reservation in/out protocol constants

2024-09-09 Thread Changqi Lu

Add constants for the persistent reservation in/out protocol
in the scsi/constant module. The constants include the persistent
reservation command, type, and scope values defined in sections
6.13 and 6.14 of the SCSI Primary Commands-4 (SPC-4) specification.

Signed-off-by: Changqi Lu 
Signed-off-by: zhenwei pi 
Reviewed-by: Stefan Hajnoczi 
---
 include/scsi/constants.h | 52 
 1 file changed, 52 insertions(+)

diff --git a/include/scsi/constants.h b/include/scsi/constants.h
index 9b98451912..922a314535 100644
--- a/include/scsi/constants.h
+++ b/include/scsi/constants.h
@@ -319,4 +319,56 @@
 #define IDENT_DESCR_TGT_DESCR_SIZE 32
 #define XCOPY_BLK2BLK_SEG_DESC_SIZE 28
 
+typedef enum {
+SCSI_PR_WRITE_EXCLUSIVE = 0x01,
+SCSI_PR_EXCLUSIVE_ACCESS= 0x03,
+SCSI_PR_WRITE_EXCLUSIVE_REGS_ONLY   = 0x05,
+SCSI_PR_EXCLUSIVE_ACCESS_REGS_ONLY  = 0x06,
+SCSI_PR_WRITE_EXCLUSIVE_ALL_REGS= 0x07,
+SCSI_PR_EXCLUSIVE_ACCESS_ALL_REGS   = 0x08,
+} SCSIPrType;
+
+typedef enum {
+SCSI_PR_LU_SCOPE  = 0x00,
+} SCSIPrScope;
+
+typedef enum {
+SCSI_PR_OUT_REGISTER = 0x0,
+SCSI_PR_OUT_RESERVE  = 0x1,
+SCSI_PR_OUT_RELEASE  = 0x2,
+SCSI_PR_OUT_CLEAR= 0x3,
+SCSI_PR_OUT_PREEMPT  = 0x4,
+SCSI_PR_OUT_PREEMPT_AND_ABORT= 0x5,
+SCSI_PR_OUT_REG_AND_IGNORE_KEY   = 0x6,
+SCSI_PR_OUT_REG_AND_MOVE = 0x7,
+} SCSIPrOutAction;
+
+typedef enum {
+SCSI_PR_IN_READ_KEYS = 0x0,
+SCSI_PR_IN_READ_RESERVATION  = 0x1,
+SCSI_PR_IN_REPORT_CAPABILITIES   = 0x2,
+} SCSIPrInAction;
+
+typedef enum {
+/* Exclusive Access All Registrants reservation type */
+SCSI_PR_CAP_EX_AC_AR = 1 << 0,
+/* Write Exclusive reservation type */
+SCSI_PR_CAP_WR_EX = 1 << 9,
+/* Exclusive Access reservation type */
+SCSI_PR_CAP_EX_AC = 1 << 11,
+/* Write Exclusive Registrants Only reservation type */
+SCSI_PR_CAP_WR_EX_RO = 1 << 13,
+/* Exclusive Access Registrants Only reservation type */
+SCSI_PR_CAP_EX_AC_RO = 1 << 14,
+/* Write Exclusive All Registrants reservation type */
+SCSI_PR_CAP_WR_EX_AR = 1 << 15,
+
+SCSI_PR_CAP_ALL = (SCSI_PR_CAP_EX_AC_AR |
+  SCSI_PR_CAP_WR_EX |
+  SCSI_PR_CAP_EX_AC |
+  SCSI_PR_CAP_WR_EX_RO |
+  SCSI_PR_CAP_EX_AC_RO |
+  SCSI_PR_CAP_WR_EX_AR),
+} SCSIPrCap;
+
 #endif
-- 
2.20.1

[PATCH v11 02/10] block/raw: add persistent reservation in/out driver

2024-09-09 Thread Changqi Lu

Add persistent reservation in/out operations for raw driver.
The following methods are implemented: bdrv_co_pr_read_keys,
bdrv_co_pr_read_reservation, bdrv_co_pr_register, bdrv_co_pr_reserve,
bdrv_co_pr_release, bdrv_co_pr_clear and bdrv_co_pr_preempt.

Signed-off-by: Changqi Lu 
Signed-off-by: zhenwei pi 
Reviewed-by: Stefan Hajnoczi 
---
 block/raw-format.c | 56 ++
 1 file changed, 56 insertions(+)

diff --git a/block/raw-format.c b/block/raw-format.c
index ac7e8495f6..3746bc1bd3 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -454,6 +454,55 @@ raw_co_ioctl(BlockDriverState *bs, unsigned long int req, 
void *buf)
 return bdrv_co_ioctl(bs->file->bs, req, buf);
 }
 
+static int coroutine_fn GRAPH_RDLOCK
+raw_co_pr_read_keys(BlockDriverState *bs, uint32_t *generation,
+uint32_t num_keys, uint64_t *keys)
+{
+
+return bdrv_co_pr_read_keys(bs->file->bs, generation, num_keys, keys);
+}
+
+static int coroutine_fn GRAPH_RDLOCK
+raw_co_pr_read_reservation(BlockDriverState *bs, uint32_t *generation,
+   uint64_t *key, BlockPrType *type)
+{
+return bdrv_co_pr_read_reservation(bs->file->bs, generation, key, type);
+}
+
+static int coroutine_fn GRAPH_RDLOCK
+raw_co_pr_register(BlockDriverState *bs, uint64_t old_key,
+   uint64_t new_key, BlockPrType type,
+   bool ptpl, bool ignore_key)
+{
+return bdrv_co_pr_register(bs->file->bs, old_key, new_key,
+   type, ptpl, ignore_key);
+}
+
+static int coroutine_fn GRAPH_RDLOCK
+raw_co_pr_reserve(BlockDriverState *bs, uint64_t key, BlockPrType type)
+{
+return bdrv_co_pr_reserve(bs->file->bs, key, type);
+}
+
+static int coroutine_fn GRAPH_RDLOCK
+raw_co_pr_release(BlockDriverState *bs, uint64_t key, BlockPrType type)
+{
+return bdrv_co_pr_release(bs->file->bs, key, type);
+}
+
+static int coroutine_fn GRAPH_RDLOCK
+raw_co_pr_clear(BlockDriverState *bs, uint64_t key)
+{
+return bdrv_co_pr_clear(bs->file->bs, key);
+}
+
+static int coroutine_fn GRAPH_RDLOCK
+raw_co_pr_preempt(BlockDriverState *bs, uint64_t old_key,
+  uint64_t new_key, BlockPrType type, bool abort)
+{
+return bdrv_co_pr_preempt(bs->file->bs, old_key, new_key, type, abort);
+}
+
 static int GRAPH_RDLOCK raw_has_zero_init(BlockDriverState *bs)
 {
 return bdrv_has_zero_init(bs->file->bs);
@@ -672,6 +721,13 @@ BlockDriver bdrv_raw = {
 .strong_runtime_opts  = raw_strong_runtime_opts,
 .mutable_opts = mutable_opts,
 .bdrv_cancel_in_flight = raw_cancel_in_flight,
+.bdrv_co_pr_read_keys= raw_co_pr_read_keys,
+.bdrv_co_pr_read_reservation = raw_co_pr_read_reservation,
+.bdrv_co_pr_register = raw_co_pr_register,
+.bdrv_co_pr_reserve  = raw_co_pr_reserve,
+.bdrv_co_pr_release  = raw_co_pr_release,
+.bdrv_co_pr_clear= raw_co_pr_clear,
+.bdrv_co_pr_preempt  = raw_co_pr_preempt,
 };
 
 static void bdrv_raw_init(void)
-- 
2.20.1

[PATCH v11 04/10] scsi/util: add helper functions for persistent reservation types conversion

2024-09-09 Thread Changqi Lu

This commit introduces two helper functions
that facilitate the conversion between the
persistent reservation types used in the SCSI
protocol and those used in the block layer.

Signed-off-by: Changqi Lu 
Signed-off-by: zhenwei pi 
Reviewed-by: Stefan Hajnoczi 
---
 include/scsi/utils.h |  8 +
 scsi/utils.c | 81 
 2 files changed, 89 insertions(+)

diff --git a/include/scsi/utils.h b/include/scsi/utils.h
index d5c8efa16e..89a0b082fb 100644
--- a/include/scsi/utils.h
+++ b/include/scsi/utils.h
@@ -1,6 +1,8 @@
 #ifndef SCSI_UTILS_H
 #define SCSI_UTILS_H
 
+#include "block/block-common.h"
+#include "scsi/constants.h"
 #ifdef CONFIG_LINUX
 #include 
 #endif
@@ -135,6 +137,12 @@ uint32_t scsi_data_cdb_xfer(uint8_t *buf);
 uint32_t scsi_cdb_xfer(uint8_t *buf);
 int scsi_cdb_length(uint8_t *buf);
 
+BlockPrType scsi_pr_type_to_block(SCSIPrType type);
+SCSIPrType block_pr_type_to_scsi(BlockPrType type);
+
+uint8_t scsi_pr_cap_to_block(uint16_t scsi_pr_cap);
+uint16_t block_pr_cap_to_scsi(uint8_t block_pr_cap);
+
 /* Linux SG_IO interface.  */
 #ifdef CONFIG_LINUX
 #define SG_ERR_DRIVER_TIMEOUT  0x06
diff --git a/scsi/utils.c b/scsi/utils.c
index 357b036671..0dfdeb499d 100644
--- a/scsi/utils.c
+++ b/scsi/utils.c
@@ -658,3 +658,84 @@ int scsi_sense_from_host_status(uint8_t host_status,
 }
 return GOOD;
 }
+
+BlockPrType scsi_pr_type_to_block(SCSIPrType type)
+{
+switch (type) {
+case SCSI_PR_WRITE_EXCLUSIVE:
+return BLK_PR_WRITE_EXCLUSIVE;
+case SCSI_PR_EXCLUSIVE_ACCESS:
+return BLK_PR_EXCLUSIVE_ACCESS;
+case SCSI_PR_WRITE_EXCLUSIVE_REGS_ONLY:
+return BLK_PR_WRITE_EXCLUSIVE_REGS_ONLY;
+case SCSI_PR_EXCLUSIVE_ACCESS_REGS_ONLY:
+return BLK_PR_EXCLUSIVE_ACCESS_REGS_ONLY;
+case SCSI_PR_WRITE_EXCLUSIVE_ALL_REGS:
+return BLK_PR_WRITE_EXCLUSIVE_ALL_REGS;
+case SCSI_PR_EXCLUSIVE_ACCESS_ALL_REGS:
+return BLK_PR_EXCLUSIVE_ACCESS_ALL_REGS;
+}
+
+return 0;
+}
+
+SCSIPrType block_pr_type_to_scsi(BlockPrType type)
+{
+switch (type) {
+case BLK_PR_WRITE_EXCLUSIVE:
+return SCSI_PR_WRITE_EXCLUSIVE;
+case BLK_PR_EXCLUSIVE_ACCESS:
+return SCSI_PR_EXCLUSIVE_ACCESS;
+case BLK_PR_WRITE_EXCLUSIVE_REGS_ONLY:
+return SCSI_PR_WRITE_EXCLUSIVE_REGS_ONLY;
+case BLK_PR_EXCLUSIVE_ACCESS_REGS_ONLY:
+return SCSI_PR_EXCLUSIVE_ACCESS_REGS_ONLY;
+case BLK_PR_WRITE_EXCLUSIVE_ALL_REGS:
+return SCSI_PR_WRITE_EXCLUSIVE_ALL_REGS;
+case BLK_PR_EXCLUSIVE_ACCESS_ALL_REGS:
+return SCSI_PR_EXCLUSIVE_ACCESS_ALL_REGS;
+}
+
+return 0;
+}
+
+
+uint8_t scsi_pr_cap_to_block(uint16_t scsi_pr_cap)
+{
+uint8_t res = 0;
+
+res |= (scsi_pr_cap & SCSI_PR_CAP_WR_EX) ?
+   BLK_PR_CAP_WR_EX : 0;
+res |= (scsi_pr_cap & SCSI_PR_CAP_EX_AC) ?
+   BLK_PR_CAP_EX_AC : 0;
+res |= (scsi_pr_cap & SCSI_PR_CAP_WR_EX_RO) ?
+   BLK_PR_CAP_WR_EX_RO : 0;
+res |= (scsi_pr_cap & SCSI_PR_CAP_EX_AC_RO) ?
+   BLK_PR_CAP_EX_AC_RO : 0;
+res |= (scsi_pr_cap & SCSI_PR_CAP_WR_EX_AR) ?
+   BLK_PR_CAP_WR_EX_AR : 0;
+res |= (scsi_pr_cap & SCSI_PR_CAP_EX_AC_AR) ?
+   BLK_PR_CAP_EX_AC_AR : 0;
+
+return res;
+}
+
+uint16_t block_pr_cap_to_scsi(uint8_t block_pr_cap)
+{
+uint16_t res = 0;
+
+res |= (block_pr_cap & BLK_PR_CAP_WR_EX) ?
+  SCSI_PR_CAP_WR_EX : 0;
+res |= (block_pr_cap & BLK_PR_CAP_EX_AC) ?
+  SCSI_PR_CAP_EX_AC : 0;
+res |= (block_pr_cap & BLK_PR_CAP_WR_EX_RO) ?
+  SCSI_PR_CAP_WR_EX_RO : 0;
+res |= (block_pr_cap & BLK_PR_CAP_EX_AC_RO) ?
+  SCSI_PR_CAP_EX_AC_RO : 0;
+res |= (block_pr_cap & BLK_PR_CAP_WR_EX_AR) ?
+  SCSI_PR_CAP_WR_EX_AR : 0;
+res |= (block_pr_cap & BLK_PR_CAP_EX_AC_AR) ?
+  SCSI_PR_CAP_EX_AC_AR : 0;
+
+return res;
+}
-- 
2.20.1

[PATCH v11 09/10] hw/nvme: add reservation protocal command

2024-09-09 Thread Changqi Lu

Add reservation acquire, reservation register,
reservation release and reservation report commands
in the nvme device layer.

By introducing these commands, this enables the nvme
device to perform reservation-related tasks, including
querying keys, querying reservation status, registering
reservation keys, initiating and releasing reservations,
as well as clearing and preempting reservations held by
other keys.

These commands are crucial for management and control of
shared storage resources in a persistent manner.
Signed-off-by: Changqi Lu 
Signed-off-by: zhenwei pi 
Acked-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   | 346 +++
 hw/nvme/ns.c |   6 +
 hw/nvme/nvme.h   |   9 ++
 include/block/nvme.h |  44 ++
 4 files changed, 405 insertions(+)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index ad212de723..ba98b86f9c 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -294,6 +294,10 @@ static const uint32_t nvme_cse_iocs_nvm[256] = {
 [NVME_CMD_COMPARE]  = NVME_CMD_EFF_CSUPP,
 [NVME_CMD_IO_MGMT_RECV] = NVME_CMD_EFF_CSUPP,
 [NVME_CMD_IO_MGMT_SEND] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
+[NVME_CMD_RESV_REGISTER]= NVME_CMD_EFF_CSUPP,
+[NVME_CMD_RESV_REPORT]  = NVME_CMD_EFF_CSUPP,
+[NVME_CMD_RESV_ACQUIRE] = NVME_CMD_EFF_CSUPP,
+[NVME_CMD_RESV_RELEASE] = NVME_CMD_EFF_CSUPP,
 };
 
 static const uint32_t nvme_cse_iocs_zoned[256] = {
@@ -308,6 +312,10 @@ static const uint32_t nvme_cse_iocs_zoned[256] = {
 [NVME_CMD_ZONE_APPEND]  = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 [NVME_CMD_ZONE_MGMT_SEND]   = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 [NVME_CMD_ZONE_MGMT_RECV]   = NVME_CMD_EFF_CSUPP,
+[NVME_CMD_RESV_REGISTER]= NVME_CMD_EFF_CSUPP,
+[NVME_CMD_RESV_REPORT]  = NVME_CMD_EFF_CSUPP,
+[NVME_CMD_RESV_ACQUIRE] = NVME_CMD_EFF_CSUPP,
+[NVME_CMD_RESV_RELEASE] = NVME_CMD_EFF_CSUPP,
 };
 
 static void nvme_process_sq(void *opaque);
@@ -1747,6 +1755,13 @@ static void nvme_aio_err(NvmeRequest *req, int ret)
 case NVME_CMD_READ:
 status = NVME_UNRECOVERED_READ;
 break;
+case NVME_CMD_RESV_REPORT:
+if (ret == -ENOTSUP) {
+status = NVME_INVALID_OPCODE;
+} else {
+status = NVME_UNRECOVERED_READ;
+}
+break;
 case NVME_CMD_FLUSH:
 case NVME_CMD_WRITE:
 case NVME_CMD_WRITE_ZEROES:
@@ -1754,6 +1769,15 @@ static void nvme_aio_err(NvmeRequest *req, int ret)
 case NVME_CMD_COPY:
 status = NVME_WRITE_FAULT;
 break;
+case NVME_CMD_RESV_REGISTER:
+case NVME_CMD_RESV_ACQUIRE:
+case NVME_CMD_RESV_RELEASE:
+if (ret == -ENOTSUP) {
+status = NVME_INVALID_OPCODE;
+} else {
+status = NVME_WRITE_FAULT;
+}
+break;
 default:
 status = NVME_INTERNAL_DEV_ERROR;
 break;
@@ -2692,6 +2716,320 @@ static uint16_t nvme_verify(NvmeCtrl *n, NvmeRequest 
*req)
 return NVME_NO_COMPLETE;
 }
 
+typedef struct NvmeKeyInfo {
+uint64_t cr_key;
+uint64_t nr_key;
+} NvmeKeyInfo;
+
+static uint16_t nvme_resv_register(NvmeCtrl *n, NvmeRequest *req)
+{
+int ret;
+NvmeKeyInfo key_info;
+NvmeNamespace *ns = req->ns;
+uint32_t cdw10 = le32_to_cpu(req->cmd.cdw10);
+bool ignore_key = cdw10 >> 3 & 0x1;
+uint8_t action = cdw10 & 0x7;
+uint8_t ptpl = cdw10 >> 30 & 0x3;
+bool aptpl;
+
+if (!nvme_support_pr(ns)) {
+return NVME_INVALID_OPCODE;
+}
+
+switch (ptpl) {
+case NVME_RESV_PTPL_NO_CHANGE:
+aptpl = (ns->id_ns.rescap & NVME_PR_CAP_PTPL) ? true : false;
+break;
+case NVME_RESV_PTPL_DISABLE:
+aptpl = false;
+break;
+case NVME_RESV_PTPL_ENABLE:
+aptpl = true;
+break;
+default:
+return NVME_INVALID_FIELD;
+}
+
+ret = nvme_h2c(n, (uint8_t *)&key_info, sizeof(NvmeKeyInfo), req);
+if (ret) {
+return ret;
+}
+
+switch (action) {
+case NVME_RESV_REGISTER_ACTION_REGISTER:
+req->aiocb = blk_aio_pr_register(ns->blkconf.blk, 0,
+ key_info.nr_key, 0, aptpl,
+ ignore_key, nvme_misc_cb,
+ req);
+break;
+case NVME_RESV_REGISTER_ACTION_UNREGISTER:
+req->aiocb = blk_aio_pr_register(ns->blkconf.blk, key_info.cr_key, 0,
+ 0, aptpl, ignore_key,
+ nvme_misc_cb, req);
+break;
+case NVME_RESV_REGISTER_ACTION_REPLACE:
+req->aiocb = blk_aio_pr_register(ns->blkconf.blk, key_info.cr_key,
+ key_info.nr_key, 0, aptpl, ignore_key,
+ nvme_misc_cb, req);
+break;
+default:
+return NVME_INVA

[PATCH v11 10/10] block/iscsi: add persistent reservation in/out driver

2024-09-09 Thread Changqi Lu

Add persistent reservation in/out operations for iscsi driver.
The following methods are implemented: bdrv_co_pr_read_keys,
bdrv_co_pr_read_reservation, bdrv_co_pr_register, bdrv_co_pr_reserve,
bdrv_co_pr_release, bdrv_co_pr_clear and bdrv_co_pr_preempt.

Signed-off-by: Changqi Lu 
Signed-off-by: zhenwei pi 
Reviewed-by: Stefan Hajnoczi 
---
 block/iscsi.c | 433 ++
 1 file changed, 433 insertions(+)

diff --git a/block/iscsi.c b/block/iscsi.c
index 2ff14b7472..ea2f6e94a5 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -96,6 +96,7 @@ typedef struct IscsiLun {
 unsigned long *allocmap_valid;
 long allocmap_size;
 int cluster_size;
+uint8_t pr_cap;
 bool use_16_for_rw;
 bool write_protected;
 bool lbpme;
@@ -280,6 +281,10 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int 
status,
 iTask->err_code = -error;
 iTask->err_str = g_strdup(iscsi_get_error(iscsi));
 }
+} else if (status == SCSI_STATUS_RESERVATION_CONFLICT) {
+iTask->err_code = -EBADE;
+error_report("iSCSI Persistent Reservation Conflict: %s",
+ iscsi_get_error(iscsi));
 }
 }
 }
@@ -1792,6 +1797,50 @@ static void iscsi_save_designator(IscsiLun *lun,
 }
 }
 
+/*
+ *  Ensure iscsi_open() must succeed, weather or not the target
+ *  implement SCSI_PR_IN_REPORT_CAPABILITIES.
+ */
+static void iscsi_get_pr_cap_sync(IscsiLun *iscsilun)
+{
+struct scsi_task *task = NULL;
+struct scsi_persistent_reserve_in_report_capabilities *rc = NULL;
+int retries = ISCSI_CMD_RETRIES;
+int xferlen = sizeof(struct 
scsi_persistent_reserve_in_report_capabilities);
+
+do {
+if (task != NULL) {
+scsi_free_scsi_task(task);
+task = NULL;
+}
+
+task = iscsi_persistent_reserve_in_sync(iscsilun->iscsi,
+   iscsilun->lun, SCSI_PR_IN_REPORT_CAPABILITIES, xferlen);
+if (task != NULL && task->status == SCSI_STATUS_GOOD) {
+rc = scsi_datain_unmarshall(task);
+if (rc == NULL) {
+error_report("iSCSI: Failed to unmarshall "
+ "report capabilities data.");
+} else {
+iscsilun->pr_cap =
+scsi_pr_cap_to_block(rc->persistent_reservation_type_mask);
+iscsilun->pr_cap |= (rc->ptpl_a) ? BLK_PR_CAP_PTPL : 0;
+}
+break;
+}
+} while (task != NULL && task->status == SCSI_STATUS_CHECK_CONDITION
+ && task->sense.key == SCSI_SENSE_UNIT_ATTENTION
+ && retries-- > 0);
+
+if (task == NULL || task->status != SCSI_STATUS_GOOD) {
+error_report("iSCSI: failed to send report capabilities command.");
+}
+
+if (task) {
+scsi_free_scsi_task(task);
+}
+}
+
 static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
   Error **errp)
 {
@@ -2024,6 +2073,7 @@ static int iscsi_open(BlockDriverState *bs, QDict 
*options, int flags,
 bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP;
 }
 
+iscsi_get_pr_cap_sync(iscsilun);
 out:
 qemu_opts_del(opts);
 g_free(initiator_name);
@@ -2110,6 +2160,8 @@ static void iscsi_refresh_limits(BlockDriverState *bs, 
Error **errp)
 bs->bl.opt_transfer = pow2floor(iscsilun->bl.opt_xfer_len *
 iscsilun->block_size);
 }
+
+bs->bl.pr_cap = iscsilun->pr_cap;
 }
 
 /* Note that this will not re-establish a connection with an iSCSI target - it
@@ -2408,6 +2460,379 @@ out_unlock:
 return r;
 }
 
+static int coroutine_fn
+iscsi_co_pr_read_keys(BlockDriverState *bs, uint32_t *generation,
+  uint32_t num_keys, uint64_t *keys)
+{
+IscsiLun *iscsilun = bs->opaque;
+QEMUIOVector qiov;
+struct IscsiTask iTask;
+int xferlen = sizeof(struct scsi_persistent_reserve_in_read_keys) +
+  sizeof(uint64_t) * num_keys;
+g_autofree uint8_t *buf = g_malloc0(xferlen);
+int32_t num_collect_keys = 0;
+int r = 0;
+
+qemu_iovec_init_buf(&qiov, buf, xferlen);
+iscsi_co_init_iscsitask(iscsilun, &iTask);
+qemu_mutex_lock(&iscsilun->mutex);
+retry:
+iTask.task = iscsi_persistent_reserve_in_task(iscsilun->iscsi,
+ iscsilun->lun, SCSI_PR_IN_READ_KEYS, xferlen,
+ iscsi_co_generic_cb, &iTask);
+
+if (iTask.task == NULL) {
+qemu_mutex_unlock(&iscsilun->mutex);
+return -ENOMEM;
+}
+
+scsi_task_set_iov_in(iTask.task, (struct scsi_iovec *)qiov.iov, qiov.niov);
+iscsi_co_wait_for_task(&iTask, iscsilun);
+
+if (iTask.task != NULL) {
+scsi_free_scsi_task(iTask.task);
+iTask.task = NULL;
+}
+
+if (iTask.do_retry) {
+iTask.complete = 0;
+goto retry;
+}
+
+i

[PATCH v11 07/10] hw/nvme: add helper functions for converting reservation types

2024-09-09 Thread Changqi Lu

This commit introduces two helper functions
that facilitate the conversion between the
reservation types used in the NVME protocol
and those used in the block layer.

Reviewed-by: Klaus Jensen 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Changqi Lu 
Signed-off-by: zhenwei pi 
---
 hw/nvme/nvme.h | 84 ++
 1 file changed, 84 insertions(+)

diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index bed8191bd5..6d0e456348 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -474,6 +474,90 @@ static inline const char *nvme_io_opc_str(uint8_t opc)
 }
 }
 
+static inline NvmeResvType block_pr_type_to_nvme(BlockPrType type)
+{
+switch (type) {
+case BLK_PR_WRITE_EXCLUSIVE:
+return NVME_RESV_WRITE_EXCLUSIVE;
+case BLK_PR_EXCLUSIVE_ACCESS:
+return NVME_RESV_EXCLUSIVE_ACCESS;
+case BLK_PR_WRITE_EXCLUSIVE_REGS_ONLY:
+return NVME_RESV_WRITE_EXCLUSIVE_REGS_ONLY;
+case BLK_PR_EXCLUSIVE_ACCESS_REGS_ONLY:
+return NVME_RESV_EXCLUSIVE_ACCESS_REGS_ONLY;
+case BLK_PR_WRITE_EXCLUSIVE_ALL_REGS:
+return NVME_RESV_WRITE_EXCLUSIVE_ALL_REGS;
+case BLK_PR_EXCLUSIVE_ACCESS_ALL_REGS:
+return NVME_RESV_EXCLUSIVE_ACCESS_ALL_REGS;
+}
+
+return 0;
+}
+
+static inline BlockPrType nvme_pr_type_to_block(NvmeResvType type)
+{
+switch (type) {
+case NVME_RESV_WRITE_EXCLUSIVE:
+return BLK_PR_WRITE_EXCLUSIVE;
+case NVME_RESV_EXCLUSIVE_ACCESS:
+return BLK_PR_EXCLUSIVE_ACCESS;
+case NVME_RESV_WRITE_EXCLUSIVE_REGS_ONLY:
+return BLK_PR_WRITE_EXCLUSIVE_REGS_ONLY;
+case NVME_RESV_EXCLUSIVE_ACCESS_REGS_ONLY:
+return BLK_PR_EXCLUSIVE_ACCESS_REGS_ONLY;
+case NVME_RESV_WRITE_EXCLUSIVE_ALL_REGS:
+return BLK_PR_WRITE_EXCLUSIVE_ALL_REGS;
+case NVME_RESV_EXCLUSIVE_ACCESS_ALL_REGS:
+return BLK_PR_EXCLUSIVE_ACCESS_ALL_REGS;
+}
+
+return 0;
+}
+
+static inline uint8_t nvme_pr_cap_to_block(uint16_t nvme_pr_cap)
+{
+uint8_t res = 0;
+
+res |= (nvme_pr_cap & NVME_PR_CAP_PTPL) ?
+   NVME_PR_CAP_PTPL : 0;
+res |= (nvme_pr_cap & NVME_PR_CAP_WR_EX) ?
+   BLK_PR_CAP_WR_EX : 0;
+res |= (nvme_pr_cap & NVME_PR_CAP_EX_AC) ?
+   BLK_PR_CAP_EX_AC : 0;
+res |= (nvme_pr_cap & NVME_PR_CAP_WR_EX_RO) ?
+   BLK_PR_CAP_WR_EX_RO : 0;
+res |= (nvme_pr_cap & NVME_PR_CAP_EX_AC_RO) ?
+   BLK_PR_CAP_EX_AC_RO : 0;
+res |= (nvme_pr_cap & NVME_PR_CAP_WR_EX_AR) ?
+   BLK_PR_CAP_WR_EX_AR : 0;
+res |= (nvme_pr_cap & NVME_PR_CAP_EX_AC_AR) ?
+   BLK_PR_CAP_EX_AC_AR : 0;
+
+return res;
+}
+
+static inline uint8_t block_pr_cap_to_nvme(uint8_t block_pr_cap)
+{
+uint16_t res = 0;
+
+res |= (block_pr_cap & BLK_PR_CAP_PTPL) ?
+  NVME_PR_CAP_PTPL : 0;
+res |= (block_pr_cap & BLK_PR_CAP_WR_EX) ?
+  NVME_PR_CAP_WR_EX : 0;
+res |= (block_pr_cap & BLK_PR_CAP_EX_AC) ?
+  NVME_PR_CAP_EX_AC : 0;
+res |= (block_pr_cap & BLK_PR_CAP_WR_EX_RO) ?
+  NVME_PR_CAP_WR_EX_RO : 0;
+res |= (block_pr_cap & BLK_PR_CAP_EX_AC_RO) ?
+  NVME_PR_CAP_EX_AC_RO : 0;
+res |= (block_pr_cap & BLK_PR_CAP_WR_EX_AR) ?
+  NVME_PR_CAP_WR_EX_AR : 0;
+res |= (block_pr_cap & BLK_PR_CAP_EX_AC_AR) ?
+  NVME_PR_CAP_EX_AC_AR : 0;
+
+return res;
+}
+
 typedef struct NvmeSQueue {
 struct NvmeCtrl *ctrl;
 uint16_tsqid;
-- 
2.20.1

[PATCH v11 08/10] hw/nvme: enable ONCS and rescap function

2024-09-09 Thread Changqi Lu

This commit enables ONCS to support the reservation
function at the controller level. Also enables rescap
function in the namespace by detecting the supported reservation
function in the backend driver.

Reviewed-by: Klaus Jensen 
Signed-off-by: Changqi Lu 
Signed-off-by: zhenwei pi 
Reviewed-by: Stefan Hajnoczi 
---
 hw/nvme/ctrl.c   | 3 ++-
 hw/nvme/ns.c | 5 +
 include/block/nvme.h | 2 +-
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 127c3d2383..ad212de723 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -8248,7 +8248,8 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 id->nn = cpu_to_le32(NVME_MAX_NAMESPACES);
 id->oncs = cpu_to_le16(NVME_ONCS_WRITE_ZEROES | NVME_ONCS_TIMESTAMP |
NVME_ONCS_FEATURES | NVME_ONCS_DSM |
-   NVME_ONCS_COMPARE | NVME_ONCS_COPY);
+   NVME_ONCS_COMPARE | NVME_ONCS_COPY |
+   NVME_ONCS_RESERVATIONS);
 
 /*
  * NOTE: If this device ever supports a command set that does NOT use 0x0
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index ea8db175db..a5c903d727 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -20,6 +20,7 @@
 #include "qemu/bitops.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/block-backend.h"
+#include "block/block_int.h"
 
 #include "nvme.h"
 #include "trace.h"
@@ -33,6 +34,7 @@ void nvme_ns_init_format(NvmeNamespace *ns)
 BlockDriverInfo bdi;
 int npdg, ret;
 int64_t nlbas;
+uint8_t blk_pr_cap;
 
 ns->lbaf = id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
 ns->lbasz = 1 << ns->lbaf.ds;
@@ -55,6 +57,9 @@ void nvme_ns_init_format(NvmeNamespace *ns)
 }
 
 id_ns->npda = id_ns->npdg = npdg - 1;
+
+blk_pr_cap = blk_bs(ns->blkconf.blk)->bl.pr_cap;
+id_ns->rescap = block_pr_cap_to_nvme(blk_pr_cap);
 }
 
 static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 8b125f7769..9b9eaeb3a7 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -1251,7 +1251,7 @@ enum NvmeIdCtrlOncs {
 NVME_ONCS_DSM   = 1 << 2,
 NVME_ONCS_WRITE_ZEROES  = 1 << 3,
 NVME_ONCS_FEATURES  = 1 << 4,
-NVME_ONCS_RESRVATIONS   = 1 << 5,
+NVME_ONCS_RESERVATIONS  = 1 << 5,
 NVME_ONCS_TIMESTAMP = 1 << 6,
 NVME_ONCS_VERIFY= 1 << 7,
 NVME_ONCS_COPY  = 1 << 8,
-- 
2.20.1

Re: [PATCH v2 17/17] vfio/migration: Multifd device state transfer support - send side

2024-09-09 Thread Avihai Horon




On 27/08/2024 20:54, Maciej S. Szmigiero wrote:

External email: Use caution opening links or attachments


From: "Maciej S. Szmigiero" 

Implement the multifd device state transfer via additional per-device
thread inside save_live_complete_precopy_thread handler.

Switch between doing the data transfer in the new handler and doing it
in the old save_state handler depending on the
x-migration-multifd-transfer device property value.

Signed-off-by: Maciej S. Szmigiero 
---
  hw/vfio/migration.c   | 169 ++
  hw/vfio/trace-events  |   2 +
  include/hw/vfio/vfio-common.h |   1 +
  3 files changed, 172 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 57c1542528dc..67996aa2df8b 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -655,6 +655,16 @@ static int vfio_save_setup(QEMUFile *f, void *opaque, 
Error **errp)
  uint64_t stop_copy_size = VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE;
  int ret;

+/* Make a copy of this setting at the start in case it is changed 
mid-migration */
+migration->multifd_transfer = vbasedev->migration_multifd_transfer;


Should VFIO multifd be controlled by main migration multifd capability, 
and let the per VFIO device migration_multifd_transfer property be 
immutable and enabled by default?
Then we would have a single point of configuration (and an extra one per 
VFIO device just to disable for backward compatibility).

Unless there are other benefits to have this property configurable?


+
+if (migration->multifd_transfer && !migration_has_device_state_support()) {
+error_setg(errp,
+   "%s: Multifd device transfer requested but unsupported in the 
current config",
+   vbasedev->name);
+return -EINVAL;
+}
+
  qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE);

  vfio_query_stop_copy_size(vbasedev, &stop_copy_size);
@@ -835,10 +845,20 @@ static int vfio_save_iterate(QEMUFile *f, void *opaque)
  static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
  {
  VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
  ssize_t data_size;
  int ret;
  Error *local_err = NULL;

+if (migration->multifd_transfer) {
+/*
+ * Emit dummy NOP data, vfio_save_complete_precopy_thread()
+ * does the actual transfer.
+ */
+qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);


There are three places where we send this dummy end of state, maybe 
worth extracting it to a helper? I.e., vfio_send_end_of_state() and then 
document there the rationale.



+return 0;
+}
+
  trace_vfio_save_complete_precopy_started(vbasedev->name);

  /* We reach here with device state STOP or STOP_COPY only */
@@ -864,12 +884,159 @@ static int vfio_save_complete_precopy(QEMUFile *f, void 
*opaque)
  return ret;
  }

+static int vfio_save_complete_precopy_async_thread_config_state(VFIODevice 
*vbasedev,
+char *idstr,
+uint32_t 
instance_id,
+uint32_t idx)
+{
+g_autoptr(QIOChannelBuffer) bioc = NULL;
+QEMUFile *f = NULL;
+int ret;
+g_autofree VFIODeviceStatePacket *packet = NULL;
+size_t packet_len;
+
+bioc = qio_channel_buffer_new(0);
+qio_channel_set_name(QIO_CHANNEL(bioc), "vfio-device-config-save");
+
+f = qemu_file_new_output(QIO_CHANNEL(bioc));
+
+ret = vfio_save_device_config_state(f, vbasedev, NULL);
+if (ret) {
+return ret;


Need to close f in this case.


+}
+
+ret = qemu_fflush(f);
+if (ret) {
+goto ret_close_file;
+}
+
+packet_len = sizeof(*packet) + bioc->usage;
+packet = g_malloc0(packet_len);
+packet->idx = idx;
+packet->flags = VFIO_DEVICE_STATE_CONFIG_STATE;
+memcpy(&packet->data, bioc->data, bioc->usage);
+
+if (!multifd_queue_device_state(idstr, instance_id,
+(char *)packet, packet_len)) {
+ret = -1;


goto ret_close_file?


+}
+
+bytes_transferred += packet_len;


bytes_transferred is a global variable. Now that we access it from 
multiple threads it should be protected.
Note that now the VFIO device data is reported also in multifd stats (if 
I am not mistaken), is this the behavior we want? Maybe we should 
enhance multifd stats to distinguish between RAM data and device data?



+
+ret_close_file:


Rename to "out" as we only have one exit point?


+g_clear_pointer(&f, qemu_fclose);


f is a local variable, wouldn't qemu_fclose(f) be enough here?


+return ret;
+}
+
+static int vfio_save_complete_precopy_thread(char *idstr,
+ uint32_t instance_id,
+ bool *abort_flag,
+

Re: [PATCH v4 1/2] target/loongarch: Add loongson binary translation feature

2024-09-09 Thread gaosong





在 2024/9/4 下午2:18, Bibo Mao 写道:

Loongson Binary Translation (LBT) is used to accelerate binary
translation, which contains 4 scratch registers (scr0 to scr3), x86/ARM
eflags (eflags) and x87 fpu stack pointer (ftop).

Now LBT feature is added in kvm mode, not supported in TCG mode since
it is not emulated. Feature variable lbt is added with OnOffAuto type,
If lbt feature is not supported with KVM host, it reports error if there
is lbt=on command line.

If there is no any command line about lbt parameter, it checks whether
KVM host supports lbt feature and set the corresponding value in cpucfg.

Signed-off-by: Bibo Mao 
---
  target/loongarch/cpu.c| 24 +++
  target/loongarch/cpu.h|  6 +++
  target/loongarch/kvm/kvm.c| 57 ++-
  target/loongarch/loongarch-qmp-cmds.c |  2 +-
  4 files changed, 87 insertions(+), 2 deletions(-)



Reviewed-by: Song Gao 

Thanks
Song Gao

Re: [PATCH v4 2/2] target/loongarch: Implement lbt registers save/restore function

2024-09-09 Thread gaosong





在 2024/9/4 下午2:18, Bibo Mao 写道:

Six registers scr0 - scr3, eflags and ftop are added in percpu vmstate.
And two functions kvm_loongarch_get_lbt/kvm_loongarch_put_lbt are added
to save/restore lbt registers.

Signed-off-by: Bibo Mao 
---
  target/loongarch/cpu.h | 12 
  target/loongarch/kvm/kvm.c | 60 ++
  target/loongarch/machine.c | 24 +++
  3 files changed, 96 insertions(+)



Reviewed-by: Song Gao 

Thanks
Song Gao

Re: [PATCH v2] virtio: kconfig: memory devices are PCI only

2024-09-09 Thread David Hildenbrand


On 06.09.24 12:16, Paolo Bonzini wrote:

Virtio memory devices rely on PCI BARs to expose the contents of memory.
Because of this they cannot be used (yet) with virtio-mmio or virtio-ccw.
In fact the code that is common to virtio-mem and virtio-pmem, which
is in hw/virtio/virtio-md-pci.c, is only included if CONFIG_VIRTIO_PCI
is set.  Reproduce the same condition in the Kconfig file, only allowing
VIRTIO_MEM and VIRTIO_PMEM to be defined if the transport supports it.

Without this patch it is possible to create a configuration with
CONFIG_VIRTIO_PCI=n and CONFIG_VIRTIO_MEM=y, but that causes a
linking failure.


I'll queue this to

https://github.com/davidhildenbrand/qemu.git mem-next

and drop it if someone beats me to up-streaming it :)

--
Cheers,

David / dhildenb

Re: [PULL 0/3] m68k patches

2024-09-09 Thread Peter Maydell

On Sun, 8 Sept 2024 at 14:11, Thomas Huth  wrote:
>
>  Hi!
>
> The following changes since commit 1581a0bc928d61230ed6e43bcb83f2f6737d0bc0:
>
>   Merge tag 'pull-ufs-20240906' of https://gitlab.com/jeuk20.kim/qemu into 
> staging (2024-09-06 15:27:43 +0100)
>
> are available in the Git repository at:
>
>   https://gitlab.com/huth/qemu.git tags/pull-request-2024-09-08
>
> for you to fetch changes up to df827aace663fdd9c432e2ff76fb13d20cbc0ca4:
>
>   hw/nubus/nubus-device: Range check 'slot' property (2024-09-08 11:49:49 
> +0200)
>
> 
> * Fix Coverity issues in mcf5208evb and nubus machines
> * Add URLs for mcf5208evb datasheets
>
> 
>
> Peter Maydell (3):
>   hw/m68k/mcf5208: Avoid shifting off end of integer
>   hw/m68k/mcf5208: Add URLs for datasheets
>   hw/nubus/nubus-device: Range check 'slot' property


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/9.2
for any user-visible changes.

-- PMM

Re: [PATCH v3 0/4] virtio-mem: Implement support for suspend+wake-up with plugged memory

2024-09-09 Thread David Hildenbrand


On 04.09.24 12:37, Juraj Marcin wrote:

Currently, the virtio-mem device would unplug all the memory with any
reset request, including when the machine wakes up from a suspended
state (deep sleep). This would lead to a loss of the contents of the
guest memory and therefore is disabled by the virtio-mem Linux Kernel
driver unless the VIRTIO_MEM_F_PERSISTENT_SUSPEND virtio feature is
exposed. [1]

To make deep sleep with virtio-mem possible, we need to differentiate
cold start reset from wake-up reset. The first patch updates
qemu_system_reset() and MachineClass children to accept ResetType
instead of ShutdownCause, which then could be passed down the device
tree. The second patch then introduces the new reset type for the
wake-up event and updates the i386 wake-up method (only architecture
using the explicit wake-up method).

The third patch replaces LegacyReset with the Resettable interface in
virtio-mem, so the memory device can access the reset type in the hold
phase. The last patch of the series implements the final support in the
hold phase of the virtio-mem reset callback and exposes
VIRTIO_MEM_F_PERSISTENT_SUSPEND to the kernel.

[1]: https://lore.kernel.org/all/20240318120645.105664-1-da...@redhat.com/


Thanks, I'll queue this to

https://github.com/davidhildenbrand/qemu.git mem-next

@Peter, it would be great if you could have another look at patch #2, 
thanks.


--
Cheers,

David / dhildenb

Re: [RFC v3 3/3] vhost: Allocate memory for packed vring

2024-09-09 Thread Eugenio Perez Martin

On Sun, Sep 8, 2024 at 9:47 PM Sahil  wrote:
>
> Hi,
>
> On Friday, August 30, 2024 4:18:31 PM GMT+5:30 Eugenio Perez Martin wrote:
> > On Fri, Aug 30, 2024 at 12:20 PM Sahil  wrote:
> > > Hi,
> > >
> > > On Tuesday, August 27, 2024 9:00:36 PM GMT+5:30 Eugenio Perez Martin 
> > > wrote:
> > > > On Wed, Aug 21, 2024 at 2:20 PM Sahil  wrote:
> > > > > [...]
> > > > > I have been trying to test my changes so far as well. I am not very
> > > > > clear
> > > > > on a few things.
> > > > >
> > > > > Q1.
> > > > > I built QEMU from source with my changes and followed the vdpa_sim +
> > > > > vhost_vdpa tutorial [1]. The VM seems to be running fine. How do I
> > > > > check
> > > > > if the packed format is being used instead of the split vq format for
> > > > > shadow virtqueues? I know the packed format is used when virtio_vdev
> > > > > has
> > > > > got the VIRTIO_F_RING_PACKED bit enabled. Is there a way of checking
> > > > > that
> > > > > this is the case?
> > > >
> > > > You can see the features that the driver acked from the guest by
> > > > checking sysfs. Once you know the PCI BFN from lspci:
> > > > # lspci -nn|grep '\[1af4:1041\]'
> > > > 01:00.0 Ethernet controller [0200]: Red Hat, Inc. Virtio 1.0 network
> > > > device [1af4:1041] (rev 01)
> > > > # cut -c 35
> > > > /sys/devices/pci:00/:00:02.0/:01:00.0/virtio0/features 0
> > > >
> > > > Also, you can check from QEMU by simply tracing if your functions are
> > > > being called.
> > > >
> > > > > Q2.
> > > > > What's the recommended way to see what's going on under the hood? I
> > > > > tried
> > > > > using the -D option so QEMU's logs are written to a file but the file
> > > > > was
> > > > > empty. Would using qemu with -monitor stdio or attaching gdb to the
> > > > > QEMU
> > > > > VM be worthwhile?
> > > >
> > > > You need to add --trace options with the regex you want to get to
> > > > enable any output. For example, --trace 'vhost_vdpa_*' print all the
> > > > trace_vhost_vdpa_* functions.
> > > >
> > > > If you want to speed things up, you can just replace the interesting
> > > > trace_... functions with fprintf(stderr, ...). We can add the trace
> > > > ones afterwards.
> > >
> > > Understood. I am able to trace the functions that are being called with
> > > fprintf. I'll stick with fprintf for now.
> > >
> > > I realized that packed vqs are not being used in the test environment. I
> > > see that in "hw/virtio/vhost-shadow-virtqueue.c", svq->is_packed is set
> > > to 0 and that calls vhost_svq_add_split(). I am not sure how one enables
> > > the packed feature bit. I don't know if this is an environment issue.
> > >
> > > I built qemu from the latest source with my changes on top of it. I
> > > followed this article [1] to set up the environment.
> > >
> > > On the host machine:
> > >
> > > $ uname -a
> > > Linux fedora 6.10.5-100.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Aug 14
> > > 15:49:25 UTC 2024 x86_64 GNU/Linux
> > >
> > > $ ./qemu/build/qemu-system-x86_64 --version
> > > QEMU emulator version 9.0.91
> > >
> > > $ vdpa -V
> > > vdpa utility, iproute2-6.4.0
> > >
> > > All the relevant vdpa modules have been loaded in accordance with [1].
> > >
> > > $ lsmod | grep -iE "(vdpa|virtio)"
> > > vdpa_sim_net12288  0
> > > vdpa_sim24576  1 vdpa_sim_net
> > > vringh  32768  2 vdpa_sim,vdpa_sim_net
> > > vhost_vdpa  32768  2
> > > vhost   65536  1 vhost_vdpa
> > > vhost_iotlb 16384  4 vdpa_sim,vringh,vhost_vdpa,vhost
> > > vdpa36864  3 vdpa_sim,vhost_vdpa,vdpa_sim_net
> > >
> > > $ ls -l /sys/bus/vdpa/devices/vdpa0/driver
> > > lrwxrwxrwx. 1 root root 0 Aug 30 11:25 /sys/bus/vdpa/devices/vdpa0/driver
> > > -> ../../bus/vdpa/drivers/vhost_vdpa
> > >
> > > In the output of the following command, I see ANY_LAYOUT is supported.
> > > According to virtio_config.h [2] in the linux kernel, this represents the
> > > layout of descriptors. This refers to split and packed vqs, right?
> > >
> > > $ vdpa mgmtdev show
> > >
> > > vdpasim_net:
> > >   supported_classes net
> > >   max_supported_vqs 3
> > >   dev_features MTU MAC STATUS CTRL_VQ CTRL_MAC_ADDR ANY_LAYOUT VERSION_1
> > >   ACCESS_PLATFORM>
> > > $ vdpa dev show -jp
> > > {
> > >
> > > "dev": {
> > >
> > > "vdpa0": {
> > >
> > > "type": "network",
> > > "mgmtdev": "vdpasim_net",
> > > "vendor_id": 0,
> > > "max_vqs": 3,
> > > "max_vq_size": 256
> > >
> > > }
> > >
> > > }
> > >
> > > }
> > >
> > > I started the VM by running:
> > >
> > > $ sudo ./qemu/build/qemu-system-x86_64 \
> > > -enable-kvm \
> > > -drive file=//home/ig91/fedora_qemu_test_vm/L1.qcow2,media=disk,if=virtio
> > > \
> > > -net nic,model=virtio \
> > > -net user,hostfwd=tcp::2226-:22 \
> > > -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0 \
> > > -device
> > > virtio-net-pci,netdev=vhost-vdpa0,bus=pci.0,addr=0x7,disable-legacy=on,di
> > > sable-m

Re: [PATCH v11 08/11] vfio/migration: Implement VFIO migration protocol v2

2024-09-09 Thread Avihai Horon




On 05/09/2024 21:31, Peter Xu wrote:

External email: Use caution opening links or attachments


On Thu, Sep 05, 2024 at 07:45:43PM +0300, Avihai Horon wrote:

Does it also mean then that the currently reported stop-size - precopy-size
will be very close to the constant non-iterable data size?

It's not constant, while the VM is running it can change.

I wonder how heavy is VFIO_DEVICE_FEATURE_MIG_DATA_SIZE ioctl.

I just gave it a quick shot with a busy VM migrating and estimate() is
invoked only every ~100ms.

VFIO might be different, but I wonder whether we can fetch stop-size in
estimate() somehow, so it's still a pretty fast estimate() meanwhile we
avoid the rest of exact() calls (which are destined to be useless without
VFIO).

IIUC so far the estimate()/exact() was because ram sync is heavy when
exact().  When idle it's 80+ms now for 32G VM with current master (which
has a bug and I'm fixing it up [1]..), even if after the fix it's 3ms (I
think both numbers contain dirty bitmap sync for both vfio and kvm).  So in
that case maybe we can still try fetching stop-size only for both
estimate() and exact(), but only sync bitmap in exact().


IIUC, the end goal is to prevent migration thread spinning uselessly in 
pre-copy in such scenarios, right?
If eventually we do call get stop-copy-size in estimate(), we will move 
the spinning from "exact() -> estimate() -> exact() -> estimate() ..." 
to "estimate() -> estimate() -> ...".
If so, what benefit would we get from this? We only move the useless 
work to other place.
Shouldn't we directly go for the non precopy-able vs precopy-able report 
that you suggested?


Thanks.

Re: [PATCH] hw/loongarch: Add acpi SPCR table support

2024-09-09 Thread gaosong


在 2024/9/7 下午3:30, Bibo Mao 写道:

Serial port console redirection table can be used for default serial
port selection, like chosen stdout-path selection with FDT method.

With acpi SPCR table added, early debug console can be parsed from
SPCR table with simple kernel parameter earlycon rather than
earlycon=uart,mmio,0x1fe001e0

Signed-off-by: Bibo Mao 
---
  hw/loongarch/acpi-build.c | 40 +++
  1 file changed, 40 insertions(+)

Reviewed-by: Song Gao 

Thanks
Song Gao

diff --git a/hw/loongarch/acpi-build.c b/hw/loongarch/acpi-build.c
index 2638f87434..3912c8d307 100644
--- a/hw/loongarch/acpi-build.c
+++ b/hw/loongarch/acpi-build.c
@@ -241,6 +241,44 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
  acpi_table_end(linker, &table);
  }
  
+/*

+ * Serial Port Console Redirection Table (SPCR)
+ * 
https://learn.microsoft.com/en-us/windows-hardware/drivers/serports/serial-port-console-redirection-table
+ */
+static void
+spcr_setup(GArray *table_data, BIOSLinker *linker, MachineState *machine)
+{
+LoongArchVirtMachineState *lvms;
+AcpiSpcrData serial = {
+.interface_type = 0,   /* 16550 compatible */
+.base_addr.id = AML_AS_SYSTEM_MEMORY,
+.base_addr.width = 32,
+.base_addr.offset = 0,
+.base_addr.size = 1,
+.base_addr.addr = VIRT_UART_BASE,
+.interrupt_type = 0,   /* Interrupt not supported */
+.pc_interrupt = 0,
+.interrupt = VIRT_UART_IRQ,
+.baud_rate = 7,/* 115200 */
+.parity = 0,
+.stop_bits = 1,
+.flow_control = 0,
+.terminal_type = 3,/* ANSI */
+.language = 0, /* Language */
+.pci_device_id = 0x,   /* not a PCI device*/
+.pci_vendor_id = 0x,   /* not a PCI device*/
+.pci_bus = 0,
+.pci_device = 0,
+.pci_function = 0,
+.pci_flags = 0,
+.pci_segment = 0,
+};
+
+lvms = LOONGARCH_VIRT_MACHINE(machine);
+build_spcr(table_data, linker, &serial, 2, lvms->oem_id,
+   lvms->oem_table_id);
+}
+
  typedef
  struct AcpiBuildState {
  /* Copy of table in RAM (for patching). */
@@ -477,6 +515,8 @@ static void acpi_build(AcpiBuildTables *tables, 
MachineState *machine)
  
  acpi_add_table(table_offsets, tables_blob);

  build_srat(tables_blob, tables->linker, machine);
+acpi_add_table(table_offsets, tables_blob);
+spcr_setup(tables_blob, tables->linker, machine);
  
  if (machine->numa_state->num_nodes) {

  if (machine->numa_state->have_numa_distance) {

base-commit: 7b87a25f49a301d3377f3e71e0b4a62540c6f6e4

Re: [PATCH v4 2/2] target/loongarch: Implement lbt registers save/restore function

2024-09-09 Thread gaosong


在 2024/9/9 下午7:52, gaosong 写道:



在 2024/9/4 下午2:18, Bibo Mao 写道:

Six registers scr0 - scr3, eflags and ftop are added in percpu vmstate.
And two functions kvm_loongarch_get_lbt/kvm_loongarch_put_lbt are added
to save/restore lbt registers.

Signed-off-by: Bibo Mao 
---
  target/loongarch/cpu.h | 12 
  target/loongarch/kvm/kvm.c | 60 ++
  target/loongarch/machine.c | 24 +++
  3 files changed, 96 insertions(+)



Reviewed-by: Song Gao 

Thanks
Song Gao

Hi,  this patch need rebase.

Applying: target/loongarch: Implement lbt registers save/restore function
error: sha1 information is lacking or useless (target/loongarch/kvm/kvm.c).
error: could not build fake ancestor
Patch failed at 0001 target/loongarch: Implement lbt registers 
save/restore function



Thanks.
Song Gao.

Re: [PATCH v2 00/15] target/cris: Remove the deprecated CRIS target

2024-09-09 Thread Philippe Mathieu-Daudé


Hi Edgar,

On 4/9/24 16:35, Philippe Mathieu-Daudé wrote:

Since v1:
- Split in smaller patches (pm215)

The CRIS target is deprecated since v9.0 (commit
c7bbef40234 "docs: mark CRIS support as deprecated").

Remove:
- Buildsys / CI infra
- User emulation
- System emulation (axis-dev88 machine and ETRAX devices)
- Tests


You acked the deprecation commit (c7bbef4023).
No objection for the removal? I'd rather have your
explicit Acked-by before merging this.

Thanks,

Phil.


Philippe Mathieu-Daudé (15):
   tests/tcg: Remove CRIS libc test files
   tests/tcg: Remove CRIS bare test files
   buildsys: Remove CRIS cross container
   linux-user: Remove support for CRIS target
   hw/cris: Remove the axis-dev88 machine
   hw/cris: Remove image loader helper
   hw/intc: Remove TYPE_ETRAX_FS_PIC device
   hw/char: Remove TYPE_ETRAX_FS_SERIAL device
   hw/net: Remove TYPE_ETRAX_FS_ETH device
   hw/dma: Remove ETRAX_FS DMA device
   hw/timer: Remove TYPE_ETRAX_FS_TIMER device
   system: Remove support for CRIS target
   target/cris: Remove the deprecated CRIS target
   disas: Remove CRIS disassembler
   seccomp: Remove check for CRIS host

Re: [PATCH v2 0/5] tmp105: Improvements and fixes

2024-09-09 Thread Philippe Mathieu-Daudé


Hi Cédric,

On 6/9/24 17:49, Philippe Mathieu-Daudé wrote:

Respin of Guenter fixes with:
- Use registerfields API
- Clear OS bit in WRITE path


Since our mails crossed (you reviewed v1 while I was
posting v2), do you mind having another look at this
v2? At least patch #4 isn't yet reviewed.

Thanks,

Phil.


Supersedes: <20240906132912.3826089-1-li...@roeck-us.net>

Guenter Roeck (2):
   hw/sensor/tmp105: Coding style fixes
   hw/sensor/tmp105: Lower 4 bit of limit registers are always 0

Philippe Mathieu-Daudé (3):
   hw/sensor/tmp105: Use registerfields API
   hw/sensor/tmp105: Pass 'oneshot' argument to tmp105_alarm_update()
   hw/sensor/tmp105: OS (one-shot) bit in config register always returns
 0

  hw/sensor/tmp105.c | 66 ++
  1 file changed, 37 insertions(+), 29 deletions(-)

Re: [PATCH v5 00/16] hw/char/pl011: Implement TX (async) FIFO to avoid blocking the main loop

2024-09-09 Thread Philippe Mathieu-Daudé


On 7/9/24 12:38, Peter Maydell wrote:

On Sat, 7 Sept 2024 at 06:42, Philippe Mathieu-Daudé  wrote:


Hi Peter,

On 19/7/24 20:10, Philippe Mathieu-Daudé wrote:


Philippe Mathieu-Daudé (16):



hw/char/pl011: Remove unused 'readbuff' field
hw/char/pl011: Move pl011_put_fifo() earlier
hw/char/pl011: Move pl011_loopback_enabled|tx() around
hw/char/pl011: Split RX/TX path of pl011_reset_fifo()
hw/char/pl011: Extract pl011_write_txdata() from pl011_write()
hw/char/pl011: Extract pl011_read_rxdata() from pl011_read()
hw/char/pl011: Warn when using disabled transmitter



hw/char/pl011: Rename RX FIFO methods



If you don't mind I'll queue the reviewed 2-8 & 11 to ease my workflow,
before respining the next version.


Sure, that's fine. I don't have anything pl011 related in
my queue that would conflict.


Great, thank you!

[RFC PATCH] tests/qtest: Don't parallelize migration-test

2024-09-09 Thread Peter Maydell

The migration-test is a long-running test whose subtests all launch
at least two QEMU processes.  This means that if for example the host
has 4 CPUs then 'make check' defaults to a parallelism of 5, and if
we launch 5 migration-tests in parallel then we will be running 10
QEMU instances on a 4 CPU system.  If the system is not very fast
then the test can spuriously time out because the different tests are
all stealing CPU from each other.  This seems to particularly be a
problem on our S390 CI job and the cross-i686-tci CI job.

Force meson to run migration-test non-parallel, so there is never any
other test running at the same time as it.  This will slow down
overall test execution time somewhat, but hopefully make our CI less
flaky.

The downside is that because each migration-test instance runs for
between 2 and 5 minutes and we run it for five architectures this
significantly increases the runtime.  For an all-architectures build
on my local machine 'make check -j8' goes from

 real8m19.127s
 user31m47.534s
 sys 19m42.650s

to

 real20m31.218s
 user32m48.712s
 sys 19m52.133s

more than doubling the wallclock runtime.

Signed-off-by: Peter Maydell 
---
Also, looking at these figures we spend a *lot* of our overall
'make check' time on migration-test. Do we really need to do
that much for every architecture?

It's unfortunate that meson doesn't let us say "parallel is
OK, but not very parallel". One other approach would be
to have mtest2make say "run tests at half the parallelism that
-jN suggests, rather than at that parallelism", I guess...
---
 tests/qtest/meson.build | 16 
 1 file changed, 16 insertions(+)

diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index fc852f3d8ba..dbf2b8e2be1 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -17,6 +17,21 @@ slow_qtests = {
   'vmgenid-test': 610,
 }
 
+# Tests which override the default of "can run in parallel".
+# Don't use this to work around test bugs which prevent parallelism.
+# Do document why we need to make a particular test serialized.
+# Do be sparing with use of this: tests listed here will not be
+# run in parallel with any other test, not merely not with other
+# instances of themselves.
+#
+# The migration-test's subtests will each kick off two QEMU processes,
+# so allowing multiple migration-tests in parallel can overload the
+# host system and result in intermittent timeouts. So we only want to
+# run one migration-test at once.a
+qtests_parallelism = {
+  'migration-test': false,
+}
+
 qtests_generic = [
   'cdrom-test',
   'device-introspect-test',
@@ -411,6 +426,7 @@ foreach dir : target_dirs
  protocol: 'tap',
  timeout: slow_qtests.get(test, 60),
  priority: slow_qtests.get(test, 60),
+ is_parallel: qtests_parallelism.get(test, true),
  suite: ['qtest', 'qtest-' + target_base])
   endforeach
 endforeach
-- 
2.34.1

Re: [PATCH for-9.2 00/53] arm: Drop deprecated boards

2024-09-09 Thread Philippe Mathieu-Daudé


Hi,

On 3/9/24 18:06, Peter Maydell wrote:

This patchset removes the various Arm machines which we deprecated
for the 9.0 release and are therefore allowed to remove for the 9.2
release:
  akita, borzoi, cheetah, connex, mainstone, n800, n810,
  spitz, terrier, tosa, verdex, z2



The series includes removal of some code which while not strictly
specific to these machines was in practice used only by them:
  * the OneNAND flash memory device
  * the PCMCIA subsystem
  * the MUSB USB2.0 OTG USB controller chip (hcd-musb)



thanks
-- PMM

Peter Maydell (53):
   hw/input: Drop ADS7846 device
   hw/adc: Remove MAX111X device
   hw/gpio: Remove MAX7310 device
   hw/input: Remove tsc2005 touchscreen controller
   hw/input: Remove tsc210x device
   hw/rtc: Remove twl92230 device
   hw/input: Remove lm832x device
   hw/usb: Remove tusb6010 USB controller
   hw/usb: Remove MUSB USB host controller


Some of these devices are user-creatable and only rely on a bus
(not a particular removed machine), so could potentially be used
on other maintained machines which expose a similar bus.
We don't have in-tree (tests/) examples, but I wonder if it is OK
to remove them without first explicitly deprecating them in
docs/about/deprecated.rst. I wouldn't surprise users when 9.2 is
release. Maybe this isn't an issue, but I prefer to mention it
now to be sure.

Regards,

Phil.

Re: [PATCH for-9.2 00/53] arm: Drop deprecated boards

2024-09-09 Thread Peter Maydell

On Mon, 9 Sept 2024 at 14:41, Philippe Mathieu-Daudé  wrote:
>
> Hi,
>
> On 3/9/24 18:06, Peter Maydell wrote:
> > This patchset removes the various Arm machines which we deprecated
> > for the 9.0 release and are therefore allowed to remove for the 9.2
> > release:
> >   akita, borzoi, cheetah, connex, mainstone, n800, n810,
> >   spitz, terrier, tosa, verdex, z2
>
> > The series includes removal of some code which while not strictly
> > specific to these machines was in practice used only by them:
> >   * the OneNAND flash memory device
> >   * the PCMCIA subsystem
> >   * the MUSB USB2.0 OTG USB controller chip (hcd-musb)
>
> > thanks
> > -- PMM
> >
> > Peter Maydell (53):
> >hw/input: Drop ADS7846 device
> >hw/adc: Remove MAX111X device
> >hw/gpio: Remove MAX7310 device
> >hw/input: Remove tsc2005 touchscreen controller
> >hw/input: Remove tsc210x device
> >hw/rtc: Remove twl92230 device
> >hw/input: Remove lm832x device
> >hw/usb: Remove tusb6010 USB controller
> >hw/usb: Remove MUSB USB host controller
>
> Some of these devices are user-creatable and only rely on a bus
> (not a particular removed machine), so could potentially be used
> on other maintained machines which expose a similar bus.

Which ones in particular? Almost all of them are sysbus.
At least one of them that I looked at (lm832x) is an I2C
device but it also requires the board to wire up a GPIO line
and to call a specific C function to inject key events, so it's
not actually generally usable.

> We don't have in-tree (tests/) examples, but I wonder if it is OK
> to remove them without first explicitly deprecating them in
> docs/about/deprecated.rst. I wouldn't surprise users when 9.2 is
> release. Maybe this isn't an issue, but I prefer to mention it
> now to be sure.

I think this is unlikely to be a problem, but if you have
a specific device you think might be a problem we can
look at whether it seems likely (e.g. whether a web search
turns up users using it in odd ways).

thanks
-- PMM

Re: [PATCH v2 00/11] s390: Convert virtio-ccw, cpu to three-phase reset, and followup cleanup

2024-09-09 Thread Nina Schoetterl-Glausch

On Fri, 2024-09-06 at 15:38 +0100, Peter Maydell wrote:
> On Fri, 30 Aug 2024 at 15:58, Peter Maydell  wrote:
> > 
> > The main aim of this patchseries is to remove the two remaining uses
> > of device_class_set_parent_reset() in the tree, which are virtio-ccw
> > and the s390 CPU class. Doing that lets us do some followup cleanup.
> > (The diffstat looks alarming but is almost all coccinelle automated
> > changes.)
> > 
> > Changes v1->v2:
> >  * new patch 1 to convert hw/s390/ccw-device
> >(fixes bug discovered via s390 CI testing in v1)
> >  * a couple of patches are already upstream
> >  * in the target/s390 cpu patch, fix sigp_cpu_reset() to use
> >RESET_TYPE_S390_CPU_NORMAL
> >  * new patches 10, 11 which take advantage of the new function
> >device_class_set_legacy_reset() to allow us to replace the
> >generic Resettable transitional_function machinery with a
> >simple wrapper that adapts from the API of the hold method
> >to the one used by the legacy reset method
> > 
> > Patches 1, 10, 11 need review. I believe that patch 1 should have
> > fixed the intermittent s390 issue we found with v1 of the patchset,
> > but if you could run these through the s390 CI again I'd
> > appreciate it.
> 
> I'm going to apply this series to my target-arm.next queue.
> 
> Let me know if you need more time to CI/test/whatever it on
> the s390 side before it goes upstream.

CI looks good.
> 
> thanks
> -- PMM

[PATCH RFC 05/10] migration: Introduce util functions for periodic CPU throttle

2024-09-09 Thread Hyman Huang

Provide useful utilities to manage the periodic_throttle_thread's
lifespan. Additionally, to set up sync mode, provide
periodic_throttle_setup.

Signed-off-by: Hyman Huang 
---
 migration/ram.c| 98 +-
 migration/ram.h|  4 ++
 migration/trace-events |  3 ++
 3 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index 23471c9e5a..d9d8ed0fda 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -416,6 +416,10 @@ struct RAMState {
  * RAM migration.
  */
 unsigned int postcopy_bmap_sync_requested;
+
+/* Periodic throttle information */
+bool throttle_running;
+QemuThread throttle_thread;
 };
 typedef struct RAMState RAMState;
 
@@ -1075,7 +1079,13 @@ static void migration_bitmap_sync(RAMState *rs,
 RAMBlock *block;
 int64_t end_time;
 
-if (!periodic) {
+if (periodic) {
+/* Be careful that we don't synchronize too often */
+int64_t curr_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+if (curr_time < rs->time_last_bitmap_sync + 1000) {
+return;
+}
+} else {
 stat64_add(&mig_stats.iteration_count, 1);
 }
 
@@ -1121,6 +1131,92 @@ static void migration_bitmap_sync(RAMState *rs,
 }
 }
 
+static void *periodic_throttle_thread(void *opaque)
+{
+RAMState *rs = opaque;
+bool skip_sleep = false;
+int sleep_duration = migrate_periodic_throttle_interval();
+
+rcu_register_thread();
+
+while (qatomic_read(&rs->throttle_running)) {
+int64_t curr_time;
+/*
+ * The first iteration copies all memory anyhow and has no
+ * effect on guest performance, therefore omit it to avoid
+ * paying extra for the sync penalty.
+ */
+if (stat64_get(&mig_stats.iteration_count) <= 1) {
+continue;
+}
+
+if (!skip_sleep) {
+sleep(sleep_duration);
+}
+
+/* Be careful that we don't synchronize too often */
+curr_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+if (curr_time > rs->time_last_bitmap_sync + 1000) {
+bql_lock();
+trace_migration_periodic_throttle();
+WITH_RCU_READ_LOCK_GUARD() {
+migration_bitmap_sync(rs, false, true);
+}
+bql_unlock();
+skip_sleep = false;
+} else {
+skip_sleep = true;
+}
+}
+
+rcu_unregister_thread();
+
+return NULL;
+}
+
+void periodic_throttle_start(void)
+{
+RAMState *rs = ram_state;
+
+if (!rs) {
+return;
+}
+
+if (qatomic_read(&rs->throttle_running)) {
+return;
+}
+
+trace_migration_periodic_throttle_start();
+
+qatomic_set(&rs->throttle_running, 1);
+qemu_thread_create(&rs->throttle_thread,
+   NULL, periodic_throttle_thread,
+   rs, QEMU_THREAD_JOINABLE);
+}
+
+void periodic_throttle_stop(void)
+{
+RAMState *rs = ram_state;
+
+if (!rs) {
+return;
+}
+
+if (!qatomic_read(&rs->throttle_running)) {
+return;
+}
+
+trace_migration_periodic_throttle_stop();
+
+qatomic_set(&rs->throttle_running, 0);
+qemu_thread_join(&rs->throttle_thread);
+}
+
+void periodic_throttle_setup(bool enable)
+{
+sync_mode = enable ? RAMBLOCK_SYN_MODERN : RAMBLOCK_SYN_LEGACY;
+}
+
 static void migration_bitmap_sync_precopy(RAMState *rs, bool last_stage)
 {
 Error *local_err = NULL;
diff --git a/migration/ram.h b/migration/ram.h
index bc0318b834..f7c7b2e7ad 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -93,4 +93,8 @@ void ram_write_tracking_prepare(void);
 int ram_write_tracking_start(void);
 void ram_write_tracking_stop(void);
 
+/* Periodic throttle */
+void periodic_throttle_start(void);
+void periodic_throttle_stop(void);
+void periodic_throttle_setup(bool enable);
 #endif
diff --git a/migration/trace-events b/migration/trace-events
index c65902f042..5b9db57c8f 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -95,6 +95,9 @@ get_queued_page_not_dirty(const char *block_name, uint64_t 
tmp_offset, unsigned
 migration_bitmap_sync_start(void) ""
 migration_bitmap_sync_end(uint64_t dirty_pages) "dirty_pages %" PRIu64
 migration_bitmap_clear_dirty(char *str, uint64_t start, uint64_t size, 
unsigned long page) "rb %s start 0x%"PRIx64" size 0x%"PRIx64" page 0x%lx"
+migration_periodic_throttle(void) ""
+migration_periodic_throttle_start(void) ""
+migration_periodic_throttle_stop(void) ""
 migration_throttle(void) ""
 migration_dirty_limit_guest(int64_t dirtyrate) "guest dirty page rate limit %" 
PRIi64 " MB/s"
 ram_discard_range(const char *rbname, uint64_t start, size_t len) "%s: start: 
%" PRIx64 " %zx"
-- 
2.39.1

[PATCH RFC 04/10] qapi/migration: Introduce the iteration-count

2024-09-09 Thread Hyman Huang

The original migration information dirty-sync-count
could no longer reflect iteration count due to the
introduction of periodic synchronization in the next
commit; add the iteration count to compensate.

Signed-off-by: Hyman Huang 
---
 migration/migration-stats.h  |  4 
 migration/migration.c|  1 +
 migration/ram.c  | 12 
 qapi/migration.json  |  6 +-
 tests/qtest/migration-test.c |  2 +-
 5 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/migration/migration-stats.h b/migration/migration-stats.h
index 05290ade76..43ee0f4f05 100644
--- a/migration/migration-stats.h
+++ b/migration/migration-stats.h
@@ -50,6 +50,10 @@ typedef struct {
  * Number of times we have synchronized guest bitmaps.
  */
 Stat64 dirty_sync_count;
+/*
+ * Number of migration iteration processed.
+ */
+Stat64 iteration_count;
 /*
  * Number of times zero copy failed to send any page using zero
  * copy.
diff --git a/migration/migration.c b/migration/migration.c
index 3dea06d577..055d527ff6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1197,6 +1197,7 @@ static void populate_ram_info(MigrationInfo *info, 
MigrationState *s)
 info->ram->mbps = s->mbps;
 info->ram->dirty_sync_count =
 stat64_get(&mig_stats.dirty_sync_count);
+info->ram->iteration_count = stat64_get(&mig_stats.iteration_count);
 info->ram->dirty_sync_missed_zero_copy =
 stat64_get(&mig_stats.dirty_sync_missed_zero_copy);
 info->ram->postcopy_requests =
diff --git a/migration/ram.c b/migration/ram.c
index a56634eb46..23471c9e5a 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -594,7 +594,7 @@ static void xbzrle_cache_zero_page(ram_addr_t current_addr)
 /* We don't care if this fails to allocate a new cache page
  * as long as it updated an old one */
 cache_insert(XBZRLE.cache, current_addr, XBZRLE.zero_target_page,
- stat64_get(&mig_stats.dirty_sync_count));
+ stat64_get(&mig_stats.iteration_count));
 }
 
 #define ENCODING_FLAG_XBZRLE 0x1
@@ -620,7 +620,7 @@ static int save_xbzrle_page(RAMState *rs, PageSearchStatus 
*pss,
 int encoded_len = 0, bytes_xbzrle;
 uint8_t *prev_cached_page;
 QEMUFile *file = pss->pss_channel;
-uint64_t generation = stat64_get(&mig_stats.dirty_sync_count);
+uint64_t generation = stat64_get(&mig_stats.iteration_count);
 
 if (!cache_is_cached(XBZRLE.cache, current_addr, generation)) {
 xbzrle_counters.cache_miss++;
@@ -1075,6 +1075,10 @@ static void migration_bitmap_sync(RAMState *rs,
 RAMBlock *block;
 int64_t end_time;
 
+if (!periodic) {
+stat64_add(&mig_stats.iteration_count, 1);
+}
+
 stat64_add(&mig_stats.dirty_sync_count, 1);
 
 if (!rs->time_last_bitmap_sync) {
@@ -,8 +1115,8 @@ static void migration_bitmap_sync(RAMState *rs,
 rs->num_dirty_pages_period = 0;
 rs->bytes_xfer_prev = migration_transferred_bytes();
 }
-if (migrate_events()) {
-uint64_t generation = stat64_get(&mig_stats.dirty_sync_count);
+if (!periodic && migrate_events()) {
+uint64_t generation = stat64_get(&mig_stats.iteration_count);
 qapi_event_send_migration_pass(generation);
 }
 }
diff --git a/qapi/migration.json b/qapi/migration.json
index 8281d4a83b..6d8358c202 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -60,6 +60,9 @@
 # between 0 and @dirty-sync-count * @multifd-channels.  (since
 # 7.1)
 #
+# @iteration-count: The number of iterations since migration started.
+# (since 9.2)
+#
 # Since: 0.14
 ##
 { 'struct': 'MigrationStats',
@@ -72,7 +75,8 @@
'multifd-bytes': 'uint64', 'pages-per-second': 'uint64',
'precopy-bytes': 'uint64', 'downtime-bytes': 'uint64',
'postcopy-bytes': 'uint64',
-   'dirty-sync-missed-zero-copy': 'uint64' } }
+   'dirty-sync-missed-zero-copy': 'uint64',
+   'iteration-count' : 'int' } }
 
 ##
 # @XBZRLECacheStats:
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 9d08101643..2fb10658d4 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -278,7 +278,7 @@ static int64_t read_migrate_property_int(QTestState *who, 
const char *property)
 
 static uint64_t get_migration_pass(QTestState *who)
 {
-return read_ram_property_int(who, "dirty-sync-count");
+return read_ram_property_int(who, "iteration-count");
 }
 
 static void read_blocktime(QTestState *who)
-- 
2.39.1

[PATCH RFC 08/10] migration: Introduce cpu-responsive-throttle parameter

2024-09-09 Thread Hyman Huang

To enable the responsive throttle that will be implemented
in the next commit, introduce the cpu-responsive-throttle
parameter.

Signed-off-by: Hyman Huang 
---
 migration/migration-hmp-cmds.c |  8 
 migration/options.c| 20 
 migration/options.h|  1 +
 qapi/migration.json| 16 +++-
 4 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index f7b8e06bb4..a3d4d3f62f 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -273,6 +273,10 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict 
*qdict)
 MigrationParameter_str(
 MIGRATION_PARAMETER_CPU_PERIODIC_THROTTLE_INTERVAL),
 params->cpu_periodic_throttle_interval);
+assert(params->has_cpu_responsive_throttle);
+monitor_printf(mon, "%s: %s\n",
+
MigrationParameter_str(MIGRATION_PARAMETER_CPU_RESPONSIVE_THROTTLE),
+params->cpu_responsive_throttle ? "on" : "off");
 assert(params->has_max_cpu_throttle);
 monitor_printf(mon, "%s: %u\n",
 MigrationParameter_str(MIGRATION_PARAMETER_MAX_CPU_THROTTLE),
@@ -529,6 +533,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict)
 p->has_cpu_periodic_throttle_interval = true;
 visit_type_uint8(v, param, &p->cpu_periodic_throttle_interval, &err);
 break;
+case MIGRATION_PARAMETER_CPU_RESPONSIVE_THROTTLE:
+p->has_cpu_responsive_throttle = true;
+visit_type_bool(v, param, &p->cpu_responsive_throttle, &err);
+break;
 case MIGRATION_PARAMETER_MAX_CPU_THROTTLE:
 p->has_max_cpu_throttle = true;
 visit_type_uint8(v, param, &p->max_cpu_throttle, &err);
diff --git a/migration/options.c b/migration/options.c
index 2dbe275ba0..aa233684ee 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -110,6 +110,8 @@ Property migration_properties[] = {
 DEFINE_PROP_UINT8("x-cpu-periodic-throttle-interval", MigrationState,
   parameters.cpu_periodic_throttle_interval,
   DEFAULT_MIGRATE_CPU_PERIODIC_THROTTLE_INTERVAL),
+DEFINE_PROP_BOOL("x-cpu-responsive-throttle", MigrationState,
+  parameters.cpu_responsive_throttle, false),
 DEFINE_PROP_SIZE("x-max-bandwidth", MigrationState,
   parameters.max_bandwidth, MAX_THROTTLE),
 DEFINE_PROP_SIZE("avail-switchover-bandwidth", MigrationState,
@@ -715,6 +717,13 @@ bool migrate_periodic_throttle(void)
 return s->parameters.cpu_periodic_throttle;
 }
 
+bool migrate_responsive_throttle(void)
+{
+MigrationState *s = migrate_get_current();
+
+return s->parameters.cpu_responsive_throttle;
+}
+
 bool migrate_cpu_throttle_tailslow(void)
 {
 MigrationState *s = migrate_get_current();
@@ -899,6 +908,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error 
**errp)
 params->has_cpu_periodic_throttle_interval = true;
 params->cpu_periodic_throttle_interval =
 s->parameters.cpu_periodic_throttle_interval;
+params->has_cpu_responsive_throttle = true;
+params->cpu_responsive_throttle = s->parameters.cpu_responsive_throttle;
 params->tls_creds = g_strdup(s->parameters.tls_creds);
 params->tls_hostname = g_strdup(s->parameters.tls_hostname);
 params->tls_authz = g_strdup(s->parameters.tls_authz ?
@@ -967,6 +978,7 @@ void migrate_params_init(MigrationParameters *params)
 params->has_cpu_throttle_tailslow = true;
 params->has_cpu_periodic_throttle = true;
 params->has_cpu_periodic_throttle_interval = true;
+params->has_cpu_responsive_throttle = true;
 params->has_max_bandwidth = true;
 params->has_downtime_limit = true;
 params->has_x_checkpoint_delay = true;
@@ -1208,6 +1220,10 @@ static void 
migrate_params_test_apply(MigrateSetParameters *params,
 params->cpu_periodic_throttle_interval;
 }
 
+if (params->has_cpu_responsive_throttle) {
+dest->cpu_responsive_throttle = params->cpu_responsive_throttle;
+}
+
 if (params->tls_creds) {
 assert(params->tls_creds->type == QTYPE_QSTRING);
 dest->tls_creds = params->tls_creds->u.s;
@@ -1325,6 +1341,10 @@ static void migrate_params_apply(MigrateSetParameters 
*params, Error **errp)
 params->cpu_periodic_throttle_interval;
 }
 
+if (params->has_cpu_responsive_throttle) {
+s->parameters.cpu_responsive_throttle = 
params->cpu_responsive_throttle;
+}
+
 if (params->tls_creds) {
 g_free(s->parameters.tls_creds);
 assert(params->tls_creds->type == QTYPE_QSTRING);
diff --git a/migration/options.h b/migration/options.h
index efeac01470..613d675003 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -70,6 +70,7 @@ uint8_t migrate_cpu_throttle_increment(void);
 uint8_t migrate_cpu_throttle_initial(void);
 uint8_t migrate_p

[PATCH RFC 09/10] migration: Support responsive CPU throttle

2024-09-09 Thread Hyman Huang

Currently, the convergence algorithm determines that the migration
cannot converge according to the following principle:
The dirty pages generated in current iteration exceed a specific
percentage (throttle-trigger-threshold, 50 by default) of the number
of transmissions. Let's refer to this criteria as the "dirty rate".
If this criteria is met more than or equal to twice
(dirty_rate_high_cnt >= 2), the throttle percentage increased.

In most cases, above implementation is appropriate. However, for a
VM with high memory overload, each iteration is time-consuming.
The VM's computing performance may be throttled at a high percentage
and last for a long time due to the repeated confirmation behavior.
Which may be intolerable for some computationally sensitive software
in the VM.

As the comment mentioned in the migration_trigger_throttle function,
in order to avoid erroneous detection, the original algorithm confirms
the criteria repeatedly. Put differently, the criteria does not need
to be validated again once the detection is more reliable.

In the refinement, in order to make the detection more accurate, we
introduce another criteria, called the "dirty ratio" to determine
the migration convergence. The "dirty ratio" is the ratio of
bytes_xfer_period and bytes_dirty_period. When the algorithm
repeatedly detects that the "dirty ratio" of current sync is lower
than the previous, the algorithm determines that the migration cannot
converge. For the "dirty rate" and "dirty ratio", if one of the two
criteria is met, the penalty percentage would be increased. This
makes CPU throttle more responsively and therefor saves the time of
the entire iteration and therefore reduces the time of VM performance
degradation.

In conclusion, this refinement significantly reduces the processing
time required for the throttle percentage step to its maximum while
the VM is under a high memory load.

Signed-off-by: Hyman Huang 
---
 migration/ram.c  | 55 ++--
 migration/trace-events   |  1 +
 tests/qtest/migration-test.c |  1 +
 3 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index d9d8ed0fda..5fba572f3e 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -420,6 +420,12 @@ struct RAMState {
 /* Periodic throttle information */
 bool throttle_running;
 QemuThread throttle_thread;
+
+/*
+ * Ratio of bytes_dirty_period and bytes_xfer_period in the previous
+ * sync.
+ */
+uint64_t dirty_ratio_pct;
 };
 typedef struct RAMState RAMState;
 
@@ -1044,6 +1050,43 @@ static void migration_dirty_limit_guest(void)
 trace_migration_dirty_limit_guest(quota_dirtyrate);
 }
 
+static bool migration_dirty_ratio_high(RAMState *rs)
+{
+static int dirty_ratio_high_cnt;
+uint64_t threshold = migrate_throttle_trigger_threshold();
+uint64_t bytes_xfer_period =
+migration_transferred_bytes() - rs->bytes_xfer_prev;
+uint64_t bytes_dirty_period = rs->num_dirty_pages_period * 
TARGET_PAGE_SIZE;
+bool dirty_ratio_high = false;
+uint64_t prev, curr;
+
+/* Calculate the dirty ratio percentage */
+curr = 100 * (bytes_dirty_period * 1.0 / bytes_xfer_period);
+
+prev = rs->dirty_ratio_pct;
+rs->dirty_ratio_pct = curr;
+
+if (prev == 0) {
+return false;
+}
+
+/*
+ * If current dirty ratio is greater than previouse, determine
+ * that the migration do not converge.
+ */
+if (curr > threshold && curr >= prev) {
+trace_migration_dirty_ratio_high(curr, prev);
+dirty_ratio_high_cnt++;
+}
+
+if (dirty_ratio_high_cnt >= 2) {
+dirty_ratio_high = true;
+dirty_ratio_high_cnt = 0;
+}
+
+return dirty_ratio_high;
+}
+
 static void migration_trigger_throttle(RAMState *rs)
 {
 uint64_t threshold = migrate_throttle_trigger_threshold();
@@ -1051,6 +1094,11 @@ static void migration_trigger_throttle(RAMState *rs)
 migration_transferred_bytes() - rs->bytes_xfer_prev;
 uint64_t bytes_dirty_period = rs->num_dirty_pages_period * 
TARGET_PAGE_SIZE;
 uint64_t bytes_dirty_threshold = bytes_xfer_period * threshold / 100;
+bool dirty_ratio_high = false;
+
+if (migrate_responsive_throttle() && (bytes_xfer_period != 0)) {
+dirty_ratio_high = migration_dirty_ratio_high(rs);
+}
 
 /*
  * The following detection logic can be refined later. For now:
@@ -1060,8 +1108,11 @@ static void migration_trigger_throttle(RAMState *rs)
  * twice, start or increase throttling.
  */
 if ((bytes_dirty_period > bytes_dirty_threshold) &&
-(++rs->dirty_rate_high_cnt >= 2)) {
-rs->dirty_rate_high_cnt = 0;
+((++rs->dirty_rate_high_cnt >= 2) || dirty_ratio_high)) {
+
+rs->dirty_rate_high_cnt =
+rs->dirty_rate_high_cnt >= 2 ? 0 : rs->dirty_rate_high_cnt;
+
 if (migrate_auto_converge()) {
 trace_migration_throttle();
 mig_thr

[PATCH RFC 00/10] migration: auto-converge refinements for huge VM

2024-09-09 Thread Hyman Huang

Currently, a huge VM with high memory overload may take a long time
to increase its maximum throttle percentage. The root cause is that
the current auto-converge throttle logic doesn't look like it will
scale because migration_trigger_throttle() is only called for each
iteration, so it won't be invoked for a long time if one iteration
can take a long time.

This patchset provides two refinements aiming at the above case.

1: The periodic CPU throttle. As Peter points out, "throttle only
   for each sync, sync for each iteration" may make sense in the
   old days, but perhaps not anymore. So we introduce perioidic
   CPU throttle implementation for migration, which is a trade-off
   between synchronization overhead and CPU throttle impact.

2: The responsive CPU throttle. We present new criteria called
   "dirty ratio" to help improve the detection accuracy and hence
   accelerate the throttle's invocation.

The RFC version of the refinement may be a rudimentary implementation,
I would appreciate hearing more feedback.

Yong, thanks.

Hyman Huang (10):
  migration: Introduce structs for periodic CPU throttle
  migration: Refine util functions to support periodic CPU throttle
  qapi/migration: Introduce periodic CPU throttling parameters
  qapi/migration: Introduce the iteration-count
  migration: Introduce util functions for periodic CPU throttle
  migration: Support periodic CPU throttle
  tests/migration-tests: Add test case for periodic throttle
  migration: Introduce cpu-responsive-throttle parameter
  migration: Support responsive CPU throttle
  tests/migration-tests: Add test case for responsive CPU throttle

 include/exec/ram_addr.h| 117 --
 include/exec/ramblock.h|  45 +++
 migration/migration-hmp-cmds.c |  25 
 migration/migration-stats.h|   4 +
 migration/migration.c  |  12 ++
 migration/options.c|  74 +++
 migration/options.h|   3 +
 migration/ram.c| 218 ++---
 migration/ram.h|   4 +
 migration/trace-events |   4 +
 qapi/migration.json|  45 ++-
 tests/qtest/migration-test.c   |  77 +++-
 12 files changed, 600 insertions(+), 28 deletions(-)

-- 
2.39.1

[PATCH RFC 01/10] migration: Introduce structs for periodic CPU throttle

2024-09-09 Thread Hyman Huang

shadow_bmap, iter_bmap, iter_dirty_pages and
periodic_sync_shown_up are introduced to satisfy the need
for periodic CPU throttle.

Meanwhile, introduce enumeration of dirty bitmap sync method.

Signed-off-by: Hyman Huang 
---
 include/exec/ramblock.h | 45 +
 migration/ram.c |  6 ++
 2 files changed, 51 insertions(+)

diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 0babd105c0..619c52885a 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -24,6 +24,30 @@
 #include "qemu/rcu.h"
 #include "exec/ramlist.h"
 
+/* Possible bits for migration_bitmap_sync */
+
+/*
+ * The old-fashioned sync method, which is, in turn, used for CPU
+ * throttle and memory transfer.
+ */
+#define RAMBLOCK_SYN_LEGACY_ITER(1U << 0)
+
+/*
+ * The modern sync method, which is, in turn, used for CPU throttle
+ * and memory transfer.
+ */
+#define RAMBLOCK_SYN_MODERN_ITER(1U << 1)
+
+/* The modern sync method, which is used for CPU throttle only */
+#define RAMBLOCK_SYN_MODERN_PERIOD  (1U << 2)
+
+#define RAMBLOCK_SYN_MASK   (0x7)
+
+typedef enum RAMBlockSynMode {
+RAMBLOCK_SYN_LEGACY,/* Old-fashined mode */
+RAMBLOCK_SYN_MODERN,
+} RAMBlockSynMode;
+
 struct RAMBlock {
 struct rcu_head rcu;
 struct MemoryRegion *mr;
@@ -89,6 +113,27 @@ struct RAMBlock {
  * could not have been valid on the source.
  */
 ram_addr_t postcopy_length;
+
+/*
+ * Used to backup the bmap during periodic sync to see whether any dirty
+ * pages were sent during that time.
+ */
+unsigned long *shadow_bmap;
+
+/*
+ * The bitmap "bmap," which was initially used for both sync and memory
+ * transfer, will be replaced by two bitmaps: the previously used "bmap"
+ * and the recently added "iter_bmap." Only the memory transfer is
+ * conducted with the previously used "bmap"; the recently added
+ * "iter_bmap" is utilized for sync.
+ */
+unsigned long *iter_bmap;
+
+/* Number of new dirty pages during iteration */
+uint64_t iter_dirty_pages;
+
+/* If periodic sync has shown up during iteration */
+bool periodic_sync_shown_up;
 };
 #endif
 #endif
diff --git a/migration/ram.c b/migration/ram.c
index 67ca3d5d51..f29faa82d6 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2362,6 +2362,10 @@ static void ram_bitmaps_destroy(void)
 block->bmap = NULL;
 g_free(block->file_bmap);
 block->file_bmap = NULL;
+g_free(block->shadow_bmap);
+block->shadow_bmap = NULL;
+g_free(block->iter_bmap);
+block->iter_bmap = NULL;
 }
 }
 
@@ -2753,6 +2757,8 @@ static void ram_list_init_bitmaps(void)
 }
 block->clear_bmap_shift = shift;
 block->clear_bmap = bitmap_new(clear_bmap_size(pages, shift));
+block->shadow_bmap = bitmap_new(pages);
+block->iter_bmap = bitmap_new(pages);
 }
 }
 }
-- 
2.39.1

[PATCH RFC 02/10] migration: Refine util functions to support periodic CPU throttle

2024-09-09 Thread Hyman Huang

Supply the migration_bitmap_sync function along with a periodic
argument. Introduce the sync_mode global variable to track the
sync mode and support periodic throttling while keeping backward
compatibility.

Signed-off-by: Hyman Huang 
---
 include/exec/ram_addr.h | 117 
 migration/ram.c |  49 +
 2 files changed, 147 insertions(+), 19 deletions(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 891c44cf2d..43fa4d7b18 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -472,17 +472,68 @@ static inline void 
cpu_physical_memory_clear_dirty_range(ram_addr_t start,
 cpu_physical_memory_test_and_clear_dirty(start, length, DIRTY_MEMORY_CODE);
 }
 
+static void ramblock_clear_iter_bmap(RAMBlock *rb,
+ ram_addr_t start,
+ ram_addr_t length)
+{
+ram_addr_t addr;
+unsigned long *bmap = rb->bmap;
+unsigned long *shadow_bmap = rb->shadow_bmap;
+unsigned long *iter_bmap = rb->iter_bmap;
+
+for (addr = 0; addr < length; addr += TARGET_PAGE_SIZE) {
+long k = (start + addr) >> TARGET_PAGE_BITS;
+if (test_bit(k, shadow_bmap) && !test_bit(k, bmap)) {
+/* Page has been sent, clear the iter bmap */
+clear_bit(k, iter_bmap);
+}
+}
+}
+
+static void ramblock_update_iter_bmap(RAMBlock *rb,
+  ram_addr_t start,
+  ram_addr_t length)
+{
+ram_addr_t addr;
+unsigned long *bmap = rb->bmap;
+unsigned long *iter_bmap = rb->iter_bmap;
+
+for (addr = 0; addr < length; addr += TARGET_PAGE_SIZE) {
+long k = (start + addr) >> TARGET_PAGE_BITS;
+if (test_bit(k, iter_bmap)) {
+if (!test_bit(k, bmap)) {
+set_bit(k, bmap);
+rb->iter_dirty_pages++;
+}
+}
+}
+}
 
 /* Called with RCU critical section */
 static inline
 uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb,
ram_addr_t start,
-   ram_addr_t length)
+   ram_addr_t length,
+   unsigned int flag)
 {
 ram_addr_t addr;
 unsigned long word = BIT_WORD((start + rb->offset) >> TARGET_PAGE_BITS);
 uint64_t num_dirty = 0;
 unsigned long *dest = rb->bmap;
+unsigned long *shadow_bmap = rb->shadow_bmap;
+unsigned long *iter_bmap = rb->iter_bmap;
+
+assert(flag && !(flag & (~RAMBLOCK_SYN_MASK)));
+
+/*
+ * We must remove the sent dirty page from the iter_bmap in order to
+ * minimize redundant page transfers if periodic sync has appeared
+ * during this iteration.
+ */
+if (rb->periodic_sync_shown_up &&
+(flag & (RAMBLOCK_SYN_MODERN_ITER | RAMBLOCK_SYN_MODERN_PERIOD))) {
+ramblock_clear_iter_bmap(rb, start, length);
+}
 
 /* start address and length is aligned at the start of a word? */
 if (((word * BITS_PER_LONG) << TARGET_PAGE_BITS) ==
@@ -503,8 +554,20 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock 
*rb,
 if (src[idx][offset]) {
 unsigned long bits = qatomic_xchg(&src[idx][offset], 0);
 unsigned long new_dirty;
+if (flag & (RAMBLOCK_SYN_MODERN_ITER |
+RAMBLOCK_SYN_MODERN_PERIOD)) {
+/* Back-up bmap for the next iteration */
+iter_bmap[k] |= bits;
+if (flag == RAMBLOCK_SYN_MODERN_PERIOD) {
+/* Back-up bmap to detect pages has been sent */
+shadow_bmap[k] = dest[k];
+}
+}
 new_dirty = ~dest[k];
-dest[k] |= bits;
+if (flag == RAMBLOCK_SYN_LEGACY_ITER) {
+dest[k] |= bits;
+}
+
 new_dirty &= bits;
 num_dirty += ctpopl(new_dirty);
 }
@@ -534,18 +597,54 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock 
*rb,
 ram_addr_t offset = rb->offset;
 
 for (addr = 0; addr < length; addr += TARGET_PAGE_SIZE) {
-if (cpu_physical_memory_test_and_clear_dirty(
-start + addr + offset,
-TARGET_PAGE_SIZE,
-DIRTY_MEMORY_MIGRATION)) {
-long k = (start + addr) >> TARGET_PAGE_BITS;
-if (!test_and_set_bit(k, dest)) {
-num_dirty++;
+long k = (start + addr) >> TARGET_PAGE_BITS;
+if (flag == RAMBLOCK_SYN_MODERN_PERIOD) {
+if (test_bit(k, dest)) {
+/* Back-up bmap to detect pages has been sent */
+set_bit(k, shadow_bmap);
+}
+

[PATCH RFC 07/10] tests/migration-tests: Add test case for periodic throttle

2024-09-09 Thread Hyman Huang

To make sure periodic throttle feature doesn't regression
any features and functionalities, enable this feature in
the auto-converge migration test.

Signed-off-by: Hyman Huang 
---
 tests/qtest/migration-test.c | 56 +++-
 1 file changed, 55 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 2fb10658d4..61d7182f88 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -281,6 +281,11 @@ static uint64_t get_migration_pass(QTestState *who)
 return read_ram_property_int(who, "iteration-count");
 }
 
+static uint64_t get_migration_dirty_sync_count(QTestState *who)
+{
+return read_ram_property_int(who, "dirty-sync-count");
+}
+
 static void read_blocktime(QTestState *who)
 {
 QDict *rsp_return;
@@ -710,6 +715,11 @@ typedef struct {
 PostcopyRecoveryFailStage postcopy_recovery_fail_stage;
 } MigrateCommon;
 
+typedef struct {
+/* CPU throttle parameters */
+bool periodic;
+} AutoConvergeArgs;
+
 static int test_migrate_start(QTestState **from, QTestState **to,
   const char *uri, MigrateStart *args)
 {
@@ -2778,12 +2788,13 @@ static void test_validate_uri_channels_none_set(void)
  * To make things even worse, we need to run the initial stage at
  * 3MB/s so we enter autoconverge even when host is (over)loaded.
  */
-static void test_migrate_auto_converge(void)
+static void test_migrate_auto_converge_args(AutoConvergeArgs *input_args)
 {
 g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
 MigrateStart args = {};
 QTestState *from, *to;
 int64_t percentage;
+bool periodic = (input_args && input_args->periodic);
 
 /*
  * We want the test to be stable and as fast as possible.
@@ -2791,6 +2802,7 @@ static void test_migrate_auto_converge(void)
  * so we need to decrease a bandwidth.
  */
 const int64_t init_pct = 5, inc_pct = 25, max_pct = 95;
+const int64_t periodic_throttle_interval = 2;
 
 if (test_migrate_start(&from, &to, uri, &args)) {
 return;
@@ -2801,6 +2813,12 @@ static void test_migrate_auto_converge(void)
 migrate_set_parameter_int(from, "cpu-throttle-increment", inc_pct);
 migrate_set_parameter_int(from, "max-cpu-throttle", max_pct);
 
+if (periodic) {
+migrate_set_parameter_bool(from, "cpu-periodic-throttle", true);
+migrate_set_parameter_int(from, "cpu-periodic-throttle-interval",
+periodic_throttle_interval);
+}
+
 /*
  * Set the initial parameters so that the migration could not converge
  * without throttling.
@@ -2827,6 +2845,29 @@ static void test_migrate_auto_converge(void)
 } while (true);
 /* The first percentage of throttling should be at least init_pct */
 g_assert_cmpint(percentage, >=, init_pct);
+
+if (periodic) {
+/*
+ * Check if periodic sync take effect, set the timeout with 20s
+ * (max_try_count * 1s), if extra sync doesn't show up, fail test.
+ */
+uint64_t iteration_count, dirty_sync_count;
+bool extra_sync = false;
+int max_try_count = 20;
+
+/* Check if periodic sync take effect */
+while (--max_try_count) {
+usleep(1000 * 1000);
+iteration_count = get_migration_pass(from);
+dirty_sync_count = get_migration_dirty_sync_count(from);
+if (dirty_sync_count > iteration_count) {
+extra_sync = true;
+break;
+}
+}
+g_assert(extra_sync);
+}
+
 /* Now, when we tested that throttling works, let it converge */
 migrate_ensure_converge(from);
 
@@ -2849,6 +2890,17 @@ static void test_migrate_auto_converge(void)
 test_migrate_end(from, to, true);
 }
 
+static void test_migrate_auto_converge(void)
+{
+test_migrate_auto_converge_args(NULL);
+}
+
+static void test_migrate_auto_converge_periodic_throttle(void)
+{
+AutoConvergeArgs args = {.periodic = true};
+test_migrate_auto_converge_args(&args);
+}
+
 static void *
 test_migrate_precopy_tcp_multifd_start_common(QTestState *from,
   QTestState *to,
@@ -3900,6 +3952,8 @@ int main(int argc, char **argv)
 if (g_test_slow()) {
 migration_test_add("/migration/auto_converge",
test_migrate_auto_converge);
+migration_test_add("/migration/auto_converge_periodic_throttle",
+   test_migrate_auto_converge_periodic_throttle);
 if (g_str_equal(arch, "x86_64") &&
 has_kvm && kvm_dirty_ring_supported()) {
 migration_test_add("/migration/dirty_limit",
-- 
2.39.1

[PATCH RFC 03/10] qapi/migration: Introduce periodic CPU throttling parameters

2024-09-09 Thread Hyman Huang

To activate the periodic CPU throttleing feature, introduce
the cpu-periodic-throttle.

To control the frequency of throttling, introduce the
cpu-periodic-throttle-interval.

Signed-off-by: Hyman Huang 
---
 migration/migration-hmp-cmds.c | 17 +++
 migration/options.c| 54 ++
 migration/options.h|  2 ++
 qapi/migration.json| 25 +++-
 4 files changed, 97 insertions(+), 1 deletion(-)

diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 7d608d26e1..f7b8e06bb4 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -264,6 +264,15 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict 
*qdict)
 monitor_printf(mon, "%s: %s\n",
 MigrationParameter_str(MIGRATION_PARAMETER_CPU_THROTTLE_TAILSLOW),
 params->cpu_throttle_tailslow ? "on" : "off");
+assert(params->has_cpu_periodic_throttle);
+monitor_printf(mon, "%s: %s\n",
+MigrationParameter_str(MIGRATION_PARAMETER_CPU_PERIODIC_THROTTLE),
+params->cpu_periodic_throttle ? "on" : "off");
+assert(params->has_cpu_periodic_throttle_interval);
+monitor_printf(mon, "%s: %u\n",
+MigrationParameter_str(
+MIGRATION_PARAMETER_CPU_PERIODIC_THROTTLE_INTERVAL),
+params->cpu_periodic_throttle_interval);
 assert(params->has_max_cpu_throttle);
 monitor_printf(mon, "%s: %u\n",
 MigrationParameter_str(MIGRATION_PARAMETER_MAX_CPU_THROTTLE),
@@ -512,6 +521,14 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict)
 p->has_cpu_throttle_tailslow = true;
 visit_type_bool(v, param, &p->cpu_throttle_tailslow, &err);
 break;
+case MIGRATION_PARAMETER_CPU_PERIODIC_THROTTLE:
+p->has_cpu_periodic_throttle = true;
+visit_type_bool(v, param, &p->cpu_periodic_throttle, &err);
+break;
+case MIGRATION_PARAMETER_CPU_PERIODIC_THROTTLE_INTERVAL:
+p->has_cpu_periodic_throttle_interval = true;
+visit_type_uint8(v, param, &p->cpu_periodic_throttle_interval, &err);
+break;
 case MIGRATION_PARAMETER_MAX_CPU_THROTTLE:
 p->has_max_cpu_throttle = true;
 visit_type_uint8(v, param, &p->max_cpu_throttle, &err);
diff --git a/migration/options.c b/migration/options.c
index 645f55003d..2dbe275ba0 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -44,6 +44,7 @@
 #define DEFAULT_MIGRATE_THROTTLE_TRIGGER_THRESHOLD 50
 #define DEFAULT_MIGRATE_CPU_THROTTLE_INITIAL 20
 #define DEFAULT_MIGRATE_CPU_THROTTLE_INCREMENT 10
+#define DEFAULT_MIGRATE_CPU_PERIODIC_THROTTLE_INTERVAL 5
 #define DEFAULT_MIGRATE_MAX_CPU_THROTTLE 99
 
 /* Migration XBZRLE default cache size */
@@ -104,6 +105,11 @@ Property migration_properties[] = {
   DEFAULT_MIGRATE_CPU_THROTTLE_INCREMENT),
 DEFINE_PROP_BOOL("x-cpu-throttle-tailslow", MigrationState,
   parameters.cpu_throttle_tailslow, false),
+DEFINE_PROP_BOOL("x-cpu-periodic-throttle", MigrationState,
+  parameters.cpu_periodic_throttle, false),
+DEFINE_PROP_UINT8("x-cpu-periodic-throttle-interval", MigrationState,
+  parameters.cpu_periodic_throttle_interval,
+  DEFAULT_MIGRATE_CPU_PERIODIC_THROTTLE_INTERVAL),
 DEFINE_PROP_SIZE("x-max-bandwidth", MigrationState,
   parameters.max_bandwidth, MAX_THROTTLE),
 DEFINE_PROP_SIZE("avail-switchover-bandwidth", MigrationState,
@@ -695,6 +701,20 @@ uint8_t migrate_cpu_throttle_initial(void)
 return s->parameters.cpu_throttle_initial;
 }
 
+uint8_t migrate_periodic_throttle_interval(void)
+{
+MigrationState *s = migrate_get_current();
+
+return s->parameters.cpu_periodic_throttle_interval;
+}
+
+bool migrate_periodic_throttle(void)
+{
+MigrationState *s = migrate_get_current();
+
+return s->parameters.cpu_periodic_throttle;
+}
+
 bool migrate_cpu_throttle_tailslow(void)
 {
 MigrationState *s = migrate_get_current();
@@ -874,6 +894,11 @@ MigrationParameters *qmp_query_migrate_parameters(Error 
**errp)
 params->cpu_throttle_increment = s->parameters.cpu_throttle_increment;
 params->has_cpu_throttle_tailslow = true;
 params->cpu_throttle_tailslow = s->parameters.cpu_throttle_tailslow;
+params->has_cpu_periodic_throttle = true;
+params->cpu_periodic_throttle = s->parameters.cpu_periodic_throttle;
+params->has_cpu_periodic_throttle_interval = true;
+params->cpu_periodic_throttle_interval =
+s->parameters.cpu_periodic_throttle_interval;
 params->tls_creds = g_strdup(s->parameters.tls_creds);
 params->tls_hostname = g_strdup(s->parameters.tls_hostname);
 params->tls_authz = g_strdup(s->parameters.tls_authz ?
@@ -940,6 +965,8 @@ void migrate_params_init(MigrationParameters *params)
 params->has_cpu_throttle_initial = true;
 pa

[PATCH RFC 06/10] migration: Support periodic CPU throttle

2024-09-09 Thread Hyman Huang

When VM is configured with huge memory, the current throttle logic
doesn't look like to scale, because migration_trigger_throttle()
is only called for each iteration, so it won't be invoked for a long
time if one iteration can take a long time.

The periodic sync and throttle aims to fix the above issue by
synchronizing the remote dirty bitmap and triggering the throttle
periodically. This is a trade-off between synchronization overhead
and CPU throttle impact.

Signed-off-by: Hyman Huang 
---
 migration/migration.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 055d527ff6..fefd93b683 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1420,6 +1420,9 @@ static void migrate_fd_cleanup(MigrationState *s)
 qemu_thread_join(&s->thread);
 s->migration_thread_running = false;
 }
+if (migrate_periodic_throttle()) {
+periodic_throttle_stop();
+}
 bql_lock();
 
 multifd_send_shutdown();
@@ -3263,6 +3266,9 @@ static MigIterateState 
migration_iteration_run(MigrationState *s)
 
 if ((!pending_size || pending_size < s->threshold_size) && can_switchover) 
{
 trace_migration_thread_low_pending(pending_size);
+if (migrate_periodic_throttle()) {
+periodic_throttle_stop();
+}
 migration_completion(s);
 return MIG_ITERATE_BREAK;
 }
@@ -3508,6 +3514,11 @@ static void *migration_thread(void *opaque)
 ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
 bql_unlock();
 
+if (migrate_periodic_throttle()) {
+periodic_throttle_setup(true);
+periodic_throttle_start();
+}
+
 qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
MIGRATION_STATUS_ACTIVE);
 
-- 
2.39.1

[PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle

2024-09-09 Thread Hyman Huang

Despite the fact that the responsive CPU throttle is enabled,
the dirty sync count may not always increase because this is
an optimization that might not happen in any situation.

This test case just making sure it doesn't interfere with any
current functionality.

Signed-off-by: Hyman Huang 
---
 tests/qtest/migration-test.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 4626301435..cf0b1dcb50 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -718,6 +718,7 @@ typedef struct {
 typedef struct {
 /* CPU throttle parameters */
 bool periodic;
+bool responsive;
 } AutoConvergeArgs;
 
 static int test_migrate_start(QTestState **from, QTestState **to,
@@ -2795,6 +2796,7 @@ static void 
test_migrate_auto_converge_args(AutoConvergeArgs *input_args)
 QTestState *from, *to;
 int64_t percentage;
 bool periodic = (input_args && input_args->periodic);
+bool responsive = (input_args && input_args->responsive);
 
 /*
  * We want the test to be stable and as fast as possible.
@@ -2820,6 +2822,16 @@ static void 
test_migrate_auto_converge_args(AutoConvergeArgs *input_args)
 periodic_throttle_interval);
 }
 
+if (responsive) {
+/*
+ * The dirty-sync-count may not always go down while using responsive
+ * throttle because it is an optimization and may not take effect in
+ * any scenario. Just making sure this feature doesn't break any
+ * existing functionality by turning it on.
+ */
+migrate_set_parameter_bool(from, "cpu-responsive-throttle", true);
+}
+
 /*
  * Set the initial parameters so that the migration could not converge
  * without throttling.
@@ -2902,6 +2914,12 @@ static void 
test_migrate_auto_converge_periodic_throttle(void)
 test_migrate_auto_converge_args(&args);
 }
 
+static void test_migrate_auto_converge_responsive_throttle(void)
+{
+AutoConvergeArgs args = {.responsive = true};
+test_migrate_auto_converge_args(&args);
+}
+
 static void *
 test_migrate_precopy_tcp_multifd_start_common(QTestState *from,
   QTestState *to,
@@ -3955,6 +3973,8 @@ int main(int argc, char **argv)
test_migrate_auto_converge);
 migration_test_add("/migration/auto_converge_periodic_throttle",
test_migrate_auto_converge_periodic_throttle);
+migration_test_add("/migration/auto_converge_responsive_throttle",
+   test_migrate_auto_converge_responsive_throttle);
 if (g_str_equal(arch, "x86_64") &&
 has_kvm && kvm_dirty_ring_supported()) {
 migration_test_add("/migration/dirty_limit",
-- 
2.39.1

Re: [PATCH v2 0/7] hw/net/can/xlnx-versal-canfd: Miscellaneous fixes

2024-09-09 Thread Peter Maydell

On Tue, 27 Aug 2024 at 04:51, Doug Brown  wrote:
>
> This series fixes several problems I ran into while trying to simulate
> the AMD/Xilinx Versal CANFD controller in the xlnx-versal-virt machine
> using Xilinx's v6.6_LTS_2024.1 kernel. With all of these patches
> applied, everything works correctly alongside actual CAN devices.
>
> - IRQs were accidentally not being delivered due to having a level other
>   than 1. The IRQ count in /proc/interrupts in Linux was stuck at 0.
> - Incoming CAN FD frames were being treated as non-FD.
> - The CAN IDs were garbled in both RX and TX directions.
> - The ESI and BRS flags were not being handled.
> - The byte ordering was wrong in the data in both directions.
> - Incoming CAN FD frames with DLC = 1-7 weren't handled correctly.
> - The FIFO read_index and store_index wrapping logic was incorrect.
>
> I don't have any actual Versal hardware to compare behavior against, but
> with these changes, it plays nicely with SocketCAN on the host system.
>
> Changes in v2:
> - Added handling of ESI and BRS flags, ensured frame->flags is initialized
> - Switched to use common can_dlc2len() and can_len2dlc() functions
> - Added fix for FIFO wrapping problems I observed during stress testing

I've applied this series to target-arm.next; thanks!

-- PMM

Re: [PATCH v2 00/15] target/cris: Remove the deprecated CRIS target

2024-09-09 Thread Edgar E. Iglesias

On Mon, Sep 9, 2024 at 7:25 AM Philippe Mathieu-Daudé 
wrote:

> Hi Edgar,
>
> On 4/9/24 16:35, Philippe Mathieu-Daudé wrote:
> > Since v1:
> > - Split in smaller patches (pm215)
> >
> > The CRIS target is deprecated since v9.0 (commit
> > c7bbef40234 "docs: mark CRIS support as deprecated").
> >
> > Remove:
> > - Buildsys / CI infra
> > - User emulation
> > - System emulation (axis-dev88 machine and ETRAX devices)
> > - Tests
>
> You acked the deprecation commit (c7bbef4023).
> No objection for the removal? I'd rather have your
> explicit Acked-by before merging this.
>
>
Hi Phil,

Yes, sorry, I haven't had time to review each patch but:
Acked-by: Edgar E. Iglesias 

Cheers,
Edgar




> Thanks,
>
> Phil.
>
> > Philippe Mathieu-Daudé (15):
> >tests/tcg: Remove CRIS libc test files
> >tests/tcg: Remove CRIS bare test files
> >buildsys: Remove CRIS cross container
> >linux-user: Remove support for CRIS target
> >hw/cris: Remove the axis-dev88 machine
> >hw/cris: Remove image loader helper
> >hw/intc: Remove TYPE_ETRAX_FS_PIC device
> >hw/char: Remove TYPE_ETRAX_FS_SERIAL device
> >hw/net: Remove TYPE_ETRAX_FS_ETH device
> >hw/dma: Remove ETRAX_FS DMA device
> >hw/timer: Remove TYPE_ETRAX_FS_TIMER device
> >system: Remove support for CRIS target
> >target/cris: Remove the deprecated CRIS target
> >disas: Remove CRIS disassembler
> >seccomp: Remove check for CRIS host
>
>

Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle

2024-09-09 Thread Peter Maydell

On Mon, 9 Sept 2024 at 14:51, Hyman Huang  wrote:
>
> Despite the fact that the responsive CPU throttle is enabled,
> the dirty sync count may not always increase because this is
> an optimization that might not happen in any situation.
>
> This test case just making sure it doesn't interfere with any
> current functionality.
>
> Signed-off-by: Hyman Huang 

tests/qtest/migration-test already runs 75 different
subtests, takes up a massive chunk of our "make check"
time, and is very commonly a "times out" test on some
of our CI jobs. It runs on five different guest CPU
architectures, each one of which takes between 2 and
5 minutes to complete the full migration-test.

Do we really need to make it even bigger?

thanks
-- PMM

Re: [PATCH v2 00/15] target/cris: Remove the deprecated CRIS target

2024-09-09 Thread Philippe Mathieu-Daudé


On 9/9/24 15:59, Edgar E. Iglesias wrote:
On Mon, Sep 9, 2024 at 7:25 AM Philippe Mathieu-Daudé > wrote:


Hi Edgar,

On 4/9/24 16:35, Philippe Mathieu-Daudé wrote:
 > Since v1:
 > - Split in smaller patches (pm215)
 >
 > The CRIS target is deprecated since v9.0 (commit
 > c7bbef40234 "docs: mark CRIS support as deprecated").
 >
 > Remove:
 > - Buildsys / CI infra
 > - User emulation
 > - System emulation (axis-dev88 machine and ETRAX devices)
 > - Tests

You acked the deprecation commit (c7bbef4023).
No objection for the removal? I'd rather have your
explicit Acked-by before merging this.


Hi Phil,

Yes, sorry, I haven't had time to review each patch but:
Acked-by: Edgar E. Iglesias >


Thank you!



Cheers,
Edgar


Thanks,

Phil.

 > Philippe Mathieu-Daudé (15):
 >    tests/tcg: Remove CRIS libc test files
 >    tests/tcg: Remove CRIS bare test files
 >    buildsys: Remove CRIS cross container
 >    linux-user: Remove support for CRIS target
 >    hw/cris: Remove the axis-dev88 machine
 >    hw/cris: Remove image loader helper
 >    hw/intc: Remove TYPE_ETRAX_FS_PIC device
 >    hw/char: Remove TYPE_ETRAX_FS_SERIAL device
 >    hw/net: Remove TYPE_ETRAX_FS_ETH device
 >    hw/dma: Remove ETRAX_FS DMA device
 >    hw/timer: Remove TYPE_ETRAX_FS_TIMER device
 >    system: Remove support for CRIS target
 >    target/cris: Remove the deprecated CRIS target


Series queued except patch 14:


 >    disas: Remove CRIS disassembler
 >    seccomp: Remove check for CRIS host

[PULL 00/10] Crypto fixes patches

2024-09-09 Thread Daniel P . Berrangé

The following changes since commit f2aee60305a1e40374b2fc1093e4d04404e780ee:

  Merge tag 'pull-request-2024-09-08' of https://gitlab.com/huth/qemu into 
staging (2024-09-09 10:47:24 +0100)

are available in the Git repository at:

  https://gitlab.com/berrange/qemu tags/crypto-fixes-pull-request

for you to fetch changes up to 10a1d34fc0d4dfe0dd6f5ec73f62dc1afa04af6c:

  crypto: Introduce x509 utils (2024-09-09 15:13:38 +0100)


Various crypto fixes

 * Support sha384 with glib crypto backend
 * Improve error reporting for unsupported cipher modes
 * Avoid memory leak when bad cipher mode is given
 * Run pbkdf tests on macOS
 * Runtime check for pbkdf hash impls with gnutls & gcrypt
 * Avoid hangs counter pbkdf iterations on some Linux kernels
   by using a throwaway thread for benchmarking performance
 * Fix iotests expected output from gnutls errors



Daniel P. Berrangé (6):
  iotests: fix expected output from gnutls
  crypto: check gnutls & gcrypt support the requested pbkdf hash
  tests/unit: always build the pbkdf crypto unit test
  tests/unit: build pbkdf test on macOS
  crypto: avoid leak of ctx when bad cipher mode is given
  crypto: use consistent error reporting pattern for unsupported cipher
modes

Dorjoy Chowdhury (3):
  crypto: Define macros for hash algorithm digest lengths
  crypto: Support SHA384 hash when using glib
  crypto: Introduce x509 utils

Tiago Pasqualini (1):
  crypto: run qcrypto_pbkdf2_count_iters in a new thread

 crypto/cipher-nettle.c.inc | 25 ---
 crypto/hash-glib.c |  2 +-
 crypto/hash.c  | 14 +++
 crypto/meson.build |  4 ++
 crypto/pbkdf-gcrypt.c  |  2 +-
 crypto/pbkdf-gnutls.c  |  2 +-
 crypto/pbkdf.c | 53 
 crypto/x509-utils.c| 76 ++
 include/crypto/hash.h  |  8 
 include/crypto/x509-utils.h| 22 ++
 tests/qemu-iotests/233.out | 12 +++---
 tests/unit/meson.build |  4 +-
 tests/unit/test-crypto-pbkdf.c | 13 +++---
 13 files changed, 200 insertions(+), 37 deletions(-)
 create mode 100644 crypto/x509-utils.c
 create mode 100644 include/crypto/x509-utils.h

-- 
2.45.2

[PULL 09/10] crypto: Support SHA384 hash when using glib

2024-09-09 Thread Daniel P . Berrangé

From: Dorjoy Chowdhury 

QEMU requires minimum glib version 2.66.0 as per the root meson.build
file and per glib documentation[1] G_CHECKSUM_SHA384 is available since
2.51.

[1] https://docs.gtk.org/glib/enum.ChecksumType.html

Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Dorjoy Chowdhury 
Signed-off-by: Daniel P. Berrangé 
---
 crypto/hash-glib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/crypto/hash-glib.c b/crypto/hash-glib.c
index 82de9db705..18e64faa9c 100644
--- a/crypto/hash-glib.c
+++ b/crypto/hash-glib.c
@@ -29,7 +29,7 @@ static int qcrypto_hash_alg_map[QCRYPTO_HASH_ALG__MAX] = {
 [QCRYPTO_HASH_ALG_SHA1] = G_CHECKSUM_SHA1,
 [QCRYPTO_HASH_ALG_SHA224] = -1,
 [QCRYPTO_HASH_ALG_SHA256] = G_CHECKSUM_SHA256,
-[QCRYPTO_HASH_ALG_SHA384] = -1,
+[QCRYPTO_HASH_ALG_SHA384] = G_CHECKSUM_SHA384,
 [QCRYPTO_HASH_ALG_SHA512] = G_CHECKSUM_SHA512,
 [QCRYPTO_HASH_ALG_RIPEMD160] = -1,
 };
-- 
2.45.2

[PULL 05/10] tests/unit: build pbkdf test on macOS

2024-09-09 Thread Daniel P . Berrangé

Add CONFIG_DARWIN to the pbkdf test build condition, since we have a way
to measure CPU time on this platform since commit bf98afc75efedf1.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Daniel P. Berrangé 
---
 tests/unit/test-crypto-pbkdf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/unit/test-crypto-pbkdf.c b/tests/unit/test-crypto-pbkdf.c
index 241e1c2cf0..39264cb662 100644
--- a/tests/unit/test-crypto-pbkdf.c
+++ b/tests/unit/test-crypto-pbkdf.c
@@ -25,7 +25,7 @@
 #include 
 #endif
 
-#if defined(_WIN32) || defined(RUSAGE_THREAD)
+#if defined(_WIN32) || defined(RUSAGE_THREAD) || defined(CONFIG_DARWNI)
 #include "crypto/pbkdf.h"
 
 typedef struct QCryptoPbkdfTestData QCryptoPbkdfTestData;
-- 
2.45.2

[PULL 08/10] crypto: Define macros for hash algorithm digest lengths

2024-09-09 Thread Daniel P . Berrangé

From: Dorjoy Chowdhury 

Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Dorjoy Chowdhury 
Signed-off-by: Daniel P. Berrangé 
---
 crypto/hash.c | 14 +++---
 include/crypto/hash.h |  8 
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/crypto/hash.c b/crypto/hash.c
index b0f8228bdc..8087f5dae6 100644
--- a/crypto/hash.c
+++ b/crypto/hash.c
@@ -23,13 +23,13 @@
 #include "hashpriv.h"
 
 static size_t qcrypto_hash_alg_size[QCRYPTO_HASH_ALG__MAX] = {
-[QCRYPTO_HASH_ALG_MD5] = 16,
-[QCRYPTO_HASH_ALG_SHA1] = 20,
-[QCRYPTO_HASH_ALG_SHA224] = 28,
-[QCRYPTO_HASH_ALG_SHA256] = 32,
-[QCRYPTO_HASH_ALG_SHA384] = 48,
-[QCRYPTO_HASH_ALG_SHA512] = 64,
-[QCRYPTO_HASH_ALG_RIPEMD160] = 20,
+[QCRYPTO_HASH_ALG_MD5]   = QCRYPTO_HASH_DIGEST_LEN_MD5,
+[QCRYPTO_HASH_ALG_SHA1]  = QCRYPTO_HASH_DIGEST_LEN_SHA1,
+[QCRYPTO_HASH_ALG_SHA224]= QCRYPTO_HASH_DIGEST_LEN_SHA224,
+[QCRYPTO_HASH_ALG_SHA256]= QCRYPTO_HASH_DIGEST_LEN_SHA256,
+[QCRYPTO_HASH_ALG_SHA384]= QCRYPTO_HASH_DIGEST_LEN_SHA384,
+[QCRYPTO_HASH_ALG_SHA512]= QCRYPTO_HASH_DIGEST_LEN_SHA512,
+[QCRYPTO_HASH_ALG_RIPEMD160] = QCRYPTO_HASH_DIGEST_LEN_RIPEMD160,
 };
 
 size_t qcrypto_hash_digest_len(QCryptoHashAlgorithm alg)
diff --git a/include/crypto/hash.h b/include/crypto/hash.h
index 54d87aa2a1..a113cc3b04 100644
--- a/include/crypto/hash.h
+++ b/include/crypto/hash.h
@@ -23,6 +23,14 @@
 
 #include "qapi/qapi-types-crypto.h"
 
+#define QCRYPTO_HASH_DIGEST_LEN_MD5   16
+#define QCRYPTO_HASH_DIGEST_LEN_SHA1  20
+#define QCRYPTO_HASH_DIGEST_LEN_SHA22428
+#define QCRYPTO_HASH_DIGEST_LEN_SHA25632
+#define QCRYPTO_HASH_DIGEST_LEN_SHA38448
+#define QCRYPTO_HASH_DIGEST_LEN_SHA51264
+#define QCRYPTO_HASH_DIGEST_LEN_RIPEMD160 20
+
 /* See also "QCryptoHashAlgorithm" defined in qapi/crypto.json */
 
 /**
-- 
2.45.2

[PULL 03/10] crypto: check gnutls & gcrypt support the requested pbkdf hash

2024-09-09 Thread Daniel P . Berrangé

Both gnutls and gcrypt can be configured to exclude support for certain
algorithms via a runtime check against system crypto policies. Thus it
is not sufficient to have a compile time test for hash support in their
pbkdf implementations.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Daniel P. Berrangé 
---
 crypto/pbkdf-gcrypt.c | 2 +-
 crypto/pbkdf-gnutls.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/crypto/pbkdf-gcrypt.c b/crypto/pbkdf-gcrypt.c
index a8d8e64f4d..bc0719c831 100644
--- a/crypto/pbkdf-gcrypt.c
+++ b/crypto/pbkdf-gcrypt.c
@@ -33,7 +33,7 @@ bool qcrypto_pbkdf2_supports(QCryptoHashAlgorithm hash)
 case QCRYPTO_HASH_ALG_SHA384:
 case QCRYPTO_HASH_ALG_SHA512:
 case QCRYPTO_HASH_ALG_RIPEMD160:
-return true;
+return qcrypto_hash_supports(hash);
 default:
 return false;
 }
diff --git a/crypto/pbkdf-gnutls.c b/crypto/pbkdf-gnutls.c
index 2dfbbd382c..911b565bea 100644
--- a/crypto/pbkdf-gnutls.c
+++ b/crypto/pbkdf-gnutls.c
@@ -33,7 +33,7 @@ bool qcrypto_pbkdf2_supports(QCryptoHashAlgorithm hash)
 case QCRYPTO_HASH_ALG_SHA384:
 case QCRYPTO_HASH_ALG_SHA512:
 case QCRYPTO_HASH_ALG_RIPEMD160:
-return true;
+return qcrypto_hash_supports(hash);
 default:
 return false;
 }
-- 
2.45.2

[PULL 06/10] crypto: avoid leak of ctx when bad cipher mode is given

2024-09-09 Thread Daniel P . Berrangé

Fixes: Coverity CID 1546884
Reviewed-by: Peter Maydell 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Daniel P. Berrangé 
---
 crypto/cipher-nettle.c.inc | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/crypto/cipher-nettle.c.inc b/crypto/cipher-nettle.c.inc
index 42b39e18a2..766de036ba 100644
--- a/crypto/cipher-nettle.c.inc
+++ b/crypto/cipher-nettle.c.inc
@@ -734,16 +734,19 @@ static QCryptoCipher 
*qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg,
 #ifdef CONFIG_CRYPTO_SM4
 case QCRYPTO_CIPHER_ALG_SM4:
 {
-QCryptoNettleSm4 *ctx = g_new0(QCryptoNettleSm4, 1);
+QCryptoNettleSm4 *ctx;
+const QCryptoCipherDriver *drv;
 
 switch (mode) {
 case QCRYPTO_CIPHER_MODE_ECB:
-ctx->base.driver = &qcrypto_nettle_sm4_driver_ecb;
+drv = &qcrypto_nettle_sm4_driver_ecb;
 break;
 default:
 goto bad_cipher_mode;
 }
 
+ctx = g_new0(QCryptoNettleSm4, 1);
+ctx->base.driver = drv;
 sm4_set_encrypt_key(&ctx->key[0], key);
 sm4_set_decrypt_key(&ctx->key[1], key);
 
-- 
2.45.2

[PULL 01/10] iotests: fix expected output from gnutls

2024-09-09 Thread Daniel P . Berrangé

Error reporting from gnutls was improved by:

  commit 57941c9c86357a6a642f9ee3279d881df4043b6d
  Author: Daniel P. Berrangé 
  Date:   Fri Mar 15 14:07:58 2024 +

crypto: push error reporting into TLS session I/O APIs

This has the effect of changing the output from one of the NBD
tests.

Reported-by: Thomas Huth 
Signed-off-by: Daniel P. Berrangé 
---
 tests/qemu-iotests/233.out | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/tests/qemu-iotests/233.out b/tests/qemu-iotests/233.out
index 1910f7df20..d498d55e0e 100644
--- a/tests/qemu-iotests/233.out
+++ b/tests/qemu-iotests/233.out
@@ -69,8 +69,8 @@ read 1048576/1048576 bytes at offset 1048576
 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
 == check TLS with authorization ==
-qemu-img: Could not open 'driver=nbd,host=127.0.0.1,port=PORT,tls-creds=tls0': 
Failed to read option reply: Cannot read from TLS channel: Software caused 
connection abort
-qemu-img: Could not open 'driver=nbd,host=127.0.0.1,port=PORT,tls-creds=tls0': 
Failed to read option reply: Cannot read from TLS channel: Software caused 
connection abort
+qemu-img: Could not open 'driver=nbd,host=127.0.0.1,port=PORT,tls-creds=tls0': 
Failed to read option reply: Cannot read from TLS channel: The TLS connection 
was non-properly terminated.
+qemu-img: Could not open 'driver=nbd,host=127.0.0.1,port=PORT,tls-creds=tls0': 
Failed to read option reply: Cannot read from TLS channel: The TLS connection 
was non-properly terminated.
 
 == check TLS fail over UNIX with no hostname ==
 qemu-img: Could not open 
'driver=nbd,path=SOCK_DIR/qemu-nbd.sock,tls-creds=tls0': No hostname for 
certificate validation
@@ -103,14 +103,14 @@ qemu-img: Could not open 
'driver=nbd,path=SOCK_DIR/qemu-nbd.sock,tls-creds=tls0'
 qemu-nbd: TLS handshake failed: The TLS connection was non-properly terminated.
 
 == final server log ==
-qemu-nbd: option negotiation failed: Failed to read opts magic: Cannot read 
from TLS channel: Software caused connection abort
-qemu-nbd: option negotiation failed: Failed to read opts magic: Cannot read 
from TLS channel: Software caused connection abort
+qemu-nbd: option negotiation failed: Failed to read opts magic: Cannot read 
from TLS channel: The TLS connection was non-properly terminated.
+qemu-nbd: option negotiation failed: Failed to read opts magic: Cannot read 
from TLS channel: The TLS connection was non-properly terminated.
 qemu-nbd: option negotiation failed: Verify failed: No certificate was found.
 qemu-nbd: option negotiation failed: Verify failed: No certificate was found.
 qemu-nbd: option negotiation failed: TLS x509 authz check for 
DISTINGUISHED-NAME is denied
 qemu-nbd: option negotiation failed: TLS x509 authz check for 
DISTINGUISHED-NAME is denied
-qemu-nbd: option negotiation failed: Failed to read opts magic: Cannot read 
from TLS channel: Software caused connection abort
-qemu-nbd: option negotiation failed: Failed to read opts magic: Cannot read 
from TLS channel: Software caused connection abort
+qemu-nbd: option negotiation failed: Failed to read opts magic: Cannot read 
from TLS channel: The TLS connection was non-properly terminated.
+qemu-nbd: option negotiation failed: Failed to read opts magic: Cannot read 
from TLS channel: The TLS connection was non-properly terminated.
 qemu-nbd: option negotiation failed: TLS handshake failed: An illegal 
parameter has been received.
 qemu-nbd: option negotiation failed: TLS handshake failed: An illegal 
parameter has been received.
 *** done
-- 
2.45.2

[PULL 02/10] crypto: run qcrypto_pbkdf2_count_iters in a new thread

2024-09-09 Thread Daniel P . Berrangé

From: Tiago Pasqualini 

CPU time accounting in the kernel has been demonstrated to have a
sawtooth pattern[1][2]. This can cause the getrusage system call to
not be as accurate as we are expecting, which can cause this calculation
to stall.

The kernel discussions shows that this inaccuracy happens when CPU time
gets big enough, so this patch changes qcrypto_pbkdf2_count_iters to run
in a fresh thread to avoid this inaccuracy. It also adds a sanity check
to fail the process if CPU time is not accounted.

[1] 
https://lore.kernel.org/lkml/159231011694.16989.16351419333851309713.tip-bot2@tip-bot2/
[2] 
https://lore.kernel.org/lkml/20221226031010.4079885-1-maxing@bytedance.com/t/#m1c7f2fdc0ea742776a70fd1aa2a2e414c437f534

Resolves: #2398
Signed-off-by: Tiago Pasqualini 
Signed-off-by: Daniel P. Berrangé 
---
 crypto/pbkdf.c | 53 +++---
 1 file changed, 46 insertions(+), 7 deletions(-)

diff --git a/crypto/pbkdf.c b/crypto/pbkdf.c
index 8d198c152c..d1c06ef3ed 100644
--- a/crypto/pbkdf.c
+++ b/crypto/pbkdf.c
@@ -19,6 +19,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/thread.h"
 #include "qapi/error.h"
 #include "crypto/pbkdf.h"
 #ifndef _WIN32
@@ -85,12 +86,28 @@ static int qcrypto_pbkdf2_get_thread_cpu(unsigned long long 
*val_ms,
 #endif
 }
 
-uint64_t qcrypto_pbkdf2_count_iters(QCryptoHashAlgorithm hash,
-const uint8_t *key, size_t nkey,
-const uint8_t *salt, size_t nsalt,
-size_t nout,
-Error **errp)
+typedef struct CountItersData {
+QCryptoHashAlgorithm hash;
+const uint8_t *key;
+size_t nkey;
+const uint8_t *salt;
+size_t nsalt;
+size_t nout;
+uint64_t iterations;
+Error **errp;
+} CountItersData;
+
+static void *threaded_qcrypto_pbkdf2_count_iters(void *data)
 {
+CountItersData *iters_data = (CountItersData *) data;
+QCryptoHashAlgorithm hash = iters_data->hash;
+const uint8_t *key = iters_data->key;
+size_t nkey = iters_data->nkey;
+const uint8_t *salt = iters_data->salt;
+size_t nsalt = iters_data->nsalt;
+size_t nout = iters_data->nout;
+Error **errp = iters_data->errp;
+
 uint64_t ret = -1;
 g_autofree uint8_t *out = g_new(uint8_t, nout);
 uint64_t iterations = (1 << 15);
@@ -114,7 +131,10 @@ uint64_t qcrypto_pbkdf2_count_iters(QCryptoHashAlgorithm 
hash,
 
 delta_ms = end_ms - start_ms;
 
-if (delta_ms > 500) {
+if (delta_ms == 0) { /* sanity check */
+error_setg(errp, "Unable to get accurate CPU usage");
+goto cleanup;
+} else if (delta_ms > 500) {
 break;
 } else if (delta_ms < 100) {
 iterations = iterations * 10;
@@ -129,5 +149,24 @@ uint64_t qcrypto_pbkdf2_count_iters(QCryptoHashAlgorithm 
hash,
 
  cleanup:
 memset(out, 0, nout);
-return ret;
+iters_data->iterations = ret;
+return NULL;
+}
+
+uint64_t qcrypto_pbkdf2_count_iters(QCryptoHashAlgorithm hash,
+const uint8_t *key, size_t nkey,
+const uint8_t *salt, size_t nsalt,
+size_t nout,
+Error **errp)
+{
+CountItersData data = {
+hash, key, nkey, salt, nsalt, nout, 0, errp
+};
+QemuThread thread;
+
+qemu_thread_create(&thread, "pbkdf2", threaded_qcrypto_pbkdf2_count_iters,
+   &data, QEMU_THREAD_JOINABLE);
+qemu_thread_join(&thread);
+
+return data.iterations;
 }
-- 
2.45.2

[PULL 07/10] crypto: use consistent error reporting pattern for unsupported cipher modes

2024-09-09 Thread Daniel P . Berrangé

Not all paths in qcrypto_cipher_ctx_new() were correctly distinguishing
between valid user input for cipher mode (which should report a user
facing error), vs program logic errors (which should assert).

Reported-by: Peter Maydell 
Signed-off-by: Daniel P. Berrangé 
---
 crypto/cipher-nettle.c.inc | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/crypto/cipher-nettle.c.inc b/crypto/cipher-nettle.c.inc
index 766de036ba..2654b439c1 100644
--- a/crypto/cipher-nettle.c.inc
+++ b/crypto/cipher-nettle.c.inc
@@ -525,8 +525,10 @@ static QCryptoCipher 
*qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg,
 case QCRYPTO_CIPHER_MODE_CTR:
 drv = &qcrypto_nettle_des_driver_ctr;
 break;
-default:
+case QCRYPTO_CIPHER_MODE_XTS:
 goto bad_cipher_mode;
+default:
+g_assert_not_reached();
 }
 
 ctx = g_new0(QCryptoNettleDES, 1);
@@ -551,8 +553,10 @@ static QCryptoCipher 
*qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg,
 case QCRYPTO_CIPHER_MODE_CTR:
 drv = &qcrypto_nettle_des3_driver_ctr;
 break;
-default:
+case QCRYPTO_CIPHER_MODE_XTS:
 goto bad_cipher_mode;
+default:
+g_assert_not_reached();
 }
 
 ctx = g_new0(QCryptoNettleDES3, 1);
@@ -663,8 +667,10 @@ static QCryptoCipher 
*qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg,
 case QCRYPTO_CIPHER_MODE_CTR:
 drv = &qcrypto_nettle_cast128_driver_ctr;
 break;
-default:
+case QCRYPTO_CIPHER_MODE_XTS:
 goto bad_cipher_mode;
+default:
+g_assert_not_reached();
 }
 
 ctx = g_new0(QCryptoNettleCAST128, 1);
@@ -741,8 +747,12 @@ static QCryptoCipher 
*qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg,
 case QCRYPTO_CIPHER_MODE_ECB:
 drv = &qcrypto_nettle_sm4_driver_ecb;
 break;
-default:
+case QCRYPTO_CIPHER_MODE_CBC:
+case QCRYPTO_CIPHER_MODE_CTR:
+case QCRYPTO_CIPHER_MODE_XTS:
 goto bad_cipher_mode;
+default:
+g_assert_not_reached();
 }
 
 ctx = g_new0(QCryptoNettleSm4, 1);
-- 
2.45.2

Re: [PATCH] block: support locking on change medium

2024-09-09 Thread Joelle van Dyne

On Mon, Sep 9, 2024 at 12:36 AM Akihiko Odaki  wrote:
>
> On 2024/09/09 10:58, Joelle van Dyne wrote:
> > New optional argument for 'blockdev-change-medium' QAPI command to allow
> > the caller to specify if they wish to enable file locking.
> >
> > Signed-off-by: Joelle van Dyne 
> > ---
> >   qapi/block.json| 23 ++-
> >   block/monitor/block-hmp-cmds.c |  2 +-
> >   block/qapi-sysemu.c| 22 ++
> >   ui/cocoa.m |  1 +
> >   4 files changed, 46 insertions(+), 2 deletions(-)
> >
> > diff --git a/qapi/block.json b/qapi/block.json
> > index e6f5c6..35e8e2e191 100644
> > --- a/qapi/block.json
> > +++ b/qapi/block.json
> > @@ -309,6 +309,23 @@
> >   { 'enum': 'BlockdevChangeReadOnlyMode',
> > 'data': ['retain', 'read-only', 'read-write'] }
> >
> > +##
> > +# @BlockdevChangeFileLockingMode:
> > +#
> > +# Specifies the new locking mode of a file image passed to the
> > +# @blockdev-change-medium command.
> > +#
> > +# @auto: Use locking if API is available
> > +#
> > +# @off: Disable file image locking
> > +#
> > +# @on: Enable file image locking
> > +#
> > +# Since: 9.2
> > +##
> > +{ 'enum': 'BlockdevChangeFileLockingMode',
> > +  'data': ['auto', 'off', 'on'] }
>
> You can use OnOffAuto type instead of defining your own.

This can be done. I had thought that defining a new type makes the
argument more explicit about the meaning.

>
> > +
> >   ##
> >   # @blockdev-change-medium:
> >   #
> > @@ -330,6 +347,9 @@
> >   # @read-only-mode: change the read-only mode of the device; defaults
> >   # to 'retain'
> >   #
> > +# @file-locking-mode: change the locking mode of the file image; defaults
> > +# to 'auto' (since: 9.2)
> > +#
> >   # @force: if false (the default), an eject request through
> >   # blockdev-open-tray will be sent to the guest if it has locked
> >   # the tray (and the tray will not be opened immediately); if true,
> > @@ -378,7 +398,8 @@
> >   'filename': 'str',
> >   '*format': 'str',
> >   '*force': 'bool',
> > -'*read-only-mode': 'BlockdevChangeReadOnlyMode' } }
> > +'*read-only-mode': 'BlockdevChangeReadOnlyMode',
> > +'*file-locking-mode': 'BlockdevChangeFileLockingMode' } }
> >
> >   ##
> >   # @DEVICE_TRAY_MOVED:
> > diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
> > index bdf2eb50b6..ff64020a80 100644
> > --- a/block/monitor/block-hmp-cmds.c
> > +++ b/block/monitor/block-hmp-cmds.c
> > @@ -1007,5 +1007,5 @@ void hmp_change_medium(Monitor *mon, const char 
> > *device, const char *target,
> >   }
> >
> >   qmp_blockdev_change_medium(device, NULL, target, arg, true, force,
> > -   !!read_only, read_only_mode, errp);
> > +   !!read_only, read_only_mode, false, 0, 
> > errp);
> >   }
> > diff --git a/block/qapi-sysemu.c b/block/qapi-sysemu.c
> > index e4282631d2..8064bdfb3a 100644
> > --- a/block/qapi-sysemu.c
> > +++ b/block/qapi-sysemu.c
> > @@ -311,6 +311,8 @@ void qmp_blockdev_change_medium(const char *device,
> >   bool has_force, bool force,
> >   bool has_read_only,
> >   BlockdevChangeReadOnlyMode read_only,
> > +bool has_file_locking_mode,
> > +BlockdevChangeFileLockingMode 
> > file_locking_mode,
> >   Error **errp)
> >   {
> >   BlockBackend *blk;
> > @@ -362,6 +364,26 @@ void qmp_blockdev_change_medium(const char *device,
> >   qdict_put_str(options, "driver", format);
> >   }
> >
> > +if (!has_file_locking_mode) {
> > +file_locking_mode = BLOCKDEV_CHANGE_FILE_LOCKING_MODE_AUTO;
> > +}
> > +
> > +switch (file_locking_mode) {
> > +case BLOCKDEV_CHANGE_FILE_LOCKING_MODE_AUTO:
> > +break;
> > +
> > +case BLOCKDEV_CHANGE_FILE_LOCKING_MODE_OFF:
> > +qdict_put_str(options, "file.locking", "off");
> > +break;
> > +
> > +case BLOCKDEV_CHANGE_FILE_LOCKING_MODE_ON:
> > +qdict_put_str(options, "file.locking", "on");
> > +break;
> > +
> > +default:
> > +abort();
> > +}
> > +
> >   medium_bs = bdrv_open(filename, NULL, options, bdrv_flags, errp);
> >
> >   if (!medium_bs) {
> > diff --git a/ui/cocoa.m b/ui/cocoa.m
> > index 4c2dd33532..6e73c6e13e 100644
> > --- a/ui/cocoa.m
> > +++ b/ui/cocoa.m
> > @@ -1611,6 +1611,7 @@ - (void)changeDeviceMedia:(id)sender
> >  "raw",
> >  true, false,
> >  false, 0,
> > +   false, 0,
>
> This change is irrelevant.

This change is needed otherwise QEMU will not compile.

>
> Regards,
> Akihiko Odaki

[PULL 04/10] tests/unit: always build the pbkdf crypto unit test

2024-09-09 Thread Daniel P . Berrangé

The meson rules were excluding the pbkdf crypto test when gnutls was the
crypto backend. It was then excluded again in #if statements in the test
file.

Rather than update these conditions, remove them all, and use the result
of the qcrypto_pbkdf_supports() function to determine whether to skip
test registration.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Daniel P. Berrangé 
---
 tests/unit/meson.build |  4 +---
 tests/unit/test-crypto-pbkdf.c | 13 -
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/tests/unit/meson.build b/tests/unit/meson.build
index 490ab8182d..972d792883 100644
--- a/tests/unit/meson.build
+++ b/tests/unit/meson.build
@@ -121,9 +121,7 @@ if have_block
   if config_host_data.get('CONFIG_REPLICATION')
 tests += {'test-replication': [testblock]}
   endif
-  if nettle.found() or gcrypt.found()
-tests += {'test-crypto-pbkdf': [io]}
-  endif
+  tests += {'test-crypto-pbkdf': [io]}
 endif
 
 if have_system
diff --git a/tests/unit/test-crypto-pbkdf.c b/tests/unit/test-crypto-pbkdf.c
index 43c417f6b4..241e1c2cf0 100644
--- a/tests/unit/test-crypto-pbkdf.c
+++ b/tests/unit/test-crypto-pbkdf.c
@@ -25,8 +25,7 @@
 #include 
 #endif
 
-#if ((defined(CONFIG_NETTLE) || defined(CONFIG_GCRYPT)) && \
- (defined(_WIN32) || defined(RUSAGE_THREAD)))
+#if defined(_WIN32) || defined(RUSAGE_THREAD)
 #include "crypto/pbkdf.h"
 
 typedef struct QCryptoPbkdfTestData QCryptoPbkdfTestData;
@@ -394,7 +393,7 @@ static void test_pbkdf(const void *opaque)
 }
 
 
-static void test_pbkdf_timing(void)
+static void test_pbkdf_timing_sha256(void)
 {
 uint8_t key[32];
 uint8_t salt[32];
@@ -422,14 +421,18 @@ int main(int argc, char **argv)
 g_assert(qcrypto_init(NULL) == 0);
 
 for (i = 0; i < G_N_ELEMENTS(test_data); i++) {
+if (!qcrypto_pbkdf2_supports(test_data[i].hash)) {
+continue;
+}
+
 if (!test_data[i].slow ||
 g_test_slow()) {
 g_test_add_data_func(test_data[i].path, &test_data[i], test_pbkdf);
 }
 }
 
-if (g_test_slow()) {
-g_test_add_func("/crypt0/pbkdf/timing", test_pbkdf_timing);
+if (g_test_slow() && qcrypto_pbkdf2_supports(QCRYPTO_HASH_ALG_SHA256)) {
+g_test_add_func("/crypt0/pbkdf/timing/sha256", 
test_pbkdf_timing_sha256);
 }
 
 return g_test_run();
-- 
2.45.2

[PULL 10/10] crypto: Introduce x509 utils

2024-09-09 Thread Daniel P . Berrangé

From: Dorjoy Chowdhury 

An utility function for getting fingerprint from X.509 certificate
has been introduced. Implementation only provided using gnutls.

Signed-off-by: Dorjoy Chowdhury 
[DB: fixed missing gnutls_x509_crt_deinit in success path]
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Daniel P. Berrangé 
---
 crypto/meson.build  |  4 ++
 crypto/x509-utils.c | 76 +
 include/crypto/x509-utils.h | 22 +++
 3 files changed, 102 insertions(+)
 create mode 100644 crypto/x509-utils.c
 create mode 100644 include/crypto/x509-utils.h

diff --git a/crypto/meson.build b/crypto/meson.build
index c46f9c22a7..735635de1f 100644
--- a/crypto/meson.build
+++ b/crypto/meson.build
@@ -24,6 +24,10 @@ crypto_ss.add(files(
   'rsakey.c',
 ))
 
+if gnutls.found()
+  crypto_ss.add(files('x509-utils.c'))
+endif
+
 if nettle.found()
   crypto_ss.add(nettle, files('hash-nettle.c', 'hmac-nettle.c', 
'pbkdf-nettle.c'))
   if hogweed.found()
diff --git a/crypto/x509-utils.c b/crypto/x509-utils.c
new file mode 100644
index 00..6e157af76b
--- /dev/null
+++ b/crypto/x509-utils.c
@@ -0,0 +1,76 @@
+/*
+ * X.509 certificate related helpers
+ *
+ * Copyright (c) 2024 Dorjoy Chowdhury 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * (at your option) any later version.  See the COPYING file in the
+ * top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "crypto/x509-utils.h"
+#include 
+#include 
+#include 
+
+static const int qcrypto_to_gnutls_hash_alg_map[QCRYPTO_HASH_ALG__MAX] = {
+[QCRYPTO_HASH_ALG_MD5] = GNUTLS_DIG_MD5,
+[QCRYPTO_HASH_ALG_SHA1] = GNUTLS_DIG_SHA1,
+[QCRYPTO_HASH_ALG_SHA224] = GNUTLS_DIG_SHA224,
+[QCRYPTO_HASH_ALG_SHA256] = GNUTLS_DIG_SHA256,
+[QCRYPTO_HASH_ALG_SHA384] = GNUTLS_DIG_SHA384,
+[QCRYPTO_HASH_ALG_SHA512] = GNUTLS_DIG_SHA512,
+[QCRYPTO_HASH_ALG_RIPEMD160] = GNUTLS_DIG_RMD160,
+};
+
+int qcrypto_get_x509_cert_fingerprint(uint8_t *cert, size_t size,
+  QCryptoHashAlgorithm alg,
+  uint8_t *result,
+  size_t *resultlen,
+  Error **errp)
+{
+int ret = -1;
+int hlen;
+gnutls_x509_crt_t crt;
+gnutls_datum_t datum = {.data = cert, .size = size};
+
+if (alg >= G_N_ELEMENTS(qcrypto_to_gnutls_hash_alg_map)) {
+error_setg(errp, "Unknown hash algorithm");
+return -1;
+}
+
+if (result == NULL) {
+error_setg(errp, "No valid buffer given");
+return -1;
+}
+
+gnutls_x509_crt_init(&crt);
+
+if (gnutls_x509_crt_import(crt, &datum, GNUTLS_X509_FMT_PEM) != 0) {
+error_setg(errp, "Failed to import certificate");
+goto cleanup;
+}
+
+hlen = gnutls_hash_get_len(qcrypto_to_gnutls_hash_alg_map[alg]);
+if (*resultlen < hlen) {
+error_setg(errp,
+   "Result buffer size %zu is smaller than hash %d",
+   *resultlen, hlen);
+goto cleanup;
+}
+
+if (gnutls_x509_crt_get_fingerprint(crt,
+qcrypto_to_gnutls_hash_alg_map[alg],
+result, resultlen) != 0) {
+error_setg(errp, "Failed to get fingerprint from certificate");
+goto cleanup;
+}
+
+ret = 0;
+
+ cleanup:
+gnutls_x509_crt_deinit(crt);
+return ret;
+}
diff --git a/include/crypto/x509-utils.h b/include/crypto/x509-utils.h
new file mode 100644
index 00..4210dfbcfc
--- /dev/null
+++ b/include/crypto/x509-utils.h
@@ -0,0 +1,22 @@
+/*
+ * X.509 certificate related helpers
+ *
+ * Copyright (c) 2024 Dorjoy Chowdhury 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * (at your option) any later version.  See the COPYING file in the
+ * top-level directory.
+ */
+
+#ifndef QCRYPTO_X509_UTILS_H
+#define QCRYPTO_X509_UTILS_H
+
+#include "crypto/hash.h"
+
+int qcrypto_get_x509_cert_fingerprint(uint8_t *cert, size_t size,
+  QCryptoHashAlgorithm hash,
+  uint8_t *result,
+  size_t *resultlen,
+  Error **errp);
+
+#endif
-- 
2.45.2

[Bug Report] smmuv3 event 0x10 report when running virtio-blk-pci

2024-09-09 Thread Zhou Wang via

Hi All,

When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10
during kernel booting up.

qemu command which I use is as below:

qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \
-kernel Image -initrd minifs.cpio.gz \
-enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \
-append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x9000 maxcpus=3' \
-device 
pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2 
\
-device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \
-device 
virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \
-drive file=/home/boot.img,if=none,id=drive0,format=raw

smmuv3 event 0x10 log:
[...]
[1.962656] virtio-pci :02:00.0: Adding to iommu group 0
[1.963150] virtio-pci :02:00.0: enabling device ( -> 0002)
[1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues
[1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks (1.07 
GB/1.00 GiB)
[1.966934] arm-smmu-v3 905.smmuv3: event 0x10 received:
[1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0
[1.967478] arm-smmu-v3 905.smmuv3:  0x0210
[1.968381] clk: Disabling unused clocks
[1.968677] arm-smmu-v3 905.smmuv3:  0x0200
[1.968990] PM: genpd: Disabling unused power domains
[1.969424] arm-smmu-v3 905.smmuv3:  0x
[1.969814] ALSA device list:
[1.970240] arm-smmu-v3 905.smmuv3:  0x
[1.970471]   No soundcards found.
[1.970902] arm-smmu-v3 905.smmuv3: event 0x10 received:
[1.971600] arm-smmu-v3 905.smmuv3:  0x0210
[1.971601] arm-smmu-v3 905.smmuv3:  0x0200
[1.971601] arm-smmu-v3 905.smmuv3:  0x
[1.971602] arm-smmu-v3 905.smmuv3:  0x
[1.971606] arm-smmu-v3 905.smmuv3: event 0x10 received:
[1.971607] arm-smmu-v3 905.smmuv3:  0x0210
[1.974202] arm-smmu-v3 905.smmuv3:  0x0200
[1.974634] arm-smmu-v3 905.smmuv3:  0x
[1.975005] Freeing unused kernel memory: 10112K
[1.975062] arm-smmu-v3 905.smmuv3:  0x
[1.975442] Run init as init process

Another information is that if "maxcpus=3" is removed from the kernel command 
line,
it will be OK.

I am not sure if there is a bug about vsmmu. It will be very appreciated if 
anyone
know this issue or can take a look at it.

Thanks,
Zhou

Re: [PATCH] block: support locking on change medium

2024-09-09 Thread Joelle van Dyne

On Mon, Sep 9, 2024 at 2:56 AM Kevin Wolf  wrote:
>
> Am 09.09.2024 um 03:58 hat Joelle van Dyne geschrieben:
> > New optional argument for 'blockdev-change-medium' QAPI command to allow
> > the caller to specify if they wish to enable file locking.
> >
> > Signed-off-by: Joelle van Dyne 
>
> I feel once you need to control such details of the backend, you should
> really use a separate 'blockdev-add' commannd.
>
> If it feels a bit too cumbersome to send explicit commands to open the
> tray, remove the medium, insert the new medium referencing the node you
> added with 'blockdev-add' and then close the tray again, I can
> understand. Maybe what we should do is extend 'blockdev-change-medium'
> so that it doesn't only accept a filename to specify the new images, but
> alternatively also a node-name.
>
> > +switch (file_locking_mode) {
> > +case BLOCKDEV_CHANGE_FILE_LOCKING_MODE_AUTO:
> > +break;
> > +
> > +case BLOCKDEV_CHANGE_FILE_LOCKING_MODE_OFF:
> > +qdict_put_str(options, "file.locking", "off");
> > +break;
> > +
> > +case BLOCKDEV_CHANGE_FILE_LOCKING_MODE_ON:
> > +qdict_put_str(options, "file.locking", "on");
> > +break;
> > +
> > +default:
> > +abort();
> > +}
>
> Using "file.locking" makes assumptions about what the passed filename
> string would result in. There is nothing that guarantees that the block
> driver even has a "file" child, or that the "file" child is referring
> to a file-posix driver rather than using a different protocol or being a
> filter driver above yet another node. It also doesn't consider backing
> files and other non-primary children of the opened node.
>
> So this is not correct, and I don't think there is any realistic way of
> making it correct with this approach.

The existence of "filename" already makes this assumption that the
input is a file child. While I agree with you that there are better
ways to solve this problem, ultimately "blockdev-change-medium" will
have to be deprecated when this hypothetical "better" way of
referencing a node added with blockdev-add is introduced. Meanwhile
this solves a very real problem on macOS which is that trying to
change medium with an ISO which the OS has already mounted will always
fail even when "read-only-mode" is set.

>
> Kevin
>

[PATCH RESEND RFC 00/10] migration: auto-converge refinements for huge VM

2024-09-09 Thread Hyman Huang

Currently, a huge VM with high memory overload may take a long time
to increase its maximum throttle percentage. The root cause is that
the current auto-converge throttle logic doesn't look like it will
scale because migration_trigger_throttle() is only called for each
iteration, so it won't be invoked for a long time if one iteration
can take a long time.

This patchset provides two refinements aiming at the above case.

1: The periodic CPU throttle. As Peter points out, "throttle only
   for each sync, sync for each iteration" may make sense in the
   old days, but perhaps not anymore. So we introduce perioidic
   CPU throttle implementation for migration, which is a trade-off
   between synchronization overhead and CPU throttle impact.

2: The responsive CPU throttle. We present new criteria called
   "dirty ratio" to help improve the detection accuracy and hence
   accelerate the throttle's invocation.

The RFC version of the refinement may be a rudimentary implementation,
I would appreciate hearing more feedback.

Yong, thanks.

Hyman Huang (10):
  migration: Introduce structs for periodic CPU throttle
  migration: Refine util functions to support periodic CPU throttle
  qapi/migration: Introduce periodic CPU throttling parameters
  qapi/migration: Introduce the iteration-count
  migration: Introduce util functions for periodic CPU throttle
  migration: Support periodic CPU throttle
  tests/migration-tests: Add test case for periodic throttle
  migration: Introduce cpu-responsive-throttle parameter
  migration: Support responsive CPU throttle
  tests/migration-tests: Add test case for responsive CPU throttle

 include/exec/ram_addr.h| 107 +++-
 include/exec/ramblock.h|  45 +++
 migration/migration-hmp-cmds.c |  25 
 migration/migration-stats.h|   4 +
 migration/migration.c  |  12 ++
 migration/options.c|  74 +++
 migration/options.h|   3 +
 migration/ram.c| 218 ++---
 migration/ram.h|   4 +
 migration/trace-events |   4 +
 qapi/migration.json|  45 ++-
 tests/qtest/migration-test.c   |  77 +++-
 12 files changed, 593 insertions(+), 25 deletions(-)

-- 
2.39.1

[PATCH RESEND RFC 07/10] tests/migration-tests: Add test case for periodic throttle

2024-09-09 Thread Hyman Huang

To make sure periodic throttle feature doesn't regression
any features and functionalities, enable this feature in
the auto-converge migration test.

Signed-off-by: Hyman Huang 
---
 tests/qtest/migration-test.c | 56 +++-
 1 file changed, 55 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 2fb10658d4..61d7182f88 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -281,6 +281,11 @@ static uint64_t get_migration_pass(QTestState *who)
 return read_ram_property_int(who, "iteration-count");
 }
 
+static uint64_t get_migration_dirty_sync_count(QTestState *who)
+{
+return read_ram_property_int(who, "dirty-sync-count");
+}
+
 static void read_blocktime(QTestState *who)
 {
 QDict *rsp_return;
@@ -710,6 +715,11 @@ typedef struct {
 PostcopyRecoveryFailStage postcopy_recovery_fail_stage;
 } MigrateCommon;
 
+typedef struct {
+/* CPU throttle parameters */
+bool periodic;
+} AutoConvergeArgs;
+
 static int test_migrate_start(QTestState **from, QTestState **to,
   const char *uri, MigrateStart *args)
 {
@@ -2778,12 +2788,13 @@ static void test_validate_uri_channels_none_set(void)
  * To make things even worse, we need to run the initial stage at
  * 3MB/s so we enter autoconverge even when host is (over)loaded.
  */
-static void test_migrate_auto_converge(void)
+static void test_migrate_auto_converge_args(AutoConvergeArgs *input_args)
 {
 g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
 MigrateStart args = {};
 QTestState *from, *to;
 int64_t percentage;
+bool periodic = (input_args && input_args->periodic);
 
 /*
  * We want the test to be stable and as fast as possible.
@@ -2791,6 +2802,7 @@ static void test_migrate_auto_converge(void)
  * so we need to decrease a bandwidth.
  */
 const int64_t init_pct = 5, inc_pct = 25, max_pct = 95;
+const int64_t periodic_throttle_interval = 2;
 
 if (test_migrate_start(&from, &to, uri, &args)) {
 return;
@@ -2801,6 +2813,12 @@ static void test_migrate_auto_converge(void)
 migrate_set_parameter_int(from, "cpu-throttle-increment", inc_pct);
 migrate_set_parameter_int(from, "max-cpu-throttle", max_pct);
 
+if (periodic) {
+migrate_set_parameter_bool(from, "cpu-periodic-throttle", true);
+migrate_set_parameter_int(from, "cpu-periodic-throttle-interval",
+periodic_throttle_interval);
+}
+
 /*
  * Set the initial parameters so that the migration could not converge
  * without throttling.
@@ -2827,6 +2845,29 @@ static void test_migrate_auto_converge(void)
 } while (true);
 /* The first percentage of throttling should be at least init_pct */
 g_assert_cmpint(percentage, >=, init_pct);
+
+if (periodic) {
+/*
+ * Check if periodic sync take effect, set the timeout with 20s
+ * (max_try_count * 1s), if extra sync doesn't show up, fail test.
+ */
+uint64_t iteration_count, dirty_sync_count;
+bool extra_sync = false;
+int max_try_count = 20;
+
+/* Check if periodic sync take effect */
+while (--max_try_count) {
+usleep(1000 * 1000);
+iteration_count = get_migration_pass(from);
+dirty_sync_count = get_migration_dirty_sync_count(from);
+if (dirty_sync_count > iteration_count) {
+extra_sync = true;
+break;
+}
+}
+g_assert(extra_sync);
+}
+
 /* Now, when we tested that throttling works, let it converge */
 migrate_ensure_converge(from);
 
@@ -2849,6 +2890,17 @@ static void test_migrate_auto_converge(void)
 test_migrate_end(from, to, true);
 }
 
+static void test_migrate_auto_converge(void)
+{
+test_migrate_auto_converge_args(NULL);
+}
+
+static void test_migrate_auto_converge_periodic_throttle(void)
+{
+AutoConvergeArgs args = {.periodic = true};
+test_migrate_auto_converge_args(&args);
+}
+
 static void *
 test_migrate_precopy_tcp_multifd_start_common(QTestState *from,
   QTestState *to,
@@ -3900,6 +3952,8 @@ int main(int argc, char **argv)
 if (g_test_slow()) {
 migration_test_add("/migration/auto_converge",
test_migrate_auto_converge);
+migration_test_add("/migration/auto_converge_periodic_throttle",
+   test_migrate_auto_converge_periodic_throttle);
 if (g_str_equal(arch, "x86_64") &&
 has_kvm && kvm_dirty_ring_supported()) {
 migration_test_add("/migration/dirty_limit",
-- 
2.39.1

[PATCH RESEND RFC 06/10] migration: Support periodic CPU throttle

2024-09-09 Thread Hyman Huang

When VM is configured with huge memory, the current throttle logic
doesn't look like to scale, because migration_trigger_throttle()
is only called for each iteration, so it won't be invoked for a long
time if one iteration can take a long time.

The periodic sync and throttle aims to fix the above issue by
synchronizing the remote dirty bitmap and triggering the throttle
periodically. This is a trade-off between synchronization overhead
and CPU throttle impact.

Signed-off-by: Hyman Huang 
---
 migration/migration.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 055d527ff6..fefd93b683 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1420,6 +1420,9 @@ static void migrate_fd_cleanup(MigrationState *s)
 qemu_thread_join(&s->thread);
 s->migration_thread_running = false;
 }
+if (migrate_periodic_throttle()) {
+periodic_throttle_stop();
+}
 bql_lock();
 
 multifd_send_shutdown();
@@ -3263,6 +3266,9 @@ static MigIterateState 
migration_iteration_run(MigrationState *s)
 
 if ((!pending_size || pending_size < s->threshold_size) && can_switchover) 
{
 trace_migration_thread_low_pending(pending_size);
+if (migrate_periodic_throttle()) {
+periodic_throttle_stop();
+}
 migration_completion(s);
 return MIG_ITERATE_BREAK;
 }
@@ -3508,6 +3514,11 @@ static void *migration_thread(void *opaque)
 ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
 bql_unlock();
 
+if (migrate_periodic_throttle()) {
+periodic_throttle_setup(true);
+periodic_throttle_start();
+}
+
 qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
MIGRATION_STATUS_ACTIVE);
 
-- 
2.39.1

[PATCH RESEND RFC 02/10] migration: Refine util functions to support periodic CPU throttle

2024-09-09 Thread Hyman Huang

Supply the migration_bitmap_sync function along with a periodic
argument. Introduce the sync_mode global variable to track the
sync mode and support periodic throttling while keeping backward
compatibility.

Signed-off-by: Hyman Huang 
---
 include/exec/ram_addr.h | 107 +---
 migration/ram.c |  49 ++
 2 files changed, 140 insertions(+), 16 deletions(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 891c44cf2d..7df926ed96 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -472,17 +472,68 @@ static inline void 
cpu_physical_memory_clear_dirty_range(ram_addr_t start,
 cpu_physical_memory_test_and_clear_dirty(start, length, DIRTY_MEMORY_CODE);
 }
 
+static void ramblock_clear_iter_bmap(RAMBlock *rb,
+ ram_addr_t start,
+ ram_addr_t length)
+{
+ram_addr_t addr;
+unsigned long *bmap = rb->bmap;
+unsigned long *shadow_bmap = rb->shadow_bmap;
+unsigned long *iter_bmap = rb->iter_bmap;
+
+for (addr = 0; addr < length; addr += TARGET_PAGE_SIZE) {
+long k = (start + addr) >> TARGET_PAGE_BITS;
+if (test_bit(k, shadow_bmap) && !test_bit(k, bmap)) {
+/* Page has been sent, clear the iter bmap */
+clear_bit(k, iter_bmap);
+}
+}
+}
+
+static void ramblock_update_iter_bmap(RAMBlock *rb,
+  ram_addr_t start,
+  ram_addr_t length)
+{
+ram_addr_t addr;
+unsigned long *bmap = rb->bmap;
+unsigned long *iter_bmap = rb->iter_bmap;
+
+for (addr = 0; addr < length; addr += TARGET_PAGE_SIZE) {
+long k = (start + addr) >> TARGET_PAGE_BITS;
+if (test_bit(k, iter_bmap)) {
+if (!test_bit(k, bmap)) {
+set_bit(k, bmap);
+rb->iter_dirty_pages++;
+}
+}
+}
+}
 
 /* Called with RCU critical section */
 static inline
 uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb,
ram_addr_t start,
-   ram_addr_t length)
+   ram_addr_t length,
+   unsigned int flag)
 {
 ram_addr_t addr;
 unsigned long word = BIT_WORD((start + rb->offset) >> TARGET_PAGE_BITS);
 uint64_t num_dirty = 0;
 unsigned long *dest = rb->bmap;
+unsigned long *shadow_bmap = rb->shadow_bmap;
+unsigned long *iter_bmap = rb->iter_bmap;
+
+assert(flag && !(flag & (~RAMBLOCK_SYN_MASK)));
+
+/*
+ * We must remove the sent dirty page from the iter_bmap in order to
+ * minimize redundant page transfers if periodic sync has appeared
+ * during this iteration.
+ */
+if (rb->periodic_sync_shown_up &&
+(flag & (RAMBLOCK_SYN_MODERN_ITER | RAMBLOCK_SYN_MODERN_PERIOD))) {
+ramblock_clear_iter_bmap(rb, start, length);
+}
 
 /* start address and length is aligned at the start of a word? */
 if (((word * BITS_PER_LONG) << TARGET_PAGE_BITS) ==
@@ -503,8 +554,20 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock 
*rb,
 if (src[idx][offset]) {
 unsigned long bits = qatomic_xchg(&src[idx][offset], 0);
 unsigned long new_dirty;
+if (flag & (RAMBLOCK_SYN_MODERN_ITER |
+RAMBLOCK_SYN_MODERN_PERIOD)) {
+/* Back-up bmap for the next iteration */
+iter_bmap[k] |= bits;
+if (flag == RAMBLOCK_SYN_MODERN_PERIOD) {
+/* Back-up bmap to detect pages has been sent */
+shadow_bmap[k] = dest[k];
+}
+}
 new_dirty = ~dest[k];
-dest[k] |= bits;
+if (flag == RAMBLOCK_SYN_LEGACY_ITER) {
+dest[k] |= bits;
+}
+
 new_dirty &= bits;
 num_dirty += ctpopl(new_dirty);
 }
@@ -534,18 +597,50 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock 
*rb,
 ram_addr_t offset = rb->offset;
 
 for (addr = 0; addr < length; addr += TARGET_PAGE_SIZE) {
-if (cpu_physical_memory_test_and_clear_dirty(
+bool dirty = false;
+long k = (start + addr) >> TARGET_PAGE_BITS;
+if (flag == RAMBLOCK_SYN_MODERN_PERIOD) {
+if (test_bit(k, dest)) {
+/* Back-up bmap to detect pages has been sent */
+set_bit(k, shadow_bmap);
+}
+}
+
+dirty = cpu_physical_memory_test_and_clear_dirty(
 start + addr + offset,
 TARGET_PAGE_SIZE,
-DIRTY_MEMORY_MIGRATION)) {
-long k = (start + addr)

[PATCH RESEND RFC 03/10] qapi/migration: Introduce periodic CPU throttling parameters

2024-09-09 Thread Hyman Huang

To activate the periodic CPU throttleing feature, introduce
the cpu-periodic-throttle.

To control the frequency of throttling, introduce the
cpu-periodic-throttle-interval.

Signed-off-by: Hyman Huang 
---
 migration/migration-hmp-cmds.c | 17 +++
 migration/options.c| 54 ++
 migration/options.h|  2 ++
 qapi/migration.json| 25 +++-
 4 files changed, 97 insertions(+), 1 deletion(-)

diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 7d608d26e1..f7b8e06bb4 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -264,6 +264,15 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict 
*qdict)
 monitor_printf(mon, "%s: %s\n",
 MigrationParameter_str(MIGRATION_PARAMETER_CPU_THROTTLE_TAILSLOW),
 params->cpu_throttle_tailslow ? "on" : "off");
+assert(params->has_cpu_periodic_throttle);
+monitor_printf(mon, "%s: %s\n",
+MigrationParameter_str(MIGRATION_PARAMETER_CPU_PERIODIC_THROTTLE),
+params->cpu_periodic_throttle ? "on" : "off");
+assert(params->has_cpu_periodic_throttle_interval);
+monitor_printf(mon, "%s: %u\n",
+MigrationParameter_str(
+MIGRATION_PARAMETER_CPU_PERIODIC_THROTTLE_INTERVAL),
+params->cpu_periodic_throttle_interval);
 assert(params->has_max_cpu_throttle);
 monitor_printf(mon, "%s: %u\n",
 MigrationParameter_str(MIGRATION_PARAMETER_MAX_CPU_THROTTLE),
@@ -512,6 +521,14 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict)
 p->has_cpu_throttle_tailslow = true;
 visit_type_bool(v, param, &p->cpu_throttle_tailslow, &err);
 break;
+case MIGRATION_PARAMETER_CPU_PERIODIC_THROTTLE:
+p->has_cpu_periodic_throttle = true;
+visit_type_bool(v, param, &p->cpu_periodic_throttle, &err);
+break;
+case MIGRATION_PARAMETER_CPU_PERIODIC_THROTTLE_INTERVAL:
+p->has_cpu_periodic_throttle_interval = true;
+visit_type_uint8(v, param, &p->cpu_periodic_throttle_interval, &err);
+break;
 case MIGRATION_PARAMETER_MAX_CPU_THROTTLE:
 p->has_max_cpu_throttle = true;
 visit_type_uint8(v, param, &p->max_cpu_throttle, &err);
diff --git a/migration/options.c b/migration/options.c
index 645f55003d..2dbe275ba0 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -44,6 +44,7 @@
 #define DEFAULT_MIGRATE_THROTTLE_TRIGGER_THRESHOLD 50
 #define DEFAULT_MIGRATE_CPU_THROTTLE_INITIAL 20
 #define DEFAULT_MIGRATE_CPU_THROTTLE_INCREMENT 10
+#define DEFAULT_MIGRATE_CPU_PERIODIC_THROTTLE_INTERVAL 5
 #define DEFAULT_MIGRATE_MAX_CPU_THROTTLE 99
 
 /* Migration XBZRLE default cache size */
@@ -104,6 +105,11 @@ Property migration_properties[] = {
   DEFAULT_MIGRATE_CPU_THROTTLE_INCREMENT),
 DEFINE_PROP_BOOL("x-cpu-throttle-tailslow", MigrationState,
   parameters.cpu_throttle_tailslow, false),
+DEFINE_PROP_BOOL("x-cpu-periodic-throttle", MigrationState,
+  parameters.cpu_periodic_throttle, false),
+DEFINE_PROP_UINT8("x-cpu-periodic-throttle-interval", MigrationState,
+  parameters.cpu_periodic_throttle_interval,
+  DEFAULT_MIGRATE_CPU_PERIODIC_THROTTLE_INTERVAL),
 DEFINE_PROP_SIZE("x-max-bandwidth", MigrationState,
   parameters.max_bandwidth, MAX_THROTTLE),
 DEFINE_PROP_SIZE("avail-switchover-bandwidth", MigrationState,
@@ -695,6 +701,20 @@ uint8_t migrate_cpu_throttle_initial(void)
 return s->parameters.cpu_throttle_initial;
 }
 
+uint8_t migrate_periodic_throttle_interval(void)
+{
+MigrationState *s = migrate_get_current();
+
+return s->parameters.cpu_periodic_throttle_interval;
+}
+
+bool migrate_periodic_throttle(void)
+{
+MigrationState *s = migrate_get_current();
+
+return s->parameters.cpu_periodic_throttle;
+}
+
 bool migrate_cpu_throttle_tailslow(void)
 {
 MigrationState *s = migrate_get_current();
@@ -874,6 +894,11 @@ MigrationParameters *qmp_query_migrate_parameters(Error 
**errp)
 params->cpu_throttle_increment = s->parameters.cpu_throttle_increment;
 params->has_cpu_throttle_tailslow = true;
 params->cpu_throttle_tailslow = s->parameters.cpu_throttle_tailslow;
+params->has_cpu_periodic_throttle = true;
+params->cpu_periodic_throttle = s->parameters.cpu_periodic_throttle;
+params->has_cpu_periodic_throttle_interval = true;
+params->cpu_periodic_throttle_interval =
+s->parameters.cpu_periodic_throttle_interval;
 params->tls_creds = g_strdup(s->parameters.tls_creds);
 params->tls_hostname = g_strdup(s->parameters.tls_hostname);
 params->tls_authz = g_strdup(s->parameters.tls_authz ?
@@ -940,6 +965,8 @@ void migrate_params_init(MigrationParameters *params)
 params->has_cpu_throttle_initial = true;
 pa

[PATCH RESEND RFC 09/10] migration: Support responsive CPU throttle

2024-09-09 Thread Hyman Huang

Currently, the convergence algorithm determines that the migration
cannot converge according to the following principle:
The dirty pages generated in current iteration exceed a specific
percentage (throttle-trigger-threshold, 50 by default) of the number
of transmissions. Let's refer to this criteria as the "dirty rate".
If this criteria is met more than or equal to twice
(dirty_rate_high_cnt >= 2), the throttle percentage increased.

In most cases, above implementation is appropriate. However, for a
VM with high memory overload, each iteration is time-consuming.
The VM's computing performance may be throttled at a high percentage
and last for a long time due to the repeated confirmation behavior.
Which may be intolerable for some computationally sensitive software
in the VM.

As the comment mentioned in the migration_trigger_throttle function,
in order to avoid erroneous detection, the original algorithm confirms
the criteria repeatedly. Put differently, the criteria does not need
to be validated again once the detection is more reliable.

In the refinement, in order to make the detection more accurate, we
introduce another criteria, called the "dirty ratio" to determine
the migration convergence. The "dirty ratio" is the ratio of
bytes_xfer_period and bytes_dirty_period. When the algorithm
repeatedly detects that the "dirty ratio" of current sync is lower
than the previous, the algorithm determines that the migration cannot
converge. For the "dirty rate" and "dirty ratio", if one of the two
criteria is met, the penalty percentage would be increased. This
makes CPU throttle more responsively and therefor saves the time of
the entire iteration and therefore reduces the time of VM performance
degradation.

In conclusion, this refinement significantly reduces the processing
time required for the throttle percentage step to its maximum while
the VM is under a high memory load.

Signed-off-by: Hyman Huang 
---
 migration/ram.c  | 55 ++--
 migration/trace-events   |  1 +
 tests/qtest/migration-test.c |  1 +
 3 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index d9d8ed0fda..5fba572f3e 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -420,6 +420,12 @@ struct RAMState {
 /* Periodic throttle information */
 bool throttle_running;
 QemuThread throttle_thread;
+
+/*
+ * Ratio of bytes_dirty_period and bytes_xfer_period in the previous
+ * sync.
+ */
+uint64_t dirty_ratio_pct;
 };
 typedef struct RAMState RAMState;
 
@@ -1044,6 +1050,43 @@ static void migration_dirty_limit_guest(void)
 trace_migration_dirty_limit_guest(quota_dirtyrate);
 }
 
+static bool migration_dirty_ratio_high(RAMState *rs)
+{
+static int dirty_ratio_high_cnt;
+uint64_t threshold = migrate_throttle_trigger_threshold();
+uint64_t bytes_xfer_period =
+migration_transferred_bytes() - rs->bytes_xfer_prev;
+uint64_t bytes_dirty_period = rs->num_dirty_pages_period * 
TARGET_PAGE_SIZE;
+bool dirty_ratio_high = false;
+uint64_t prev, curr;
+
+/* Calculate the dirty ratio percentage */
+curr = 100 * (bytes_dirty_period * 1.0 / bytes_xfer_period);
+
+prev = rs->dirty_ratio_pct;
+rs->dirty_ratio_pct = curr;
+
+if (prev == 0) {
+return false;
+}
+
+/*
+ * If current dirty ratio is greater than previouse, determine
+ * that the migration do not converge.
+ */
+if (curr > threshold && curr >= prev) {
+trace_migration_dirty_ratio_high(curr, prev);
+dirty_ratio_high_cnt++;
+}
+
+if (dirty_ratio_high_cnt >= 2) {
+dirty_ratio_high = true;
+dirty_ratio_high_cnt = 0;
+}
+
+return dirty_ratio_high;
+}
+
 static void migration_trigger_throttle(RAMState *rs)
 {
 uint64_t threshold = migrate_throttle_trigger_threshold();
@@ -1051,6 +1094,11 @@ static void migration_trigger_throttle(RAMState *rs)
 migration_transferred_bytes() - rs->bytes_xfer_prev;
 uint64_t bytes_dirty_period = rs->num_dirty_pages_period * 
TARGET_PAGE_SIZE;
 uint64_t bytes_dirty_threshold = bytes_xfer_period * threshold / 100;
+bool dirty_ratio_high = false;
+
+if (migrate_responsive_throttle() && (bytes_xfer_period != 0)) {
+dirty_ratio_high = migration_dirty_ratio_high(rs);
+}
 
 /*
  * The following detection logic can be refined later. For now:
@@ -1060,8 +1108,11 @@ static void migration_trigger_throttle(RAMState *rs)
  * twice, start or increase throttling.
  */
 if ((bytes_dirty_period > bytes_dirty_threshold) &&
-(++rs->dirty_rate_high_cnt >= 2)) {
-rs->dirty_rate_high_cnt = 0;
+((++rs->dirty_rate_high_cnt >= 2) || dirty_ratio_high)) {
+
+rs->dirty_rate_high_cnt =
+rs->dirty_rate_high_cnt >= 2 ? 0 : rs->dirty_rate_high_cnt;
+
 if (migrate_auto_converge()) {
 trace_migration_throttle();
 mig_thr

[PATCH RESEND RFC 05/10] migration: Introduce util functions for periodic CPU throttle

2024-09-09 Thread Hyman Huang

Provide useful utilities to manage the periodic_throttle_thread's
lifespan. Additionally, to set up sync mode, provide
periodic_throttle_setup.

Signed-off-by: Hyman Huang 
---
 migration/ram.c| 98 +-
 migration/ram.h|  4 ++
 migration/trace-events |  3 ++
 3 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index 23471c9e5a..d9d8ed0fda 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -416,6 +416,10 @@ struct RAMState {
  * RAM migration.
  */
 unsigned int postcopy_bmap_sync_requested;
+
+/* Periodic throttle information */
+bool throttle_running;
+QemuThread throttle_thread;
 };
 typedef struct RAMState RAMState;
 
@@ -1075,7 +1079,13 @@ static void migration_bitmap_sync(RAMState *rs,
 RAMBlock *block;
 int64_t end_time;
 
-if (!periodic) {
+if (periodic) {
+/* Be careful that we don't synchronize too often */
+int64_t curr_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+if (curr_time < rs->time_last_bitmap_sync + 1000) {
+return;
+}
+} else {
 stat64_add(&mig_stats.iteration_count, 1);
 }
 
@@ -1121,6 +1131,92 @@ static void migration_bitmap_sync(RAMState *rs,
 }
 }
 
+static void *periodic_throttle_thread(void *opaque)
+{
+RAMState *rs = opaque;
+bool skip_sleep = false;
+int sleep_duration = migrate_periodic_throttle_interval();
+
+rcu_register_thread();
+
+while (qatomic_read(&rs->throttle_running)) {
+int64_t curr_time;
+/*
+ * The first iteration copies all memory anyhow and has no
+ * effect on guest performance, therefore omit it to avoid
+ * paying extra for the sync penalty.
+ */
+if (stat64_get(&mig_stats.iteration_count) <= 1) {
+continue;
+}
+
+if (!skip_sleep) {
+sleep(sleep_duration);
+}
+
+/* Be careful that we don't synchronize too often */
+curr_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+if (curr_time > rs->time_last_bitmap_sync + 1000) {
+bql_lock();
+trace_migration_periodic_throttle();
+WITH_RCU_READ_LOCK_GUARD() {
+migration_bitmap_sync(rs, false, true);
+}
+bql_unlock();
+skip_sleep = false;
+} else {
+skip_sleep = true;
+}
+}
+
+rcu_unregister_thread();
+
+return NULL;
+}
+
+void periodic_throttle_start(void)
+{
+RAMState *rs = ram_state;
+
+if (!rs) {
+return;
+}
+
+if (qatomic_read(&rs->throttle_running)) {
+return;
+}
+
+trace_migration_periodic_throttle_start();
+
+qatomic_set(&rs->throttle_running, 1);
+qemu_thread_create(&rs->throttle_thread,
+   NULL, periodic_throttle_thread,
+   rs, QEMU_THREAD_JOINABLE);
+}
+
+void periodic_throttle_stop(void)
+{
+RAMState *rs = ram_state;
+
+if (!rs) {
+return;
+}
+
+if (!qatomic_read(&rs->throttle_running)) {
+return;
+}
+
+trace_migration_periodic_throttle_stop();
+
+qatomic_set(&rs->throttle_running, 0);
+qemu_thread_join(&rs->throttle_thread);
+}
+
+void periodic_throttle_setup(bool enable)
+{
+sync_mode = enable ? RAMBLOCK_SYN_MODERN : RAMBLOCK_SYN_LEGACY;
+}
+
 static void migration_bitmap_sync_precopy(RAMState *rs, bool last_stage)
 {
 Error *local_err = NULL;
diff --git a/migration/ram.h b/migration/ram.h
index bc0318b834..f7c7b2e7ad 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -93,4 +93,8 @@ void ram_write_tracking_prepare(void);
 int ram_write_tracking_start(void);
 void ram_write_tracking_stop(void);
 
+/* Periodic throttle */
+void periodic_throttle_start(void);
+void periodic_throttle_stop(void);
+void periodic_throttle_setup(bool enable);
 #endif
diff --git a/migration/trace-events b/migration/trace-events
index c65902f042..5b9db57c8f 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -95,6 +95,9 @@ get_queued_page_not_dirty(const char *block_name, uint64_t 
tmp_offset, unsigned
 migration_bitmap_sync_start(void) ""
 migration_bitmap_sync_end(uint64_t dirty_pages) "dirty_pages %" PRIu64
 migration_bitmap_clear_dirty(char *str, uint64_t start, uint64_t size, 
unsigned long page) "rb %s start 0x%"PRIx64" size 0x%"PRIx64" page 0x%lx"
+migration_periodic_throttle(void) ""
+migration_periodic_throttle_start(void) ""
+migration_periodic_throttle_stop(void) ""
 migration_throttle(void) ""
 migration_dirty_limit_guest(int64_t dirtyrate) "guest dirty page rate limit %" 
PRIi64 " MB/s"
 ram_discard_range(const char *rbname, uint64_t start, size_t len) "%s: start: 
%" PRIx64 " %zx"
-- 
2.39.1

[PATCH RESEND RFC 01/10] migration: Introduce structs for periodic CPU throttle

2024-09-09 Thread Hyman Huang

shadow_bmap, iter_bmap, iter_dirty_pages and
periodic_sync_shown_up are introduced to satisfy the need
for periodic CPU throttle.

Meanwhile, introduce enumeration of dirty bitmap sync method.

Signed-off-by: Hyman Huang 
---
 include/exec/ramblock.h | 45 +
 migration/ram.c |  6 ++
 2 files changed, 51 insertions(+)

diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 0babd105c0..619c52885a 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -24,6 +24,30 @@
 #include "qemu/rcu.h"
 #include "exec/ramlist.h"
 
+/* Possible bits for migration_bitmap_sync */
+
+/*
+ * The old-fashioned sync method, which is, in turn, used for CPU
+ * throttle and memory transfer.
+ */
+#define RAMBLOCK_SYN_LEGACY_ITER(1U << 0)
+
+/*
+ * The modern sync method, which is, in turn, used for CPU throttle
+ * and memory transfer.
+ */
+#define RAMBLOCK_SYN_MODERN_ITER(1U << 1)
+
+/* The modern sync method, which is used for CPU throttle only */
+#define RAMBLOCK_SYN_MODERN_PERIOD  (1U << 2)
+
+#define RAMBLOCK_SYN_MASK   (0x7)
+
+typedef enum RAMBlockSynMode {
+RAMBLOCK_SYN_LEGACY,/* Old-fashined mode */
+RAMBLOCK_SYN_MODERN,
+} RAMBlockSynMode;
+
 struct RAMBlock {
 struct rcu_head rcu;
 struct MemoryRegion *mr;
@@ -89,6 +113,27 @@ struct RAMBlock {
  * could not have been valid on the source.
  */
 ram_addr_t postcopy_length;
+
+/*
+ * Used to backup the bmap during periodic sync to see whether any dirty
+ * pages were sent during that time.
+ */
+unsigned long *shadow_bmap;
+
+/*
+ * The bitmap "bmap," which was initially used for both sync and memory
+ * transfer, will be replaced by two bitmaps: the previously used "bmap"
+ * and the recently added "iter_bmap." Only the memory transfer is
+ * conducted with the previously used "bmap"; the recently added
+ * "iter_bmap" is utilized for sync.
+ */
+unsigned long *iter_bmap;
+
+/* Number of new dirty pages during iteration */
+uint64_t iter_dirty_pages;
+
+/* If periodic sync has shown up during iteration */
+bool periodic_sync_shown_up;
 };
 #endif
 #endif
diff --git a/migration/ram.c b/migration/ram.c
index 67ca3d5d51..f29faa82d6 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2362,6 +2362,10 @@ static void ram_bitmaps_destroy(void)
 block->bmap = NULL;
 g_free(block->file_bmap);
 block->file_bmap = NULL;
+g_free(block->shadow_bmap);
+block->shadow_bmap = NULL;
+g_free(block->iter_bmap);
+block->iter_bmap = NULL;
 }
 }
 
@@ -2753,6 +2757,8 @@ static void ram_list_init_bitmaps(void)
 }
 block->clear_bmap_shift = shift;
 block->clear_bmap = bitmap_new(clear_bmap_size(pages, shift));
+block->shadow_bmap = bitmap_new(pages);
+block->iter_bmap = bitmap_new(pages);
 }
 }
 }
-- 
2.39.1

[PATCH RESEND RFC 08/10] migration: Introduce cpu-responsive-throttle parameter

2024-09-09 Thread Hyman Huang

To enable the responsive throttle that will be implemented
in the next commit, introduce the cpu-responsive-throttle
parameter.

Signed-off-by: Hyman Huang 
---
 migration/migration-hmp-cmds.c |  8 
 migration/options.c| 20 
 migration/options.h|  1 +
 qapi/migration.json| 16 +++-
 4 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index f7b8e06bb4..a3d4d3f62f 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -273,6 +273,10 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict 
*qdict)
 MigrationParameter_str(
 MIGRATION_PARAMETER_CPU_PERIODIC_THROTTLE_INTERVAL),
 params->cpu_periodic_throttle_interval);
+assert(params->has_cpu_responsive_throttle);
+monitor_printf(mon, "%s: %s\n",
+
MigrationParameter_str(MIGRATION_PARAMETER_CPU_RESPONSIVE_THROTTLE),
+params->cpu_responsive_throttle ? "on" : "off");
 assert(params->has_max_cpu_throttle);
 monitor_printf(mon, "%s: %u\n",
 MigrationParameter_str(MIGRATION_PARAMETER_MAX_CPU_THROTTLE),
@@ -529,6 +533,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict)
 p->has_cpu_periodic_throttle_interval = true;
 visit_type_uint8(v, param, &p->cpu_periodic_throttle_interval, &err);
 break;
+case MIGRATION_PARAMETER_CPU_RESPONSIVE_THROTTLE:
+p->has_cpu_responsive_throttle = true;
+visit_type_bool(v, param, &p->cpu_responsive_throttle, &err);
+break;
 case MIGRATION_PARAMETER_MAX_CPU_THROTTLE:
 p->has_max_cpu_throttle = true;
 visit_type_uint8(v, param, &p->max_cpu_throttle, &err);
diff --git a/migration/options.c b/migration/options.c
index 2dbe275ba0..aa233684ee 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -110,6 +110,8 @@ Property migration_properties[] = {
 DEFINE_PROP_UINT8("x-cpu-periodic-throttle-interval", MigrationState,
   parameters.cpu_periodic_throttle_interval,
   DEFAULT_MIGRATE_CPU_PERIODIC_THROTTLE_INTERVAL),
+DEFINE_PROP_BOOL("x-cpu-responsive-throttle", MigrationState,
+  parameters.cpu_responsive_throttle, false),
 DEFINE_PROP_SIZE("x-max-bandwidth", MigrationState,
   parameters.max_bandwidth, MAX_THROTTLE),
 DEFINE_PROP_SIZE("avail-switchover-bandwidth", MigrationState,
@@ -715,6 +717,13 @@ bool migrate_periodic_throttle(void)
 return s->parameters.cpu_periodic_throttle;
 }
 
+bool migrate_responsive_throttle(void)
+{
+MigrationState *s = migrate_get_current();
+
+return s->parameters.cpu_responsive_throttle;
+}
+
 bool migrate_cpu_throttle_tailslow(void)
 {
 MigrationState *s = migrate_get_current();
@@ -899,6 +908,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error 
**errp)
 params->has_cpu_periodic_throttle_interval = true;
 params->cpu_periodic_throttle_interval =
 s->parameters.cpu_periodic_throttle_interval;
+params->has_cpu_responsive_throttle = true;
+params->cpu_responsive_throttle = s->parameters.cpu_responsive_throttle;
 params->tls_creds = g_strdup(s->parameters.tls_creds);
 params->tls_hostname = g_strdup(s->parameters.tls_hostname);
 params->tls_authz = g_strdup(s->parameters.tls_authz ?
@@ -967,6 +978,7 @@ void migrate_params_init(MigrationParameters *params)
 params->has_cpu_throttle_tailslow = true;
 params->has_cpu_periodic_throttle = true;
 params->has_cpu_periodic_throttle_interval = true;
+params->has_cpu_responsive_throttle = true;
 params->has_max_bandwidth = true;
 params->has_downtime_limit = true;
 params->has_x_checkpoint_delay = true;
@@ -1208,6 +1220,10 @@ static void 
migrate_params_test_apply(MigrateSetParameters *params,
 params->cpu_periodic_throttle_interval;
 }
 
+if (params->has_cpu_responsive_throttle) {
+dest->cpu_responsive_throttle = params->cpu_responsive_throttle;
+}
+
 if (params->tls_creds) {
 assert(params->tls_creds->type == QTYPE_QSTRING);
 dest->tls_creds = params->tls_creds->u.s;
@@ -1325,6 +1341,10 @@ static void migrate_params_apply(MigrateSetParameters 
*params, Error **errp)
 params->cpu_periodic_throttle_interval;
 }
 
+if (params->has_cpu_responsive_throttle) {
+s->parameters.cpu_responsive_throttle = 
params->cpu_responsive_throttle;
+}
+
 if (params->tls_creds) {
 g_free(s->parameters.tls_creds);
 assert(params->tls_creds->type == QTYPE_QSTRING);
diff --git a/migration/options.h b/migration/options.h
index efeac01470..613d675003 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -70,6 +70,7 @@ uint8_t migrate_cpu_throttle_increment(void);
 uint8_t migrate_cpu_throttle_initial(void);
 uint8_t migrate_p

[PATCH RESEND RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle

2024-09-09 Thread Hyman Huang

Despite the fact that the responsive CPU throttle is enabled,
the dirty sync count may not always increase because this is
an optimization that might not happen in any situation.

This test case just making sure it doesn't interfere with any
current functionality.

Signed-off-by: Hyman Huang 
---
 tests/qtest/migration-test.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 4626301435..cf0b1dcb50 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -718,6 +718,7 @@ typedef struct {
 typedef struct {
 /* CPU throttle parameters */
 bool periodic;
+bool responsive;
 } AutoConvergeArgs;
 
 static int test_migrate_start(QTestState **from, QTestState **to,
@@ -2795,6 +2796,7 @@ static void 
test_migrate_auto_converge_args(AutoConvergeArgs *input_args)
 QTestState *from, *to;
 int64_t percentage;
 bool periodic = (input_args && input_args->periodic);
+bool responsive = (input_args && input_args->responsive);
 
 /*
  * We want the test to be stable and as fast as possible.
@@ -2820,6 +2822,16 @@ static void 
test_migrate_auto_converge_args(AutoConvergeArgs *input_args)
 periodic_throttle_interval);
 }
 
+if (responsive) {
+/*
+ * The dirty-sync-count may not always go down while using responsive
+ * throttle because it is an optimization and may not take effect in
+ * any scenario. Just making sure this feature doesn't break any
+ * existing functionality by turning it on.
+ */
+migrate_set_parameter_bool(from, "cpu-responsive-throttle", true);
+}
+
 /*
  * Set the initial parameters so that the migration could not converge
  * without throttling.
@@ -2902,6 +2914,12 @@ static void 
test_migrate_auto_converge_periodic_throttle(void)
 test_migrate_auto_converge_args(&args);
 }
 
+static void test_migrate_auto_converge_responsive_throttle(void)
+{
+AutoConvergeArgs args = {.responsive = true};
+test_migrate_auto_converge_args(&args);
+}
+
 static void *
 test_migrate_precopy_tcp_multifd_start_common(QTestState *from,
   QTestState *to,
@@ -3955,6 +3973,8 @@ int main(int argc, char **argv)
test_migrate_auto_converge);
 migration_test_add("/migration/auto_converge_periodic_throttle",
test_migrate_auto_converge_periodic_throttle);
+migration_test_add("/migration/auto_converge_responsive_throttle",
+   test_migrate_auto_converge_responsive_throttle);
 if (g_str_equal(arch, "x86_64") &&
 has_kvm && kvm_dirty_ring_supported()) {
 migration_test_add("/migration/dirty_limit",
-- 
2.39.1

[PATCH RESEND RFC 04/10] qapi/migration: Introduce the iteration-count

2024-09-09 Thread Hyman Huang

The original migration information dirty-sync-count
could no longer reflect iteration count due to the
introduction of periodic synchronization in the next
commit; add the iteration count to compensate.

Signed-off-by: Hyman Huang 
---
 migration/migration-stats.h  |  4 
 migration/migration.c|  1 +
 migration/ram.c  | 12 
 qapi/migration.json  |  6 +-
 tests/qtest/migration-test.c |  2 +-
 5 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/migration/migration-stats.h b/migration/migration-stats.h
index 05290ade76..43ee0f4f05 100644
--- a/migration/migration-stats.h
+++ b/migration/migration-stats.h
@@ -50,6 +50,10 @@ typedef struct {
  * Number of times we have synchronized guest bitmaps.
  */
 Stat64 dirty_sync_count;
+/*
+ * Number of migration iteration processed.
+ */
+Stat64 iteration_count;
 /*
  * Number of times zero copy failed to send any page using zero
  * copy.
diff --git a/migration/migration.c b/migration/migration.c
index 3dea06d577..055d527ff6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1197,6 +1197,7 @@ static void populate_ram_info(MigrationInfo *info, 
MigrationState *s)
 info->ram->mbps = s->mbps;
 info->ram->dirty_sync_count =
 stat64_get(&mig_stats.dirty_sync_count);
+info->ram->iteration_count = stat64_get(&mig_stats.iteration_count);
 info->ram->dirty_sync_missed_zero_copy =
 stat64_get(&mig_stats.dirty_sync_missed_zero_copy);
 info->ram->postcopy_requests =
diff --git a/migration/ram.c b/migration/ram.c
index a56634eb46..23471c9e5a 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -594,7 +594,7 @@ static void xbzrle_cache_zero_page(ram_addr_t current_addr)
 /* We don't care if this fails to allocate a new cache page
  * as long as it updated an old one */
 cache_insert(XBZRLE.cache, current_addr, XBZRLE.zero_target_page,
- stat64_get(&mig_stats.dirty_sync_count));
+ stat64_get(&mig_stats.iteration_count));
 }
 
 #define ENCODING_FLAG_XBZRLE 0x1
@@ -620,7 +620,7 @@ static int save_xbzrle_page(RAMState *rs, PageSearchStatus 
*pss,
 int encoded_len = 0, bytes_xbzrle;
 uint8_t *prev_cached_page;
 QEMUFile *file = pss->pss_channel;
-uint64_t generation = stat64_get(&mig_stats.dirty_sync_count);
+uint64_t generation = stat64_get(&mig_stats.iteration_count);
 
 if (!cache_is_cached(XBZRLE.cache, current_addr, generation)) {
 xbzrle_counters.cache_miss++;
@@ -1075,6 +1075,10 @@ static void migration_bitmap_sync(RAMState *rs,
 RAMBlock *block;
 int64_t end_time;
 
+if (!periodic) {
+stat64_add(&mig_stats.iteration_count, 1);
+}
+
 stat64_add(&mig_stats.dirty_sync_count, 1);
 
 if (!rs->time_last_bitmap_sync) {
@@ -,8 +1115,8 @@ static void migration_bitmap_sync(RAMState *rs,
 rs->num_dirty_pages_period = 0;
 rs->bytes_xfer_prev = migration_transferred_bytes();
 }
-if (migrate_events()) {
-uint64_t generation = stat64_get(&mig_stats.dirty_sync_count);
+if (!periodic && migrate_events()) {
+uint64_t generation = stat64_get(&mig_stats.iteration_count);
 qapi_event_send_migration_pass(generation);
 }
 }
diff --git a/qapi/migration.json b/qapi/migration.json
index 8281d4a83b..6d8358c202 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -60,6 +60,9 @@
 # between 0 and @dirty-sync-count * @multifd-channels.  (since
 # 7.1)
 #
+# @iteration-count: The number of iterations since migration started.
+# (since 9.2)
+#
 # Since: 0.14
 ##
 { 'struct': 'MigrationStats',
@@ -72,7 +75,8 @@
'multifd-bytes': 'uint64', 'pages-per-second': 'uint64',
'precopy-bytes': 'uint64', 'downtime-bytes': 'uint64',
'postcopy-bytes': 'uint64',
-   'dirty-sync-missed-zero-copy': 'uint64' } }
+   'dirty-sync-missed-zero-copy': 'uint64',
+   'iteration-count' : 'int' } }
 
 ##
 # @XBZRLECacheStats:
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 9d08101643..2fb10658d4 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -278,7 +278,7 @@ static int64_t read_migrate_property_int(QTestState *who, 
const char *property)
 
 static uint64_t get_migration_pass(QTestState *who)
 {
-return read_ram_property_int(who, "dirty-sync-count");
+return read_ram_property_int(who, "iteration-count");
 }
 
 static void read_blocktime(QTestState *who)
-- 
2.39.1

Re: [Bug Report] smmuv3 event 0x10 report when running virtio-blk-pci

2024-09-09 Thread Peter Maydell

On Mon, 9 Sept 2024 at 15:22, Zhou Wang via  wrote:
>
> Hi All,
>
> When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10
> during kernel booting up.

Does it still do this if you either:
 (1) use the v9.1.0 release (commit fd1952d814da)
 (2) use "-machine virt-9.1" instead of "-machine virt"

?

My suspicion is that this will have started happening now that
we expose an SMMU with two-stage translation support to the guest
in the "virt" machine type (which we do not if you either
use virt-9.1 or in the v9.1.0 release).

I've cc'd Eric (smmuv3 maintainer) and Mostafa (author of
the two-stage support).

> qemu command which I use is as below:
>
> qemu-system-aarch64 -machine 
> virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \
> -kernel Image -initrd minifs.cpio.gz \
> -enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \
> -append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x9000 maxcpus=3' \
> -device 
> pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2
>  \
> -device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \
> -device 
> virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \
> -drive file=/home/boot.img,if=none,id=drive0,format=raw
>
> smmuv3 event 0x10 log:
> [...]
> [1.962656] virtio-pci :02:00.0: Adding to iommu group 0
> [1.963150] virtio-pci :02:00.0: enabling device ( -> 0002)
> [1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues
> [1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks 
> (1.07 GB/1.00 GiB)
> [1.966934] arm-smmu-v3 905.smmuv3: event 0x10 received:
> [1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0
> [1.967478] arm-smmu-v3 905.smmuv3:  0x0210
> [1.968381] clk: Disabling unused clocks
> [1.968677] arm-smmu-v3 905.smmuv3:  0x0200
> [1.968990] PM: genpd: Disabling unused power domains
> [1.969424] arm-smmu-v3 905.smmuv3:  0x
> [1.969814] ALSA device list:
> [1.970240] arm-smmu-v3 905.smmuv3:  0x
> [1.970471]   No soundcards found.
> [1.970902] arm-smmu-v3 905.smmuv3: event 0x10 received:
> [1.971600] arm-smmu-v3 905.smmuv3:  0x0210
> [1.971601] arm-smmu-v3 905.smmuv3:  0x0200
> [1.971601] arm-smmu-v3 905.smmuv3:  0x
> [1.971602] arm-smmu-v3 905.smmuv3:  0x
> [1.971606] arm-smmu-v3 905.smmuv3: event 0x10 received:
> [1.971607] arm-smmu-v3 905.smmuv3:  0x0210
> [1.974202] arm-smmu-v3 905.smmuv3:  0x0200
> [1.974634] arm-smmu-v3 905.smmuv3:  0x
> [1.975005] Freeing unused kernel memory: 10112K
> [1.975062] arm-smmu-v3 905.smmuv3:  0x
> [1.975442] Run init as init process
>
> Another information is that if "maxcpus=3" is removed from the kernel command 
> line,
> it will be OK.
>
> I am not sure if there is a bug about vsmmu. It will be very appreciated if 
> anyone
> know this issue or can take a look at it.

thanks
-- PMM

Re: [PULL 27/34] migration/multifd: Move nocomp code into multifd-nocomp.c

2024-09-09 Thread Peter Xu

On Mon, Sep 09, 2024 at 11:28:14AM +0100, Peter Maydell wrote:
> On Wed, 4 Sept 2024 at 13:48, Fabiano Rosas  wrote:
> >
> > In preparation for adding new payload types to multifd, move most of
> > the no-compression code into multifd-nocomp.c. Let's try to keep a
> > semblance of layering by not mixing general multifd control flow with
> > the details of transmitting pages of ram.
> >
> > There are still some pieces leftover, namely the p->normal, p->zero,
> > etc variables that we use for zero page tracking and the packet
> > allocation which is heavily dependent on the ram code.
> >
> > Reviewed-by: Peter Xu 
> > Signed-off-by: Fabiano Rosas 
> 
> I know Coverity has only flagged this up because the
> code has moved, but it seems like a good place to ask
> the question:
> 
> > +void multifd_ram_fill_packet(MultiFDSendParams *p)
> > +{
> > +MultiFDPacket_t *packet = p->packet;
> > +MultiFDPages_t *pages = &p->data->u.ram;
> > +uint32_t zero_num = pages->num - pages->normal_num;
> > +
> > +packet->pages_alloc = cpu_to_be32(multifd_ram_page_count());
> > +packet->normal_pages = cpu_to_be32(pages->normal_num);
> > +packet->zero_pages = cpu_to_be32(zero_num);
> > +
> > +if (pages->block) {
> > +strncpy(packet->ramblock, pages->block->idstr, 256);
> 
> Coverity points out that when we fill in the RAMBlock::idstr
> here, if packet->ramblock is not NUL terminated then we won't
> NUL-terminate idstr either (CID 1560071).
> 
> Is this really what is intended?
> 
> Perhaps
>  pstrncpy(packet->ramblock, sizeof(packet->ramblock),
>   pages->block->idstr);
> 
> would be better?
> 
> (pstrncpy will always NUL-terminate, and won't pointlessly
> zero-fill the space after the string in the destination.)

In reality only the "zero-fill" change would affect us, as ramblock->idstr
always has the same size and always null-terminated, or we're in bigger
trouble.. So I assume there's no security concern, however indeed still
nice to use pstrcpy() to at least avoid zero-fills.

Thanks,

-- 
Peter Xu

Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle

2024-09-09 Thread Peter Xu

On Mon, Sep 09, 2024 at 03:02:57PM +0100, Peter Maydell wrote:
> On Mon, 9 Sept 2024 at 14:51, Hyman Huang  wrote:
> >
> > Despite the fact that the responsive CPU throttle is enabled,
> > the dirty sync count may not always increase because this is
> > an optimization that might not happen in any situation.
> >
> > This test case just making sure it doesn't interfere with any
> > current functionality.
> >
> > Signed-off-by: Hyman Huang 
> 
> tests/qtest/migration-test already runs 75 different
> subtests, takes up a massive chunk of our "make check"
> time, and is very commonly a "times out" test on some
> of our CI jobs. It runs on five different guest CPU
> architectures, each one of which takes between 2 and
> 5 minutes to complete the full migration-test.
> 
> Do we really need to make it even bigger?

I'll try to find some time in the next few weeks looking into this to see
whether we can further shrink migration test times after previous attemps
from Dan.  At least a low hanging fruit is we should indeed put some more
tests into g_test_slow(), and this new test could also be a candidate (then
we can run "-m slow" for migration PRs only).

Thanks,

-- 
Peter Xu

Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle

2024-09-09 Thread Yong Huang

On Mon, Sep 9, 2024 at 10:03 PM Peter Maydell 
wrote:

> On Mon, 9 Sept 2024 at 14:51, Hyman Huang  wrote:
> >
> > Despite the fact that the responsive CPU throttle is enabled,
> > the dirty sync count may not always increase because this is
> > an optimization that might not happen in any situation.
> >
> > This test case just making sure it doesn't interfere with any
> > current functionality.
> >
> > Signed-off-by: Hyman Huang 
>
> tests/qtest/migration-test already runs 75 different
> subtests, takes up a massive chunk of our "make check"
> time, and is very commonly a "times out" test on some
> of our CI jobs. It runs on five different guest CPU
> architectures, each one of which takes between 2 and
> 5 minutes to complete the full migration-test.
>
> Do we really need to make it even bigger?
>

No, I don't insist on that; the cpu-responsive-throttle
parameter may also be enabled on the existing
migrate_auto_converge test case by default.

Thank for the comment.

Yong.


>
> thanks
> -- PMM
>


-- 
Best regards

1 2 3 4 >

1 - 100 of 328 matches

Mail list logo