date:20230501

Re: [PATCH v3 54/57] tcg/ppc: Remove unused constraints A, B, C, D

2023-05-01 Thread Richard Henderson


On 4/29/23 13:29, Philippe Mathieu-Daudé wrote:

On 24/4/23 07:41, Richard Henderson wrote:

These constraints have not been used for quite some time.

Fixes: 77b73de67632 ("Use rem/div[u]_i32 drop div[u]2_i32")
Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Richard Henderson 
---
  tcg/ppc/tcg-target-con-str.h | 4 
  1 file changed, 4 deletions(-)

diff --git a/tcg/ppc/tcg-target-con-str.h b/tcg/ppc/tcg-target-con-str.h
index f3bf030bc3..9dcbc3df50 100644
--- a/tcg/ppc/tcg-target-con-str.h
+++ b/tcg/ppc/tcg-target-con-str.h
@@ -10,10 +10,6 @@
   */
  REGS('r', ALL_GENERAL_REGS)
  REGS('v', ALL_VECTOR_REGS)
-REGS('A', 1u << TCG_REG_R3)
-REGS('B', 1u << TCG_REG_R4)
-REGS('C', 1u << TCG_REG_R5)
-REGS('D', 1u << TCG_REG_R6)


Reviewed-by: Philippe Mathieu-Daudé 

Is the J constraint introduced in commit 3d582c6179
("tcg-ppc64: Rearrange integer constant constraints")
ever used?


Nope, not anymore.  Used to be for and/or/xor, now replaced by more general constraints. 
Will remove.


r~

[PATCH 0/5] eBPF RSS through QMP support.

2023-05-01 Thread Andrew Melnychenko

This series of patches provides the ability to retrieve eBPF program
through qmp, so management application may load bpf blob with proper 
capabilities.
Now, virtio-net devices can accept eBPF programs and maps through properties
as external file descriptors. Access to the eBPF map is direct through mmap()
call, so it should not require additional capabilities to bpf* calls.
eBPF file descriptors can be passed to QEMU from parent process or by unix
socket with sendfd() qmp command.

Possible solution for libvirt may look like this: 
https://github.com/daynix/libvirt/tree/RSS_eBPF (WIP)

Andrew Melnychenko (5):
  ebpf: Added eBPF initialization by fds and map update.
  virtio-net: Added property to load eBPF RSS with fds.
  ebpf: Added declaration/initialization routines.
  qmp: Added new command to retrieve eBPF blob.
  ebpf: Updated eBPF program and skeleton.

 ebpf/ebpf.c|   54 ++
 ebpf/ebpf.h|   31 +
 ebpf/ebpf_rss-stub.c   |6 +
 ebpf/ebpf_rss.c|  124 ++-
 ebpf/ebpf_rss.h|   10 +
 ebpf/meson.build   |1 +
 ebpf/rss.bpf.skeleton.h| 1469 
 hw/net/virtio-net.c|   96 ++-
 include/hw/virtio/virtio-net.h |1 +
 monitor/qmp-cmds.c |   16 +
 qapi/misc.json |   28 +
 tools/ebpf/rss.bpf.c   |2 +-
 12 files changed, 1079 insertions(+), 759 deletions(-)
 create mode 100644 ebpf/ebpf.c
 create mode 100644 ebpf/ebpf.h

-- 
2.39.1

[PATCH 2/5] virtio-net: Added property to load eBPF RSS with fds.

2023-05-01 Thread Andrew Melnychenko

eBPF RSS program and maps may now be passed during initialization.
Initially was implemented for libvirt to launch qemu without permissions,
and initialized eBPF program through the helper.

Signed-off-by: Andrew Melnychenko 
---
 hw/net/virtio-net.c| 96 +++---
 include/hw/virtio/virtio-net.h |  1 +
 2 files changed, 91 insertions(+), 6 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 53e1c326433..9b3a997d872 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -42,6 +42,7 @@
 #include "sysemu/sysemu.h"
 #include "trace.h"
 #include "monitor/qdev.h"
+#include "monitor/monitor.h"
 #include "hw/pci/pci_device.h"
 #include "net_rx_pkt.h"
 #include "hw/virtio/vhost.h"
@@ -1305,14 +1306,96 @@ static void virtio_net_detach_epbf_rss(VirtIONet *n)
 virtio_net_attach_ebpf_to_backend(n->nic, -1);
 }
 
-static bool virtio_net_load_ebpf(VirtIONet *n)
+static int virtio_net_get_ebpf_rss_fds(char *str, char *fds[], int nfds,
+   Error **errp)
 {
-if (!virtio_net_attach_ebpf_to_backend(n->nic, -1)) {
-/* backend does't support steering ebpf */
-return false;
+char *ptr = str;
+char *cur = NULL;
+size_t len = strlen(str);
+int i = 0;
+
+for (; i < nfds && ptr < str + len;) {
+cur = strchr(ptr, ':');
+
+if (cur == NULL) {
+fds[i] = g_strdup(ptr);
+} else {
+fds[i] = g_strndup(ptr, cur - ptr);
+}
+
+i++;
+if (cur == NULL) {
+break;
+} else {
+ptr = cur + 1;
+}
+}
+
+if (cur != NULL) {
+/* the string contains more arguments */
+error_setg(errp,
+   "Too many eBPF file descriptors for RSS provided.");
+} else if (i < nfds) {
+error_setg(errp,
+   "Not enough eBPF file descriptors for RSS were provided.");
+}
+
+return i;
+}
+
+static bool virtio_net_load_ebpf_fds(VirtIONet *n, Error **errp)
+{
+char *fds_strs[EBPF_RSS_MAX_FDS];
+int fds[EBPF_RSS_MAX_FDS];
+int nfds;
+int ret = true;
+int i = 0;
+
+ERRP_GUARD();
+
+nfds = virtio_net_get_ebpf_rss_fds(n->ebpf_rss_fds,
+   fds_strs, EBPF_RSS_MAX_FDS, errp);
+if (*errp) {
+ret = false;
+goto exit;
 }
 
-return ebpf_rss_load(&n->ebpf_rss);
+for (i = 0; i < nfds; i++) {
+fds[i] = monitor_fd_param(monitor_cur(), fds_strs[i], errp);
+if (*errp) {
+ret = false;
+goto exit;
+}
+}
+
+ret = ebpf_rss_load_fds(&n->ebpf_rss, fds[0], fds[1], fds[2], fds[3]);
+
+exit:
+if (!ret || *errp) {
+for (i = 0; i < nfds; i++) {
+close(fds[i]);
+}
+}
+
+for (i = 0; i < nfds; i++) {
+g_free(fds_strs[i]);
+}
+
+return ret;
+}
+
+static bool virtio_net_load_ebpf(VirtIONet *n, Error **errp)
+{
+bool ret = false;
+
+if (virtio_net_attach_ebpf_to_backend(n->nic, -1)) {
+if (!(n->ebpf_rss_fds
+&& virtio_net_load_ebpf_fds(n, errp))) {
+ret = ebpf_rss_load(&n->ebpf_rss);
+}
+}
+
+return ret;
 }
 
 static void virtio_net_unload_ebpf(VirtIONet *n)
@@ -3738,7 +3821,7 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)
 net_rx_pkt_init(&n->rx_pkt);
 
 if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
-virtio_net_load_ebpf(n);
+virtio_net_load_ebpf(n, errp);
 }
 }
 
@@ -3900,6 +3983,7 @@ static Property virtio_net_properties[] = {
 VIRTIO_NET_F_RSS, false),
 DEFINE_PROP_BIT64("hash", VirtIONet, host_features,
 VIRTIO_NET_F_HASH_REPORT, false),
+DEFINE_PROP_STRING("ebpf_rss_fds", VirtIONet, ebpf_rss_fds),
 DEFINE_PROP_BIT64("guest_rsc_ext", VirtIONet, host_features,
 VIRTIO_NET_F_RSC_EXT, false),
 DEFINE_PROP_UINT32("rsc_interval", VirtIONet, rsc_timeout,
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index ef234ffe7ef..e10ce88f918 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -219,6 +219,7 @@ struct VirtIONet {
 VirtioNetRssData rss_data;
 struct NetRxPkt *rx_pkt;
 struct EBPFRSSContext ebpf_rss;
+char *ebpf_rss_fds;
 };
 
 size_t virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
-- 
2.39.1

[PATCH 1/5] ebpf: Added eBPF initialization by fds and map update.

2023-05-01 Thread Andrew Melnychenko

Changed eBPF map updates through mmaped array.
Mmaped arrays provide direct access to map data.
It should omit using bpf_map_update_elem() call,
which may require capabilities that are not present.

Signed-off-by: Andrew Melnychenko 
---
 ebpf/ebpf_rss-stub.c |   6 +++
 ebpf/ebpf_rss.c  | 120 ++-
 ebpf/ebpf_rss.h  |  10 
 3 files changed, 113 insertions(+), 23 deletions(-)

diff --git a/ebpf/ebpf_rss-stub.c b/ebpf/ebpf_rss-stub.c
index e71e229190d..8d7fae2ad92 100644
--- a/ebpf/ebpf_rss-stub.c
+++ b/ebpf/ebpf_rss-stub.c
@@ -28,6 +28,12 @@ bool ebpf_rss_load(struct EBPFRSSContext *ctx)
 return false;
 }
 
+bool ebpf_rss_load_fds(struct EBPFRSSContext *ctx, int program_fd,
+   int config_fd, int toeplitz_fd, int table_fd)
+{
+return false;
+}
+
 bool ebpf_rss_set_all(struct EBPFRSSContext *ctx, struct EBPFRSSConfig *config,
   uint16_t *indirections_table, uint8_t *toeplitz_key)
 {
diff --git a/ebpf/ebpf_rss.c b/ebpf/ebpf_rss.c
index cee658c158b..08015fecb18 100644
--- a/ebpf/ebpf_rss.c
+++ b/ebpf/ebpf_rss.c
@@ -27,19 +27,68 @@ void ebpf_rss_init(struct EBPFRSSContext *ctx)
 {
 if (ctx != NULL) {
 ctx->obj = NULL;
+ctx->program_fd = -1;
 }
 }
 
 bool ebpf_rss_is_loaded(struct EBPFRSSContext *ctx)
 {
-return ctx != NULL && ctx->obj != NULL;
+return ctx != NULL && (ctx->obj != NULL || ctx->program_fd != -1);
+}
+
+static bool ebpf_rss_mmap(struct EBPFRSSContext *ctx)
+{
+if (!ebpf_rss_is_loaded(ctx)) {
+return false;
+}
+
+ctx->mmap_configuration = mmap(NULL, qemu_real_host_page_size(),
+   PROT_READ | PROT_WRITE, MAP_SHARED,
+   ctx->map_configuration, 0);
+if (ctx->mmap_configuration == MAP_FAILED) {
+trace_ebpf_error("eBPF RSS", "can not mmap eBPF configuration array");
+return false;
+}
+ctx->mmap_toeplitz_key = mmap(NULL, qemu_real_host_page_size(),
+   PROT_READ | PROT_WRITE, MAP_SHARED,
+   ctx->map_toeplitz_key, 0);
+if (ctx->mmap_toeplitz_key == MAP_FAILED) {
+trace_ebpf_error("eBPF RSS", "can not mmap eBPF toeplitz key");
+goto toeplitz_fail;
+}
+ctx->mmap_indirections_table = mmap(NULL, qemu_real_host_page_size(),
+   PROT_READ | PROT_WRITE, MAP_SHARED,
+   ctx->map_indirections_table, 0);
+if (ctx->mmap_indirections_table == MAP_FAILED) {
+trace_ebpf_error("eBPF RSS", "can not mmap eBPF indirection table");
+goto indirection_fail;
+}
+
+return true;
+
+indirection_fail:
+munmap(ctx->mmap_toeplitz_key, qemu_real_host_page_size());
+toeplitz_fail:
+munmap(ctx->mmap_configuration, qemu_real_host_page_size());
+return false;
+}
+
+static void ebpf_rss_munmap(struct EBPFRSSContext *ctx)
+{
+if (!ebpf_rss_is_loaded(ctx)) {
+return;
+}
+
+munmap(ctx->mmap_indirections_table, qemu_real_host_page_size());
+munmap(ctx->mmap_toeplitz_key, qemu_real_host_page_size());
+munmap(ctx->mmap_configuration, qemu_real_host_page_size());
 }
 
 bool ebpf_rss_load(struct EBPFRSSContext *ctx)
 {
 struct rss_bpf *rss_bpf_ctx;
 
-if (ctx == NULL) {
+if (ctx == NULL || ebpf_rss_is_loaded(ctx)) {
 return false;
 }
 
@@ -66,26 +115,51 @@ bool ebpf_rss_load(struct EBPFRSSContext *ctx)
 ctx->map_toeplitz_key = bpf_map__fd(
 rss_bpf_ctx->maps.tap_rss_map_toeplitz_key);
 
+if (!ebpf_rss_mmap(ctx)) {
+goto error;
+}
+
 return true;
 error:
 rss_bpf__destroy(rss_bpf_ctx);
 ctx->obj = NULL;
+ctx->program_fd = -1;
 
 return false;
 }
 
-static bool ebpf_rss_set_config(struct EBPFRSSContext *ctx,
-struct EBPFRSSConfig *config)
+bool ebpf_rss_load_fds(struct EBPFRSSContext *ctx, int program_fd,
+   int config_fd, int toeplitz_fd, int table_fd)
 {
-uint32_t map_key = 0;
+if (ctx == NULL || ebpf_rss_is_loaded(ctx)) {
+return false;
+}
 
-if (!ebpf_rss_is_loaded(ctx)) {
+if (program_fd < 0 || config_fd < 0 || toeplitz_fd < 0 || table_fd < 0) {
 return false;
 }
-if (bpf_map_update_elem(ctx->map_configuration,
-&map_key, config, 0) < 0) {
+
+ctx->program_fd = program_fd;
+ctx->map_configuration = config_fd;
+ctx->map_toeplitz_key = toeplitz_fd;
+ctx->map_indirections_table = table_fd;
+
+if (!ebpf_rss_mmap(ctx)) {
+ctx->program_fd = -1;
 return false;
 }
+
+return true;
+}
+
+static bool ebpf_rss_set_config(struct EBPFRSSContext *ctx,
+struct EBPFRSSConfig *config)
+{
+if (!ebpf_rss_is_loaded(ctx)) {
+return false;
+}
+
+memcpy(ctx->mmap_configuration, config, sizeof(*config));
 return t

[PATCH 5/5] ebpf: Updated eBPF program and skeleton.

2023-05-01 Thread Andrew Melnychenko

Updated section name, so libbpf should init/gues proper
program type without specifications during open/load.

Signed-off-by: Andrew Melnychenko 
---
 ebpf/rss.bpf.skeleton.h | 1469 ---
 tools/ebpf/rss.bpf.c|2 +-
 2 files changed, 741 insertions(+), 730 deletions(-)

diff --git a/ebpf/rss.bpf.skeleton.h b/ebpf/rss.bpf.skeleton.h
index 18eb2adb12c..41b84aea44c 100644
--- a/ebpf/rss.bpf.skeleton.h
+++ b/ebpf/rss.bpf.skeleton.h
@@ -176,162 +176,162 @@ err:
 
 static inline const void *rss_bpf__elf_bytes(size_t *sz)
 {
-   *sz = 20440;
+   *sz = 20720;
return (const void *)"\
 \x7f\x45\x4c\x46\x02\x01\x01\0\0\0\0\0\0\0\0\0\x01\0\xf7\0\x01\0\0\0\0\0\0\0\0\
-\0\0\0\0\0\0\0\0\0\0\0\x98\x4c\0\0\0\0\0\0\0\0\0\0\x40\0\0\0\0\0\x40\0\x0d\0\
-\x01\0\xbf\x19\0\0\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\x54\xff\0\0\0\0\xbf\xa7\
-\0\0\0\0\0\0\x07\x07\0\0\x54\xff\xff\xff\x18\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
+\0\0\0\0\0\0\0\0\0\0\0\xb0\x4d\0\0\0\0\0\0\0\0\0\0\x40\0\0\0\0\0\x40\0\x0d\0\
+\x01\0\xbf\x19\0\0\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\x4c\xff\0\0\0\0\xbf\xa7\
+\0\0\0\0\0\0\x07\x07\0\0\x4c\xff\xff\xff\x18\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
 \xbf\x72\0\0\0\0\0\0\x85\0\0\0\x01\0\0\0\xbf\x06\0\0\0\0\0\0\x18\x01\0\0\0\0\0\
 \0\0\0\0\0\0\0\0\0\xbf\x72\0\0\0\0\0\0\x85\0\0\0\x01\0\0\0\xbf\x08\0\0\0\0\0\0\
-\x18\0\0\0\xff\xff\xff\xff\0\0\0\0\0\0\0\0\x15\x06\x67\x02\0\0\0\0\xbf\x87\0\0\
-\0\0\0\0\x15\x07\x65\x02\0\0\0\0\x71\x61\0\0\0\0\0\0\x55\x01\x01\0\0\0\0\0\x05\
-\0\x5e\x02\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xc8\xff\0\0\0\0\x7b\x1a\xc0\xff\
-\0\0\0\0\x7b\x1a\xb8\xff\0\0\0\0\x7b\x1a\xb0\xff\0\0\0\0\x7b\x1a\xa8\xff\0\0\0\
-\0\x63\x1a\xa0\xff\0\0\0\0\x7b\x1a\x98\xff\0\0\0\0\x7b\x1a\x90\xff\0\0\0\0\x7b\
-\x1a\x88\xff\0\0\0\0\x7b\x1a\x80\xff\0\0\0\0\x7b\x1a\x78\xff\0\0\0\0\x7b\x1a\
-\x70\xff\0\0\0\0\x7b\x1a\x68\xff\0\0\0\0\x7b\x1a\x60\xff\0\0\0\0\x7b\x1a\x58\
-\xff\0\0\0\0\x15\x09\x4d\x02\0\0\0\0\x6b\x1a\xd0\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\
-\0\x07\x03\0\0\xd0\xff\xff\xff\xbf\x91\0\0\0\0\0\0\xb7\x02\0\0\x0c\0\0\0\xb7\
+\x18\0\0\0\xff\xff\xff\xff\0\0\0\0\0\0\0\0\x15\x06\x64\x02\0\0\0\0\xbf\x87\0\0\
+\0\0\0\0\x15\x07\x62\x02\0\0\0\0\x71\x61\0\0\0\0\0\0\x55\x01\x01\0\0\0\0\0\x05\
+\0\x5b\x02\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xc0\xff\0\0\0\0\x7b\x1a\xb8\xff\
+\0\0\0\0\x7b\x1a\xb0\xff\0\0\0\0\x7b\x1a\xa8\xff\0\0\0\0\x7b\x1a\xa0\xff\0\0\0\
+\0\x63\x1a\x98\xff\0\0\0\0\x7b\x1a\x90\xff\0\0\0\0\x7b\x1a\x88\xff\0\0\0\0\x7b\
+\x1a\x80\xff\0\0\0\0\x7b\x1a\x78\xff\0\0\0\0\x7b\x1a\x70\xff\0\0\0\0\x7b\x1a\
+\x68\xff\0\0\0\0\x7b\x1a\x60\xff\0\0\0\0\x7b\x1a\x58\xff\0\0\0\0\x7b\x1a\x50\
+\xff\0\0\0\0\x15\x09\x4a\x02\0\0\0\0\x6b\x1a\xc8\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\
+\0\x07\x03\0\0\xc8\xff\xff\xff\xbf\x91\0\0\0\0\0\0\xb7\x02\0\0\x0c\0\0\0\xb7\
 \x04\0\0\x02\0\0\0\xb7\x05\0\0\0\0\0\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\
-\x77\0\0\0\x20\0\0\0\x55\0\x42\x02\0\0\0\0\xb7\x02\0\0\x10\0\0\0\x69\xa1\xd0\
+\x77\0\0\0\x20\0\0\0\x55\0\x3f\x02\0\0\0\0\xb7\x02\0\0\x10\0\0\0\x69\xa1\xc8\
 \xff\0\0\0\0\xbf\x13\0\0\0\0\0\0\xdc\x03\0\0\x10\0\0\0\x15\x03\x02\0\0\x81\0\0\
 \x55\x03\x0b\0\xa8\x88\0\0\xb7\x02\0\0\x14\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\
-\0\xd0\xff\xff\xff\xbf\x91\0\0\0\0\0\0\xb7\x04\0\0\x02\0\0\0\xb7\x05\0\0\0\0\0\
-\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\x55\0\x32\x02\0\
-\0\0\0\x69\xa1\xd0\xff\0\0\0\0\x15\x01\x30\x02\0\0\0\0\x7b\x7a\x38\xff\0\0\0\0\
-\x7b\x9a\x40\xff\0\0\0\0\x15\x01\x55\0\x86\xdd\0\0\x55\x01\x39\0\x08\0\0\0\xb7\
-\x07\0\0\x01\0\0\0\x73\x7a\x58\xff\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xe0\xff\
-\0\0\0\0\x7b\x1a\xd8\xff\0\0\0\0\x7b\x1a\xd0\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\0\
-\x07\x03\0\0\xd0\xff\xff\xff\x79\xa1\x40\xff\0\0\0\0\xb7\x02\0\0\0\0\0\0\xb7\
+\0\xc8\xff\xff\xff\xbf\x91\0\0\0\0\0\0\xb7\x04\0\0\x02\0\0\0\xb7\x05\0\0\0\0\0\
+\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\x55\0\x2f\x02\0\
+\0\0\0\x69\xa1\xc8\xff\0\0\0\0\x15\x01\x2d\x02\0\0\0\0\x7b\x7a\x30\xff\0\0\0\0\
+\x7b\x9a\x38\xff\0\0\0\0\x15\x01\x55\0\x86\xdd\0\0\x55\x01\x39\0\x08\0\0\0\xb7\
+\x07\0\0\x01\0\0\0\x73\x7a\x50\xff\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xd8\xff\
+\0\0\0\0\x7b\x1a\xd0\xff\0\0\0\0\x7b\x1a\xc8\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\0\
+\x07\x03\0\0\xc8\xff\xff\xff\x79\xa1\x38\xff\0\0\0\0\xb7\x02\0\0\0\0\0\0\xb7\
 \x04\0\0\x14\0\0\0\xb7\x05\0\0\x01\0\0\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\
-\0\x77\0\0\0\x20\0\0\0\x55\0\x1c\x02\0\0\0\0\x69\xa1\xd6\xff\0\0\0\0\x55\x01\
-\x01\0\0\0\0\0\xb7\x07\0\0\0\0\0\0\x61\xa1\xdc\xff\0\0\0\0\x63\x1a\x64\xff\0\0\
-\0\0\x61\xa1\xe0\xff\0\0\0\0\x63\x1a\x68\xff\0\0\0\0\x71\xa9\xd9\xff\0\0\0\0\
-\x73\x7a\x5e\xff\0\0\0\0\x71\xa1\xd0\xff\0\0\0\0\x67\x01\0\0\x02\0\0\0\x57\x01\
-\0\0\x3c\0\0\0\x7b\x1a\x48\xff\0\0\0\0\xbf\x91\0\0\0\0\0\0\x57\x01\0\0\xff\0\0\
+\0\x77\0\0\0\x20\0\0\0\x55\0\x19\x02\0\0\0\0\x69\xa1\xce\xff\0\0\0\0\x55\x01\
+\x01\0\0\0\0\0\xb7\x07\0\0\0\0\0\0\x61\xa1\xd4\xff\0\0\0\0\x63\x1a\x5c\xff\0\0\
+\0\0\x61\xa1\xd8\xff\0\

[PATCH 4/5] qmp: Added new command to retrieve eBPF blob.

2023-05-01 Thread Andrew Melnychenko

Added command "request-ebpf". This command returns
eBPF program encoded base64. The program taken from the
skeleton and essentially is an ELF object that can be
loaded in the future with libbpf.

Signed-off-by: Andrew Melnychenko 
---
 monitor/qmp-cmds.c | 16 
 qapi/misc.json | 28 
 2 files changed, 44 insertions(+)

diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index b0f948d3376..f7641bb55b9 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -32,6 +32,7 @@
 #include "hw/mem/memory-device.h"
 #include "hw/intc/intc.h"
 #include "hw/rdma/rdma.h"
+#include "ebpf/ebpf.h"
 
 NameInfo *qmp_query_name(Error **errp)
 {
@@ -209,3 +210,18 @@ static void __attribute__((__constructor__)) 
monitor_init_qmp_commands(void)
  qmp_marshal_qmp_capabilities,
  QCO_ALLOW_PRECONFIG, 0);
 }
+
+EbpfObject *qmp_request_ebpf(const char *id, Error **errp)
+{
+EbpfObject *ret = NULL;
+size_t size = 0;
+const void *data = ebpf_find_binary_by_id(id, &size, errp);
+if (!data) {
+return NULL;
+}
+
+ret = g_new0(EbpfObject, 1);
+ret->object = g_base64_encode(data, size);
+
+return ret;
+}
diff --git a/qapi/misc.json b/qapi/misc.json
index 6ddd16ea283..81613fd1b13 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -618,3 +618,31 @@
 { 'event': 'VFU_CLIENT_HANGUP',
   'data': { 'vfu-id': 'str', 'vfu-qom-path': 'str',
 'dev-id': 'str', 'dev-qom-path': 'str' } }
+
+##
+# @EbpfObject:
+#
+# Structure that holds eBPF ELF object encoded in base64.
+#
+# Since: 8.1
+#
+##
+{ 'struct': 'EbpfObject',
+  'data': {'object': 'str'} }
+
+##
+# @request-ebpf:
+#
+# Function returns eBPF object that can be loaded with libbpf.
+# Management applications (g.e. libvirt) may load it and pass file
+# descriptors to QEMU. Which allows running QEMU without BPF capabilities.
+#
+# Returns: RSS eBPF object encoded in base64.
+#
+# Since: 8.1
+#
+##
+{ 'command': 'request-ebpf',
+  'data': { 'id': 'str' },
+  'returns': 'EbpfObject' }
+
-- 
2.39.1

[PATCH 3/5] ebpf: Added declaration/initialization routines.

2023-05-01 Thread Andrew Melnychenko

Now, the binary objects may be retrieved by id/name.
It would require for future qmp commands that may require specific
eBPF blob.

Signed-off-by: Andrew Melnychenko 
---
 ebpf/ebpf.c  | 54 
 ebpf/ebpf.h  | 31 +++
 ebpf/ebpf_rss.c  |  4 
 ebpf/meson.build |  1 +
 4 files changed, 90 insertions(+)
 create mode 100644 ebpf/ebpf.c
 create mode 100644 ebpf/ebpf.h

diff --git a/ebpf/ebpf.c b/ebpf/ebpf.c
new file mode 100644
index 000..fd96f2b42f9
--- /dev/null
+++ b/ebpf/ebpf.c
@@ -0,0 +1,54 @@
+/*
+ * QEMU eBPF binary declaration routine.
+ *
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Authors:
+ *  Andrew Melnychenko 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/queue.h"
+#include "qapi/error.h"
+#include "ebpf/ebpf.h"
+
+struct ElfBinaryDataEntry {
+const char *id;
+const void *data;
+size_t datalen;
+
+QSLIST_ENTRY(ElfBinaryDataEntry) node;
+};
+
+static QSLIST_HEAD(, ElfBinaryDataEntry) ebpf_elf_obj_list =
+QSLIST_HEAD_INITIALIZER();
+
+void ebpf_register_binary_data(const char *id, const void *data, size_t 
datalen)
+{
+struct ElfBinaryDataEntry *dataentry = NULL;
+
+dataentry = g_new0(struct ElfBinaryDataEntry, 1);
+dataentry->data = data;
+dataentry->datalen = datalen;
+dataentry->id = id;
+
+QSLIST_INSERT_HEAD(&ebpf_elf_obj_list, dataentry, node);
+}
+
+const void *ebpf_find_binary_by_id(const char *id, size_t *sz, Error **errp)
+{
+struct ElfBinaryDataEntry *it = NULL;
+QSLIST_FOREACH(it, &ebpf_elf_obj_list, node) {
+if (strcmp(id, it->id) == 0) {
+*sz = it->datalen;
+return it->data;
+}
+}
+
+error_setg(errp, "can't find eBPF object with id: %s", id);
+
+return NULL;
+}
diff --git a/ebpf/ebpf.h b/ebpf/ebpf.h
new file mode 100644
index 000..36c5d455b4b
--- /dev/null
+++ b/ebpf/ebpf.h
@@ -0,0 +1,31 @@
+/*
+ * QEMU eBPF binary declaration routine.
+ *
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Authors:
+ *  Andrew Melnychenko 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#ifndef EBPF_H
+#define EBPF_H
+
+struct Error;
+
+void ebpf_register_binary_data(const char *id, const void *data,
+   size_t datalen);
+const void *ebpf_find_binary_by_id(const char *id, size_t *sz,
+   struct Error **errp);
+
+#define ebpf_binary_init(id, fn)   \
+static void __attribute__((constructor)) ebpf_binary_init_ ## fn(void) \
+{  \
+size_t datalen = 0;\
+const void *data = fn(&datalen);   \
+ebpf_register_binary_data(id, data, datalen);  \
+}
+
+#endif /* EBPF_H */
diff --git a/ebpf/ebpf_rss.c b/ebpf/ebpf_rss.c
index 08015fecb18..b4038725f23 100644
--- a/ebpf/ebpf_rss.c
+++ b/ebpf/ebpf_rss.c
@@ -21,6 +21,8 @@
 
 #include "ebpf/ebpf_rss.h"
 #include "ebpf/rss.bpf.skeleton.h"
+#include "ebpf/ebpf.h"
+
 #include "trace.h"
 
 void ebpf_rss_init(struct EBPFRSSContext *ctx)
@@ -237,3 +239,5 @@ void ebpf_rss_unload(struct EBPFRSSContext *ctx)
 ctx->obj = NULL;
 ctx->program_fd = -1;
 }
+
+ebpf_binary_init("rss", rss_bpf__elf_bytes)
diff --git a/ebpf/meson.build b/ebpf/meson.build
index 2dd0fd89480..67c3f53aa9d 100644
--- a/ebpf/meson.build
+++ b/ebpf/meson.build
@@ -1 +1,2 @@
+softmmu_ss.add(files('ebpf.c'))
 softmmu_ss.add(when: libbpf, if_true: files('ebpf_rss.c'), if_false: 
files('ebpf_rss-stub.c'))
-- 
2.39.1

Re: [PATCH v3 06/57] tcg/i386: Generalize multi-part load overlap test

2023-05-01 Thread Richard Henderson


On 4/29/23 14:01, Philippe Mathieu-Daudé wrote:

On 24/4/23 07:40, Richard Henderson wrote:

Test for both base and index; use datahi as a temporary, overwritten
by the final load.  Always perform the loads in ascending order, so
that any (user-only) fault sees the correct address.

Signed-off-by: Richard Henderson 
---
  tcg/i386/tcg-target.c.inc | 31 +++
  1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index b986109d77..794d440a9e 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -2223,23 +2223,22 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, 
TCGReg datahi,

  if (TCG_TARGET_REG_BITS == 64) {
  tcg_out_modrm_sib_offset(s, movop + P_REXW + seg, datalo,
   base, index, 0, ofs);
+    break;
+    }
+    if (use_movbe) {
+    TCGReg t = datalo;
+    datalo = datahi;
+    datahi = t;
+    }
+    if (base == datalo || index == datalo) {
+    tcg_out_modrm_sib_offset(s, OPC_LEA, datahi, base, index, 0, ofs);
+    tcg_out_modrm_offset(s, movop + seg, datalo, datahi, 0);
+    tcg_out_modrm_offset(s, movop + seg, datahi, datahi, 4);


LGTM but I'd rather have someone fluent with x86 review this one...


The original address is (base + (index << 0) + ofs).

If datalo overlaps either base or index, then we can't use the same form of address for 
the second load for datahi.  So we "Load Effective Address" to perform the computation of 
the original address once, storing into datahi as temporary (we are guaranteed that datalo 
!= datahi because they're both outputs).  After that, the two addresses that we want are 
(datahi + 0) and (datahi + 4).



r~

Re: [PATCH] tb-maint: do not use mb_read/mb_set

2023-05-01 Thread Richard Henderson


On 4/30/23 12:25, Paolo Bonzini wrote:

The load side can use a relaxed load, which will surely happen before
the work item is run by async_safe_run_on_cpu() or before double-checking
under mmap_lock.  The store side can use an atomic RMW operation.

Signed-off-by: Paolo Bonzini
---
  accel/tcg/tb-maint.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH] call_rcu: stop using mb_set/mb_read

2023-05-01 Thread Richard Henderson


On 4/30/23 12:25, Paolo Bonzini wrote:

Use a store-release when enqueuing a new call_rcu, and a load-acquire
when dequeuing; and read the tail after checking that node->next is
consistent, which is the standard message passing pattern and it is
clearer than mb_read/mb_set.

Signed-off-by: Paolo Bonzini 
---
  util/rcu.c | 38 +++---
  1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/util/rcu.c b/util/rcu.c
index e5b6e52be6f8..867607cd5a1e 100644
--- a/util/rcu.c
+++ b/util/rcu.c
@@ -189,8 +189,22 @@ static void enqueue(struct rcu_head *node)
  struct rcu_head **old_tail;
  
  node->next = NULL;

+
+/*
+ * Make this node the tail of the list.  The node will be
+ * used by further enqueue operations, but it will not
+ * be dequeued yet...
+ */
  old_tail = qatomic_xchg(&tail, &node->next);
-qatomic_mb_set(old_tail, node);
+
+/*
+ * ... until it is pointed to from another item in the list.
+ * In the meanwhile, try_dequeue() will find a NULL next pointer


Either "In the meantime" or "Meanwhile" (noun vs adverb).
E.g. "Meanwhile, at Try Dequeue's volcano lair..."  :-)



+/* If the head node has NULL in its next pointer, the value is
+ * wrong and we need to wait until its enqueuer finishes the update.
+ */


/*
 *
 */

I know surrounding code is different, but slowly it will all be edited.

Reviewed-by: Richard Henderson 


r~

Re: [PATCH v7 1/1] arm/kvm: add support for MTE

2023-05-01 Thread Richard Henderson


On 4/28/23 18:50, Juan Quintela wrote:

Pardon my ignorance here, but to try to help with migration.  How is
this mte tag stored?
- 1 array of 8bits per page of memory
- 1 array of 64bits per page of memory
- whatever

Lets asume that it is 1 byte per page. For the explanation it don't
matter, only matters that it is an array of things that are one for each
page.


Not that it matters, as you say, but for concreteness, 1 4-bit tag per 16 bytes, packed, 
so 128 bytes per 4k page.



So my suggestion is just to send another array:

- 1 array of page addresses
- 1 array of page tags that correspond to the previous one
- 1 array of pages that correspond to the previous addresses

You put compatiblity marks here and there checking that you are using
mte (and the same version) in both sides and you call that a day.


Sounds reasonable.


Notice that this requires the series (still not upstream but already on
the list) that move the zero page detection to the multifd thread,
because I am assuming that zero pages also have tags (yes, it was not a
very impressive guess).


Correct.  "Proper" zero detection would include checking the tags as well.
Zero tags are what you get from the kernel on a new allocation.


Now you need to tell me if I should do this for each page, or use some
kind of scatter-gather function that allows me to receive the mte tags
from an array of pages.


That is going to depend on if KVM exposes an interface with which to bulk-set tags (STGM, 
"store tag multiple", is only available to kernel mode for some reason), a-la 
arch/arm64/mm/copypage.c (which copies the page data then the page tags separately).


For the moment, KVM believes that memcpy from userspace is sufficient, which means we'll 
want a custom memcpy using STGP to store 16 bytes along with its tag.



You could pass this information when we are searching for dirty pages,
but it is going to be complicated doing that (basically we only pass the
dirty page id, nothing else).


A page can be dirtied by changing nothing but a tag.
So we cannot of course send tags "early", they must come with the data.
I'm not 100% sure I understood your question here.


Another question, if you are using MTE, all pages have MTE, right?
Or there are other exceptions?


No such systems are built yet, so we won't know what corner cases the host system will 
have to cope with, but I believe as written so far all pages must have tags when MTE is 
enabled by KVM.



r~

[PATCH v2 2/3] target/i386: Fix exception classes for SSE/AVX instructions.

2023-05-01 Thread Ricky Zhou

Fix the exception classes for some SSE/AVX instructions to match what is
documented in the Intel manual.

These changes are expected to have no functional effect on the behavior
that qemu implements (primarily >= 16-byte memory alignment checks). For
instance, since qemu does not implement the AC flag, there is no
difference in behavior between Exception Classes 4 and 5 for
instructions where the SSE version only takes <16 byte memory operands.
---
 target/i386/tcg/decode-new.c.inc | 50 
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index 1a579451d2..796ba7cf18 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -237,7 +237,7 @@ static void decode_group14(DisasContext *s, CPUX86State 
*env, X86OpEntry *entry,
 static void decode_0F6F(DisasContext *s, CPUX86State *env, X86OpEntry *entry, 
uint8_t *b)
 {
 static const X86OpEntry opcodes_0F6F[4] = {
-X86_OP_ENTRY3(MOVDQ,   P,q, None,None, Q,q, vex1 mmx),  /* movq */
+X86_OP_ENTRY3(MOVDQ,   P,q, None,None, Q,q, vex5 mmx),  /* movq */
 X86_OP_ENTRY3(MOVDQ,   V,x, None,None, W,x, vex1),  /* movdqa 
*/
 X86_OP_ENTRY3(MOVDQ,   V,x, None,None, W,x, vex4_unal), /* movdqu 
*/
 {},
@@ -306,7 +306,7 @@ static void decode_0F7E(DisasContext *s, CPUX86State *env, 
X86OpEntry *entry, ui
 static void decode_0F7F(DisasContext *s, CPUX86State *env, X86OpEntry *entry, 
uint8_t *b)
 {
 static const X86OpEntry opcodes_0F7F[4] = {
-X86_OP_ENTRY3(MOVDQ,   W,x, None,None, V,x, vex1 mmx), /* movq */
+X86_OP_ENTRY3(MOVDQ,   W,x, None,None, V,x, vex5 mmx), /* movq */
 X86_OP_ENTRY3(MOVDQ,   W,x, None,None, V,x, vex1), /* movdqa */
 X86_OP_ENTRY3(MOVDQ,   W,x, None,None, V,x, vex4_unal), /* movdqu 
*/
 {},
@@ -639,15 +639,15 @@ static void decode_0F10(DisasContext *s, CPUX86State 
*env, X86OpEntry *entry, ui
 static const X86OpEntry opcodes_0F10_reg[4] = {
 X86_OP_ENTRY3(MOVDQ,   V,x,  None,None, W,x, vex4_unal), /* MOVUPS */
 X86_OP_ENTRY3(MOVDQ,   V,x,  None,None, W,x, vex4_unal), /* MOVUPD */
-X86_OP_ENTRY3(VMOVSS,  V,x,  H,x,   W,x, vex4),
-X86_OP_ENTRY3(VMOVLPx, V,x,  H,x,   W,x, vex4), /* MOVSD */
+X86_OP_ENTRY3(VMOVSS,  V,x,  H,x,   W,x, vex5),
+X86_OP_ENTRY3(VMOVLPx, V,x,  H,x,   W,x, vex5), /* MOVSD */
 };
 
 static const X86OpEntry opcodes_0F10_mem[4] = {
 X86_OP_ENTRY3(MOVDQ,  V,x,  None,None, W,x,  vex4_unal), /* MOVUPS 
*/
 X86_OP_ENTRY3(MOVDQ,  V,x,  None,None, W,x,  vex4_unal), /* MOVUPD 
*/
-X86_OP_ENTRY3(VMOVSS_ld,  V,x,  H,x,   M,ss, vex4),
-X86_OP_ENTRY3(VMOVSD_ld,  V,x,  H,x,   M,sd, vex4),
+X86_OP_ENTRY3(VMOVSS_ld,  V,x,  H,x,   M,ss, vex5),
+X86_OP_ENTRY3(VMOVSD_ld,  V,x,  H,x,   M,sd, vex5),
 };
 
 if ((get_modrm(s, env) >> 6) == 3) {
@@ -662,15 +662,15 @@ static void decode_0F11(DisasContext *s, CPUX86State 
*env, X86OpEntry *entry, ui
 static const X86OpEntry opcodes_0F11_reg[4] = {
 X86_OP_ENTRY3(MOVDQ,   W,x,  None,None, V,x, vex4), /* MOVUPS */
 X86_OP_ENTRY3(MOVDQ,   W,x,  None,None, V,x, vex4), /* MOVUPD */
-X86_OP_ENTRY3(VMOVSS,  W,x,  H,x,   V,x, vex4),
-X86_OP_ENTRY3(VMOVLPx, W,x,  H,x,   V,q, vex4), /* MOVSD */
+X86_OP_ENTRY3(VMOVSS,  W,x,  H,x,   V,x, vex5),
+X86_OP_ENTRY3(VMOVLPx, W,x,  H,x,   V,q, vex5), /* MOVSD */
 };
 
 static const X86OpEntry opcodes_0F11_mem[4] = {
 X86_OP_ENTRY3(MOVDQ,  W,x,  None,None, V,x, vex4), /* MOVUPS */
 X86_OP_ENTRY3(MOVDQ,  W,x,  None,None, V,x, vex4), /* MOVUPD */
-X86_OP_ENTRY3(VMOVSS_st,  M,ss, None,None, V,x, vex4),
-X86_OP_ENTRY3(VMOVLPx_st, M,sd, None,None, V,x, vex4), /* MOVSD */
+X86_OP_ENTRY3(VMOVSS_st,  M,ss, None,None, V,x, vex5),
+X86_OP_ENTRY3(VMOVLPx_st, M,sd, None,None, V,x, vex5), /* MOVSD */
 };
 
 if ((get_modrm(s, env) >> 6) == 3) {
@@ -687,16 +687,16 @@ static void decode_0F12(DisasContext *s, CPUX86State 
*env, X86OpEntry *entry, ui
  * Use dq for operand for compatibility with gen_MOVSD and
  * to allow VEX128 only.
  */
-X86_OP_ENTRY3(VMOVLPx_ld, V,dq, H,dq,  M,q, vex4), /* MOVLPS */
-X86_OP_ENTRY3(VMOVLPx_ld, V,dq, H,dq,  M,q, vex4), /* MOVLPD */
+X86_OP_ENTRY3(VMOVLPx_ld, V,dq, H,dq,  M,q, vex5), /* MOVLPS */
+X86_OP_ENTRY3(VMOVLPx_ld, V,dq, H,dq,  M,q, vex5), /* MOVLPD */
 X86_OP_ENTRY3(VMOVSLDUP,  V,x,  None,None, W,x, vex4 cpuid(SSE3)),
-X86_OP_ENTRY3(VMOVDDUP,   V,x,  None,None, WM,q, vex4 cpuid(SSE3)), /* 
qq if VEX.256 */
+X86_OP_ENTRY3(VMOVDDUP,   V,x,  None,None, WM,q, vex5 cpuid(SSE3)), /* 
qq if VEX.256 */
 };
 stati

[PATCH v2 1/3] target/i386: Fix and add some comments next to SSE/AVX instructions.

2023-05-01 Thread Ricky Zhou

Adds some comments describing what instructions correspond to decoding
table entries and fixes some existing comments which named the wrong
instruction.
---
 target/i386/tcg/decode-new.c.inc | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index 4fdd87750b..1a579451d2 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -274,9 +274,9 @@ static void decode_0F78(DisasContext *s, CPUX86State *env, 
X86OpEntry *entry, ui
 {
 static const X86OpEntry opcodes_0F78[4] = {
 {},
-X86_OP_ENTRY3(EXTRQ_i,   V,x, None,None, I,w,  cpuid(SSE4A)),
+X86_OP_ENTRY3(EXTRQ_i,   V,x, None,None, I,w,  cpuid(SSE4A)), /* 
AMD extension */
 {},
-X86_OP_ENTRY3(INSERTQ_i, V,x, U,x, I,w,cpuid(SSE4A)),
+X86_OP_ENTRY3(INSERTQ_i, V,x, U,x, I,w,cpuid(SSE4A)), /* 
AMD extension */
 };
 *entry = *decode_by_prefix(s, opcodes_0F78);
 }
@@ -284,9 +284,9 @@ static void decode_0F78(DisasContext *s, CPUX86State *env, 
X86OpEntry *entry, ui
 static void decode_0F79(DisasContext *s, CPUX86State *env, X86OpEntry *entry, 
uint8_t *b)
 {
 if (s->prefix & PREFIX_REPNZ) {
-entry->gen = gen_INSERTQ_r;
+entry->gen = gen_INSERTQ_r; /* AMD extension */
 } else if (s->prefix & PREFIX_DATA) {
-entry->gen = gen_EXTRQ_r;
+entry->gen = gen_EXTRQ_r; /* AMD extension */
 } else {
 entry->gen = NULL;
 };
@@ -660,15 +660,15 @@ static void decode_0F10(DisasContext *s, CPUX86State 
*env, X86OpEntry *entry, ui
 static void decode_0F11(DisasContext *s, CPUX86State *env, X86OpEntry *entry, 
uint8_t *b)
 {
 static const X86OpEntry opcodes_0F11_reg[4] = {
-X86_OP_ENTRY3(MOVDQ,   W,x,  None,None, V,x, vex4), /* MOVPS */
-X86_OP_ENTRY3(MOVDQ,   W,x,  None,None, V,x, vex4), /* MOVPD */
+X86_OP_ENTRY3(MOVDQ,   W,x,  None,None, V,x, vex4), /* MOVUPS */
+X86_OP_ENTRY3(MOVDQ,   W,x,  None,None, V,x, vex4), /* MOVUPD */
 X86_OP_ENTRY3(VMOVSS,  W,x,  H,x,   V,x, vex4),
 X86_OP_ENTRY3(VMOVLPx, W,x,  H,x,   V,q, vex4), /* MOVSD */
 };
 
 static const X86OpEntry opcodes_0F11_mem[4] = {
-X86_OP_ENTRY3(MOVDQ,  W,x,  None,None, V,x, vex4), /* MOVPS */
-X86_OP_ENTRY3(MOVDQ,  W,x,  None,None, V,x, vex4), /* MOVPD */
+X86_OP_ENTRY3(MOVDQ,  W,x,  None,None, V,x, vex4), /* MOVUPS */
+X86_OP_ENTRY3(MOVDQ,  W,x,  None,None, V,x, vex4), /* MOVUPD */
 X86_OP_ENTRY3(VMOVSS_st,  M,ss, None,None, V,x, vex4),
 X86_OP_ENTRY3(VMOVLPx_st, M,sd, None,None, V,x, vex4), /* MOVSD */
 };
@@ -839,9 +839,9 @@ static const X86OpEntry opcodes_0F[256] = {
 [0x17] = X86_OP_ENTRY3(VMOVHPx_st,  M,q, None,None, V,dq, vex4 p_00_66),
 
 [0x50] = X86_OP_ENTRY3(MOVMSK, G,y, None,None, U,x, vex7 p_00_66),
-[0x51] = X86_OP_GROUP3(sse_unary,  V,x, H,x, W,x, vex2_rep3 p_00_66_f3_f2),
-[0x52] = X86_OP_GROUP3(sse_unary,  V,x, H,x, W,x, vex4_rep5 p_00_f3),
-[0x53] = X86_OP_GROUP3(sse_unary,  V,x, H,x, W,x, vex4_rep5 p_00_f3),
+[0x51] = X86_OP_GROUP3(sse_unary,  V,x, H,x, W,x, vex2_rep3 
p_00_66_f3_f2), /* sqrtps */
+[0x52] = X86_OP_GROUP3(sse_unary,  V,x, H,x, W,x, vex4_rep5 p_00_f3), /* 
rsqrtps */
+[0x53] = X86_OP_GROUP3(sse_unary,  V,x, H,x, W,x, vex4_rep5 p_00_f3), /* 
rcpps */
 [0x54] = X86_OP_ENTRY3(PAND,   V,x, H,x, W,x,  vex4 p_00_66), /* vand 
*/
 [0x55] = X86_OP_ENTRY3(PANDN,  V,x, H,x, W,x,  vex4 p_00_66), /* vandn 
*/
 [0x56] = X86_OP_ENTRY3(POR,V,x, H,x, W,x,  vex4 p_00_66), /* vor */
@@ -879,7 +879,7 @@ static const X86OpEntry opcodes_0F[256] = {
 
 [0x58] = X86_OP_ENTRY3(VADD,   V,x, H,x, W,x, vex2_rep3 p_00_66_f3_f2),
 [0x59] = X86_OP_ENTRY3(VMUL,   V,x, H,x, W,x, vex2_rep3 p_00_66_f3_f2),
-[0x5a] = X86_OP_GROUP3(sse_unary,  V,x, H,x, W,x, vex2_rep3 p_00_66_f3_f2),
+[0x5a] = X86_OP_GROUP3(sse_unary,  V,x, H,x, W,x, vex2_rep3 
p_00_66_f3_f2), /* CVTPS2PD */
 [0x5b] = X86_OP_GROUP0(0F5B),
 [0x5c] = X86_OP_ENTRY3(VSUB,   V,x, H,x, W,x, vex2_rep3 p_00_66_f3_f2),
 [0x5d] = X86_OP_ENTRY3(VMIN,   V,x, H,x, W,x, vex2_rep3 p_00_66_f3_f2),
-- 
2.39.2

[PATCH v2 3/3] target/i386: Fix exception classes for MOVNTPS/MOVNTPD.

2023-05-01 Thread Ricky Zhou

Before this change, MOVNTPS and MOVNTPD were labeled as Exception Class
4 (only requiring alignment for legacy SSE instructions). This changes
them to Exception Class 1 (always requiring memory alignment), as
documented in the Intel manual.
---
 target/i386/tcg/decode-new.c.inc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index 796ba7cf18..282721b54c 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -750,8 +750,9 @@ static void decode_0F2A(DisasContext *s, CPUX86State *env, 
X86OpEntry *entry, ui
 static void decode_0F2B(DisasContext *s, CPUX86State *env, X86OpEntry *entry, 
uint8_t *b)
 {
 static const X86OpEntry opcodes_0F2B[4] = {
-X86_OP_ENTRY3(MOVDQ,  M,x,  None,None, V,x, vex4), /* MOVNTPS */
-X86_OP_ENTRY3(MOVDQ,  M,x,  None,None, V,x, vex4), /* MOVNTPD */
+X86_OP_ENTRY3(MOVDQ,  M,x,  None,None, V,x, vex1), /* MOVNTPS */
+X86_OP_ENTRY3(MOVDQ,  M,x,  None,None, V,x, vex1), /* MOVNTPD */
+/* AMD extensions */
 X86_OP_ENTRY3(VMOVSS_st,  M,ss, None,None, V,x, vex4 cpuid(SSE4A)), /* 
MOVNTSS */
 X86_OP_ENTRY3(VMOVLPx_st, M,sd, None,None, V,x, vex4 cpuid(SSE4A)), /* 
MOVNTSD */
 };
-- 
2.39.2

Re: [PATCH] target/i386: Fix exception classes for SSE/AVX instructions.

2023-05-01 Thread Ricky Zhou

On Fri, Apr 14, 2023 at 8:19 AM Philippe Mathieu-Daudé 
wrote:

> Having this patch split in 2 (documentation first, logical change then)
> would ease code review.
>
> > There is one functional change:
> >
> > Before this change, MOVNTPS and MOVNTPD were labeled as Exception Class
> > 4 (only requiring alignment for legacy SSE instructions). This changes
> > them to Exception Class 1 (always requiring memory alignment), as
> > documented in the Intel manual.
>
> This could be a 3rd patch.
>
Apologies for the delayed response - I just noticed your reply today.

I've split this into three separate patches as suggested (
https://lore.kernel.org/qemu-devel/2023050428.95998-1-ri...@rzhou.org/T/),
thanks!

Ricky

Re: [PULL 00/17] Block patches

2023-05-01 Thread Stefan Hajnoczi

On Sat, Apr 29, 2023 at 11:05:06PM +0100, Richard Henderson wrote:
> On 4/28/23 13:39, Stefan Hajnoczi wrote:
> > The following changes since commit 05d50ba2d4668d43a835c5a502efdec9b92646e6:
> > 
> >Merge tag 'migration-20230427-pull-request' of 
> > https://gitlab.com/juan.quintela/qemu into staging (2023-04-28 08:35:06 
> > +0100)
> > 
> > are available in the Git repository at:
> > 
> >https://gitlab.com/stefanha/qemu.git tags/block-pull-request
> > 
> > for you to fetch changes up to d3c760be786571d83d5cea01953e543df4d76f51:
> > 
> >docs/zoned-storage:add zoned emulation use case (2023-04-28 08:34:07 
> > -0400)
> > 
> > 
> > Pull request
> > 
> > This pull request contains Sam Li's virtio-blk zoned storage work. These
> > patches were dropped from my previous block pull request due to CI failures.
> 
> 
> More CI build failures, e.g.

Hi Sam,
There are some more CI failures.

> 
> https://gitlab.com/qemu-project/qemu/-/jobs/4202086013#L1720

This Ubuntu 20.04 on s390x CI job failed because  is
missing Linux commit e876df1fe0ad ("block: add zone open, close and
finish ioctl support"):

  ../block/file-posix.c: In function ‘raw_co_zone_mgmt’:
  ../block/file-posix.c:3472:14: error: ‘BLKOPENZONE’ undeclared (first use in 
this function)
   3472 | zo = BLKOPENZONE;
|  ^~~
  ../block/file-posix.c:3472:14: note: each undeclared identifier is reported 
only once for each function it appears in
  ../block/file-posix.c:3476:14: error: ‘BLKCLOSEZONE’ undeclared (first use in 
this function); did you mean ‘BLKRESETZONE’?
   3476 | zo = BLKCLOSEZONE;
|  ^~~~
|  BLKRESETZONE
  ../block/file-posix.c:3480:14: error: ‘BLKFINISHZONE’ undeclared (first use 
in this function)
   3480 | zo = BLKFINISHZONE;
|  ^

Older kernels didn't have these ioctls. I don't think it makes sense to
enable file-posix zoned functionality without these ioctls.

I suggest changing the CONFIG_BLKZONED check in meson.build from:

  config_host_data.set('CONFIG_BLKZONED', cc.has_header('linux/blkzoned.h'))

 to:

  config_host_data.set('CONFIG_BLKZONED', 
cc.has_header_symbol('linux/blkzoned.h', 'BLKOPENZONE'))

> https://gitlab.com/qemu-project/qemu/-/jobs/4202085995#L4088

The  header file started using __DECLARE_FLEX_ARRAY()
and QEMU doesn't have that macro:

  linux-headers/asm/kvm.h:509:3: error: expected specifier-qualifier-list 
before '__DECLARE_FLEX_ARRAY'
509 |   __DECLARE_FLEX_ARRAY(struct kvm_vmx_nested_state_data, vmx);
|   ^~~~

You could update the sed command in scripts/update-linux-headers.sh to
convert __DECLARE_FLEX_ARRAY(type, field) into type field[] or import
the Linux macro definition of __DECLARE_FLEX_ARRAY().

Another failure is
https://gitlab.com/qemu-project/qemu/-/jobs/4202085991 where
qemu-iotests is failing because the output has changed due to the
addition of zoned fields to block stats.

Another failure is
https://gitlab.com/qemu-project/qemu/-/jobs/4202086041 where
qemu-system-ppc seems to segfault.

You can find the CI results here: 
https://gitlab.com/qemu-project/qemu/-/pipelines/852908752

You can run the GitLab CI yourself like this:

  $ git push -o ci.variable=QEMU_CI=2 your_gitlab_repo HEAD

Stefan

signature.asc
Description: PGP signature

Re: [PATCH] test-aio-multithread: do not use mb_read/mb_set for simple flags

2023-05-01 Thread Stefan Hajnoczi

On Fri, Apr 28, 2023 at 01:12:48PM +0200, Paolo Bonzini wrote:
> The remaining use of mb_read/mb_set is just to force a thread to exit
> eventually.  It does not order two memory accesses and therefore can be
> just read/set.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  tests/unit/test-aio-multithread.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

[PATCH v4 1/3] target/riscv: smstateen check for fcsr

2023-05-01 Thread Mayuresh Chitale

If smstateen is implemented and smtateen0.fcsr is clear and misa.F
is off then the floating point operations must return illegal
instruction exception or virtual instruction trap, if relevant.

Signed-off-by: Mayuresh Chitale 
Reviewed-by: Weiwei Li 
---
 target/riscv/csr.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 4451bd1263..3f6b824bd2 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -82,6 +82,10 @@ static RISCVException fs(CPURISCVState *env, int csrno)
 !riscv_cpu_cfg(env)->ext_zfinx) {
 return RISCV_EXCP_ILLEGAL_INST;
 }
+
+if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
+return smstateen_acc_ok(env, 0, SMSTATEEN0_FCSR);
+}
 #endif
 return RISCV_EXCP_NONE;
 }
@@ -2100,6 +2104,9 @@ static RISCVException write_mstateen0(CPURISCVState *env, 
int csrno,
   target_ulong new_val)
 {
 uint64_t wr_mask = SMSTATEEN_STATEEN | SMSTATEEN0_HSENVCFG;
+if (!riscv_has_ext(env, RVF)) {
+wr_mask |= SMSTATEEN0_FCSR;
+}
 
 return write_mstateen(env, csrno, wr_mask, new_val);
 }
@@ -2173,6 +2180,10 @@ static RISCVException write_hstateen0(CPURISCVState 
*env, int csrno,
 {
 uint64_t wr_mask = SMSTATEEN_STATEEN | SMSTATEEN0_HSENVCFG;
 
+if (!riscv_has_ext(env, RVF)) {
+wr_mask |= SMSTATEEN0_FCSR;
+}
+
 return write_hstateen(env, csrno, wr_mask, new_val);
 }
 
@@ -2259,6 +2270,10 @@ static RISCVException write_sstateen0(CPURISCVState 
*env, int csrno,
 {
 uint64_t wr_mask = SMSTATEEN_STATEEN | SMSTATEEN0_HSENVCFG;
 
+if (!riscv_has_ext(env, RVF)) {
+wr_mask |= SMSTATEEN0_FCSR;
+}
+
 return write_sstateen(env, csrno, wr_mask, new_val);
 }
 
-- 
2.34.1

[PATCH v4 2/3] target/riscv: Reuse tb->flags.FS

2023-05-01 Thread Mayuresh Chitale

When misa.F is 0 tb->flags.FS field is unused and can be used to save
the current state of smstateen0.FCSR check which is needed by the
floating point translation routines.

Signed-off-by: Mayuresh Chitale 
Reviewed-by: Weiwei Li 
---
 target/riscv/cpu_helper.c   | 6 ++
 target/riscv/insn_trans/trans_rvf.c.inc | 7 ---
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index b68dcfe7b6..695c189f96 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -119,6 +119,12 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong 
*pc,
 vs = MIN(vs, get_field(env->mstatus_hs, MSTATUS_VS));
 }
 
+/* With Zfinx, floating point is enabled/disabled by Smstateen. */
+if (!riscv_has_ext(env, RVF)) {
+fs = (smstateen_acc_ok(env, 0, SMSTATEEN0_FCSR) == RISCV_EXCP_NONE)
+ ? EXT_STATUS_DIRTY : EXT_STATUS_DISABLED;
+}
+
 if (cpu->cfg.debug && !icount_enabled()) {
 flags = FIELD_DP32(flags, TB_FLAGS, ITRIGGER, env->itrigger_enabled);
 }
diff --git a/target/riscv/insn_trans/trans_rvf.c.inc 
b/target/riscv/insn_trans/trans_rvf.c.inc
index b2de4fcf3f..509a6acffe 100644
--- a/target/riscv/insn_trans/trans_rvf.c.inc
+++ b/target/riscv/insn_trans/trans_rvf.c.inc
@@ -19,9 +19,10 @@
  */
 
 #define REQUIRE_FPU do {\
-if (ctx->mstatus_fs == EXT_STATUS_DISABLED) \
-if (!ctx->cfg_ptr->ext_zfinx) \
-return false; \
+if (ctx->mstatus_fs == EXT_STATUS_DISABLED) {   \
+ctx->virt_inst_excp = ctx->virt_enabled && ctx->cfg_ptr->ext_zfinx; \
+return false;   \
+}   \
 } while (0)
 
 #define REQUIRE_ZFINX_OR_F(ctx) do {\
-- 
2.34.1

[PATCH v4 3/3] target/riscv: smstateen knobs

2023-05-01 Thread Mayuresh Chitale

Add knobs to allow users to enable smstateen and also export it via the
ISA extension string.

Signed-off-by: Mayuresh Chitale 
Reviewed-by: Weiwei Li
Reviewed-by: Alistair Francis 
---
 target/riscv/cpu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index befa64528f..9420cd670e 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -119,6 +119,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(zhinx, PRIV_VERSION_1_12_0, ext_zhinx),
 ISA_EXT_DATA_ENTRY(zhinxmin, PRIV_VERSION_1_12_0, ext_zhinxmin),
 ISA_EXT_DATA_ENTRY(smaia, PRIV_VERSION_1_12_0, ext_smaia),
+ISA_EXT_DATA_ENTRY(smstateen, PRIV_VERSION_1_12_0, ext_smstateen),
 ISA_EXT_DATA_ENTRY(ssaia, PRIV_VERSION_1_12_0, ext_ssaia),
 ISA_EXT_DATA_ENTRY(sscofpmf, PRIV_VERSION_1_12_0, ext_sscofpmf),
 ISA_EXT_DATA_ENTRY(sstc, PRIV_VERSION_1_12_0, ext_sstc),
@@ -1498,8 +1499,8 @@ static Property riscv_cpu_extensions[] = {
 DEFINE_PROP_UINT16("vlen", RISCVCPU, cfg.vlen, 128),
 DEFINE_PROP_UINT16("elen", RISCVCPU, cfg.elen, 64),
 
+DEFINE_PROP_BOOL("smstateen", RISCVCPU, cfg.ext_smstateen, false),
 DEFINE_PROP_BOOL("svadu", RISCVCPU, cfg.ext_svadu, true),
-
 DEFINE_PROP_BOOL("svinval", RISCVCPU, cfg.ext_svinval, false),
 DEFINE_PROP_BOOL("svnapot", RISCVCPU, cfg.ext_svnapot, false),
 DEFINE_PROP_BOOL("svpbmt", RISCVCPU, cfg.ext_svpbmt, false),
-- 
2.34.1

[PATCH v4 0/3] Smstateen FCSR

2023-05-01 Thread Mayuresh Chitale

Patch 4 and 5 of the smstateen series need to be re-submitted with
changes described in the email below.
https://lists.nongnu.org/archive/html/qemu-riscv/2022-11/msg00155.html
Hence splitting the patch 4 of the original series into three and
re-submitting along with the original patch 5.

Changes in v4:
- Drop patch 3 
- Add reviewed-by tag

Changes in v3:
- Reuse TB_FLAGS.FS (instead of TB_FLAGS.HS_FS) for smstateen as HS_FS bits 
been removed.
- Remove fcsr check for zfh and zfhmin

Changes in v2:
 - Improve patch 1 description
 - Reuse TB_FLAGS.HS_FS for smstateen
 - Convert smstateen_fcsr_check to function
 - Add fcsr check for zdinx

Mayuresh Chitale (3):
  target/riscv: smstateen check for fcsr
  target/riscv: Reuse tb->flags.FS
  target/riscv: smstateen knobs

 target/riscv/cpu.c  |  3 ++-
 target/riscv/cpu_helper.c   |  6 ++
 target/riscv/csr.c  | 15 +++
 target/riscv/insn_trans/trans_rvf.c.inc |  7 ---
 4 files changed, 27 insertions(+), 4 deletions(-)

-- 
2.34.1

[PATCH 2/8] migration: Add precopy initial data handshake

2023-05-01 Thread Avihai Horon

Add precopy initial data handshake between source and destination upon
migration setup. The purpose of the handshake is to notify the
destination that precopy initial data is used and which migration users
(i.e., SaveStateEntry) are going to use it.

The handshake is done in two levels. First, a general enable command is
sent to notify the destination migration code that precopy initial data
is used.
Then, for each migration user in the source that supports precopy
initial data, an enable command is sent to its counterpart in the
destination:
If both support it, precopy initial data will be used for them.
If source doesn't support it, precopy initial data will not be used for
them.
If source supports it and destination doesn't, migration will be failed.

To implement it, a new migration command MIG_CMD_INITIAL_DATA_ENABLE is
added, as well as a new SaveVMHandlers handler initial_data_advise.
Calling the handler advises the migration user that precopy initial data
is used and its return value indicates whether precopy initial data is
supported by it.

Signed-off-by: Avihai Horon 
---
 include/migration/register.h |   6 +++
 migration/migration.h|   3 ++
 migration/savevm.h   |   1 +
 migration/migration.c|   4 ++
 migration/savevm.c   | 102 +++
 migration/trace-events   |   2 +
 6 files changed, 118 insertions(+)

diff --git a/include/migration/register.h b/include/migration/register.h
index a8dfd8fefd..0a73f3883e 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -71,6 +71,12 @@ typedef struct SaveVMHandlers {
 int (*load_cleanup)(void *opaque);
 /* Called when postcopy migration wants to resume from failure */
 int (*resume_prepare)(MigrationState *s, void *opaque);
+
+/*
+ * Advises that precopy initial data was requested to be enabled. Returns
+ * true if it's supported or false otherwise. Called both in src and dest.
+ */
+bool (*initial_data_advise)(void *opaque);
 } SaveVMHandlers;
 
 int register_savevm_live(const char *idstr,
diff --git a/migration/migration.h b/migration/migration.h
index 3a918514e7..4f615e9dbc 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -204,6 +204,9 @@ struct MigrationIncomingState {
  * contains valid information.
  */
 QemuMutex page_request_mutex;
+
+/* Indicates whether precopy initial data was enabled and should be used */
+bool initial_data_enabled;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/migration/savevm.h b/migration/savevm.h
index fb636735f0..d47ab4ad18 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -58,6 +58,7 @@ void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const 
char *name,
uint64_t *start_list,
uint64_t *length_list);
 void qemu_savevm_send_colo_enable(QEMUFile *f);
+void qemu_savevm_send_initial_data_enable(MigrationState *ms, QEMUFile *f);
 void qemu_savevm_live_state(QEMUFile *f);
 int qemu_save_device_state(QEMUFile *f);
 
diff --git a/migration/migration.c b/migration/migration.c
index abcadbb619..68cdf5b184 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2964,6 +2964,10 @@ static void *migration_thread(void *opaque)
 qemu_savevm_send_colo_enable(s->to_dst_file);
 }
 
+if (migrate_precopy_initial_data()) {
+qemu_savevm_send_initial_data_enable(s, s->to_dst_file);
+}
+
 qemu_savevm_state_setup(s->to_dst_file);
 
 qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
diff --git a/migration/savevm.c b/migration/savevm.c
index a9181b444b..2740defdf0 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -71,6 +71,13 @@
 
 const unsigned int postcopy_ram_discard_version;
 
+typedef struct {
+uint8_t general_enable;
+uint8_t reserved[7];
+uint8_t idstr[256];
+uint32_t instance_id;
+} InitialDataInfo;
+
 /* Subcommands for QEMU_VM_COMMAND */
 enum qemu_vm_cmd {
 MIG_CMD_INVALID = 0,   /* Must be 0 */
@@ -90,6 +97,8 @@ enum qemu_vm_cmd {
 MIG_CMD_ENABLE_COLO,   /* Enable COLO */
 MIG_CMD_POSTCOPY_RESUME,   /* resume postcopy on dest */
 MIG_CMD_RECV_BITMAP,   /* Request for recved bitmap on dst */
+
+MIG_CMD_INITIAL_DATA_ENABLE, /* Enable precopy initial data in dest */
 MIG_CMD_MAX
 };
 
@@ -109,6 +118,8 @@ static struct mig_cmd_args {
 [MIG_CMD_POSTCOPY_RESUME]  = { .len =  0, .name = "POSTCOPY_RESUME" },
 [MIG_CMD_PACKAGED] = { .len =  4, .name = "PACKAGED" },
 [MIG_CMD_RECV_BITMAP]  = { .len = -1, .name = "RECV_BITMAP" },
+[MIG_CMD_INITIAL_DATA_ENABLE] = { .len = sizeof(InitialDataInfo),
+  .name = "INITIAL_DATA_ENABLE" },
 [MIG_CMD_MAX]  = { .len = -1, .name = "MAX" },
 };
 
@@ -1036,6 +1047,40 @@ static void qemu_savevm_command_send(QEMUFile *f,
 qemu_fflush(

[PATCH 3/8] migration: Add precopy initial data loaded ACK functionality

2023-05-01 Thread Avihai Horon

Add the core functionality of precopy initial data, which allows the
destination to ACK that initial data has been loaded and the source to
wait for this ACK before completing the migration.

A new return path command MIG_RP_MSG_INITIAL_DATA_LOADED_ACK is added.
It is sent by the destination after precopy initial data is loaded to
ACK to the source that precopy initial data has been loaded.

In addition, two new SaveVMHandlers handlers are added:
1. is_initial_data_active which indicates whether precopy initial data
   is used for this migration user (i.e., SaveStateEntry).
2. initial_data_loaded which indicates whether precopy initial data has
   been loaded by this migration user.

Signed-off-by: Avihai Horon 
---
 include/migration/register.h |  7 ++
 migration/migration.h| 12 +++
 migration/migration.c| 41 ++--
 migration/savevm.c   | 39 ++
 migration/trace-events   |  2 ++
 5 files changed, 99 insertions(+), 2 deletions(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index 0a73f3883e..297bbe9f73 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -77,6 +77,13 @@ typedef struct SaveVMHandlers {
  * true if it's supported or false otherwise. Called both in src and dest.
  */
 bool (*initial_data_advise)(void *opaque);
+/*
+ * Checks if precopy initial data is active. If it's inactive,
+ * initial_data_loaded check is skipped.
+ */
+bool (*is_initial_data_active)(void *opaque);
+/* Checks if precopy initial data has been loaded in dest */
+bool (*initial_data_loaded)(void *opaque);
 } SaveVMHandlers;
 
 int register_savevm_live(const char *idstr,
diff --git a/migration/migration.h b/migration/migration.h
index 4f615e9dbc..d865c23d87 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -207,6 +207,11 @@ struct MigrationIncomingState {
 
 /* Indicates whether precopy initial data was enabled and should be used */
 bool initial_data_enabled;
+/*
+ * Indicates whether an ACK that precopy initial data was loaded has been
+ * sent to source.
+ */
+bool initial_data_loaded_ack_sent;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
@@ -435,6 +440,12 @@ struct MigrationState {
 
 /* QEMU_VM_VMDESCRIPTION content filled for all non-iterable devices. */
 JSONWriter *vmdesc;
+
+/*
+ * Indicates whether an ACK that precopy initial data was loaded in
+ * destination has been received.
+ */
+bool initial_data_loaded_acked;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);
@@ -475,6 +486,7 @@ int 
migrate_send_rp_message_req_pages(MigrationIncomingState *mis,
 void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
  char *block_name);
 void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
+int migrate_send_rp_initial_data_loaded_ack(MigrationIncomingState *mis);
 
 void dirty_bitmap_mig_before_vm_start(void);
 void dirty_bitmap_mig_cancel_outgoing(void);
diff --git a/migration/migration.c b/migration/migration.c
index 68cdf5b184..304cab2fa1 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -77,6 +77,11 @@ enum mig_rp_message_type {
 MIG_RP_MSG_RECV_BITMAP,  /* send recved_bitmap back to source */
 MIG_RP_MSG_RESUME_ACK,   /* tell source that we are ready to resume */
 
+MIG_RP_MSG_INITIAL_DATA_LOADED_ACK, /*
+ * Tell source precopy initial data is
+ * loaded.
+ */
+
 MIG_RP_MSG_MAX
 };
 
@@ -756,6 +761,12 @@ bool migration_has_all_channels(void)
 return true;
 }
 
+int migrate_send_rp_initial_data_loaded_ack(MigrationIncomingState *mis)
+{
+return migrate_send_rp_message(mis, MIG_RP_MSG_INITIAL_DATA_LOADED_ACK, 0,
+   NULL);
+}
+
 /*
  * Send a 'SHUT' message on the return channel with the given value
  * to indicate that we've finished with the RP.  Non-0 value indicates
@@ -1401,6 +1412,8 @@ void migrate_init(MigrationState *s)
 s->vm_was_running = false;
 s->iteration_initial_bytes = 0;
 s->threshold_size = 0;
+
+s->initial_data_loaded_acked = false;
 }
 
 int migrate_add_blocker_internal(Error *reason, Error **errp)
@@ -1717,6 +1730,9 @@ static struct rp_cmd_args {
 [MIG_RP_MSG_REQ_PAGES_ID]   = { .len = -1, .name = "REQ_PAGES_ID" },
 [MIG_RP_MSG_RECV_BITMAP]= { .len = -1, .name = "RECV_BITMAP" },
 [MIG_RP_MSG_RESUME_ACK] = { .len =  4, .name = "RESUME_ACK" },
+[MIG_RP_MSG_INITIAL_DATA_LOADED_ACK] = { .len = 0,
+ .name =
+ "INITIAL_DATA_LOADED_ACK" },
 [MIG_RP_MSG_MAX]= { .len = -1, .name = "MAX" },
 };
 
@@ -1955

[PATCH 1/8] migration: Add precopy initial data capability

2023-05-01 Thread Avihai Horon

Migration downtime estimation is calculated based on bandwidth and
remaining migration data. This assumes that loading of migration data in
the destination takes a negligible amount of time and that downtime
depends only on network speed.

While this may be true for RAM, it's not necessarily true for other
migration users. For example, loading the data of a VFIO device in the
destination might require from the device to allocate resources, prepare
internal data structures and so on. These operations can take a
significant amount of time which can increase migration downtime.

This patch adds a new capability "precopy initial data" that allows the
source to send initial precopy data and the destination to ACK that this
data has been loaded. Migration will not attempt to stop the source VM
and complete the migration until this ACK is received.

This will allow migration users to send initial precopy data which can
be used to reduce downtime (e.g., by pre-allocating resources), while
making sure that the source will stop the VM and complete the migration
only after this initial precopy data is sent and loaded in the
destination so it will have full effect.

This new capability relies on the return path capability to communicate
from the destination back to the source.

The actual implementation of the capability will be added in the
following patches.

Signed-off-by: Avihai Horon 
---
 qapi/migration.json |  9 -
 migration/options.h |  1 +
 migration/options.c | 20 
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 82000adce4..d496148386 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -478,6 +478,13 @@
 #should not affect the correctness of postcopy migration.
 #(since 7.1)
 #
+# @precopy-initial-data: If enabled, migration will not attempt to stop source
+#VM and complete the migration until an ACK is received
+#from the destination that initial precopy data has
+#been loaded. This can improve downtime if there are
+#migration users that support precopy initial data.
+#(since 8.1)
+#
 # Features:
 # @unstable: Members @x-colo and @x-ignore-shared are experimental.
 #
@@ -492,7 +499,7 @@
'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
{ 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
'validate-uuid', 'background-snapshot',
-   'zero-copy-send', 'postcopy-preempt'] }
+   'zero-copy-send', 'postcopy-preempt', 'precopy-initial-data'] }
 
 ##
 # @MigrationCapabilityStatus:
diff --git a/migration/options.h b/migration/options.h
index 3c322867cd..d004b6321e 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -44,6 +44,7 @@ bool migrate_pause_before_switchover(void);
 bool migrate_postcopy_blocktime(void);
 bool migrate_postcopy_preempt(void);
 bool migrate_postcopy_ram(void);
+bool migrate_precopy_initial_data(void);
 bool migrate_rdma_pin_all(void);
 bool migrate_release_ram(void);
 bool migrate_return_path(void);
diff --git a/migration/options.c b/migration/options.c
index 53b7fc5d5d..c4ef0c60c7 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -184,6 +184,8 @@ Property migration_properties[] = {
 DEFINE_PROP_MIG_CAP("x-zero-copy-send",
 MIGRATION_CAPABILITY_ZERO_COPY_SEND),
 #endif
+DEFINE_PROP_MIG_CAP("x-precopy-initial-data",
+MIGRATION_CAPABILITY_PRECOPY_INITIAL_DATA),
 
 DEFINE_PROP_END_OF_LIST(),
 };
@@ -286,6 +288,13 @@ bool migrate_postcopy_ram(void)
 return s->capabilities[MIGRATION_CAPABILITY_POSTCOPY_RAM];
 }
 
+bool migrate_precopy_initial_data(void)
+{
+MigrationState *s = migrate_get_current();
+
+return s->capabilities[MIGRATION_CAPABILITY_PRECOPY_INITIAL_DATA];
+}
+
 bool migrate_rdma_pin_all(void)
 {
 MigrationState *s = migrate_get_current();
@@ -546,6 +555,17 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, 
Error **errp)
 }
 }
 
+if (new_caps[MIGRATION_CAPABILITY_PRECOPY_INITIAL_DATA]) {
+if (!new_caps[MIGRATION_CAPABILITY_RETURN_PATH]) {
+error_setg(errp, "Precopy initial data requires return path");
+return false;
+}
+
+/* Disable this capability until it's implemented */
+error_setg(errp, "Precopy initial data is not implemented yet");
+return false;
+}
+
 return true;
 }
 
-- 
2.26.3

[PATCH 7/8] vfio/migration: Add VFIO migration pre-copy support

2023-05-01 Thread Avihai Horon

Pre-copy support allows the VFIO device data to be transferred while the
VM is running. This helps to accommodate VFIO devices that have a large
amount of data that needs to be transferred, and it can reduce migration
downtime.

Pre-copy support is optional in VFIO migration protocol v2.
Implement pre-copy of VFIO migration protocol v2 and use it for devices
that support it. Full description of it can be found here [1].

[1]
https://lore.kernel.org/kvm/20221206083438.37807-3-yish...@nvidia.com/

Signed-off-by: Avihai Horon 
---
 docs/devel/vfio-migration.rst |  35 +---
 include/hw/vfio/vfio-common.h |   3 +
 hw/vfio/common.c  |   6 +-
 hw/vfio/migration.c   | 155 --
 hw/vfio/trace-events  |   4 +-
 5 files changed, 181 insertions(+), 22 deletions(-)

diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
index 1b68ccf115..e896b2a673 100644
--- a/docs/devel/vfio-migration.rst
+++ b/docs/devel/vfio-migration.rst
@@ -7,12 +7,14 @@ the guest is running on source host and restoring this saved 
state on the
 destination host. This document details how saving and restoring of VFIO
 devices is done in QEMU.
 
-Migration of VFIO devices currently consists of a single stop-and-copy phase.
-During the stop-and-copy phase the guest is stopped and the entire VFIO device
-data is transferred to the destination.
-
-The pre-copy phase of migration is currently not supported for VFIO devices.
-Support for VFIO pre-copy will be added later on.
+Migration of VFIO devices consists of two phases: the optional pre-copy phase,
+and the stop-and-copy phase. The pre-copy phase is iterative and allows to
+accommodate VFIO devices that have a large amount of data that needs to be
+transferred. The iterative pre-copy phase of migration allows for the guest to
+continue whilst the VFIO device state is transferred to the destination, this
+helps to reduce the total downtime of the VM. VFIO devices opt-in to pre-copy
+support by reporting the VFIO_MIGRATION_PRE_COPY flag in the
+VFIO_DEVICE_FEATURE_MIGRATION ioctl.
 
 Note that currently VFIO migration is supported only for a single device. This
 is due to VFIO migration's lack of P2P support. However, P2P support is planned
@@ -29,10 +31,20 @@ VFIO implements the device hooks for the iterative approach 
as follows:
 * A ``load_setup`` function that sets the VFIO device on the destination in
   _RESUMING state.
 
+* A ``state_pending_estimate`` function that reports an estimate of the
+  remaining pre-copy data that the vendor driver has yet to save for the VFIO
+  device.
+
 * A ``state_pending_exact`` function that reads pending_bytes from the vendor
   driver, which indicates the amount of data that the vendor driver has yet to
   save for the VFIO device.
 
+* An ``is_active_iterate`` function that indicates ``save_live_iterate`` is
+  active only when the VFIO device is in pre-copy states.
+
+* A ``save_live_iterate`` function that reads the VFIO device's data from the
+  vendor driver during iterative pre-copy phase.
+
 * A ``save_state`` function to save the device config space if it is present.
 
 * A ``save_live_complete_precopy`` function that sets the VFIO device in
@@ -111,8 +123,10 @@ Flow of state changes during Live migration
 ===
 
 Below is the flow of state change during live migration.
-The values in the brackets represent the VM state, the migration state, and
+The values in the parentheses represent the VM state, the migration state, and
 the VFIO device state, respectively.
+The text in the square brackets represents the flow if the VFIO device supports
+pre-copy.
 
 Live migration save path
 
@@ -124,11 +138,12 @@ Live migration save path
   |
  migrate_init spawns migration_thread
 Migration thread then calls each device's .save_setup()
-   (RUNNING, _SETUP, _RUNNING)
+  (RUNNING, _SETUP, _RUNNING [_PRE_COPY])
   |
-  (RUNNING, _ACTIVE, _RUNNING)
- If device is active, get pending_bytes by .state_pending_exact()
+  (RUNNING, _ACTIVE, _RUNNING [_PRE_COPY])
+  If device is active, get pending_bytes by 
.state_pending_{estimate,exact}()
   If total pending_bytes >= threshold_size, call .save_live_iterate()
+  [Data of VFIO device for pre-copy phase is copied]
 Iterate till total pending bytes converge and are less than threshold
   |
   On migration completion, vCPU stops and calls .save_live_complete_precopy for
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index eed244f25f..fa42955d4c 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -66,6 +66,9 @@ typedef struct VFIOMigration {
 int data_fd;
 void *data

[PATCH 8/8] vfio/migration: Add support for precopy initial data capability

2023-05-01 Thread Avihai Horon

Loading of a VFIO device's data can take a substantial amount of time as
the device may need to allocate resources, prepare internal data
structures, etc. This can increase migration downtime, especially for
VFIO devices with a lot of resources.

To solve this, VFIO migration uAPI defines "initial bytes" as part of
its precopy data stream. Initial bytes can be used in various ways to
improve VFIO migration performance. For example, it can be used to
transfer device metadata to pre-allocate resources in the destination.
However, for this to work we need to make sure that all initial bytes
are sent and loaded in the destination before the source VM is stopped.

Use migration precopy initial data capability to make sure a VFIO
device's initial bytes are sent and loaded in the destination before the
source stops the VM and attempts to complete the migration.
This can significantly reduce migration downtime.

Signed-off-by: Avihai Horon 
---
 include/hw/vfio/vfio-common.h |  3 +++
 hw/vfio/migration.c   | 48 ++-
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index fa42955d4c..dd3b052682 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -69,6 +69,9 @@ typedef struct VFIOMigration {
 uint64_t precopy_init_size;
 uint64_t precopy_dirty_size;
 uint64_t mig_flags;
+bool initial_data_active;
+bool initial_data_sent;
+bool initial_data_loaded;
 } VFIOMigration;
 
 typedef struct VFIOAddressSpace {
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 980be1f614..23f4f1f8a5 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -45,6 +45,7 @@
 #define VFIO_MIG_FLAG_DEV_CONFIG_STATE  (0xef12ULL)
 #define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xef13ULL)
 #define VFIO_MIG_FLAG_DEV_DATA_STATE(0xef14ULL)
+#define VFIO_MIG_FLAG_DEV_INIT_DATA_SENT (0xef15ULL)
 
 /*
  * This is an arbitrary size based on migration of mlx5 devices, where 
typically
@@ -372,6 +373,8 @@ static void vfio_save_cleanup(void *opaque)
 
 g_free(migration->data_buffer);
 migration->data_buffer = NULL;
+migration->initial_data_sent = false;
+migration->initial_data_active = false;
 vfio_migration_cleanup(vbasedev);
 trace_vfio_save_cleanup(vbasedev->name);
 }
@@ -447,10 +450,17 @@ static int vfio_save_iterate(QEMUFile *f, void *opaque)
 if (data_size < 0) {
 return data_size;
 }
-qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
 
 vfio_update_estimated_pending_data(migration, data_size);
 
+if (migration->initial_data_active && !migration->precopy_init_size &&
+!migration->initial_data_sent) {
+qemu_put_be64(f, VFIO_MIG_FLAG_DEV_INIT_DATA_SENT);
+migration->initial_data_sent = true;
+} else {
+qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
+}
+
 trace_vfio_save_iterate(vbasedev->name);
 
 /*
@@ -568,6 +578,12 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int 
version_id)
 }
 break;
 }
+case VFIO_MIG_FLAG_DEV_INIT_DATA_SENT:
+{
+vbasedev->migration->initial_data_loaded = true;
+
+return 0;
+}
 default:
 error_report("%s: Unknown tag 0x%"PRIx64, vbasedev->name, data);
 return -EINVAL;
@@ -582,6 +598,33 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int 
version_id)
 return ret;
 }
 
+static bool vfio_initial_data_advise(void *opaque)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+
+migration->initial_data_active =
+migration->mig_flags & VFIO_MIGRATION_PRE_COPY;
+
+return migration->initial_data_active;
+}
+
+static bool vfio_is_initial_data_active(void *opaque)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+
+return migration->initial_data_active;
+}
+
+static bool vfio_initial_data_loaded(void *opaque)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+
+return migration->initial_data_loaded;
+}
+
 static const SaveVMHandlers savevm_vfio_handlers = {
 .save_setup = vfio_save_setup,
 .save_cleanup = vfio_save_cleanup,
@@ -594,6 +637,9 @@ static const SaveVMHandlers savevm_vfio_handlers = {
 .load_setup = vfio_load_setup,
 .load_cleanup = vfio_load_cleanup,
 .load_state = vfio_load_state,
+.initial_data_advise = vfio_initial_data_advise,
+.is_initial_data_active = vfio_is_initial_data_active,
+.initial_data_loaded = vfio_initial_data_loaded,
 };
 
 /* -- */
-- 
2.26.3

[PATCH 4/8] migration: Enable precopy initial data capability

2023-05-01 Thread Avihai Horon

Now that precopy initial data logic has been implemented, enable the
capability.

Signed-off-by: Avihai Horon 
---
 migration/options.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/migration/options.c b/migration/options.c
index c4ef0c60c7..77a570f866 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -560,10 +560,6 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, 
Error **errp)
 error_setg(errp, "Precopy initial data requires return path");
 return false;
 }
-
-/* Disable this capability until it's implemented */
-error_setg(errp, "Precopy initial data is not implemented yet");
-return false;
 }
 
 return true;
-- 
2.26.3

[PATCH 6/8] vfio/migration: Refactor vfio_save_block() to return saved data size

2023-05-01 Thread Avihai Horon

Refactor vfio_save_block() to return the size of saved data on success
and -errno on error.

This will be used in next patch to implement VFIO migration pre-copy
support.

Signed-off-by: Avihai Horon 
Reviewed-by: Cédric Le Goater 
---
 hw/vfio/migration.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 6b58dddb88..235978fd68 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -241,8 +241,8 @@ static int vfio_query_stop_copy_size(VFIODevice *vbasedev,
 return 0;
 }
 
-/* Returns 1 if end-of-stream is reached, 0 if more data and -errno if error */
-static int vfio_save_block(QEMUFile *f, VFIOMigration *migration)
+/* Returns the size of saved data on success and -errno on error */
+static ssize_t vfio_save_block(QEMUFile *f, VFIOMigration *migration)
 {
 ssize_t data_size;
 
@@ -252,7 +252,7 @@ static int vfio_save_block(QEMUFile *f, VFIOMigration 
*migration)
 return -errno;
 }
 if (data_size == 0) {
-return 1;
+return 0;
 }
 
 qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE);
@@ -262,7 +262,7 @@ static int vfio_save_block(QEMUFile *f, VFIOMigration 
*migration)
 
 trace_vfio_save_block(migration->vbasedev->name, data_size);
 
-return qemu_file_get_error(f);
+return qemu_file_get_error(f) ?: data_size;
 }
 
 /* -- */
@@ -335,6 +335,7 @@ static void vfio_state_pending_exact(void *opaque, uint64_t 
*must_precopy,
 static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
 {
 VFIODevice *vbasedev = opaque;
+ssize_t data_size;
 int ret;
 
 /* We reach here with device state STOP only */
@@ -345,11 +346,11 @@ static int vfio_save_complete_precopy(QEMUFile *f, void 
*opaque)
 }
 
 do {
-ret = vfio_save_block(f, vbasedev->migration);
-if (ret < 0) {
-return ret;
+data_size = vfio_save_block(f, vbasedev->migration);
+if (data_size < 0) {
+return data_size;
 }
-} while (!ret);
+} while (data_size);
 
 qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
 ret = qemu_file_get_error(f);
-- 
2.26.3

[PATCH 5/8] tests: Add migration precopy initial data capability test

2023-05-01 Thread Avihai Horon

Add migration precopy initial data capability test. The test runs
without migration users that support this capability, but is still
useful to make sure it didn't break anything.

Signed-off-by: Avihai Horon 
---
 tests/qtest/migration-test.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 60dd53d3ec..71d30bd330 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1533,6 +1533,25 @@ static void test_precopy_tcp_plain(void)
 test_precopy_common(&args);
 }
 
+static void *test_migrate_initial_data_start(QTestState *from, QTestState *to)
+{
+
+migrate_set_capability(from, "return-path", true);
+migrate_set_capability(from, "precopy-initial-data", true);
+
+return NULL;
+}
+
+static void test_precopy_tcp_initial_data(void)
+{
+MigrateCommon args = {
+.listen_uri = "tcp:127.0.0.1:0",
+.start_hook = test_migrate_initial_data_start,
+};
+
+test_precopy_common(&args);
+}
+
 #ifdef CONFIG_GNUTLS
 static void test_precopy_tcp_tls_psk_match(void)
 {
@@ -2557,6 +2576,10 @@ int main(int argc, char **argv)
 #endif /* CONFIG_GNUTLS */
 
 qtest_add_func("/migration/precopy/tcp/plain", test_precopy_tcp_plain);
+
+qtest_add_func("/migration/precopy/tcp/plain/precopy-initial-data",
+   test_precopy_tcp_initial_data);
+
 #ifdef CONFIG_GNUTLS
 qtest_add_func("/migration/precopy/tcp/tls/psk/match",
test_precopy_tcp_tls_psk_match);
-- 
2.26.3

[PATCH 0/8] migration: Add precopy initial data capability and VFIO precopy support

2023-05-01 Thread Avihai Horon

Hello everyone,

This series adds a new migration capability called "precopy initial
data". The purpose of this capability is to reduce migration downtime in
cases where loading of migration data in the destination can take a lot
of time, such as with VFIO migration data.

The series then moves to add precopy support and precopy initial data
support for VFIO migration.

Precopy initial data is used by VFIO migration, but other migration
users can add support for it and use it as well.

=== Background ===

Migration downtime estimation is calculated based on bandwidth and
remaining migration data. This assumes that loading of migration data in
the destination takes a negligible amount of time and that downtime
depends only on network speed.

While this may be true for RAM, it's not necessarily true for other
migration users. For example, loading the data of a VFIO device in the
destination might require from the device to allocate resources and
prepare internal data structures which can take a significant amount of
time to do.

This poses a problem, as the source may think that the remaining
migration data is small enough to meet the downtime limit, so it will
stop the VM and complete the migration, but in fact sending and loading
the data in the destination may take longer than the downtime limit.

To solve this, VFIO migration uAPI defines "initial bytes" as part of
its precopy stream [1]. Initial bytes can be used in various ways to
improve VFIO migration performance. For example, it can be used to
transfer device metadata to pre-allocate resources in the destination.
However, for this to work we need to make sure that all initial bytes
are sent and loaded in the destination before the source VM is stopped.

The new precopy initial data migration capability helps us achieve this.
It allows the source to send initial precopy data and the destination to
ACK that this data has been loaded. Migration will not attempt to stop
the source VM and complete the migration until this ACK is received.

Note that this relies on the return path capability to communicate from
the destination back to the source.

=== Flow of operation ===

To use precopy initial data, the capability must be enabled in the
source.

As this capability must be supported also in the destination, a
handshake is performed during migration setup. The purpose of the
handshake is to notify the destination that precopy initial data is used
and to check if it's supported.

The handshake is done in two levels. First, a general handshake is done
with the destination migration code to notify that precopy initial data
is used. Then, for each migration user in the source that supports
precopy initial data, a handshake is done with its counterpart in the
destination:
If both support it, precopy initial data will be used for them.
If source doesn't support it, precopy initial data will not be used for
them.
If source supports it and destination doesn't, migration will be failed.

Assuming the handshake succeeded, migration starts to send precopy data
and as part of it also the initial precopy data. Initial precopy data is
just like any other precopy data and as such, migration code is not
aware of it. Therefore, it's the responsibility of the migration users
(such as VFIO devices) to notify their counterparts in the destination
that their initial precopy data has been sent (for example, VFIO
migration does it when its initial bytes reach zero).

In the destination, migration code will query each migration user that
supports precopy initial data and check if its initial data has been
loaded. If initial data has been loaded by all of them, an ACK will be
sent to the source which will now be able to complete migration when
appropriate.

=== Test results ===

The below table shows the downtime of two identical migrations. In the
first migration precopy initial data is disabled and in the second it is
enabled. The migrated VM is assigned with a mlx5 VFIO device which has
300MB of device data to be migrated.

+--+---+--+
| Precopy initial data | VFIO device data size | Downtime |
+--+---+--+
|   Disabled   | 300MB |  1900ms  |
|   Enabled| 300MB |  420ms   |
+--+---+--+

Precopy initial data gives a roughly 4.5 times improvement in downtime.
The 1480ms difference is time that is used for resource allocation for
the VFIO device in the destination. Without precopy initial data, this
time is spent when the source VM is stopped and thus the downtime is
much higher. With precopy initial data, the time is spent when the
source VM is still running.

=== Patch breakdown ===

- Patches 1-5 add the precopy initial data capability.
- Patches 6-7 add VFIO migration precopy support. Similar version of
  them was previously sent here [2].
- Patch 8 adds precopy initial data support

Re: [PULL 07/13] async: Add an optional reentrancy guard to the BH API

2023-05-01 Thread Alexander Bulekov



On 230428 1143, Thomas Huth wrote:
> From: Alexander Bulekov 
> 
> Devices can pass their MemoryReentrancyGuard (from their DeviceState),
> when creating new BHes. Then, the async API will toggle the guard
> before/after calling the BH call-back. This prevents bh->mmio reentrancy
> issues.
> 
> Signed-off-by: Alexander Bulekov 
> Reviewed-by: Darren Kenny 
> Message-Id: <20230427211013.2994127-3-alx...@bu.edu>
> [thuth: Fix "line over 90 characters" checkpatch.pl error]
> Signed-off-by: Thomas Huth 
> ---

 
>  void aio_bh_call(QEMUBH *bh)
>  {
> +bool last_engaged_in_io = false;
> +
> +if (bh->reentrancy_guard) {
> +last_engaged_in_io = bh->reentrancy_guard->engaged_in_io;
> +if (bh->reentrancy_guard->engaged_in_io) {
> +trace_reentrant_aio(bh->ctx, bh->name);
> +}
> +bh->reentrancy_guard->engaged_in_io = true;
> +}
> +
>  bh->cb(bh->opaque);
> +
> +if (bh->reentrancy_guard) {
> +bh->reentrancy_guard->engaged_in_io = last_engaged_in_io;
> +}

This causes a UAF if bh was freed in bh->cb(). 
OSS-Fuzz reported this as issue 58513.

==3433535==ERROR: AddressSanitizer: heap-use-after-free on address 
0x606427d0 at pc 0x565542b09347 bp 0x7fff2a4cf590 sp 0x7fff2a4cf588
READ of size 8 at 0x606427d0 thread T0
#0 0x565542b09346 in aio_bh_call /../util/async.c:169:19
#1 0x565542b0a2cc in aio_bh_poll /../util/async.c:200:13
#2 0x565542a6a818 in aio_dispatch /../util/aio-posix.c:421:5
#3 0x565542b1156e in aio_ctx_dispatch /../util/async.c:342:5
#4 0x7fc66e3657a8 in g_main_context_dispatch 
(/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x547a8) (BuildId: 
77a560369e4633278bc6e75ab0587491e11d5aac)
#5 0x565542b153f9 in glib_pollfds_poll /../util/main-loop.c:290:9
#6 0x565542b13cb3 in os_host_main_loop_wait /../util/main-loop.c:313:5
#7 0x565542b1387c in main_loop_wait /../util/main-loop.c:592:11

0x606427d0 is located 48 bytes inside of 56-byte region 
[0x606427a0,0x606427d8)
freed by thread T0 here:
#0 0x56553eff2192 in __interceptor_free (Id: 
ba9d8c3e3344b6323a2db18d4ab0bb9948201520)
#1 0x565542b0a32f in aio_bh_poll /../util/async.c:203:13
#2 0x565542a6ed7c in aio_poll /../util/aio-posix.c:721:17
#3 0x565542380b4d in bdrv_aio_cancel /../block/io.c:2812:13
#4 0x56554231aeda in blk_aio_cancel /../block/block-backend.c:1702:5
#5 0x56553f8fc242 in ahci_reset_port /../hw/ide/ahci.c:678:13
#6 0x56553f91d073 in handle_reg_h2d_fis /../hw/ide/ahci.c:1218:17
#7 0x56553f91a6c5 in handle_cmd /../hw/ide/ahci.c:1323:13
#8 0x56553f90fb13 in check_cmd /../hw/ide/ahci.c:595:18
#9 0x56553f944b8d in ahci_check_cmd_bh /../hw/ide/ahci.c:609:5
#10 0x565542b0929c in aio_bh_call /../util/async.c:167:5
#11 0x565542b0a2cc in aio_bh_poll /../util/async.c:200:13
#12 0x565542a6a818 in aio_dispatch /../util/aio-posix.c:421:5
#13 0x565542b1156e in aio_ctx_dispatch /../util/async.c:342:5
#14 0x7fc66e3657a8 in g_main_context_dispatch 
(/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x547a8)

[PATCH] async: avoid use-after-free on re-entrancy guard

2023-05-01 Thread Alexander Bulekov

A BH callback can free the BH, causing a use-after-free in aio_bh_call.
Fix that by keeping a local copy of the re-entrancy guard pointer.

Buglink: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=58513
Fixes: 9c86c97f12 ("async: Add an optional reentrancy guard to the BH API")
Signed-off-by: Alexander Bulekov 
---
 util/async.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/util/async.c b/util/async.c
index 9df7674b4e..055070ffbd 100644
--- a/util/async.c
+++ b/util/async.c
@@ -156,18 +156,20 @@ void aio_bh_call(QEMUBH *bh)
 {
 bool last_engaged_in_io = false;
 
-if (bh->reentrancy_guard) {
-last_engaged_in_io = bh->reentrancy_guard->engaged_in_io;
-if (bh->reentrancy_guard->engaged_in_io) {
+/* Make a copy of the guard-pointer as cb may free the bh */
+MemReentrancyGuard *reentrancy_guard = bh->reentrancy_guard;
+if (reentrancy_guard) {
+last_engaged_in_io = reentrancy_guard->engaged_in_io;
+if (reentrancy_guard->engaged_in_io) {
 trace_reentrant_aio(bh->ctx, bh->name);
 }
-bh->reentrancy_guard->engaged_in_io = true;
+reentrancy_guard->engaged_in_io = true;
 }
 
 bh->cb(bh->opaque);
 
-if (bh->reentrancy_guard) {
-bh->reentrancy_guard->engaged_in_io = last_engaged_in_io;
+if (reentrancy_guard) {
+reentrancy_guard->engaged_in_io = last_engaged_in_io;
 }
 }
 
-- 
2.39.0

Re: [PATCH] linux-user: report ENOTTY for unknown ioctls

2023-05-01 Thread Laurent Vivier


Le 26/04/2023 à 09:06, Thomas Weißschuh a écrit :

The correct error number for unknown ioctls is ENOTTY.

ENOSYS would mean that the ioctl() syscall itself is not implemented,
which is very improbable and unexpected for userspace.

ENOTTY means "Inappropriate ioctl for device". This is what the kernel
returns on unknown ioctls, what qemu is trying to express and what
userspace is prepared to handle.

Signed-off-by: Thomas Weißschuh 
---
  linux-user/syscall.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 69f740ff98c8..c5955313a063 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -5747,7 +5747,7 @@ static abi_long do_ioctl(int fd, int cmd, abi_long arg)
  if (ie->target_cmd == 0) {
  qemu_log_mask(
  LOG_UNIMP, "Unsupported ioctl: cmd=0x%04lx\n", (long)cmd);
-return -TARGET_ENOSYS;
+return -TARGET_ENOTTY;
  }
  if (ie->target_cmd == cmd)
  break;
@@ -5759,7 +5759,7 @@ static abi_long do_ioctl(int fd, int cmd, abi_long arg)
  } else if (!ie->host_cmd) {
  /* Some architectures define BSD ioctls in their headers
 that are not implemented in Linux.  */
-return -TARGET_ENOSYS;
+return -TARGET_ENOTTY;
  }
  
  switch(arg_type[0]) {

@@ -5817,7 +5817,7 @@ static abi_long do_ioctl(int fd, int cmd, abi_long arg)
  qemu_log_mask(LOG_UNIMP,
"Unsupported ioctl type: cmd=0x%04lx type=%d\n",
(long)cmd, arg_type[0]);
-ret = -TARGET_ENOSYS;
+ret = -TARGET_ENOTTY;
  break;
  }
  return ret;

base-commit: a14b8206c5edcbbad1c71256ea9b44c3b382a9f5


Applied to my linux-user-for-8.1 branch.

Thanks,
Laurent

Re: [PATCH v4 04/20] virtio-scsi: stop using aio_disable_external() during unplug

2023-05-01 Thread Stefan Hajnoczi

On Fri, Apr 28, 2023 at 04:22:55PM +0200, Kevin Wolf wrote:
> Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> > This patch is part of an effort to remove the aio_disable_external()
> > API because it does not fit in a multi-queue block layer world where
> > many AioContexts may be submitting requests to the same disk.
> > 
> > The SCSI emulation code is already in good shape to stop using
> > aio_disable_external(). It was only used by commit 9c5aad84da1c
> > ("virtio-scsi: fixed virtio_scsi_ctx_check failed when detaching scsi
> > disk") to ensure that virtio_scsi_hotunplug() works while the guest
> > driver is submitting I/O.
> > 
> > Ensure virtio_scsi_hotunplug() is safe as follows:
> > 
> > 1. qdev_simple_device_unplug_cb() -> qdev_unrealize() ->
> >device_set_realized() calls qatomic_set(&dev->realized, false) so
> >that future scsi_device_get() calls return NULL because they exclude
> >SCSIDevices with realized=false.
> > 
> >That means virtio-scsi will reject new I/O requests to this
> >SCSIDevice with VIRTIO_SCSI_S_BAD_TARGET even while
> >virtio_scsi_hotunplug() is still executing. We are protected against
> >new requests!
> > 
> > 2. Add a call to scsi_device_purge_requests() from scsi_unrealize() so
> >that in-flight requests are cancelled synchronously. This ensures
> >that no in-flight requests remain once qdev_simple_device_unplug_cb()
> >returns.
> > 
> > Thanks to these two conditions we don't need aio_disable_external()
> > anymore.
> > 
> > Cc: Zhengui Li 
> > Reviewed-by: Paolo Bonzini 
> > Reviewed-by: Daniil Tatianin 
> > Signed-off-by: Stefan Hajnoczi 
> 
> qemu-iotests 040 starts failing for me after this patch, with what looks
> like a use-after-free error of some kind.
> 
> (gdb) bt
> #0  0x55b6e3e1f31c in job_type (job=0xe3e3e3e3e3e3e3e3) at ../job.c:238
> #1  0x55b6e3e1cee5 in is_block_job (job=0xe3e3e3e3e3e3e3e3) at 
> ../blockjob.c:41
> #2  0x55b6e3e1ce7d in block_job_next_locked (bjob=0x55b6e72b7570) at 
> ../blockjob.c:54
> #3  0x55b6e3df6370 in blockdev_mark_auto_del (blk=0x55b6e74af0a0) at 
> ../blockdev.c:157
> #4  0x55b6e393e23b in scsi_qdev_unrealize (qdev=0x55b6e7c04d40) at 
> ../hw/scsi/scsi-bus.c:303
> #5  0x55b6e3db0d0e in device_set_realized (obj=0x55b6e7c04d40, 
> value=false, errp=0x55b6e497c918 ) at ../hw/core/qdev.c:599
> #6  0x55b6e3dba36e in property_set_bool (obj=0x55b6e7c04d40, 
> v=0x55b6e7d7f290, name=0x55b6e41bd6d8 "realized", opaque=0x55b6e7246d20, 
> errp=0x55b6e497c918 )
> at ../qom/object.c:2285
> #7  0x55b6e3db7e65 in object_property_set (obj=0x55b6e7c04d40, 
> name=0x55b6e41bd6d8 "realized", v=0x55b6e7d7f290, errp=0x55b6e497c918 
> ) at ../qom/object.c:1420
> #8  0x55b6e3dbd84a in object_property_set_qobject (obj=0x55b6e7c04d40, 
> name=0x55b6e41bd6d8 "realized", value=0x55b6e74c1890, errp=0x55b6e497c918 
> )
> at ../qom/qom-qobject.c:28
> #9  0x55b6e3db8570 in object_property_set_bool (obj=0x55b6e7c04d40, 
> name=0x55b6e41bd6d8 "realized", value=false, errp=0x55b6e497c918 
> ) at ../qom/object.c:1489
> #10 0x55b6e3daf2b5 in qdev_unrealize (dev=0x55b6e7c04d40) at 
> ../hw/core/qdev.c:306
> #11 0x55b6e3db509d in qdev_simple_device_unplug_cb 
> (hotplug_dev=0x55b6e81c3630, dev=0x55b6e7c04d40, errp=0x7ffec5519200) at 
> ../hw/core/qdev-hotplug.c:72
> #12 0x55b6e3c520f9 in virtio_scsi_hotunplug (hotplug_dev=0x55b6e81c3630, 
> dev=0x55b6e7c04d40, errp=0x7ffec5519200) at ../hw/scsi/virtio-scsi.c:1065
> #13 0x55b6e3db4dec in hotplug_handler_unplug 
> (plug_handler=0x55b6e81c3630, plugged_dev=0x55b6e7c04d40, 
> errp=0x7ffec5519200) at ../hw/core/hotplug.c:56
> #14 0x55b6e3a28f84 in qdev_unplug (dev=0x55b6e7c04d40, 
> errp=0x7ffec55192e0) at ../softmmu/qdev-monitor.c:935
> #15 0x55b6e3a290fa in qmp_device_del (id=0x55b6e74c1760 "scsi0", 
> errp=0x7ffec55192e0) at ../softmmu/qdev-monitor.c:955
> #16 0x55b6e3fb0a5f in qmp_marshal_device_del (args=0x7f61cc005eb0, 
> ret=0x7f61d5a8ae38, errp=0x7f61d5a8ae40) at qapi/qapi-commands-qdev.c:114
> #17 0x55b6e3fd52e1 in do_qmp_dispatch_bh (opaque=0x7f61d5a8ae08) at 
> ../qapi/qmp-dispatch.c:128
> #18 0x55b6e4007b9e in aio_bh_call (bh=0x55b6e7dea730) at 
> ../util/async.c:155
> #19 0x55b6e4007d2e in aio_bh_poll (ctx=0x55b6e72447c0) at 
> ../util/async.c:184
> #20 0x55b6e3fe3b45 in aio_dispatch (ctx=0x55b6e72447c0) at 
> ../util/aio-posix.c:421
> #21 0x55b6e4009544 in aio_ctx_dispatch (source=0x55b6e72447c0, 
> callback=0x0, user_data=0x0) at ../util/async.c:326
> #22 0x7f61ddc14c7f in g_main_dispatch (context=0x55b6e7244b20) at 
> ../glib/gmain.c:3454
> #23 g_main_context_dispatch (context=0x55b6e7244b20) at ../glib/gmain.c:4172
> #24 0x55b6e400a7e8 in glib_pollfds_poll () at ../util/main-loop.c:290
> #25 0x55b6e400a0c2 in os_host_main_loop_wait (timeout=0) at 
> ../util/main-loop.c:313
> #26 0x55b6e4009fa2 in main_loop_wait (nonblocking=0) at 
> ../util/main-

Re: [PATCH 1/2] linux-user: Add move_mount() syscall

2023-05-01 Thread Laurent Vivier


Le 24/04/2023 à 17:34, Thomas Weißschuh a écrit :

Signed-off-by: Thomas Weißschuh 
---
  linux-user/syscall.c | 27 +++
  1 file changed, 27 insertions(+)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 69f740ff98c8..95e370130cee 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -9139,6 +9139,33 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int 
num, abi_long arg1,
  unlock_user(p, arg1, 0);
  return ret;
  #endif
+#ifdef TARGET_NR_move_mount
+case TARGET_NR_move_mount:
+{
+void *p2, *p4;
+
+if (!arg2 || !arg4) {
+return -TARGET_EFAULT;
+}
+
+p2 = lock_user_string(arg2);
+if (!p2) {
+return -TARGET_EFAULT;
+}
+
+p4 = lock_user_string(arg4);
+if (!p4) {
+unlock_user(p2, arg2, 0);
+return -TARGET_EFAULT;
+}
+ret = get_errno(move_mount(arg1, p2, arg3, p4, arg5));
+
+unlock_user(p2, arg2, 0);
+unlock_user(p4, arg4, 0);
+
+return ret;
+}
+#endif
  #ifdef TARGET_NR_stime /* not on alpha */
  case TARGET_NR_stime:
  {

base-commit: 81072abf1575b11226b3779af76dc71dfa85ee5d


Reviewed-by: Laurent Vivier

Re: [PATCH] test-aio-multithread: simplify test_multi_co_schedule

2023-05-01 Thread Stefan Hajnoczi

On Fri, Apr 28, 2023 at 01:19:41PM +0200, Paolo Bonzini wrote:
> Instead of using qatomic_mb_{read,set} mindlessly, just use a per-coroutine
> flag that requires no synchronization.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  tests/unit/test-aio-multithread.c | 18 --
>  1 file changed, 12 insertions(+), 6 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 2/2] linux-user: Add open_tree() syscall

2023-05-01 Thread Laurent Vivier


Le 24/04/2023 à 17:34, Thomas Weißschuh a écrit :

Signed-off-by: Thomas Weißschuh 
---
  linux-user/syscall.c | 26 ++
  1 file changed, 26 insertions(+)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 95e370130cee..140bd2c36e0f 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -9166,6 +9166,32 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int 
num, abi_long arg1,
  return ret;
  }
  #endif
+#ifdef TARGET_NR_open_tree
+case TARGET_NR_open_tree:
+{
+void *p2;
+
+if (!arg2) {
+return -TARGET_EFAULT;
+}
+
+p2 = lock_user_string(arg2);
+if (!p2) {
+return -TARGET_EFAULT;
+}
+
+int host_flags = arg3 & ~TARGET_O_CLOEXEC;
+if (arg3 & TARGET_O_CLOEXEC) {
+host_flags |= O_CLOEXEC;
+}
+
+ret = get_errno(open_tree(arg1, p2, host_flags));
+
+unlock_user(p2, arg2, 0);
+
+return ret;
+}
+#endif
  #ifdef TARGET_NR_stime /* not on alpha */
  case TARGET_NR_stime:
  {


Reviewed-by: Laurent Vivier

Re: [PATCH 01/20] qcow2: Don't call bdrv_getlength() in coroutine_fns

2023-05-01 Thread Stefan Hajnoczi

On Tue, Apr 25, 2023 at 07:31:39PM +0200, Kevin Wolf wrote:
> There is a bdrv_co_getlength() now, which should be used in coroutine
> context.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/qcow2.h  |  4 +++-
>  block/qcow2-refcount.c |  2 +-
>  block/qcow2.c  | 19 +--
>  3 files changed, 13 insertions(+), 12 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 02/20] block: Consistently call bdrv_activate() outside coroutine

2023-05-01 Thread Stefan Hajnoczi

On Tue, Apr 25, 2023 at 07:31:40PM +0200, Kevin Wolf wrote:
> Migration code can call bdrv_activate() in coroutine context, whereas
> other callers call it outside of coroutines. As it calls other code that
> is not supposed to run in coroutines, standardise on running outside of
> coroutines.
> 
> This adds a no_co_wrapper to switch to the main loop before calling
> bdrv_activate().
> 
> Signed-off-by: Kevin Wolf 
> ---
>  include/block/block-global-state.h |  6 +-
>  block/block-backend.c  | 10 +-
>  2 files changed, 14 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 1/2] linux-user: Add move_mount() syscall

2023-05-01 Thread Laurent Vivier


Le 24/04/2023 à 17:34, Thomas Weißschuh a écrit :

Signed-off-by: Thomas Weißschuh 
---
  linux-user/syscall.c | 27 +++
  1 file changed, 27 insertions(+)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 69f740ff98c8..95e370130cee 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -9139,6 +9139,33 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int 
num, abi_long arg1,
  unlock_user(p, arg1, 0);
  return ret;
  #endif
+#ifdef TARGET_NR_move_mount
+case TARGET_NR_move_mount:
+{
+void *p2, *p4;
+
+if (!arg2 || !arg4) {
+return -TARGET_EFAULT;
+}
+
+p2 = lock_user_string(arg2);
+if (!p2) {
+return -TARGET_EFAULT;
+}
+
+p4 = lock_user_string(arg4);
+if (!p4) {
+unlock_user(p2, arg2, 0);
+return -TARGET_EFAULT;
+}
+ret = get_errno(move_mount(arg1, p2, arg3, p4, arg5));
+
+unlock_user(p2, arg2, 0);
+unlock_user(p4, arg4, 0);
+
+return ret;
+}
+#endif
  #ifdef TARGET_NR_stime /* not on alpha */
  case TARGET_NR_stime:
  {

base-commit: 81072abf1575b11226b3779af76dc71dfa85ee5d


Applied to my linux-user-for-8.1 branch.

Thanks,
Laurent

Re: [PATCH 03/20] block: bdrv/blk_co_unref() for calls in coroutine context

2023-05-01 Thread Stefan Hajnoczi

On Tue, Apr 25, 2023 at 07:31:41PM +0200, Kevin Wolf wrote:
> These functions must not be called in coroutine context, because they
> need write access to the graph.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  include/block/block-global-state.h  |  3 ++-
>  include/sysemu/block-backend-global-state.h |  5 -
>  block.c |  2 +-
>  block/crypto.c  |  6 +++---
>  block/parallels.c   |  6 +++---
>  block/qcow.c|  6 +++---
>  block/qcow2.c   | 14 +++---
>  block/qed.c |  6 +++---
>  block/vdi.c |  6 +++---
>  block/vhdx.c|  6 +++---
>  block/vmdk.c| 18 +-
>  block/vpc.c |  6 +++---
>  12 files changed, 44 insertions(+), 40 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 04/20] block: Don't call no_coroutine_fns in qmp_block_resize()

2023-05-01 Thread Stefan Hajnoczi

On Tue, Apr 25, 2023 at 07:31:42PM +0200, Kevin Wolf wrote:
> This QMP handler runs in a coroutine, so it must use the corresponding
> no_co_wrappers instead.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  blockdev.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 05/20] test-bdrv-drain: Don't modify the graph in coroutines

2023-05-01 Thread Stefan Hajnoczi

On Tue, Apr 25, 2023 at 07:31:43PM +0200, Kevin Wolf wrote:
> test-bdrv-drain contains a few test cases that are run both in coroutine
> and non-coroutine context. Running the entire code including the setup
> and shutdown in coroutines is incorrect because graph modifications can
> generally not happen in coroutines.
> 
> Change the test so that creating and destroying the test nodes and
> BlockBackends always happens outside of coroutine context.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  tests/unit/test-bdrv-drain.c | 112 +++
>  1 file changed, 75 insertions(+), 37 deletions(-)
> 
> diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
> index d9d3807062..765ae382da 100644
> --- a/tests/unit/test-bdrv-drain.c
> +++ b/tests/unit/test-bdrv-drain.c
> @@ -188,6 +188,25 @@ static void do_drain_begin_unlocked(enum drain_type 
> drain_type, BlockDriverState
>  }
>  }
>  
> +static BlockBackend * no_coroutine_fn test_setup(void)
> +{
> +BlockBackend *blk;
> +BlockDriverState *bs, *backing;
> +
> +blk = blk_new(qemu_get_aio_context(), BLK_PERM_ALL, BLK_PERM_ALL);
> +bs = bdrv_new_open_driver(&bdrv_test, "test-node", BDRV_O_RDWR,
> +  &error_abort);
> +blk_insert_bs(blk, bs, &error_abort);
> +
> +backing = bdrv_new_open_driver(&bdrv_test, "backing", 0, &error_abort);
> +bdrv_set_backing_hd(bs, backing, &error_abort);
> +
> +bdrv_unref(backing);
> +bdrv_unref(bs);
> +
> +return blk;
> +}
> +
>  static void do_drain_end_unlocked(enum drain_type drain_type, 
> BlockDriverState *bs)
>  {
>  if (drain_type != BDRV_DRAIN_ALL) {
> @@ -199,25 +218,19 @@ static void do_drain_end_unlocked(enum drain_type 
> drain_type, BlockDriverState *
>  }
>  }
>  
> -static void test_drv_cb_common(enum drain_type drain_type, bool recursive)
> +static void test_drv_cb_common(BlockBackend *blk, enum drain_type drain_type,
> +   bool recursive)
>  {
> -BlockBackend *blk;
> -BlockDriverState *bs, *backing;
> +BlockDriverState *bs = blk_bs(blk);
> +BlockDriverState *backing = bs->backing->bs;
>  BDRVTestState *s, *backing_s;
>  BlockAIOCB *acb;
>  int aio_ret;
>  
>  QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, NULL, 0);
>  
> -blk = blk_new(qemu_get_aio_context(), BLK_PERM_ALL, BLK_PERM_ALL);
> -bs = bdrv_new_open_driver(&bdrv_test, "test-node", BDRV_O_RDWR,
> -  &error_abort);
>  s = bs->opaque;
> -blk_insert_bs(blk, bs, &error_abort);
> -
> -backing = bdrv_new_open_driver(&bdrv_test, "backing", 0, &error_abort);
>  backing_s = backing->opaque;
> -bdrv_set_backing_hd(bs, backing, &error_abort);
>  
>  /* Simple bdrv_drain_all_begin/end pair, check that CBs are called */
>  g_assert_cmpint(s->drain_count, ==, 0);
> @@ -252,44 +265,53 @@ static void test_drv_cb_common(enum drain_type 
> drain_type, bool recursive)
>  
>  g_assert_cmpint(s->drain_count, ==, 0);
>  g_assert_cmpint(backing_s->drain_count, ==, 0);
> -
> -bdrv_unref(backing);
> -bdrv_unref(bs);
> -blk_unref(blk);
>  }
>  
>  static void test_drv_cb_drain_all(void)
>  {
> -test_drv_cb_common(BDRV_DRAIN_ALL, true);
> +BlockBackend *blk = test_setup();
> +test_drv_cb_common(blk, BDRV_DRAIN_ALL, true);
> +blk_unref(blk);
>  }
>  
>  static void test_drv_cb_drain(void)
>  {
> -test_drv_cb_common(BDRV_DRAIN, false);
> +BlockBackend *blk = test_setup();
> +test_drv_cb_common(blk, BDRV_DRAIN, false);
> +blk_unref(blk);
> +}
> +
> +static void test_drv_cb_co_drain_all_entry(void)

Missing coroutine_fn.

> +{
> +BlockBackend *blk = blk_all_next(NULL);
> +test_drv_cb_common(blk, BDRV_DRAIN_ALL, true);
>  }
>  
>  static void test_drv_cb_co_drain_all(void)
>  {
> -call_in_coroutine(test_drv_cb_drain_all);
> +BlockBackend *blk = test_setup();
> +call_in_coroutine(test_drv_cb_co_drain_all_entry);
> +blk_unref(blk);
>  }
>  
> -static void test_drv_cb_co_drain(void)
> +static void test_drv_cb_co_drain_entry(void)

Missing coroutine_fn.

>  {
> -call_in_coroutine(test_drv_cb_drain);
> +BlockBackend *blk = blk_all_next(NULL);
> +test_drv_cb_common(blk, BDRV_DRAIN, false);
>  }
>  
> -static void test_quiesce_common(enum drain_type drain_type, bool recursive)
> +static void test_drv_cb_co_drain(void)
>  {
> -BlockBackend *blk;
> -BlockDriverState *bs, *backing;
> -
> -blk = blk_new(qemu_get_aio_context(), BLK_PERM_ALL, BLK_PERM_ALL);
> -bs = bdrv_new_open_driver(&bdrv_test, "test-node", BDRV_O_RDWR,
> -  &error_abort);
> -blk_insert_bs(blk, bs, &error_abort);
> +BlockBackend *blk = test_setup();
> +call_in_coroutine(test_drv_cb_co_drain_entry);
> +blk_unref(blk);
> +}
>  
> -backing = bdrv_new_open_driver(&bdrv_test, "backing", 0, &error_abort);
> -bdrv_set_backing_hd(bs, backing, &error_abort)

Re: [PATCH 06/20] graph-lock: Add GRAPH_UNLOCKED(_PTR)

2023-05-01 Thread Stefan Hajnoczi

On Tue, Apr 25, 2023 at 07:31:44PM +0200, Kevin Wolf wrote:
> For some function, parts of their interface is that are called without
> holding the graph lock. Add a new macro to document this.
> 
> The macro expands to TSA_EXCLUDES(), which is a relatively weak check
> because it passes in cases where the compiler just doesn't know if the
> lock is held. Function pointers can't be checked at all. Therefore, its
> primary purpose is documentation.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  include/block/graph-lock.h | 2 ++
>  1 file changed, 2 insertions(+)

Modulo Eric's comment about the commit description:

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 2/2] linux-user: Add open_tree() syscall

2023-05-01 Thread Laurent Vivier


Le 24/04/2023 à 17:34, Thomas Weißschuh a écrit :

Signed-off-by: Thomas Weißschuh 
---
  linux-user/syscall.c | 26 ++
  1 file changed, 26 insertions(+)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 95e370130cee..140bd2c36e0f 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -9166,6 +9166,32 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int 
num, abi_long arg1,
  return ret;
  }
  #endif
+#ifdef TARGET_NR_open_tree
+case TARGET_NR_open_tree:
+{
+void *p2;
+
+if (!arg2) {
+return -TARGET_EFAULT;
+}
+
+p2 = lock_user_string(arg2);
+if (!p2) {
+return -TARGET_EFAULT;
+}
+
+int host_flags = arg3 & ~TARGET_O_CLOEXEC;
+if (arg3 & TARGET_O_CLOEXEC) {
+host_flags |= O_CLOEXEC;
+}
+
+ret = get_errno(open_tree(arg1, p2, host_flags));
+
+unlock_user(p2, arg2, 0);
+
+return ret;
+}
+#endif
  #ifdef TARGET_NR_stime /* not on alpha */
  case TARGET_NR_stime:
  {


Applied to my linux-user-for-8.1 branch.
(moved the variable declaration to the beginning of the block)

Thanks,
Laurent

Re: [PATCH 07/20] graph-lock: Fix GRAPH_RDLOCK_GUARD*() to be reader lock

2023-05-01 Thread Stefan Hajnoczi

On Tue, Apr 25, 2023 at 07:31:45PM +0200, Kevin Wolf wrote:
> GRAPH_RDLOCK_GUARD() and GRAPH_RDLOCK_GUARD_MAINLOOP() only take a
> reader lock for the graph, so the correct annotation for them to use is
> TSA_ASSERT_SHARED rather than TSA_ASSERT.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  include/block/graph-lock.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/block/graph-lock.h b/include/block/graph-lock.h
> index 7ef391fab3..adaa3ed089 100644
> --- a/include/block/graph-lock.h
> +++ b/include/block/graph-lock.h
> @@ -210,7 +210,7 @@ typedef struct GraphLockable { } GraphLockable;
>   * unlocked. TSA_ASSERT() makes sure that the following calls know that we

Does this comment need to be updated to TSA_ASSERT_SHARED()?

>   * hold the lock while unlocking is left unchecked.
>   */
> -static inline GraphLockable * TSA_ASSERT(graph_lock) TSA_NO_TSA
> +static inline GraphLockable * TSA_ASSERT_SHARED(graph_lock) TSA_NO_TSA
>  graph_lockable_auto_lock(GraphLockable *x)
>  {
>  bdrv_graph_co_rdlock();
> @@ -254,7 +254,7 @@ typedef struct GraphLockableMainloop { } 
> GraphLockableMainloop;
>   * unlocked. TSA_ASSERT() makes sure that the following calls know that we

Same.


signature.asc
Description: PGP signature

Re: [PATCH] linux-user/main: Use list_cpus() instead of cpu_list()

2023-05-01 Thread Laurent Vivier


Le 24/04/2023 à 14:21, Thomas Huth a écrit :

This way we can get rid of the if'deffery and the XXX comment
here (it's repeated in the list_cpus() function anyway).

Signed-off-by: Thomas Huth 
---
  linux-user/main.c | 5 +
  1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/linux-user/main.c b/linux-user/main.c
index fe03293516..aece4d9e91 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -359,10 +359,7 @@ static void handle_arg_cpu(const char *arg)
  {
  cpu_model = strdup(arg);
  if (cpu_model == NULL || is_help_option(cpu_model)) {
-/* XXX: implement xxx_cpu_list for targets that still miss it */
-#if defined(cpu_list)
-cpu_list();
-#endif
+list_cpus();
  exit(EXIT_FAILURE);
  }
  }


Applied to my linux-user-for-8.1 branch.

Thanks,
Laurent

Re: [PATCH v2 1/2] linux-user: Add new flag VERIFY_NONE

2023-05-01 Thread Laurent Vivier


Le 22/04/2023 à 12:03, Thomas Weißschuh a écrit :

This can be used to validate that an address range is mapped but without
being readable or writable.

It will be used by an updated implementation of mincore().

Signed-off-by: Thomas Weißschuh 
---
  linux-user/qemu.h | 1 +
  1 file changed, 1 insertion(+)

diff --git a/linux-user/qemu.h b/linux-user/qemu.h
index e2e93fbd1d5d..92f9f5af41c7 100644
--- a/linux-user/qemu.h
+++ b/linux-user/qemu.h
@@ -168,6 +168,7 @@ abi_long do_brk(abi_ulong new_brk);
  
  /* user access */
  
+#define VERIFY_NONE  0

  #define VERIFY_READ  PAGE_READ
  #define VERIFY_WRITE (PAGE_READ | PAGE_WRITE)
  


Reviewed-by: Laurent Vivier

Re: [PATCH v2 2/2] linux-user: Don't require PROT_READ for mincore

2023-05-01 Thread Laurent Vivier


Le 22/04/2023 à 12:03, Thomas Weißschuh a écrit :

The kernel does not require PROT_READ for addresses passed to mincore.
For example the fincore(1) tool from util-linux uses PROT_NONE and
currently does not work under qemu-user.

Example (with fincore(1) from util-linux 2.38):

$ fincore /proc/self/exe
RES PAGES  SIZE FILE
24K 6 22.1K /proc/self/exe

$ qemu-x86_64 /usr/bin/fincore /proc/self/exe
fincore: failed to do mincore: /proc/self/exe: Cannot allocate memory

With this patch:

$ ./build/qemu-x86_64 /usr/bin/fincore /proc/self/exe
RES PAGES  SIZE FILE
24K 6 22.1K /proc/self/exe

Signed-off-by: Thomas Weißschuh 
---
  linux-user/syscall.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 69f740ff98c8..5ec848b459f7 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -11897,7 +11897,7 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int 
num, abi_long arg1,
  #ifdef TARGET_NR_mincore
  case TARGET_NR_mincore:
  {
-void *a = lock_user(VERIFY_READ, arg1, arg2, 0);
+void *a = lock_user(VERIFY_NONE, arg1, arg2, 0);
  if (!a) {
  return -TARGET_ENOMEM;
  }


Reviewed-by: Laurent Vivier

Re: [PATCH v2 1/2] linux-user: Add new flag VERIFY_NONE

2023-05-01 Thread Laurent Vivier


Le 22/04/2023 à 12:03, Thomas Weißschuh a écrit :

This can be used to validate that an address range is mapped but without
being readable or writable.

It will be used by an updated implementation of mincore().

Signed-off-by: Thomas Weißschuh 
---
  linux-user/qemu.h | 1 +
  1 file changed, 1 insertion(+)

diff --git a/linux-user/qemu.h b/linux-user/qemu.h
index e2e93fbd1d5d..92f9f5af41c7 100644
--- a/linux-user/qemu.h
+++ b/linux-user/qemu.h
@@ -168,6 +168,7 @@ abi_long do_brk(abi_ulong new_brk);
  
  /* user access */
  
+#define VERIFY_NONE  0

  #define VERIFY_READ  PAGE_READ
  #define VERIFY_WRITE (PAGE_READ | PAGE_WRITE)
  


Applied to my linux-user-for-8.1 branch.

Thanks,
Laurent

Re: [PATCH v2 2/2] linux-user: Don't require PROT_READ for mincore

2023-05-01 Thread Laurent Vivier


Le 22/04/2023 à 12:03, Thomas Weißschuh a écrit :

The kernel does not require PROT_READ for addresses passed to mincore.
For example the fincore(1) tool from util-linux uses PROT_NONE and
currently does not work under qemu-user.

Example (with fincore(1) from util-linux 2.38):

$ fincore /proc/self/exe
RES PAGES  SIZE FILE
24K 6 22.1K /proc/self/exe

$ qemu-x86_64 /usr/bin/fincore /proc/self/exe
fincore: failed to do mincore: /proc/self/exe: Cannot allocate memory

With this patch:

$ ./build/qemu-x86_64 /usr/bin/fincore /proc/self/exe
RES PAGES  SIZE FILE
24K 6 22.1K /proc/self/exe

Signed-off-by: Thomas Weißschuh 
---
  linux-user/syscall.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 69f740ff98c8..5ec848b459f7 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -11897,7 +11897,7 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int 
num, abi_long arg1,
  #ifdef TARGET_NR_mincore
  case TARGET_NR_mincore:
  {
-void *a = lock_user(VERIFY_READ, arg1, arg2, 0);
+void *a = lock_user(VERIFY_NONE, arg1, arg2, 0);
  if (!a) {
  return -TARGET_ENOMEM;
  }


Applied to my linux-user-for-8.1 branch.

Thanks,
Laurent

Re: [PATCH 08/20] block: .bdrv_open is non-coroutine and unlocked

2023-05-01 Thread Stefan Hajnoczi

On Tue, Apr 25, 2023 at 07:31:46PM +0200, Kevin Wolf wrote:
> Drivers were a bit confused about whether .bdrv_open can run in a
> coroutine and whether or not it holds a graph lock.
> 
> It cannot keep a graph lock from the caller across the whole function
> because it both changes the graph (requires a writer lock) and does I/O
> (requires a reader lock). Therefore, it should take these locks
> internally as needed.
> 
> The functions used to be called in coroutine context during image
> creation. This was buggy for other reasons, and as of commit 32192301,
> all block drivers go through no_co_wrappers. So it is not called in
> coroutine context any more.
> 
> Fix qcow2 and qed to work with the correct assumptions: The graph lock
> needs to be taken internally instead of just assuming it's already
> there, and the coroutine path is dead code that can be removed.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  include/block/block_int-common.h |  8 
>  block.c  |  6 +++---
>  block/qcow2.c| 15 ++-
>  block/qed.c  | 18 --
>  4 files changed, 21 insertions(+), 26 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 09/20] nbd: Remove nbd_co_flush() wrapper function

2023-05-01 Thread Stefan Hajnoczi

On Tue, Apr 25, 2023 at 07:31:47PM +0200, Kevin Wolf wrote:
> The only thing nbd_co_flush() does is calling nbd_client_co_flush().
> Just use that function directly in the BlockDriver definitions and
> remove the wrapper.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/nbd.c | 11 +++
>  1 file changed, 3 insertions(+), 8 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

RE: [PATCH v2] Hexagon (target/hexagon) Additional instructions handled by idef-parser

2023-05-01 Thread Taylor Simpson



> -Original Message-
> From: Taylor Simpson
> Sent: Friday, April 28, 2023 3:53 PM
> To: a...@rev.ng; qemu-devel@nongnu.org
> Cc: richard.hender...@linaro.org; phi...@linaro.org; a...@rev.ng; Brian Cain
> ; Matheus Bernardino (QUIC)
> 
> Subject: RE: [PATCH v2] Hexagon (target/hexagon) Additional instructions
> handled by idef-parser
> 
> 
> 
> > -Original Message-
> > From: Anton Johansson 
> > Sent: Friday, April 28, 2023 11:25 AM
> > To: Taylor Simpson ; qemu-devel@nongnu.org
> > Cc: richard.hender...@linaro.org; phi...@linaro.org; a...@rev.ng; Brian
> Cain
> > ; Matheus Bernardino (QUIC)
> > 
> > Subject: Re: [PATCH v2] Hexagon (target/hexagon) Additional instructions
> > handled by idef-parser
> >
> > On 4/26/23 19:32, Taylor Simpson wrote:
> > >  Changes in v2 
> > > Fix bug in imm_print identified in clang build
> > >
> > > Currently, idef-parser skips all floating point instructions.
> > > However, there are some floating point instructions that can be handled.
> > >
> > > The following instructions are now parsed
> > >  F2_sfimm_p
> > >  F2_sfimm_n
> > >  F2_dfimm_p
> > >  F2_dfimm_n
> > >  F2_dfmpyll
> > >  F2_dfmpylh
> > >
> > > To make these instructions work, we fix some bugs in parser-helpers.c
> > >  gen_rvalue_extend
> > >  gen_cast_op
> > >  imm_print
> > >
> > > Test cases added to tests/tcg/hexagon/fpstuff.c
> > >
> > > Signed-off-by: Taylor Simpson 
> > > ---
> > >   target/hexagon/idef-parser/parser-helpers.h |  2 +-
> > >   target/hexagon/idef-parser/parser-helpers.c | 37 ++
> > >   tests/tcg/hexagon/fpstuff.c | 54 +
> > >   target/hexagon/gen_idef_parser_funcs.py | 10 +++-
> > >   4 files changed, 91 insertions(+), 12 deletions(-)
> >
> > I'm getting a harness failure on
> >
> >  v65_Q6_R_mpy_RR_rnd.c
> >
> > I'll take a deeper look at this next week.
> 
> I'm seeing that failure too.  Thanks for looking into it.

It's this instruction
void emit_M2_dpmpyss_rnd_s0(DisasContext * ctx, Insn * insn, Packet * pkt,
TCGv_i32 RdV, TCGv_i32 RsV, TCGv_i32 RtV)
/* {RdV=(fMPY32SS(RsV,RtV)+fCONSTLL(0x8000))>>32;} */ {
TCGv_i64 tmp_0 = tcg_temp_new_i64();
tcg_gen_ext_i32_i64(tmp_0, RsV);
TCGv_i64 tmp_1 = tcg_temp_new_i64();
tcg_gen_ext_i32_i64(tmp_1, RtV);
TCGv_i64 tmp_2 = tcg_temp_new_i64();
tcg_gen_mul_i64(tmp_2, tmp_0, tmp_1);
int64_t qemu_tmp_0 = (int64_t) ((int32_t) - 2147483648);
TCGv_i64 tmp_3 = tcg_temp_new_i64();
tcg_gen_addi_i64(tmp_3, tmp_2, qemu_tmp_0);
int64_t qemu_tmp_1 = (int64_t) ((int32_t) 32);
TCGv_i64 tmp_4 = tcg_temp_new_i64();
{
int64_t shift = qemu_tmp_1;
if (qemu_tmp_1 >= 64) {
shift = 64 - 1;
}
tcg_gen_sari_i64(tmp_4, tmp_3, shift);
}
TCGv_i32 tmp_5 = tcg_temp_new_i32();
tcg_gen_trunc_i64_tl(tmp_5, tmp_4);
tcg_gen_mov_i32(RdV, tmp_5);
}

The problem is how we handle fCONSTLL(0x8000).  In macros.h, it's
#define fCONSTLL(A) A##LL

The parser is treating it as a cast to int64_t.  However,
 0x8000LL != (int64_t) 0x8000

I'll change fCONSTLL from a cast to simply changing the bit_width to 64 and 
signedness to SIGNED.

Stay tuned for v3 of the patch.

Thanks,
Taylor

Re: [PATCH v4 2/3] target/riscv: Reuse tb->flags.FS

2023-05-01 Thread Richard Henderson


On 5/1/23 15:00, Mayuresh Chitale wrote:

When misa.F is 0 tb->flags.FS field is unused and can be used to save
the current state of smstateen0.FCSR check which is needed by the
floating point translation routines.

Signed-off-by: Mayuresh Chitale
Reviewed-by: Weiwei Li
---
  target/riscv/cpu_helper.c   | 6 ++
  target/riscv/insn_trans/trans_rvf.c.inc | 7 ---
  2 files changed, 10 insertions(+), 3 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH v2 3/5] hw/display/virtio-gpu-virgl: define callbacks in realize function

2023-05-01 Thread Gurchetan Singh

On Sun, Apr 30, 2023 at 2:48 PM Bernhard Beschow  wrote:

>
>
> Am 28. April 2023 16:48:21 UTC schrieb Gurchetan Singh <
> gurchetansi...@chromium.org>:
> >From: Gurchetan Singh 
> >
> >This reduces the amount of renderer backend specific needed to
> >be exposed to the GL device.  We only need one realize function
> >per renderer backend.
> >
> >Signed-off-by: Gurchetan Singh 
> >Reviewed-by: Philippe Mathieu-Daudé 
> >---
> >v1: - Remove NULL inits (Philippe)
> >- Use VIRTIO_GPU_BASE where possible (Philippe)
> >v2: - Fix unnecessary line break (Akihiko)
> >
> > hw/display/virtio-gpu-gl.c | 15 ++-
> > hw/display/virtio-gpu-virgl.c  | 35 --
> > include/hw/virtio/virtio-gpu.h |  7 ---
> > 3 files changed, 31 insertions(+), 26 deletions(-)
> >
> >diff --git a/hw/display/virtio-gpu-gl.c b/hw/display/virtio-gpu-gl.c
> >index 2d140e8792..cdc9483e4d 100644
> >--- a/hw/display/virtio-gpu-gl.c
> >+++ b/hw/display/virtio-gpu-gl.c
> >@@ -21,6 +21,11 @@
> > #include "hw/virtio/virtio-gpu-pixman.h"
> > #include "hw/qdev-properties.h"
> >
> >+static void virtio_gpu_gl_device_realize(DeviceState *qdev, Error **errp)
> >+{
> >+virtio_gpu_virgl_device_realize(qdev, errp);
> >+}
> >+
> > static Property virtio_gpu_gl_properties[] = {
> > DEFINE_PROP_BIT("stats", VirtIOGPU, parent_obj.conf.flags,
> > VIRTIO_GPU_FLAG_STATS_ENABLED, false),
> >@@ -31,16 +36,8 @@ static void virtio_gpu_gl_class_init(ObjectClass
> *klass, void *data)
> > {
> > DeviceClass *dc = DEVICE_CLASS(klass);
> > VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
> >-VirtIOGPUBaseClass *vbc = VIRTIO_GPU_BASE_CLASS(klass);
> >-VirtIOGPUClass *vgc = VIRTIO_GPU_CLASS(klass);
> >-
> >-vbc->gl_flushed = virtio_gpu_virgl_flushed;
> >-vgc->handle_ctrl = virtio_gpu_virgl_handle_ctrl;
> >-vgc->process_cmd = virtio_gpu_virgl_process_cmd;
> >-vgc->update_cursor_data = virtio_gpu_virgl_update_cursor;
> >
> >-vdc->realize = virtio_gpu_virgl_device_realize;
> >-vdc->reset = virtio_gpu_virgl_reset;
> >+vdc->realize = virtio_gpu_gl_device_realize;
> > device_class_set_props(dc, virtio_gpu_gl_properties);
> > }
> >
> >diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
> >index 786351446c..d7e01f1c77 100644
> >--- a/hw/display/virtio-gpu-virgl.c
> >+++ b/hw/display/virtio-gpu-virgl.c
> >@@ -401,8 +401,9 @@ static void virgl_cmd_get_capset(VirtIOGPU *g,
> > g_free(resp);
> > }
> >
> >-void virtio_gpu_virgl_process_cmd(VirtIOGPU *g,
> >-  struct virtio_gpu_ctrl_command
> *cmd)
> >+static void
> >+virtio_gpu_virgl_process_cmd(VirtIOGPU *g,
> >+ struct virtio_gpu_ctrl_command *cmd)
> > {
> > VIRTIO_GPU_FILL_CMD(cmd->cmd_hdr);
> >
> >@@ -637,7 +638,7 @@ static int virtio_gpu_virgl_get_num_capsets(VirtIOGPU
> *g)
> > return capset2_max_ver ? 2 : 1;
> > }
> >
> >-void virtio_gpu_virgl_update_cursor(VirtIOGPU *g,
> >+static void virtio_gpu_virgl_update_cursor(VirtIOGPU *g,
> >struct virtio_gpu_scanout *s,
> >uint32_t resource_id)
> > {
> >@@ -660,14 +661,14 @@ void virtio_gpu_virgl_update_cursor(VirtIOGPU *g,
> > free(data);
> > }
> >
> >-void virtio_gpu_virgl_flushed(VirtIOGPUBase *b)
> >+static void virtio_gpu_virgl_flushed(VirtIOGPUBase *b)
> > {
> > VirtIOGPU *g = VIRTIO_GPU(b);
> >
> > virtio_gpu_process_cmdq(g);
> > }
> >
> >-void virtio_gpu_virgl_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
> >+static void virtio_gpu_virgl_handle_ctrl(VirtIODevice *vdev, VirtQueue
> *vq)
> > {
> > VirtIOGPU *g = VIRTIO_GPU(vdev);
> > VirtIOGPUGL *gl = VIRTIO_GPU_GL(vdev);
> >@@ -699,7 +700,7 @@ void virtio_gpu_virgl_handle_ctrl(VirtIODevice *vdev,
> VirtQueue *vq)
> > virtio_gpu_virgl_fence_poll(g);
> > }
> >
> >-void virtio_gpu_virgl_reset(VirtIODevice *vdev)
> >+static void virtio_gpu_virgl_reset(VirtIODevice *vdev)
> > {
> > VirtIOGPU *g = VIRTIO_GPU(vdev);
> > VirtIOGPUGL *gl = VIRTIO_GPU_GL(vdev);
> >@@ -718,7 +719,21 @@ void virtio_gpu_virgl_reset(VirtIODevice *vdev)
> >
> > void virtio_gpu_virgl_device_realize(DeviceState *qdev, Error **errp)
> > {
> >-VirtIOGPU *g = VIRTIO_GPU(qdev);
> >+VirtIODevice *vdev = VIRTIO_DEVICE(qdev);
> >+VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(vdev);
> >+
> >+VirtIOGPUBase *bdev = VIRTIO_GPU_BASE(qdev);
> >+VirtIOGPUBaseClass *vbc = VIRTIO_GPU_BASE_GET_CLASS(bdev);
> >+
> >+VirtIOGPU *gpudev = VIRTIO_GPU(qdev);
> >+VirtIOGPUClass *vgc = VIRTIO_GPU_GET_CLASS(gpudev);
> >+
> >+vbc->gl_flushed = virtio_gpu_virgl_flushed;
> >+vgc->handle_ctrl = virtio_gpu_virgl_handle_ctrl;
> >+vgc->process_cmd = virtio_gpu_virgl_process_cmd;
> >+vgc->update_cursor_data = virtio_gpu_virgl_update_cursor;
> >+
> >+vdc->reset = virtio_gpu_virgl_reset;
>
> A realize method is supposed to modify a

[PATCH] block: compile out assert_bdrv_graph_readable() by default

2023-05-01 Thread Stefan Hajnoczi

reader_count() is a performance bottleneck because the global
aio_context_list_lock mutex causes thread contention. Put this debugging
assertion behind a new ./configure --enable-debug-graph-lock option and
disable it by default.

The --enable-debug-graph-lock option is also enabled by the more general
--enable-debug option.

Signed-off-by: Stefan Hajnoczi 
---
 meson_options.txt | 2 ++
 configure | 1 +
 meson.build   | 2 ++
 block/graph-lock.c| 3 +++
 scripts/meson-buildoptions.sh | 4 
 5 files changed, 12 insertions(+)

diff --git a/meson_options.txt b/meson_options.txt
index 2471dd02da..0b2dd2d30d 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -311,6 +311,8 @@ option('rng_none', type: 'boolean', value: false,
description: 'dummy RNG, avoid using /dev/(u)random and getrandom()')
 option('coroutine_pool', type: 'boolean', value: true,
description: 'coroutine freelist (better performance)')
+option('debug_graph_lock', type: 'boolean', value: false,
+   description: 'graph lock debugging support')
 option('debug_mutex', type: 'boolean', value: false,
description: 'mutex debugging support')
 option('debug_stack_usage', type: 'boolean', value: false,
diff --git a/configure b/configure
index 77c03315f8..243e2e0a0d 100755
--- a/configure
+++ b/configure
@@ -816,6 +816,7 @@ for opt do
   --enable-debug)
   # Enable debugging options that aren't excessively noisy
   debug_tcg="yes"
+  meson_option_parse --enable-debug-graph-lock ""
   meson_option_parse --enable-debug-mutex ""
   meson_option_add -Doptimization=0
   fortify_source="no"
diff --git a/meson.build b/meson.build
index c44d05a13f..d964e741e7 100644
--- a/meson.build
+++ b/meson.build
@@ -1956,6 +1956,7 @@ if get_option('debug_stack_usage') and have_coroutine_pool
   have_coroutine_pool = false
 endif
 config_host_data.set10('CONFIG_COROUTINE_POOL', have_coroutine_pool)
+config_host_data.set('CONFIG_DEBUG_GRAPH_LOCK', get_option('debug_graph_lock'))
 config_host_data.set('CONFIG_DEBUG_MUTEX', get_option('debug_mutex'))
 config_host_data.set('CONFIG_DEBUG_STACK_USAGE', 
get_option('debug_stack_usage'))
 config_host_data.set('CONFIG_GPROF', get_option('gprof'))
@@ -3833,6 +3834,7 @@ summary_info += {'PIE':   get_option('b_pie')}
 summary_info += {'static build':  config_host.has_key('CONFIG_STATIC')}
 summary_info += {'malloc trim support': has_malloc_trim}
 summary_info += {'membarrier':have_membarrier}
+summary_info += {'debug graph lock':  get_option('debug_graph_lock')}
 summary_info += {'debug stack usage': get_option('debug_stack_usage')}
 summary_info += {'mutex debugging':   get_option('debug_mutex')}
 summary_info += {'memory allocator':  get_option('malloc')}
diff --git a/block/graph-lock.c b/block/graph-lock.c
index 639526608f..377884c3a9 100644
--- a/block/graph-lock.c
+++ b/block/graph-lock.c
@@ -265,7 +265,10 @@ void bdrv_graph_rdunlock_main_loop(void)
 
 void assert_bdrv_graph_readable(void)
 {
+/* reader_count() is slow due to aio_context_list_lock lock contention */
+#ifdef CONFIG_DEBUG_GRAPH_LOCK
 assert(qemu_in_main_thread() || reader_count());
+#endif
 }
 
 void assert_bdrv_graph_writable(void)
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index d4369a3ad8..d760ceb1ad 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -22,6 +22,8 @@ meson_options_help() {
   printf "%s\n" '   QEMU'
   printf "%s\n" '  --enable-cfi Control-Flow Integrity (CFI)'
   printf "%s\n" '  --enable-cfi-debug   Verbose errors in case of CFI 
violation'
+  printf "%s\n" '  --enable-debug-graph-lock'
+  printf "%s\n" '   graph lock debugging support'
   printf "%s\n" '  --enable-debug-mutex mutex debugging support'
   printf "%s\n" '  --enable-debug-stack-usage'
   printf "%s\n" '   measure coroutine stack usage'
@@ -249,6 +251,8 @@ _meson_option_parse() {
 --datadir=*) quote_sh "-Ddatadir=$2" ;;
 --enable-dbus-display) printf "%s" -Ddbus_display=enabled ;;
 --disable-dbus-display) printf "%s" -Ddbus_display=disabled ;;
+--enable-debug-graph-lock) printf "%s" -Ddebug_graph_lock=true ;;
+--disable-debug-graph-lock) printf "%s" -Ddebug_graph_lock=false ;;
 --enable-debug-mutex) printf "%s" -Ddebug_mutex=true ;;
 --disable-debug-mutex) printf "%s" -Ddebug_mutex=false ;;
 --enable-debug-stack-usage) printf "%s" -Ddebug_stack_usage=true ;;
-- 
2.40.1

Re: [PATCH v6 2/3] qga: Add `merged` variant to GuestExecCaptureOutputMode

2023-05-01 Thread Daniel Xu

Hi Konstantin,

On Mon, Apr 3, 2023, at 8:56 AM, Konstantin Kostiuk wrote:
> Hi Daniel,
>
> I will merge this series after the 8.0 release.
>
> Best Regards,
> Konstantin Kostiuk.
>

Sorry to bug again, but 8.0 is out now right? Does this need a rebase
or is it good to go?

Thanks,
Daniel

[...]

Re: [RFC PATCH v4 01/44] target/loongarch: Add LSX data type VReg

2023-05-01 Thread Richard Henderson


On 4/25/23 08:02, Song Gao wrote:

Signed-off-by: Song Gao
---
  linux-user/loongarch64/signal.c |  4 +-
  target/loongarch/cpu.c  |  2 +-
  target/loongarch/cpu.h  | 21 -
  target/loongarch/gdbstub.c  |  4 +-
  target/loongarch/internals.h| 22 +
  target/loongarch/machine.c  | 79 ++---
  6 files changed, 119 insertions(+), 13 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [RFC PATCH v4 14/44] target/loongarch: Implement vmul/vmuh/vmulw{ev/od}

2023-05-01 Thread Richard Henderson


On 4/25/23 08:02, Song Gao wrote:

This patch includes:
- VMUL.{B/H/W/D};
- VMUH.{B/H/W/D}[U];
- VMULW{EV/OD}.{H.B/W.H/D.W/Q.D}[U];
- VMULW{EV/OD}.{H.BU.B/W.HU.H/D.WU.W/Q.DU.D}.

Signed-off-by: Song Gao
---
  target/loongarch/disas.c|  38 ++
  target/loongarch/helper.h   |  30 ++
  target/loongarch/insn_trans/trans_lsx.c.inc | 550 
  target/loongarch/insns.decode   |  38 ++
  target/loongarch/lsx_helper.c   |  76 +++
  5 files changed, 732 insertions(+)


Reviewed-by: Richard Henderson 


r~

Re: [RFC PATCH v4 30/44] target/loongarch: Implement vpcnt

2023-05-01 Thread Richard Henderson


On 4/25/23 08:02, Song Gao wrote:

This patch includes:
- VPCNT.{B/H/W/D}.

Signed-off-by: Song Gao
---
  target/loongarch/disas.c|  5 +
  target/loongarch/helper.h   |  5 +
  target/loongarch/insn_trans/trans_lsx.c.inc |  5 +
  target/loongarch/insns.decode   |  5 +
  target/loongarch/lsx_helper.c   | 18 ++
  5 files changed, 38 insertions(+)


Reviewed-by: Richard Henderson 

r~

Re: [RFC PATCH v4 34/44] target/loongarch: Implement LSX fpu fcvt instructions

2023-05-01 Thread Richard Henderson


On 4/25/23 08:02, Song Gao wrote:

This patch includes:
- VFCVT{L/H}.{S.H/D.S};
- VFCVT.{H.S/S.D};
- VFRINT[{RNE/RZ/RP/RM}].{S/D};
- VFTINT[{RNE/RZ/RP/RM}].{W.S/L.D};
- VFTINT[RZ].{WU.S/LU.D};
- VFTINT[{RNE/RZ/RP/RM}].W.D;
- VFTINT[{RNE/RZ/RP/RM}]{L/H}.L.S;
- VFFINT.{S.W/D.L}[U];
- VFFINT.S.L, VFFINT{L/H}.D.W.

Signed-off-by: Song Gao
---
  target/loongarch/disas.c|  56 +++
  target/loongarch/helper.h   |  56 +++
  target/loongarch/insn_trans/trans_lsx.c.inc |  56 +++
  target/loongarch/insns.decode   |  56 +++
  target/loongarch/lsx_helper.c   | 376 
  5 files changed, 600 insertions(+)


Reviewed-by: Richard Henderson 


r~

Re: [RFC PATCH v4 37/44] target/loongarch: Implement vbitsel vset

2023-05-01 Thread Richard Henderson


On 4/25/23 08:02, Song Gao wrote:

This patch includes:
- VBITSEL.V;
- VBITSELI.B;
- VSET{EQZ/NEZ}.V;
- VSETANYEQZ.{B/H/W/D};
- VSETALLNEZ.{B/H/W/D}.

Signed-off-by: Song Gao
---
  target/loongarch/disas.c| 20 ++
  target/loongarch/helper.h   | 11 +++
  target/loongarch/insn_trans/trans_lsx.c.inc | 74 +
  target/loongarch/insns.decode   | 17 +
  target/loongarch/lsx_helper.c   | 52 +++
  5 files changed, 174 insertions(+)


Reviewed-by: Richard Henderson 

r~

Re: [RFC PATCH v4 41/44] target/loongarch: Implement vld vst

2023-05-01 Thread Richard Henderson


On 4/25/23 08:02, Song Gao wrote:

+tcg_gen_qemu_ld_i128(val, addr, ctx->mem_idx, MO_128);


You need MO_128 | MO_TE, here and elsewhere.
This will make things correct for big-endian hosts.

Otherwise,
Reviewed-by: Richard Henderson 


r~

Re: [RFC PATCH v4 00/44] Add LoongArch LSX instructions

2023-05-01 Thread Richard Henderson


On 4/25/23 08:02, Song Gao wrote:

Hi,

This series adds LoongArch LSX instructions, Since the LoongArch
Vol2 is not open, So we use 'RFC' title.

I'm not sure when the manual will be open.
After these patches are reviewed, how about merging them?

About test:
V2 we use RISU test the LoongArch LSX instructions.

QEMU:
 https://github.com/loongson/qemu/tree/tcg-old-abi-support-lsx
RISU:
 https://github.com/loongson/risu/tree/loongarch-suport-lsx

Build test:
make docker-test-build@fedora-i386-cross

The following patches need to be reviewed:
   0001-target-loongarch-Add-LSX-data-type-VReg.patch
   0014-target-loongarch-Implement-vmul-vmuh-vmulw-ev-od.patch
   0030-target-loongarch-Implement-vpcnt.patch
   0034-target-loongarch-Implement-LSX-fpu-fcvt-instructions.patch
   0037-target-loongarch-Implement-vbitsel-vset.patch
   0041-target-loongarch-Implement-vld-vst.patch

V4:
   - R-b and rebase;
   - Migrate the upper half lsx regs;
   - Remove tcg_gen_mulus2_*;
   - Vsetallnez use !do_match2;
   - Use tcg_gen_concat_i64_i128/tcg_gen_extr_i128_i64 to replace
 TCGV128_LOW(val)/TCGV128_High(val);


One minor nit, everything reviewed!  Congratulations.


r~

Re: [PATCH 10/20] nbd: Mark nbd_co_do_establish_connection() and callers GRAPH_RDLOCK

2023-05-01 Thread Stefan Hajnoczi

On Tue, Apr 25, 2023 at 07:31:48PM +0200, Kevin Wolf wrote:
> From: Emanuele Giuseppe Esposito 
> 
> This adds GRAPH_RDLOCK annotations to declare that callers of
> nbd_co_do_establish_connection() need to hold a reader lock for the
> graph.
> 
> Signed-off-by: Emanuele Giuseppe Esposito 
> Signed-off-by: Kevin Wolf 
> ---
>  block/coroutines.h |  5 +++--
>  block/nbd.c| 39 +--
>  2 files changed, 24 insertions(+), 20 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 11/20] vhdx: Take graph lock for accessing a node's parent list

2023-05-01 Thread Stefan Hajnoczi

On Tue, Apr 25, 2023 at 07:31:49PM +0200, Kevin Wolf wrote:
> This adds GRAPH_RDLOCK annotations to declare that functions accessing
> the parent list of a node need to hold a reader lock for the graph. As
> it happens, they already do.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/vhdx.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)

The commit message is misleading. This commit does not take the graph
lock, it declares that the caller must already hold the graph lock.

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 12/20] mirror: Take graph lock for accessing a node's parent list

2023-05-01 Thread Stefan Hajnoczi

On Tue, Apr 25, 2023 at 07:31:50PM +0200, Kevin Wolf wrote:
> This adds GRAPH_RDLOCK annotations to declare that functions accessing
> the parent list of a node need to hold a reader lock for the graph. As
> it happens, they already do.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/mirror.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

The commit message is misleading. This commit does not take the graph
lock, it declares that the caller must already hold the graph lock.

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 13/20] block: Mark bdrv_co_get_allocated_file_size() and callers GRAPH_RDLOCK

2023-05-01 Thread Stefan Hajnoczi

On Tue, Apr 25, 2023 at 07:31:51PM +0200, Kevin Wolf wrote:
> @@ -5778,6 +5779,7 @@ int64_t coroutine_fn 
> bdrv_co_get_allocated_file_size(BlockDriverState *bs)
>  {
>  BlockDriver *drv = bs->drv;
>  IO_CODE();
> +assert_bdrv_graph_readable();

Is there a need for runtime assertions in functions already checked by
TSA?

I guess not. Otherwise runtime assertions should have been added in many
of the other functions marked GRAPH_RDLOCK in this series.

signature.asc
Description: PGP signature

Re: [PATCH 14/20] block: Mark bdrv_co_get_info() and callers GRAPH_RDLOCK

2023-05-01 Thread Stefan Hajnoczi

On Tue, Apr 25, 2023 at 07:31:52PM +0200, Kevin Wolf wrote:
> From: Emanuele Giuseppe Esposito 
> 
> This adds GRAPH_RDLOCK annotations to declare that callers of
> bdrv_co_get_info() need to hold a reader lock for the graph.
> 
> Signed-off-by: Emanuele Giuseppe Esposito 
> Signed-off-by: Kevin Wolf 
> ---
>  include/block/block-io.h |  7 +--
>  include/block/block_int-common.h |  4 ++--
>  block.c  |  2 ++
>  block/crypto.c   |  2 +-
>  block/io.c   | 11 +--
>  block/mirror.c   |  8 ++--
>  block/raw-format.c   |  2 +-
>  7 files changed, 22 insertions(+), 14 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 15/20] block: Mark bdrv_co_debug_event() GRAPH_RDLOCK

2023-05-01 Thread Stefan Hajnoczi

On Tue, Apr 25, 2023 at 07:31:53PM +0200, Kevin Wolf wrote:
> From: Emanuele Giuseppe Esposito 
> 
> This adds GRAPH_RDLOCK annotations to declare that callers of
> bdrv_co_debug_event() need to hold a reader lock for the graph.
> 
> Unfortunately we cannot use a co_wrapper_bdrv_rdlock, because the
> function is called by mixed functions that run both in coroutine and
> non-coroutine context (for example blkdebug_open).
> 
> Signed-off-by: Emanuele Giuseppe Esposito 
> Signed-off-by: Kevin Wolf 
> ---
>  include/block/block-io.h | 9 +
>  include/block/block_int-common.h | 4 ++--
>  block.c  | 2 ++
>  3 files changed, 9 insertions(+), 6 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 16/20] block: Mark BlockDriver callbacks for amend job GRAPH_RDLOCK

2023-05-01 Thread Stefan Hajnoczi

On Tue, Apr 25, 2023 at 07:31:54PM +0200, Kevin Wolf wrote:
> From: Emanuele Giuseppe Esposito 
> 
> This adds GRAPH_RDLOCK annotations to declare that callers of amend
> callbacks in BlockDriver need to hold a reader lock for the graph.
> 
> Signed-off-by: Emanuele Giuseppe Esposito 
> Signed-off-by: Kevin Wolf 
> ---
>  include/block/block_int-common.h | 12 ++--
>  block/amend.c|  8 +++-
>  2 files changed, 13 insertions(+), 7 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 17/20] block: Mark bdrv_query_bds_stats() and callers GRAPH_RDLOCK

2023-05-01 Thread Stefan Hajnoczi

On Tue, Apr 25, 2023 at 07:31:55PM +0200, Kevin Wolf wrote:
> This adds GRAPH_RDLOCK annotations to declare that callers of
> bdrv_query_bds_stats() need to hold a reader lock for the graph because
> it accesses the children list of a node.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/qapi.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 18/20] block: Mark bdrv_query_block_graph_info() and callers GRAPH_RDLOCK

2023-05-01 Thread Stefan Hajnoczi

On Tue, Apr 25, 2023 at 07:31:56PM +0200, Kevin Wolf wrote:
> This adds GRAPH_RDLOCK annotations to declare that callers of
> bdrv_query_block_graph_info() need to hold a reader lock for the graph
> because it accesses the children list of a node.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  include/block/qapi.h | 7 ---
>  qemu-img.c   | 2 ++
>  2 files changed, 6 insertions(+), 3 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 19/20] block: Mark bdrv_recurse_can_replace() and callers GRAPH_RDLOCK

2023-05-01 Thread Stefan Hajnoczi

On Tue, Apr 25, 2023 at 07:31:57PM +0200, Kevin Wolf wrote:
> This adds GRAPH_RDLOCK annotations to declare that callers of
> bdrv_recurse_can_replace() need to hold a reader lock for the graph
> because it accesses the children list of a node.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  include/block/block-global-state.h | 5 +++--
>  include/block/block_int-common.h   | 4 ++--
>  include/block/block_int-global-state.h | 4 ++--
>  block/blkverify.c  | 5 +++--
>  block/mirror.c | 4 
>  block/quorum.c | 4 ++--
>  blockdev.c | 3 +++
>  7 files changed, 19 insertions(+), 10 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 20/20] block: Mark bdrv_refresh_limits() and callers GRAPH_RDLOCK

2023-05-01 Thread Stefan Hajnoczi

On Tue, Apr 25, 2023 at 07:31:58PM +0200, Kevin Wolf wrote:
> This adds GRAPH_RDLOCK annotations to declare that callers of
> bdrv_refresh_limits() need to hold a reader lock for the graph because
> it accesses the children list of a node.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  include/block/block-global-state.h | 5 -
>  include/block/block_int-common.h   | 3 ++-
>  block.c| 9 +
>  block/io.c | 1 -
>  4 files changed, 15 insertions(+), 3 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [RFC PATCH v3 00/20] configure: create a python venv and ensure meson, sphinx

2023-05-01 Thread John Snow

On Mon, Apr 24, 2023, 4:02 PM John Snow  wrote:

> GitLab CI: https://gitlab.com/jsnow/qemu/-/pipelines/846869409
>(All green, except Python self-tests, see below)
>
> This patch series creates a mandatory python virtual environment
> ("venv") during configure time and uses it to ensure the availability of
> meson and sphinx.
>
> See https://www.qemu.org/2023/03/24/python/ for details. The summary is
> that the goal of this series is to ensure that the `python` used to run
> meson is the same `python` used to run Sphinx, tests, and any build-time
> python scripting we have. As it stands, meson and sphinx (and their
> extensions) *may* run in a different python environment than the one
> configured and chosen by the user at configure/build time.
>
> The effective change of this series is that QEMU will now
> unconditionally create a venv at configure-time and will ensure that
> meson (and sphinx, if docs are enabled) are available through that venv.
>
> Some important points as a pre-emptive "FAQ":
>
> - This venv is unconditionally created and lives at {build_dir}/pyvenv.
>
> - The python interpreter used by this venv is always the one identified
>   by configure. (Which in turn is always the one specified by --python
>   or $PYTHON)
>
> - *almost* all python scripts in qemu.git executed as part of the build
>   system, meson, sphinx, avocado tests, vm tests or CI are always
>   executed within this venv.
>
>   (iotests are not yet integrated; I plan to tackle this separately as a
>   follow-up in order to have a more tightly focused scope on that
>   series.)
>
> - It remains possible to build and test fully offline.
>   (In most cases, you just need meson and sphinx from your distro's repo.)
>
> - Distribution packaged 'meson' and 'sphinx' are still utilized whenever
>   possible as the highest preference.
>
> - Vendored versions of e.g. 'meson' are always preferred to PyPI
>   versions for speed, repeatability and ensuring tarball builds work
>   as-is offline.
>
>   (Sphinx will not be vendored, just like it already isn't.)
>
> - Missing dependencies, when possible, are fetched and installed
>   on-demand automatically to make developer environments "just work".
>
> - Works for Python 3.7 and up, on Fedora, OpenSuSE, Red Hat, CentOS,
>   Alpine, Debian, Ubuntu, NetBSD, OpenBSD, and hopefully everywhere
>
> - No new dependencies (...for most platforms. Debian and NetBSD get an
>   asterisk.)
>
> - The meson git submodule is unused after this series and can be removed.
>
> For reviewers, here's how the series is broken up:
>
> Patch 1 is a testing pre-req. Note that even with this patch,
> 'check-python-minreqs' and 'check-python-tox' CI jobs will both still
> fail on origin/master because this series requires 3.7+, but
> origin/master is currently still 3.6+.
>
> - python: update pylint configuration
>
> Patches 2-8 add the mkvenv script. The first patch checks in the barest
> essentials, and each subsequent patch adds a workaround or feature one
> at a time.
>
> - python: add mkvenv.py
> - mkvenv: add console script entry point generation
> - mkvenv: Add better error message for missing pyexapt module
> - mkvenv: generate console entry shims from inside the venv
> - mkvenv: work around broken pip installations on Debian 10
> - mkvenv: add nested venv workaround
> - mkvenv: add ensure subcommand
>
> Patches 9-11 modify our testing configuration to add new dependencies as
> needed.
>
> - tests/docker: add python3-venv dependency
> - tests/vm: Configure netbsd to use Python 3.10
> - tests/vm: add py310-expat to NetBSD
>
> Patch 12 changes how we package release tarballs.
>
> - scripts/make-release: download meson==0.61.5 .whl
>
> Patches 13-16 wire mkvenv into configure and tests.
>
> - configure: create a python venv unconditionally
> - configure: use 'mkvenv ensure meson' to bootstrap meson
> - configure: add --enable-pypi and --disable-pypi
> - tests: Use configure-provided pyvenv for tests
>
> Patches 17-20 delegate Sphinx bootstrapping to mkvenv. Some of these
> changes could be folded earlier in the series (like the diagnose()
> patch), but I'm keeping it separate for review for now.
>
> - configure: move --enable-docs and --disable-docs back to configure
> - mkvenv: add diagnose() method for ensure() failures
> - configure: use --diagnose option with meson ensure
> - configure: bootstrap sphinx with mkvenv
>
> That's all for now, seeya!
> --js
>
> John Snow (20):
>   python: update pylint configuration
>   python: add mkvenv.py
>   mkvenv: add console script entry point generation
>   mkvenv: Add better error message for missing pyexpat module
>   mkvenv: generate console entry shims from inside the venv
>   mkvenv: work around broken pip installations on Debian 10
>   mkvenv: add nested venv workaround
>   mkvenv: add ensure subcommand
>   tests/docker: add python3-venv dependency
>   tests/vm: Configure netbsd to use Python 3.10
>   tests/vm: add py310-expat to NetBSD
>   scripts/make

Re: [PATCH v3 08/19] qemu/bitops.h: Limit rotate amounts

2023-05-01 Thread Richard Henderson


On 4/28/23 15:47, Lawrence Hunter wrote:

From: Dickon Hood

Rotates have been fixed up to only allow for reasonable rotate amounts
(ie, no rotates >7 on an 8b value etc.)  This fixes a problem with riscv
vector rotate instructions.

Signed-off-by: Dickon Hood
Reviewed-by: Richard Henderson
---
  include/qemu/bitops.h | 24 
  1 file changed, 16 insertions(+), 8 deletions(-)


Queued to tcg-next.


r~

Re: [PATCH v3 10/19] qemu/host-utils.h: Add clz and ctz functions for lower-bit integers

2023-05-01 Thread Richard Henderson


On 4/28/23 15:47, Lawrence Hunter wrote:

From: Kiran Ostrolenk

This is for use in the RISC-V vclz and vctz instructions (implemented in
proceeding commit).

Signed-off-by: Kiran Ostrolenk
Reviewed-by: Richard Henderson
---
  include/qemu/host-utils.h | 54 +++
  1 file changed, 54 insertions(+)


Queued to tcg-next.

r~

Re: [PATCH v3 09/19] tcg: Add andcs and rotrs tcg gvec ops

2023-05-01 Thread Richard Henderson


On 4/28/23 15:47, Lawrence Hunter wrote:

From: Nazar Kazakov 

This commit adds helper functions and tcg operation definitions for the andcs 
and rotrs instructions

Signed-off-by: Nazar Kazakov 
---
  accel/tcg/tcg-runtime-gvec.c | 11 +++
  accel/tcg/tcg-runtime.h  |  1 +
  include/tcg/tcg-op-gvec.h|  4 
  tcg/tcg-op-gvec.c| 23 +++
  4 files changed, 39 insertions(+)


Queued to tcg-next as two patches, and with alterations:


+void tcg_gen_gvec_andcs(unsigned vece, uint32_t dofs, uint32_t aofs,
+TCGv_i64 c, uint32_t oprsz, uint32_t maxsz)
+{
+static GVecGen2s g = {
+.fni8 = tcg_gen_andc_i64,
+.fniv = tcg_gen_andc_vec,
+.fno = gen_helper_gvec_andcs,
+.prefer_i64 = TCG_TARGET_REG_BITS == 64,
+.vece = MO_64
+};
+
+tcg_gen_dup_i64(vece, c, c);
+tcg_gen_gvec_2s(dofs, aofs, oprsz, maxsz, c, &g);
+}


This needed a temporary.


+void tcg_gen_gvec_rotrs(unsigned vece, uint32_t dofs, uint32_t aofs,
+TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz)
+{
+TCGv_i32 tmp = tcg_temp_new_i32();
+tcg_gen_sub_i32(tmp, tcg_constant_i32(1 << (vece + 3)), shift);
+tcg_gen_gvec_rotls(vece, dofs, aofs, tmp, oprsz, maxsz);
+}


This needed the rotation count to be masked (32 - 0 == 32 is illegal).
Simplified as (-shift & mask).


r~

Re: [PATCH 00/10] tracing: remove dynamic vcpu state

2023-05-01 Thread Stefan Hajnoczi

On Thu, Apr 20, 2023 at 03:59:59PM +0100, Alex Bennée wrote:
> The references dynamic vcpu tracing support was removed when the
> original TCG trace points where removed. However there was still a
> legacy of dynamic trace state to track this in cpu.h and extra hash
> variables to track TBs. While the removed vcpu tracepoints are not in
> generated code (or helpers) they still bring in a bunch of machinery
> to manage the state so I've pulled them out. We could just replace
> them with static trace points which dump vcpu->index as one of their
> arguments because they don't happen that often.
> 
> While most of the changes are excising bits of the tracing code I've
> also cleaned up the xxhash function use and simplified the core
> function to qemu_xxhash6.
> 
> Please review.
> 
> Alex Bennée (10):
>   *-user: remove the guest_user_syscall tracepoints
>   trace-events: remove the remaining vcpu trace events
>   trace: remove vcpu_id from the TraceEvent structure
>   scripts/qapi: document the tool that generated the file
>   qapi: make the vcpu parameters deprecated for 8.1
>   trace: remove code that depends on setting vcpu
>   trace: remove control-vcpu.h
>   tcg: remove the final vestiges of dstate
>   hw/9pfs: use qemu_xxhash4
>   xxhash: remove qemu_xxhash7
> 
>  qapi/trace.json   |  22 +++
>  accel/tcg/tb-hash.h   |   6 +-
>  include/exec/exec-all.h   |   3 -
>  include/hw/core/cpu.h |   5 --
>  include/qemu/xxhash.h |  17 ++
>  include/user/syscall-trace.h  |   4 --
>  trace/control-internal.h  |  10 ---
>  trace/control-vcpu.h  |  63 ---
>  trace/control.h   |  48 ---
>  trace/event-internal.h|   2 -
>  accel/tcg/cpu-exec.c  |   7 +--
>  accel/tcg/tb-maint.c  |   5 +-
>  accel/tcg/translate-all.c |   6 --
>  bsd-user/freebsd/os-syscall.c |   2 -
>  hw/9pfs/9p.c  |   4 +-
>  hw/core/cpu-common.c  |   4 --
>  stubs/trace-control.c |  13 
>  trace/control-target.c| 111 +++---
>  trace/control.c   |  28 -
>  trace/qmp.c   |  76 +++
>  trace/trace-hmp-cmds.c|  17 +-
>  scripts/qapi/gen.py   |   4 +-
>  scripts/tracetool/format/c.py |   6 --
>  scripts/tracetool/format/h.py |  16 +
>  trace-events  |  50 ---
>  25 files changed, 43 insertions(+), 486 deletions(-)
>  delete mode 100644 trace/control-vcpu.h

Nice job! I'm happy to merge it but will wait for discussion to finish.

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH] vhost-user: send SET_STATUS 0 after GET_VRING_BASE

2023-05-01 Thread Stefan Hajnoczi

On Fri, Apr 21, 2023 at 01:30:48PM +0800, Yajun Wu wrote:
> 
> On 4/20/2023 9:07 PM, Stefan Hajnoczi wrote:
> > 
> > Setting the VIRTIO Device Status Field to 0 resets the device. The
> > device's state is lost, including the vring configuration.
> > 
> > vhost-user.c currently sends SET_STATUS 0 before GET_VRING_BASE. This
> > risks confusion about the lifetime of the vhost-user state (e.g. vring
> > last_avail_idx) across VIRTIO device reset.
> > 
> > Eugenio Pérez  adjusted the order for vhost-vdpa.c
> > in commit c3716f260bff ("vdpa: move vhost reset after get vring base")
> > and in that commit description suggested doing the same for vhost-user
> > in the future.
> > 
> > Go ahead and adjust vhost-user.c now. I ran various online code searches
> > to identify vhost-user backends implementing SET_STATUS. It seems only
> > DPDK implements SET_STATUS and Yajun Wu  has
> > confirmed that it is safe to make this change.
> > 
> > Cc: Michael S. Tsirkin 
> > Cc: Cindy Lu 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >   hw/virtio/vhost-user.c | 13 -
> >   1 file changed, 12 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index e5285df4ba..2d40b1b3e7 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -2677,10 +2677,20 @@ static int vhost_user_dev_start(struct vhost_dev 
> > *dev, bool started)
> > VIRTIO_CONFIG_S_DRIVER |
> > VIRTIO_CONFIG_S_DRIVER_OK);
> >   } else {
> > -return vhost_user_set_status(dev, 0);
> > +return 0;
> >   }
> >   }
> > 
> > +static void vhost_user_reset_status(struct vhost_dev *dev)
> > +{
> > +/* Set device status only for last queue pair */
> > +if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
> > +return;
> > +}
> > +
> > +vhost_user_set_status(dev, 0);
> > +}
> > +
> >   const VhostOps user_ops = {
> >   .backend_type = VHOST_BACKEND_TYPE_USER,
> >   .vhost_backend_init = vhost_user_backend_init,
> > @@ -2716,4 +2726,5 @@ const VhostOps user_ops = {
> >   .vhost_get_inflight_fd = vhost_user_get_inflight_fd,
> >   .vhost_set_inflight_fd = vhost_user_set_inflight_fd,
> >   .vhost_dev_start = vhost_user_dev_start,
> > +.vhost_reset_status = vhost_user_reset_status,
> >   };
> > --
> > 2.39.2
> > 
> Thank you for this fix.
> 
> Can you add protocol feature bit check, just like we do in
> vhost_user_dev_start?
> 
>     if (!virtio_has_feature(dev->protocol_features,
>     VHOST_USER_PROTOCOL_F_STATUS)) {
>     return 0;
>     }

Sure, will fix in v2.

Stefan


signature.asc
Description: PGP signature

[PATCH v3] Hexagon (target/hexagon) Additional instructions handled by idef-parser

2023-05-01 Thread Taylor Simpson

 Changes in v3 
Fix bugs exposed by dpmpyss_rnd_s0 instruction
Set correct size/signedness for constants
Test cases added to tests/tcg/hexagon/misc.c

 Changes in v2 
Fix bug in imm_print identified in clang build

Currently, idef-parser skips all floating point instructions.  However,
there are some floating point instructions that can be handled.

The following instructions are now parsed
F2_sfimm_p
F2_sfimm_n
F2_dfimm_p
F2_dfimm_n
F2_dfmpyll
F2_dfmpylh

To make these instructions work, we fix some bugs in parser-helpers.c
gen_rvalue_extend
gen_cast_op
imm_print
lexer properly sets size/signedness of constants

Test cases added to tests/tcg/hexagon/fpstuff.c

Signed-off-by: Taylor Simpson 
---
 target/hexagon/idef-parser/parser-helpers.h |  2 +-
 target/hexagon/idef-parser/parser-helpers.c | 41 +++-
 tests/tcg/hexagon/fpstuff.c | 54 +
 tests/tcg/hexagon/misc.c| 35 +
 target/hexagon/gen_idef_parser_funcs.py | 10 +++-
 target/hexagon/idef-parser/idef-parser.lex  | 38 +--
 target/hexagon/idef-parser/idef-parser.y|  2 -
 7 files changed, 162 insertions(+), 20 deletions(-)

diff --git a/target/hexagon/idef-parser/parser-helpers.h 
b/target/hexagon/idef-parser/parser-helpers.h
index 1239d23a6a..7c58087169 100644
--- a/target/hexagon/idef-parser/parser-helpers.h
+++ b/target/hexagon/idef-parser/parser-helpers.h
@@ -80,7 +80,7 @@ void reg_compose(Context *c, YYLTYPE *locp, HexReg *reg, char 
reg_id[5]);
 
 void reg_print(Context *c, YYLTYPE *locp, HexReg *reg);
 
-void imm_print(Context *c, YYLTYPE *locp, HexImm *imm);
+void imm_print(Context *c, YYLTYPE *locp, HexValue *rvalue);
 
 void var_print(Context *c, YYLTYPE *locp, HexVar *var);
 
diff --git a/target/hexagon/idef-parser/parser-helpers.c 
b/target/hexagon/idef-parser/parser-helpers.c
index 86511efb62..0ad917f591 100644
--- a/target/hexagon/idef-parser/parser-helpers.c
+++ b/target/hexagon/idef-parser/parser-helpers.c
@@ -167,8 +167,9 @@ void reg_print(Context *c, YYLTYPE *locp, HexReg *reg)
 EMIT(c, "hex_gpr[%u]", reg->id);
 }
 
-void imm_print(Context *c, YYLTYPE *locp, HexImm *imm)
+void imm_print(Context *c, YYLTYPE *locp, HexValue *rvalue)
 {
+HexImm *imm = &rvalue->imm;
 switch (imm->type) {
 case I:
 EMIT(c, "i");
@@ -177,7 +178,21 @@ void imm_print(Context *c, YYLTYPE *locp, HexImm *imm)
 EMIT(c, "%ciV", imm->id);
 break;
 case VALUE:
-EMIT(c, "((int64_t) %" PRIu64 "ULL)", (int64_t) imm->value);
+if (rvalue->bit_width == 32) {
+if (rvalue->signedness == UNSIGNED) {
+EMIT(c, "((uint32_t) 0x%" PRIx32 ")", (uint32_t) imm->value);
+}  else {
+EMIT(c, "((int32_t) 0x%" PRIx32 ")", (int32_t) imm->value);
+}
+} else if (rvalue->bit_width == 64) {
+if (rvalue->signedness == UNSIGNED) {
+EMIT(c, "((uint64_t) 0x%" PRIx64 "ULL)", (uint64_t) 
imm->value);
+} else {
+EMIT(c, "((int64_t) 0x%" PRIx64 "LL)", (int64_t) imm->value);
+}
+} else {
+g_assert_not_reached();
+}
 break;
 case QEMU_TMP:
 EMIT(c, "qemu_tmp_%" PRIu64, imm->index);
@@ -213,7 +228,7 @@ void rvalue_print(Context *c, YYLTYPE *locp, void *pointer)
   tmp_print(c, locp, &rvalue->tmp);
   break;
   case IMMEDIATE:
-  imm_print(c, locp, &rvalue->imm);
+  imm_print(c, locp, rvalue);
   break;
   case VARID:
   var_print(c, locp, &rvalue->var);
@@ -386,13 +401,10 @@ HexValue gen_rvalue_extend(Context *c, YYLTYPE *locp, 
HexValue *rvalue)
 
 if (rvalue->type == IMMEDIATE) {
 HexValue res = gen_imm_qemu_tmp(c, locp, 64, rvalue->signedness);
-bool is_unsigned = (rvalue->signedness == UNSIGNED);
-const char *sign_suffix = is_unsigned ? "u" : "";
 gen_c_int_type(c, locp, 64, rvalue->signedness);
-OUT(c, locp, " ", &res, " = ");
-OUT(c, locp, "(", sign_suffix, "int64_t) ");
-OUT(c, locp, "(", sign_suffix, "int32_t) ");
-OUT(c, locp, rvalue, ";\n");
+OUT(c, locp, " ", &res, " = (");
+gen_c_int_type(c, locp, 64, rvalue->signedness);
+OUT(c, locp, ")", rvalue, ";\n");
 return res;
 } else {
 HexValue res = gen_tmp(c, locp, 64, rvalue->signedness);
@@ -961,9 +973,16 @@ HexValue gen_cast_op(Context *c,
 {
 assert_signedness(c, locp, src->signedness);
 if (src->bit_width == target_width) {
-return *src;
-} else if (src->type == IMMEDIATE) {
 HexValue res = *src;
+res.signedness = signedness;
+return res;
+} else if (src->type == IMMEDIATE) {
+HexValue res;
+if (src->bit_width < target_width) {
+res = gen_rvalue_extend(c, locp, src);
+} else {
+res = *src;
+}
 re

[PATCH] qemu/int128: Re-shuffle Int128Alias members

2023-05-01 Thread Richard Henderson

Clang 14, with --enable-tcg-interpreter errors with

include/qemu/int128.h:487:16: error: alignment of field 'i' (128 bits)
  does not match the alignment of the first field in transparent union;
  transparent_union attribute ignored [-Werror,-Wignored-attributes]
__int128_t i;
   ^
include/qemu/int128.h:486:12: note: alignment of first field is 64 bits
Int128 s;
   ^
1 error generated.

By placing the __uint128_t member first, this is avoided.

Signed-off-by: Richard Henderson 
---
 include/qemu/int128.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index f62a46b48c..9e46cfaefc 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -483,9 +483,9 @@ static inline void bswap128s(Int128 *s)
  */
 #ifdef CONFIG_INT128
 typedef union {
-Int128 s;
-__int128_t i;
 __uint128_t u;
+__int128_t i;
+Int128 s;
 } Int128Alias __attribute__((transparent_union));
 #else
 typedef Int128 Int128Alias;
-- 
2.34.1

[PATCH] migration/xbzrle: Use attribute((target)) for avx512

2023-05-01 Thread Richard Henderson

Use the attribute, which is supported by clang, instead of
the #pragma, which is not supported and, for some reason,
also not detected by the meson probe, so we fail by -Werror.

Signed-off-by: Richard Henderson 
---
 migration/xbzrle.c | 9 -
 meson.build| 5 +
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/migration/xbzrle.c b/migration/xbzrle.c
index c6f8b20917..258e4959c9 100644
--- a/migration/xbzrle.c
+++ b/migration/xbzrle.c
@@ -177,11 +177,11 @@ int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t 
*dst, int dlen)
 }
 
 #if defined(CONFIG_AVX512BW_OPT)
-#pragma GCC push_options
-#pragma GCC target("avx512bw")
 #include 
-int xbzrle_encode_buffer_avx512(uint8_t *old_buf, uint8_t *new_buf, int slen,
- uint8_t *dst, int dlen)
+
+int __attribute__((target("avx512bw")))
+xbzrle_encode_buffer_avx512(uint8_t *old_buf, uint8_t *new_buf, int slen,
+uint8_t *dst, int dlen)
 {
 uint32_t zrun_len = 0, nzrun_len = 0;
 int d = 0, i = 0, num = 0;
@@ -296,5 +296,4 @@ int xbzrle_encode_buffer_avx512(uint8_t *old_buf, uint8_t 
*new_buf, int slen,
 }
 return d;
 }
-#pragma GCC pop_options
 #endif
diff --git a/meson.build b/meson.build
index f71653d0c8..4bbdbcef37 100644
--- a/meson.build
+++ b/meson.build
@@ -2386,12 +2386,9 @@ config_host_data.set('CONFIG_AVX512F_OPT', 
get_option('avx512f') \
 config_host_data.set('CONFIG_AVX512BW_OPT', get_option('avx512bw') \
   .require(have_cpuid_h, error_message: 'cpuid.h not available, cannot enable 
AVX512BW') \
   .require(cc.links('''
-#pragma GCC push_options
-#pragma GCC target("avx512bw")
 #include 
 #include 
-static int bar(void *a) {
-
+static int __attribute__((target("avx512bw"))) bar(void *a) {
   __m512i *x = a;
   __m512i res= _mm512_abs_epi8(*x);
   return res[1];
-- 
2.34.1

Re: [PATCH v2 1/4] ppc: spapr: cleanup cr get/store in [h_enter|spapr_exit]_nested with helpers.

2023-05-01 Thread Nicholas Piggin

On Tue Apr 25, 2023 at 12:47 AM AEST, Harsh Prateek Bora wrote:
> The bits in cr reg are grouped into eight 4-bit fields represented
> by env->crf[8] and the related calculations should be abstracted to
> keep the calling routines simpler to read. This is a step towards
> cleaning up the [h_enter|spapr_exit]_nested calls for better readability.
>
> Signed-off-by: Harsh Prateek Bora 
> Reviewed-by: Fabiano Rosas 
> ---
>  hw/ppc/spapr_hcall.c | 18 ++

Could you either convert all callers, or do implementation and
conversion as separate patches. Preference for former if you can
be bothered.

save_user_regs(), restore_user_regs(), gdb read/write register * 2,
kvm_arch_get/put_registers, monitor_get_ccr, at a quick glance.

>  target/ppc/cpu.c | 17 +
>  target/ppc/cpu.h |  2 ++
>  3 files changed, 21 insertions(+), 16 deletions(-)
>
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index ec4def62f8..124cee5e53 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c

[snip]

> diff --git a/target/ppc/cpu.c b/target/ppc/cpu.c
> index 1a97b41c6b..3b444e58b5 100644
> --- a/target/ppc/cpu.c
> +++ b/target/ppc/cpu.c
> @@ -67,6 +67,23 @@ uint32_t ppc_get_vscr(CPUPPCState *env)
>  return env->vscr | (sat << VSCR_SAT);
>  }
>  
> +void ppc_store_cr(CPUPPCState *env, uint64_t cr)

Set is normal counterpart to get. Or load and store, but
I think set and get is probably better.

Good refactoring though, it shouldn't be open-coded everywhere.

Thanks,
Nick

> +{
> +for (int i = 7; i >= 0; i--) {
> +env->crf[i] = cr & 15;
> +cr >>= 4;
> +}
> +}
> +
> +uint64_t ppc_get_cr(CPUPPCState *env)
> +{
> +uint64_t cr = 0;
> +for (int i = 0; i < 8; i++) {
> +cr |= (env->crf[i] & 15) << (4 * (7 - i));
> +}
> +return cr;
> +}
> +
>  /* GDBstub can read and write MSR... */
>  void ppc_store_msr(CPUPPCState *env, target_ulong value)
>  {
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index 557d736dab..b4c21459f1 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -2773,6 +2773,8 @@ void dump_mmu(CPUPPCState *env);
>  void ppc_maybe_bswap_register(CPUPPCState *env, uint8_t *mem_buf, int len);
>  void ppc_store_vscr(CPUPPCState *env, uint32_t vscr);
>  uint32_t ppc_get_vscr(CPUPPCState *env);
> +void ppc_store_cr(CPUPPCState *env, uint64_t cr);
> +uint64_t ppc_get_cr(CPUPPCState *env);
>  
>  
> /*/
>  /* Power management enable checks
> */
> -- 
> 2.31.1

Re: [PATCH v2 2/4] ppc: spapr: cleanup h_enter_nested() with helper routines.

2023-05-01 Thread Nicholas Piggin

On Tue Apr 25, 2023 at 12:47 AM AEST, Harsh Prateek Bora wrote:
> h_enter_nested() currently does a lot of register specific operations
> which should be abstracted logically to simplify the code for better
> readability. This patch breaks down relevant blocks into respective
> helper routines to make use of them for better readability/maintenance.
>
> Signed-off-by: Harsh Prateek Bora 
> ---
>  hw/ppc/spapr_hcall.c | 117 ---
>  1 file changed, 78 insertions(+), 39 deletions(-)
>
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 124cee5e53..f24d4b368e 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1544,6 +1544,81 @@ static target_ulong h_copy_tofrom_guest(PowerPCCPU 
> *cpu,
>  return H_FUNCTION;
>  }
>  
> +static void restore_hdec_from_hvstate(CPUPPCState *dst,
> +  struct kvmppc_hv_guest_state *hv_state,
> +  target_ulong now)
> +{
> +target_ulong hdec;
> +
> +assert(hv_state);
> +hdec = hv_state->hdec_expiry - now;
> +cpu_ppc_hdecr_init(dst);
> +cpu_ppc_store_hdecr(dst, hdec);
> +}
> +
> +static void restore_lpcr_from_hvstate(PowerPCCPU *cpu,
> +  struct kvmppc_hv_guest_state *hv_state)
> +{
> +PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
> +CPUPPCState *dst = &cpu->env;
> +target_ulong lpcr, lpcr_mask;
> +
> +assert(hv_state);
> +lpcr_mask = LPCR_DPFD | LPCR_ILE | LPCR_AIL | LPCR_LD | LPCR_MER;
> +lpcr = (dst->spr[SPR_LPCR] & ~lpcr_mask) | (hv_state->lpcr & lpcr_mask);
> +lpcr |= LPCR_HR | LPCR_UPRT | LPCR_GTSE | LPCR_HVICE | LPCR_HDICE;
> +lpcr &= ~LPCR_LPES0;
> +dst->spr[SPR_LPCR] = lpcr & pcc->lpcr_mask;
> +}
> +
> +static void restore_env_from_ptregs(CPUPPCState *env,
> +struct kvmppc_pt_regs *regs)
> +{
> +assert(env);
> +assert(regs);
> +assert(sizeof(env->gpr) == sizeof(regs->gpr));
> +memcpy(env->gpr, regs->gpr, sizeof(env->gpr));
> +env->nip = regs->nip;
> +env->msr = regs->msr;
> +env->lr = regs->link;
> +env->ctr = regs->ctr;
> +cpu_write_xer(env, regs->xer);
> +ppc_store_cr(env, regs->ccr);
> +}
> +
> +static void restore_env_from_hvstate(CPUPPCState *env,
> + struct kvmppc_hv_guest_state *hv_state)
> +{
> +assert(env);
> +assert(hv_state);
> +env->spr[SPR_HFSCR] = hv_state->hfscr;
> +/* TCG does not implement DAWR*, CIABR, PURR, SPURR, IC, VTB, HEIR SPRs*/
> +env->cfar = hv_state->cfar;
> +env->spr[SPR_PCR] = hv_state->pcr;
> +env->spr[SPR_DPDES] = hv_state->dpdes;
> +env->spr[SPR_SRR0] = hv_state->srr0;
> +env->spr[SPR_SRR1] = hv_state->srr1;
> +env->spr[SPR_SPRG0] = hv_state->sprg[0];
> +env->spr[SPR_SPRG1] = hv_state->sprg[1];
> +env->spr[SPR_SPRG2] = hv_state->sprg[2];
> +env->spr[SPR_SPRG3] = hv_state->sprg[3];
> +env->spr[SPR_BOOKS_PID] = hv_state->pidr;
> +env->spr[SPR_PPR] = hv_state->ppr;
> +}
> +
> +static inline void restore_l2_env(PowerPCCPU *cpu,
> +   struct kvmppc_hv_guest_state *hv_state,
> +   struct kvmppc_pt_regs *regs,
> +   target_ulong now)
> +{
> +CPUPPCState *env = &cpu->env;
> +
> +restore_env_from_ptregs(env, regs);
> +restore_env_from_hvstate(env, hv_state);
> +restore_lpcr_from_hvstate(cpu, hv_state);
> +restore_hdec_from_hvstate(env, hv_state, now);
> +}
> +
>  /*
>   * When this handler returns, the environment is switched to the L2 guest
>   * and TCG begins running that. spapr_exit_nested() performs the switch from
> @@ -1554,14 +1629,12 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
> target_ulong opcode,
> target_ulong *args)
>  {
> -PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>  CPUState *cs = CPU(cpu);
>  CPUPPCState *env = &cpu->env;
>  SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
>  target_ulong hv_ptr = args[0];
>  target_ulong regs_ptr = args[1];
> -target_ulong hdec, now = cpu_ppc_load_tbl(env);
> -target_ulong lpcr, lpcr_mask;
> +target_ulong now = cpu_ppc_load_tbl(env);
>  struct kvmppc_hv_guest_state *hvstate;
>  struct kvmppc_hv_guest_state hv_state;
>  struct kvmppc_pt_regs *regs;
> @@ -1607,49 +1680,15 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
>  return H_P2;
>  }
>  
> -len = sizeof(env->gpr);
> -assert(len == sizeof(regs->gpr));
> -memcpy(env->gpr, regs->gpr, len);
> -
> -env->lr = regs->link;
> -env->ctr = regs->ctr;
> -cpu_write_xer(env, regs->xer);
> -ppc_store_cr(env, regs->ccr);
> -
> -env->msr = regs->msr;
> -env->nip = regs->nip;
> +/* restore L2 env from hv_state and ptregs */
> +restore_l2_env(cpu, &hv_stat

Re: [PATCH v2 1/4] ppc: spapr: cleanup cr get/store in [h_enter|spapr_exit]_nested with helpers.

2023-05-01 Thread Harsh Prateek Bora





On 5/2/23 10:07, Nicholas Piggin wrote:

On Tue Apr 25, 2023 at 12:47 AM AEST, Harsh Prateek Bora wrote:

The bits in cr reg are grouped into eight 4-bit fields represented
by env->crf[8] and the related calculations should be abstracted to
keep the calling routines simpler to read. This is a step towards
cleaning up the [h_enter|spapr_exit]_nested calls for better readability.

Signed-off-by: Harsh Prateek Bora 
Reviewed-by: Fabiano Rosas 
---
  hw/ppc/spapr_hcall.c | 18 ++


Could you either convert all callers, or do implementation and
conversion as separate patches. Preference for former if you can
be bothered.

save_user_regs(), restore_user_regs(), gdb read/write register * 2,
kvm_arch_get/put_registers, monitor_get_ccr, at a quick glance.


Sure, I can include other consumers as well in the patches.
I usually prefer separate patches for implementation/conversion but 
since the implementation is a small change, I hope either approach is fine.





  target/ppc/cpu.c | 17 +
  target/ppc/cpu.h |  2 ++
  3 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index ec4def62f8..124cee5e53 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c


[snip]


diff --git a/target/ppc/cpu.c b/target/ppc/cpu.c
index 1a97b41c6b..3b444e58b5 100644
--- a/target/ppc/cpu.c
+++ b/target/ppc/cpu.c
@@ -67,6 +67,23 @@ uint32_t ppc_get_vscr(CPUPPCState *env)
  return env->vscr | (sat << VSCR_SAT);
  }
  
+void ppc_store_cr(CPUPPCState *env, uint64_t cr)


Set is normal counterpart to get. Or load and store, but
I think set and get is probably better.


Sure, make sense.


Good refactoring though, it shouldn't be open-coded everywhere.


Thanks,
Harsh


Thanks,
Nick


+{
+for (int i = 7; i >= 0; i--) {
+env->crf[i] = cr & 15;
+cr >>= 4;
+}
+}
+
+uint64_t ppc_get_cr(CPUPPCState *env)
+{
+uint64_t cr = 0;
+for (int i = 0; i < 8; i++) {
+cr |= (env->crf[i] & 15) << (4 * (7 - i));
+}
+return cr;
+}
+
  /* GDBstub can read and write MSR... */
  void ppc_store_msr(CPUPPCState *env, target_ulong value)
  {
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 557d736dab..b4c21459f1 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -2773,6 +2773,8 @@ void dump_mmu(CPUPPCState *env);
  void ppc_maybe_bswap_register(CPUPPCState *env, uint8_t *mem_buf, int len);
  void ppc_store_vscr(CPUPPCState *env, uint32_t vscr);
  uint32_t ppc_get_vscr(CPUPPCState *env);
+void ppc_store_cr(CPUPPCState *env, uint64_t cr);
+uint64_t ppc_get_cr(CPUPPCState *env);
  
  /*/

  /* Power management enable checks
*/
--
2.31.1

Re: [PATCH v2 3/4] ppc: spapr: cleanup spapr_exit_nested() with helper routines.

2023-05-01 Thread Nicholas Piggin

On Tue Apr 25, 2023 at 12:47 AM AEST, Harsh Prateek Bora wrote:
> Currently, in spapr_exit_nested(), it does a lot of register state
> restoring from ptregs/hvstate after mapping each of those before
> restoring the L1 host state. This patch breaks down those set of ops
> to respective helper routines for better code readability/maintenance.
>
> Signed-off-by: Harsh Prateek Bora 
> ---
>  hw/ppc/spapr_hcall.c | 120 ++-
>  1 file changed, 72 insertions(+), 48 deletions(-)
>
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index f24d4b368e..e69634bc22 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1719,45 +1719,14 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
>  return env->gpr[3];
>  }
>  
> -void spapr_exit_nested(PowerPCCPU *cpu, int excp)
> +static void restore_hvstate_from_env(struct kvmppc_hv_guest_state *hvstate,
> + CPUPPCState *env, int excp)
>  {
> -CPUState *cs = CPU(cpu);
> -CPUPPCState *env = &cpu->env;
> -SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
> -target_ulong r3_return = env->excp_vectors[excp]; /* hcall return value 
> */
> -target_ulong hv_ptr = spapr_cpu->nested_host_state->gpr[4];
> -target_ulong regs_ptr = spapr_cpu->nested_host_state->gpr[5];
> -struct kvmppc_hv_guest_state *hvstate;
> -struct kvmppc_pt_regs *regs;
> -hwaddr len;
> -
> -assert(spapr_cpu->in_nested);
> -
> -cpu_ppc_hdecr_exit(env);
> -
> -len = sizeof(*hvstate);
> -hvstate = address_space_map(CPU(cpu)->as, hv_ptr, &len, true,
> -MEMTXATTRS_UNSPECIFIED);
> -if (len != sizeof(*hvstate)) {
> -address_space_unmap(CPU(cpu)->as, hvstate, len, 0, true);
> -r3_return = H_PARAMETER;
> -goto out_restore_l1;
> -}
> -
>  hvstate->cfar = env->cfar;
>  hvstate->lpcr = env->spr[SPR_LPCR];
>  hvstate->pcr = env->spr[SPR_PCR];
>  hvstate->dpdes = env->spr[SPR_DPDES];
>  hvstate->hfscr = env->spr[SPR_HFSCR];
> -
> -if (excp == POWERPC_EXCP_HDSI) {
> -hvstate->hdar = env->spr[SPR_HDAR];
> -hvstate->hdsisr = env->spr[SPR_HDSISR];
> -hvstate->asdr = env->spr[SPR_ASDR];
> -} else if (excp == POWERPC_EXCP_HISI) {
> -hvstate->asdr = env->spr[SPR_ASDR];
> -}
> -
>  /* HEIR should be implemented for HV mode and saved here. */
>  hvstate->srr0 = env->spr[SPR_SRR0];
>  hvstate->srr1 = env->spr[SPR_SRR1];
> @@ -1768,27 +1737,43 @@ void spapr_exit_nested(PowerPCCPU *cpu, int excp)
>  hvstate->pidr = env->spr[SPR_BOOKS_PID];
>  hvstate->ppr = env->spr[SPR_PPR];
>  
> -/* Is it okay to specify write length larger than actual data written? */
> -address_space_unmap(CPU(cpu)->as, hvstate, len, len, true);
> +if (excp == POWERPC_EXCP_HDSI) {
> +hvstate->hdar = env->spr[SPR_HDAR];
> +hvstate->hdsisr = env->spr[SPR_HDSISR];
> +hvstate->asdr = env->spr[SPR_ASDR];
> +} else if (excp == POWERPC_EXCP_HISI) {
> +hvstate->asdr = env->spr[SPR_ASDR];
> +}
> +}
>  
> -len = sizeof(*regs);
> -regs = address_space_map(CPU(cpu)->as, regs_ptr, &len, true,
> +static int map_and_restore_l2_hvstate(PowerPCCPU *cpu, int excp, 
> target_ulong *r3)
> +{
> +CPUPPCState *env = &cpu->env;
> +SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
> +target_ulong hv_ptr = spapr_cpu->nested_host_state->gpr[4];
> +struct kvmppc_hv_guest_state *hvstate;
> +hwaddr len = sizeof(*hvstate);
> +
> +hvstate = address_space_map(CPU(cpu)->as, hv_ptr, &len, true,
>  MEMTXATTRS_UNSPECIFIED);
> -if (!regs || len != sizeof(*regs)) {
> -address_space_unmap(CPU(cpu)->as, regs, len, 0, true);
> -r3_return = H_P2;
> -goto out_restore_l1;
> +if (len != sizeof(*hvstate)) {
> +address_space_unmap(CPU(cpu)->as, hvstate, len, 0, true);
> +*r3 = H_PARAMETER;
> +return -1;
>  }
> +restore_hvstate_from_env(hvstate, env, excp);
> +/* Is it okay to specify write length larger than actual data written? */
> +address_space_unmap(CPU(cpu)->as, hvstate, len, len, true);
> +return 0;
> +}
>  
> +static void restore_ptregs_from_env(struct kvmppc_pt_regs *regs,
> +CPUPPCState *env, int excp)
> +{
> +hwaddr len;
>  len = sizeof(env->gpr);
>  assert(len == sizeof(regs->gpr));
>  memcpy(regs->gpr, env->gpr, len);
> -
> -regs->link = env->lr;
> -regs->ctr = env->ctr;
> -regs->xer = cpu_read_xer(env);
> -regs->ccr = ppc_get_cr(env);
> -
>  if (excp == POWERPC_EXCP_MCHECK ||
>  excp == POWERPC_EXCP_RESET ||
>  excp == POWERPC_EXCP_SYSCALL) {
> @@ -1798,11 +1783,50 @@ void spapr_exit_nested(PowerPCCPU *cpu, int excp)
>  regs->nip = env->spr[SPR_HSRR0];
>  regs->msr = env->spr[SPR_HSRR1] & env->msr_mask;
>  }

Re: [PATCH] tests/9p: fix potential leak in v9fs_rreaddir()

2023-05-01 Thread Greg Kurz

On Sat, 29 Apr 2023 15:20:12 +0200
Christian Schoenebeck  wrote:

> On Saturday, April 29, 2023 2:04:30 PM CEST Greg Kurz wrote:
> > Hi Christian !
> 
> Hi there, it's been a while! :)
> 
> > On Sat, 29 Apr 2023 11:25:33 +0200
> > Christian Schoenebeck  wrote:
> > 
> > > Free allocated directory entries in v9fs_rreaddir() if argument
> > > `entries` was passed as NULL, to avoid a memory leak. It is
> > > explicitly allowed by design for `entries` to be NULL. [1]
> > > 
> > > [1] https://lore.kernel.org/all/1690923.g4PEXVpXuU@silver
> > > 
> > > Reported-by: Coverity (CID 1487558)
> > > Signed-off-by: Christian Schoenebeck 
> > > ---
> > 
> > Good catch Coverity ! :-)
> 
> Yeah, this Coverity report is actually from March and I ignored it so far,
> because the reported leak could never happen with current test code. But Paolo
> brought it up this week, so ...
> 
> > Reviewed-by: Greg Kurz 
> > 
> > I still have a suggestion. See below.
> > 
> > >  tests/qtest/libqos/virtio-9p-client.c | 5 +
> > >  1 file changed, 5 insertions(+)
> > > 
> > > diff --git a/tests/qtest/libqos/virtio-9p-client.c 
> > > b/tests/qtest/libqos/virtio-9p-client.c
> > > index e4a368e036..b8adc8d4b9 100644
> > > --- a/tests/qtest/libqos/virtio-9p-client.c
> > > +++ b/tests/qtest/libqos/virtio-9p-client.c
> > > @@ -594,6 +594,8 @@ void v9fs_rreaddir(P9Req *req, uint32_t *count, 
> > > uint32_t *nentries,
> > >  {
> > >  uint32_t local_count;
> > >  struct V9fsDirent *e = NULL;
> > > +/* only used to avoid a leak if entries was NULL */
> > > +struct V9fsDirent *unused_entries = NULL;
> > >  uint16_t slen;
> > >  uint32_t n = 0;
> > >  
> > > @@ -612,6 +614,8 @@ void v9fs_rreaddir(P9Req *req, uint32_t *count, 
> > > uint32_t *nentries,
> > >  e = g_new(struct V9fsDirent, 1);
> > >  if (entries) {
> > >  *entries = e;
> > > +} else {
> > > +unused_entries = e;
> > >  }
> > >  } else {
> > >  e = e->next = g_new(struct V9fsDirent, 1);
> > 
> > This is always allocating and chaining a new entry even
> > though it isn't needed in the entries == NULL case.
> > 
> > > @@ -628,6 +632,7 @@ void v9fs_rreaddir(P9Req *req, uint32_t *count, 
> > > uint32_t *nentries,
> > >  *nentries = n;
> > >  }
> > >  
> > > +v9fs_free_dirents(unused_entries);
> > 
> > This is going to loop again on all entries to free them.
> > 
> > >  v9fs_req_free(req);
> > >  }
> > >  
> > 
> > If this function is to be called one day with an enormous
> > number of entries and entries == NULL case, this might
> > not scale well.
> > 
> > What about only allocating a single entry in this case ?
> > 
> > E.g.
> > 
> > @@ -593,7 +593,7 @@ void v9fs_rreaddir(P9Req *req, uint32_t *count, 
> > uint32_t *nentries,
> > struct V9fsDirent **entries)
> >  {
> >  uint32_t local_count;
> > -struct V9fsDirent *e = NULL;
> > +g_autofree struct V9fsDirent *e = NULL;
> >  uint16_t slen;
> >  uint32_t n = 0;
> >  
> > @@ -611,10 +611,12 @@ void v9fs_rreaddir(P9Req *req, uint32_t *count, 
> > uint32_t *nentries,
> >  if (!e) {
> >  e = g_new(struct V9fsDirent, 1);
> >  if (entries) {
> > -*entries = e;
> > +*entries = g_steal_pointer(e);
> 
> g_steal_pointer(e) just sets `e` to NULL and returns its old value, so ...
> 
> >  }
> >  } else {
> > -e = e->next = g_new(struct V9fsDirent, 1);
> > +if (entries) {
> > +e = e->next = g_new(struct V9fsDirent, 1);
> > +}
> 
> ... this `else` block would never be reached and no list assembled.
> 
> >  }
> >  e->next = NULL;
> >  /* qid[13] offset[8] type[1] name[s] */
> 
> And even if above's issue was fixed, then it would cause a use-after-free for
> the last element in the list if entries != NULL and caller trying to access
> the last element afterwards. So you would still need a separate g_autofree
> pointer instead of tagging `e` directly, or something like this after loop
> end:
> 
>   if (entries)
> g_steal_pointer(e);
> 
> Which would somehow defeat the purpose of using g_autofree though.
> 
> I mean, yes this could be addressed, but is it worth it? I don't know. Even
> this reported leak is a purely theoretical one, but I understand if people
> want to silence a warning.
> 

Yeah you're right.

Cheers,

--
Greg

> Best regards,
> Christian Schoenebeck
> 
>

Re: [PATCH v2 2/4] ppc: spapr: cleanup h_enter_nested() with helper routines.

2023-05-01 Thread Harsh Prateek Bora





On 5/2/23 10:19, Nicholas Piggin wrote:

On Tue Apr 25, 2023 at 12:47 AM AEST, Harsh Prateek Bora wrote:

h_enter_nested() currently does a lot of register specific operations
which should be abstracted logically to simplify the code for better
readability. This patch breaks down relevant blocks into respective
helper routines to make use of them for better readability/maintenance.

Signed-off-by: Harsh Prateek Bora 
---
  hw/ppc/spapr_hcall.c | 117 ---
  1 file changed, 78 insertions(+), 39 deletions(-)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 124cee5e53..f24d4b368e 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1544,6 +1544,81 @@ static target_ulong h_copy_tofrom_guest(PowerPCCPU *cpu,
  return H_FUNCTION;
  }
  
+static void restore_hdec_from_hvstate(CPUPPCState *dst,

+  struct kvmppc_hv_guest_state *hv_state,
+  target_ulong now)
+{
+target_ulong hdec;
+
+assert(hv_state);
+hdec = hv_state->hdec_expiry - now;
+cpu_ppc_hdecr_init(dst);
+cpu_ppc_store_hdecr(dst, hdec);
+}
+
+static void restore_lpcr_from_hvstate(PowerPCCPU *cpu,
+  struct kvmppc_hv_guest_state *hv_state)
+{
+PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
+CPUPPCState *dst = &cpu->env;
+target_ulong lpcr, lpcr_mask;
+
+assert(hv_state);
+lpcr_mask = LPCR_DPFD | LPCR_ILE | LPCR_AIL | LPCR_LD | LPCR_MER;
+lpcr = (dst->spr[SPR_LPCR] & ~lpcr_mask) | (hv_state->lpcr & lpcr_mask);
+lpcr |= LPCR_HR | LPCR_UPRT | LPCR_GTSE | LPCR_HVICE | LPCR_HDICE;
+lpcr &= ~LPCR_LPES0;
+dst->spr[SPR_LPCR] = lpcr & pcc->lpcr_mask;
+}
+
+static void restore_env_from_ptregs(CPUPPCState *env,
+struct kvmppc_pt_regs *regs)
+{
+assert(env);
+assert(regs);
+assert(sizeof(env->gpr) == sizeof(regs->gpr));
+memcpy(env->gpr, regs->gpr, sizeof(env->gpr));
+env->nip = regs->nip;
+env->msr = regs->msr;
+env->lr = regs->link;
+env->ctr = regs->ctr;
+cpu_write_xer(env, regs->xer);
+ppc_store_cr(env, regs->ccr);
+}
+
+static void restore_env_from_hvstate(CPUPPCState *env,
+ struct kvmppc_hv_guest_state *hv_state)
+{
+assert(env);
+assert(hv_state);
+env->spr[SPR_HFSCR] = hv_state->hfscr;
+/* TCG does not implement DAWR*, CIABR, PURR, SPURR, IC, VTB, HEIR SPRs*/
+env->cfar = hv_state->cfar;
+env->spr[SPR_PCR] = hv_state->pcr;
+env->spr[SPR_DPDES] = hv_state->dpdes;
+env->spr[SPR_SRR0] = hv_state->srr0;
+env->spr[SPR_SRR1] = hv_state->srr1;
+env->spr[SPR_SPRG0] = hv_state->sprg[0];
+env->spr[SPR_SPRG1] = hv_state->sprg[1];
+env->spr[SPR_SPRG2] = hv_state->sprg[2];
+env->spr[SPR_SPRG3] = hv_state->sprg[3];
+env->spr[SPR_BOOKS_PID] = hv_state->pidr;
+env->spr[SPR_PPR] = hv_state->ppr;
+}
+
+static inline void restore_l2_env(PowerPCCPU *cpu,
+ struct kvmppc_hv_guest_state *hv_state,
+ struct kvmppc_pt_regs *regs,
+ target_ulong now)
+{
+CPUPPCState *env = &cpu->env;
+
+restore_env_from_ptregs(env, regs);
+restore_env_from_hvstate(env, hv_state);
+restore_lpcr_from_hvstate(cpu, hv_state);
+restore_hdec_from_hvstate(env, hv_state, now);
+}
+
  /*
   * When this handler returns, the environment is switched to the L2 guest
   * and TCG begins running that. spapr_exit_nested() performs the switch from
@@ -1554,14 +1629,12 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
 target_ulong opcode,
 target_ulong *args)
  {
-PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
  CPUState *cs = CPU(cpu);
  CPUPPCState *env = &cpu->env;
  SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
  target_ulong hv_ptr = args[0];
  target_ulong regs_ptr = args[1];
-target_ulong hdec, now = cpu_ppc_load_tbl(env);
-target_ulong lpcr, lpcr_mask;
+target_ulong now = cpu_ppc_load_tbl(env);
  struct kvmppc_hv_guest_state *hvstate;
  struct kvmppc_hv_guest_state hv_state;
  struct kvmppc_pt_regs *regs;
@@ -1607,49 +1680,15 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
  return H_P2;
  }
  
-len = sizeof(env->gpr);

-assert(len == sizeof(regs->gpr));
-memcpy(env->gpr, regs->gpr, len);
-
-env->lr = regs->link;
-env->ctr = regs->ctr;
-cpu_write_xer(env, regs->xer);
-ppc_store_cr(env, regs->ccr);
-
-env->msr = regs->msr;
-env->nip = regs->nip;
+/* restore L2 env from hv_state and ptregs */
+restore_l2_env(cpu, &hv_state, regs, now);
  
  address_space_unmap(CPU(cpu)->as, regs, len, len, false);


I don't agree this improves readability. It also does more with the
guest address space mapped, whi

Re: [PATCH v2 3/4] ppc: spapr: cleanup spapr_exit_nested() with helper routines.

2023-05-01 Thread Harsh Prateek Bora


Hi Nick,

On 5/2/23 10:36, Nicholas Piggin wrote:

On Tue Apr 25, 2023 at 12:47 AM AEST, Harsh Prateek Bora wrote:

Currently, in spapr_exit_nested(), it does a lot of register state
restoring from ptregs/hvstate after mapping each of those before
restoring the L1 host state. This patch breaks down those set of ops
to respective helper routines for better code readability/maintenance.

Signed-off-by: Harsh Prateek Bora 
---
  hw/ppc/spapr_hcall.c | 120 ++-
  1 file changed, 72 insertions(+), 48 deletions(-)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index f24d4b368e..e69634bc22 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1719,45 +1719,14 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
  return env->gpr[3];
  }
  
-void spapr_exit_nested(PowerPCCPU *cpu, int excp)

+static void restore_hvstate_from_env(struct kvmppc_hv_guest_state *hvstate,
+ CPUPPCState *env, int excp)
  {
-CPUState *cs = CPU(cpu);
-CPUPPCState *env = &cpu->env;
-SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
-target_ulong r3_return = env->excp_vectors[excp]; /* hcall return value */
-target_ulong hv_ptr = spapr_cpu->nested_host_state->gpr[4];
-target_ulong regs_ptr = spapr_cpu->nested_host_state->gpr[5];
-struct kvmppc_hv_guest_state *hvstate;
-struct kvmppc_pt_regs *regs;
-hwaddr len;
-
-assert(spapr_cpu->in_nested);
-
-cpu_ppc_hdecr_exit(env);
-
-len = sizeof(*hvstate);
-hvstate = address_space_map(CPU(cpu)->as, hv_ptr, &len, true,
-MEMTXATTRS_UNSPECIFIED);
-if (len != sizeof(*hvstate)) {
-address_space_unmap(CPU(cpu)->as, hvstate, len, 0, true);
-r3_return = H_PARAMETER;
-goto out_restore_l1;
-}
-
  hvstate->cfar = env->cfar;
  hvstate->lpcr = env->spr[SPR_LPCR];
  hvstate->pcr = env->spr[SPR_PCR];
  hvstate->dpdes = env->spr[SPR_DPDES];
  hvstate->hfscr = env->spr[SPR_HFSCR];
-
-if (excp == POWERPC_EXCP_HDSI) {
-hvstate->hdar = env->spr[SPR_HDAR];
-hvstate->hdsisr = env->spr[SPR_HDSISR];
-hvstate->asdr = env->spr[SPR_ASDR];
-} else if (excp == POWERPC_EXCP_HISI) {
-hvstate->asdr = env->spr[SPR_ASDR];
-}
-
  /* HEIR should be implemented for HV mode and saved here. */
  hvstate->srr0 = env->spr[SPR_SRR0];
  hvstate->srr1 = env->spr[SPR_SRR1];
@@ -1768,27 +1737,43 @@ void spapr_exit_nested(PowerPCCPU *cpu, int excp)
  hvstate->pidr = env->spr[SPR_BOOKS_PID];
  hvstate->ppr = env->spr[SPR_PPR];
  
-/* Is it okay to specify write length larger than actual data written? */

-address_space_unmap(CPU(cpu)->as, hvstate, len, len, true);
+if (excp == POWERPC_EXCP_HDSI) {
+hvstate->hdar = env->spr[SPR_HDAR];
+hvstate->hdsisr = env->spr[SPR_HDSISR];
+hvstate->asdr = env->spr[SPR_ASDR];
+} else if (excp == POWERPC_EXCP_HISI) {
+hvstate->asdr = env->spr[SPR_ASDR];
+}
+}
  
-len = sizeof(*regs);

-regs = address_space_map(CPU(cpu)->as, regs_ptr, &len, true,
+static int map_and_restore_l2_hvstate(PowerPCCPU *cpu, int excp, target_ulong 
*r3)
+{
+CPUPPCState *env = &cpu->env;
+SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
+target_ulong hv_ptr = spapr_cpu->nested_host_state->gpr[4];
+struct kvmppc_hv_guest_state *hvstate;
+hwaddr len = sizeof(*hvstate);
+
+hvstate = address_space_map(CPU(cpu)->as, hv_ptr, &len, true,
  MEMTXATTRS_UNSPECIFIED);
-if (!regs || len != sizeof(*regs)) {
-address_space_unmap(CPU(cpu)->as, regs, len, 0, true);
-r3_return = H_P2;
-goto out_restore_l1;
+if (len != sizeof(*hvstate)) {
+address_space_unmap(CPU(cpu)->as, hvstate, len, 0, true);
+*r3 = H_PARAMETER;
+return -1;
  }
+restore_hvstate_from_env(hvstate, env, excp);
+/* Is it okay to specify write length larger than actual data written? */
+address_space_unmap(CPU(cpu)->as, hvstate, len, len, true);
+return 0;
+}
  
+static void restore_ptregs_from_env(struct kvmppc_pt_regs *regs,

+CPUPPCState *env, int excp)
+{
+hwaddr len;
  len = sizeof(env->gpr);
  assert(len == sizeof(regs->gpr));
  memcpy(regs->gpr, env->gpr, len);
-
-regs->link = env->lr;
-regs->ctr = env->ctr;
-regs->xer = cpu_read_xer(env);
-regs->ccr = ppc_get_cr(env);
-
  if (excp == POWERPC_EXCP_MCHECK ||
  excp == POWERPC_EXCP_RESET ||
  excp == POWERPC_EXCP_SYSCALL) {
@@ -1798,11 +1783,50 @@ void spapr_exit_nested(PowerPCCPU *cpu, int excp)
  regs->nip = env->spr[SPR_HSRR0];
  regs->msr = env->spr[SPR_HSRR1] & env->msr_mask;
  }
+regs->link = env->lr;
+regs->ctr = env->ctr;
+regs->xer = cpu_read_xer(env);
+regs->ccr = ppc_get_cr(env);
+}
  
+static int map_and_restore_l2_pt

Re: [PATCH v3] meson: Pass -j option to sphinx

2023-05-01 Thread Thomas Huth


On 28/04/2023 19.45, Fabiano Rosas wrote:

Markus Armbruster  writes:


Fabiano Rosas  writes:


Save a bit of build time by passing the number of jobs option to
sphinx.

We cannot use the -j option from make because meson does not support
setting build time parameters for custom targets. Use nproc instead or
the equivalent sphinx option "-j auto", if that is available.

Also make sure our plugins support parallelism and report it properly
to sphinx. Particularly, implement the merge_domaindata method in
DBusDomain that is used to merge in data from other subprocesses.

...

diff --git a/docs/meson.build b/docs/meson.build
index f220800e3e..138ec6ce6f 100644
--- a/docs/meson.build
+++ b/docs/meson.build
@@ -10,6 +10,18 @@ if sphinx_build.found()
  SPHINX_ARGS += [ '-W', '-Dkerneldoc_werror=1' ]
endif
  
+  sphinx_version = run_command(SPHINX_ARGS + ['--version'],

+   check: true).stdout().split()[1]
+  if sphinx_version.version_compare('>=5.1.2')


Where do you get 5.1.2 from?  I have 5.0.2, and -j auto appears to work
fine.  The manual page says "Changed in version 1.7: Support auto
argument."



Ouch, I was looking at the readthedocs repository which has a similar
change.

So I think we could probably just hardcode the option. Most distros will
have a more recent sphinx version.
https://repology.org/project/python:sphinx/versions

Let me try to figure out what gitlab is using. I know it is less than 4
because our docs don't show some of the dbus parts


That's the "pages" job in .gitlab-ci.d/buildtest.yml, i.e. the debian-amd64 
container, i.e. Debian 11.


If I get that right (https://packages.debian.org/source/sphinx), this means 
we're using Sphinx v3.4.3 here.


 Thomas

Re: [PATCH v2 2/4] ppc: spapr: cleanup h_enter_nested() with helper routines.

2023-05-01 Thread Nicholas Piggin

On Tue May 2, 2023 at 4:13 PM AEST, Harsh Prateek Bora wrote:
> On 5/2/23 10:19, Nicholas Piggin wrote:
> > On Tue Apr 25, 2023 at 12:47 AM AEST, Harsh Prateek Bora wrote:
> >> @@ -1607,49 +1680,15 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
> >>   return H_P2;
> >>   }
> >>   
> >> -len = sizeof(env->gpr);
> >> -assert(len == sizeof(regs->gpr));
> >> -memcpy(env->gpr, regs->gpr, len);
> >> -
> >> -env->lr = regs->link;
> >> -env->ctr = regs->ctr;
> >> -cpu_write_xer(env, regs->xer);
> >> -ppc_store_cr(env, regs->ccr);
> >> -
> >> -env->msr = regs->msr;
> >> -env->nip = regs->nip;
> >> +/* restore L2 env from hv_state and ptregs */
> >> +restore_l2_env(cpu, &hv_state, regs, now);
> >>   
> >>   address_space_unmap(CPU(cpu)->as, regs, len, len, false);
> > 
> > I don't agree this improves readability. It also does more with the
> > guest address space mapped, which may not be a big deal is strictly
> > not an improvement.
> > 
> > The comment needn't just repeat what the function says, and it does
> > not actually restore the l2 environment. It sets some registers to
> > L2 values, but it also leaves other state.
> > 
> > I would like to see this in a larger series if it's going somewhere,
> > but at the moment I'd rather leave it as is.
> > 
> While I agree the routine could be named restore_l2_hvstate_ptregs() as 
> more appropriate, I think it still makes sense to have the body of 
> enter/exit routines with as minimum LOC as possible, with the help of 
> minimum helper routines possible.

I don't think that's a good goal. The entirity of entering and exiting
from a nested guest is 279 lines including comments and no more than
one level of control flow. It's tricky code and has worts, but not
because the number of lines.

> Giving semantics to the set of 
> operations related to ptregs/hvstate register load/store is the first 
> step towards it.

Those structures are entirely the domain of the hcall API though, so
if anything belongs in the handler functions it is the handling of
those IMO.

> As you have guessed, this is certainly a precursor to another API 
> version that we have been working on (still a WIP), and helps isolating 
> the code flows for backward compatibiility. Having such changes early 
> upstream helps stablising changes which are not a really a API/design 
> change.

Right. Some more abstracting could certainly make sense here, I just
think at this point we need to see the bigger picture.

Thanks,
Nick

94 matches

Mail list logo