date:20210412

From: Klaus Jensen 

Remove the docs/specs/nvme.txt and replace it with proper documentation
in docs/system/nvme.rst.

Signed-off-by: Klaus Jensen 
Reviewed-by: Philippe Mathieu-Daudé 
---
 docs/specs/nvme.txt   |  23 -
 docs/system/index.rst |   1 +
 docs/system/nvme.rst  | 225 ++
 MAINTAINERS   |   2 +-
 4 files changed, 227 insertions(+), 24 deletions(-)
 delete mode 100644 docs/specs/nvme.txt
 create mode 100644 docs/system/nvme.rst

diff --git a/docs/specs/nvme.txt b/docs/specs/nvme.txt
deleted file mode 100644
index 56d393884e7a..
--- a/docs/specs/nvme.txt
+++ /dev/null
@@ -1,23 +0,0 @@
-NVM Express Controller
-==
-
-The nvme device (-device nvme) emulates an NVM Express Controller.
-
-
-Reference Specifications
-
-
-The device currently implements most mandatory features of NVMe v1.3d, see
-
-  https://nvmexpress.org/resources/specifications/
-
-for the specification.
-
-
-Known issues
-
-
-* The accounting numbers in the SMART/Health are reset across power cycles
-
-* Interrupt Coalescing is not supported and is disabled by default in volation
-  of the specification.
diff --git a/docs/system/index.rst b/docs/system/index.rst
index 02d07071810f..b05af716a973 100644
--- a/docs/system/index.rst
+++ b/docs/system/index.rst
@@ -23,6 +23,7 @@ Contents:
net
virtio-net-failover
usb
+   nvme
ivshmem
linuxboot
generic-loader
diff --git a/docs/system/nvme.rst b/docs/system/nvme.rst
new file mode 100644
index ..f7f63d6bf615
--- /dev/null
+++ b/docs/system/nvme.rst
@@ -0,0 +1,225 @@
+==
+NVMe Emulation
+==
+
+QEMU provides NVMe emulation through the ``nvme``, ``nvme-ns`` and
+``nvme-subsys`` devices.
+
+See the following sections for specific information on
+
+  * `Adding NVMe Devices`_, `additional namespaces`_ and `NVM subsystems`_.
+  * Configuration of `Optional Features`_ such as `Controller Memory Buffer`_,
+`Simple Copy`_, `Zoned Namespaces`_, `metadata`_ and `End-to-End Data
+Protection`_,
+
+Adding NVMe Devices
+===
+
+Controller Emulation
+
+
+The QEMU emulated NVMe controller implements version 1.4 of the NVM Express
+specification. All mandatory features are implement with a couple of exceptions
+and limitations:
+
+  * Accounting numbers in the SMART/Health log page are reset when the device
+is power cycled.
+  * Interrupt Coalescing is not supported and is disabled by default.
+
+The simplest way to attach an NVMe controller on the QEMU PCI bus is to add the
+following parameters:
+
+.. code-block:: console
+
+-drive file=nvm.img,if=none,id=nvm
+-device nvme,serial=deadbeef,drive=nvm
+
+There are a number of optional general parameters for the ``nvme`` device. Some
+are mentioned here, but see ``-device nvme,help`` to list all possible
+parameters.
+
+``max_ioqpairs=UINT32`` (default: ``64``)
+  Set the maximum number of allowed I/O queue pairs. This replaces the
+  deprecated ``num_queues`` parameter.
+
+``msix_qsize=UINT16`` (default: ``65``)
+  The number of MSI-X vectors that the device should support.
+
+``mdts=UINT8`` (default: ``7``)
+  Set the Maximum Data Transfer Size of the device.
+
+``use-intel-id`` (default: ``off``)
+  Since QEMU 5.2, the device uses a QEMU allocated "Red Hat" PCI Device and
+  Vendor ID. Set this to ``on`` to revert to the unallocated Intel ID
+  previously used.
+
+Additional Namespaces
+-
+
+In the simplest possible invocation sketched above, the device only support a
+single namespace with the namespace identifier ``1``. To support multiple
+namespaces and additional features, the ``nvme-ns`` device must be used.
+
+.. code-block:: console
+
+   -device nvme,id=nvme-ctrl-0,serial=deadbeef
+   -drive file=nvm-1.img,if=none,id=nvm-1
+   -device nvme-ns,drive=nvm-1
+   -drive file=nvm-2.img,if=none,id=nvm-2
+   -device nvme-ns,drive=nvm-2
+
+The namespaces defined by the ``nvme-ns`` device will attach to the most
+recently defined ``nvme-bus`` that is created by the ``nvme`` device. Namespace
+identifers are allocated automatically, starting from ``1``.
+
+There are a number of parameters available:
+
+``nsid`` (default: ``0``)
+  Explicitly set the namespace identifier.
+
+``uuid`` (default: *autogenerated*)
+  Set the UUID of the namespace. This will be reported as a "Namespace UUID"
+  descriptor in the Namespace Identification Descriptor List.
+
+``bus``
+  If there are more ``nvme`` devices defined, this parameter may be used to
+  attach the namespace to a specific ``nvme`` device (identified by an ``id``
+  parameter on the controller device).
+
+NVM Subsystems
+--
+
+Additional features becomes available if the controller device (``nvme``) is
+linked to an NVM Subsystem device (``nvme-subsys``).
+
+The NVM Subsystem emulation allows features such as shared namespaces and
+multipath I/O.
+
+..

[PATCH RFC v5 10/12] target/riscv: Add kvm_riscv_get/put_regs_timer

2021-04-12 Thread Yifei Jiang

Add kvm_riscv_get/put_regs_timer to synchronize virtual time context
from KVM.

To set register of RISCV_TIMER_REG(state) will occur a error from KVM
on kvm_timer_state == 0. It's better to adapt in KVM, but it doesn't matter
that adaping in QEMU.

Signed-off-by: Yifei Jiang 
Signed-off-by: Yipeng Yin 
---
 target/riscv/cpu.h |  6 
 target/riscv/kvm.c | 72 ++
 2 files changed, 78 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 3ca3dad341..b043881bb1 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -247,6 +247,12 @@ struct CPURISCVState {
 
 hwaddr kernel_addr;
 hwaddr fdt_addr;
+
+/* kvm timer */
+bool kvm_timer_dirty;
+uint64_t kvm_timer_time;
+uint64_t kvm_timer_compare;
+uint64_t kvm_timer_state;
 };
 
 OBJECT_DECLARE_TYPE(RISCVCPU, RISCVCPUClass,
diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index f9707157e7..ec693795ce 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -59,6 +59,9 @@ static __u64 kvm_riscv_reg_id(CPURISCVState *env, __u64 type, 
__u64 idx)
 #define RISCV_CSR_REG(env, name)  kvm_riscv_reg_id(env, KVM_REG_RISCV_CSR, \
  KVM_REG_RISCV_CSR_REG(name))
 
+#define RISCV_TIMER_REG(env, name)  kvm_riscv_reg_id(env, KVM_REG_RISCV_TIMER, 
\
+ KVM_REG_RISCV_TIMER_REG(name))
+
 #define RISCV_FP_F_REG(env, idx)  kvm_riscv_reg_id(env, KVM_REG_RISCV_FP_F, 
idx)
 
 #define RISCV_FP_D_REG(env, idx)  kvm_riscv_reg_id(env, KVM_REG_RISCV_FP_D, 
idx)
@@ -306,6 +309,75 @@ static int kvm_riscv_put_regs_fp(CPUState *cs)
 return ret;
 }
 
+static void kvm_riscv_get_regs_timer(CPUState *cs)
+{
+int ret;
+uint64_t reg;
+CPURISCVState *env = &RISCV_CPU(cs)->env;
+
+if (env->kvm_timer_dirty) {
+return;
+}
+
+ret = kvm_get_one_reg(cs, RISCV_TIMER_REG(env, time), ®);
+if (ret) {
+abort();
+}
+env->kvm_timer_time = reg;
+
+ret = kvm_get_one_reg(cs, RISCV_TIMER_REG(env, compare), ®);
+if (ret) {
+abort();
+}
+env->kvm_timer_compare = reg;
+
+ret = kvm_get_one_reg(cs, RISCV_TIMER_REG(env, state), ®);
+if (ret) {
+abort();
+}
+env->kvm_timer_state = reg;
+
+env->kvm_timer_dirty = true;
+}
+
+static void kvm_riscv_put_regs_timer(CPUState *cs)
+{
+int ret;
+uint64_t reg;
+CPURISCVState *env = &RISCV_CPU(cs)->env;
+
+if (!env->kvm_timer_dirty) {
+return;
+}
+
+reg = env->kvm_timer_time;
+ret = kvm_set_one_reg(cs, RISCV_TIMER_REG(env, time), ®);
+if (ret) {
+abort();
+}
+
+reg = env->kvm_timer_compare;
+ret = kvm_set_one_reg(cs, RISCV_TIMER_REG(env, compare), ®);
+if (ret) {
+abort();
+}
+
+/*
+ * To set register of RISCV_TIMER_REG(state) will occur a error from KVM
+ * on env->kvm_timer_state == 0, It's better to adapt in KVM, but it
+ * doesn't matter that adaping in QEMU now.
+ * TODO If KVM changes, adapt here.
+ */
+if (env->kvm_timer_state) {
+reg = env->kvm_timer_state;
+ret = kvm_set_one_reg(cs, RISCV_TIMER_REG(env, state), ®);
+if (ret) {
+abort();
+}
+}
+
+env->kvm_timer_dirty = false;
+}
 
 const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 KVM_CAP_LAST_INFO
-- 
2.19.1

[PULL for-6.0 4/4] hw/block/nvme: drain namespaces on sq deletion

From: Klaus Jensen 

For most commands, when issuing an AIO, the BlockAIOCB is stored in the
NvmeRequest aiocb pointer when the AIO is issued. The main use of this
is cancelling AIOs when deleting submission queues (it is currently not
used for Abort).

However, some commands like Dataset Management Zone Management Send
(zone reset) may involve more than one AIO and here the AIOs are issued
without saving a reference to the BlockAIOCB. This is a problem since
nvme_del_sq() will attempt to cancel outstanding AIOs, potentially with
an invalid BlockAIOCB since the aiocb pointer is not NULL'ed when the
request structure is recycled.

Fix this by

  1. making sure the aiocb pointer is NULL'ed when requests are recycled
  2. only attempt to cancel the AIO if the aiocb is non-NULL
  3. if any AIOs could not be cancelled, drain all aio as a last resort.

Fixes: dc04d25e2f3f ("hw/block/nvme: add support for the format nvm command")
Fixes: c94973288cd9 ("hw/block/nvme: add broadcast nsid support flush command")
Fixes: e4e430b3d6ba ("hw/block/nvme: add simple copy command")
Fixes: 5f5dc4c6a942 ("hw/block/nvme: zero out zones on reset")
Fixes: 2605257a26b8 ("hw/block/nvme: add the dataset management command")
Cc: Gollu Appalanaidu 
Cc: Minwoo Im 
Signed-off-by: Klaus Jensen 
Reviewed-by: Minwoo Im 
---
 hw/block/nvme.c | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 65bb2cfc21d7..624a1431d072 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -470,6 +470,7 @@ static void nvme_req_clear(NvmeRequest *req)
 {
 req->ns = NULL;
 req->opaque = NULL;
+req->aiocb = NULL;
 memset(&req->cqe, 0x0, sizeof(req->cqe));
 req->status = NVME_SUCCESS;
 }
@@ -3687,6 +3688,7 @@ static uint16_t nvme_del_sq(NvmeCtrl *n, NvmeRequest *req)
 NvmeSQueue *sq;
 NvmeCQueue *cq;
 uint16_t qid = le16_to_cpu(c->qid);
+uint32_t nsid;
 
 if (unlikely(!qid || nvme_check_sqid(n, qid))) {
 trace_pci_nvme_err_invalid_del_sq(qid);
@@ -3698,9 +3700,26 @@ static uint16_t nvme_del_sq(NvmeCtrl *n, NvmeRequest 
*req)
 sq = n->sq[qid];
 while (!QTAILQ_EMPTY(&sq->out_req_list)) {
 r = QTAILQ_FIRST(&sq->out_req_list);
-assert(r->aiocb);
-blk_aio_cancel(r->aiocb);
+if (r->aiocb) {
+blk_aio_cancel(r->aiocb);
+}
 }
+
+/*
+ * Drain all namespaces if there are still outstanding requests that we
+ * could not cancel explicitly.
+ */
+if (!QTAILQ_EMPTY(&sq->out_req_list)) {
+for (nsid = 1; nsid <= NVME_MAX_NAMESPACES; nsid++) {
+NvmeNamespace *ns = nvme_ns(n, nsid);
+if (ns) {
+nvme_ns_drain(ns);
+}
+}
+}
+
+assert(QTAILQ_EMPTY(&sq->out_req_list));
+
 if (!nvme_check_cqid(n, sq->cqid)) {
 cq = n->cq[sq->cqid];
 QTAILQ_REMOVE(&cq->sq_list, sq, entry);
-- 
2.31.1

[PATCH RFC v5 12/12] target/riscv: Support virtual time context synchronization

2021-04-12 Thread Yifei Jiang

Add virtual time context description to vmstate_riscv_cpu. After cpu being
loaded, virtual time context is updated to KVM.

Signed-off-by: Yifei Jiang 
Signed-off-by: Yipeng Yin 
---
 target/riscv/machine.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/target/riscv/machine.c b/target/riscv/machine.c
index 44d4015bd6..ef2d5395a8 100644
--- a/target/riscv/machine.c
+++ b/target/riscv/machine.c
@@ -138,10 +138,20 @@ static const VMStateDescription vmstate_hyper = {
 }
 };
 
+static int cpu_post_load(void *opaque, int version_id)
+{
+RISCVCPU *cpu = opaque;
+CPURISCVState *env = &cpu->env;
+
+env->kvm_timer_dirty = true;
+return 0;
+}
+
 const VMStateDescription vmstate_riscv_cpu = {
 .name = "cpu",
 .version_id = 1,
 .minimum_version_id = 1,
+.post_load = cpu_post_load,
 .fields = (VMStateField[]) {
 VMSTATE_UINTTL_ARRAY(env.gpr, RISCVCPU, 32),
 VMSTATE_UINT64_ARRAY(env.fpr, RISCVCPU, 32),
@@ -185,6 +195,10 @@ const VMStateDescription vmstate_riscv_cpu = {
 VMSTATE_UINT64(env.mtohost, RISCVCPU),
 VMSTATE_UINT64(env.timecmp, RISCVCPU),
 
+VMSTATE_UINT64(env.kvm_timer_time, RISCVCPU),
+VMSTATE_UINT64(env.kvm_timer_compare, RISCVCPU),
+VMSTATE_UINT64(env.kvm_timer_state, RISCVCPU),
+
 VMSTATE_END_OF_LIST()
 },
 .subsections = (const VMStateDescription * []) {
-- 
2.19.1

[PULL for-6.0 0/4] emulated nvme docs and fixes for -rc3

From: Klaus Jensen 

Hi Peter,

The following changes since commit 555249a59e9cdd6b58da103aba5cf3a2d45c899f:

  Merge remote-tracking branch 'remotes/ehabkost-gl/tags/x86-next-pull-request' 
into staging (2021-04-10 16:58:56 +0100)

are available in the Git repository at:

  git://git.infradead.org/qemu-nvme.git tags/nvme-fixes-20210412-pull-request

for you to fetch changes up to 98f84f5a4eca5c03e32fff20f246d9b4b96d6422:

  hw/block/nvme: drain namespaces on sq deletion (2021-04-12 08:55:23 +0200)


emulated nvme docs and fixes for -rc3

- documentation
- fixes



Klaus Jensen (3):
  docs: add nvme emulation documentation
  hw/block/nvme: store aiocb in compare
  hw/block/nvme: drain namespaces on sq deletion

Padmakar Kalghatgi (1):
  hw/block/nvme: map prp fix if prp2 contains non-zero offset

 docs/specs/nvme.txt   |  23 -
 docs/system/index.rst |   1 +
 docs/system/nvme.rst  | 225 ++
 hw/block/nvme.c   |  38 +--
 MAINTAINERS   |   2 +-
 5 files changed, 259 insertions(+), 30 deletions(-)
 delete mode 100644 docs/specs/nvme.txt
 create mode 100644 docs/system/nvme.rst

-- 
2.31.1

[PULL for-6.0 3/4] hw/block/nvme: store aiocb in compare

From: Klaus Jensen 

nvme_compare() fails to store the aiocb from the blk_aio_preadv() call.
Fix this.

Fixes: 0a384f923f51 ("hw/block/nvme: add compare command")
Cc: Gollu Appalanaidu 
Signed-off-by: Klaus Jensen 
Reviewed-by: Gollu Appalanaidu 
Reviewed-by: Minwoo Im 
---
 hw/block/nvme.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 86336152a378..65bb2cfc21d7 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -2843,7 +2843,8 @@ static uint16_t nvme_compare(NvmeCtrl *n, NvmeRequest 
*req)
 
 block_acct_start(blk_get_stats(blk), &req->acct, data_len,
  BLOCK_ACCT_READ);
-blk_aio_preadv(blk, offset, &ctx->data.iov, 0, nvme_compare_data_cb, req);
+req->aiocb = blk_aio_preadv(blk, offset, &ctx->data.iov, 0,
+nvme_compare_data_cb, req);
 
 return NVME_NO_COMPLETE;
 }
-- 
2.31.1

[PULL for-6.0 2/4] hw/block/nvme: map prp fix if prp2 contains non-zero offset

From: Padmakar Kalghatgi 

nvme_map_prp needs to calculate the number of list entries based on the
offset value. For the subsequent PRP2 list, need to ensure the number of
entries is within the MAX number of PRP entries for a page.

Signed-off-by: Padmakar Kalghatgi 
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 6b1f056a0ebc..86336152a378 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -655,7 +655,12 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, NvmeSg *sg, 
uint64_t prp1,
 uint32_t nents, prp_trans;
 int i = 0;
 
-nents = (len + n->page_size - 1) >> n->page_bits;
+/*
+ * The first PRP list entry, pointed to by PRP2 may contain offset.
+ * Hence, we need to calculate the number of entries in based on
+ * that offset.
+ */
+nents = (n->page_size - (prp2 & (n->page_size - 1))) >> 3;
 prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t);
 ret = nvme_addr_read(n, prp2, (void *)prp_list, prp_trans);
 if (ret) {
@@ -666,7 +671,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, NvmeSg *sg, 
uint64_t prp1,
 while (len != 0) {
 uint64_t prp_ent = le64_to_cpu(prp_list[i]);
 
-if (i == n->max_prp_ents - 1 && len > n->page_size) {
+if (i == nents - 1 && len > n->page_size) {
 if (unlikely(prp_ent & (n->page_size - 1))) {
 trace_pci_nvme_err_invalid_prplist_ent(prp_ent);
 status = NVME_INVALID_PRP_OFFSET | NVME_DNR;
@@ -675,7 +680,8 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, NvmeSg *sg, 
uint64_t prp1,
 
 i = 0;
 nents = (len + n->page_size - 1) >> n->page_bits;
-prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t);
+nents = MIN(nents, n->max_prp_ents);
+prp_trans = nents * sizeof(uint64_t);
 ret = nvme_addr_read(n, prp_ent, (void *)prp_list,
  prp_trans);
 if (ret) {
-- 
2.31.1

Re: [PATCH 3/3] vhost-vdpa-blk: Introduce vhost-vdpa-blk host device

2021-04-12 Thread Jason Wang




在 2021/4/9 下午4:17, Yongji Xie 写道:

On Fri, Apr 9, 2021 at 2:02 PM Jason Wang  wrote:


在 2021/4/8 下午6:12, Xie Yongji 写道:

This commit introduces a new vhost-vdpa block device, which
will set up a vDPA device specified by a "vdpa-dev" parameter,
something like:

qemu-system-x86_64 \
  -device vhost-vdpa-blk-pci,vdpa-dev=/dev/vhost-vdpa-0

Signed-off-by: Xie Yongji 
---
   hw/block/Kconfig   |   5 +
   hw/block/meson.build   |   1 +
   hw/block/vhost-vdpa-blk.c  | 227 +
   hw/virtio/meson.build  |   1 +
   hw/virtio/vhost-vdpa-blk-pci.c | 101 +
   include/hw/virtio/vhost-vdpa-blk.h |  30 
   6 files changed, 365 insertions(+)
   create mode 100644 hw/block/vhost-vdpa-blk.c
   create mode 100644 hw/virtio/vhost-vdpa-blk-pci.c
   create mode 100644 include/hw/virtio/vhost-vdpa-blk.h

diff --git a/hw/block/Kconfig b/hw/block/Kconfig
index 4fcd152166..4615a2c116 100644
--- a/hw/block/Kconfig
+++ b/hw/block/Kconfig
@@ -41,5 +41,10 @@ config VHOST_USER_BLK
   default y if VIRTIO_PCI
   depends on VIRTIO && VHOST_USER && LINUX

+config VHOST_VDPA_BLK
+bool
+default y if VIRTIO_PCI
+depends on VIRTIO && VHOST_VDPA && LINUX
+
   config SWIM
   bool
diff --git a/hw/block/meson.build b/hw/block/meson.build
index 5862bda4cb..98f1fc330a 100644
--- a/hw/block/meson.build
+++ b/hw/block/meson.build
@@ -17,5 +17,6 @@ softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: 
files('nvme.c', 'nvme-ns.c', 'n

   specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c'))
   specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: 
files('vhost-blk-common.c', 'vhost-user-blk.c'))
+specific_ss.add(when: 'CONFIG_VHOST_VDPA_BLK', if_true: 
files('vhost-blk-common.c', 'vhost-vdpa-blk.c'))

   subdir('dataplane')
diff --git a/hw/block/vhost-vdpa-blk.c b/hw/block/vhost-vdpa-blk.c
new file mode 100644
index 00..d5ca10
--- /dev/null
+++ b/hw/block/vhost-vdpa-blk.c
@@ -0,0 +1,227 @@
+/*
+ * vhost-vdpa-blk host device
+ *
+ * Copyright (C) 2021 Bytedance Inc. and/or its affiliates. All rights 
reserved.
+ *
+ * Author:
+ *   Xie Yongji 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "qemu/cutils.h"
+#include "hw/qdev-core.h"
+#include "hw/qdev-properties.h"
+#include "hw/qdev-properties-system.h"
+#include "hw/virtio/vhost.h"
+#include "hw/virtio/vhost-vdpa-blk.h"
+#include "hw/virtio/virtio.h"
+#include "hw/virtio/virtio-bus.h"
+#include "hw/virtio/virtio-access.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/runstate.h"
+
+static const int vdpa_feature_bits[] = {
+VIRTIO_BLK_F_SIZE_MAX,
+VIRTIO_BLK_F_SEG_MAX,
+VIRTIO_BLK_F_GEOMETRY,
+VIRTIO_BLK_F_BLK_SIZE,
+VIRTIO_BLK_F_TOPOLOGY,
+VIRTIO_BLK_F_MQ,
+VIRTIO_BLK_F_RO,
+VIRTIO_BLK_F_FLUSH,
+VIRTIO_BLK_F_CONFIG_WCE,
+VIRTIO_BLK_F_DISCARD,
+VIRTIO_BLK_F_WRITE_ZEROES,
+VIRTIO_F_VERSION_1,
+VIRTIO_RING_F_INDIRECT_DESC,
+VIRTIO_RING_F_EVENT_IDX,
+VIRTIO_F_NOTIFY_ON_EMPTY,
+VHOST_INVALID_FEATURE_BIT
+};
+
+static void vhost_vdpa_blk_set_status(VirtIODevice *vdev, uint8_t status)
+{
+VHostVdpaBlk *s = VHOST_VDPA_BLK(vdev);
+VHostBlkCommon *vbc = VHOST_BLK_COMMON(s);
+bool should_start = virtio_device_started(vdev, status);
+int ret;
+
+if (!vdev->vm_running) {
+should_start = false;
+}
+
+if (vbc->dev.started == should_start) {
+return;
+}
+
+if (should_start) {
+ret = vhost_blk_common_start(vbc);
+if (ret < 0) {
+error_report("vhost-vdpa-blk: vhost start failed: %s",
+ strerror(-ret));
+}
+} else {
+vhost_blk_common_stop(vbc);
+}
+
+}
+
+static void vhost_vdpa_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
+{
+VHostVdpaBlk *s = VHOST_VDPA_BLK(vdev);
+VHostBlkCommon *vbc = VHOST_BLK_COMMON(s);
+int i, ret;


I believe we should never reach here, the backend should poll the
notifier and trigger vq handler there after DRIVER_OK?


Some legacy virtio-blk driver (virtio 0.9) will do that. Kick before
set DRIVER_OK.



Ok, I see, but any reason:

1) we need start vhost-blk
2) the relay is not done per vq but per device?





+
+if (!vdev->start_on_kick) {
+return;
+}
+
+if (vbc->dev.started) {
+return;
+}
+
+ret = vhost_blk_common_start(vbc);
+if (ret < 0) {
+error_report("vhost-vdpa-blk: vhost start failed: %s",
+ strerror(-ret));
+return;
+}
+
+/* Kick right away to begin processing requests already in vring */
+for (i = 0; i < vbc->dev.nvqs; i++) {
+VirtQueue *kick_vq = virtio_get_queue(vdev, i);
+
+if (!virtio_queue_get_desc_addr(vdev, i)) {
+continue;
+

Re: [PATCH for-6.0 v2 6/8] hw/block/nvme: update dmsrl limit on namespace detachment


On Apr  9 19:39, Thomas Huth wrote:

On 06/04/2021 09.24, Klaus Jensen wrote:

On Apr  6 09:10, Philippe Mathieu-Daudé wrote:

On 4/5/21 7:54 PM, Klaus Jensen wrote:

From: Klaus Jensen 

The Non-MDTS DMSRL limit must be recomputed when namespaces are
detached.

Fixes: 645ce1a70cb6 ("hw/block/nvme: support namespace attachment command")
Signed-off-by: Klaus Jensen 
Reviewed-by: Gollu Appalanaidu 
---
 hw/block/nvme.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index de0e726dfdd8..3dc51f407671 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -4876,6 +4876,21 @@ static uint16_t nvme_aer(NvmeCtrl *n, NvmeRequest *req)
 return NVME_NO_COMPLETE;
 }
+static void __nvme_update_dmrsl(NvmeCtrl *n)
+{
+int nsid;
+
+for (nsid = 1; nsid <= NVME_MAX_NAMESPACES; nsid++) {
+NvmeNamespace *ns = nvme_ns(n, nsid);
+if (!ns) {
+continue;
+}
+
+n->dmrsl = MIN_NON_ZERO(n->dmrsl,
+BDRV_REQUEST_MAX_BYTES / nvme_l2b(ns, 1));
+}
+}
+
 static void __nvme_select_ns_iocs(NvmeCtrl *n, NvmeNamespace *ns);
 static uint16_t nvme_ns_attachment(NvmeCtrl *n, NvmeRequest *req)
 {
@@ -4925,6 +4940,8 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, 
NvmeRequest *req)
 }
 nvme_ns_detach(ctrl, ns);
+
+__nvme_update_dmrsl(ctrl);
 }


Why the '__' prefix? It doesn't seem clearer (I'm not sure there is
a convention, it makes me think of a internal macro expansion use
for preprocessor).

There are very few uses of this prefix:

hw/9pfs/cofs.c:21:static ssize_t __readlink(V9fsState *s, V9fsPath
*path, V9fsString *buf)
hw/block/nvme.c:1683:static uint16_t __nvme_zrm_open(NvmeNamespace *ns,
NvmeZone *zone,
hw/block/nvme.c:1742:static void __nvme_advance_zone_wp(NvmeNamespace
*ns, NvmeZone *zone,
hw/block/nvme.c:5213:static void __nvme_select_ns_iocs(NvmeCtrl *n,
NvmeNamespace *ns)
hw/i386/amd_iommu.c:1160:static int __amdvi_int_remap_msi(AMDVIState *iommu,
hw/intc/s390_flic_kvm.c:255:static int __get_all_irqs(KVMS390FLICState
*flic,
hw/net/rocker/rocker_desc.c:199:static bool
__desc_ring_post_desc(DescRing *ring, int err)
hw/net/sungem.c:766:static uint16_t __sungem_mii_read(SunGEMState *s,
uint8_t phy_addr,
hw/ppc/ppc.c:867:static void __cpu_ppc_store_decr(PowerPCCPU *cpu,
uint64_t *nextp,
hw/s390x/pv.c:25:static int __s390_pv_cmd(uint32_t cmd, const char
*cmdname, void *data)
pc-bios/s390-ccw/cio.c:315:static int __do_cio(SubChannelId schid,
uint32_t ccw_addr, int fmt, Irb *irb)
target/ppc/mmu-hash64.c:170:static void __helper_slbie(CPUPPCState *env,
target_ulong addr,

Thomas, Eric, is it worth cleaning these and updating the
'CODESTYLE.rst'?



Yeah ok, I think you are right that there is no clear convention on when
to use this or not. I typically just use it for functions that are
normally not supposed to be called directly.

But I don't even think its consistent in the nvme device. For my sake,
we can clean it up, I'll drop it in this case since there is no good
reason for it other than my own idea of "style".


IIRC all identifiers that start with two underscores are reserved by 
the C standard:


https://busybox.net/~landley/c99-draft.html#7.1.3

Thus you should not use two underscores at the beginning here at all.



I'll clean up the remaining double underscores in the next cycle!


signature.asc
Description: PGP signature

[PATCH] target/i386: Add CPU model versions supporting 'xsaves'

2021-04-12 Thread Vitaly Kuznetsov

Hyper-V 2016 refuses to boot on Skylake+ CPU models because they lack
'xsaves'/'vmx-xsaves' features and this diverges from real hardware. The
same issue emerges with AMD "EPYC" CPU model prior to version 3 which got
'xsaves' added. EPYC-Rome/EPYC-Milan CPU models have 'xsaves' enabled from
the very beginning so the comment blaming KVM to explain why other CPUs
lack 'xsaves' is likely outdated.

Signed-off-by: Vitaly Kuznetsov 
---
This patch is a succesor of "[PATCH RFC] target/i386: Add Intel CPU model
 versions supporting 'xsaves'".

Changes since RFC:
- Add 'xsaves' version of Hygon Dhyana [Wen Pu]
---
 target/i386/cpu.c | 150 +-
 1 file changed, 94 insertions(+), 56 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 6b3e9467f177..5753f839b990 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -2881,12 +2881,7 @@ static X86CPUDefinition builtin_x86_defs[] = {
 CPUID_7_0_EBX_BMI2 | CPUID_7_0_EBX_ERMS | CPUID_7_0_EBX_INVPCID |
 CPUID_7_0_EBX_RTM | CPUID_7_0_EBX_RDSEED | CPUID_7_0_EBX_ADX |
 CPUID_7_0_EBX_SMAP,
-/* Missing: XSAVES (not supported by some Linux versions,
- * including v4.1 to v4.12).
- * KVM doesn't yet expose any XSAVES state save component,
- * and the only one defined in Skylake (processor tracing)
- * probably will block migration anyway.
- */
+/* XSAVES is added in version 4 */
 .features[FEAT_XSAVE] =
 CPUID_XSAVE_XSAVEOPT | CPUID_XSAVE_XSAVEC |
 CPUID_XSAVE_XGETBV1,
@@ -2962,6 +2957,15 @@ static X86CPUDefinition builtin_x86_defs[] = {
 { /* end of list */ }
 }
 },
+{
+.version = 4,
+.note = "IBRS, XSAVES, no TSX",
+.props = (PropValue[]) {
+{ "xsaves", "on" },
+{ "vmx-xsaves", "on" },
+{ /* end of list */ }
+}
+},
 { /* end of list */ }
 }
 },
@@ -3001,12 +3005,7 @@ static X86CPUDefinition builtin_x86_defs[] = {
 CPUID_7_0_EBX_AVX512VL | CPUID_7_0_EBX_CLFLUSHOPT,
 .features[FEAT_7_0_ECX] =
 CPUID_7_0_ECX_PKU,
-/* Missing: XSAVES (not supported by some Linux versions,
- * including v4.1 to v4.12).
- * KVM doesn't yet expose any XSAVES state save component,
- * and the only one defined in Skylake (processor tracing)
- * probably will block migration anyway.
- */
+/* XSAVES is added in version 5 */
 .features[FEAT_XSAVE] =
 CPUID_XSAVE_XSAVEOPT | CPUID_XSAVE_XSAVEC |
 CPUID_XSAVE_XGETBV1,
@@ -3094,6 +3093,15 @@ static X86CPUDefinition builtin_x86_defs[] = {
 { /* end of list */ }
 }
 },
+{
+.version = 5,
+.note = "IBRS, XSAVES, EPT switching, no TSX",
+.props = (PropValue[]) {
+{ "xsaves", "on" },
+{ "vmx-xsaves", "on" },
+{ /* end of list */ }
+}
+},
 { /* end of list */ }
 }
 },
@@ -3136,12 +3144,7 @@ static X86CPUDefinition builtin_x86_defs[] = {
 CPUID_7_0_ECX_AVX512VNNI,
 .features[FEAT_7_0_EDX] =
 CPUID_7_0_EDX_SPEC_CTRL | CPUID_7_0_EDX_SPEC_CTRL_SSBD,
-/* Missing: XSAVES (not supported by some Linux versions,
-* including v4.1 to v4.12).
-* KVM doesn't yet expose any XSAVES state save component,
-* and the only one defined in Skylake (processor tracing)
-* probably will block migration anyway.
-*/
+/* XSAVES is added in version 5 */
 .features[FEAT_XSAVE] =
 CPUID_XSAVE_XSAVEOPT | CPUID_XSAVE_XSAVEC |
 CPUID_XSAVE_XGETBV1,
@@ -3225,6 +3228,14 @@ static X86CPUDefinition builtin_x86_defs[] = {
   { /* end of list */ }
   },
 },
+{ .version = 5,
+  .note = "ARCH_CAPABILITIES, EPT switching, XSAVES, no TSX",
+  .props = (PropValue[]) {
+  { "xsaves", "on" },
+  { "vmx-xsaves", "on" },
+  { /* end of list */ }
+  },
+},
 { /* end of list */ }
 }
 },
@@ -3274,13 +3285,7 @@ static X86CPUDefinition builtin_x86_defs[] = {
 MSR_ARCH_CAP_PSCHANGE_MC_NO | MSR_ARCH_CAP_TAA_NO,
 .features[FEAT_7_1_EAX] =
 CPUID_7_1_EAX_AVX512_BF16,
-/*
- * Missing: XSAVES (not supported by some Linux versions,
- * including v4.1 to v4.12).
- * KVM doesn't yet expose any XSAVES state save component,
- * and the only one defined in Skylake (processor tracing)
- * prob

Re: Re: [PATCH 3/3] vhost-vdpa-blk: Introduce vhost-vdpa-blk host device

2021-04-12 Thread Yongji Xie

On Mon, Apr 12, 2021 at 3:14 PM Jason Wang  wrote:
>
>
> 在 2021/4/9 下午4:17, Yongji Xie 写道:
> > On Fri, Apr 9, 2021 at 2:02 PM Jason Wang  wrote:
> >>
> >> 在 2021/4/8 下午6:12, Xie Yongji 写道:
> >>> This commit introduces a new vhost-vdpa block device, which
> >>> will set up a vDPA device specified by a "vdpa-dev" parameter,
> >>> something like:
> >>>
> >>> qemu-system-x86_64 \
> >>>   -device vhost-vdpa-blk-pci,vdpa-dev=/dev/vhost-vdpa-0
> >>>
> >>> Signed-off-by: Xie Yongji 
> >>> ---
> >>>hw/block/Kconfig   |   5 +
> >>>hw/block/meson.build   |   1 +
> >>>hw/block/vhost-vdpa-blk.c  | 227 +
> >>>hw/virtio/meson.build  |   1 +
> >>>hw/virtio/vhost-vdpa-blk-pci.c | 101 +
> >>>include/hw/virtio/vhost-vdpa-blk.h |  30 
> >>>6 files changed, 365 insertions(+)
> >>>create mode 100644 hw/block/vhost-vdpa-blk.c
> >>>create mode 100644 hw/virtio/vhost-vdpa-blk-pci.c
> >>>create mode 100644 include/hw/virtio/vhost-vdpa-blk.h
> >>>
> >>> diff --git a/hw/block/Kconfig b/hw/block/Kconfig
> >>> index 4fcd152166..4615a2c116 100644
> >>> --- a/hw/block/Kconfig
> >>> +++ b/hw/block/Kconfig
> >>> @@ -41,5 +41,10 @@ config VHOST_USER_BLK
> >>>default y if VIRTIO_PCI
> >>>depends on VIRTIO && VHOST_USER && LINUX
> >>>
> >>> +config VHOST_VDPA_BLK
> >>> +bool
> >>> +default y if VIRTIO_PCI
> >>> +depends on VIRTIO && VHOST_VDPA && LINUX
> >>> +
> >>>config SWIM
> >>>bool
> >>> diff --git a/hw/block/meson.build b/hw/block/meson.build
> >>> index 5862bda4cb..98f1fc330a 100644
> >>> --- a/hw/block/meson.build
> >>> +++ b/hw/block/meson.build
> >>> @@ -17,5 +17,6 @@ softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: 
> >>> files('nvme.c', 'nvme-ns.c', 'n
> >>>
> >>>specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: 
> >>> files('virtio-blk.c'))
> >>>specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: 
> >>> files('vhost-blk-common.c', 'vhost-user-blk.c'))
> >>> +specific_ss.add(when: 'CONFIG_VHOST_VDPA_BLK', if_true: 
> >>> files('vhost-blk-common.c', 'vhost-vdpa-blk.c'))
> >>>
> >>>subdir('dataplane')
> >>> diff --git a/hw/block/vhost-vdpa-blk.c b/hw/block/vhost-vdpa-blk.c
> >>> new file mode 100644
> >>> index 00..d5ca10
> >>> --- /dev/null
> >>> +++ b/hw/block/vhost-vdpa-blk.c
> >>> @@ -0,0 +1,227 @@
> >>> +/*
> >>> + * vhost-vdpa-blk host device
> >>> + *
> >>> + * Copyright (C) 2021 Bytedance Inc. and/or its affiliates. All rights 
> >>> reserved.
> >>> + *
> >>> + * Author:
> >>> + *   Xie Yongji 
> >>> + *
> >>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> >>> + * the COPYING file in the top-level directory.
> >>> + *
> >>> + */
> >>> +
> >>> +#include "qemu/osdep.h"
> >>> +#include "qapi/error.h"
> >>> +#include "qemu/error-report.h"
> >>> +#include "qemu/cutils.h"
> >>> +#include "hw/qdev-core.h"
> >>> +#include "hw/qdev-properties.h"
> >>> +#include "hw/qdev-properties-system.h"
> >>> +#include "hw/virtio/vhost.h"
> >>> +#include "hw/virtio/vhost-vdpa-blk.h"
> >>> +#include "hw/virtio/virtio.h"
> >>> +#include "hw/virtio/virtio-bus.h"
> >>> +#include "hw/virtio/virtio-access.h"
> >>> +#include "sysemu/sysemu.h"
> >>> +#include "sysemu/runstate.h"
> >>> +
> >>> +static const int vdpa_feature_bits[] = {
> >>> +VIRTIO_BLK_F_SIZE_MAX,
> >>> +VIRTIO_BLK_F_SEG_MAX,
> >>> +VIRTIO_BLK_F_GEOMETRY,
> >>> +VIRTIO_BLK_F_BLK_SIZE,
> >>> +VIRTIO_BLK_F_TOPOLOGY,
> >>> +VIRTIO_BLK_F_MQ,
> >>> +VIRTIO_BLK_F_RO,
> >>> +VIRTIO_BLK_F_FLUSH,
> >>> +VIRTIO_BLK_F_CONFIG_WCE,
> >>> +VIRTIO_BLK_F_DISCARD,
> >>> +VIRTIO_BLK_F_WRITE_ZEROES,
> >>> +VIRTIO_F_VERSION_1,
> >>> +VIRTIO_RING_F_INDIRECT_DESC,
> >>> +VIRTIO_RING_F_EVENT_IDX,
> >>> +VIRTIO_F_NOTIFY_ON_EMPTY,
> >>> +VHOST_INVALID_FEATURE_BIT
> >>> +};
> >>> +
> >>> +static void vhost_vdpa_blk_set_status(VirtIODevice *vdev, uint8_t status)
> >>> +{
> >>> +VHostVdpaBlk *s = VHOST_VDPA_BLK(vdev);
> >>> +VHostBlkCommon *vbc = VHOST_BLK_COMMON(s);
> >>> +bool should_start = virtio_device_started(vdev, status);
> >>> +int ret;
> >>> +
> >>> +if (!vdev->vm_running) {
> >>> +should_start = false;
> >>> +}
> >>> +
> >>> +if (vbc->dev.started == should_start) {
> >>> +return;
> >>> +}
> >>> +
> >>> +if (should_start) {
> >>> +ret = vhost_blk_common_start(vbc);
> >>> +if (ret < 0) {
> >>> +error_report("vhost-vdpa-blk: vhost start failed: %s",
> >>> + strerror(-ret));
> >>> +}
> >>> +} else {
> >>> +vhost_blk_common_stop(vbc);
> >>> +}
> >>> +
> >>> +}
> >>> +
> >>> +static void vhost_vdpa_blk_handle_output(VirtIODevice *vdev, VirtQueue 
> >>> *vq)
> >>> +{
> >>> +VHostVdpaBlk *s = VHOST_VDPA_BLK(vdev);
> >>> +VHostBlkCommon *vbc = VHOST_BLK_COMMON(s);
> >>> +int i, r

Re: [PATCH v5 3/6] virtio:add support in configure interrupt

On Fri, Apr 9, 2021 at 3:21 PM Jason Wang  wrote:
>
>
> 在 2021/4/8 下午5:38, Cindy Lu 写道:
> > Add configure notifier support in virtio and related driver
> > When peer is vhost vdpa, setup the configure interrupt function
> > vhost_net_start and release the resource when vhost_net_stop
>
>
> Actually, this is the vhost support for config interrupt.
>
>
sure will fix this
> >
> > Signed-off-by: Cindy Lu 
> > ---
> >   hw/net/vhost_net.c |  9 +
> >   hw/net/virtio-net.c|  6 ++
> >   hw/virtio/vhost.c  | 38 +-
> >   hw/virtio/virtio.c | 25 +
> >   include/hw/virtio/vhost.h  |  3 +++
> >   include/hw/virtio/virtio.h |  5 +
> >   include/net/vhost_net.h|  3 +++
> >   7 files changed, 88 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> > index 24d555e764..12e30dc25e 100644
> > --- a/hw/net/vhost_net.c
> > +++ b/hw/net/vhost_net.c
> > @@ -426,6 +426,15 @@ void vhost_net_virtqueue_mask(VHostNetState *net, 
> > VirtIODevice *dev,
> >   vhost_virtqueue_mask(&net->dev, dev, idx, mask);
> >   }
> >
> > +bool vhost_net_config_pending(VHostNetState *net, int idx)
> > +{
> > +return vhost_config_pending(&net->dev, idx);
> > +}
> > +void vhost_net_config_mask(VHostNetState *net, VirtIODevice *dev,
> > +  bool mask)
> > +{
> > +vhost_config_mask(&net->dev, dev,  mask);
> > +}
> >   VHostNetState *get_vhost_net(NetClientState *nc)
> >   {
> >   VHostNetState *vhost_net = 0;
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index 00d95e6615..e30a7d9835 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -3064,6 +3064,9 @@ static bool 
> > virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx,
> >   if (type == VIRTIO_VQ_VECTOR) {
> >   return vhost_net_virtqueue_pending(get_vhost_net(nc->peer), idx);
> >   }
> > +if (type == VIRTIO_CONFIG_VECTOR) {
> > +return vhost_net_config_pending(get_vhost_net(nc->peer), idx);
> > +}
> >   return false;
> >   }
> >
> > @@ -3076,6 +3079,9 @@ static void 
> > virtio_net_guest_notifier_mask(VirtIODevice *vdev, int idx,
> >   if (type == VIRTIO_VQ_VECTOR) {
> >   vhost_net_virtqueue_mask(get_vhost_net(nc->peer), vdev, idx, 
> > mask);
> >}
> > +if (type == VIRTIO_CONFIG_VECTOR) {
> > +vhost_net_config_mask(get_vhost_net(nc->peer), vdev, mask);
> > + }
> >   }
> >
> >   static void virtio_net_set_config_size(VirtIONet *n, uint64_t 
> > host_features)
> > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > index 614ccc2bcb..b5e915d5cf 100644
> > --- a/hw/virtio/vhost.c
> > +++ b/hw/virtio/vhost.c
> > @@ -1313,6 +1313,10 @@ int vhost_dev_init(struct vhost_dev *hdev, void 
> > *opaque,
> >   goto fail;
> >   }
> >   }
> > +r = event_notifier_init(&hdev->masked_config_notifier, 0);
> > +if (r < 0) {
> > +return r;
> > +}
> >
> >   if (busyloop_timeout) {
> >   for (i = 0; i < hdev->nvqs; ++i) {
> > @@ -1405,6 +1409,7 @@ void vhost_dev_cleanup(struct vhost_dev *hdev)
> >   for (i = 0; i < hdev->nvqs; ++i) {
> >   vhost_virtqueue_cleanup(hdev->vqs + i);
> >   }
> > +event_notifier_cleanup(&hdev->masked_config_notifier);
> >   if (hdev->mem) {
> >   /* those are only safe after successful init */
> >   memory_listener_unregister(&hdev->memory_listener);
> > @@ -1498,6 +1503,10 @@ bool vhost_virtqueue_pending(struct vhost_dev *hdev, 
> > int n)
> >   return event_notifier_test_and_clear(&vq->masked_notifier);
> >   }
> >
> > +bool vhost_config_pending(struct vhost_dev *hdev, int n)
> > +{
> > +return event_notifier_test_and_clear(&hdev->masked_config_notifier);
> > +}
> >   /* Mask/unmask events from this vq. */
> >   void vhost_virtqueue_mask(struct vhost_dev *hdev, VirtIODevice *vdev, int 
> > n,
> >bool mask)
> > @@ -1522,6 +1531,28 @@ void vhost_virtqueue_mask(struct vhost_dev *hdev, 
> > VirtIODevice *vdev, int n,
> >   VHOST_OPS_DEBUG("vhost_set_vring_call failed");
> >   }
> >   }
> > +void vhost_config_mask(struct vhost_dev *hdev, VirtIODevice *vdev,
> > + bool mask)
> > +{
> > +   int fd;
> > +   int r;
> > +   EventNotifier *masked_config_notifier = &hdev->masked_config_notifier;
> > +   EventNotifier *config_notifier = &vdev->config_notifier;
> > +   if (vdev->use_config_notifier != true) {
> > +return;
> > +}
> > +assert(hdev->vhost_ops);
> > +if (mask) {
> > +assert(vdev->use_guest_notifier_mask);
> > +fd = event_notifier_get_fd(masked_config_notifier);
> > +} else {
> > +fd = event_notifier_get_fd(config_notifier);
> > +}
> > +   r = hdev->vhost_ops->vhost_set_config_call(hdev, &fd);
> > +   if (r < 0) {
> > +error_report("vhost_set_config_call failed")

Re: [PATCH v5 2/6] vhost: add new call back function for config interrupt

On Fri, Apr 9, 2021 at 3:12 PM Jason Wang  wrote:
>
>
> 在 2021/4/8 下午5:38, Cindy Lu 写道:
> > to support configure interrupt, we need to
> > Add new call back function for config interrupt.
>
>
> There're brunch of capital issues.
>
>
> > now聽this call back function only used in vhost-vdpa driver
>
>
> And looks like anthoer traditional chinese character.
>
>
will fix this
> >
> > Signed-off-by: Cindy Lu 
> > ---
> >   hw/virtio/trace-events| 2 ++
> >   hw/virtio/vhost-vdpa.c| 7 +++
> >   include/hw/virtio/vhost-backend.h | 3 +++
> >   3 files changed, 12 insertions(+)
> >
> > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > index 2060a144a2..6710835b46 100644
> > --- a/hw/virtio/trace-events
> > +++ b/hw/virtio/trace-events
> > @@ -52,6 +52,8 @@ vhost_vdpa_set_vring_call(void *dev, unsigned int index, 
> > int fd) "dev: %p index:
> >   vhost_vdpa_get_features(void *dev, uint64_t features) "dev: %p features: 
> > 0x%"PRIx64
> >   vhost_vdpa_set_owner(void *dev) "dev: %p"
> >   vhost_vdpa_vq_get_addr(void *dev, void *vq, uint64_t desc_user_addr, 
> > uint64_t avail_user_addr, uint64_t used_user_addr) "dev: %p vq: %p 
> > desc_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" used_user_addr: 
> > 0x%"PRIx64
> > +vhost_vdpa_set_config_call(void *dev, int *fd)"dev: %p fd: %p"
> > +
> >
> >   # virtio.c
> >   virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned 
> > out_num) "elem %p size %zd in_num %u out_num %u"
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 01d2101d09..9ba2a2bed4 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -545,6 +545,12 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev 
> > *dev,
> >   trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
> >   return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
> >   }
> > +static int vhost_vdpa_set_config_call(struct vhost_dev *dev,
> > +   int *fd)
> > +{
> > +trace_vhost_vdpa_set_config_call(dev, fd);
> > +return vhost_vdpa_call(dev, VHOST_VDPA_SET_CONFIG_CALL, fd);
> > +}
> >
> >   static int vhost_vdpa_get_features(struct vhost_dev *dev,
> >uint64_t *features)
> > @@ -611,4 +617,5 @@ const VhostOps vdpa_ops = {
> >   .vhost_get_device_id = vhost_vdpa_get_device_id,
> >   .vhost_vq_get_addr = vhost_vdpa_vq_get_addr,
> >   .vhost_force_iommu = vhost_vdpa_force_iommu,
> > +.vhost_set_config_call = vhost_vdpa_set_config_call,
> >   };
> > diff --git a/include/hw/virtio/vhost-backend.h 
> > b/include/hw/virtio/vhost-backend.h
> > index 8a6f8e2a7a..adaf6982d2 100644
> > --- a/include/hw/virtio/vhost-backend.h
> > +++ b/include/hw/virtio/vhost-backend.h
> > @@ -125,6 +125,8 @@ typedef int (*vhost_get_device_id_op)(struct vhost_dev 
> > *dev, uint32_t *dev_id);
> >
> >   typedef bool (*vhost_force_iommu_op)(struct vhost_dev *dev);
> >
> > +typedef int (*vhost_set_config_call_op)(struct vhost_dev *dev,
> > +   int *fd);
> >   typedef struct VhostOps {
> >   VhostBackendType backend_type;
> >   vhost_backend_init vhost_backend_init;
> > @@ -170,6 +172,7 @@ typedef struct VhostOps {
> >   vhost_vq_get_addr_op  vhost_vq_get_addr;
> >   vhost_get_device_id_op vhost_get_device_id;
> >   vhost_force_iommu_op vhost_force_iommu;
> > +vhost_set_config_call_op vhost_set_config_call;
> >   } VhostOps;
>
>
> Let's split the vhost-vdpa implementation into another patch.
>
> Thanks
>
>
sure will fix this
> >
> >   extern const VhostOps user_ops;
>

Re: [PATCH v5 1/6] virtio: introduce new type in interrupt process

On Fri, Apr 9, 2021 at 2:57 PM Jason Wang  wrote:
>
>
> 在 2021/4/8 下午5:38, Cindy Lu 写道:
> > To support config interrupt we need to add
> > a new type of interrupt process. So we introduce
> > the vector type
> > enum virtio_vector_type {
> > 聽 聽 聽 聽 VIRTIO_VQ_VECTOR,
> > 聽 聽 聽 聽 VIRTIO_CONFIG_VECTOR,
> > 聽 聽 聽 聽 VIRTIO_VECTOR_UNKNOWN,
> > 聽 聽 };
> > 聽 聽
>
>
> I see some tranditiaonl chinese characters here.
>
> Plaese correctly congirue your locale.
>
sure will fix this
> So consider we use int for queue_index, can we simply use -1 as a hint
> for the config interrupt then there's no need for this extra enum?
>
that's a good idea, I will rewrite this part
>
> > Now the bus which support configure interrupt is
> > virtio-mmio and vritio-pci. For other drivers, the function will
> > only check if the interrupt type is the VIRTIO_CONFIG_VQ. If not
> > the function will fail.
> >
> > Signed-off-by: Cindy Lu 
> > ---
> >   hw/display/vhost-user-gpu.c| 14 +---
> >   hw/net/virtio-net.c| 15 +---
> >   hw/s390x/virtio-ccw.c  |  6 ++--
> >   hw/virtio/vhost-user-fs.c  | 12 ---
> >   hw/virtio/vhost-vsock-common.c | 12 ---
> >   hw/virtio/virtio-crypto.c  | 13 ---
> >   hw/virtio/virtio-mmio.c|  2 +-
> >   hw/virtio/virtio-pci.c | 66 --
> >   include/hw/virtio/virtio.h |  9 +++--
> >   9 files changed, 95 insertions(+), 54 deletions(-)
> >
> > diff --git a/hw/display/vhost-user-gpu.c b/hw/display/vhost-user-gpu.c
> > index 51f1747c4a..959ad115b6 100644
> > --- a/hw/display/vhost-user-gpu.c
> > +++ b/hw/display/vhost-user-gpu.c
> > @@ -487,18 +487,24 @@ vhost_user_gpu_set_status(VirtIODevice *vdev, uint8_t 
> > val)
> >   }
> >
> >   static bool
> > -vhost_user_gpu_guest_notifier_pending(VirtIODevice *vdev, int idx)
> > +vhost_user_gpu_guest_notifier_pending(VirtIODevice *vdev, int idx,
> > +int type)
> >   {
> >   VhostUserGPU *g = VHOST_USER_GPU(vdev);
> > -
> > +if (type != VIRTIO_VQ_VECTOR) {
> > +return false;
> > +}
> >   return vhost_virtqueue_pending(&g->vhost->dev, idx);
> >   }
> >
> >   static void
> > -vhost_user_gpu_guest_notifier_mask(VirtIODevice *vdev, int idx, bool mask)
> > +vhost_user_gpu_guest_notifier_mask(VirtIODevice *vdev, int idx, bool mask,
> > +int type)
> >   {
> >   VhostUserGPU *g = VHOST_USER_GPU(vdev);
> > -
> > +if (type != VIRTIO_VQ_VECTOR) {
> > +return;
> > +}
> >   vhost_virtqueue_mask(&g->vhost->dev, vdev, idx, mask);
> >   }
> >
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index 9179013ac4..00d95e6615 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -3055,22 +3055,27 @@ static NetClientInfo net_virtio_info = {
> >   .announce = virtio_net_announce,
> >   };
> >
> > -static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
> > +static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx,
> > +int type)
> >   {
> >   VirtIONet *n = VIRTIO_NET(vdev);
> >   NetClientState *nc = qemu_get_subqueue(n->nic, vq2q(idx));
> >   assert(n->vhost_started);
> > -return vhost_net_virtqueue_pending(get_vhost_net(nc->peer), idx);
> > +if (type == VIRTIO_VQ_VECTOR) {
> > +return vhost_net_virtqueue_pending(get_vhost_net(nc->peer), idx);
> > +}
> > +return false;
> >   }
> >
> >   static void virtio_net_guest_notifier_mask(VirtIODevice *vdev, int idx,
> > -   bool mask)
> > +   bool mask, int type)
> >   {
> >   VirtIONet *n = VIRTIO_NET(vdev);
> >   NetClientState *nc = qemu_get_subqueue(n->nic, vq2q(idx));
> >   assert(n->vhost_started);
> > -vhost_net_virtqueue_mask(get_vhost_net(nc->peer),
> > - vdev, idx, mask);
> > +if (type == VIRTIO_VQ_VECTOR) {
> > +vhost_net_virtqueue_mask(get_vhost_net(nc->peer), vdev, idx, mask);
> > + }
> >   }
> >
> >   static void virtio_net_set_config_size(VirtIONet *n, uint64_t 
> > host_features)
> > diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
> > index 4582e94ae7..234f749548 100644
> > --- a/hw/s390x/virtio-ccw.c
> > +++ b/hw/s390x/virtio-ccw.c
> > @@ -1003,16 +1003,16 @@ static int 
> > virtio_ccw_set_guest_notifier(VirtioCcwDevice *dev, int n,
> >* need to manually trigger any guest masking callbacks here.
> >*/
> >   if (k->guest_notifier_mask && vdev->use_guest_notifier_mask) {
> > -k->guest_notifier_mask(vdev, n, false);
> > +k->guest_notifier_mask(vdev, n, false, VIRTIO_VQ_VECTOR);
> >   }
> >   /* get lost events and re-inject */
> >   if (k->guest_notifier_pending &&
> > -k->guest_notifier_pending(vdev, n)) {
> > +k->guest_no

Re: [PATCH v5 4/6] vhost-vdpa: add support for configure interrupt

On Fri, Apr 9, 2021 at 3:24 PM Jason Wang  wrote:
>
>
> 在 2021/4/8 下午5:38, Cindy Lu 写道:
> > Add support for configure interrupt. Set the notifier's fd to
> > the kernel driver when vdpa start. also set -1 while vdpa stop.
> > then the kernel will release the related resource
> >
> > Signed-off-by: Cindy Lu 
> > ---
> >   hw/virtio/vhost-vdpa.c | 26 +-
> >   1 file changed, 25 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 9ba2a2bed4..7825366f64 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -467,11 +467,33 @@ static int vhost_vdpa_get_config(struct vhost_dev 
> > *dev, uint8_t *config,
> >   }
> >   return ret;
> >}
> > -
>
>
> Let's keep this blank line.
>
>
sure will fix this
> > +static void vhost_vdpa_config_notify_start(struct vhost_dev *dev,
> > +struct VirtIODevice *vdev, bool start)
> > +{
> > +int fd = 0;
> > +int r = 0;
> > +if (!(dev->features & (0x1ULL << VIRTIO_NET_F_STATUS))) {
> > +return;
> > +}
> > +if (start) {
> > +fd = event_notifier_get_fd(&vdev->config_notifier);
> > +r = dev->vhost_ops->vhost_set_config_call(dev, &fd);
>
>
> So you introduce a general vhost ops but it's call by devic specific
> code (vhost-vdpa).
>
> Any reason that we don't do that in vhost_dev_start/vhost_dev_stop?
>
> Thanks
>
sure, will move this part to dev_start/stop
>
> > +if (!r) {
> > +vdev->use_config_notifier = true;
> > +event_notifier_set(&vdev->config_notifier);
> > +}
> > +} else {
> > +fd = -1;
> > +vdev->use_config_notifier = false;
> > +r = dev->vhost_ops->vhost_set_config_call(dev, &fd);
> > +}
> > +return;
> > +}
> >   static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> >   {
> >   struct vhost_vdpa *v = dev->opaque;
> >   trace_vhost_vdpa_dev_start(dev, started);
> > +VirtIODevice *vdev = dev->vdev;
> >   if (started) {
> >   uint8_t status = 0;
> >   memory_listener_register(&v->listener, &address_space_memory);
> > @@ -479,8 +501,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, 
> > bool started)
> >   vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> >   vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
> >
> > +vhost_vdpa_config_notify_start(dev, vdev, true);
> >   return !(status & VIRTIO_CONFIG_S_DRIVER_OK);
> >   } else {
> > +vhost_vdpa_config_notify_start(dev, vdev, false);
> >   vhost_vdpa_reset_device(dev);
> >   vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> >  VIRTIO_CONFIG_S_DRIVER);
>

[PATCH v6 1/7] net/tap: Added TUNSETSTEERINGEBPF code.

From: Andrew 

Additional code that will be used for eBPF setting steering routine.

Signed-off-by: Andrew Melnychenko 
---
 net/tap-linux.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/tap-linux.h b/net/tap-linux.h
index 2f36d100fc..1d06fe0de6 100644
--- a/net/tap-linux.h
+++ b/net/tap-linux.h
@@ -31,6 +31,7 @@
 #define TUNSETQUEUE  _IOW('T', 217, int)
 #define TUNSETVNETLE _IOW('T', 220, int)
 #define TUNSETVNETBE _IOW('T', 222, int)
+#define TUNSETSTEERINGEBPF _IOR('T', 224, int)
 
 #endif
 
-- 
2.31.0

[PATCH v6 0/7] eBPF RSS support for virtio-net

This set of patches introduces the usage of eBPF for packet steering
and RSS hash calculation:
* RSS(Receive Side Scaling) is used to distribute network packets to
guest virtqueues by calculating packet hash
* Additionally adding support for the usage of RSS with vhost

The eBPF works on kernels 5.8+
On earlier kerneld it fails to load and the RSS feature is reported
only without vhost and implemented in 'in-qemu' software.

Implementation notes:
Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
Added libbpf dependency and eBPF support.
The eBPF program is part of the qemu and presented as an array
of BPF ELF file data. The eBPF array file initially generated by bpftool.
The compilation of eBPF is not part of QEMU build and can be done
using provided Makefile.ebpf.
Added changes to virtio-net and vhost, primary eBPF RSS is used.
'in-qemu' RSS used in the case of hash population and as a fallback option.
For vhost, the hash population feature is not reported to the guest.

Please also see the documentation in PATCH 6/7.

Known issues:
* hash population not supported by eBPF RSS: 'in-qemu' RSS used
as a fallback, also, hash population feature is not reported to guests
with vhost.
* IPv6 extensions still in progress.

Changes since v1:
* using libbpf instead of direct 'bpf' system call.
* added libbpf dependency to the configure/meson scripts.
* changed python script for eBPF .h file generation.
* changed eBPF program - reading L3 proto from ethernet frame.
* added TUNSETSTEERINGEBPF define for TUN.
* changed the maintainer's info.
* added license headers.
* refactored code.

Changes since v2:
* using bpftool for eBPF skeleton generation.
* ebpf_rss is refactored to use skeleton generated by bpftool.
* added/adjasted license in comment sections and in eBPF file.
* rss.bpf.c and Makefile.ebpf moved to the tool/ebpf folder.
* virtio-net eBPF rss refactored. Now eBPF initialized during realize().

Changes since v3:
* rebased to last master.
* fixed issue with failed build without libbpf.
* fixed ebpf loading without rss option.
* refactored labels in ebpf_rss.c

Changes since v4:
* refactored configure/meson script.
* added checks for load_bytes in ebpf.
* documentation added to the index.
* refactored Makefile and rss.bpf.c.
* rebased to last master.

Changes since v5:
* fixed issue with dstopt parsing in the eBPF program.
* added fragment packet parsing to skip L4.

Andrew (7):
  net/tap: Added TUNSETSTEERINGEBPF code.
  net: Added SetSteeringEBPF method for NetClientState.
  ebpf: Added eBPF RSS program.
  ebpf: Added eBPF RSS loader.
  virtio-net: Added eBPF RSS to virtio-net.
  docs: Added eBPF documentation.
  MAINTAINERS: Added eBPF maintainers information.

 MAINTAINERS|   8 +
 configure  |   8 +-
 docs/devel/ebpf_rss.rst| 125 
 docs/devel/index.rst   |   1 +
 ebpf/ebpf_rss-stub.c   |  40 +++
 ebpf/ebpf_rss.c| 165 ++
 ebpf/ebpf_rss.h|  44 +++
 ebpf/meson.build   |   1 +
 ebpf/rss.bpf.skeleton.h| 431 +
 ebpf/trace-events  |   4 +
 ebpf/trace.h   |   2 +
 hw/net/vhost_net.c |   3 +
 hw/net/virtio-net.c| 115 ++-
 include/hw/virtio/virtio-net.h |   4 +
 include/net/net.h  |   2 +
 meson.build|   9 +
 meson_options.txt  |   2 +
 net/tap-bsd.c  |   5 +
 net/tap-linux.c|  13 +
 net/tap-linux.h|   1 +
 net/tap-solaris.c  |   5 +
 net/tap-stub.c |   5 +
 net/tap.c  |   9 +
 net/tap_int.h  |   1 +
 net/vhost-vdpa.c   |   2 +
 tools/ebpf/Makefile.ebpf   |  22 ++
 tools/ebpf/rss.bpf.c   | 569 +
 27 files changed, 1592 insertions(+), 4 deletions(-)
 create mode 100644 docs/devel/ebpf_rss.rst
 create mode 100644 ebpf/ebpf_rss-stub.c
 create mode 100644 ebpf/ebpf_rss.c
 create mode 100644 ebpf/ebpf_rss.h
 create mode 100644 ebpf/meson.build
 create mode 100644 ebpf/rss.bpf.skeleton.h
 create mode 100644 ebpf/trace-events
 create mode 100644 ebpf/trace.h
 create mode 100755 tools/ebpf/Makefile.ebpf
 create mode 100644 tools/ebpf/rss.bpf.c

-- 
2.31.0

[PATCH v6 2/7] net: Added SetSteeringEBPF method for NetClientState.

From: Andrew 

For now, that method supported only by Linux TAP.
Linux TAP uses TUNSETSTEERINGEBPF ioctl.

Signed-off-by: Andrew Melnychenko 
---
 include/net/net.h |  2 ++
 net/tap-bsd.c |  5 +
 net/tap-linux.c   | 13 +
 net/tap-solaris.c |  5 +
 net/tap-stub.c|  5 +
 net/tap.c |  9 +
 net/tap_int.h |  1 +
 7 files changed, 40 insertions(+)

diff --git a/include/net/net.h b/include/net/net.h
index eff24519d2..c55eb08437 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -61,6 +61,7 @@ typedef int (SetVnetBE)(NetClientState *, bool);
 typedef struct SocketReadState SocketReadState;
 typedef void (SocketReadStateFinalize)(SocketReadState *rs);
 typedef void (NetAnnounce)(NetClientState *);
+typedef bool (SetSteeringEBPF)(NetClientState *, int);
 
 typedef struct NetClientInfo {
 NetClientDriver type;
@@ -82,6 +83,7 @@ typedef struct NetClientInfo {
 SetVnetLE *set_vnet_le;
 SetVnetBE *set_vnet_be;
 NetAnnounce *announce;
+SetSteeringEBPF *set_steering_ebpf;
 } NetClientInfo;
 
 struct NetClientState {
diff --git a/net/tap-bsd.c b/net/tap-bsd.c
index 77aaf674b1..4f64f31e98 100644
--- a/net/tap-bsd.c
+++ b/net/tap-bsd.c
@@ -259,3 +259,8 @@ int tap_fd_get_ifname(int fd, char *ifname)
 {
 return -1;
 }
+
+int tap_fd_set_steering_ebpf(int fd, int prog_fd)
+{
+return -1;
+}
diff --git a/net/tap-linux.c b/net/tap-linux.c
index b0635e9e32..9584769740 100644
--- a/net/tap-linux.c
+++ b/net/tap-linux.c
@@ -316,3 +316,16 @@ int tap_fd_get_ifname(int fd, char *ifname)
 pstrcpy(ifname, sizeof(ifr.ifr_name), ifr.ifr_name);
 return 0;
 }
+
+int tap_fd_set_steering_ebpf(int fd, int prog_fd)
+{
+if (ioctl(fd, TUNSETSTEERINGEBPF, (void *) &prog_fd) != 0) {
+error_report("Issue while setting TUNSETSTEERINGEBPF:"
+" %s with fd: %d, prog_fd: %d",
+strerror(errno), fd, prog_fd);
+
+   return -1;
+}
+
+return 0;
+}
diff --git a/net/tap-solaris.c b/net/tap-solaris.c
index 0475a58207..d85224242b 100644
--- a/net/tap-solaris.c
+++ b/net/tap-solaris.c
@@ -255,3 +255,8 @@ int tap_fd_get_ifname(int fd, char *ifname)
 {
 return -1;
 }
+
+int tap_fd_set_steering_ebpf(int fd, int prog_fd)
+{
+return -1;
+}
diff --git a/net/tap-stub.c b/net/tap-stub.c
index de525a2e69..a0fa25804b 100644
--- a/net/tap-stub.c
+++ b/net/tap-stub.c
@@ -85,3 +85,8 @@ int tap_fd_get_ifname(int fd, char *ifname)
 {
 return -1;
 }
+
+int tap_fd_set_steering_ebpf(int fd, int prog_fd)
+{
+return -1;
+}
diff --git a/net/tap.c b/net/tap.c
index dd42ac6134..54c9960593 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -347,6 +347,14 @@ static void tap_poll(NetClientState *nc, bool enable)
 tap_write_poll(s, enable);
 }
 
+static bool tap_set_steering_ebpf(NetClientState *nc, int prog_fd)
+{
+TAPState *s = DO_UPCAST(TAPState, nc, nc);
+assert(nc->info->type == NET_CLIENT_DRIVER_TAP);
+
+return tap_fd_set_steering_ebpf(s->fd, prog_fd) == 0;
+}
+
 int tap_get_fd(NetClientState *nc)
 {
 TAPState *s = DO_UPCAST(TAPState, nc, nc);
@@ -372,6 +380,7 @@ static NetClientInfo net_tap_info = {
 .set_vnet_hdr_len = tap_set_vnet_hdr_len,
 .set_vnet_le = tap_set_vnet_le,
 .set_vnet_be = tap_set_vnet_be,
+.set_steering_ebpf = tap_set_steering_ebpf,
 };
 
 static TAPState *net_tap_fd_init(NetClientState *peer,
diff --git a/net/tap_int.h b/net/tap_int.h
index 225a49ea48..547f8a5a28 100644
--- a/net/tap_int.h
+++ b/net/tap_int.h
@@ -44,5 +44,6 @@ int tap_fd_set_vnet_be(int fd, int vnet_is_be);
 int tap_fd_enable(int fd);
 int tap_fd_disable(int fd);
 int tap_fd_get_ifname(int fd, char *ifname);
+int tap_fd_set_steering_ebpf(int fd, int prog_fd);
 
 #endif /* NET_TAP_INT_H */
-- 
2.31.0

[PATCH v6 5/7] virtio-net: Added eBPF RSS to virtio-net.

From: Andrew 

When RSS is enabled the device tries to load the eBPF program
to select RX virtqueue in the TUN. If eBPF can be loaded
the RSS will function also with vhost (works with kernel 5.8 and later).
Software RSS is used as a fallback with vhost=off when eBPF can't be loaded
or when hash population requested by the guest.

Signed-off-by: Yuri Benditovich 
Signed-off-by: Andrew Melnychenko 
---
 hw/net/vhost_net.c |   3 +
 hw/net/virtio-net.c| 115 -
 include/hw/virtio/virtio-net.h |   4 ++
 net/vhost-vdpa.c   |   2 +
 4 files changed, 121 insertions(+), 3 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 24d555e764..44c1ed92dc 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -45,6 +45,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_NET_F_MTU,
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_PACKED,
+VIRTIO_NET_F_HASH_REPORT,
 VHOST_INVALID_FEATURE_BIT
 };
 
@@ -71,6 +72,8 @@ static const int user_feature_bits[] = {
 VIRTIO_NET_F_MTU,
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_PACKED,
+VIRTIO_NET_F_RSS,
+VIRTIO_NET_F_HASH_REPORT,
 
 /* This bit implies RARP isn't sent by QEMU out of band */
 VIRTIO_NET_F_GUEST_ANNOUNCE,
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 66b9ff4511..7ed11a303b 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -737,8 +737,9 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, 
uint64_t features,
 return features;
 }
 
-virtio_clear_feature(&features, VIRTIO_NET_F_RSS);
-virtio_clear_feature(&features, VIRTIO_NET_F_HASH_REPORT);
+if (!ebpf_rss_is_loaded(&n->ebpf_rss)) {
+virtio_clear_feature(&features, VIRTIO_NET_F_RSS);
+}
 features = vhost_net_get_features(get_vhost_net(nc->peer), features);
 vdev->backend_features = features;
 
@@ -1163,12 +1164,79 @@ static int virtio_net_handle_announce(VirtIONet *n, 
uint8_t cmd,
 }
 }
 
+static void virtio_net_detach_epbf_rss(VirtIONet *n);
+
 static void virtio_net_disable_rss(VirtIONet *n)
 {
 if (n->rss_data.enabled) {
 trace_virtio_net_rss_disable();
 }
 n->rss_data.enabled = false;
+
+virtio_net_detach_epbf_rss(n);
+}
+
+static bool virtio_net_attach_ebpf_to_backend(NICState *nic, int prog_fd)
+{
+NetClientState *nc = qemu_get_peer(qemu_get_queue(nic), 0);
+if (nc == NULL || nc->info->set_steering_ebpf == NULL) {
+return false;
+}
+
+return nc->info->set_steering_ebpf(nc, prog_fd);
+}
+
+static void rss_data_to_rss_config(struct VirtioNetRssData *data,
+   struct EBPFRSSConfig *config)
+{
+config->redirect = data->redirect;
+config->populate_hash = data->populate_hash;
+config->hash_types = data->hash_types;
+config->indirections_len = data->indirections_len;
+config->default_queue = data->default_queue;
+}
+
+static bool virtio_net_attach_epbf_rss(VirtIONet *n)
+{
+struct EBPFRSSConfig config = {};
+
+if (!ebpf_rss_is_loaded(&n->ebpf_rss)) {
+return false;
+}
+
+rss_data_to_rss_config(&n->rss_data, &config);
+
+if (!ebpf_rss_set_all(&n->ebpf_rss, &config,
+  n->rss_data.indirections_table, n->rss_data.key)) {
+return false;
+}
+
+if (!virtio_net_attach_ebpf_to_backend(n->nic, n->ebpf_rss.program_fd)) {
+return false;
+}
+
+return true;
+}
+
+static void virtio_net_detach_epbf_rss(VirtIONet *n)
+{
+virtio_net_attach_ebpf_to_backend(n->nic, -1);
+}
+
+static bool virtio_net_load_ebpf(VirtIONet *n)
+{
+if (!virtio_net_attach_ebpf_to_backend(n->nic, -1)) {
+/* backend does't support steering ebpf */
+return false;
+}
+
+return ebpf_rss_load(&n->ebpf_rss);
+}
+
+static void virtio_net_unload_ebpf(VirtIONet *n)
+{
+virtio_net_attach_ebpf_to_backend(n->nic, -1);
+ebpf_rss_unload(&n->ebpf_rss);
 }
 
 static uint16_t virtio_net_handle_rss(VirtIONet *n,
@@ -1283,6 +1351,25 @@ static uint16_t virtio_net_handle_rss(VirtIONet *n,
 goto error;
 }
 n->rss_data.enabled = true;
+
+if (!n->rss_data.populate_hash) {
+if (!virtio_net_attach_epbf_rss(n)) {
+/* EBPF must be loaded for vhost */
+if (get_vhost_net(qemu_get_queue(n->nic)->peer)) {
+warn_report("Can't load eBPF RSS for vhost");
+goto error;
+}
+/* fallback to software RSS */
+warn_report("Can't load eBPF RSS - fallback to software RSS");
+n->rss_data.enabled_software_rss = true;
+}
+} else {
+/* use software RSS for hash populating */
+/* and detach eBPF if was loaded before */
+virtio_net_detach_epbf_rss(n);
+n->rss_data.enabled_software_rss = true;
+}
+
 trace_virtio_net_rss_enable(n->rss_data.hash_types,
 n->rss_data.indirect

[PATCH v6 6/7] docs: Added eBPF documentation.

From: Andrew 

Signed-off-by: Yuri Benditovich 
Signed-off-by: Andrew Melnychenko 
---
 docs/devel/ebpf_rss.rst | 125 
 docs/devel/index.rst|   1 +
 2 files changed, 126 insertions(+)
 create mode 100644 docs/devel/ebpf_rss.rst

diff --git a/docs/devel/ebpf_rss.rst b/docs/devel/ebpf_rss.rst
new file mode 100644
index 00..e00962577a
--- /dev/null
+++ b/docs/devel/ebpf_rss.rst
@@ -0,0 +1,125 @@
+===
+eBPF RSS virtio-net support
+===
+
+RSS(Receive Side Scaling) is used to distribute network packets to guest 
virtqueues
+by calculating packet hash. Usually every queue is processed then by a 
specific guest CPU core.
+
+For now there are 2 RSS implementations in qemu:
+- 'in-qemu' RSS (functions if qemu receives network packets, i.e. vhost=off)
+- eBPF RSS (can function with also with vhost=on)
+
+eBPF support (CONFIG_EBPF) is enabled by 'configure' script.
+To enable eBPF RSS support use './configure --enable-bpf'.
+
+If steering BPF is not set for kernel's TUN module, the TUN uses automatic 
selection
+of rx virtqueue based on lookup table built according to calculated symmetric 
hash
+of transmitted packets.
+If steering BPF is set for TUN the BPF code calculates the hash of packet 
header and
+returns the virtqueue number to place the packet to.
+
+Simplified decision formula:
+
+.. code:: C
+
+queue_index = indirection_table[hash()%]
+
+
+Not for all packets, the hash can/should be calculated.
+
+Note: currently, eBPF RSS does not support hash reporting.
+
+eBPF RSS turned on by different combinations of vhost-net, vitrio-net and tap 
configurations:
+
+- eBPF is used:
+
+tap,vhost=off & virtio-net-pci,rss=on,hash=off
+
+- eBPF is used:
+
+tap,vhost=on & virtio-net-pci,rss=on,hash=off
+
+- 'in-qemu' RSS is used:
+
+tap,vhost=off & virtio-net-pci,rss=on,hash=on
+
+- eBPF is used, hash population feature is not reported to the guest:
+
+tap,vhost=on & virtio-net-pci,rss=on,hash=on
+
+If CONFIG_EBPF is not set then only 'in-qemu' RSS is supported.
+Also 'in-qemu' RSS, as a fallback, is used if the eBPF program failed to load 
or set to TUN.
+
+RSS eBPF program
+
+
+RSS program located in ebpf/rss.bpf.skeleton.h generated by bpftool.
+So the program is part of the qemu binary.
+Initially, the eBPF program was compiled by clang and source code located at 
tools/ebpf/rss.bpf.c.
+Prerequisites to recompile the eBPF program (regenerate 
ebpf/rss.bpf.skeleton.h):
+
+llvm, clang, kernel source tree, bpftool
+Adjust Makefile.ebpf to reflect the location of the kernel source tree
+
+$ cd tools/ebpf
+$ make -f Makefile.ebpf
+
+Current eBPF RSS implementation uses 'bounded loops' with 'backward jump 
instructions' which present in the last kernels.
+Overall eBPF RSS works on kernels 5.8+.
+
+eBPF RSS implementation
+---
+
+eBPF RSS loading functionality located in ebpf/ebpf_rss.c and ebpf/ebpf_rss.h.
+
+The `struct EBPFRSSContext` structure that holds 4 file descriptors:
+
+- ctx - pointer of the libbpf context.
+- program_fd - file descriptor of the eBPF RSS program.
+- map_configuration - file descriptor of the 'configuration' map. This map 
contains one element of 'struct EBPFRSSConfig'. This configuration determines 
eBPF program behavior.
+- map_toeplitz_key - file descriptor of the 'Toeplitz key' map. One element of 
the 40byte key prepared for the hashing algorithm.
+- map_indirections_table - 128 elements of queue indexes.
+
+`struct EBPFRSSConfig` fields:
+
+- redirect - "boolean" value, should the hash be calculated, on false  - 
`default_queue` would be used as the final decision.
+- populate_hash - for now, not used. eBPF RSS doesn't support hash reporting.
+- hash_types - binary mask of different hash types. See 
`VIRTIO_NET_RSS_HASH_TYPE_*` defines. If for packet hash should not be 
calculated - `default_queue` would be used.
+- indirections_len - length of the indirections table, maximum 128.
+- default_queue - the queue index that used for packet that shouldn't be 
hashed. For some packets, the hash can't be calculated(g.e ARP).
+
+Functions:
+
+- `ebpf_rss_init()` - sets ctx to NULL, which indicates that EBPFRSSContext is 
not loaded.
+- `ebpf_rss_load()` - creates 3 maps and loads eBPF program from the 
rss.bpf.skeleton.h. Returns 'true' on success. After that, program_fd can be 
used to set steering for TAP.
+- `ebpf_rss_set_all()` - sets values for eBPF maps. `indirections_table` 
length is in EBPFRSSConfig. `toeplitz_key` is VIRTIO_NET_RSS_MAX_KEY_SIZE aka 
40 bytes array.
+- `ebpf_rss_unload()` - close all file descriptors and set ctx to NULL.
+
+Simplified eBPF RSS workflow:
+
+.. code:: C
+
+struct EBPFRSSConfig config;
+config.redirect = 1;
+config.hash_types = VIRTIO_NET_RSS_HASH_TYPE_UDPv4 | 
VIRTIO_NET_RSS_HASH_TYPE_TCPv4;
+config.indirections_len = VIRTIO_NET_RSS_MAX_TABLE_L

[PATCH v6 4/7] ebpf: Added eBPF RSS loader.

From: Andrew 

Added function that loads RSS eBPF program.
Added stub functions for RSS eBPF loader.
Added meson and configuration options.

By default, eBPF feature enabled if libbpf is present in the build system.
libbpf checked in configuration shell script and meson script.

Signed-off-by: Yuri Benditovich 
Signed-off-by: Andrew Melnychenko 
---
 configure   |   8 +-
 ebpf/ebpf_rss-stub.c|  40 
 ebpf/ebpf_rss.c | 165 +++
 ebpf/ebpf_rss.h |  44 
 ebpf/meson.build|   1 +
 ebpf/rss.bpf.skeleton.h | 431 
 ebpf/trace-events   |   4 +
 ebpf/trace.h|   2 +
 meson.build |   9 +
 meson_options.txt   |   2 +
 10 files changed, 705 insertions(+), 1 deletion(-)
 create mode 100644 ebpf/ebpf_rss-stub.c
 create mode 100644 ebpf/ebpf_rss.c
 create mode 100644 ebpf/ebpf_rss.h
 create mode 100644 ebpf/meson.build
 create mode 100644 ebpf/rss.bpf.skeleton.h
 create mode 100644 ebpf/trace-events
 create mode 100644 ebpf/trace.h

diff --git a/configure b/configure
index 4f374b4889..7d9e0391a0 100755
--- a/configure
+++ b/configure
@@ -348,6 +348,7 @@ vhost_vsock="$default_feature"
 vhost_user="no"
 vhost_user_blk_server="auto"
 vhost_user_fs="$default_feature"
+bpf="auto"
 kvm="auto"
 hax="auto"
 hvf="auto"
@@ -1228,6 +1229,10 @@ for opt do
   ;;
   --enable-membarrier) membarrier="yes"
   ;;
+  --disable-bpf) bpf="disabled"
+  ;;
+  --enable-bpf) bpf="enabled"
+  ;;
   --disable-blobs) blobs="false"
   ;;
   --with-pkgversion=*) pkgversion="$optarg"
@@ -1865,6 +1870,7 @@ disabled with --disable-FEATURE, default is enabled if 
available
   vhost-user  vhost-user backend support
   vhost-user-blk-servervhost-user-blk server support
   vhost-vdpa  vhost-vdpa kernel backend support
+  bpf BPF kernel support
   spice   spice
   rbd rados block device (rbd)
   libiscsiiscsi support
@@ -6423,7 +6429,7 @@ NINJA=$ninja $meson setup \
 -Dattr=$attr -Ddefault_devices=$default_devices \
 -Ddocs=$docs -Dsphinx_build=$sphinx_build -Dinstall_blobs=$blobs \
 -Dvhost_user_blk_server=$vhost_user_blk_server 
-Dmultiprocess=$multiprocess \
--Dfuse=$fuse -Dfuse_lseek=$fuse_lseek 
-Dguest_agent_msi=$guest_agent_msi \
+-Dfuse=$fuse -Dfuse_lseek=$fuse_lseek 
-Dguest_agent_msi=$guest_agent_msi -Dbpf=$bpf\
 $(if test "$default_features" = no; then echo 
"-Dauto_features=disabled"; fi) \
-Dtcg_interpreter=$tcg_interpreter \
 $cross_arg \
diff --git a/ebpf/ebpf_rss-stub.c b/ebpf/ebpf_rss-stub.c
new file mode 100644
index 00..e71e229190
--- /dev/null
+++ b/ebpf/ebpf_rss-stub.c
@@ -0,0 +1,40 @@
+/*
+ * eBPF RSS stub file
+ *
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Authors:
+ *  Yuri Benditovich 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "ebpf/ebpf_rss.h"
+
+void ebpf_rss_init(struct EBPFRSSContext *ctx)
+{
+
+}
+
+bool ebpf_rss_is_loaded(struct EBPFRSSContext *ctx)
+{
+return false;
+}
+
+bool ebpf_rss_load(struct EBPFRSSContext *ctx)
+{
+return false;
+}
+
+bool ebpf_rss_set_all(struct EBPFRSSContext *ctx, struct EBPFRSSConfig *config,
+  uint16_t *indirections_table, uint8_t *toeplitz_key)
+{
+return false;
+}
+
+void ebpf_rss_unload(struct EBPFRSSContext *ctx)
+{
+
+}
diff --git a/ebpf/ebpf_rss.c b/ebpf/ebpf_rss.c
new file mode 100644
index 00..118c68da83
--- /dev/null
+++ b/ebpf/ebpf_rss.c
@@ -0,0 +1,165 @@
+/*
+ * eBPF RSS loader
+ *
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Authors:
+ *  Andrew Melnychenko 
+ *  Yuri Benditovich 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+
+#include 
+#include 
+
+#include "hw/virtio/virtio-net.h" /* VIRTIO_NET_RSS_MAX_TABLE_LEN */
+
+#include "ebpf/ebpf_rss.h"
+#include "ebpf/rss.bpf.skeleton.h"
+#include "trace.h"
+
+void ebpf_rss_init(struct EBPFRSSContext *ctx)
+{
+if (ctx != NULL) {
+ctx->obj = NULL;
+}
+}
+
+bool ebpf_rss_is_loaded(struct EBPFRSSContext *ctx)
+{
+return ctx != NULL && ctx->obj != NULL;
+}
+
+bool ebpf_rss_load(struct EBPFRSSContext *ctx)
+{
+struct rss_bpf *rss_bpf_ctx;
+
+if (ctx == NULL) {
+return false;
+}
+
+rss_bpf_ctx = rss_bpf__open();
+if (rss_bpf_ctx == NULL) {
+trace_ebpf_error("eBPF RSS", "can not open eBPF RSS object");
+goto error;
+}
+
+bpf_program__set_socket_filter(rss_bpf_ctx->progs.tun_rss_steering_prog);
+
+if (rss_bpf__load(rss_bpf_ctx)) {
+trace_ebpf_error("eBPF RSS", "can not load RSS program");
+goto error;
+}
+
+ctx->obj = rss_bpf_ctx;
+ctx->pro

[PATCH v6 3/7] ebpf: Added eBPF RSS program.

From: Andrew 

RSS program and Makefile to build it.
The bpftool used to generate '.h' file.
The data in that file may be loaded by libbpf.
EBPF compilation is not required for building qemu.
You can use Makefile if you need to regenerate rss.bpf.skeleton.h.

Signed-off-by: Yuri Benditovich 
Signed-off-by: Andrew Melnychenko 
---
 tools/ebpf/Makefile.ebpf |  22 ++
 tools/ebpf/rss.bpf.c | 569 +++
 2 files changed, 591 insertions(+)
 create mode 100755 tools/ebpf/Makefile.ebpf
 create mode 100644 tools/ebpf/rss.bpf.c

diff --git a/tools/ebpf/Makefile.ebpf b/tools/ebpf/Makefile.ebpf
new file mode 100755
index 00..45b5551fc7
--- /dev/null
+++ b/tools/ebpf/Makefile.ebpf
@@ -0,0 +1,22 @@
+OBJS = rss.bpf.o
+
+LLC ?= llc
+CLANG ?= clang
+INC_FLAGS = `$(CLANG) -print-file-name=include`
+EXTRA_CFLAGS ?= -O2 -emit-llvm -fno-stack-protector
+
+all: $(OBJS)
+
+.PHONY: clean
+
+clean:
+   rm -f $(OBJS)
+
+$(OBJS):  %.o:%.c
+   $(CLANG) $(INC_FLAGS) \
+-D__KERNEL__ -D__ASM_SYSREG_H \
+-I../include $(LINUXINCLUDE) \
+$(EXTRA_CFLAGS) -c $< -o -| $(LLC) -march=bpf -filetype=obj -o 
$@
+   bpftool gen skeleton rss.bpf.o > rss.bpf.skeleton.h
+   cp rss.bpf.skeleton.h ../../ebpf/
+
diff --git a/tools/ebpf/rss.bpf.c b/tools/ebpf/rss.bpf.c
new file mode 100644
index 00..9d57ab4212
--- /dev/null
+++ b/tools/ebpf/rss.bpf.c
@@ -0,0 +1,569 @@
+/*
+ * eBPF RSS program
+ *
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Authors:
+ *  Andrew Melnychenko 
+ *  Yuri Benditovich 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Prepare:
+ * Requires llvm, clang, bpftool, linux kernel tree
+ *
+ * Build rss.bpf.skeleton.h:
+ * make -f Makefile.ebpf clean all
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#define INDIRECTION_TABLE_SIZE 128
+#define HASH_CALCULATION_BUFFER_SIZE 36
+
+struct rss_config_t {
+__u8 redirect;
+__u8 populate_hash;
+__u32 hash_types;
+__u16 indirections_len;
+__u16 default_queue;
+} __attribute__((packed));
+
+struct toeplitz_key_data_t {
+__u32 leftmost_32_bits;
+__u8 next_byte[HASH_CALCULATION_BUFFER_SIZE];
+};
+
+struct packet_hash_info_t {
+__u8 is_ipv4;
+__u8 is_ipv6;
+__u8 is_udp;
+__u8 is_tcp;
+__u8 is_ipv6_ext_src;
+__u8 is_ipv6_ext_dst;
+__u8 is_fragmented;
+
+__u16 src_port;
+__u16 dst_port;
+
+union {
+struct {
+__be32 in_src;
+__be32 in_dst;
+};
+
+struct {
+struct in6_addr in6_src;
+struct in6_addr in6_dst;
+struct in6_addr in6_ext_src;
+struct in6_addr in6_ext_dst;
+};
+};
+};
+
+struct bpf_map_def SEC("maps")
+tap_rss_map_configurations = {
+.type= BPF_MAP_TYPE_ARRAY,
+.key_size= sizeof(__u32),
+.value_size  = sizeof(struct rss_config_t),
+.max_entries = 1,
+};
+
+struct bpf_map_def SEC("maps")
+tap_rss_map_toeplitz_key = {
+.type= BPF_MAP_TYPE_ARRAY,
+.key_size= sizeof(__u32),
+.value_size  = sizeof(struct toeplitz_key_data_t),
+.max_entries = 1,
+};
+
+struct bpf_map_def SEC("maps")
+tap_rss_map_indirection_table = {
+.type= BPF_MAP_TYPE_ARRAY,
+.key_size= sizeof(__u32),
+.value_size  = sizeof(__u16),
+.max_entries = INDIRECTION_TABLE_SIZE,
+};
+
+static inline void net_rx_rss_add_chunk(__u8 *rss_input, size_t *bytes_written,
+const void *ptr, size_t size) {
+__builtin_memcpy(&rss_input[*bytes_written], ptr, size);
+*bytes_written += size;
+}
+
+static inline
+void net_toeplitz_add(__u32 *result,
+  __u8 *input,
+  __u32 len
+, struct toeplitz_key_data_t *key) {
+
+__u32 accumulator = *result;
+__u32 leftmost_32_bits = key->leftmost_32_bits;
+__u32 byte;
+
+for (byte = 0; byte < HASH_CALCULATION_BUFFER_SIZE; byte++) {
+__u8 input_byte = input[byte];
+__u8 key_byte = key->next_byte[byte];
+__u8 bit;
+
+for (bit = 0; bit < 8; bit++) {
+if (input_byte & (1 << 7)) {
+accumulator ^= leftmost_32_bits;
+}
+
+leftmost_32_bits =
+(leftmost_32_bits << 1) | ((key_byte & (1 << 7)) >> 7);
+
+input_byte <<= 1;
+key_byte <<= 1;
+}
+}
+
+*result = accumulator;
+}
+
+
+static inline int ip6_extension_header_type(__u8 hdr_type)
+{
+switch (hdr_type) {
+case IPPROTO_HOPOPTS:
+case IPPROTO_ROUTING:
+case IPPROTO_FRAGMENT:
+case IPPROTO_ICMPV6:
+case IPPROTO_NONE:
+case IPPROTO_DSTOPTS:
+case IPPROTO_MH:
+

[PATCH v6 7/7] MAINTAINERS: Added eBPF maintainers information.

From: Andrew 

Signed-off-by: Yuri Benditovich 
Signed-off-by: Andrew Melnychenko 
---
 MAINTAINERS | 8 
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 58f342108e..e05a9fd9f6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3311,6 +3311,14 @@ F: include/hw/remote/proxy-memory-listener.h
 F: hw/remote/iohub.c
 F: include/hw/remote/iohub.h
 
+EBPF:
+M: Jason Wang 
+R: Andrew Melnychenko 
+R: Yuri Benditovich 
+S: Maintained
+F: ebpf/*
+F: tools/ebpf/*
+
 Build and test automation
 -
 Build and test automation, general continuous integration
-- 
2.31.0

Re: [PATCH v5 6/6] virtio-pci: add support for configure interrupt

On Fri, Apr 9, 2021 at 3:39 PM Jason Wang  wrote:
>
>
> 在 2021/4/8 下午5:38, Cindy Lu 写道:
> > Add support for configure interrupt, use kvm_irqfd_assign and set the
> > gsi to kernel. When the configure notifier was eventfd_signal by host
> > kernel, this will finally inject an msix interrupt to guest
> >
> > Signed-off-by: Cindy Lu 
> > ---
> >   hw/virtio/virtio-pci.c | 91 --
> >   1 file changed, 88 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> > index 481f6e7505..7b02f42c06 100644
> > --- a/hw/virtio/virtio-pci.c
> > +++ b/hw/virtio/virtio-pci.c
> > @@ -664,7 +664,6 @@ static uint32_t virtio_read_config(PCIDevice *pci_dev,
> >   }
> >
> >   static int kvm_virtio_pci_vq_vector_use(VirtIOPCIProxy *proxy,
> > -unsigned int queue_no,
> >   unsigned int vector)
> >   {
> >   VirtIOIRQFD *irqfd = &proxy->vector_irqfd[vector];
> > @@ -726,7 +725,7 @@ static int kvm_virtio_pci_vector_use(VirtIOPCIProxy 
> > *proxy, int nvqs)
> >   if (vector >= msix_nr_vectors_allocated(dev)) {
> >   continue;
> >   }
> > -ret = kvm_virtio_pci_vq_vector_use(proxy, queue_no, vector);
> > +ret = kvm_virtio_pci_vq_vector_use(proxy,  vector);
> >   if (ret < 0) {
> >   goto undo;
> >   }
> > @@ -760,6 +759,56 @@ undo:
> >   }
> >   return ret;
> >   }
> > +static int kvm_virtio_pci_vector_config_use(VirtIOPCIProxy *proxy)
> > +{
> > +VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus);
> > +unsigned int vector;
> > +int ret;
> > +EventNotifier *n = virtio_get_config_notifier(vdev);
> > +vector = vdev->config_vector ;
> > +ret = kvm_virtio_pci_vq_vector_use(proxy, vector);
> > +if (ret < 0) {
> > +goto undo;
> > +}
> > +ret = kvm_virtio_pci_irqfd_use(proxy,  n, vector);
>
>
> So the kvm_virtio_pci_vector_use() has the following codes:
>
>  /* If guest supports masking, set up irqfd now.
>   * Otherwise, delay until unmasked in the frontend.
>   */
>  if (vdev->use_guest_notifier_mask && k->guest_notifier_mask) {
>  vq = virtio_get_queue(vdev, queue_no);
>  n = virtio_queue_get_guest_notifier(vq);
>  ret = kvm_virtio_pci_irqfd_use(proxy, n, vector);
>  if (ret < 0) {
>  kvm_virtio_pci_vq_vector_release(proxy, vector);
>  goto undo;
>  }
>  }
>
> Do we need to check the masking support here as well?
>
> Btw, I think we can factor out the core logic (decouple the queue_no) of
> kvm_virtio_pci_vector_user() and let it be reused by config interrupt.
>
sure , I will rewrite this part
>
> > +if (ret < 0) {
> > +goto undo;
> > +}
> > +return 0;
> > +undo:
> > +kvm_virtio_pci_irqfd_release(proxy, n, vector);
> > +return ret;
> > +}
> > +static void kvm_virtio_pci_vector_config_release(VirtIOPCIProxy *proxy)
> > +{
> > +PCIDevice *dev = &proxy->pci_dev;
> > +VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus);
> > +unsigned int vector;
> > +EventNotifier *n = virtio_get_config_notifier(vdev);
> > +vector = vdev->config_vector ;
> > +if (vector >= msix_nr_vectors_allocated(dev)) {
> > +return;
> > +}
> > +kvm_virtio_pci_irqfd_release(proxy, n, vector);
> > +kvm_virtio_pci_vq_vector_release(proxy, vector);
> > +}
> > +static int virtio_pci_set_config_notifier(DeviceState *d,  bool assign)
> > +{
> > +VirtIOPCIProxy *proxy = to_virtio_pci_proxy(d);
> > +VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus);
> > +EventNotifier *notifier = virtio_get_config_notifier(vdev);
> > +int r = 0;
> > +if (assign) {
> > +r = event_notifier_init(notifier, 0);
> > +virtio_set_config_notifier_fd_handler(vdev, true, true);
> > +kvm_virtio_pci_vector_config_use(proxy);
> > +} else {
> > +virtio_set_config_notifier_fd_handler(vdev, false, true);
>
>
> So let's try to unify the codes between this and
> virtio_queue_set_guest_notifier_fd_handler(). Bascailly it's just
> decouple virtqueue *.
>
sure will fix this
>
> > +kvm_virtio_pci_vector_config_release(proxy);
> > +event_notifier_cleanup(notifier);
> > +}
> > +return r;
> > +}
> >
> >   static void kvm_virtio_pci_vector_release(VirtIOPCIProxy *proxy, int nvqs)
> >   {
> > @@ -858,6 +907,14 @@ static int virtio_pci_vector_unmask(PCIDevice *dev, 
> > unsigned vector,
> >   EventNotifier *n;
> >   int ret, index, unmasked = 0;
> >
> > +   if (vdev->use_config_notifier == true) {
> > +n = virtio_get_config_notifier(vdev);
> > +ret = virtio_pci_vq_vector_unmask(proxy, 0, vector, msg,
> > +VIRTIO_CONFIG_VECTOR, n);
> > +if (ret < 0) {
> > +goto config_undo;
> > +   }

Re: [RFC PATCH 3/3] hw/arm/smmuv3: Post-load stage 1 configurations to the host

2021-04-12 Thread Auger Eric

Hi Kunkun,
On 2/19/21 10:42 AM, Kunkun Jiang wrote:
> In nested mode, we call the set_pasid_table() callback on each STE
> update to pass the guest stage 1 configuration to the host and
> apply it at physical level.
> 
> In the case of live migration, we need to manual call the
s/manual/manually
> set_pasid_table() to load the guest stage 1 configurations to the
> host. If this operation is fail, the migration is fail.
s/If this operation is fail, the migration is fail/If this operation
fails, the migration fails.
> 
> Signed-off-by: Kunkun Jiang 
> ---
>  hw/arm/smmuv3.c | 60 +
>  hw/arm/trace-events |  1 +
>  2 files changed, 61 insertions(+)
> 
> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index 6c6ed84e78..94ca15375c 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -1473,6 +1473,65 @@ static void smmu_realize(DeviceState *d, Error **errp)
>  smmu_init_irq(s, dev);
>  }
>  
> +static int smmuv3_manual_set_pci_device_pasid_table(SMMUDevice *sdev)
Can't you retrieve the associated sid and then call
smmuv3_notify_config_change()
> +{
> +#ifdef __linux__
> +IOMMUMemoryRegion *mr = &(sdev->iommu);
> +int sid = smmu_get_sid(sdev);
> +SMMUEventInfo event = {.type = SMMU_EVT_NONE, .sid = sid,
> +   .inval_ste_allowed = true};
> +IOMMUConfig iommu_config = {};
> +SMMUTransCfg *cfg;
> +int ret = -1;
> +
> +cfg = smmuv3_get_config(sdev, &event);
> +if (!cfg) {
> +return ret;
> +}
> +
> +iommu_config.pasid_cfg.argsz = sizeof(struct iommu_pasid_table_config);
> +iommu_config.pasid_cfg.version = PASID_TABLE_CFG_VERSION_1;
> +iommu_config.pasid_cfg.format = IOMMU_PASID_FORMAT_SMMUV3;
> +iommu_config.pasid_cfg.base_ptr = cfg->s1ctxptr;
> +iommu_config.pasid_cfg.pasid_bits = 0;
> +iommu_config.pasid_cfg.vendor_data.smmuv3.version = 
> PASID_TABLE_SMMUV3_CFG_VERSION_1;
> +
> +if (cfg->disabled || cfg->bypassed) {
> +iommu_config.pasid_cfg.config = IOMMU_PASID_CONFIG_BYPASS;
> +} else if (cfg->aborted) {
> +iommu_config.pasid_cfg.config = IOMMU_PASID_CONFIG_ABORT;
> +} else {
> +iommu_config.pasid_cfg.config = IOMMU_PASID_CONFIG_TRANSLATE;
> +}
> +
> +ret = pci_device_set_pasid_table(sdev->bus, sdev->devfn, &iommu_config);
> +if (ret) {
> +error_report("Failed to pass PASID table to host for iommu mr %s 
> (%m)",
> + mr->parent_obj.name);
> +}
> +
> +return ret;
> +#endif
> +}
> +
> +static int smmuv3_post_load(void *opaque, int version_id)
> +{
> +SMMUv3State *s3 = opaque;
> +SMMUState *s = &(s3->smmu_state);
> +SMMUDevice *sdev;
> +int ret = 0;
> +
> +QLIST_FOREACH(sdev, &s->devices_with_notifiers, next) {
> +trace_smmuv3_post_load_sdev(sdev->devfn, 
> sdev->iommu.parent_obj.name);
> +ret = smmuv3_manual_set_pci_device_pasid_table(sdev);
> +if (ret) {
> +break;
> +}
> +}
> +
> +return ret;
> +}
> +
>  static const VMStateDescription vmstate_smmuv3_queue = {
>  .name = "smmuv3_queue",
>  .version_id = 1,
> @@ -1491,6 +1550,7 @@ static const VMStateDescription vmstate_smmuv3 = {
>  .version_id = 1,
>  .minimum_version_id = 1,
>  .priority = MIG_PRI_IOMMU,
> +.post_load = smmuv3_post_load,
>  .fields = (VMStateField[]) {
>  VMSTATE_UINT32(features, SMMUv3State),
>  VMSTATE_UINT8(sid_size, SMMUv3State),
> diff --git a/hw/arm/trace-events b/hw/arm/trace-events
> index 35e562ab74..caa864dd72 100644
> --- a/hw/arm/trace-events
> +++ b/hw/arm/trace-events
> @@ -53,4 +53,5 @@ smmuv3_notify_flag_add(const char *iommu) "ADD SMMUNotifier 
> node for iommu mr=%s
>  smmuv3_notify_flag_del(const char *iommu) "DEL SMMUNotifier node for iommu 
> mr=%s"
>  smmuv3_inv_notifiers_iova(const char *name, uint16_t asid, uint64_t iova, 
> uint8_t tg, uint64_t num_pages) "iommu mr=%s asid=%d iova=0x%"PRIx64" tg=%d 
> num_pages=0x%"PRIx64
>  smmuv3_notify_config_change(const char *name, uint8_t config, uint64_t 
> s1ctxptr) "iommu mr=%s config=%d s1ctxptr=0x%"PRIx64
> +smmuv3_post_load_sdev(int devfn, const char *name) "sdev devfn=%d iommu 
> mr=%s"PRIx64
>  
> 
Thanks

Eric

Re: [PATCH v3] hw/block/nvme: add device self test command support

2021-04-12 Thread Gollu Appalanaidu


On Mon, Apr 12, 2021 at 01:57:49PM +0530, Gollu Appalanaidu wrote:

On Sat, Apr 10, 2021 at 12:35:20AM +0900, Keith Busch wrote:

On Wed, Mar 31, 2021 at 02:54:27PM +0530, Gollu Appalanaidu wrote:

This is to add support for Device Self Test Command (DST) and
DST Log Page. Refer NVM Express specification 1.4b section 5.8
("Device Self-test command")


Please don't write change logs that just say what you did. I can read
the code to see that. Explain why this is useful because this frankly
looks like another useless feature. We don't need to implement every
optional spec feature here. There should be a real value proposition.


Hi Keith,
It was useful to us to be able to test the feature against qemu - and
we wanted to contribute the code, but we understand that features should
be more "complete" for upstreaming.

New features for SPDK (and nvme-cli) are use-cases for optional features
like this, where one might not have physical device available and also users
who is going to develop their in house host test tool this would be useful,
since we are providing the functional behaviour as per the NVMe protocol.


Hi Keith,

It was useful to us to be able to test the feature against qemu - and
we wanted to contribute the code, but we understand that features should
be more "complete" for upstreaming.

New features for SPDK (and nvme-cli) are use-cases for optional features
like this, where one might not have physical device available and also users
who is going to develop their in house host test tool this would be useful,
since we are providing the functional behaviour as per the NVMe protocol.

Re: [PATCH v6 0/7] eBPF RSS support for virtio-net

2021-04-12 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20210412082512.14998-1-and...@daynix.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20210412082512.14998-1-and...@daynix.com
Subject: [PATCH v6 0/7] eBPF RSS support for virtio-net

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] patchew/20210412082512.14998-1-and...@daynix.com -> 
patchew/20210412082512.14998-1-and...@daynix.com
Switched to a new branch 'test'
48992c9 MAINTAINERS: Added eBPF maintainers information.
56c94d0 docs: Added eBPF documentation.
ec7b685 virtio-net: Added eBPF RSS to virtio-net.
9731e1e ebpf: Added eBPF RSS loader.
9cf91d5 ebpf: Added eBPF RSS program.
1556709 net: Added SetSteeringEBPF method for NetClientState.
79b1d59 net/tap: Added TUNSETSTEERINGEBPF code.

=== OUTPUT BEGIN ===
1/7 Checking commit 79b1d59a6332 (net/tap: Added TUNSETSTEERINGEBPF code.)
2/7 Checking commit 15567095a567 (net: Added SetSteeringEBPF method for 
NetClientState.)
3/7 Checking commit 9cf91d50d877 (ebpf: Added eBPF RSS program.)
Use of uninitialized value $acpi_testexpected in string eq at 
./scripts/checkpatch.pl line 1529.
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#22: 
new file mode 100755

WARNING: line over 80 characters
#211: FILE: tools/ebpf/rss.bpf.c:157:
+ * According to 
https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml

WARNING: line over 80 characters
#214: FILE: tools/ebpf/rss.bpf.c:160:
+ * Need to choose reasonable amount of maximum extensions/options we may check 
to find

WARNING: line over 80 characters
#281: FILE: tools/ebpf/rss.bpf.c:227:
+*l4_offset + opt_offset + offsetof(struct 
ipv6_destopt_hao, addr),

total: 0 errors, 4 warnings, 591 lines checked

Patch 3/7 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
4/7 Checking commit 9731e1e3d2ce (ebpf: Added eBPF RSS loader.)
Use of uninitialized value $acpi_testexpected in string eq at 
./scripts/checkpatch.pl line 1529.
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#71: 
new file mode 100644

WARNING: architecture specific defines should be avoided
#353: FILE: ebpf/rss.bpf.skeleton.h:4:
+#ifndef __RSS_BPF_SKEL_H__

ERROR: code indent should never use tabs
#360: FILE: ebpf/rss.bpf.skeleton.h:11:
+^Istruct bpf_object_skeleton *skeleton;$

ERROR: code indent should never use tabs
#361: FILE: ebpf/rss.bpf.skeleton.h:12:
+^Istruct bpf_object *obj;$

ERROR: code indent should never use tabs
#362: FILE: ebpf/rss.bpf.skeleton.h:13:
+^Istruct {$

ERROR: code indent should never use tabs
#363: FILE: ebpf/rss.bpf.skeleton.h:14:
+^I^Istruct bpf_map *tap_rss_map_configurations;$

ERROR: code indent should never use tabs
#364: FILE: ebpf/rss.bpf.skeleton.h:15:
+^I^Istruct bpf_map *tap_rss_map_indirection_table;$

ERROR: code indent should never use tabs
#365: FILE: ebpf/rss.bpf.skeleton.h:16:
+^I^Istruct bpf_map *tap_rss_map_toeplitz_key;$

ERROR: code indent should never use tabs
#366: FILE: ebpf/rss.bpf.skeleton.h:17:
+^I} maps;$

ERROR: code indent should never use tabs
#367: FILE: ebpf/rss.bpf.skeleton.h:18:
+^Istruct {$

ERROR: code indent should never use tabs
#368: FILE: ebpf/rss.bpf.skeleton.h:19:
+^I^Istruct bpf_program *tun_rss_steering_prog;$

ERROR: code indent should never use tabs
#369: FILE: ebpf/rss.bpf.skeleton.h:20:
+^I} progs;$

ERROR: code indent should never use tabs
#370: FILE: ebpf/rss.bpf.skeleton.h:21:
+^Istruct {$

ERROR: code indent should never use tabs
#371: FILE: ebpf/rss.bpf.skeleton.h:22:
+^I^Istruct bpf_link *tun_rss_steering_prog;$

ERROR: code indent should never use tabs
#372: FILE: ebpf/rss.bpf.skeleton.h:23:
+^I} links;$

ERROR: code indent should never use tabs
#378: FILE: ebpf/rss.bpf.skeleton.h:29:
+^Iif (!obj)$

ERROR: braces {} are necessary for all arms of this statement
#378: FILE: ebpf/rss.bpf.skeleton.h:29:
+   if (!obj)
[...]

ERROR: code indent should never use tabs
#379: FILE: ebpf/rss.bpf.skeleton.h:30:
+^I^Ireturn;$

ERROR: code indent should never use tabs
#380: FILE: ebpf/rss.bpf.skeleton.h:31:
+^Iif (obj->skeleton)$

ERROR: braces {} are necessary for all arms of this statement
#380: FILE: ebpf/rss.bpf.skeleton.h:31:
+   if (obj->skeleton)
[...]

ERROR: code indent should never use tabs
#381: FILE: ebpf/rss.bpf.skeleton.h:32:
+^I^Ibpf_object__destroy_skeleton(obj->skeleton);$

ERROR: code indent should never use tabs
#382: FILE: ebpf/rss.bpf.skeleton.h:33:
+^Ifree(obj);$

ERROR: code indent should never use tabs
#391: FILE: ebpf/rss.bpf.skele

Re: [RFC PATCH 0/3] Add migration support for VFIO PCI devices in SMMUv3 nested stage mode

2021-04-12 Thread Auger Eric

Hi Kunkun,

On 2/19/21 10:42 AM, Kunkun Jiang wrote:
> Hi all,
> 
> Since the SMMUv3's nested translation stages[1] has been introduced by Eric, 
> we
> need to pay attention to the migration of VFIO PCI devices in SMMUv3 nested 
> stage
> mode. At present, it is not yet supported in QEMU. There are two problems in 
> the
> existing framework.
> 
> First, the current way to get dirty pages is not applicable to nested stage 
> mode.
> Because of the "Caching Mode", VTD can map the RAM through the host single 
> stage
> (giova->hpa). "vfio_listener_log_sync" gets dirty pages by transferring 
> "giova"
> to the kernel for the RAM block section of mapped MMIO region. In nested stage
> mode, we setup the stage 2 (gpa->hpa) and the stage 1(giova->gpa) separately. 
> So
> it is inapplicable to get dirty pages by the current way in nested stage mode.
> 
> Second, it also need to pass stage 1 configurations to the destination host 
> after
> the migration. In Eric's patch, it passes the stage 1 configuration to the 
> host on
> each STE update for the devices set the PASID PciOps. The configuration will 
> be
> applied at physical level. But the data of physical level will not be sent to 
> the
> destination host. So we have to pass stage 1 configurations to the destination
> host after the migration.
> 
> This Patch set includes patches as below:
> Patch 1-2:
> - Refactor the vfio_listener_log_sync and added a new function to get dirty 
> pages
> in nested stage mode.
> 
> Patch 3:
> - Added the post_load function to vmstate_smmuv3 for passing stage 1 
> configuration
> to the destination host after the migration.
> 
> @Eric, Could you please add this Patch set to your future version of
> "vSMMUv3/pSMMUv3 2 stage VFIO integration", if you think this Patch set makes 
> sense? :)
First of all, thank you for working on this. As you may have noticed I
sent a new RFC version yesterday (without including this). When time
allows, you may have a look at the comments I posted on your series. I
don't think I can test it at the moment so I may prefer to keep it
separate. Also be aware that the QEMU integration of nested has not
received much comments yet and is likely to evolve. The priority is to
get some R-b's on the kernel pieces, especially the SMMU part. With this
dependency resolved, things can't move forward I am afraid.

Thanks

Eric
> 
> Best Regards
> Kunkun Jiang
> 
> [1] [RFC,v7,00/26] vSMMUv3/pSMMUv3 2 stage VFIO integration
> http://patchwork.ozlabs.org/project/qemu-devel/cover/20201116181349.11908-1-eric.au...@redhat.com/
> 
> Kunkun Jiang (3):
>   vfio: Introduce helpers to mark dirty pages of a RAM section
>   vfio: Add vfio_prereg_listener_log_sync in nested stage
>   hw/arm/smmuv3: Post-load stage 1 configurations to the host
> 
>  hw/arm/smmuv3.c | 60 +
>  hw/arm/trace-events |  1 +
>  hw/vfio/common.c| 47 +--
>  3 files changed, 100 insertions(+), 8 deletions(-)
>

[Bug 1915063] Re: Windows 10 wil not install using qemu-system-x86_64

2021-04-12 Thread Andy Whitcroft

The commit in question is marked for stable:

  commit 841c2be09fe4f495fe5224952a419bd8c7e5b455
  Author: Maxim Levitsky 
  Date:   Wed Jul 8 14:57:31 2020 +0300

kvm: x86: replace kvm_spec_ctrl_test_value with runtime test on the host

To avoid complex and in some cases incorrect logic in
kvm_spec_ctrl_test_value, just try the guest's given value on the host
processor instead, and if it doesn't #GP, allow the guest to set it.

One such case is when host CPU supports STIBP mitigation
but doesn't support IBRS (as is the case with some Zen2 AMD cpus),
and in this case we were giving guest #GP when it tried to use STIBP

The reason why can can do the host test is that IA32_SPEC_CTRL msr is
passed to the guest, after the guest sets it to a non zero value
for the first time (due to performance reasons),
and as as result of this, it is pointless to emulate #GP condition on
this first access, in a different way than what the host CPU does.

This is based on a patch from Sean Christopherson, who suggested this idea.

Fixes: 6441fa6178f5 ("KVM: x86: avoid incorrect writes to host 
MSR_IA32_SPEC_CTRL")
Cc: sta...@vger.kernel.org
Suggested-by: Sean Christopherson 
Signed-off-by: Maxim Levitsky 
Message-Id: <20200708115731.180097-1-mlevi...@redhat.com>
Signed-off-by: Paolo Bonzini 

It appears to be in `v5.4.102` which is currently queued up for the
cycle following the one just starting.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1915063

Title:
  Windows 10 wil not install using qemu-system-x86_64

Status in QEMU:
  New
Status in linux package in Ubuntu:
  Confirmed
Status in linux-oem-5.10 package in Ubuntu:
  Fix Released
Status in linux-oem-5.6 package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Invalid

Bug description:
  Steps to reproduce
  install virt-manager and ovmf if nopt already there
  copy windows and virtio iso files to /var/lib/libvirt/images

  Use virt-manager from local machine to create your VMs with the disk, CPUs 
and memory required
  Select customize configuration then select OVMF(UEFI) instead of seabios
  set first CDROM to the windows installation iso (enable in boot options)
  add a second CDROM and load with the virtio iso
change spice display to VNC

Always get a security error from windows and it fails to launch the 
installer (works on RHEL and Fedora)
  I tried updating the qemu version from Focals 4.2 to Groovy 5.0 which was of 
no help
  --- 
  ProblemType: Bug
  ApportVersion: 2.20.11-0ubuntu27.14
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  DistributionChannelDescriptor:
   # This is the distribution channel descriptor for the OEM CDs
   # For more information see 
http://wiki.ubuntu.com/DistributionChannelDescriptor
   
canonical-oem-sutton-focal-amd64-20201030-422+pc-sutton-bachman-focal-amd64+X00
  DistroRelease: Ubuntu 20.04
  InstallationDate: Installed on 2021-01-20 (19 days ago)
  InstallationMedia: Ubuntu 20.04 "Focal" - Build amd64 LIVE Binary 
20201030-14:39
  MachineType: LENOVO 30E102Z
  NonfreeKernelModules: nvidia_modeset nvidia
  Package: linux (not installed)
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 EFI VGA
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.6.0-1042-oem 
root=UUID=389cd165-fc52-4814-b837-a1090b9c2387 ro locale=en_US quiet splash 
vt.handoff=7
  ProcVersionSignature: Ubuntu 5.6.0-1042.46-oem 5.6.19
  RelatedPackageVersions:
   linux-restricted-modules-5.6.0-1042-oem N/A
   linux-backports-modules-5.6.0-1042-oem  N/A
   linux-firmware  1.187.8
  RfKill:
   
  Tags:  focal
  Uname: Linux 5.6.0-1042-oem x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: adm cdrom dip docker kvm libvirt lpadmin plugdev sambashare sudo
  _MarkForUpload: True
  dmi.bios.date: 07/29/2020
  dmi.bios.vendor: LENOVO
  dmi.bios.version: S07KT08A
  dmi.board.name: 1046
  dmi.board.vendor: LENOVO
  dmi.board.version: Not Defined
  dmi.chassis.type: 3
  dmi.chassis.vendor: LENOVO
  dmi.chassis.version: None
  dmi.modalias: 
dmi:bvnLENOVO:bvrS07KT08A:bd07/29/2020:svnLENOVO:pn30E102Z:pvrThinkStationP620:rvnLENOVO:rn1046:rvrNotDefined:cvnLENOVO:ct3:cvrNone:
  dmi.product.family: INVALID
  dmi.product.name: 30E102Z
  dmi.product.sku: LENOVO_MT_30E1_BU_Think_FM_ThinkStation P620
  dmi.product.version: ThinkStation P620
  dmi.sys.vendor: LENOVO

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1915063/+subscriptions

Re: [PATCH for-6.0] block/nbd: fix possible use after free of s->connect_thread

2021-04-12 Thread Roman Kagan

On Tue, Apr 06, 2021 at 06:51:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> If on nbd_close() we detach the thread (in
> nbd_co_establish_connection_cancel() thr->state becomes
> CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use
> s->connect_thread (which is set to NULL), as running thread may free it
> at any time.
> 
> Still nbd_co_establish_connection() does exactly this: it saves
> s->connect_thread to local variable (just for better code style) and
> use it even after yield point, when thread may be already detached.
> 
> Fix that. Also check thr to be non-NULL on
> nbd_co_establish_connection() start for safety.
> 
> After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes
> impossible in the second switch in nbd_co_establish_connection().
> Still, don't add extra abort() just before the release. If it somehow
> possible to reach this "case:" it won't hurt. Anyway, good refactoring
> of all this reconnect mess will come soon.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
> 
> Hi all! I faced a crash, just running 277 iotest in a loop. I can't
> reproduce it on master, it reproduces only on my branch with nbd
> reconnect refactorings.
> 
> Still, it seems very possible that it may crash under some conditions.
> So I propose this patch for 6.0. It's written so that it's obvious that
> it will not hurt:
> 
>  pre-patch, on first hunk we'll just crash if thr is NULL,
>  on second hunk it's safe to return -1, and using thr when
>  s->connect_thread is already zeroed is obviously wrong.
> 
>  block/nbd.c | 11 +++
>  1 file changed, 11 insertions(+)

Can we please get it merged in 6.0 as it's a genuine crasher fix?

Reviewed-by: Roman Kagan

Re: [PATCH for-6.0] block/nbd: fix possible use after free of s->connect_thread

2021-04-12 Thread Vladimir Sementsov-Ogievskiy


12.04.2021 11:45, Roman Kagan wrote:

On Tue, Apr 06, 2021 at 06:51:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:

If on nbd_close() we detach the thread (in
nbd_co_establish_connection_cancel() thr->state becomes
CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use
s->connect_thread (which is set to NULL), as running thread may free it
at any time.

Still nbd_co_establish_connection() does exactly this: it saves
s->connect_thread to local variable (just for better code style) and
use it even after yield point, when thread may be already detached.

Fix that. Also check thr to be non-NULL on
nbd_co_establish_connection() start for safety.

After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes
impossible in the second switch in nbd_co_establish_connection().
Still, don't add extra abort() just before the release. If it somehow
possible to reach this "case:" it won't hurt. Anyway, good refactoring
of all this reconnect mess will come soon.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

Hi all! I faced a crash, just running 277 iotest in a loop. I can't
reproduce it on master, it reproduces only on my branch with nbd
reconnect refactorings.

Still, it seems very possible that it may crash under some conditions.
So I propose this patch for 6.0. It's written so that it's obvious that
it will not hurt:

  pre-patch, on first hunk we'll just crash if thr is NULL,
  on second hunk it's safe to return -1, and using thr when
  s->connect_thread is already zeroed is obviously wrong.

  block/nbd.c | 11 +++
  1 file changed, 11 insertions(+)


Can we please get it merged in 6.0 as it's a genuine crasher fix?

Reviewed-by: Roman Kagan 



Thanks! I'm applying it to my nbd branch and will send PULL request for rc3 
today

--
Best regards,
Vladimir

Re: gitlab-ci check-dco test

2021-04-12 Thread Daniel P . Berrangé

On Sat, Apr 10, 2021 at 11:03:19AM +0200, Philippe Mathieu-Daudé wrote:
> On 4/10/21 4:58 AM, Richard Henderson wrote:
> > On development branches, it's not uncommon to push
> > temporary --fixup patches, and normally one doesn't
> > sign those.  But then of course one get hate-mail
> > from the gitlab-ci job about the failing test.
> > 
> > Is there a way to make it fatal on staging, but
> > merely a warning on other branches (a-la checkpatch)?
> 
> To only run check-dco on branch /staging on any namespace:
> 
> -- >8 --
> diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
> index 3480d79db3a..f0d21da57f0 100644
> --- a/.gitlab-ci.yml
> +++ b/.gitlab-ci.yml
> @@ -781,9 +781,9 @@ check-dco:
>needs:
>  job: amd64-centos8-container
>script: .gitlab-ci.d/check-dco.py
> -  except:
> +  only:
>  variables:
> -  - $CI_PROJECT_NAMESPACE == 'qemu-project' && $CI_COMMIT_BRANCH ==
> 'master'
> +  - $CI_COMMIT_BRANCH == 'staging'
>variables:
>  GIT_DEPTH: 1000

Definitely don't want that - it skips the DCO check entirely on all
branches except 'staging'. We want contributors to see the missing
SoB on any of their branches *before* they send them to qemu-devel
at all.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [RFC v12 27/65] target/arm: split a15 cpu model and 32bit class functions to cpu32.c

2021-04-12 Thread Claudio Fontana

Hi Peter,

On 4/8/21 12:36 PM, Peter Maydell wrote:
> On Thu, 8 Apr 2021 at 11:23, Claudio Fontana  wrote:
>> Mainly for this code here a question from my side: is the current code 
>> actually already "wrong"?
>>
>> I mean, we unconditionally set the aarch64-capable cpu classes to all use 
>> aarch64_gdb_arch_name and gdbstub64,
>> but what about an aarch64-capable cpu running in 32bit mode?
> 
> This is somewhere between a bug and a missing feature. The 'bug' part is
> that for running a guest on AArch64 KVM with -cpu aarch64=off' (ie a
> 32-bit guest) we should be presenting an aarch32 gdb stub, and we don't.

Isn't this "easily" solvable? Probably I am missing something obvious..

I mean we could dispatch to the one or to the other according to ->is_aa64()?


> The 'missing feature' part is that in an ideal world we would support
> mixed aarch64-and-aarch32 guest debugging, and we don't. This needs
> support on the gdb side as well as on our side, AIUI.
> 
>> Why don't we have, like tentatively done here for arm_cpu_dump_state,
>>
>> an arm_gdb_arch_name and an arm_cpu_gdb_read_register that check is_a64() 
>> and call aaarch32_cpu_gdb_read_register or aarch64_cpu_gdb_read_register 
>> accordingly?
> 
> Because the gdb on the other end of the gdbstub does not expect the
> target to suddenly change in the middle of execution like that.


Ok, but given the restriction that we can only support one or the other the 
whole time,
wouldn't adding this check ensure that at least we would do the right thing 
with -cpu aarch64=off?

Or is this on the way out, support-wise?

Thanks,

Claudio

> gdb is really a userspace-process debugger at heart, and it only
> negotiates "what are all the register types, what is the debuggee
> architecture, etc" once when it connects. Once we've told gdb
> what the registers are we need to stick to them.
> 
> Properly supporting mixed-mode debugging would require support
> for telling gdb "I support both this set of registers for aarch64
> and this other set for aarch32" and notifying it about which mode
> we were in (and support in gdb for understanding this). I don't
> think anybody's ever seriously tried to work out a design for this.
> 
> thanks
> -- PMM
>

Re: gitlab-ci check-dco test

2021-04-12 Thread Daniel P . Berrangé

On Fri, Apr 09, 2021 at 07:58:48PM -0700, Richard Henderson wrote:
> On development branches, it's not uncommon to push
> temporary --fixup patches, and normally one doesn't
> sign those.  But then of course one get hate-mail
> from the gitlab-ci job about the failing test.
> 
> Is there a way to make it fatal on staging, but
> merely a warning on other branches (a-la checkpatch)?

Note,  checkpatch is *never* fatal on any branch because there are always
scenarios in which checkpatch gives false positives that we have to allow.

We can of course set 'allow_failure: true' on the DCO check, on non-staging
branches, but that loose some of its value. In general I think users should
see this as a mandatory check, because we don't want them ever sending
patches without this passing. Your scenario of sometimes needing to push
temporary fix patches is valid too of course.

So this is a no-win scenario and we have to decide what the least worst
option is. When I added this check I decided the least worst was to have
developers see failures when they have temp fixup patches, because I was
optimizing for the ensuring developers see & fix problems before they
submit to qemu-devel.

In libvirt we have the same check, but we moved it to a separate stage
in the pipeline "sanity_checks" instead of "build", so developers can
see at a glance in the UI that the build jobs all passed, and only the
sanity check(s) failed.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 1/1] Set TARGET_PAGE_BITS to be 10 instead of 8 bits

On Sun, 11 Apr 2021 at 16:15, Richard Henderson
 wrote:
>
> On 4/10/21 10:24 AM, Michael Rolnik wrote:
> > Please review.
>
>
> The first 256b is i/o, the next 768b are ram.  But having changed the page
> size, it should mean that the first 1k are now treated as i/o.
>
> We do have a path by which instructions in i/o pages can be executed.  This
> happens on some ARM board setups during cold boot.  But we do not save those
> translations, so they run much much slower than it should.
>
> But perhaps in the case of AVR, "much much slower" really isn't visible?
>
> In general, I think changing the page size is wrong.  I also assume that
> migration is largely irrelevant to this target.

Migration is irrelevant, but every target benefits from snapshot
save-and-restore, and I think that uses the same codepaths ?

-- PMM

Re: [PATCH v1 1/8] target/riscv: Remove the hardcoded RVXLEN macro

On Sat, Apr 3, 2021 at 4:04 AM Alistair Francis
 wrote:
>
> Signed-off-by: Alistair Francis 
> ---
>  target/riscv/cpu.h | 6 --
>  target/riscv/cpu.c | 6 +-
>  2 files changed, 5 insertions(+), 7 deletions(-)
>

Reviewed-by: Bin Meng

Re: [PATCH v1 7/8] target/riscv: Remove an unused CASE_OP_32_64 macro

On Sat, Apr 3, 2021 at 4:04 AM Alistair Francis
 wrote:
>
> Signed-off-by: Alistair Francis 
> ---
>  target/riscv/translate.c | 6 --
>  1 file changed, 6 deletions(-)
>

Reviewed-by: Bin Meng

Re: [PATCH v1 2/8] target/riscv: Remove the hardcoded SSTATUS_SD macro

On Sat, Apr 3, 2021 at 4:04 AM Alistair Francis
 wrote:

Worth mentioning that this also fixed the issue of a writable SD bit

>
> Signed-off-by: Alistair Francis 
> ---
>  target/riscv/cpu_bits.h | 6 --
>  target/riscv/csr.c  | 9 -
>  2 files changed, 8 insertions(+), 7 deletions(-)
>

Reviewed-by: Bin Meng

Re: [PATCH v1 6/8] target/riscv: Remove the unused HSTATUS_WPRI macro

On Sat, Apr 3, 2021 at 4:05 AM Alistair Francis
 wrote:
>
> Signed-off-by: Alistair Francis 
> ---
>  target/riscv/cpu_bits.h | 6 --
>  1 file changed, 6 deletions(-)
>

Reviewed-by: Bin Meng

[PATCH v2] libqtest: refuse QTEST_QEMU_BINARY=qemu-kvm

Some downstreams rename the QEMU binary to "qemu-kvm". This breaks
qtest_get_arch(), which attempts to parse the target architecture from
the QTEST_QEMU_BINARY environment variable.

Print an error instead of returning the architecture "kvm". Things fail
in weird ways when the architecture string is bogus.

Arguably qtests should always be run in a build directory instead of
against an installed QEMU. In any case, printing a clear error when this
happens is helpful.

Reported-by: Qin Wang 
Cc: Emanuele Giuseppe Esposito 
Signed-off-by: Stefan Hajnoczi 
---
 tests/qtest/libqtest.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index 71e359efcd..7caf20f56b 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -910,6 +910,16 @@ const char *qtest_get_arch(void)
 abort();
 }
 
+if (!strstr(qemu, "-system-")) {
+fprintf(stderr, "QTEST_QEMU_BINARY must end with *-system- where 
"
+"'arch' is the target architecture (x86_64, aarch64, "
+"etc). If you are using qemu-kvm or another custom "
+"name, please create a symlink like ln -s "
+"path/to/qemu-kvm qemu-system-x86_64 and use that "
+"instead.\n");
+abort();
+}
+
 return end + 1;
 }
 
-- 
2.30.2

Re: [RFC v12 27/65] target/arm: split a15 cpu model and 32bit class functions to cpu32.c

On Mon, 12 Apr 2021 at 10:05, Claudio Fontana  wrote:
>
> Hi Peter,
>
> On 4/8/21 12:36 PM, Peter Maydell wrote:
> > On Thu, 8 Apr 2021 at 11:23, Claudio Fontana  wrote:
> >> Mainly for this code here a question from my side: is the current code 
> >> actually already "wrong"?
> >>
> >> I mean, we unconditionally set the aarch64-capable cpu classes to all use 
> >> aarch64_gdb_arch_name and gdbstub64,
> >> but what about an aarch64-capable cpu running in 32bit mode?
> >
> > This is somewhere between a bug and a missing feature. The 'bug' part is
> > that for running a guest on AArch64 KVM with -cpu aarch64=off' (ie a
> > 32-bit guest) we should be presenting an aarch32 gdb stub, and we don't.
>
> Isn't this "easily" solvable? Probably I am missing something obvious..

Fairly, yes. That's why I classify it as "bug". It's just nobody's
written the patch.

> I mean we could dispatch to the one or to the other according to ->is_aa64()?

No, because is_aa64() is the "at runtime, is the CPU currently 32-bit
or 64 bit?" check. For gdbstub functions, because gdb cannot handle
runtime switching, the check should be a static one of the "what kind
of CPU is this?" type. (This is effectively what we do today, except
that for the special case of "aarch64=off" we make the wrong choice.)

thanks
-- PMM

Re: [PATCH v2] libqtest: refuse QTEST_QEMU_BINARY=qemu-kvm


On 12/04/2021 11.18, Stefan Hajnoczi wrote:

Some downstreams rename the QEMU binary to "qemu-kvm". This breaks
qtest_get_arch(), which attempts to parse the target architecture from
the QTEST_QEMU_BINARY environment variable.

Print an error instead of returning the architecture "kvm". Things fail
in weird ways when the architecture string is bogus.

Arguably qtests should always be run in a build directory instead of
against an installed QEMU. In any case, printing a clear error when this
happens is helpful.

Reported-by: Qin Wang 
Cc: Emanuele Giuseppe Esposito 
Signed-off-by: Stefan Hajnoczi 
---
  tests/qtest/libqtest.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index 71e359efcd..7caf20f56b 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -910,6 +910,16 @@ const char *qtest_get_arch(void)
  abort();
  }
  
+if (!strstr(qemu, "-system-")) {

+fprintf(stderr, "QTEST_QEMU_BINARY must end with *-system- where 
"
+"'arch' is the target architecture (x86_64, aarch64, "
+"etc). If you are using qemu-kvm or another custom "
+"name, please create a symlink like ln -s "
+"path/to/qemu-kvm qemu-system-x86_64 and use that "
+"instead.\n");


The text is very long ... maybe add some \n to wrap it after 80 columns?
(also not sure whether we really need the second part about the symlink... 
but I also don't mind leaving it in)



+abort();


Since this can be triggered by the user, I'd rather use exit(1) instead, 
what do you think?


 Thomas



+}
+
  return end + 1;
  }

Re: [PATCH v2] libqtest: refuse QTEST_QEMU_BINARY=qemu-kvm

On Mon, 12 Apr 2021 at 10:35, Thomas Huth  wrote:
>
> On 12/04/2021 11.18, Stefan Hajnoczi wrote:
> > Some downstreams rename the QEMU binary to "qemu-kvm". This breaks
> > qtest_get_arch(), which attempts to parse the target architecture from
> > the QTEST_QEMU_BINARY environment variable.
> >
> > Print an error instead of returning the architecture "kvm". Things fail
> > in weird ways when the architecture string is bogus.
> >
> > Arguably qtests should always be run in a build directory instead of
> > against an installed QEMU. In any case, printing a clear error when this
> > happens is helpful.
> >
> > Reported-by: Qin Wang 
> > Cc: Emanuele Giuseppe Esposito 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >   tests/qtest/libqtest.c | 10 ++
> >   1 file changed, 10 insertions(+)
> >
> > diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
> > index 71e359efcd..7caf20f56b 100644
> > --- a/tests/qtest/libqtest.c
> > +++ b/tests/qtest/libqtest.c
> > @@ -910,6 +910,16 @@ const char *qtest_get_arch(void)
> >   abort();
> >   }
> >
> > +if (!strstr(qemu, "-system-")) {
> > +fprintf(stderr, "QTEST_QEMU_BINARY must end with *-system- 
> > where "
> > +"'arch' is the target architecture (x86_64, 
> > aarch64, "
> > +"etc). If you are using qemu-kvm or another custom 
> > "
> > +"name, please create a symlink like ln -s "
> > +"path/to/qemu-kvm qemu-system-x86_64 and use that "
> > +"instead.\n");
>
> The text is very long ... maybe add some \n to wrap it after 80 columns?
> (also not sure whether we really need the second part about the symlink...
> but I also don't mind leaving it in)

Yeah, anybody who runs into this is doing something weird and can
be assumed to be able to figure out how to do what they want with
a name of the right form, I think. You'll never see it if you're
just running 'make check'.

thanks
-- PMM

Re: [PATCH v4 1/1] docs/devel: Add VFIO device migration documentation

2021-04-12 Thread Cornelia Huck

On Fri, 9 Apr 2021 19:13:46 +0530
Tarun Gupta  wrote:

> Document interfaces used for VFIO device migration. Added flow of state 
> changes
> during live migration with VFIO device.
> 
> Co-developed-by: Kirti Wankhede 

This still needs Kirti's S-o-b as well (i.e. both tags need to be
paired.)

> Signed-off-by: Tarun Gupta 
> ---
> Tested by building docs with new vfio-migration.rst file
> 
> v4:
> - Added info about vfio_listener_log_global_[start|stop]
> - Added info about `save_state` callback.
> - Incorporated comments from v3.
> 
> v3:
> - Add introductory line about VM migration in general.
> - Remove occurcences of vfio_pin_pages() to describe pinning.
> - Incorporated comments from v2
> 
> v2:
> - Included the new vfio-migration.rst file in index.rst
> - Updated dirty page tracking section, also added details about
>   'pre-copy-dirty-page-tracking' opt-out option.
> - Incorporated comments around wording of doc.
> 

The change log and the other extra information also need to be followed
by another '---'.

>  MAINTAINERS   |   1 +
>  docs/devel/index.rst  |   1 +
>  docs/devel/vfio-migration.rst | 150 ++
>  3 files changed, 152 insertions(+)
>  create mode 100644 docs/devel/vfio-migration.rst

Aside from the meta issues above, the patch looks good to me.

Reviewed-by: Cornelia Huck

Re: [RFC PATCH-for-6.1 0/9] hw/clock: Strengthen machine (non-qdev) clock propagation

On Sat, 10 Apr 2021 at 16:15, Peter Maydell  wrote:
> I do wonder if the right fix to that would be to make TYPE_MACHINE
> be a subtype of TYPE_DEVICE, though -- machines not being subtypes
> of device has other annoying effects, like their not having reset
> methods or being able to register vmstate structs.

I wasn't quite right about this -- turns out that machines can
have a reset method, but it is a weird special case method
MachineClass::reset that isn't like device reset. (We use it
in a few machine types; if implemented, it is responsible for
calling qemu_devices_reset() to kick off device reset. Probably
MachineClass ought to implement TYPE_RESETTABLE_INTERFACE
instead.)

thanks
-- PMM

Re: [RFC PATCH-for-6.1 0/9] hw/clock: Strengthen machine (non-qdev) clock propagation

Hi Peter,

On 4/12/21 12:11 PM, Peter Maydell wrote:
> On Sat, 10 Apr 2021 at 16:15, Peter Maydell  wrote:
>> I do wonder if the right fix to that would be to make TYPE_MACHINE
>> be a subtype of TYPE_DEVICE, though -- machines not being subtypes
>> of device has other annoying effects, like their not having reset
>> methods or being able to register vmstate structs.
> 
> I wasn't quite right about this -- turns out that machines can
> have a reset method, but it is a weird special case method
> MachineClass::reset that isn't like device reset. (We use it
> in a few machine types; if implemented, it is responsible for
> calling qemu_devices_reset() to kick off device reset. Probably
> MachineClass ought to implement TYPE_RESETTABLE_INTERFACE
> instead.)

TIL MachineClass::reset().

- hw/hppa/machine.c
- hw/i386/pc.c

  Used to reset CPUs manually because CPUs aren't sysbus-reset.

- hw/i386/microvm.c

  Removing the microvm_fix_kernel_cmdline() call which is suspicious
  at reset (should be enough once in realize time), the is the same
  previous "Used to reset CPUs manually ..."

- hw/ppc/pnv.c

  "Reset BMC simulator" seems a legitimate case of machine reset.

  Next is 'fdt = pnv_dt_create()'. So guest has changed hardware state
  and we need to refresh the DT for the next guest boots. Could be.

- hw/ppc/spapr.c

  Similar previous "Used to reset CPUs manually ..."

  Similar previous "update DT after hardware physically changed".

Am I correct than half of these handlers code is to kludge the fact
that CPU aren't reset such device reset?

Regards,

Phil.

[PULL 1/5] hw/arm/virt-acpi-build: Fix GSIV values of the {GERR, Sync} interrupts

From: Zenghui Yu 

The GSIV values in SMMUv3 IORT node are not correct as they don't match
the SMMUIrq enumeration, which describes the IRQ<->PIN mapping used by
our emulated vSMMU.

Fixes: a703b4f6c1ee ("hw/arm/virt-acpi-build: Add smmuv3 node in IORT table")
Signed-off-by: Zenghui Yu 
Acked-by: Eric Auger 
Message-id: 20210402084731.93-1-yuzeng...@huawei.com
Signed-off-by: Peter Maydell 
---
 hw/arm/virt-acpi-build.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index f5a2b2d4cb5..60fe2e65a76 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -292,8 +292,8 @@ build_iort(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 smmu->flags = cpu_to_le32(ACPI_IORT_SMMU_V3_COHACC_OVERRIDE);
 smmu->event_gsiv = cpu_to_le32(irq);
 smmu->pri_gsiv = cpu_to_le32(irq + 1);
-smmu->gerr_gsiv = cpu_to_le32(irq + 2);
-smmu->sync_gsiv = cpu_to_le32(irq + 3);
+smmu->sync_gsiv = cpu_to_le32(irq + 2);
+smmu->gerr_gsiv = cpu_to_le32(irq + 3);
 
 /* Identity RID mapping covering the whole input RID range */
 idmap = &smmu->id_mapping_array[0];
-- 
2.20.1

[PULL 2/5] hw/arm/smmuv3: Emulate CFGI_STE_RANGE for an aligned range of StreamIDs

From: Zenghui Yu 

In emulation of the CFGI_STE_RANGE command, we now take StreamID as the
start of the invalidation range, regardless of whatever the Range is,
whilst the spec clearly states that

 - "Invalidation is performed for an *aligned* range of 2^(Range+1)
StreamIDs."

 - "The bottom Range+1 bits of the StreamID parameter are IGNORED,
aligning the range to its size."

Take CFGI_ALL (where Range == 31) as an example, if there are some random
bits in the StreamID field, we'll fail to perform the full invalidation but
get a strange range (e.g., SMMUSIDRange={.start=1, .end=0}) instead. Rework
the emulation a bit to get rid of the discrepancy with the spec.

Signed-off-by: Zenghui Yu 
Acked-by: Eric Auger 
Message-id: 20210402100449.528-1-yuzeng...@huawei.com
Signed-off-by: Peter Maydell 
---
 hw/arm/smmuv3.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 3b87324ce22..87056125357 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -980,16 +980,20 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 }
 case SMMU_CMD_CFGI_STE_RANGE: /* same as SMMU_CMD_CFGI_ALL */
 {
-uint32_t start = CMD_SID(&cmd);
+uint32_t sid = CMD_SID(&cmd), mask;
 uint8_t range = CMD_STE_RANGE(&cmd);
-uint64_t end = start + (1ULL << (range + 1)) - 1;
-SMMUSIDRange sid_range = {start, end};
+SMMUSIDRange sid_range;
 
 if (CMD_SSEC(&cmd)) {
 cmd_error = SMMU_CERROR_ILL;
 break;
 }
-trace_smmuv3_cmdq_cfgi_ste_range(start, end);
+
+mask = (1ULL << (range + 1)) - 1;
+sid_range.start = sid & ~mask;
+sid_range.end = sid_range.start + mask;
+
+trace_smmuv3_cmdq_cfgi_ste_range(sid_range.start, sid_range.end);
 g_hash_table_foreach_remove(bs->configs, smmuv3_invalidate_ste,
 &sid_range);
 break;
-- 
2.20.1

[PULL 0/5] target-arm queue

Handful of arm fixes for the rc.

The following changes since commit 555249a59e9cdd6b58da103aba5cf3a2d45c899f:

  Merge remote-tracking branch 'remotes/ehabkost-gl/tags/x86-next-pull-request' 
into staging (2021-04-10 16:58:56 +0100)

are available in the Git repository at:

  https://git.linaro.org/people/pmaydell/qemu-arm.git 
tags/pull-target-arm-20210412

for you to fetch changes up to 52c01ada86611136e3122dd139788dbcbc292d86:

  exec: Fix overlap of PAGE_ANON and PAGE_TARGET_1 (2021-04-12 11:06:24 +0100)


target-arm queue:
 * hw/arm/virt-acpi-build: Fix GSIV values of the {GERR, Sync} interrupts
 * hw/arm/smmuv3: Emulate CFGI_STE_RANGE for an aligned range of StreamIDs
 * accel/tcg: Preserve PAGE_ANON when changing page permissions
 * target/arm: Check PAGE_WRITE_ORG for MTE writeability
 * exec: Fix overlap of PAGE_ANON and PAGE_TARGET_1


Richard Henderson (3):
  accel/tcg: Preserve PAGE_ANON when changing page permissions
  target/arm: Check PAGE_WRITE_ORG for MTE writeability
  exec: Fix overlap of PAGE_ANON and PAGE_TARGET_1

Zenghui Yu (2):
  hw/arm/virt-acpi-build: Fix GSIV values of the {GERR, Sync} interrupts
  hw/arm/smmuv3: Emulate CFGI_STE_RANGE for an aligned range of StreamIDs

 include/exec/cpu-all.h|  4 ++--
 tests/tcg/aarch64/mte.h   |  3 ++-
 accel/tcg/translate-all.c |  9 ++--
 hw/arm/smmuv3.c   | 12 +++
 hw/arm/virt-acpi-build.c  |  4 ++--
 target/arm/mte_helper.c   |  2 +-
 tests/tcg/aarch64/mte-6.c | 43 +++
 tests/tcg/aarch64/Makefile.target |  2 +-
 8 files changed, 66 insertions(+), 13 deletions(-)
 create mode 100644 tests/tcg/aarch64/mte-6.c

[PULL 5/5] exec: Fix overlap of PAGE_ANON and PAGE_TARGET_1

From: Richard Henderson 

Unfortuately, the elements of PAGE_* were not in numerical
order and so PAGE_ANON was added to an "unused" bit.
As an arbitrary choice, move PAGE_TARGET_{1,2} together.

Cc: Laurent Vivier 
Fixes: 26bab757d41b ("linux-user: Introduce PAGE_ANON")
Buglink: https://bugs.launchpad.net/bugs/1922617
Signed-off-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Reviewed-by: Laurent Vivier 
Tested-by: Laurent Vivier 
Tested-by: Nathan Chancellor 
Signed-off-by: Peter Maydell 
---
 include/exec/cpu-all.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index d76b0b9e02d..32cfb634c6a 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -268,8 +268,8 @@ extern intptr_t qemu_host_page_mask;
 #define PAGE_RESERVED  0x0100
 #endif
 /* Target-specific bits that will be used via page_get_flags().  */
-#define PAGE_TARGET_1  0x0080
-#define PAGE_TARGET_2  0x0200
+#define PAGE_TARGET_1  0x0200
+#define PAGE_TARGET_2  0x0400
 
 #if defined(CONFIG_USER_ONLY)
 void page_dump(FILE *f);
-- 
2.20.1

[PULL 4/5] target/arm: Check PAGE_WRITE_ORG for MTE writeability

From: Richard Henderson 

We can remove PAGE_WRITE when (internally) marking a page
read-only because it contains translated code.

This can be triggered by tests/tcg/aarch64/bti-2, after
having serviced SIGILL trampolines on the stack.

Signed-off-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Signed-off-by: Peter Maydell 
---
 target/arm/mte_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
index 0bbb9ec3463..8be17e1b707 100644
--- a/target/arm/mte_helper.c
+++ b/target/arm/mte_helper.c
@@ -83,7 +83,7 @@ static uint8_t *allocation_tag_mem(CPUARMState *env, int 
ptr_mmu_idx,
 uint8_t *tags;
 uintptr_t index;
 
-if (!(flags & (ptr_access == MMU_DATA_STORE ? PAGE_WRITE : PAGE_READ))) {
+if (!(flags & (ptr_access == MMU_DATA_STORE ? PAGE_WRITE_ORG : 
PAGE_READ))) {
 /* SIGSEGV */
 arm_cpu_tlb_fill(env_cpu(env), ptr, ptr_size, ptr_access,
  ptr_mmu_idx, false, ra);
-- 
2.20.1

[PULL 3/5] accel/tcg: Preserve PAGE_ANON when changing page permissions

From: Richard Henderson 

Using mprotect() to change PROT_* does not change the MAP_ANON
previously set with mmap().  Our linux-user version of MTE only
works with MAP_ANON pages, so losing PAGE_ANON caused MTE to
stop working.

Reported-by: Stephen Long 
Signed-off-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Tested-by: Alex Bennée 
Signed-off-by: Peter Maydell 
---
 tests/tcg/aarch64/mte.h   |  3 ++-
 accel/tcg/translate-all.c |  9 +--
 tests/tcg/aarch64/mte-6.c | 43 +++
 tests/tcg/aarch64/Makefile.target |  2 +-
 4 files changed, 53 insertions(+), 4 deletions(-)
 create mode 100644 tests/tcg/aarch64/mte-6.c

diff --git a/tests/tcg/aarch64/mte.h b/tests/tcg/aarch64/mte.h
index 141cef522ce..0805676b116 100644
--- a/tests/tcg/aarch64/mte.h
+++ b/tests/tcg/aarch64/mte.h
@@ -48,7 +48,8 @@ static void enable_mte(int tcf)
 }
 }
 
-static void *alloc_mte_mem(size_t size)
+static void * alloc_mte_mem(size_t size) __attribute__((unused));
+static void * alloc_mte_mem(size_t size)
 {
 void *p = mmap(NULL, size, PROT_READ | PROT_WRITE | PROT_MTE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index f32df8b2404..ba6ab09790e 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -2714,6 +2714,8 @@ void page_set_flags(target_ulong start, target_ulong end, 
int flags)
a missing call to h2g_valid.  */
 assert(end - 1 <= GUEST_ADDR_MAX);
 assert(start < end);
+/* Only set PAGE_ANON with new mappings. */
+assert(!(flags & PAGE_ANON) || (flags & PAGE_RESET));
 assert_memory_lock();
 
 start = start & TARGET_PAGE_MASK;
@@ -2737,11 +2739,14 @@ void page_set_flags(target_ulong start, target_ulong 
end, int flags)
 p->first_tb) {
 tb_invalidate_phys_page(addr, 0);
 }
-if (reset_target_data && p->target_data) {
+if (reset_target_data) {
 g_free(p->target_data);
 p->target_data = NULL;
+p->flags = flags;
+} else {
+/* Using mprotect on a page does not change MAP_ANON. */
+p->flags = (p->flags & PAGE_ANON) | flags;
 }
-p->flags = flags;
 }
 }
 
diff --git a/tests/tcg/aarch64/mte-6.c b/tests/tcg/aarch64/mte-6.c
new file mode 100644
index 000..60d51d18be5
--- /dev/null
+++ b/tests/tcg/aarch64/mte-6.c
@@ -0,0 +1,43 @@
+#include "mte.h"
+
+void pass(int sig, siginfo_t *info, void *uc)
+{
+assert(info->si_code == SEGV_MTESERR);
+exit(0);
+}
+
+int main(void)
+{
+enable_mte(PR_MTE_TCF_SYNC);
+
+void *brk = sbrk(16);
+if (brk == (void *)-1) {
+perror("sbrk");
+return 2;
+}
+
+if (mprotect(brk, 16, PROT_READ | PROT_WRITE | PROT_MTE)) {
+perror("mprotect");
+return 2;
+}
+
+int *p1, *p2;
+long excl = 1;
+
+asm("irg %0,%1,%2" : "=r"(p1) : "r"(brk), "r"(excl));
+asm("gmi %0,%1,%0" : "+r"(excl) : "r"(p1));
+asm("irg %0,%1,%2" : "=r"(p2) : "r"(brk), "r"(excl));
+asm("stg %0,[%0]" : : "r"(p1));
+
+*p1 = 0;
+
+struct sigaction sa;
+memset(&sa, 0, sizeof(sa));
+sa.sa_sigaction = pass;
+sa.sa_flags = SA_SIGINFO;
+sigaction(SIGSEGV, &sa, NULL);
+
+*p2 = 0;
+
+abort();
+}
diff --git a/tests/tcg/aarch64/Makefile.target 
b/tests/tcg/aarch64/Makefile.target
index 56e48f4b34f..05b2622bfc9 100644
--- a/tests/tcg/aarch64/Makefile.target
+++ b/tests/tcg/aarch64/Makefile.target
@@ -37,7 +37,7 @@ AARCH64_TESTS += bti-2
 
 # MTE Tests
 ifneq ($(DOCKER_IMAGE)$(CROSS_CC_HAS_ARMV8_MTE),)
-AARCH64_TESTS += mte-1 mte-2 mte-3 mte-4
+AARCH64_TESTS += mte-1 mte-2 mte-3 mte-4 mte-6
 mte-%: CFLAGS += -march=armv8.5-a+memtag
 endif
 
-- 
2.20.1

Re: [PULL 0/5] target-arm queue

2021-04-12 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20210412103152.28433-1-peter.mayd...@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20210412103152.28433-1-peter.mayd...@linaro.org
Subject: [PULL 0/5] target-arm queue

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] patchew/20210412103152.28433-1-peter.mayd...@linaro.org -> 
patchew/20210412103152.28433-1-peter.mayd...@linaro.org
Switched to a new branch 'test'
b54ded2 exec: Fix overlap of PAGE_ANON and PAGE_TARGET_1
5f7d9e7 target/arm: Check PAGE_WRITE_ORG for MTE writeability
a3fb10e accel/tcg: Preserve PAGE_ANON when changing page permissions
435ceeb hw/arm/smmuv3: Emulate CFGI_STE_RANGE for an aligned range of StreamIDs
21190f3 hw/arm/virt-acpi-build: Fix GSIV values of the {GERR, Sync} interrupts

=== OUTPUT BEGIN ===
1/5 Checking commit 21190f31d420 (hw/arm/virt-acpi-build: Fix GSIV values of 
the {GERR, Sync} interrupts)
2/5 Checking commit 435ceeb0c89a (hw/arm/smmuv3: Emulate CFGI_STE_RANGE for an 
aligned range of StreamIDs)
3/5 Checking commit a3fb10ec0d23 (accel/tcg: Preserve PAGE_ANON when changing 
page permissions)
Use of uninitialized value $acpi_testexpected in string eq at 
./scripts/checkpatch.pl line 1529.
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#69: 
new file mode 100644

ERROR: "foo * bar" should be "foo *bar"
#126: FILE: tests/tcg/aarch64/mte.h:51:
+static void * alloc_mte_mem(size_t size) __attribute__((unused));

ERROR: "foo * bar" should be "foo *bar"
#127: FILE: tests/tcg/aarch64/mte.h:52:
+static void * alloc_mte_mem(size_t size)

total: 2 errors, 1 warnings, 84 lines checked

Patch 3/5 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

4/5 Checking commit 5f7d9e72dc1b (target/arm: Check PAGE_WRITE_ORG for MTE 
writeability)
WARNING: line over 80 characters
#30: FILE: target/arm/mte_helper.c:86:
+if (!(flags & (ptr_access == MMU_DATA_STORE ? PAGE_WRITE_ORG : 
PAGE_READ))) {

total: 0 errors, 1 warnings, 8 lines checked

Patch 4/5 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
5/5 Checking commit b54ded2ab9fb (exec: Fix overlap of PAGE_ANON and 
PAGE_TARGET_1)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20210412103152.28433-1-peter.mayd...@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [RFC PATCH-for-6.1 0/9] hw/clock: Strengthen machine (non-qdev) clock propagation

On Mon, 12 Apr 2021 at 11:31, Philippe Mathieu-Daudé  wrote:
> TIL MachineClass::reset().
>
> - hw/hppa/machine.c
> - hw/i386/pc.c
>
>   Used to reset CPUs manually because CPUs aren't sysbus-reset.

pc_machine_reset() is not resetting the CPUs -- it is
re-resetting the APIC devices, which looks like it is a
workaround for a reset-ordering or other problem. I'm not
sure where the CPUs are being reset...

-- PMM

Re: [PATCH v4 for-6.0 05/12] esp: introduce esp_fifo_pop_buf() and use it instead of fifo8_pop_buf()

On 4/7/21 9:57 PM, Mark Cave-Ayland wrote:
> The const pointer returned by fifo8_pop_buf() lies directly within the array 
> used
> to model the FIFO. Building with address sanitizers enabled shows that if the
> caller expects a minimum number of bytes present then if the FIFO is nearly 
> full,
> the caller may unexpectedly access past the end of the array.
> 
> Introduce esp_fifo_pop_buf() which takes a destination buffer and performs a
> memcpy() in it to guarantee that the caller cannot overwrite the FIFO array 
> and
> update all callers to use it. Similarly add underflow protection similar to
> esp_fifo_push() and esp_fifo_pop() so that instead of triggering an assert()
> the operation becomes a no-op.
> 
> Buglink: https://bugs.launchpad.net/qemu/+bug/1909247
> Signed-off-by: Mark Cave-Ayland 
> Tested-by: Alexander Bulekov 
> ---
>  hw/scsi/esp.c | 40 
>  1 file changed, 28 insertions(+), 12 deletions(-)

Way cleaner/safer.

Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v4 for-6.0 05/12] esp: introduce esp_fifo_pop_buf() and use it instead of fifo8_pop_buf()

On Wed, 7 Apr 2021 at 21:02, Mark Cave-Ayland
 wrote:
>
> The const pointer returned by fifo8_pop_buf() lies directly within the array 
> used
> to model the FIFO. Building with address sanitizers enabled shows that if the
> caller expects a minimum number of bytes present then if the FIFO is nearly 
> full,
> the caller may unexpectedly access past the end of the array.
>
> Introduce esp_fifo_pop_buf() which takes a destination buffer and performs a
> memcpy() in it to guarantee that the caller cannot overwrite the FIFO array 
> and
> update all callers to use it. Similarly add underflow protection similar to
> esp_fifo_push() and esp_fifo_pop() so that instead of triggering an assert()
> the operation becomes a no-op.
>
> Buglink: https://bugs.launchpad.net/qemu/+bug/1909247
> Signed-off-by: Mark Cave-Ayland 
> Tested-by: Alexander Bulekov 
> ---
>  hw/scsi/esp.c | 40 
>  1 file changed, 28 insertions(+), 12 deletions(-)
>
> diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
> index ff8fa73de9..1aa2caf57d 100644
> --- a/hw/scsi/esp.c
> +++ b/hw/scsi/esp.c
> @@ -117,6 +117,23 @@ static uint8_t esp_fifo_pop(Fifo8 *fifo)
>  return fifo8_pop(fifo);
>  }
>
> +static uint32_t esp_fifo_pop_buf(Fifo8 *fifo, uint8_t *dest, int maxlen)
> +{
> +const uint8_t *buf;
> +uint32_t n;
> +
> +if (maxlen == 0) {
> +return 0;
> +}
> +
> +buf = fifo8_pop_buf(fifo, maxlen, &n);
> +if (dest) {
> +memcpy(dest, buf, n);
> +}
> +
> +return n;
> +}
> +
>  static uint32_t esp_get_tc(ESPState *s)
>  {
>  uint32_t dmalen;
> @@ -241,11 +258,11 @@ static uint32_t get_cmd(ESPState *s, uint32_t maxlen)
>  if (dmalen == 0) {
>  return 0;
>  }
> -memcpy(buf, fifo8_pop_buf(&s->fifo, dmalen, &n), dmalen);
> -if (dmalen >= 3) {
> +n = esp_fifo_pop_buf(&s->fifo, buf, dmalen);
> +if (n >= 3) {
>  buf[0] = buf[2] >> 5;
>  }
> -fifo8_push_all(&s->cmdfifo, buf, dmalen);
> +fifo8_push_all(&s->cmdfifo, buf, n);
>  }
>  trace_esp_get_cmd(dmalen, target);

This bit could be tidied up further -- it currently sets
   dmalen = MIN(fifo8_num_used(&s->fifo), maxlen);
in order not to pull more bytes out of the FIFO than it has in it;
but now we are making that check in esp_fifo_pop_buf() I think you
could just do
dmalen = esp_fifo_pop_buf(&s->fifo, buf, maxlen);
and drop the
dmalen = MIN(fifo8_num_used(&s->fifo), maxlen);
if (dmalen == 0) {
return 0;
}
part and the use of 'n' entirely.

But this code isn't wrong, so we can do that later to avoid having
to respin here.

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [RFC PATCH-for-6.1 0/9] hw/clock: Strengthen machine (non-qdev) clock propagation

On 4/12/21 12:44 PM, Peter Maydell wrote:
> On Mon, 12 Apr 2021 at 11:31, Philippe Mathieu-Daudé  wrote:
>> TIL MachineClass::reset().
>>
>> - hw/hppa/machine.c
>> - hw/i386/pc.c
>>
>>   Used to reset CPUs manually because CPUs aren't sysbus-reset.
> 
> pc_machine_reset() is not resetting the CPUs -- it is
> re-resetting the APIC devices, which looks like it is a
> workaround for a reset-ordering or other problem. I'm not
> sure where the CPUs are being reset...

Ah, I got confused by the CPU_FOREACH(cs) macro.

Re: [PATCH-for-6.0] hw/mem/meson: Fix linking sparse-mem device with fuzzer

ping?

On 4/6/21 4:39 PM, Alexander Bulekov wrote:
> On 210406 1539, Philippe Mathieu-Daudé wrote:
>> sparse-mem.c is added to the 'mem_ss' source set, which itself
>> is conditionally added to softmmu_ss if CONFIG_MEM_DEVICE is
>> selected.
>> But if CONFIG_MEM_DEVICE isn't selected, we get a link failure
>> even if CONFIG_FUZZ is selected:
>>
>>   /usr/bin/ld: tests_qtest_fuzz_generic_fuzz.c.o: in function 
>> `generic_pre_fuzz':
>>   tests/qtest/fuzz/generic_fuzz.c:826: undefined reference to 
>> `sparse_mem_init'
>>   clang-10: error: linker command failed with exit code 1 (use -v to see 
>> invocation)
>>
>> Fix by adding sparse-mem.c directly to the softmmu_ss set.
>>
>> Fixes: 230376d285b ("memory: add a sparse memory device for fuzzing")
>> Signed-off-by: Philippe Mathieu-Daudé 
> 
> Oops..
> Reviewed-by: Alexander Bulekov 
> 
>> ---
>>  hw/mem/meson.build | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/mem/meson.build b/hw/mem/meson.build
>> index ef79e046787..3c8fdef9f9e 100644
>> --- a/hw/mem/meson.build
>> +++ b/hw/mem/meson.build
>> @@ -1,8 +1,9 @@
>>  mem_ss = ss.source_set()
>>  mem_ss.add(files('memory-device.c'))
>> -mem_ss.add(when: 'CONFIG_FUZZ', if_true: files('sparse-mem.c'))
>>  mem_ss.add(when: 'CONFIG_DIMM', if_true: files('pc-dimm.c'))
>>  mem_ss.add(when: 'CONFIG_NPCM7XX', if_true: files('npcm7xx_mc.c'))
>>  mem_ss.add(when: 'CONFIG_NVDIMM', if_true: files('nvdimm.c'))
>>  
>>  softmmu_ss.add_all(when: 'CONFIG_MEM_DEVICE', if_true: mem_ss)
>> +
>> +softmmu_ss.add(when: 'CONFIG_FUZZ', if_true: files('sparse-mem.c'))
>> -- 
>> 2.26.3
>>
>>
>

Re: [PULL for-6.0 0/4] emulated nvme docs and fixes for -rc3

On Mon, 12 Apr 2021 at 08:01, Klaus Jensen  wrote:
>
> From: Klaus Jensen 
>
> Hi Peter,
>
> The following changes since commit 555249a59e9cdd6b58da103aba5cf3a2d45c899f:
>
>   Merge remote-tracking branch 
> 'remotes/ehabkost-gl/tags/x86-next-pull-request' into staging (2021-04-10 
> 16:58:56 +0100)
>
> are available in the Git repository at:
>
>   git://git.infradead.org/qemu-nvme.git tags/nvme-fixes-20210412-pull-request
>
> for you to fetch changes up to 98f84f5a4eca5c03e32fff20f246d9b4b96d6422:
>
>   hw/block/nvme: drain namespaces on sq deletion (2021-04-12 08:55:23 +0200)
>
> 
> emulated nvme docs and fixes for -rc3
>
> - documentation
> - fixes
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/6.0
for any user-visible changes.

-- PMM

RE: [RFC PATCH 0/4] target/ppc: add disable-tcg option

2021-04-12 Thread Bruno Piazera Larsen

> You will need to fix these style errors.

Yeah, I was using the checkpatch on a manually generated diff, so I could 
commit everything with the correct style, but it didn't go over the new file, 
and I forgot to run it after all the commits -.-'

> For future reference, please CC me explicitly on things you'd like me
> to review.  I do scan the qemu-...@nongnu.org list, but it makes it
> easier for me to find (and less likely that I'll accidentally overlook it) if 
> I'm also CCed directly

Will do! It was just my first time with send-email, I guess I got a bit 
nervous


Bruno Piazera Larsen

Instituto de Pesquisas 
ELDORADO

Departamento Computação Embarcada

Analista de Software Trainee

Aviso Legal - Disclaimer

[PATCH v5 0/3] ppc: Enable 2nd DAWR support on Power10

This series enables 2nd DAWR support on p10 qemu guest. 2nd
DAWR is new watchpoint added in Power10 processor. Kernel/kvm
patches are already in[1]. Watchpoint on powerpc TCG guest is
not supported and thus 2nd DAWR is not enabled for TCG mode.

Patches apply fine on qemu/master branch (f2afdc2ad94b).

v4: 
https://lore.kernel.org/r/20210406053833.282907-1-ravi.bango...@linux.ibm.com
v3->v4:
  - Make error message more proper.

v3: https://lore.kernel.org/r/20210330095350.36309-1-ravi.bango...@linux.ibm.com
v3->v4:
  - spapr_dt_pa_features(): POWER10 processor is compatible with 3.0
(PCR_COMPAT_3_00). No need to ppc_check_compat(3_10) for now as
ppc_check_compati(3_00) will also be true. ppc_check_compat(3_10)
can be added while introducing pa_features_310 in future.
  - Use error_append_hint() for hints. Also add ERRP_GUARD().
  - Add kvmppc_set_cap_dawr1() stub function for CONFIG_KVM=n.

v2: 
https://lore.kernel.org/r/20210329041906.213991-1-ravi.bango...@linux.ibm.com
v2->v3:
  - Don't introduce pa_features_310[], instead, reuse pa_features_300[]
for 3.1 guests, as there is no difference between initial values of
them atm.
  - Call gen_spr_book3s_310_dbg() from init_proc_POWER10() instead of
init_proc_POWER8(). Also, Don't call gen_spr_book3s_207_dbg() from
gen_spr_book3s_310_dbg() as init_proc_POWER10() already calls it.

v1: 
https://lore.kernel.org/r/20200723104220.314671-1-ravi.bango...@linux.ibm.com
[Apologies for long gap]
v1->v2:
  - Introduce machine capability cap-dawr1 to enable/disable
the feature. By default, 2nd DAWR is OFF for guests even
when host kvm supports it. User has to manually enable it
with -machine cap-dawr1=on if he wishes to use it.
  - Split the header file changes into separate patch. (Sync
headers from v5.12-rc3)

[1] https://git.kernel.org/torvalds/c/bd1de1a0e6eff

Ravi Bangoria (3):
  Linux headers: update from 5.12-rc3
  ppc: Rename current DAWR macros and variables
  ppc: Enable 2nd DAWR support on p10

 hw/ppc/spapr.c|  7 +-
 hw/ppc/spapr_caps.c   | 32 +++
 include/hw/ppc/spapr.h|  8 +-
 include/standard-headers/drm/drm_fourcc.h | 23 -
 include/standard-headers/linux/input.h|  2 +-
 .../standard-headers/rdma/vmw_pvrdma-abi.h|  7 ++
 linux-headers/asm-generic/unistd.h|  4 +-
 linux-headers/asm-mips/unistd_n32.h   |  1 +
 linux-headers/asm-mips/unistd_n64.h   |  1 +
 linux-headers/asm-mips/unistd_o32.h   |  1 +
 linux-headers/asm-powerpc/kvm.h   |  2 +
 linux-headers/asm-powerpc/unistd_32.h |  1 +
 linux-headers/asm-powerpc/unistd_64.h |  1 +
 linux-headers/asm-s390/unistd_32.h|  1 +
 linux-headers/asm-s390/unistd_64.h|  1 +
 linux-headers/asm-x86/kvm.h   |  1 +
 linux-headers/asm-x86/unistd_32.h |  1 +
 linux-headers/asm-x86/unistd_64.h |  1 +
 linux-headers/asm-x86/unistd_x32.h|  1 +
 linux-headers/linux/kvm.h | 89 +++
 linux-headers/linux/vfio.h| 27 ++
 target/ppc/cpu.h  |  6 +-
 target/ppc/kvm.c  | 12 +++
 target/ppc/kvm_ppc.h  | 12 +++
 target/ppc/translate_init.c.inc   | 19 +++-
 25 files changed, 250 insertions(+), 11 deletions(-)

-- 
2.17.1

[PATCH v5 3/3] ppc: Enable 2nd DAWR support on p10

As per the PAPR, bit 0 of byte 64 in pa-features property indicates
availability of 2nd DAWR registers. i.e. If this bit is set, 2nd
DAWR is present, otherwise not. Use KVM_CAP_PPC_DAWR1 capability to
find whether kvm supports 2nd DAWR or not. If it's supported, allow
user to set the pa-feature bit in guest DT using cap-dawr1 machine
capability. Though, watchpoint on powerpc TCG guest is not supported
and thus 2nd DAWR is not enabled for TCG mode.

Signed-off-by: Ravi Bangoria 
Reviewed-by: Greg Kurz 
---
 hw/ppc/spapr.c  |  7 ++-
 hw/ppc/spapr_caps.c | 32 
 include/hw/ppc/spapr.h  |  6 +-
 target/ppc/cpu.h|  2 ++
 target/ppc/kvm.c| 12 
 target/ppc/kvm_ppc.h| 12 
 target/ppc/translate_init.c.inc | 15 +++
 7 files changed, 84 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 73a06df3b1..6317fad973 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -238,7 +238,7 @@ static void spapr_dt_pa_features(SpaprMachineState *spapr,
 0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 48 - 53 */
 /* 54: DecFP, 56: DecI, 58: SHA */
 0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 54 - 59 */
-/* 60: NM atomic, 62: RNG */
+/* 60: NM atomic, 62: RNG, 64: DAWR1 (ISA 3.1) */
 0x80, 0x00, 0x80, 0x00, 0x00, 0x00, /* 60 - 65 */
 };
 uint8_t *pa_features = NULL;
@@ -279,6 +279,9 @@ static void spapr_dt_pa_features(SpaprMachineState *spapr,
  * in pa-features. So hide it from them. */
 pa_features[40 + 2] &= ~0x80; /* Radix MMU */
 }
+if (spapr_get_cap(spapr, SPAPR_CAP_DAWR1)) {
+pa_features[66] |= 0x80;
+}
 
 _FDT((fdt_setprop(fdt, offset, "ibm,pa-features", pa_features, pa_size)));
 }
@@ -2003,6 +2006,7 @@ static const VMStateDescription vmstate_spapr = {
 &vmstate_spapr_cap_ccf_assist,
 &vmstate_spapr_cap_fwnmi,
 &vmstate_spapr_fwnmi,
+&vmstate_spapr_cap_dawr1,
 NULL
 }
 };
@@ -4542,6 +4546,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
 smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
 smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_ON;
 smc->default_caps.caps[SPAPR_CAP_FWNMI] = SPAPR_CAP_ON;
+smc->default_caps.caps[SPAPR_CAP_DAWR1] = SPAPR_CAP_OFF;
 spapr_caps_add_properties(smc);
 smc->irq = &spapr_irq_dual;
 smc->dr_phb_enabled = true;
diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index 9ea7ddd1e9..98a6b15f29 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -523,6 +523,28 @@ static void cap_fwnmi_apply(SpaprMachineState *spapr, 
uint8_t val,
 }
 }
 
+static void cap_dawr1_apply(SpaprMachineState *spapr, uint8_t val,
+   Error **errp)
+{
+ERRP_GUARD();
+if (!val) {
+return; /* Disable by default */
+}
+
+if (tcg_enabled()) {
+error_setg(errp, "DAWR1 not supported in TCG.");
+error_append_hint(errp, "Try appending -machine cap-dawr1=off\n");
+} else if (kvm_enabled()) {
+if (!kvmppc_has_cap_dawr1()) {
+error_setg(errp, "DAWR1 not supported by KVM.");
+error_append_hint(errp, "Try appending -machine cap-dawr1=off\n");
+} else if (kvmppc_set_cap_dawr1(val) < 0) {
+error_setg(errp, "Error enabling cap-dawr1 with KVM.");
+error_append_hint(errp, "Try appending -machine cap-dawr1=off\n");
+}
+}
+}
+
 SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
 [SPAPR_CAP_HTM] = {
 .name = "htm",
@@ -631,6 +653,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
 .type = "bool",
 .apply = cap_fwnmi_apply,
 },
+[SPAPR_CAP_DAWR1] = {
+.name = "dawr1",
+.description = "Allow 2nd Data Address Watchpoint Register (DAWR1)",
+.index = SPAPR_CAP_DAWR1,
+.get = spapr_cap_get_bool,
+.set = spapr_cap_set_bool,
+.type = "bool",
+.apply = cap_dawr1_apply,
+},
 };
 
 static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr,
@@ -771,6 +802,7 @@ SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
 SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
 SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
 SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI);
+SPAPR_CAP_MIG_STATE(dawr1, SPAPR_CAP_DAWR1);
 
 void spapr_caps_init(SpaprMachineState *spapr)
 {
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 5f90bb26d5..51202b7c90 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -74,8 +74,10 @@ typedef enum {
 #define SPAPR_CAP_CCF_ASSIST0x09
 /* Implements PAPR FWNMI option */
 #define SPAPR_CAP_FWNMI 0x0A
+/* DAWR1 */
+#define SPAPR_CAP_DAWR1 0x0B
 /* Num Caps */
-#define SPAPR_CAP_NUM

[PATCH v5 2/3] ppc: Rename current DAWR macros and variables

Power10 is introducing second DAWR. Use real register names (with
suffix 0) from ISA for current macros and variables used by Qemu.

One exception to this is KVM_REG_PPC_DAWR[X]. This is from kernel
uapi header and thus not changed in kernel as well as Qemu.

Signed-off-by: Ravi Bangoria 
Reviewed-by: Greg Kurz 
Reviewed-by: David Gibson 
---
 include/hw/ppc/spapr.h  | 2 +-
 target/ppc/cpu.h| 4 ++--
 target/ppc/translate_init.c.inc | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index bf7cab7a2c..5f90bb26d5 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -363,7 +363,7 @@ struct SpaprMachineState {
 
 /* Values for 2nd argument to H_SET_MODE */
 #define H_SET_MODE_RESOURCE_SET_CIABR   1
-#define H_SET_MODE_RESOURCE_SET_DAWR2
+#define H_SET_MODE_RESOURCE_SET_DAWR0   2
 #define H_SET_MODE_RESOURCE_ADDR_TRANS_MODE 3
 #define H_SET_MODE_RESOURCE_LE  4
 
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index e73416da68..cd02d65303 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1459,10 +1459,10 @@ typedef PowerPCCPU ArchCPU;
 #define SPR_MPC_BAR   (0x09F)
 #define SPR_PSPB  (0x09F)
 #define SPR_DPDES (0x0B0)
-#define SPR_DAWR  (0x0B4)
+#define SPR_DAWR0 (0x0B4)
 #define SPR_RPR   (0x0BA)
 #define SPR_CIABR (0x0BB)
-#define SPR_DAWRX (0x0BC)
+#define SPR_DAWRX0(0x0BC)
 #define SPR_HFSCR (0x0BE)
 #define SPR_VRSAVE(0x100)
 #define SPR_USPRG0(0x100)
diff --git a/target/ppc/translate_init.c.inc b/target/ppc/translate_init.c.inc
index c03a7c4f52..879e6df217 100644
--- a/target/ppc/translate_init.c.inc
+++ b/target/ppc/translate_init.c.inc
@@ -7748,12 +7748,12 @@ static void gen_spr_book3s_dbg(CPUPPCState *env)
 
 static void gen_spr_book3s_207_dbg(CPUPPCState *env)
 {
-spr_register_kvm_hv(env, SPR_DAWR, "DAWR",
+spr_register_kvm_hv(env, SPR_DAWR0, "DAWR0",
 SPR_NOACCESS, SPR_NOACCESS,
 SPR_NOACCESS, SPR_NOACCESS,
 &spr_read_generic, &spr_write_generic,
 KVM_REG_PPC_DAWR, 0x);
-spr_register_kvm_hv(env, SPR_DAWRX, "DAWRX",
+spr_register_kvm_hv(env, SPR_DAWRX0, "DAWRX0",
 SPR_NOACCESS, SPR_NOACCESS,
 SPR_NOACCESS, SPR_NOACCESS,
 &spr_read_generic, &spr_write_generic,
-- 
2.17.1

[PATCH v5 1/3] Linux headers: update from 5.12-rc3

Update against Linux 5.12-rc3

Signed-off-by: Ravi Bangoria 
---
 include/standard-headers/drm/drm_fourcc.h | 23 -
 include/standard-headers/linux/input.h|  2 +-
 .../standard-headers/rdma/vmw_pvrdma-abi.h|  7 ++
 linux-headers/asm-generic/unistd.h|  4 +-
 linux-headers/asm-mips/unistd_n32.h   |  1 +
 linux-headers/asm-mips/unistd_n64.h   |  1 +
 linux-headers/asm-mips/unistd_o32.h   |  1 +
 linux-headers/asm-powerpc/kvm.h   |  2 +
 linux-headers/asm-powerpc/unistd_32.h |  1 +
 linux-headers/asm-powerpc/unistd_64.h |  1 +
 linux-headers/asm-s390/unistd_32.h|  1 +
 linux-headers/asm-s390/unistd_64.h|  1 +
 linux-headers/asm-x86/kvm.h   |  1 +
 linux-headers/asm-x86/unistd_32.h |  1 +
 linux-headers/asm-x86/unistd_64.h |  1 +
 linux-headers/asm-x86/unistd_x32.h|  1 +
 linux-headers/linux/kvm.h | 89 +++
 linux-headers/linux/vfio.h| 27 ++
 18 files changed, 161 insertions(+), 4 deletions(-)

diff --git a/include/standard-headers/drm/drm_fourcc.h 
b/include/standard-headers/drm/drm_fourcc.h
index c47e19810c..a61ae520c2 100644
--- a/include/standard-headers/drm/drm_fourcc.h
+++ b/include/standard-headers/drm/drm_fourcc.h
@@ -526,6 +526,25 @@ extern "C" {
  */
 #define I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS fourcc_mod_code(INTEL, 7)
 
+/*
+ * Intel Color Control Surface with Clear Color (CCS) for Gen-12 render
+ * compression.
+ *
+ * The main surface is Y-tiled and is at plane index 0 whereas CCS is linear
+ * and at index 1. The clear color is stored at index 2, and the pitch should
+ * be ignored. The clear color structure is 256 bits. The first 128 bits
+ * represents Raw Clear Color Red, Green, Blue and Alpha color each represented
+ * by 32 bits. The raw clear color is consumed by the 3d engine and generates
+ * the converted clear color of size 64 bits. The first 32 bits store the Lower
+ * Converted Clear Color value and the next 32 bits store the Higher Converted
+ * Clear Color value when applicable. The Converted Clear Color values are
+ * consumed by the DE. The last 64 bits are used to store Color Discard Enable
+ * and Depth Clear Value Valid which are ignored by the DE. A CCS cache line
+ * corresponds to an area of 4x1 tiles in the main surface. The main surface
+ * pitch is required to be a multiple of 4 tile widths.
+ */
+#define I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC fourcc_mod_code(INTEL, 8)
+
 /*
  * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
  *
@@ -1035,9 +1054,9 @@ drm_fourcc_canonicalize_nvidia_format_mod(uint64_t 
modifier)
  * Not all combinations are valid, and different SoCs may support different
  * combinations of layout and options.
  */
-#define __fourcc_mod_amlogic_layout_mask 0xf
+#define __fourcc_mod_amlogic_layout_mask 0xff
 #define __fourcc_mod_amlogic_options_shift 8
-#define __fourcc_mod_amlogic_options_mask 0xf
+#define __fourcc_mod_amlogic_options_mask 0xff
 
 #define DRM_FORMAT_MOD_AMLOGIC_FBC(__layout, __options) \
fourcc_mod_code(AMLOGIC, \
diff --git a/include/standard-headers/linux/input.h 
b/include/standard-headers/linux/input.h
index f89c986190..7822c24178 100644
--- a/include/standard-headers/linux/input.h
+++ b/include/standard-headers/linux/input.h
@@ -81,7 +81,7 @@ struct input_id {
  * in units per radian.
  * When INPUT_PROP_ACCELEROMETER is set the resolution changes.
  * The main axes (ABS_X, ABS_Y, ABS_Z) are then reported in
- * in units per g (units/g) and in units per degree per second
+ * units per g (units/g) and in units per degree per second
  * (units/deg/s) for rotational axes (ABS_RX, ABS_RY, ABS_RZ).
  */
 struct input_absinfo {
diff --git a/include/standard-headers/rdma/vmw_pvrdma-abi.h 
b/include/standard-headers/rdma/vmw_pvrdma-abi.h
index 0989426a3f..c30182a7ae 100644
--- a/include/standard-headers/rdma/vmw_pvrdma-abi.h
+++ b/include/standard-headers/rdma/vmw_pvrdma-abi.h
@@ -133,6 +133,13 @@ enum pvrdma_wc_flags {
PVRDMA_WC_FLAGS_MAX = PVRDMA_WC_WITH_NETWORK_HDR_TYPE,
 };
 
+enum pvrdma_network_type {
+   PVRDMA_NETWORK_IB,
+   PVRDMA_NETWORK_ROCE_V1 = PVRDMA_NETWORK_IB,
+   PVRDMA_NETWORK_IPV4,
+   PVRDMA_NETWORK_IPV6
+};
+
 struct pvrdma_alloc_ucontext_resp {
uint32_t qp_tab_size;
uint32_t reserved;
diff --git a/linux-headers/asm-generic/unistd.h 
b/linux-headers/asm-generic/unistd.h
index 7287529177..ce58cff99b 100644
--- a/linux-headers/asm-generic/unistd.h
+++ b/linux-headers/asm-generic/unistd.h
@@ -861,9 +861,11 @@ __SYSCALL(__NR_faccessat2, sys_faccessat2)
 __SYSCALL(__NR_process_madvise, sys_process_madvise)
 #define __NR_epoll_pwait2 441
 __SC_COMP(__NR_epoll_pwait2, sys_epoll_pwait2, compat_sys_epoll_pwait2)
+#define __NR_mount_setattr 442
+__SYSCALL(__NR_mount_setattr, sys_mount_setattr)
 
 #undef __NR_syscalls
-#define __NR_sy

Re: [RFC v12 55/65] target/arm: cpu-exceptions: new module

2021-04-12 Thread Claudio Fontana

On 3/28/21 8:40 PM, Richard Henderson wrote:
> On 3/26/21 1:36 PM, Claudio Fontana wrote:
>> extract the exception handling code from cpu-sysemu,
>> and split it into general arm code and an AArch64-specific part.
>>
>> Signed-off-by: Claudio Fontana
>> ---
>>   target/arm/cpu-exceptions-aa64.h |  32 +
>>   target/arm/cpu.h |   4 -
>>   target/arm/arch_dump.c   |   1 +
>>   target/arm/cpu-exceptions-aa64.c | 553 ++
>>   target/arm/cpu-exceptions.c  | 481 +++
>>   target/arm/cpu-sysemu.c  | 975 ---
>>   target/arm/cpu-user.c|   1 +
>>   target/arm/cpu64.c   |   1 +
>>   target/arm/kvm/kvm64.c   |   1 +
>>   target/arm/tcg/helper-a64.c  |   1 +
>>   target/arm/tcg/helper.c  |   1 +
>>   target/arm/meson.build   |   5 +
>>   12 files changed, 1077 insertions(+), 979 deletions(-)
>>   create mode 100644 target/arm/cpu-exceptions-aa64.h
>>   create mode 100644 target/arm/cpu-exceptions-aa64.c
>>   create mode 100644 target/arm/cpu-exceptions.c
> 
> This is the second move for all of this code.
> Not a fan of that.  Come up with a plan and move it once.

I tried to reduce the double move, I failed at my first attempt: I tried to 
move the exception code directly in place
but failed due to first the code needing to be extracted out from being 
tcg-only fairly early to enable the build,
but then the code in exceptions.c / exceptions-aa64.c needing the cpu-sve stuff 
and the aa64 split of other patches.


> 
> I think the sve_exception_el move is a mistake, since as you have pointed out 
> it is used far beyond exception handling.

I think that's right, also switch_mode and the _sync_ functions probably do not 
belong there.. I'll revisit this.

> 
> 
> r~
>
 
Thanks,

C

RE: [PATCH 1/4] target/ppc: Code motion required to build disabling tcg

2021-04-12 Thread Bruno Piazera Larsen

> A general advice for this whole series is: make sure you add in some
> words explaining why you decided to make a particular change. It will be
> much easier to review if we know what were the logical steps leading to
> the change.

Fair point, I should've thought about that.

> > This commit does the necessary code motion from translate_init.c.inc
>
> For instance, I don't immediately see why these changes are necessary. I
> see that translate_init.c.inc already has some `#ifdef CONFIG_TCG`, so
> why do we need to move a bunch of code into cpu.c instead of just adding
> more code under ifdef CONFIG_TCG? (I'm not saying it's wrong, just trying to
> understand the reasoning).

There are 3 main reasons for this decision. The first is kind of silly, but 
when I read translate.c my mind jumped to translating machine code to TCG, and 
the amount of TCGv variables at the start reinforced this notion.
The second was that both s390x and i386 removed it (translate.c) from 
compilation, so I had no good reason to doubt it.
The last (and arguably most important) is that translate.c is many thousands of 
lines long (translate_init.c.inc alone was almost 11k). The whole point of 
disabling TCG is to speed up compilation and reduce the final file size, so I 
think it makes sense to remove that big file.
And the final nail in the coffin is that at no point did it cross my mind to 
keep the init part of translation, but remove the rest

Also, I'm not a fan of big ifdefs, because it's kinda hard to follow them when 
viewing code in vim. Adding that to already having a cpu.c file, where it makes 
sense (to me, at least) to add all CPU related functions, just sounded like a 
good idea.

> Is translate_init.c.inc intended to be TCG only? But then I see you
> moved TCG-only functions out of it (ppc_fixup_cpu) and left not TCG-only
> functions (gen_spr_generic).

This is me misjudging what is TCG and what is not, mostly. I think that 
actually moving everything to cpu.c and adding ifdefs, or creating a 
cpu_tcg.c.inc or similar, would be the most sensible code motion, but every 
function I removed from translate gave me 3 new errors, at some point I felt 
like I should leave something behind otherwise we're compiling everything from 
TCG again, just in different files, so I tried to guess what was TCG and what 
was not (also, I really wanted the RFC out by the weekend, I _may_ have rushed 
a few choices).

> > This moves all functions that start with gdb_* into target/ppc/gdbstub.c
> > and creates a new function that calls those and is called by ppc_cpu_realize
>
> This looks like it makes sense regardless of disable-tcg, could we have
> it in a standalone patch?

Sure, I'll try and get it ready ASAP, and make sure I didn't miss one function 
before sending. Just a noob question... do I edit the patch manually to have it 
only contain the gdb code motion, or is there a way to move back to before I 
actually made the commit without needing to re-do the changes?

Thomas, I'm adding you to this discussion since it sort of answers your message 
on the other one, about the functions used even in a KVM-only build.

> IIRC you don't only have to exclude translate.c from the build, you also
> have to separate translate_init.c.inc from it, i.e. turn
> translate_init.c.inc into a proper .c file and get rid of the #include
> "translate_init.c.inc" statement in translate.c, since many functions in the
> translate_init.c.inc file are still needed for the KVM-only builds, too. So
> maybe that's a good place to start as a first mini series.

Now that I know that translate doesn't mean "translating to TCG", this idea 
makes more sense. My question is, is it a better option than the code motion 
I'm suggesting? From my quick check on the bits that I haven't touched it 
might, but at this point it's clear I'm very lost with what makes sense hahaha.


Bruno Piazera Larsen

Instituto de Pesquisas 
ELDORADO

Departamento Computação Embarcada

Analista de Software Trainee

Aviso Legal - Disclaimer

[PULL 0/1] NBD fix for 6.0-rc3

2021-04-12 Thread Vladimir Sementsov-Ogievskiy

The following changes since commit 555249a59e9cdd6b58da103aba5cf3a2d45c899f:

  Merge remote-tracking branch 'remotes/ehabkost-gl/tags/x86-next-pull-request' 
into staging (2021-04-10 16:58:56 +0100)

are available in the Git repository at:

  https://src.openvz.org/scm/~vsementsov/qemu.git tags/pull-nbd-2021-04-12

for you to fetch changes up to d3c278b689845558cd9811940436b28ae6afc5d7:

  block/nbd: fix possible use after free of s->connect_thread (2021-04-12 
11:56:03 +0300)


One fix of possible use-after-free in NBD block-driver for 6.0-rc3



Note: the tag is signed by a new key, as I've lost the old one. So,
now there are two keys with my name on http://keys.gnupg.net, the elder
is lost. I feel stupid about that :(. Anyway, both keys are not signed by
anybody except for myself. And this is my first pull-request to Qemu,
so, I think some kind of TOFU still applies.

Vladimir Sementsov-Ogievskiy (1):
  block/nbd: fix possible use after free of s->connect_thread

 block/nbd.c | 11 +++
 1 file changed, 11 insertions(+)

-- 
2.29.2

[PULL 1/1] block/nbd: fix possible use after free of s->connect_thread

2021-04-12 Thread Vladimir Sementsov-Ogievskiy

If on nbd_close() we detach the thread (in
nbd_co_establish_connection_cancel() thr->state becomes
CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use
s->connect_thread (which is set to NULL), as running thread may free it
at any time.

Still nbd_co_establish_connection() does exactly this: it saves
s->connect_thread to local variable (just for better code style) and
use it even after yield point, when thread may be already detached.

Fix that. Also check thr to be non-NULL on
nbd_co_establish_connection() start for safety.

After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes
impossible in the second switch in nbd_co_establish_connection().
Still, don't add extra abort() just before the release. If it somehow
possible to reach this "case:" it won't hurt. Anyway, good refactoring
of all this reconnect mess will come soon.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Roman Kagan 
---
 block/nbd.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index c26dc5a54f..1d4668d42d 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -443,6 +443,11 @@ nbd_co_establish_connection(BlockDriverState *bs, Error 
**errp)
 BDRVNBDState *s = bs->opaque;
 NBDConnectThread *thr = s->connect_thread;
 
+if (!thr) {
+/* detached */
+return -1;
+}
+
 qemu_mutex_lock(&thr->mutex);
 
 switch (thr->state) {
@@ -486,6 +491,12 @@ nbd_co_establish_connection(BlockDriverState *bs, Error 
**errp)
 s->wait_connect = true;
 qemu_coroutine_yield();
 
+if (!s->connect_thread) {
+/* detached */
+return -1;
+}
+assert(thr == s->connect_thread);
+
 qemu_mutex_lock(&thr->mutex);
 
 switch (thr->state) {
-- 
2.29.2

Re: [PATCH 2/2] Support monitor chardev hotswap with QMP

2021-04-12 Thread Li Zhang

Hi Markus,

Any suggestions on this patch?

Thanks
Li

On Fri, Mar 26, 2021 at 3:40 PM Markus Armbruster  wrote:

> Li Zhang  writes:
>
> > Hi,
> >
> > Any comments about this patch?
>
> I wanted to review this before my Easter break, but I'm out of time :(
>
> When I'm back (April 6), I'll check whether it still needs review, but I
> do hope somebody else can look at it sooner.
>
>

RE: Better alternative to strncpy in QEMU.

2021-04-12 Thread Bruno Piazera Larsen

Im not sure about what "better version" means, but my guess would be a faster 
or more reliable version. If that's the case:

> for (int i = 0; i < strlen(source); i++) {

Since you're going on ebyte at a time, there's no need to know how big the 
array is. As a stopping condition you could use source[i] != '\0', which is one 
less pass through the array.

One other optimization that could be done (but is a bigger headache to 
implement correctly) would be to cast the char* into uint64_t* (or uint32_t* 
for 32-bit systems) and copy more bytes at a time. The headache comes from 
finding a 0 in this longer variable, but you can probably use a similar 
strategy to freebsd's strlen 
(https://github.com/freebsd/freebsd-src/blob/main/lib/libc/string/strlen.c).

I'm not sure if it would be a real speedup in most cases, since glibc can use 
this strategy already), but at least we'd have consistent performance in case 
some system doesn't use it


Bruno Piazera Larsen

Instituto de Pesquisas 
ELDORADO

Departamento Computação Embarcada

Analista de Software Trainee

Aviso Legal - Disclaimer

Re: [PATCH v3 0/8] [RfC] fix tracing for modules

2021-04-12 Thread Gerd Hoffmann

On Fri, Apr 09, 2021 at 02:17:13PM +0100, Daniel P. Berrangé wrote:
> On Fri, Apr 09, 2021 at 03:12:45PM +0200, Gerd Hoffmann wrote:
> >   Hi,
> > 
> > > eg a trace point "dma_map_wait" gets mapped to probes in many
> > > .stp files, once per target, because we need to match based on
> > > the executable path:
> > > 
> > >   probe qemu.system.x86_64.dma_map_wait = 
> > > process("/usr/libexec/qemu-system-x86_64").mark("dma_map_wait")
> > 
> > So, that changes with modules, we need the module name now, i.e.
> > 
> > probe qemu.system.x86_64.qxl_soft_reset = \
> > 
> > process("/home/kraxel/qemu-install/lib/qemu/hw-display-qxl.so").mark("qxl_soft_reset")
> > 
> > We could repeat that in every qemu-system-$arch.stp file.
> 
> This would have the surprise the 'qemu.system.x86_64.qxl_soft_reset'
> probes will fire even for qemu-system-ppc64 / qemu-system-x etc
> because we've not restricted the scope as the original probe did.

Oh, right.

> If we can't fix that, then we must use the second option to avoid
> the surprise IMHO

Yep.  Got that working.  Only problem is qemu-trace-stap is broken now
and it seems there is no easy way out.  Right now qemu-trace-stap can
simply work with a constant prefix, with that change the prefix can be
either qemu.system.$arch or qemu.system.modules and I suspect there is
no way around listing tracepoints to figure the correct name ...

take care,
  Gerd

PS: https://git.kraxel.org/cgit/qemu/log/?h=sirius/trace-modules

Re: [PATCH v5 3/3] ppc: Enable 2nd DAWR support on p10

2021-04-12 Thread Cédric Le Goater

On 4/12/21 1:44 PM, Ravi Bangoria wrote:
> As per the PAPR, bit 0 of byte 64 in pa-features property indicates
> availability of 2nd DAWR registers. i.e. If this bit is set, 2nd
> DAWR is present, otherwise not. Use KVM_CAP_PPC_DAWR1 capability to
> find whether kvm supports 2nd DAWR or not. If it's supported, allow
> user to set the pa-feature bit in guest DT using cap-dawr1 machine
> capability. Though, watchpoint on powerpc TCG guest is not supported
> and thus 2nd DAWR is not enabled for TCG mode.
> 
> Signed-off-by: Ravi Bangoria 
> Reviewed-by: Greg Kurz 

Reviewed-by: Cédric Le Goater 

Thanks,

C.

> ---
>  hw/ppc/spapr.c  |  7 ++-
>  hw/ppc/spapr_caps.c | 32 
>  include/hw/ppc/spapr.h  |  6 +-
>  target/ppc/cpu.h|  2 ++
>  target/ppc/kvm.c| 12 
>  target/ppc/kvm_ppc.h| 12 
>  target/ppc/translate_init.c.inc | 15 +++
>  7 files changed, 84 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 73a06df3b1..6317fad973 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -238,7 +238,7 @@ static void spapr_dt_pa_features(SpaprMachineState *spapr,
>  0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 48 - 53 */
>  /* 54: DecFP, 56: DecI, 58: SHA */
>  0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 54 - 59 */
> -/* 60: NM atomic, 62: RNG */
> +/* 60: NM atomic, 62: RNG, 64: DAWR1 (ISA 3.1) */
>  0x80, 0x00, 0x80, 0x00, 0x00, 0x00, /* 60 - 65 */
>  };
>  uint8_t *pa_features = NULL;
> @@ -279,6 +279,9 @@ static void spapr_dt_pa_features(SpaprMachineState *spapr,
>   * in pa-features. So hide it from them. */
>  pa_features[40 + 2] &= ~0x80; /* Radix MMU */
>  }
> +if (spapr_get_cap(spapr, SPAPR_CAP_DAWR1)) {
> +pa_features[66] |= 0x80;
> +}
>  
>  _FDT((fdt_setprop(fdt, offset, "ibm,pa-features", pa_features, 
> pa_size)));
>  }
> @@ -2003,6 +2006,7 @@ static const VMStateDescription vmstate_spapr = {
>  &vmstate_spapr_cap_ccf_assist,
>  &vmstate_spapr_cap_fwnmi,
>  &vmstate_spapr_fwnmi,
> +&vmstate_spapr_cap_dawr1,
>  NULL
>  }
>  };
> @@ -4542,6 +4546,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
> void *data)
>  smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>  smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_ON;
>  smc->default_caps.caps[SPAPR_CAP_FWNMI] = SPAPR_CAP_ON;
> +smc->default_caps.caps[SPAPR_CAP_DAWR1] = SPAPR_CAP_OFF;
>  spapr_caps_add_properties(smc);
>  smc->irq = &spapr_irq_dual;
>  smc->dr_phb_enabled = true;
> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> index 9ea7ddd1e9..98a6b15f29 100644
> --- a/hw/ppc/spapr_caps.c
> +++ b/hw/ppc/spapr_caps.c
> @@ -523,6 +523,28 @@ static void cap_fwnmi_apply(SpaprMachineState *spapr, 
> uint8_t val,
>  }
>  }
>  
> +static void cap_dawr1_apply(SpaprMachineState *spapr, uint8_t val,
> +   Error **errp)
> +{
> +ERRP_GUARD();
> +if (!val) {
> +return; /* Disable by default */
> +}
> +
> +if (tcg_enabled()) {
> +error_setg(errp, "DAWR1 not supported in TCG.");
> +error_append_hint(errp, "Try appending -machine cap-dawr1=off\n");
> +} else if (kvm_enabled()) {
> +if (!kvmppc_has_cap_dawr1()) {
> +error_setg(errp, "DAWR1 not supported by KVM.");
> +error_append_hint(errp, "Try appending -machine 
> cap-dawr1=off\n");
> +} else if (kvmppc_set_cap_dawr1(val) < 0) {
> +error_setg(errp, "Error enabling cap-dawr1 with KVM.");
> +error_append_hint(errp, "Try appending -machine 
> cap-dawr1=off\n");
> +}
> +}
> +}
> +
>  SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>  [SPAPR_CAP_HTM] = {
>  .name = "htm",
> @@ -631,6 +653,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>  .type = "bool",
>  .apply = cap_fwnmi_apply,
>  },
> +[SPAPR_CAP_DAWR1] = {
> +.name = "dawr1",
> +.description = "Allow 2nd Data Address Watchpoint Register (DAWR1)",
> +.index = SPAPR_CAP_DAWR1,
> +.get = spapr_cap_get_bool,
> +.set = spapr_cap_set_bool,
> +.type = "bool",
> +.apply = cap_dawr1_apply,
> +},
>  };
>  
>  static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr,
> @@ -771,6 +802,7 @@ SPAPR_CAP_MIG_STATE(nested_kvm_hv, 
> SPAPR_CAP_NESTED_KVM_HV);
>  SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
>  SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
>  SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI);
> +SPAPR_CAP_MIG_STATE(dawr1, SPAPR_CAP_DAWR1);
>  
>  void spapr_caps_init(SpaprMachineState *spapr)
>  {
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 5f90bb26d5..51202

Re: [PATCH 2/2] Support monitor chardev hotswap with QMP

2021-04-12 Thread Markus Armbruster

Li Zhang  writes:

> Hi Markus,
>
> Any suggestions on this patch?

I understand PATCH 1/2 got superseded by Lukas's "[PATCH v8 0/4] yank:
Add chardev tests and fixes".  I trust Marc-André will take care of it
in due time.

Before I look at the actual patch: does this patch depend on Lukas's fix
or your "[PATCH 1/2] Fix the segment fault when calling
yank_register_instance"?

Re: Better alternative to strncpy in QEMU.

On Sun, 11 Apr 2021 at 14:52, Chetan  wrote:
> char *qemu_strncpy(char destination[], char source[], size_t destination_size)
> {
> /* Looping through the array and copying the characters from
>  * source to destination.
>  */
> for (int i = 0; i < strlen(source); i++) {
> destination[i] = source[i];
>
> /* Check if value of i is equal to the second last index
>  * of destination array and if condition is true, mark last
>  * index as NULL and break from the loop.
>  */
> if (i == (destination_size - 2)) {
> destination[destination_size - 1] = '\0';
> break;
> }
> }
> return destination;
> }

This implementation is "accidentally quadratic", because it
calls strlen(source) every time through the loop, and thus
copying an N byte string will read N*N bytes of memory. (The
compiler can't pull the "strlen(source)" call up out of the loop
because it can't guarantee that source and destination don't
overlap.)

I think this is a good illustration of why we probably don't want
to roll our own string operation functions if we can avoid it
(ie without having a clear view of why we are improving on either
what libc or glib offer us).

thanks
-- PMM

Re: [PATCH 2/2] Support monitor chardev hotswap with QMP

2021-04-12 Thread Li Zhang

Hi Markus,

[PATCH 1/2] Fix the segment fault when calling yank_register_instance is
reworked by Lukas.
And his patches have been merged to master branch of qemu.
[PATCH 2/2]  Support monitor chardev hotswap with QMP is to change monitor
backend with chardev-change. It is not implemented yet.

On Mon, Apr 12, 2021 at 3:10 PM Markus Armbruster  wrote:

> Li Zhang  writes:
>
> > Hi Markus,
> >
> > Any suggestions on this patch?
>
> I understand PATCH 1/2 got superseded by Lukas's "[PATCH v8 0/4] yank:
> Add chardev tests and fixes".  I trust Marc-André will take care of it
> in due time.
>
> Before I look at the actual patch: does this patch depend on Lukas's fix
> or your "[PATCH 1/2] Fix the segment fault when calling
> yank_register_instance"?
>
>

[PATCH 3/3] hw/arm/mps2-tz: Implement AN524 memory remapping via machine property

The AN524 FPGA image supports two memory maps, which differ
in where the QSPI and BRAM are. In the default map, the BRAM
is at 0x_, and the QSPI at 0x2800_. In the second
map, they are the other way around.

In hardware, the initial mapping can be selected by the user
by writing either "REMAP: BRAM" (the default) or "REMAP: QSPI"
in the board configuration file. The guest can also dynamically
change the mapping via the SCC CFG_REG0 register.

Implement this functionality for QEMU, using a machine property
"remap" with valid values "BRAM" and "QSPI" to allow the user to set
the initial mapping, in the same way they can on the FPGA, and
wiring up the bit from the SCC register to also switch the mapping.

Signed-off-by: Peter Maydell 
---
 docs/system/arm/mps2.rst |  10 
 hw/arm/mps2-tz.c | 106 ++-
 2 files changed, 115 insertions(+), 1 deletion(-)

diff --git a/docs/system/arm/mps2.rst b/docs/system/arm/mps2.rst
index f83b1517871..8a75beb3a08 100644
--- a/docs/system/arm/mps2.rst
+++ b/docs/system/arm/mps2.rst
@@ -45,3 +45,13 @@ Differences between QEMU and real hardware:
   flash, but only as simple ROM, so attempting to rewrite the flash
   from the guest will fail
 - QEMU does not model the USB controller in MPS3 boards
+
+Machine-specific options
+
+
+The following machine-specific options are supported:
+
+remap
+  Supported for ``mps3-an524`` only.
+  Set ``BRAM``/``QSPI`` to select the initial memory mapping. The
+  default is ``BRAM``.
diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index 25016e464d9..e9806779326 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -55,6 +55,7 @@
 #include "hw/boards.h"
 #include "exec/address-spaces.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/reset.h"
 #include "hw/misc/unimp.h"
 #include "hw/char/cmsdk-apb-uart.h"
 #include "hw/timer/cmsdk-apb-timer.h"
@@ -72,6 +73,7 @@
 #include "hw/core/split-irq.h"
 #include "hw/qdev-clock.h"
 #include "qom/object.h"
+#include "hw/irq.h"
 
 #define MPS2TZ_NUMIRQ_MAX 96
 #define MPS2TZ_RAM_MAX 5
@@ -153,6 +155,9 @@ struct MPS2TZMachineState {
 SplitIRQ cpu_irq_splitter[MPS2TZ_NUMIRQ_MAX];
 Clock *sysclk;
 Clock *s32kclk;
+
+int remap;
+qemu_irq remap_irq;
 };
 
 #define TYPE_MPS2TZ_MACHINE "mps2tz"
@@ -228,6 +233,10 @@ static const RAMInfo an505_raminfo[] = { {
 },
 };
 
+/*
+ * Note that the addresses and MPC numbering here should match up
+ * with those used in remap_memory(), which can swap the BRAM and QSPI.
+ */
 static const RAMInfo an524_raminfo[] = { {
 .name = "bram",
 .base = 0x,
@@ -457,6 +466,7 @@ static MemoryRegion *make_scc(MPS2TZMachineState *mms, void 
*opaque,
 
 object_initialize_child(OBJECT(mms), "scc", scc, TYPE_MPS2_SCC);
 sccdev = DEVICE(scc);
+qdev_prop_set_uint32(sccdev, "scc-cfg0", mms->remap);
 qdev_prop_set_uint32(sccdev, "scc-cfg4", 0x2);
 qdev_prop_set_uint32(sccdev, "scc-aid", 0x0028);
 qdev_prop_set_uint32(sccdev, "scc-id", mmc->scc_id);
@@ -573,6 +583,50 @@ static MemoryRegion *make_mpc(MPS2TZMachineState *mms, 
void *opaque,
 return sysbus_mmio_get_region(SYS_BUS_DEVICE(mpc), 0);
 }
 
+static hwaddr boot_mem_base(MPS2TZMachineState *mms)
+{
+/*
+ * Return the canonical address of the block which will be mapped
+ * at address 0x0 (i.e. where the vector table is).
+ * This is usually 0, but if the AN524 alternate memory map is
+ * enabled it will be the base address of the QSPI block.
+ */
+return mms->remap ? 0x2800 : 0;
+}
+
+static void remap_memory(MPS2TZMachineState *mms, int map)
+{
+/*
+ * Remap the memory for the AN524. 'map' is the value of
+ * SCC CFG_REG0 bit 0, i.e. 0 for the default map and 1
+ * for the "option 1" mapping where QSPI is at address 0.
+ *
+ * Effectively we need to swap around the "upstream" ends of
+ * MPC 0 and MPC 1.
+ */
+MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
+int i;
+
+if (mmc->fpga_type != FPGA_AN524) {
+return;
+}
+
+for (i = 0; i < 2; i++) {
+TZMPC *mpc = &mms->mpc[i];
+MemoryRegion *upstream = sysbus_mmio_get_region(SYS_BUS_DEVICE(mpc), 
1);
+hwaddr addr = (i ^ map) ? 0x2800 : 0;
+
+memory_region_set_address(upstream, addr);
+}
+}
+
+static void remap_irq_fn(void *opaque, int n, int level)
+{
+MPS2TZMachineState *mms = opaque;
+
+remap_memory(mms, level);
+}
+
 static MemoryRegion *make_dma(MPS2TZMachineState *mms, void *opaque,
   const char *name, hwaddr size,
   const int *irqs)
@@ -711,7 +765,7 @@ static uint32_t boot_ram_size(MPS2TZMachineState *mms)
 MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
 
 for (p = mmc->raminfo; p->name; p++) {
-if (p->base == 0) {
+if (p->base == boot_mem_base(mms)) {
 return p->size;
 }

[PATCH 2/3] hw/misc/mps2-scc: Support using CFG0 bit 0 for remapping

On some boards, SCC config register CFG0 bit 0 controls whether
parts of the board memory map are remapped. Support this with:
 * a device property scc-cfg0 so the board can specify the
   initial value of the CFG0 register
 * an outbound GPIO line which tracks bit 0 and which the board
   can wire up to provide the remapping

Signed-off-by: Peter Maydell 
---
 include/hw/misc/mps2-scc.h |  9 +
 hw/misc/mps2-scc.c | 13 ++---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/include/hw/misc/mps2-scc.h b/include/hw/misc/mps2-scc.h
index ea261ea30d6..3b2d13ac9c3 100644
--- a/include/hw/misc/mps2-scc.h
+++ b/include/hw/misc/mps2-scc.h
@@ -18,8 +18,14 @@
  *  + QOM property "scc-cfg4": value of the read-only CFG4 register
  *  + QOM property "scc-aid": value of the read-only SCC_AID register
  *  + QOM property "scc-id": value of the read-only SCC_ID register
+ *  + QOM property "scc-cfg0": reset value of the CFG0 register
  *  + QOM property array "oscclk": reset values of the OSCCLK registers
  *(which are accessed via the SYS_CFG channel provided by this device)
+ *  + named GPIO output "remap": this tracks the value of CFG0 register
+ *bit 0. Boards where this bit controls memory remapping should
+ *connect this GPIO line to a function performing that mapping.
+ *Boards where bit 0 has no special function should leave the GPIO
+ *output disconnected.
  */
 #ifndef MPS2_SCC_H
 #define MPS2_SCC_H
@@ -55,6 +61,9 @@ struct MPS2SCC {
 uint32_t num_oscclk;
 uint32_t *oscclk;
 uint32_t *oscclk_reset;
+uint32_t cfg0_reset;
+
+qemu_irq remap;
 };
 
 #endif
diff --git a/hw/misc/mps2-scc.c b/hw/misc/mps2-scc.c
index c56aca86ad5..b3b42a792cd 100644
--- a/hw/misc/mps2-scc.c
+++ b/hw/misc/mps2-scc.c
@@ -23,6 +23,7 @@
 #include "qemu/bitops.h"
 #include "trace.h"
 #include "hw/sysbus.h"
+#include "hw/irq.h"
 #include "migration/vmstate.h"
 #include "hw/registerfields.h"
 #include "hw/misc/mps2-scc.h"
@@ -186,10 +187,13 @@ static void mps2_scc_write(void *opaque, hwaddr offset, 
uint64_t value,
 switch (offset) {
 case A_CFG0:
 /*
- * TODO on some boards bit 0 controls RAM remapping;
- * on others bit 1 is CPU_WAIT.
+ * On some boards bit 0 controls board-specific remapping;
+ * we always reflect bit 0 in the 'remap' GPIO output line,
+ * and let the board wire it up or not as it chooses.
+ * TODO on some boards bit 1 is CPU_WAIT.
  */
 s->cfg0 = value;
+qemu_set_irq(s->remap, s->cfg0 & 1);
 break;
 case A_CFG1:
 s->cfg1 = value;
@@ -283,7 +287,7 @@ static void mps2_scc_reset(DeviceState *dev)
 int i;
 
 trace_mps2_scc_reset();
-s->cfg0 = 0;
+s->cfg0 = s->cfg0_reset;
 s->cfg1 = 0;
 s->cfg2 = 0;
 s->cfg5 = 0;
@@ -308,6 +312,7 @@ static void mps2_scc_init(Object *obj)
 
 memory_region_init_io(&s->iomem, obj, &mps2_scc_ops, s, "mps2-scc", 
0x1000);
 sysbus_init_mmio(sbd, &s->iomem);
+qdev_init_gpio_out_named(DEVICE(obj), &s->remap, "remap", 1);
 }
 
 static void mps2_scc_realize(DeviceState *dev, Error **errp)
@@ -353,6 +358,8 @@ static Property mps2_scc_properties[] = {
 DEFINE_PROP_UINT32("scc-cfg4", MPS2SCC, cfg4, 0),
 DEFINE_PROP_UINT32("scc-aid", MPS2SCC, aid, 0),
 DEFINE_PROP_UINT32("scc-id", MPS2SCC, id, 0),
+/* Reset value for CFG0 register */
+DEFINE_PROP_UINT32("scc-cfg0", MPS2SCC, cfg0_reset, 0),
 /*
  * These are the initial settings for the source clocks on the board.
  * In hardware they can be configured via a config file read by the
-- 
2.20.1

[PATCH 1/3] hw/misc/mps2-scc: Add "QEMU interface" comment

The MPS2 SCC device doesn't have any documentation of its properties;
add a "QEMU interface" format comment describing them.

Signed-off-by: Peter Maydell 
---
 include/hw/misc/mps2-scc.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/include/hw/misc/mps2-scc.h b/include/hw/misc/mps2-scc.h
index 49d070616aa..ea261ea30d6 100644
--- a/include/hw/misc/mps2-scc.h
+++ b/include/hw/misc/mps2-scc.h
@@ -9,6 +9,18 @@
  *  (at your option) any later version.
  */
 
+/*
+ * This is a model of the Serial Communication Controller (SCC)
+ * block found in most MPS FPGA images.
+ *
+ * QEMU interface:
+ *  + sysbus MMIO region 0: the register bank
+ *  + QOM property "scc-cfg4": value of the read-only CFG4 register
+ *  + QOM property "scc-aid": value of the read-only SCC_AID register
+ *  + QOM property "scc-id": value of the read-only SCC_ID register
+ *  + QOM property array "oscclk": reset values of the OSCCLK registers
+ *(which are accessed via the SYS_CFG channel provided by this device)
+ */
 #ifndef MPS2_SCC_H
 #define MPS2_SCC_H
 
-- 
2.20.1

[PATCH 0/3] mps3-an524: support memory remapping

The AN524 FPGA image supports two memory maps, which differ
in where the QSPI and BRAM are. In the default map, the BRAM
is at 0x_, and the QSPI at 0x2800_. In the second
map, they are the other way around.

In hardware, the initial mapping can be selected by the user
by writing either "REMAP: BRAM" (the default) or "REMAP: QSPI"
in the board configuration file. The guest can also dynamically
change the mapping via the SCC CFG_REG0 register.

This patchset adds support for the feature to QEMU's model;
the user-sets-the-initial-mapping part is a new machine property
which can be set with "-M remap=QSPI".

This is needed for some guest images -- for instance the
Arm TF-M binaries -- which assume they have the QSPI layout.

Based-on: 20210409150527.15053-1-peter.mayd...@linaro.org
("mps3-an524: Fix MPC setting for SRAM block")
though any conflict/dependency would be minor and purely textual.

thanks
-- PMM

Peter Maydell (3):
  hw/misc/mps2-scc: Add "QEMU interface" comment
  hw/misc/mps2-scc: Support using CFG0 bit 0 for remapping
  hw/arm/mps2-tz: Implement AN524 memory remapping via machine property

 docs/system/arm/mps2.rst   |  10 
 include/hw/misc/mps2-scc.h |  21 
 hw/arm/mps2-tz.c   | 106 -
 hw/misc/mps2-scc.c |  13 +++--
 4 files changed, 146 insertions(+), 4 deletions(-)

-- 
2.20.1

Re: [RFC PATCH 1/3] vfio: Introduce helpers to mark dirty pages of a RAM section


Hi Eric,

On 2021/4/8 21:46, Auger Eric wrote:

Hi Kunkun,

On 2/19/21 10:42 AM, Kunkun Jiang wrote:

Extract part of the code from vfio_sync_dirty_bitmap to form a
new helper, which allows to mark dirty pages of a RAM section.
This helper will be called for nested stage.

Signed-off-by: Kunkun Jiang 
---
  hw/vfio/common.c | 22 ++
  1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 9225f10722..7c50905856 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1203,6 +1203,19 @@ err_out:
  return ret;
  }
  
+static int vfio_dma_sync_ram_section_dirty_bitmap(VFIOContainer *container,

+  MemoryRegionSection *section)
+{
+ram_addr_t ram_addr;
+
+ram_addr = memory_region_get_ram_addr(section->mr) +
+   section->offset_within_region;
+
+return vfio_get_dirty_bitmap(container,
+   TARGET_PAGE_ALIGN(section->offset_within_address_space),
+   int128_get64(section->size), ram_addr);
+}
+
  typedef struct {
  IOMMUNotifier n;
  VFIOGuestIOMMU *giommu;
@@ -1244,8 +1257,6 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
  static int vfio_sync_dirty_bitmap(VFIOContainer *container,
MemoryRegionSection *section)
  {
-ram_addr_t ram_addr;
-
  if (memory_region_is_iommu(section->mr)) {
  VFIOGuestIOMMU *giommu;
  
@@ -1274,12 +1285,7 @@ static int vfio_sync_dirty_bitmap(VFIOContainer *container,

  return 0;
  }
  
-ram_addr = memory_region_get_ram_addr(section->mr) +

-   section->offset_within_region;
-
-return vfio_get_dirty_bitmap(container,
-   TARGET_PAGE_ALIGN(section->offset_within_address_space),

this is now REAL_HOST_PAGE_ALIGN

Thanks

Eric

Sorry for late replay.

Yes, it is REAL_HOST_PAGE_ALIGN now which modified by a patch I sent 
earlier.


I posted v2 a few days ago and I have modified TARGET_PAGE_ALIGN to
REAL_HOST_PAGE_ALIGN.[1]

[1] 
https://lore.kernel.org/qemu-devel/20210331101259.2153-2-jiangkun...@huawei.com/


Thanks,
Kunkun Jiang

-   int128_get64(section->size), ram_addr);
+return vfio_dma_sync_ram_section_dirty_bitmap(container, section);
  }
  
  static void vfio_listerner_log_sync(MemoryListener *listener,



.

Re: [RFC PATCH 0/3] Add migration support for VFIO PCI devices in SMMUv3 nested stage mode

Hi Eric,

On 2021/4/12 16:40, Auger Eric wrote:

Hi Kunkun,

On 2/19/21 10:42 AM, Kunkun Jiang wrote:

Hi all,

Since the SMMUv3's nested translation stages[1] has been introduced by Eric, we
need to pay attention to the migration of VFIO PCI devices in SMMUv3 nested
stage
mode. At present, it is not yet supported in QEMU. There are two problems in the
existing framework.

First, the current way to get dirty pages is not applicable to nested stage
mode.
Because of the "Caching Mode", VTD can map the RAM through the host single stage
(giova->hpa). "vfio_listener_log_sync" gets dirty pages by transferring "giova"
to the kernel for the RAM block section of mapped MMIO region. In nested stage
mode, we setup the stage 2 (gpa->hpa) and the stage 1(giova->gpa) separately. So
it is inapplicable to get dirty pages by the current way in nested stage mode.

Second, it also need to pass stage 1 configurations to the destination host
after
the migration. In Eric's patch, it passes the stage 1 configuration to the host
on
each STE update for the devices set the PASID PciOps. The configuration will be
applied at physical level. But the data of physical level will not be sent to
the
destination host. So we have to pass stage 1 configurations to the destination
host after the migration.

This Patch set includes patches as below:
Patch 1-2:
- Refactor the vfio_listener_log_sync and added a new function to get dirty
pages
in nested stage mode.

Patch 3:
- Added the post_load function to vmstate_smmuv3 for passing stage 1
configuration
to the destination host after the migration.

@Eric, Could you please add this Patch set to your future version of
"vSMMUv3/pSMMUv3 2 stage VFIO integration", if you think this Patch set makes
sense? :)

First of all, thank you for working on this. As you may have noticed I
sent a new RFC version yesterday (without including this). When time
allows, you may have a look at the comments I posted on your series. I
don't think I can test it at the moment so I may prefer to keep it
separate. Also be aware that the QEMU integration of nested has not
received much comments yet and is likely to evolve. The priority is to
get some R-b's on the kernel pieces, especially the SMMU part. With this
dependency resolved, things can't move forward I am afraid.

Thanks

Eric

Yes, I saw your latest version and comments. Thanks again for your review.

I will try my best to test your patches and come up with some useful ideas.
In the future, this series will be updated with your series of nested.
If conditions permit, I hope you can test it and give me some comments.

Best regards,
Kunkun Jiang

Best Regards
Kunkun Jiang

[1] [RFC,v7,00/26] vSMMUv3/pSMMUv3 2 stage VFIO integration
http://patchwork.ozlabs.org/project/qemu-devel/cover/20201116181349.11908-1-eric.au...@redhat.com/

Kunkun Jiang (3):
vfio: Introduce helpers to mark dirty pages of a RAM section
vfio: Add vfio_prereg_listener_log_sync in nested stage
hw/arm/smmuv3: Post-load stage 1 configurations to the host

hw/arm/smmuv3.c | 60 +
hw/arm/trace-events | 1 +
hw/vfio/common.c| 47 +--
3 files changed, 100 insertions(+), 8 deletions(-)

Re: [RFC PATCH 2/3] vfio: Add vfio_prereg_listener_log_sync in nested stage


Hi Eric,

On 2021/4/8 21:56, Auger Eric wrote:

Hi Kunkun,

On 2/19/21 10:42 AM, Kunkun Jiang wrote:

On Intel, the DMA mapped through the host single stage. Instead
we set up the stage 2 and stage 1 separately in nested mode as there
is no "Caching Mode".

You need to rewrite the above sentences, Missing ARM and also the 1st
sentences misses a verb.

Thanks for your review! I will fix it in the next version.

Legacy vfio_listener_log_sync cannot be used in nested stage as we
don't need to pay close attention to stage 1 mapping. This patch adds
vfio_prereg_listener_log_sync to mark dirty pages in nested mode.

Signed-off-by: Kunkun Jiang 
---
  hw/vfio/common.c | 25 +
  1 file changed, 25 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 7c50905856..af333e0dee 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1216,6 +1216,22 @@ static int 
vfio_dma_sync_ram_section_dirty_bitmap(VFIOContainer *container,
 int128_get64(section->size), ram_addr);
  }
  
+static void vfio_prereg_listener_log_sync(MemoryListener *listener,

+  MemoryRegionSection *section)
+{
+VFIOContainer *container =
+container_of(listener, VFIOContainer, prereg_listener);
+
+if (!memory_region_is_ram(section->mr) ||
+!container->dirty_pages_supported) {
+return;
+}
+
+if (vfio_devices_all_saving(container)) {

I fail to see where is this defined?

Keqian modified vfio_devices_all_saving to vfio_devices_all_dirty_tracking
in 758b96b61d5.

When I posted this series patches, it was vfio_devices_all_saving. In 
v2[1], I

have updated it based on the lasted qemu.

[1] 
https://lore.kernel.org/qemu-devel/20210331101259.2153-3-jiangkun...@huawei.com/


Thanks,
Kunkun Jiang

+vfio_dma_sync_ram_section_dirty_bitmap(container, section);
+}
+}
+
  typedef struct {
  IOMMUNotifier n;
  VFIOGuestIOMMU *giommu;
@@ -1260,6 +1276,14 @@ static int vfio_sync_dirty_bitmap(VFIOContainer 
*container,
  if (memory_region_is_iommu(section->mr)) {
  VFIOGuestIOMMU *giommu;
  
+/*

+ * In nested mode, stage 2 and stage 1 are set up separately. We
+ * only need to focus on stage 2 mapping when marking dirty pages.
+ */
+if (container->iommu_type == VFIO_TYPE1_NESTING_IOMMU) {
+return 0;
+}
+
  QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
  if (MEMORY_REGION(giommu->iommu) == section->mr &&
  giommu->n.start == section->offset_within_region) {
@@ -1312,6 +1336,7 @@ static const MemoryListener vfio_memory_listener = {
  static MemoryListener vfio_memory_prereg_listener = {
  .region_add = vfio_prereg_listener_region_add,
  .region_del = vfio_prereg_listener_region_del,
+.log_sync = vfio_prereg_listener_log_sync,
  };
  
  static void vfio_listener_release(VFIOContainer *container)



Thanks

Eric

.

Re: [RFC PATCH 3/3] hw/arm/smmuv3: Post-load stage 1 configurations to the host


Hi Eric,

On 2021/4/12 16:34, Auger Eric wrote:

Hi Kunkun,
On 2/19/21 10:42 AM, Kunkun Jiang wrote:

In nested mode, we call the set_pasid_table() callback on each STE
update to pass the guest stage 1 configuration to the host and
apply it at physical level.

In the case of live migration, we need to manual call the

s/manual/manually

You are right.

set_pasid_table() to load the guest stage 1 configurations to the
host. If this operation is fail, the migration is fail.

s/If this operation is fail, the migration is fail/If this operation
fails, the migration fails.

Thanks for your careful review.

Signed-off-by: Kunkun Jiang 
---
  hw/arm/smmuv3.c | 60 +
  hw/arm/trace-events |  1 +
  2 files changed, 61 insertions(+)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 6c6ed84e78..94ca15375c 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1473,6 +1473,65 @@ static void smmu_realize(DeviceState *d, Error **errp)
  smmu_init_irq(s, dev);
  }
  
+static int smmuv3_manual_set_pci_device_pasid_table(SMMUDevice *sdev)

Can't you retrieve the associated sid and then call
smmuv3_notify_config_change()

Agreed. It can reuse the code of smmuv3_notify_config_change().
I will modify it in the next version.😁

But I need a return value to judge whether it is successful, which
may need to modify the type of smmuv3_notify_config_change().

Thanks,
Kunkun Jiang

+{
+#ifdef __linux__
+IOMMUMemoryRegion *mr = &(sdev->iommu);
+int sid = smmu_get_sid(sdev);
+SMMUEventInfo event = {.type = SMMU_EVT_NONE, .sid = sid,
+   .inval_ste_allowed = true};
+IOMMUConfig iommu_config = {};
+SMMUTransCfg *cfg;
+int ret = -1;
+
+cfg = smmuv3_get_config(sdev, &event);
+if (!cfg) {
+return ret;
+}
+
+iommu_config.pasid_cfg.argsz = sizeof(struct iommu_pasid_table_config);
+iommu_config.pasid_cfg.version = PASID_TABLE_CFG_VERSION_1;
+iommu_config.pasid_cfg.format = IOMMU_PASID_FORMAT_SMMUV3;
+iommu_config.pasid_cfg.base_ptr = cfg->s1ctxptr;
+iommu_config.pasid_cfg.pasid_bits = 0;
+iommu_config.pasid_cfg.vendor_data.smmuv3.version = 
PASID_TABLE_SMMUV3_CFG_VERSION_1;
+
+if (cfg->disabled || cfg->bypassed) {
+iommu_config.pasid_cfg.config = IOMMU_PASID_CONFIG_BYPASS;
+} else if (cfg->aborted) {
+iommu_config.pasid_cfg.config = IOMMU_PASID_CONFIG_ABORT;
+} else {
+iommu_config.pasid_cfg.config = IOMMU_PASID_CONFIG_TRANSLATE;
+}
+
+ret = pci_device_set_pasid_table(sdev->bus, sdev->devfn, &iommu_config);
+if (ret) {
+error_report("Failed to pass PASID table to host for iommu mr %s (%m)",
+ mr->parent_obj.name);
+}
+
+return ret;
+#endif
+}
+
+static int smmuv3_post_load(void *opaque, int version_id)
+{
+SMMUv3State *s3 = opaque;
+SMMUState *s = &(s3->smmu_state);
+SMMUDevice *sdev;
+int ret = 0;
+
+QLIST_FOREACH(sdev, &s->devices_with_notifiers, next) {
+trace_smmuv3_post_load_sdev(sdev->devfn, sdev->iommu.parent_obj.name);
+ret = smmuv3_manual_set_pci_device_pasid_table(sdev);
+if (ret) {
+break;
+}
+}
+
+return ret;
+}
+
  static const VMStateDescription vmstate_smmuv3_queue = {
  .name = "smmuv3_queue",
  .version_id = 1,
@@ -1491,6 +1550,7 @@ static const VMStateDescription vmstate_smmuv3 = {
  .version_id = 1,
  .minimum_version_id = 1,
  .priority = MIG_PRI_IOMMU,
+.post_load = smmuv3_post_load,
  .fields = (VMStateField[]) {
  VMSTATE_UINT32(features, SMMUv3State),
  VMSTATE_UINT8(sid_size, SMMUv3State),
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index 35e562ab74..caa864dd72 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -53,4 +53,5 @@ smmuv3_notify_flag_add(const char *iommu) "ADD SMMUNotifier 
node for iommu mr=%s
  smmuv3_notify_flag_del(const char *iommu) "DEL SMMUNotifier node for iommu 
mr=%s"
  smmuv3_inv_notifiers_iova(const char *name, uint16_t asid, uint64_t iova, uint8_t tg, uint64_t 
num_pages) "iommu mr=%s asid=%d iova=0x%"PRIx64" tg=%d num_pages=0x%"PRIx64
  smmuv3_notify_config_change(const char *name, uint8_t config, uint64_t s1ctxptr) 
"iommu mr=%s config=%d s1ctxptr=0x%"PRIx64
+smmuv3_post_load_sdev(int devfn, const char *name) "sdev devfn=%d iommu 
mr=%s"PRIx64
  


Thanks

Eric

.

RE: [PATCH 1/4] target/ppc: Code motion required to build disabling tcg

2021-04-12 Thread Fabiano Rosas

Bruno Piazera Larsen  writes:

>> > This commit does the necessary code motion from translate_init.c.inc
>>
>> For instance, I don't immediately see why these changes are necessary. I
>> see that translate_init.c.inc already has some `#ifdef CONFIG_TCG`, so
>> why do we need to move a bunch of code into cpu.c instead of just adding
>> more code under ifdef CONFIG_TCG? (I'm not saying it's wrong, just trying to
>> understand the reasoning).
>
> There are 3 main reasons for this decision. The first is kind of
> silly, but when I read translate.c my mind jumped to translating
> machine code to TCG, and the amount of TCGv variables at the start
> reinforced this notion.  The second was that both s390x and i386
> removed it (translate.c) from compilation, so I had no good reason to
> doubt it.  The last (and arguably most important) is that translate.c
> is many thousands of lines long (translate_init.c.inc alone was almost
> 11k). The whole point of disabling TCG is to speed up compilation and
> reduce the final file size, so I think it makes sense to remove that
> big file.  And the final nail in the coffin is that at no point did it
> cross my mind to keep the init part of translation, but remove the
> rest

Ok, so whatever you decide to do, make sure you state: "this is where
TCG functions go", "this file is built on KVM|TCG only", and so on. And
it's ok in principle to do a partial move for the RFC, but please
mention that somewhere so we're all in the same page.

>
> Also, I'm not a fan of big ifdefs, because it's kinda hard to follow
> them when viewing code in vim. Adding that to already having a cpu.c
> file, where it makes sense (to me, at least) to add all CPU related
> functions, just sounded like a good idea.

My point about ifdefs is that it would be easier to understand if you
either:

a) wrapped code in ifdefs, got the RCF going and then later moved all
ifdef'ed code into a new tcg-only file;

or

b) define which is the tcg-only file right away and just move code in
there.

When you moved code into cpu.c *along with* the ifdefs, you kind of did
both at the same time, which got confusing; to me at least.

About moving CPU related functions into cpu.c, that's fine. But it is
not strictly related to disable-tcg, so maybe send that in a separate
patch that does it explicitly at the start of the series.

>
>> Is translate_init.c.inc intended to be TCG only? But then I see you
>> moved TCG-only functions out of it (ppc_fixup_cpu) and left not TCG-only
>> functions (gen_spr_generic).
>
> This is me misjudging what is TCG and what is not, mostly. I think
> that actually moving everything to cpu.c and adding ifdefs, or
> creating a cpu_tcg.c.inc or similar, would be the most sensible code
> motion, but every function I removed from translate gave me 3 new
> errors, at some point I felt like I should leave something behind
> otherwise we're compiling everything from TCG again, just in different
> files, so I tried to guess what was TCG and what was not (also, I
> really wanted the RFC out by the weekend, I _may_ have rushed a few
> choices).

Ok, so you left out some things on purpose to reduce complexity for the
first RFC. That's fine, we just need to state these kinds of thing more
clearly.

>
>> > This moves all functions that start with gdb_* into target/ppc/gdbstub.c
>> > and creates a new function that calls those and is called by 
>> > ppc_cpu_realize
>>
>> This looks like it makes sense regardless of disable-tcg, could we have
>> it in a standalone patch?
>
> Sure, I'll try and get it ready ASAP, and make sure I didn't miss one
> function before sending. Just a noob question... do I edit the patch
> manually to have it only contain the gdb code motion, or is there a
> way to move back to before I actually made the commit without needing
> to re-do the changes?

You could git rebase -i back into your first patch and then split it in
two (git reset HEAD~1 and stage + commit each one), one for moving CPU
code into cpu.c and other for moving GDB code into gdbstub.c.

Or just checkout a new branch, apply the patch on top of it and commit
just the GDB change.

Feel free to ping me on IRC if you need help.

Re: [RFC v12 12/65] target/arm: move physical address translation to cpu-mmu

2021-04-12 Thread Alex Bennée

Claudio Fontana  writes:

> get_phys_addr is needed for KVM too, and in turn it requires
> the aa64_va_parameter* family of functions.
>
> Create cpu-mmu and cpu-mmu-sysemu to store these and
> other mmu-related functions.
>
> Signed-off-by: Claudio Fontana 
> Reviewed-by: Richard Henderson 
> ---
>  target/arm/cpu-mmu.h   |  119 ++
>  target/arm/cpu.h   |3 -
>  target/arm/internals.h |   34 -
>  target/arm/cpu-mmu-sysemu.c| 2307 ++
>  target/arm/cpu-mmu.c   |  124 ++
>  target/arm/cpu.c   |1 +
>  target/arm/tcg/helper.c| 2442 +---

I realise I reviewed the previous patch but we moved everything
wholesale into target/arm/tcg only to move stuff right back out again. I
can see there maybe an argument for a "move and pare down" approach but
that should probably be expanded on in the comment for 2/65 (move
helpers to tcg).

Anyway:

Reviewed-by: Alex Bennée 

-- 
Alex Bennée

Re: [PATCH v2] libqtest: refuse QTEST_QEMU_BINARY=qemu-kvm

On Mon, Apr 12, 2021 at 11:35:40AM +0200, Thomas Huth wrote:
> On 12/04/2021 11.18, Stefan Hajnoczi wrote:
> > Some downstreams rename the QEMU binary to "qemu-kvm". This breaks
> > qtest_get_arch(), which attempts to parse the target architecture from
> > the QTEST_QEMU_BINARY environment variable.
> > 
> > Print an error instead of returning the architecture "kvm". Things fail
> > in weird ways when the architecture string is bogus.
> > 
> > Arguably qtests should always be run in a build directory instead of
> > against an installed QEMU. In any case, printing a clear error when this
> > happens is helpful.
> > 
> > Reported-by: Qin Wang 
> > Cc: Emanuele Giuseppe Esposito 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >   tests/qtest/libqtest.c | 10 ++
> >   1 file changed, 10 insertions(+)
> > 
> > diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
> > index 71e359efcd..7caf20f56b 100644
> > --- a/tests/qtest/libqtest.c
> > +++ b/tests/qtest/libqtest.c
> > @@ -910,6 +910,16 @@ const char *qtest_get_arch(void)
> >   abort();
> >   }
> > +if (!strstr(qemu, "-system-")) {
> > +fprintf(stderr, "QTEST_QEMU_BINARY must end with *-system- 
> > where "
> > +"'arch' is the target architecture (x86_64, 
> > aarch64, "
> > +"etc). If you are using qemu-kvm or another custom 
> > "
> > +"name, please create a symlink like ln -s "
> > +"path/to/qemu-kvm qemu-system-x86_64 and use that "
> > +"instead.\n");
> 
> The text is very long ... maybe add some \n to wrap it after 80 columns?
> (also not sure whether we really need the second part about the symlink...
> but I also don't mind leaving it in)
> 
> > +abort();
> 
> Since this can be triggered by the user, I'd rather use exit(1) instead,
> what do you think?

Sure, but in that case I guess the abort() call above also needs to be
changed? It is triggered when the QTEST_QEMU_BINARY path does not
contain a hyphen ('-') and it currently aborts.

Stefan


signature.asc
Description: PGP signature

Re: [PATCH v3 06/36] block: drop ctx argument from bdrv_root_attach_child

2021-04-12 Thread Alberto Garcia

On Wed 17 Mar 2021 03:34:59 PM CET, Vladimir Sementsov-Ogievskiy 
 wrote:
> Passing parent aio context is redundant, as child_class and parent
> opaque pointer are enough to retrieve it. Drop the argument and use new
> bdrv_child_get_parent_aio_context() interface.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy 

Reviewed-by: Alberto Garcia 

Berto

Re: * ARC port for review *

2021-04-12 Thread Cupertino Miranda

Hi Richard,

Thanks for your quick reply.

For the time, ARCv3 is an under development architecture and 
unfortunately, for the time being, there is no public documentation 
available.
Please notice that this should be all available once ARCv3 gets released.

QEMU ARCv3 port is really an early stage to prepare the ecosystem and we 
decided to include it in this patches to try to be as transparent as 
possible.

Regards,
Cupertino

On 4/7/21 12:47 AM, Richard Henderson wrote:
> On 4/5/21 7:31 AM, cupertinomira...@gmail.com wrote:
>> In order to simplify the review process, we have separated the patches
>> for ARCv3 from the previous emailed ARCv2 ones. Being the patches from
>> 1 to 16 for ARCv2 and 17 to 27 for ARCv3.
>
> How may one find the arcv3 documentation on the synopsis website?
>
> The closest I can find is
>
>   https://www.synopsys.com/designware-ip/processor-solutions.html
>
> which best I can figure leads to a paywall.
>
>
> r~

Re: [PATCH 07/27] arc: TCG instruction definitions

2021-04-12 Thread Cupertino Miranda

Hi Richard,

I totally understand your position with a new scripting language and the 
unclean code produced by the auto generated tools.
In order to ease out the review process, I propose to drop the idea of 
the generated code and cleanup by hand all of the semfunc.c functions.
What is you opinion about this?

Just to clarify my initial position:

I agree that output code from my generation tools are far from optimal 
and way too verbose.
First thing to improve would be to replace the temp_locals when possible.

 From my early experiments, I got the impression that TCG optimizer was 
not that bad and that hand optimized TCG would not be producing 
significantly better x86 code, except when using those temp_locals, 
obviously.
My personal inclination, and initial thought, was that more verbose code 
would be acceptable. Also my perception, since I did not had the 
opportunity to dig into the TCG optimizer, was that TCG optimizer before 
being able to generate host code would need to decompose any TCG 
constructs into simpler forms and only then construct host machine code, 
and for that reason having more compact TCG code would be more of a code 
size optimization rather then a real improvement in final execution result.

Answering also on the duplication from v2 and v3. I understand that 
duplication in general seems sloppy. However, please take into 
consideration that the semfunc.c, mapping and decoder code are generated 
or reused from binutils, did not seem to be so bad to keep them in the 
original form.

Regards,
Cupertino

On 4/8/21 1:20 AM, Richard Henderson wrote:
> On 4/5/21 7:31 AM, cupertinomira...@gmail.com wrote:
>> +/*
>> + * ADD
>> + *    Variables: @b, @c, @a
>> + *    Functions: getCCFlag, getFFlag, setZFlag, setNFlag, setCFlag, 
>> CarryADD,
>> + *   setVFlag, OverflowADD
>> + * --- code ---
>> + * {
>> + *   cc_flag = getCCFlag ();
>> + *   lb = @b;
>> + *   lc = @c;
>> + *   if((cc_flag == true))
>> + * {
>> + *   lb = @b;
>> + *   lc = @c;
>> + *   @a = (@b + @c);
>> + *   if((getFFlag () == true))
>> + * {
>> + *   setZFlag (@a);
>> + *   setNFlag (@a);
>> + *   setCFlag (CarryADD (@a, lb, lc));
>> + *   setVFlag (OverflowADD (@a, lb, lc));
>> + * };
>> + * };
>> + * }
>> + */
>> +
>> +int
>> +arc_gen_ADD(DisasCtxt *ctx, TCGv b, TCGv c, TCGv a)
>> +{
>> +    int ret = DISAS_NEXT;
>> +    TCGv temp_3 = tcg_temp_local_new();
>> +    TCGv cc_flag = tcg_temp_local_new();
>> +    TCGv lb = tcg_temp_local_new();
>> +    TCGv lc = tcg_temp_local_new();
>> +    TCGv temp_1 = tcg_temp_local_new();
>> +    TCGv temp_2 = tcg_temp_local_new();
>> +    TCGv temp_5 = tcg_temp_local_new();
>> +    TCGv temp_4 = tcg_temp_local_new();
>> +    TCGv temp_7 = tcg_temp_local_new();
>> +    TCGv temp_6 = tcg_temp_local_new();
>> +    getCCFlag(temp_3);
>> +    tcg_gen_mov_tl(cc_flag, temp_3);
>> +    tcg_gen_mov_tl(lb, b);
>> +    tcg_gen_mov_tl(lc, c);
>> +    TCGLabel *done_1 = gen_new_label();
>> +    tcg_gen_setcond_tl(TCG_COND_EQ, temp_1, cc_flag, arc_true);
>> +    tcg_gen_xori_tl(temp_2, temp_1, 1);
>> +    tcg_gen_andi_tl(temp_2, temp_2, 1);
>> +    tcg_gen_brcond_tl(TCG_COND_EQ, temp_2, arc_true, done_1);
>> +    tcg_gen_mov_tl(lb, b);
>> +    tcg_gen_mov_tl(lc, c);
>> +    tcg_gen_add_tl(a, b, c);
>> +    if ((getFFlag () == true)) {
>> +    setZFlag(a);
>> +    setNFlag(a);
>> +    CarryADD(temp_5, a, lb, lc);
>> +    tcg_gen_mov_tl(temp_4, temp_5);
>> +    setCFlag(temp_4);
>> +    OverflowADD(temp_7, a, lb, lc);
>> +    tcg_gen_mov_tl(temp_6, temp_7);
>> +    setVFlag(temp_6);
>> +    }
>> +    gen_set_label(done_1);
>> +    tcg_temp_free(temp_3);
>> +    tcg_temp_free(cc_flag);
>> +    tcg_temp_free(lb);
>> +    tcg_temp_free(lc);
>> +    tcg_temp_free(temp_1);
>> +    tcg_temp_free(temp_2);
>> +    tcg_temp_free(temp_5);
>> +    tcg_temp_free(temp_4);
>> +    tcg_temp_free(temp_7);
>> +    tcg_temp_free(temp_6);
>> +
>> +    return ret;
>> +}
>
> I must say I'm not really impressed by the results here.
>
> Your input is clearly intended to be fed to an optimizing compiler, 
> which TCG is not.
>
>
>> +/*
>> + * DIV
>> + *    Variables: @src2, @src1, @dest
>> + *    Functions: getCCFlag, divSigned, getFFlag, setZFlag, setNFlag, 
>> setVFlag
>> + * --- code ---
>> + * {
>> + *   cc_flag = getCCFlag ();
>> + *   if((cc_flag == true))
>> + * {
>> + *   if(((@src2 != 0) && ((@src1 != 2147483648) || (@src2 != 
>> 4294967295
>> + * {
>> + *   @dest = divSigned (@src1, @src2);
>> + *   if((getFFlag () == true))
>> + * {
>> + *   setZFlag (@dest);
>> + *   setNFlag (@dest);
>> + *   setVFlag (0);
>> + * };
>> + * }
>> + *   else
>> + * {
>> + * };
>> + * };
>> + * }
>> + */
>> +
>> +int
>> +arc_gen_DIV(DisasCtxt *ctx, TCGv src2, TCGv src1, TCGv dest)
>> +{
>

[PATCH v3] libqtest: refuse QTEST_QEMU_BINARY=qemu-kvm

Some downstreams rename the QEMU binary to "qemu-kvm". This breaks
qtest_get_arch(), which attempts to parse the target architecture from
the QTEST_QEMU_BINARY environment variable.

Print an error instead of returning the architecture "kvm". Things fail
in weird ways when the architecture string is bogus.

Arguably qtests should always be run in a build directory instead of
against an installed QEMU. In any case, printing a clear error when this
happens is helpful.

Since this is an error that is triggered by the user and not a test
failure, use exit(1) instead of abort(). Change the existing abort()
call in qtest_get_arch() to exit(1) too for the same reason and to be
consistent.

Reported-by: Qin Wang 
Cc: Emanuele Giuseppe Esposito 
Signed-off-by: Stefan Hajnoczi 
---
v3:
 * Add newline to wrap error message output at 80 columns [Thomas]
 * Drop information about working around this using a symlink [Thomas]
 * Use exit(1) instead of abort() [Thomas]

 tests/qtest/libqtest.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index 71e359efcd..825b13a44c 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -907,7 +907,14 @@ const char *qtest_get_arch(void)
 
 if (!end) {
 fprintf(stderr, "Can't determine architecture from binary name.\n");
-abort();
+exit(1);
+}
+
+if (!strstr(qemu, "-system-")) {
+fprintf(stderr, "QTEST_QEMU_BINARY must end with *-system- "
+"where 'arch' is the target\narchitecture (x86_64, aarch64, "
+"etc).\n");
+exit(1);
 }
 
 return end + 1;
-- 
2.30.2

[PATCH v2 0/2] docs/devel/qgraph: add troubleshooting information

v2:
 * Fix "will unavailable" typo [Thomas]

I recently needed to troubleshoot a case where qos-test terminated immediately
with no output. In other words, qos-test decided that no tests are runnable.

After lots of head scratching and some help from Emanuele it turned out that
the machine types weren't being detected as expected.

These patches add documentation about how to troubleshoot similar cases in the
future.

Stefan Hajnoczi (2):
  libqos/qgraph: fix "UNAVAILBLE" typo
  docs/devel/qgraph: add troubleshooting information

 docs/devel/qgraph.rst   | 58 +
 tests/qtest/libqos/qgraph.c |  2 +-
 2 files changed, 59 insertions(+), 1 deletion(-)

-- 
2.30.2

[PATCH v2 1/2] libqos/qgraph: fix "UNAVAILBLE" typo

Cc: Emanuele Giuseppe Esposito 
Cc: Paolo Bonzini 
Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Philippe Mathieu-Daudé 
---
 tests/qtest/libqos/qgraph.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/libqos/qgraph.c b/tests/qtest/libqos/qgraph.c
index b3b1a31f81..d1dc491930 100644
--- a/tests/qtest/libqos/qgraph.c
+++ b/tests/qtest/libqos/qgraph.c
@@ -844,7 +844,7 @@ void qos_dump_graph(void)
 }
 qos_printf_literal("type=%d cmd_line='%s' [%s]\n",
node->type, node->command_line,
-   node->available ? "available" : "UNAVAILBLE"
+   node->available ? "available" : "UNAVAILABLE"
 );
 }
 g_list_free(keys);
-- 
2.30.2

Re: [PATCH v2] libqtest: refuse QTEST_QEMU_BINARY=qemu-kvm


On 12/04/2021 16.21, Stefan Hajnoczi wrote:

On Mon, Apr 12, 2021 at 11:35:40AM +0200, Thomas Huth wrote:

On 12/04/2021 11.18, Stefan Hajnoczi wrote:

Some downstreams rename the QEMU binary to "qemu-kvm". This breaks
qtest_get_arch(), which attempts to parse the target architecture from
the QTEST_QEMU_BINARY environment variable.

Print an error instead of returning the architecture "kvm". Things fail
in weird ways when the architecture string is bogus.

Arguably qtests should always be run in a build directory instead of
against an installed QEMU. In any case, printing a clear error when this
happens is helpful.

Reported-by: Qin Wang 
Cc: Emanuele Giuseppe Esposito 
Signed-off-by: Stefan Hajnoczi 
---
   tests/qtest/libqtest.c | 10 ++
   1 file changed, 10 insertions(+)

diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index 71e359efcd..7caf20f56b 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -910,6 +910,16 @@ const char *qtest_get_arch(void)
   abort();
   }
+if (!strstr(qemu, "-system-")) {
+fprintf(stderr, "QTEST_QEMU_BINARY must end with *-system- where 
"
+"'arch' is the target architecture (x86_64, aarch64, "
+"etc). If you are using qemu-kvm or another custom "
+"name, please create a symlink like ln -s "
+"path/to/qemu-kvm qemu-system-x86_64 and use that "
+"instead.\n");


The text is very long ... maybe add some \n to wrap it after 80 columns?
(also not sure whether we really need the second part about the symlink...
but I also don't mind leaving it in)


+abort();


Since this can be triggered by the user, I'd rather use exit(1) instead,
what do you think?


Sure, but in that case I guess the abort() call above also needs to be
changed? It is triggered when the QTEST_QEMU_BINARY path does not
contain a hyphen ('-') and it currently aborts.


Drat, you're right, and it was even me who added that :-/ ... if you've got 
some spare minutes, could you send a patch for that, too, please? (Otherwise 
I'll do it later)


 Thomas

[PATCH v2 2/2] docs/devel/qgraph: add troubleshooting information

It can be tricky to troubleshoot qos-test when a test won't execute. Add
an explanation of how to trace qgraph node connectivity and find which
node has the problem.

Cc: Emanuele Giuseppe Esposito 
Cc: Paolo Bonzini 
Signed-off-by: Stefan Hajnoczi 
---
 docs/devel/qgraph.rst | 58 +++
 1 file changed, 58 insertions(+)

diff --git a/docs/devel/qgraph.rst b/docs/devel/qgraph.rst
index a9aff167ad..318534d4b0 100644
--- a/docs/devel/qgraph.rst
+++ b/docs/devel/qgraph.rst
@@ -92,6 +92,64 @@ The basic framework steps are the following:
 Depending on the QEMU binary used, only some drivers/machines will be
 available and only test that are reached by them will be executed.
 
+Troubleshooting unavailable tests
+^
+If there is no path from an available machine to a test then that test will be
+unavailable and won't execute. This can happen if a test or driver did not set
+up its qgraph node correctly. It can also happen if the necessary machine type
+or device is missing from the QEMU binary because it was compiled out or
+otherwise.
+
+It is possible to troubleshoot unavailable tests by running::
+
+  $ QTEST_QEMU_BINARY=build/qemu-system-x86_64 build/tests/qtest/qos-test 
--verbose
+  # ALL QGRAPH EDGES: {
+  #   src='virtio-net'
+  #  |-> dest='virtio-net-tests/vhost-user/multiqueue' type=2 
(node=0x559142109e30)
+  #  |-> dest='virtio-net-tests/vhost-user/migrate' type=2 
(node=0x559142109d00)
+  #   src='virtio-net-pci'
+  #  |-> dest='virtio-net' type=1 (node=0x55914210d740)
+  #   src='pci-bus'
+  #  |-> dest='virtio-net-pci' type=2 (node=0x55914210d880)
+  #   src='pci-bus-pc'
+  #  |-> dest='pci-bus' type=1 (node=0x559142103f40)
+  #   src='i440FX-pcihost'
+  #  |-> dest='pci-bus-pc' type=0 (node=0x55914210ac70)
+  #   src='x86_64/pc'
+  #  |-> dest='i440FX-pcihost' type=0 (node=0x5591421117f0)
+  #   src=''
+  #  |-> dest='x86_64/pc' type=0 (node=0x559142111600)
+  #  |-> dest='arm/raspi2' type=0 (node=0x559142110740)
+  ...
+  # }
+  # ALL QGRAPH NODES: {
+  #   name='virtio-net-tests/announce-self' type=3 cmd_line='(null)' 
[available]
+  #   name='arm/raspi2' type=0 cmd_line='-M raspi2 ' [UNAVAILABLE]
+  ...
+  # }
+
+The ``virtio-net-tests/announce-self`` test is listed as "available" in the
+"ALL QGRAPH NODES" output. This means the test will execute. We can follow the
+qgraph path in the "ALL QGRAPH EDGES" output as follows: '' -> 'x86_64/pc' ->
+'i440FX-pcihost' -> 'pci-bus-pc' -> 'pci-bus' -> 'virtio-net-pci' ->
+'virtio-net'. The root of the qgraph is '' and the depth first search begins
+there.
+
+The ``arm/raspi`` machine node is listed as "UNAVAILABLE". Although it is
+reachable from the root via '' -> 'arm/raspi2' the node is unavailable because
+the QEMU binary did not list it when queried by the framework. This is expected
+because we used the ``qemu-system-x86_64`` binary which does not support ARM
+machine types.
+
+If a test is unexpectedly listed as "UNAVAILABLE", first check that the "ALL
+QGRAPH EDGES" output reports edge connectivity from the root ('') to the test.
+If there is no connectivity then the qgraph nodes were not set up correctly and
+the driver or test code is incorrect. If there is connectivity, check the
+availability of each node in the path in the "ALL QGRAPH NODES" output. The
+first unavailable node in the path is the reason why the test is unavailable.
+Typically this is because the QEMU binary lacks support for the necessary
+machine type or device.
+
 Creating a new driver and its interface
 "
 
-- 
2.30.2

Re: [PATCH 0/3] mps3-an524: support memory remapping

Hi Peter,

On 4/12/21 3:43 PM, Peter Maydell wrote:
> The AN524 FPGA image supports two memory maps, which differ
> in where the QSPI and BRAM are. In the default map, the BRAM
> is at 0x_, and the QSPI at 0x2800_. In the second
> map, they are the other way around.
> 
> In hardware, the initial mapping can be selected by the user
> by writing either "REMAP: BRAM" (the default) or "REMAP: QSPI"
> in the board configuration file. The guest can also dynamically
> change the mapping via the SCC CFG_REG0 register.
> 
> This patchset adds support for the feature to QEMU's model;
> the user-sets-the-initial-mapping part is a new machine property
> which can be set with "-M remap=QSPI".
> 
> This is needed for some guest images -- for instance the
> Arm TF-M binaries -- which assume they have the QSPI layout.

I tend to see machine property set on the command line similar
to hardware wire jumpers, externally set (by an operator having
access to the hardware, not guest code).

Here the remap behavior you described is triggered by the guest.
Usually this is done by a bootloader code before running the
guest code.
Couldn't we have the same result using a booloader (like -bios
cmd line option) rather than modifying internal peripheral state?

I'm worried anyone wants to add its own machine property to simplify
device modelling, but maybe this is the correct way to go...

Regards,

Phil.

Re: [PATCH 2/2] docs/devel/qgraph: add troubleshooting information

On Sat, Apr 10, 2021 at 06:41:03AM +0200, Thomas Huth wrote:
> On 09/04/2021 21.01, Stefan Hajnoczi wrote:
> > It can be tricky to troubleshoot qos-test when a test won't execute. Add
> > an explanation of how to trace qgraph node connectivity and find which
> > node has the problem.
> > 
> > Cc: Emanuele Giuseppe Esposito 
> > Cc: Paolo Bonzini 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >   docs/devel/qgraph.rst | 58 +++
> >   1 file changed, 58 insertions(+)
> > 
> > diff --git a/docs/devel/qgraph.rst b/docs/devel/qgraph.rst
> > index a9aff167ad..4635efb2c2 100644
> > --- a/docs/devel/qgraph.rst
> > +++ b/docs/devel/qgraph.rst
> > @@ -92,6 +92,64 @@ The basic framework steps are the following:
> >   Depending on the QEMU binary used, only some drivers/machines will be
> >   available and only test that are reached by them will be executed.
> > +Troubleshooting unavailable tests
> > +^
> > +If there is no path from an available machine to a test then that test will
> > +unavailable and cannot execute. This can happen if a test or driver did 
> > not set
> 
> "will be unavailable" ? or "will be marked as unavailable" ?
> 
> Apart from that, patch looks fine to me.

Thanks, will fix.

Stefan


signature.asc
Description: PGP signature

Re: [PATCH v2] libqtest: refuse QTEST_QEMU_BINARY=qemu-kvm


On 12/04/2021 16.35, Thomas Huth wrote:

On 12/04/2021 16.21, Stefan Hajnoczi wrote:

On Mon, Apr 12, 2021 at 11:35:40AM +0200, Thomas Huth wrote:

On 12/04/2021 11.18, Stefan Hajnoczi wrote:

Some downstreams rename the QEMU binary to "qemu-kvm". This breaks
qtest_get_arch(), which attempts to parse the target architecture from
the QTEST_QEMU_BINARY environment variable.

Print an error instead of returning the architecture "kvm". Things fail
in weird ways when the architecture string is bogus.

Arguably qtests should always be run in a build directory instead of
against an installed QEMU. In any case, printing a clear error when this
happens is helpful.

Reported-by: Qin Wang 
Cc: Emanuele Giuseppe Esposito 
Signed-off-by: Stefan Hajnoczi 
---
   tests/qtest/libqtest.c | 10 ++
   1 file changed, 10 insertions(+)

diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index 71e359efcd..7caf20f56b 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -910,6 +910,16 @@ const char *qtest_get_arch(void)
   abort();
   }
+    if (!strstr(qemu, "-system-")) {
+    fprintf(stderr, "QTEST_QEMU_BINARY must end with 
*-system- where "
+    "'arch' is the target architecture (x86_64, 
aarch64, "
+    "etc). If you are using qemu-kvm or another 
custom "

+    "name, please create a symlink like ln -s "
+    "path/to/qemu-kvm qemu-system-x86_64 and use 
that "

+    "instead.\n");


The text is very long ... maybe add some \n to wrap it after 80 columns?
(also not sure whether we really need the second part about the symlink...
but I also don't mind leaving it in)


+    abort();


Since this can be triggered by the user, I'd rather use exit(1) instead,
what do you think?


Sure, but in that case I guess the abort() call above also needs to be
changed? It is triggered when the QTEST_QEMU_BINARY path does not
contain a hyphen ('-') and it currently aborts.


Drat, you're right, and it was even me who added that :-/ ... if you've got 
some spare minutes, could you send a patch for that, too, please? (Otherwise 
I'll do it later)


Never mind, I just saw that you've fixed it in v3 already :-)

 Thomas

Re: [PATCH v3] libqtest: refuse QTEST_QEMU_BINARY=qemu-kvm


On 12/04/2021 16.30, Stefan Hajnoczi wrote:

Some downstreams rename the QEMU binary to "qemu-kvm". This breaks
qtest_get_arch(), which attempts to parse the target architecture from
the QTEST_QEMU_BINARY environment variable.

Print an error instead of returning the architecture "kvm". Things fail
in weird ways when the architecture string is bogus.

Arguably qtests should always be run in a build directory instead of
against an installed QEMU. In any case, printing a clear error when this
happens is helpful.

Since this is an error that is triggered by the user and not a test
failure, use exit(1) instead of abort(). Change the existing abort()
call in qtest_get_arch() to exit(1) too for the same reason and to be
consistent.

Reported-by: Qin Wang 
Cc: Emanuele Giuseppe Esposito 
Signed-off-by: Stefan Hajnoczi 
---
v3:
  * Add newline to wrap error message output at 80 columns [Thomas]
  * Drop information about working around this using a symlink [Thomas]
  * Use exit(1) instead of abort() [Thomas]

  tests/qtest/libqtest.c | 9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index 71e359efcd..825b13a44c 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -907,7 +907,14 @@ const char *qtest_get_arch(void)
  
  if (!end) {

  fprintf(stderr, "Can't determine architecture from binary name.\n");
-abort();
+exit(1);
+}
+
+if (!strstr(qemu, "-system-")) {
+fprintf(stderr, "QTEST_QEMU_BINARY must end with *-system- "
+"where 'arch' is the target\narchitecture (x86_64, aarch64, "
+"etc).\n");
+exit(1);
  }
  
  return end + 1;




Reviewed-by: Thomas Huth

Re: [RFC PATCH 0/5] mptcp support

2021-04-12 Thread Dr. David Alan Gilbert

* Daniel P. Berrangé (berra...@redhat.com) wrote:
> On Fri, Apr 09, 2021 at 10:34:30AM +0100, Daniel P. Berrangé wrote:
> > On Thu, Apr 08, 2021 at 08:11:54PM +0100, Dr. David Alan Gilbert (git) 
> > wrote:
> > > From: "Dr. David Alan Gilbert" 
> > >   I had a quick go at trying NBD as well, but I think it needs
> > > some work with the parsing of NBD addresses.
> > 
> > In theory this is applicable to anywhere that we use sockets.
> > Anywhere that is configured with the QAPI  SocketAddress /
> > SocketAddressLegacy type will get it for free AFAICT.
> 
> The caveat is any servers which share the problem of prematurely
> closing the listener socket that you fixed here for migration.

Right, this varies depending on the server semantics; migration is only
expecting a single connection so shut it immediately; nbd is already
wired to expect multiple connections.

Dave

> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH 0/3] mps3-an524: support memory remapping