Re: [PATCH] hw/nvme: Add iothread support

2022-08-26 Thread Klaus Jensen
On Jul 20 17:00, Jinhao Fan wrote:
> Add an option "iothread=x" to do emulation in a seperate iothread.
> This improves the performance because QEMU's main loop is responsible
> for a lot of other work while iothread is dedicated to NVMe emulation.
> Moreover, emulating in iothread brings the potential of polling on
> SQ/CQ doorbells, which I will bring up in a following patch.
> 
> Iothread can be enabled by:
>  -object iothread,id=nvme0 \
>  -device nvme,iothread=nvme0 \
> 
> Performance comparisons (KIOPS):
> 
> QD 1   4  16  64
> QEMU  41 136 242 338
> iothread  53 155 245 309
> 
> Signed-off-by: Jinhao Fan 
> ---
>  hw/nvme/ctrl.c | 80 ++
>  hw/nvme/ns.c   | 19 +---
>  hw/nvme/nvme.h |  6 +++-
>  3 files changed, 95 insertions(+), 10 deletions(-)
> 

Jinhao,

Are you gonna respin this based on the irqfd patches? I suggest you just
add this work on top and post a series that is irqfd+iothread. Then, if
we find the irqfd ready for merge, we can pick that up for the next
release cycle early and continue on iothread work.


signature.asc
Description: PGP signature


Re: slirp: Can I get IPv6-only DHCP working?

2022-08-26 Thread Thomas Huth

On 26/08/2022 01.15, Peter Delevoryas wrote:

On Fri, Aug 26, 2022 at 12:56:10AM +0200, Samuel Thibault wrote:

Hello,

Peter Delevoryas, le jeu. 25 août 2022 15:38:53 -0700, a ecrit:

It seems like there's support for an IPv6 dns proxy, and there's literally a
file called "dhcpv6.c" in slirp, but it has a comment saying it only supports
whatever is necessary for TFTP network boot I guess.


For which DNS support is welcome :)


Maybe there's no support then?


It seems there is:

 if (ri.want_dns) {
 *resp++ = OPTION_DNS_SERVERS >> 8; /* option-code high byte */
 *resp++ = OPTION_DNS_SERVERS; /* option-code low byte */
 *resp++ = 0; /* option-len high byte */
 *resp++ = 16; /* option-len low byte */
 memcpy(resp, &slirp->vnameserver_addr6, 16);
 resp += 16;
 }


Well, that's great, but actually I just care about whether slirp supports DHCPv6
address requests. Sorry if I didn't explain that properly.

My goal is to run:

 qemu-system-arm -machine fby35-bmc -nographic -mtdblock image-bmc \
 -net nic,model=ftgmac100,netdev=nic \
 -netdev user,id=nic,hostfwd=::-:22

And then see that the BMC received an IPv6 address assignment.

But, slirp currently just supports IP address assignment through BOOTP?  I
didn't realize that until looking a little closer at the code.


No, slirp support "IPv6 *stateless* address assignment" (if you haven't 
heard about that before, I suggest to google it). That means that IPv6 
addresses are not administered by a DHCP server, but that each client can 
built its own IPv6 address. It basically works like this (don't quote me on 
that, it's been a while since I worked with this stuff): Once the network 
interface of the client gets activated, the OS creates a link-local IPv6 
that can be only used for activating on the link (based on the MAC address). 
It then uses this address for sending a "router solicitation" message, and 
if there is a router on the link, it replies with a "router advertisment" 
that contains a routable prefix. The client then can take this prefix to 
form a unique IPv6 address and assign it to its interface. QEMU/slirp 
support this way of address assignment, see ndp_send_na() in ip6_icmp.c.
So if your BMC code supports stateless IPv6 address (it certainly should), 
you should be fine already).


 HTH,
  Thomas




Re: [PATCH 15/20] disas/nanomips: Replace exception handling

2022-08-26 Thread Milica Lazarevic
Any comments on this patch?

From: Milica Lazarevic
Sent: Monday, August 15, 2022 9:31 AM
To: th...@redhat.com 
Cc: qemu-devel@nongnu.org ; cfont...@suse.de 
; berra...@redhat.com ; 
pbonz...@redhat.com ; vince.delvecc...@mediatek.com 
; richard.hender...@linaro.org 
; peter.mayd...@linaro.org 
; Djordje Todorovic ; 
mips3...@gmail.com ; Dragan Mladjenovic 
; Milica Lazarevic 
Subject: [PATCH 15/20] disas/nanomips: Replace exception handling

Since there's no support for exception handling in C, the try-catch
blocks have been deleted, and throw clauses are replaced. When a runtime
error happens, we're printing out the error message. Disassembling of
the current instruction interrupts. This behavior is achieved by adding
sigsetjmp() to discard further disassembling after the error message
prints and by adding the siglongjmp() function to imitate throwing an
error.The goal was to maintain the same output as it was.

Signed-off-by: Milica Lazarevic 
---
 disas/nanomips.cpp | 135 +++--
 1 file changed, 69 insertions(+), 66 deletions(-)

diff --git a/disas/nanomips.cpp b/disas/nanomips.cpp
index 769368a984..a8cd878809 100644
--- a/disas/nanomips.cpp
+++ b/disas/nanomips.cpp
@@ -31,7 +31,6 @@
 #include "disas/dis-asm.h"

 #include 
-#include 
 #include 
 #include 

@@ -90,6 +89,8 @@ struct Pool {

 static img_address   m_pc;
 static TABLE_ATTRIBUTE_TYPE   m_requested_instruction_categories;
+static struct disassemble_info *disassm_info;
+static jmp_buf j_buf;

 static const char *img_format(const char *format, ...)
 {
@@ -133,10 +134,13 @@ static uint64 renumber_registers(uint64 index, uint64 
*register_list,
 return register_list[index];
 }

-throw std::runtime_error(img_format(
-   "Invalid register mapping index %" PRIu64
-   ", size of list = %zu",
-   index, register_list_size));
+const char *err = img_format(
+  "Invalid register mapping index %" PRIu64
+  ", size of list = %zu",
+  index, register_list_size);
+(*disassm_info->fprintf_func)(disassm_info->stream, "%s", err);
+free((char *)err);
+siglongjmp(j_buf, 1);
 }


@@ -513,8 +517,11 @@ static const char *GPR(uint64 reg)
 return gpr_reg[reg];
 }

-throw std::runtime_error(img_format("Invalid GPR register index %" PRIu64,
- reg));
+const char *err = img_format("Invalid GPR register index %" PRIu64,
+ reg);
+(*disassm_info->fprintf_func)(disassm_info->stream, "%s", err);
+free((char *)err);
+siglongjmp(j_buf, 1);
 }


@@ -548,8 +555,11 @@ static const char *FPR(uint64 reg)
 return fpr_reg[reg];
 }

-throw std::runtime_error(img_format("Invalid FPR register index %" PRIu64,
- reg));
+const char *err = img_format("Invalid FPR register index %" PRIu64,
+ reg);
+(*disassm_info->fprintf_func)(disassm_info->stream, "%s", err);
+free((char *)err);
+siglongjmp(j_buf, 1);
 }


@@ -563,8 +573,11 @@ static const char *AC(uint64 reg)
 return ac_reg[reg];
 }

-throw std::runtime_error(img_format("Invalid AC register index %" PRIu64,
- reg));
+const char *err = img_format("Invalid AC register index %" PRIu64,
+ reg);
+(*disassm_info->fprintf_func)(disassm_info->stream, "%s", err);
+free((char *)err);
+siglongjmp(j_buf, 1);
 }


@@ -628,67 +641,50 @@ static int Disassemble(const uint16 *data, char *dis,
TABLE_ENTRY_TYPE & type, const Pool *table,
int table_size)
 {
-try
-{
-for (int i = 0; i < table_size; i++) {
-uint64 op_code = extract_op_code_value(data,
- table[i].instructions_size);
-if ((op_code & table[i].mask) == table[i].value) {
-/* possible match */
-conditional_function cond = table[i].condition;
-if ((cond == 0) || (cond)(op_code)) {
-try
-{
-if (table[i].type == pool) {
-return Disassemble(data, dis, type,
-   table[i].next_table,
-   table[i].next_table_size);
-} else if ((table[i].type == instruction) ||
-   (table[i].type == call_instruction) ||
-   (table[i].type == branch_instruction) ||
-   (table[i].type == return_instruction)) {
-if ((table[i].attributes != 0) &&
-(m_requested_instruction_categories &
-

Re: [PATCH] gitlab-ci: Only use one process in Windows jobs for compilation

2022-08-26 Thread Bin Meng
On Fri, Aug 26, 2022 at 3:33 AM Thomas Huth  wrote:
>
> The Windows jobs are currently aborting at weird places - and
> there's the suspicion that it's due to memory constraints in
> the Windows containers. Let's switch to single-threaded compilation
> to decrease the pressure on the memory load, and to make the
> jobs more deterministic for further investigations.
>
> Signed-off-by: Thomas Huth 
> ---
>  .gitlab-ci.d/windows.yml | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>

Reviewed-by: Bin Meng 

Could it be possibly caused by the windows version of meson? I once
saw the same build failure [1] on one Windows host with 64GiB memory
so I think it is not due to memory constraints on the build host.

[1] https://gitlab.com/stsquad/qemu/-/jobs/2765579267



Re: [PATCH] hw/nvme: Add iothread support

2022-08-26 Thread Jinhao Fan
Sure. I’ve already reworked this iothread patch upon the new irqfd patch. I 
think I can post a v2 patch today. Do you mean I include irqfd v3 in the new 
iothread patch series?

发自我的iPhone

> 在 2022年8月26日,15:12,Klaus Jensen  写道:
> 
> On Jul 20 17:00, Jinhao Fan wrote:
>> Add an option "iothread=x" to do emulation in a seperate iothread.
>> This improves the performance because QEMU's main loop is responsible
>> for a lot of other work while iothread is dedicated to NVMe emulation.
>> Moreover, emulating in iothread brings the potential of polling on
>> SQ/CQ doorbells, which I will bring up in a following patch.
>> 
>> Iothread can be enabled by:
>> -object iothread,id=nvme0 \
>> -device nvme,iothread=nvme0 \
>> 
>> Performance comparisons (KIOPS):
>> 
>> QD 1   4  16  64
>> QEMU  41 136 242 338
>> iothread  53 155 245 309
>> 
>> Signed-off-by: Jinhao Fan 
>> ---
>> hw/nvme/ctrl.c | 80 ++
>> hw/nvme/ns.c   | 19 +---
>> hw/nvme/nvme.h |  6 +++-
>> 3 files changed, 95 insertions(+), 10 deletions(-)
>> 
> 
> Jinhao,
> 
> Are you gonna respin this based on the irqfd patches? I suggest you just
> add this work on top and post a series that is irqfd+iothread. Then, if
> we find the irqfd ready for merge, we can pick that up for the next
> release cycle early and continue on iothread work.


signature.asc
Description: Binary data


Re: [PATCH] chardev: fix segfault in finalize

2022-08-26 Thread Marc-André Lureau
Hi


On Thu, Aug 25, 2022 at 9:02 PM Maksim Davydov 
wrote:

> If finalize chardev-msmouse or chardev-wctable is called immediately after
> init it cases QEMU to crash with segfault. This happens because of
> QTAILQ_REMOVE in qemu_input_handler_unregister tries to dereference
> NULL pointer.
> For instance, this error can be reproduced via `qom-list-properties`
> command.
>
> Signed-off-by: Maksim Davydov 
>

Reviewed-by: Marc-André Lureau 


> ---
>  chardev/msmouse.c  | 4 +++-
>  chardev/wctablet.c | 4 +++-
>  2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/chardev/msmouse.c b/chardev/msmouse.c
> index eb9231dcdb..2cc1b16561 100644
> --- a/chardev/msmouse.c
> +++ b/chardev/msmouse.c
> @@ -146,7 +146,9 @@ static void char_msmouse_finalize(Object *obj)
>  {
>  MouseChardev *mouse = MOUSE_CHARDEV(obj);
>
> -qemu_input_handler_unregister(mouse->hs);
> +if (mouse->hs) {
> +qemu_input_handler_unregister(mouse->hs);
> +}
>  }
>
>  static QemuInputHandler msmouse_handler = {
> diff --git a/chardev/wctablet.c b/chardev/wctablet.c
> index e8b292c43c..43bdf6b608 100644
> --- a/chardev/wctablet.c
> +++ b/chardev/wctablet.c
> @@ -319,7 +319,9 @@ static void wctablet_chr_finalize(Object *obj)
>  {
>  TabletChardev *tablet = WCTABLET_CHARDEV(obj);
>
> -qemu_input_handler_unregister(tablet->hs);
> +if (tablet->hs) {
> +qemu_input_handler_unregister(tablet->hs);
> +}
>  }
>
>  static void wctablet_chr_open(Chardev *chr,
> --
> 2.25.1
>
>
>

-- 
Marc-André Lureau


Re: [PATCH 4/5] virtio-net: Update virtio-net curr_queue_pairs in vdpa backends

2022-08-26 Thread Eugenio Perez Martin
On Fri, Aug 26, 2022 at 6:29 AM Si-Wei Liu  wrote:
>
>
>
> On 8/24/2022 11:19 PM, Eugenio Perez Martin wrote:
> > On Thu, Aug 25, 2022 at 2:38 AM Si-Wei Liu  wrote:
> >>
> >>
> >> On 8/23/2022 9:27 PM, Jason Wang wrote:
> >>> 在 2022/8/20 01:13, Eugenio Pérez 写道:
>  It was returned as error before. Instead of it, simply update the
>  corresponding field so qemu can send it in the migration data.
> 
>  Signed-off-by: Eugenio Pérez 
>  ---
> >>>
> >>> Looks correct.
> >>>
> >>> Adding Si Wei for double check.
> >> Hmmm, I understand why this change is needed for live migration, but
> >> this would easily cause userspace out of sync with the kernel for other
> >> use cases, such as link down or userspace fallback due to vdpa ioctl
> >> error. Yes, these are edge cases.
> > The link down case is not possible at this moment because that cvq
> > command does not call virtio_net_handle_ctrl_iov.
> Right. Though shadow cvq would need to rely on extra ASID support from
> kernel. For the case without shadow cvq we still need to look for an
> alternative mechanism.
>
> > A similar treatment
> > than mq would be needed when supported, and the call to
> > virtio_net_set_status will be avoided.
> So, maybe the seemingly "right" fix for the moment is to prohibit manual
> set_link at all (for vDPA only)?

We can apply a similar solution and just save the link status, without
stopping any vqp backend. The code can be more elegant than checking
if the backend is vhost-vdpa of course, but what is the problem with
doing it that way?

> In longer term we'd need to come up
> with appropriate support for applying mq config regardless of asid or
> shadow cvq support.
>

What do you mean by applying "mq config"? To the virtio-net device
model in qemu? Is there any use case to apply it to the model outside
of live migration?

On the other hand, the current approach is not using ASID at all, it
will be added on top. Do you mean that it is needed for data
passthrough & CVQ shadow, isn't it?

> >
> > I'll double check device initialization ioctl failure with
> > n->curr_queue_pairs > 1 in the destination, but I think we should be
> > safe.
> >
> >> Not completely against it, but I
> >> wonder if there's a way we can limit the change scope to live migration
> >> case only?
> >>
> > The reason to update the device model is to send the curr_queue_pairs
> > to the destination in a backend agnostic way. To send it otherwise
> > would limit the live migration possibilities, but sure we can explore
> > another way.
> A hacky workaround that came off the top of my head was to allow sending
> curr_queue_pairs for the !vm_running case for vdpa. It doesn't look it
> would affect other backend I think. But I agree with Jason, this doesn't
> look decent so I give up on this idea. Hence for this patch,
>

I still don't get the problem. Also, the guest would need to reset the
device anyway, so that information will be lost, isn't it?

Thanks!

> Acked-by: Si-Wei Liu 
>
> >
> > Thanks!
> >
>




Re: [PATCH] hw/nvme: Add iothread support

2022-08-26 Thread Klaus Jensen
On Aug 26 16:15, Jinhao Fan wrote:
> Sure. I’ve already reworked this iothread patch upon the new irqfd
> patch. I think I can post a v2 patch today. Do you mean I include
> irqfd v3 in the new iothread patch series?
> 

Yes, please include irqfd-v3 in the series.


signature.asc
Description: PGP signature


Re: [PATCH 0/7] configure: fix misc shellcheck warnings

2022-08-26 Thread Marc-André Lureau
Hi

On Thu, Aug 25, 2022 at 7:09 PM Peter Maydell 
wrote:

> Currently if you run shellcheck on our configure script it
> generates a ton of warnings. This patchset fixes some of the
> easier ones. I wasn't aiming for completeness or consistency;
> I just wanted to zap some of the ones where the fix is clear
> and didn't take long to write and is hopefully easy to review.
> We can always come back and take another swing at it later.
>
> thanks
> -- PMM
>
> Peter Maydell (7):
>   configure: Remove unused python_version variable
>   configure: Remove unused meson_args variable
>   configure: Add missing quoting for some easy cases
>   configure: Add './' on front of glob of */config-devices.mak.d
>   configure: Remove use of backtick `...` syntax
>   configure: Check mkdir result directly, not via $?
>   configure: Avoid use of 'local' as it is non-POSIX
>
>  configure | 82 ++-
>  1 file changed, 38 insertions(+), 44 deletions(-)
>
> --
> 2.25.1
>
>
>
Reviewed-by: Marc-André Lureau 


-- 
Marc-André Lureau


Re: [PATCH v2 1/1] virtio-gpu: CONTEXT_INIT feature

2022-08-26 Thread Marc-André Lureau
Hi

On Thu, Aug 25, 2022 at 8:10 PM Antonio Caggiano <
antonio.caggi...@collabora.com> wrote:

> Create virgl renderer context with flags using context_id when valid.
>
> v2:
> - The feature can be enabled via the context_init config option.
> - A warning message will be emitted and the feature will not be used
>   when linking with virglrenderer versions without context_init support.
>
> Signed-off-by: Antonio Caggiano 
> ---
>  hw/display/virtio-gpu-base.c   |  3 +++
>  hw/display/virtio-gpu-virgl.c  | 19 +--
>  hw/display/virtio-gpu.c|  2 ++
>  include/hw/virtio/virtio-gpu.h |  3 +++
>  meson.build| 18 ++
>  5 files changed, 43 insertions(+), 2 deletions(-)
>
> diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c
> index a29f191aa8..6c5f1f327f 100644
> --- a/hw/display/virtio-gpu-base.c
> +++ b/hw/display/virtio-gpu-base.c
> @@ -215,6 +215,9 @@ virtio_gpu_base_get_features(VirtIODevice *vdev,
> uint64_t features,
>  if (virtio_gpu_blob_enabled(g->conf)) {
>  features |= (1 << VIRTIO_GPU_F_RESOURCE_BLOB);
>  }
> +if (virtio_gpu_context_init_enabled(g->conf)) {
> +features |= (1 << VIRTIO_GPU_F_CONTEXT_INIT);
> +}
>
>  return features;
>  }
> diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
> index 73cb92c8d5..d70a0006b1 100644
> --- a/hw/display/virtio-gpu-virgl.c
> +++ b/hw/display/virtio-gpu-virgl.c
> @@ -97,8 +97,23 @@ static void virgl_cmd_context_create(VirtIOGPU *g,
>  trace_virtio_gpu_cmd_ctx_create(cc.hdr.ctx_id,
>  cc.debug_name);
>
> -virgl_renderer_context_create(cc.hdr.ctx_id, cc.nlen,
> -  cc.debug_name);
> +if (cc.context_init) {
> +#if VIRGL_RENDERER_HAS_CONTEXT_INIT
> +virgl_renderer_context_create_with_flags(cc.hdr.ctx_id,
> + cc.context_init,
> + cc.nlen,
> + cc.debug_name);
> +return;
> +#else
> +qemu_log_mask(LOG_UNIMP,
> +  "Virglrenderer %d.%d.%d does not support
> context-init\n",
> +  VIRGL_RENDERER_VERSION_MAJOR,
> +  VIRGL_RENDERER_VERSION_MINOR,
> +  VIRGL_RENDERER_VERSION_MICRO);
> +#endif
> +}
> +
> +virgl_renderer_context_create(cc.hdr.ctx_id, cc.nlen, cc.debug_name);
>  }
>
>  static void virgl_cmd_context_destroy(VirtIOGPU *g,
> diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
> index 20cc703dcc..fa667ec234 100644
> --- a/hw/display/virtio-gpu.c
> +++ b/hw/display/virtio-gpu.c
> @@ -1424,6 +1424,8 @@ static Property virtio_gpu_properties[] = {
>   256 * MiB),
>  DEFINE_PROP_BIT("blob", VirtIOGPU, parent_obj.conf.flags,
>  VIRTIO_GPU_FLAG_BLOB_ENABLED, false),
> +DEFINE_PROP_BIT("context_init", VirtIOGPU, parent_obj.conf.flags,
> +VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED, false),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>
> diff --git a/include/hw/virtio/virtio-gpu.h
> b/include/hw/virtio/virtio-gpu.h
> index 2e28507efe..c6f5cfde47 100644
> --- a/include/hw/virtio/virtio-gpu.h
> +++ b/include/hw/virtio/virtio-gpu.h
> @@ -90,6 +90,7 @@ enum virtio_gpu_base_conf_flags {
>  VIRTIO_GPU_FLAG_EDID_ENABLED,
>  VIRTIO_GPU_FLAG_DMABUF_ENABLED,
>  VIRTIO_GPU_FLAG_BLOB_ENABLED,
> +VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED,
>  };
>
>  #define virtio_gpu_virgl_enabled(_cfg) \
> @@ -102,6 +103,8 @@ enum virtio_gpu_base_conf_flags {
>  (_cfg.flags & (1 << VIRTIO_GPU_FLAG_DMABUF_ENABLED))
>  #define virtio_gpu_blob_enabled(_cfg) \
>  (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED))
> +#define virtio_gpu_context_init_enabled(_cfg) \
> +(_cfg.flags & (1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED))
>
>  struct virtio_gpu_base_conf {
>  uint32_t max_outputs;
> diff --git a/meson.build b/meson.build
> index 20fddbd707..0d834ff027 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -718,6 +718,24 @@ if not get_option('virglrenderer').auto() or
> have_system or have_vhost_user_gpu
>   method: 'pkg-config',
>   required: get_option('virglrenderer'),
>   kwargs: static_kwargs)
> +
> +  if virgl.found()
> +virgl_compile_args = [
> +  '-DVIRGL_RENDERER_VERSION_MAJOR=' + virgl.version().split('.')[0],
> +  '-DVIRGL_RENDERER_VERSION_MINOR=' + virgl.version().split('.')[1],
> +  '-DVIRGL_RENDERER_VERSION_MICRO=' + virgl.version().split('.')[2],
> +]
>

It would be better to avoid using the VIRGL_ prefix, as it may conflict
with future updates.

You don't use those (yet). Perhaps the library header should define those
instead..


+if cc.has_function('virgl_renderer_context_create_with_flags',
> +   prefix: '#include ',
> +  

Re: [PATCH v1 20/40] i386/tdvf: Introduce function to parse TDVF metadata

2022-08-26 Thread Gerd Hoffmann
On Tue, Aug 02, 2022 at 03:47:30PM +0800, Xiaoyao Li wrote:
> From: Isaku Yamahata 
> 
> TDX VM needs to boot with its specialized firmware, Trusted Domain
> Virtual Firmware (TDVF). QEMU needs to parse TDVF and map it in TD
> guest memory prior to running the TDX VM.
> 
> A TDVF Metadata in TDVF image describes the structure of firmware.
> QEMU refers to it to setup memory for TDVF. Introduce function
> tdvf_parse_metadata() to parse the metadata from TDVF image and store
> the info of each TDVF section.
> 
> TDX metadata is located by a TDX metadata offset block, which is a
> GUID-ed structure. The data portion of the GUID structure contains
> only an 4-byte field that is the offset of TDX metadata to the end
> of firmware file.
> 
> Select X86_FW_OVMF when TDX is enable to leverage existing functions
> to parse and search OVMF's GUID-ed structures.
> 
> Signed-off-by: Isaku Yamahata 
> Co-developed-by: Xiaoyao Li 
> Signed-off-by: Xiaoyao Li 

Acked-by: Gerd Hoffmann 




RE: [PATCH v5 1/2] Update AVX512 support for xbzrle_encode_buffer

2022-08-26 Thread Xu, Ling1
Hi, juan, 
  Thanks for your time and suggestions on this patch. We have revised our 
code according to your nice comments. We will submit patch v6 to update these 
modifications.

Best Regards
Ling

-Original Message-
From: Juan Quintela  
Sent: Wednesday, August 24, 2022 4:42 PM
To: Xu, Ling1 
Cc: qemu-devel@nongnu.org; dgilb...@redhat.com; Zhao, Zhou 
; Jin, Jun I 
Subject: Re: [PATCH v5 1/2] Update AVX512 support for xbzrle_encode_buffer

ling xu  wrote:
> This commit updates code of avx512 support for xbzrle_encode_buffer 
> function to accelerate xbzrle encoding speed. We add runtime check of 
> avx512 and add benchmark for this feature. Compared with C version of 
> xbzrle_encode_buffer function, avx512 version can achieve 50%-70% 
> performance improvement on benchmarking. In addition, if dirty data is 
> randomly located in 4K page, the avx512 version can achieve almost 
> 140% performance gain.
>
> Signed-off-by: ling xu 
> Co-authored-by: Zhou Zhao 
> Co-authored-by: Jun Jin 
> ---
>  meson.build|  16 ++
>  meson_options.txt  |   2 +
>  migration/ram.c|  35 ++--
>  migration/xbzrle.c | 130 +
>  migration/xbzrle.h |   4 ++
>  5 files changed, 184 insertions(+), 3 deletions(-)
>
> diff --git a/meson.build b/meson.build index 30a380752c..c9d90a5bff 
> 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -2264,6 +2264,22 @@ config_host_data.set('CONFIG_AVX512F_OPT', 
> get_option('avx512f') \
>  int main(int argc, char *argv[]) { return bar(argv[0]); }
>'''), error_message: 'AVX512F not available').allowed())
>  
> +config_host_data.set('CONFIG_AVX512BW_OPT', get_option('avx512bw') \
> +  .require(have_cpuid_h, error_message: 'cpuid.h not available, 
> +cannot enable AVX512BW') \
> +  .require(cc.links('''
> +#pragma GCC push_options
> +#pragma GCC target("avx512bw")
> +#include 
> +#include 
> +static int bar(void *a) {


> +  __m512i x = *(__m512i *)a;
> +  __m512i res= _mm512_abs_epi8(x);

Cast is as ugly as hell, what about:

  __m512i *x = a;
  __m512i res = _mm512_abs_epi8(*x);

??

> +static void __attribute__((constructor)) init_cpu_flag(void) {
> +unsigned max = __get_cpuid_max(0, NULL);
> +int a, b, c, d;
> +if (max >= 1) {
> +__cpuid(1, a, b, c, d);
> + /* We must check that AVX is not just available, but usable.  */
> +if ((c & bit_OSXSAVE) && (c & bit_AVX) && max >= 7) {
> +int bv;
> +__asm("xgetbv" : "=a"(bv), "=d"(d) : "c"(0));
> +__cpuid_count(7, 0, a, b, c, d);
> +   /* 0xe6:
> +*  XCR0[7:5] = 111b (OPMASK state, upper 256-bit of ZMM0-ZMM15
> +*and ZMM16-ZMM31 state are enabled by OS)
> +*  XCR0[2:1] = 11b (XMM state and YMM state are enabled by OS)
> +*/
> +if ((bv & 0xe6) == 0xe6 && (b & bit_AVX512BW)) {
> +xbzrle_encode_buffer_func = xbzrle_encode_buffer_avx512;
> +}
> +}
> +}
> +return ;

This return line is not needed.

> +}
> +#endif
> +
>  XBZRLECacheStats xbzrle_counters;
>  
>  /* struct contains XBZRLE cache and a static page @@ -802,9 +831,9 @@ 
> static int save_xbzrle_page(RAMState *rs, uint8_t **current_data,
>  memcpy(XBZRLE.current_buf, *current_data, TARGET_PAGE_SIZE);
>  
>  /* XBZRLE encoding (if there is no overflow) */
> -encoded_len = xbzrle_encode_buffer(prev_cached_page, XBZRLE.current_buf,
> -   TARGET_PAGE_SIZE, XBZRLE.encoded_buf,
> -   TARGET_PAGE_SIZE);
> +encoded_len = xbzrle_encode_buffer_func(prev_cached_page, 
> XBZRLE.current_buf,
> +TARGET_PAGE_SIZE, 
> XBZRLE.encoded_buf,
> +TARGET_PAGE_SIZE);
>  
>  /*
>   * Update the cache contents, so that it corresponds to the data 
> diff --git a/migration/xbzrle.c b/migration/xbzrle.c index 
> 1ba482ded9..6da7f79625 100644
> --- a/migration/xbzrle.c
> +++ b/migration/xbzrle.c
> @@ -174,3 +174,133 @@ int xbzrle_decode_buffer(uint8_t *src, int slen, 
> uint8_t *dst, int dlen)
>  
>  return d;
>  }
> +
> +#if defined(CONFIG_AVX512BW_OPT)
> +#pragma GCC push_options
> +#pragma GCC target("avx512bw")
> +#include 
> +int xbzrle_encode_buffer_avx512(uint8_t *old_buf, uint8_t *new_buf, int slen,
> + uint8_t *dst, int dlen) {
> +uint32_t zrun_len = 0, nzrun_len = 0;
> +int d = 0, i = 0, num = 0;
> +uint8_t *nzrun_start = NULL;
> +/* add 1 to include residual part in main loop */
> +uint32_t count512s = (slen >> 6) + 1;
> +/* countResidual is tail of data, i.e., countResidual = slen % 64 */
> +uint32_t countResidual = slen & 0b11;
> +bool never_same = true;
> +uint64_t maskResidual = 1;
> +maskResidual <<= countResidual;
> +maskResidual 

Re: [PATCH v1 22/40] i386/tdx: Skip BIOS shadowing setup

2022-08-26 Thread Gerd Hoffmann
On Tue, Aug 02, 2022 at 03:47:32PM +0800, Xiaoyao Li wrote:
> TDX doesn't support map different GPAs to same private memory. Thus,
> aliasing top 128KB of BIOS as isa-bios is not supported.
> 
> On the other hand, TDX guest cannot go to real mode, it can work fine
> without isa-bios.
> 
> Signed-off-by: Xiaoyao Li 

Acked-by: Gerd Hoffmann 




Re: [PATCH v1 26/40] headers: Add definitions from UEFI spec for volumes, resources, etc...

2022-08-26 Thread Gerd Hoffmann
On Tue, Aug 02, 2022 at 03:47:36PM +0800, Xiaoyao Li wrote:
> Add UEFI definitions for literals, enums, structs, GUIDs, etc... that
> will be used by TDX to build the UEFI Hand-Off Block (HOB) that is passed
> to the Trusted Domain Virtual Firmware (TDVF).
> 
> All values come from the UEFI specification and TDVF design guide. [1]
> 
> Note, EFI_RESOURCE_MEMORY_UNACCEPTED will be added in future UEFI spec.
> 
> [1] 
> https://software.intel.com/content/dam/develop/external/us/en/documents/tdx-virtual-firmware-design-guide-rev-1.pdf
> 
> Signed-off-by: Xiaoyao Li 

Acked-by: Gerd Hoffmann 




Re: [PATCH v1 25/40] i386/tdx: Track RAM entries for TDX VM

2022-08-26 Thread Gerd Hoffmann
On Tue, Aug 02, 2022 at 03:47:35PM +0800, Xiaoyao Li wrote:
> The RAM of TDX VM can be classified into two types:
> 
>  - TDX_RAM_UNACCEPTED: default type of TDX memory, which needs to be
>accepted by TDX guest before it can be used and will be all-zeros
>after being accepted.
> 
>  - TDX_RAM_ADDED: the RAM that is ADD'ed to TD guest before running, and
>can be used directly. E.g., TD HOB and TEMP MEM that needed by TDVF.
> 
> Maintain TdxRamEntries[] which grabs the initial RAM info from e820 table
> and mark each RAM range as default type TDX_RAM_UNACCEPTED.
> 
> Then turn the range of TD HOB and TEMP MEM to TDX_RAM_ADDED since these
> ranges will be ADD'ed before TD runs and no need to be accepted runtime.
> 
> The TdxRamEntries[] are later used to setup the memory TD resource HOB
> that passes memory info from QEMU to TDVF.
> 
> Signed-off-by: Xiaoyao Li 

Acked-by: Gerd Hoffmann 




Re: [PATCH 1/2] dump: simplify a bit kdump get_next_page()

2022-08-26 Thread David Hildenbrand
On 25.08.22 15:21, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> This should be functionally equivalent, but slightly easier to read,
> with simplified paths and checks at the end of the function.
> 
> The following patch is a major rewrite to get rid of the assert().
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  dump/dump.c | 30 --
>  1 file changed, 12 insertions(+), 18 deletions(-)
> 
> diff --git a/dump/dump.c b/dump/dump.c
> index 4d9658ffa2..18f06cffe2 100644
> --- a/dump/dump.c
> +++ b/dump/dump.c
> @@ -1107,37 +1107,31 @@ static bool get_next_page(GuestPhysBlock **blockptr, 
> uint64_t *pfnptr,
>  uint8_t *buf;
>  
>  /* block == NULL means the start of the iteration */
> -if (!block) {
> -block = QTAILQ_FIRST(&s->guest_phys_blocks.head);
> -*blockptr = block;
> -assert((block->target_start & ~target_page_mask) == 0);
> -assert((block->target_end & ~target_page_mask) == 0);
> -*pfnptr = dump_paddr_to_pfn(s, block->target_start);
> -if (bufptr) {
> -*bufptr = block->host_addr;
> -}
> -return true;


Instead of the "return true" we'll now do take the  "if ((addr >=
block->target_start) &&" path below I guess, always ending up with
essentially "buf = buf;" because addr == block->target_start.

I guess that's fine.

> +if (block == NULL) {

What's wrong with keeping the "if (!block) {" ? :)

> +*blockptr = block = QTAILQ_FIRST(&s->guest_phys_blocks.head);

Another unnecessary change.

> +addr = block->target_start;
> +} else {
> +addr = dump_pfn_to_paddr(s, *pfnptr + 1);
>  }
> -
> -*pfnptr = *pfnptr + 1;
> -addr = dump_pfn_to_paddr(s, *pfnptr);
> +assert(block != NULL);
>  
>  if ((addr >= block->target_start) &&
>  (addr + s->dump_info.page_size <= block->target_end)) {
>  buf = block->host_addr + (addr - block->target_start);
>  } else {
>  /* the next page is in the next block */
> -block = QTAILQ_NEXT(block, next);
> -*blockptr = block;
> +*blockptr = block = QTAILQ_NEXT(block, next);

Another unnecessary change. (avoiding these really eases review, because
the focus is then completely on the actual code changes)

>  if (!block) {
>  return false;
>  }
> -assert((block->target_start & ~target_page_mask) == 0);
> -assert((block->target_end & ~target_page_mask) == 0);
> -*pfnptr = dump_paddr_to_pfn(s, block->target_start);
> +addr = block->target_start;
>  buf = block->host_addr;
>  }
>  
> +/* those checks are going away next */

This comment seems to imply a story documented in code. Rather just drop
it -- the patch description already points that out.

> +assert((block->target_start & ~target_page_mask) == 0);
> +assert((block->target_end & ~target_page_mask) == 0);
> +*pfnptr = dump_paddr_to_pfn(s, addr);
>  if (bufptr) {
>  *bufptr = buf;
>  }


Apart from the nits, LGTM.

-- 
Thanks,

David / dhildenb




[PATCH v1 1/2] hw/loongarch: Support memory hotplug

2022-08-26 Thread Xiaojuan Yang
Add hotplug/unplug interface for memory device.

Signed-off-by: Xiaojuan Yang 
---
 hw/loongarch/Kconfig  |   2 +
 hw/loongarch/acpi-build.c |  32 +---
 hw/loongarch/virt.c   | 105 +-
 3 files changed, 132 insertions(+), 7 deletions(-)

diff --git a/hw/loongarch/Kconfig b/hw/loongarch/Kconfig
index fef55c5638..17d15b6c90 100644
--- a/hw/loongarch/Kconfig
+++ b/hw/loongarch/Kconfig
@@ -4,6 +4,7 @@ config LOONGARCH_VIRT
 select PCI_EXPRESS_GENERIC_BRIDGE
 imply VIRTIO_VGA
 imply PCI_DEVICES
+imply NVDIMM
 select ISA_BUS
 select SERIAL
 select SERIAL_ISA
@@ -18,3 +19,4 @@ config LOONGARCH_VIRT
 select ACPI_PCI
 select ACPI_HW_REDUCED
 select FW_CFG_DMA
+select DIMM
diff --git a/hw/loongarch/acpi-build.c b/hw/loongarch/acpi-build.c
index 95e30975a8..92ee62c11a 100644
--- a/hw/loongarch/acpi-build.c
+++ b/hw/loongarch/acpi-build.c
@@ -186,6 +186,12 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 build_srat_memory(table_data, VIRT_HIGHMEM_BASE, machine->ram_size - 
VIRT_LOWMEM_SIZE,
   0, MEM_AFFINITY_ENABLED);
 
+if (ms->device_memory) {
+build_srat_memory(table_data, ms->device_memory->base,
+  memory_region_size(&ms->device_memory->mr),
+  0, MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
+}
+
 acpi_table_end(linker, &table);
 }
 
@@ -335,6 +341,25 @@ static void build_uart_device_aml(Aml *table)
 aml_append(table, scope);
 }
 
+static void
+build_la_ged_aml(Aml *dsdt, MachineState *machine)
+{
+uint32_t event;
+LoongArchMachineState *lams = LOONGARCH_MACHINE(machine);
+
+build_ged_aml(dsdt, "\\_SB."GED_DEVICE,
+  HOTPLUG_HANDLER(lams->acpi_ged),
+  VIRT_SCI_IRQ, AML_SYSTEM_MEMORY,
+  VIRT_GED_EVT_ADDR);
+event = object_property_get_uint(OBJECT(lams->acpi_ged),
+ "ged-event", &error_abort);
+if (event & ACPI_GED_MEM_HOTPLUG_EVT) {
+build_memory_hotplug_aml(dsdt, machine->ram_slots, "\\_SB", NULL,
+ AML_SYSTEM_MEMORY,
+ VIRT_GED_MEM_ADDR);
+}
+}
+
 /* build DSDT */
 static void
 build_dsdt(GArray *table_data, BIOSLinker *linker, MachineState *machine)
@@ -364,12 +389,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 
 build_gpex_pci0_int(dsdt);
 build_uart_device_aml(dsdt);
-if (lams->acpi_ged) {
-build_ged_aml(dsdt, "\\_SB."GED_DEVICE,
-  HOTPLUG_HANDLER(lams->acpi_ged),
-  VIRT_SCI_IRQ, AML_SYSTEM_MEMORY,
-  VIRT_GED_EVT_ADDR);
-}
+build_la_ged_aml(dsdt, machine);
 
 scope = aml_scope("\\_SB.PCI0");
 /* Build PCI0._CRS */
diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index 1e1dc699ef..a81db29384 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -40,6 +40,7 @@
 #include "hw/core/sysbus-fdt.h"
 #include "hw/platform-bus.h"
 #include "hw/display/ramfb.h"
+#include "hw/mem/pc-dimm.h"
 
 static void create_fdt(LoongArchMachineState *lams)
 {
@@ -719,6 +720,35 @@ static void loongarch_init(MachineState *machine)
  machine->ram, offset, highram_size);
 memory_region_add_subregion(address_space_mem, 0x9000, &lams->highmem);
 memmap_add_entry(0x9000, highram_size, 1);
+
+/* initialize device memory address space */
+if (machine->ram_size < machine->maxram_size) {
+machine->device_memory = g_malloc0(sizeof(*machine->device_memory));
+ram_addr_t device_mem_size = machine->maxram_size - machine->ram_size;
+
+if (machine->ram_slots > ACPI_MAX_RAM_SLOTS) {
+error_report("unsupported amount of memory slots: %"PRIu64,
+ machine->ram_slots);
+exit(EXIT_FAILURE);
+}
+
+if (QEMU_ALIGN_UP(machine->maxram_size,
+  TARGET_PAGE_SIZE) != machine->maxram_size) {
+error_report("maximum memory size must by aligned to multiple of "
+ "%d bytes", TARGET_PAGE_SIZE);
+exit(EXIT_FAILURE);
+}
+/* device memory base is the top of high memory address. */
+machine->device_memory->base = 0x9000 + highram_size;
+machine->device_memory->base =
+ROUND_UP(machine->device_memory->base, 1 * GiB);
+
+memory_region_init(&machine->device_memory->mr, OBJECT(lams),
+   "device-memory", device_mem_size);
+memory_region_add_subregion(address_space_mem, 
machine->device_memory->base,
+&machine->device_memory->mr);
+}
+
 /* Add isa io region */
 memory_region_init_alias(&lams->isa_io, NULL, "isa-io",
  get_system_io(), 0, VIRT_ISA_IO_SIZE);
@@ -8

[PATCH v1 2/2] hw/loongarch: Improve acpi dsdt table

2022-08-26 Thread Xiaojuan Yang
Cleanup the previous pci information in acpi dsdt table.
And using the common acpi_dsdt_add_gpex function to build
the gpex and pci information.

Signed-off-by: Xiaojuan Yang 
---
 hw/loongarch/acpi-build.c   | 159 +---
 hw/loongarch/virt.c |   1 +
 include/hw/loongarch/virt.h |   1 +
 3 files changed, 21 insertions(+), 140 deletions(-)

diff --git a/hw/loongarch/acpi-build.c b/hw/loongarch/acpi-build.c
index 92ee62c11a..378a6d9d38 100644
--- a/hw/loongarch/acpi-build.c
+++ b/hw/loongarch/acpi-build.c
@@ -30,6 +30,7 @@
 #include "qom/qom-qobject.h"
 
 #include "hw/acpi/generic_event_device.h"
+#include "hw/pci-host/gpex.h"
 
 #define ACPI_BUILD_ALIGN_SIZE 0x1000
 #define ACPI_BUILD_TABLE_SIZE 0x2
@@ -206,108 +207,6 @@ struct AcpiBuildState {
 MemoryRegion *linker_mr;
 } AcpiBuildState;
 
-static void build_gpex_pci0_int(Aml *table)
-{
-Aml *sb_scope = aml_scope("_SB");
-Aml *pci0_scope = aml_scope("PCI0");
-Aml *prt_pkg = aml_varpackage(128);
-int slot, pin;
-
-for (slot = 0; slot < PCI_SLOT_MAX; slot++) {
-for (pin = 0; pin < PCI_NUM_PINS; pin++) {
-Aml *pkg = aml_package(4);
-aml_append(pkg, aml_int((slot << 16) | 0x));
-aml_append(pkg, aml_int(pin));
-aml_append(pkg, aml_int(0));
-aml_append(pkg, aml_int(80 + (slot + pin) % 4));
-aml_append(prt_pkg, pkg);
-}
-}
-aml_append(pci0_scope, aml_name_decl("_PRT", prt_pkg));
-aml_append(sb_scope, pci0_scope);
-aml_append(table, sb_scope);
-}
-
-static void build_dbg_aml(Aml *table)
-{
-Aml *field;
-Aml *method;
-Aml *while_ctx;
-Aml *scope = aml_scope("\\");
-Aml *buf = aml_local(0);
-Aml *len = aml_local(1);
-Aml *idx = aml_local(2);
-
-aml_append(scope,
-   aml_operation_region("DBG", AML_SYSTEM_IO, aml_int(0x0402), 0x01));
-field = aml_field("DBG", AML_BYTE_ACC, AML_NOLOCK, AML_PRESERVE);
-aml_append(field, aml_named_field("DBGB", 8));
-aml_append(scope, field);
-
-method = aml_method("DBUG", 1, AML_NOTSERIALIZED);
-
-aml_append(method, aml_to_hexstring(aml_arg(0), buf));
-aml_append(method, aml_to_buffer(buf, buf));
-aml_append(method, aml_subtract(aml_sizeof(buf), aml_int(1), len));
-aml_append(method, aml_store(aml_int(0), idx));
-
-while_ctx = aml_while(aml_lless(idx, len));
-aml_append(while_ctx,
-aml_store(aml_derefof(aml_index(buf, idx)), aml_name("DBGB")));
-aml_append(while_ctx, aml_increment(idx));
-aml_append(method, while_ctx);
-aml_append(method, aml_store(aml_int(0x0A), aml_name("DBGB")));
-aml_append(scope, method);
-aml_append(table, scope);
-}
-
-static Aml *build_osc_method(void)
-{
-Aml *if_ctx;
-Aml *if_ctx2;
-Aml *else_ctx;
-Aml *method;
-Aml *a_cwd1 = aml_name("CDW1");
-Aml *a_ctrl = aml_local(0);
-
-method = aml_method("_OSC", 4, AML_NOTSERIALIZED);
-aml_append(method, aml_create_dword_field(aml_arg(3), aml_int(0), "CDW1"));
-
-if_ctx = aml_if(aml_equal(
-aml_arg(0), aml_touuid("33DB4D5B-1FF7-401C-9657-7441C03DD766")));
-aml_append(if_ctx, aml_create_dword_field(aml_arg(3), aml_int(4), "CDW2"));
-aml_append(if_ctx, aml_create_dword_field(aml_arg(3), aml_int(8), "CDW3"));
-aml_append(if_ctx, aml_store(aml_name("CDW3"), a_ctrl));
-
-/*
- * Always allow native PME, AER (no dependencies)
- * Allow SHPC (PCI bridges can have SHPC controller)
- */
-aml_append(if_ctx, aml_and(a_ctrl, aml_int(0x1F), a_ctrl));
-
-if_ctx2 = aml_if(aml_lnot(aml_equal(aml_arg(1), aml_int(1;
-/* Unknown revision */
-aml_append(if_ctx2, aml_or(a_cwd1, aml_int(0x08), a_cwd1));
-aml_append(if_ctx, if_ctx2);
-
-if_ctx2 = aml_if(aml_lnot(aml_equal(aml_name("CDW3"), a_ctrl)));
-/* Capabilities bits were masked */
-aml_append(if_ctx2, aml_or(a_cwd1, aml_int(0x10), a_cwd1));
-aml_append(if_ctx, if_ctx2);
-
-/* Update DWORD3 in the buffer */
-aml_append(if_ctx, aml_store(a_ctrl, aml_name("CDW3")));
-aml_append(method, if_ctx);
-
-else_ctx = aml_else();
-/* Unrecognized UUID */
-aml_append(else_ctx, aml_or(a_cwd1, aml_int(4), a_cwd1));
-aml_append(method, else_ctx);
-
-aml_append(method, aml_return(aml_arg(3)));
-return method;
-}
-
 static void build_uart_device_aml(Aml *table)
 {
 Aml *dev;
@@ -360,57 +259,37 @@ build_la_ged_aml(Aml *dsdt, MachineState *machine)
 }
 }
 
+static void build_pci_device_aml(Aml *scope, LoongArchMachineState *lams)
+{
+struct GPEXConfig cfg = {
+.mmio64.base = VIRT_PCI_MEM_BASE,
+.mmio64.size = VIRT_PCI_MEM_SIZE,
+.pio.base= VIRT_PCI_IO_BASE,
+.pio.size= VIRT_PCI_IO_SIZE,
+.ecam.base   = VIRT_PCI_CFG_BASE,
+.ecam.size   = VIRT_PCI_CFG_SIZE,
+.irq = PCH_PIC_IRQ_OFFSET + VIRT_DEVICE_IRQS,
+.bus = lams->pc

[PATCH v1 0/2] Add mem hotplug and improve acpi dsdt

2022-08-26 Thread Xiaojuan Yang
This series based on the 'Add funtions for LoongArch virt machine patch'(11 Aug)
and 'Fix acpi ged irq number in dsdt table patch'(19 Aug).

Changes for v1:
1.Support memory hotplug
2.Improve acpi dsdt table

Xiaojuan Yang (2):
  hw/loongarch: Support memory hotplug
  hw/loongarch: Improve acpi dsdt table

 hw/loongarch/Kconfig|   2 +
 hw/loongarch/acpi-build.c   | 191 +---
 hw/loongarch/virt.c | 106 +++-
 include/hw/loongarch/virt.h |   1 +
 4 files changed, 153 insertions(+), 147 deletions(-)

-- 
2.31.1




[PATCH v6 0/2] This patch updates AVX512 support for xbzrle

2022-08-26 Thread ling xu
This patch updates code of avx512 support for xbzrle_encode_buffer
function. We modified code of algorithm and check of avx512. Besides, we 
provide benchmark in xbzrle-bench.c for performance comparison.

Signed-off-by: ling xu 
Co-authored-by: Zhou Zhao 
Co-authored-by: Jun Jin 

ling xu (2):
  Update AVX512 support for xbzrle_encode_buffer
  Unit test code and benchmark code

 meson.build|  16 ++
 meson_options.txt  |   2 +
 migration/ram.c|  34 ++-
 migration/xbzrle.c | 124 ++
 migration/xbzrle.h |   4 +
 tests/bench/meson.build|   4 +
 tests/bench/xbzrle-bench.c | 465 +
 tests/unit/test-xbzrle.c   |  39 +++-
 8 files changed, 680 insertions(+), 8 deletions(-)
 create mode 100644 tests/bench/xbzrle-bench.c

-- 
2.25.1




[PATCH v6 1/2] Update AVX512 support for xbzrle_encode_buffer

2022-08-26 Thread ling xu
This commit updates code of avx512 support for xbzrle_encode_buffer
function to accelerate xbzrle encoding speed. Runtime check of avx512
support and benchmark for this feature are added. Compared with C
version of xbzrle_encode_buffer function, avx512 version can achieve
50%-70% performance improvement on benchmarking. In addition, if dirty
data is randomly located in 4K page, the avx512 version can achieve
almost 140% performance gain.

Signed-off-by: ling xu 
Co-authored-by: Zhou Zhao 
Co-authored-by: Jun Jin 
---
 meson.build|  16 ++
 meson_options.txt  |   2 +
 migration/ram.c|  34 +++--
 migration/xbzrle.c | 124 +
 migration/xbzrle.h |   4 ++
 5 files changed, 177 insertions(+), 3 deletions(-)

diff --git a/meson.build b/meson.build
index 20fddbd707..5d4b82d7f3 100644
--- a/meson.build
+++ b/meson.build
@@ -2264,6 +2264,22 @@ config_host_data.set('CONFIG_AVX512F_OPT', 
get_option('avx512f') \
 int main(int argc, char *argv[]) { return bar(argv[0]); }
   '''), error_message: 'AVX512F not available').allowed())
 
+config_host_data.set('CONFIG_AVX512BW_OPT', get_option('avx512bw') \
+  .require(have_cpuid_h, error_message: 'cpuid.h not available, cannot enable 
AVX512BW') \
+  .require(cc.links('''
+#pragma GCC push_options
+#pragma GCC target("avx512bw")
+#include 
+#include 
+static int bar(void *a) {
+
+  __m512i *x = a;
+  __m512i res= _mm512_abs_epi8(*x);
+  return res[1];
+}
+int main(int argc, char *argv[]) { return bar(argv[0]); }
+  '''), error_message: 'AVX512BW not available').allowed())
+
 have_pvrdma = get_option('pvrdma') \
   .require(rdma.found(), error_message: 'PVRDMA requires OpenFabrics 
libraries') \
   .require(cc.compiles(gnu_source_prefix + '''
diff --git a/meson_options.txt b/meson_options.txt
index e58e158396..07194bf680 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -104,6 +104,8 @@ option('avx2', type: 'feature', value: 'auto',
description: 'AVX2 optimizations')
 option('avx512f', type: 'feature', value: 'disabled',
description: 'AVX512F optimizations')
+option('avx512bw', type: 'feature', value: 'auto',
+   description: 'AVX512BW optimizations')
 option('keyring', type: 'feature', value: 'auto',
description: 'Linux keyring support')
 
diff --git a/migration/ram.c b/migration/ram.c
index dc1de9ddbc..ff4c15c9c3 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -83,6 +83,34 @@
 /* 0x80 is reserved in migration.h start with 0x100 next */
 #define RAM_SAVE_FLAG_COMPRESS_PAGE0x100
 
+int (*xbzrle_encode_buffer_func)(uint8_t *, uint8_t *, int,
+ uint8_t *, int) = xbzrle_encode_buffer;
+#if defined(CONFIG_AVX512BW_OPT)
+#include "qemu/cpuid.h"
+static void __attribute__((constructor)) init_cpu_flag(void)
+{
+unsigned max = __get_cpuid_max(0, NULL);
+int a, b, c, d;
+if (max >= 1) {
+__cpuid(1, a, b, c, d);
+ /* We must check that AVX is not just available, but usable.  */
+if ((c & bit_OSXSAVE) && (c & bit_AVX) && max >= 7) {
+int bv;
+__asm("xgetbv" : "=a"(bv), "=d"(d) : "c"(0));
+__cpuid_count(7, 0, a, b, c, d);
+   /* 0xe6:
+*  XCR0[7:5] = 111b (OPMASK state, upper 256-bit of ZMM0-ZMM15
+*and ZMM16-ZMM31 state are enabled by OS)
+*  XCR0[2:1] = 11b (XMM state and YMM state are enabled by OS)
+*/
+if ((bv & 0xe6) == 0xe6 && (b & bit_AVX512BW)) {
+xbzrle_encode_buffer_func = xbzrle_encode_buffer_avx512;
+}
+}
+}
+}
+#endif
+
 XBZRLECacheStats xbzrle_counters;
 
 /* struct contains XBZRLE cache and a static page
@@ -802,9 +830,9 @@ static int save_xbzrle_page(RAMState *rs, uint8_t 
**current_data,
 memcpy(XBZRLE.current_buf, *current_data, TARGET_PAGE_SIZE);
 
 /* XBZRLE encoding (if there is no overflow) */
-encoded_len = xbzrle_encode_buffer(prev_cached_page, XBZRLE.current_buf,
-   TARGET_PAGE_SIZE, XBZRLE.encoded_buf,
-   TARGET_PAGE_SIZE);
+encoded_len = xbzrle_encode_buffer_func(prev_cached_page, 
XBZRLE.current_buf,
+TARGET_PAGE_SIZE, 
XBZRLE.encoded_buf,
+TARGET_PAGE_SIZE);
 
 /*
  * Update the cache contents, so that it corresponds to the data
diff --git a/migration/xbzrle.c b/migration/xbzrle.c
index 1ba482ded9..05366e86c0 100644
--- a/migration/xbzrle.c
+++ b/migration/xbzrle.c
@@ -174,3 +174,127 @@ int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t 
*dst, int dlen)
 
 return d;
 }
+
+#if defined(CONFIG_AVX512BW_OPT)
+#pragma GCC push_options
+#pragma GCC target("avx512bw")
+#include 
+int xbzrle_encode_buffer_avx512(uint8_t *old_buf, uint8_t *new_buf, int slen,
+ uint8_t *dst, int dlen)
+{
+ui

Re: [PATCH 1/2] dump: simplify a bit kdump get_next_page()

2022-08-26 Thread David Hildenbrand
On 26.08.22 11:56, Marc-André Lureau wrote:
> Hi
> 
> On Fri, Aug 26, 2022 at 1:45 PM David Hildenbrand  wrote:
>>
>> On 25.08.22 15:21, marcandre.lur...@redhat.com wrote:
>>> From: Marc-André Lureau 
>>>
>>> This should be functionally equivalent, but slightly easier to read,
>>> with simplified paths and checks at the end of the function.
>>>
>>> The following patch is a major rewrite to get rid of the assert().
>>>
>>> Signed-off-by: Marc-André Lureau 
>>> ---
>>>  dump/dump.c | 30 --
>>>  1 file changed, 12 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/dump/dump.c b/dump/dump.c
>>> index 4d9658ffa2..18f06cffe2 100644
>>> --- a/dump/dump.c
>>> +++ b/dump/dump.c
>>> @@ -1107,37 +1107,31 @@ static bool get_next_page(GuestPhysBlock 
>>> **blockptr, uint64_t *pfnptr,
>>>  uint8_t *buf;
>>>
>>>  /* block == NULL means the start of the iteration */
>>> -if (!block) {
>>> -block = QTAILQ_FIRST(&s->guest_phys_blocks.head);
>>> -*blockptr = block;
>>> -assert((block->target_start & ~target_page_mask) == 0);
>>> -assert((block->target_end & ~target_page_mask) == 0);
>>> -*pfnptr = dump_paddr_to_pfn(s, block->target_start);
>>> -if (bufptr) {
>>> -*bufptr = block->host_addr;
>>> -}
>>> -return true;
>>
>>
>> Instead of the "return true" we'll now do take the  "if ((addr >=
>> block->target_start) &&" path below I guess, always ending up with
>> essentially "buf = buf;" because addr == block->target_start.
>>
>> I guess that's fine.
>>
>>> +if (block == NULL) {
>>
>> What's wrong with keeping the "if (!block) {" ? :)
> 
> That's just to be consistent with the comment above.
> 
>>
>>> +*blockptr = block = QTAILQ_FIRST(&s->guest_phys_blocks.head);
>>
>> Another unnecessary change.
>>
>>> +addr = block->target_start;
>>> +} else {
>>> +addr = dump_pfn_to_paddr(s, *pfnptr + 1);
>>>  }
>>> -
>>> -*pfnptr = *pfnptr + 1;
>>> -addr = dump_pfn_to_paddr(s, *pfnptr);
>>> +assert(block != NULL);
>>>
>>>  if ((addr >= block->target_start) &&
>>>  (addr + s->dump_info.page_size <= block->target_end)) {
>>>  buf = block->host_addr + (addr - block->target_start);
>>>  } else {
>>>  /* the next page is in the next block */
>>> -block = QTAILQ_NEXT(block, next);
>>> -*blockptr = block;
>>> +*blockptr = block = QTAILQ_NEXT(block, next);
>>
>> Another unnecessary change. (avoiding these really eases review, because
>> the focus is then completely on the actual code changes)
>>
>>>  if (!block) {
>>>  return false;
>>>  }
>>> -assert((block->target_start & ~target_page_mask) == 0);
>>> -assert((block->target_end & ~target_page_mask) == 0);
>>> -*pfnptr = dump_paddr_to_pfn(s, block->target_start);
>>> +addr = block->target_start;
>>>  buf = block->host_addr;
>>>  }
>>>
>>> +/* those checks are going away next */
>>
>> This comment seems to imply a story documented in code. Rather just drop
>> it -- the patch description already points that out.
>>
>>> +assert((block->target_start & ~target_page_mask) == 0);
>>> +assert((block->target_end & ~target_page_mask) == 0);
>>> +*pfnptr = dump_paddr_to_pfn(s, addr);
>>>  if (bufptr) {
>>>  *bufptr = buf;
>>>  }
>>
>>
>> Apart from the nits, LGTM.
> 
> We could also drop this patch, it helped me to rewrite the function next 
> mostly.

I think it's fine. Small logical changes are easier to review -- at
least for me.

-- 
Thanks,

David / dhildenb




Re: [PATCH 2/2] dump: fix kdump to work over non-aligned blocks

2022-08-26 Thread David Hildenbrand
On 25.08.22 15:21, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> Rewrite get_next_page() to work over non-aligned blocks. When it
> encounters non aligned addresses, it will allocate a zero-page and try
> to fill it.

Could we simplify by using one global helper page (or caller provided
page) and avoiding the allocation/freeing?


-- 
Thanks,

David / dhildenb




Re: [PATCH 1/2] dump: simplify a bit kdump get_next_page()

2022-08-26 Thread Marc-André Lureau
Hi

On Fri, Aug 26, 2022 at 1:45 PM David Hildenbrand  wrote:
>
> On 25.08.22 15:21, marcandre.lur...@redhat.com wrote:
> > From: Marc-André Lureau 
> >
> > This should be functionally equivalent, but slightly easier to read,
> > with simplified paths and checks at the end of the function.
> >
> > The following patch is a major rewrite to get rid of the assert().
> >
> > Signed-off-by: Marc-André Lureau 
> > ---
> >  dump/dump.c | 30 --
> >  1 file changed, 12 insertions(+), 18 deletions(-)
> >
> > diff --git a/dump/dump.c b/dump/dump.c
> > index 4d9658ffa2..18f06cffe2 100644
> > --- a/dump/dump.c
> > +++ b/dump/dump.c
> > @@ -1107,37 +1107,31 @@ static bool get_next_page(GuestPhysBlock 
> > **blockptr, uint64_t *pfnptr,
> >  uint8_t *buf;
> >
> >  /* block == NULL means the start of the iteration */
> > -if (!block) {
> > -block = QTAILQ_FIRST(&s->guest_phys_blocks.head);
> > -*blockptr = block;
> > -assert((block->target_start & ~target_page_mask) == 0);
> > -assert((block->target_end & ~target_page_mask) == 0);
> > -*pfnptr = dump_paddr_to_pfn(s, block->target_start);
> > -if (bufptr) {
> > -*bufptr = block->host_addr;
> > -}
> > -return true;
>
>
> Instead of the "return true" we'll now do take the  "if ((addr >=
> block->target_start) &&" path below I guess, always ending up with
> essentially "buf = buf;" because addr == block->target_start.
>
> I guess that's fine.
>
> > +if (block == NULL) {
>
> What's wrong with keeping the "if (!block) {" ? :)

That's just to be consistent with the comment above.

>
> > +*blockptr = block = QTAILQ_FIRST(&s->guest_phys_blocks.head);
>
> Another unnecessary change.
>
> > +addr = block->target_start;
> > +} else {
> > +addr = dump_pfn_to_paddr(s, *pfnptr + 1);
> >  }
> > -
> > -*pfnptr = *pfnptr + 1;
> > -addr = dump_pfn_to_paddr(s, *pfnptr);
> > +assert(block != NULL);
> >
> >  if ((addr >= block->target_start) &&
> >  (addr + s->dump_info.page_size <= block->target_end)) {
> >  buf = block->host_addr + (addr - block->target_start);
> >  } else {
> >  /* the next page is in the next block */
> > -block = QTAILQ_NEXT(block, next);
> > -*blockptr = block;
> > +*blockptr = block = QTAILQ_NEXT(block, next);
>
> Another unnecessary change. (avoiding these really eases review, because
> the focus is then completely on the actual code changes)
>
> >  if (!block) {
> >  return false;
> >  }
> > -assert((block->target_start & ~target_page_mask) == 0);
> > -assert((block->target_end & ~target_page_mask) == 0);
> > -*pfnptr = dump_paddr_to_pfn(s, block->target_start);
> > +addr = block->target_start;
> >  buf = block->host_addr;
> >  }
> >
> > +/* those checks are going away next */
>
> This comment seems to imply a story documented in code. Rather just drop
> it -- the patch description already points that out.
>
> > +assert((block->target_start & ~target_page_mask) == 0);
> > +assert((block->target_end & ~target_page_mask) == 0);
> > +*pfnptr = dump_paddr_to_pfn(s, addr);
> >  if (bufptr) {
> >  *bufptr = buf;
> >  }
>
>
> Apart from the nits, LGTM.

We could also drop this patch, it helped me to rewrite the function next mostly.




[PATCH v6 2/2] Unit test code and benchmark code

2022-08-26 Thread ling xu
Unit test code is in test-xbzrle.c, and benchmark code is in xbzrle-bench.c
for performance benchmarking.

Signed-off-by: ling xu 
Co-authored-by: Zhou Zhao 
Co-authored-by: Jun Jin 
---
 tests/bench/meson.build|   4 +
 tests/bench/xbzrle-bench.c | 465 +
 tests/unit/test-xbzrle.c   |  39 +++-
 3 files changed, 503 insertions(+), 5 deletions(-)
 create mode 100644 tests/bench/xbzrle-bench.c

diff --git a/tests/bench/meson.build b/tests/bench/meson.build
index 279a8fcc33..daefead58d 100644
--- a/tests/bench/meson.build
+++ b/tests/bench/meson.build
@@ -3,6 +3,10 @@ qht_bench = executable('qht-bench',
sources: 'qht-bench.c',
dependencies: [qemuutil])
 
+xbzrle_bench = executable('xbzrle-bench',
+   sources: 'xbzrle-bench.c',
+   dependencies: [qemuutil,migration])
+
 executable('atomic_add-bench',
sources: files('atomic_add-bench.c'),
dependencies: [qemuutil],
diff --git a/tests/bench/xbzrle-bench.c b/tests/bench/xbzrle-bench.c
new file mode 100644
index 00..d71397e6f4
--- /dev/null
+++ b/tests/bench/xbzrle-bench.c
@@ -0,0 +1,465 @@
+/*
+ * Xor Based Zero Run Length Encoding unit tests.
+ *
+ * Copyright 2013 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *  Orit Wasserman  
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+#include "qemu/osdep.h"
+#include "qemu/cutils.h"
+#include "../migration/xbzrle.h"
+
+#define XBZRLE_PAGE_SIZE 4096
+
+#if defined(CONFIG_AVX512BW_OPT)
+static bool is_cpu_support_avx512bw;
+#include "qemu/cpuid.h"
+static void __attribute__((constructor)) init_cpu_flag(void)
+{
+unsigned max = __get_cpuid_max(0, NULL);
+int a, b, c, d;
+is_cpu_support_avx512bw = false;
+if (max >= 1) {
+__cpuid(1, a, b, c, d);
+ /* We must check that AVX is not just available, but usable.  */
+if ((c & bit_OSXSAVE) && (c & bit_AVX) && max >= 7) {
+int bv;
+__asm("xgetbv" : "=a"(bv), "=d"(d) : "c"(0));
+__cpuid_count(7, 0, a, b, c, d);
+   /* 0xe6:
+*  XCR0[7:5] = 111b (OPMASK state, upper 256-bit of ZMM0-ZMM15
+*and ZMM16-ZMM31 state are enabled by OS)
+*  XCR0[2:1] = 11b (XMM state and YMM state are enabled by OS)
+*/
+if ((bv & 0xe6) == 0xe6 && (b & bit_AVX512BW)) {
+is_cpu_support_avx512bw = true;
+}
+}
+}
+return ;
+}
+#endif
+
+struct ResTime {
+float t_raw;
+float t_512;
+};
+
+static void encode_decode_zero(struct ResTime *res)
+{
+uint8_t *buffer = g_malloc0(XBZRLE_PAGE_SIZE);
+uint8_t *compressed = g_malloc0(XBZRLE_PAGE_SIZE);
+uint8_t *buffer512 = g_malloc0(XBZRLE_PAGE_SIZE);
+uint8_t *compressed512 = g_malloc0(XBZRLE_PAGE_SIZE);
+int i = 0;
+int dlen = 0, dlen512 = 0;
+int diff_len = g_test_rand_int_range(0, XBZRLE_PAGE_SIZE - 1006);
+
+for (i = diff_len; i > 0; i--) {
+buffer[1000 + i] = i;
+buffer512[1000 + i] = i;
+}
+
+buffer[1000 + diff_len + 3] = 103;
+buffer[1000 + diff_len + 5] = 105;
+
+buffer512[1000 + diff_len + 3] = 103;
+buffer512[1000 + diff_len + 5] = 105;
+
+/* encode zero page */
+time_t t_start, t_end, t_start512, t_end512;
+t_start = clock();
+dlen = xbzrle_encode_buffer(buffer, buffer, XBZRLE_PAGE_SIZE, compressed,
+   XBZRLE_PAGE_SIZE);
+t_end = clock();
+float time_val = difftime(t_end, t_start);
+g_assert(dlen == 0);
+
+t_start512 = clock();
+dlen512 = xbzrle_encode_buffer_avx512(buffer512, buffer512, 
XBZRLE_PAGE_SIZE,
+   compressed512, XBZRLE_PAGE_SIZE);
+t_end512 = clock();
+float time_val512 = difftime(t_end512, t_start512);
+g_assert(dlen512 == 0);
+
+res->t_raw = time_val;
+res->t_512 = time_val512;
+
+g_free(buffer);
+g_free(compressed);
+g_free(buffer512);
+g_free(compressed512);
+
+}
+
+static void test_encode_decode_zero_avx512(void)
+{
+int i;
+float time_raw = 0.0, time_512 = 0.0;
+struct ResTime res;
+for (i = 0; i < 1; i++) {
+encode_decode_zero(&res);
+time_raw += res.t_raw;
+time_512 += res.t_512;
+}
+printf("Zero test:\n");
+printf("Raw xbzrle_encode time is %f ms\n", time_raw);
+printf("512 xbzrle_encode time is %f ms\n", time_512);
+}
+
+static void encode_decode_unchanged(struct ResTime *res)
+{
+uint8_t *compressed = g_malloc0(XBZRLE_PAGE_SIZE);
+uint8_t *test = g_malloc0(XBZRLE_PAGE_SIZE);
+uint8_t *compressed512 = g_malloc0(XBZRLE_PAGE_SIZE);
+uint8_t *test512 = g_malloc0(XBZRLE_PAGE_SIZE);
+int i = 0;
+int dlen = 0, dlen512 = 0;
+int diff_len = g_test_rand_int_range(0, XBZRLE_PAGE_SIZE - 1006);
+
+

Re: [PATCH 2/2] dump: fix kdump to work over non-aligned blocks

2022-08-26 Thread Marc-André Lureau
Hi


On Fri, Aug 26, 2022 at 2:01 PM David Hildenbrand  wrote:

> On 25.08.22 15:21, marcandre.lur...@redhat.com wrote:
> > From: Marc-André Lureau 
> >
> > Rewrite get_next_page() to work over non-aligned blocks. When it
> > encounters non aligned addresses, it will allocate a zero-page and try
> > to fill it.
>
> Could we simplify by using one global helper page (or caller provided
> page) and avoiding the allocation/freeing?
>
>
I don't think that makes a big difference, but certainly doable.


-- 
Marc-André Lureau


Re: [PATCH 09/51] fsdev/virtfs-proxy-helper: Use g_mkdir_with_parents()

2022-08-26 Thread Christian Schoenebeck
On Mittwoch, 24. August 2022 11:39:47 CEST Bin Meng wrote:
> From: Bin Meng 
> 
> Use the same g_mkdir_with_parents() call to create a directory on
> all platforms.

The same would be g_mkdir(), not g_mkdir_with_parents(), so please use that 
instead.

> Signed-off-by: Bin Meng 
> ---
> 
>  fsdev/virtfs-proxy-helper.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fsdev/virtfs-proxy-helper.c b/fsdev/virtfs-proxy-helper.c
> index 2dde27922f..d0cf76d6d1 100644
> --- a/fsdev/virtfs-proxy-helper.c
> +++ b/fsdev/virtfs-proxy-helper.c
> @@ -639,7 +639,7 @@ static int do_create_others(int type, struct iovec
> *iovec) if (retval < 0) {
>  goto err_out;
>  }
> -retval = mkdir(path.data, mode);
> +retval = g_mkdir_with_parents(path.data, mode);
>  break;
>  case T_SYMLINK:
>  retval = proxy_unmarshal(iovec, offset, "ss", &oldpath, &path);






Re: [PATCH 2/2] dump: fix kdump to work over non-aligned blocks

2022-08-26 Thread David Hildenbrand
On 26.08.22 12:02, Marc-André Lureau wrote:
> Hi
> 
> 
> On Fri, Aug 26, 2022 at 2:01 PM David Hildenbrand  > wrote:
> 
> On 25.08.22 15:21, marcandre.lur...@redhat.com
>  wrote:
> > From: Marc-André Lureau  >
> >
> > Rewrite get_next_page() to work over non-aligned blocks. When it
> > encounters non aligned addresses, it will allocate a zero-page and try
> > to fill it.
> 
> Could we simplify by using one global helper page (or caller provided
> page) and avoiding the allocation/freeing?
> 
> 
> I don't think that makes a big difference, but certainly doable.

If we're using one central page, I guess we'd have to pass "flag_sync =
true" to write_cache() in case that page is used. Or we simply specify
on the single global page in there and force a sync. Changes would be
limited to get_next_page() and write_cache() then.


-- 
Thanks,

David / dhildenb




Re: [PATCH 2/2] dump: fix kdump to work over non-aligned blocks

2022-08-26 Thread David Hildenbrand
On 26.08.22 12:07, David Hildenbrand wrote:
> On 26.08.22 12:02, Marc-André Lureau wrote:
>> Hi
>>
>>
>> On Fri, Aug 26, 2022 at 2:01 PM David Hildenbrand > > wrote:
>>
>> On 25.08.22 15:21, marcandre.lur...@redhat.com
>>  wrote:
>> > From: Marc-André Lureau > >
>> >
>> > Rewrite get_next_page() to work over non-aligned blocks. When it
>> > encounters non aligned addresses, it will allocate a zero-page and try
>> > to fill it.
>>
>> Could we simplify by using one global helper page (or caller provided
>> page) and avoiding the allocation/freeing?
>>
>>
>> I don't think that makes a big difference, but certainly doable.
> 
> If we're using one central page, I guess we'd have to pass "flag_sync =
> true" to write_cache() in case that page is used. Or we simply specify
> on the single global page in there and force a sync. Changes would be
> limited to get_next_page() and write_cache() then.

I might be wrong. I think we might not have to touch write_cache() at
all -- it will copy the data into the DataCache buffer.


-- 
Thanks,

David / dhildenb




[PATCH v3 1/1] virtio-gpu: CONTEXT_INIT feature

2022-08-26 Thread Antonio Caggiano
Create virgl renderer context with flags using context_id when valid.

v2:
- The feature can be enabled via the context_init config option.
- A warning message will be emitted and the feature will not be used
  when linking with virglrenderer versions without context_init support.

v3: Define HAVE_VIRGL_CONTEXT_INIT in config_host_data.

Signed-off-by: Antonio Caggiano 
---
 hw/display/virtio-gpu-base.c   |  3 +++
 hw/display/virtio-gpu-virgl.c  | 16 ++--
 hw/display/virtio-gpu.c|  2 ++
 include/hw/virtio/virtio-gpu.h |  3 +++
 meson.build|  5 +
 5 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c
index a29f191aa8..6c5f1f327f 100644
--- a/hw/display/virtio-gpu-base.c
+++ b/hw/display/virtio-gpu-base.c
@@ -215,6 +215,9 @@ virtio_gpu_base_get_features(VirtIODevice *vdev, uint64_t 
features,
 if (virtio_gpu_blob_enabled(g->conf)) {
 features |= (1 << VIRTIO_GPU_F_RESOURCE_BLOB);
 }
+if (virtio_gpu_context_init_enabled(g->conf)) {
+features |= (1 << VIRTIO_GPU_F_CONTEXT_INIT);
+}
 
 return features;
 }
diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 73cb92c8d5..274cbc44de 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -97,8 +97,20 @@ static void virgl_cmd_context_create(VirtIOGPU *g,
 trace_virtio_gpu_cmd_ctx_create(cc.hdr.ctx_id,
 cc.debug_name);
 
-virgl_renderer_context_create(cc.hdr.ctx_id, cc.nlen,
-  cc.debug_name);
+if (cc.context_init) {
+#ifdef HAVE_VIRGL_CONTEXT_INIT
+virgl_renderer_context_create_with_flags(cc.hdr.ctx_id,
+ cc.context_init,
+ cc.nlen,
+ cc.debug_name);
+return;
+#else
+qemu_log_mask(LOG_UNIMP,
+  "Linked virglrenderer does not support context-init\n");
+#endif
+}
+
+virgl_renderer_context_create(cc.hdr.ctx_id, cc.nlen, cc.debug_name);
 }
 
 static void virgl_cmd_context_destroy(VirtIOGPU *g,
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 20cc703dcc..fa667ec234 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1424,6 +1424,8 @@ static Property virtio_gpu_properties[] = {
  256 * MiB),
 DEFINE_PROP_BIT("blob", VirtIOGPU, parent_obj.conf.flags,
 VIRTIO_GPU_FLAG_BLOB_ENABLED, false),
+DEFINE_PROP_BIT("context_init", VirtIOGPU, parent_obj.conf.flags,
+VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED, false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 2e28507efe..c6f5cfde47 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -90,6 +90,7 @@ enum virtio_gpu_base_conf_flags {
 VIRTIO_GPU_FLAG_EDID_ENABLED,
 VIRTIO_GPU_FLAG_DMABUF_ENABLED,
 VIRTIO_GPU_FLAG_BLOB_ENABLED,
+VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED,
 };
 
 #define virtio_gpu_virgl_enabled(_cfg) \
@@ -102,6 +103,8 @@ enum virtio_gpu_base_conf_flags {
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_DMABUF_ENABLED))
 #define virtio_gpu_blob_enabled(_cfg) \
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED))
+#define virtio_gpu_context_init_enabled(_cfg) \
+(_cfg.flags & (1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED))
 
 struct virtio_gpu_base_conf {
 uint32_t max_outputs;
diff --git a/meson.build b/meson.build
index 20fddbd707..e1071b3563 100644
--- a/meson.build
+++ b/meson.build
@@ -718,6 +718,11 @@ if not get_option('virglrenderer').auto() or have_system 
or have_vhost_user_gpu
  method: 'pkg-config',
  required: get_option('virglrenderer'),
  kwargs: static_kwargs)
+
+  config_host_data.set('HAVE_VIRGL_CONTEXT_INIT',
+   
cc.has_function('virgl_renderer_context_create_with_flags',
+   prefix: '#include ',
+   dependencies: virgl))
 endif
 curl = not_found
 if not get_option('curl').auto() or have_block
-- 
2.34.1




[PATCH v3 0/1] virtio-gpu: CONTEXT_INIT feature

2022-08-26 Thread Antonio Caggiano
This is a different attempt at upstreaming the work I have been doing to
enable support for the Venus Virtio-GPU Vulkan driver.

I believe the previous one [0] was a bit too much stuff in one place,
therefore with this I would like to try a more fine-grained approach.

I will just start by the CONTEXT_INIT feature as it was the first commit
of the series aforementioned and the virtio-spec has been updated
recently on that regard [1]. Hopefully this would also answer Gerd's
comment on the previous patch [2].

[0] https://www.mail-archive.com/qemu-devel@nongnu.org/msg826897.html
[1] 
https://github.com/oasis-tcs/virtio-spec/commit/aad2b6f3620ec0c9d16aaf046db8c282c24cce3e
[2] https://www.mail-archive.com/qemu-devel@nongnu.org/msg827304.html

Antonio Caggiano (1):
  virtio-gpu: CONTEXT_INIT feature

 hw/display/virtio-gpu-base.c   |  3 +++
 hw/display/virtio-gpu-virgl.c  | 16 ++--
 hw/display/virtio-gpu.c|  2 ++
 include/hw/virtio/virtio-gpu.h |  3 +++
 meson.build|  5 +
 5 files changed, 27 insertions(+), 2 deletions(-)

-- 
2.34.1




Re: [PATCH v3 1/1] virtio-gpu: CONTEXT_INIT feature

2022-08-26 Thread Marc-André Lureau
Hi

On Fri, Aug 26, 2022 at 2:12 PM Antonio Caggiano <
antonio.caggi...@collabora.com> wrote:

> Create virgl renderer context with flags using context_id when valid.
>
> v2:
> - The feature can be enabled via the context_init config option.
> - A warning message will be emitted and the feature will not be used
>   when linking with virglrenderer versions without context_init support.
>
> v3: Define HAVE_VIRGL_CONTEXT_INIT in config_host_data.
>
> Signed-off-by: Antonio Caggiano 
> ---
>  hw/display/virtio-gpu-base.c   |  3 +++
>  hw/display/virtio-gpu-virgl.c  | 16 ++--
>  hw/display/virtio-gpu.c|  2 ++
>  include/hw/virtio/virtio-gpu.h |  3 +++
>  meson.build|  5 +
>  5 files changed, 27 insertions(+), 2 deletions(-)
>
> diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c
> index a29f191aa8..6c5f1f327f 100644
> --- a/hw/display/virtio-gpu-base.c
> +++ b/hw/display/virtio-gpu-base.c
> @@ -215,6 +215,9 @@ virtio_gpu_base_get_features(VirtIODevice *vdev,
> uint64_t features,
>  if (virtio_gpu_blob_enabled(g->conf)) {
>  features |= (1 << VIRTIO_GPU_F_RESOURCE_BLOB);
>  }
> +if (virtio_gpu_context_init_enabled(g->conf)) {
> +features |= (1 << VIRTIO_GPU_F_CONTEXT_INIT);
> +}
>
>  return features;
>  }
> diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
> index 73cb92c8d5..274cbc44de 100644
> --- a/hw/display/virtio-gpu-virgl.c
> +++ b/hw/display/virtio-gpu-virgl.c
> @@ -97,8 +97,20 @@ static void virgl_cmd_context_create(VirtIOGPU *g,
>  trace_virtio_gpu_cmd_ctx_create(cc.hdr.ctx_id,
>  cc.debug_name);
>
> -virgl_renderer_context_create(cc.hdr.ctx_id, cc.nlen,
> -  cc.debug_name);
> +if (cc.context_init) {
> +#ifdef HAVE_VIRGL_CONTEXT_INIT
> +virgl_renderer_context_create_with_flags(cc.hdr.ctx_id,
> + cc.context_init,
> + cc.nlen,
> + cc.debug_name);
> +return;
> +#else
> +qemu_log_mask(LOG_UNIMP,
> +  "Linked virglrenderer does not support
> context-init\n");
>

What is the outcome in that case?


> +#endif
> +}
> +
> +virgl_renderer_context_create(cc.hdr.ctx_id, cc.nlen, cc.debug_name);
>  }
>
>  static void virgl_cmd_context_destroy(VirtIOGPU *g,
> diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
> index 20cc703dcc..fa667ec234 100644
> --- a/hw/display/virtio-gpu.c
> +++ b/hw/display/virtio-gpu.c
> @@ -1424,6 +1424,8 @@ static Property virtio_gpu_properties[] = {
>   256 * MiB),
>  DEFINE_PROP_BIT("blob", VirtIOGPU, parent_obj.conf.flags,
>  VIRTIO_GPU_FLAG_BLOB_ENABLED, false),
> +DEFINE_PROP_BIT("context_init", VirtIOGPU, parent_obj.conf.flags,
> +VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED, false),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>
> diff --git a/include/hw/virtio/virtio-gpu.h
> b/include/hw/virtio/virtio-gpu.h
> index 2e28507efe..c6f5cfde47 100644
> --- a/include/hw/virtio/virtio-gpu.h
> +++ b/include/hw/virtio/virtio-gpu.h
> @@ -90,6 +90,7 @@ enum virtio_gpu_base_conf_flags {
>  VIRTIO_GPU_FLAG_EDID_ENABLED,
>  VIRTIO_GPU_FLAG_DMABUF_ENABLED,
>  VIRTIO_GPU_FLAG_BLOB_ENABLED,
> +VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED,
>  };
>
>  #define virtio_gpu_virgl_enabled(_cfg) \
> @@ -102,6 +103,8 @@ enum virtio_gpu_base_conf_flags {
>  (_cfg.flags & (1 << VIRTIO_GPU_FLAG_DMABUF_ENABLED))
>  #define virtio_gpu_blob_enabled(_cfg) \
>  (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED))
> +#define virtio_gpu_context_init_enabled(_cfg) \
> +(_cfg.flags & (1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED))
>
>  struct virtio_gpu_base_conf {
>  uint32_t max_outputs;
> diff --git a/meson.build b/meson.build
> index 20fddbd707..e1071b3563 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -718,6 +718,11 @@ if not get_option('virglrenderer').auto() or
> have_system or have_vhost_user_gpu
>   method: 'pkg-config',
>   required: get_option('virglrenderer'),
>   kwargs: static_kwargs)
> +
> +  config_host_data.set('HAVE_VIRGL_CONTEXT_INIT',
> +
>  cc.has_function('virgl_renderer_context_create_with_flags',
> +   prefix: '#include
> ',
> +   dependencies: virgl))
>  endif
>  curl = not_found
>  if not get_option('curl').auto() or have_block
> --
> 2.34.1
>
>
>
lgtm
Reviewed-by: Marc-André Lureau 

-- 
Marc-André Lureau


Re: [PATCH v7 1/2] target/s390x: support SHA-512 extensions

2022-08-26 Thread Thomas Huth



Finally, I'm also having some spare minutes to have a look on this ... 
First, thank you for your work here, it's very appreciated! Some more 
comments inline below (mostly cosmetics since I'm not very much into this 
crypto stuff)...


On 09/08/2022 17.03, Jason A. Donenfeld wrote:

In order to fully support MSA_EXT_5, we have to support the SHA-512
special instructions. So implement those.

The implementation began as something TweetNacl-like, and then was
adjusted to be useful here. It's not very beautiful, but it is quite
short and compact, which is what we're going for.

>

Signed-off-by: Jason A. Donenfeld 
---
  target/s390x/gen-features.c  |   3 +
  target/s390x/tcg/crypto_helper.c | 157 +++
  2 files changed, 160 insertions(+)


If you've got some spare time, it would be great to have a test for the new 
functions in the tests/tcg/s390x/ folder, too (but otherwise we can still 
add them later).



diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index ad140184b9..85ab69d04e 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -749,6 +749,9 @@ static uint16_t qemu_V7_0[] = {
   */
  static uint16_t qemu_MAX[] = {
  S390_FEAT_VECTOR_ENH2,
+S390_FEAT_MSA_EXT_5,
+S390_FEAT_KIMD_SHA_512,
+S390_FEAT_KLMD_SHA_512,
  };


I think we likely have to fence the bits off for older machine type 
versions, like it has been done in commit 4f9b6c7ddb2 for example. However, 
the patch for the new 7.2 machine type is not merged yet (but I've queued it 
on https://gitlab.com/thuth/qemu/-/commits/s390x-next/ ), so you either have 
to pick that manually into your branch, or we fix it up later (which would 
be ok for me, too).



  /** END FEATURE DEFS **/
diff --git a/target/s390x/tcg/crypto_helper.c b/target/s390x/tcg/crypto_helper.c
index 138d9e7ad9..4d45de8faa 100644
--- a/target/s390x/tcg/crypto_helper.c
+++ b/target/s390x/tcg/crypto_helper.c
@@ -1,10 +1,12 @@
  /*
   *  s390x crypto helpers
   *
+ *  Copyright (C) 2022 Jason A. Donenfeld . All Rights 
Reserved.


Please drop the "All rights reserved" ... it does not have any legal meaning 
anymore, and also sounds weird in the Open Source context. See:


 https://en.wikipedia.org/wiki/All_rights_reserved#Obsolescence


   *  Copyright (c) 2017 Red Hat Inc
   *
   *  Authors:
   *   David Hildenbrand 
+ *   Jason A. Donenfeld 
   *
   * This work is licensed under the terms of the GNU GPL, version 2 or later.
   * See the COPYING file in the top-level directory.
@@ -18,6 +20,153 @@
  #include "exec/exec-all.h"
  #include "exec/cpu_ldst.h"
  
+static uint64_t R(uint64_t x, int c) { return (x >> c) | (x << (64 - c)); }

+static uint64_t Ch(uint64_t x, uint64_t y, uint64_t z) { return (x & y) ^ (~x 
& z); }
+static uint64_t Maj(uint64_t x, uint64_t y, uint64_t z) { return (x & y) ^ (x & z) 
^ (y & z); }
+static uint64_t Sigma0(uint64_t x) { return R(x, 28) ^ R(x, 34) ^ R(x, 39); }
+static uint64_t Sigma1(uint64_t x) { return R(x, 14) ^ R(x, 18) ^ R(x, 41); }
+static uint64_t sigma0(uint64_t x) { return R(x, 1) ^ R(x, 8) ^ (x >> 7); }
+static uint64_t sigma1(uint64_t x) { return R(x, 19) ^ R(x, 61) ^ (x >> 6); }
+
+static const uint64_t K[80] = {
+0x428a2f98d728ae22ULL, 0x7137449123ef65cdULL, 0xb5c0fbcfec4d3b2fULL,
+0xe9b5dba58189dbbcULL, 0x3956c25bf348b538ULL, 0x59f111f1b605d019ULL,
+0x923f82a4af194f9bULL, 0xab1c5ed5da6d8118ULL, 0xd807aa98a3030242ULL,
+0x12835b0145706fbeULL, 0x243185be4ee4b28cULL, 0x550c7dc3d5ffb4e2ULL,
+0x72be5d74f27b896fULL, 0x80deb1fe3b1696b1ULL, 0x9bdc06a725c71235ULL,
+0xc19bf174cf692694ULL, 0xe49b69c19ef14ad2ULL, 0xefbe4786384f25e3ULL,
+0x0fc19dc68b8cd5b5ULL, 0x240ca1cc77ac9c65ULL, 0x2de92c6f592b0275ULL,
+0x4a7484aa6ea6e483ULL, 0x5cb0a9dcbd41fbd4ULL, 0x76f988da831153b5ULL,
+0x983e5152ee66dfabULL, 0xa831c66d2db43210ULL, 0xb00327c898fb213fULL,
+0xbf597fc7beef0ee4ULL, 0xc6e00bf33da88fc2ULL, 0xd5a79147930aa725ULL,
+0x06ca6351e003826fULL, 0x142929670a0e6e70ULL, 0x27b70a8546d22ffcULL,
+0x2e1b21385c26c926ULL, 0x4d2c6dfc5ac42aedULL, 0x53380d139d95b3dfULL,
+0x650a73548baf63deULL, 0x766a0abb3c77b2a8ULL, 0x81c2c92e47edaee6ULL,
+0x92722c851482353bULL, 0xa2bfe8a14cf10364ULL, 0xa81a664bbc423001ULL,
+0xc24b8b70d0f89791ULL, 0xc76c51a30654be30ULL, 0xd192e819d6ef5218ULL,
+0xd69906245565a910ULL, 0xf40e35855771202aULL, 0x106aa07032bbd1b8ULL,
+0x19a4c116b8d2d0c8ULL, 0x1e376c085141ab53ULL, 0x2748774cdf8eeb99ULL,
+0x34b0bcb5e19b48a8ULL, 0x391c0cb3c5c95a63ULL, 0x4ed8aa4ae3418acbULL,
+0x5b9cca4f7763e373ULL, 0x682e6ff3d6b2b8a3ULL, 0x748f82ee5defb2fcULL,
+0x78a5636f43172f60ULL, 0x84c87814a1f0ab72ULL, 0x8cc702081a6439ecULL,
+0x90befffa23631e28ULL, 0xa4506cebde82bde9ULL, 0xbef9a3f7b2c67915ULL,
+0xc67178f2e372532bULL, 0xca273eceea26619cULL, 0xd186b8c721c0c207ULL,
+0xeada7dd6cde0eb1eULL, 0xf57d4f7fee6ed178ULL, 0x06f067aa72176fbaULL,
+0x0a637dc5a2c898a6ULL, 0x113f9804bef90daeULL,

Re: [PATCH v1 27/40] i386/tdx: Setup the TD HOB list

2022-08-26 Thread Gerd Hoffmann
On Tue, Aug 02, 2022 at 03:47:37PM +0800, Xiaoyao Li wrote:
> The TD HOB list is used to pass the information from VMM to TDVF. The TD
> HOB must include PHIT HOB and Resource Descriptor HOB. More details can
> be found in TDVF specification and PI specification.
> 
> Build the TD HOB in TDX's machine_init_done callback.
> 
> Co-developed-by: Isaku Yamahata 
> Signed-off-by: Isaku Yamahata 
> Co-developed-by: Sean Christopherson 
> Signed-off-by: Sean Christopherson 
> Signed-off-by: Xiaoyao Li 

Acked-by: Gerd Hoffmann 




Re: [PATCH 09/51] fsdev/virtfs-proxy-helper: Use g_mkdir_with_parents()

2022-08-26 Thread Bin Meng
On Fri, Aug 26, 2022 at 6:09 PM Christian Schoenebeck
 wrote:
>
> On Mittwoch, 24. August 2022 11:39:47 CEST Bin Meng wrote:
> > From: Bin Meng 
> >
> > Use the same g_mkdir_with_parents() call to create a directory on
> > all platforms.
>
> The same would be g_mkdir(), not g_mkdir_with_parents(), so please use that
> instead.
>

No, g_mkdir() is a deprecated API.

Search result (https://docs.gtk.org/glib/?q=mkdir) shows only
g_mkdir_with_parents().

Regards,
Bin



Re: [PATCH v1 36/40] i386/tdx: Don't synchronize guest tsc for TDs

2022-08-26 Thread Gerd Hoffmann
On Tue, Aug 02, 2022 at 03:47:46PM +0800, Xiaoyao Li wrote:
> From: Isaku Yamahata 
> 
> TSC of TDs is not accessible and KVM doesn't allow access of
> MSR_IA32_TSC for TDs. To avoid the assert() in kvm_get_tsc, make
> kvm_synchronize_all_tsc() noop for TDs,
> 
> Signed-off-by: Isaku Yamahata 
> Reviewed-by: Connor Kuehl 
> Signed-off-by: Xiaoyao Li 

Acked-by: Gerd Hoffmann 




Re: [PATCH v1 38/40] i386/tdx: Skip kvm_put_apicbase() for TDs

2022-08-26 Thread Gerd Hoffmann
On Tue, Aug 02, 2022 at 03:47:48PM +0800, Xiaoyao Li wrote:
> KVM doesn't allow wirting to MSR_IA32_APICBASE for TDs.
> 
> Signed-off-by: Xiaoyao Li 

Acked-by: Gerd Hoffmann 




Re: [PATCH v1 34/40] hw/i386: add eoi_intercept_unsupported member to X86MachineState

2022-08-26 Thread Gerd Hoffmann
On Tue, Aug 02, 2022 at 03:47:44PM +0800, Xiaoyao Li wrote:
> Add a new bool member, eoi_intercept_unsupported, to X86MachineState
> with default value false. Set true for TDX VM.
> 
> Inability to intercept eoi causes impossibility to emulate level
> triggered interrupt to be re-injected when level is still kept active.
> which affects interrupt controller emulation.

Acked-by: Gerd Hoffmann 




Re: [PATCH v1 39/40] i386/tdx: Don't get/put guest state for TDX VMs

2022-08-26 Thread Gerd Hoffmann
On Tue, Aug 02, 2022 at 03:47:49PM +0800, Xiaoyao Li wrote:
> From: Sean Christopherson 
> 
> Don't get/put state of TDX VMs since accessing/mutating guest state of
> production TDs is not supported.
> 
> Note, it will be allowed for a debug TD. Corresponding support will be
> introduced when debug TD support is implemented in the future.
> 
> Signed-off-by: Sean Christopherson 
> Signed-off-by: Xiaoyao Li 

Acked-by: Gerd Hoffmann 




Re: [PATCH v1 35/40] hw/i386: add option to forcibly report edge trigger in acpi tables

2022-08-26 Thread Gerd Hoffmann
On Tue, Aug 02, 2022 at 03:47:45PM +0800, Xiaoyao Li wrote:
> From: Isaku Yamahata 
> 
> When level trigger isn't supported on x86 platform,
> forcibly report edge trigger in acpi tables.
> 
> Signed-off-by: Isaku Yamahata 
> Signed-off-by: Xiaoyao Li 

Acked-by: Gerd Hoffmann 




Re: [PATCH v1 40/40] docs: Add TDX documentation

2022-08-26 Thread Gerd Hoffmann
On Tue, Aug 02, 2022 at 03:47:50PM +0800, Xiaoyao Li wrote:
> Add docs/system/i386/tdx.rst for TDX support, and add tdx in
> confidential-guest-support.rst
> 
> Signed-off-by: Xiaoyao Li 

Acked-by: Gerd Hoffmann 




Re: [PATCH 34/51] tests/qtest: bios-tables-test: Adapt the case for win32

2022-08-26 Thread Bin Meng
On Wed, Aug 24, 2022 at 8:42 PM Ani Sinha  wrote:
>
>
>
> On Wed, 24 Aug 2022, Bin Meng wrote:
>
> > From: Bin Meng 
> >
> > Single quotes in the arguments (oem_id='CRASH ') are not removed in
> > the Windows environment before it is passed to the QEMU executable.
> > The space in the argument causes the "-acpitable" option parser to
> > think that all of its parameters are done, hence it complains:
> >
> >   '-acpitable' requires one of 'data' or 'file'
> >
> > Change to use double quotes which works fine on all platforms.
> >
> > Also /dev/null does not work on win32, and nul should be used.
> >
> > Signed-off-by: Bin Meng 
> > ---
> >
> >  tests/qtest/bios-tables-test.c | 12 +---
> >  1 file changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c
> > index 36783966b0..0148ce388c 100644
> > --- a/tests/qtest/bios-tables-test.c
> > +++ b/tests/qtest/bios-tables-test.c
> > @@ -1615,6 +1615,12 @@ static void test_acpi_virt_viot(void)
> >  free_test_data(&data);
> >  }
> >
> > +#ifndef _WIN32
> > +# define DEV_NULL "/dev/null"
> > +#else
> > +# define DEV_NULL "nul"
> > +#endif
> > +
> >  static void test_acpi_q35_slic(void)
> >  {
> >  test_data data = {
> > @@ -1622,9 +1628,9 @@ static void test_acpi_q35_slic(void)
> >  .variant = ".slic",
> >  };
> >
> > -test_acpi_one("-acpitable sig=SLIC,oem_id='CRASH ',oem_table_id='ME',"
> > -  "oem_rev=2210,asl_compiler_id='qemu',"
> > -  "asl_compiler_rev=,data=/dev/null",
> > +test_acpi_one("-acpitable sig=SLIC,oem_id=\"CRASH \",oem_table_id=ME,"
> > +  "oem_rev=2210,asl_compiler_id=qemu,"
>
> ME and qemu should be surrounded by quotes. They are string arguments.
> https://www.qemu.org/docs/master/interop/qemu-qmp-ref.html?highlight=oem_table_id
>

The doc also says sig is a string, but the original code does not
surround it by quotes.

The parameter is of string type by default, so there is no need to
surround it by quotes unless it contains some special characters.

Regards,
Bin



Re: [PATCH v3 1/1] virtio-gpu: CONTEXT_INIT feature

2022-08-26 Thread Antonio Caggiano

Hi Marc-André,

On 26/08/2022 12:16, Marc-André Lureau wrote:

Hi

On Fri, Aug 26, 2022 at 2:12 PM Antonio Caggiano 
mailto:antonio.caggi...@collabora.com>> 
wrote:


Create virgl renderer context with flags using context_id when valid.

v2:
- The feature can be enabled via the context_init config option.
- A warning message will be emitted and the feature will not be used
   when linking with virglrenderer versions without context_init
support.

v3: Define HAVE_VIRGL_CONTEXT_INIT in config_host_data.

Signed-off-by: Antonio Caggiano mailto:antonio.caggi...@collabora.com>>
---
  hw/display/virtio-gpu-base.c   |  3 +++
  hw/display/virtio-gpu-virgl.c  | 16 ++--
  hw/display/virtio-gpu.c        |  2 ++
  include/hw/virtio/virtio-gpu.h |  3 +++
  meson.build                    |  5 +
  5 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c
index a29f191aa8..6c5f1f327f 100644
--- a/hw/display/virtio-gpu-base.c
+++ b/hw/display/virtio-gpu-base.c
@@ -215,6 +215,9 @@ virtio_gpu_base_get_features(VirtIODevice *vdev,
uint64_t features,
      if (virtio_gpu_blob_enabled(g->conf)) {
          features |= (1 << VIRTIO_GPU_F_RESOURCE_BLOB);
      }
+    if (virtio_gpu_context_init_enabled(g->conf)) {
+        features |= (1 << VIRTIO_GPU_F_CONTEXT_INIT);
+    }

      return features;
  }
diff --git a/hw/display/virtio-gpu-virgl.c
b/hw/display/virtio-gpu-virgl.c
index 73cb92c8d5..274cbc44de 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -97,8 +97,20 @@ static void virgl_cmd_context_create(VirtIOGPU *g,
      trace_virtio_gpu_cmd_ctx_create(cc.hdr.ctx_id,
                                      cc.debug_name);

-    virgl_renderer_context_create(cc.hdr.ctx_id, cc.nlen,
-                                  cc.debug_name);
+    if (cc.context_init) {
+#ifdef HAVE_VIRGL_CONTEXT_INIT
+        virgl_renderer_context_create_with_flags(cc.hdr.ctx_id,
+                                                 cc.context_init,
+                                                 cc.nlen,
+                                                 cc.debug_name);
+        return;
+#else
+        qemu_log_mask(LOG_UNIMP,
+                      "Linked virglrenderer does not support
context-init\n");


What is the outcome in that case?


It's in the commit message: "A warning message will be emitted and the 
feature will not be used when linking with virglrenderer versions 
without context_init"




+#endif
+    }
+
+    virgl_renderer_context_create(cc.hdr.ctx_id, cc.nlen,
cc.debug_name);
  }

  static void virgl_cmd_context_destroy(VirtIOGPU *g,
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 20cc703dcc..fa667ec234 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1424,6 +1424,8 @@ static Property virtio_gpu_properties[] = {
                       256 * MiB),
      DEFINE_PROP_BIT("blob", VirtIOGPU, parent_obj.conf.flags,
                      VIRTIO_GPU_FLAG_BLOB_ENABLED, false),
+    DEFINE_PROP_BIT("context_init", VirtIOGPU, parent_obj.conf.flags,
+                    VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED, false),
      DEFINE_PROP_END_OF_LIST(),
  };

diff --git a/include/hw/virtio/virtio-gpu.h
b/include/hw/virtio/virtio-gpu.h
index 2e28507efe..c6f5cfde47 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -90,6 +90,7 @@ enum virtio_gpu_base_conf_flags {
      VIRTIO_GPU_FLAG_EDID_ENABLED,
      VIRTIO_GPU_FLAG_DMABUF_ENABLED,
      VIRTIO_GPU_FLAG_BLOB_ENABLED,
+    VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED,
  };

  #define virtio_gpu_virgl_enabled(_cfg) \
@@ -102,6 +103,8 @@ enum virtio_gpu_base_conf_flags {
      (_cfg.flags & (1 << VIRTIO_GPU_FLAG_DMABUF_ENABLED))
  #define virtio_gpu_blob_enabled(_cfg) \
      (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED))
+#define virtio_gpu_context_init_enabled(_cfg) \
+    (_cfg.flags & (1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED))

  struct virtio_gpu_base_conf {
      uint32_t max_outputs;
diff --git a/meson.build b/meson.build
index 20fddbd707..e1071b3563 100644
--- a/meson.build
+++ b/meson.build
@@ -718,6 +718,11 @@ if not get_option('virglrenderer').auto() or
have_system or have_vhost_user_gpu
                       method: 'pkg-config',
                       required: get_option('virglrenderer'),
                       kwargs: static_kwargs)
+
+  config_host_data.set('HAVE_VIRGL_CONTEXT_INIT',
+ 
  cc.has_function('virgl_renderer_context_create_with_flags',

+                               

[PATCH v4 1/1] virtio-gpu: CONTEXT_INIT feature

2022-08-26 Thread Antonio Caggiano
Create virgl renderer context with flags using context_id when valid.

v2:
- The feature can be enabled via the context_init config option.
- A warning message will be emitted and the feature will not be used
  when linking with virglrenderer versions without context_init support.

v3: Define HAVE_VIRGL_CONTEXT_INIT in config_host_data.

Signed-off-by: Antonio Caggiano 
Reviewed-by: Marc-André Lureau 
---
 hw/display/virtio-gpu-base.c   |  3 +++
 hw/display/virtio-gpu-virgl.c  | 16 ++--
 hw/display/virtio-gpu.c|  2 ++
 include/hw/virtio/virtio-gpu.h |  3 +++
 meson.build|  5 +
 5 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c
index a29f191aa8..6c5f1f327f 100644
--- a/hw/display/virtio-gpu-base.c
+++ b/hw/display/virtio-gpu-base.c
@@ -215,6 +215,9 @@ virtio_gpu_base_get_features(VirtIODevice *vdev, uint64_t 
features,
 if (virtio_gpu_blob_enabled(g->conf)) {
 features |= (1 << VIRTIO_GPU_F_RESOURCE_BLOB);
 }
+if (virtio_gpu_context_init_enabled(g->conf)) {
+features |= (1 << VIRTIO_GPU_F_CONTEXT_INIT);
+}
 
 return features;
 }
diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 73cb92c8d5..274cbc44de 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -97,8 +97,20 @@ static void virgl_cmd_context_create(VirtIOGPU *g,
 trace_virtio_gpu_cmd_ctx_create(cc.hdr.ctx_id,
 cc.debug_name);
 
-virgl_renderer_context_create(cc.hdr.ctx_id, cc.nlen,
-  cc.debug_name);
+if (cc.context_init) {
+#ifdef HAVE_VIRGL_CONTEXT_INIT
+virgl_renderer_context_create_with_flags(cc.hdr.ctx_id,
+ cc.context_init,
+ cc.nlen,
+ cc.debug_name);
+return;
+#else
+qemu_log_mask(LOG_UNIMP,
+  "Linked virglrenderer does not support context-init\n");
+#endif
+}
+
+virgl_renderer_context_create(cc.hdr.ctx_id, cc.nlen, cc.debug_name);
 }
 
 static void virgl_cmd_context_destroy(VirtIOGPU *g,
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 20cc703dcc..fa667ec234 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1424,6 +1424,8 @@ static Property virtio_gpu_properties[] = {
  256 * MiB),
 DEFINE_PROP_BIT("blob", VirtIOGPU, parent_obj.conf.flags,
 VIRTIO_GPU_FLAG_BLOB_ENABLED, false),
+DEFINE_PROP_BIT("context_init", VirtIOGPU, parent_obj.conf.flags,
+VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED, false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 2e28507efe..c6f5cfde47 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -90,6 +90,7 @@ enum virtio_gpu_base_conf_flags {
 VIRTIO_GPU_FLAG_EDID_ENABLED,
 VIRTIO_GPU_FLAG_DMABUF_ENABLED,
 VIRTIO_GPU_FLAG_BLOB_ENABLED,
+VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED,
 };
 
 #define virtio_gpu_virgl_enabled(_cfg) \
@@ -102,6 +103,8 @@ enum virtio_gpu_base_conf_flags {
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_DMABUF_ENABLED))
 #define virtio_gpu_blob_enabled(_cfg) \
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED))
+#define virtio_gpu_context_init_enabled(_cfg) \
+(_cfg.flags & (1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED))
 
 struct virtio_gpu_base_conf {
 uint32_t max_outputs;
diff --git a/meson.build b/meson.build
index 20fddbd707..e1071b3563 100644
--- a/meson.build
+++ b/meson.build
@@ -718,6 +718,11 @@ if not get_option('virglrenderer').auto() or have_system 
or have_vhost_user_gpu
  method: 'pkg-config',
  required: get_option('virglrenderer'),
  kwargs: static_kwargs)
+
+  config_host_data.set('HAVE_VIRGL_CONTEXT_INIT',
+   
cc.has_function('virgl_renderer_context_create_with_flags',
+   prefix: '#include ',
+   dependencies: virgl))
 endif
 curl = not_found
 if not get_option('curl').auto() or have_block
-- 
2.34.1




[PATCH v4 0/1] virtio-gpu: CONTEXT_INIT feature

2022-08-26 Thread Antonio Caggiano
This is a different attempt at upstreaming the work I have been doing to
enable support for the Venus Virtio-GPU Vulkan driver.

I believe the previous one [0] was a bit too much stuff in one place,
therefore with this I would like to try a more fine-grained approach.

I will just start by the CONTEXT_INIT feature as it was the first commit
of the series aforementioned and the virtio-spec has been updated
recently on that regard [1]. Hopefully this would also answer Gerd's
comment on the previous patch [2].

[0] https://www.mail-archive.com/qemu-devel@nongnu.org/msg826897.html
[1] 
https://github.com/oasis-tcs/virtio-spec/commit/aad2b6f3620ec0c9d16aaf046db8c282c24cce3e
[2] https://www.mail-archive.com/qemu-devel@nongnu.org/msg827304.html

Antonio Caggiano (1):
  virtio-gpu: CONTEXT_INIT feature

 hw/display/virtio-gpu-base.c   |  3 +++
 hw/display/virtio-gpu-virgl.c  | 16 ++--
 hw/display/virtio-gpu.c|  2 ++
 include/hw/virtio/virtio-gpu.h |  3 +++
 meson.build|  5 +
 5 files changed, 27 insertions(+), 2 deletions(-)

-- 
2.34.1




Re: [PATCH 37/51] tests/qtest: migration-test: Disable IO redirection for win32

2022-08-26 Thread Bin Meng
On Thu, Aug 25, 2022 at 2:53 AM Dr. David Alan Gilbert
 wrote:
>
> * Bin Meng (bmeng...@gmail.com) wrote:
> > From: Bin Meng 
> >
> > On Windows the QEMU executable is created via CreateProcess() and IO
> > redirection does not work, so we need to set MigrateStart::hide_stderr
> > to false to disable adding IO redirection to the command line.
> >
> > Signed-off-by: Bin Meng 
>
> Isn't it easier just to change the one place that tests this?
>

Yeah, will do in v2.

Regards,
Bin



Re: [PATCH v3 1/1] virtio-gpu: CONTEXT_INIT feature

2022-08-26 Thread Marc-André Lureau
Hi

On Fri, Aug 26, 2022 at 2:45 PM Antonio Caggiano <
antonio.caggi...@collabora.com> wrote:

> Hi Marc-André,
>
> On 26/08/2022 12:16, Marc-André Lureau wrote:
> > Hi
> >
> > On Fri, Aug 26, 2022 at 2:12 PM Antonio Caggiano
> > mailto:antonio.caggi...@collabora.com>>
>
> > wrote:
> >
> > Create virgl renderer context with flags using context_id when valid.
> >
> > v2:
> > - The feature can be enabled via the context_init config option.
> > - A warning message will be emitted and the feature will not be used
> >when linking with virglrenderer versions without context_init
> > support.
> >
> > v3: Define HAVE_VIRGL_CONTEXT_INIT in config_host_data.
> >
> > Signed-off-by: Antonio Caggiano  > >
> > ---
> >   hw/display/virtio-gpu-base.c   |  3 +++
> >   hw/display/virtio-gpu-virgl.c  | 16 ++--
> >   hw/display/virtio-gpu.c|  2 ++
> >   include/hw/virtio/virtio-gpu.h |  3 +++
> >   meson.build|  5 +
> >   5 files changed, 27 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/display/virtio-gpu-base.c
> b/hw/display/virtio-gpu-base.c
> > index a29f191aa8..6c5f1f327f 100644
> > --- a/hw/display/virtio-gpu-base.c
> > +++ b/hw/display/virtio-gpu-base.c
> > @@ -215,6 +215,9 @@ virtio_gpu_base_get_features(VirtIODevice *vdev,
> > uint64_t features,
> >   if (virtio_gpu_blob_enabled(g->conf)) {
> >   features |= (1 << VIRTIO_GPU_F_RESOURCE_BLOB);
> >   }
> > +if (virtio_gpu_context_init_enabled(g->conf)) {
> > +features |= (1 << VIRTIO_GPU_F_CONTEXT_INIT);
> > +}
> >
> >   return features;
> >   }
> > diff --git a/hw/display/virtio-gpu-virgl.c
> > b/hw/display/virtio-gpu-virgl.c
> > index 73cb92c8d5..274cbc44de 100644
> > --- a/hw/display/virtio-gpu-virgl.c
> > +++ b/hw/display/virtio-gpu-virgl.c
> > @@ -97,8 +97,20 @@ static void virgl_cmd_context_create(VirtIOGPU *g,
> >   trace_virtio_gpu_cmd_ctx_create(cc.hdr.ctx_id,
> >   cc.debug_name);
> >
> > -virgl_renderer_context_create(cc.hdr.ctx_id, cc.nlen,
> > -  cc.debug_name);
> > +if (cc.context_init) {
> > +#ifdef HAVE_VIRGL_CONTEXT_INIT
> > +virgl_renderer_context_create_with_flags(cc.hdr.ctx_id,
> > + cc.context_init,
> > + cc.nlen,
> > + cc.debug_name);
> > +return;
> > +#else
> > +qemu_log_mask(LOG_UNIMP,
> > +  "Linked virglrenderer does not support
> > context-init\n");
> >
> >
> > What is the outcome in that case?
>
> It's in the commit message: "A warning message will be emitted and the
> feature will not be used when linking with virglrenderer versions
> without context_init"
>
>
Ah ok, I didn't expect this to be under the changelog. We generally don't
put it in the commit message, but rather after the three-dash line.

>
> > +#endif
> > +}
> > +
> > +virgl_renderer_context_create(cc.hdr.ctx_id, cc.nlen,
> > cc.debug_name);
> >   }
> >
> >   static void virgl_cmd_context_destroy(VirtIOGPU *g,
> > diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
> > index 20cc703dcc..fa667ec234 100644
> > --- a/hw/display/virtio-gpu.c
> > +++ b/hw/display/virtio-gpu.c
> > @@ -1424,6 +1424,8 @@ static Property virtio_gpu_properties[] = {
> >256 * MiB),
> >   DEFINE_PROP_BIT("blob", VirtIOGPU, parent_obj.conf.flags,
> >   VIRTIO_GPU_FLAG_BLOB_ENABLED, false),
> > +DEFINE_PROP_BIT("context_init", VirtIOGPU,
> parent_obj.conf.flags,
> > +VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED, false),
> >   DEFINE_PROP_END_OF_LIST(),
> >   };
> >
> > diff --git a/include/hw/virtio/virtio-gpu.h
> > b/include/hw/virtio/virtio-gpu.h
> > index 2e28507efe..c6f5cfde47 100644
> > --- a/include/hw/virtio/virtio-gpu.h
> > +++ b/include/hw/virtio/virtio-gpu.h
> > @@ -90,6 +90,7 @@ enum virtio_gpu_base_conf_flags {
> >   VIRTIO_GPU_FLAG_EDID_ENABLED,
> >   VIRTIO_GPU_FLAG_DMABUF_ENABLED,
> >   VIRTIO_GPU_FLAG_BLOB_ENABLED,
> > +VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED,
> >   };
> >
> >   #define virtio_gpu_virgl_enabled(_cfg) \
> > @@ -102,6 +103,8 @@ enum virtio_gpu_base_conf_flags {
> >   (_cfg.flags & (1 << VIRTIO_GPU_FLAG_DMABUF_ENABLED))
> >   #define virtio_gpu_blob_enabled(_cfg) \
> >   (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED))
> > +#define virtio_gpu_context_init_enabled(_cfg) \
> > +(_cfg.flags & (1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLE

[PATCH 0/8] Support VIRTIO_F_RING_RESET for vhost-user in virtio pci-modern

2022-08-26 Thread Kangjie Xu
This patch set is based on the patch set that supports VIRTIO_F_RING_RESET for 
vhost-kernel:

https://lore.kernel.org/qemu-devel/cover.1661414345.git.kangjie...@linux.alibaba.com/T/

The virtio queue reset function has already been defined in the virtio spec 1.2.
The relevant virtio spec information is here:

https://github.com/oasis-tcs/virtio-spec/issues/124
https://github.com/oasis-tcs/virtio-spec/issues/139

This patch set is to support this function for vhost-user in QEMU. It consists 
of several parts:
Patch 1: docs about vhost-user message VHOST_USER_RESET_VRING.
Patch 2: rename vhost_set_vring_enable to vhost_set_dev_enable.
Patches 3-4: support in vhost-user module.
Patches 5-6: support in vhost-net module.
Patch 7: support in virtio-net module.
Patch 8: add feature negotitation support.

The process of virtqueue reset can be concluded as:
1. The virtqueue is disabled when VIRTIO_PCI_COMMON_Q_RESET is written.
2. Then the virtqueue can be optionally restarted(re-enabled).

The detailed process is listed below:
1. VIRTIO_PCI_COMMON_Q_RESET is written [virtio-pci]
-> virtio_queue_reset() [virtio]
-> virtio_net_queue_reset() [virtio-net]
-> vhost_net_virtqueue_reset() [vhost-net]
-> vhost_user_reset_vring() [vhost-user]
-> send VHOST_USER_RESET_VRING to the device
-> vhost_virtqueue_unmap()
-> __virtio_queue_reset()
2. VIRTIO_PCI_COMMON_Q_ENABLE is written [virtio-pci]
-> virtio_queue_enable() [virtio]
-> virtio_net_queue_enable() [virtio-net]
-> vhost_net_virtqueue_restart() [vhost-net]
-> vhost_virtqueue_start()
-> vhost_user_set_vring_enable [vhost-user]
-> send VHOST_USER_SET_VRING_ENABLE to the device
-> set enabled, reset status of vq.

Test environment:
Qemu: QEMU emulator version 7.0.50
Guest: 5.19.0-rc3 (With vq reset support)
DPDK: 22.07-rc1 (With vq reset support)
Test Cmd: ethtool -g eth1; ethtool -G eth1 rx $1 tx $2; ethtool -g eth1;

The drvier can resize the virtio queue, then virtio queue reset function 
should
be triggered.

The default is split mode, modify Qemu virtio-net to add PACKED feature to 
test packed mode.

Guest Kernel Patch:

https://lore.kernel.org/bpf/20220801063902.129329-1-xuanz...@linux.alibaba.com/

DPDK Patch:

https://github.com/middaywords/dpdk/compare/72206323a5dd3182b13f61b25a64abdddfee595c...eabadfac7953da66bc10ffb8284b490d09bb7ec7

changelog:(based the series 
https://lore.kernel.org/qemu-devel/cover.1658141552.git.kangjie...@linux.alibaba.com/T/#t)
1. rename vhost_set_vring_enable to vhost_set_dev_enable.
2. add vhost-user message VHOST_USER_RESET_VRING
3. remove restart/reset functions of virtqueue in vhost module.

Kangjie Xu (8):
  docs: vhost-user: add VHOST_USER_RESET_VRING message
  net: virtio: rename vhost_set_vring_enable to vhost_set_dev_enable
  vhost-user: add op to enable or disable a single vring
  vhost-user: introduce vhost_reset_vring() interface
  vhost-net: vhost-user: update vhost_net_virtqueue_reset()
  vhost-net: vhost-user: update vhost_net_virtqueue_restart()
  virtio-net: vhost-user: update queue_reset and queue_enable
  vhost: vhost-user: enable vq reset feature

 backends/cryptodev-vhost.c| 12 +++---
 docs/interop/vhost-user.rst   | 10 +
 hw/net/vhost_net-stub.c   |  2 +-
 hw/net/vhost_net.c| 34 +---
 hw/net/virtio-net.c   | 10 +++--
 hw/virtio/vhost-user.c| 68 +++
 include/hw/virtio/vhost-backend.h |  8 +++-
 include/net/vhost_net.h   |  2 +-
 8 files changed, 119 insertions(+), 27 deletions(-)

-- 
2.32.0




[PATCH 1/8] docs: vhost-user: add VHOST_USER_RESET_VRING message

2022-08-26 Thread Kangjie Xu
To support the reset operation for an individual virtqueue,
we introduce a new message VHOST_USER_RESET_VRING. This
message is submitted by the front-end to reset an individual
virtqueue to initial states in the back-end. The reply is
needed to ensure that the reset operation is complete.

Signed-off-by: Kangjie Xu 
Signed-off-by: Xuan Zhuo 
---
 docs/interop/vhost-user.rst | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index 3f18ab424e..ce7991b9d3 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -1422,6 +1422,16 @@ Front-end message types
   query the back-end for its device status as defined in the Virtio
   specification.
 
+``VHOST_USER_RESET_VRING``
+  :id: 41
+  :equivalent ioctl: N/A
+  :request payload: vring state description
+  :reply payload: ``u64``
+
+  When the feature ``VIRTIO_F_RING_RESET`` feature has been successfully
+  negotiated, this message is submitted by the front-end to reset an
+  individual virtqueue to initial states in the back-end. It will ask
+  for a reply to ensure the virtqueue is successfully reset in the back-end.
 
 Back-end message types
 --
-- 
2.32.0




[PATCH 3/8] vhost-user: add op to enable or disable a single vring

2022-08-26 Thread Kangjie Xu
There is only vhost_set_dev_enable op in VhostOps. Thus, we introduce
the interface vhost_set_vring_enable to set the enable status for a
single vring.

Resetting a single vq will rely on this interface.

Signed-off-by: Kangjie Xu 
Signed-off-by: Xuan Zhuo 
---
 hw/virtio/vhost-user.c| 25 ++---
 include/hw/virtio/vhost-backend.h |  3 +++
 2 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 794519359b..3f140d5085 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1198,6 +1198,22 @@ static int vhost_user_set_vring_base(struct vhost_dev 
*dev,
 return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
 }
 
+static int vhost_user_set_vring_enable(struct vhost_dev *dev,
+   int index,
+   int enable)
+{
+if (index < dev->vq_index || index >= dev->vq_index + dev->nvqs) {
+return -EINVAL;
+}
+
+struct vhost_vring_state state = {
+.index = index,
+.num   = enable,
+};
+
+return vhost_set_vring(dev, VHOST_USER_SET_VRING_ENABLE, &state);
+}
+
 static int vhost_user_set_dev_enable(struct vhost_dev *dev, int enable)
 {
 int i;
@@ -1207,13 +1223,7 @@ static int vhost_user_set_dev_enable(struct vhost_dev 
*dev, int enable)
 }
 
 for (i = 0; i < dev->nvqs; ++i) {
-int ret;
-struct vhost_vring_state state = {
-.index = dev->vq_index + i,
-.num   = enable,
-};
-
-ret = vhost_set_vring(dev, VHOST_USER_SET_VRING_ENABLE, &state);
+int ret = vhost_user_set_vring_enable(dev, dev->vq_index + i, enable);
 if (ret < 0) {
 /*
  * Restoring the previous state is likely infeasible, as well as
@@ -2627,6 +2637,7 @@ const VhostOps user_ops = {
 .vhost_set_owner = vhost_user_set_owner,
 .vhost_reset_device = vhost_user_reset_device,
 .vhost_get_vq_index = vhost_user_get_vq_index,
+.vhost_set_vring_enable = vhost_user_set_vring_enable,
 .vhost_set_dev_enable = vhost_user_set_dev_enable,
 .vhost_requires_shm_log = vhost_user_requires_shm_log,
 .vhost_migration_done = vhost_user_migration_done,
diff --git a/include/hw/virtio/vhost-backend.h 
b/include/hw/virtio/vhost-backend.h
index b49432045f..dad7191bac 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -81,6 +81,8 @@ typedef int (*vhost_set_backend_cap_op)(struct vhost_dev 
*dev);
 typedef int (*vhost_set_owner_op)(struct vhost_dev *dev);
 typedef int (*vhost_reset_device_op)(struct vhost_dev *dev);
 typedef int (*vhost_get_vq_index_op)(struct vhost_dev *dev, int idx);
+typedef int (*vhost_set_vring_enable_op)(struct vhost_dev *dev,
+ int index, int enable);
 typedef int (*vhost_set_dev_enable_op)(struct vhost_dev *dev,
int enable);
 typedef bool (*vhost_requires_shm_log_op)(struct vhost_dev *dev);
@@ -155,6 +157,7 @@ typedef struct VhostOps {
 vhost_set_owner_op vhost_set_owner;
 vhost_reset_device_op vhost_reset_device;
 vhost_get_vq_index_op vhost_get_vq_index;
+vhost_set_vring_enable_op vhost_set_vring_enable;
 vhost_set_dev_enable_op vhost_set_dev_enable;
 vhost_requires_shm_log_op vhost_requires_shm_log;
 vhost_migration_done_op vhost_migration_done;
-- 
2.32.0




[PATCH 8/8] vhost: vhost-user: enable vq reset feature

2022-08-26 Thread Kangjie Xu
Add virtqueue reset feature for vhost-user.

Signed-off-by: Kangjie Xu 
Signed-off-by: Xuan Zhuo 
---
 hw/net/vhost_net.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 13b9c11e68..745cb4375b 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -74,6 +74,7 @@ static const int user_feature_bits[] = {
 VIRTIO_NET_F_MTU,
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_PACKED,
+VIRTIO_F_RING_RESET,
 VIRTIO_NET_F_RSS,
 VIRTIO_NET_F_HASH_REPORT,
 
-- 
2.32.0




[PATCH 2/8] net: virtio: rename vhost_set_vring_enable to vhost_set_dev_enable

2022-08-26 Thread Kangjie Xu
Previously, vhost_set_vring_enable will enable/disable all vrings
in a device, which causes ambiguity. So we rename it to
vhost_set_dev_enable.

Signed-off-by: Kangjie Xu 
Signed-off-by: Xuan Zhuo 
---
 backends/cryptodev-vhost.c| 12 ++--
 hw/net/vhost_net-stub.c   |  2 +-
 hw/net/vhost_net.c|  8 
 hw/net/virtio-net.c   |  4 ++--
 hw/virtio/vhost-user.c|  4 ++--
 include/hw/virtio/vhost-backend.h |  6 +++---
 include/net/vhost_net.h   |  2 +-
 7 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/backends/cryptodev-vhost.c b/backends/cryptodev-vhost.c
index bc13e466b4..b83e939760 100644
--- a/backends/cryptodev-vhost.c
+++ b/backends/cryptodev-vhost.c
@@ -147,9 +147,9 @@ cryptodev_vhost_set_vq_index(CryptoDevBackendVhost *crypto,
 }
 
 static int
-vhost_set_vring_enable(CryptoDevBackendClient *cc,
-CryptoDevBackend *b,
-uint16_t queue, int enable)
+vhost_set_dev_enable(CryptoDevBackendClient *cc,
+ CryptoDevBackend *b,
+ uint16_t queue, int enable)
 {
 CryptoDevBackendVhost *crypto =
cryptodev_get_vhost(cc, b, queue);
@@ -162,8 +162,8 @@ vhost_set_vring_enable(CryptoDevBackendClient *cc,
 }
 
 vhost_ops = crypto->dev.vhost_ops;
-if (vhost_ops->vhost_set_vring_enable) {
-return vhost_ops->vhost_set_vring_enable(&crypto->dev, enable);
+if (vhost_ops->vhost_set_dev_enable) {
+return vhost_ops->vhost_set_dev_enable(&crypto->dev, enable);
 }
 
 return 0;
@@ -219,7 +219,7 @@ int cryptodev_vhost_start(VirtIODevice *dev, int 
total_queues)
 
 if (cc->vring_enable) {
 /* restore vring enable state */
-r = vhost_set_vring_enable(cc, b, i, cc->vring_enable);
+r = vhost_set_dev_enable(cc, b, i, cc->vring_enable);
 
 if (r < 0) {
 goto err_start;
diff --git a/hw/net/vhost_net-stub.c b/hw/net/vhost_net-stub.c
index 89d71cfb8e..ac5f217dc1 100644
--- a/hw/net/vhost_net-stub.c
+++ b/hw/net/vhost_net-stub.c
@@ -92,7 +92,7 @@ VHostNetState *get_vhost_net(NetClientState *nc)
 return 0;
 }
 
-int vhost_set_vring_enable(NetClientState *nc, int enable)
+int vhost_set_dev_enable(NetClientState *nc, int enable)
 {
 return 0;
 }
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 74c5147d6e..c0c1456172 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -379,7 +379,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 
 if (peer->vring_enable) {
 /* restore vring enable state */
-r = vhost_set_vring_enable(peer, peer->vring_enable);
+r = vhost_set_dev_enable(peer, peer->vring_enable);
 
 if (r < 0) {
 vhost_net_stop_one(get_vhost_net(peer), dev);
@@ -491,15 +491,15 @@ VHostNetState *get_vhost_net(NetClientState *nc)
 return vhost_net;
 }
 
-int vhost_set_vring_enable(NetClientState *nc, int enable)
+int vhost_set_dev_enable(NetClientState *nc, int enable)
 {
 VHostNetState *net = get_vhost_net(nc);
 const VhostOps *vhost_ops = net->dev.vhost_ops;
 
 nc->vring_enable = enable;
 
-if (vhost_ops && vhost_ops->vhost_set_vring_enable) {
-return vhost_ops->vhost_set_vring_enable(&net->dev, enable);
+if (vhost_ops && vhost_ops->vhost_set_dev_enable) {
+return vhost_ops->vhost_set_dev_enable(&net->dev, enable);
 }
 
 return 0;
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 7817206596..6ab796b399 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -696,7 +696,7 @@ static int peer_attach(VirtIONet *n, int index)
 }
 
 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_USER) {
-vhost_set_vring_enable(nc->peer, 1);
+vhost_set_dev_enable(nc->peer, 1);
 }
 
 if (nc->peer->info->type != NET_CLIENT_DRIVER_TAP) {
@@ -719,7 +719,7 @@ static int peer_detach(VirtIONet *n, int index)
 }
 
 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_USER) {
-vhost_set_vring_enable(nc->peer, 0);
+vhost_set_dev_enable(nc->peer, 0);
 }
 
 if (nc->peer->info->type !=  NET_CLIENT_DRIVER_TAP) {
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index bd24741be8..794519359b 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1198,7 +1198,7 @@ static int vhost_user_set_vring_base(struct vhost_dev 
*dev,
 return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
 }
 
-static int vhost_user_set_vring_enable(struct vhost_dev *dev, int enable)
+static int vhost_user_set_dev_enable(struct vhost_dev *dev, int enable)
 {
 int i;
 
@@ -2627,7 +2627,7 @@ const VhostOps user_ops = {
 .vhost_set_owner = vhost_user_set_owner,
 .vhost_reset_device = vhost_user_reset_device,
 .vhost_get_vq_index = vhost_user_get_vq_index,
-.vhost_set_vring_enable = vhos

[PATCH 5/8] vhost-net: vhost-user: update vhost_net_virtqueue_reset()

2022-08-26 Thread Kangjie Xu
Update vhost_net_virtqueue_reset() for vhost-user scenario.

In order to reuse some functions, we process the idx for
vhost-user scenario because vhost_get_vq_index behave
differently for vhost-user.

Signed-off-by: Kangjie Xu 
Signed-off-by: Xuan Zhuo 
---
 hw/net/vhost_net.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index c0c1456172..8ad5743f7c 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -522,19 +522,28 @@ void vhost_net_virtqueue_reset(VirtIODevice *vdev, 
NetClientState *nc,
 VHostNetState *net = get_vhost_net(nc->peer);
 const VhostOps *vhost_ops = net->dev.vhost_ops;
 struct vhost_vring_file file = { .fd = -1 };
-int idx;
+struct vhost_vring_state state;
+int idx, r;
 
 /* should only be called after backend is connected */
 assert(vhost_ops);
 
 idx = vhost_ops->vhost_get_vq_index(&net->dev, vq_index);
+if (net->nc->info->type == NET_CLIENT_DRIVER_VHOST_USER) {
+idx -= net->dev.vq_index;
+}
 
 if (net->nc->info->type == NET_CLIENT_DRIVER_TAP) {
 file.index = idx;
-int r = vhost_net_set_backend(&net->dev, &file);
+r = vhost_net_set_backend(&net->dev, &file);
 assert(r >= 0);
 }
 
+if (vhost_ops->vhost_reset_vring) {
+state.index = net->dev.vq_index + idx;
+r = vhost_ops->vhost_reset_vring(&net->dev, &state);
+}
+
 vhost_virtqueue_unmap(&net->dev, vdev, net->dev.vqs + idx, idx);
 }
 
-- 
2.32.0




[PATCH 7/8] virtio-net: vhost-user: update queue_reset and queue_enable

2022-08-26 Thread Kangjie Xu
Update virtio_net_queue_reset() and virtio_net_queue_enable()
for vhost-user scenario.

Signed-off-by: Kangjie Xu 
Signed-off-by: Xuan Zhuo 
---
 hw/net/virtio-net.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 6ab796b399..19a2132180 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -550,7 +550,8 @@ static void virtio_net_queue_reset(VirtIODevice *vdev, 
uint32_t queue_index)
 }
 
 if (get_vhost_net(nc->peer) &&
-nc->peer->info->type == NET_CLIENT_DRIVER_TAP) {
+(nc->peer->info->type == NET_CLIENT_DRIVER_TAP ||
+ nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_USER)) {
 vhost_net_virtqueue_reset(vdev, nc, queue_index);
 }
 
@@ -568,7 +569,8 @@ static void virtio_net_queue_enable(VirtIODevice *vdev, 
uint32_t queue_index)
 }
 
 if (get_vhost_net(nc->peer) &&
-nc->peer->info->type == NET_CLIENT_DRIVER_TAP) {
+(nc->peer->info->type == NET_CLIENT_DRIVER_TAP ||
+ nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_USER)) {
 r = vhost_net_virtqueue_restart(vdev, nc, queue_index);
 if (r < 0) {
 error_report("unable to restart vhost net virtqueue: %d, "
-- 
2.32.0




[PATCH 4/8] vhost-user: introduce vhost_reset_vring() interface

2022-08-26 Thread Kangjie Xu
Introduce the interface vhost_reset_vring(). The interface is a wrapper
to send a VHOST_USER_RESET_VRING message to the back-end. It will reset
an individual vring in the back-end. Meanwhile, it will wait for a reply
to ensure the reset has been completed.

Signed-off-by: Kangjie Xu 
Signed-off-by: Xuan Zhuo 
---
 hw/virtio/vhost-user.c| 41 +++
 include/hw/virtio/vhost-backend.h |  3 +++
 2 files changed, 44 insertions(+)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 3f140d5085..b49076fdc4 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -126,6 +126,7 @@ typedef enum VhostUserRequest {
 VHOST_USER_GET_MAX_MEM_SLOTS = 36,
 VHOST_USER_ADD_MEM_REG = 37,
 VHOST_USER_REM_MEM_REG = 38,
+VHOST_USER_RESET_VRING = 41,
 VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -1508,6 +1509,45 @@ static int vhost_user_get_max_memslots(struct vhost_dev 
*dev,
 return 0;
 }
 
+static int vhost_user_reset_vring(struct vhost_dev *dev,
+  struct vhost_vring_state *ring)
+{
+int ret;
+VhostUserMsg msg = {
+.hdr.request = VHOST_USER_RESET_VRING,
+.hdr.flags = VHOST_USER_VERSION,
+.payload.state = *ring,
+.hdr.size = sizeof(msg.payload.state),
+};
+
+if (!virtio_has_feature(dev->acked_features, VIRTIO_F_RING_RESET)) {
+return -ENOTSUP;
+}
+
+ret = vhost_user_write(dev, &msg, NULL, 0);
+if (ret < 0) {
+return ret;
+}
+
+ret = vhost_user_read(dev, &msg);
+if (ret < 0) {
+return ret;
+}
+
+if (msg.hdr.request != VHOST_USER_RESET_VRING) {
+error_report("Received unexpected msg type. Expected %d received %d",
+ VHOST_USER_RESET_VRING, msg.hdr.request);
+return -EPROTO;
+}
+
+if (msg.hdr.size != sizeof(msg.payload.state)) {
+error_report("Received bad msg size.");
+return -EPROTO;
+}
+
+return 0;
+}
+
 static int vhost_user_reset_device(struct vhost_dev *dev)
 {
 VhostUserMsg msg = {
@@ -2635,6 +2675,7 @@ const VhostOps user_ops = {
 .vhost_set_features = vhost_user_set_features,
 .vhost_get_features = vhost_user_get_features,
 .vhost_set_owner = vhost_user_set_owner,
+.vhost_reset_vring = vhost_user_reset_vring,
 .vhost_reset_device = vhost_user_reset_device,
 .vhost_get_vq_index = vhost_user_get_vq_index,
 .vhost_set_vring_enable = vhost_user_set_vring_enable,
diff --git a/include/hw/virtio/vhost-backend.h 
b/include/hw/virtio/vhost-backend.h
index dad7191bac..ec65a55a77 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -79,6 +79,8 @@ typedef int (*vhost_get_features_op)(struct vhost_dev *dev,
  uint64_t *features);
 typedef int (*vhost_set_backend_cap_op)(struct vhost_dev *dev);
 typedef int (*vhost_set_owner_op)(struct vhost_dev *dev);
+typedef int (*vhost_reset_vring_op)(struct vhost_dev *dev,
+struct vhost_vring_state *ring);
 typedef int (*vhost_reset_device_op)(struct vhost_dev *dev);
 typedef int (*vhost_get_vq_index_op)(struct vhost_dev *dev, int idx);
 typedef int (*vhost_set_vring_enable_op)(struct vhost_dev *dev,
@@ -156,6 +158,7 @@ typedef struct VhostOps {
 vhost_set_backend_cap_op vhost_set_backend_cap;
 vhost_set_owner_op vhost_set_owner;
 vhost_reset_device_op vhost_reset_device;
+vhost_reset_vring_op vhost_reset_vring;
 vhost_get_vq_index_op vhost_get_vq_index;
 vhost_set_vring_enable_op vhost_set_vring_enable;
 vhost_set_dev_enable_op vhost_set_dev_enable;
-- 
2.32.0




[PATCH 6/8] vhost-net: vhost-user: update vhost_net_virtqueue_restart()

2022-08-26 Thread Kangjie Xu
Update vhost_net_virtqueue_restart() for vhost-user scenario.

In order to reuse some functions, we process the idx for
vhost-user case. It is because vhost_get_vq_index behave
differently in vhost-user.

Signed-off-by: Kangjie Xu 
Signed-off-by: Xuan Zhuo 
---
 hw/net/vhost_net.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 8ad5743f7c..13b9c11e68 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -563,6 +563,9 @@ int vhost_net_virtqueue_restart(VirtIODevice *vdev, 
NetClientState *nc,
 assert(vhost_ops);
 
 idx = vhost_ops->vhost_get_vq_index(&net->dev, vq_index);
+if (net->nc->info->type == NET_CLIENT_DRIVER_VHOST_USER) {
+idx -= net->dev.vq_index;
+}
 
 r = vhost_virtqueue_start(&net->dev,
   vdev,
@@ -572,6 +575,15 @@ int vhost_net_virtqueue_restart(VirtIODevice *vdev, 
NetClientState *nc,
 goto err_start;
 }
 
+if (vhost_ops->vhost_set_vring_enable) {
+r = vhost_ops->vhost_set_vring_enable(&net->dev,
+  net->dev.vq_index + idx,
+  1);
+if (r < 0) {
+goto err_start;
+}
+}
+
 if (net->nc->info->type == NET_CLIENT_DRIVER_TAP) {
 file.index = idx;
 file.fd = net->backend;
-- 
2.32.0




[PATCH] KVM: dirty ring: add missing memory barrier

2022-08-26 Thread Paolo Bonzini
The KVM_DIRTY_GFN_F_DIRTY flag ensures that the entry is valid.  If
the read of the fields are not ordered after the read of the flag,
QEMU might see stale values.

Cc: Peter Xu 
Cc: Gavin Shan 
Signed-off-by: Paolo Bonzini 
---
 accel/kvm/kvm-all.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 8d81ab74de..f49643cd24 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -719,7 +719,7 @@ static void kvm_dirty_ring_mark_page(KVMState *s, uint32_t 
as_id,
 
 static bool dirty_gfn_is_dirtied(struct kvm_dirty_gfn *gfn)
 {
-return gfn->flags == KVM_DIRTY_GFN_F_DIRTY;
+return qatomic_load_acquire(&gfn->flags, KVM_DIRTY_GFN_F_DIRTY);
 }
 
 static void dirty_gfn_set_collected(struct kvm_dirty_gfn *gfn)
-- 
2.37.1




Re: [PATCH 09/51] fsdev/virtfs-proxy-helper: Use g_mkdir_with_parents()

2022-08-26 Thread Christian Schoenebeck
On Freitag, 26. August 2022 12:30:20 CEST Bin Meng wrote:
> On Fri, Aug 26, 2022 at 6:09 PM Christian Schoenebeck
> 
>  wrote:
> > On Mittwoch, 24. August 2022 11:39:47 CEST Bin Meng wrote:
> > > From: Bin Meng 
> > > 
> > > Use the same g_mkdir_with_parents() call to create a directory on
> > > all platforms.
> > 
> > The same would be g_mkdir(), not g_mkdir_with_parents(), so please use
> > that
> > instead.
> 
> No, g_mkdir() is a deprecated API.

Where did you got that from? AFAICS g_mkdir() does not seem to be deprecated:
https://gitlab.gnome.org/GNOME/glib/-/blob/main/glib/gstdio.c#L1201
https://gitlab.gnome.org/GNOME/glib/-/blob/main/glib/gstdio.h#L131

> Search result (https://docs.gtk.org/glib/?q=mkdir) shows only
> g_mkdir_with_parents().

Yeah, but that does not say that it was deprecated.

> Regards,
> Bin





[PATCH 0/3] iothread and irqfd support

2022-08-26 Thread Jinhao Fan
This patch series adds support for using a seperate iothread for NVMe
IO emulation, which brings the potential of applying polling. The
first two patches implements support for irqfd, which solves thread
safety problems for interrupt emulation outside the main loop thread.

Jinhao Fan (3):
  hw/nvme: support irq(de)assertion with eventfd
  hw/nvme: use KVM irqfd when available
  hw/nvme: add iothread support

 hw/nvme/ctrl.c   | 335 +++
 hw/nvme/ns.c |  21 ++-
 hw/nvme/nvme.h   |  12 +-
 hw/nvme/trace-events |   3 +
 4 files changed, 342 insertions(+), 29 deletions(-)

-- 
2.25.1




Re: [PATCH 50/51] .gitlab-ci.d/windows.yml: Increase the timeout to the runner limit

2022-08-26 Thread Bin Meng
On Thu, Aug 25, 2022 at 4:18 PM Thomas Huth  wrote:
>
> On 24/08/2022 11.40, Bin Meng wrote:
> > From: Bin Meng 
> >
> > commit 9f8e6cad65a6 ("gitlab-ci: Speed up the msys2-64bit job by using 
> > --without-default-devices"
> > changed to compile QEMU with the --without-default-devices switch for
> > the msys2-64bit job, due to the build could not complete within the
> > project timeout (1h), and also mentioned that a bigger timeout was
> > getting ignored on the shared Gitlab-CI Windows runners.
> >
> > However as of today it seems the shared Gitlab-CI Windows runners does
> > honor the job timeout, and the runner has the timeout limit of 2h, so
> > let's increase the timeout to the runner limit and drop the configure
> > switch "--without-default-devices" to get a larger build coverage.
> >
> > As a result of this, the check-qtest starts running on Windows in CI.
> >
> > Signed-off-by: Bin Meng 
> > ---
> >
> >   .gitlab-ci.d/windows.yml | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/.gitlab-ci.d/windows.yml b/.gitlab-ci.d/windows.yml
> > index c4bde758be..d4fd821b5f 100644
> > --- a/.gitlab-ci.d/windows.yml
> > +++ b/.gitlab-ci.d/windows.yml
> > @@ -10,7 +10,7 @@
> > - ${CI_PROJECT_DIR}/msys64/var/cache
> > needs: []
> > stage: build
> > -  timeout: 70m
> > +  timeout: 2h
>
> IMHO 2 hours are too long ... we're normally trying to limit the time of
> each job to 1h only and only extend it a little bit if we cannot really
> make, but we should not double the amount of time here. The highest timeout
> that we currently have are 90 minutes ... would that still be OK for this

90 minutes is okay for "make -j2" on the CI machine, but if we disable
the parallel build I am afraid 90 minutes is not enough.

> job, too? If so, please use 90 minutes here. Otherwise, it might still be
> necessary to cut down this job here and there a little bit...
> (maybe the tests now also work a little bit faster now that the migration
> test has been speed up in 7.1-rc4 ?)

I believe the build takes more time than the testing. But definitely
the latest migration test speed up patch will help on windows too.

Regards,
Bin



Re: [PATCH 50/51] .gitlab-ci.d/windows.yml: Increase the timeout to the runner limit

2022-08-26 Thread Daniel P . Berrangé
On Thu, Aug 25, 2022 at 10:18:06AM +0200, Thomas Huth wrote:
> On 24/08/2022 11.40, Bin Meng wrote:
> > From: Bin Meng 
> > 
> > commit 9f8e6cad65a6 ("gitlab-ci: Speed up the msys2-64bit job by using 
> > --without-default-devices"
> > changed to compile QEMU with the --without-default-devices switch for
> > the msys2-64bit job, due to the build could not complete within the
> > project timeout (1h), and also mentioned that a bigger timeout was
> > getting ignored on the shared Gitlab-CI Windows runners.
> > 
> > However as of today it seems the shared Gitlab-CI Windows runners does
> > honor the job timeout, and the runner has the timeout limit of 2h, so
> > let's increase the timeout to the runner limit and drop the configure
> > switch "--without-default-devices" to get a larger build coverage.
> > 
> > As a result of this, the check-qtest starts running on Windows in CI.
> > 
> > Signed-off-by: Bin Meng 
> > ---
> > 
> >   .gitlab-ci.d/windows.yml | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/.gitlab-ci.d/windows.yml b/.gitlab-ci.d/windows.yml
> > index c4bde758be..d4fd821b5f 100644
> > --- a/.gitlab-ci.d/windows.yml
> > +++ b/.gitlab-ci.d/windows.yml
> > @@ -10,7 +10,7 @@
> > - ${CI_PROJECT_DIR}/msys64/var/cache
> > needs: []
> > stage: build
> > -  timeout: 70m
> > +  timeout: 2h
> 
> IMHO 2 hours are too long ... we're normally trying to limit the time of
> each job to 1h only and only extend it a little bit if we cannot really
> make, but we should not double the amount of time here. The highest timeout
> that we currently have are 90 minutes ... would that still be OK for this
> job, too? If so, please use 90 minutes here. Otherwise, it might still be
> necessary to cut down this job here and there a little bit...

Also note that 90 minutes is not considered the typical execution
time. For a 90 minute timeout, we should expect the job to run
much quicker than that under normal CI load. eg a 90 minute timeout
should imply a job typically runs in 60-70 minutes, leaving some slack.

IMHO if normal execution of a job takes >60 minutes, we need to
turn off features in CI to get it faster, or split it across
multiple jobs, not increase the timeout even more.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




[PATCH] linux-user: use 'max' instead of 'qemu32' / 'qemu64' by defualt

2022-08-26 Thread Daniel P . Berrangé
The 'qemu64' CPU model implements the least featureful x86_64 CPU that's
possible. Historically this hasn't been an issue since it was rare for
OS distros to build with a higher mandatory CPU baseline.

With RHEL-9, however, the entire distro is built for the x86_64-v2 ABI
baseline:

  
https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-linux-9-for-the-x86-64-v2-microarchitecture-level

It is likely that other distros may take similar steps in the not too
distant future. For example, it has been suggested for Fedora on a
number of occassions.

This new baseline is not compatible with the qemu64 CPU model though.
While it is possible to pass a '-cpu xxx' flag to qemu-x86_64, the
usage of QEMU doesn't always allow for this. For example, the args
are typically controlled via binfmt rules that the user has no ability
to change. This impacts users who are trying to use podman on aarch64
platforms, to run containers with x86_64 content. There's no arg to
podman that can be used to change the qemu-x86_64 args, and a non-root
user of podman can not change binfmt rules without elevating privileges:

  https://github.com/containers/podman/issues/15456#issuecomment-1228210973

Changing to the 'max' CPU model gives 'qemu-x86_64' maximum
compatibility with binaries it is likely to encounter in the wild,
and not likely to have a significant downside for existing usage.

Most other architectures already use an 'any' CPU model, which is
often mapped to 'max' (or similar) already, rather than the oldest
possible CPU model.

For the sake of consistency the 'i386' architecture is also changed
from using 'qemu32' to 'max'.

Signed-off-by: Daniel P. Berrangé 
---
 linux-user/i386/target_elf.h   | 2 +-
 linux-user/x86_64/target_elf.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/linux-user/i386/target_elf.h b/linux-user/i386/target_elf.h
index 1c6142e7da..238a9aba73 100644
--- a/linux-user/i386/target_elf.h
+++ b/linux-user/i386/target_elf.h
@@ -9,6 +9,6 @@
 #define I386_TARGET_ELF_H
 static inline const char *cpu_get_model(uint32_t eflags)
 {
-return "qemu32";
+return "max";
 }
 #endif
diff --git a/linux-user/x86_64/target_elf.h b/linux-user/x86_64/target_elf.h
index 7b76a90de8..3f628f8d66 100644
--- a/linux-user/x86_64/target_elf.h
+++ b/linux-user/x86_64/target_elf.h
@@ -9,6 +9,6 @@
 #define X86_64_TARGET_ELF_H
 static inline const char *cpu_get_model(uint32_t eflags)
 {
-return "qemu64";
+return "max";
 }
 #endif
-- 
2.37.2




[PATCH] tests/x86: Add 'q35' machine type to hotplug tests

2022-08-26 Thread Michael Labiuk via
Add pci bridge setting to run hotplug tests on q35 machine type.
Hotplug tests was bounded to 'pc' machine type by commit 7b172333f1b

Signed-off-by: Michael Labiuk 
---
 tests/qtest/device-plug-test.c |  26 ++
 tests/qtest/drive_del-test.c   | 111 +
 tests/qtest/hd-geo-test.c  | 148 +
 tests/qtest/ivshmem-test.c |  30 +++
 4 files changed, 315 insertions(+)

diff --git a/tests/qtest/device-plug-test.c b/tests/qtest/device-plug-test.c
index 2e3137843e..2f07b37ba1 100644
--- a/tests/qtest/device-plug-test.c
+++ b/tests/qtest/device-plug-test.c
@@ -165,6 +165,26 @@ static void test_spapr_phb_unplug_request(void)
 qtest_quit(qtest);
 }
 
+static void test_q35_pci_unplug_request(void)
+{
+
+QTestState *qtest = qtest_initf("-machine q35 "
+"-device pcie-root-port,id=p1 "
+"-device pcie-pci-bridge,bus=p1,id=b1 "
+"-device virtio-mouse-pci,bus=b1,id=dev0");
+
+/*
+ * Request device removal. As the guest is not running, the request won't
+ * be processed. However during system reset, the removal will be
+ * handled, removing the device.
+ */
+device_del(qtest, "dev0");
+system_reset(qtest);
+wait_device_deleted_event(qtest, "dev0");
+
+qtest_quit(qtest);
+}
+
 int main(int argc, char **argv)
 {
 const char *arch = qtest_get_arch();
@@ -195,5 +215,11 @@ int main(int argc, char **argv)
test_spapr_phb_unplug_request);
 }
 
+if (!strcmp(arch, "x86_64")) {
+qtest_add_func("/device-plug/q35-pci-unplug-request",
+   test_q35_pci_unplug_request);
+
+}
+
 return g_test_run();
 }
diff --git a/tests/qtest/drive_del-test.c b/tests/qtest/drive_del-test.c
index 5e6d58b4dd..3a2ddecf22 100644
--- a/tests/qtest/drive_del-test.c
+++ b/tests/qtest/drive_del-test.c
@@ -258,6 +258,27 @@ static void test_cli_device_del(void)
 qtest_quit(qts);
 }
 
+static void test_cli_device_del_q35(void)
+{
+QTestState *qts;
+
+/*
+ * -drive/-device and device_del.  Start with a drive used by a
+ * device that unplugs after reset.
+ */
+qts = qtest_initf("-drive if=none,id=drive0,file=null-co://,"
+  "file.read-zeroes=on,format=raw "
+  "-machine q35 -device pcie-root-port,id=p1 "
+  "-device pcie-pci-bridge,bus=p1,id=b1 "
+  "-device virtio-blk-%s,drive=drive0,bus=b1,id=dev0",
+  qvirtio_get_dev_type());
+
+device_del(qts, true);
+g_assert(!has_drive(qts));
+
+qtest_quit(qts);
+}
+
 static void test_empty_device_del(void)
 {
 QTestState *qts;
@@ -294,6 +315,45 @@ static void test_device_add_and_del(void)
 qtest_quit(qts);
 }
 
+static void device_add_q35(QTestState *qts)
+{
+QDict *response;
+char driver[32];
+snprintf(driver, sizeof(driver), "virtio-blk-%s",
+ qvirtio_get_dev_type());
+
+response = qtest_qmp(qts, "{'execute': 'device_add',"
+  " 'arguments': {"
+  "   'driver': %s,"
+  "   'drive': 'drive0',"
+  "   'id': 'dev0',"
+  "   'bus': 'b1'"
+  "}}", driver);
+g_assert(response);
+g_assert(qdict_haskey(response, "return"));
+qobject_unref(response);
+}
+
+static void test_device_add_and_del_q35(void)
+{
+QTestState *qts;
+
+/*
+ * -drive/device_add and device_del.  Start with a drive used by a
+ * device that unplugs after reset.
+ */
+qts = qtest_initf("-machine q35 -device pcie-root-port,id=p1 "
+ "-device pcie-pci-bridge,bus=p1,id=b1 "
+ "-drive if=none,id=drive0,file=null-co://,"
+ "file.read-zeroes=on,format=raw");
+
+device_add_q35(qts);
+device_del(qts, true);
+g_assert(!has_drive(qts));
+
+qtest_quit(qts);
+}
+
 static void test_drive_add_device_add_and_del(void)
 {
 QTestState *qts;
@@ -318,6 +378,25 @@ static void test_drive_add_device_add_and_del(void)
 qtest_quit(qts);
 }
 
+static void test_drive_add_device_add_and_del_q35(void)
+{
+QTestState *qts;
+
+qts = qtest_init("-machine q35 -device pcie-root-port,id=p1 "
+ "-device pcie-pci-bridge,bus=p1,id=b1");
+
+/*
+ * drive_add/device_add and device_del.  The drive is used by a
+ * device that unplugs after reset.
+ */
+drive_add_with_media(qts);
+device_add_q35(qts);
+device_del(qts, true);
+g_assert(!has_drive(qts));
+
+qtest_quit(qts);
+}
+
 static void test_blockdev_add_device_add_and_del(void)
 {
 QTestState *qts;
@@ -342,8 +421,29 @@ static void test_blockdev_add_device_add_and_del(void)
 qtest_quit(qts);
 }
 
+static void test_blockdev_add_device_add_and_del_q3

[PATCH 1/3] hw/nvme: support irq(de)assertion with eventfd

2022-08-26 Thread Jinhao Fan
When the new option 'irq-eventfd' is turned on, the IO emulation code
signals an eventfd when it want to (de)assert an irq. The main loop
eventfd handler does the actual irq (de)assertion.  This paves the way
for iothread support since QEMU's interrupt emulation is not thread
safe.

Asserting and deasseting irq with eventfd has some performance
implications. For small queue depth it increases request latency but
for large queue depth it effectively coalesces irqs.

Comparision (KIOPS):

QD1   4  16  64
QEMU 38 123 210 329
irq-eventfd  32 106 240 364

Signed-off-by: Jinhao Fan 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 120 ++---
 hw/nvme/nvme.h |   3 ++
 2 files changed, 106 insertions(+), 17 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 87aeba0564..51792f3955 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -526,34 +526,57 @@ static void nvme_irq_check(NvmeCtrl *n)
 }
 }
 
+static void nvme_irq_do_assert(NvmeCtrl *n, NvmeCQueue *cq)
+{
+if (msix_enabled(&(n->parent_obj))) {
+trace_pci_nvme_irq_msix(cq->vector);
+msix_notify(&(n->parent_obj), cq->vector);
+} else {
+trace_pci_nvme_irq_pin();
+assert(cq->vector < 32);
+n->irq_status |= 1 << cq->vector;
+nvme_irq_check(n);
+}
+}
+
 static void nvme_irq_assert(NvmeCtrl *n, NvmeCQueue *cq)
 {
 if (cq->irq_enabled) {
-if (msix_enabled(&(n->parent_obj))) {
-trace_pci_nvme_irq_msix(cq->vector);
-msix_notify(&(n->parent_obj), cq->vector);
+if (cq->assert_notifier.initialized) {
+event_notifier_set(&cq->assert_notifier);
 } else {
-trace_pci_nvme_irq_pin();
-assert(cq->vector < 32);
-n->irq_status |= 1 << cq->vector;
-nvme_irq_check(n);
+nvme_irq_do_assert(n, cq);
 }
 } else {
 trace_pci_nvme_irq_masked();
 }
 }
 
+static void nvme_irq_do_deassert(NvmeCtrl *n, NvmeCQueue *cq)
+{
+if (msix_enabled(&(n->parent_obj))) {
+return;
+} else {
+assert(cq->vector < 32);
+if (!n->cq_pending) {
+n->irq_status &= ~(1 << cq->vector);
+}
+nvme_irq_check(n);
+}
+}
+
 static void nvme_irq_deassert(NvmeCtrl *n, NvmeCQueue *cq)
 {
 if (cq->irq_enabled) {
-if (msix_enabled(&(n->parent_obj))) {
-return;
+if (cq->deassert_notifier.initialized) {
+/*
+ * The deassert notifier will only be initilized when MSI-X is NOT
+ * in use. Therefore no need to worry about extra eventfd syscall
+ * for pin-based interrupts.
+ */
+event_notifier_set(&cq->deassert_notifier);
 } else {
-assert(cq->vector < 32);
-if (!n->cq_pending) {
-n->irq_status &= ~(1 << cq->vector);
-}
-nvme_irq_check(n);
+nvme_irq_do_deassert(n, cq);
 }
 }
 }
@@ -1338,6 +1361,50 @@ static void nvme_update_cq_head(NvmeCQueue *cq)
 trace_pci_nvme_shadow_doorbell_cq(cq->cqid, cq->head);
 }
 
+static void nvme_assert_notifier_read(EventNotifier *e)
+{
+NvmeCQueue *cq = container_of(e, NvmeCQueue, assert_notifier);
+if (event_notifier_test_and_clear(e)) {
+nvme_irq_do_assert(cq->ctrl, cq);
+}
+}
+
+static void nvme_deassert_notifier_read(EventNotifier *e)
+{
+NvmeCQueue *cq = container_of(e, NvmeCQueue, deassert_notifier);
+if (event_notifier_test_and_clear(e)) {
+nvme_irq_do_deassert(cq->ctrl, cq);
+}
+}
+
+static void nvme_init_irq_notifier(NvmeCtrl *n, NvmeCQueue *cq)
+{
+int ret;
+
+ret = event_notifier_init(&cq->assert_notifier, 0);
+if (ret < 0) {
+return;
+}
+
+event_notifier_set_handler(&cq->assert_notifier,
+nvme_assert_notifier_read);
+
+if (!msix_enabled(&n->parent_obj)) {
+ret = event_notifier_init(&cq->deassert_notifier, 0);
+if (ret < 0) {
+event_notifier_set_handler(&cq->assert_notifier, NULL);
+event_notifier_cleanup(&cq->assert_notifier);
+
+return;
+}
+
+event_notifier_set_handler(&cq->deassert_notifier,
+   nvme_deassert_notifier_read);
+}
+
+return;
+}
+
 static void nvme_post_cqes(void *opaque)
 {
 NvmeCQueue *cq = opaque;
@@ -1377,8 +1444,10 @@ static void nvme_post_cqes(void *opaque)
 QTAILQ_INSERT_TAIL(&sq->req_list, req, entry);
 }
 if (cq->tail != cq->head) {
-if (cq->irq_enabled && !pending) {
-n->cq_pending++;
+if (cq->irq_enabled) {
+if (!pending) {
+n->cq_pending++;
+}
 }
 
 nvme_irq_assert(n, cq);
@@ -4705,6 +4774,14 @@ static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n)
 event_notifier_set_handler(&cq->notifier, NULL);

[PATCH 2/3] hw/nvme: use KVM irqfd when available

2022-08-26 Thread Jinhao Fan
Use KVM's irqfd to send interrupts when possible. This approach is
thread safe. Moreover, it does not have the inter-thread communication
overhead of plain event notifiers since handler callback are called
in the same system call as irqfd write.

Signed-off-by: Jinhao Fan 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   | 145 ++-
 hw/nvme/nvme.h   |   3 +
 hw/nvme/trace-events |   3 +
 3 files changed, 149 insertions(+), 2 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 51792f3955..396f3f0cdd 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -192,6 +192,7 @@
 #include "qapi/error.h"
 #include "qapi/visitor.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/kvm.h"
 #include "sysemu/block-backend.h"
 #include "sysemu/hostmem.h"
 #include "hw/pci/msix.h"
@@ -1377,8 +1378,115 @@ static void nvme_deassert_notifier_read(EventNotifier 
*e)
 }
 }
 
+static int nvme_kvm_vector_use(NvmeCtrl *n, NvmeCQueue *cq, uint32_t vector)
+{
+KVMRouteChange c = kvm_irqchip_begin_route_changes(kvm_state);
+int ret;
+
+ret = kvm_irqchip_add_msi_route(&c, vector, &n->parent_obj);
+if (ret < 0) {
+return ret;
+}
+
+kvm_irqchip_commit_route_changes(&c);
+
+cq->virq = ret;
+
+return 0;
+}
+
+static int nvme_kvm_vector_unmask(PCIDevice *pci_dev, unsigned vector,
+  MSIMessage msg)
+{
+NvmeCtrl *n = NVME(pci_dev);
+int ret;
+
+trace_pci_nvme_irq_unmask(vector, msg.address, msg.data);
+
+for (uint32_t i = 1; i <= n->params.max_ioqpairs; i++) {
+NvmeCQueue *cq = n->cq[i];
+
+if (!cq) {
+continue;
+}
+
+if (cq->vector == vector) {
+if (cq->msg.data != msg.data || cq->msg.address != msg.address) {
+ret = kvm_irqchip_update_msi_route(kvm_state, cq->virq, msg,
+   pci_dev);
+if (ret < 0) {
+return ret;
+}
+
+kvm_irqchip_commit_routes(kvm_state);
+
+cq->msg = msg;
+}
+
+ret = kvm_irqchip_add_irqfd_notifier_gsi(kvm_state,
+ &cq->assert_notifier,
+ NULL, cq->virq);
+if (ret < 0) {
+return ret;
+}
+}
+}
+
+return 0;
+}
+
+static void nvme_kvm_vector_mask(PCIDevice *pci_dev, unsigned vector)
+{
+NvmeCtrl *n = NVME(pci_dev);
+
+trace_pci_nvme_irq_mask(vector);
+
+for (uint32_t i = 1; i <= n->params.max_ioqpairs; i++) {
+NvmeCQueue *cq = n->cq[i];
+
+if (!cq) {
+continue;
+}
+
+if (cq->vector == vector) {
+kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state,
+  &cq->assert_notifier,
+  cq->virq);
+}
+}
+}
+
+static void nvme_kvm_vector_poll(PCIDevice *pci_dev, unsigned int vector_start,
+ unsigned int vector_end)
+{
+NvmeCtrl *n = NVME(pci_dev);
+
+trace_pci_nvme_irq_poll(vector_start, vector_end);
+
+for (uint32_t i = 1; i <= n->params.max_ioqpairs; i++) {
+NvmeCQueue *cq = n->cq[i];
+
+if (!cq) {
+continue;
+}
+
+if (!msix_is_masked(pci_dev, cq->vector)) {
+continue;
+}
+
+if (cq->vector >= vector_start && cq->vector <= vector_end) {
+if (event_notifier_test_and_clear(&cq->assert_notifier)) {
+msix_set_pending(pci_dev, i);
+}
+}
+}
+}
+
+
 static void nvme_init_irq_notifier(NvmeCtrl *n, NvmeCQueue *cq)
 {
+bool with_irqfd = msix_enabled(&n->parent_obj) &&
+  kvm_msi_via_irqfd_enabled();
 int ret;
 
 ret = event_notifier_init(&cq->assert_notifier, 0);
@@ -1386,12 +1494,27 @@ static void nvme_init_irq_notifier(NvmeCtrl *n, 
NvmeCQueue *cq)
 return;
 }
 
-event_notifier_set_handler(&cq->assert_notifier,
-nvme_assert_notifier_read);
+if (with_irqfd) {
+ret = nvme_kvm_vector_use(n, cq, cq->vector);
+if (ret < 0) {
+event_notifier_cleanup(&cq->assert_notifier);
+
+return;
+}
+} else {
+event_notifier_set_handler(&cq->assert_notifier,
+   nvme_assert_notifier_read);
+}
 
 if (!msix_enabled(&n->parent_obj)) {
 ret = event_notifier_init(&cq->deassert_notifier, 0);
 if (ret < 0) {
+if (with_irqfd) {
+kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state,
+  &cq->assert_notifier,
+  cq->virq);
+}
+
 event_notifier_set_handler(&cq->assert_notif

Re: [PATCH v7 2/2] target/s390x: support PRNO_TRNG instruction

2022-08-26 Thread Thomas Huth

On 09/08/2022 17.03, Jason A. Donenfeld wrote:

In order for hosts running inside of TCG to initialize the kernel's
random number generator, we should support the PRNO_TRNG instruction,
backed in the usual way with the qemu_guest_getrandom helper. This is
confirmed working on Linux 5.19.

Cc: Thomas Huth 
Cc: David Hildenbrand 
Cc: Christian Borntraeger 
Cc: Richard Henderson 
Cc: Cornelia Huck 
Cc: Harald Freudenberger 
Cc: Holger Dengler 
Signed-off-by: Jason A. Donenfeld 
---
  target/s390x/gen-features.c  |  1 +
  target/s390x/tcg/crypto_helper.c | 30 ++
  2 files changed, 31 insertions(+)


Also here: If you've got some spare time, a test in tests/tcg/s390x/ would 
be very welcome!



diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index 85ab69d04e..423ae44315 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -752,6 +752,7 @@ static uint16_t qemu_MAX[] = {
  S390_FEAT_MSA_EXT_5,
  S390_FEAT_KIMD_SHA_512,
  S390_FEAT_KLMD_SHA_512,
+S390_FEAT_PRNO_TRNG,
  };


(this will need some fencing for old machine types, too, just like in patch 1/2)


  /** END FEATURE DEFS **/
diff --git a/target/s390x/tcg/crypto_helper.c b/target/s390x/tcg/crypto_helper.c
index 4d45de8faa..e155ae1f54 100644
--- a/target/s390x/tcg/crypto_helper.c
+++ b/target/s390x/tcg/crypto_helper.c
@@ -14,6 +14,7 @@
  
  #include "qemu/osdep.h"

  #include "qemu/main-loop.h"
+#include "qemu/guest-random.h"
  #include "s390x-internal.h"
  #include "tcg_s390x.h"
  #include "exec/helper-proto.h"
@@ -167,6 +168,31 @@ static int klmd_sha512(CPUS390XState *env, uintptr_t ra, 
uint64_t parameter_bloc
  return 0;
  }
  
+static void fill_buf_random(CPUS390XState *env, uintptr_t ra,

+uint64_t *buf_reg, uint64_t *len_reg)
+{
+uint8_t tmp[256];
+uint64_t len = *len_reg;
+int message_reg_len = 64;
+
+if (!(env->psw.mask & PSW_MASK_64)) {
+len = (uint32_t)len;
+message_reg_len = (env->psw.mask & PSW_MASK_32) ? 32 : 24;
+}
+
+while (len) {
+size_t block = MIN(len, sizeof(tmp));
+
+qemu_guest_getrandom_nofail(tmp, block);
+for (size_t i = 0; i < block; ++i) {
+cpu_stb_data_ra(env, wrap_address(env, *buf_reg), tmp[i], ra);
+*buf_reg = deposit64(*buf_reg, 0, message_reg_len, *buf_reg + 1);
+--*len_reg;


I know it's annoying, but technically, you must not touch the upper bits of 
the len_reg if running in 31- or 24-bit addressing mode. The Principles of 
Operations say:


"In either the 24- or 31-bit addressing mode, bits 32-63 of the odd-numbered 
register are decremented by the number

of bytes processed for the respective operand, and
bits 0-31 of the register remain unchanged."


+}
+len -= block;
+}
+}
+
  uint32_t HELPER(msa)(CPUS390XState *env, uint32_t r1, uint32_t r2, uint32_t 
r3,
   uint32_t type)
  {


Don't you also need to modify the "query" part to signal the availability of 
the function? Doesn't Linux in the guest check the availability first before 
using it?



@@ -209,6 +235,10 @@ uint32_t HELPER(msa)(CPUS390XState *env, uint32_t r1, 
uint32_t r2, uint32_t r3,
  return klmd_sha512(env, ra, env->regs[1], &env->regs[r2], 
&env->regs[r2 + 1]);
  }
  break;
+case 114: /* CPACF_PRNO_TRNG */
+fill_buf_random(env, ra, &env->regs[r1], &env->regs[r1 + 1]);
+fill_buf_random(env, ra, &env->regs[r2], &env->regs[r2 + 1]);
+break;
  default:
  /* we don't implement any other subfunction yet */
  g_assert_not_reached();


Maybe one more thing to check (according the "Special Conditions" section in 
the Principles of Operation):


"A specification exception is recognized and no other
action is taken if any of the following conditions exist:

...

2. The R1 or R2 fields designate an odd-numbered
register or general register 0. This exception is
recognized regardless of the function code.
"

 Thomas




Re: [PATCH v2 0/3] Fix hugepages with memfd on s390x and clean up related code

2022-08-26 Thread Thomas Huth

On 10/08/2022 14.57, Thomas Huth wrote:

The first patch fixes the problem that hugepages cannot be used via
the "memory-backend-memfd" object on s390x. The second and third patch
are some clean-ups that can be done after generalizing the code in the
first patch.

v2:
  - Use qemu_ram_pagesize(memdev->mr.ram_block) instead of adding
additional code for the memfd object
  - Added the two clean-up patches on top to simplify the code

Thomas Huth (3):
   backends/hostmem: Fix support of memory-backend-memfd in
 qemu_maxrampagesize()
   softmmu/physmem: Remove the ifdef __linux__  around the pagesize
 functions
   util/mmap-alloc: Remove qemu_mempath_getpagesize()

  include/qemu/mmap-alloc.h |  2 --
  backends/hostmem.c| 14 ++
  softmmu/physmem.c | 17 -
  util/mmap-alloc.c | 31 ---
  4 files changed, 2 insertions(+), 62 deletions(-)



Thanks to David and Claudio for the reviews! FWIW, I'll take this through my 
s390x-next branch since it fixes a s390x-related problem:


 https://gitlab.com/thuth/qemu/-/commits/s390x-next/

 Thomas




[PATCH 3/3] hw/nvme: add iothread support

2022-08-26 Thread Jinhao Fan
Add an option "iothread=x" to do emulation in a seperate iothread.
This improves the performance because QEMU's main loop is responsible
for a lot of other work while iothread is dedicated to NVMe emulation.
Moreover, emulating in iothread brings the potential of polling on
SQ/CQ doorbells, which I will bring up in a following patch.

Iothread can be enabled by:
 -object iothread,id=nvme0 \
 -device nvme,iothread=nvme0 \

Performance comparisons (KIOPS):

QD 1   4  16  64
QEMU  41 136 242 338
iothread  53 155 245 309

Signed-off-by: Jinhao Fan 
---
 hw/nvme/ctrl.c | 74 +-
 hw/nvme/ns.c   | 21 +++---
 hw/nvme/nvme.h |  6 +++-
 3 files changed, 89 insertions(+), 12 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 396f3f0cdd..24a367329d 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -4458,7 +4458,13 @@ static int nvme_init_cq_ioeventfd(NvmeCQueue *cq)
 return ret;
 }
 
-event_notifier_set_handler(&cq->notifier, nvme_cq_notifier);
+if (cq->cqid) {
+aio_set_event_notifier(n->ctx, &cq->notifier, true, nvme_cq_notifier,
+   NULL, NULL);
+} else {
+event_notifier_set_handler(&cq->notifier, nvme_cq_notifier);
+}
+
 memory_region_add_eventfd(&n->iomem,
   0x1000 + offset, 4, false, 0, &cq->notifier);
 
@@ -4487,7 +4493,13 @@ static int nvme_init_sq_ioeventfd(NvmeSQueue *sq)
 return ret;
 }
 
-event_notifier_set_handler(&sq->notifier, nvme_sq_notifier);
+if (sq->sqid) {
+aio_set_event_notifier(n->ctx, &sq->notifier, true, nvme_sq_notifier,
+   NULL, NULL);
+} else {
+event_notifier_set_handler(&sq->notifier, nvme_sq_notifier);
+}
+
 memory_region_add_eventfd(&n->iomem,
   0x1000 + offset, 4, false, 0, &sq->notifier);
 
@@ -4503,7 +4515,12 @@ static void nvme_free_sq(NvmeSQueue *sq, NvmeCtrl *n)
 if (sq->ioeventfd_enabled) {
 memory_region_del_eventfd(&n->iomem,
   0x1000 + offset, 4, false, 0, &sq->notifier);
-event_notifier_set_handler(&sq->notifier, NULL);
+if (sq->sqid) {
+aio_set_event_notifier(n->ctx, &sq->notifier, true, NULL, NULL,
+   NULL);
+} else {
+event_notifier_set_handler(&sq->notifier, NULL);
+}
 event_notifier_cleanup(&sq->notifier);
 }
 g_free(sq->io_req);
@@ -4573,7 +4590,13 @@ static void nvme_init_sq(NvmeSQueue *sq, NvmeCtrl *n, 
uint64_t dma_addr,
 sq->io_req[i].sq = sq;
 QTAILQ_INSERT_TAIL(&(sq->req_list), &sq->io_req[i], entry);
 }
-sq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_process_sq, sq);
+
+if (sq->sqid) {
+sq->timer = aio_timer_new(n->ctx, QEMU_CLOCK_VIRTUAL, SCALE_NS,
+  nvme_process_sq, sq);
+} else {
+sq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_process_sq, sq);
+}
 
 if (n->dbbuf_enabled) {
 sq->db_addr = n->dbbuf_dbs + (sqid << 3);
@@ -4896,7 +4919,12 @@ static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n)
 if (cq->ioeventfd_enabled) {
 memory_region_del_eventfd(&n->iomem,
   0x1000 + offset, 4, false, 0, &cq->notifier);
-event_notifier_set_handler(&cq->notifier, NULL);
+if (cq->cqid) {
+aio_set_event_notifier(n->ctx, &cq->notifier, true, NULL, NULL,
+   NULL);
+} else {
+event_notifier_set_handler(&cq->notifier, NULL);
+}
 event_notifier_cleanup(&cq->notifier);
 }
 if (cq->assert_notifier.initialized) {
@@ -4979,7 +5007,13 @@ static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n, 
uint64_t dma_addr,
 }
 }
 n->cq[cqid] = cq;
-cq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_post_cqes, cq);
+
+if (cq->cqid) {
+cq->timer = aio_timer_new(n->ctx, QEMU_CLOCK_VIRTUAL, SCALE_NS,
+  nvme_post_cqes, cq);
+} else {
+cq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_post_cqes, cq);
+}
 
 /*
  * Only enable irq eventfd for IO queues since we always emulate admin
@@ -4988,6 +5022,13 @@ static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n, 
uint64_t dma_addr,
 if (cqid && n->params.irq_eventfd) {
 nvme_init_irq_notifier(n, cq);
 }
+
+if (cq->cqid) {
+cq->timer = aio_timer_new(n->ctx, QEMU_CLOCK_VIRTUAL, SCALE_NS,
+  nvme_post_cqes, cq);
+} else {
+cq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_post_cqes, cq);
+}
 }
 
 static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeRequest *req)
@@ -7759,6 +7800,14 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 if (pci_is_vf(&n->parent_obj) && !sctrl->scs) {
 stl_le_p(&n->bar.csts, 

Re: [PATCH] fw_cfg: Don't set callback_opaque NULL in fw_cfg_modify_bytes_read()

2022-08-26 Thread Laszlo Ersek
On 08/25/22 18:18, Shameer Kolothum wrote:
> Hi
> 
> On arm/virt platform, Chen Xiang reported a Guest crash while
> attempting the below steps,
> 
> 1. Launch the Guest with nvdimm=on
> 2. Hot-add a NVDIMM dev
> 3. Reboot
> 4. Guest boots fine.
> 5. Reboot again.
> 6. Guest boot fails.
> 
> QEMU_EFI reports the below error:
> ProcessCmdAddPointer: invalid pointer value in "etc/acpi/tables"
> OnRootBridgesConnected: InstallAcpiTables: Protocol Error
> 
> Debugging shows that on first reboot(after hot-adding NVDIMM),
> Qemu updates the etc/table-loader len,
> 
> qemu_ram_resize()
>   fw_cfg_modify_file()
>      fw_cfg_modify_bytes_read()
> 
> And in fw_cfg_modify_bytes_read() we set the "callback_opaque" for
> the "key" entry to NULL. Because of this, on the second reboot,
> virt_acpi_build_update() is called with a NULL "build_state" and
> returns without updating the ACPI tables. This seems to be 
> upsetting the firmware.
> 
> To fix this, don't change the callback_opaque in fw_cfg_modify_bytes_read().
> 
> Reported-by: chenxiang 
> Signed-off-by: Shameer Kolothum 
> ---
> I am still not very convinced this is the root cause of the issue.
> Though it looks like setting callback_opaque to NULL while updating
> the file size is wrong, what puzzles me is that on the second reboot
> we don't have any ACPI table size changes and ideally firmware should
> see the updated tables from the first reboot itself.
> 
> Please take a look and let me know.
> 
> Thanks,
> Shameer
> 
> ---
>  hw/nvram/fw_cfg.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
> index d605f3f45a..dfe8404c01 100644
> --- a/hw/nvram/fw_cfg.c
> +++ b/hw/nvram/fw_cfg.c
> @@ -728,7 +728,6 @@ static void *fw_cfg_modify_bytes_read(FWCfgState *s, 
> uint16_t key,
>  ptr = s->entries[arch][key].data;
>  s->entries[arch][key].data = data;
>  s->entries[arch][key].len = len;
> -s->entries[arch][key].callback_opaque = NULL;
>  s->entries[arch][key].allow_write = false;
>  
>  return ptr;
> 

I vaguely recall seeing the same issue report years ago (also in
relation to hot-adding NVDIMM). However, I have no capacity to
participate in the discussion. Making this remark just for clarity.

Laszlo




Re: [PATCH 07/18] i386: Rewrite simple integer vector helpers

2022-08-26 Thread Paolo Bonzini

On 8/26/22 02:01, Richard Henderson wrote:


-#if SHIFT == 0
-SSE_HELPER_W(helper_pmulhrw, FMULHRW)
-#endif
  SSE_HELPER_W(helper_pmulhuw, FMULHUW)
  SSE_HELPER_W(helper_pmulhw, FMULHW)
+#if SHIFT == 0
+void glue(helper_pmulhrw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
+{
+    d->W(0) = FMULHRW(d->W(0), s->W(0));
+    d->W(1) = FMULHRW(d->W(1), s->W(1));
+    d->W(2) = FMULHRW(d->W(2), s->W(2));
+    d->W(3) = FMULHRW(d->W(3), s->W(3));
+}
+#endif


Why?


Because this is actually a 3DNow instruction so it doesn't get the 
3-operand treatment later.  But I can defer the change to the next part 
of the series.


Paolo



Re: [PATCH] linux-user: use 'max' instead of 'qemu32' / 'qemu64' by defualt

2022-08-26 Thread Richard W.M. Jones
On Fri, Aug 26, 2022 at 12:39:00PM +0100, Daniel P. Berrangé wrote:
> The 'qemu64' CPU model implements the least featureful x86_64 CPU that's
> possible. Historically this hasn't been an issue since it was rare for
> OS distros to build with a higher mandatory CPU baseline.
> 
> With RHEL-9, however, the entire distro is built for the x86_64-v2 ABI
> baseline:
> 
>   
> https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-linux-9-for-the-x86-64-v2-microarchitecture-level
> 
> It is likely that other distros may take similar steps in the not too
> distant future. For example, it has been suggested for Fedora on a
> number of occassions.
> 
> This new baseline is not compatible with the qemu64 CPU model though.
> While it is possible to pass a '-cpu xxx' flag to qemu-x86_64, the
> usage of QEMU doesn't always allow for this. For example, the args
> are typically controlled via binfmt rules that the user has no ability
> to change. This impacts users who are trying to use podman on aarch64
> platforms, to run containers with x86_64 content. There's no arg to
> podman that can be used to change the qemu-x86_64 args, and a non-root
> user of podman can not change binfmt rules without elevating privileges:
> 
>   https://github.com/containers/podman/issues/15456#issuecomment-1228210973
> 
> Changing to the 'max' CPU model gives 'qemu-x86_64' maximum
> compatibility with binaries it is likely to encounter in the wild,
> and not likely to have a significant downside for existing usage.
> 
> Most other architectures already use an 'any' CPU model, which is
> often mapped to 'max' (or similar) already, rather than the oldest
> possible CPU model.
> 
> For the sake of consistency the 'i386' architecture is also changed
> from using 'qemu32' to 'max'.
> 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  linux-user/i386/target_elf.h   | 2 +-
>  linux-user/x86_64/target_elf.h | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/linux-user/i386/target_elf.h b/linux-user/i386/target_elf.h
> index 1c6142e7da..238a9aba73 100644
> --- a/linux-user/i386/target_elf.h
> +++ b/linux-user/i386/target_elf.h
> @@ -9,6 +9,6 @@
>  #define I386_TARGET_ELF_H
>  static inline const char *cpu_get_model(uint32_t eflags)
>  {
> -return "qemu32";
> +return "max";
>  }
>  #endif
> diff --git a/linux-user/x86_64/target_elf.h b/linux-user/x86_64/target_elf.h
> index 7b76a90de8..3f628f8d66 100644
> --- a/linux-user/x86_64/target_elf.h
> +++ b/linux-user/x86_64/target_elf.h
> @@ -9,6 +9,6 @@
>  #define X86_64_TARGET_ELF_H
>  static inline const char *cpu_get_model(uint32_t eflags)
>  {
> -return "qemu64";
> +return "max";
>  }
>  #endif

Can we be assured we won't ever hit this TCG bug that currently
affects -cpu max ?

https://gitlab.com/qemu-project/qemu/-/issues/1023

I'm going to guess we will be OK because qemu-user doesn't run a
kernel and therefore wouldn't normally touch %cr3.  Is there any other
situation?  (Of course it would be better all round if that glaring
bug could be fixed.)

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v




RE: [PATCH] fw_cfg: Don't set callback_opaque NULL in fw_cfg_modify_bytes_read()

2022-08-26 Thread Shameerali Kolothum Thodi via


> -Original Message-
> From: Laszlo Ersek [mailto:ler...@redhat.com]
> Sent: 26 August 2022 13:07
> To: Shameerali Kolothum Thodi ;
> qemu-devel@nongnu.org; qemu-...@nongnu.org
> Cc: imamm...@redhat.com; peter.mayd...@linaro.org; Linuxarm
> ; chenxiang (M) ; Ard
> Biesheuvel (kernel.org address) ; Gerd Hoffmann
> 
> Subject: Re: [PATCH] fw_cfg: Don't set callback_opaque NULL in
> fw_cfg_modify_bytes_read()
> 
> +Ard +Gerd, one pointer at the bottom
> 
> On 08/26/22 13:59, Laszlo Ersek wrote:
> > On 08/25/22 18:18, Shameer Kolothum wrote:
> >> Hi
> >>
> >> On arm/virt platform, Chen Xiang reported a Guest crash while
> >> attempting the below steps,
> >>
> >> 1. Launch the Guest with nvdimm=on
> >> 2. Hot-add a NVDIMM dev
> >> 3. Reboot
> >> 4. Guest boots fine.
> >> 5. Reboot again.
> >> 6. Guest boot fails.
> >>
> >> QEMU_EFI reports the below error:
> >> ProcessCmdAddPointer: invalid pointer value in "etc/acpi/tables"
> >> OnRootBridgesConnected: InstallAcpiTables: Protocol Error
> >>
> >> Debugging shows that on first reboot(after hot-adding NVDIMM),
> >> Qemu updates the etc/table-loader len,
> >>
> >> qemu_ram_resize()
> >>   fw_cfg_modify_file()
> >>      fw_cfg_modify_bytes_read()
> >>
> >> And in fw_cfg_modify_bytes_read() we set the "callback_opaque" for
> >> the "key" entry to NULL. Because of this, on the second reboot,
> >> virt_acpi_build_update() is called with a NULL "build_state" and
> >> returns without updating the ACPI tables. This seems to be
> >> upsetting the firmware.
> >>
> >> To fix this, don't change the callback_opaque in
> fw_cfg_modify_bytes_read().
> >>
> >> Reported-by: chenxiang 
> >> Signed-off-by: Shameer Kolothum
> 
> >> ---
> >> I am still not very convinced this is the root cause of the issue.
> >> Though it looks like setting callback_opaque to NULL while updating
> >> the file size is wrong, what puzzles me is that on the second reboot
> >> we don't have any ACPI table size changes and ideally firmware should
> >> see the updated tables from the first reboot itself.
> >>
> >> Please take a look and let me know.
> >>
> >> Thanks,
> >> Shameer
> >>
> >> ---
> >>  hw/nvram/fw_cfg.c | 1 -
> >>  1 file changed, 1 deletion(-)
> >>
> >> diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
> >> index d605f3f45a..dfe8404c01 100644
> >> --- a/hw/nvram/fw_cfg.c
> >> +++ b/hw/nvram/fw_cfg.c
> >> @@ -728,7 +728,6 @@ static void
> *fw_cfg_modify_bytes_read(FWCfgState *s, uint16_t key,
> >>  ptr = s->entries[arch][key].data;
> >>  s->entries[arch][key].data = data;
> >>  s->entries[arch][key].len = len;
> >> -s->entries[arch][key].callback_opaque = NULL;
> >>  s->entries[arch][key].allow_write = false;
> >>
> >>  return ptr;
> >>
> >
> > I vaguely recall seeing the same issue report years ago (also in
> > relation to hot-adding NVDIMM). However, I have no capacity to
> > participate in the discussion. Making this remark just for clarity.
> 
> The earlier report I've had in mind was from Shameer as well:
> 
> http://mid.mail-archive.com/5FC3163CFD30C246ABAA99954A238FA83F3F
> b...@lhreml524-mbs.china.huawei.com

Right. That was a slightly different issue though. It was basically ACPI table 
size not
getting updated on the first reboot of Guest after we hot-add NVDIMM dev. The 
error
from firmware was different in that case,

ProcessCmdAddChecksum: invalid checksum range in "etc/acpi/tables"
OnRootBridgesConnected: InstallAcpiTables: Protocol Error

And it was fixed with this series here,
https://patchwork.kernel.org/project/qemu-devel/cover/20200403101827.30664-1-shameerali.kolothum.th...@huawei.com/

The current issue only happens on the second reboot of the Guest as described 
in 
the steps above.

Thanks,
Shameer



Re: [PATCH] linux-user: use 'max' instead of 'qemu32' / 'qemu64' by defualt

2022-08-26 Thread Claudio Fontana
On 8/26/22 13:39, Daniel P. Berrangé wrote:
> The 'qemu64' CPU model implements the least featureful x86_64 CPU that's
> possible. Historically this hasn't been an issue since it was rare for
> OS distros to build with a higher mandatory CPU baseline.
> 
> With RHEL-9, however, the entire distro is built for the x86_64-v2 ABI
> baseline:
> 
>   
> https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-linux-9-for-the-x86-64-v2-microarchitecture-level
> 
> It is likely that other distros may take similar steps in the not too
> distant future. For example, it has been suggested for Fedora on a
> number of occassions.
> 
> This new baseline is not compatible with the qemu64 CPU model though.
> While it is possible to pass a '-cpu xxx' flag to qemu-x86_64, the
> usage of QEMU doesn't always allow for this. For example, the args
> are typically controlled via binfmt rules that the user has no ability
> to change. This impacts users who are trying to use podman on aarch64
> platforms, to run containers with x86_64 content. There's no arg to
> podman that can be used to change the qemu-x86_64 args, and a non-root
> user of podman can not change binfmt rules without elevating privileges:
> 
>   https://github.com/containers/podman/issues/15456#issuecomment-1228210973
> 
> Changing to the 'max' CPU model gives 'qemu-x86_64' maximum
> compatibility with binaries it is likely to encounter in the wild,
> and not likely to have a significant downside for existing usage.

How do we know for sure? Do we have a base of binaries to test across qemu 
versions?

> 
> Most other architectures already use an 'any' CPU model, which is
> often mapped to 'max' (or similar) already, rather than the oldest
> possible CPU model.
> 
> For the sake of consistency the 'i386' architecture is also changed
> from using 'qemu32' to 'max'.
> 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  linux-user/i386/target_elf.h   | 2 +-
>  linux-user/x86_64/target_elf.h | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/linux-user/i386/target_elf.h b/linux-user/i386/target_elf.h
> index 1c6142e7da..238a9aba73 100644
> --- a/linux-user/i386/target_elf.h
> +++ b/linux-user/i386/target_elf.h
> @@ -9,6 +9,6 @@
>  #define I386_TARGET_ELF_H
>  static inline const char *cpu_get_model(uint32_t eflags)
>  {
> -return "qemu32";
> +return "max";
>  }
>  #endif
> diff --git a/linux-user/x86_64/target_elf.h b/linux-user/x86_64/target_elf.h
> index 7b76a90de8..3f628f8d66 100644
> --- a/linux-user/x86_64/target_elf.h
> +++ b/linux-user/x86_64/target_elf.h
> @@ -9,6 +9,6 @@
>  #define X86_64_TARGET_ELF_H
>  static inline const char *cpu_get_model(uint32_t eflags)
>  {
> -return "qemu64";
> +return "max";
>  }
>  #endif

Just seems an abrupt change to me if we don't have a mechanism in place to 
ensure we don't break existing workloads.

C





Re: [PATCH v7 4/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2022-08-26 Thread Sam Li
Damien Le Moal  于2022年8月17日周三 01:50写道:
>
> On 2022/08/15 23:25, Sam Li wrote:
> > By adding zone management operations in BlockDriver, storage controller
> > emulation can use the new block layer APIs including Report Zone and
> > four zone management operations (open, close, finish, reset).
> >
> > Add zoned storage commands of the device: zone_report(zrp), zone_open(zo),
> > zone_close(zc), zone_reset(zrs), zone_finish(zf).
> >
> > For example, to test zone_report, use following command:
> > $ ./build/qemu-io --image-opts driver=zoned_host_device, 
> > filename=/dev/nullb0
> > -c "zrp offset nr_zones"
> >
> > Signed-off-by: Sam Li 
> > Reviewed-by: Hannes Reinecke 
> > ---
> >  block/block-backend.c |  50 +
> >  block/file-posix.c| 341 +-
> >  block/io.c|  41 
> >  include/block/block-common.h  |   1 -
> >  include/block/block-io.h  |  13 ++
> >  include/block/block_int-common.h  |  22 +-
> >  include/block/raw-aio.h   |   6 +-
> >  include/sysemu/block-backend-io.h |   6 +
> >  meson.build   |   1 +
> >  qapi/block-core.json  |   8 +-
> >  qemu-io-cmds.c| 143 +
> >  11 files changed, 625 insertions(+), 7 deletions(-)
> >
> > diff --git a/block/block-backend.c b/block/block-backend.c
> > index d4a5df2ac2..fc639b0cd7 100644
> > --- a/block/block-backend.c
> > +++ b/block/block-backend.c
> > @@ -1775,6 +1775,56 @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
> >  return ret;
> >  }
> >
> > +/*
> > + * Send a zone_report command.
> > + * offset is a byte offset from the start of the device. No alignment
> > + * required for offset.
> > + * nr_zones represents IN maximum and OUT actual.
> > + */
> > +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
> > +unsigned int *nr_zones,
> > +BlockZoneDescriptor *zones)
> > +{
> > +int ret;
> > +IO_CODE();
> > +
> > +blk_inc_in_flight(blk); /* increase before waiting */
> > +blk_wait_while_drained(blk);
> > +if (!blk_is_available(blk)) {
> > +blk_dec_in_flight(blk);
> > +return -ENOMEDIUM;
> > +}
> > +ret = bdrv_co_zone_report(blk_bs(blk), offset, nr_zones, zones);
> > +blk_dec_in_flight(blk);
> > +return ret;
> > +}
> > +
> > +/*
> > + * Send a zone_management command.
> > + * offset is the starting zone specified as a sector offset.
> > + * len is the maximum number of sectors the command should operate on.
>
> You should mention that len should be zone size aligned. Also, for 
> completness,
> add a short description of the op argument too ?
>
> > + */
> > +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
> > +int64_t offset, int64_t len)
> > +{
> > +int ret;
> > +IO_CODE();
> > +
> > +ret = blk_check_byte_request(blk, offset, len);
> > +if (ret < 0) {
> > +return ret;
> > +}
> > +blk_inc_in_flight(blk);
> > +blk_wait_while_drained(blk);
> > +if (!blk_is_available(blk)) {
> > +blk_dec_in_flight(blk);
> > +return -ENOMEDIUM;
> > +}
> > +ret = bdrv_co_zone_mgmt(blk_bs(blk), op, offset, len);
> > +blk_dec_in_flight(blk);
> > +return ret;
> > +}
> > +
> >  void blk_drain(BlockBackend *blk)
> >  {
> >  BlockDriverState *bs = blk_bs(blk);
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index 727389488c..29f67082d9 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -67,6 +67,9 @@
> >  #include 
> >  #include 
> >  #include 
> > +#if defined(CONFIG_BLKZONED)
> > +#include 
> > +#endif
> >  #include 
> >  #include 
> >  #include 
> > @@ -216,6 +219,13 @@ typedef struct RawPosixAIOData {
> >  PreallocMode prealloc;
> >  Error **errp;
> >  } truncate;
> > +struct {
> > +unsigned int *nr_zones;
> > +BlockZoneDescriptor *zones;
> > +} zone_report;
> > +struct {
> > +unsigned long ioctl_op;
>
> May be clarify this field usage by calling it zone_op ?
>
> > +} zone_mgmt;
> >  };
> >  } RawPosixAIOData;
> >
> > @@ -1328,7 +1338,7 @@ static void raw_refresh_limits(BlockDriverState *bs, 
> > Error **errp)
> >  #endif
> >
> >  if (bs->sg || S_ISBLK(st.st_mode)) {
> > -int ret = hdev_get_max_hw_transfer(s->fd, &st);
> > +ret = hdev_get_max_hw_transfer(s->fd, &st);
> >
> >  if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
> >  bs->bl.max_hw_transfer = ret;
> > @@ -1340,11 +1350,32 @@ static void raw_refresh_limits(BlockDriverState 
> > *bs, Error **errp)
> >  }
> >  }
> >
> > -ret = get_sysfs_zoned_model(s->fd, &st, &zoned);
> > +ret = get_sysfs_zoned_model(&st, &zoned);
> >  if (ret < 0) {
> >  zoned = BLK_Z_NONE;
> >  }
> >  bs->bl.zoned = zoned;
> > 

Re: [PATCH] fw_cfg: Don't set callback_opaque NULL in fw_cfg_modify_bytes_read()

2022-08-26 Thread Laszlo Ersek
+Ard +Gerd, one pointer at the bottom

On 08/26/22 13:59, Laszlo Ersek wrote:
> On 08/25/22 18:18, Shameer Kolothum wrote:
>> Hi
>>
>> On arm/virt platform, Chen Xiang reported a Guest crash while
>> attempting the below steps,
>>
>> 1. Launch the Guest with nvdimm=on
>> 2. Hot-add a NVDIMM dev
>> 3. Reboot
>> 4. Guest boots fine.
>> 5. Reboot again.
>> 6. Guest boot fails.
>>
>> QEMU_EFI reports the below error:
>> ProcessCmdAddPointer: invalid pointer value in "etc/acpi/tables"
>> OnRootBridgesConnected: InstallAcpiTables: Protocol Error
>>
>> Debugging shows that on first reboot(after hot-adding NVDIMM),
>> Qemu updates the etc/table-loader len,
>>
>> qemu_ram_resize()
>>   fw_cfg_modify_file()
>>      fw_cfg_modify_bytes_read()
>>
>> And in fw_cfg_modify_bytes_read() we set the "callback_opaque" for
>> the "key" entry to NULL. Because of this, on the second reboot,
>> virt_acpi_build_update() is called with a NULL "build_state" and
>> returns without updating the ACPI tables. This seems to be 
>> upsetting the firmware.
>>
>> To fix this, don't change the callback_opaque in fw_cfg_modify_bytes_read().
>>
>> Reported-by: chenxiang 
>> Signed-off-by: Shameer Kolothum 
>> ---
>> I am still not very convinced this is the root cause of the issue.
>> Though it looks like setting callback_opaque to NULL while updating
>> the file size is wrong, what puzzles me is that on the second reboot
>> we don't have any ACPI table size changes and ideally firmware should
>> see the updated tables from the first reboot itself.
>>
>> Please take a look and let me know.
>>
>> Thanks,
>> Shameer
>>
>> ---
>>  hw/nvram/fw_cfg.c | 1 -
>>  1 file changed, 1 deletion(-)
>>
>> diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
>> index d605f3f45a..dfe8404c01 100644
>> --- a/hw/nvram/fw_cfg.c
>> +++ b/hw/nvram/fw_cfg.c
>> @@ -728,7 +728,6 @@ static void *fw_cfg_modify_bytes_read(FWCfgState *s, 
>> uint16_t key,
>>  ptr = s->entries[arch][key].data;
>>  s->entries[arch][key].data = data;
>>  s->entries[arch][key].len = len;
>> -s->entries[arch][key].callback_opaque = NULL;
>>  s->entries[arch][key].allow_write = false;
>>  
>>  return ptr;
>>
> 
> I vaguely recall seeing the same issue report years ago (also in
> relation to hot-adding NVDIMM). However, I have no capacity to
> participate in the discussion. Making this remark just for clarity.

The earlier report I've had in mind was from Shameer as well:

http://mid.mail-archive.com/5FC3163CFD30C246ABAA99954A238FA83F3FB328@lhreml524-mbs.china.huawei.com




Re: [PATCH] linux-user: use 'max' instead of 'qemu32' / 'qemu64' by defualt

2022-08-26 Thread Daniel P . Berrangé
On Fri, Aug 26, 2022 at 01:05:13PM +0100, Richard W.M. Jones wrote:
> On Fri, Aug 26, 2022 at 12:39:00PM +0100, Daniel P. Berrangé wrote:
> > The 'qemu64' CPU model implements the least featureful x86_64 CPU that's
> > possible. Historically this hasn't been an issue since it was rare for
> > OS distros to build with a higher mandatory CPU baseline.
> > 
> > With RHEL-9, however, the entire distro is built for the x86_64-v2 ABI
> > baseline:
> > 
> >   
> > https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-linux-9-for-the-x86-64-v2-microarchitecture-level
> > 
> > It is likely that other distros may take similar steps in the not too
> > distant future. For example, it has been suggested for Fedora on a
> > number of occassions.
> > 
> > This new baseline is not compatible with the qemu64 CPU model though.
> > While it is possible to pass a '-cpu xxx' flag to qemu-x86_64, the
> > usage of QEMU doesn't always allow for this. For example, the args
> > are typically controlled via binfmt rules that the user has no ability
> > to change. This impacts users who are trying to use podman on aarch64
> > platforms, to run containers with x86_64 content. There's no arg to
> > podman that can be used to change the qemu-x86_64 args, and a non-root
> > user of podman can not change binfmt rules without elevating privileges:
> > 
> >   https://github.com/containers/podman/issues/15456#issuecomment-1228210973
> > 
> > Changing to the 'max' CPU model gives 'qemu-x86_64' maximum
> > compatibility with binaries it is likely to encounter in the wild,
> > and not likely to have a significant downside for existing usage.
> > 
> > Most other architectures already use an 'any' CPU model, which is
> > often mapped to 'max' (or similar) already, rather than the oldest
> > possible CPU model.
> > 
> > For the sake of consistency the 'i386' architecture is also changed
> > from using 'qemu32' to 'max'.
> > 
> > Signed-off-by: Daniel P. Berrangé 
> > ---
> >  linux-user/i386/target_elf.h   | 2 +-
> >  linux-user/x86_64/target_elf.h | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/linux-user/i386/target_elf.h b/linux-user/i386/target_elf.h
> > index 1c6142e7da..238a9aba73 100644
> > --- a/linux-user/i386/target_elf.h
> > +++ b/linux-user/i386/target_elf.h
> > @@ -9,6 +9,6 @@
> >  #define I386_TARGET_ELF_H
> >  static inline const char *cpu_get_model(uint32_t eflags)
> >  {
> > -return "qemu32";
> > +return "max";
> >  }
> >  #endif
> > diff --git a/linux-user/x86_64/target_elf.h b/linux-user/x86_64/target_elf.h
> > index 7b76a90de8..3f628f8d66 100644
> > --- a/linux-user/x86_64/target_elf.h
> > +++ b/linux-user/x86_64/target_elf.h
> > @@ -9,6 +9,6 @@
> >  #define X86_64_TARGET_ELF_H
> >  static inline const char *cpu_get_model(uint32_t eflags)
> >  {
> > -return "qemu64";
> > +return "max";
> >  }
> >  #endif
> 
> Can we be assured we won't ever hit this TCG bug that currently
> affects -cpu max ?
> 
> https://gitlab.com/qemu-project/qemu/-/issues/1023
> 
> I'm going to guess we will be OK because qemu-user doesn't run a
> kernel and therefore wouldn't normally touch %cr3.  Is there any other
> situation?  (Of course it would be better all round if that glaring
> bug could be fixed.)

Yeah, the bug appears to be an interaction with the VM configuring
page tables, and since qemu-user is not doing that my guess it it
won't affect this usage. If we did want to be totally safe, we could
add -la57, since that feature flag is useless for user emulation
anyway.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH] linux-user: use 'max' instead of 'qemu32' / 'qemu64' by defualt

2022-08-26 Thread Daniel P . Berrangé
On Fri, Aug 26, 2022 at 01:50:40PM +0200, Claudio Fontana wrote:
> On 8/26/22 13:39, Daniel P. Berrangé wrote:
> > The 'qemu64' CPU model implements the least featureful x86_64 CPU that's
> > possible. Historically this hasn't been an issue since it was rare for
> > OS distros to build with a higher mandatory CPU baseline.
> > 
> > With RHEL-9, however, the entire distro is built for the x86_64-v2 ABI
> > baseline:
> > 
> >   
> > https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-linux-9-for-the-x86-64-v2-microarchitecture-level
> > 
> > It is likely that other distros may take similar steps in the not too
> > distant future. For example, it has been suggested for Fedora on a
> > number of occassions.
> > 
> > This new baseline is not compatible with the qemu64 CPU model though.
> > While it is possible to pass a '-cpu xxx' flag to qemu-x86_64, the
> > usage of QEMU doesn't always allow for this. For example, the args
> > are typically controlled via binfmt rules that the user has no ability
> > to change. This impacts users who are trying to use podman on aarch64
> > platforms, to run containers with x86_64 content. There's no arg to
> > podman that can be used to change the qemu-x86_64 args, and a non-root
> > user of podman can not change binfmt rules without elevating privileges:
> > 
> >   https://github.com/containers/podman/issues/15456#issuecomment-1228210973
> > 
> > Changing to the 'max' CPU model gives 'qemu-x86_64' maximum
> > compatibility with binaries it is likely to encounter in the wild,
> > and not likely to have a significant downside for existing usage.
> 
> How do we know for sure? Do we have a base of binaries to test across
> qemu versions?

There are never any perfect guarantees, but this assertion is based on
the view that the x86 instruction set changes are considered backwards
compatible. Existing applications from years (even decades) ago can
generally run on arbitrarily newer CPUs with orders of magnitude more
features, as apps have to intentionally opt-in to use of new CPU
instructions.

So the risk here would be an existing applications, which is able to
dynamically opt-in to optimized code paths if certain CPUID features
exist, and in turn tickles a bug in QEMU's implementation of said
feature that it would not previously hit. That's certainly possible,
but I don't think it would be common, as we would already have seen
that in system emulators.  The la57 feature issue Richard mentions
is one example, but that doesn't impact user emulators I believe.

Weigh that risk against the fact that we have users frequently
hitting problems with the existing qemu64  default because it is
too old. User's have already been making this change in the context
of Docker for this reason. eg

https://github.com/tonistiigi/binfmt/blob/master/patches/cpu-max/0001-default-to-cpu-max-on-x86-and-arm.patch

> 
> > 
> > Most other architectures already use an 'any' CPU model, which is
> > often mapped to 'max' (or similar) already, rather than the oldest
> > possible CPU model.
> > 
> > For the sake of consistency the 'i386' architecture is also changed
> > from using 'qemu32' to 'max'.
> > 
> > Signed-off-by: Daniel P. Berrangé 
> > ---
> >  linux-user/i386/target_elf.h   | 2 +-
> >  linux-user/x86_64/target_elf.h | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/linux-user/i386/target_elf.h b/linux-user/i386/target_elf.h
> > index 1c6142e7da..238a9aba73 100644
> > --- a/linux-user/i386/target_elf.h
> > +++ b/linux-user/i386/target_elf.h
> > @@ -9,6 +9,6 @@
> >  #define I386_TARGET_ELF_H
> >  static inline const char *cpu_get_model(uint32_t eflags)
> >  {
> > -return "qemu32";
> > +return "max";
> >  }
> >  #endif
> > diff --git a/linux-user/x86_64/target_elf.h b/linux-user/x86_64/target_elf.h
> > index 7b76a90de8..3f628f8d66 100644
> > --- a/linux-user/x86_64/target_elf.h
> > +++ b/linux-user/x86_64/target_elf.h
> > @@ -9,6 +9,6 @@
> >  #define X86_64_TARGET_ELF_H
> >  static inline const char *cpu_get_model(uint32_t eflags)
> >  {
> > -return "qemu64";
> > +return "max";
> >  }
> >  #endif
> 
> Just seems an abrupt change to me if we don't have a mechanism in
> place to ensure we don't break existing workloads.

There are no absolutes here. We have risk of unknown problem possibly
breaking some existing apps, vs a known problem currently breaking
users of CentOS 9 / RHEL 9, which podman and docker need to workaround.

The question is which benefits more users, and which is the better
long term option. I think using modern CPU is better long term, and
if we find bugs in QEMU's TCG impl we just need to fix them regardless.

If we find bugs in applications, however, then the apps need to fix
them.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www

Re: [PATCH 09/51] fsdev/virtfs-proxy-helper: Use g_mkdir_with_parents()

2022-08-26 Thread Bin Meng
On Fri, Aug 26, 2022 at 7:16 PM Christian Schoenebeck
 wrote:
>
> On Freitag, 26. August 2022 12:30:20 CEST Bin Meng wrote:
> > On Fri, Aug 26, 2022 at 6:09 PM Christian Schoenebeck
> >
> >  wrote:
> > > On Mittwoch, 24. August 2022 11:39:47 CEST Bin Meng wrote:
> > > > From: Bin Meng 
> > > >
> > > > Use the same g_mkdir_with_parents() call to create a directory on
> > > > all platforms.
> > >
> > > The same would be g_mkdir(), not g_mkdir_with_parents(), so please use
> > > that
> > > instead.
> >
> > No, g_mkdir() is a deprecated API.
>
> Where did you got that from? AFAICS g_mkdir() does not seem to be deprecated:
> https://gitlab.gnome.org/GNOME/glib/-/blob/main/glib/gstdio.c#L1201
> https://gitlab.gnome.org/GNOME/glib/-/blob/main/glib/gstdio.h#L131
>
> > Search result (https://docs.gtk.org/glib/?q=mkdir) shows only
> > g_mkdir_with_parents().
>
> Yeah, but that does not say that it was deprecated.
>

Umm, correct. However I don't remember where I got the statement that
it was deprecated.

Maybe I was misled by the doc. Do you know why does the doc not
document g_mkdir()?

Regards,
Bin



Re: [PATCH] tests/x86: Add 'q35' machine type to hotplug tests

2022-08-26 Thread Denis V. Lunev

On 26.08.2022 13:41, Michael Labiuk wrote:

Add pci bridge setting to run hotplug tests on q35 machine type.
Hotplug tests was bounded to 'pc' machine type by commit 7b172333f1b

Signed-off-by: Michael Labiuk 
---
  tests/qtest/device-plug-test.c |  26 ++
  tests/qtest/drive_del-test.c   | 111 +
  tests/qtest/hd-geo-test.c  | 148 +
  tests/qtest/ivshmem-test.c |  30 +++
  4 files changed, 315 insertions(+)

diff --git a/tests/qtest/device-plug-test.c b/tests/qtest/device-plug-test.c
index 2e3137843e..2f07b37ba1 100644
--- a/tests/qtest/device-plug-test.c
+++ b/tests/qtest/device-plug-test.c
@@ -165,6 +165,26 @@ static void test_spapr_phb_unplug_request(void)
  qtest_quit(qtest);
  }
  
+static void test_q35_pci_unplug_request(void)

that seems a little bit wrong. we have pcie test and thus
the naming should be appropriate.

What about test_pcie_unplug_request()?


+{
+
+QTestState *qtest = qtest_initf("-machine q35 "
+"-device pcie-root-port,id=p1 "
+"-device pcie-pci-bridge,bus=p1,id=b1 "
+"-device virtio-mouse-pci,bus=b1,id=dev0");
+
+/*
+ * Request device removal. As the guest is not running, the request won't
+ * be processed. However during system reset, the removal will be
+ * handled, removing the device.
+ */
+device_del(qtest, "dev0");
+system_reset(qtest);
+wait_device_deleted_event(qtest, "dev0");
+
+qtest_quit(qtest);
+}
+

this is better to be placed near 'pci' testcases I think


  int main(int argc, char **argv)
  {
  const char *arch = qtest_get_arch();
@@ -195,5 +215,11 @@ int main(int argc, char **argv)
 test_spapr_phb_unplug_request);
  }
  
+if (!strcmp(arch, "x86_64")) {

+qtest_add_func("/device-plug/q35-pci-unplug-request",
+   test_q35_pci_unplug_request);

for me JSON version of the test would also be good to be added


+
+}
+
  return g_test_run();
  }
diff --git a/tests/qtest/drive_del-test.c b/tests/qtest/drive_del-test.c
index 5e6d58b4dd..3a2ddecf22 100644
--- a/tests/qtest/drive_del-test.c
+++ b/tests/qtest/drive_del-test.c
@@ -258,6 +258,27 @@ static void test_cli_device_del(void)
  qtest_quit(qts);
  }
  


this patch seems trashes the internal structure of the test.
originally it was unified for all archs through
  qvirtio_get_dev_type(void)
and this change makes the test non-uniform.
This should be rethinked


+static void test_cli_device_del_q35(void)
+{
+QTestState *qts;
+
+/*
+ * -drive/-device and device_del.  Start with a drive used by a
+ * device that unplugs after reset.
+ */
+qts = qtest_initf("-drive if=none,id=drive0,file=null-co://,"
+  "file.read-zeroes=on,format=raw "
+  "-machine q35 -device pcie-root-port,id=p1 "
+  "-device pcie-pci-bridge,bus=p1,id=b1 "
+  "-device virtio-blk-%s,drive=drive0,bus=b1,id=dev0",
+  qvirtio_get_dev_type());
+
+device_del(qts, true);
+g_assert(!has_drive(qts));
+
+qtest_quit(qts);
+}
+
  static void test_empty_device_del(void)
  {
  QTestState *qts;
@@ -294,6 +315,45 @@ static void test_device_add_and_del(void)
  qtest_quit(qts);
  }
  
+static void device_add_q35(QTestState *qts)

+{
+QDict *response;
+char driver[32];
+snprintf(driver, sizeof(driver), "virtio-blk-%s",
+ qvirtio_get_dev_type());
+
+response = qtest_qmp(qts, "{'execute': 'device_add',"
+  " 'arguments': {"
+  "   'driver': %s,"
+  "   'drive': 'drive0',"
+  "   'id': 'dev0',"
+  "   'bus': 'b1'"
+  "}}", driver);
+g_assert(response);
+g_assert(qdict_haskey(response, "return"));
+qobject_unref(response);
+}
+
+static void test_device_add_and_del_q35(void)
+{
+QTestState *qts;
+
+/*
+ * -drive/device_add and device_del.  Start with a drive used by a
+ * device that unplugs after reset.
+ */
+qts = qtest_initf("-machine q35 -device pcie-root-port,id=p1 "
+ "-device pcie-pci-bridge,bus=p1,id=b1 "
+ "-drive if=none,id=drive0,file=null-co://,"
+ "file.read-zeroes=on,format=raw");
+
+device_add_q35(qts);
+device_del(qts, true);
+g_assert(!has_drive(qts));
+
+qtest_quit(qts);
+}
+
  static void test_drive_add_device_add_and_del(void)
  {
  QTestState *qts;
@@ -318,6 +378,25 @@ static void test_drive_add_device_add_and_del(void)
  qtest_quit(qts);
  }
  
+static void test_drive_add_device_add_and_del_q35(void)

+{
+QTestState *qts;
+
+qts = qtest_init("-machine q35 -device pcie-root-port,id=p1 "
+ "-

Re: [PATCH v6 01/10] parallels: Out of image offset in BAT leads to image inflation

2022-08-26 Thread Denis V. Lunev

On 25.08.2022 16:31, Alexander Ivanov wrote:

data_end field in BDRVParallelsState is set to the biggest offset present
in BAT. If this offset is outside of the image, any further write will create
the cluster at this offset and/or the image will be truncated to this
offset on close. This is definitely not correct.
Raise an error in parallels_open() if data_end points outside the image and
it is not a check (let the check to repaire the image).
Set data_end to the end of the cluster with the last correct offset.

Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 17 +
  1 file changed, 17 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index a229c06f25..93bc2750ef 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -732,6 +732,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
  BDRVParallelsState *s = bs->opaque;
  ParallelsHeader ph;
  int ret, size, i;
+int64_t file_size;
  QemuOpts *opts = NULL;
  Error *local_err = NULL;
  char *buf;
@@ -742,6 +743,12 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
  return -EINVAL;
  }
  
+file_size = bdrv_getlength(bs->file->bs);

+if (file_size < 0) {
+return -EINVAL;
+}
+file_size >>= BDRV_SECTOR_BITS;
+
  ret = bdrv_pread(bs->file, 0, sizeof(ph), &ph, 0);
  if (ret < 0) {
  goto fail;
@@ -806,6 +813,16 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
  
  for (i = 0; i < s->bat_size; i++) {

  int64_t off = bat2sect(s, i);
+if (off >= file_size) {
+if (flags & BDRV_O_CHECK) {
+continue;
+}
+error_setg(errp, "parallels: Offset %" PRIi64 " in BAT[%d] entry "
+   "is larger than file size (%" PRIi64 ")",
+   off, i, file_size);
+ret = -EINVAL;
+goto fail;
+}
  if (off >= s->data_end) {
  s->data_end = off + s->tracks;
  }

with string length fixes in the commit message (more that 74 chars)

Reviewed-by: Denis V. Lunev 



Re: [PATCH v6 02/10] parallels: Fix high_off calculation in parallels_co_check()

2022-08-26 Thread Denis V. Lunev

On 25.08.2022 16:31, Alexander Ivanov wrote:

Don't let high_off be more than the file size even if we don't fix the image.

Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 93bc2750ef..7e8cdbbc3a 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -460,12 +460,12 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
  fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
  res->corruptions++;
  if (fix & BDRV_FIX_ERRORS) {
-prev_off = 0;
  s->bat_bitmap[i] = 0;
  res->corruptions_fixed++;
  flush_bat = true;
-continue;
  }
+prev_off = 0;
+continue;
  }
  
  res->bfi.allocated_clusters++;

with string length fixes in the commit message (more that 74 chars)

Reviewed-by: Denis V. Lunev 



Re: [PATCH v6 03/10] parallels: Fix data_end after out-of-image check

2022-08-26 Thread Denis V. Lunev

On 25.08.2022 16:31, Alexander Ivanov wrote:

Set data_end to the end of the last cluster inside the image.
In such a way we can be shure that corrupted offsets in the BAT

s/shure/sure/

can't affect on the image size.

Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index 7e8cdbbc3a..c1ff8bb5f0 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -514,6 +514,8 @@ static int coroutine_fn parallels_co_check(BlockDriverState 
*bs,
  }
  }
  
+s->data_end = res->image_end_offset >> BDRV_SECTOR_BITS;

+
  out:
  qemu_co_mutex_unlock(&s->lock);
  return ret;

Reviewed-by: Denis V. Lunev 



Re: [PATCH] linux-user: use 'max' instead of 'qemu32' / 'qemu64' by defualt

2022-08-26 Thread Claudio Fontana
On 8/26/22 14:29, Daniel P. Berrangé wrote:
> On Fri, Aug 26, 2022 at 01:50:40PM +0200, Claudio Fontana wrote:
>> On 8/26/22 13:39, Daniel P. Berrangé wrote:
>>> The 'qemu64' CPU model implements the least featureful x86_64 CPU that's
>>> possible. Historically this hasn't been an issue since it was rare for
>>> OS distros to build with a higher mandatory CPU baseline.
>>>
>>> With RHEL-9, however, the entire distro is built for the x86_64-v2 ABI
>>> baseline:
>>>
>>>   
>>> https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-linux-9-for-the-x86-64-v2-microarchitecture-level
>>>
>>> It is likely that other distros may take similar steps in the not too
>>> distant future. For example, it has been suggested for Fedora on a
>>> number of occassions.
>>>
>>> This new baseline is not compatible with the qemu64 CPU model though.
>>> While it is possible to pass a '-cpu xxx' flag to qemu-x86_64, the
>>> usage of QEMU doesn't always allow for this. For example, the args
>>> are typically controlled via binfmt rules that the user has no ability
>>> to change. This impacts users who are trying to use podman on aarch64
>>> platforms, to run containers with x86_64 content. There's no arg to
>>> podman that can be used to change the qemu-x86_64 args, and a non-root
>>> user of podman can not change binfmt rules without elevating privileges:
>>>
>>>   https://github.com/containers/podman/issues/15456#issuecomment-1228210973
>>>
>>> Changing to the 'max' CPU model gives 'qemu-x86_64' maximum
>>> compatibility with binaries it is likely to encounter in the wild,
>>> and not likely to have a significant downside for existing usage.
>>
>> How do we know for sure? Do we have a base of binaries to test across
>> qemu versions?
> 
> There are never any perfect guarantees, but this assertion is based on
> the view that the x86 instruction set changes are considered backwards
> compatible. Existing applications from years (even decades) ago can
> generally run on arbitrarily newer CPUs with orders of magnitude more
> features, as apps have to intentionally opt-in to use of new CPU
> instructions.
> 
> So the risk here would be an existing applications, which is able to
> dynamically opt-in to optimized code paths if certain CPUID features
> exist, and in turn tickles a bug in QEMU's implementation of said
> feature that it would not previously hit. That's certainly possible,
> but I don't think it would be common, as we would already have seen
> that in system emulators.  The la57 feature issue Richard mentions
> is one example, but that doesn't impact user emulators I believe.
> 
> Weigh that risk against the fact that we have users frequently
> hitting problems with the existing qemu64  default because it is
> too old. User's have already been making this change in the context
> of Docker for this reason. eg
> 
> https://github.com/tonistiigi/binfmt/blob/master/patches/cpu-max/0001-default-to-cpu-max-on-x86-and-arm.patch
> 
>>
>>>
>>> Most other architectures already use an 'any' CPU model, which is
>>> often mapped to 'max' (or similar) already, rather than the oldest
>>> possible CPU model.
>>>
>>> For the sake of consistency the 'i386' architecture is also changed
>>> from using 'qemu32' to 'max'.
>>>
>>> Signed-off-by: Daniel P. Berrangé 
>>> ---
>>>  linux-user/i386/target_elf.h   | 2 +-
>>>  linux-user/x86_64/target_elf.h | 2 +-
>>>  2 files changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/linux-user/i386/target_elf.h b/linux-user/i386/target_elf.h
>>> index 1c6142e7da..238a9aba73 100644
>>> --- a/linux-user/i386/target_elf.h
>>> +++ b/linux-user/i386/target_elf.h
>>> @@ -9,6 +9,6 @@
>>>  #define I386_TARGET_ELF_H
>>>  static inline const char *cpu_get_model(uint32_t eflags)
>>>  {
>>> -return "qemu32";
>>> +return "max";
>>>  }
>>>  #endif
>>> diff --git a/linux-user/x86_64/target_elf.h b/linux-user/x86_64/target_elf.h
>>> index 7b76a90de8..3f628f8d66 100644
>>> --- a/linux-user/x86_64/target_elf.h
>>> +++ b/linux-user/x86_64/target_elf.h
>>> @@ -9,6 +9,6 @@
>>>  #define X86_64_TARGET_ELF_H
>>>  static inline const char *cpu_get_model(uint32_t eflags)
>>>  {
>>> -return "qemu64";
>>> +return "max";
>>>  }
>>>  #endif
>>
>> Just seems an abrupt change to me if we don't have a mechanism in
>> place to ensure we don't break existing workloads.
> 
> There are no absolutes here. We have risk of unknown problem possibly
> breaking some existing apps, vs a known problem currently breaking
> users of CentOS 9 / RHEL 9, which podman and docker need to workaround.

I wonder how bad the workarounds are, when they allow both old and new users to 
enjoy their running workloads.

> 
> The question is which benefits more users, and which is the better
> long term option. I think using modern CPU is better long term, and
> if we find bugs in QEMU's TCG impl we just need to fix them regardless.
> 
> If we find bugs in applications, however, then the apps need to fix
> them.

Hmm... I wo

Re: [PATCH v6 07/10] parallels: Move check of cluster outside image to a separate function

2022-08-26 Thread Denis V. Lunev

On 25.08.2022 16:31, Alexander Ivanov wrote:

We will add more and more checks so we need a better code structure
in parallels_co_check. Let each check performs in a separate loop
in a separate helper.

Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 59 ++-
  1 file changed, 43 insertions(+), 16 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index eea318f809..f50cd232aa 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -438,13 +438,50 @@ static void parallels_check_unclean(BlockDriverState *bs,
  }
  }
  
+static int parallels_check_outside_image(BlockDriverState *bs,

+ BdrvCheckResult *res,
+ BdrvCheckMode fix)
+{
+BDRVParallelsState *s = bs->opaque;
+uint32_t i;
+int64_t off, high_off, size;
+
+size = bdrv_getlength(bs->file->bs);
+if (size < 0) {
+res->check_errors++;
+return size;
+}
+
+high_off = 0;
+for (i = 0; i < s->bat_size; i++) {
+off = bat2sect(s, i) << BDRV_SECTOR_BITS;
+if (off > size) {


we need one more patch here. Correct check would be

if (off >= size) {
  bla-bla()
}


+fprintf(stderr, "%s cluster %u is outside image\n",
+fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
+res->corruptions++;
+if (fix & BDRV_FIX_ERRORS) {
+parallels_set_bat_entry(s, i, 0);
+res->corruptions_fixed++;
+}
+continue;
+}
+if (high_off < off) {
+high_off = off;
+}
+}
+
+s->data_end = (high_off + s->cluster_size) >> BDRV_SECTOR_BITS;
+
+return 0;
+}
+
  static int coroutine_fn parallels_co_check(BlockDriverState *bs,
 BdrvCheckResult *res,
 BdrvCheckMode fix)
  {
  BDRVParallelsState *s = bs->opaque;
  int64_t size, prev_off, high_off;
-int ret = 0;
+int ret;
  uint32_t i;
  
  size = bdrv_getlength(bs->file->bs);

@@ -457,6 +494,11 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
  
  parallels_check_unclean(bs, res, fix);
  
+ret = parallels_check_outside_image(bs, res, fix);

+if (ret < 0) {
+goto out;
+}
+
  res->bfi.total_clusters = s->bat_size;
  res->bfi.compressed_clusters = 0; /* compression is not supported */
  
@@ -469,19 +511,6 @@ static int coroutine_fn parallels_co_check(BlockDriverState *bs,

  continue;
  }
  
-/* cluster outside the image */

-if (off > size) {
-fprintf(stderr, "%s cluster %u is outside image\n",
-fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
-res->corruptions++;
-if (fix & BDRV_FIX_ERRORS) {
-parallels_set_bat_entry(s, i, 0);
-res->corruptions_fixed++;
-}
-prev_off = 0;
-continue;
-}
-
  res->bfi.allocated_clusters++;
  if (off > high_off) {
  high_off = off;
@@ -519,8 +548,6 @@ static int coroutine_fn parallels_co_check(BlockDriverState 
*bs,
  }
  }
  
-s->data_end = res->image_end_offset >> BDRV_SECTOR_BITS;

-
  out:
  qemu_co_mutex_unlock(&s->lock);
  

Reviewed-by: Denis V. Lunev 



Re: [PATCH 40/51] chardev/char-file: Add FILE_SHARE_WRITE when openning the file for win32

2022-08-26 Thread Bin Meng
On Thu, Aug 25, 2022 at 3:59 PM Marc-André Lureau
 wrote:
>
> Hi
>
> On Wed, Aug 24, 2022 at 1:43 PM Bin Meng  wrote:
> >
> > From: Xuzhou Cheng 
> >
> > The combination of GENERIC_WRITE and FILE_SHARE_READ options does
> > not allow the same file to be opened again by CreateFile() from
> > another QEMU process with the same options when the previous QEMU
> > process still holds the file handle openned.
>
> opened
>
> >
> > As per [1] we should add FILE_SHARE_WRITE to the share mode to allow
> > such use case. This change makes the behavior be consisten with the
> > POSIX platforms.
> >
>
> consistent
>
> > [1] 
> > https://docs.microsoft.com/en-us/windows/win32/fileio/creating-and-opening-files
> >
> > Signed-off-by: Xuzhou Cheng 
> > Signed-off-by: Bin Meng 
> > ---
>
>
> What's the benefit to allow multiple processes write access to the
> same file? It seems it could easily lead to corruption or unexpected
> results.

This was triggered by running the test_multifd_tcp_cancel() case on
windows, which cancels the migration, and launches another QEMU
process to migrate with the same file opened for write. Chances are
that the previous QEMU process does not quit before the new QEMU
process runs hence the new one still holds the file handle that does
not allow shared write permission then the new QEMU process will fail.

> To me, it's the other way around, the POSIX implementation should
> learn to lock the file opened for write..
>

Regards,
Bin



[PATCH v11 04/21] aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED

2022-08-26 Thread Emanuele Giuseppe Esposito
Same as AIO_WAIT_WHILE macro, but if we are in the Main loop
do not release and then acquire ctx_ 's aiocontext.

Once all Aiocontext locks go away, this macro will replace
AIO_WAIT_WHILE.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/aio-wait.h | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
index 54840f8622..dd9a7f6461 100644
--- a/include/block/aio-wait.h
+++ b/include/block/aio-wait.h
@@ -59,10 +59,13 @@ typedef struct {
 extern AioWait global_aio_wait;
 
 /**
- * AIO_WAIT_WHILE:
+ * AIO_WAIT_WHILE_INTERNAL:
  * @ctx: the aio context, or NULL if multiple aio contexts (for which the
  *   caller does not hold a lock) are involved in the polling condition.
  * @cond: wait while this conditional expression is true
+ * @unlock: whether to unlock and then lock again @ctx. This apples
+ * only when waiting for another AioContext from the main loop.
+ * Otherwise it's ignored.
  *
  * Wait while a condition is true.  Use this to implement synchronous
  * operations that require event loop activity.
@@ -75,7 +78,7 @@ extern AioWait global_aio_wait;
  * wait on conditions between two IOThreads since that could lead to deadlock,
  * go via the main loop instead.
  */
-#define AIO_WAIT_WHILE(ctx, cond) ({   \
+#define AIO_WAIT_WHILE_INTERNAL(ctx, cond, unlock) ({  \
 bool waited_ = false;  \
 AioWait *wait_ = &global_aio_wait; \
 AioContext *ctx_ = (ctx);  \
@@ -92,11 +95,11 @@ extern AioWait global_aio_wait;
 assert(qemu_get_current_aio_context() ==   \
qemu_get_aio_context());\
 while ((cond)) {   \
-if (ctx_) {\
+if (unlock && ctx_) {  \
 aio_context_release(ctx_); \
 }  \
 aio_poll(qemu_get_aio_context(), true);\
-if (ctx_) {\
+if (unlock && ctx_) {  \
 aio_context_acquire(ctx_); \
 }  \
 waited_ = true;\
@@ -105,6 +108,12 @@ extern AioWait global_aio_wait;
 qatomic_dec(&wait_->num_waiters);  \
 waited_; })
 
+#define AIO_WAIT_WHILE(ctx, cond)  \
+AIO_WAIT_WHILE_INTERNAL(ctx, cond, true)
+
+#define AIO_WAIT_WHILE_UNLOCKED(ctx, cond) \
+AIO_WAIT_WHILE_INTERNAL(ctx, cond, false)
+
 /**
  * aio_wait_kick:
  * Wake up the main thread if it is waiting on AIO_WAIT_WHILE().  During
-- 
2.31.1




[PATCH v11 08/21] jobs: add job lock in find_* functions

2022-08-26 Thread Emanuele Giuseppe Esposito
Both blockdev.c and job-qmp.c have TOC/TOU conditions, because
they first search for the job and then perform an action on it.
Therefore, we need to do the search + action under the same
job mutex critical section.

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Kevin Wolf 
---
 blockdev.c | 67 +-
 job-qmp.c  | 57 --
 2 files changed, 86 insertions(+), 38 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 9230888e34..71f793c4ab 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3302,9 +3302,13 @@ out:
 aio_context_release(aio_context);
 }
 
-/* Get a block job using its ID and acquire its AioContext */
-static BlockJob *find_block_job(const char *id, AioContext **aio_context,
-Error **errp)
+/*
+ * Get a block job using its ID and acquire its AioContext.
+ * Called with job_mutex held.
+ */
+static BlockJob *find_block_job_locked(const char *id,
+   AioContext **aio_context,
+   Error **errp)
 {
 BlockJob *job;
 
@@ -3312,7 +3316,7 @@ static BlockJob *find_block_job(const char *id, 
AioContext **aio_context,
 
 *aio_context = NULL;
 
-job = block_job_get(id);
+job = block_job_get_locked(id);
 
 if (!job) {
 error_set(errp, ERROR_CLASS_DEVICE_NOT_ACTIVE,
@@ -3329,13 +,16 @@ static BlockJob *find_block_job(const char *id, 
AioContext **aio_context,
 void qmp_block_job_set_speed(const char *device, int64_t speed, Error **errp)
 {
 AioContext *aio_context;
-BlockJob *job = find_block_job(device, &aio_context, errp);
+BlockJob *job;
+
+JOB_LOCK_GUARD();
+job = find_block_job_locked(device, &aio_context, errp);
 
 if (!job) {
 return;
 }
 
-block_job_set_speed(job, speed, errp);
+block_job_set_speed_locked(job, speed, errp);
 aio_context_release(aio_context);
 }
 
@@ -3343,7 +3350,10 @@ void qmp_block_job_cancel(const char *device,
   bool has_force, bool force, Error **errp)
 {
 AioContext *aio_context;
-BlockJob *job = find_block_job(device, &aio_context, errp);
+BlockJob *job;
+
+JOB_LOCK_GUARD();
+job = find_block_job_locked(device, &aio_context, errp);
 
 if (!job) {
 return;
@@ -3353,14 +3363,14 @@ void qmp_block_job_cancel(const char *device,
 force = false;
 }
 
-if (job_user_paused(&job->job) && !force) {
+if (job_user_paused_locked(&job->job) && !force) {
 error_setg(errp, "The block job for device '%s' is currently paused",
device);
 goto out;
 }
 
 trace_qmp_block_job_cancel(job);
-job_user_cancel(&job->job, force, errp);
+job_user_cancel_locked(&job->job, force, errp);
 out:
 aio_context_release(aio_context);
 }
@@ -3368,57 +3378,69 @@ out:
 void qmp_block_job_pause(const char *device, Error **errp)
 {
 AioContext *aio_context;
-BlockJob *job = find_block_job(device, &aio_context, errp);
+BlockJob *job;
+
+JOB_LOCK_GUARD();
+job = find_block_job_locked(device, &aio_context, errp);
 
 if (!job) {
 return;
 }
 
 trace_qmp_block_job_pause(job);
-job_user_pause(&job->job, errp);
+job_user_pause_locked(&job->job, errp);
 aio_context_release(aio_context);
 }
 
 void qmp_block_job_resume(const char *device, Error **errp)
 {
 AioContext *aio_context;
-BlockJob *job = find_block_job(device, &aio_context, errp);
+BlockJob *job;
+
+JOB_LOCK_GUARD();
+job = find_block_job_locked(device, &aio_context, errp);
 
 if (!job) {
 return;
 }
 
 trace_qmp_block_job_resume(job);
-job_user_resume(&job->job, errp);
+job_user_resume_locked(&job->job, errp);
 aio_context_release(aio_context);
 }
 
 void qmp_block_job_complete(const char *device, Error **errp)
 {
 AioContext *aio_context;
-BlockJob *job = find_block_job(device, &aio_context, errp);
+BlockJob *job;
+
+JOB_LOCK_GUARD();
+job = find_block_job_locked(device, &aio_context, errp);
 
 if (!job) {
 return;
 }
 
 trace_qmp_block_job_complete(job);
-job_complete(&job->job, errp);
+job_complete_locked(&job->job, errp);
 aio_context_release(aio_context);
 }
 
 void qmp_block_job_finalize(const char *id, Error **errp)
 {
 AioContext *aio_context;
-BlockJob *job = find_block_job(id, &aio_context, errp);
+BlockJob *job;
+
+JOB_LOCK_GUARD();
+job = find_block_job_locked(id, &aio_context, errp);
 
 if (!job) {
 return;
 }
 
 trace_qmp_block_job_finalize(job);
-job_ref(&job->job);
-job_finalize(&job->job, errp);
+job_ref_locked(&job->job);
+job_finalize_locked(&job->job, errp);
 
 /*
  * Job's con

[PATCH v11 00/21] job: replace AioContext lock with job_mutex

2022-08-26 Thread Emanuele Giuseppe Esposito
In this series, we want to remove the AioContext lock and instead
use the already existent job_mutex to protect the job structures
and list. This is part of the work to get rid of AioContext lock
usage in favour of smaller granularity locks.

In order to simplify reviewer's job, job lock/unlock functions and
macros are added as empty prototypes (nop) in patch 1.
They are converted to use the actual job mutex only in the last
patch. In this way we can freely create locking sections
without worrying about deadlocks with the aiocontext lock.

Patch 2 defines what fields in the job structure need protection.
Patches 3-6 are in preparation to the job locks, moving functions
from global to static and introducing helpers.

Patch 7-9 introduce the (nop) job lock into the job API and
its users, and patches 10-13 categorize respectively locked and
unlocked functions in the job API.

Patches 14-17 take care of protecting job->aio_context, and
finally patch 18 makes the prototypes in patch 1 use the
job_mutex and removes all aiocontext lock at the same time.

Tested this series by running unit tests, qemu-iotests and qtests
(x86_64).

---
v11:
* Apply Kevin and Vladimir feedbacks
* job_set_aio_context: check coroutine is quiescent if job_is_completed
* Rephrased commit message in patch 13

v10:
* protect job->status in unit tests
* patch 11: change commit description and avoid using lock guard for a single
function call
* move patch 19 before patch 15

v9:
* merge patch 6 and 7 to 5.
* additional "taken with job lock/unlock" added and propagated in callers
* protect iostatus field of BlockJobs
* move all blockjob patches torward the end of the serie

v8:
* reorganize patch ordering according with Vladimir proposal
* minor nitpicks

v7:
* s/temporary/temporarly
* double identical locking comment to the same function
* patch 2: add "Protected by AioContext lock" to better categorize fields in
  job.h
* use same comment style in all function headers ("Just like {funct}, but
  called between job_lock and job_unlock")

v6:
* patch 4 and 6 squashed with patch 19 (enable job lock and
  reduce/remove AioContext lock)
* patch 19: job_unref_locked read the aiocontext inside the
  job lock.

v5:
* just restructured patches a little bit better, as there were
  functions used before they were defined.
* rebased on kwolf/block branch and API split serie

v4:
* move "protected by job_mutex" from patch 2 to 15, where the job_mutex is
  actually added.
* s/aio_co_enter/aio_co_schedule in job.c, and adjust tests accordingly.
* remove job_get_aio_context, add job_set_aio_context. Use "fake rwlock"
  to protect job->aiocontext.
* get rid of useless getters method, namely:
  job_get_status
  job_get_pause_count
  job_get_paused
  job_get_busy
  They are all used only by tests, and such getter is pretty useless.
  Replace with job_lock(); assert(); job_unlock();
* use job lock macros instead of job lock/unlock in unit tests.
* convert also blockjob functions to have _locked
* put the job_lock/unlock patches before the _locked ones
* replace aio_co_enter in job.c and detect change of context

v3:
* add "_locked" suffix to the functions called under job_mutex lock
* rename _job_lock in real_job_lock
* job_mutex is now public, and drivers like monitor use it directly
* introduce and protect job_get_aio_context
* remove mirror-specific APIs and just use WITH_JOB_GUARD
* more extensive use of WITH_JOB_GUARD and JOB_LOCK_GUARD

RFC v2:
* use JOB_LOCK_GUARD and WITH_JOB_LOCK_GUARD
* mu(u)ltiple typos in commit messages
* job API split patches are sent separately in another series
* use of empty job_{lock/unlock} and JOB_LOCK_GUARD/WITH_JOB_LOCK_GUARD
  to avoid deadlocks and simplify the reviewer job
* move patch 11 (block_job_query: remove atomic read) as last

Emanuele Giuseppe Esposito (20):
  job.c: make job_mutex and job_lock/unlock() public
  job.h: categorize fields in struct Job
  job.c: API functions not used outside should be static
  aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED
  job.c: add job_lock/unlock while keeping job.h intact
  job: move and update comments from blockjob.c
  blockjob: introduce block_job  _locked() APIs
  jobs: add job lock in find_* functions
  jobs: use job locks also in the unit tests
  block/mirror.c: use of job helpers in drivers
  jobs: group together API calls under the same job lock
  jobs: protect job.aio_context with BQL and job_mutex
  blockjob.h: categorize fields in struct BlockJob
  blockjob: rename notifier callbacks as _locked
  blockjob: protect iostatus field in BlockJob struct
  job.h: categorize JobDriver callbacks that need the AioContext lock
  job.c: enable job lock/unlock and remove Aiocontext locks
  block_job_query: remove atomic read
  blockjob: remove unused functions
  job: remove unused functions

Paolo Bonzini (1):
  job: detect change of aiocontext within job coroutine

 block.c  |  17 +-
 block/mirror.c   |  20 +-
 block/replication.c 

[PATCH v11 01/21] job.c: make job_mutex and job_lock/unlock() public

2022-08-26 Thread Emanuele Giuseppe Esposito
job mutex will be used to protect the job struct elements and list,
replacing AioContext locks.

Right now use a shared lock for all jobs, in order to keep things
simple. Once the AioContext lock is gone, we can introduce per-job
locks.

To simplify the switch from aiocontext to job lock, introduce
*nop* lock/unlock functions and macros.
We want to always call job_lock/unlock outside the AioContext locks,
and not vice-versa, otherwise we might get a deadlock. This is not
straightforward to do, and that's why we start with nop functions.
Once everything is protected by job_lock/unlock, we can change the nop into
an actual mutex and remove the aiocontext lock.

Since job_mutex is already being used, add static
real_job_{lock/unlock} for the existing usage.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 include/qemu/job.h | 24 
 job.c  | 35 +++
 2 files changed, 47 insertions(+), 12 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index c105b31076..d1192ffd61 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -303,6 +303,30 @@ typedef enum JobCreateFlags {
 JOB_MANUAL_DISMISS = 0x04,
 } JobCreateFlags;
 
+extern QemuMutex job_mutex;
+
+#define JOB_LOCK_GUARD() /* QEMU_LOCK_GUARD(&job_mutex) */
+
+#define WITH_JOB_LOCK_GUARD() /* WITH_QEMU_LOCK_GUARD(&job_mutex) */
+
+/**
+ * job_lock:
+ *
+ * Take the mutex protecting the list of jobs and their status.
+ * Most functions called by the monitor need to call job_lock
+ * and job_unlock manually.  On the other hand, function called
+ * by the block jobs themselves and by the block layer will take the
+ * lock for you.
+ */
+void job_lock(void);
+
+/**
+ * job_unlock:
+ *
+ * Release the mutex protecting the list of jobs and their status.
+ */
+void job_unlock(void);
+
 /**
  * Allocate and return a new job transaction. Jobs can be added to the
  * transaction using job_txn_add_job().
diff --git a/job.c b/job.c
index 075c6f3a20..2b4ffca9d4 100644
--- a/job.c
+++ b/job.c
@@ -32,6 +32,12 @@
 #include "trace/trace-root.h"
 #include "qapi/qapi-events-job.h"
 
+/*
+ * job_mutex protects the jobs list, but also makes the
+ * struct job fields thread-safe.
+ */
+QemuMutex job_mutex;
+
 static QLIST_HEAD(, Job) jobs = QLIST_HEAD_INITIALIZER(jobs);
 
 /* Job State Transition Table */
@@ -74,17 +80,22 @@ struct JobTxn {
 int refcnt;
 };
 
-/* Right now, this mutex is only needed to synchronize accesses to job->busy
- * and job->sleep_timer, such as concurrent calls to job_do_yield and
- * job_enter. */
-static QemuMutex job_mutex;
+void job_lock(void)
+{
+/* nop */
+}
+
+void job_unlock(void)
+{
+/* nop */
+}
 
-static void job_lock(void)
+static void real_job_lock(void)
 {
 qemu_mutex_lock(&job_mutex);
 }
 
-static void job_unlock(void)
+static void real_job_unlock(void)
 {
 qemu_mutex_unlock(&job_mutex);
 }
@@ -450,21 +461,21 @@ void job_enter_cond(Job *job, bool(*fn)(Job *job))
 return;
 }
 
-job_lock();
+real_job_lock();
 if (job->busy) {
-job_unlock();
+real_job_unlock();
 return;
 }
 
 if (fn && !fn(job)) {
-job_unlock();
+real_job_unlock();
 return;
 }
 
 assert(!job->deferred_to_main_loop);
 timer_del(&job->sleep_timer);
 job->busy = true;
-job_unlock();
+real_job_unlock();
 aio_co_enter(job->aio_context, job->co);
 }
 
@@ -481,13 +492,13 @@ void job_enter(Job *job)
  * called explicitly. */
 static void coroutine_fn job_do_yield(Job *job, uint64_t ns)
 {
-job_lock();
+real_job_lock();
 if (ns != -1) {
 timer_mod(&job->sleep_timer, ns);
 }
 job->busy = false;
 job_event_idle(job);
-job_unlock();
+real_job_unlock();
 qemu_coroutine_yield();
 
 /* Set by job_enter_cond() before re-entering the coroutine.  */
-- 
2.31.1




[PATCH v11 12/21] job: detect change of aiocontext within job coroutine

2022-08-26 Thread Emanuele Giuseppe Esposito
From: Paolo Bonzini 

We want to make sure access of job->aio_context is always done
under either BQL or job_mutex. The problem is that using
aio_co_enter(job->aiocontext, job->co) in job_start and job_enter_cond
makes the coroutine immediately resume, so we can't hold the job lock.
And caching it is not safe either, as it might change.

job_start is under BQL, so it can freely read job->aiocontext, but
job_enter_cond is not.
We want to avoid reading job->aio_context in job_enter_cond, therefore:
1) use aio_co_wake(), since it doesn't want an aiocontext as argument
   but uses job->co->ctx
2) detect possible discrepancy between job->co->ctx and job->aio_context
   by checking right after the coroutine resumes back from yielding if
   job->aio_context has changed. If so, reschedule the coroutine to the
   new context.

Calling bdrv_try_set_aio_context() will issue the following calls
(simplified):
* in terms of  bdrv callbacks:
  .drained_begin -> .set_aio_context -> .drained_end
* in terms of child_job functions:
  child_job_drained_begin -> child_job_set_aio_context -> child_job_drained_end
* in terms of job functions:
  job_pause_locked -> job_set_aio_context -> job_resume_locked

We can see that after setting the new aio_context, job_resume_locked
calls again job_enter_cond, which then invokes aio_co_wake(). But
while job->aiocontext has been set in job_set_aio_context,
job->co->ctx has not changed, so the coroutine would be entering in
the wrong aiocontext.

Using aio_co_schedule in job_resume_locked() might seem as a valid
alternative, but the problem is that the bh resuming the coroutine
is not scheduled immediately, and if in the meanwhile another
bdrv_try_set_aio_context() is run (see test_propagate_mirror() in
test-block-iothread.c), we would have the first schedule in the
wrong aiocontext, and the second set of drains won't even manage
to schedule the coroutine, as job->busy would still be true from
the previous job_resume_locked().

The solution is to stick with aio_co_wake() and detect every time
the coroutine resumes back from yielding if job->aio_context
has changed. If so, we can reschedule it to the new context.

Check for the aiocontext change in job_do_yield_locked because:
1) aio_co_reschedule_self requires to be in the running coroutine
2) since child_job_set_aio_context allows changing the aiocontext only
   while the job is paused, this is the exact place where the coroutine
   resumes, before running JobDriver's code.

Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
---
 job.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/job.c b/job.c
index e336af0c1c..85ae843f03 100644
--- a/job.c
+++ b/job.c
@@ -588,7 +588,7 @@ void job_enter_cond_locked(Job *job, bool(*fn)(Job *job))
 job->busy = true;
 real_job_unlock();
 job_unlock();
-aio_co_enter(job->aio_context, job->co);
+aio_co_wake(job->co);
 job_lock();
 }
 
@@ -615,6 +615,8 @@ void job_enter(Job *job)
  */
 static void coroutine_fn job_do_yield_locked(Job *job, uint64_t ns)
 {
+AioContext *next_aio_context;
+
 real_job_lock();
 if (ns != -1) {
 timer_mod(&job->sleep_timer, ns);
@@ -626,7 +628,20 @@ static void coroutine_fn job_do_yield_locked(Job *job, 
uint64_t ns)
 qemu_coroutine_yield();
 job_lock();
 
-/* Set by job_enter_cond() before re-entering the coroutine.  */
+next_aio_context = job->aio_context;
+/*
+ * Coroutine has resumed, but in the meanwhile the job AioContext
+ * might have changed via bdrv_try_set_aio_context(), so we need to move
+ * the coroutine too in the new aiocontext.
+ */
+while (qemu_get_current_aio_context() != next_aio_context) {
+job_unlock();
+aio_co_reschedule_self(next_aio_context);
+job_lock();
+next_aio_context = job->aio_context;
+}
+
+/* Set by job_enter_cond_locked() before re-entering the coroutine.  */
 assert(job->busy);
 }
 
-- 
2.31.1




[PATCH v11 07/21] blockjob: introduce block_job _locked() APIs

2022-08-26 Thread Emanuele Giuseppe Esposito
Just as done with job.h, create _locked() functions in blockjob.h

These functions will be later useful when caller has already taken
the lock. All blockjob _locked functions call job _locked functions.

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Kevin Wolf 
Reviewed-by: Stefan Hajnoczi 
---
 blockjob.c   | 52 
 include/block/blockjob.h | 18 ++
 2 files changed, 60 insertions(+), 10 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 7da59a1f1c..0d59aba439 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -44,21 +44,27 @@ static bool is_block_job(Job *job)
job_type(job) == JOB_TYPE_STREAM;
 }
 
-BlockJob *block_job_next(BlockJob *bjob)
+BlockJob *block_job_next_locked(BlockJob *bjob)
 {
 Job *job = bjob ? &bjob->job : NULL;
 GLOBAL_STATE_CODE();
 
 do {
-job = job_next(job);
+job = job_next_locked(job);
 } while (job && !is_block_job(job));
 
 return job ? container_of(job, BlockJob, job) : NULL;
 }
 
-BlockJob *block_job_get(const char *id)
+BlockJob *block_job_next(BlockJob *bjob)
 {
-Job *job = job_get(id);
+JOB_LOCK_GUARD();
+return block_job_next_locked(bjob);
+}
+
+BlockJob *block_job_get_locked(const char *id)
+{
+Job *job = job_get_locked(id);
 GLOBAL_STATE_CODE();
 
 if (job && is_block_job(job)) {
@@ -68,6 +74,12 @@ BlockJob *block_job_get(const char *id)
 }
 }
 
+BlockJob *block_job_get(const char *id)
+{
+JOB_LOCK_GUARD();
+return block_job_get_locked(id);
+}
+
 void block_job_free(Job *job)
 {
 BlockJob *bjob = container_of(job, BlockJob, job);
@@ -256,14 +268,14 @@ static bool job_timer_pending(Job *job)
 return timer_pending(&job->sleep_timer);
 }
 
-bool block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
+bool block_job_set_speed_locked(BlockJob *job, int64_t speed, Error **errp)
 {
 const BlockJobDriver *drv = block_job_driver(job);
 int64_t old_speed = job->speed;
 
 GLOBAL_STATE_CODE();
 
-if (job_apply_verb(&job->job, JOB_VERB_SET_SPEED, errp) < 0) {
+if (job_apply_verb_locked(&job->job, JOB_VERB_SET_SPEED, errp) < 0) {
 return false;
 }
 if (speed < 0) {
@@ -277,7 +289,9 @@ bool block_job_set_speed(BlockJob *job, int64_t speed, 
Error **errp)
 job->speed = speed;
 
 if (drv->set_speed) {
+job_unlock();
 drv->set_speed(job, speed);
+job_lock();
 }
 
 if (speed && speed <= old_speed) {
@@ -285,18 +299,24 @@ bool block_job_set_speed(BlockJob *job, int64_t speed, 
Error **errp)
 }
 
 /* kick only if a timer is pending */
-job_enter_cond(&job->job, job_timer_pending);
+job_enter_cond_locked(&job->job, job_timer_pending);
 
 return true;
 }
 
+bool block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
+{
+JOB_LOCK_GUARD();
+return block_job_set_speed_locked(job, speed, errp);
+}
+
 int64_t block_job_ratelimit_get_delay(BlockJob *job, uint64_t n)
 {
 IO_CODE();
 return ratelimit_calculate_delay(&job->limit, n);
 }
 
-BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
+BlockJobInfo *block_job_query_locked(BlockJob *job, Error **errp)
 {
 BlockJobInfo *info;
 uint64_t progress_current, progress_total;
@@ -320,7 +340,7 @@ BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
 info->len   = progress_total;
 info->speed = job->speed;
 info->io_status = job->iostatus;
-info->ready = job_is_ready(&job->job),
+info->ready = job_is_ready_locked(&job->job),
 info->status= job->job.status;
 info->auto_finalize = job->job.auto_finalize;
 info->auto_dismiss  = job->job.auto_dismiss;
@@ -333,6 +353,12 @@ BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
 return info;
 }
 
+BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
+{
+JOB_LOCK_GUARD();
+return block_job_query_locked(job, errp);
+}
+
 static void block_job_iostatus_set_err(BlockJob *job, int error)
 {
 if (job->iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
@@ -478,7 +504,7 @@ fail:
 return NULL;
 }
 
-void block_job_iostatus_reset(BlockJob *job)
+void block_job_iostatus_reset_locked(BlockJob *job)
 {
 GLOBAL_STATE_CODE();
 if (job->iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
@@ -488,6 +514,12 @@ void block_job_iostatus_reset(BlockJob *job)
 job->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
 }
 
+void block_job_iostatus_reset(BlockJob *job)
+{
+JOB_LOCK_GUARD();
+block_job_iostatus_reset_locked(job);
+}
+
 void block_job_user_resume(Job *job)
 {
 BlockJob *bjob = container_of(job, BlockJob, job);
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 6525e16fd5..8b65d3949d 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -92,6 +92,9 @@ typedef struct BlockJob {
  */
 BlockJob *bl

[PATCH v11 03/21] job.c: API functions not used outside should be static

2022-08-26 Thread Emanuele Giuseppe Esposito
job_event_* functions can all be static, as they are not used
outside job.c.

Same applies for job_txn_add_job().

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Kevin Wolf 
---
 include/qemu/job.h | 18 --
 job.c  | 22 +++---
 2 files changed, 19 insertions(+), 21 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 876e13d549..4b64eb15f7 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -358,18 +358,6 @@ JobTxn *job_txn_new(void);
  */
 void job_txn_unref(JobTxn *txn);
 
-/**
- * @txn: The transaction (may be NULL)
- * @job: Job to add to the transaction
- *
- * Add @job to the transaction.  The @job must not already be in a transaction.
- * The caller must call either job_txn_unref() or job_completed() to release
- * the reference that is automatically grabbed here.
- *
- * If @txn is NULL, the function does nothing.
- */
-void job_txn_add_job(JobTxn *txn, Job *job);
-
 /**
  * Create a new long-running job and return it.
  *
@@ -431,12 +419,6 @@ void job_progress_set_remaining(Job *job, uint64_t 
remaining);
  */
 void job_progress_increase_remaining(Job *job, uint64_t delta);
 
-/** To be called when a cancelled job is finalised. */
-void job_event_cancelled(Job *job);
-
-/** To be called when a successfully completed job is finalised. */
-void job_event_completed(Job *job);
-
 /**
  * Conditionally enter the job coroutine if the job is ready to run, not
  * already busy and fn() returns true. fn() is called while under the job_lock
diff --git a/job.c b/job.c
index 2b4ffca9d4..cafd597ba4 100644
--- a/job.c
+++ b/job.c
@@ -125,7 +125,17 @@ void job_txn_unref(JobTxn *txn)
 }
 }
 
-void job_txn_add_job(JobTxn *txn, Job *job)
+/**
+ * @txn: The transaction (may be NULL)
+ * @job: Job to add to the transaction
+ *
+ * Add @job to the transaction.  The @job must not already be in a transaction.
+ * The caller must call either job_txn_unref() or job_completed() to release
+ * the reference that is automatically grabbed here.
+ *
+ * If @txn is NULL, the function does nothing.
+ */
+static void job_txn_add_job(JobTxn *txn, Job *job)
 {
 if (!txn) {
 return;
@@ -427,12 +437,18 @@ void job_progress_increase_remaining(Job *job, uint64_t 
delta)
 progress_increase_remaining(&job->progress, delta);
 }
 
-void job_event_cancelled(Job *job)
+/**
+ * To be called when a cancelled job is finalised.
+ */
+static void job_event_cancelled(Job *job)
 {
 notifier_list_notify(&job->on_finalize_cancelled, job);
 }
 
-void job_event_completed(Job *job)
+/**
+ * To be called when a successfully completed job is finalised.
+ */
+static void job_event_completed(Job *job)
 {
 notifier_list_notify(&job->on_finalize_completed, job);
 }
-- 
2.31.1




[PATCH v11 02/21] job.h: categorize fields in struct Job

2022-08-26 Thread Emanuele Giuseppe Esposito
Categorize the fields in struct Job to understand which ones
need to be protected by the job mutex and which don't.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Kevin Wolf 
Reviewed-by: Stefan Hajnoczi 
---
 include/qemu/job.h | 61 +++---
 1 file changed, 36 insertions(+), 25 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index d1192ffd61..876e13d549 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -40,27 +40,52 @@ typedef struct JobTxn JobTxn;
  * Long-running operation.
  */
 typedef struct Job {
+
+/* Fields set at initialization (job_create), and never modified */
+
 /** The ID of the job. May be NULL for internal jobs. */
 char *id;
 
-/** The type of this job. */
+/**
+ * The type of this job.
+ * All callbacks are called with job_mutex *not* held.
+ */
 const JobDriver *driver;
 
-/** Reference count of the block job */
-int refcnt;
-
-/** Current state; See @JobStatus for details. */
-JobStatus status;
-
-/** AioContext to run the job coroutine in */
-AioContext *aio_context;
-
 /**
  * The coroutine that executes the job.  If not NULL, it is reentered when
  * busy is false and the job is cancelled.
+ * Initialized in job_start()
  */
 Coroutine *co;
 
+/** True if this job should automatically finalize itself */
+bool auto_finalize;
+
+/** True if this job should automatically dismiss itself */
+bool auto_dismiss;
+
+/** The completion function that will be called when the job completes.  */
+BlockCompletionFunc *cb;
+
+/** The opaque value that is passed to the completion function.  */
+void *opaque;
+
+/* ProgressMeter API is thread-safe */
+ProgressMeter progress;
+
+
+/** Protected by AioContext lock */
+
+/** AioContext to run the job coroutine in */
+AioContext *aio_context;
+
+/** Reference count of the block job */
+int refcnt;
+
+/** Current state; See @JobStatus for details. */
+JobStatus status;
+
 /**
  * Timer that is used by @job_sleep_ns. Accessed under job_mutex (in
  * job.c).
@@ -112,14 +137,6 @@ typedef struct Job {
 /** Set to true when the job has deferred work to the main loop. */
 bool deferred_to_main_loop;
 
-/** True if this job should automatically finalize itself */
-bool auto_finalize;
-
-/** True if this job should automatically dismiss itself */
-bool auto_dismiss;
-
-ProgressMeter progress;
-
 /**
  * Return code from @run and/or @prepare callback(s).
  * Not final until the job has reached the CONCLUDED status.
@@ -134,12 +151,6 @@ typedef struct Job {
  */
 Error *err;
 
-/** The completion function that will be called when the job completes.  */
-BlockCompletionFunc *cb;
-
-/** The opaque value that is passed to the completion function.  */
-void *opaque;
-
 /** Notifiers called when a cancelled job is finalised */
 NotifierList on_finalize_cancelled;
 
@@ -167,6 +178,7 @@ typedef struct Job {
 
 /**
  * Callbacks and other information about a Job driver.
+ * All callbacks are invoked with job_mutex *not* held.
  */
 struct JobDriver {
 
@@ -472,7 +484,6 @@ void job_yield(Job *job);
  */
 void coroutine_fn job_sleep_ns(Job *job, int64_t ns);
 
-
 /** Returns the JobType of a given Job. */
 JobType job_type(const Job *job);
 
-- 
2.31.1




[PATCH v11 16/21] blockjob: protect iostatus field in BlockJob struct

2022-08-26 Thread Emanuele Giuseppe Esposito
iostatus is the only field (together with .job) that needs
protection using the job mutex.

It is set in the main loop (GLOBAL_STATE functions) but read
in I/O code (block_job_error_action).

In order to protect it, change block_job_iostatus_set_err
to block_job_iostatus_set_err_locked(), always called under
job lock.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Kevin Wolf 
---
 block/mirror.c | 7 +--
 blockjob.c | 5 +++--
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index c6bf7f40ce..7e32ee1d31 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -893,7 +893,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
 MirrorBlockJob *s = container_of(job, MirrorBlockJob, common.job);
 BlockDriverState *bs = s->mirror_top_bs->backing->bs;
 BlockDriverState *target_bs = blk_bs(s->target);
-bool need_drain = true;
+bool need_drain = true, iostatus;
 int64_t length;
 int64_t target_length;
 BlockDriverInfo bdi;
@@ -1016,8 +1016,11 @@ static int coroutine_fn mirror_run(Job *job, Error 
**errp)
  * We do so every BLKOCK_JOB_SLICE_TIME nanoseconds, or when there is
  * an error, or when the source is clean, whichever comes first. */
 delta = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - s->last_pause_ns;
+WITH_JOB_LOCK_GUARD() {
+iostatus = s->common.iostatus;
+}
 if (delta < BLOCK_JOB_SLICE_TIME &&
-s->common.iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
+iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
 if (s->in_flight >= MAX_IN_FLIGHT || s->buf_free_count == 0 ||
 (cnt == 0 && s->in_flight > 0)) {
 trace_mirror_yield(s, cnt, s->buf_free_count, s->in_flight);
diff --git a/blockjob.c b/blockjob.c
index d8fb5311c7..d04f804001 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -363,7 +363,8 @@ BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
 return block_job_query_locked(job, errp);
 }
 
-static void block_job_iostatus_set_err(BlockJob *job, int error)
+/* Called with job lock held */
+static void block_job_iostatus_set_err_locked(BlockJob *job, int error)
 {
 if (job->iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
 job->iostatus = error == ENOSPC ? BLOCK_DEVICE_IO_STATUS_NOSPACE :
@@ -577,8 +578,8 @@ BlockErrorAction block_job_error_action(BlockJob *job, 
BlockdevOnError on_err,
  */
 job->job.user_paused = true;
 }
+block_job_iostatus_set_err_locked(job, error);
 }
-block_job_iostatus_set_err(job, error);
 }
 return action;
 }
-- 
2.31.1




[PATCH v11 05/21] job.c: add job_lock/unlock while keeping job.h intact

2022-08-26 Thread Emanuele Giuseppe Esposito
With "intact" we mean that all job.h functions implicitly
take the lock. Therefore API callers are unmodified.

This means that:
- many static functions that will be always called with job lock held
  become _locked, and call _locked functions
- all public functions take the lock internally if needed, and call _locked
  functions
- all public functions called internally by other functions in job.c will have a
  _locked counterpart (sometimes public), to avoid deadlocks (job lock already 
taken).
  These functions are not used for now.
- some public functions called only from exernal files (not job.c) do not
  have _locked() counterpart and take the lock inside. Others won't need
  the lock at all because use fields only set at initialization and
  never modified.

job_{lock/unlock} is independent from real_job_{lock/unlock}.

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Kevin Wolf 
Reviewed-by: Stefan Hajnoczi 
---
 include/qemu/job.h | 138 ++-
 job.c  | 605 +++--
 2 files changed, 557 insertions(+), 186 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 4b64eb15f7..5709e8d4a8 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -358,8 +358,15 @@ JobTxn *job_txn_new(void);
  */
 void job_txn_unref(JobTxn *txn);
 
+/*
+ * Same as job_txn_unref(), but called with job lock held.
+ * Might release the lock temporarily.
+ */
+void job_txn_unref_locked(JobTxn *txn);
+
 /**
  * Create a new long-running job and return it.
+ * Called with job_mutex *not* held.
  *
  * @job_id: The id of the newly-created job, or %NULL for internal jobs
  * @driver: The class object for the newly-created job.
@@ -380,17 +387,25 @@ void *job_create(const char *job_id, const JobDriver 
*driver, JobTxn *txn,
  */
 void job_ref(Job *job);
 
+/* Same as job_ref(), but called with job lock held. */
+void job_ref_locked(Job *job);
+
 /**
  * Release a reference that was previously acquired with job_ref() or
  * job_create(). If it's the last reference to the object, it will be freed.
  */
 void job_unref(Job *job);
 
+/* Same as job_unref(), but called with job lock held. */
+void job_unref_locked(Job *job);
+
 /**
  * @job: The job that has made progress
  * @done: How much progress the job made since the last call
  *
  * Updates the progress counter of the job.
+ *
+ * May be called with mutex held or not held.
  */
 void job_progress_update(Job *job, uint64_t done);
 
@@ -401,6 +416,8 @@ void job_progress_update(Job *job, uint64_t done);
  *
  * Sets the expected end value of the progress counter of a job so that a
  * completion percentage can be calculated when the progress is updated.
+ *
+ * May be called with mutex held or not held.
  */
 void job_progress_set_remaining(Job *job, uint64_t remaining);
 
@@ -416,6 +433,8 @@ void job_progress_set_remaining(Job *job, uint64_t 
remaining);
  * length before, and job_progress_update() afterwards.
  * (So the operation acts as a parenthesis in regards to the main job
  * operation running in background.)
+ *
+ * May be called with mutex held or not held.
  */
 void job_progress_increase_remaining(Job *job, uint64_t delta);
 
@@ -426,11 +445,19 @@ void job_progress_increase_remaining(Job *job, uint64_t 
delta);
  */
 void job_enter_cond(Job *job, bool(*fn)(Job *job));
 
+/*
+ * Same as job_enter_cond(), but called with job lock held.
+ * Might release the lock temporarily.
+ */
+void job_enter_cond_locked(Job *job, bool(*fn)(Job *job));
+
 /**
  * @job: A job that has not yet been started.
  *
  * Begins execution of a job.
  * Takes ownership of one reference to the job object.
+ *
+ * Called with job_mutex *not* held.
  */
 void job_start(Job *job);
 
@@ -438,6 +465,7 @@ void job_start(Job *job);
  * @job: The job to enter.
  *
  * Continue the specified job by entering the coroutine.
+ * Called with job_mutex *not* held.
  */
 void job_enter(Job *job);
 
@@ -446,6 +474,8 @@ void job_enter(Job *job);
  *
  * Pause now if job_pause() has been called. Jobs that perform lots of I/O
  * must call this between requests so that the job can be paused.
+ *
+ * Called with job_mutex *not* held.
  */
 void coroutine_fn job_pause_point(Job *job);
 
@@ -453,6 +483,7 @@ void coroutine_fn job_pause_point(Job *job);
  * @job: The job that calls the function.
  *
  * Yield the job coroutine.
+ * Called with job_mutex *not* held.
  */
 void job_yield(Job *job);
 
@@ -463,6 +494,8 @@ void job_yield(Job *job);
  * Put the job to sleep (assuming that it wasn't canceled) for @ns
  * %QEMU_CLOCK_REALTIME nanoseconds.  Canceling the job will immediately
  * interrupt the wait.
+ *
+ * Called with job_mutex *not* held.
  */
 void coroutine_fn job_sleep_ns(Job *job, int64_t ns);
 
@@ -475,21 +508,40 @@ const char *job_type_str(const Job *job);
 /** Returns true if the job should not be visible to the management layer. */
 bool job_is_int

[PATCH v11 10/21] block/mirror.c: use of job helpers in drivers

2022-08-26 Thread Emanuele Giuseppe Esposito
Once job lock is used and aiocontext is removed, mirror has
to perform job operations under the same critical section,
Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 block/mirror.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 3c4ab1159d..c6bf7f40ce 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1152,8 +1152,10 @@ static void mirror_complete(Job *job, Error **errp)
 s->should_complete = true;
 
 /* If the job is paused, it will be re-entered when it is resumed */
-if (!job->paused) {
-job_enter(job);
+WITH_JOB_LOCK_GUARD() {
+if (!job->paused) {
+job_enter_cond_locked(job, NULL);
+}
 }
 }
 
@@ -1173,8 +1175,11 @@ static bool mirror_drained_poll(BlockJob *job)
  * from one of our own drain sections, to avoid a deadlock waiting for
  * ourselves.
  */
-if (!s->common.job.paused && !job_is_cancelled(&job->job) && !s->in_drain) 
{
-return true;
+WITH_JOB_LOCK_GUARD() {
+if (!s->common.job.paused && !job_is_cancelled_locked(&job->job)
+&& !s->in_drain) {
+return true;
+}
 }
 
 return !!s->in_flight;
-- 
2.31.1




[PATCH v11 17/21] job.h: categorize JobDriver callbacks that need the AioContext lock

2022-08-26 Thread Emanuele Giuseppe Esposito
Some callbacks implementation use bdrv_* APIs that assume the
AioContext lock is held. Make sure this invariant is documented.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 include/qemu/job.h | 27 +--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index cede227e67..b24aa89737 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -65,7 +65,11 @@ typedef struct Job {
 /** True if this job should automatically dismiss itself */
 bool auto_dismiss;
 
-/** The completion function that will be called when the job completes.  */
+/**
+ * The completion function that will be called when the job completes.
+ * Called with AioContext lock held, since many callback implementations
+ * use bdrv_* functions that require to hold the lock.
+ */
 BlockCompletionFunc *cb;
 
 /** The opaque value that is passed to the completion function.  */
@@ -260,6 +264,9 @@ struct JobDriver {
  *
  * This callback will not be invoked if the job has already failed.
  * If it fails, abort and then clean will be called.
+ *
+ * Called with AioContext lock held, since many callbacs implementations
+ * use bdrv_* functions that require to hold the lock.
  */
 int (*prepare)(Job *job);
 
@@ -270,6 +277,9 @@ struct JobDriver {
  *
  * All jobs will complete with a call to either .commit() or .abort() but
  * never both.
+ *
+ * Called with AioContext lock held, since many callback implementations
+ * use bdrv_* functions that require to hold the lock.
  */
 void (*commit)(Job *job);
 
@@ -280,6 +290,9 @@ struct JobDriver {
  *
  * All jobs will complete with a call to either .commit() or .abort() but
  * never both.
+ *
+ * Called with AioContext lock held, since many callback implementations
+ * use bdrv_* functions that require to hold the lock.
  */
 void (*abort)(Job *job);
 
@@ -288,6 +301,9 @@ struct JobDriver {
  * .commit() or .abort(). Regardless of which callback is invoked after
  * completion, .clean() will always be called, even if the job does not
  * belong to a transaction group.
+ *
+ * Called with AioContext lock held, since many callbacs implementations
+ * use bdrv_* functions that require to hold the lock.
  */
 void (*clean)(Job *job);
 
@@ -302,11 +318,18 @@ struct JobDriver {
  * READY).
  * (If the callback is NULL, the job is assumed to terminate
  * without I/O.)
+ *
+ * Called with AioContext lock held, since many callback implementations
+ * use bdrv_* functions that require to hold the lock.
  */
 bool (*cancel)(Job *job, bool force);
 
 
-/** Called when the job is freed */
+/**
+ * Called when the job is freed.
+ * Called with AioContext lock held, since many callback implementations
+ * use bdrv_* functions that require to hold the lock.
+ */
 void (*free)(Job *job);
 };
 
-- 
2.31.1




[PATCH v11 15/21] blockjob: rename notifier callbacks as _locked

2022-08-26 Thread Emanuele Giuseppe Esposito
They all are called with job_lock held, in job_event_*_locked()

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Kevin Wolf 
---
 blockjob.c | 25 +++--
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index c8919cef9b..d8fb5311c7 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -250,7 +250,8 @@ int block_job_add_bdrv(BlockJob *job, const char *name, 
BlockDriverState *bs,
 return 0;
 }
 
-static void block_job_on_idle(Notifier *n, void *opaque)
+/* Called with job_mutex lock held. */
+static void block_job_on_idle_locked(Notifier *n, void *opaque)
 {
 aio_wait_kick();
 }
@@ -370,7 +371,8 @@ static void block_job_iostatus_set_err(BlockJob *job, int 
error)
 }
 }
 
-static void block_job_event_cancelled(Notifier *n, void *opaque)
+/* Called with job_mutex lock held. */
+static void block_job_event_cancelled_locked(Notifier *n, void *opaque)
 {
 BlockJob *job = opaque;
 uint64_t progress_current, progress_total;
@@ -389,7 +391,8 @@ static void block_job_event_cancelled(Notifier *n, void 
*opaque)
 job->speed);
 }
 
-static void block_job_event_completed(Notifier *n, void *opaque)
+/* Called with job_mutex lock held. */
+static void block_job_event_completed_locked(Notifier *n, void *opaque)
 {
 BlockJob *job = opaque;
 const char *msg = NULL;
@@ -415,7 +418,8 @@ static void block_job_event_completed(Notifier *n, void 
*opaque)
 msg);
 }
 
-static void block_job_event_pending(Notifier *n, void *opaque)
+/* Called with job_mutex lock held. */
+static void block_job_event_pending_locked(Notifier *n, void *opaque)
 {
 BlockJob *job = opaque;
 
@@ -427,7 +431,8 @@ static void block_job_event_pending(Notifier *n, void 
*opaque)
   job->job.id);
 }
 
-static void block_job_event_ready(Notifier *n, void *opaque)
+/* Called with job_mutex lock held. */
+static void block_job_event_ready_locked(Notifier *n, void *opaque)
 {
 BlockJob *job = opaque;
 uint64_t progress_current, progress_total;
@@ -472,11 +477,11 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
 
 ratelimit_init(&job->limit);
 
-job->finalize_cancelled_notifier.notify = block_job_event_cancelled;
-job->finalize_completed_notifier.notify = block_job_event_completed;
-job->pending_notifier.notify = block_job_event_pending;
-job->ready_notifier.notify = block_job_event_ready;
-job->idle_notifier.notify = block_job_on_idle;
+job->finalize_cancelled_notifier.notify = block_job_event_cancelled_locked;
+job->finalize_completed_notifier.notify = block_job_event_completed_locked;
+job->pending_notifier.notify = block_job_event_pending_locked;
+job->ready_notifier.notify = block_job_event_ready_locked;
+job->idle_notifier.notify = block_job_on_idle_locked;
 
 WITH_JOB_LOCK_GUARD() {
 notifier_list_add(&job->job.on_finalize_cancelled,
-- 
2.31.1




[PATCH v11 19/21] block_job_query: remove atomic read

2022-08-26 Thread Emanuele Giuseppe Esposito
Not sure what the atomic here was supposed to do, since job.busy
is protected by the job lock. Since the whole function
is called under job_mutex, just remove the atomic.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Kevin Wolf 
---
 blockjob.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/blockjob.c b/blockjob.c
index d04f804001..120c1b7ead 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -338,7 +338,7 @@ BlockJobInfo *block_job_query_locked(BlockJob *job, Error 
**errp)
 info = g_new0(BlockJobInfo, 1);
 info->type  = g_strdup(job_type_str(&job->job));
 info->device= g_strdup(job->job.id);
-info->busy  = qatomic_read(&job->job.busy);
+info->busy  = job->job.busy;
 info->paused= job->job.pause_count > 0;
 info->offset= progress_current;
 info->len   = progress_total;
-- 
2.31.1




[PATCH v11 14/21] blockjob.h: categorize fields in struct BlockJob

2022-08-26 Thread Emanuele Giuseppe Esposito
The same job lock is being used also to protect some of blockjob fields.
Categorize them just as done in job.h.

Reviewed-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Emanuele Giuseppe Esposito 
---
 include/block/blockjob.h | 32 ++--
 1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 8b65d3949d..10c24e240a 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -40,21 +40,38 @@ typedef struct BlockJobDriver BlockJobDriver;
  * Long-running operation on a BlockDriverState.
  */
 typedef struct BlockJob {
-/** Data belonging to the generic Job infrastructure */
+/**
+ * Data belonging to the generic Job infrastructure.
+ * Protected by job mutex.
+ */
 Job job;
 
-/** Status that is published by the query-block-jobs QMP API */
+/**
+ * Status that is published by the query-block-jobs QMP API.
+ * Protected by job mutex.
+ */
 BlockDeviceIoStatus iostatus;
 
-/** Speed that was set with @block_job_set_speed.  */
+/**
+ * Speed that was set with @block_job_set_speed.
+ * Always modified and read under QEMU global mutex (GLOBAL_STATE_CODE).
+ */
 int64_t speed;
 
-/** Rate limiting data structure for implementing @speed. */
+/**
+ * Rate limiting data structure for implementing @speed.
+ * RateLimit API is thread-safe.
+ */
 RateLimit limit;
 
-/** Block other operations when block job is running */
+/**
+ * Block other operations when block job is running.
+ * Always modified and read under QEMU global mutex (GLOBAL_STATE_CODE).
+ */
 Error *blocker;
 
+/** All notifiers are set once in block_job_create() and never modified. */
+
 /** Called when a cancelled job is finalised. */
 Notifier finalize_cancelled_notifier;
 
@@ -70,7 +87,10 @@ typedef struct BlockJob {
 /** Called when the job coroutine yields or terminates */
 Notifier idle_notifier;
 
-/** BlockDriverStates that are involved in this block job */
+/**
+ * BlockDriverStates that are involved in this block job.
+ * Always modified and read under QEMU global mutex (GLOBAL_STATE_CODE).
+ */
 GSList *nodes;
 } BlockJob;
 
-- 
2.31.1




  1   2   3   4   >