Re: [PATCH v3 1/1] docs/devel: Add VFIO device migration documentation

2021-04-01 Thread Tarun Gupta (SW-GPU)




On 3/27/2021 11:34 AM, Shenming Lu wrote:

On 2021/3/26 21:18, Tarun Gupta wrote:

Document interfaces used for VFIO device migration. Added flow of state changes
during live migration with VFIO device. Tested by building docs with the new
vfio-migration.rst file.

v3:
- Add introductory line about VM migration in general.
- Remove occurcences of vfio_pin_pages() to describe pinning.
- Incorporated comments from v2

v2:
- Included the new vfio-migration.rst file in index.rst
- Updated dirty page tracking section, also added details about
   'pre-copy-dirty-page-tracking' opt-out option.
- Incorporated comments around wording of doc.

Signed-off-by: Tarun Gupta 
Signed-off-by: Kirti Wankhede 
---
  MAINTAINERS   |   1 +
  docs/devel/index.rst  |   1 +
  docs/devel/vfio-migration.rst | 143 ++
  3 files changed, 145 insertions(+)
  create mode 100644 docs/devel/vfio-migration.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index 738786146d..a2a80eee59 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1801,6 +1801,7 @@ M: Alex Williamson 
  S: Supported
  F: hw/vfio/*
  F: include/hw/vfio/
+F: docs/devel/vfio-migration.rst

  vfio-ccw
  M: Cornelia Huck 
diff --git a/docs/devel/index.rst b/docs/devel/index.rst
index ae664da00c..5330f1ca1d 100644
--- a/docs/devel/index.rst
+++ b/docs/devel/index.rst
@@ -39,3 +39,4 @@ Contents:
 qom
 block-coroutine-wrapper
 multi-process
+   vfio-migration
diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
new file mode 100644
index 00..24cb55991a
--- /dev/null
+++ b/docs/devel/vfio-migration.rst
@@ -0,0 +1,143 @@
+=
+VFIO device Migration
+=
+
+Migration of virtual machine involves saving the state for each device that
+the guest is running on source host and restoring this saved state on the
+destination host. This document details how saving and restoring of VFIO
+devices is done in QEMU.
+
+Migration of VFIO devices consists of two phases: the optional pre-copy phase,
+and the stop-and-copy phase. The pre-copy phase is iterative and allows to
+accommodate VFIO devices that have a large amount of data that needs to be
+transferred. The iterative pre-copy phase of migration allows for the guest to
+continue whilst the VFIO device state is transferred to the destination, this
+helps to reduce the total downtime of the VM. VFIO devices can choose to skip
+the pre-copy phase of migration by returning pending_bytes as zero during the
+pre-copy phase.
+
+A detailed description of the UAPI for VFIO device migration can be found in
+the comment for the ``vfio_device_migration_info`` structure in the header
+file linux-headers/linux/vfio.h.
+
+VFIO device hooks for iterative approach:
+
+* A ``save_setup`` function that sets up the migration region, sets _SAVING
+  flag in the VFIO device state and informs the VFIO IOMMU module to start
+  dirty page tracking.
+
+* A ``load_setup`` function that sets up the migration region on the
+  destination and sets _RESUMING flag in the VFIO device state.
+
+* A ``save_live_pending`` function that reads pending_bytes from the vendor
+  driver, which indicates the amount of data that the vendor driver has yet to
+  save for the VFIO device.
+
+* A ``save_live_iterate`` function that reads the VFIO device's data from the
+  vendor driver through the migration region during iterative phase.
+
+* A ``save_live_complete_precopy`` function that resets _RUNNING flag from the
+  VFIO device state, saves the device config space, if any, and iteratively
+  copies the remaining data for the VFIO device until the vendor driver
+  indicates that no data remains (pending bytes is zero).


Hi Tarun,

We have moved the saving of the config space to the ``save_state`` function
added in commit d329f5032e1, do we need to add this change here? :-)



Thanks Shenming, I'll update it accordingly in the next version.

Thanks,
Tarun


Thanks,
Shenming


+
+* A ``load_state`` function that loads the config section and the data
+  sections that are generated by the save functions above
+
+* ``cleanup`` functions for both save and load that perform any migration
+  related cleanup, including unmapping the migration region
+
+A VM state change handler is registered to change the VFIO device state when
+the VM state changes.
+
+Similarly, a migration state change notifier is registered to get a
+notification on migration state change. These states are translated to the
+corresponding VFIO device state and conveyed to the vendor driver.
+
+System memory dirty pages tracking
+--
+
+A ``log_sync`` memory listener callback marks those system memory pages
+as dirty which are used for DMA by the VFIO device. The dirty pages bitmap is
+queried per container. All pages pinned by the vendor driver through external
+APIs have to be marked as dirty during migration. When there are CPU writes,
+CPU dirty page tracking 

Re: [PATCH] tap-win32: correctly recycle buffers

2021-04-01 Thread Jason Wang



在 2021/3/29 上午11:01, Bin Meng 写道:

On Mon, Mar 29, 2021 at 10:20 AM Jason Wang  wrote:

Commit 969e50b61a28 ("net: Pad short frames to minimum size before
sending from SLiRP/TAP") tries to pad frames but try to recyle the
local array that is used for padding to tap thread. This patch fixes
this by recyling the original buffer.

Fixes: 969e50b61a28 ("net: Pad short frames to minimum size before sending from 
SLiRP/TAP")
Tested-by: Howard Spoelstra 
Signed-off-by: Jason Wang 
---
  net/tap-win32.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)


Reviewed-by: Bin Meng 

Tested booting U-Boot with QEMU ppce500 on Windows, without the patch,
QEMU crashes
Tested-by: Bin Meng 



Applied.

Thanks




Re: [PATCH v5 0/7] eBPF RSS support for virtio-net

2021-04-01 Thread Andrew Melnichenko
The skeleton is generated file. Style issues with rss.bpf.c would be fixed
in upcoming patches.

On Thu, Mar 25, 2021 at 5:58 PM  wrote:

> Patchew URL:
> https://patchew.org/QEMU/20210325153529.75831-1-and...@daynix.com/
>
>
>
> Hi,
>
> This series seems to have some coding style problems. See output below for
> more information:
>
> Type: series
> Message-id: 20210325153529.75831-1-and...@daynix.com
> Subject: [PATCH v5 0/7] eBPF RSS support for virtio-net
>
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
> git rev-parse base > /dev/null || exit 0
> git config --local diff.renamelimit 0
> git config --local diff.renames True
> git config --local diff.algorithm histogram
> ./scripts/checkpatch.pl --mailback base..
> === TEST SCRIPT END ===
>
> Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
> Switched to a new branch 'test'
> ad293ca MAINTAINERS: Added eBPF maintainers information.
> 57b0f9a docs: Added eBPF documentation.
> 043dbde virtio-net: Added eBPF RSS to virtio-net.
> aa652c0 ebpf: Added eBPF RSS loader.
> 9f24275 ebpf: Added eBPF RSS program.
> 6a33681 net: Added SetSteeringEBPF method for NetClientState.
> ad82041 net/tap: Added TUNSETSTEERINGEBPF code.
>
> === OUTPUT BEGIN ===
> 1/7 Checking commit ad820417b22d (net/tap: Added TUNSETSTEERINGEBPF code.)
> 2/7 Checking commit 6a33681ca4bf (net: Added SetSteeringEBPF method for
> NetClientState.)
> 3/7 Checking commit 9f24275a1eef (ebpf: Added eBPF RSS program.)
> Use of uninitialized value $acpi_testexpected in string eq at ./scripts/
> checkpatch.pl line 1529.
> WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
> #22:
> new file mode 100755
>
> WARNING: line over 80 characters
> #210: FILE: tools/ebpf/rss.bpf.c:156:
> + * According to
> https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml
>
> WARNING: line over 80 characters
> #213: FILE: tools/ebpf/rss.bpf.c:159:
> + * Need to choose reasonable amount of maximum extensions/options we may
> check to find
>
> ERROR: braces {} are necessary for all arms of this statement
> #235: FILE: tools/ebpf/rss.bpf.c:181:
> +if (err)
> [...]
>
> ERROR: braces {} are necessary for all arms of this statement
> #243: FILE: tools/ebpf/rss.bpf.c:189:
> +if (err)
> [...]
>
> ERROR: braces {} are necessary for all arms of this statement
> #254: FILE: tools/ebpf/rss.bpf.c:200:
> +if (err)
> [...]
>
> ERROR: braces {} are necessary for all arms of this statement
> #271: FILE: tools/ebpf/rss.bpf.c:217:
> +if (err)
> [...]
>
> WARNING: line over 80 characters
> #283: FILE: tools/ebpf/rss.bpf.c:229:
> +*l4_offset + opt_offset + offsetof(struct
> ipv6_destopt_hao, addr),
>
> ERROR: braces {} are necessary for all arms of this statement
> #286: FILE: tools/ebpf/rss.bpf.c:232:
> +if (err)
> [...]
>
> ERROR: braces {} are necessary for all arms of this statement
> #316: FILE: tools/ebpf/rss.bpf.c:262:
> +if (err)
> [...]
>
> ERROR: braces {} are necessary for all arms of this statement
> #330: FILE: tools/ebpf/rss.bpf.c:276:
> +if (err)
> [...]
>
> ERROR: braces {} are necessary for all arms of this statement
> #359: FILE: tools/ebpf/rss.bpf.c:305:
> +if (err)
> [...]
>
> ERROR: braces {} are necessary for all arms of this statement
> #373: FILE: tools/ebpf/rss.bpf.c:319:
> +if (err)
> [...]
>
> ERROR: braces {} are necessary for all arms of this statement
> #383: FILE: tools/ebpf/rss.bpf.c:329:
> +if (err)
> [...]
>
> ERROR: braces {} are necessary for all arms of this statement
> #394: FILE: tools/ebpf/rss.bpf.c:340:
> +if (err)
> [...]
>
> ERROR: braces {} are necessary for all arms of this statement
> #405: FILE: tools/ebpf/rss.bpf.c:351:
> +if (err)
> [...]
>
> ERROR: braces {} are necessary for all arms of this statement
> #429: FILE: tools/ebpf/rss.bpf.c:375:
> +if (err)
> [...]
>
> total: 13 errors, 4 warnings, 574 lines checked
>
> Patch 3/7 has style problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
>
> 4/7 Checking commit aa652c04e4a0 (ebpf: Added eBPF RSS loader.)
> Use of uninitialized value $acpi_testexpected in string eq at ./scripts/
> checkpatch.pl line 1529.
> WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
> #71:
> new file mode 100644
>
> WARNING: architecture specific defines should be avoided
> #353: FILE: ebpf/rss.bpf.skeleton.h:4:
> +#ifndef __RSS_BPF_SKEL_H__
>
> ERROR: code indent should never use tabs
> #360: FILE: ebpf/rss.bpf.skeleton.h:11:
> +^Istruct bpf_object_skeleton *skeleton;$
>
> ERROR: code indent should never use tabs
> #361: FILE: ebpf/rss.bpf.skeleton.h:12:
> +^Istruct bpf_object *obj;$
>
> ERROR: code indent should never use tabs
> #362: FILE: ebpf/rss.bpf.skeleton.h:13:
> +^Istruct {$
>
> ERROR: code indent should never use tabs
> #363: FILE: ebpf/rss.bpf.skeleton.h:14:
> +^I^Istruct

Re: [PATCH] tap-bsd: Remove special casing for older OpenBSD releases

2021-04-01 Thread Jason Wang



在 2021/3/30 上午4:38, Brad Smith 写道:

On 3/28/2021 11:58 PM, Jason Wang wrote:



在 2021/3/29 上午11:03, Brad Smith 写道:

It very much is correct. We don't care about such releases anymore.



So is there a doc/wiki to say Qemu doesn't support those OpenBSD 
release?


The (OpenBSD itself and QEMU) project only makes a concerted effort to 
support
two previous releases. I can't remember where in the QEMU Wiki it is 
mentioned.


Just looking at the Meson requirement alone limits us to the previous 
two releases
never mind older. Even if that wasn't a consideration there would be 
issues with
a few other dependencies like Gtk before going back this far to 
support such old

OpenBSD releases.




Ok. So I've applied this patch.

Thanks




Re: [PATCH for-6.0] Revert "target/arm: Make number of counters in PMCR follow the CPU"

2021-04-01 Thread Zenghui Yu

This works for me.

Tested-by: Zenghui Yu 



[PATCH v3 00/11] esp: fix asserts/segfaults discovered by fuzzer

2021-04-01 Thread Mark Cave-Ayland
Recently there have been a number of issues raised on Launchpad as a result of
fuzzing the am53c974 (ESP) device. I spent some time over the past couple of
days checking to see if anything had improved since my last patchset: from
what I can tell the issues are still present, but the cmdfifo related failures
now assert rather than corrupting memory.

This patchset applied to master passes my local tests using the qtest fuzz test
cases added by Alexander for the following Launchpad bugs:

  https://bugs.launchpad.net/qemu/+bug/1919035
  https://bugs.launchpad.net/qemu/+bug/1919036
  https://bugs.launchpad.net/qemu/+bug/1910723
  https://bugs.launchpad.net/qemu/+bug/1909247
  
I'm posting this now just before soft freeze since I see that some of the issues
have recently been allocated CVEs and so it could be argued that even though
they have existed for some time, it is worth fixing them for 6.0.

Signed-off-by: Mark Cave-Ayland 

v3:
- Rebase onto master
- Rearrange patch ordering (move patch 5 to the front) to help reduce cross-talk
  between the regression tests
- Introduce patch 2 to remove unnecessary FIFO usage
- Introduce patches 3-4 to consolidate esp_fifo_pop()/esp_fifo_push() wrapper
  functions to avoid having to introduce 2 variants of esp_fifo_pop_buf()
- Introduce esp_fifo_pop_buf() in patch 5 to prevent callers from overflowing
  the array used to model the FIFO
- Introduce patch 10 to clarify cancellation logic should all occur in the 
.cancel
  SCSI callback rather than at the site of the caller
- Add extra qtests in patch 11 to cover addition test cases provided on LP

v2:
- Add Alexander's R-B tag for patch 2 and Phil's R-B for patch 3
- Add patch 4 for additional testcase provided in Alexander's patch 1 comment
- Move current_req NULL checks forward in DMA functions (fixes ASAN bug reported
  at https://bugs.launchpad.net/qemu/+bug/1909247/comments/6) in patch 3
- Add qtest for am53c974 containing a basic set of regression tests using the
  automatic test cases generated by the fuzzer as requested by Paolo


Mark Cave-Ayland (11):
  esp: always check current_req is not NULL before use in DMA callbacks
  esp: rework write_response() to avoid using the FIFO for DMA
transactions
  esp: consolidate esp_cmdfifo_push() into esp_fifo_push()
  esp: consolidate esp_cmdfifo_pop() into esp_fifo_pop()
  esp: introduce esp_fifo_pop_buf() and use it instead of
fifo8_pop_buf()
  esp: ensure cmdfifo is not empty and current_dev is non-NULL
  esp: don't underflow cmdfifo in do_cmd()
  esp: don't overflow cmdfifo in get_cmd()
  esp: don't overflow cmdfifo if TC is larger than the cmdfifo size
  esp: don't reset async_len directly in esp_select() if cancelling
request
  tests/qtest: add tests for am53c974 device

 MAINTAINERS |   1 +
 hw/scsi/esp.c   | 116 ++-
 tests/qtest/am53c974-test.c | 216 
 tests/qtest/meson.build |   1 +
 4 files changed, 282 insertions(+), 52 deletions(-)
 create mode 100644 tests/qtest/am53c974-test.c

-- 
2.20.1




[PATCH v3 01/11] esp: always check current_req is not NULL before use in DMA callbacks

2021-04-01 Thread Mark Cave-Ayland
After issuing a SCSI command the SCSI layer can call the SCSIBusInfo .cancel
callback which resets both current_req and current_dev to NULL. If any data
is left in the transfer buffer (async_len != 0) then the next TI (Transfer
Information) command will attempt to reference the NULL pointer causing a
segfault.

Buglink: https://bugs.launchpad.net/qemu/+bug/1910723
Buglink: https://bugs.launchpad.net/qemu/+bug/1909247
Signed-off-by: Mark Cave-Ayland 
---
 hw/scsi/esp.c | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index 507ab363bc..bafea0d4e6 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -496,6 +496,10 @@ static void do_dma_pdma_cb(ESPState *s)
 return;
 }
 
+if (!s->current_req) {
+return;
+}
+
 if (to_device) {
 /* Copy FIFO data to device */
 len = MIN(s->async_len, ESP_FIFO_SZ);
@@ -527,11 +531,9 @@ static void do_dma_pdma_cb(ESPState *s)
 return;
 } else {
 if (s->async_len == 0) {
-if (s->current_req) {
-/* Defer until the scsi layer has completed */
-scsi_req_continue(s->current_req);
-s->data_in_ready = false;
-}
+/* Defer until the scsi layer has completed */
+scsi_req_continue(s->current_req);
+s->data_in_ready = false;
 return;
 }
 
@@ -604,6 +606,9 @@ static void esp_do_dma(ESPState *s)
 }
 return;
 }
+if (!s->current_req) {
+return;
+}
 if (s->async_len == 0) {
 /* Defer until data is available.  */
 return;
@@ -713,6 +718,10 @@ static void esp_do_nodma(ESPState *s)
 return;
 }
 
+if (!s->current_req) {
+return;
+}
+
 if (s->async_len == 0) {
 /* Defer until data is available.  */
 return;
-- 
2.20.1




[PATCH v3 03/11] esp: consolidate esp_cmdfifo_push() into esp_fifo_push()

2021-04-01 Thread Mark Cave-Ayland
Each FIFO currently has its own push functions with the only difference being
the capacity check. The original reason for this was that the fifo8
implementation doesn't have a formal API for retrieving the FIFO capacity,
however there are multiple examples within QEMU where the capacity field is
accessed directly.

Change esp_fifo_push() to access the FIFO capacity directly and then consolidate
esp_cmdfifo_push() into esp_fifo_push().

Signed-off-by: Mark Cave-Ayland 
---
 hw/scsi/esp.c | 27 ---
 1 file changed, 8 insertions(+), 19 deletions(-)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index 26fe1dcb9d..16aaf8be93 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -98,16 +98,15 @@ void esp_request_cancelled(SCSIRequest *req)
 }
 }
 
-static void esp_fifo_push(ESPState *s, uint8_t val)
+static void esp_fifo_push(Fifo8 *fifo, uint8_t val)
 {
-if (fifo8_num_used(&s->fifo) == ESP_FIFO_SZ) {
+if (fifo8_num_used(fifo) == fifo->capacity) {
 trace_esp_error_fifo_overrun();
 return;
 }
 
-fifo8_push(&s->fifo, val);
+fifo8_push(fifo, val);
 }
-
 static uint8_t esp_fifo_pop(ESPState *s)
 {
 if (fifo8_is_empty(&s->fifo)) {
@@ -117,16 +116,6 @@ static uint8_t esp_fifo_pop(ESPState *s)
 return fifo8_pop(&s->fifo);
 }
 
-static void esp_cmdfifo_push(ESPState *s, uint8_t val)
-{
-if (fifo8_num_used(&s->cmdfifo) == ESP_CMDFIFO_SZ) {
-trace_esp_error_fifo_overrun();
-return;
-}
-
-fifo8_push(&s->cmdfifo, val);
-}
-
 static uint8_t esp_cmdfifo_pop(ESPState *s)
 {
 if (fifo8_is_empty(&s->cmdfifo)) {
@@ -187,9 +176,9 @@ static void esp_pdma_write(ESPState *s, uint8_t val)
 }
 
 if (s->do_cmd) {
-esp_cmdfifo_push(s, val);
+esp_fifo_push(&s->cmdfifo, val);
 } else {
-esp_fifo_push(s, val);
+esp_fifo_push(&s->fifo, val);
 }
 
 dmalen--;
@@ -645,7 +634,7 @@ static void esp_do_dma(ESPState *s)
  */
 if (len < esp_get_tc(s) && esp_get_tc(s) <= ESP_FIFO_SZ) {
 while (fifo8_num_used(&s->fifo) < ESP_FIFO_SZ) {
-esp_fifo_push(s, 0);
+esp_fifo_push(&s->fifo, 0);
 len++;
 }
 }
@@ -947,9 +936,9 @@ void esp_reg_write(ESPState *s, uint32_t saddr, uint64_t 
val)
 break;
 case ESP_FIFO:
 if (s->do_cmd) {
-esp_cmdfifo_push(s, val);
+esp_fifo_push(&s->cmdfifo, val);
 } else {
-esp_fifo_push(s, val);
+esp_fifo_push(&s->fifo, val);
 }
 
 /* Non-DMA transfers raise an interrupt after every byte */
-- 
2.20.1




[PATCH v3 09/11] esp: don't overflow cmdfifo if TC is larger than the cmdfifo size

2021-04-01 Thread Mark Cave-Ayland
If a guest transfers the message out/command phase data using DMA with a TC
that is larger than the cmdfifo size then the cmdfifo overflows triggering
an assert. Limit the size of the transfer to the free space available in
cmdfifo.

Buglink: https://bugs.launchpad.net/qemu/+bug/1919036
Signed-off-by: Mark Cave-Ayland 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Alexander Bulekov 
---
 hw/scsi/esp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index c547c60395..b7f2680617 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -577,6 +577,7 @@ static void esp_do_dma(ESPState *s)
 cmdlen = fifo8_num_used(&s->cmdfifo);
 trace_esp_do_dma(cmdlen, len);
 if (s->dma_memory_read) {
+len = MIN(len, fifo8_num_free(&s->cmdfifo));
 s->dma_memory_read(s->dma_opaque, buf, len);
 fifo8_push_all(&s->cmdfifo, buf, len);
 } else {
-- 
2.20.1




[PATCH v3 02/11] esp: rework write_response() to avoid using the FIFO for DMA transactions

2021-04-01 Thread Mark Cave-Ayland
The code for write_response() has always used the FIFO to store the data for
the status/message in phases, even for DMA transactions. Switch to using a
separate buffer that can be used directly for DMA transactions and restrict
the FIFO use to the non-DMA case.

Signed-off-by: Mark Cave-Ayland 
---
 hw/scsi/esp.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index bafea0d4e6..26fe1dcb9d 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -445,18 +445,16 @@ static void write_response_pdma_cb(ESPState *s)
 
 static void write_response(ESPState *s)
 {
-uint32_t n;
+uint8_t buf[2];
 
 trace_esp_write_response(s->status);
 
-fifo8_reset(&s->fifo);
-esp_fifo_push(s, s->status);
-esp_fifo_push(s, 0);
+buf[0] = s->status;
+buf[1] = 0;
 
 if (s->dma) {
 if (s->dma_memory_write) {
-s->dma_memory_write(s->dma_opaque,
-(uint8_t *)fifo8_pop_buf(&s->fifo, 2, &n), 2);
+s->dma_memory_write(s->dma_opaque, buf, 2);
 s->rregs[ESP_RSTAT] = STAT_TC | STAT_ST;
 s->rregs[ESP_RINTR] |= INTR_BS | INTR_FC;
 s->rregs[ESP_RSEQ] = SEQ_CD;
@@ -466,7 +464,8 @@ static void write_response(ESPState *s)
 return;
 }
 } else {
-s->ti_size = 2;
+fifo8_reset(&s->fifo);
+fifo8_push_all(&s->fifo, buf, 2);
 s->rregs[ESP_RFLAGS] = 2;
 }
 esp_raise_irq(s);
-- 
2.20.1




[PATCH v3 11/11] tests/qtest: add tests for am53c974 device

2021-04-01 Thread Mark Cave-Ayland
Use the autogenerated fuzzer test cases as the basis for a set of am53c974
regression tests.

Signed-off-by: Mark Cave-Ayland 
---
 MAINTAINERS |   1 +
 tests/qtest/am53c974-test.c | 216 
 tests/qtest/meson.build |   1 +
 3 files changed, 218 insertions(+)
 create mode 100644 tests/qtest/am53c974-test.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 554be84b32..675f35d3af 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1776,6 +1776,7 @@ F: include/hw/scsi/*
 F: hw/scsi/*
 F: tests/qtest/virtio-scsi-test.c
 F: tests/qtest/fuzz-virtio-scsi-test.c
+F: tests/qtest/am53c974-test.c
 T: git https://github.com/bonzini/qemu.git scsi-next
 
 SSI
diff --git a/tests/qtest/am53c974-test.c b/tests/qtest/am53c974-test.c
new file mode 100644
index 00..9c4285d0c0
--- /dev/null
+++ b/tests/qtest/am53c974-test.c
@@ -0,0 +1,216 @@
+/*
+ * QTest testcase for am53c974
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later. See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include "libqos/libqtest.h"
+
+
+static void test_cmdfifo_underflow_ok(void)
+{
+QTestState *s = qtest_init(
+"-device am53c974,id=scsi "
+"-device scsi-hd,drive=disk0 -drive "
+"id=disk0,if=none,file=null-co://,format=raw -nodefaults");
+qtest_outl(s, 0xcf8, 0x80001004);
+qtest_outw(s, 0xcfc, 0x01);
+qtest_outl(s, 0xcf8, 0x8000100e);
+qtest_outl(s, 0xcfc, 0x8a00);
+qtest_outl(s, 0x8a09, 0x4200);
+qtest_outl(s, 0x8a0d, 0x00);
+qtest_outl(s, 0x8a0b, 0x1000);
+qtest_quit(s);
+}
+
+/* Reported as crash_1548bd10e7 */
+static void test_cmdfifo_underflow2_ok(void)
+{
+QTestState *s = qtest_init(
+"-m 512M -device am53c974,id=scsi -device scsi-hd,drive=disk0 "
+"-drive id=disk0,if=none,file=null-co://,format=raw -nodefaults");
+qtest_outl(s, 0xcf8, 0x80001010);
+qtest_outl(s, 0xcfc, 0xc000);
+qtest_outl(s, 0xcf8, 0x80001004);
+qtest_outw(s, 0xcfc, 0x01);
+qtest_outw(s, 0xc00c, 0x41);
+qtest_outw(s, 0xc00a, 0x00);
+qtest_outl(s, 0xc00a, 0x00);
+qtest_outw(s, 0xc00c, 0x43);
+qtest_outw(s, 0xc00b, 0x00);
+qtest_outw(s, 0xc00b, 0x00);
+qtest_outw(s, 0xc00c, 0x00);
+qtest_outl(s, 0xc00a, 0x00);
+qtest_outw(s, 0xc00a, 0x00);
+qtest_outl(s, 0xc00a, 0x00);
+qtest_outw(s, 0xc00c, 0x00);
+qtest_outl(s, 0xc00a, 0x00);
+qtest_outw(s, 0xc00a, 0x00);
+qtest_outl(s, 0xc00a, 0x00);
+qtest_outw(s, 0xc00c, 0x00);
+qtest_outl(s, 0xc00a, 0x00);
+qtest_outw(s, 0xc00a, 0x00);
+qtest_outl(s, 0xc00a, 0x00);
+qtest_outw(s, 0xc00c, 0x00);
+qtest_outl(s, 0xc00a, 0x00);
+qtest_outl(s, 0xc006, 0x00);
+qtest_outl(s, 0xc00b, 0x00);
+qtest_outw(s, 0xc00b, 0x0800);
+qtest_outw(s, 0xc00b, 0x00);
+qtest_outw(s, 0xc00b, 0x00);
+qtest_outl(s, 0xc006, 0x00);
+qtest_outl(s, 0xc00b, 0x00);
+qtest_outw(s, 0xc00b, 0x0800);
+qtest_outw(s, 0xc00b, 0x00);
+qtest_outw(s, 0xc00b, 0x4100);
+qtest_outw(s, 0xc00a, 0x00);
+qtest_outl(s, 0xc00a, 0x10);
+qtest_outl(s, 0xc00a, 0x00);
+qtest_outw(s, 0xc00c, 0x43);
+qtest_outl(s, 0xc00a, 0x10);
+qtest_outl(s, 0xc00a, 0x10);
+qtest_quit(s);
+}
+
+static void test_cmdfifo_overflow_ok(void)
+{
+QTestState *s = qtest_init(
+"-device am53c974,id=scsi "
+"-device scsi-hd,drive=disk0 -drive "
+"id=disk0,if=none,file=null-co://,format=raw -nodefaults");
+qtest_outl(s, 0xcf8, 0x80001004);
+qtest_outw(s, 0xcfc, 0x01);
+qtest_outl(s, 0xcf8, 0x8000100e);
+qtest_outl(s, 0xcfc, 0x0e00);
+qtest_outl(s, 0xe40, 0x03);
+qtest_outl(s, 0xe0b, 0x4100);
+qtest_outl(s, 0xe0b, 0x9000);
+qtest_quit(s);
+}
+
+/* Reported as crash_530ff2e211 */
+static void test_cmdfifo_overflow2_ok(void)
+{
+QTestState *s = qtest_init(
+"-m 512M -device am53c974,id=scsi -device scsi-hd,drive=disk0 "
+"-drive id=disk0,if=none,file=null-co://,format=raw -nodefaults");
+qtest_outl(s, 0xcf8, 0x80001010);
+qtest_outl(s, 0xcfc, 0xc000);
+qtest_outl(s, 0xcf8, 0x80001004);
+qtest_outw(s, 0xcfc, 0x01);
+qtest_outl(s, 0xc00b, 0x4100);
+qtest_outw(s, 0xc00b, 0xc200);
+qtest_outl(s, 0xc03f, 0x0300);
+qtest_quit(s);
+}
+
+/* Reported as crash_0900379669 */
+static void test_fifo_pop_buf(void)
+{
+QTestState *s = qtest_init(
+"-m 512M -device am53c974,id=scsi -device scsi-hd,drive=disk0 "
+"-drive id=disk0,if=none,file=null-co://,format=raw -nodefaults");
+qtest_outl(s, 0xcf8, 0x80001010);
+qtest_outl(s, 0xcfc, 0xc000);
+qtest_outl(s, 0xcf8, 0x80001004);
+qtest_outw(s, 0xcfc, 0x01);
+qtest_outb(s, 0xc000, 0x4);
+qtest_outb(s, 0xc008, 0xa0);
+qtest_outl(s, 0xc03f, 0x0300);
+qtest_outl(s, 0xc00b, 0xc300);
+qtest_outw(s, 0xc00b, 0x9000);
+qtest_outl(s, 0xc00b, 0xc300);
+

[PATCH v3 04/11] esp: consolidate esp_cmdfifo_pop() into esp_fifo_pop()

2021-04-01 Thread Mark Cave-Ayland
Each FIFO currently has its own pop functions with the only difference being
the capacity check. The original reason for this was that the fifo8
implementation doesn't have a formal API for retrieving the FIFO capacity,
however there are multiple examples within QEMU where the capacity field is
accessed directly.

Change esp_fifo_pop() to access the FIFO capacity directly and then consolidate
esp_cmdfifo_pop() into esp_fifo_pop().

Signed-off-by: Mark Cave-Ayland 
---
 hw/scsi/esp.c | 21 ++---
 1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index 16aaf8be93..ce88866803 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -107,22 +107,13 @@ static void esp_fifo_push(Fifo8 *fifo, uint8_t val)
 
 fifo8_push(fifo, val);
 }
-static uint8_t esp_fifo_pop(ESPState *s)
+static uint8_t esp_fifo_pop(Fifo8 *fifo)
 {
-if (fifo8_is_empty(&s->fifo)) {
+if (fifo8_is_empty(fifo)) {
 return 0;
 }
 
-return fifo8_pop(&s->fifo);
-}
-
-static uint8_t esp_cmdfifo_pop(ESPState *s)
-{
-if (fifo8_is_empty(&s->cmdfifo)) {
-return 0;
-}
-
-return fifo8_pop(&s->cmdfifo);
+return fifo8_pop(fifo);
 }
 
 static uint32_t esp_get_tc(ESPState *s)
@@ -159,9 +150,9 @@ static uint8_t esp_pdma_read(ESPState *s)
 uint8_t val;
 
 if (s->do_cmd) {
-val = esp_cmdfifo_pop(s);
+val = esp_fifo_pop(&s->cmdfifo);
 } else {
-val = esp_fifo_pop(s);
+val = esp_fifo_pop(&s->fifo);
 }
 
 return val;
@@ -887,7 +878,7 @@ uint64_t esp_reg_read(ESPState *s, uint32_t saddr)
 qemu_log_mask(LOG_UNIMP, "esp: PIO data read not implemented\n");
 s->rregs[ESP_FIFO] = 0;
 } else {
-s->rregs[ESP_FIFO] = esp_fifo_pop(s);
+s->rregs[ESP_FIFO] = esp_fifo_pop(&s->fifo);
 }
 val = s->rregs[ESP_FIFO];
 break;
-- 
2.20.1




Re: Status update for maintainers file

2021-04-01 Thread Philippe Mathieu-Daudé
On 3/31/21 7:42 PM, Sarah Harris wrote:
> Hi all,
> 
> I was added as a reviewer (in MAINTAINERS) for the AVR target for the 
> duration of my research work using it.
> The funding for my project expires in the middle of April, so I will not be 
> able to provide time for reviewing patches from that point.

Thank you Sarah for your AVR reviews the last years, they
have been of great value for the project and community.

Best regards,

Phil.



[PATCH v3 05/11] esp: introduce esp_fifo_pop_buf() and use it instead of fifo8_pop_buf()

2021-04-01 Thread Mark Cave-Ayland
The const pointer returned by fifo8_pop_buf() lies directly within the array 
used
to model the FIFO. Building with address sanitisers enabled shows that if the
caller expects a minimum number of bytes present then if the FIFO is nearly 
full,
the caller may unexpectedly access past the end of the array.

Introduce esp_fifo_pop_buf() which takes a destination buffer and performs a
memcpy() in it to guarantee that the caller cannot overwrite the FIFO array and
update all callers to use it. Similarly add underflow protection similar to
esp_fifo_push() and esp_fifo_pop() so that instead of triggering an assert()
the operation becomes a no-op.

Buglink: https://bugs.launchpad.net/qemu/+bug/1909247
Signed-off-by: Mark Cave-Ayland 
---
 hw/scsi/esp.c | 41 +
 1 file changed, 29 insertions(+), 12 deletions(-)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index ce88866803..1aa2caf57d 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -107,6 +107,7 @@ static void esp_fifo_push(Fifo8 *fifo, uint8_t val)
 
 fifo8_push(fifo, val);
 }
+
 static uint8_t esp_fifo_pop(Fifo8 *fifo)
 {
 if (fifo8_is_empty(fifo)) {
@@ -116,6 +117,23 @@ static uint8_t esp_fifo_pop(Fifo8 *fifo)
 return fifo8_pop(fifo);
 }
 
+static uint32_t esp_fifo_pop_buf(Fifo8 *fifo, uint8_t *dest, int maxlen)
+{
+const uint8_t *buf;
+uint32_t n;
+
+if (maxlen == 0) {
+return 0;
+}
+
+buf = fifo8_pop_buf(fifo, maxlen, &n);
+if (dest) {
+memcpy(dest, buf, n);
+}
+
+return n;
+}
+
 static uint32_t esp_get_tc(ESPState *s)
 {
 uint32_t dmalen;
@@ -240,11 +258,11 @@ static uint32_t get_cmd(ESPState *s, uint32_t maxlen)
 if (dmalen == 0) {
 return 0;
 }
-memcpy(buf, fifo8_pop_buf(&s->fifo, dmalen, &n), dmalen);
-if (dmalen >= 3) {
+n = esp_fifo_pop_buf(&s->fifo, buf, dmalen);
+if (n >= 3) {
 buf[0] = buf[2] >> 5;
 }
-fifo8_push_all(&s->cmdfifo, buf, dmalen);
+fifo8_push_all(&s->cmdfifo, buf, n);
 }
 trace_esp_get_cmd(dmalen, target);
 
@@ -257,16 +275,16 @@ static uint32_t get_cmd(ESPState *s, uint32_t maxlen)
 
 static void do_busid_cmd(ESPState *s, uint8_t busid)
 {
-uint32_t n, cmdlen;
+uint32_t cmdlen;
 int32_t datalen;
 int lun;
 SCSIDevice *current_lun;
-uint8_t *buf;
+uint8_t buf[ESP_CMDFIFO_SZ];
 
 trace_esp_do_busid_cmd(busid);
 lun = busid & 7;
 cmdlen = fifo8_num_used(&s->cmdfifo);
-buf = (uint8_t *)fifo8_pop_buf(&s->cmdfifo, cmdlen, &n);
+esp_fifo_pop_buf(&s->cmdfifo, buf, cmdlen);
 
 current_lun = scsi_device_find(&s->bus, 0, s->current_dev->id, lun);
 s->current_req = scsi_req_new(current_lun, 0, lun, buf, s);
@@ -299,13 +317,12 @@ static void do_busid_cmd(ESPState *s, uint8_t busid)
 static void do_cmd(ESPState *s)
 {
 uint8_t busid = fifo8_pop(&s->cmdfifo);
-uint32_t n;
 
 s->cmdfifo_cdb_offset--;
 
 /* Ignore extended messages for now */
 if (s->cmdfifo_cdb_offset) {
-fifo8_pop_buf(&s->cmdfifo, s->cmdfifo_cdb_offset, &n);
+esp_fifo_pop_buf(&s->cmdfifo, NULL, s->cmdfifo_cdb_offset);
 s->cmdfifo_cdb_offset = 0;
 }
 
@@ -483,7 +500,7 @@ static void do_dma_pdma_cb(ESPState *s)
 /* Copy FIFO data to device */
 len = MIN(s->async_len, ESP_FIFO_SZ);
 len = MIN(len, fifo8_num_used(&s->fifo));
-memcpy(s->async_buf, fifo8_pop_buf(&s->fifo, len, &n), len);
+n = esp_fifo_pop_buf(&s->fifo, s->async_buf, len);
 s->async_buf += n;
 s->async_len -= n;
 s->ti_size += n;
@@ -491,7 +508,7 @@ static void do_dma_pdma_cb(ESPState *s)
 if (n < len) {
 /* Unaligned accesses can cause FIFO wraparound */
 len = len - n;
-memcpy(s->async_buf, fifo8_pop_buf(&s->fifo, len, &n), len);
+n = esp_fifo_pop_buf(&s->fifo, s->async_buf, len);
 s->async_buf += n;
 s->async_len -= n;
 s->ti_size += n;
@@ -667,7 +684,7 @@ static void esp_do_dma(ESPState *s)
 static void esp_do_nodma(ESPState *s)
 {
 int to_device = ((s->rregs[ESP_RSTAT] & 7) == STAT_DO);
-uint32_t cmdlen, n;
+uint32_t cmdlen;
 int len;
 
 if (s->do_cmd) {
@@ -708,7 +725,7 @@ static void esp_do_nodma(ESPState *s)
 
 if (to_device) {
 len = MIN(fifo8_num_used(&s->fifo), ESP_FIFO_SZ);
-memcpy(s->async_buf, fifo8_pop_buf(&s->fifo, len, &n), len);
+esp_fifo_pop_buf(&s->fifo, s->async_buf, len);
 s->async_buf += len;
 s->async_len -= len;
 s->ti_size += len;
-- 
2.20.1




[PATCH v3 06/11] esp: ensure cmdfifo is not empty and current_dev is non-NULL

2021-04-01 Thread Mark Cave-Ayland
When about to execute a SCSI command, ensure that cmdfifo is not empty and
current_dev is non-NULL. This can happen if the guest tries to execute a TI
(Transfer Information) command without issuing one of the select commands
first.

Buglink: https://bugs.launchpad.net/qemu/+bug/1910723
Buglink: https://bugs.launchpad.net/qemu/+bug/1909247
Signed-off-by: Mark Cave-Ayland 
---
 hw/scsi/esp.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index 1aa2caf57d..4decbbfc29 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -284,6 +284,9 @@ static void do_busid_cmd(ESPState *s, uint8_t busid)
 trace_esp_do_busid_cmd(busid);
 lun = busid & 7;
 cmdlen = fifo8_num_used(&s->cmdfifo);
+if (!cmdlen || !s->current_dev) {
+return;
+}
 esp_fifo_pop_buf(&s->cmdfifo, buf, cmdlen);
 
 current_lun = scsi_device_find(&s->bus, 0, s->current_dev->id, lun);
-- 
2.20.1




[PATCH v3 07/11] esp: don't underflow cmdfifo in do_cmd()

2021-04-01 Thread Mark Cave-Ayland
If the guest tries to execute a CDB when cmdfifo is not empty before the start
of the message out phase then clearing the message out phase data will cause
cmdfifo to underflow due to cmdfifo_cdb_offset being larger than the amount of
data within.

Since this can only occur by issuing deliberately incorrect instruction
sequences, ensure that the maximum length of esp_fifo_pop_buf() is limited to
the size of the data within cmdfifo.

Buglink: https://bugs.launchpad.net/qemu/+bug/1909247
Signed-off-by: Mark Cave-Ayland 
---
 hw/scsi/esp.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index 4decbbfc29..7f49522e1d 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -319,13 +319,15 @@ static void do_busid_cmd(ESPState *s, uint8_t busid)
 
 static void do_cmd(ESPState *s)
 {
-uint8_t busid = fifo8_pop(&s->cmdfifo);
+uint8_t busid = esp_fifo_pop(&s->cmdfifo);
+int len;
 
 s->cmdfifo_cdb_offset--;
 
 /* Ignore extended messages for now */
 if (s->cmdfifo_cdb_offset) {
-esp_fifo_pop_buf(&s->cmdfifo, NULL, s->cmdfifo_cdb_offset);
+len = MIN(s->cmdfifo_cdb_offset, fifo8_num_used(&s->cmdfifo));
+esp_fifo_pop_buf(&s->cmdfifo, NULL, len);
 s->cmdfifo_cdb_offset = 0;
 }
 
-- 
2.20.1




Re: [PATCH] MAINTAINERS: replace Huawei's email to personal one

2021-04-01 Thread Philippe Mathieu-Daudé
On 4/1/21 8:34 AM, Thomas Huth wrote:
> On 23/03/2021 05.04, Dongjiu Geng wrote:
>> ping...
>>
>> sorry for the noise.
>> On 3/11/2021 19:29,Dongjiu Geng
>>  wrote:
>>
>>     In order to conveniently receive email, replace the Huawei
>>     email address with my personal one.
>>
>>     Signed-off-by: Dongjiu Geng 
>>     ---
>>     MAINTAINERS | 2 +-
>>     1 file changed, 1 insertion(+), 1 deletion(-)
>>
>>     diff --git a/MAINTAINERS b/MAINTAINERS
>>     index e04ae21..823b98b 100644
>>     --- a/MAINTAINERS
>>     +++ b/MAINTAINERS
>>     @@ -1711,7 +1711,7 @@ F: tests/qtest/acpi-utils.[hc]
>>     F: tests/data/acpi/
>>
>>     ACPI/HEST/GHES
>>     -R: Dongjiu Geng 
>>     +R: Dongjiu Geng 

We prefer the mail to be sent/signed by the listed address,
and acked by the updated address. But we should have tell
you that earlier, when you still had access to your mail.

>>     R: Xiang Zheng 
>>     L: qemu-...@nongnu.org
>>     S: Maintained
> 
> I'm currently assembling a pull-request that contains some other updates
> to MAINTAINERS, too. I'll add your patch there.

A pair of candidates:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg790977.html



[PATCH v3 08/11] esp: don't overflow cmdfifo in get_cmd()

2021-04-01 Thread Mark Cave-Ayland
If the guest tries to read a CDB using DMA and cmdfifo is not empty then it is
possible to overflow cmdfifo.

Since this can only occur by issuing deliberately incorrect instruction
sequences, ensure that the maximum length of the CDB transferred to cmdfifo is
limited to the available free space within cmdfifo.

Buglink: https://bugs.launchpad.net/qemu/+bug/1909247
Signed-off-by: Mark Cave-Ayland 
---
 hw/scsi/esp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index 7f49522e1d..c547c60395 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -243,6 +243,7 @@ static uint32_t get_cmd(ESPState *s, uint32_t maxlen)
 }
 if (s->dma_memory_read) {
 s->dma_memory_read(s->dma_opaque, buf, dmalen);
+dmalen = MIN(fifo8_num_free(&s->fifo), dmalen);
 fifo8_push_all(&s->cmdfifo, buf, dmalen);
 } else {
 if (esp_select(s) < 0) {
-- 
2.20.1




Re: [PATCH] docs: Fix typo in the default name of the qemu-system-x86_64 binary

2021-04-01 Thread Philippe Mathieu-Daudé
On 4/1/21 7:18 AM, Thomas Huth wrote:
> It's a '-' between 'qemu' and 'system', not a '_'.
> 
> Fixes: 324b2298fe ("docs/system: convert Texinfo documentation to rST")
> Signed-off-by: Thomas Huth 
> ---
>  docs/defs.rst.inc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Philippe Mathieu-Daudé 





[PATCH v3 10/11] esp: don't reset async_len directly in esp_select() if cancelling request

2021-04-01 Thread Mark Cave-Ayland
Instead let the SCSI layer invoke the .cancel callback itself to cancel and
reset the request state.

Signed-off-by: Mark Cave-Ayland 
---
 hw/scsi/esp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index b7f2680617..ca062a0400 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -95,6 +95,7 @@ void esp_request_cancelled(SCSIRequest *req)
 scsi_req_unref(s->current_req);
 s->current_req = NULL;
 s->current_dev = NULL;
+s->async_len = 0;
 }
 }
 
@@ -206,7 +207,6 @@ static int esp_select(ESPState *s)
 if (s->current_req) {
 /* Started a new command before the old one finished.  Cancel it.  */
 scsi_req_cancel(s->current_req);
-s->async_len = 0;
 }
 
 s->current_dev = scsi_device_find(&s->bus, 0, target, 0);
-- 
2.20.1




Re: [PATCH v2 0/6] esp: fix asserts/segfaults discovered by fuzzer

2021-04-01 Thread Mark Cave-Ayland

On 30/03/2021 10:59, Paolo Bonzini wrote:


Hi,

I also had some failures of the tests on CI, which is why I hadn't incorporated these 
changes yet.  Thanks for the advance warning, I'll wait for your v3.


Paolo


Hi Paolo,

I've just posted the latest v3 which passes all my local boot tests and the extra 
test cases posted to LP. There is one failure on Gitlab CI but that is for the 
clang-user build for tcg-tests-s390x-linux-user which is an existing issue.


Apologies it took a bit longer than expected: my laptop isn't the fastest in the 
world and booting everything will full debug and ASAN across several targets is 
tremendously slow :/


Also it seems there is something wrong with the qtest dependencies: for my current 
build of ~2900 files, the final commit to add the qtest for am53c974 which adds the 
test to test/qtest/meson.build causes ~2100 of those files to be rebuilt :(



ATB,

Mark.



Re: [PATCH v3 2/4] block: check for sys/disk.h

2021-04-01 Thread Philippe Mathieu-Daudé
On 4/1/21 7:08 AM, Joelle van Dyne wrote:
> On Mon, Mar 15, 2021 at 11:03 AM Joelle van Dyne  wrote:
>>
>> Some BSD platforms do not have this header.
>>
>> Reviewed-by: Peter Maydell 
>> Reviewed-by: Philippe Mathieu-Daudé 
>> Signed-off-by: Joelle van Dyne 
> 
> Please bear with me as I am still new to this, but what happens to the
> three patches that are reviewed if the last patch does not get
> reviewed? Do the reviewed patches still get to make it into 6.0? I am
> willing to drop the unreviewed patch if there are issues. Thanks.

I guess this is bad timing, as this time in year various maintainers
are on vacations. Cc'ing Paolo as he sometimes take generic/block
patches.

Regards,

Phil.



[PATCH v5 00/10] Fixed some bugs and optimized some codes for COLO

2021-04-01 Thread leirao
From: Rao,Lei 

Changes since v4:
--Replaced qemu_mutex_lock calls with QEMU_LOCK_GUARD in 
colo_bitmap_clear_dirty.
--Modify some minor issues about variable definition.
--Add some performance test data in the commit message.

Changes since v3:
--Remove cpu_throttle_stop from mig_throttle_counter_reset.

Changes since v2:
--Add a function named packet_new_nocopy.
--Continue to optimize the function of colo_flush_ram_cache.

Changes since v1:
--Reset the state of the auto-converge counters at every checkpoint 
instead of directly disabling.
--Treat the filter_send function returning zero as a normal case.

The series of patches include:
Fixed some bugs of qemu crash.
Optimized some code to reduce the time of checkpoint.
Remove some unnecessary code to improve COLO.

Rao, Lei (10):
  Remove some duplicate trace code.
  Fix the qemu crash when guest shutdown during checkpoint
  Optimize the function of filter_send
  Remove migrate_set_block_enabled in checkpoint
  Add a function named packet_new_nocopy for COLO.
  Add the function of colo_compare_cleanup
  Reset the auto-converge counter at every checkpoint.
  Reduce the PVM stop time during Checkpoint
  Add the function of colo_bitmap_clear_dirty
  Fixed calculation error of pkt->header_size in fill_pkt_tcp_info()

 migration/colo.c  | 10 +++
 migration/migration.c |  4 +++
 migration/ram.c   | 83 +--
 migration/ram.h   |  1 +
 net/colo-compare.c| 25 +++-
 net/colo-compare.h|  1 +
 net/colo.c| 23 ++
 net/colo.h|  1 +
 net/filter-mirror.c   |  8 ++---
 net/filter-rewriter.c |  3 +-
 net/net.c |  4 +++
 softmmu/runstate.c|  1 +
 12 files changed, 135 insertions(+), 29 deletions(-)

-- 
1.8.3.1




[PATCH v5 02/10] Fix the qemu crash when guest shutdown during checkpoint

2021-04-01 Thread leirao
From: "Rao, Lei" 

This patch fixes the following:
qemu-system-x86_64: invalid runstate transition: 'colo' ->'shutdown'
Aborted (core dumped)

Signed-off-by: Lei Rao 
Reviewed-by: Li Zhijian 
---
 softmmu/runstate.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index ce8977c..1564057 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -126,6 +126,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_RESTORE_VM, RUN_STATE_PRELAUNCH },
 
 { RUN_STATE_COLO, RUN_STATE_RUNNING },
+{ RUN_STATE_COLO, RUN_STATE_SHUTDOWN},
 
 { RUN_STATE_RUNNING, RUN_STATE_DEBUG },
 { RUN_STATE_RUNNING, RUN_STATE_INTERNAL_ERROR },
-- 
1.8.3.1




[PATCH v5 01/10] Remove some duplicate trace code.

2021-04-01 Thread leirao
From: "Rao, Lei" 

There is the same trace code in the colo_compare_packet_payload.

Signed-off-by: Lei Rao 
Reviewed-by: Li Zhijian 
---
 net/colo-compare.c | 13 -
 1 file changed, 13 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 9d1ad99..c142c08 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -590,19 +590,6 @@ static int colo_packet_compare_other(Packet *spkt, Packet 
*ppkt)
 uint16_t offset = ppkt->vnet_hdr_len;
 
 trace_colo_compare_main("compare other");
-if (trace_event_get_state_backends(TRACE_COLO_COMPARE_IP_INFO)) {
-char pri_ip_src[20], pri_ip_dst[20], sec_ip_src[20], sec_ip_dst[20];
-
-strcpy(pri_ip_src, inet_ntoa(ppkt->ip->ip_src));
-strcpy(pri_ip_dst, inet_ntoa(ppkt->ip->ip_dst));
-strcpy(sec_ip_src, inet_ntoa(spkt->ip->ip_src));
-strcpy(sec_ip_dst, inet_ntoa(spkt->ip->ip_dst));
-
-trace_colo_compare_ip_info(ppkt->size, pri_ip_src,
-   pri_ip_dst, spkt->size,
-   sec_ip_src, sec_ip_dst);
-}
-
 if (ppkt->size != spkt->size) {
 trace_colo_compare_main("Other: payload size of packets are 
different");
 return -1;
-- 
1.8.3.1




[PATCH v5 07/10] Reset the auto-converge counter at every checkpoint.

2021-04-01 Thread leirao
From: "Rao, Lei" 

if we don't reset the auto-converge counter,
it will continue to run with COLO running,
and eventually the system will hang due to the
CPU throttle reaching DEFAULT_MIGRATE_MAX_CPU_THROTTLE.

Signed-off-by: Lei Rao 
Reviewed-by: Dr. David Alan Gilbert 
---
 migration/colo.c | 4 
 migration/ram.c  | 9 +
 migration/ram.h  | 1 +
 3 files changed, 14 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index 1aaf316..723ffb8 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -459,6 +459,10 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s,
 if (ret < 0) {
 goto out;
 }
+
+if (migrate_auto_converge()) {
+mig_throttle_counter_reset();
+}
 /*
  * Only save VM's live state, which not including device state.
  * TODO: We may need a timeout mechanism to prevent COLO process
diff --git a/migration/ram.c b/migration/ram.c
index 40e7895..c69a8e0 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -652,6 +652,15 @@ static void mig_throttle_guest_down(uint64_t 
bytes_dirty_period,
 }
 }
 
+void mig_throttle_counter_reset(void)
+{
+RAMState *rs = ram_state;
+
+rs->time_last_bitmap_sync = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+rs->num_dirty_pages_period = 0;
+rs->bytes_xfer_prev = ram_counters.transferred;
+}
+
 /**
  * xbzrle_cache_zero_page: insert a zero page in the XBZRLE cache
  *
diff --git a/migration/ram.h b/migration/ram.h
index 6378bb3..3f78175 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -50,6 +50,7 @@ bool ramblock_is_ignored(RAMBlock *block);
 int xbzrle_cache_resize(uint64_t new_size, Error **errp);
 uint64_t ram_bytes_remaining(void);
 uint64_t ram_bytes_total(void);
+void mig_throttle_counter_reset(void);
 
 uint64_t ram_pagesize_summary(void);
 int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len);
-- 
1.8.3.1




[PATCH v5 04/10] Remove migrate_set_block_enabled in checkpoint

2021-04-01 Thread leirao
From: "Rao, Lei" 

We can detect disk migration in migrate_prepare, if disk migration
is enabled in COLO mode, we can directly report an error.and there
is no need to disable block migration at every checkpoint.

Signed-off-by: Lei Rao 
Signed-off-by: Zhang Chen 
Reviewed-by: Li Zhijian 
---
 migration/colo.c  | 6 --
 migration/migration.c | 4 
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index de27662..1aaf316 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -435,12 +435,6 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s,
 if (failover_get_state() != FAILOVER_STATUS_NONE) {
 goto out;
 }
-
-/* Disable block migration */
-migrate_set_block_enabled(false, &local_err);
-if (local_err) {
-goto out;
-}
 qemu_mutex_lock_iothread();
 
 #ifdef CONFIG_REPLICATION
diff --git a/migration/migration.c b/migration/migration.c
index ca8b97b..4578f22 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2219,6 +2219,10 @@ static bool migrate_prepare(MigrationState *s, bool blk, 
bool blk_inc,
 }
 
 if (blk || blk_inc) {
+if (migrate_colo_enabled()) {
+error_setg(errp, "No disk migration is required in COLO mode");
+return false;
+}
 if (migrate_use_block() || migrate_use_block_incremental()) {
 error_setg(errp, "Command options are incompatible with "
"current migration capabilities");
-- 
1.8.3.1




[PATCH v5 03/10] Optimize the function of filter_send

2021-04-01 Thread leirao
From: "Rao, Lei" 

The iov_size has been calculated in filter_send(). we can directly
return the size.In this way, this is no need to repeat calculations
in filter_redirector_receive_iov();

Signed-off-by: Lei Rao 
Reviewed-by: Li Zhijian 
---
 net/filter-mirror.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/filter-mirror.c b/net/filter-mirror.c
index f8e6500..f20240c 100644
--- a/net/filter-mirror.c
+++ b/net/filter-mirror.c
@@ -88,7 +88,7 @@ static int filter_send(MirrorState *s,
 goto err;
 }
 
-return 0;
+return size;
 
 err:
 return ret < 0 ? ret : -EIO;
@@ -159,7 +159,7 @@ static ssize_t filter_mirror_receive_iov(NetFilterState *nf,
 int ret;
 
 ret = filter_send(s, iov, iovcnt);
-if (ret) {
+if (ret < 0) {
 error_report("filter mirror send failed(%s)", strerror(-ret));
 }
 
@@ -182,10 +182,10 @@ static ssize_t 
filter_redirector_receive_iov(NetFilterState *nf,
 
 if (qemu_chr_fe_backend_connected(&s->chr_out)) {
 ret = filter_send(s, iov, iovcnt);
-if (ret) {
+if (ret < 0) {
 error_report("filter redirector send failed(%s)", strerror(-ret));
 }
-return iov_size(iov, iovcnt);
+return ret;
 } else {
 return 0;
 }
-- 
1.8.3.1




答复: [PATCH 1/1] Remove flatview_simplify()

2021-04-01 Thread FelixCui-oc
>That said, perhaps it's better to keep the simplification within a
>page-sized range, to avoid introducing subpages unnecessarily.


hi  paolo,

   The sizes of all flatranges merged by flatview_simplify() are 
page aligned.

Flatview_simplify() seems to have the opportunity to do some 
merging actions only when

starting the virtual machine.

We can temporarily remove flatview_simplify().


Thanks

Felixcui


发件人: Paolo Bonzini 
发送时间: 2021年3月31日 0:35:49
收件人: Richard Henderson; FelixCui-oc; Richard Henderson; Eduardo Habkost; Alex 
Williamson
抄送: RaymondPang-oc; qemu-devel@nongnu.org; cobechen...@zhaoxin.com
主题: Re: [PATCH 1/1] Remove flatview_simplify()

On 30/03/21 18:33, Richard Henderson wrote:
>
>> Flatview_simplify() can merge many small memory ranges
>> into a large one and contains EHCI dma buffers.
>> For example,the merged range maybe0xc-0xbfff.
>> When seabios write PAM register to change the properties
>> of part of the merged range from RW to readonly,
>> this action cause the merged IOVA mapping will be
>> unmapped.But EHCI device still send DMA cycles
>> and then IOMMU blocks the DMA cycles of EHCI device.
>
> You've described the problem, and it is quite obviously *not* in memory.c.

Well, sort of.

The problem is that neither VFIO nor KVM support atomically switching
the memory map.  For KVM that would be possible, for VFIO based on past
discussion it would be much harder.  Removing flatview_simplify() seems
to be the easiest way to bypass the issue.

That said, perhaps it's better to keep the simplification within a
page-sized range, to avoid introducing subpages unnecessarily.

Paolo



[PATCH v5 09/10] Add the function of colo_bitmap_clear_dirty

2021-04-01 Thread leirao
From: "Rao, Lei" 

When we use continuous dirty memory copy for flushing ram cache on
secondary VM, we can also clean up the bitmap of contiguous dirty
page memory. This also can reduce the VM stop time during checkpoint.

The performance test for COLO as follow:

Server configuraton:
CPU :Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
MEM :251G(type:DDR4 Speed:2666 MT/s)
SSD :Intel 730 and DC S35x0/3610/3700 Series SSDs

dirty pages:3189376  migration_bitmap_clear_dirty time consuming(ns):105194000
dirty pages:3189784  migration_bitmap_clear_dirty time consuming(ns):105297000
dirty pages:3190501  migration_bitmap_clear_dirty time consuming(ns):10541
dirty pages:3188734  migration_bitmap_clear_dirty time consuming(ns):105138000
dirty pages:3189464  migration_bitmap_clear_dirty time consuming(ns):111736000
dirty pages:3188558  migration_bitmap_clear_dirty time consuming(ns):105079000
dirty pages:3239489  migration_bitmap_clear_dirty time consuming(ns):106761000

dirty pages:3190240  colo_bitmap_clear_dirty time consuming(ns):8369000
dirty pages:3189293  colo_bitmap_clear_dirty time consuming(ns):8388000
dirty pages:3189171  colo_bitmap_clear_dirty time consuming(ns):8641000
dirty pages:3189099  colo_bitmap_clear_dirty time consuming(ns):828
dirty pages:3189974  colo_bitmap_clear_dirty time consuming(ns):8352000
dirty pages:3189471  colo_bitmap_clear_dirty time consuming(ns):8348000
dirty pages:3189681  colo_bitmap_clear_dirty time consuming(ns):8426000

it can be seen from the data that colo_bitmap_clear_dirty is more
efficient.

Signed-off-by: Lei Rao 
---
 migration/ram.c | 36 +++-
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 317fa4e..570ffa4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -857,6 +857,36 @@ unsigned long colo_bitmap_find_dirty(RAMState *rs, 
RAMBlock *rb,
 return first;
 }
 
+/**
+ * colo_bitmap_clear_dirty:when we flush ram cache to ram, we will use
+ * continuous memory copy, so we can also clean up the bitmap of contiguous
+ * dirty memory.
+ */
+static inline bool colo_bitmap_clear_dirty(RAMState *rs,
+   RAMBlock *rb,
+   unsigned long start,
+   unsigned long num)
+{
+bool ret;
+unsigned long i = 0;
+
+/*
+ * Since flush ram cache to ram can only happen on Secondary VM.
+ * and the clear bitmap always is NULL on destination side.
+ * Therefore, there is unnecessary to judge whether the
+ * clear_bitmap needs clear.
+ */
+QEMU_LOCK_GUARD(&rs->bitmap_mutex);
+for (i = 0; i < num; i++) {
+ret = test_and_clear_bit(start + i, rb->bmap);
+if (ret) {
+rs->migration_dirty_pages--;
+}
+}
+
+return ret;
+}
+
 static inline bool migration_bitmap_clear_dirty(RAMState *rs,
 RAMBlock *rb,
 unsigned long page)
@@ -3723,11 +3753,7 @@ void colo_flush_ram_cache(void)
 num = 0;
 block = QLIST_NEXT_RCU(block, next);
 } else {
-unsigned long i = 0;
-
-for (i = 0; i < num; i++) {
-migration_bitmap_clear_dirty(ram_state, block, offset + i);
-}
+colo_bitmap_clear_dirty(ram_state, block, offset, num);
 dst_host = block->host
  + (((ram_addr_t)offset) << TARGET_PAGE_BITS);
 src_host = block->colo_cache
-- 
1.8.3.1




[PATCH v5 06/10] Add the function of colo_compare_cleanup

2021-04-01 Thread leirao
From: "Rao, Lei" 

This patch fixes the following:
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x7f6ae4559859 in __GI_abort () at abort.c:79
#2  0x559aaa386720 in error_exit (err=16, msg=0x559aaa5973d0 
<__func__.16227> "qemu_mutex_destroy") at util/qemu-thread-posix.c:36
#3  0x559aaa3868c5 in qemu_mutex_destroy (mutex=0x559aabffe828) at 
util/qemu-thread-posix.c:69
#4  0x559aaa2f93a8 in char_finalize (obj=0x559aabffe800) at 
chardev/char.c:285
#5  0x559aaa23318a in object_deinit (obj=0x559aabffe800, 
type=0x559aabfd7d20) at qom/object.c:606
#6  0x559aaa2331b8 in object_deinit (obj=0x559aabffe800, 
type=0x559aabfd9060) at qom/object.c:610
#7  0x559aaa233200 in object_finalize (data=0x559aabffe800) at 
qom/object.c:620
#8  0x559aaa234202 in object_unref (obj=0x559aabffe800) at 
qom/object.c:1074
#9  0x559aaa2356b6 in object_finalize_child_property 
(obj=0x559aac0dac10, name=0x559aac778760 "compare0-0", opaque=0x559aabffe800) 
at qom/object.c:1584
#10 0x559aaa232f70 in object_property_del_all (obj=0x559aac0dac10) at 
qom/object.c:557
#11 0x559aaa2331ed in object_finalize (data=0x559aac0dac10) at 
qom/object.c:619
#12 0x559aaa234202 in object_unref (obj=0x559aac0dac10) at 
qom/object.c:1074
#13 0x559aaa2356b6 in object_finalize_child_property 
(obj=0x559aac0c75c0, name=0x559aac0dadc0 "chardevs", opaque=0x559aac0dac10) at 
qom/object.c:1584
#14 0x559aaa233071 in object_property_del_child (obj=0x559aac0c75c0, 
child=0x559aac0dac10, errp=0x0) at qom/object.c:580
#15 0x559aaa233155 in object_unparent (obj=0x559aac0dac10) at 
qom/object.c:599
#16 0x559aaa2fb721 in qemu_chr_cleanup () at chardev/char.c:1159
#17 0x559aa9f9b110 in main (argc=54, argv=0x7ffeb62fa998, 
envp=0x7ffeb62fab50) at vl.c:4539

When chardev is cleaned up, chr_write_lock needs to be destroyed. But
the colo-compare module is not cleaned up normally before it when the
guest poweroff. It is holding chr_write_lock at this time. This will
cause qemu crash.So we add the function of colo_compare_cleanup() before
qemu_chr_cleanup() to fix the bug.

Signed-off-by: Lei Rao 
---
 net/colo-compare.c | 10 ++
 net/colo-compare.h |  1 +
 net/net.c  |  4 
 3 files changed, 15 insertions(+)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index c142c08..5b538f4 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -1402,6 +1402,16 @@ static void colo_compare_init(Object *obj)
  compare_set_vnet_hdr);
 }
 
+void colo_compare_cleanup(void)
+{
+CompareState *tmp = NULL;
+CompareState *n = NULL;
+
+QTAILQ_FOREACH_SAFE(tmp, &net_compares, next, n) {
+object_unparent(OBJECT(tmp));
+}
+}
+
 static void colo_compare_finalize(Object *obj)
 {
 CompareState *s = COLO_COMPARE(obj);
diff --git a/net/colo-compare.h b/net/colo-compare.h
index 22ddd51..b055270 100644
--- a/net/colo-compare.h
+++ b/net/colo-compare.h
@@ -20,5 +20,6 @@
 void colo_notify_compares_event(void *opaque, int event, Error **errp);
 void colo_compare_register_notifier(Notifier *notify);
 void colo_compare_unregister_notifier(Notifier *notify);
+void colo_compare_cleanup(void);
 
 #endif /* QEMU_COLO_COMPARE_H */
diff --git a/net/net.c b/net/net.c
index 725a4e1..8fcb2e7 100644
--- a/net/net.c
+++ b/net/net.c
@@ -53,6 +53,7 @@
 #include "sysemu/sysemu.h"
 #include "sysemu/runstate.h"
 #include "sysemu/sysemu.h"
+#include "net/colo-compare.h"
 #include "net/filter.h"
 #include "qapi/string-output-visitor.h"
 #include "qapi/hmp-output-visitor.h"
@@ -1463,6 +1464,9 @@ void net_cleanup(void)
 {
 NetClientState *nc;
 
+/*cleanup colo compare module for COLO*/
+colo_compare_cleanup();
+
 /* We may del multiple entries during qemu_del_net_client(),
  * so QTAILQ_FOREACH_SAFE() is also not safe here.
  */
-- 
1.8.3.1




[PATCH v5 05/10] Add a function named packet_new_nocopy for COLO.

2021-04-01 Thread leirao
From: "Rao, Lei" 

Use the packet_new_nocopy instead of packet_new in the
filter-rewriter module. There will be one less memory
copy in the processing of each network packet.

Signed-off-by: Lei Rao 
---
 net/colo.c| 23 +++
 net/colo.h|  1 +
 net/filter-rewriter.c |  3 +--
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/net/colo.c b/net/colo.c
index ef00609..58106a8 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -174,6 +174,29 @@ Packet *packet_new(const void *data, int size, int 
vnet_hdr_len)
 return pkt;
 }
 
+/*
+ * packet_new_nocopy will not copy data, so the caller can't release
+ * the data. And it will be released in packet_destroy.
+ */
+Packet *packet_new_nocopy(void *data, int size, int vnet_hdr_len)
+{
+Packet *pkt = g_slice_new(Packet);
+
+pkt->data = data;
+pkt->size = size;
+pkt->creation_ms = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+pkt->vnet_hdr_len = vnet_hdr_len;
+pkt->tcp_seq = 0;
+pkt->tcp_ack = 0;
+pkt->seq_end = 0;
+pkt->header_size = 0;
+pkt->payload_size = 0;
+pkt->offset = 0;
+pkt->flags = 0;
+
+return pkt;
+}
+
 void packet_destroy(void *opaque, void *user_data)
 {
 Packet *pkt = opaque;
diff --git a/net/colo.h b/net/colo.h
index 573ab91..d91cd24 100644
--- a/net/colo.h
+++ b/net/colo.h
@@ -101,6 +101,7 @@ bool connection_has_tracked(GHashTable 
*connection_track_table,
 ConnectionKey *key);
 void connection_hashtable_reset(GHashTable *connection_track_table);
 Packet *packet_new(const void *data, int size, int vnet_hdr_len);
+Packet *packet_new_nocopy(void *data, int size, int vnet_hdr_len);
 void packet_destroy(void *opaque, void *user_data);
 void packet_destroy_partial(void *opaque, void *user_data);
 
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index 10fe393..cb3a96c 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -270,8 +270,7 @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
 vnet_hdr_len = nf->netdev->vnet_hdr_len;
 }
 
-pkt = packet_new(buf, size, vnet_hdr_len);
-g_free(buf);
+pkt = packet_new_nocopy(buf, size, vnet_hdr_len);
 
 /*
  * if we get tcp packet
-- 
1.8.3.1




[PATCH v5 08/10] Reduce the PVM stop time during Checkpoint

2021-04-01 Thread leirao
From: "Rao, Lei" 

When flushing memory from ram cache to ram during every checkpoint
on secondary VM, we can copy continuous chunks of memory instead of
4096 bytes per time to reduce the time of VM stop during checkpoint.

Signed-off-by: Lei Rao 
Reviewed-by: Dr. David Alan Gilbert 
---
 migration/ram.c | 48 +---
 1 file changed, 45 insertions(+), 3 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index c69a8e0..317fa4e 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -822,6 +822,41 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, 
RAMBlock *rb,
 return next;
 }
 
+/*
+ * colo_bitmap_find_diry:find contiguous dirty pages from start
+ *
+ * Returns the page offset within memory region of the start of the contiguout
+ * dirty page
+ *
+ * @rs: current RAM state
+ * @rb: RAMBlock where to search for dirty pages
+ * @start: page where we start the search
+ * @num: the number of contiguous dirty pages
+ */
+static inline
+unsigned long colo_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
+ unsigned long start, unsigned long *num)
+{
+unsigned long size = rb->used_length >> TARGET_PAGE_BITS;
+unsigned long *bitmap = rb->bmap;
+unsigned long first, next;
+
+*num = 0;
+
+if (ramblock_is_ignored(rb)) {
+return size;
+}
+
+first = find_next_bit(bitmap, size, start);
+if (first >= size) {
+return first;
+}
+next = find_next_zero_bit(bitmap, size, first + 1);
+assert(next >= first);
+*num = next - first;
+return first;
+}
+
 static inline bool migration_bitmap_clear_dirty(RAMState *rs,
 RAMBlock *rb,
 unsigned long page)
@@ -3679,19 +3714,26 @@ void colo_flush_ram_cache(void)
 block = QLIST_FIRST_RCU(&ram_list.blocks);
 
 while (block) {
-offset = migration_bitmap_find_dirty(ram_state, block, offset);
+unsigned long num = 0;
 
+offset = colo_bitmap_find_dirty(ram_state, block, offset, &num);
 if (((ram_addr_t)offset) << TARGET_PAGE_BITS
 >= block->used_length) {
 offset = 0;
+num = 0;
 block = QLIST_NEXT_RCU(block, next);
 } else {
-migration_bitmap_clear_dirty(ram_state, block, offset);
+unsigned long i = 0;
+
+for (i = 0; i < num; i++) {
+migration_bitmap_clear_dirty(ram_state, block, offset + i);
+}
 dst_host = block->host
  + (((ram_addr_t)offset) << TARGET_PAGE_BITS);
 src_host = block->colo_cache
  + (((ram_addr_t)offset) << TARGET_PAGE_BITS);
-memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
+memcpy(dst_host, src_host, TARGET_PAGE_SIZE * num);
+offset += num;
 }
 }
 }
-- 
1.8.3.1




[PATCH v5 10/10] Fixed calculation error of pkt->header_size in fill_pkt_tcp_info()

2021-04-01 Thread leirao
From: "Rao, Lei" 

The data pointer has skipped vnet_hdr_len in the function of
parse_packet_early().So, we can not subtract vnet_hdr_len again
when calculating pkt->header_size in fill_pkt_tcp_info(). Otherwise,
it will cause network packet comparsion errors and greatly increase
the frequency of checkpoints.

Signed-off-by: Lei Rao 
Signed-off-by: Zhang Chen 
Reviewed-by: Li Zhijian 
---
 net/colo-compare.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 5b538f4..b100e7b 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -211,7 +211,7 @@ static void fill_pkt_tcp_info(void *data, uint32_t *max_ack)
 pkt->tcp_ack = ntohl(tcphd->th_ack);
 *max_ack = *max_ack > pkt->tcp_ack ? *max_ack : pkt->tcp_ack;
 pkt->header_size = pkt->transport_header - (uint8_t *)pkt->data
-   + (tcphd->th_off << 2) - pkt->vnet_hdr_len;
+   + (tcphd->th_off << 2);
 pkt->payload_size = pkt->size - pkt->header_size;
 pkt->seq_end = pkt->tcp_seq + pkt->payload_size;
 pkt->flags = tcphd->th_flags;
-- 
1.8.3.1




Re: [PATCH v3 06/11] esp: ensure cmdfifo is not empty and current_dev is non-NULL

2021-04-01 Thread Philippe Mathieu-Daudé
On 4/1/21 9:49 AM, Mark Cave-Ayland wrote:
> When about to execute a SCSI command, ensure that cmdfifo is not empty and
> current_dev is non-NULL. This can happen if the guest tries to execute a TI
> (Transfer Information) command without issuing one of the select commands
> first.
> 
> Buglink: https://bugs.launchpad.net/qemu/+bug/1910723
> Buglink: https://bugs.launchpad.net/qemu/+bug/1909247
> Signed-off-by: Mark Cave-Ayland 
> ---
>  hw/scsi/esp.c | 3 +++
>  1 file changed, 3 insertions(+)

Reviewed-by: Philippe Mathieu-Daudé 



Re: [PATCH] migration: Remove time_t cast for OpenBSD

2021-04-01 Thread Daniel P . Berrangé
On Wed, Mar 31, 2021 at 03:26:16PM -0400, Brad Smith wrote:
> On 3/13/2021 6:33 PM, Brad Smith wrote:
> > On 3/11/2021 1:39 PM, Daniel P. Berrangé wrote:
> > > On Thu, Mar 11, 2021 at 06:28:57PM +, Dr. David Alan Gilbert wrote:
> > > > * Laurent Vivier (laur...@vivier.eu) wrote:
> > > > > Le 08/03/2021 à 12:46, Thomas Huth a écrit :
> > > > > > On 22/02/2021 08.28, Brad Smith wrote:
> > > > > > > OpenBSD has supported 64-bit time_t across all archs
> > > > > > > since 5.5 released in 2014.
> > > > > > > 
> > > > > > > Remove a time_t cast that is no longer necessary.
> > > > > > > 
> > > > > > > 
> > > > > > > Signed-off-by: Brad Smith 
> > > > > > > 
> > > > > > > diff --git a/migration/savevm.c b/migration/savevm.c
> > > > > > > index 52e2d72e4b..9557f85ba9 100644
> > > > > > > --- a/migration/savevm.c
> > > > > > > +++ b/migration/savevm.c
> > > > > > > @@ -2849,8 +2849,7 @@ bool save_snapshot(const char
> > > > > > > *name, bool overwrite, const char *vmstate,
> > > > > > >   if (name) {
> > > > > > >   pstrcpy(sn->name, sizeof(sn->name), name);
> > > > > > >   } else {
> > > > > > > -    /* cast below needed for OpenBSD where
> > > > > > > tv_sec is still 'long' */
> > > > > > > -    localtime_r((const time_t *)&tv.tv_sec, &tm);
> > > > > > > +    localtime_r(&tv.tv_sec, &tm);
> > > > > > >   strftime(sn->name, sizeof(sn->name),
> > > > > > > "vm-%Y%m%d%H%M%S", &tm);
> > > > > > >   }
> > > > > but the qemu_timeval from "include/sysemu/os-win32.h" still
> > > > > uses a long: is this file compiled for
> > > > > win32?
> > > > Yep this fails for me when built with x86_64-w64-mingw32- (it's fine
> > > > with i686-w64-mingw32- )
> > > We could just switch the code to use GDateTime from GLib and thus
> > > avoid portability issues. I think this should be equivalent:
> > > 
> > >   g_autoptr(GDateTime) now = g_date_time_new_now_local();
> > >   g_autofree char *nowstr = g_date_time_format(now,
> > > "vm-%Y%m%d%H%M%s");
> > >   strncpy(sn->name, sizeof(sn->name), nowstr);
> > 
> > Which way do you guys want to go? Something like above, remove the
> > comment
> > or some variation on the comment but not mentioning OpenBSD since it is
> > no
> > longer relevant?
> 
> Anyone?

Personally I always favour using GLib APIs if there's an applicable one,
since it eliminates portability problems - or rather offloads them to
the GLib maintainers, who have already solved them generally.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v3 07/11] esp: don't underflow cmdfifo in do_cmd()

2021-04-01 Thread Philippe Mathieu-Daudé
On 4/1/21 9:49 AM, Mark Cave-Ayland wrote:
> If the guest tries to execute a CDB when cmdfifo is not empty before the start
> of the message out phase then clearing the message out phase data will cause
> cmdfifo to underflow due to cmdfifo_cdb_offset being larger than the amount of
> data within.
> 
> Since this can only occur by issuing deliberately incorrect instruction
> sequences, ensure that the maximum length of esp_fifo_pop_buf() is limited to
> the size of the data within cmdfifo.
> 
> Buglink: https://bugs.launchpad.net/qemu/+bug/1909247
> Signed-off-by: Mark Cave-Ayland 
> ---
>  hw/scsi/esp.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
> index 4decbbfc29..7f49522e1d 100644
> --- a/hw/scsi/esp.c
> +++ b/hw/scsi/esp.c
> @@ -319,13 +319,15 @@ static void do_busid_cmd(ESPState *s, uint8_t busid)
>  
>  static void do_cmd(ESPState *s)
>  {
> -uint8_t busid = fifo8_pop(&s->cmdfifo);
> +uint8_t busid = esp_fifo_pop(&s->cmdfifo);
> +int len;
>  
>  s->cmdfifo_cdb_offset--;
>  
>  /* Ignore extended messages for now */
>  if (s->cmdfifo_cdb_offset) {
> -esp_fifo_pop_buf(&s->cmdfifo, NULL, s->cmdfifo_cdb_offset);
> +len = MIN(s->cmdfifo_cdb_offset, fifo8_num_used(&s->cmdfifo));

Do we want to log(GUEST_ERRORS) this?

> +esp_fifo_pop_buf(&s->cmdfifo, NULL, len);
>  s->cmdfifo_cdb_offset = 0;
>  }
>  
> 

Reviewed-by: Philippe Mathieu-Daudé 



Re: [PATCH v3 08/11] esp: don't overflow cmdfifo in get_cmd()

2021-04-01 Thread Philippe Mathieu-Daudé
On 4/1/21 9:49 AM, Mark Cave-Ayland wrote:
> If the guest tries to read a CDB using DMA and cmdfifo is not empty then it is
> possible to overflow cmdfifo.
> 
> Since this can only occur by issuing deliberately incorrect instruction
> sequences, ensure that the maximum length of the CDB transferred to cmdfifo is
> limited to the available free space within cmdfifo.
> 
> Buglink: https://bugs.launchpad.net/qemu/+bug/1909247
> Signed-off-by: Mark Cave-Ayland 
> ---
>  hw/scsi/esp.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
> index 7f49522e1d..c547c60395 100644
> --- a/hw/scsi/esp.c
> +++ b/hw/scsi/esp.c
> @@ -243,6 +243,7 @@ static uint32_t get_cmd(ESPState *s, uint32_t maxlen)
>  }
>  if (s->dma_memory_read) {
>  s->dma_memory_read(s->dma_opaque, buf, dmalen);
> +dmalen = MIN(fifo8_num_free(&s->fifo), dmalen);

Ditto, GUEST_ERRORS?

Reviewed-by: Philippe Mathieu-Daudé 

>  fifo8_push_all(&s->cmdfifo, buf, dmalen);
>  } else {
>  if (esp_select(s) < 0) {
> 




Re: [PATCH v3 03/11] esp: consolidate esp_cmdfifo_push() into esp_fifo_push()

2021-04-01 Thread Philippe Mathieu-Daudé
On 4/1/21 9:49 AM, Mark Cave-Ayland wrote:
> Each FIFO currently has its own push functions with the only difference being
> the capacity check. The original reason for this was that the fifo8
> implementation doesn't have a formal API for retrieving the FIFO capacity,
> however there are multiple examples within QEMU where the capacity field is
> accessed directly.

So the Fifo8 API is not complete / practical.

> Change esp_fifo_push() to access the FIFO capacity directly and then 
> consolidate
> esp_cmdfifo_push() into esp_fifo_push().
> 
> Signed-off-by: Mark Cave-Ayland 
> ---
>  hw/scsi/esp.c | 27 ---
>  1 file changed, 8 insertions(+), 19 deletions(-)
> 
> diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
> index 26fe1dcb9d..16aaf8be93 100644
> --- a/hw/scsi/esp.c
> +++ b/hw/scsi/esp.c
> @@ -98,16 +98,15 @@ void esp_request_cancelled(SCSIRequest *req)
>  }
>  }
>  
> -static void esp_fifo_push(ESPState *s, uint8_t val)
> +static void esp_fifo_push(Fifo8 *fifo, uint8_t val)
>  {
> -if (fifo8_num_used(&s->fifo) == ESP_FIFO_SZ) {
> +if (fifo8_num_used(fifo) == fifo->capacity) {
>  trace_esp_error_fifo_overrun();
>  return;
>  }
>  
> -fifo8_push(&s->fifo, val);
> +fifo8_push(fifo, val);
>  }
> -

Spurious line removal?

>  static uint8_t esp_fifo_pop(ESPState *s)
>  {
>  if (fifo8_is_empty(&s->fifo)) {
> @@ -117,16 +116,6 @@ static uint8_t esp_fifo_pop(ESPState *s)
>  return fifo8_pop(&s->fifo);
>  }

Reviewed-by: Philippe Mathieu-Daudé 



[PATCH v3] replay: notify CPU on event

2021-04-01 Thread Pavel Dovgalyuk
This patch enables vCPU notification to wake it up
when new async event comes in replay mode.

The motivation of this patch is the following.
Consider recorded block async event. It is saved into the log
with one of the checkpoints. This checkpoint may be passed in
vCPU loop. In replay mode when this async event is read from
the log, and block thread task is not finished yet, vCPU thread
goes to sleep. That is why this patch adds waking up the vCPU
to process this finished event.

Signed-off-by: Pavel Dovgalyuk 
---
 replay/replay-events.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/replay/replay-events.c b/replay/replay-events.c
index a1c6bb934e..15983dd250 100644
--- a/replay/replay-events.c
+++ b/replay/replay-events.c
@@ -15,6 +15,7 @@
 #include "replay-internal.h"
 #include "block/aio.h"
 #include "ui/input.h"
+#include "hw/core/cpu.h"
 
 typedef struct Event {
 ReplayAsyncEventKind event_kind;
@@ -126,6 +127,7 @@ void replay_add_event(ReplayAsyncEventKind event_kind,
 
 g_assert(replay_mutex_locked());
 QTAILQ_INSERT_TAIL(&events_list, event, events);
+qemu_cpu_kick(first_cpu);
 }
 
 void replay_bh_schedule_event(QEMUBH *bh)




Re: [PATCH v3 04/11] esp: consolidate esp_cmdfifo_pop() into esp_fifo_pop()

2021-04-01 Thread Philippe Mathieu-Daudé
On 4/1/21 9:49 AM, Mark Cave-Ayland wrote:
> Each FIFO currently has its own pop functions with the only difference being
> the capacity check. The original reason for this was that the fifo8
> implementation doesn't have a formal API for retrieving the FIFO capacity,
> however there are multiple examples within QEMU where the capacity field is
> accessed directly.
> 
> Change esp_fifo_pop() to access the FIFO capacity directly and then 
> consolidate
> esp_cmdfifo_pop() into esp_fifo_pop().
> 
> Signed-off-by: Mark Cave-Ayland 
> ---
>  hw/scsi/esp.c | 21 ++---
>  1 file changed, 6 insertions(+), 15 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé 



[PATCH v2 1/2] hw/smbios: support for type 41 (onboard devices extended information)

2021-04-01 Thread Vincent Bernat
Type 41 defines the attributes of devices that are onboard. The
original intent was to imply the BIOS had some level of control over
the enablement of the associated devices.

If network devices are present in this table, by default, udev will
name the corresponding interfaces enoX, X being the instance number.
Without such information, udev will fallback to using the PCI ID and
this usually gives ens3 or ens4. This can be a bit annoying as the
name of the network card may depend on the order of options and may
change if a new PCI device is added earlier on the commande line.
Being able to provide SMBIOS type 41 entry ensure the name of the
interface won't change and helps the user guess the right name without
booting a first time.

This can be invoked with:

$QEMU -netdev user,id=internet
  -device virtio-net-pci,mac=50:54:00:00:00:42,netdev=internet \
  -smbios type=41,designation='Onboard 
LAN',instance=1,kind=ethernet,pci=:00:09.0

Which results in the guest seeing dmidecode data and the interface
exposed as "eno1":

$ dmidecode -t 41
# dmidecode 3.3
Getting SMBIOS data from sysfs.
SMBIOS 2.8 present.Handle 0x2900, DMI type 41, 11 bytes
Onboard Device
Reference Designation: Onboard LAN
Type: Ethernet
Status: Enabled
Type Instance: 1
Bus Address: :00:09.0
$ udevadm info -p /sys/class/net/eno1 | grep ONBOARD
E: ID_NET_NAME_ONBOARD=eno1
E: ID_NET_LABEL_ONBOARD=Onboard LAN

Signed-off-by: Vincent Bernat 
---
 hw/smbios/smbios.c   | 119 +++
 include/hw/firmware/smbios.h |  11 
 qemu-options.hx  |   7 ++-
 3 files changed, 136 insertions(+), 1 deletion(-)

diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
index f22c4f5b734e..46a08652dff4 100644
--- a/hw/smbios/smbios.c
+++ b/hw/smbios/smbios.c
@@ -118,6 +118,32 @@ static struct {
 uint16_t speed;
 } type17;
 
+static QEnumLookup type41_kind_lookup = {
+.array = (const char *const[]) {
+"other",
+"unknown",
+"video",
+"scsi",
+"ethernet",
+"tokenring",
+"sound",
+"pata",
+"sata",
+"sas",
+},
+.size = 10
+};
+struct type41_instance {
+const char *designation;
+uint8_t instance, kind;
+struct {
+uint16_t segment;
+uint8_t bus, device;
+} pci;
+QTAILQ_ENTRY(type41_instance) next;
+};
+static QTAILQ_HEAD(, type41_instance) type41 = QTAILQ_HEAD_INITIALIZER(type41);
+
 static QemuOptsList qemu_smbios_opts = {
 .name = "smbios",
 .head = QTAILQ_HEAD_INITIALIZER(qemu_smbios_opts.head),
@@ -358,6 +384,33 @@ static const QemuOptDesc qemu_smbios_type17_opts[] = {
 { /* end of list */ }
 };
 
+static const QemuOptDesc qemu_smbios_type41_opts[] = {
+{
+.name = "type",
+.type = QEMU_OPT_NUMBER,
+.help = "SMBIOS element type",
+},{
+.name = "designation",
+.type = QEMU_OPT_STRING,
+.help = "reference designation string",
+},{
+.name = "kind",
+.type = QEMU_OPT_STRING,
+.help = "device type",
+.def_value_str = "other",
+},{
+.name = "instance",
+.type = QEMU_OPT_NUMBER,
+.help = "device type instance",
+},{
+.name = "pci",
+.type = QEMU_OPT_STRING,
+.help = "PCI device",
+.def_value_str = "0:0.0",
+},
+{ /* end of list */ }
+};
+
 static void smbios_register_config(void)
 {
 qemu_add_opts(&qemu_smbios_opts);
@@ -773,6 +826,26 @@ static void smbios_build_type_32_table(void)
 SMBIOS_BUILD_TABLE_POST;
 }
 
+static void smbios_build_type_41_table(void)
+{
+unsigned instance = 0;
+struct type41_instance *t41;
+
+QTAILQ_FOREACH(t41, &type41, next) {
+SMBIOS_BUILD_TABLE_PRE(41, 0x2900 + instance, true);
+
+SMBIOS_TABLE_SET_STR(41, reference_designation_str, t41->designation);
+t->device_type = t41->kind;
+t->device_type_instance = t41->instance;
+t->segment_group_number = cpu_to_le16(t41->pci.segment);
+t->bus_number = t41->pci.bus;
+t->device_number = t41->pci.device;
+
+SMBIOS_BUILD_TABLE_POST;
+instance++;
+}
+}
+
 static void smbios_build_type_127_table(void)
 {
 SMBIOS_BUILD_TABLE_PRE(127, 0x7F00, true); /* required */
@@ -928,6 +1001,7 @@ void smbios_get_tables(MachineState *ms,
 
 smbios_build_type_32_table();
 smbios_build_type_38_table();
+smbios_build_type_41_table();
 smbios_build_type_127_table();
 
 smbios_validate_table(ms);
@@ -1224,6 +1298,51 @@ void smbios_entry_add(QemuOpts *opts, Error **errp)
 save_opt(&type17.part, opts, "part");
 type17.speed = qemu_opt_get_number(opts, "speed", 0);
 return;
+case 41: {
+struct type41_instance *t;
+Error *local_err = NULL;
+int pseg,

[Bug 1909247] Re: QEMU: use after free vulnerability in esp_do_dma() in hw/scsi/esp.c

2021-04-01 Thread Mark Cave-Ayland
Thanks again Alex. I've just posted a v3 to the list which fixes your
extra test cases, and also those contained within the uaf and hw-esp-oob
attachments:

https://lists.gnu.org/archive/html/qemu-devel/2021-04/msg00015.html

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1909247

Title:
  QEMU: use after free vulnerability in esp_do_dma() in hw/scsi/esp.c

Status in QEMU:
  New

Bug description:
  A use-after-free vulnerability was found in the am53c974 SCSI host bus
  adapter emulation of QEMU. It could occur in the esp_do_dma() function
  in hw/scsi/esp.c while handling the 'Information Transfer' command
  (CMD_TI). A privileged guest user may abuse this flaw to crash the
  QEMU process on the host, resulting in a denial of service or
  potential code execution with the privileges of the QEMU process.

  This issue was reported by Cheolwoo Myung (Seoul National University).

  Original report:
  Using hypervisor fuzzer, hyfuzz, I found a use-after-free issue in
  am53c974 emulator of QEMU enabled ASan.

  It occurs while transferring information, as it does not check the
  buffer to be transferred.

  A malicious guest user/process could use this flaw to crash the QEMU
  process resulting in DoS scenario.

  To reproduce this issue, please run the QEMU with the following command
  line.

  # To enable ASan option, please set configuration with the following
  $ ./configure --target-list=i386-softmmu --disable-werror --enable-sanitizers
  $ make

  # To reproduce this issue, please run the QEMU process with the following 
command line
  $ ./qemu-system-i386 -m 512 -drive 
file=./hyfuzz.img,index=0,media=disk,format=raw \
  -device am53c974,id=scsi -device scsi-hd,drive=SysDisk \
  -drive id=SysDisk,if=none,file=./disk.img

  Please find attached the disk images to reproduce this issue.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1909247/+subscriptions



Re: [PATCH v3 02/11] esp: rework write_response() to avoid using the FIFO for DMA transactions

2021-04-01 Thread Philippe Mathieu-Daudé
On 4/1/21 9:49 AM, Mark Cave-Ayland wrote:
> The code for write_response() has always used the FIFO to store the data for
> the status/message in phases, even for DMA transactions. Switch to using a
> separate buffer that can be used directly for DMA transactions and restrict
> the FIFO use to the non-DMA case.
> 
> Signed-off-by: Mark Cave-Ayland 
> ---
>  hw/scsi/esp.c | 13 ++---
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
> index bafea0d4e6..26fe1dcb9d 100644
> --- a/hw/scsi/esp.c
> +++ b/hw/scsi/esp.c
> @@ -445,18 +445,16 @@ static void write_response_pdma_cb(ESPState *s)
>  
>  static void write_response(ESPState *s)
>  {
> -uint32_t n;
> +uint8_t buf[2];
>  
>  trace_esp_write_response(s->status);
>  
> -fifo8_reset(&s->fifo);
> -esp_fifo_push(s, s->status);
> -esp_fifo_push(s, 0);
> +buf[0] = s->status;
> +buf[1] = 0;

Slightly simplified:

   const uint8_t buf[2] = { s->status, 0 };

>  if (s->dma) {

   uint32_t n;

>  if (s->dma_memory_write) {
> -s->dma_memory_write(s->dma_opaque,
> -(uint8_t *)fifo8_pop_buf(&s->fifo, 2, &n), 
> 2);
> +s->dma_memory_write(s->dma_opaque, buf, 2);
>  s->rregs[ESP_RSTAT] = STAT_TC | STAT_ST;
>  s->rregs[ESP_RINTR] |= INTR_BS | INTR_FC;
>  s->rregs[ESP_RSEQ] = SEQ_CD;
> @@ -466,7 +464,8 @@ static void write_response(ESPState *s)
>  return;
>  }
>  } else {
> -s->ti_size = 2;
> +fifo8_reset(&s->fifo);
> +fifo8_push_all(&s->fifo, buf, 2);
>  s->rregs[ESP_RFLAGS] = 2;
>  }
>  esp_raise_irq(s);
> 

Reviewed-by: Philippe Mathieu-Daudé 



[PATCH v2 2/2] hw/smbios: retrieve PCI address from specified device for Type 41

2021-04-01 Thread Vincent Bernat
Instead of specifying the PCI address manually, the device can be
specified by ID:

$QEMU -netdev user,id=internet
  -device 
virtio-net-pci,mac=50:54:00:00:00:42,netdev=internet,id=internet-dev \
  -smbios type=41,designation='Onboard 
LAN',instance=1,kind=ethernet,pcidev=internet-dev

The PCI segment is assumed to be 0. This should hold true for most
cases.

$ dmidecode -t 41
# dmidecode 3.3
Getting SMBIOS data from sysfs.
SMBIOS 2.8 present.

Handle 0x2900, DMI type 41, 11 bytes
Onboard Device
Reference Designation: Onboard LAN
Type: Ethernet
Status: Enabled
Type Instance: 1
Bus Address: :00:09.0

$ ip -brief a
lo   UNKNOWN127.0.0.1/8 ::1/128
eno1 UP 10.0.2.14/24 fec0::5254:ff:fe00:42/64 
fe80::5254:ff:fe00:42/64

Signed-off-by: Vincent Bernat 
---
 hw/smbios/smbios.c | 47 +-
 qemu-options.hx|  2 +-
 2 files changed, 22 insertions(+), 27 deletions(-)

diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
index 46a08652dff4..0f390e03453c 100644
--- a/hw/smbios/smbios.c
+++ b/hw/smbios/smbios.c
@@ -27,6 +27,7 @@
 #include "hw/firmware/smbios.h"
 #include "hw/loader.h"
 #include "hw/boards.h"
+#include "hw/pci/pci.h"
 #include "smbios_build.h"
 
 /* legacy structures and constants for <= 2.0 machines */
@@ -134,12 +135,8 @@ static QEnumLookup type41_kind_lookup = {
 .size = 10
 };
 struct type41_instance {
-const char *designation;
+const char *designation, *pcidev;
 uint8_t instance, kind;
-struct {
-uint16_t segment;
-uint8_t bus, device;
-} pci;
 QTAILQ_ENTRY(type41_instance) next;
 };
 static QTAILQ_HEAD(, type41_instance) type41 = QTAILQ_HEAD_INITIALIZER(type41);
@@ -403,10 +400,9 @@ static const QemuOptDesc qemu_smbios_type41_opts[] = {
 .type = QEMU_OPT_NUMBER,
 .help = "device type instance",
 },{
-.name = "pci",
+.name = "pcidev",
 .type = QEMU_OPT_STRING,
 .help = "PCI device",
-.def_value_str = "0:0.0",
 },
 { /* end of list */ }
 };
@@ -837,9 +833,23 @@ static void smbios_build_type_41_table(void)
 SMBIOS_TABLE_SET_STR(41, reference_designation_str, t41->designation);
 t->device_type = t41->kind;
 t->device_type_instance = t41->instance;
-t->segment_group_number = cpu_to_le16(t41->pci.segment);
-t->bus_number = t41->pci.bus;
-t->device_number = t41->pci.device;
+
+if (t41->pcidev) {
+PCIDevice *pdev = NULL;
+int rc = pci_qdev_find_device(t41->pcidev, &pdev);
+if (rc == 0) {
+/*
+ * TODO: Extract the appropriate value. Most of the
+ * time, this will be 0.
+ */
+t->segment_group_number = cpu_to_le16(0);
+t->bus_number = pci_dev_bus_num(pdev);
+t->device_number = pdev->devfn;
+} else {
+fprintf(stderr, "%s: cannot find PCI device %s\n",
+__func__, t41->pcidev);
+}
+}
 
 SMBIOS_BUILD_TABLE_POST;
 instance++;
@@ -1301,7 +1311,6 @@ void smbios_entry_add(QemuOpts *opts, Error **errp)
 case 41: {
 struct type41_instance *t;
 Error *local_err = NULL;
-int pseg, pbus, pdevice, pfunction;
 
 if (!qemu_opts_validate(opts, qemu_smbios_type41_opts, errp)) {
 return;
@@ -1324,21 +1333,7 @@ void smbios_entry_add(QemuOpts *opts, Error **errp)
 return;
 }
 t->instance = qemu_opt_get_number(opts, "instance", 1);
-if (sscanf(qemu_opt_get(opts, "pci"), "%x:%x:%x.%x",
-   &pseg,
-   &pbus,
-   &pdevice,
-   &pfunction) != 4) {
-error_setg(errp, "unable to parse %s: %s",
-   qemu_opt_get(opts, "pci"),
-   g_strerror(errno));
-free(t);
-return;
-}
-t->pci.segment = pseg;
-t->pci.bus = pbus;
-t->pci.device = ((uint8_t)pdevice << 3) +
-((uint8_t)pfunction & 0x7);
+save_opt(&t->pcidev, opts, "pcidev");
 
 QTAILQ_INSERT_TAIL(&type41, t, next);
 return;
diff --git a/qemu-options.hx b/qemu-options.hx
index eb2de7c372c7..e6e54f9bd1f3 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2371,7 +2371,7 @@ DEF("smbios", HAS_ARG, QEMU_OPTION_smbios,
 "-smbios 
type=17[,loc_pfx=str][,bank=str][,manufacturer=str][,serial=str]\n"
 "   [,asset=str][,part=str][,speed=%d]\n"
 "specify SMBIOS type 17 fields\n"
-"-smbios 
type=41[,designation=str][,kind=str][,instance=%d][,pci=%x:%x:%x.%x]\n"
+ 

Re: [PATCH v6 1/4] Add NVMM accelerator: configure and build logic

2021-04-01 Thread Paolo Bonzini

On 31/03/21 22:07, Reinoud Zandijk wrote:

Signed-off-by: Reinoud Zandijk 
Signed-off-by: Kamil Rytarowski 


Incorrect order for the S-o-b headers, you should be last.


---
@@ -886,7 +887,7 @@ for opt do
;;
--python=*) python="$optarg" ; explicit_python=yes
;;
-  --sphinx-build=*) sphinx_build="$optarg"
+  --sphinx-build-3.8-3.8=*) sphinx_build="$optarg"


Oops. :)  Another occurrence later.


diff --git a/meson.build b/meson.build
index c6f4b0cf5e..e33face775 100644
--- a/meson.build
+++ b/meson.build


The meson.build parts can be simplified by applying this patch:

diff --git a/meson.build b/meson.build
index e33face775..c4600a46a6 100644
--- a/meson.build
+++ b/meson.build
@@ -171,7 +171,7 @@ version_res = []
 coref = []
 iokit = []
 emulator_link_args = []
-nvmm = []
+nvmm = not_found
 hvf = not_found
 if targetos == 'windows'
   socket = cc.find_library('ws2_32')
@@ -197,12 +197,6 @@ elif targetos == 'openbsd'
 # Disable OpenBSD W^X if available
 emulator_link_args = 
cc.get_supported_link_arguments('-Wl,-z,wxneeded')

   endif
-elif targetos == 'netbsd'
-  if not get_option('nvmm').disabled()
-if cc.has_header('nvmm.h')
-  nvmm = cc.find_library('nvmm')
-endif
-  endif
 endif

 accelerators = []
@@ -235,8 +229,11 @@ if not get_option('hax').disabled()
 accelerators += 'CONFIG_HAX'
   endif
 endif
-if not get_option('nvmm').disabled()
+if targetos == 'netbsd'
   if cc.has_header('nvmm.h', required: get_option('nvmm'))
+nvmm = cc.find_library('nvmm', required: get_option('nvmm'))
+  endif
+  if nvmm.found()
 accelerators += 'CONFIG_NVMM'
   endif
 endif
@@ -2242,7 +2239,7 @@ foreach target : target_dirs
   'name': 'qemu-system-' + target_name,
   'gui': false,
   'sources': files('softmmu/main.c'),
-  'dependencies': [nvmm]
+  'dependencies': []
 }]
 if targetos == 'windows' and (sdl.found() or gtk.found())
   execs += [{



@@ -607,7 +623,7 @@ if have_system and not get_option('curses').disabled()
has_curses_h = cc.has_header('curses.h', args: curses_compile_args)
  endif
  if has_curses_h
-  curses_libname_list = (targetos == 'windows' ? ['pdcurses'] : 
['ncursesw', 'cursesw'])
+  curses_libname_list = (targetos == 'windows' ? ['pdcurses'] : 
['ncursesw', 'cursesw', 'curses'])
foreach curses_libname : curses_libname_list
  libcurses = cc.find_library(curses_libname,
  required: false,
@@ -625,7 +641,7 @@ if have_system and not get_option('curses').disabled()
  endif
endif
if not get_option('iconv').disabled()
-foreach link_args : [ ['-liconv'], [] ]
+foreach link_args : [ [], ['-liconv'] ]
# Programs will be linked with glib and this will bring in libiconv on 
FreeBSD.
# We need to use libiconv if available because mixing libiconv's 
headers with
# the system libc does not work.


Independent changes, should be posted separately.


@@ -2226,7 +2242,7 @@ foreach target : target_dirs
'name': 'qemu-system-' + target_name,
'gui': false,
'sources': files('softmmu/main.c'),
-  'dependencies': []
+  'dependencies': [nvmm]
  }]
  if targetos == 'windows' and (sdl.found() or gtk.found())
execs += [{


Not needed yet, should be added together with the nvmm sources to the 
sourceset.


Paolo




Re: [PATCH] iotests: Test mirror-top filter permissions

2021-04-01 Thread Vladimir Sementsov-Ogievskiy

31.03.2021 15:28, Max Reitz wrote:

Add a test accompanying commit 53431b9086b2832ca1aeff0c55e186e9ed79bd11
("block/mirror: Fix mirror_top's permissions").

Signed-off-by: Max Reitz 
---
  tests/qemu-iotests/tests/mirror-top-perms | 121 ++
  tests/qemu-iotests/tests/mirror-top-perms.out |   5 +
  2 files changed, 126 insertions(+)
  create mode 100755 tests/qemu-iotests/tests/mirror-top-perms
  create mode 100644 tests/qemu-iotests/tests/mirror-top-perms.out

diff --git a/tests/qemu-iotests/tests/mirror-top-perms 
b/tests/qemu-iotests/tests/mirror-top-perms
new file mode 100755
index 00..451a0666f8
--- /dev/null
+++ b/tests/qemu-iotests/tests/mirror-top-perms
@@ -0,0 +1,121 @@
+#!/usr/bin/env python3
+# group: rw
+#
+# Test permissions taken by the mirror-top filter
+#
+# Copyright (C) 2021 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import os
+import iotests
+from iotests import qemu_img
+
+# Import qemu after iotests.py has amended sys.path
+# pylint: disable=wrong-import-order
+import qemu
+
+
+image_size = 1 * 1024 * 1024
+source = os.path.join(iotests.test_dir, 'source.img')
+
+
+class TestMirrorTopPerms(iotests.QMPTestCase):
+def setUp(self):
+assert qemu_img('create', '-f', iotests.imgfmt, source,
+str(image_size)) == 0
+self.vm = iotests.VM()
+self.vm.add_drive(source)
+self.vm.add_blockdev(f'null-co,node-name=null,size={image_size}')
+self.vm.launch()
+
+# Will be created by the test function itself
+self.vm_b = None
+
+def tearDown(self):
+try:
+self.vm.shutdown()
+except qemu.machine.AbnormalShutdown:
+pass
+
+if self.vm_b is not None:
+self.vm_b.shutdown()
+
+os.remove(source)
+
+def test_cancel(self):
+"""
+Before commit 53431b9086b28, mirror-top used to not take any
+permissions but WRITE and share all permissions.  Because it
+is inserted between the source's original parents and the
+source, there generally was no parent that would have taken or
+unshared any permissions on the source, which means that an
+external process could access the image unhindered by locks.
+(Unless there was a parent above the protocol node that would
+take its own locks, e.g. a format driver.)
+This is bad enough, but if the mirror job is then cancelled,
+the mirroring VM tries to take back the image, restores the
+original permissions taken and unshared, and assumes this must
+just work.  But it will not, and so the VM aborts.
+
+Commit 53431b9086b28 made mirror keep the original permissions
+and so no other process can "steal" the image.
+
+(Note that you cannot really do the same with the target image
+and then completing the job, because the mirror job always
+took/unshared the correct permissions on the target.  For
+example, it does not share READ_CONSISTENT, which makes it
+difficult to let some other qemu process open the image.)
+"""
+
+result = self.vm.qmp('blockdev-mirror',
+ job_id='mirror',
+ device='drive0',
+ target='null',
+ sync='full')
+self.assert_qmp(result, 'return', {})
+
+self.vm.event_wait('BLOCK_JOB_READY')
+
+# We want this to fail because the image cannot be locked.
+# If it does not fail, continue still and see what happens.


This comment is about vm_b.launch(), not about creating vm object. Probably 
better to move it down


+self.vm_b = iotests.VM(path_suffix='b')
+# Must use -blockdev -device so we can use share-rw.
+# (And we need share-rw=on because mirror-top was always
+# forced to take the WRITE permission so it can write to the
+# source image.)
+self.vm_b.add_blockdev(f'file,node-name=drive0,filename={source}')
+self.vm_b.add_device('virtio-blk,drive=drive0,share-rw=on')
+try:
+self.vm_b.launch()
+print('ERROR: VM B launched successfully, this should not have '
+  'happened')


probably iotests.log() is better here.


+except qemu.qmp.QMPConnectError:
+

Re: [PATCH v6 2/4] Add NVMM accelerator: x86 CPU support

2021-04-01 Thread Paolo Bonzini

On 31/03/21 22:07, Reinoud Zandijk wrote:

+void nvmm_vcpu_kick(CPUState *cpu);


Not defined anywhere.


+{
+#if NVMM_USER_VERSION == 1
+struct sigaction sigact;
+sigset_t set;
+
+/* Install the IPI handler. */
+memset(&sigact, 0, sizeof(sigact));
+sigact.sa_handler = nvmm_ipi_signal;
+sigaction(SIG_IPI, &sigact, NULL);
+
+/* Allow IPIs on the current thread. */
+sigprocmask(SIG_BLOCK, NULL, &set);
+sigdelset(&set, SIG_IPI);
+pthread_sigmask(SIG_SETMASK, &set, NULL);
+#else
+/*
+ * We use the nvmm_vcpu_stop() mechanism, and don't use signals.
+ * Nothing to do.
+ */
+#endif


Since nvmm_vcpu_stop is very similar to KVM's immediate_exit mechanism, 
I think you still need to have a dummy signal handler to kick the VM out 
of the run loop *if it is in the kernel*.  The signal handler however 
can just do nothing.


Also, can you just drop support for NVMM_USER_VERSION == 1?


diff --git a/target/i386/nvmm/meson.build b/target/i386/nvmm/meson.build
new file mode 100644
index 00..c154e78014
--- /dev/null
+++ b/target/i386/nvmm/meson.build
@@ -0,0 +1,4 @@
+i386_softmmu_ss.add(when: 'CONFIG_NVMM', if_true: files(
+  'nvmm-all.c',
+  'nvmm-accel-ops.c',
+))


The nvmm library should be added here.

Paolo




Re: [PATCH v6 0/4] Implements the NetBSD Virtual Machine Monitor accelerator

2021-04-01 Thread Paolo Bonzini

On 31/03/21 22:07, Reinoud Zandijk wrote:

The NetBSD team has been working hard on a new user-mode API for our
hypervisor that will be released as part of the upcoming NetBSD 9.0.

The NetBSD team has implemented its new hypervisor called NVMM. It has been
included since NetBSD 9.0 and has been in use now for quite some time. NVMM
adds user-mode capabilities to create and manage virtual machines, configure
memory mappings for guest machines, and create and control execution of
virtual processors.

With this new API we are now able to bring our hypervisor to the QEMU
community! The following patches implement the NetBSD Virtual Machine Monitor
accelerator (NVMM) for QEMU on NetBSD 9.0 and newer hosts.

When compiling QEMU for x86_64 it will autodetect nvmm and will compile the
accelerator for use if found. At runtime using the '-accel nvmm' should see a
significant performance improvement over emulation, much like when using 'hax'
on NetBSD.

The documentation for this new API is visible at https://man.netbsd.org under
the libnvmm(3) and nvmm(4) pages.

NVMM was designed and implemented by Maxime Villard 

Thank you for your feedback.


Very nice.  Just a couple remarks but nothing too serious.

Paolo




Re: [PATCH v2 1/2] hw/smbios: support for type 41 (onboard devices extended information)

2021-04-01 Thread Daniel P . Berrangé
On Thu, Apr 01, 2021 at 10:25:43AM +0200, Vincent Bernat wrote:
> Type 41 defines the attributes of devices that are onboard. The
> original intent was to imply the BIOS had some level of control over
> the enablement of the associated devices.
> 
> If network devices are present in this table, by default, udev will
> name the corresponding interfaces enoX, X being the instance number.
> Without such information, udev will fallback to using the PCI ID and
> this usually gives ens3 or ens4. This can be a bit annoying as the
> name of the network card may depend on the order of options and may
> change if a new PCI device is added earlier on the commande line.
> Being able to provide SMBIOS type 41 entry ensure the name of the
> interface won't change and helps the user guess the right name without
> booting a first time.
> 
> This can be invoked with:
> 
> $QEMU -netdev user,id=internet
>   -device virtio-net-pci,mac=50:54:00:00:00:42,netdev=internet \
>   -smbios type=41,designation='Onboard 
> LAN',instance=1,kind=ethernet,pci=:00:09.0
> 
> Which results in the guest seeing dmidecode data and the interface
> exposed as "eno1":
> 
> $ dmidecode -t 41
> # dmidecode 3.3
> Getting SMBIOS data from sysfs.
> SMBIOS 2.8 present.Handle 0x2900, DMI type 41, 11 bytes
> Onboard Device
> Reference Designation: Onboard LAN
> Type: Ethernet
> Status: Enabled
> Type Instance: 1
> Bus Address: :00:09.0
> $ udevadm info -p /sys/class/net/eno1 | grep ONBOARD
> E: ID_NET_NAME_ONBOARD=eno1
> E: ID_NET_LABEL_ONBOARD=Onboard LAN
> 
> Signed-off-by: Vincent Bernat 
> ---
>  hw/smbios/smbios.c   | 119 +++
>  include/hw/firmware/smbios.h |  11 
>  qemu-options.hx  |   7 ++-
>  3 files changed, 136 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
> index f22c4f5b734e..46a08652dff4 100644
> --- a/hw/smbios/smbios.c
> +++ b/hw/smbios/smbios.c


>  static void smbios_register_config(void)
>  {
>  qemu_add_opts(&qemu_smbios_opts);
> @@ -773,6 +826,26 @@ static void smbios_build_type_32_table(void)
>  SMBIOS_BUILD_TABLE_POST;
>  }
>  
> +static void smbios_build_type_41_table(void)
> +{
> +unsigned instance = 0;
> +struct type41_instance *t41;
> +
> +QTAILQ_FOREACH(t41, &type41, next) {
> +SMBIOS_BUILD_TABLE_PRE(41, 0x2900 + instance, true);
> +
> +SMBIOS_TABLE_SET_STR(41, reference_designation_str, 
> t41->designation);
> +t->device_type = t41->kind;
> +t->device_type_instance = t41->instance;
> +t->segment_group_number = cpu_to_le16(t41->pci.segment);
> +t->bus_number = t41->pci.bus;
> +t->device_number = t41->pci.device;
> +
> +SMBIOS_BUILD_TABLE_POST;
> +instance++;
> +}
> +}
> +
>  static void smbios_build_type_127_table(void)
>  {
>  SMBIOS_BUILD_TABLE_PRE(127, 0x7F00, true); /* required */
> @@ -928,6 +1001,7 @@ void smbios_get_tables(MachineState *ms,
>  
>  smbios_build_type_32_table();
>  smbios_build_type_38_table();
> +smbios_build_type_41_table();
>  smbios_build_type_127_table();
>  
>  smbios_validate_table(ms);
> @@ -1224,6 +1298,51 @@ void smbios_entry_add(QemuOpts *opts, Error **errp)
>  save_opt(&type17.part, opts, "part");
>  type17.speed = qemu_opt_get_number(opts, "speed", 0);
>  return;
> +case 41: {
> +struct type41_instance *t;
> +Error *local_err = NULL;
> +int pseg, pbus, pdevice, pfunction;
> +
> +if (!qemu_opts_validate(opts, qemu_smbios_type41_opts, errp)) {
> +return;
> +}
> +t = calloc(1, sizeof(struct type41_instance));
> +if (!t) {
> +error_setg(errp,
> +   "Unable to allocate memory for a new type 41 
> instance");
> +return;
> +}

QEMU uses GLib allocation functions throughout, which abort
on OOM. So replace this with g_new0.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [RFC PATCH 0/6] Introduce cluster cpu topology support

2021-04-01 Thread wangyanan (Y)

Hi Paolo,

On 2021/3/31 18:00, Paolo Bonzini wrote:

On 31/03/21 11:53, Yanan Wang wrote:

A cluster means a group of cores that share some resources (e.g. cache)
among them under the LLC. For example, ARM64 server chip Kunpeng 920 has
6 or 8 clusters in each NUMA, and each cluster has 4 cores. All clusters
share L3 cache data while cores within each cluster share the L2 cache.


Is this different from what we already have with "-smp dies"?
As far as I know, yes. I think they are two architecture concepts of 
different levels.
A cpu socket/package can consist of multiple dies, and each die can 
consist of
multiple clusters, which means dies are parent for clusters. And this 
kind of cpu

hierarchy structure is normal in ARM64 platform.

Still take above ARM64 server chip Kunpeng 920 as an example, there 
totally are
2 sockets, 2 dies in each socket, 6 or 8 clusters in each die, and 4 
cores in each

cluster. Since it also supports NUMA architecture, then each NUMA actually
represents one die.

Although there is already "-smp dies=*" command line parameter for PC 
Machines,
a cluster level can be added for x86 architecture if meaningful and 
there will be more
work to do in qemu to make use of it. And I am sure that ARM needs this 
level.


If there is something wrong above, please correct me, thanks!

Thanks,
Yanan


Paolo

.




Re: [PATCH v2 1/2] hw/smbios: support for type 41 (onboard devices extended information)

2021-04-01 Thread Vincent Bernat
 ❦  1 avril 2021 09:41 +01, Daniel P. Berrangé:

>> +t = calloc(1, sizeof(struct type41_instance));
>> +if (!t) {
>> +error_setg(errp,
>> +   "Unable to allocate memory for a new type 41 
>> instance");
>> +return;
>> +}
>
> QEMU uses GLib allocation functions throughout, which abort
> on OOM. So replace this with g_new0.

Ack. Will be fixed on the next version. Thanks!
-- 
Make input easy to proofread.
- The Elements of Programming Style (Kernighan & Plauger)



Re: [PATCH v3] replay: notify CPU on event

2021-04-01 Thread Paolo Bonzini

On 01/04/21 10:19, Pavel Dovgalyuk wrote:

This patch enables vCPU notification to wake it up
when new async event comes in replay mode.

The motivation of this patch is the following.
Consider recorded block async event. It is saved into the log
with one of the checkpoints. This checkpoint may be passed in
vCPU loop. In replay mode when this async event is read from
the log, and block thread task is not finished yet, vCPU thread
goes to sleep. That is why this patch adds waking up the vCPU
to process this finished event.

Signed-off-by: Pavel Dovgalyuk 
---
  replay/replay-events.c |2 ++
  1 file changed, 2 insertions(+)

diff --git a/replay/replay-events.c b/replay/replay-events.c
index a1c6bb934e..15983dd250 100644
--- a/replay/replay-events.c
+++ b/replay/replay-events.c
@@ -15,6 +15,7 @@
  #include "replay-internal.h"
  #include "block/aio.h"
  #include "ui/input.h"
+#include "hw/core/cpu.h"
  
  typedef struct Event {

  ReplayAsyncEventKind event_kind;
@@ -126,6 +127,7 @@ void replay_add_event(ReplayAsyncEventKind event_kind,
  
  g_assert(replay_mutex_locked());

  QTAILQ_INSERT_TAIL(&events_list, event, events);
+qemu_cpu_kick(first_cpu);
  }
  
  void replay_bh_schedule_event(QEMUBH *bh)




Queued, thanks.

Paolo




Re: [PATCH v3 03/11] esp: consolidate esp_cmdfifo_push() into esp_fifo_push()

2021-04-01 Thread Mark Cave-Ayland

On 01/04/2021 09:15, Philippe Mathieu-Daudé wrote:


On 4/1/21 9:49 AM, Mark Cave-Ayland wrote:

Each FIFO currently has its own push functions with the only difference being
the capacity check. The original reason for this was that the fifo8
implementation doesn't have a formal API for retrieving the FIFO capacity,
however there are multiple examples within QEMU where the capacity field is
accessed directly.


So the Fifo8 API is not complete / practical.


I guess it depends what you consider to be public and private to Fifo8: what is 
arguably missing is something like:


int fifo8_capacity(Fifo8 *fifo)
{
return fifo->capacity;
}

But given that most other users access fifo->capacity directly then I might as well 
do the same and save myself requiring 2 separate implementations of esp_fifo_pop_buf() :)



Change esp_fifo_push() to access the FIFO capacity directly and then consolidate
esp_cmdfifo_push() into esp_fifo_push().

Signed-off-by: Mark Cave-Ayland 
---
  hw/scsi/esp.c | 27 ---
  1 file changed, 8 insertions(+), 19 deletions(-)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index 26fe1dcb9d..16aaf8be93 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -98,16 +98,15 @@ void esp_request_cancelled(SCSIRequest *req)
  }
  }
  
-static void esp_fifo_push(ESPState *s, uint8_t val)

+static void esp_fifo_push(Fifo8 *fifo, uint8_t val)
  {
-if (fifo8_num_used(&s->fifo) == ESP_FIFO_SZ) {
+if (fifo8_num_used(fifo) == fifo->capacity) {
  trace_esp_error_fifo_overrun();
  return;
  }
  
-fifo8_push(&s->fifo, val);

+fifo8_push(fifo, val);
  }
-


Spurious line removal?


Ooops yes. I'm not too worried about this, but if Paolo has any other changes then I 
can roll this into a v4.



  static uint8_t esp_fifo_pop(ESPState *s)
  {
  if (fifo8_is_empty(&s->fifo)) {
@@ -117,16 +116,6 @@ static uint8_t esp_fifo_pop(ESPState *s)
  return fifo8_pop(&s->fifo);
  }


Reviewed-by: Philippe Mathieu-Daudé 



ATB,

Mark.



Re: [PATCH v3 07/11] esp: don't underflow cmdfifo in do_cmd()

2021-04-01 Thread Mark Cave-Ayland

On 01/04/2021 09:19, Philippe Mathieu-Daudé wrote:


On 4/1/21 9:49 AM, Mark Cave-Ayland wrote:

If the guest tries to execute a CDB when cmdfifo is not empty before the start
of the message out phase then clearing the message out phase data will cause
cmdfifo to underflow due to cmdfifo_cdb_offset being larger than the amount of
data within.

Since this can only occur by issuing deliberately incorrect instruction
sequences, ensure that the maximum length of esp_fifo_pop_buf() is limited to
the size of the data within cmdfifo.

Buglink: https://bugs.launchpad.net/qemu/+bug/1909247
Signed-off-by: Mark Cave-Ayland 
---
  hw/scsi/esp.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index 4decbbfc29..7f49522e1d 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -319,13 +319,15 @@ static void do_busid_cmd(ESPState *s, uint8_t busid)
  
  static void do_cmd(ESPState *s)

  {
-uint8_t busid = fifo8_pop(&s->cmdfifo);
+uint8_t busid = esp_fifo_pop(&s->cmdfifo);
+int len;
  
  s->cmdfifo_cdb_offset--;
  
  /* Ignore extended messages for now */

  if (s->cmdfifo_cdb_offset) {
-esp_fifo_pop_buf(&s->cmdfifo, NULL, s->cmdfifo_cdb_offset);
+len = MIN(s->cmdfifo_cdb_offset, fifo8_num_used(&s->cmdfifo));


Do we want to log(GUEST_ERRORS) this?


Not for this case, since a guest OS may legitimately try a SDTR negotiation using 
extended messages which the ESP implementation currently ignores.



+esp_fifo_pop_buf(&s->cmdfifo, NULL, len);
  s->cmdfifo_cdb_offset = 0;
  }
  



Reviewed-by: Philippe Mathieu-Daudé 



ATB,

Mark.



Re: [PATCH v3 08/11] esp: don't overflow cmdfifo in get_cmd()

2021-04-01 Thread Mark Cave-Ayland

On 01/04/2021 09:19, Philippe Mathieu-Daudé wrote:


On 4/1/21 9:49 AM, Mark Cave-Ayland wrote:

If the guest tries to read a CDB using DMA and cmdfifo is not empty then it is
possible to overflow cmdfifo.

Since this can only occur by issuing deliberately incorrect instruction
sequences, ensure that the maximum length of the CDB transferred to cmdfifo is
limited to the available free space within cmdfifo.

Buglink: https://bugs.launchpad.net/qemu/+bug/1909247
Signed-off-by: Mark Cave-Ayland 
---
  hw/scsi/esp.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index 7f49522e1d..c547c60395 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -243,6 +243,7 @@ static uint32_t get_cmd(ESPState *s, uint32_t maxlen)
  }
  if (s->dma_memory_read) {
  s->dma_memory_read(s->dma_opaque, buf, dmalen);
+dmalen = MIN(fifo8_num_free(&s->fifo), dmalen);


Ditto, GUEST_ERRORS?


Possibly? But then there are several other places where this could also happen. The 
ESP uses the FIFO as a buffer for the SCSI bus in DMA mode, and so at the start of a 
DMA transfer the FIFO can contain leftover junk. This is why all the guest OSs I've 
seen send an explicit "Flush FIFO" command before each command to ensure that any 
junk is removed from the FIFO before sending the message out/CDB.



Reviewed-by: Philippe Mathieu-Daudé 


  fifo8_push_all(&s->cmdfifo, buf, dmalen);
  } else {
  if (esp_select(s) < 0) {



ATB,

Mark.



Re: [PATCH v3 2/4] block: check for sys/disk.h

2021-04-01 Thread Paolo Bonzini

On 01/04/21 10:03, Philippe Mathieu-Daudé wrote:

On 4/1/21 7:08 AM, Joelle van Dyne wrote:

On Mon, Mar 15, 2021 at 11:03 AM Joelle van Dyne  wrote:


Some BSD platforms do not have this header.

Reviewed-by: Peter Maydell 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Joelle van Dyne 


Please bear with me as I am still new to this, but what happens to the
three patches that are reviewed if the last patch does not get
reviewed? Do the reviewed patches still get to make it into 6.0? I am
willing to drop the unreviewed patch if there are issues. Thanks.


I guess this is bad timing, as this time in year various maintainers
are on vacations. Cc'ing Paolo as he sometimes take generic/block
patches.


I didn't notice this series before, and it was posted a bit late (one 
day before feature freeze).  I have queued it, so Joelle need not do 
anything else, but I won't include it in my pull request for 6.0.


Paolo




Re: [PATCH v3 03/11] esp: consolidate esp_cmdfifo_push() into esp_fifo_push()

2021-04-01 Thread Philippe Mathieu-Daudé
On 4/1/21 10:50 AM, Mark Cave-Ayland wrote:
> On 01/04/2021 09:15, Philippe Mathieu-Daudé wrote:
> 
>> On 4/1/21 9:49 AM, Mark Cave-Ayland wrote:
>>> Each FIFO currently has its own push functions with the only
>>> difference being
>>> the capacity check. The original reason for this was that the fifo8
>>> implementation doesn't have a formal API for retrieving the FIFO
>>> capacity,
>>> however there are multiple examples within QEMU where the capacity
>>> field is
>>> accessed directly.
>>
>> So the Fifo8 API is not complete / practical.
> 
> I guess it depends what you consider to be public and private to Fifo8:
> what is arguably missing is something like:
> 
> int fifo8_capacity(Fifo8 *fifo)
> {
>     return fifo->capacity;
> }
> 
> But given that most other users access fifo->capacity directly then I
> might as well do the same and save myself requiring 2 separate
> implementations of esp_fifo_pop_buf() :)

Sorry, I should have been more explicit by precising this was not
a comment directed to your patch, but I was thinking loudly about
the Fifo8 API; and as you mentioned the other models do that so your
kludge is acceptable.
>>> Change esp_fifo_push() to access the FIFO capacity directly and then
>>> consolidate
>>> esp_cmdfifo_push() into esp_fifo_push().
>>>
>>> Signed-off-by: Mark Cave-Ayland 
>>> ---
>>>   hw/scsi/esp.c | 27 ---
>>>   1 file changed, 8 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
>>> index 26fe1dcb9d..16aaf8be93 100644
>>> --- a/hw/scsi/esp.c
>>> +++ b/hw/scsi/esp.c
>>> @@ -98,16 +98,15 @@ void esp_request_cancelled(SCSIRequest *req)
>>>   }
>>>   }
>>>   -static void esp_fifo_push(ESPState *s, uint8_t val)
>>> +static void esp_fifo_push(Fifo8 *fifo, uint8_t val)
>>>   {
>>> -    if (fifo8_num_used(&s->fifo) == ESP_FIFO_SZ) {
>>> +    if (fifo8_num_used(fifo) == fifo->capacity) {
>>>   trace_esp_error_fifo_overrun();
>>>   return;
>>>   }
>>>   -    fifo8_push(&s->fifo, val);
>>> +    fifo8_push(fifo, val);
>>>   }
>>> -
>>
>> Spurious line removal?
> 
> Ooops yes. I'm not too worried about this, but if Paolo has any other
> changes then I can roll this into a v4.

Actually it reappears in patch 05/11 ;)



Re: [PATCH] MAINTAINERS: replace Huawei's email to personal one

2021-04-01 Thread Thomas Huth

On 01/04/2021 09.58, Philippe Mathieu-Daudé wrote:

On 4/1/21 8:34 AM, Thomas Huth wrote:

On 23/03/2021 05.04, Dongjiu Geng wrote:

ping...

sorry for the noise.
On 3/11/2021 19:29,Dongjiu Geng
 wrote:

     In order to conveniently receive email, replace the Huawei
     email address with my personal one.

     Signed-off-by: Dongjiu Geng 
     ---
     MAINTAINERS | 2 +-
     1 file changed, 1 insertion(+), 1 deletion(-)

     diff --git a/MAINTAINERS b/MAINTAINERS
     index e04ae21..823b98b 100644
     --- a/MAINTAINERS
     +++ b/MAINTAINERS
     @@ -1711,7 +1711,7 @@ F: tests/qtest/acpi-utils.[hc]
     F: tests/data/acpi/

     ACPI/HEST/GHES
     -R: Dongjiu Geng 
     +R: Dongjiu Geng 


We prefer the mail to be sent/signed by the listed address,
and acked by the updated address. But we should have tell
you that earlier, when you still had access to your mail.


I've put the gengdong...@huawei.com on CC:, just to make sure, but it was 
bouncing already.



     R: Xiang Zheng 
     L: qemu-...@nongnu.org
     S: Maintained


I'm currently assembling a pull-request that contains some other updates
to MAINTAINERS, too. I'll add your patch there.


A pair of candidates:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg790977.html


Yes, I've queued them.

 Thomas




Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes

2021-04-01 Thread Cédric Le Goater
On 4/1/21 4:59 AM, David Gibson wrote:
> On Wed, Mar 31, 2021 at 05:18:45PM +0200, Cédric Le Goater wrote:
>> On 3/31/21 2:57 AM, David Gibson wrote:
>>> On Mon, Mar 29, 2021 at 03:32:37PM -0300, Daniel Henrique Barboza wrote:


 On 3/29/21 12:32 PM, Cédric Le Goater wrote:
> On 3/29/21 6:20 AM, David Gibson wrote:
>> On Thu, Mar 25, 2021 at 09:56:04AM +0100, Cédric Le Goater wrote:
>>> On 3/25/21 3:10 AM, David Gibson wrote:
 On Tue, Mar 23, 2021 at 02:21:33PM -0300, Daniel Henrique Barboza 
 wrote:
>
>
> On 3/22/21 10:03 PM, David Gibson wrote:
>> On Fri, Mar 19, 2021 at 03:34:52PM -0300, Daniel Henrique Barboza 
>> wrote:
>>> Kernel commit 4bce545903fa ("powerpc/topology: Update
>>> topology_core_cpumask") cause a regression in the pseries machine 
>>> when
>>> defining certain SMP topologies [1]. The reasoning behind the 
>>> change is
>>> explained in kernel commit 4ca234a9cbd7 ("powerpc/smp: Stop updating
>>> cpu_core_mask"). In short, cpu_core_mask logic was causing troubles 
>>> with
>>> large VMs with lots of CPUs and was changed by cpu_cpu_mask 
>>> because, as
>>> far as the kernel understanding of SMP topologies goes, both masks 
>>> are
>>> equivalent.
>>>
>>> Further discussions in the kernel mailing list [2] shown that the
>>> powerpc kernel always considered that the number of sockets were 
>>> equal
>>> to the number of NUMA nodes. The claim is that it doesn't make 
>>> sense,
>>> for Power hardware at least, 2+ sockets being in the same NUMA 
>>> node. The
>>> immediate conclusion is that all SMP topologies the pseries machine 
>>> were
>>> supplying to the kernel, with more than one socket in the same NUMA 
>>> node
>>> as in [1], happened to be correctly represented in the kernel by
>>> accident during all these years.
>>>
>>> There's a case to be made for virtual topologies being detached from
>>> hardware constraints, allowing maximum flexibility to users. At the 
>>> same
>>> time, this freedom can't result in unrealistic hardware 
>>> representations
>>> being emulated. If the real hardware and the pseries kernel don't
>>> support multiple chips/sockets in the same NUMA node, neither 
>>> should we.
>>>
>>> Starting in 6.0.0, all sockets must match an unique NUMA node in the
>>> pseries machine. qtest changes were made to adapt to this new
>>> condition.
>>
>> Oof.  I really don't like this idea.  It means a bunch of fiddly work
>> for users to match these up, for no real gain.  I'm also concerned
>> that this will require follow on changes in libvirt to not make this 
>> a
>> really cryptic and irritating point of failure.
>
> Haven't though about required Libvirt changes, although I can say 
> that there
> will be some amount to be mande and it will probably annoy existing 
> users
> (everyone that has a multiple socket per NUMA node topology).
>
> There is not much we can do from the QEMU layer aside from what I've 
> proposed
> here. The other alternative is to keep interacting with the kernel 
> folks to
> see if there is a way to keep our use case untouched.

 Right.  Well.. not necessarily untouched, but I'm hoping for more
 replies from Cédric to my objections and mpe's.  Even with sockets
 being a kinda meaningless concept in PAPR, I don't think tying it to
 NUMA nodes makes sense.
>>>
>>> I did a couple of replies in different email threads but maybe not
>>> to all. I felt it was going nowhere :/ Couple of thoughts,
>>
>> I think I saw some of those, but maybe not all.
>>
>>> Shouldn't we get rid of the socket concept, die also, under pseries
>>> since they don't exist under PAPR ? We only have numa nodes, cores,
>>> threads AFAICT.
>>
>> Theoretically, yes.  I'm not sure it's really practical, though, since
>> AFAICT, both qemu and the kernel have the notion of sockets (though
>> not dies) built into generic code.
>
> Yes. But, AFAICT, these topology notions have not reached "arch/powerpc"
> and PPC Linux only has a NUMA node id, on pseries and powernv.
>
>> It does mean that one possible approach here - maybe the best one - is
>> to simply declare that sockets are meaningless under, so we simply
>> don't expect what the guest kernel reports to match what's given to
>> qemu.
>>
>> It'd be nice to avoid that if we can: in a sense it's just cosmetic,
>> but it is likely to surprise and confuse people.
>

[BUG FIX][PATCH v3 0/3] vhost-user-blk: fix bug on device disconnection during initialization

2021-04-01 Thread Denis Plotnikov

This is a series fixing a bug in host-user-blk.
Is there any chance for it to be considered for the next rc?

Thanks!

Denis

On 29.03.2021 16:44, Denis Plotnikov wrote:


ping!

On 25.03.2021 18:12, Denis Plotnikov wrote:

v3:
   * 0003: a new patch added fixing the problem on vm shutdown
 I stumbled on this bug after v2 sending.
   * 0001: gramma fixing (Raphael)
   * 0002: commit message fixing (Raphael)

v2:
   * split the initial patch into two (Raphael)
   * rename init to realized (Raphael)
   * remove unrelated comment (Raphael)

When the vhost-user-blk device lose the connection to the daemon during
the initialization phase it kills qemu because of the assert in the code.
The series fixes the bug.

0001 is preparation for the fix
0002 fixes the bug, patch description has the full motivation for the series
0003 (added in v3) fix bug on vm shutdown

Denis Plotnikov (3):
   vhost-user-blk: use different event handlers on initialization
   vhost-user-blk: perform immediate cleanup if disconnect on
 initialization
   vhost-user-blk: add immediate cleanup on shutdown

  hw/block/vhost-user-blk.c | 79 ---
  1 file changed, 48 insertions(+), 31 deletions(-)



[PATCH for-6.0 v1 2/4] migration: Inhibit virtio-balloon for the duration of background snapshot

2021-04-01 Thread Andrey Gruzdev
The same thing as for incoming postcopy - we cannot deal with concurrent
RAM discards when using background snapshot feature in outgoing migration.

Fixes: 8518278a6af589ccc401f06e35f171b1e6fae800 (migration: implementation
  of background snapshot thread)
Signed-off-by: Andrey Gruzdev 
Reported-by: David Hildenbrand 
Reviewed-by: David Hildenbrand 
---
 hw/virtio/virtio-balloon.c | 8 ++--
 include/migration/misc.h   | 2 ++
 migration/migration.c  | 8 
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index e770955176..d120bf8f43 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -66,8 +66,12 @@ static bool 
virtio_balloon_pbp_matches(PartiallyBalloonedPage *pbp,
 
 static bool virtio_balloon_inhibited(void)
 {
-/* Postcopy cannot deal with concurrent discards, so it's special. */
-return ram_block_discard_is_disabled() || migration_in_incoming_postcopy();
+/*
+ * Postcopy cannot deal with concurrent discards,
+ * so it's special, as well as background snapshots.
+ */
+return ram_block_discard_is_disabled() || migration_in_incoming_postcopy() 
||
+migration_in_bg_snapshot();
 }
 
 static void balloon_inflate_page(VirtIOBalloon *balloon,
diff --git a/include/migration/misc.h b/include/migration/misc.h
index bccc1b6b44..738675ef52 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -70,6 +70,8 @@ bool migration_in_postcopy_after_devices(MigrationState *);
 void migration_global_dump(Monitor *mon);
 /* True if incomming migration entered POSTCOPY_INCOMING_DISCARD */
 bool migration_in_incoming_postcopy(void);
+/* True if background snapshot is active */
+bool migration_in_bg_snapshot(void);
 
 /* migration/block-dirty-bitmap.c */
 void dirty_bitmap_mig_init(void);
diff --git a/migration/migration.c b/migration/migration.c
index 00e13f9d58..be4729e7c8 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1976,6 +1976,14 @@ bool migration_in_incoming_postcopy(void)
 return ps >= POSTCOPY_INCOMING_DISCARD && ps < POSTCOPY_INCOMING_END;
 }
 
+bool migration_in_bg_snapshot(void)
+{
+MigrationState *s = migrate_get_current();
+
+return migrate_background_snapshot() &&
+migration_is_setup_or_active(s->state);
+}
+
 bool migration_is_idle(void)
 {
 MigrationState *s = current_migration;
-- 
2.27.0




[PATCH for-6.0 v1 1/4] migration: Fix missing qemu_fflush() on buffer file in bg_migration_thread

2021-04-01 Thread Andrey Gruzdev
Added missing qemu_fflush() on buffer file holding precopy device state.
Increased initial QIOChannelBuffer allocation to 512KB to avoid reallocs.
Typical configurations often require >200KB for device state and VMDESC.

Fixes: 8518278a6af589ccc401f06e35f171b1e6fae800 (migration: implementation
  of background snapshot thread)
Signed-off-by: Andrey Gruzdev 
---
 migration/migration.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index ca8b97baa5..00e13f9d58 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3812,7 +3812,7 @@ static void *bg_migration_thread(void *opaque)
  * with vCPUs running and, finally, write stashed non-RAM part of
  * the vmstate from the buffer to the migration stream.
  */
-s->bioc = qio_channel_buffer_new(128 * 1024);
+s->bioc = qio_channel_buffer_new(512 * 1024);
 qio_channel_set_name(QIO_CHANNEL(s->bioc), "vmstate-buffer");
 fb = qemu_fopen_channel_output(QIO_CHANNEL(s->bioc));
 object_unref(OBJECT(s->bioc));
@@ -3866,6 +3866,12 @@ static void *bg_migration_thread(void *opaque)
 if (qemu_savevm_state_complete_precopy_non_iterable(fb, false, false)) {
 goto fail;
 }
+/*
+ * Since we are going to get non-iterable state data directly
+ * from s->bioc->data, explicit flush is needed here.
+ */
+qemu_fflush(fb);
+
 /* Now initialize UFFD context and start tracking RAM writes */
 if (ram_write_tracking_start()) {
 goto fail;
-- 
2.27.0




[PATCH for-6.0 v1 4/4] migration: Rename 'bs' to 'block' in background snapshot code

2021-04-01 Thread Andrey Gruzdev
Rename 'bs' to commonly used 'block' in migration/ram.c background
snapshot code.

Signed-off-by: Andrey Gruzdev 
Reported-by: David Hildenbrand 
---
 migration/ram.c | 86 +
 1 file changed, 44 insertions(+), 42 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 7e2bc0fdd3..4682f3625c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1455,7 +1455,7 @@ static RAMBlock *poll_fault_page(RAMState *rs, ram_addr_t 
*offset)
 {
 struct uffd_msg uffd_msg;
 void *page_address;
-RAMBlock *bs;
+RAMBlock *block;
 int res;
 
 if (!migrate_background_snapshot()) {
@@ -1468,9 +1468,9 @@ static RAMBlock *poll_fault_page(RAMState *rs, ram_addr_t 
*offset)
 }
 
 page_address = (void *)(uintptr_t) uffd_msg.arg.pagefault.address;
-bs = qemu_ram_block_from_host(page_address, false, offset);
-assert(bs && (bs->flags & RAM_UF_WRITEPROTECT) != 0);
-return bs;
+block = qemu_ram_block_from_host(page_address, false, offset);
+assert(block && (block->flags & RAM_UF_WRITEPROTECT) != 0);
+return block;
 }
 
 /**
@@ -1526,7 +1526,7 @@ bool ram_write_tracking_compatible(void)
 {
 const uint64_t uffd_ioctls_mask = BIT(_UFFDIO_WRITEPROTECT);
 int uffd_fd;
-RAMBlock *bs;
+RAMBlock *block;
 bool ret = false;
 
 /* Open UFFD file descriptor */
@@ -1537,15 +1537,15 @@ bool ram_write_tracking_compatible(void)
 
 RCU_READ_LOCK_GUARD();
 
-RAMBLOCK_FOREACH_NOT_IGNORED(bs) {
+RAMBLOCK_FOREACH_NOT_IGNORED(block) {
 uint64_t uffd_ioctls;
 
 /* Nothing to do with read-only and MMIO-writable regions */
-if (bs->mr->readonly || bs->mr->rom_device) {
+if (block->mr->readonly || block->mr->rom_device) {
 continue;
 }
 /* Try to register block memory via UFFD-IO to track writes */
-if (uffd_register_memory(uffd_fd, bs->host, bs->max_length,
+if (uffd_register_memory(uffd_fd, block->host, block->max_length,
 UFFDIO_REGISTER_MODE_WP, &uffd_ioctls)) {
 goto out;
 }
@@ -1567,13 +1567,13 @@ out:
  * Since it's solely used for userfault_fd WP feature, here we just
  *   hardcode page size to qemu_real_host_page_size.
  *
- * @bs: RAM block to populate
+ * @block: RAM block to populate
  */
-static void ram_block_populate_pages(RAMBlock *bs)
+static void ram_block_populate_pages(RAMBlock *block)
 {
-char *ptr = (char *) bs->host;
+char *ptr = (char *) block->host;
 
-for (ram_addr_t offset = 0; offset < bs->used_length;
+for (ram_addr_t offset = 0; offset < block->used_length;
 offset += qemu_real_host_page_size) {
 char tmp = *(ptr + offset);
 
@@ -1587,13 +1587,13 @@ static void ram_block_populate_pages(RAMBlock *bs)
  */
 void ram_write_tracking_prepare(void)
 {
-RAMBlock *bs;
+RAMBlock *block;
 
 RCU_READ_LOCK_GUARD();
 
-RAMBLOCK_FOREACH_NOT_IGNORED(bs) {
+RAMBLOCK_FOREACH_NOT_IGNORED(block) {
 /* Nothing to do with read-only and MMIO-writable regions */
-if (bs->mr->readonly || bs->mr->rom_device) {
+if (block->mr->readonly || block->mr->rom_device) {
 continue;
 }
 
@@ -1605,7 +1605,7 @@ void ram_write_tracking_prepare(void)
  * UFFDIO_WRITEPROTECT_MODE_WP mode setting would silently skip
  * pages with pte_none() entries in page table.
  */
-ram_block_populate_pages(bs);
+ram_block_populate_pages(block);
 }
 }
 
@@ -1618,7 +1618,7 @@ int ram_write_tracking_start(void)
 {
 int uffd_fd;
 RAMState *rs = ram_state;
-RAMBlock *bs;
+RAMBlock *block;
 
 /* Open UFFD file descriptor */
 uffd_fd = uffd_create_fd(UFFD_FEATURE_PAGEFAULT_FLAG_WP, true);
@@ -1629,27 +1629,27 @@ int ram_write_tracking_start(void)
 
 RCU_READ_LOCK_GUARD();
 
-RAMBLOCK_FOREACH_NOT_IGNORED(bs) {
+RAMBLOCK_FOREACH_NOT_IGNORED(block) {
 /* Nothing to do with read-only and MMIO-writable regions */
-if (bs->mr->readonly || bs->mr->rom_device) {
+if (block->mr->readonly || block->mr->rom_device) {
 continue;
 }
 
 /* Register block memory with UFFD to track writes */
-if (uffd_register_memory(rs->uffdio_fd, bs->host,
-bs->max_length, UFFDIO_REGISTER_MODE_WP, NULL)) {
+if (uffd_register_memory(rs->uffdio_fd, block->host,
+block->max_length, UFFDIO_REGISTER_MODE_WP, NULL)) {
 goto fail;
 }
 /* Apply UFFD write protection to the block memory range */
-if (uffd_change_protection(rs->uffdio_fd, bs->host,
-bs->max_length, true, false)) {
+if (uffd_change_protection(rs->uffdio_fd, block->host,
+block->max_length, true, false)) {
 goto fail;
 }
-bs->flags |= RAM_UF_WRITEPROTECT;
-memory_region_ref(bs->mr);
+block->flags |= RAM_UF_WRITEPROTE

[PATCH for-6.0 v1 3/4] migration: Pre-fault memory before starting background snasphot

2021-04-01 Thread Andrey Gruzdev
This commit solves the issue with userfault_fd WP feature that
background snapshot is based on. For any never poluated or discarded
memory page, the UFFDIO_WRITEPROTECT ioctl() would skip updating
PTE for that page, thereby loosing WP setting for it.

So we need to pre-fault pages for each RAM block to be protected
before making a userfault_fd wr-protect ioctl().

Fixes: 278e2f551a095b234de74dca9c214d5502a1f72c (migration: support
  UFFD write fault processing in ram_save_iterate())
Signed-off-by: Andrey Gruzdev 
Reported-by: David Hildenbrand 
Reviewed-by: David Hildenbrand 
---
 migration/migration.c |  6 ++
 migration/ram.c   | 49 +++
 migration/ram.h   |  1 +
 3 files changed, 56 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index be4729e7c8..71bce15a1b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3827,6 +3827,12 @@ static void *bg_migration_thread(void *opaque)
 
 update_iteration_initial_status(s);
 
+/*
+ * Prepare for tracking memory writes with UFFD-WP - populate
+ * RAM pages before protecting.
+ */
+ram_write_tracking_prepare();
+
 qemu_savevm_state_header(s->to_dst_file);
 qemu_savevm_state_setup(s->to_dst_file);
 
diff --git a/migration/ram.c b/migration/ram.c
index 40e78952ad..7e2bc0fdd3 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1560,6 +1560,55 @@ out:
 return ret;
 }
 
+/*
+ * ram_block_populate_pages: populate memory in the RAM block by reading
+ *   an integer from the beginning of each page.
+ *
+ * Since it's solely used for userfault_fd WP feature, here we just
+ *   hardcode page size to qemu_real_host_page_size.
+ *
+ * @bs: RAM block to populate
+ */
+static void ram_block_populate_pages(RAMBlock *bs)
+{
+char *ptr = (char *) bs->host;
+
+for (ram_addr_t offset = 0; offset < bs->used_length;
+offset += qemu_real_host_page_size) {
+char tmp = *(ptr + offset);
+
+/* Don't optimize the read out */
+asm volatile("" : "+r" (tmp));
+}
+}
+
+/*
+ * ram_write_tracking_prepare: prepare for UFFD-WP memory tracking
+ */
+void ram_write_tracking_prepare(void)
+{
+RAMBlock *bs;
+
+RCU_READ_LOCK_GUARD();
+
+RAMBLOCK_FOREACH_NOT_IGNORED(bs) {
+/* Nothing to do with read-only and MMIO-writable regions */
+if (bs->mr->readonly || bs->mr->rom_device) {
+continue;
+}
+
+/*
+ * Populate pages of the RAM block before enabling userfault_fd
+ * write protection.
+ *
+ * This stage is required since ioctl(UFFDIO_WRITEPROTECT) with
+ * UFFDIO_WRITEPROTECT_MODE_WP mode setting would silently skip
+ * pages with pte_none() entries in page table.
+ */
+ram_block_populate_pages(bs);
+}
+}
+
 /*
  * ram_write_tracking_start: start UFFD-WP memory tracking
  *
diff --git a/migration/ram.h b/migration/ram.h
index 6378bb3ebc..4833e9fd5b 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -82,6 +82,7 @@ void colo_incoming_start_dirty_log(void);
 /* Background snapshot */
 bool ram_write_tracking_available(void);
 bool ram_write_tracking_compatible(void);
+void ram_write_tracking_prepare(void);
 int ram_write_tracking_start(void);
 void ram_write_tracking_stop(void);
 
-- 
2.27.0




[PATCH for-6.0 v1 0/4] migration: Fixes to the 'background-snapshot' code

2021-04-01 Thread Andrey Gruzdev
Changes v0->v1:
 * Fixes to coding style and commit messages
 * Renamed 'bs' to 'block' in migration/ram.c background snapshot code

This patch series contains:
 * Fix to the issue with occasionally truncated non-iterable device state
 * Solution to compatibility issues with virtio-balloon device
 * Fix to the issue when discarded or never populated pages miss UFFD
   write protection and get into migration stream in dirty state
 * Renaming of 'bs' to commonly used 'block' in migration/ram.c background
   snapshot code

Andrey Gruzdev (4):
  migration: Fix missing qemu_fflush() on buffer file in
bg_migration_thread
  migration: Inhibit virtio-balloon for the duration of background
snapshot
  migration: Pre-fault memory before starting background snasphot
  migration: Rename 'bs' to 'block' in background snapshot code

 hw/virtio/virtio-balloon.c |   8 ++-
 include/migration/misc.h   |   2 +
 migration/migration.c  |  22 ++-
 migration/ram.c| 119 ++---
 migration/ram.h|   1 +
 5 files changed, 115 insertions(+), 37 deletions(-)

-- 
2.27.0




[RFC v2 1/4] target/riscv: add RNMI cpu feature

2021-04-01 Thread frank . chang
From: Frank Chang 

Signed-off-by: Frank Chang 
---
 hw/riscv/riscv_hart.c |  8 +++
 include/hw/riscv/riscv_hart.h |  2 ++
 target/riscv/cpu.c| 40 +++
 target/riscv/cpu.h| 12 ++-
 target/riscv/cpu_bits.h   |  6 ++
 5 files changed, 67 insertions(+), 1 deletion(-)

diff --git a/hw/riscv/riscv_hart.c b/hw/riscv/riscv_hart.c
index 613ea2aaa0b..b8cb5088638 100644
--- a/hw/riscv/riscv_hart.c
+++ b/hw/riscv/riscv_hart.c
@@ -33,6 +33,10 @@ static Property riscv_harts_props[] = {
 DEFINE_PROP_STRING("cpu-type", RISCVHartArrayState, cpu_type),
 DEFINE_PROP_UINT64("resetvec", RISCVHartArrayState, resetvec,
DEFAULT_RSTVEC),
+DEFINE_PROP_UINT64("rnmi_irqvec", RISCVHartArrayState, rnmi_irqvec,
+   DEFAULT_RNMI_IRQVEC),
+DEFINE_PROP_UINT64("rnmi_excpvec", RISCVHartArrayState, rnmi_excpvec,
+   DEFAULT_RNMI_EXCPVEC),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -47,6 +51,10 @@ static bool riscv_hart_realize(RISCVHartArrayState *s, int 
idx,
 {
 object_initialize_child(OBJECT(s), "harts[*]", &s->harts[idx], cpu_type);
 qdev_prop_set_uint64(DEVICE(&s->harts[idx]), "resetvec", s->resetvec);
+qdev_prop_set_uint64(DEVICE(&s->harts[idx]), "rnmi_irqvec",
+s->rnmi_irqvec);
+qdev_prop_set_uint64(DEVICE(&s->harts[idx]), "rnmi_excpvec",
+s->rnmi_excpvec);
 s->harts[idx].env.mhartid = s->hartid_base + idx;
 qemu_register_reset(riscv_harts_cpu_reset, &s->harts[idx]);
 return qdev_realize(DEVICE(&s->harts[idx]), NULL, errp);
diff --git a/include/hw/riscv/riscv_hart.h b/include/hw/riscv/riscv_hart.h
index bbc21cdc9a6..48e6730d832 100644
--- a/include/hw/riscv/riscv_hart.h
+++ b/include/hw/riscv/riscv_hart.h
@@ -38,6 +38,8 @@ struct RISCVHartArrayState {
 uint32_t hartid_base;
 char *cpu_type;
 uint64_t resetvec;
+uint64_t rnmi_irqvec;
+uint64_t rnmi_excpvec;
 RISCVCPU *harts;
 };
 
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 7d6ed80f6b6..9fb6ceb0ad8 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -137,6 +137,14 @@ static void set_feature(CPURISCVState *env, int feature)
 env->features |= (1ULL << feature);
 }
 
+static void set_rnmi_vectors(CPURISCVState *env, int irqvec, int excpvec)
+{
+#ifndef CONFIG_USER_ONLY
+env->rnmi_irqvec = irqvec;
+env->rnmi_excpvec = excpvec;
+#endif
+}
+
 static void set_resetvec(CPURISCVState *env, int resetvec)
 {
 #ifndef CONFIG_USER_ONLY
@@ -373,6 +381,23 @@ static void riscv_cpu_disas_set_info(CPUState *s, 
disassemble_info *info)
 }
 }
 
+#ifndef CONFIG_USER_ONLY
+static void riscv_cpu_set_rnmi(void *opaque, int irq, int level)
+{
+RISCVCPU *cpu = opaque;
+CPURISCVState *env = &cpu->env;
+CPUState *cs = CPU(cpu);
+
+if (level) {
+env->nmip |= 1 << irq;
+cpu_interrupt(cs, CPU_INTERRUPT_RNMI);
+} else {
+env->nmip &= ~(1 << irq);
+cpu_reset_interrupt(cs, CPU_INTERRUPT_RNMI);
+}
+}
+#endif
+
 static void riscv_cpu_realize(DeviceState *dev, Error **errp)
 {
 CPUState *cs = CPU(dev);
@@ -416,6 +441,16 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 
 set_resetvec(env, cpu->cfg.resetvec);
 
+if (cpu->cfg.rnmi) {
+set_feature(env, RISCV_FEATURE_RNMI);
+set_rnmi_vectors(env, cpu->cfg.rnmi_irqvec, cpu->cfg.rnmi_excpvec);
+#ifndef CONFIG_USER_ONLY
+env->nmie = true;
+qdev_init_gpio_in_named(DEVICE(cpu), riscv_cpu_set_rnmi,
+"rnmi", TARGET_LONG_BITS);
+#endif
+}
+
 /* If only XLEN is set for misa, then set misa from properties */
 if (env->misa == RV32 || env->misa == RV64) {
 /* Do some ISA extension error checking */
@@ -555,6 +590,11 @@ static Property riscv_cpu_properties[] = {
 DEFINE_PROP_BOOL("mmu", RISCVCPU, cfg.mmu, true),
 DEFINE_PROP_BOOL("pmp", RISCVCPU, cfg.pmp, true),
 DEFINE_PROP_UINT64("resetvec", RISCVCPU, cfg.resetvec, DEFAULT_RSTVEC),
+DEFINE_PROP_BOOL("rnmi", RISCVCPU, cfg.rnmi, false),
+DEFINE_PROP_UINT64("rnmi_irqvec", RISCVCPU, cfg.rnmi_irqvec,
+   DEFAULT_RNMI_IRQVEC),
+DEFINE_PROP_UINT64("rnmi_excpvec", RISCVCPU, cfg.rnmi_excpvec,
+   DEFAULT_RNMI_EXCPVEC),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 0a33d387ba8..7d2bb7e7003 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -80,7 +80,8 @@
 enum {
 RISCV_FEATURE_MMU,
 RISCV_FEATURE_PMP,
-RISCV_FEATURE_MISA
+RISCV_FEATURE_MISA,
+RISCV_FEATURE_RNMI,
 };
 
 #define PRIV_VERSION_1_10_0 0x00011000
@@ -178,6 +179,12 @@ struct CPURISCVState {
 target_ulong mcause;
 target_ulong mtval;  /* since: priv-1.10.0 */
 
+/* NMI */
+bool nmie;
+target_ulong nmip;
+target_ulong rnmi_irqvec;
+target_ulon

[RFC v2 2/4] target/riscv: add RNMI CSRs

2021-04-01 Thread frank . chang
From: Frank Chang 

Signed-off-by: Frank Chang 
---
 target/riscv/cpu.h  |  4 +++
 target/riscv/cpu_bits.h |  9 +++
 target/riscv/csr.c  | 59 +
 3 files changed, 72 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 7d2bb7e7003..674ee4dc999 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -180,6 +180,10 @@ struct CPURISCVState {
 target_ulong mtval;  /* since: priv-1.10.0 */
 
 /* NMI */
+target_ulong mnscratch;
+target_ulong mnepc;
+target_ulong mncause; /* mncause without bit XLEN-1 set to 1 */
+target_ulong mnstatus;
 bool nmie;
 target_ulong nmip;
 target_ulong rnmi_irqvec;
diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index 8e5f0be599a..a376ede0cc5 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -166,6 +166,12 @@
 #define CSR_MTVAL   0x343
 #define CSR_MIP 0x344
 
+/* NMI */
+#define CSR_MNSCRATCH   0x350
+#define CSR_MNEPC   0x351
+#define CSR_MNCAUSE 0x352
+#define CSR_MNSTATUS0x353
+
 /* Legacy Machine Trap Handling (priv v1.9.1) */
 #define CSR_MBADADDR0x343
 
@@ -558,6 +564,9 @@
 #define RISCV_EXCP_INT_FLAG0x8000
 #define RISCV_EXCP_INT_MASK0x7fff
 
+/* RNMI mnstatus CSR mask */
+#define MNSTATUS_MPP   MSTATUS_MPP
+
 /* Interrupt causes */
 #define IRQ_U_SOFT 0
 #define IRQ_S_SOFT 1
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index d2585395bfb..489d6d90e68 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -188,6 +188,11 @@ static int pmp(CPURISCVState *env, int csrno)
 {
 return -!riscv_feature(env, RISCV_FEATURE_PMP);
 }
+
+static int nmi(CPURISCVState *env, int csrno)
+{
+return -!riscv_feature(env, RISCV_FEATURE_RNMI);
+}
 #endif
 
 /* User Floating-Point CSRs */
@@ -713,6 +718,54 @@ static int write_mbadaddr(CPURISCVState *env, int csrno, 
target_ulong val)
 return 0;
 }
 
+static int read_mnscratch(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->mnscratch;
+return 0;
+}
+
+static int write_mnscratch(CPURISCVState *env, int csrno, target_ulong val)
+{
+env->mnscratch = val;
+return 0;
+}
+
+static int read_mnepc(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->mnepc;
+return 0;
+}
+
+static int write_mnepc(CPURISCVState *env, int csrno, target_ulong val)
+{
+env->mnepc = val;
+return 0;
+}
+
+static int read_mncause(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->mncause;
+return 0;
+}
+
+static int write_mncause(CPURISCVState *env, int csrno, target_ulong val)
+{
+env->mncause = val;
+return 0;
+}
+
+static int read_mnstatus(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->mnstatus;
+return 0;
+}
+
+static int write_mnstatus(CPURISCVState *env, int csrno, target_ulong val)
+{
+env->mnstatus = val & MNSTATUS_MPP;
+return 0;
+}
+
 static int rmw_mip(CPURISCVState *env, int csrno, target_ulong *ret_value,
target_ulong new_value, target_ulong write_mask)
 {
@@ -1428,6 +1481,12 @@ riscv_csr_operations csr_ops[CSR_TABLE_SIZE] = {
 [CSR_MBADADDR] = { "mbadaddr", any,  read_mbadaddr, write_mbadaddr },
 [CSR_MIP]  = { "mip",  any,  NULL,NULL, rmw_mip},
 
+/* NMI */
+[CSR_MNSCRATCH] = { "mnscratch", nmi, read_mnscratch, write_mnscratch },
+[CSR_MNEPC] = { "mnepc", nmi, read_mnepc, write_mnepc },
+[CSR_MNCAUSE]   = { "mncause",   nmi, read_mncause,   write_mncause   },
+[CSR_MNSTATUS]  = { "mnstatus",  nmi, read_mnstatus,  write_mnstatus  },
+
 /* Supervisor Trap Setup */
 [CSR_SSTATUS]= { "sstatus",smode, read_sstatus,write_sstatus   
 },
 [CSR_SIE]= { "sie",smode, read_sie,write_sie   
 },
-- 
2.17.1




[RFC v2 3/4] target/riscv: handle RNMI interrupt and exception

2021-04-01 Thread frank . chang
From: Frank Chang 

Signed-off-by: Frank Chang 
---
 target/riscv/cpu_bits.h   |  4 
 target/riscv/cpu_helper.c | 49 +++
 2 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index a376ede0cc5..937b1f28455 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -607,4 +607,8 @@
 #define MIE_UTIE   (1 << IRQ_U_TIMER)
 #define MIE_SSIE   (1 << IRQ_S_SOFT)
 #define MIE_USIE   (1 << IRQ_U_SOFT)
+
+/* RISC-V-specific interrupt pending bits. */
+#define CPU_INTERRUPT_RNMI CPU_INTERRUPT_TGT_EXT_0
+
 #endif
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 21c54ef5613..67a633154a9 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -38,6 +38,19 @@ int riscv_cpu_mmu_index(CPURISCVState *env, bool ifetch)
 #ifndef CONFIG_USER_ONLY
 static int riscv_cpu_local_irq_pending(CPURISCVState *env)
 {
+if (riscv_feature(env, RISCV_FEATURE_RNMI)) {
+/* Priority: RNMI > Other interrupt. */
+if (env->nmip && env->nmie) {
+return ctz64(env->nmip); /* since non-zero */
+} else if (!env->nmie) {
+/*
+ * We are already in RNMI handler,
+ * other interrupts cannot preempt.
+ */
+return EXCP_NONE;
+}
+}
+
 target_ulong irqs;
 
 target_ulong mstatus_mie = get_field(env->mstatus, MSTATUS_MIE);
@@ -80,7 +93,9 @@ static int riscv_cpu_local_irq_pending(CPURISCVState *env)
 bool riscv_cpu_exec_interrupt(CPUState *cs, int interrupt_request)
 {
 #if !defined(CONFIG_USER_ONLY)
-if (interrupt_request & CPU_INTERRUPT_HARD) {
+uint32_t mask = CPU_INTERRUPT_HARD | CPU_INTERRUPT_RNMI;
+
+if (interrupt_request & mask) {
 RISCVCPU *cpu = RISCV_CPU(cs);
 CPURISCVState *env = &cpu->env;
 int interruptno = riscv_cpu_local_irq_pending(env);
@@ -909,6 +924,23 @@ void riscv_cpu_do_interrupt(CPUState *cs)
 target_ulong tval = 0;
 target_ulong htval = 0;
 target_ulong mtval2 = 0;
+target_ulong nextpc;
+bool nmi_execp = false;
+
+if (riscv_feature(env, RISCV_FEATURE_RNMI)) {
+nmi_execp = !env->nmie && !async;
+
+if (env->nmip && async) {
+env->nmie = false;
+env->mnstatus = set_field(env->mnstatus, MSTATUS_MPP,
+  env->priv);
+env->mncause = cause;
+env->mnepc = env->pc;
+env->pc = env->rnmi_irqvec;
+riscv_cpu_set_mode(env, PRV_M);
+goto handled;
+}
+}
 
 if  (cause == RISCV_EXCP_SEMIHOST) {
 if (env->priv >= PRV_S) {
@@ -967,7 +999,7 @@ void riscv_cpu_do_interrupt(CPUState *cs)
   __func__, env->mhartid, async, cause, env->pc, tval,
   riscv_cpu_get_trap_name(cause, async));
 
-if (env->priv <= PRV_S &&
+if (env->priv <= PRV_S && !nmi_execp &&
 cause < TARGET_LONG_BITS && ((deleg >> cause) & 1)) {
 /* handle the trap in S-mode */
 if (riscv_has_ext(env, RVH)) {
@@ -1056,8 +1088,15 @@ void riscv_cpu_do_interrupt(CPUState *cs)
 env->mepc = env->pc;
 env->mbadaddr = tval;
 env->mtval2 = mtval2;
-env->pc = (env->mtvec >> 2 << 2) +
-((async && (env->mtvec & 3) == 1) ? cause * 4 : 0);
+
+if (nmi_execp) {
+nextpc = env->rnmi_excpvec;
+} else {
+nextpc = (env->mtvec >> 2 << 2) +
+((async && (env->mtvec & 3) == 1) ? cause * 4 : 0);
+}
+env->pc = nextpc;
+
 riscv_cpu_set_mode(env, PRV_M);
 }
 
@@ -1068,6 +1107,8 @@ void riscv_cpu_do_interrupt(CPUState *cs)
  */
 
 env->two_stage_lookup = false;
+
+handled:
 #endif
 cs->exception_index = EXCP_NONE; /* mark handled to qemu */
 }
-- 
2.17.1




[RFC v2 0/4] target/riscv: add RNMI support

2021-04-01 Thread frank . chang
From: Frank Chang 

This patchset add suport of Resumable NMI (RNMI) in RISC-V.

There are four new CSRs and one new instruction added to allow NMI to be
resumable in RISC-V, which are:

=
  * mnscratch (0x350)
  * mnepc (0x351)
  * mncause   (0x352)
  * mnstatus  (0x353)
=
  * mnret: To return from RNMI interrupt/exception handler.
=

RNMI also has higher priority than any other interrupts or exceptions
and cannot be disable by software.

RNMI may be used to route to other devices such as Bus Error Unit or
Watchdog Timer in the future.

The interrupt/exception trap handler addresses of RNMI are
implementation defined.

The technical proposal of RNMI can be found:
https://lists.riscv.org/g/tech-privileged/message/421

The port is available here:
https://github.com/sifive/qemu/tree/nmi-upstream-v2

To test RNMI, we have created another QEMU branch to have
RNMI feature enabled and also both SiFive Bus Error Unit and
Error Device included on sifive_e machine.

Bus Error Unit (BEU) is routed to RNMI with mncause value set to 3.
When any reads or writes to Error Device, it will drive BEU RNMI.
A freedom-e-sdk RNMI example is also provided for testing.
(We will also upstream BEU and Error Device in the near future.)

Two -cpu options are added for RNMI and BEU:
  * rnmi=true to enable RNMI feature
  * beu=true to enable BEU feature

Download and build freedom-e-sdk rnmi example:

1. git clone g...@github.com:sifive/freedom-e-sdk.git
2. cd freedom-e-sdk
3. git checkout origin/dev/yihaoc/nmi -b nmi
4. git submodule init
5. git submodule update --recursive
6. Follow freedom-e-sdk guide to install freedom-e-sdk:
   https://sifive.github.io/freedom-e-sdk-docs/index.html
7. make PROGRAM=example-rnmi TARGET=qemu-sifive-e31 \
CONFIGURATION=release software

Download, build and run freedom-e-sdk RNMI example on QEMU:

1. git clone g...@github.com:sifive/qemu.git
2. cd qemu
3. git checkout origin/upstream-nmi-beu-error-device -b nmi-beu-error-device
4. git submodule init
5. git submodule update --recursive
6. ./configure --target-list=riscv32-softmmu
7. make -j
8.  -nographic -M sifive_e \
-cpu rv32,rnmi=true,beu=true \
--bios none -kernel 

Output example:

RNMI Driver Example.
Cleared accrued bus error
Enter RNMI interrupt ISR.
mnscratch: 0x
mnepc: 0x20401178
mncause: 0x0003
mnstatus: 0x1800
Try to trigger illegal instruction exception.
Enter RNMI exception ISR.
mscratch: 0x
mepc: 0x20401208
mcause: 0x0002
mstatus: 0x80007800
Return from RNMI exception ISR.
Handled TileLink bus error
Return from RNMI interrupt ISR.
Test passed!!

Changelog:

v2
  * split up the series into more commits for convenience of review.
  * add missing rnmi_irqvec and rnmi_excpvec properties to riscv_harts.

Frank Chang (4):
  target/riscv: add RNMI cpu feature
  target/riscv: add RNMI CSRs
  target/riscv: handle RNMI interrupt and exception
  target/riscv: add RNMI mnret instruction

 hw/riscv/riscv_hart.c |  8 +++
 include/hw/riscv/riscv_hart.h |  2 +
 target/riscv/cpu.c| 40 +
 target/riscv/cpu.h| 16 -
 target/riscv/cpu_bits.h   | 19 ++
 target/riscv/cpu_helper.c | 49 +--
 target/riscv/csr.c| 59 +++
 target/riscv/helper.h |  1 +
 target/riscv/insn32.decode|  3 +
 .../riscv/insn_trans/trans_privileged.c.inc   | 13 
 target/riscv/op_helper.c  | 31 ++
 11 files changed, 236 insertions(+), 5 deletions(-)

--
2.17.1




[RFC v2 4/4] target/riscv: add RNMI mnret instruction

2021-04-01 Thread frank . chang
From: Frank Chang 

Signed-off-by: Frank Chang 
---
 target/riscv/helper.h |  1 +
 target/riscv/insn32.decode|  3 ++
 .../riscv/insn_trans/trans_privileged.c.inc   | 13 
 target/riscv/op_helper.c  | 31 +++
 4 files changed, 48 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index e3f3f41e891..0914d777d6d 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -65,6 +65,7 @@ DEF_HELPER_4(csrrc, tl, env, tl, tl, tl)
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_2(sret, tl, env, tl)
 DEF_HELPER_2(mret, tl, env, tl)
+DEF_HELPER_2(mnret, tl, env, tl)
 DEF_HELPER_1(wfi, void, env)
 DEF_HELPER_1(tlb_flush, void, env)
 #endif
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 84080dd18ca..557f3394276 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -97,6 +97,9 @@ wfi 000100000101 0 000 0 1110011
 sfence_vma  0001001. . 000 0 1110011 @sfence_vma
 sfence_vm   000100000100 . 000 0 1110011 @sfence_vm
 
+# *** NMI ***
+mnret   011100000010 0 000 0 1110011
+
 # *** RV32I Base Instruction Set ***
 lui     . 0110111 @u
 auipc   . 0010111 @u
diff --git a/target/riscv/insn_trans/trans_privileged.c.inc 
b/target/riscv/insn_trans/trans_privileged.c.inc
index 32312be2024..63c49dfe6fb 100644
--- a/target/riscv/insn_trans/trans_privileged.c.inc
+++ b/target/riscv/insn_trans/trans_privileged.c.inc
@@ -106,6 +106,19 @@ static bool trans_mret(DisasContext *ctx, arg_mret *a)
 #endif
 }
 
+static bool trans_mnret(DisasContext *ctx, arg_mnret *a)
+{
+#ifndef CONFIG_USER_ONLY
+tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);
+gen_helper_mnret(cpu_pc, cpu_env, cpu_pc);
+exit_tb(ctx); /* no chaining */
+ctx->base.is_jmp = DISAS_NORETURN;
+return true;
+#else
+return false;
+#endif
+}
+
 static bool trans_wfi(DisasContext *ctx, arg_wfi *a)
 {
 #ifndef CONFIG_USER_ONLY
diff --git a/target/riscv/op_helper.c b/target/riscv/op_helper.c
index 1eddcb94de7..b9601776153 100644
--- a/target/riscv/op_helper.c
+++ b/target/riscv/op_helper.c
@@ -175,6 +175,37 @@ target_ulong helper_mret(CPURISCVState *env, target_ulong 
cpu_pc_deb)
 return retpc;
 }
 
+target_ulong helper_mnret(CPURISCVState *env, target_ulong cpu_pc_deb)
+{
+if (!riscv_feature(env, RISCV_FEATURE_RNMI)) {
+/* RNMI feature is not presented. */
+riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());
+}
+
+if (!(env->priv >= PRV_M)) {
+riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());
+}
+
+/* Get return PC from mnepc CSR. */
+target_ulong retpc = env->mnepc;
+if (!riscv_has_ext(env, RVC) && (retpc & 0x3)) {
+riscv_raise_exception(env, RISCV_EXCP_INST_ADDR_MIS, GETPC());
+}
+
+/* Get previous privilege level from mnstatus CSR. */
+target_ulong prev_priv = get_field(env->mnstatus, MNSTATUS_MPP);
+
+if (!pmp_get_num_rules(env) && (prev_priv != PRV_M)) {
+riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());
+}
+
+riscv_cpu_set_mode(env, prev_priv);
+
+env->nmie = true;
+
+return retpc;
+}
+
 void helper_wfi(CPURISCVState *env)
 {
 CPUState *cs = env_cpu(env);
-- 
2.17.1




Re: [PATCH v3 05/11] esp: introduce esp_fifo_pop_buf() and use it instead of fifo8_pop_buf()

2021-04-01 Thread Philippe Mathieu-Daudé
On 4/1/21 9:49 AM, Mark Cave-Ayland wrote:
> The const pointer returned by fifo8_pop_buf() lies directly within the array 
> used
> to model the FIFO. Building with address sanitisers enabled shows that if the

Typo "sanitizers"

> caller expects a minimum number of bytes present then if the FIFO is nearly 
> full,
> the caller may unexpectedly access past the end of the array.

Why isn't it a problem with the other models? Because the pointed
buffer is consumed directly?

> Introduce esp_fifo_pop_buf() which takes a destination buffer and performs a
> memcpy() in it to guarantee that the caller cannot overwrite the FIFO array 
> and
> update all callers to use it. Similarly add underflow protection similar to
> esp_fifo_push() and esp_fifo_pop() so that instead of triggering an assert()
> the operation becomes a no-op.

This is OK for your ESP model.

Now thinking loudly about the Fifo8 API, shouldn't this be part of it?

Something prototype like:

  /**
   * fifo8_pop_buf:
   * @do_copy: If %true, also copy data to @bufptr.
   */
  size_t fifo8_pop_buf(Fifo8 *fifo,
   void **bufptr,
   size_t buflen,
   bool do_copy);

> 
> Buglink: https://bugs.launchpad.net/qemu/+bug/1909247
> Signed-off-by: Mark Cave-Ayland 
> ---
>  hw/scsi/esp.c | 41 +
>  1 file changed, 29 insertions(+), 12 deletions(-)
> 
> diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
> index ce88866803..1aa2caf57d 100644
> --- a/hw/scsi/esp.c
> +++ b/hw/scsi/esp.c
> @@ -107,6 +107,7 @@ static void esp_fifo_push(Fifo8 *fifo, uint8_t val)
>  
>  fifo8_push(fifo, val);
>  }
> +
>  static uint8_t esp_fifo_pop(Fifo8 *fifo)
>  {
>  if (fifo8_is_empty(fifo)) {
> @@ -116,6 +117,23 @@ static uint8_t esp_fifo_pop(Fifo8 *fifo)
>  return fifo8_pop(fifo);
>  }
>  
> +static uint32_t esp_fifo_pop_buf(Fifo8 *fifo, uint8_t *dest, int maxlen)
> +{
> +const uint8_t *buf;
> +uint32_t n;
> +
> +if (maxlen == 0) {
> +return 0;
> +}
> +
> +buf = fifo8_pop_buf(fifo, maxlen, &n);
> +if (dest) {
> +memcpy(dest, buf, n);
> +}
> +
> +return n;
> +}

Reviewed-by: Philippe Mathieu-Daudé 



Re: [RFC v2 0/4] target/riscv: add RNMI support

2021-04-01 Thread Frank Chang
 於 2021年4月1日 週四 下午5:27寫道:

> From: Frank Chang 
>
> This patchset add suport of Resumable NMI (RNMI) in RISC-V.
>
> There are four new CSRs and one new instruction added to allow NMI to be
> resumable in RISC-V, which are:
>
> =
>   * mnscratch (0x350)
>   * mnepc (0x351)
>   * mncause   (0x352)
>   * mnstatus  (0x353)
> =
>   * mnret: To return from RNMI interrupt/exception handler.
> =
>
> RNMI also has higher priority than any other interrupts or exceptions
> and cannot be disable by software.
>
> RNMI may be used to route to other devices such as Bus Error Unit or
> Watchdog Timer in the future.
>
> The interrupt/exception trap handler addresses of RNMI are
> implementation defined.
>
> The technical proposal of RNMI can be found:
> https://lists.riscv.org/g/tech-privileged/message/421
>
> The port is available here:
> https://github.com/sifive/qemu/tree/nmi-upstream-v2
>
> To test RNMI, we have created another QEMU branch to have
> RNMI feature enabled and also both SiFive Bus Error Unit and
> Error Device included on sifive_e machine.
>
> Bus Error Unit (BEU) is routed to RNMI with mncause value set to 3.
> When any reads or writes to Error Device, it will drive BEU RNMI.
> A freedom-e-sdk RNMI example is also provided for testing.
> (We will also upstream BEU and Error Device in the near future.)
>
> Two -cpu options are added for RNMI and BEU:
>   * rnmi=true to enable RNMI feature
>   * beu=true to enable BEU feature
>
> Download and build freedom-e-sdk rnmi example:
>
> 1. git clone g...@github.com:sifive/freedom-e-sdk.git
> 2. cd freedom-e-sdk
> 3. git checkout origin/dev/yihaoc/nmi -b nmi
> 4. git submodule init
> 5. git submodule update --recursive
> 6. Follow freedom-e-sdk guide to install freedom-e-sdk:
>https://sifive.github.io/freedom-e-sdk-docs/index.html
> 7. make PROGRAM=example-rnmi TARGET=qemu-sifive-e31 \
> CONFIGURATION=release software
>

freedom-e-sdk RNMI example code is available at:
https://github.com/sifive/freedom-e-sdk/tree/dev/yihaoc/nmi/software/example-rnmi


>
> Download, build and run freedom-e-sdk RNMI example on QEMU:
>
> 1. git clone g...@github.com:sifive/qemu.git
> 2. cd qemu
> 3. git checkout origin/upstream-nmi-beu-error-device -b
> nmi-beu-error-device
> 4. git submodule init
> 5. git submodule update --recursive
> 6. ./configure --target-list=riscv32-softmmu
> 7. make -j
> 8.  -nographic -M sifive_e \
> -cpu rv32,rnmi=true,beu=true \
> --bios none -kernel 
>
> Output example:
>
> RNMI Driver Example.
> Cleared accrued bus error
> Enter RNMI interrupt ISR.
> mnscratch: 0x
> mnepc: 0x20401178
> mncause: 0x0003
> mnstatus: 0x1800
> Try to trigger illegal instruction exception.
> Enter RNMI exception ISR.
> mscratch: 0x
> mepc: 0x20401208
> mcause: 0x0002
> mstatus: 0x80007800
> Return from RNMI exception ISR.
> Handled TileLink bus error
> Return from RNMI interrupt ISR.
> Test passed!!
>
> Changelog:
>
> v2
>   * split up the series into more commits for convenience of review.
>   * add missing rnmi_irqvec and rnmi_excpvec properties to riscv_harts.
>
> Frank Chang (4):
>   target/riscv: add RNMI cpu feature
>   target/riscv: add RNMI CSRs
>   target/riscv: handle RNMI interrupt and exception
>   target/riscv: add RNMI mnret instruction
>
>  hw/riscv/riscv_hart.c |  8 +++
>  include/hw/riscv/riscv_hart.h |  2 +
>  target/riscv/cpu.c| 40 +
>  target/riscv/cpu.h| 16 -
>  target/riscv/cpu_bits.h   | 19 ++
>  target/riscv/cpu_helper.c | 49 +--
>  target/riscv/csr.c| 59 +++
>  target/riscv/helper.h |  1 +
>  target/riscv/insn32.decode|  3 +
>  .../riscv/insn_trans/trans_privileged.c.inc   | 13 
>  target/riscv/op_helper.c  | 31 ++
>  11 files changed, 236 insertions(+), 5 deletions(-)
>
> --
> 2.17.1
>
>
>


Re: [PATCH v2 2/2] hw/smbios: retrieve PCI address from specified device for Type 41

2021-04-01 Thread Daniel P . Berrangé
On Thu, Apr 01, 2021 at 10:25:44AM +0200, Vincent Bernat wrote:
> Instead of specifying the PCI address manually, the device can be
> specified by ID:
> 
> $QEMU -netdev user,id=internet
>   -device 
> virtio-net-pci,mac=50:54:00:00:00:42,netdev=internet,id=internet-dev \
>   -smbios type=41,designation='Onboard 
> LAN',instance=1,kind=ethernet,pcidev=internet-dev
> 
> The PCI segment is assumed to be 0. This should hold true for most
> cases.
> 
> $ dmidecode -t 41
> # dmidecode 3.3
> Getting SMBIOS data from sysfs.
> SMBIOS 2.8 present.
> 
> Handle 0x2900, DMI type 41, 11 bytes
> Onboard Device
> Reference Designation: Onboard LAN
> Type: Ethernet
> Status: Enabled
> Type Instance: 1
> Bus Address: :00:09.0
> 
> $ ip -brief a
> lo   UNKNOWN127.0.0.1/8 ::1/128
> eno1 UP 10.0.2.14/24 fec0::5254:ff:fe00:42/64 
> fe80::5254:ff:fe00:42/64
> 
> Signed-off-by: Vincent Bernat 
> ---
>  hw/smbios/smbios.c | 47 +-
>  qemu-options.hx|  2 +-
>  2 files changed, 22 insertions(+), 27 deletions(-)

It doesn't really make sense to have this as a separate patch
when it is deleting half the code you added in the previous
patch. Just merge them together as one.

> 
> diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
> index 46a08652dff4..0f390e03453c 100644
> --- a/hw/smbios/smbios.c
> +++ b/hw/smbios/smbios.c
> @@ -27,6 +27,7 @@
>  #include "hw/firmware/smbios.h"
>  #include "hw/loader.h"
>  #include "hw/boards.h"
> +#include "hw/pci/pci.h"
>  #include "smbios_build.h"
>  
>  /* legacy structures and constants for <= 2.0 machines */
> @@ -134,12 +135,8 @@ static QEnumLookup type41_kind_lookup = {
>  .size = 10
>  };
>  struct type41_instance {
> -const char *designation;
> +const char *designation, *pcidev;
>  uint8_t instance, kind;
> -struct {
> -uint16_t segment;
> -uint8_t bus, device;
> -} pci;
>  QTAILQ_ENTRY(type41_instance) next;
>  };
>  static QTAILQ_HEAD(, type41_instance) type41 = 
> QTAILQ_HEAD_INITIALIZER(type41);
> @@ -403,10 +400,9 @@ static const QemuOptDesc qemu_smbios_type41_opts[] = {
>  .type = QEMU_OPT_NUMBER,
>  .help = "device type instance",
>  },{
> -.name = "pci",
> +.name = "pcidev",
>  .type = QEMU_OPT_STRING,
>  .help = "PCI device",
> -.def_value_str = "0:0.0",
>  },
>  { /* end of list */ }
>  };
> @@ -837,9 +833,23 @@ static void smbios_build_type_41_table(void)
>  SMBIOS_TABLE_SET_STR(41, reference_designation_str, 
> t41->designation);
>  t->device_type = t41->kind;
>  t->device_type_instance = t41->instance;
> -t->segment_group_number = cpu_to_le16(t41->pci.segment);
> -t->bus_number = t41->pci.bus;
> -t->device_number = t41->pci.device;
> +
> +if (t41->pcidev) {
> +PCIDevice *pdev = NULL;
> +int rc = pci_qdev_find_device(t41->pcidev, &pdev);
> +if (rc == 0) {
> +/*
> + * TODO: Extract the appropriate value. Most of the
> + * time, this will be 0.
> + */
> +t->segment_group_number = cpu_to_le16(0);

Hmm, tricky, as it requires interpreting the PCI topology. Wonder if
there's any helper that can do the hard work for you

> +t->bus_number = pci_dev_bus_num(pdev);
> +t->device_number = pdev->devfn;
> +} else {
> +fprintf(stderr, "%s: cannot find PCI device %s\n",
> +__func__, t41->pcidev);

This isn't terminating execution which looks like a bug.

Modify this method to have an 'Error **errp' parameter and
use 'error_setg' to report it.  You'll need to modify the
smbios_get_tables caller to have an 'Error **errp' too.

For the callers of smbios_get_tables(), you can then just pass
in '&error_fatal', to make it print the error + exit.

> +}
> +}
>  
>  SMBIOS_BUILD_TABLE_POST;
>  instance++;
> @@ -1301,7 +1311,6 @@ void smbios_entry_add(QemuOpts *opts, Error **errp)
>  case 41: {
>  struct type41_instance *t;
>  Error *local_err = NULL;
> -int pseg, pbus, pdevice, pfunction;
>  
>  if (!qemu_opts_validate(opts, qemu_smbios_type41_opts, errp)) {
>  return;
> @@ -1324,21 +1333,7 @@ void smbios_entry_add(QemuOpts *opts, Error **errp)
>  return;
>  }
>  t->instance = qemu_opt_get_number(opts, "instance", 1);
> -if (sscanf(qemu_opt_get(opts, "pci"), "%x:%x:%x.%x",
> -   &pseg,
> -   &pbus,
> -   &pdevice,
> -   &pfunction) != 4) {
> -error_setg(errp, "unable to parse %s: %s

Re: [RFC PATCH] block: always update auto_backing_file to full path

2021-04-01 Thread Max Reitz

On 01.04.21 06:22, Joe Jin wrote:

Some time after created snapshot, auto_backing_file only has filename,
this confused overridden check, update it to full path if it is not.

Signed-off-by: Joe Jin 
---
  block.c | 13 +
  1 file changed, 13 insertions(+)


Do you have a test for this?

The thing is, I’m not sure about this solution, and I think having a 
test would help me understand better.
bs->auto_backing_file is meant to track what filename a BDS would have 
if we opened bs->backing_file.  To this end, we generally set it to 
whatever bs->backing_file is and then refresh it when we actually do 
open a BDS from it.


So it seems strange to blindly modify it somewhere that doesn’t have to 
do with any of these things.


Now, when opening a backing file from bs->backing_file, we first do make 
it an absolute filename via bdrv_get_full_backing_filename().  So it 
kind of seems prudent to replicate that elsewhere, but I’m not sure 
where exactly.  I would think the best place would be whenever 
auto_backing_file is set to be equal to backing_file (which is generally 
in the image format drivers, when they read the backing file string from 
the image metadata), but that might break the strcmp() in 
bdrv_open_backing_file()...


I don’t think bdrv_refresh_filename() is the right place, though, 
because I’m afraid that this might modify filenames we’ve already 
retrieved from the actual backing BDS, or something.


For example, with this patch applied, iotest 024 fails.


diff --git a/block.c b/block.c
index c5b887cec1..8f9a027dee 100644
--- a/block.c
+++ b/block.c
@@ -6969,6 +6969,19 @@ void bdrv_refresh_filename(BlockDriverState *bs)
  return;
  }
  
+    /* auto_backing_file only has filename, update it to the full path */

+    if (bs->backing && bs->auto_backing_file[0] != '/') {
+    char *backing_filename = NULL;
+    Error *local_err = NULL;
+
+    backing_filename = bdrv_make_absolute_filename(bs,
+ bs->auto_backing_file, &local_err);
+    if (!local_err && backing_filename) {
+    pstrcpy(bs->auto_backing_file, sizeof(bs->auto_backing_file),
+ backing_filename);
+    g_free(backing_filename);
+    }
+    }


All spaces here are 0xa0 (non-breaking space), which is kind of broken 
and makes it difficult to apply this patch.


Furthermore, if local_err != NULL, we’d need to free it.

Max


  backing_overridden = bdrv_backing_overridden(bs);
  
  if (bs->open_flags & BDRV_O_NO_IO) {







[1/1] tcg/mips: Fix SoftTLB comparison on mips backend

2021-04-01 Thread Kele Huang
The addrl used to compare with SoftTLB entry should be sign-extended
in common case, and it will cause constant failing in SoftTLB
comparisons for the addrl whose address is over 0x8000 on the
emulation of 32-bit guest on 64-bit host.

This is an important performance bug fix. Spec2000 gzip rate increase
from ~45 to ~140 on Loongson 3A4000 (MIPS compatible platform).

Signed-off-by: Kele Huang 
---
 tcg/mips/tcg-target.c.inc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index 8738a3a581..8b16726242 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -1201,13 +1201,13 @@ static void tcg_out_tlb_load(TCGContext *s, TCGReg 
base, TCGReg addrl,
load the tlb addend for the fast path.  */
 tcg_out_ld(s, TCG_TYPE_PTR, TCG_TMP2, TCG_TMP3, add_off);
 }
-tcg_out_opc_reg(s, OPC_AND, TCG_TMP1, TCG_TMP1, addrl);
 
 /* Zero extend a 32-bit guest address for a 64-bit host. */
 if (TCG_TARGET_REG_BITS > TARGET_LONG_BITS) {
 tcg_out_ext32u(s, base, addrl);
 addrl = base;
 }
+tcg_out_opc_reg(s, OPC_AND, TCG_TMP1, TCG_TMP1, addrl);
 
 label_ptr[0] = s->code_ptr;
 tcg_out_opc_br(s, OPC_BNE, TCG_TMP1, TCG_TMP0);
-- 
2.30.0




Re: [PATCH] iotests: Test mirror-top filter permissions

2021-04-01 Thread Max Reitz

On 01.04.21 10:32, Vladimir Sementsov-Ogievskiy wrote:

31.03.2021 15:28, Max Reitz wrote:

Add a test accompanying commit 53431b9086b2832ca1aeff0c55e186e9ed79bd11
("block/mirror: Fix mirror_top's permissions").

Signed-off-by: Max Reitz 
---
  tests/qemu-iotests/tests/mirror-top-perms | 121 ++
  tests/qemu-iotests/tests/mirror-top-perms.out |   5 +
  2 files changed, 126 insertions(+)
  create mode 100755 tests/qemu-iotests/tests/mirror-top-perms
  create mode 100644 tests/qemu-iotests/tests/mirror-top-perms.out

diff --git a/tests/qemu-iotests/tests/mirror-top-perms 
b/tests/qemu-iotests/tests/mirror-top-perms

new file mode 100755
index 00..451a0666f8
--- /dev/null
+++ b/tests/qemu-iotests/tests/mirror-top-perms
@@ -0,0 +1,121 @@
+#!/usr/bin/env python3
+# group: rw
+#
+# Test permissions taken by the mirror-top filter
+#
+# Copyright (C) 2021 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import os
+import iotests
+from iotests import qemu_img
+
+# Import qemu after iotests.py has amended sys.path
+# pylint: disable=wrong-import-order
+import qemu
+
+
+image_size = 1 * 1024 * 1024
+source = os.path.join(iotests.test_dir, 'source.img')
+
+
+class TestMirrorTopPerms(iotests.QMPTestCase):
+    def setUp(self):
+    assert qemu_img('create', '-f', iotests.imgfmt, source,
+    str(image_size)) == 0
+    self.vm = iotests.VM()
+    self.vm.add_drive(source)
+
self.vm.add_blockdev(f'null-co,node-name=null,size={image_size}')

+    self.vm.launch()
+
+    # Will be created by the test function itself
+    self.vm_b = None
+
+    def tearDown(self):
+    try:
+    self.vm.shutdown()
+    except qemu.machine.AbnormalShutdown:
+    pass
+
+    if self.vm_b is not None:
+    self.vm_b.shutdown()
+
+    os.remove(source)
+
+    def test_cancel(self):
+    """
+    Before commit 53431b9086b28, mirror-top used to not take any
+    permissions but WRITE and share all permissions.  Because it
+    is inserted between the source's original parents and the
+    source, there generally was no parent that would have taken or
+    unshared any permissions on the source, which means that an
+    external process could access the image unhindered by locks.
+    (Unless there was a parent above the protocol node that would
+    take its own locks, e.g. a format driver.)
+    This is bad enough, but if the mirror job is then cancelled,
+    the mirroring VM tries to take back the image, restores the
+    original permissions taken and unshared, and assumes this must
+    just work.  But it will not, and so the VM aborts.
+
+    Commit 53431b9086b28 made mirror keep the original permissions
+    and so no other process can "steal" the image.
+
+    (Note that you cannot really do the same with the target image
+    and then completing the job, because the mirror job always
+    took/unshared the correct permissions on the target.  For
+    example, it does not share READ_CONSISTENT, which makes it
+    difficult to let some other qemu process open the image.)
+    """
+
+    result = self.vm.qmp('blockdev-mirror',
+ job_id='mirror',
+ device='drive0',
+ target='null',
+ sync='full')
+    self.assert_qmp(result, 'return', {})
+
+    self.vm.event_wait('BLOCK_JOB_READY')
+
+    # We want this to fail because the image cannot be locked.
+    # If it does not fail, continue still and see what happens.


This comment is about vm_b.launch(), not about creating vm object. 
Probably better to move it down


I kind of meant this as a general comment on what this VM is for.  I 
think I kind of like any comment here, and I don’t see a practical 
advantage in having this comment above the try-except, because the whole 
“launch VM, see it fail” block is kind of one monolithic concept.



+    self.vm_b = iotests.VM(path_suffix='b')
+    # Must use -blockdev -device so we can use share-rw.
+    # (And we need share-rw=on because mirror-top was always
+    # forced to take the WRITE permission so it can write to the
+    # source image.)
+
self.vm_b.add_blockdev(f'file,node-n

Re: [PATCH v2 2/2] hw/smbios: retrieve PCI address from specified device for Type 41

2021-04-01 Thread Vincent Bernat
 ❦  1 avril 2021 10:38 +01, Daniel P. Berrangé:

>>  hw/smbios/smbios.c | 47 +-
>>  qemu-options.hx|  2 +-
>>  2 files changed, 22 insertions(+), 27 deletions(-)
>
> It doesn't really make sense to have this as a separate patch
> when it is deleting half the code you added in the previous
> patch. Just merge them together as one.

I'll do that.

>> +/*
>> + * TODO: Extract the appropriate value. Most of the
>> + * time, this will be 0.
>> + */
>> +t->segment_group_number = cpu_to_le16(0);
>
> Hmm, tricky, as it requires interpreting the PCI topology. Wonder if
> there's any helper that can do the hard work for you

There is pci_root_bus_path(), but it returns a string which could just
contain a segment or several segments. It seems the SMBIOS standard
didn't account for complex topologies. I could parse the string. and
keep only the right-most segment.

>> +t->bus_number = pci_dev_bus_num(pdev);
>> +t->device_number = pdev->devfn;
>> +} else {
>> +fprintf(stderr, "%s: cannot find PCI device %s\n",
>> +__func__, t41->pcidev);
>
> This isn't terminating execution which looks like a bug.

It was my intention. The PCI address will then be 00:00:00.0. If you
think it's better to terminate, I can do what you suggest.
-- 
Replace repetitive expressions by calls to a common function.
- The Elements of Programming Style (Kernighan & Plauger)



Re: [PATCH v2 2/2] hw/smbios: retrieve PCI address from specified device for Type 41

2021-04-01 Thread Daniel P . Berrangé
On Thu, Apr 01, 2021 at 12:07:30PM +0200, Vincent Bernat wrote:
>  ❦  1 avril 2021 10:38 +01, Daniel P. Berrangé:
> 
> >>  hw/smbios/smbios.c | 47 +-
> >>  qemu-options.hx|  2 +-
> >>  2 files changed, 22 insertions(+), 27 deletions(-)
> >
> > It doesn't really make sense to have this as a separate patch
> > when it is deleting half the code you added in the previous
> > patch. Just merge them together as one.
> 
> I'll do that.
> 
> >> +/*
> >> + * TODO: Extract the appropriate value. Most of the
> >> + * time, this will be 0.
> >> + */
> >> +t->segment_group_number = cpu_to_le16(0);
> >
> > Hmm, tricky, as it requires interpreting the PCI topology. Wonder if
> > there's any helper that can do the hard work for you
> 
> There is pci_root_bus_path(), but it returns a string which could just
> contain a segment or several segments. It seems the SMBIOS standard
> didn't account for complex topologies. I could parse the string. and
> keep only the right-most segment.
> 
> >> +t->bus_number = pci_dev_bus_num(pdev);
> >> +t->device_number = pdev->devfn;
> >> +} else {
> >> +fprintf(stderr, "%s: cannot find PCI device %s\n",
> >> +__func__, t41->pcidev);
> >
> > This isn't terminating execution which looks like a bug.
> 
> It was my intention. The PCI address will then be 00:00:00.0. If you
> think it's better to terminate, I can do what you suggest.

If we can't find the PCI device, that's user configuration error, and
we prefer to report those & exit, rather than continuing with likely
bogus data.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




[PATCH v2 04/11] tests/tcg/configure.sh: make sure we pick up x86_64 cross compilers

2021-04-01 Thread Alex Bennée
While it's hard to find cross compilers packaged for arches other than
x86_64 the same cannot be said for the x86_64 compiler which is
available on Debians i386, arm64 and ppc64el release architectures.

Signed-off-by: Alex Bennée 
---
 tests/tcg/configure.sh | 5 +
 1 file changed, 5 insertions(+)

diff --git a/tests/tcg/configure.sh b/tests/tcg/configure.sh
index 87a9f24b20..90fd81f506 100755
--- a/tests/tcg/configure.sh
+++ b/tests/tcg/configure.sh
@@ -198,6 +198,11 @@ for target in $target_list; do
   container_image=debian-sparc64-cross
   container_cross_cc=sparc64-linux-gnu-gcc
   ;;
+x86_64-*)
+  container_hosts="aarch64 ppc64el x86_64"
+  container_image=debian-amd64-cross
+  container_cross_cc=x86_64-linux-gnu-gcc
+  ;;
 xtensa*-softmmu)
   container_hosts=x86_64
   container_image=debian-xtensa-cross
-- 
2.20.1




[PATCH v2 09/11] docs/system/gdb.rst: Document how to debug multicore machines

2021-04-01 Thread Alex Bennée
From: Peter Maydell 

Document how multicore machines appear to GDB when debugged
via the debug stub. This is particularly non-intuitive for
the "multiple heterogenous clusters" case, but unfortunately
as far as I know there is no way with the remote protocol
for the stub to tell gdb "I have 2 inferiors, please connect
to both", so the user must set it all up manually.

Signed-off-by: Peter Maydell 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20210325175023.13838-3-peter.mayd...@linaro.org>
Signed-off-by: Alex Bennée 
---
 docs/system/gdb.rst | 55 +
 1 file changed, 55 insertions(+)

diff --git a/docs/system/gdb.rst b/docs/system/gdb.rst
index 0bb1bedf1b..144d083df3 100644
--- a/docs/system/gdb.rst
+++ b/docs/system/gdb.rst
@@ -45,6 +45,61 @@ Here are some useful tips in order to use gdb on system code:
 3. Use ``set architecture i8086`` to dump 16 bit code. Then use
``x/10i $cs*16+$eip`` to dump the code at the PC position.
 
+Debugging multicore machines
+
+
+GDB's abstraction for debugging targets with multiple possible
+parallel flows of execution is a two layer one: it supports multiple
+"inferiors", each of which can have multiple "threads". When the QEMU
+machine has more than one CPU, QEMU exposes each CPU cluster as a
+separate "inferior", where each CPU within the cluster is a separate
+"thread". Most QEMU machine types have identical CPUs, so there is a
+single cluster which has all the CPUs in it.  A few machine types are
+heterogenous and have multiple clusters: for example the ``sifive_u``
+machine has a cluster with one E51 core and a second cluster with four
+U54 cores. Here the E51 is the only thread in the first inferior, and
+the U54 cores are all threads in the second inferior.
+
+When you connect gdb to the gdbstub, it will automatically
+connect to the first inferior; you can display the CPUs in this
+cluster using the gdb ``info thread`` command, and switch between
+them using gdb's usual thread-management commands.
+
+For multi-cluster machines, unfortunately gdb does not by default
+handle multiple inferiors, and so you have to explicitly connect
+to them. First, you must connect with the ``extended-remote``
+protocol, not ``remote``::
+
+(gdb) target extended-remote localhost:1234
+
+Once connected, gdb will have a single inferior, for the
+first cluster. You need to create inferiors for the other
+clusters and attach to them, like this::
+
+  (gdb) add-inferior
+  Added inferior 2
+  (gdb) inferior 2
+  [Switching to inferior 2 [] ()]
+  (gdb) attach 2
+  Attaching to process 2
+  warning: No executable has been specified and target does not support
+  determining executable automatically.  Try using the "file" command.
+  0x in ?? ()
+
+Once you've done this, ``info threads`` will show CPUs in
+all the clusters you have attached to::
+
+  (gdb) info threads
+Id   Target Id Frame
+1.1  Thread 1.1 (cortex-m33-arm-cpu cpu [running]) 0x in ?? ()
+  * 2.1  Thread 2.2 (cortex-m33-arm-cpu cpu [halted ]) 0x in ?? ()
+
+You probably also want to set gdb to ``schedule-multiple`` mode,
+so that when you tell gdb to ``continue`` it resumes all CPUs,
+not just those in the cluster you are currently working on::
+
+  (gdb) set schedule-multiple on
+
 Advanced debugging options
 ==
 
-- 
2.20.1




[PATCH v2 02/11] tests/docker: don't set DOCKER_REGISTRY on non-x86_64

2021-04-01 Thread Alex Bennée
Currently our gitlab registry is x86_64 only so attempting to pull an
image from it on something else will end in tears.

Reviewed-by: Willian Rampazzo 
Reviewed-by: Richard Henderson 
Reviewed-by: Thomas Huth 
Signed-off-by: Alex Bennée 
---
 tests/docker/Makefile.include | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
index 7cab761bf5..9f464cb92c 100644
--- a/tests/docker/Makefile.include
+++ b/tests/docker/Makefile.include
@@ -16,7 +16,10 @@ DOCKER_IMAGES := $(sort $(notdir $(basename $(wildcard 
$(DOCKER_FILES_DIR)/*.doc
 DOCKER_TARGETS := $(patsubst %,docker-image-%,$(DOCKER_IMAGES))
 # Use a global constant ccache directory to speed up repetitive builds
 DOCKER_CCACHE_DIR := $$HOME/.cache/qemu-docker-ccache
-DOCKER_REGISTRY := $(if 
$(REGISTRY),$(REGISTRY),registry.gitlab.com/qemu-project/qemu)
+ifeq ($(HOST_ARCH),x86_64)
+DOCKER_DEFAULT_REGISTRY := registry.gitlab.com/qemu-project/qemu
+endif
+DOCKER_REGISTRY := $(if $(REGISTRY),$(REGISTRY),$(DOCKER_DEFAULT_REGISTRY))
 
 DOCKER_TESTS := $(notdir $(shell \
find $(SRC_PATH)/tests/docker/ -name 'test-*' -type f))
-- 
2.20.1




[PATCH for 6.0-rc2 v2 00/11] various fixes, pre-PR (check-tcg, gdbstub, gitlab)

2021-04-01 Thread Alex Bennée
Hi,

A few more patches have been added:

  - gdbstub documentation
  - tweak the gdbstub sha1 test
  - tweaks for gitlab

as well as fixing the i386-linux-user cross compile case (including
detecting the support for -no-pie for cross compilers). Other than
that it's just review tags. I plan to cut the PR from this post on the
morning of the 6th in time for rc2.

The following remain un-reviewed:

 - tests/tcg: relax the next step precision of the gdb sha1 test
 - tests/tcg/i386: force -fno-pie for test-i386
 - tests/tcg/configure.sh: make sure we pick up x86_64 cross compilers
 - tests/tcg: add concept of container_hosts

Alex Bennée (7):
  tests/tcg: update the defaults for x86 compilers
  tests/docker: don't set DOCKER_REGISTRY on non-x86_64
  tests/tcg: add concept of container_hosts
  tests/tcg/configure.sh: make sure we pick up x86_64 cross compilers
  tests/tcg/i386: expand .data sections for system tests
  tests/tcg/i386: force -fno-pie for test-i386
  tests/tcg: relax the next step precision of the gdb sha1 test

Peter Maydell (2):
  docs/system/gdb.rst: Add some more heading structure
  docs/system/gdb.rst: Document how to debug multicore machines

Thomas Huth (2):
  gitlab-ci.yml: Fix the filtering for the git submodules
  gitlab-ci.yml: Test the dtrace backend in one of the jobs

 docs/system/gdb.rst | 63 -
 configure   |  2 +-
 .gitlab-ci.yml  |  4 +-
 tests/docker/Makefile.include   |  5 +-
 tests/docker/dockerfiles/centos8.docker |  1 +
 tests/tcg/configure.sh  | 42 +++--
 tests/tcg/i386/Makefile.target  | 16 +--
 tests/tcg/i386/system/kernel.ld |  2 +-
 tests/tcg/multiarch/gdbstub/sha1.py |  5 +-
 9 files changed, 126 insertions(+), 14 deletions(-)

-- 
2.20.1




[PATCH v2 05/11] tests/tcg/i386: expand .data sections for system tests

2021-04-01 Thread Alex Bennée
Newer compilers might end up putting some data in .data.rel.local
which was getting skipped resulting in hilarious confusion on some
tests. Fix that.

Reviewed-by: Richard Henderson 
Reviewed-by: Thomas Huth 
Signed-off-by: Alex Bennée 
---
 tests/tcg/i386/system/kernel.ld | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/tcg/i386/system/kernel.ld b/tests/tcg/i386/system/kernel.ld
index 92de525e93..27ea5bbe04 100644
--- a/tests/tcg/i386/system/kernel.ld
+++ b/tests/tcg/i386/system/kernel.ld
@@ -12,7 +12,7 @@ SECTIONS {
}
 
.data : {
-   *(.data)
+   *(.data*)
__load_en = .;
}
 
-- 
2.20.1




[PATCH v2 11/11] gitlab-ci.yml: Test the dtrace backend in one of the jobs

2021-04-01 Thread Alex Bennée
From: Thomas Huth 

We are using the dtrace backend in downstream RHEL, so testing this
in the CentOS 8 task seems to be a good fit.

Signed-off-by: Thomas Huth 
Reviewed-by: Willian Rampazzo 
Message-Id: <20210331160351.3071279-1-th...@redhat.com>
Signed-off-by: Alex Bennée 
---
 .gitlab-ci.yml  | 2 +-
 tests/docker/dockerfiles/centos8.docker | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index 85b8e10b84..52d65d6c04 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -223,7 +223,7 @@ build-system-centos:
   variables:
 IMAGE: centos8
 CONFIGURE_ARGS: --disable-nettle --enable-gcrypt --enable-fdt=system
---enable-modules
+--enable-modules --enable-trace-backends=dtrace
 TARGETS: ppc64-softmmu or1k-softmmu s390x-softmmu
   x86_64-softmmu rx-softmmu sh4-softmmu nios2-softmmu
 MAKE_CHECK_ARGS: check-build
diff --git a/tests/docker/dockerfiles/centos8.docker 
b/tests/docker/dockerfiles/centos8.docker
index a763d55730..a8c6c528b0 100644
--- a/tests/docker/dockerfiles/centos8.docker
+++ b/tests/docker/dockerfiles/centos8.docker
@@ -29,6 +29,7 @@ ENV PACKAGES \
 rdma-core-devel \
 spice-glib-devel \
 spice-server \
+systemtap-sdt-devel \
 tar \
 zlib-devel
 
-- 
2.20.1




[PATCH v2 07/11] tests/tcg: relax the next step precision of the gdb sha1 test

2021-04-01 Thread Alex Bennée
Depending on the version of gdb we may not execute the first line of
SHA1Init when executing the first "next" command - instead just
stepping over the preamble. As we don't actually care about the
position of the PC after the steps and want to be sure the
context->state[] has been loaded before we inspect it do a double next
at the start.

Signed-off-by: Alex Bennée 
---
 tests/tcg/multiarch/gdbstub/sha1.py | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tests/tcg/multiarch/gdbstub/sha1.py 
b/tests/tcg/multiarch/gdbstub/sha1.py
index 2bfde49633..423b720e6d 100644
--- a/tests/tcg/multiarch/gdbstub/sha1.py
+++ b/tests/tcg/multiarch/gdbstub/sha1.py
@@ -40,7 +40,10 @@ def run_test():
 
 check_break("SHA1Init")
 
-# check step and inspect values
+# Check step and inspect values. We do a double next after the
+# breakpoint as depending on the version of gdb we may step the
+# preamble and not the first actual line of source.
+gdb.execute("next")
 gdb.execute("next")
 val_ctx = gdb.parse_and_eval("context->state[0]")
 exp_ctx = 0x67452301
-- 
2.20.1




[PATCH v2 01/11] tests/tcg: update the defaults for x86 compilers

2021-04-01 Thread Alex Bennée
You don't usually notice this is broken on developer system on x86 as
we use the normal host compiler. However on other systems the -pc was
extraneous. Also for 32 bit only i686 packages exist now so we should
use those when available.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
 tests/tcg/configure.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/tcg/configure.sh b/tests/tcg/configure.sh
index ce304f4933..af4aecf14e 100755
--- a/tests/tcg/configure.sh
+++ b/tests/tcg/configure.sh
@@ -52,7 +52,7 @@ fi
 : ${cross_cc_hexagon="hexagon-unknown-linux-musl-clang"}
 : ${cross_cc_cflags_hexagon="-mv67 -O2 -static"}
 : ${cross_cc_hppa="hppa-linux-gnu-gcc"}
-: ${cross_cc_i386="i386-pc-linux-gnu-gcc"}
+: ${cross_cc_i386="i686-linux-gnu-gcc"}
 : ${cross_cc_cflags_i386="-m32"}
 : ${cross_cc_m68k="m68k-linux-gnu-gcc"}
 : $(cross_cc_mips64el="mips64el-linux-gnuabi64-gcc")
@@ -69,7 +69,7 @@ fi
 : ${cross_cc_cflags_sparc="-m32 -mv8plus -mcpu=ultrasparc"}
 : ${cross_cc_sparc64="sparc64-linux-gnu-gcc"}
 : ${cross_cc_cflags_sparc64="-m64 -mcpu=ultrasparc"}
-: ${cross_cc_x86_64="x86_64-pc-linux-gnu-gcc"}
+: ${cross_cc_x86_64="x86_64-linux-gnu-gcc"}
 : ${cross_cc_cflags_x86_64="-m64"}
 
 for target in $target_list; do
-- 
2.20.1




[PATCH v2 03/11] tests/tcg: add concept of container_hosts

2021-04-01 Thread Alex Bennée
While docker is nominally multarch these days it doesn't mean our
distros actually package all cross compilers for all architectures.
The upcoming Debian bullseye release will improve things further. At
least for now we can get things like the 32 bit ARM compiler on it's
64 bit cousin.

Signed-off-by: Alex Bennée 
---
 configure  |  2 +-
 tests/tcg/configure.sh | 27 +--
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/configure b/configure
index 535e6a9269..7df7750a14 100755
--- a/configure
+++ b/configure
@@ -6299,7 +6299,7 @@ done
 (for i in $cross_cc_vars; do
   export $i
 done
-export target_list source_path use_containers
+export target_list source_path use_containers ARCH
 $source_path/tests/tcg/configure.sh)
 
 # temporary config to build submodules
diff --git a/tests/tcg/configure.sh b/tests/tcg/configure.sh
index af4aecf14e..87a9f24b20 100755
--- a/tests/tcg/configure.sh
+++ b/tests/tcg/configure.sh
@@ -108,79 +108,98 @@ for target in $target_list; do
   case $target in
 aarch64-*)
   # We don't have any bigendian build tools so we only use this for AArch64
+  container_hosts="x86_64 aarch64"
   container_image=debian-arm64-test-cross
   container_cross_cc=aarch64-linux-gnu-gcc-10
   ;;
 alpha-*)
+  container_hosts=x86_64
   container_image=debian-alpha-cross
   container_cross_cc=alpha-linux-gnu-gcc
   ;;
 arm-*)
   # We don't have any bigendian build tools so we only use this for ARM
+  container_hosts="x86_64 aarch64"
   container_image=debian-armhf-cross
   container_cross_cc=arm-linux-gnueabihf-gcc
   ;;
 cris-*)
+  container_hosts=x86_64
   container_image=fedora-cris-cross
   container_cross_cc=cris-linux-gnu-gcc
   ;;
 hppa-*)
+  container_hosts=x86_64
   container_image=debian-hppa-cross
   container_cross_cc=hppa-linux-gnu-gcc
   ;;
 i386-*)
+  container_hosts=x86_64
   container_image=fedora-i386-cross
   container_cross_cc=gcc
   ;;
 m68k-*)
+  container_hosts=x86_64
   container_image=debian-m68k-cross
   container_cross_cc=m68k-linux-gnu-gcc
   ;;
 mips64el-*)
+  container_hosts=x86_64
   container_image=debian-mips64el-cross
   container_cross_cc=mips64el-linux-gnuabi64-gcc
   ;;
 mips64-*)
+  container_hosts=x86_64
   container_image=debian-mips64-cross
   container_cross_cc=mips64-linux-gnuabi64-gcc
   ;;
 mipsel-*)
+  container_hosts=x86_64
   container_image=debian-mipsel-cross
   container_cross_cc=mipsel-linux-gnu-gcc
   ;;
 mips-*)
+  container_hosts=x86_64
   container_image=debian-mips-cross
   container_cross_cc=mips-linux-gnu-gcc
   ;;
 ppc-*|ppc64abi32-*)
+  container_hosts=x86_64
   container_image=debian-powerpc-cross
   container_cross_cc=powerpc-linux-gnu-gcc
   ;;
 ppc64-*)
+  container_hosts=x86_64
   container_image=debian-ppc64-cross
   container_cross_cc=powerpc64-linux-gnu-gcc
   ;;
 ppc64le-*)
+  container_hosts=x86_64
   container_image=debian-ppc64el-cross
   container_cross_cc=powerpc64le-linux-gnu-gcc
   ;;
 riscv64-*)
+  container_hosts=x86_64
   container_image=debian-riscv64-cross
   container_cross_cc=riscv64-linux-gnu-gcc
   ;;
 s390x-*)
+  container_hosts=x86_64
   container_image=debian-s390x-cross
   container_cross_cc=s390x-linux-gnu-gcc
   ;;
 sh4-*)
+  container_hosts=x86_64
   container_image=debian-sh4-cross
   container_cross_cc=sh4-linux-gnu-gcc
   ;;
 sparc64-*)
+  container_hosts=x86_64
   container_image=debian-sparc64-cross
   container_cross_cc=sparc64-linux-gnu-gcc
   ;;
 xtensa*-softmmu)
+  container_hosts=x86_64
   container_image=debian-xtensa-cross
 
   # default to the dc232b cpu
@@ -265,7 +284,11 @@ for target in $target_list; do
   done
 
   if test $got_cross_cc = no && test "$container" != no && test -n 
"$container_image"; then
-echo "DOCKER_IMAGE=$container_image" >> $config_target_mak
-echo "DOCKER_CROSS_CC_GUEST=$container_cross_cc" >> $config_target_mak
+  for host in $container_hosts; do
+  if test "$host" = "$ARCH"; then
+  echo "DOCKER_IMAGE=$container_image" >> $config_target_mak
+  echo "DOCKER_CROSS_CC_GUEST=$container_cross_cc" >> 
$config_target_mak
+  fi
+  done
   fi
 done
-- 
2.20.1




[PATCH v2 08/11] docs/system/gdb.rst: Add some more heading structure

2021-04-01 Thread Alex Bennée
From: Peter Maydell 

We're about to add a new section to gdb.rst. In
preparation, add some more headings so it isn't just
one huge run-on section.

Signed-off-by: Peter Maydell 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20210325175023.13838-2-peter.mayd...@linaro.org>
Signed-off-by: Alex Bennée 
---
 docs/system/gdb.rst | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/docs/system/gdb.rst b/docs/system/gdb.rst
index 72b1e68f4e..0bb1bedf1b 100644
--- a/docs/system/gdb.rst
+++ b/docs/system/gdb.rst
@@ -45,7 +45,11 @@ Here are some useful tips in order to use gdb on system code:
 3. Use ``set architecture i8086`` to dump 16 bit code. Then use
``x/10i $cs*16+$eip`` to dump the code at the PC position.
 
-Advanced debugging options:
+Advanced debugging options
+==
+
+Changing single-stepping behaviour
+^^
 
 The default single stepping behavior is step with the IRQs and timer
 service routines off. It is set this way because when gdb executes a
@@ -88,6 +92,8 @@ three commands you can query and set the single step behavior:
   sending: "qemu.sstep=0x5"
   received: "OK"
 
+Examining physical memory
+^
 
 Another feature that QEMU gdbstub provides is to toggle the memory GDB
 works with, by default GDB will show the current process memory respecting
-- 
2.20.1




[PATCH v2 10/11] gitlab-ci.yml: Fix the filtering for the git submodules

2021-04-01 Thread Alex Bennée
From: Thomas Huth 

Commit 7d7dbf9dc15be6e introduced a new line starting with
"GIT_SUBMODULES_ACTION=" in the config-host.mak file. The grep that
tries to determine the submodules in the gitlab-ci.yml file matches
this new line, too, causing a warning message when updating the modules:

 warn: ignoring non-existent submodule GIT_SUBMODULES_ACTION=update

Fix it by matching the "GIT_SUBMODULES=..." line only.

Signed-off-by: Thomas Huth 
Signed-off-by: Alex Bennée 
Message-Id: <20210331073316.2965928-1-th...@redhat.com>
---
 .gitlab-ci.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index 3480d79db3..85b8e10b84 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -42,7 +42,7 @@ include:
   image: $CI_REGISTRY_IMAGE/qemu/$IMAGE:latest
   script:
 - scripts/git-submodule.sh update
-$(grep GIT_SUBMODULES build/config-host.mak | sed 
's/GIT_SUBMODULES=//')
+$(sed -n '/GIT_SUBMODULES=/ s/.*=// p' build/config-host.mak)
 - cd build
 - find . -type f -exec touch {} +
 # Avoid recompiling by hiding ninja with NINJA=":"
-- 
2.20.1




[PATCH v2 06/11] tests/tcg/i386: force -fno-pie for test-i386

2021-04-01 Thread Alex Bennée
The containerised compiler defaults to no-pie anyway but if we are
relying on the users installed cross compiler we need to check it
works for building 16 bit code first.

Signed-off-by: Alex Bennée 
---
 tests/tcg/configure.sh |  6 ++
 tests/tcg/i386/Makefile.target | 16 +---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/tests/tcg/configure.sh b/tests/tcg/configure.sh
index 90fd81f506..46bc8634bb 100755
--- a/tests/tcg/configure.sh
+++ b/tests/tcg/configure.sh
@@ -281,6 +281,12 @@ for target in $target_list; do
 echo "CROSS_CC_HAS_POWER8_VECTOR=y" >> $config_target_mak
 fi
 ;;
+i386-linux-user)
+if do_compiler "$target_compiler" $target_compiler_cflags \
+-Werror -fno-pie -no-pie -o $TMPE $TMPC; then
+echo "CROSS_CC_HAS_I386_NOPIE=y" >> $config_target_mak
+fi
+;;
 esac
 
 enabled_cross_compilers="$enabled_cross_compilers $target_compiler"
diff --git a/tests/tcg/i386/Makefile.target b/tests/tcg/i386/Makefile.target
index c4a6f91966..f7efaab918 100644
--- a/tests/tcg/i386/Makefile.target
+++ b/tests/tcg/i386/Makefile.target
@@ -27,13 +27,23 @@ run-plugin-test-i386-bmi2-%: QEMU_OPTS += -cpu max
 hello-i386: CFLAGS+=-ffreestanding
 hello-i386: LDFLAGS+=-nostdlib
 
-#
-# test-386 includes a couple of additional objects that need to be linked 
together
-#
+# test-386 includes a couple of additional objects that need to be
+# linked together, we also need a no-pie capable compiler due to the
+# non-pic calls into 16-bit mode
+ifneq ($(DOCKER_IMAGE)$(CROSS_CC_HAS_I386_NOPIE),)
+test-i386: CFLAGS += -fno-pie
 
 test-i386: test-i386.c test-i386-code16.S test-i386-vm86.S test-i386.h 
test-i386-shift.h test-i386-muldiv.h
$(CC) $(CFLAGS) $(LDFLAGS) $(EXTRA_CFLAGS) -o $@ \
   $(

[PATCH v2 1/6] Update linux header with new arm64 NV macro

2021-04-01 Thread Haibo Xu
Signed-off-by: Haibo Xu 
---
 linux-headers/asm-arm64/kvm.h | 2 ++
 linux-headers/linux/kvm.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
index b6a0eaa32a..77b995a26c 100644
--- a/linux-headers/asm-arm64/kvm.h
+++ b/linux-headers/asm-arm64/kvm.h
@@ -106,6 +106,7 @@ struct kvm_regs {
 #define KVM_ARM_VCPU_SVE   4 /* enable SVE for this CPU */
 #define KVM_ARM_VCPU_PTRAUTH_ADDRESS   5 /* VCPU uses address authentication */
 #define KVM_ARM_VCPU_PTRAUTH_GENERIC   6 /* VCPU uses generic authentication */
+#define KVM_ARM_VCPU_HAS_EL2   7 /* Support nested virtualization */
 
 struct kvm_vcpu_init {
__u32 target;
@@ -334,6 +335,7 @@ struct kvm_vcpu_events {
 #define KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS 6
 #define KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO  7
 #define KVM_DEV_ARM_VGIC_GRP_ITS_REGS 8
+#define KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ 9
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT 10
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK \
(0x3fULL << KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT)
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 020b62a619..ce4630c4db 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1056,6 +1056,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_ENFORCE_PV_FEATURE_CPUID 190
 #define KVM_CAP_SYS_HYPERV_CPUID 191
 #define KVM_CAP_DIRTY_LOG_RING 192
+#define KVM_CAP_ARM_EL2 193
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.17.1




[PATCH v2 4/6] hw/intc/arm_gicv3: Enable support for setting vGIC maintenance IRQ

2021-04-01 Thread Haibo Xu
Using the new VGIC KVM device attribute to set the maintenance IRQ.
This is fixed to use IRQ 25(PPI 9), as a platform decision matching
the arm64 SBSA recommendation.

Signed-off-by: Haibo Xu 
---
 hw/arm/virt.c  |  5 +
 hw/intc/arm_gicv3_common.c |  1 +
 hw/intc/arm_gicv3_kvm.c| 16 
 include/hw/intc/arm_gicv3_common.h |  1 +
 4 files changed, 23 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index aa2bbd14e0..92d46ebcfe 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -663,6 +663,11 @@ static void create_gic(VirtMachineState *vms)
 qdev_prop_set_uint32(vms->gic, "redist-region-count[1]",
 MIN(smp_cpus - redist0_count, redist1_capacity));
 }
+
+if (kvm_irqchip_in_kernel()) {
+bool el2 = object_property_get_bool(OBJECT(first_cpu), "el2", 
NULL);
+qdev_prop_set_bit(vms->gic, "has-virtualization-extensions", el2);
+}
 } else {
 if (!kvm_irqchip_in_kernel()) {
 qdev_prop_set_bit(vms->gic, "has-virtualization-extensions",
diff --git a/hw/intc/arm_gicv3_common.c b/hw/intc/arm_gicv3_common.c
index 58ef65f589..3ac10c8e61 100644
--- a/hw/intc/arm_gicv3_common.c
+++ b/hw/intc/arm_gicv3_common.c
@@ -495,6 +495,7 @@ static Property arm_gicv3_common_properties[] = {
 DEFINE_PROP_UINT32("num-irq", GICv3State, num_irq, 32),
 DEFINE_PROP_UINT32("revision", GICv3State, revision, 3),
 DEFINE_PROP_BOOL("has-security-extensions", GICv3State, security_extn, 0),
+DEFINE_PROP_BOOL("has-virtualization-extensions", GICv3State, virt_extn, 
0),
 DEFINE_PROP_ARRAY("redist-region-count", GICv3State, nb_redist_regions,
   redist_region_count, qdev_prop_uint32, uint32_t),
 DEFINE_PROP_END_OF_LIST(),
diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c
index 65a4c880a3..1e1ca66e2c 100644
--- a/hw/intc/arm_gicv3_kvm.c
+++ b/hw/intc/arm_gicv3_kvm.c
@@ -826,6 +826,22 @@ static void kvm_arm_gicv3_realize(DeviceState *dev, Error 
**errp)
 kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
   KVM_DEV_ARM_VGIC_CTRL_INIT, NULL, true, &error_abort);
 
+if (s->virt_extn) {
+bool maint_irq_allowed;
+uint32_t maint_irq = 25;
+
+maint_irq_allowed =
+kvm_device_check_attr(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ, 
0);
+if (!maint_irq_allowed) {
+error_setg(errp, "VGICv3 setting maintenance IRQ are not "
+ "supported by this host kernel");
+return;
+}
+
+kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ,
+  0, &maint_irq, true, &error_abort);
+}
+
 kvm_arm_register_device(&s->iomem_dist, -1, KVM_DEV_ARM_VGIC_GRP_ADDR,
 KVM_VGIC_V3_ADDR_TYPE_DIST, s->dev_fd, 0);
 
diff --git a/include/hw/intc/arm_gicv3_common.h 
b/include/hw/intc/arm_gicv3_common.h
index 91491a2f66..921ddc2c5f 100644
--- a/include/hw/intc/arm_gicv3_common.h
+++ b/include/hw/intc/arm_gicv3_common.h
@@ -220,6 +220,7 @@ struct GICv3State {
 uint32_t num_irq;
 uint32_t revision;
 bool security_extn;
+bool virt_extn;
 bool irq_reset_nonsecure;
 bool gicd_no_migration_shift_bug;
 
-- 
2.17.1




[PATCH v2 0/6] target/arm: Add nested virtualization support

2021-04-01 Thread Haibo Xu
v2:
  - Move the NV to a CPU feature flag(Andrea&Andrew)
  - Add CPU feature 'el2' test(Andrew)

Many thanks to Andrea and Andrew for their comments!

This series add support for ARMv8.3/8.4 nested virtualization support
in KVM mode. It's based on Marc Zyngier's kernel KVM patches[1], and 
has been tested on a FVP model to run a L2 guest with Qemu. Now the 
feature can be enabled by "-M virt,accel=kvm,virtualization=on" when
starting a VM. 

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-5.12-WIP

Haibo Xu (6):
  Update linux header with new arm64 NV macro
  target/arm/kvm: Add helper to detect el2 when using KVM
  target/arm/kvm: Add an option to turn on/off el2 support
  hw/intc/arm_gicv3: Enable support for setting vGIC maintenance IRQ
  target/arm/cpu: Enable 'el2' to work with host/max cpu
  target/arm: Add vCPU feature 'el2' test.

 hw/arm/virt.c  | 19 ---
 hw/intc/arm_gicv3_common.c |  1 +
 hw/intc/arm_gicv3_kvm.c| 16 +
 include/hw/intc/arm_gicv3_common.h |  1 +
 linux-headers/asm-arm64/kvm.h  |  2 ++
 linux-headers/linux/kvm.h  |  1 +
 target/arm/cpu.c   | 14 +++-
 target/arm/cpu.h   |  4 +++
 target/arm/cpu64.c | 53 ++
 target/arm/kvm64.c | 15 +
 target/arm/kvm_arm.h   | 13 
 target/arm/monitor.c   |  2 +-
 tests/qtest/arm-cpu-features.c |  9 +
 13 files changed, 144 insertions(+), 6 deletions(-)

-- 
2.17.1




[PATCH v2 5/6] target/arm/cpu: Enable 'el2' to work with host/max cpu

2021-04-01 Thread Haibo Xu
Turn off the 'el2' cpu property by default to keep in line with
that in TCG mode, i.e. we can now use '-cpu max|host,el2=on' to
enable the nested virtualization.

Signed-off-by: Haibo Xu 
---
 hw/arm/virt.c  | 14 ++
 target/arm/cpu.c   |  3 ++-
 target/arm/cpu64.c |  1 +
 target/arm/kvm64.c | 10 ++
 4 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 92d46ebcfe..74340e21bd 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -454,6 +454,7 @@ static void fdt_add_gic_node(VirtMachineState *vms)
 {
 MachineState *ms = MACHINE(vms);
 char *nodename;
+bool has_el2 = object_property_get_bool(OBJECT(first_cpu), "el2", NULL);
 
 vms->gic_phandle = qemu_fdt_alloc_phandle(ms->fdt);
 qemu_fdt_setprop_cell(ms->fdt, "/", "interrupt-parent", vms->gic_phandle);
@@ -491,7 +492,7 @@ static void fdt_add_gic_node(VirtMachineState *vms)
  2, vms->memmap[VIRT_HIGH_GIC_REDIST2].size);
 }
 
-if (vms->virt) {
+if (vms->virt || has_el2) {
 qemu_fdt_setprop_cells(ms->fdt, nodename, "interrupts",
GIC_FDT_IRQ_TYPE_PPI, ARCH_GIC_MAINT_IRQ,
GIC_FDT_IRQ_FLAGS_LEVEL_HI);
@@ -1911,8 +1912,8 @@ static void machvirt_init(MachineState *machine)
 }
 
 if (vms->virt && kvm_enabled()) {
-error_report("mach-virt: KVM does not support providing "
- "Virtualization extensions to the guest CPU");
+error_report("mach-virt: VM 'virtualization' feature is not supported "
+ "in KVM mode, please use CPU feature 'el2' instead");
 exit(1);
 }
 
@@ -1950,11 +1951,16 @@ static void machvirt_init(MachineState *machine)
 object_property_set_bool(cpuobj, "has_el3", false, NULL);
 }
 
-if (!vms->virt && object_property_find(cpuobj, "has_el2")) {
+if (!vms->virt && !kvm_enabled() &&
+object_property_find(cpuobj, "has_el2")) {
 object_property_set_bool(cpuobj, "has_el2", false, NULL);
 }
 
 if (vms->psci_conduit != QEMU_PSCI_CONDUIT_DISABLED) {
+if (kvm_enabled() && ARM_CPU(cpuobj)->has_el2) {
+vms->psci_conduit = QEMU_PSCI_CONDUIT_SMC;
+}
+
 object_property_set_int(cpuobj, "psci-conduit", vms->psci_conduit,
 NULL);
 
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 30cc330f50..9530a2c4bf 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1099,7 +1099,7 @@ static Property arm_cpu_rvbar_property =
 
 #ifndef CONFIG_USER_ONLY
 static Property arm_cpu_has_el2_property =
-DEFINE_PROP_BOOL("has_el2", ARMCPU, has_el2, true);
+DEFINE_PROP_BOOL("has_el2", ARMCPU, has_el2, false);
 
 static Property arm_cpu_has_el3_property =
 DEFINE_PROP_BOOL("has_el3", ARMCPU, has_el3, true);
@@ -2018,6 +2018,7 @@ static void arm_host_initfn(Object *obj)
 kvm_arm_set_cpu_features_from_host(cpu);
 if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
 aarch64_add_sve_properties(obj);
+aarch64_add_el2_properties(obj);
 }
 arm_cpu_post_init(obj);
 }
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 3f3f2c5495..ae8811d09e 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -666,6 +666,7 @@ static void aarch64_max_initfn(Object *obj)
 
 if (kvm_enabled()) {
 kvm_arm_set_cpu_features_from_host(cpu);
+aarch64_add_el2_properties(obj);
 } else {
 uint64_t t;
 uint32_t u;
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 9cacaf2eb8..7bf892404f 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -500,6 +500,7 @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
  */
 int fdarray[3];
 bool sve_supported;
+bool el2_supported;
 uint64_t features = 0;
 uint64_t t;
 int err;
@@ -646,6 +647,7 @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
 }
 
 sve_supported = ioctl(fdarray[0], KVM_CHECK_EXTENSION, KVM_CAP_ARM_SVE) > 
0;
+el2_supported = ioctl(fdarray[0], KVM_CHECK_EXTENSION, KVM_CAP_ARM_EL2) > 
0;
 
 kvm_arm_destroy_scratch_host_vcpu(fdarray);
 
@@ -660,6 +662,11 @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures 
*ahcf)
 ahcf->isar.id_aa64pfr0 = t;
 }
 
+/* Use the ARM_FEATURE_EL2 bit to keep inline with that in TCG mode. */
+if (el2_supported) {
+features |= 1ULL << ARM_FEATURE_EL2;
+}
+
 /*
  * We can assume any KVM supporting CPU is at least a v8
  * with VFPv4+Neon; this in turn implies most of the other
@@ -861,6 +868,9 @@ int kvm_arch_init_vcpu(CPUState *cs)
 assert(kvm_arm_sve_supported());
 cpu->kvm_init_features[0] |= 1 << KVM_ARM_VCPU_SVE;
 }
+if (arm_feature(&cpu->env, ARM_FEATURE_EL2)) {
+cpu->kvm_init_features[0] |= 1 << KVM_ARM_

[PATCH v2 2/6] target/arm/kvm: Add helper to detect el2 when using KVM

2021-04-01 Thread Haibo Xu
Signed-off-by: Haibo Xu 
---
 target/arm/kvm64.c   |  5 +
 target/arm/kvm_arm.h | 13 +
 2 files changed, 18 insertions(+)

diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index dff85f6db9..9cacaf2eb8 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -721,6 +721,11 @@ bool kvm_arm_steal_time_supported(void)
 return kvm_check_extension(kvm_state, KVM_CAP_STEAL_TIME);
 }
 
+bool kvm_arm_el2_supported(void)
+{
+return kvm_check_extension(kvm_state, KVM_CAP_ARM_EL2);
+}
+
 QEMU_BUILD_BUG_ON(KVM_ARM64_SVE_VQ_MIN != 1);
 
 void kvm_arm_sve_get_vls(CPUState *cs, unsigned long *map)
diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index 34f8daa377..7d7fc7981b 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -285,6 +285,14 @@ void kvm_arm_steal_time_finalize(ARMCPU *cpu, Error 
**errp);
  */
 bool kvm_arm_steal_time_supported(void);
 
+/**
+ * kvm_arm_el2_supported:
+ *
+ * Returns: true if KVM can enable el2(nested virtualization)
+ * and false otherwise.
+ */
+bool kvm_arm_el2_supported(void);
+
 /**
  * kvm_arm_aarch32_supported:
  *
@@ -398,6 +406,11 @@ static inline bool kvm_arm_steal_time_supported(void)
 return false;
 }
 
+static inline bool kvm_arm_el2_supported(void)
+{
+return false;
+}
+
 /*
  * These functions should never actually be called without KVM support.
  */
-- 
2.17.1




[PATCH v2 6/6] target/arm: Add vCPU feature 'el2' test.

2021-04-01 Thread Haibo Xu
Signed-off-by: Haibo Xu 
---
 target/arm/monitor.c   | 2 +-
 tests/qtest/arm-cpu-features.c | 9 +
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/target/arm/monitor.c b/target/arm/monitor.c
index 80c64fa355..6c39238925 100644
--- a/target/arm/monitor.c
+++ b/target/arm/monitor.c
@@ -90,7 +90,7 @@ QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16);
  * then the order that considers those dependencies must be used.
  */
 static const char *cpu_model_advertised_features[] = {
-"aarch64", "pmu", "sve",
+"aarch64", "pmu", "sve", "el2",
 "sve128", "sve256", "sve384", "sve512",
 "sve640", "sve768", "sve896", "sve1024", "sve1152", "sve1280",
 "sve1408", "sve1536", "sve1664", "sve1792", "sve1920", "sve2048",
diff --git a/tests/qtest/arm-cpu-features.c b/tests/qtest/arm-cpu-features.c
index 8252b85bb8..be07bf0c76 100644
--- a/tests/qtest/arm-cpu-features.c
+++ b/tests/qtest/arm-cpu-features.c
@@ -509,6 +509,7 @@ static void test_query_cpu_model_expansion_kvm(const void 
*data)
 if (g_str_equal(qtest_get_arch(), "aarch64")) {
 bool kvm_supports_steal_time;
 bool kvm_supports_sve;
+bool kvm_supports_el2;
 char max_name[8], name[8];
 uint32_t max_vq, vq;
 uint64_t vls;
@@ -533,10 +534,12 @@ static void test_query_cpu_model_expansion_kvm(const void 
*data)
  */
 assert_has_feature(qts, "host", "kvm-steal-time");
 assert_has_feature(qts, "host", "sve");
+assert_has_feature(qts, "host", "el2");
 
 resp = do_query_no_props(qts, "host");
 kvm_supports_steal_time = resp_get_feature(resp, "kvm-steal-time");
 kvm_supports_sve = resp_get_feature(resp, "sve");
+kvm_supports_el2 = resp_get_feature(resp, "el2");
 vls = resp_get_sve_vls(resp);
 qobject_unref(resp);
 
@@ -602,11 +605,17 @@ static void test_query_cpu_model_expansion_kvm(const void 
*data)
 } else {
 g_assert(vls == 0);
 }
+
+if (kvm_supports_el2) {
+assert_set_feature(qts, "host", "el2", false);
+assert_set_feature(qts, "host", "el2", true);
+}
 } else {
 assert_has_not_feature(qts, "host", "aarch64");
 assert_has_not_feature(qts, "host", "pmu");
 assert_has_not_feature(qts, "host", "sve");
 assert_has_not_feature(qts, "host", "kvm-steal-time");
+assert_has_not_feature(qts, "host", "el2");
 }
 
 qtest_quit(qts);
-- 
2.17.1




[PATCH v2 3/6] target/arm/kvm: Add an option to turn on/off el2 support

2021-04-01 Thread Haibo Xu
Adds an el2=[on/off] option to enable/disable el2(nested virtualization)
support in KVM guest vCPU.

Signed-off-by: Haibo Xu 
---
 target/arm/cpu.c   | 11 ++
 target/arm/cpu.h   |  4 
 target/arm/cpu64.c | 52 ++
 3 files changed, 67 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index ae04884408..30cc330f50 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1349,6 +1349,17 @@ void arm_cpu_finalize_features(ARMCPU *cpu, Error **errp)
 return;
 }
 }
+
+/*
+ * Currently, vCPU feature 'el2' only supported in KVM mode.
+ */
+if (kvm_enabled()) {
+arm_cpu_el2_finalize(cpu, &local_err);
+if (local_err != NULL) {
+error_propagate(errp, local_err);
+return;
+}
+}
 }
 
 if (kvm_enabled()) {
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 193a49ec7f..19fa9cfbfd 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -203,10 +203,12 @@ typedef struct {
 # define ARM_MAX_VQ16
 void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp);
 void arm_cpu_pauth_finalize(ARMCPU *cpu, Error **errp);
+void arm_cpu_el2_finalize(ARMCPU *cpu, Error **errp);
 #else
 # define ARM_MAX_VQ1
 static inline void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp) { }
 static inline void arm_cpu_pauth_finalize(ARMCPU *cpu, Error **errp) { }
+static inline void arm_cpu_el2_finalize(ARMCPU *cpu, Error **errp) { }
 #endif
 
 typedef struct ARMVectorReg {
@@ -1058,6 +1060,7 @@ void aarch64_sve_narrow_vq(CPUARMState *env, unsigned vq);
 void aarch64_sve_change_el(CPUARMState *env, int old_el,
int new_el, bool el0_a64);
 void aarch64_add_sve_properties(Object *obj);
+void aarch64_add_el2_properties(Object *obj);
 
 /*
  * SVE registers are encoded in KVM's memory in an endianness-invariant format.
@@ -1089,6 +1092,7 @@ static inline void aarch64_sve_change_el(CPUARMState 
*env, int o,
  int n, bool a)
 { }
 static inline void aarch64_add_sve_properties(Object *obj) { }
+static inline void aarch64_add_el2_properties(Object *obj) { }
 #endif
 
 void aarch64_sync_32_to_64(CPUARMState *env);
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index f0a9e968c9..3f3f2c5495 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -603,6 +603,58 @@ static Property arm_cpu_pauth_property =
 static Property arm_cpu_pauth_impdef_property =
 DEFINE_PROP_BOOL("pauth-impdef", ARMCPU, prop_pauth_impdef, false);
 
+void arm_cpu_el2_finalize(ARMCPU *cpu, Error **errp)
+{
+if (cpu->has_el2) {
+if (!kvm_enabled() || !kvm_arm_el2_supported()) {
+error_setg(errp, "'el2' cannot be enabled on this host");
+return;
+}
+}
+
+if (cpu->has_el2) {
+set_feature(&cpu->env, ARM_FEATURE_EL2);
+} else {
+unset_feature(&cpu->env, ARM_FEATURE_EL2);
+}
+}
+
+static bool arm_get_el2(Object *obj, Error **errp)
+{
+ARMCPU *cpu = ARM_CPU(obj);
+
+return cpu->has_el2;
+}
+
+static void arm_set_el2(Object *obj, bool value, Error **errp)
+{
+ARMCPU *cpu = ARM_CPU(obj);
+
+if (value) {
+if (!kvm_enabled() || !kvm_arm_el2_supported()) {
+error_setg(errp, "'el2' cannot be enabled on this host");
+return;
+}
+set_feature(&cpu->env, ARM_FEATURE_EL2);
+} else {
+unset_feature(&cpu->env, ARM_FEATURE_EL2);
+}
+
+cpu->has_el2 = value;
+}
+
+void aarch64_add_el2_properties(Object *obj)
+{
+/*
+ * vCPU feature 'el2' is only available in KVM mode, and is
+ * disabled by default to keep in line with that in TCG mode.
+ */
+ARM_CPU(obj)->has_el2 = false;
+object_property_add_bool(obj, "el2", arm_get_el2, arm_set_el2);
+object_property_set_description(obj, "el2", "Set off to disable "
+"nested virtulization.");
+}
+
 /* -cpu max: if KVM is enabled, like -cpu host (best possible with this host);
  * otherwise, a CPU with as many features enabled as our emulation supports.
  * The version of '-cpu max' for qemu-system-arm is defined in cpu.c;
-- 
2.17.1




Re: [PATCH RESEND] docs: clarify absence of set_features in vhost-user

2021-04-01 Thread Alex Bennée


Alyssa Ross  writes:

> The previous wording was (at least to me) ambiguous about whether a
> backend should enable features immediately after they were set using
> VHOST_USER_SET_PROTOCOL_FEATURES, or wait for support for protocol
> features to be acknowledged if it hasn't been yet before enabling
> those features.
>
> This patch attempts to make it clearer that
> VHOST_USER_SET_PROTOCOL_FEATURES should immediately enable features,
> even if support for protocol features has not yet been acknowledged,
> while still also making clear that the frontend SHOULD acknowledge
> support for protocol features.
>
> Previous discussion begins here:
> 

I totally missed this when I posted a similar attempt at clarification:

  Subject: [PATCH v2] vhost-user.rst: add clarifying language about protocol 
negotiation
  Date: Wed,  3 Mar 2021 14:50:11 +
  Message-Id: <20210303145011.14547-1-alex.ben...@linaro.org>

>
> Cc: Michael S. Tsirkin 
> Signed-off-by: Alyssa Ross 
> ---
>  docs/interop/vhost-user.rst | 14 +-
>  1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index d6085f7045..c42150331d 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -871,9 +871,9 @@ Master message types
>``VHOST_USER_GET_FEATURES``.
>  
>  .. Note::
> -   Slave that reported ``VHOST_USER_F_PROTOCOL_FEATURES`` must
> -   support this message even before ``VHOST_USER_SET_FEATURES`` was
> -   called.
> +   While QEMU should acknowledge ``VHOST_USER_F_PROTOCOL_FEATURES``, a
> +   backend must allow ``VHOST_USER_GET_PROTOCOL_FEATURES`` even if
> +   ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been acknowledged yet.
>  
>  ``VHOST_USER_SET_PROTOCOL_FEATURES``
>:id: 16
> @@ -886,8 +886,12 @@ Master message types
>``VHOST_USER_GET_FEATURES``.
>  
>  .. Note::
> -   Slave that reported ``VHOST_USER_F_PROTOCOL_FEATURES`` must support
> -   this message even before ``VHOST_USER_SET_FEATURES`` was called.
> +   While QEMU should acknowledge ``VHOST_USER_F_PROTOCOL_FEATURES``, a
> +   backend must allow ``VHOST_USER_SET_PROTOCOL_FEATURES`` even if
> +   ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been acknowledged yet.
> +   The backend must not wait for ``VHOST_USER_SET_FEATURES`` before
> +   enabling protocol features requested with
> +   ``VHOST_USER_SET_PROTOCOL_FEATURES``.

I think this is perfectly fine clarification although I think there
might be a patch in flight which changes some of the master slave
terminology so with that resolved:

Reviewed-by: Alex Bennée 

However there is still the edge case of what happens after the vhost
pair have negotiated protocol features like REPLY_ACK should
VHOST_USER_F_PROTOCOL_FEATURES still be acknowledged by
VHOST_USER_SET_FEATURES.

Stefan's proposed wording which I incorporated in my patch made it clear
that VHOST_USER_F_PROTOCOL_FEATURES is never exposed to the guest by the
VMM due to it's UNUSED status. I would just like it explicit if it needs
to be preserved between the two sides of the VHOST_USER_*_FEATURES for
the negotiated protocol features to remain valid.

Perhaps you could incorporate some wording for that?

>  
>  ``VHOST_USER_SET_OWNER``
>:id: 3


-- 
Alex Bennée



Re: [PATCH v3 05/11] esp: introduce esp_fifo_pop_buf() and use it instead of fifo8_pop_buf()

2021-04-01 Thread Mark Cave-Ayland

On 01/04/2021 10:34, Philippe Mathieu-Daudé wrote:


On 4/1/21 9:49 AM, Mark Cave-Ayland wrote:

The const pointer returned by fifo8_pop_buf() lies directly within the array 
used
to model the FIFO. Building with address sanitisers enabled shows that if the


Typo "sanitizers"


Ha. It's definitely "sanitiser" here in the UK (UK English) as opposed to "sanitizer" 
(US English). I don't really mind either way, but I can fix this if it needs a v4 
following Paolo's comments.



caller expects a minimum number of bytes present then if the FIFO is nearly 
full,
the caller may unexpectedly access past the end of the array.


Why isn't it a problem with the other models? Because the pointed
buffer is consumed directly?


Yes that's correct, which is why Fifo8 currently doesn't support wraparound. I 
haven't analysed how other devices have used it but I would imagine there would be an 
ASan hit if it were being misused this way.



Introduce esp_fifo_pop_buf() which takes a destination buffer and performs a
memcpy() in it to guarantee that the caller cannot overwrite the FIFO array and
update all callers to use it. Similarly add underflow protection similar to
esp_fifo_push() and esp_fifo_pop() so that instead of triggering an assert()
the operation becomes a no-op.


This is OK for your ESP model.

Now thinking loudly about the Fifo8 API, shouldn't this be part of it?

Something prototype like:

   /**
* fifo8_pop_buf:
* @do_copy: If %true, also copy data to @bufptr.
*/
   size_t fifo8_pop_buf(Fifo8 *fifo,
void **bufptr,
size_t buflen,
bool do_copy);


That could work, and may even allow support for wraparound in future. I suspect 
things would become clearer after looking at the other Fifo8 users to see if this is 
worth an API change/alternative API.



ATB,

Mark.



  1   2   3   >