date:20201101

Re: Out-of-Process Device Emulation session at KVM Forum 2020

2020-11-01 Thread Paolo Bonzini

Il sab 31 ott 2020, 22:49 Michael S. Tsirkin  ha scritto:

> > > I still don't get why it must be opaque.
> >
> > If the device state format needs to be in the VMM then each device
> > needs explicit enablement in each VMM (QEMU, cloud-hypervisor, etc).
>
> And QEMU cares why exactly?
>

QEMU cares for another reason. It is more code to review, and it's worth
spending the time to reviewing it only if we can do a decent job at
reviewing it.

There are several cases in which drivers migrate non-architectural,
implementation-dependent state. There are some examples in nested
virtualization (the deadline of the VMX preemption timer) or device
emulation (the RTC has quite a few example also of how those changed
through the years). We probably don't have anyway the knowledge of the
innards of the drivers to do a decent job at reviewing patches that affect
those.

> Let's invert the question: why does the VMM need to understand the
> > device state of a _passthrough_ device?
>
> To support cross version migration and compatibility checks.
>

That doesn't have to be in the VMM. We should give guidance but that can be
in terms of documentation. Also, in QEMU we chose the path of dropping
sections on the source when migrating to older versions, but that can also
be considered a deficiency of vmstate---a self-synchronizing format
(Anthony many years ago wanted to use X509 as the migration format) would
be much better. And for some specific device types we could define standard
formats, just like PCI has standard classes.

Paolo

>
This problem is harder than it appears, I don't think vendors
> will do a good job of it without any guidance and standards.
>
> --
> MST
>
>

Re: simple example of pci driver with dma

2020-11-01 Thread Yan Vugenfirer

Hi Shaked,

In the prob function, before you are trying to do any DMA operations with your 
device, you should call pci_enable_device and then pci_set_master. Also you 
might need to map the resources of your device.
Check PCI driver documentation: 
https://lxr.missinglinkelectronics.com/linux/Documentation/PCI/pci.rst#L199

Best regards,
Yan.

> On 29 Oct 2020, at 10:32 PM, Shaked Matzner  wrote:
> 
> Hey Peter,
> Currently I have this test in the driver which allocates data, assign it to 
> default value config source as the BASE address of dma, dest as the physical 
> address which I got from dma_alloc_coherent, set the count and assign the 
> command to raise interrupt and read,the piece of code used is something like 
> this(this a test performed from the probe function of the driver)
> vaddr_to = 
> dma_alloc_coherent(&(dev->dev), 4, &dma_handle_to, GFP_ATOMIC |GFP_KERNEL);
> *((volatile int*)vaddr_to) = 0xff;
> test->vaddr_to = vaddr_to;
> dev_info(&(dev->dev), 
> "vaddr_to = %px\n", vaddr_to);
> dev_info(&(dev->dev), 
> "dma_handle_to = %llx\n", (unsigned long long)dma_handle_to);
> iowrite32(DMA_BASE, mmio + 
> IO_DMA_SRC);
> iowrite32((u32)dma_handle_to, 
> mmio + IO_DMA_DST);
> iowrite32(SIZE, mmio + 
> IO_DMA_CNT);
>iowrite32(DMA_CMD | 
> DMA_FROM_DEV | DMA_IRQ, mmio + IO_DMA_CMD);
> Where and when should the pci_set_master hould be called?
> Thanks,
>  Shaked Matzner
>  
> From: Peter Maydell  
> Sent: Thursday, October 29, 2020 5:46 PM
> To: Shaked Matzner 
> Cc: qemu-devel@nongnu.org
> Subject: Re: simple example of pci driver with dma
>  
> 
> IRONSCALES couldn't recognize this email as this is the first time you 
> received an email from this sender peter.mayd...@linaro.org
>  
> [EXTERNAL]
> 
> On Thu, 29 Oct 2020 at 14:59, Shaked Matzner wrote:
> > however the value I get is still 255(0xff) and not 18(0x12) probably I've 
> > missed something but when the interrupt is called the transfer to the RAM 
> > address should be completed, however it seems like the dma_write_buffer 
> > function from the device does not perform any transfer. What Am I missing?
> 
> The usual mistake is forgetting in the guest code to program the
> PCI device to enable bus mastering by setting the Bus Master bit
> in the Command register in the PCI config space registers for
> the device. Unless you do that then all DMA attempts will fail
> (same as on real h/w). In the Linux kernel the function for this
> is pci_set_master(), I think.
> 
> thanks
> -- PMM 
> 
> The contents of this email message and any attachments are intended solely 
> for the addressee(s) and may contain confidential and/or privileged 
> information and may be legally protected from disclosure. If you are not the 
> intended recipient of this message or their agent, or if this message has 
> been addressed to you in error, please immediately alert the sender by reply 
> email and then delete this message and any attachments. If you are not the 
> intended recipient, you are hereby notified that any use, dissemination, 
> copying, or storage of this message or its attachments is strictly 
> prohibited. 
> 



--
Daynix Computing LTD
Yan Vugenfirer, CEO
Email: y...@daynix.com
Phone (Israel): +972-54-4758084
Phone (USA): +1-7204776716
Phone (UK): +44-2070482938
Web: www.daynix.com

Re: [PULL 00/16] migration queue

2020-11-01 Thread Christian Schoenebeck

On Samstag, 31. Oktober 2020 20:10:49 CET Christian Schoenebeck wrote:
> On Samstag, 31. Oktober 2020 18:46:11 CET Peter Xu wrote:
> > On Sat, Oct 31, 2020 at 05:26:28PM +, Peter Maydell wrote:
> > > On Sat, 31 Oct 2020 at 16:12, Christian Schoenebeck
> > > 
> > >  wrote:
> > > > On Montag, 26. Oktober 2020 17:19:36 CET Dr. David Alan Gilbert (git)
> 
> wrote:
> > > > > 
> > > > > migration pull: 2020-10-26
> > > > > 
> > > > > Another go at Peter's postcopy fixes
> > > > > 
> > > > > Cleanups from Bihong Yu and Peter Maydell.
> > > > > 
> > > > > Signed-off-by: Dr. David Alan Gilbert 
> > > > 
> > > > May it be possible that this PR introduced a lockup of the qtests that
> > > > I
> > > > am
> > > > encountering in this week's upstream revisions?
> > > 
> > > If you try the patches Peter Xu attached to this thread
> > > does the lockup go away ?
> > > 
> > > https://lore.kernel.org/qemu-devel/20201030135350.GA588069@xz-x1/
> > > 
> > > (I'm also seeing intermittent hangs, for some reason almost always
> > > on s390x host.)
> > 
> > It would be good to know exactly which test hanged.  If it's
> > migration-test
> > then it's very possible.
> 
> It's run-test-144 that does not return; according to Makefile.mtest that's
> migration-test, so chances are high that it's indeed introduced by this PR.
> 
> > The race above patch(es) tried to fix should logically be reproducable on
> > all archs, not s390x only.
> > 
> > Thanks,
> 
> Yes, it's i386 here that locks up.
> 
> I'm running the loop with your patches now, so far so good, let's see if
> it's still alive tomorrow.
> 
> Best regards,
> Christian Schoenebeck

Looks good! 16h later and the loop is still running here; it also made the 
lockup to disappear on Travis-CI. So Peter Xu's two patches fix the lockup 
problem for me.

Best regards,
Christian Schoenebeck

Re: [PULL 0/5] Modules 20201029 patches

2020-11-01 Thread Peter Maydell

On Thu, 29 Oct 2020 at 11:16, Gerd Hoffmann  wrote:
>
> The following changes since commit bbc48d2bcb9711614fbe751c2c5ae13e172fbca8:
>
>   Merge remote-tracking branch 'remotes/philmd-gitlab/tags/renesas-20201027' 
> into staging (2020-10-28 16:25:31 +)
>
> are available in the Git repository at:
>
>   git://git.kraxel.org/qemu tags/modules-20201029-pull-request
>
> for you to fetch changes up to 546323bdac18984c771ebefae1046ee61742f9ca:
>
>   modules: turn off lazy binding (2020-10-29 06:37:24 +0100)
>
> 
> modules: build virtio-gpu-pci & virtio-vga modular.
> modules: various bugfixes, mostly for macos.
>

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/5.2
for any user-visible changes.

-- PMM

Re: [PULL v2 01/16] tests/9pfs: fix test dir for parallel tests

2020-11-01 Thread Christian Schoenebeck

On Samstag, 31. Oktober 2020 14:20:27 CET Christian Schoenebeck wrote:
> On Freitag, 30. Oktober 2020 13:07:03 CET Christian Schoenebeck wrote:
> > Use mkdtemp() to generate a unique directory for the 9p 'local' tests.
> > 
> > This fixes occasional 9p test failures when running 'make check -jN' if
> > QEMU was compiled for multiple target architectures, because the
> > individual
> > architecture's test suites would run in parallel and interfere with each
> > other's data as the test directory was previously hard coded and hence the
> > same directory was used by all of them simultaniously.
> > 
> > This also requires a change how the test directory is created and deleted:
> > As the test path is now randomized and virtio_9p_register_nodes() being
> > called in a somewhat undeterministic way, that's no longer an appropriate
> > place to create and remove the test directory. Use a constructor and
> > destructor function for creating and removing the test directory instead.
> > Unfortunately libqos currently does not support setup/teardown callbacks
> > to handle this more cleanly.
> 
> Peter, please ignore this PR. This patch needs rework:
> 
> ERROR:../tests/qtest/test-x86-cpuid-compat.c:208:test_plus_minus: stdout of
> child process (/x86/cpuid/parsing-plus-minus/subprocess [34856]) failed to
> match:
> 
> stdout was:
> 
> # mkdir('/home/travis/build/cschoenebeck/qemu/build/qtest-9p-local-PwY2nQ')
> failed: File exists
> 
> ERROR qtest-x86_64/test-x86-cpuid-compat - Bail out! ERROR:../tests/qtest/
> test-x86-cpuid-compat.c:208:test_plus_minus: stdout of child process (/x86/
> cpuid/parsing-plus-minus/subprocess [34856]) failed to match:
> 
> make: *** [Makefile.mtest:1793: run-test-222] Error 1
> 
> https://travis-ci.org/github/cschoenebeck/qemu/jobs/740199494

Ok, I found a solution: by moving constructor & destructor functions from 
virtio-9p.c to virtio-9p-test.c:
https://github.com/cschoenebeck/qemu/commit/b4c72149f087d5a

The problem was that the constructor function was executed when libqos was 
loaded, which included completely unrelated test suites that just link to 
libqos.

In conjunction with Peter Xu's two migration patches (fixing occasional 
lockups of migration tests) overall situation appears to be smooth now:
https://lore.kernel.org/qemu-devel/20201030135350.GA588069@xz-x1/

There is now only one test failure left concerning macOS Xcode builds, but 
that seems to be completely unrelated to our 9pfs patches:
https://github.com/cschoenebeck/qemu/runs/1338011297

missing object type 'vhost-user-gpu'
Broken pipe
../tests/qtest/libqtest.c:176: kill_qemu() detected QEMU death from signal 6 
(Abort trap: 6)
ERROR qtest-aarch64/device-introspect-test - too few tests run (expected 6, 
got 5)
gmake: *** [Makefile.mtest:905: run-test-111] Error 1

I prepare updated patches for review.

Best regards,
Christian Schoenebeck

[PATCH] hw/input/ps2.c: Remove remnants of printf debug

2020-11-01 Thread Peter Maydell

In commit 5edab03d4040 we added tracepoints to the ps2 keyboard
and mouse emulation. However we didn't remove all the debug-by-printf
support. In fact there is only one printf() remaining, and it is
redundant with the trace_ps2_write_mouse() event next to it.
Remove the printf() and the now-unused DEBUG* macros.

Signed-off-by: Peter Maydell 
---
 hw/input/ps2.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/hw/input/ps2.c b/hw/input/ps2.c
index f8746d2f52c..72cdb80ae1c 100644
--- a/hw/input/ps2.c
+++ b/hw/input/ps2.c
@@ -33,12 +33,6 @@
 
 #include "trace.h"
 
-/* debug PC keyboard */
-//#define DEBUG_KBD
-
-/* debug PC keyboard : only mouse */
-//#define DEBUG_MOUSE
-
 /* Keyboard Commands */
 #define KBD_CMD_SET_LEDS   0xED/* Set keyboard leds */
 #define KBD_CMD_ECHO   0xEE
@@ -790,9 +784,6 @@ void ps2_write_mouse(void *opaque, int val)
 PS2MouseState *s = (PS2MouseState *)opaque;
 
 trace_ps2_write_mouse(opaque, val);
-#ifdef DEBUG_MOUSE
-printf("kbd: write mouse 0x%02x\n", val);
-#endif
 switch(s->common.write_cmd) {
 default:
 case -1:
-- 
2.20.1

Re: [PULL 00/18] riscv-to-apply queue

2020-11-01 Thread Peter Maydell

On Thu, 29 Oct 2020 at 14:25, Alistair Francis  wrote:
>
> The following changes since commit c0444009147aa935d52d5acfc6b70094bb42b0dd:
>
>   Merge remote-tracking branch 'remotes/armbru/tags/pull-qmp-2020-10-27' into 
> staging (2020-10-29 10:03:32 +)
>
> are available in the Git repository at:
>
>   g...@github.com:alistair23/qemu.git tags/pull-riscv-to-apply-20201029
>
> for you to fetch changes up to e041badcd4ac644a67f02f8765095a5ff7a24d47:
>
>   hw/riscv: microchip_pfsoc: Hook the I2C1 controller (2020-10-29 07:11:14 
> -0700)
>
> 
> This series adds support for migration to RISC-V QEMU and expands the
> Microchip PFSoC to allow unmodified HSS and Linux boots.
>
> 

Hi; this fails 'make check' on 32-bit hosts:

qemu-system-riscv64: at most 2047 MB RAM can be simulated
Broken pipe
../../tests/qtest/libqtest.c:167: kill_qemu() tried to terminate QEMU
process but encountered exit status 1 (expected 0)
ERROR qtest-riscv64/qom-test - too few tests run (expected 6, got 3)

and

qemu-system-riscv64: at most 2047 MB RAM can be simulated
Broken pipe
../../tests/qtest/libqtest.c:167: kill_qemu() tried to terminate QEMU
process but encountered exit status 1 (expected 0)
ERROR qtest-riscv64/test-hmp - too few tests run (expected 7, got 3)

thanks
-- PMM

Re: [PULL 00/15] pc,pci,vhost,virtio: misc fixes

2020-11-01 Thread Peter Maydell

On Fri, 30 Oct 2020 at 14:33,  wrote:
>
> Patchew URL: https://patchew.org/QEMU/20201030141136.1013521-1-...@redhat.com/

I'll apply this pullreq (unless it has other more serious
issues), but could you look at the coding style warnings in
a followup patch, please?


> 9/15 Checking commit 660b206b990b (pc: Implement -no-hpet as sugar for 
> -machine hpet=on)
> WARNING: Block comments use a leading /* on a separate line
> #53: FILE: hw/i386/pc.c:1152:
> +/* For pc-piix-*, hpet's intcap is always IRQ2. For pc-q35-1.7
>
> WARNING: Block comments should align the * on each line
> #54: FILE: hw/i386/pc.c:1153:
> +/* For pc-piix-*, hpet's intcap is always IRQ2. For pc-q35-1.7
> +* and earlier, use IRQ2 for compat. Otherwise, use IRQ16~23,
>

> 13/15 Checking commit e013e462e230 (vhost-blk: set features before setting 
> inflight feature)
> ERROR: trailing whitespace
> #45: FILE: hw/virtio/vhost.c:1651:
> + $
>
> ERROR: trailing whitespace
> #50: FILE: hw/virtio/vhost.c:1656:
> + $

These all look like nits that should be fixed.

thanks
-- PMM

Re: [PATCH v2 18/19] tcg/aarch64: Implement flush_idcache_range manually

2020-11-01 Thread Richard Henderson

On 10/31/20 6:25 PM, Joelle van Dyne wrote:
> Another thing, for x86 (and maybe other archs), the icache is cache
> coherent but does it apply if we are aliasing the memory address? I
> think in that case, it's like we're doing a DMA right and still need
> to do flushing+invalidating?

No, it is not like dma.  The x86 caches are physically tagged, so virtual
aliasing does not matter.


r~

[PATCH v3 0/2] 9pfs: test suite fixes

2020-11-01 Thread Christian Schoenebeck

Fixes test failures with the 9pfs 'local' tests as discussed with latest
9P PR. See the discussion of that PR v2 (Fri, Oct 30th) for details.

In conjunction with Peter Xu's two migration patches (fixing occasional
lockups of migration tests) overall situation appears to be smooth now:
https://lore.kernel.org/qemu-devel/20201030135350.GA588069@xz-x1/

v2->v3:

  - Make the two functions for creating and removing the 9pfs test directory
public [NEW patch 1].

  - Place the constructor and destructor functions in virtio-9p-test.c, not
in virtio-9p.c, because the latter location would cause the constructor
to be executed whenever libqos is loaded, which would break other,
completely unrelated tests suites that just link to libqos [patch 2].

  - Previous patch 2 (coverity fix) is already queued, no changes, hence
omitted in this v3.

v1->v2:

  - Added Greg's tested-by tag [patch 1].

  - Log an info-level message if mkdir() failed [patch 2].

  - Update commit log message about coverity being the reporter and
details of the coverity report [patch 2].

Christian Schoenebeck (2):
  tests/9pfs: make create/remove test dir public
  tests/9pfs: fix test dir for parallel tests

 tests/qtest/libqos/virtio-9p.c | 20 ++--
 tests/qtest/libqos/virtio-9p.h | 10 ++
 tests/qtest/virtio-9p-test.c   | 12 
 3 files changed, 32 insertions(+), 10 deletions(-)

-- 
2.20.1

[PATCH v3 1/2] tests/9pfs: make create/remove test dir public

2020-11-01 Thread Christian Schoenebeck

Make functions create_local_test_dir() and remove_local_test_dir()
public. They're going to be used in the next patch.

Signed-off-by: Christian Schoenebeck 
---
 tests/qtest/libqos/virtio-9p.c | 10 --
 tests/qtest/libqos/virtio-9p.h | 10 ++
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/tests/qtest/libqos/virtio-9p.c b/tests/qtest/libqos/virtio-9p.c
index d43647b3b7..2736e9ae2a 100644
--- a/tests/qtest/libqos/virtio-9p.c
+++ b/tests/qtest/libqos/virtio-9p.c
@@ -39,8 +39,7 @@ static void init_local_test_path(void)
 g_free(pwd);
 }
 
-/* Creates the directory for the 9pfs 'local' filesystem driver to access. */
-static void create_local_test_dir(void)
+void virtio_9p_create_local_test_dir(void)
 {
 struct stat st;
 
@@ -53,8 +52,7 @@ static void create_local_test_dir(void)
 g_assert((st.st_mode & S_IFMT) == S_IFDIR);
 }
 
-/* Deletes directory previously created by create_local_test_dir(). */
-static void remove_local_test_dir(void)
+void virtio_9p_remove_local_test_dir(void)
 {
 g_assert(local_test_path != NULL);
 char *cmd = g_strdup_printf("rm -r '%s'\n", local_test_path);
@@ -248,8 +246,8 @@ static void virtio_9p_register_nodes(void)
 
 /* make sure test dir for the 'local' tests exists and is clean */
 init_local_test_path();
-remove_local_test_dir();
-create_local_test_dir();
+virtio_9p_remove_local_test_dir();
+virtio_9p_create_local_test_dir();
 
 QPCIAddress addr = {
 .devfn = QPCI_DEVFN(4, 0),
diff --git a/tests/qtest/libqos/virtio-9p.h b/tests/qtest/libqos/virtio-9p.h
index 19a4d97454..480727120e 100644
--- a/tests/qtest/libqos/virtio-9p.h
+++ b/tests/qtest/libqos/virtio-9p.h
@@ -44,6 +44,16 @@ struct QVirtio9PDevice {
 QVirtio9P v9p;
 };
 
+/**
+ * Creates the directory for the 9pfs 'local' filesystem driver to access.
+ */
+void virtio_9p_create_local_test_dir(void);
+
+/**
+ * Deletes directory previously created by virtio_9p_create_local_test_dir().
+ */
+void virtio_9p_remove_local_test_dir(void);
+
 /**
  * Prepares QEMU command line for 9pfs tests using the 'local' fs driver.
  */
-- 
2.20.1

[PATCH v3 2/2] tests/9pfs: fix test dir for parallel tests

2020-11-01 Thread Christian Schoenebeck

Use mkdtemp() to generate a unique directory for the 9p 'local' tests.

This fixes occasional 9p test failures when running 'make check -jN' if
QEMU was compiled for multiple target architectures, because the individual
architecture's test suites would run in parallel and interfere with each
other's data as the test directory was previously hard coded and hence the
same directory was used by all of them simultaniously.

This also requires a change how the test directory is created and deleted:
As the test path is now randomized and virtio_9p_register_nodes() being
called in a somewhat undeterministic way, that's no longer an appropriate
place to create and remove the test directory. Use a constructor and
destructor function for creating and removing the test directory instead.
Unfortunately libqos currently does not support setup/teardown callbacks
to handle this more cleanly.

The constructor functions needs to be in virtio-9p-test.c, not in
virtio-9p.c, because in the latter location it would cause all apps that
link to libqos (i.e. entirely unrelated test suites) to create a 9pfs
test directory as well, which would even break other test suites.

Signed-off-by: Christian Schoenebeck 
---
 tests/qtest/libqos/virtio-9p.c | 14 --
 tests/qtest/virtio-9p-test.c   | 12 
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/tests/qtest/libqos/virtio-9p.c b/tests/qtest/libqos/virtio-9p.c
index 2736e9ae2a..586e700b24 100644
--- a/tests/qtest/libqos/virtio-9p.c
+++ b/tests/qtest/libqos/virtio-9p.c
@@ -35,7 +35,12 @@ static char *concat_path(const char* a, const char* b)
 static void init_local_test_path(void)
 {
 char *pwd = g_get_current_dir();
-local_test_path = concat_path(pwd, "qtest-9p-local");
+char *template = concat_path(pwd, "qtest-9p-local-XX");
+local_test_path = mkdtemp(template);
+if (!local_test_path) {
+g_test_message("mkdtemp('%s') failed: %s", template, strerror(errno));
+}
+g_assert(local_test_path);
 g_free(pwd);
 }
 
@@ -43,6 +48,8 @@ void virtio_9p_create_local_test_dir(void)
 {
 struct stat st;
 
+init_local_test_path();
+
 g_assert(local_test_path != NULL);
 mkdir(local_test_path, 0777);
 
@@ -244,11 +251,6 @@ static void virtio_9p_register_nodes(void)
 const char *str_simple = "fsdev=fsdev0,mount_tag=" MOUNT_TAG;
 const char *str_addr = "fsdev=fsdev0,addr=04.0,mount_tag=" MOUNT_TAG;
 
-/* make sure test dir for the 'local' tests exists and is clean */
-init_local_test_path();
-virtio_9p_remove_local_test_dir();
-virtio_9p_create_local_test_dir();
-
 QPCIAddress addr = {
 .devfn = QPCI_DEVFN(4, 0),
 };
diff --git a/tests/qtest/virtio-9p-test.c b/tests/qtest/virtio-9p-test.c
index c15908f27b..6401d4f564 100644
--- a/tests/qtest/virtio-9p-test.c
+++ b/tests/qtest/virtio-9p-test.c
@@ -1076,3 +1076,15 @@ static void register_virtio_9p_test(void)
 }
 
 libqos_init(register_virtio_9p_test);
+
+static void __attribute__((constructor)) construct_9p_test(void)
+{
+/* make sure test dir for the 'local' tests exists */
+virtio_9p_create_local_test_dir();
+}
+
+static void __attribute__((destructor)) destruct_9p_test(void)
+{
+/* remove previously created test dir when test suite completed */
+virtio_9p_remove_local_test_dir();
+}
-- 
2.20.1

[PATCH v2 0/2] Assorted fixes to tests that were broken by recent scsi changes

2020-11-01 Thread Maxim Levitsky

While most of the patches in V1 of this series are already merged upstream,
the patch that fixes iotest 240 was broken on s390 and was not accepted.

This is an updated version of this patch, based on Paulo's suggestion,
that hopefully makes this iotest work on both x86 and s390.

Best regards,
Maxim Levitsky

Maxim Levitsky (2):
  iotests: add filter_qmp_virtio_scsi function
  iotests: rewrite iotest 240 in python

 tests/qemu-iotests/240| 228 +++---
 tests/qemu-iotests/240.out|  76 +++-
 tests/qemu-iotests/iotests.py |  10 ++
 3 files changed, 153 insertions(+), 161 deletions(-)

-- 
2.26.2

[PATCH v2 1/2] iotests: add filter_qmp_virtio_scsi function

2020-11-01 Thread Maxim Levitsky

filter_qmp_virtio_scsi can be used to filter virtio-scsi-pci/ccw differences.
Note that this patch was only tested on x86.

Suggested-by: Paolo Bonzini 
Signed-off-by: Maxim Levitsky 
---
 tests/qemu-iotests/iotests.py | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 63d2ace93c..18b7437600 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -392,6 +392,16 @@ def filter_qmp_testfiles(qmsg):
 return value
 return filter_qmp(qmsg, _filter)
 
+def filter_virtio_scsi(output: str) -> str:
+return re.sub(r'(virtio-scsi)-(ccw|pci)', r'\1', output)
+
+def filter_qmp_virtio_scsi(qmsg):
+def _filter(_key, value):
+if is_str(value):
+return filter_virtio_scsi(value)
+return value
+return filter_qmp(qmsg, _filter)
+
 def filter_generated_node_ids(msg):
 return re.sub("#block[0-9]+", "NODE_NAME", msg)
 
-- 
2.26.2

[PATCH v2 2/2] iotests: rewrite iotest 240 in python

2020-11-01 Thread Maxim Levitsky

The recent changes that brought RCU delayed device deletion,
broke few tests and this test breakage went unnoticed.

Fix this test by rewriting it in python
(which allows to wait for DEVICE_DELETED events before continuing).

Signed-off-by: Maxim Levitsky 
---
 tests/qemu-iotests/240 | 228 -
 tests/qemu-iotests/240.out |  76 -
 2 files changed, 143 insertions(+), 161 deletions(-)

diff --git a/tests/qemu-iotests/240 b/tests/qemu-iotests/240
index 8b4337b58d..bfc9b72f36 100755
--- a/tests/qemu-iotests/240
+++ b/tests/qemu-iotests/240
@@ -1,5 +1,5 @@
-#!/usr/bin/env bash
-#
+#!/usr/bin/env python3
+
 # Test hot plugging and unplugging with iothreads
 #
 # Copyright (C) 2019 Igalia, S.L.
@@ -17,133 +17,99 @@
 #
 # You should have received a copy of the GNU General Public License
 # along with this program.  If not, see .
-#
 
-# creator
-owner=be...@igalia.com
-
-seq=`basename $0`
-echo "QA output created by $seq"
-
-status=1   # failure is the default!
-
-_cleanup()
-{
-rm -f "$SOCK_DIR/nbd"
-}
-trap "_cleanup; exit \$status" 0 1 2 3 15
-
-# get standard environment, filters and checks
-. ./common.rc
-. ./common.filter
-
-_supported_fmt generic
-_supported_proto generic
-
-do_run_qemu()
-{
-echo Testing: "$@"
-$QEMU -nographic -qmp stdio -serial none "$@"
-echo
-}
-
-# Remove QMP events from (pretty-printed) output. Doesn't handle
-# nested dicts correctly, but we don't get any of those in this test.
-_filter_qmp_events()
-{
-tr '\n' '\t' | sed -e \
-   
's/{\s*"timestamp":\s*{[^}]*},\s*"event":[^,}]*\(,\s*"data":\s*{[^}]*}\)\?\s*}\s*//g'
 \
-   | tr '\t' '\n'
-}
-
-run_qemu()
-{
-do_run_qemu "$@" 2>&1 | _filter_qmp | _filter_qmp_events
-}
-
-case "$QEMU_DEFAULT_MACHINE" in
-  s390-ccw-virtio)
-  virtio_scsi=virtio-scsi-ccw
-  ;;
-  *)
-  virtio_scsi=virtio-scsi-pci
-  ;;
-esac
-
-echo
-echo === Unplug a SCSI disk and then plug it again ===
-echo
-
-run_qemu <

Re: [PULL 00/18] riscv-to-apply queue

2020-11-01 Thread Bin Meng

On Sun, Nov 1, 2020 at 10:02 PM Peter Maydell  wrote:
>
> On Thu, 29 Oct 2020 at 14:25, Alistair Francis  
> wrote:
> >
> > The following changes since commit c0444009147aa935d52d5acfc6b70094bb42b0dd:
> >
> >   Merge remote-tracking branch 'remotes/armbru/tags/pull-qmp-2020-10-27' 
> > into staging (2020-10-29 10:03:32 +)
> >
> > are available in the Git repository at:
> >
> >   g...@github.com:alistair23/qemu.git tags/pull-riscv-to-apply-20201029
> >
> > for you to fetch changes up to e041badcd4ac644a67f02f8765095a5ff7a24d47:
> >
> >   hw/riscv: microchip_pfsoc: Hook the I2C1 controller (2020-10-29 07:11:14 
> > -0700)
> >
> > 
> > This series adds support for migration to RISC-V QEMU and expands the
> > Microchip PFSoC to allow unmodified HSS and Linux boots.
> >
> > 
>
> Hi; this fails 'make check' on 32-bit hosts:

Oops, I don't have 32-bit hosts to test :(

>
> qemu-system-riscv64: at most 2047 MB RAM can be simulated
> Broken pipe
> ../../tests/qtest/libqtest.c:167: kill_qemu() tried to terminate QEMU
> process but encountered exit status 1 (expected 0)
> ERROR qtest-riscv64/qom-test - too few tests run (expected 6, got 3)
>
> and
>
> qemu-system-riscv64: at most 2047 MB RAM can be simulated
> Broken pipe
> ../../tests/qtest/libqtest.c:167: kill_qemu() tried to terminate QEMU
> process but encountered exit status 1 (expected 0)
> ERROR qtest-riscv64/test-hmp - too few tests run (expected 7, got 3)
>

But I think this is caused by the following commit:
https://github.com/alistair23/qemu/commit/8c47c1e9df850a928b4b230240a950feabe6152f

I will send a new version of this patch soon.

Regards,
Bin

[PATCH v3] hw/riscv: microchip_pfsoc: Correct DDR memory map

2020-11-01 Thread Bin Meng

From: Bin Meng 

When system memory is larger than 1 GiB (high memory), PolarFire SoC
maps it at address 0x10__. Address 0xC000_ and above is
aliased to the same 1 GiB low memory with different cache attributes.

At present QEMU maps the system memory contiguously from 0x8000_.
This corrects the wrong QEMU logic. Note address 0x14__ is
the alias to the high memory, and even physical memory is only 1 GiB,
the HSS codes still tries to probe the high memory alias address.
It seems there is no issue on the real hardware, so we will have to
take that into the consideration in our emulation. Due to this, we
we increase the default system memory size to 2047 MiB (the largest
ram size allowed when running on a 32-bit host) so that user gets
notified an error when less than 2047 MiB is specified.

Signed-off-by: Bin Meng 

---
This patch should replace the following commit in Alistair's
riscv-to-apply.next tree:
https://github.com/alistair23/qemu/commit/8c47c1e9df850a928b4b230240a950feabe6152f

Changes in v3:
- Change default ram size to 2047 MiB for 32-bit host

 hw/riscv/microchip_pfsoc.c | 48 ++
 include/hw/riscv/microchip_pfsoc.h |  5 +++-
 2 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/hw/riscv/microchip_pfsoc.c b/hw/riscv/microchip_pfsoc.c
index 44a84732ac..0bc17b3955 100644
--- a/hw/riscv/microchip_pfsoc.c
+++ b/hw/riscv/microchip_pfsoc.c
@@ -121,7 +121,10 @@ static const struct MemmapEntry {
 [MICROCHIP_PFSOC_ENVM_CFG] ={ 0x2020, 0x1000 },
 [MICROCHIP_PFSOC_ENVM_DATA] =   { 0x2022,0x2 },
 [MICROCHIP_PFSOC_IOSCB] =   { 0x3000, 0x1000 },
-[MICROCHIP_PFSOC_DRAM] ={ 0x8000,0x0 },
+[MICROCHIP_PFSOC_DRAM_LO] = { 0x8000, 0x4000 },
+[MICROCHIP_PFSOC_DRAM_LO_ALIAS] =   { 0xc000, 0x4000 },
+[MICROCHIP_PFSOC_DRAM_HI] =   { 0x10,0x0 },
+[MICROCHIP_PFSOC_DRAM_HI_ALIAS] = { 0x14,0x0 },
 };
 
 static void microchip_pfsoc_soc_instance_init(Object *obj)
@@ -424,7 +427,11 @@ static void microchip_icicle_kit_machine_init(MachineState 
*machine)
 const struct MemmapEntry *memmap = microchip_pfsoc_memmap;
 MicrochipIcicleKitState *s = MICROCHIP_ICICLE_KIT_MACHINE(machine);
 MemoryRegion *system_memory = get_system_memory();
-MemoryRegion *main_mem = g_new(MemoryRegion, 1);
+MemoryRegion *mem_low = g_new(MemoryRegion, 1);
+MemoryRegion *mem_low_alias = g_new(MemoryRegion, 1);
+MemoryRegion *mem_high = g_new(MemoryRegion, 1);
+MemoryRegion *mem_high_alias = g_new(MemoryRegion, 1);
+uint64_t mem_high_size;
 DriveInfo *dinfo = drive_get_next(IF_SD);
 
 /* Sanity check on RAM size */
@@ -441,10 +448,39 @@ static void 
microchip_icicle_kit_machine_init(MachineState *machine)
 qdev_realize(DEVICE(&s->soc), NULL, &error_abort);
 
 /* Register RAM */
-memory_region_init_ram(main_mem, NULL, "microchip.icicle.kit.ram",
-   machine->ram_size, &error_fatal);
+memory_region_init_ram(mem_low, NULL, "microchip.icicle.kit.ram_low",
+   memmap[MICROCHIP_PFSOC_DRAM_LO].size,
+   &error_fatal);
+memory_region_init_alias(mem_low_alias, NULL,
+ "microchip.icicle.kit.ram_low.alias",
+ mem_low, 0,
+ memmap[MICROCHIP_PFSOC_DRAM_LO_ALIAS].size);
+memory_region_add_subregion(system_memory,
+memmap[MICROCHIP_PFSOC_DRAM_LO].base,
+mem_low);
+memory_region_add_subregion(system_memory,
+memmap[MICROCHIP_PFSOC_DRAM_LO_ALIAS].base,
+mem_low_alias);
+
+/*
+ * Map 1 GiB high memory because HSS will do memory test against the high
+ * memory address range regardless of physical memory installed.
+ *
+ * See memory_tests() in mss_ddr.c in the HSS source code.
+ */
+mem_high_size = machine->ram_size - 1 * GiB;
+
+memory_region_init_ram(mem_high, NULL, "microchip.icicle.kit.ram_high",
+   mem_high_size, &error_fatal);
+memory_region_init_alias(mem_high_alias, NULL,
+ "microchip.icicle.kit.ram_high.alias",
+ mem_high, 0, mem_high_size);
+memory_region_add_subregion(system_memory,
+memmap[MICROCHIP_PFSOC_DRAM_HI].base,
+mem_high);
 memory_region_add_subregion(system_memory,
-memmap[MICROCHIP_PFSOC_DRAM].base, main_mem);
+memmap[MICROCHIP_PFSOC_DRAM_HI_ALIAS].base,
+mem_high_alias);
 
 /* Load the firmware */
 riscv_find_and_load_firmware(machine, BIOS_FILENAME, RESET_VECTOR, NULL);
@@ -470,7 +506

Re: [PATCH v3] hw/riscv: microchip_pfsoc: Correct DDR memory map

2020-11-01 Thread Alistair Francis

On Sun, Nov 1, 2020 at 8:42 AM Bin Meng  wrote:
>
> From: Bin Meng 
>
> When system memory is larger than 1 GiB (high memory), PolarFire SoC
> maps it at address 0x10__. Address 0xC000_ and above is
> aliased to the same 1 GiB low memory with different cache attributes.
>
> At present QEMU maps the system memory contiguously from 0x8000_.
> This corrects the wrong QEMU logic. Note address 0x14__ is
> the alias to the high memory, and even physical memory is only 1 GiB,
> the HSS codes still tries to probe the high memory alias address.
> It seems there is no issue on the real hardware, so we will have to
> take that into the consideration in our emulation. Due to this, we
> we increase the default system memory size to 2047 MiB (the largest
> ram size allowed when running on a 32-bit host) so that user gets
> notified an error when less than 2047 MiB is specified.

Is this better than just not supporting 32-bit hosts? Or could we make
this number even lower (as low as possible that still works with HSS)?

Alistair

>
> Signed-off-by: Bin Meng 
>
> ---
> This patch should replace the following commit in Alistair's
> riscv-to-apply.next tree:
> https://github.com/alistair23/qemu/commit/8c47c1e9df850a928b4b230240a950feabe6152f
>
> Changes in v3:
> - Change default ram size to 2047 MiB for 32-bit host
>
>  hw/riscv/microchip_pfsoc.c | 48 ++
>  include/hw/riscv/microchip_pfsoc.h |  5 +++-
>  2 files changed, 46 insertions(+), 7 deletions(-)
>
> diff --git a/hw/riscv/microchip_pfsoc.c b/hw/riscv/microchip_pfsoc.c
> index 44a84732ac..0bc17b3955 100644
> --- a/hw/riscv/microchip_pfsoc.c
> +++ b/hw/riscv/microchip_pfsoc.c
> @@ -121,7 +121,10 @@ static const struct MemmapEntry {
>  [MICROCHIP_PFSOC_ENVM_CFG] ={ 0x2020, 0x1000 },
>  [MICROCHIP_PFSOC_ENVM_DATA] =   { 0x2022,0x2 },
>  [MICROCHIP_PFSOC_IOSCB] =   { 0x3000, 0x1000 },
> -[MICROCHIP_PFSOC_DRAM] ={ 0x8000,0x0 },
> +[MICROCHIP_PFSOC_DRAM_LO] = { 0x8000, 0x4000 },
> +[MICROCHIP_PFSOC_DRAM_LO_ALIAS] =   { 0xc000, 0x4000 },
> +[MICROCHIP_PFSOC_DRAM_HI] =   { 0x10,0x0 },
> +[MICROCHIP_PFSOC_DRAM_HI_ALIAS] = { 0x14,0x0 },
>  };
>
>  static void microchip_pfsoc_soc_instance_init(Object *obj)
> @@ -424,7 +427,11 @@ static void 
> microchip_icicle_kit_machine_init(MachineState *machine)
>  const struct MemmapEntry *memmap = microchip_pfsoc_memmap;
>  MicrochipIcicleKitState *s = MICROCHIP_ICICLE_KIT_MACHINE(machine);
>  MemoryRegion *system_memory = get_system_memory();
> -MemoryRegion *main_mem = g_new(MemoryRegion, 1);
> +MemoryRegion *mem_low = g_new(MemoryRegion, 1);
> +MemoryRegion *mem_low_alias = g_new(MemoryRegion, 1);
> +MemoryRegion *mem_high = g_new(MemoryRegion, 1);
> +MemoryRegion *mem_high_alias = g_new(MemoryRegion, 1);
> +uint64_t mem_high_size;
>  DriveInfo *dinfo = drive_get_next(IF_SD);
>
>  /* Sanity check on RAM size */
> @@ -441,10 +448,39 @@ static void 
> microchip_icicle_kit_machine_init(MachineState *machine)
>  qdev_realize(DEVICE(&s->soc), NULL, &error_abort);
>
>  /* Register RAM */
> -memory_region_init_ram(main_mem, NULL, "microchip.icicle.kit.ram",
> -   machine->ram_size, &error_fatal);
> +memory_region_init_ram(mem_low, NULL, "microchip.icicle.kit.ram_low",
> +   memmap[MICROCHIP_PFSOC_DRAM_LO].size,
> +   &error_fatal);
> +memory_region_init_alias(mem_low_alias, NULL,
> + "microchip.icicle.kit.ram_low.alias",
> + mem_low, 0,
> + memmap[MICROCHIP_PFSOC_DRAM_LO_ALIAS].size);
> +memory_region_add_subregion(system_memory,
> +memmap[MICROCHIP_PFSOC_DRAM_LO].base,
> +mem_low);
> +memory_region_add_subregion(system_memory,
> +memmap[MICROCHIP_PFSOC_DRAM_LO_ALIAS].base,
> +mem_low_alias);
> +
> +/*
> + * Map 1 GiB high memory because HSS will do memory test against the high
> + * memory address range regardless of physical memory installed.
> + *
> + * See memory_tests() in mss_ddr.c in the HSS source code.
> + */
> +mem_high_size = machine->ram_size - 1 * GiB;
> +
> +memory_region_init_ram(mem_high, NULL, "microchip.icicle.kit.ram_high",
> +   mem_high_size, &error_fatal);
> +memory_region_init_alias(mem_high_alias, NULL,
> + "microchip.icicle.kit.ram_high.alias",
> + mem_high, 0, mem_high_size);
> +memory_region_add_subregion(system_memory,
> +memmap[MICROCHIP_PFSOC_DRAM_HI].base,
> +mem_

Re: [PATCH v3] hw/riscv: microchip_pfsoc: Correct DDR memory map

2020-11-01 Thread Bin Meng

Hi Alistair,

On Mon, Nov 2, 2020 at 12:46 AM Alistair Francis  wrote:
>
> On Sun, Nov 1, 2020 at 8:42 AM Bin Meng  wrote:
> >
> > From: Bin Meng 
> >
> > When system memory is larger than 1 GiB (high memory), PolarFire SoC
> > maps it at address 0x10__. Address 0xC000_ and above is
> > aliased to the same 1 GiB low memory with different cache attributes.
> >
> > At present QEMU maps the system memory contiguously from 0x8000_.
> > This corrects the wrong QEMU logic. Note address 0x14__ is
> > the alias to the high memory, and even physical memory is only 1 GiB,
> > the HSS codes still tries to probe the high memory alias address.
> > It seems there is no issue on the real hardware, so we will have to
> > take that into the consideration in our emulation. Due to this, we
> > we increase the default system memory size to 2047 MiB (the largest
> > ram size allowed when running on a 32-bit host) so that user gets
> > notified an error when less than 2047 MiB is specified.
>
> Is this better than just not supporting 32-bit hosts? Or could we make

I am not sure if we have a general rule about discontinuing 32-bit
hosts support, i.e.: deprecating 32-bit hosts at some time?

> this number even lower (as low as possible that still works with HSS)?
>

Sure I will figure this out and set this number to meet the minium
requirement of HSS.

Regards,
Bin

Re: [PATCH v3] hw/riscv: microchip_pfsoc: Correct DDR memory map

2020-11-01 Thread Alistair Francis

On Sun, Nov 1, 2020 at 8:51 AM Bin Meng  wrote:
>
> Hi Alistair,
>
> On Mon, Nov 2, 2020 at 12:46 AM Alistair Francis  wrote:
> >
> > On Sun, Nov 1, 2020 at 8:42 AM Bin Meng  wrote:
> > >
> > > From: Bin Meng 
> > >
> > > When system memory is larger than 1 GiB (high memory), PolarFire SoC
> > > maps it at address 0x10__. Address 0xC000_ and above is
> > > aliased to the same 1 GiB low memory with different cache attributes.
> > >
> > > At present QEMU maps the system memory contiguously from 0x8000_.
> > > This corrects the wrong QEMU logic. Note address 0x14__ is
> > > the alias to the high memory, and even physical memory is only 1 GiB,
> > > the HSS codes still tries to probe the high memory alias address.
> > > It seems there is no issue on the real hardware, so we will have to
> > > take that into the consideration in our emulation. Due to this, we
> > > we increase the default system memory size to 2047 MiB (the largest
> > > ram size allowed when running on a 32-bit host) so that user gets
> > > notified an error when less than 2047 MiB is specified.
> >
> > Is this better than just not supporting 32-bit hosts? Or could we make
>
> I am not sure if we have a general rule about discontinuing 32-bit
> hosts support, i.e.: deprecating 32-bit hosts at some time?
>
> > this number even lower (as low as possible that still works with HSS)?
> >
>
> Sure I will figure this out and set this number to meet the minium
> requirement of HSS.

Thanks, that's probably the best bet for both 32 and 64-bit hosts then.

Alistair

>
> Regards,
> Bin

Re: [PATCH v3 1/2] tests/9pfs: make create/remove test dir public

2020-11-01 Thread Greg Kurz

On Sun, 1 Nov 2020 15:25:14 +0100
Christian Schoenebeck  wrote:

> Make functions create_local_test_dir() and remove_local_test_dir()
> public. They're going to be used in the next patch.
> 
> Signed-off-by: Christian Schoenebeck 
> ---

Reviewed-by: Greg Kurz 

>  tests/qtest/libqos/virtio-9p.c | 10 --
>  tests/qtest/libqos/virtio-9p.h | 10 ++
>  2 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/tests/qtest/libqos/virtio-9p.c b/tests/qtest/libqos/virtio-9p.c
> index d43647b3b7..2736e9ae2a 100644
> --- a/tests/qtest/libqos/virtio-9p.c
> +++ b/tests/qtest/libqos/virtio-9p.c
> @@ -39,8 +39,7 @@ static void init_local_test_path(void)
>  g_free(pwd);
>  }
>  
> -/* Creates the directory for the 9pfs 'local' filesystem driver to access. */
> -static void create_local_test_dir(void)
> +void virtio_9p_create_local_test_dir(void)
>  {
>  struct stat st;
>  
> @@ -53,8 +52,7 @@ static void create_local_test_dir(void)
>  g_assert((st.st_mode & S_IFMT) == S_IFDIR);
>  }
>  
> -/* Deletes directory previously created by create_local_test_dir(). */
> -static void remove_local_test_dir(void)
> +void virtio_9p_remove_local_test_dir(void)
>  {
>  g_assert(local_test_path != NULL);
>  char *cmd = g_strdup_printf("rm -r '%s'\n", local_test_path);
> @@ -248,8 +246,8 @@ static void virtio_9p_register_nodes(void)
>  
>  /* make sure test dir for the 'local' tests exists and is clean */
>  init_local_test_path();
> -remove_local_test_dir();
> -create_local_test_dir();
> +virtio_9p_remove_local_test_dir();
> +virtio_9p_create_local_test_dir();
>  
>  QPCIAddress addr = {
>  .devfn = QPCI_DEVFN(4, 0),
> diff --git a/tests/qtest/libqos/virtio-9p.h b/tests/qtest/libqos/virtio-9p.h
> index 19a4d97454..480727120e 100644
> --- a/tests/qtest/libqos/virtio-9p.h
> +++ b/tests/qtest/libqos/virtio-9p.h
> @@ -44,6 +44,16 @@ struct QVirtio9PDevice {
>  QVirtio9P v9p;
>  };
>  
> +/**
> + * Creates the directory for the 9pfs 'local' filesystem driver to access.
> + */
> +void virtio_9p_create_local_test_dir(void);
> +
> +/**
> + * Deletes directory previously created by virtio_9p_create_local_test_dir().
> + */
> +void virtio_9p_remove_local_test_dir(void);
> +
>  /**
>   * Prepares QEMU command line for 9pfs tests using the 'local' fs driver.
>   */

[PATCH v4] hw/riscv: microchip_pfsoc: Correct DDR memory map

2020-11-01 Thread Bin Meng

From: Bin Meng 

When system memory is larger than 1 GiB (high memory), PolarFire SoC
maps it at address 0x10__. Address 0xC000_ and above is
aliased to the same 1 GiB low memory with different cache attributes.

At present QEMU maps the system memory contiguously from 0x8000_.
This corrects the wrong QEMU logic. Note address 0x14__ is
the alias to the high memory, and even physical memory is only 1 GiB,
the HSS codes still tries to probe the high memory alias address.
It seems there is no issue on the real hardware, so we will have to
take that into the consideration in our emulation. Due to this, we
we increase the default system memory size to 1537 MiB (the minimum
required high memory size by HSS) so that user gets notified an error
when less than 1537 MiB is specified.

Signed-off-by: Bin Meng 

---
This patch should replace the following commit in Alistair's
riscv-to-apply.next tree:
https://github.com/alistair23/qemu/commit/8c47c1e9df850a928b4b230240a950feabe6152f

Changes in v4:
- Change default ram size to 1537 MiB which is the minimum required
  high memory size to satisfy HSS

Changes in v3:
- Change default ram size to 2047 MiB for 32-bit host

 hw/riscv/microchip_pfsoc.c | 50 ++
 include/hw/riscv/microchip_pfsoc.h |  5 ++-
 2 files changed, 48 insertions(+), 7 deletions(-)

diff --git a/hw/riscv/microchip_pfsoc.c b/hw/riscv/microchip_pfsoc.c
index 44a84732ac..96cb8b983a 100644
--- a/hw/riscv/microchip_pfsoc.c
+++ b/hw/riscv/microchip_pfsoc.c
@@ -121,7 +121,10 @@ static const struct MemmapEntry {
 [MICROCHIP_PFSOC_ENVM_CFG] ={ 0x2020, 0x1000 },
 [MICROCHIP_PFSOC_ENVM_DATA] =   { 0x2022,0x2 },
 [MICROCHIP_PFSOC_IOSCB] =   { 0x3000, 0x1000 },
-[MICROCHIP_PFSOC_DRAM] ={ 0x8000,0x0 },
+[MICROCHIP_PFSOC_DRAM_LO] = { 0x8000, 0x4000 },
+[MICROCHIP_PFSOC_DRAM_LO_ALIAS] =   { 0xc000, 0x4000 },
+[MICROCHIP_PFSOC_DRAM_HI] =   { 0x10,0x0 },
+[MICROCHIP_PFSOC_DRAM_HI_ALIAS] = { 0x14,0x0 },
 };
 
 static void microchip_pfsoc_soc_instance_init(Object *obj)
@@ -424,7 +427,11 @@ static void microchip_icicle_kit_machine_init(MachineState 
*machine)
 const struct MemmapEntry *memmap = microchip_pfsoc_memmap;
 MicrochipIcicleKitState *s = MICROCHIP_ICICLE_KIT_MACHINE(machine);
 MemoryRegion *system_memory = get_system_memory();
-MemoryRegion *main_mem = g_new(MemoryRegion, 1);
+MemoryRegion *mem_low = g_new(MemoryRegion, 1);
+MemoryRegion *mem_low_alias = g_new(MemoryRegion, 1);
+MemoryRegion *mem_high = g_new(MemoryRegion, 1);
+MemoryRegion *mem_high_alias = g_new(MemoryRegion, 1);
+uint64_t mem_high_size;
 DriveInfo *dinfo = drive_get_next(IF_SD);
 
 /* Sanity check on RAM size */
@@ -441,10 +448,33 @@ static void 
microchip_icicle_kit_machine_init(MachineState *machine)
 qdev_realize(DEVICE(&s->soc), NULL, &error_abort);
 
 /* Register RAM */
-memory_region_init_ram(main_mem, NULL, "microchip.icicle.kit.ram",
-   machine->ram_size, &error_fatal);
+memory_region_init_ram(mem_low, NULL, "microchip.icicle.kit.ram_low",
+   memmap[MICROCHIP_PFSOC_DRAM_LO].size,
+   &error_fatal);
+memory_region_init_alias(mem_low_alias, NULL,
+ "microchip.icicle.kit.ram_low.alias",
+ mem_low, 0,
+ memmap[MICROCHIP_PFSOC_DRAM_LO_ALIAS].size);
+memory_region_add_subregion(system_memory,
+memmap[MICROCHIP_PFSOC_DRAM_LO].base,
+mem_low);
 memory_region_add_subregion(system_memory,
-memmap[MICROCHIP_PFSOC_DRAM].base, main_mem);
+memmap[MICROCHIP_PFSOC_DRAM_LO_ALIAS].base,
+mem_low_alias);
+
+mem_high_size = machine->ram_size - 1 * GiB;
+
+memory_region_init_ram(mem_high, NULL, "microchip.icicle.kit.ram_high",
+   mem_high_size, &error_fatal);
+memory_region_init_alias(mem_high_alias, NULL,
+ "microchip.icicle.kit.ram_high.alias",
+ mem_high, 0, mem_high_size);
+memory_region_add_subregion(system_memory,
+memmap[MICROCHIP_PFSOC_DRAM_HI].base,
+mem_high);
+memory_region_add_subregion(system_memory,
+memmap[MICROCHIP_PFSOC_DRAM_HI_ALIAS].base,
+mem_high_alias);
 
 /* Load the firmware */
 riscv_find_and_load_firmware(machine, BIOS_FILENAME, RESET_VECTOR, NULL);
@@ -470,7 +500,15 @@ static void 
microchip_icicle_kit_machine_class_init(ObjectClass *oc, void *data)
MICROCHIP_PFSOC_COMPUTE_CP

Re: [PATCH v3 2/2] tests/9pfs: fix test dir for parallel tests

2020-11-01 Thread Greg Kurz

On Sun, 1 Nov 2020 15:37:12 +0100
Christian Schoenebeck  wrote:

> Use mkdtemp() to generate a unique directory for the 9p 'local' tests.
> 
> This fixes occasional 9p test failures when running 'make check -jN' if
> QEMU was compiled for multiple target architectures, because the individual
> architecture's test suites would run in parallel and interfere with each
> other's data as the test directory was previously hard coded and hence the
> same directory was used by all of them simultaniously.
> 
> This also requires a change how the test directory is created and deleted:
> As the test path is now randomized and virtio_9p_register_nodes() being
> called in a somewhat undeterministic way, that's no longer an appropriate
> place to create and remove the test directory. Use a constructor and
> destructor function for creating and removing the test directory instead.
> Unfortunately libqos currently does not support setup/teardown callbacks
> to handle this more cleanly.
> 
> The constructor functions needs to be in virtio-9p-test.c, not in
> virtio-9p.c, because in the latter location it would cause all apps that
> link to libqos (i.e. entirely unrelated test suites) to create a 9pfs
> test directory as well, which would even break other test suites.
> 
> Signed-off-by: Christian Schoenebeck 
> ---

Reviewed-by: Greg Kurz 

I could run 'make check -j' with 4 archs (ppc64, x86_64, aarch64, s390x)
on a POWER9 system with 128 cpus, for ~1 hour without seeing any failure.

Tested-by: Greg Kurz 

>  tests/qtest/libqos/virtio-9p.c | 14 --
>  tests/qtest/virtio-9p-test.c   | 12 
>  2 files changed, 20 insertions(+), 6 deletions(-)
> 
> diff --git a/tests/qtest/libqos/virtio-9p.c b/tests/qtest/libqos/virtio-9p.c
> index 2736e9ae2a..586e700b24 100644
> --- a/tests/qtest/libqos/virtio-9p.c
> +++ b/tests/qtest/libqos/virtio-9p.c
> @@ -35,7 +35,12 @@ static char *concat_path(const char* a, const char* b)
>  static void init_local_test_path(void)
>  {
>  char *pwd = g_get_current_dir();
> -local_test_path = concat_path(pwd, "qtest-9p-local");
> +char *template = concat_path(pwd, "qtest-9p-local-XX");
> +local_test_path = mkdtemp(template);
> +if (!local_test_path) {
> +g_test_message("mkdtemp('%s') failed: %s", template, 
> strerror(errno));
> +}
> +g_assert(local_test_path);
>  g_free(pwd);
>  }
>  
> @@ -43,6 +48,8 @@ void virtio_9p_create_local_test_dir(void)
>  {
>  struct stat st;
>  
> +init_local_test_path();
> +
>  g_assert(local_test_path != NULL);
>  mkdir(local_test_path, 0777);
>  
> @@ -244,11 +251,6 @@ static void virtio_9p_register_nodes(void)
>  const char *str_simple = "fsdev=fsdev0,mount_tag=" MOUNT_TAG;
>  const char *str_addr = "fsdev=fsdev0,addr=04.0,mount_tag=" MOUNT_TAG;
>  
> -/* make sure test dir for the 'local' tests exists and is clean */
> -init_local_test_path();
> -virtio_9p_remove_local_test_dir();
> -virtio_9p_create_local_test_dir();
> -
>  QPCIAddress addr = {
>  .devfn = QPCI_DEVFN(4, 0),
>  };
> diff --git a/tests/qtest/virtio-9p-test.c b/tests/qtest/virtio-9p-test.c
> index c15908f27b..6401d4f564 100644
> --- a/tests/qtest/virtio-9p-test.c
> +++ b/tests/qtest/virtio-9p-test.c
> @@ -1076,3 +1076,15 @@ static void register_virtio_9p_test(void)
>  }
>  
>  libqos_init(register_virtio_9p_test);
> +
> +static void __attribute__((constructor)) construct_9p_test(void)
> +{
> +/* make sure test dir for the 'local' tests exists */
> +virtio_9p_create_local_test_dir();
> +}
> +
> +static void __attribute__((destructor)) destruct_9p_test(void)
> +{
> +/* remove previously created test dir when test suite completed */
> +virtio_9p_remove_local_test_dir();
> +}

Re: [PATCH v3 0/2] 9pfs: test suite fixes

2020-11-01 Thread Mark Cave-Ayland


On 01/11/2020 15:12, Christian Schoenebeck wrote:


Fixes test failures with the 9pfs 'local' tests as discussed with latest
9P PR. See the discussion of that PR v2 (Fri, Oct 30th) for details.

In conjunction with Peter Xu's two migration patches (fixing occasional
lockups of migration tests) overall situation appears to be smooth now:
https://lore.kernel.org/qemu-devel/20201030135350.GA588069@xz-x1/

v2->v3:

   - Make the two functions for creating and removing the 9pfs test directory
 public [NEW patch 1].

   - Place the constructor and destructor functions in virtio-9p-test.c, not
 in virtio-9p.c, because the latter location would cause the constructor
 to be executed whenever libqos is loaded, which would break other,
 completely unrelated tests suites that just link to libqos [patch 2].

   - Previous patch 2 (coverity fix) is already queued, no changes, hence
 omitted in this v3.

v1->v2:

   - Added Greg's tested-by tag [patch 1].

   - Log an info-level message if mkdir() failed [patch 2].

   - Update commit log message about coverity being the reporter and
 details of the coverity report [patch 2].

Christian Schoenebeck (2):
   tests/9pfs: make create/remove test dir public
   tests/9pfs: fix test dir for parallel tests

  tests/qtest/libqos/virtio-9p.c | 20 ++--
  tests/qtest/libqos/virtio-9p.h | 10 ++
  tests/qtest/virtio-9p-test.c   | 12 
  3 files changed, 32 insertions(+), 10 deletions(-)


FWIW one thing I've noticed recently is that my builds for qemu-system-sparc64 have 
started giving this warning about a missing "qtest-9p-local" directory during make check:


...
...
Running test QAPI schema regression tests
Running test qtest-sparc64/endianness-test
Running test qtest-sparc64/prom-env-test
Running test qtest-sparc64/boot-serial-test
Running test qtest-sparc64/cdrom-test
Running test qtest-sparc64/device-introspect-test
Running test qtest-sparc64/machine-none-test
Running test qtest-sparc64/qmp-test
Running test qtest-sparc64/qmp-cmd-test
Running test qtest-sparc64/qom-test
Running test qtest-sparc64/test-hmp
Running test qtest-sparc64/qos-test
rm: cannot remove '/home/build/src/qemu/git/qemu/build/qtest-9p-local': No such file 
or directory

  TESTiotest-qcow2: 001
  TESTiotest-qcow2: 002
  TESTiotest-qcow2: 003
  TESTiotest-qcow2: 004
  TESTiotest-qcow2: 005
...
...

Would this get resolved by the changes to the test directory in this patchset? The 
build is a simple configure run with "--target-list=sparc64-softmmu".



ATB,

Mark.

Re: [PULL 00/15] pc,pci,vhost,virtio: misc fixes

2020-11-01 Thread Peter Maydell

On Fri, 30 Oct 2020 at 14:11, Michael S. Tsirkin  wrote:
>
> The following changes since commit 802427bcdae1ad2eceea8a8877ecad835e3f8fde:
>
>   Merge remote-tracking branch 
> 'remotes/pmaydell/tags/pull-target-arm-20201027-1' into staging (2020-10-29 
> 11:40:04 +)
>
> are available in the Git repository at:
>
>   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream
>
> for you to fetch changes up to 73beb01ec54969f76ab32d1e0605a759b6c95ab0:
>
>   intel_iommu: Fix two misuse of "0x%u" prints (2020-10-30 06:48:53 -0400)
>
> 
> pc,pci,vhost,virtio: misc fixes
>
> Just a bunch of bugfixes all over the place.
>
> Signed-off-by: Michael S. Tsirkin 
>
> 


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/5.2
for any user-visible changes.

-- PMM

Re: [PATCH v3 2/2] tests/9pfs: fix test dir for parallel tests

2020-11-01 Thread Christian Schoenebeck

On Sonntag, 1. November 2020 18:44:44 CET Greg Kurz wrote:
> On Sun, 1 Nov 2020 15:37:12 +0100
> 
> Christian Schoenebeck  wrote:
> > Use mkdtemp() to generate a unique directory for the 9p 'local' tests.
> > 
> > This fixes occasional 9p test failures when running 'make check -jN' if
> > QEMU was compiled for multiple target architectures, because the
> > individual
> > architecture's test suites would run in parallel and interfere with each
> > other's data as the test directory was previously hard coded and hence the
> > same directory was used by all of them simultaniously.
> > 
> > This also requires a change how the test directory is created and deleted:
> > As the test path is now randomized and virtio_9p_register_nodes() being
> > called in a somewhat undeterministic way, that's no longer an appropriate
> > place to create and remove the test directory. Use a constructor and
> > destructor function for creating and removing the test directory instead.
> > Unfortunately libqos currently does not support setup/teardown callbacks
> > to handle this more cleanly.
> > 
> > The constructor functions needs to be in virtio-9p-test.c, not in
> > virtio-9p.c, because in the latter location it would cause all apps that
> > link to libqos (i.e. entirely unrelated test suites) to create a 9pfs
> > test directory as well, which would even break other test suites.
> > 
> > Signed-off-by: Christian Schoenebeck 
> > ---
> 
> Reviewed-by: Greg Kurz 

Thanks for the overtime, on a Sunday!

Queued on 9p.next:
https://github.com/cschoenebeck/qemu/commits/9p.next

And this one with Peter Xu's patches on top, just for testing:
https://github.com/cschoenebeck/qemu/commits/9p.experimental.2

> I could run 'make check -j' with 4 archs (ppc64, x86_64, aarch64, s390x)
> on a POWER9 system with 128 cpus, for ~1 hour without seeing any failure.
> 
> Tested-by: Greg Kurz 

OO Sounds like having advantages working for IBM. Respect. I start to get envy 
as these beasts are running towards PCIe 6, while we regular x86 users would 
already be glad having PCIe 4.

I give it some more spinning hours this time, just to be sure, before sending 
the PR tomorrow morning. But I think it's all right now.

Thanks!

Best regards,
Christian Schoenebeck

Re: [PATCH v3 0/2] 9pfs: test suite fixes

2020-11-01 Thread Christian Schoenebeck

On Sonntag, 1. November 2020 19:02:28 CET Mark Cave-Ayland wrote:
> On 01/11/2020 15:12, Christian Schoenebeck wrote:
> > Fixes test failures with the 9pfs 'local' tests as discussed with latest
> > 9P PR. See the discussion of that PR v2 (Fri, Oct 30th) for details.
> > 
> > In conjunction with Peter Xu's two migration patches (fixing occasional
> > lockups of migration tests) overall situation appears to be smooth now:
> > https://lore.kernel.org/qemu-devel/20201030135350.GA588069@xz-x1/
> > 
> > v2->v3:
> >- Make the two functions for creating and removing the 9pfs test
> >directory
> >
> >  public [NEW patch 1].
> >
> >- Place the constructor and destructor functions in virtio-9p-test.c,
> >not
> >
> >  in virtio-9p.c, because the latter location would cause the
> >  constructor
> >  to be executed whenever libqos is loaded, which would break other,
> >  completely unrelated tests suites that just link to libqos [patch 2].
> >
> >- Previous patch 2 (coverity fix) is already queued, no changes, hence
> >
> >  omitted in this v3.
> > 
> > v1->v2:
> >- Added Greg's tested-by tag [patch 1].
> >
> >- Log an info-level message if mkdir() failed [patch 2].
> >
> >- Update commit log message about coverity being the reporter and
> >
> >  details of the coverity report [patch 2].
> > 
> > Christian Schoenebeck (2):
> >tests/9pfs: make create/remove test dir public
> >tests/9pfs: fix test dir for parallel tests
> >   
> >   tests/qtest/libqos/virtio-9p.c | 20 ++--
> >   tests/qtest/libqos/virtio-9p.h | 10 ++
> >   tests/qtest/virtio-9p-test.c   | 12 
> >   3 files changed, 32 insertions(+), 10 deletions(-)
> 
> FWIW one thing I've noticed recently is that my builds for
> qemu-system-sparc64 have started giving this warning about a missing
> "qtest-9p-local" directory during make check:
> 
> ...
> ...
> Running test QAPI schema regression tests
> Running test qtest-sparc64/endianness-test
> Running test qtest-sparc64/prom-env-test
> Running test qtest-sparc64/boot-serial-test
> Running test qtest-sparc64/cdrom-test
> Running test qtest-sparc64/device-introspect-test
> Running test qtest-sparc64/machine-none-test
> Running test qtest-sparc64/qmp-test
> Running test qtest-sparc64/qmp-cmd-test
> Running test qtest-sparc64/qom-test
> Running test qtest-sparc64/test-hmp
> Running test qtest-sparc64/qos-test
> rm: cannot remove '/home/build/src/qemu/git/qemu/build/qtest-9p-local': No
> such file or directory
>TESTiotest-qcow2: 001
>TESTiotest-qcow2: 002
>TESTiotest-qcow2: 003
>TESTiotest-qcow2: 004
>TESTiotest-qcow2: 005
> ...
> ...
> 
> Would this get resolved by the changes to the test directory in this
> patchset? The build is a simple configure run with
> "--target-list=sparc64-softmmu".
> 
> 
> ATB,
> 
> Mark.

Yes, that should be resolved with the next 9p PR as well, additionally with 
the following patch that is:
https://github.com/cschoenebeck/qemu/commit/603cc76a6069

Thanks for the feedback!

Best regards,
Christian Schoenebeck

Re: [PATCH v3 2/2] tests/9pfs: fix test dir for parallel tests

2020-11-01 Thread Greg Kurz

On Sun, 01 Nov 2020 20:14:16 +0100
Christian Schoenebeck  wrote:

> On Sonntag, 1. November 2020 18:44:44 CET Greg Kurz wrote:
> > On Sun, 1 Nov 2020 15:37:12 +0100
> > 
> > Christian Schoenebeck  wrote:
> > > Use mkdtemp() to generate a unique directory for the 9p 'local' tests.
> > > 
> > > This fixes occasional 9p test failures when running 'make check -jN' if
> > > QEMU was compiled for multiple target architectures, because the
> > > individual
> > > architecture's test suites would run in parallel and interfere with each
> > > other's data as the test directory was previously hard coded and hence the
> > > same directory was used by all of them simultaniously.
> > > 
> > > This also requires a change how the test directory is created and deleted:
> > > As the test path is now randomized and virtio_9p_register_nodes() being
> > > called in a somewhat undeterministic way, that's no longer an appropriate
> > > place to create and remove the test directory. Use a constructor and
> > > destructor function for creating and removing the test directory instead.
> > > Unfortunately libqos currently does not support setup/teardown callbacks
> > > to handle this more cleanly.
> > > 
> > > The constructor functions needs to be in virtio-9p-test.c, not in
> > > virtio-9p.c, because in the latter location it would cause all apps that
> > > link to libqos (i.e. entirely unrelated test suites) to create a 9pfs
> > > test directory as well, which would even break other test suites.
> > > 
> > > Signed-off-by: Christian Schoenebeck 
> > > ---
> > 
> > Reviewed-by: Greg Kurz 
> 
> Thanks for the overtime, on a Sunday!
> 
> Queued on 9p.next:
> https://github.com/cschoenebeck/qemu/commits/9p.next
> 
> And this one with Peter Xu's patches on top, just for testing:
> https://github.com/cschoenebeck/qemu/commits/9p.experimental.2
> 
> > I could run 'make check -j' with 4 archs (ppc64, x86_64, aarch64, s390x)
> > on a POWER9 system with 128 cpus, for ~1 hour without seeing any failure.
> > 
> > Tested-by: Greg Kurz 
> 
> OO Sounds like having advantages working for IBM. Respect. I start to get 
> envy 
> as these beasts are running towards PCIe 6, while we regular x86 users would 
> already be glad having PCIe 4.
> 

I work for Red Hat now but yes, this allows easier access to bigger systems.

> I give it some more spinning hours this time, just to be sure, before sending 
> the PR tomorrow morning. But I think it's all right now.
> 

Cool ! :)

> Thanks!
> 
> Best regards,
> Christian Schoenebeck
> 
>

Re: [PATCH v3 0/2] 9pfs: test suite fixes

2020-11-01 Thread Mark Cave-Ayland


On 01/11/2020 19:17, Christian Schoenebeck wrote:


Yes, that should be resolved with the next 9p PR as well, additionally with
the following patch that is:
https://github.com/cschoenebeck/qemu/commit/603cc76a6069

Thanks for the feedback!


Fantastic - thanks a lot :)


ATB,

Mark.

Re: [PULL v2 00/32] VFIO updates 2020-10-28 (for QEMU 5.2 soft-freeze)

2020-11-01 Thread Alex Williamson

On Sat, 31 Oct 2020 14:54:54 +
Peter Maydell  wrote:

> On Wed, 28 Oct 2020 at 16:42, Alex Williamson
>  wrote:
> >
> > The following changes since commit 33dc9914eac581dea9bdea35dcda4d542531d66a:
> >
> >   Revert series: virtiofsd: Announce submounts to the guest (2020-10-28 
> > 13:17:32 +)
> >
> > are available in the Git repository at:
> >
> >   git://github.com/awilliam/qemu-vfio.git tags/vfio-update-20201028.0
> >
> > for you to fetch changes up to 83d64f2efe383f1f70e180cf1579d3bbe2fbcdf5:
> >
> >   vfio: fix incorrect print type (2020-10-28 10:30:37 -0600)
> >
> > 
> > VFIO update 2020-10-28
> >
> >  * Migration support (Kirti Wankhede)
> >  * s390 DMA limiting (Matthew Rosato)
> >  * zPCI hardware info (Matthew Rosato)
> >  * Lock guard (Amey Narkhede)
> >  * Print fixes (Zhengui li)
> >  * Warning/build fixes
> >  
> 
> Hi; this fails to build on OSX and the BSDs:
> 
> ../../hw/s390x/s390-pci-vfio.c:13:10: fatal error: 'linux/vfio.h' file not 
> found
> #include 
>  ^~
> 
> fails differently but on the same file on windows builds:
> 
> ../../hw/s390x/s390-pci-vfio.c:12:23: fatal error: sys/ioctl.h: No
> such file or directory

I think this can be solved by making s390-pci-vfio.c only compiled
under CONFIG_LINUX and stubbing the functions with static inlines in
the header.  It seems to resolve the windows warning in a mingw build.

> and has this error on 32-bit hosts:
> 
> ../../hw/vfio/common.c: In function 'vfio_dma_unmap_bitmap':
> ../../hw/vfio/common.c:414:48: error: passing argument 1 of
> 'cpu_physical_memory_set_dirty_lebitmap' from incompatible pointer
> type [-Werror=incompatible-pointer-types]
>  cpu_physical_memory_set_dirty_lebitmap((uint64_t *)bitmap->data,
> ^
> In file included from ../../hw/vfio/common.c:32:0:
> /home/peter.maydell/qemu/include/exec/ram_addr.h:337:20: note:
> expected 'long unsigned int *' but argument is of type 'uint64_t *
> {aka long long unsigned int *}'
>  static inline void cpu_physical_memory_set_dirty_lebitmap(unsigned
> long *bitmap,
> ^~
> ../../hw/vfio/common.c: In function 'vfio_get_dirty_bitmap':
> ../../hw/vfio/common.c:1008:44: error: passing argument 1 of
> 'cpu_physical_memory_set_dirty_lebitmap' from incompatible pointer
> type [-Werror=incompatible-pointer-types]
>  cpu_physical_memory_set_dirty_lebitmap((uint64_t *)range->bitmap.data,
> ^
> In file included from ../../hw/vfio/common.c:32:0:
> /home/peter.maydell/qemu/include/exec/ram_addr.h:337:20: note:
> expected 'long unsigned int *' but argument is of type 'uint64_t *
> {aka long long unsigned int *}'
>  static inline void cpu_physical_memory_set_dirty_lebitmap(unsigned
> long *bitmap,
> ^~

Seems that our bitmap in just being incorrectly cast as a uint64_t*
rather than an unsigned long*.

Both fixes being rolled into the next pull request, which I've build 32
and 64 bit and with mingw.  Thanks,

Alex

[Bug 1902451] [NEW] incorrect cpuid feature detection

2020-11-01 Thread Luis

Public bug reported:

Hello,

I am currently developing a x64 kernel and I wanted to check through
cpuid if some features are available in the guest. When I try to enable
cpu features like vmcb_clean or constant_tsc qemu is saying that my host
doesn't support the requested features. However cat /proc/cpuinfo tells
a different story:

model name:  AMD Ryzen 5 3500U
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf 
pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx 
f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 
3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext 
perfctr_llc mwaitx cpb hw_pstate sme pti ssbd sev ibpb vmmcall fsgsbase bmi1 
avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves 
clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean 
flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif 
overflow_recov succor smca

I also checked it myself by running cpuid and check the bits as in the
AMD Manual. Everything checks out but qemu still fails.

QEMU version: QEMU emulator version 4.2.0

$ qemu-system-x86_64 -cpu host,+vmcb_clean,enforce -enable-kvm -drive 
format=raw,file=target/x86_64-os/debug/bootimage-my_kernel.bin -serial stdio 
-display none
qemu-system-x86_64: warning: host doesn't support requested feature: 
CPUID.800AH:EDX.vmcb-clean [bit 5]
qemu-system-x86_64: Host doesn't support requested features

or

$ qemu-system-x86_64 -cpu host,+constant_tsc,enforce -enable-kvm -drive 
format=raw,file=target/x86_64-os/debug/bootimage-my_kernel.bin -serial stdio 
-display none
qemu-system-x86_64: Property '.constant_tsc' not found

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1902451

Title:
  incorrect cpuid feature detection

Status in QEMU:
  New

Bug description:
  Hello,

  I am currently developing a x64 kernel and I wanted to check through
  cpuid if some features are available in the guest. When I try to
  enable cpu features like vmcb_clean or constant_tsc qemu is saying
  that my host doesn't support the requested features. However cat
  /proc/cpuinfo tells a different story:

  model name:  AMD Ryzen 5 3500U
  flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf 
pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx 
f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 
3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext 
perfctr_llc mwaitx cpb hw_pstate sme pti ssbd sev ibpb vmmcall fsgsbase bmi1 
avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves 
clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean 
flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif 
overflow_recov succor smca

  I also checked it myself by running cpuid and check the bits as in the
  AMD Manual. Everything checks out but qemu still fails.

  QEMU version: QEMU emulator version 4.2.0

  $ qemu-system-x86_64 -cpu host,+vmcb_clean,enforce -enable-kvm -drive 
format=raw,file=target/x86_64-os/debug/bootimage-my_kernel.bin -serial stdio 
-display none
  qemu-system-x86_64: warning: host doesn't support requested feature: 
CPUID.800AH:EDX.vmcb-clean [bit 5]
  qemu-system-x86_64: Host doesn't support requested features

  or

  $ qemu-system-x86_64 -cpu host,+constant_tsc,enforce -enable-kvm -drive 
format=raw,file=target/x86_64-os/debug/bootimage-my_kernel.bin -serial stdio 
-display none
  qemu-system-x86_64: Property '.constant_tsc' not found

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1902451/+subscriptions

[PULL v3 00/32] VFIO updates 2020-11-01 (for QEMU 5.2 soft-freeze)

2020-11-01 Thread Alex Williamson

Aggregated interdiff versus v1 pull request below.  Thanks,

Alex

The following changes since commit 700d20b49e303549b32d3a7a3efbfcee8c7a4f6c:

  Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging 
(2020-11-01 14:02:19 +)

are available in the Git repository at:

  git://github.com/awilliam/qemu-vfio.git tags/vfio-update-20201101.0

for you to fetch changes up to c624b6b312680b76d2a19a4c65cfdb234e875e1b:

  vfio: fix incorrect print type (2020-11-01 12:30:52 -0700)


VFIO update 2020-11-01

 * Migration support (Kirti Wankhede)
 * s390 DMA limiting (Matthew Rosato)
 * zPCI hardware info (Matthew Rosato)
 * Lock guard (Amey Narkhede)
 * Print fixes (Zhengui li)
 * Warning/build fixes


Amey Narkhede (1):
  hw/vfio: Use lock guard macros

Kirti Wankhede (17):
  vfio: Add function to unmap VFIO region
  vfio: Add vfio_get_object callback to VFIODeviceOps
  vfio: Add save and load functions for VFIO PCI devices
  vfio: Add migration region initialization and finalize function
  vfio: Add VM state change handler to know state of VM
  vfio: Add migration state change notifier
  vfio: Register SaveVMHandlers for VFIO device
  vfio: Add save state functions to SaveVMHandlers
  vfio: Add load state functions to SaveVMHandlers
  memory: Set DIRTY_MEMORY_MIGRATION when IOMMU is enabled
  vfio: Get migration capability flags for container
  vfio: Add function to start and stop dirty pages tracking
  vfio: Add vfio_listener_log_sync to mark dirty pages
  vfio: Dirty page tracking when vIOMMU is enabled
  vfio: Add ioctl to get dirty pages bitmap during dma unmap
  vfio: Make vfio-pci device migration capable
  qapi: Add VFIO devices migration stats in Migration stats

Matthew Rosato (10):
  update-linux-headers: Add vfio_zdev.h
  linux-headers: update against 5.10-rc1
  s390x/pci: Move header files to include/hw/s390x
  vfio: Create shared routine for scanning info capabilities
  vfio: Find DMA available capability
  s390x/pci: Add routine to get the vfio dma available count
  s390x/pci: Honor DMA limits set by vfio
  s390x/pci: clean up s390 PCI groups
  vfio: Add routine for finding VFIO_DEVICE_GET_INFO capabilities
  s390x/pci: get zPCI function info from host

Pierre Morel (3):
  s390x/pci: create a header dedicated to PCI CLP
  s390x/pci: use a PCI Group structure
  s390x/pci: use a PCI Function structure

Zhengui Li (1):
  vfio: fix incorrect print type

 MAINTAINERS|   1 +
 hw/s390x/meson.build   |   1 +
 hw/s390x/s390-pci-bus.c|  91 +-
 hw/s390x/s390-pci-inst.c   |  78 +-
 hw/s390x/s390-pci-vfio.c   | 276 ++
 hw/s390x/s390-virtio-ccw.c |   2 +-
 hw/s390x/trace-events  |   5 +
 hw/vfio/common.c   | 508 ++-
 hw/vfio/meson.build|   1 +
 hw/vfio/migration.c| 933 +
 hw/vfio/pci.c  |  87 +-
 hw/vfio/pci.h  |   1 -
 hw/vfio/platform.c |   7 +-
 hw/vfio/trace-events   |  21 +
 {hw => include/hw}/s390x/s390-pci-bus.h|  22 +
 .../hw/s390x/s390-pci-clp.h| 123 +--
 include/hw/s390x/s390-pci-inst.h   | 119 +++
 include/hw/s390x/s390-pci-vfio.h   |  38 +
 include/hw/vfio/vfio-common.h  |  30 +
 .../infiniband/hw/vmw_pvrdma/pvrdma_verbs.h|   2 +-
 include/standard-headers/linux/ethtool.h   |   2 +
 include/standard-headers/linux/fuse.h  |  50 +-
 include/standard-headers/linux/input-event-codes.h |   4 +
 include/standard-headers/linux/pci_regs.h  |   6 +-
 include/standard-headers/linux/virtio_fs.h |   3 +
 include/standard-headers/linux/virtio_gpu.h|  19 +
 include/standard-headers/linux/virtio_mmio.h   |  11 +
 include/standard-headers/linux/virtio_pci.h|  11 +-
 linux-headers/asm-arm64/kvm.h  |  25 +
 linux-headers/asm-arm64/mman.h |   1 +
 linux-headers/asm-generic/hugetlb_encode.h |   1 +
 linux-headers/asm-generic/unistd.h |  18 +-
 linux-headers/asm-mips/unistd_n32.h|   1 +
 linux-headers/asm-mips/unistd_n64.h|   1 +
 linux-headers/asm-mips/unistd_o32.h|   1 +
 linux-headers/asm-powerpc/unistd_32.h  |   1 +
 linux-headers/asm-powerpc/unistd_64.h  |   1 +
 linux-headers/asm-s390/unistd_32.h |

[PULL v3 01/32] vfio: Add function to unmap VFIO region

2020-11-01 Thread Alex Williamson

From: Kirti Wankhede 

This function will be used for migration region.
Migration region is mmaped when migration starts and will be unmapped when
migration is complete.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
Reviewed-by: Cornelia Huck 
Signed-off-by: Alex Williamson 
---
 hw/vfio/common.c  |   32 
 hw/vfio/trace-events  |1 +
 include/hw/vfio/vfio-common.h |1 +
 3 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 13471ae29436..c6e98b8d61be 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -924,6 +924,18 @@ int vfio_region_setup(Object *obj, VFIODevice *vbasedev, 
VFIORegion *region,
 return 0;
 }
 
+static void vfio_subregion_unmap(VFIORegion *region, int index)
+{
+trace_vfio_region_unmap(memory_region_name(®ion->mmaps[index].mem),
+region->mmaps[index].offset,
+region->mmaps[index].offset +
+region->mmaps[index].size - 1);
+memory_region_del_subregion(region->mem, ®ion->mmaps[index].mem);
+munmap(region->mmaps[index].mmap, region->mmaps[index].size);
+object_unparent(OBJECT(®ion->mmaps[index].mem));
+region->mmaps[index].mmap = NULL;
+}
+
 int vfio_region_mmap(VFIORegion *region)
 {
 int i, prot = 0;
@@ -954,10 +966,7 @@ int vfio_region_mmap(VFIORegion *region)
 region->mmaps[i].mmap = NULL;
 
 for (i--; i >= 0; i--) {
-memory_region_del_subregion(region->mem, 
®ion->mmaps[i].mem);
-munmap(region->mmaps[i].mmap, region->mmaps[i].size);
-object_unparent(OBJECT(®ion->mmaps[i].mem));
-region->mmaps[i].mmap = NULL;
+vfio_subregion_unmap(region, i);
 }
 
 return ret;
@@ -982,6 +991,21 @@ int vfio_region_mmap(VFIORegion *region)
 return 0;
 }
 
+void vfio_region_unmap(VFIORegion *region)
+{
+int i;
+
+if (!region->mem) {
+return;
+}
+
+for (i = 0; i < region->nr_mmaps; i++) {
+if (region->mmaps[i].mmap) {
+vfio_subregion_unmap(region, i);
+}
+}
+}
+
 void vfio_region_exit(VFIORegion *region)
 {
 int i;
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 93a0bc2522f8..a0c7b49a2ebc 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -113,6 +113,7 @@ vfio_region_mmap(const char *name, unsigned long offset, 
unsigned long end) "Reg
 vfio_region_exit(const char *name, int index) "Device %s, region %d"
 vfio_region_finalize(const char *name, int index) "Device %s, region %d"
 vfio_region_mmaps_set_enabled(const char *name, bool enabled) "Region %s mmaps 
enabled: %d"
+vfio_region_unmap(const char *name, unsigned long offset, unsigned long end) 
"Region %s unmap [0x%lx - 0x%lx]"
 vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) 
"Device %s region %d: %d sparse mmap entries"
 vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long end) 
"sparse entry %d [0x%lx - 0x%lx]"
 vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t 
subtype) "%s index %d, %08x/%0x8"
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index c78f3ff5593c..dc95f527b583 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -171,6 +171,7 @@ int vfio_region_setup(Object *obj, VFIODevice *vbasedev, 
VFIORegion *region,
   int index, const char *name);
 int vfio_region_mmap(VFIORegion *region);
 void vfio_region_mmaps_set_enabled(VFIORegion *region, bool enabled);
+void vfio_region_unmap(VFIORegion *region);
 void vfio_region_exit(VFIORegion *region);
 void vfio_region_finalize(VFIORegion *region);
 void vfio_reset_handler(void *opaque);

[PULL v3 04/32] vfio: Add migration region initialization and finalize function

2020-11-01 Thread Alex Williamson

From: Kirti Wankhede 

Whether the VFIO device supports migration or not is decided based of
migration region query. If migration region query is successful and migration
region initialization is successful then migration is supported else
migration is blocked.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
Acked-by: Dr. David Alan Gilbert 
Reviewed-by: Cornelia Huck 
Signed-off-by: Alex Williamson 
---
 hw/vfio/meson.build   |1 
 hw/vfio/migration.c   |  122 +
 hw/vfio/trace-events  |3 +
 include/hw/vfio/vfio-common.h |9 +++
 4 files changed, 135 insertions(+)
 create mode 100644 hw/vfio/migration.c

diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
index 37efa74018bc..da9af297a0c5 100644
--- a/hw/vfio/meson.build
+++ b/hw/vfio/meson.build
@@ -2,6 +2,7 @@ vfio_ss = ss.source_set()
 vfio_ss.add(files(
   'common.c',
   'spapr.c',
+  'migration.c',
 ))
 vfio_ss.add(when: 'CONFIG_VFIO_PCI', if_true: files(
   'display.c',
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
new file mode 100644
index ..fd7faf423cdc
--- /dev/null
+++ b/hw/vfio/migration.c
@@ -0,0 +1,122 @@
+/*
+ * Migration support for VFIO devices
+ *
+ * Copyright NVIDIA, Inc. 2020
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include 
+
+#include "hw/vfio/vfio-common.h"
+#include "cpu.h"
+#include "migration/migration.h"
+#include "migration/qemu-file.h"
+#include "migration/register.h"
+#include "migration/blocker.h"
+#include "migration/misc.h"
+#include "qapi/error.h"
+#include "exec/ramlist.h"
+#include "exec/ram_addr.h"
+#include "pci.h"
+#include "trace.h"
+
+static void vfio_migration_exit(VFIODevice *vbasedev)
+{
+VFIOMigration *migration = vbasedev->migration;
+
+vfio_region_exit(&migration->region);
+vfio_region_finalize(&migration->region);
+g_free(vbasedev->migration);
+vbasedev->migration = NULL;
+}
+
+static int vfio_migration_init(VFIODevice *vbasedev,
+   struct vfio_region_info *info)
+{
+int ret;
+Object *obj;
+
+if (!vbasedev->ops->vfio_get_object) {
+return -EINVAL;
+}
+
+obj = vbasedev->ops->vfio_get_object(vbasedev);
+if (!obj) {
+return -EINVAL;
+}
+
+vbasedev->migration = g_new0(VFIOMigration, 1);
+
+ret = vfio_region_setup(obj, vbasedev, &vbasedev->migration->region,
+info->index, "migration");
+if (ret) {
+error_report("%s: Failed to setup VFIO migration region %d: %s",
+ vbasedev->name, info->index, strerror(-ret));
+goto err;
+}
+
+if (!vbasedev->migration->region.size) {
+error_report("%s: Invalid zero-sized VFIO migration region %d",
+ vbasedev->name, info->index);
+ret = -EINVAL;
+goto err;
+}
+return 0;
+
+err:
+vfio_migration_exit(vbasedev);
+return ret;
+}
+
+/* -- */
+
+int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
+{
+struct vfio_region_info *info = NULL;
+Error *local_err = NULL;
+int ret;
+
+ret = vfio_get_dev_region_info(vbasedev, VFIO_REGION_TYPE_MIGRATION,
+   VFIO_REGION_SUBTYPE_MIGRATION, &info);
+if (ret) {
+goto add_blocker;
+}
+
+ret = vfio_migration_init(vbasedev, info);
+if (ret) {
+goto add_blocker;
+}
+
+g_free(info);
+trace_vfio_migration_probe(vbasedev->name, info->index);
+return 0;
+
+add_blocker:
+error_setg(&vbasedev->migration_blocker,
+   "VFIO device doesn't support migration");
+g_free(info);
+
+ret = migrate_add_blocker(vbasedev->migration_blocker, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+error_free(vbasedev->migration_blocker);
+vbasedev->migration_blocker = NULL;
+}
+return ret;
+}
+
+void vfio_migration_finalize(VFIODevice *vbasedev)
+{
+if (vbasedev->migration) {
+vfio_migration_exit(vbasedev);
+}
+
+if (vbasedev->migration_blocker) {
+migrate_del_blocker(vbasedev->migration_blocker);
+error_free(vbasedev->migration_blocker);
+vbasedev->migration_blocker = NULL;
+}
+}
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index a0c7b49a2ebc..9ced5ec6277c 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -145,3 +145,6 @@ vfio_display_edid_link_up(void) ""
 vfio_display_edid_link_down(void) ""
 vfio_display_edid_update(uint32_t prefx, uint32_t prefy) "%ux%u"
 vfio_display_edid_write_error(void) ""
+
+# migration.c
+vfio_migration_probe(const char *name, uint32_t index) " (%s) Region %d"
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index ba6169cd926e..8275c4c68f45 100644
--- a/includ

[PULL v3 06/32] vfio: Add migration state change notifier

2020-11-01 Thread Alex Williamson

From: Kirti Wankhede 

Added migration state change notifier to get notification on migration state
change. These states are translated to VFIO device state and conveyed to
vendor driver.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Cornelia Huck 
Signed-off-by: Alex Williamson 
---
 hw/vfio/migration.c   |   28 
 hw/vfio/trace-events  |1 +
 include/hw/vfio/vfio-common.h |2 ++
 3 files changed, 31 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index e1ffae05e288..7ec85b6469c5 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -175,6 +175,30 @@ static void vfio_vmstate_change(void *opaque, int running, 
RunState state)
 (migration->device_state & mask) | value);
 }
 
+static void vfio_migration_state_notifier(Notifier *notifier, void *data)
+{
+MigrationState *s = data;
+VFIOMigration *migration = container_of(notifier, VFIOMigration,
+migration_state);
+VFIODevice *vbasedev = migration->vbasedev;
+int ret;
+
+trace_vfio_migration_state_notifier(vbasedev->name,
+MigrationStatus_str(s->state));
+
+switch (s->state) {
+case MIGRATION_STATUS_CANCELLING:
+case MIGRATION_STATUS_CANCELLED:
+case MIGRATION_STATUS_FAILED:
+ret = vfio_migration_set_state(vbasedev,
+  ~(VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RESUMING),
+  VFIO_DEVICE_STATE_RUNNING);
+if (ret) {
+error_report("%s: Failed to set state RUNNING", vbasedev->name);
+}
+}
+}
+
 static void vfio_migration_exit(VFIODevice *vbasedev)
 {
 VFIOMigration *migration = vbasedev->migration;
@@ -219,8 +243,11 @@ static int vfio_migration_init(VFIODevice *vbasedev,
 }
 
 migration = vbasedev->migration;
+migration->vbasedev = vbasedev;
 migration->vm_state = qemu_add_vm_change_state_handler(vfio_vmstate_change,
vbasedev);
+migration->migration_state.notify = vfio_migration_state_notifier;
+add_migration_state_change_notifier(&migration->migration_state);
 return 0;
 
 err:
@@ -270,6 +297,7 @@ void vfio_migration_finalize(VFIODevice *vbasedev)
 if (vbasedev->migration) {
 VFIOMigration *migration = vbasedev->migration;
 
+remove_migration_state_change_notifier(&migration->migration_state);
 qemu_del_vm_change_state_handler(migration->vm_state);
 vfio_migration_exit(vbasedev);
 }
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 41de81f12f60..78d7d83b5ef8 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -150,3 +150,4 @@ vfio_display_edid_write_error(void) ""
 vfio_migration_probe(const char *name, uint32_t index) " (%s) Region %d"
 vfio_migration_set_state(const char *name, uint32_t state) " (%s) state %d"
 vfio_vmstate_change(const char *name, int running, const char *reason, 
uint32_t dev_state) " (%s) running %d reason %s device state %d"
+vfio_migration_state_notifier(const char *name, const char *state) " (%s) 
state %s"
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 9a571f1fb552..2bd593ba38bb 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -59,10 +59,12 @@ typedef struct VFIORegion {
 } VFIORegion;
 
 typedef struct VFIOMigration {
+struct VFIODevice *vbasedev;
 VMChangeStateEntry *vm_state;
 VFIORegion region;
 uint32_t device_state;
 int vm_running;
+Notifier migration_state;
 } VFIOMigration;
 
 typedef struct VFIOAddressSpace {

[PULL v3 02/32] vfio: Add vfio_get_object callback to VFIODeviceOps

2020-11-01 Thread Alex Williamson

From: Kirti Wankhede 

Hook vfio_get_object callback for PCI devices.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
Suggested-by: Cornelia Huck 
Reviewed-by: Cornelia Huck 
Signed-off-by: Alex Williamson 
---
 hw/vfio/pci.c |8 
 include/hw/vfio/vfio-common.h |1 +
 2 files changed, 9 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 0d83eb0e47bb..bffd5bfe3b78 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2394,10 +2394,18 @@ static void vfio_pci_compute_needs_reset(VFIODevice 
*vbasedev)
 }
 }
 
+static Object *vfio_pci_get_object(VFIODevice *vbasedev)
+{
+VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+
+return OBJECT(vdev);
+}
+
 static VFIODeviceOps vfio_pci_ops = {
 .vfio_compute_needs_reset = vfio_pci_compute_needs_reset,
 .vfio_hot_reset_multi = vfio_pci_hot_reset_multi,
 .vfio_eoi = vfio_intx_eoi,
+.vfio_get_object = vfio_pci_get_object,
 };
 
 int vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index dc95f527b583..fe99c36a693a 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -119,6 +119,7 @@ struct VFIODeviceOps {
 void (*vfio_compute_needs_reset)(VFIODevice *vdev);
 int (*vfio_hot_reset_multi)(VFIODevice *vdev);
 void (*vfio_eoi)(VFIODevice *vdev);
+Object *(*vfio_get_object)(VFIODevice *vdev);
 };
 
 typedef struct VFIOGroup {

[PULL v3 03/32] vfio: Add save and load functions for VFIO PCI devices

2020-11-01 Thread Alex Williamson

From: Kirti Wankhede 

Added functions to save and restore PCI device specific data,
specifically config space of PCI device.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
Signed-off-by: Alex Williamson 
---
 hw/vfio/pci.c |   51 +
 include/hw/vfio/vfio-common.h |2 ++
 2 files changed, 53 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index bffd5bfe3b78..e27c88be6d85 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -41,6 +41,7 @@
 #include "trace.h"
 #include "qapi/error.h"
 #include "migration/blocker.h"
+#include "migration/qemu-file.h"
 
 #define TYPE_VFIO_PCI_NOHOTPLUG "vfio-pci-nohotplug"
 
@@ -2401,11 +2402,61 @@ static Object *vfio_pci_get_object(VFIODevice *vbasedev)
 return OBJECT(vdev);
 }
 
+static bool vfio_msix_present(void *opaque, int version_id)
+{
+PCIDevice *pdev = opaque;
+
+return msix_present(pdev);
+}
+
+const VMStateDescription vmstate_vfio_pci_config = {
+.name = "VFIOPCIDevice",
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_PCI_DEVICE(pdev, VFIOPCIDevice),
+VMSTATE_MSIX_TEST(pdev, VFIOPCIDevice, vfio_msix_present),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static void vfio_pci_save_config(VFIODevice *vbasedev, QEMUFile *f)
+{
+VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+
+vmstate_save_state(f, &vmstate_vfio_pci_config, vdev, NULL);
+}
+
+static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
+{
+VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+PCIDevice *pdev = &vdev->pdev;
+int ret;
+
+ret = vmstate_load_state(f, &vmstate_vfio_pci_config, vdev, 1);
+if (ret) {
+return ret;
+}
+
+vfio_pci_write_config(pdev, PCI_COMMAND,
+  pci_get_word(pdev->config + PCI_COMMAND), 2);
+
+if (msi_enabled(pdev)) {
+vfio_msi_enable(vdev);
+} else if (msix_enabled(pdev)) {
+vfio_msix_enable(vdev);
+}
+
+return ret;
+}
+
 static VFIODeviceOps vfio_pci_ops = {
 .vfio_compute_needs_reset = vfio_pci_compute_needs_reset,
 .vfio_hot_reset_multi = vfio_pci_hot_reset_multi,
 .vfio_eoi = vfio_intx_eoi,
 .vfio_get_object = vfio_pci_get_object,
+.vfio_save_config = vfio_pci_save_config,
+.vfio_load_config = vfio_pci_load_config,
 };
 
 int vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index fe99c36a693a..ba6169cd926e 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -120,6 +120,8 @@ struct VFIODeviceOps {
 int (*vfio_hot_reset_multi)(VFIODevice *vdev);
 void (*vfio_eoi)(VFIODevice *vdev);
 Object *(*vfio_get_object)(VFIODevice *vdev);
+void (*vfio_save_config)(VFIODevice *vdev, QEMUFile *f);
+int (*vfio_load_config)(VFIODevice *vdev, QEMUFile *f);
 };
 
 typedef struct VFIOGroup {

[PULL v3 05/32] vfio: Add VM state change handler to know state of VM

2020-11-01 Thread Alex Williamson

From: Kirti Wankhede 

VM state change handler is called on change in VM's state. Based on
VM state, VFIO device state should be changed.
Added read/write helper functions for migration region.
Added function to set device_state.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Cornelia Huck 
[aw: lx -> HWADDR_PRIx, remove redundant parens]
Signed-off-by: Alex Williamson 
---
 hw/vfio/migration.c   |  160 +
 hw/vfio/trace-events  |2 +
 include/hw/vfio/vfio-common.h |4 +
 3 files changed, 166 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index fd7faf423cdc..e1ffae05e288 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -10,6 +10,7 @@
 #include "qemu/osdep.h"
 #include 
 
+#include "sysemu/runstate.h"
 #include "hw/vfio/vfio-common.h"
 #include "cpu.h"
 #include "migration/migration.h"
@@ -22,6 +23,157 @@
 #include "exec/ram_addr.h"
 #include "pci.h"
 #include "trace.h"
+#include "hw/hw.h"
+
+static inline int vfio_mig_access(VFIODevice *vbasedev, void *val, int count,
+  off_t off, bool iswrite)
+{
+int ret;
+
+ret = iswrite ? pwrite(vbasedev->fd, val, count, off) :
+pread(vbasedev->fd, val, count, off);
+if (ret < count) {
+error_report("vfio_mig_%s %d byte %s: failed at offset 0x%"
+ HWADDR_PRIx", err: %s", iswrite ? "write" : "read", count,
+ vbasedev->name, off, strerror(errno));
+return (ret < 0) ? ret : -EINVAL;
+}
+return 0;
+}
+
+static int vfio_mig_rw(VFIODevice *vbasedev, __u8 *buf, size_t count,
+   off_t off, bool iswrite)
+{
+int ret, done = 0;
+__u8 *tbuf = buf;
+
+while (count) {
+int bytes = 0;
+
+if (count >= 8 && !(off % 8)) {
+bytes = 8;
+} else if (count >= 4 && !(off % 4)) {
+bytes = 4;
+} else if (count >= 2 && !(off % 2)) {
+bytes = 2;
+} else {
+bytes = 1;
+}
+
+ret = vfio_mig_access(vbasedev, tbuf, bytes, off, iswrite);
+if (ret) {
+return ret;
+}
+
+count -= bytes;
+done += bytes;
+off += bytes;
+tbuf += bytes;
+}
+return done;
+}
+
+#define vfio_mig_read(f, v, c, o)   vfio_mig_rw(f, (__u8 *)v, c, o, false)
+#define vfio_mig_write(f, v, c, o)  vfio_mig_rw(f, (__u8 *)v, c, o, true)
+
+#define VFIO_MIG_STRUCT_OFFSET(f)   \
+ offsetof(struct vfio_device_migration_info, f)
+/*
+ * Change the device_state register for device @vbasedev. Bits set in @mask
+ * are preserved, bits set in @value are set, and bits not set in either @mask
+ * or @value are cleared in device_state. If the register cannot be accessed,
+ * the resulting state would be invalid, or the device enters an error state,
+ * an error is returned.
+ */
+
+static int vfio_migration_set_state(VFIODevice *vbasedev, uint32_t mask,
+uint32_t value)
+{
+VFIOMigration *migration = vbasedev->migration;
+VFIORegion *region = &migration->region;
+off_t dev_state_off = region->fd_offset +
+  VFIO_MIG_STRUCT_OFFSET(device_state);
+uint32_t device_state;
+int ret;
+
+ret = vfio_mig_read(vbasedev, &device_state, sizeof(device_state),
+dev_state_off);
+if (ret < 0) {
+return ret;
+}
+
+device_state = (device_state & mask) | value;
+
+if (!VFIO_DEVICE_STATE_VALID(device_state)) {
+return -EINVAL;
+}
+
+ret = vfio_mig_write(vbasedev, &device_state, sizeof(device_state),
+ dev_state_off);
+if (ret < 0) {
+int rret;
+
+rret = vfio_mig_read(vbasedev, &device_state, sizeof(device_state),
+ dev_state_off);
+
+if ((rret < 0) || (VFIO_DEVICE_STATE_IS_ERROR(device_state))) {
+hw_error("%s: Device in error state 0x%x", vbasedev->name,
+ device_state);
+return rret ? rret : -EIO;
+}
+return ret;
+}
+
+migration->device_state = device_state;
+trace_vfio_migration_set_state(vbasedev->name, device_state);
+return 0;
+}
+
+static void vfio_vmstate_change(void *opaque, int running, RunState state)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+uint32_t value, mask;
+int ret;
+
+if (vbasedev->migration->vm_running == running) {
+return;
+}
+
+if (running) {
+/*
+ * Here device state can have one of _SAVING, _RESUMING or _STOP bit.
+ * Transition from _SAVING to _RUNNING can happen if there is migration
+ * failure, in that case clear _SAVING bit.
+ * Transition from _RESUMING to _RUNNING occurs during resuming
+ * p

[PULL v3 07/32] vfio: Register SaveVMHandlers for VFIO device

2020-11-01 Thread Alex Williamson

From: Kirti Wankhede 

Define flags to be used as delimiter in migration stream for VFIO devices.
Added .save_setup and .save_cleanup functions. Map & unmap migration
region from these functions at source during saving or pre-copy phase.

Set VFIO device state depending on VM's state. During live migration, VM is
running when .save_setup is called, _SAVING | _RUNNING state is set for VFIO
device. During save-restore, VM is paused, _SAVING state is set for VFIO device.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
Reviewed-by: Cornelia Huck 
Reviewed-by: Yan Zhao 
Signed-off-by: Alex Williamson 
---
 hw/vfio/migration.c  |  102 ++
 hw/vfio/trace-events |2 +
 2 files changed, 104 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 7ec85b6469c5..ca6fd896655b 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -8,12 +8,15 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/main-loop.h"
+#include "qemu/cutils.h"
 #include 
 
 #include "sysemu/runstate.h"
 #include "hw/vfio/vfio-common.h"
 #include "cpu.h"
 #include "migration/migration.h"
+#include "migration/vmstate.h"
 #include "migration/qemu-file.h"
 #include "migration/register.h"
 #include "migration/blocker.h"
@@ -25,6 +28,22 @@
 #include "trace.h"
 #include "hw/hw.h"
 
+/*
+ * Flags to be used as unique delimiters for VFIO devices in the migration
+ * stream. These flags are composed as:
+ * 0x => MSB 32-bit all 1s
+ * 0xef10 => Magic ID, represents emulated (virtual) function IO
+ * 0x => 16-bits reserved for flags
+ *
+ * The beginning of state information is marked by _DEV_CONFIG_STATE,
+ * _DEV_SETUP_STATE, or _DEV_DATA_STATE, respectively. The end of a
+ * certain state information is marked by _END_OF_STATE.
+ */
+#define VFIO_MIG_FLAG_END_OF_STATE  (0xef11ULL)
+#define VFIO_MIG_FLAG_DEV_CONFIG_STATE  (0xef12ULL)
+#define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xef13ULL)
+#define VFIO_MIG_FLAG_DEV_DATA_STATE(0xef14ULL)
+
 static inline int vfio_mig_access(VFIODevice *vbasedev, void *val, int count,
   off_t off, bool iswrite)
 {
@@ -129,6 +148,75 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, 
uint32_t mask,
 return 0;
 }
 
+static void vfio_migration_cleanup(VFIODevice *vbasedev)
+{
+VFIOMigration *migration = vbasedev->migration;
+
+if (migration->region.mmaps) {
+vfio_region_unmap(&migration->region);
+}
+}
+
+/* -- */
+
+static int vfio_save_setup(QEMUFile *f, void *opaque)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+int ret;
+
+trace_vfio_save_setup(vbasedev->name);
+
+qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE);
+
+if (migration->region.mmaps) {
+/*
+ * Calling vfio_region_mmap() from migration thread. Memory API called
+ * from this function require locking the iothread when called from
+ * outside the main loop thread.
+ */
+qemu_mutex_lock_iothread();
+ret = vfio_region_mmap(&migration->region);
+qemu_mutex_unlock_iothread();
+if (ret) {
+error_report("%s: Failed to mmap VFIO migration region: %s",
+ vbasedev->name, strerror(-ret));
+error_report("%s: Falling back to slow path", vbasedev->name);
+}
+}
+
+ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_MASK,
+   VFIO_DEVICE_STATE_SAVING);
+if (ret) {
+error_report("%s: Failed to set state SAVING", vbasedev->name);
+return ret;
+}
+
+qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
+
+ret = qemu_file_get_error(f);
+if (ret) {
+return ret;
+}
+
+return 0;
+}
+
+static void vfio_save_cleanup(void *opaque)
+{
+VFIODevice *vbasedev = opaque;
+
+vfio_migration_cleanup(vbasedev);
+trace_vfio_save_cleanup(vbasedev->name);
+}
+
+static SaveVMHandlers savevm_vfio_handlers = {
+.save_setup = vfio_save_setup,
+.save_cleanup = vfio_save_cleanup,
+};
+
+/* -- */
+
 static void vfio_vmstate_change(void *opaque, int running, RunState state)
 {
 VFIODevice *vbasedev = opaque;
@@ -215,6 +303,8 @@ static int vfio_migration_init(VFIODevice *vbasedev,
 int ret;
 Object *obj;
 VFIOMigration *migration;
+char id[256] = "";
+g_autofree char *path = NULL, *oid = NULL;
 
 if (!vbasedev->ops->vfio_get_object) {
 return -EINVAL;
@@ -244,6 +334,18 @@ static int vfio_migration_init(VFIODevice *vbasedev,
 
 migration = vbasedev->migration;
 migration->vbasedev = vbasedev;
+
+oid = vmstate_if_get_id(VMSTATE_IF(DEVICE(obj)));
+if (oid) {
+path = g_strdup_printf("%s/vfio", oid);
+} else

[PULL v3 09/32] vfio: Add load state functions to SaveVMHandlers

2020-11-01 Thread Alex Williamson

From: Kirti Wankhede 

Sequence  during _RESUMING device state:
While data for this device is available, repeat below steps:
a. read data_offset from where user application should write data.
b. write data of data_size to migration region from data_offset.
c. write data_size which indicates vendor driver that data is written in
   staging buffer.

For user, data is opaque. User should write data in the same order as
received.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Yan Zhao 
Signed-off-by: Alex Williamson 
---
 hw/vfio/migration.c  |  195 ++
 hw/vfio/trace-events |4 +
 2 files changed, 199 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 5e0c9e8e61ec..1af0fce874d4 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -257,6 +257,77 @@ static int vfio_save_buffer(QEMUFile *f, VFIODevice 
*vbasedev, uint64_t *size)
 return ret;
 }
 
+static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
+uint64_t data_size)
+{
+VFIORegion *region = &vbasedev->migration->region;
+uint64_t data_offset = 0, size, report_size;
+int ret;
+
+do {
+ret = vfio_mig_read(vbasedev, &data_offset, sizeof(data_offset),
+  region->fd_offset + VFIO_MIG_STRUCT_OFFSET(data_offset));
+if (ret < 0) {
+return ret;
+}
+
+if (data_offset + data_size > region->size) {
+/*
+ * If data_size is greater than the data section of migration 
region
+ * then iterate the write buffer operation. This case can occur if
+ * size of migration region at destination is smaller than size of
+ * migration region at source.
+ */
+report_size = size = region->size - data_offset;
+data_size -= size;
+} else {
+report_size = size = data_size;
+data_size = 0;
+}
+
+trace_vfio_load_state_device_data(vbasedev->name, data_offset, size);
+
+while (size) {
+void *buf;
+uint64_t sec_size;
+bool buf_alloc = false;
+
+buf = get_data_section_size(region, data_offset, size, &sec_size);
+
+if (!buf) {
+buf = g_try_malloc(sec_size);
+if (!buf) {
+error_report("%s: Error allocating buffer ", __func__);
+return -ENOMEM;
+}
+buf_alloc = true;
+}
+
+qemu_get_buffer(f, buf, sec_size);
+
+if (buf_alloc) {
+ret = vfio_mig_write(vbasedev, buf, sec_size,
+region->fd_offset + data_offset);
+g_free(buf);
+
+if (ret < 0) {
+return ret;
+}
+}
+size -= sec_size;
+data_offset += sec_size;
+}
+
+ret = vfio_mig_write(vbasedev, &report_size, sizeof(report_size),
+region->fd_offset + VFIO_MIG_STRUCT_OFFSET(data_size));
+if (ret < 0) {
+return ret;
+}
+} while (data_size);
+
+return 0;
+}
+
 static int vfio_update_pending(VFIODevice *vbasedev)
 {
 VFIOMigration *migration = vbasedev->migration;
@@ -293,6 +364,33 @@ static int vfio_save_device_config_state(QEMUFile *f, void 
*opaque)
 return qemu_file_get_error(f);
 }
 
+static int vfio_load_device_config_state(QEMUFile *f, void *opaque)
+{
+VFIODevice *vbasedev = opaque;
+uint64_t data;
+
+if (vbasedev->ops && vbasedev->ops->vfio_load_config) {
+int ret;
+
+ret = vbasedev->ops->vfio_load_config(vbasedev, f);
+if (ret) {
+error_report("%s: Failed to load device config space",
+ vbasedev->name);
+return ret;
+}
+}
+
+data = qemu_get_be64(f);
+if (data != VFIO_MIG_FLAG_END_OF_STATE) {
+error_report("%s: Failed loading device config space, "
+ "end flag incorrect 0x%"PRIx64, vbasedev->name, data);
+return -EINVAL;
+}
+
+trace_vfio_load_device_config_state(vbasedev->name);
+return qemu_file_get_error(f);
+}
+
 static void vfio_migration_cleanup(VFIODevice *vbasedev)
 {
 VFIOMigration *migration = vbasedev->migration;
@@ -483,12 +581,109 @@ static int vfio_save_complete_precopy(QEMUFile *f, void 
*opaque)
 return ret;
 }
 
+static int vfio_load_setup(QEMUFile *f, void *opaque)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+int ret = 0;
+
+if (migration->region.mmaps) {
+ret = vfio_region_mmap(&migration->region);
+if (ret) {
+error_report("%s: Failed to mmap VFIO migration region %d: %s",
+ vbasedev->name, migration->region.nr,
+ strerro

[PULL v3 08/32] vfio: Add save state functions to SaveVMHandlers

2020-11-01 Thread Alex Williamson

From: Kirti Wankhede 

Added .save_live_pending, .save_live_iterate and .save_live_complete_precopy
functions. These functions handles pre-copy and stop-and-copy phase.

In _SAVING|_RUNNING device state or pre-copy phase:
- read pending_bytes. If pending_bytes > 0, go through below steps.
- read data_offset - indicates kernel driver to write data to staging
  buffer.
- read data_size - amount of data in bytes written by vendor driver in
  migration region.
- read data_size bytes of data from data_offset in the migration region.
- Write data packet to file stream as below:
{VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data,
VFIO_MIG_FLAG_END_OF_STATE }

In _SAVING device state or stop-and-copy phase
a. read config space of device and save to migration file stream. This
   doesn't need to be from vendor driver. Any other special config state
   from driver can be saved as data in following iteration.
b. read pending_bytes. If pending_bytes > 0, go through below steps.
c. read data_offset - indicates kernel driver to write data to staging
   buffer.
d. read data_size - amount of data in bytes written by vendor driver in
   migration region.
e. read data_size bytes of data from data_offset in the migration region.
f. Write data packet as below:
   {VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data}
g. iterate through steps b to f while (pending_bytes > 0)
h. Write {VFIO_MIG_FLAG_END_OF_STATE}

When data region is mapped, its user's responsibility to read data from
data_offset of data_size before moving to next steps.

Added fix suggested by Artem Polyakov to reset pending_bytes in
vfio_save_iterate().
Added fix suggested by Zhi Wang to add 0 as data size in migration stream and
add END_OF_STATE delimiter to indicate phase complete.

Suggested-by: Artem Polyakov 
Suggested-by: Zhi Wang 
Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
Reviewed-by: Yan Zhao 
Signed-off-by: Alex Williamson 
---
 hw/vfio/migration.c   |  276 +
 hw/vfio/trace-events  |6 +
 include/hw/vfio/vfio-common.h |1 
 3 files changed, 283 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index ca6fd896655b..5e0c9e8e61ec 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -148,6 +148,151 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, 
uint32_t mask,
 return 0;
 }
 
+static void *get_data_section_size(VFIORegion *region, uint64_t data_offset,
+   uint64_t data_size, uint64_t *size)
+{
+void *ptr = NULL;
+uint64_t limit = 0;
+int i;
+
+if (!region->mmaps) {
+if (size) {
+*size = MIN(data_size, region->size - data_offset);
+}
+return ptr;
+}
+
+for (i = 0; i < region->nr_mmaps; i++) {
+VFIOMmap *map = region->mmaps + i;
+
+if ((data_offset >= map->offset) &&
+(data_offset < map->offset + map->size)) {
+
+/* check if data_offset is within sparse mmap areas */
+ptr = map->mmap + data_offset - map->offset;
+if (size) {
+*size = MIN(data_size, map->offset + map->size - data_offset);
+}
+break;
+} else if ((data_offset < map->offset) &&
+   (!limit || limit > map->offset)) {
+/*
+ * data_offset is not within sparse mmap areas, find size of
+ * non-mapped area. Check through all list since region->mmaps list
+ * is not sorted.
+ */
+limit = map->offset;
+}
+}
+
+if (!ptr && size) {
+*size = limit ? MIN(data_size, limit - data_offset) : data_size;
+}
+return ptr;
+}
+
+static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev, uint64_t *size)
+{
+VFIOMigration *migration = vbasedev->migration;
+VFIORegion *region = &migration->region;
+uint64_t data_offset = 0, data_size = 0, sz;
+int ret;
+
+ret = vfio_mig_read(vbasedev, &data_offset, sizeof(data_offset),
+  region->fd_offset + VFIO_MIG_STRUCT_OFFSET(data_offset));
+if (ret < 0) {
+return ret;
+}
+
+ret = vfio_mig_read(vbasedev, &data_size, sizeof(data_size),
+region->fd_offset + VFIO_MIG_STRUCT_OFFSET(data_size));
+if (ret < 0) {
+return ret;
+}
+
+trace_vfio_save_buffer(vbasedev->name, data_offset, data_size,
+   migration->pending_bytes);
+
+qemu_put_be64(f, data_size);
+sz = data_size;
+
+while (sz) {
+void *buf;
+uint64_t sec_size;
+bool buf_allocated = false;
+
+buf = get_data_section_size(region, data_offset, sz, &sec_size);
+
+if (!buf) {
+buf = g_try_malloc(sec_size);
+if (!buf) {
+error_report("%s: Error allocating buffer ", __func__);
+return -ENOMEM;
+}
+buf_allocated = true;
+
+ret

[PULL v3 12/32] vfio: Add function to start and stop dirty pages tracking

2020-11-01 Thread Alex Williamson

From: Kirti Wankhede 

Call VFIO_IOMMU_DIRTY_PAGES ioctl to start and stop dirty pages tracking
for VFIO devices.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Alex Williamson 
---
 hw/vfio/migration.c |   36 
 1 file changed, 36 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 39503b49e33d..a248effb3786 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -11,6 +11,7 @@
 #include "qemu/main-loop.h"
 #include "qemu/cutils.h"
 #include 
+#include 
 
 #include "sysemu/runstate.h"
 #include "hw/vfio/vfio-common.h"
@@ -391,10 +392,40 @@ static int vfio_load_device_config_state(QEMUFile *f, 
void *opaque)
 return qemu_file_get_error(f);
 }
 
+static int vfio_set_dirty_page_tracking(VFIODevice *vbasedev, bool start)
+{
+int ret;
+VFIOMigration *migration = vbasedev->migration;
+VFIOContainer *container = vbasedev->group->container;
+struct vfio_iommu_type1_dirty_bitmap dirty = {
+.argsz = sizeof(dirty),
+};
+
+if (start) {
+if (migration->device_state & VFIO_DEVICE_STATE_SAVING) {
+dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START;
+} else {
+return -EINVAL;
+}
+} else {
+dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP;
+}
+
+ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, &dirty);
+if (ret) {
+error_report("Failed to set dirty tracking flag 0x%x errno: %d",
+ dirty.flags, errno);
+return -errno;
+}
+return ret;
+}
+
 static void vfio_migration_cleanup(VFIODevice *vbasedev)
 {
 VFIOMigration *migration = vbasedev->migration;
 
+vfio_set_dirty_page_tracking(vbasedev, false);
+
 if (migration->region.mmaps) {
 vfio_region_unmap(&migration->region);
 }
@@ -435,6 +466,11 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
 return ret;
 }
 
+ret = vfio_set_dirty_page_tracking(vbasedev, true);
+if (ret) {
+return ret;
+}
+
 qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
 
 ret = qemu_file_get_error(f);

[PULL v3 10/32] memory: Set DIRTY_MEMORY_MIGRATION when IOMMU is enabled

2020-11-01 Thread Alex Williamson

From: Kirti Wankhede 

mr->ram_block is NULL when mr->is_iommu is true, then fr.dirty_log_mask
wasn't set correctly due to which memory listener's log_sync doesn't
get called.
This patch returns log_mask with DIRTY_MEMORY_MIGRATION set when
IOMMU is enabled.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Yan Zhao 
Acked-by: Paolo Bonzini 
Signed-off-by: Alex Williamson 
---
 softmmu/memory.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/softmmu/memory.c b/softmmu/memory.c
index ee4a6bc16859..21d533d8ed84 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1806,7 +1806,7 @@ bool memory_region_is_ram_device(MemoryRegion *mr)
 uint8_t memory_region_get_dirty_log_mask(MemoryRegion *mr)
 {
 uint8_t mask = mr->dirty_log_mask;
-if (global_dirty_log && mr->ram_block) {
+if (global_dirty_log && (mr->ram_block || memory_region_is_iommu(mr))) {
 mask |= (1 << DIRTY_MEMORY_MIGRATION);
 }
 return mask;

[PULL v3 13/32] vfio: Add vfio_listener_log_sync to mark dirty pages

2020-11-01 Thread Alex Williamson

From: Kirti Wankhede 

vfio_listener_log_sync gets list of dirty pages from container using
VFIO_IOMMU_GET_DIRTY_BITMAP ioctl and mark those pages dirty when all
devices are stopped and saving state.
Return early for the RAM block section of mapped MMIO region.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
[aw: fix error_report types, fix cpu_physical_memory_set_dirty_lebitmap() cast]
Signed-off-by: Alex Williamson 
---
 hw/vfio/common.c |  116 ++
 hw/vfio/trace-events |1 
 2 files changed, 117 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index d4959c036dd1..0a97fbfefb89 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -29,6 +29,7 @@
 #include "hw/vfio/vfio.h"
 #include "exec/address-spaces.h"
 #include "exec/memory.h"
+#include "exec/ram_addr.h"
 #include "hw/hw.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
@@ -37,6 +38,7 @@
 #include "sysemu/reset.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "migration/migration.h"
 
 VFIOGroupList vfio_group_list =
 QLIST_HEAD_INITIALIZER(vfio_group_list);
@@ -286,6 +288,39 @@ const MemoryRegionOps vfio_region_ops = {
 },
 };
 
+/*
+ * Device state interfaces
+ */
+
+static bool vfio_devices_all_stopped_and_saving(VFIOContainer *container)
+{
+VFIOGroup *group;
+VFIODevice *vbasedev;
+MigrationState *ms = migrate_get_current();
+
+if (!migration_is_setup_or_active(ms->state)) {
+return false;
+}
+
+QLIST_FOREACH(group, &container->group_list, container_next) {
+QLIST_FOREACH(vbasedev, &group->device_list, next) {
+VFIOMigration *migration = vbasedev->migration;
+
+if (!migration) {
+return false;
+}
+
+if ((migration->device_state & VFIO_DEVICE_STATE_SAVING) &&
+!(migration->device_state & VFIO_DEVICE_STATE_RUNNING)) {
+continue;
+} else {
+return false;
+}
+}
+}
+return true;
+}
+
 /*
  * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
  */
@@ -812,9 +847,90 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
 }
 }
 
+static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
+ uint64_t size, ram_addr_t ram_addr)
+{
+struct vfio_iommu_type1_dirty_bitmap *dbitmap;
+struct vfio_iommu_type1_dirty_bitmap_get *range;
+uint64_t pages;
+int ret;
+
+dbitmap = g_malloc0(sizeof(*dbitmap) + sizeof(*range));
+
+dbitmap->argsz = sizeof(*dbitmap) + sizeof(*range);
+dbitmap->flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP;
+range = (struct vfio_iommu_type1_dirty_bitmap_get *)&dbitmap->data;
+range->iova = iova;
+range->size = size;
+
+/*
+ * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of
+ * TARGET_PAGE_SIZE to mark those dirty. Hence set bitmap's pgsize to
+ * TARGET_PAGE_SIZE.
+ */
+range->bitmap.pgsize = TARGET_PAGE_SIZE;
+
+pages = TARGET_PAGE_ALIGN(range->size) >> TARGET_PAGE_BITS;
+range->bitmap.size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) /
+ BITS_PER_BYTE;
+range->bitmap.data = g_try_malloc0(range->bitmap.size);
+if (!range->bitmap.data) {
+ret = -ENOMEM;
+goto err_out;
+}
+
+ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, dbitmap);
+if (ret) {
+error_report("Failed to get dirty bitmap for iova: 0x%"PRIx64
+" size: 0x%"PRIx64" err: %d", (uint64_t)range->iova,
+(uint64_t)range->size, errno);
+goto err_out;
+}
+
+cpu_physical_memory_set_dirty_lebitmap((unsigned long *)range->bitmap.data,
+ram_addr, pages);
+
+trace_vfio_get_dirty_bitmap(container->fd, range->iova, range->size,
+range->bitmap.size, ram_addr);
+err_out:
+g_free(range->bitmap.data);
+g_free(dbitmap);
+
+return ret;
+}
+
+static int vfio_sync_dirty_bitmap(VFIOContainer *container,
+  MemoryRegionSection *section)
+{
+ram_addr_t ram_addr;
+
+ram_addr = memory_region_get_ram_addr(section->mr) +
+   section->offset_within_region;
+
+return vfio_get_dirty_bitmap(container,
+   TARGET_PAGE_ALIGN(section->offset_within_address_space),
+   int128_get64(section->size), ram_addr);
+}
+
+static void vfio_listerner_log_sync(MemoryListener *listener,
+MemoryRegionSection *section)
+{
+VFIOContainer *container = container_of(listener, VFIOContainer, listener);
+
+if (vfio_listener_skipped_section(section) ||
+!container->dirty_pages_supported) {
+return;
+}
+
+if (vfio_devices_all_stopped_and_saving(container)) {
+vfio_sync_dirty_bitmap(container, section);
+

[PULL v3 18/32] update-linux-headers: Add vfio_zdev.h

2020-11-01 Thread Alex Williamson

From: Matthew Rosato 

vfio_zdev.h is used by s390x zPCI support to pass device-specific
CLP information between host and userspace.

Signed-off-by: Matthew Rosato 
Acked-by: Cornelia Huck 
Signed-off-by: Alex Williamson 
---
 scripts/update-linux-headers.sh |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index 29c27f468177..9efbaf2f84b3 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -141,7 +141,7 @@ done
 
 rm -rf "$output/linux-headers/linux"
 mkdir -p "$output/linux-headers/linux"
-for header in kvm.h vfio.h vfio_ccw.h vhost.h \
+for header in kvm.h vfio.h vfio_ccw.h vfio_zdev.h vhost.h \
   psci.h psp-sev.h userfaultfd.h mman.h; do
 cp "$tmpdir/include/linux/$header" "$output/linux-headers/linux"
 done

[PULL v3 14/32] vfio: Dirty page tracking when vIOMMU is enabled

2020-11-01 Thread Alex Williamson

From: Kirti Wankhede 

When vIOMMU is enabled, register MAP notifier from log_sync when all
devices in container are in stop and copy phase of migration. Call replay
and get dirty pages from notifier callback.

Suggested-by: Alex Williamson 
Signed-off-by: Kirti Wankhede 
Reviewed-by: Yan Zhao 
Signed-off-by: Alex Williamson 
---
 hw/vfio/common.c |   88 +++---
 hw/vfio/trace-events |1 +
 2 files changed, 83 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 0a97fbfefb89..43e6e89090f2 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -442,8 +442,8 @@ static bool 
vfio_listener_skipped_section(MemoryRegionSection *section)
 }
 
 /* Called with rcu_read_lock held.  */
-static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, void **vaddr,
-   bool *read_only)
+static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
+   ram_addr_t *ram_addr, bool *read_only)
 {
 MemoryRegion *mr;
 hwaddr xlat;
@@ -474,8 +474,17 @@ static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, void 
**vaddr,
 return false;
 }
 
-*vaddr = memory_region_get_ram_ptr(mr) + xlat;
-*read_only = !writable || mr->readonly;
+if (vaddr) {
+*vaddr = memory_region_get_ram_ptr(mr) + xlat;
+}
+
+if (ram_addr) {
+*ram_addr = memory_region_get_ram_addr(mr) + xlat;
+}
+
+if (read_only) {
+*read_only = !writable || mr->readonly;
+}
 
 return true;
 }
@@ -485,7 +494,6 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
 VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
 VFIOContainer *container = giommu->container;
 hwaddr iova = iotlb->iova + giommu->iommu_offset;
-bool read_only;
 void *vaddr;
 int ret;
 
@@ -501,7 +509,9 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
 rcu_read_lock();
 
 if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
-if (!vfio_get_vaddr(iotlb, &vaddr, &read_only)) {
+bool read_only;
+
+if (!vfio_get_xlat_addr(iotlb, &vaddr, NULL, &read_only)) {
 goto out;
 }
 /*
@@ -899,11 +909,77 @@ err_out:
 return ret;
 }
 
+typedef struct {
+IOMMUNotifier n;
+VFIOGuestIOMMU *giommu;
+} vfio_giommu_dirty_notifier;
+
+static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
+{
+vfio_giommu_dirty_notifier *gdn = container_of(n,
+vfio_giommu_dirty_notifier, n);
+VFIOGuestIOMMU *giommu = gdn->giommu;
+VFIOContainer *container = giommu->container;
+hwaddr iova = iotlb->iova + giommu->iommu_offset;
+ram_addr_t translated_addr;
+
+trace_vfio_iommu_map_dirty_notify(iova, iova + iotlb->addr_mask);
+
+if (iotlb->target_as != &address_space_memory) {
+error_report("Wrong target AS \"%s\", only system memory is allowed",
+ iotlb->target_as->name ? iotlb->target_as->name : "none");
+return;
+}
+
+rcu_read_lock();
+if (vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL)) {
+int ret;
+
+ret = vfio_get_dirty_bitmap(container, iova, iotlb->addr_mask + 1,
+translated_addr);
+if (ret) {
+error_report("vfio_iommu_map_dirty_notify(%p, 0x%"HWADDR_PRIx", "
+ "0x%"HWADDR_PRIx") = %d (%m)",
+ container, iova,
+ iotlb->addr_mask + 1, ret);
+}
+}
+rcu_read_unlock();
+}
+
 static int vfio_sync_dirty_bitmap(VFIOContainer *container,
   MemoryRegionSection *section)
 {
 ram_addr_t ram_addr;
 
+if (memory_region_is_iommu(section->mr)) {
+VFIOGuestIOMMU *giommu;
+
+QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
+if (MEMORY_REGION(giommu->iommu) == section->mr &&
+giommu->n.start == section->offset_within_region) {
+Int128 llend;
+vfio_giommu_dirty_notifier gdn = { .giommu = giommu };
+int idx = memory_region_iommu_attrs_to_index(giommu->iommu,
+   MEMTXATTRS_UNSPECIFIED);
+
+llend = 
int128_add(int128_make64(section->offset_within_region),
+   section->size);
+llend = int128_sub(llend, int128_one());
+
+iommu_notifier_init(&gdn.n,
+vfio_iommu_map_dirty_notify,
+IOMMU_NOTIFIER_MAP,
+section->offset_within_region,
+int128_get64(llend),
+idx);
+memory_region_iommu_replay(giommu->iommu, &gdn.n);
+break;
+}

[PULL v3 17/32] qapi: Add VFIO devices migration stats in Migration stats

2020-11-01 Thread Alex Williamson

From: Kirti Wankhede 

Added amount of bytes transferred to the VM at destination by all VFIO
devices

Signed-off-by: Kirti Wankhede 
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Alex Williamson 
---
 hw/vfio/common.c  |   19 +++
 hw/vfio/migration.c   |9 +
 include/hw/vfio/vfio-common.h |3 +++
 migration/migration.c |   17 +
 monitor/hmp-cmds.c|6 ++
 qapi/migration.json   |   17 +
 6 files changed, 71 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 620358a3d804..d41ba67ffbbb 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -292,6 +292,25 @@ const MemoryRegionOps vfio_region_ops = {
  * Device state interfaces
  */
 
+bool vfio_mig_active(void)
+{
+VFIOGroup *group;
+VFIODevice *vbasedev;
+
+if (QLIST_EMPTY(&vfio_group_list)) {
+return false;
+}
+
+QLIST_FOREACH(group, &vfio_group_list, next) {
+QLIST_FOREACH(vbasedev, &group->device_list, next) {
+if (vbasedev->migration_blocker) {
+return false;
+}
+}
+}
+return true;
+}
+
 static bool vfio_devices_all_stopped_and_saving(VFIOContainer *container)
 {
 VFIOGroup *group;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index a248effb3786..3ce285ea395d 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -45,6 +45,8 @@
 #define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xef13ULL)
 #define VFIO_MIG_FLAG_DEV_DATA_STATE(0xef14ULL)
 
+static int64_t bytes_transferred;
+
 static inline int vfio_mig_access(VFIODevice *vbasedev, void *val, int count,
   off_t off, bool iswrite)
 {
@@ -255,6 +257,7 @@ static int vfio_save_buffer(QEMUFile *f, VFIODevice 
*vbasedev, uint64_t *size)
 *size = data_size;
 }
 
+bytes_transferred += data_size;
 return ret;
 }
 
@@ -785,6 +788,7 @@ static void vfio_migration_state_notifier(Notifier 
*notifier, void *data)
 case MIGRATION_STATUS_CANCELLING:
 case MIGRATION_STATUS_CANCELLED:
 case MIGRATION_STATUS_FAILED:
+bytes_transferred = 0;
 ret = vfio_migration_set_state(vbasedev,
   ~(VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RESUMING),
   VFIO_DEVICE_STATE_RUNNING);
@@ -866,6 +870,11 @@ err:
 
 /* -- */
 
+int64_t vfio_mig_bytes_transferred(void)
+{
+return bytes_transferred;
+}
+
 int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
 {
 VFIOContainer *container = vbasedev->group->container;
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index b1c1b18fd228..24e299d97425 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -203,6 +203,9 @@ extern const MemoryRegionOps vfio_region_ops;
 typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
 extern VFIOGroupList vfio_group_list;
 
+bool vfio_mig_active(void);
+int64_t vfio_mig_bytes_transferred(void);
+
 #ifdef CONFIG_LINUX
 int vfio_get_region_info(VFIODevice *vbasedev, int index,
  struct vfio_region_info **info);
diff --git a/migration/migration.c b/migration/migration.c
index 9bb4fee5acec..3263aa55a9da 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -57,6 +57,10 @@
 #include "qemu/queue.h"
 #include "multifd.h"
 
+#ifdef CONFIG_VFIO
+#include "hw/vfio/vfio-common.h"
+#endif
+
 #define MAX_THROTTLE  (128 << 20)  /* Migration transfer speed throttling 
*/
 
 /* Amount of time to allocate to each "chunk" of bandwidth-throttled
@@ -1037,6 +1041,17 @@ static void populate_disk_info(MigrationInfo *info)
 }
 }
 
+static void populate_vfio_info(MigrationInfo *info)
+{
+#ifdef CONFIG_VFIO
+if (vfio_mig_active()) {
+info->has_vfio = true;
+info->vfio = g_malloc0(sizeof(*info->vfio));
+info->vfio->transferred = vfio_mig_bytes_transferred();
+}
+#endif
+}
+
 static void fill_source_migration_info(MigrationInfo *info)
 {
 MigrationState *s = migrate_get_current();
@@ -1061,6 +1076,7 @@ static void fill_source_migration_info(MigrationInfo 
*info)
 populate_time_info(info, s);
 populate_ram_info(info, s);
 populate_disk_info(info);
+populate_vfio_info(info);
 break;
 case MIGRATION_STATUS_COLO:
 info->has_status = true;
@@ -1069,6 +1085,7 @@ static void fill_source_migration_info(MigrationInfo 
*info)
 case MIGRATION_STATUS_COMPLETED:
 populate_time_info(info, s);
 populate_ram_info(info, s);
+populate_vfio_info(info);
 break;
 case MIGRATION_STATUS_FAILED:
 info->has_status = true;
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 9789f4277f50..56e9bad33d94 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -357,6 +357,12 @@ void hmp_info_migrate(Monit

[PULL v3 20/32] s390x/pci: Move header files to include/hw/s390x

2020-11-01 Thread Alex Williamson

From: Matthew Rosato 

Seems a more appropriate location for them.

Signed-off-by: Matthew Rosato 
Reviewed-by: Cornelia Huck 
Signed-off-by: Alex Williamson 
---
 MAINTAINERS  |1 
 hw/s390x/s390-pci-bus.c  |4 
 hw/s390x/s390-pci-bus.h  |  372 --
 hw/s390x/s390-pci-inst.c |4 
 hw/s390x/s390-pci-inst.h |  312 
 hw/s390x/s390-virtio-ccw.c   |2 
 include/hw/s390x/s390-pci-bus.h  |  372 ++
 include/hw/s390x/s390-pci-inst.h |  312 
 8 files changed, 690 insertions(+), 689 deletions(-)
 delete mode 100644 hw/s390x/s390-pci-bus.h
 delete mode 100644 hw/s390x/s390-pci-inst.h
 create mode 100644 include/hw/s390x/s390-pci-bus.h
 create mode 100644 include/hw/s390x/s390-pci-inst.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 8c744a9bdf42..2c22bbca5ac3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1436,6 +1436,7 @@ S390 PCI
 M: Matthew Rosato 
 S: Supported
 F: hw/s390x/s390-pci*
+F: include/hw/s390x/s390-pci*
 L: qemu-s3...@nongnu.org
 
 UniCore32 Machines
diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index fb4cee87a494..a929340688cc 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -15,8 +15,8 @@
 #include "qapi/error.h"
 #include "qapi/visitor.h"
 #include "cpu.h"
-#include "s390-pci-bus.h"
-#include "s390-pci-inst.h"
+#include "hw/s390x/s390-pci-bus.h"
+#include "hw/s390x/s390-pci-inst.h"
 #include "hw/pci/pci_bus.h"
 #include "hw/qdev-properties.h"
 #include "hw/pci/pci_bridge.h"
diff --git a/hw/s390x/s390-pci-bus.h b/hw/s390x/s390-pci-bus.h
deleted file mode 100644
index 97464d0ad33e..
--- a/hw/s390x/s390-pci-bus.h
+++ /dev/null
@@ -1,372 +0,0 @@
-/*
- * s390 PCI BUS definitions
- *
- * Copyright 2014 IBM Corp.
- * Author(s): Frank Blaschka 
- *Hong Bo Li 
- *Yi Min Zhao 
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or (at
- * your option) any later version. See the COPYING file in the top-level
- * directory.
- */
-
-#ifndef HW_S390_PCI_BUS_H
-#define HW_S390_PCI_BUS_H
-
-#include "hw/pci/pci.h"
-#include "hw/pci/pci_host.h"
-#include "hw/s390x/sclp.h"
-#include "hw/s390x/s390_flic.h"
-#include "hw/s390x/css.h"
-#include "qom/object.h"
-
-#define TYPE_S390_PCI_HOST_BRIDGE "s390-pcihost"
-#define TYPE_S390_PCI_BUS "s390-pcibus"
-#define TYPE_S390_PCI_DEVICE "zpci"
-#define TYPE_S390_PCI_IOMMU "s390-pci-iommu"
-#define TYPE_S390_IOMMU_MEMORY_REGION "s390-iommu-memory-region"
-#define FH_MASK_ENABLE   0x8000
-#define FH_MASK_INSTANCE 0x7f00
-#define FH_MASK_SHM  0x00ff
-#define FH_MASK_INDEX0x
-#define FH_SHM_VFIO  0x0001
-#define FH_SHM_EMUL  0x0002
-#define ZPCI_MAX_FID 0x
-#define ZPCI_MAX_UID 0x
-#define UID_UNDEFINED 0
-#define UID_CHECKING_ENABLED 0x01
-
-OBJECT_DECLARE_SIMPLE_TYPE(S390pciState, S390_PCI_HOST_BRIDGE)
-OBJECT_DECLARE_SIMPLE_TYPE(S390PCIBus, S390_PCI_BUS)
-OBJECT_DECLARE_SIMPLE_TYPE(S390PCIBusDevice, S390_PCI_DEVICE)
-OBJECT_DECLARE_SIMPLE_TYPE(S390PCIIOMMU, S390_PCI_IOMMU)
-
-#define HP_EVENT_TO_CONFIGURED0x0301
-#define HP_EVENT_RESERVED_TO_STANDBY  0x0302
-#define HP_EVENT_DECONFIGURE_REQUEST  0x0303
-#define HP_EVENT_CONFIGURED_TO_STBRES 0x0304
-#define HP_EVENT_STANDBY_TO_RESERVED  0x0308
-
-#define ERR_EVENT_INVALAS 0x1
-#define ERR_EVENT_OORANGE 0x2
-#define ERR_EVENT_INVALTF 0x3
-#define ERR_EVENT_TPROTE  0x4
-#define ERR_EVENT_APROTE  0x5
-#define ERR_EVENT_KEYE0x6
-#define ERR_EVENT_INVALTE 0x7
-#define ERR_EVENT_INVALTL 0x8
-#define ERR_EVENT_TT  0x9
-#define ERR_EVENT_INVALMS 0xa
-#define ERR_EVENT_SERR0xb
-#define ERR_EVENT_NOMSI   0x10
-#define ERR_EVENT_INVALBV 0x11
-#define ERR_EVENT_AIBV0x12
-#define ERR_EVENT_AIRERR  0x13
-#define ERR_EVENT_FMBA0x2a
-#define ERR_EVENT_FMBUP   0x2b
-#define ERR_EVENT_FMBPRO  0x2c
-#define ERR_EVENT_CCONF   0x30
-#define ERR_EVENT_SERVAC  0x3a
-#define ERR_EVENT_PERMERR 0x3b
-
-#define ERR_EVENT_Q_BIT 0x2
-#define ERR_EVENT_MVN_OFFSET 16
-
-#define ZPCI_MSI_VEC_BITS 11
-#define ZPCI_MSI_VEC_MASK 0x7ff
-
-#define ZPCI_MSI_ADDR  0xfe00ULL
-#define ZPCI_SDMA_ADDR 0x1ULL
-#define ZPCI_EDMA_ADDR 0x1ffULL
-
-#define PAGE_SHIFT  12
-#define PAGE_SIZE   (1 << PAGE_SHIFT)
-#define PAGE_MASK   (~(PAGE_SIZE-1))
-#define PAGE_DEFAULT_ACC0
-#define PAGE_DEFAULT_KEY(PAGE_DEFAULT_ACC << 4)
-
-/* I/O Translation Anchor (IOTA) */
-enum ZpciIoatDtype {
-ZPCI_IOTA_STO = 0,
-ZPCI_IOTA_RTTO = 1,
-ZPCI_IOTA_RSTO = 2,
-ZPCI_IOTA_RFTO = 3,
-ZPCI_IOTA_PFAA = 4,
-ZPCI_IOTA_IOPFAA = 5,
-ZPCI_IOTA_IOPTO = 7
-};
-
-#define ZPCI_IOTA_IOT_ENABLED   0x800ULL
-#define ZPCI_IOTA_DT_ST (ZPCI_IOTA_STO  << 2)
-#define ZPCI_IOTA_DT_RT (ZPCI_IOTA_RTTO << 2)
-#define ZPCI

[PULL v3 11/32] vfio: Get migration capability flags for container

2020-11-01 Thread Alex Williamson

From: Kirti Wankhede 

Added helper functions to get IOMMU info capability chain.
Added function to get migration capability information from that
capability chain for IOMMU container.

Similar change was proposed earlier:
https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg03759.html

Disable migration for devices if IOMMU module doesn't support migration
capability.

Signed-off-by: Kirti Wankhede 
Cc: Shameer Kolothum 
Cc: Eric Auger 
Signed-off-by: Alex Williamson 
---
 hw/vfio/common.c  |   90 +
 hw/vfio/migration.c   |7 +++
 include/hw/vfio/vfio-common.h |3 +
 3 files changed, 91 insertions(+), 9 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index c6e98b8d61be..d4959c036dd1 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1228,6 +1228,75 @@ static int vfio_init_container(VFIOContainer *container, 
int group_fd,
 return 0;
 }
 
+static int vfio_get_iommu_info(VFIOContainer *container,
+   struct vfio_iommu_type1_info **info)
+{
+
+size_t argsz = sizeof(struct vfio_iommu_type1_info);
+
+*info = g_new0(struct vfio_iommu_type1_info, 1);
+again:
+(*info)->argsz = argsz;
+
+if (ioctl(container->fd, VFIO_IOMMU_GET_INFO, *info)) {
+g_free(*info);
+*info = NULL;
+return -errno;
+}
+
+if (((*info)->argsz > argsz)) {
+argsz = (*info)->argsz;
+*info = g_realloc(*info, argsz);
+goto again;
+}
+
+return 0;
+}
+
+static struct vfio_info_cap_header *
+vfio_get_iommu_info_cap(struct vfio_iommu_type1_info *info, uint16_t id)
+{
+struct vfio_info_cap_header *hdr;
+void *ptr = info;
+
+if (!(info->flags & VFIO_IOMMU_INFO_CAPS)) {
+return NULL;
+}
+
+for (hdr = ptr + info->cap_offset; hdr != ptr; hdr = ptr + hdr->next) {
+if (hdr->id == id) {
+return hdr;
+}
+}
+
+return NULL;
+}
+
+static void vfio_get_iommu_info_migration(VFIOContainer *container,
+ struct vfio_iommu_type1_info *info)
+{
+struct vfio_info_cap_header *hdr;
+struct vfio_iommu_type1_info_cap_migration *cap_mig;
+
+hdr = vfio_get_iommu_info_cap(info, VFIO_IOMMU_TYPE1_INFO_CAP_MIGRATION);
+if (!hdr) {
+return;
+}
+
+cap_mig = container_of(hdr, struct vfio_iommu_type1_info_cap_migration,
+header);
+
+/*
+ * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of
+ * TARGET_PAGE_SIZE to mark those dirty.
+ */
+if (cap_mig->pgsize_bitmap & TARGET_PAGE_SIZE) {
+container->dirty_pages_supported = true;
+container->max_dirty_bitmap_size = cap_mig->max_dirty_bitmap_size;
+container->dirty_pgsizes = cap_mig->pgsize_bitmap;
+}
+}
+
 static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
   Error **errp)
 {
@@ -1297,6 +1366,7 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
 container->space = space;
 container->fd = fd;
 container->error = NULL;
+container->dirty_pages_supported = false;
 QLIST_INIT(&container->giommu_list);
 QLIST_INIT(&container->hostwin_list);
 
@@ -1309,7 +1379,7 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
 case VFIO_TYPE1v2_IOMMU:
 case VFIO_TYPE1_IOMMU:
 {
-struct vfio_iommu_type1_info info;
+struct vfio_iommu_type1_info *info;
 
 /*
  * FIXME: This assumes that a Type1 IOMMU can map any 64-bit
@@ -1318,15 +1388,19 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
  * existing Type1 IOMMUs generally support any IOVA we're
  * going to actually try in practice.
  */
-info.argsz = sizeof(info);
-ret = ioctl(fd, VFIO_IOMMU_GET_INFO, &info);
-/* Ignore errors */
-if (ret || !(info.flags & VFIO_IOMMU_INFO_PGSIZES)) {
+ret = vfio_get_iommu_info(container, &info);
+
+if (ret || !(info->flags & VFIO_IOMMU_INFO_PGSIZES)) {
 /* Assume 4k IOVA page size */
-info.iova_pgsizes = 4096;
+info->iova_pgsizes = 4096;
 }
-vfio_host_win_add(container, 0, (hwaddr)-1, info.iova_pgsizes);
-container->pgsizes = info.iova_pgsizes;
+vfio_host_win_add(container, 0, (hwaddr)-1, info->iova_pgsizes);
+container->pgsizes = info->iova_pgsizes;
+
+if (!ret) {
+vfio_get_iommu_info_migration(container, info);
+}
+g_free(info);
 break;
 }
 case VFIO_SPAPR_TCE_v2_IOMMU:
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 1af0fce874d4..39503b49e33d 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -832,9 +832,14 @@ err:
 
 int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
 {
+VFIOContainer *container = vbasedev->group->container;

[PULL v3 15/32] vfio: Add ioctl to get dirty pages bitmap during dma unmap

2020-11-01 Thread Alex Williamson

From: Kirti Wankhede 

With vIOMMU, IO virtual address range can get unmapped while in pre-copy
phase of migration. In that case, unmap ioctl should return pages pinned
in that range and QEMU should find its correcponding guest physical
addresses and report those dirty.

Suggested-by: Alex Williamson 
Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
[aw: fix error_report types, fix cpu_physical_memory_set_dirty_lebitmap() cast]
Signed-off-by: Alex Williamson 
---
 hw/vfio/common.c |   97 --
 1 file changed, 93 insertions(+), 4 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 43e6e89090f2..620358a3d804 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -321,11 +321,95 @@ static bool 
vfio_devices_all_stopped_and_saving(VFIOContainer *container)
 return true;
 }
 
+static bool vfio_devices_all_running_and_saving(VFIOContainer *container)
+{
+VFIOGroup *group;
+VFIODevice *vbasedev;
+MigrationState *ms = migrate_get_current();
+
+if (!migration_is_setup_or_active(ms->state)) {
+return false;
+}
+
+QLIST_FOREACH(group, &container->group_list, container_next) {
+QLIST_FOREACH(vbasedev, &group->device_list, next) {
+VFIOMigration *migration = vbasedev->migration;
+
+if (!migration) {
+return false;
+}
+
+if ((migration->device_state & VFIO_DEVICE_STATE_SAVING) &&
+(migration->device_state & VFIO_DEVICE_STATE_RUNNING)) {
+continue;
+} else {
+return false;
+}
+}
+}
+return true;
+}
+
+static int vfio_dma_unmap_bitmap(VFIOContainer *container,
+ hwaddr iova, ram_addr_t size,
+ IOMMUTLBEntry *iotlb)
+{
+struct vfio_iommu_type1_dma_unmap *unmap;
+struct vfio_bitmap *bitmap;
+uint64_t pages = TARGET_PAGE_ALIGN(size) >> TARGET_PAGE_BITS;
+int ret;
+
+unmap = g_malloc0(sizeof(*unmap) + sizeof(*bitmap));
+
+unmap->argsz = sizeof(*unmap) + sizeof(*bitmap);
+unmap->iova = iova;
+unmap->size = size;
+unmap->flags |= VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP;
+bitmap = (struct vfio_bitmap *)&unmap->data;
+
+/*
+ * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of
+ * TARGET_PAGE_SIZE to mark those dirty. Hence set bitmap_pgsize to
+ * TARGET_PAGE_SIZE.
+ */
+
+bitmap->pgsize = TARGET_PAGE_SIZE;
+bitmap->size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) /
+   BITS_PER_BYTE;
+
+if (bitmap->size > container->max_dirty_bitmap_size) {
+error_report("UNMAP: Size of bitmap too big 0x%"PRIx64,
+ (uint64_t)bitmap->size);
+ret = -E2BIG;
+goto unmap_exit;
+}
+
+bitmap->data = g_try_malloc0(bitmap->size);
+if (!bitmap->data) {
+ret = -ENOMEM;
+goto unmap_exit;
+}
+
+ret = ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, unmap);
+if (!ret) {
+cpu_physical_memory_set_dirty_lebitmap((unsigned long *)bitmap->data,
+iotlb->translated_addr, pages);
+} else {
+error_report("VFIO_UNMAP_DMA with DIRTY_BITMAP : %m");
+}
+
+g_free(bitmap->data);
+unmap_exit:
+g_free(unmap);
+return ret;
+}
+
 /*
  * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
  */
 static int vfio_dma_unmap(VFIOContainer *container,
-  hwaddr iova, ram_addr_t size)
+  hwaddr iova, ram_addr_t size,
+  IOMMUTLBEntry *iotlb)
 {
 struct vfio_iommu_type1_dma_unmap unmap = {
 .argsz = sizeof(unmap),
@@ -334,6 +418,11 @@ static int vfio_dma_unmap(VFIOContainer *container,
 .size = size,
 };
 
+if (iotlb && container->dirty_pages_supported &&
+vfio_devices_all_running_and_saving(container)) {
+return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
+}
+
 while (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
 /*
  * The type1 backend has an off-by-one bug in the kernel (71a7d3d78e3c
@@ -381,7 +470,7 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr 
iova,
  * the VGA ROM space.
  */
 if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0 ||
-(errno == EBUSY && vfio_dma_unmap(container, iova, size) == 0 &&
+(errno == EBUSY && vfio_dma_unmap(container, iova, size, NULL) == 0 &&
  ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0)) {
 return 0;
 }
@@ -531,7 +620,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
  iotlb->addr_mask + 1, vaddr, ret);
 }
 } else {
-ret = vfio_dma_unmap(container, iova, iotlb->addr_mask + 1);
+ret = vfio_dma_unmap(container, iova, iotlb->addr_mask + 1, iotlb);
 if (ret) {

[PULL v3 27/32] s390x/pci: clean up s390 PCI groups

2020-11-01 Thread Alex Williamson

From: Matthew Rosato 

Add a step to remove all stashed PCI groups to avoid stale data between
machine resets.

Signed-off-by: Matthew Rosato 
Reviewed-by: Cornelia Huck 
Signed-off-by: Alex Williamson 
---
 hw/s390x/s390-pci-bus.c |   12 
 1 file changed, 12 insertions(+)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 4c7f06d5cf95..036cf4635a7e 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -813,6 +813,17 @@ static void s390_pcihost_realize(DeviceState *dev, Error 
**errp)
  S390_ADAPTER_SUPPRESSIBLE, errp);
 }
 
+static void s390_pcihost_unrealize(DeviceState *dev)
+{
+S390PCIGroup *group;
+S390pciState *s = S390_PCI_HOST_BRIDGE(dev);
+
+while (!QTAILQ_EMPTY(&s->zpci_groups)) {
+group = QTAILQ_FIRST(&s->zpci_groups);
+QTAILQ_REMOVE(&s->zpci_groups, group, link);
+}
+}
+
 static int s390_pci_msix_init(S390PCIBusDevice *pbdev)
 {
 char *name;
@@ -1171,6 +1182,7 @@ static void s390_pcihost_class_init(ObjectClass *klass, 
void *data)
 
 dc->reset = s390_pcihost_reset;
 dc->realize = s390_pcihost_realize;
+dc->unrealize = s390_pcihost_unrealize;
 hc->pre_plug = s390_pcihost_pre_plug;
 hc->plug = s390_pcihost_plug;
 hc->unplug_request = s390_pcihost_unplug_request;

[PULL v3 19/32] linux-headers: update against 5.10-rc1

2020-11-01 Thread Alex Williamson

From: Matthew Rosato 

commit 3650b228f83adda7e5ee532e2b90429c03f7b9ec

Signed-off-by: Matthew Rosato 
[aw: drop pvrdma_ring.h changes to avoid revert of d73415a31547 
("qemu/atomic.h: rename atomic_ to qatomic_")]
Signed-off-by: Alex Williamson 
---
 .../infiniband/hw/vmw_pvrdma/pvrdma_verbs.h|2 -
 include/standard-headers/linux/ethtool.h   |2 +
 include/standard-headers/linux/fuse.h  |   50 -
 include/standard-headers/linux/input-event-codes.h |4 +
 include/standard-headers/linux/pci_regs.h  |6 +-
 include/standard-headers/linux/virtio_fs.h |3 +
 include/standard-headers/linux/virtio_gpu.h|   19 +
 include/standard-headers/linux/virtio_mmio.h   |   11 +++
 include/standard-headers/linux/virtio_pci.h|   11 +++
 linux-headers/asm-arm64/kvm.h  |   25 ++
 linux-headers/asm-arm64/mman.h |1 
 linux-headers/asm-generic/hugetlb_encode.h |1 
 linux-headers/asm-generic/unistd.h |   18 ++---
 linux-headers/asm-mips/unistd_n32.h|1 
 linux-headers/asm-mips/unistd_n64.h|1 
 linux-headers/asm-mips/unistd_o32.h|1 
 linux-headers/asm-powerpc/unistd_32.h  |1 
 linux-headers/asm-powerpc/unistd_64.h  |1 
 linux-headers/asm-s390/unistd_32.h |1 
 linux-headers/asm-s390/unistd_64.h |1 
 linux-headers/asm-x86/kvm.h|   20 +
 linux-headers/asm-x86/unistd_32.h  |1 
 linux-headers/asm-x86/unistd_64.h  |1 
 linux-headers/asm-x86/unistd_x32.h |1 
 linux-headers/linux/kvm.h  |   19 +
 linux-headers/linux/mman.h |1 
 linux-headers/linux/vfio.h |   29 +++
 linux-headers/linux/vfio_zdev.h|   78 
 28 files changed, 294 insertions(+), 16 deletions(-)
 create mode 100644 linux-headers/linux/vfio_zdev.h

diff --git 
a/include/standard-headers/drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.h 
b/include/standard-headers/drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.h
index 1677208a411f..0a8c7c931199 100644
--- a/include/standard-headers/drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.h
+++ b/include/standard-headers/drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.h
@@ -176,7 +176,7 @@ struct pvrdma_port_attr {
uint8_t subnet_timeout;
uint8_t init_type_reply;
uint8_t active_width;
-   uint8_t active_speed;
+   uint16_tactive_speed;
uint8_t phys_state;
uint8_t reserved[2];
 };
diff --git a/include/standard-headers/linux/ethtool.h 
b/include/standard-headers/linux/ethtool.h
index e13eff44882d..0df22f7538e3 100644
--- a/include/standard-headers/linux/ethtool.h
+++ b/include/standard-headers/linux/ethtool.h
@@ -1617,6 +1617,8 @@ enum ethtool_link_mode_bit_indices {
ETHTOOL_LINK_MODE_40baseLR4_ER4_FR4_Full_BIT = 87,
ETHTOOL_LINK_MODE_40baseDR4_Full_BIT = 88,
ETHTOOL_LINK_MODE_40baseCR4_Full_BIT = 89,
+   ETHTOOL_LINK_MODE_100baseFX_Half_BIT = 90,
+   ETHTOOL_LINK_MODE_100baseFX_Full_BIT = 91,
/* must be last entry */
__ETHTOOL_LINK_MODE_MASK_NBITS
 };
diff --git a/include/standard-headers/linux/fuse.h 
b/include/standard-headers/linux/fuse.h
index f4df0a40f604..82c0a38b591e 100644
--- a/include/standard-headers/linux/fuse.h
+++ b/include/standard-headers/linux/fuse.h
@@ -172,6 +172,9 @@
  *  - add FUSE_WRITE_KILL_PRIV flag
  *  - add FUSE_SETUPMAPPING and FUSE_REMOVEMAPPING
  *  - add map_alignment to fuse_init_out, add FUSE_MAP_ALIGNMENT flag
+ *
+ *  7.32
+ *  - add flags to fuse_attr, add FUSE_ATTR_SUBMOUNT, add FUSE_SUBMOUNTS
  */
 
 #ifndef _LINUX_FUSE_H
@@ -203,7 +206,7 @@
 #define FUSE_KERNEL_VERSION 7
 
 /** Minor version number of this interface */
-#define FUSE_KERNEL_MINOR_VERSION 31
+#define FUSE_KERNEL_MINOR_VERSION 32
 
 /** The node ID of the root inode */
 #define FUSE_ROOT_ID 1
@@ -227,7 +230,7 @@ struct fuse_attr {
uint32_tgid;
uint32_trdev;
uint32_tblksize;
-   uint32_tpadding;
+   uint32_tflags;
 };
 
 struct fuse_kstatfs {
@@ -309,7 +312,10 @@ struct fuse_file_lock {
  * FUSE_CACHE_SYMLINKS: cache READLINK responses
  * FUSE_NO_OPENDIR_SUPPORT: kernel supports zero-message opendir
  * FUSE_EXPLICIT_INVAL_DATA: only invalidate cached pages on explicit request
- * FUSE_MAP_ALIGNMENT: map_alignment field is valid
+ * FUSE_MAP_ALIGNMENT: init_out.map_alignment contains log2(byte alignment) for
+ *foffset and moffset fields in struct
+ *fuse_setupmapping_o

[PULL v3 21/32] vfio: Create shared routine for scanning info capabilities

2020-11-01 Thread Alex Williamson

From: Matthew Rosato 

Rather than duplicating the same loop in multiple locations,
create a static function to do the work.

Signed-off-by: Matthew Rosato 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Cornelia Huck 
Signed-off-by: Alex Williamson 
---
 hw/vfio/common.c |   21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index d41ba67ffbbb..693d3a29439a 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1125,17 +1125,12 @@ static void vfio_listener_release(VFIOContainer 
*container)
 }
 }
 
-struct vfio_info_cap_header *
-vfio_get_region_info_cap(struct vfio_region_info *info, uint16_t id)
+static struct vfio_info_cap_header *
+vfio_get_cap(void *ptr, uint32_t cap_offset, uint16_t id)
 {
 struct vfio_info_cap_header *hdr;
-void *ptr = info;
 
-if (!(info->flags & VFIO_REGION_INFO_FLAG_CAPS)) {
-return NULL;
-}
-
-for (hdr = ptr + info->cap_offset; hdr != ptr; hdr = ptr + hdr->next) {
+for (hdr = ptr + cap_offset; hdr != ptr; hdr = ptr + hdr->next) {
 if (hdr->id == id) {
 return hdr;
 }
@@ -1144,6 +1139,16 @@ vfio_get_region_info_cap(struct vfio_region_info *info, 
uint16_t id)
 return NULL;
 }
 
+struct vfio_info_cap_header *
+vfio_get_region_info_cap(struct vfio_region_info *info, uint16_t id)
+{
+if (!(info->flags & VFIO_REGION_INFO_FLAG_CAPS)) {
+return NULL;
+}
+
+return vfio_get_cap((void *)info, info->cap_offset, id);
+}
+
 static int vfio_setup_region_sparse_mmaps(VFIORegion *region,
   struct vfio_region_info *info)
 {

[PULL v3 16/32] vfio: Make vfio-pci device migration capable

2020-11-01 Thread Alex Williamson

From: Kirti Wankhede 

If the device is not a failover primary device, call
vfio_migration_probe() and vfio_migration_finalize() to enable
migration support for those devices that support it respectively to
tear it down again.
Removed migration blocker from VFIO PCI device specific structure and use
migration blocker from generic structure of  VFIO device.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Cornelia Huck 
Signed-off-by: Alex Williamson 
---
 hw/vfio/pci.c |   28 
 hw/vfio/pci.h |1 -
 2 files changed, 8 insertions(+), 21 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index e27c88be6d85..58c0ce8971e3 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2791,17 +2791,6 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 return;
 }
 
-if (!pdev->failover_pair_id) {
-error_setg(&vdev->migration_blocker,
-"VFIO device doesn't support migration");
-ret = migrate_add_blocker(vdev->migration_blocker, errp);
-if (ret) {
-error_free(vdev->migration_blocker);
-vdev->migration_blocker = NULL;
-return;
-}
-}
-
 vdev->vbasedev.name = g_path_get_basename(vdev->vbasedev.sysfsdev);
 vdev->vbasedev.ops = &vfio_pci_ops;
 vdev->vbasedev.type = VFIO_DEVICE_TYPE_PCI;
@@ -3069,6 +3058,13 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 }
 }
 
+if (!pdev->failover_pair_id) {
+ret = vfio_migration_probe(&vdev->vbasedev, errp);
+if (ret) {
+error_report("%s: Migration disabled", vdev->vbasedev.name);
+}
+}
+
 vfio_register_err_notifier(vdev);
 vfio_register_req_notifier(vdev);
 vfio_setup_resetfn_quirk(vdev);
@@ -3083,11 +3079,6 @@ out_teardown:
 vfio_bars_exit(vdev);
 error:
 error_prepend(errp, VFIO_MSG_PREFIX, vdev->vbasedev.name);
-if (vdev->migration_blocker) {
-migrate_del_blocker(vdev->migration_blocker);
-error_free(vdev->migration_blocker);
-vdev->migration_blocker = NULL;
-}
 }
 
 static void vfio_instance_finalize(Object *obj)
@@ -3099,10 +3090,6 @@ static void vfio_instance_finalize(Object *obj)
 vfio_bars_finalize(vdev);
 g_free(vdev->emulated_config_bits);
 g_free(vdev->rom);
-if (vdev->migration_blocker) {
-migrate_del_blocker(vdev->migration_blocker);
-error_free(vdev->migration_blocker);
-}
 /*
  * XXX Leaking igd_opregion is not an oversight, we can't remove the
  * fw_cfg entry therefore leaking this allocation seems like the safest
@@ -3130,6 +3117,7 @@ static void vfio_exitfn(PCIDevice *pdev)
 }
 vfio_teardown_msi(vdev);
 vfio_bars_exit(vdev);
+vfio_migration_finalize(&vdev->vbasedev);
 }
 
 static void vfio_pci_reset(DeviceState *dev)
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index bce71a9ac93f..1574ef983f8f 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -172,7 +172,6 @@ struct VFIOPCIDevice {
 bool no_vfio_ioeventfd;
 bool enable_ramfb;
 VFIODisplay *dpy;
-Error *migration_blocker;
 Notifier irqchip_change_notifier;
 };

[PULL v3 28/32] s390x/pci: use a PCI Function structure

2020-11-01 Thread Alex Williamson

From: Pierre Morel 

We use a ClpRspQueryPci structure to hold the information related to a
zPCI Function.

This allows us to be ready to support different zPCI functions and to
retrieve the zPCI function information from the host.

Signed-off-by: Pierre Morel 
Signed-off-by: Matthew Rosato 
Reviewed-by: Cornelia Huck 
Signed-off-by: Alex Williamson 
---
 hw/s390x/s390-pci-bus.c |   12 
 hw/s390x/s390-pci-inst.c|8 ++--
 include/hw/s390x/s390-pci-bus.h |1 +
 3 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 036cf4635a7e..072b56e45ee5 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -778,6 +778,17 @@ static void s390_pci_init_default_group(void)
 resgrp->version = 0;
 }
 
+static void set_pbdev_info(S390PCIBusDevice *pbdev)
+{
+pbdev->zpci_fn.sdma = ZPCI_SDMA_ADDR;
+pbdev->zpci_fn.edma = ZPCI_EDMA_ADDR;
+pbdev->zpci_fn.pchid = 0;
+pbdev->zpci_fn.ug = ZPCI_DEFAULT_FN_GRP;
+pbdev->zpci_fn.fid = pbdev->fid;
+pbdev->zpci_fn.uid = pbdev->uid;
+pbdev->pci_group = s390_group_find(ZPCI_DEFAULT_FN_GRP);
+}
+
 static void s390_pcihost_realize(DeviceState *dev, Error **errp)
 {
 PCIBus *b;
@@ -1000,6 +1011,7 @@ static void s390_pcihost_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 pbdev->iommu = s390_pci_get_iommu(s, pci_get_bus(pdev), pdev->devfn);
 pbdev->iommu->pbdev = pbdev;
 pbdev->state = ZPCI_FS_DISABLED;
+set_pbdev_info(pbdev);
 
 if (object_dynamic_cast(OBJECT(dev), "vfio-pci")) {
 pbdev->fh |= FH_SHM_VFIO;
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index c25b2a67efe0..58cd041d17fb 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -281,6 +281,8 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t ra)
 goto out;
 }
 
+memcpy(resquery, &pbdev->zpci_fn, sizeof(*resquery));
+
 for (i = 0; i < PCI_BAR_COUNT; i++) {
 uint32_t data = pci_get_long(pbdev->pdev->config +
 PCI_BASE_ADDRESS_0 + (i * 4));
@@ -294,12 +296,6 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t 
ra)
 resquery->bar_size[i]);
 }
 
-stq_p(&resquery->sdma, ZPCI_SDMA_ADDR);
-stq_p(&resquery->edma, ZPCI_EDMA_ADDR);
-stl_p(&resquery->fid, pbdev->fid);
-stw_p(&resquery->pchid, 0);
-stw_p(&resquery->ug, ZPCI_DEFAULT_FN_GRP);
-stl_p(&resquery->uid, pbdev->uid);
 stw_p(&resquery->hdr.rsp, CLP_RC_OK);
 break;
 }
diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
index 869c0f254b7f..fe36f163abd4 100644
--- a/include/hw/s390x/s390-pci-bus.h
+++ b/include/hw/s390x/s390-pci-bus.h
@@ -342,6 +342,7 @@ struct S390PCIBusDevice {
 uint16_t maxstbl;
 uint8_t sum;
 S390PCIGroup *pci_group;
+ClpRspQueryPci zpci_fn;
 S390MsixInfo msix;
 AdapterRoutes routes;
 S390PCIIOMMU *iommu;

[PULL v3 22/32] vfio: Find DMA available capability

2020-11-01 Thread Alex Williamson

From: Matthew Rosato 

The underlying host may be limiting the number of outstanding DMA
requests for type 1 IOMMU.  Add helper functions to check for the
DMA available capability and retrieve the current number of DMA
mappings allowed.

Signed-off-by: Matthew Rosato 
Reviewed-by: Cornelia Huck 
[aw: vfio_get_info_dma_avail moved inside CONFIG_LINUX]
Signed-off-by: Alex Williamson 
---
 hw/vfio/common.c  |   31 +++
 include/hw/vfio/vfio-common.h |2 ++
 2 files changed, 33 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 693d3a29439a..920786a23e0b 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1149,6 +1149,37 @@ vfio_get_region_info_cap(struct vfio_region_info *info, 
uint16_t id)
 return vfio_get_cap((void *)info, info->cap_offset, id);
 }
 
+static struct vfio_info_cap_header *
+vfio_get_iommu_type1_info_cap(struct vfio_iommu_type1_info *info, uint16_t id)
+{
+if (!(info->flags & VFIO_IOMMU_INFO_CAPS)) {
+return NULL;
+}
+
+return vfio_get_cap((void *)info, info->cap_offset, id);
+}
+
+bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
+ unsigned int *avail)
+{
+struct vfio_info_cap_header *hdr;
+struct vfio_iommu_type1_info_dma_avail *cap;
+
+/* If the capability cannot be found, assume no DMA limiting */
+hdr = vfio_get_iommu_type1_info_cap(info,
+VFIO_IOMMU_TYPE1_INFO_DMA_AVAIL);
+if (hdr == NULL) {
+return false;
+}
+
+if (avail != NULL) {
+cap = (void *) hdr;
+*avail = cap->avail;
+}
+
+return true;
+}
+
 static int vfio_setup_region_sparse_mmaps(VFIORegion *region,
   struct vfio_region_info *info)
 {
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 24e299d97425..1d14946a9d66 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -214,6 +214,8 @@ int vfio_get_dev_region_info(VFIODevice *vbasedev, uint32_t 
type,
 bool vfio_has_region_cap(VFIODevice *vbasedev, int region, uint16_t cap_type);
 struct vfio_info_cap_header *
 vfio_get_region_info_cap(struct vfio_region_info *info, uint16_t id);
+bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
+ unsigned int *avail);
 #endif
 extern const MemoryListener vfio_prereg_listener;

[PULL v3 24/32] s390x/pci: Honor DMA limits set by vfio

2020-11-01 Thread Alex Williamson

From: Matthew Rosato 

When an s390 guest is using lazy unmapping, it can result in a very
large number of oustanding DMA requests, far beyond the default
limit configured for vfio.  Let's track DMA usage similar to vfio
in the host, and trigger the guest to flush their DMA mappings
before vfio runs out.

Signed-off-by: Matthew Rosato 
Reviewed-by: Cornelia Huck 
[aw: non-Linux build fixes]
Signed-off-by: Alex Williamson 
---
 hw/s390x/s390-pci-bus.c  |   16 +-
 hw/s390x/s390-pci-inst.c |   45 +-
 hw/s390x/s390-pci-vfio.c |   42 +++
 include/hw/s390x/s390-pci-bus.h  |9 
 include/hw/s390x/s390-pci-inst.h |3 +++
 include/hw/s390x/s390-pci-vfio.h |   12 ++
 6 files changed, 116 insertions(+), 11 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index a929340688cc..218717397ae1 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -17,6 +17,7 @@
 #include "cpu.h"
 #include "hw/s390x/s390-pci-bus.h"
 #include "hw/s390x/s390-pci-inst.h"
+#include "hw/s390x/s390-pci-vfio.h"
 #include "hw/pci/pci_bus.h"
 #include "hw/qdev-properties.h"
 #include "hw/pci/pci_bridge.h"
@@ -764,6 +765,7 @@ static void s390_pcihost_realize(DeviceState *dev, Error 
**errp)
 s->bus_no = 0;
 QTAILQ_INIT(&s->pending_sei);
 QTAILQ_INIT(&s->zpci_devs);
+QTAILQ_INIT(&s->zpci_dma_limit);
 
 css_register_io_adapters(CSS_IO_ADAPTER_PCI, true, false,
  S390_ADAPTER_SUPPRESSIBLE, errp);
@@ -941,17 +943,18 @@ static void s390_pcihost_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 }
 }
 
+pbdev->pdev = pdev;
+pbdev->iommu = s390_pci_get_iommu(s, pci_get_bus(pdev), pdev->devfn);
+pbdev->iommu->pbdev = pbdev;
+pbdev->state = ZPCI_FS_DISABLED;
+
 if (object_dynamic_cast(OBJECT(dev), "vfio-pci")) {
 pbdev->fh |= FH_SHM_VFIO;
+pbdev->iommu->dma_limit = s390_pci_start_dma_count(s, pbdev);
 } else {
 pbdev->fh |= FH_SHM_EMUL;
 }
 
-pbdev->pdev = pdev;
-pbdev->iommu = s390_pci_get_iommu(s, pci_get_bus(pdev), pdev->devfn);
-pbdev->iommu->pbdev = pbdev;
-pbdev->state = ZPCI_FS_DISABLED;
-
 if (s390_pci_msix_init(pbdev)) {
 error_setg(errp, "MSI-X support is mandatory "
"in the S390 architecture");
@@ -1004,6 +1007,9 @@ static void s390_pcihost_unplug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 pbdev->fid = 0;
 QTAILQ_REMOVE(&s->zpci_devs, pbdev, link);
 g_hash_table_remove(s->zpci_table, &pbdev->idx);
+if (pbdev->iommu->dma_limit) {
+s390_pci_end_dma_count(s, pbdev->iommu->dma_limit);
+}
 qdev_unrealize(dev);
 }
 }
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 639b13c8d626..4eadd9e79416 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -32,6 +32,20 @@
 }  \
 } while (0)
 
+static inline void inc_dma_avail(S390PCIIOMMU *iommu)
+{
+if (iommu->dma_limit) {
+iommu->dma_limit->avail++;
+}
+}
+
+static inline void dec_dma_avail(S390PCIIOMMU *iommu)
+{
+if (iommu->dma_limit) {
+iommu->dma_limit->avail--;
+}
+}
+
 static void s390_set_status_code(CPUS390XState *env,
  uint8_t r, uint64_t status_code)
 {
@@ -572,7 +586,8 @@ int pcistg_service_call(S390CPU *cpu, uint8_t r1, uint8_t 
r2, uintptr_t ra)
 return 0;
 }
 
-static void s390_pci_update_iotlb(S390PCIIOMMU *iommu, S390IOTLBEntry *entry)
+static uint32_t s390_pci_update_iotlb(S390PCIIOMMU *iommu,
+  S390IOTLBEntry *entry)
 {
 S390IOTLBEntry *cache = g_hash_table_lookup(iommu->iotlb, &entry->iova);
 IOMMUTLBEntry notify = {
@@ -585,14 +600,15 @@ static void s390_pci_update_iotlb(S390PCIIOMMU *iommu, 
S390IOTLBEntry *entry)
 
 if (entry->perm == IOMMU_NONE) {
 if (!cache) {
-return;
+goto out;
 }
 g_hash_table_remove(iommu->iotlb, &entry->iova);
+inc_dma_avail(iommu);
 } else {
 if (cache) {
 if (cache->perm == entry->perm &&
 cache->translated_addr == entry->translated_addr) {
-return;
+goto out;
 }
 
 notify.perm = IOMMU_NONE;
@@ -606,9 +622,13 @@ static void s390_pci_update_iotlb(S390PCIIOMMU *iommu, 
S390IOTLBEntry *entry)
 cache->len = PAGE_SIZE;
 cache->perm = entry->perm;
 g_hash_table_replace(iommu->iotlb, &cache->iova, cache);
+dec_dma_avail(iommu);
 }
 
 memory_region_notify_iommu(&iommu->iommu_mr, 0, notify);
+
+out:
+return iommu->dma_limit ? iommu->dma_limit->avail : 1;
 }
 
 int rpcit_service_call(S390CPU *c

[PULL v3 29/32] vfio: Add routine for finding VFIO_DEVICE_GET_INFO capabilities

2020-11-01 Thread Alex Williamson

From: Matthew Rosato 

Now that VFIO_DEVICE_GET_INFO supports capability chains, add a helper
function to find specific capabilities in the chain.

Signed-off-by: Matthew Rosato 
Reviewed-by: Cornelia Huck 
Signed-off-by: Alex Williamson 
---
 hw/vfio/common.c  |   10 ++
 include/hw/vfio/vfio-common.h |2 ++
 2 files changed, 12 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 920786a23e0b..57f55f0447d6 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1159,6 +1159,16 @@ vfio_get_iommu_type1_info_cap(struct 
vfio_iommu_type1_info *info, uint16_t id)
 return vfio_get_cap((void *)info, info->cap_offset, id);
 }
 
+struct vfio_info_cap_header *
+vfio_get_device_info_cap(struct vfio_device_info *info, uint16_t id)
+{
+if (!(info->flags & VFIO_DEVICE_FLAGS_CAPS)) {
+return NULL;
+}
+
+return vfio_get_cap((void *)info, info->cap_offset, id);
+}
+
 bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
  unsigned int *avail)
 {
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 1d14946a9d66..baeb4dcff102 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -216,6 +216,8 @@ struct vfio_info_cap_header *
 vfio_get_region_info_cap(struct vfio_region_info *info, uint16_t id);
 bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
  unsigned int *avail);
+struct vfio_info_cap_header *
+vfio_get_device_info_cap(struct vfio_device_info *info, uint16_t id);
 #endif
 extern const MemoryListener vfio_prereg_listener;

[PULL v3 23/32] s390x/pci: Add routine to get the vfio dma available count

2020-11-01 Thread Alex Williamson

From: Matthew Rosato 

Create new files for separating out vfio-specific work for s390
pci. Add the first such routine, which issues VFIO_IOMMU_GET_INFO
ioctl to collect the current dma available count.

Signed-off-by: Matthew Rosato 
Reviewed-by: Cornelia Huck 
[aw: Fix non-Linux build with CONFIG_LINUX]
Signed-off-by: Alex Williamson 
---
 hw/s390x/meson.build |1 +
 hw/s390x/s390-pci-vfio.c |   54 ++
 include/hw/s390x/s390-pci-vfio.h |   24 +
 3 files changed, 79 insertions(+)
 create mode 100644 hw/s390x/s390-pci-vfio.c
 create mode 100644 include/hw/s390x/s390-pci-vfio.h

diff --git a/hw/s390x/meson.build b/hw/s390x/meson.build
index 948ceae7a7b3..f4663a835514 100644
--- a/hw/s390x/meson.build
+++ b/hw/s390x/meson.build
@@ -27,6 +27,7 @@ s390x_ss.add(when: 'CONFIG_KVM', if_true: files(
 ))
 s390x_ss.add(when: 'CONFIG_S390_CCW_VIRTIO', if_true: 
files('s390-virtio-ccw.c'))
 s390x_ss.add(when: 'CONFIG_TERMINAL3270', if_true: files('3270-ccw.c'))
+s390x_ss.add(when: 'CONFIG_LINUX', if_true: files('s390-pci-vfio.c'))
 
 virtio_ss = ss.source_set()
 virtio_ss.add(files('virtio-ccw.c'))
diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
new file mode 100644
index ..cb3f4d98adf8
--- /dev/null
+++ b/hw/s390x/s390-pci-vfio.c
@@ -0,0 +1,54 @@
+/*
+ * s390 vfio-pci interfaces
+ *
+ * Copyright 2020 IBM Corp.
+ * Author(s): Matthew Rosato 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#include 
+
+#include "qemu/osdep.h"
+#include "hw/s390x/s390-pci-vfio.h"
+#include "hw/vfio/vfio-common.h"
+
+/*
+ * Get the current DMA available count from vfio.  Returns true if vfio is
+ * limiting DMA requests, false otherwise.  The current available count read
+ * from vfio is returned in avail.
+ */
+bool s390_pci_update_dma_avail(int fd, unsigned int *avail)
+{
+g_autofree struct vfio_iommu_type1_info *info;
+uint32_t argsz;
+
+assert(avail);
+
+argsz = sizeof(struct vfio_iommu_type1_info);
+info = g_malloc0(argsz);
+
+/*
+ * If the specified argsz is not large enough to contain all capabilities
+ * it will be updated upon return from the ioctl.  Retry until we have
+ * a big enough buffer to hold the entire capability chain.
+ */
+retry:
+info->argsz = argsz;
+
+if (ioctl(fd, VFIO_IOMMU_GET_INFO, info)) {
+return false;
+}
+
+if (info->argsz > argsz) {
+argsz = info->argsz;
+info = g_realloc(info, argsz);
+goto retry;
+}
+
+/* If the capability exists, update with the current value */
+return vfio_get_info_dma_avail(info, avail);
+}
+
diff --git a/include/hw/s390x/s390-pci-vfio.h b/include/hw/s390x/s390-pci-vfio.h
new file mode 100644
index ..1727292e9b5d
--- /dev/null
+++ b/include/hw/s390x/s390-pci-vfio.h
@@ -0,0 +1,24 @@
+/*
+ * s390 vfio-pci interfaces
+ *
+ * Copyright 2020 IBM Corp.
+ * Author(s): Matthew Rosato 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#ifndef HW_S390_PCI_VFIO_H
+#define HW_S390_PCI_VFIO_H
+
+#ifdef CONFIG_LINUX
+bool s390_pci_update_dma_avail(int fd, unsigned int *avail);
+#else
+static inline bool s390_pci_update_dma_avail(int fd, unsigned int *avail)
+{
+return false;
+}
+#endif
+
+#endif

[PULL v3 26/32] s390x/pci: use a PCI Group structure

2020-11-01 Thread Alex Williamson

From: Pierre Morel 

We use a S390PCIGroup structure to hold the information related to a
zPCI Function group.

This allows us to be ready to support multiple groups and to retrieve
the group information from the host.

Signed-off-by: Pierre Morel 
Signed-off-by: Matthew Rosato 
Reviewed-by: Cornelia Huck 
Signed-off-by: Alex Williamson 
---
 hw/s390x/s390-pci-bus.c |   42 +++
 hw/s390x/s390-pci-inst.c|   23 +
 include/hw/s390x/s390-pci-bus.h |   10 +
 3 files changed, 66 insertions(+), 9 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 218717397ae1..4c7f06d5cf95 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -738,6 +738,46 @@ static void s390_pci_iommu_free(S390pciState *s, PCIBus 
*bus, int32_t devfn)
 object_unref(OBJECT(iommu));
 }
 
+static S390PCIGroup *s390_group_create(int id)
+{
+S390PCIGroup *group;
+S390pciState *s = s390_get_phb();
+
+group = g_new0(S390PCIGroup, 1);
+group->id = id;
+QTAILQ_INSERT_TAIL(&s->zpci_groups, group, link);
+return group;
+}
+
+S390PCIGroup *s390_group_find(int id)
+{
+S390PCIGroup *group;
+S390pciState *s = s390_get_phb();
+
+QTAILQ_FOREACH(group, &s->zpci_groups, link) {
+if (group->id == id) {
+return group;
+}
+}
+return NULL;
+}
+
+static void s390_pci_init_default_group(void)
+{
+S390PCIGroup *group;
+ClpRspQueryPciGrp *resgrp;
+
+group = s390_group_create(ZPCI_DEFAULT_FN_GRP);
+resgrp = &group->zpci_group;
+resgrp->fr = 1;
+stq_p(&resgrp->dasm, 0);
+stq_p(&resgrp->msia, ZPCI_MSI_ADDR);
+stw_p(&resgrp->mui, DEFAULT_MUI);
+stw_p(&resgrp->i, 128);
+stw_p(&resgrp->maxstbl, 128);
+resgrp->version = 0;
+}
+
 static void s390_pcihost_realize(DeviceState *dev, Error **errp)
 {
 PCIBus *b;
@@ -766,7 +806,9 @@ static void s390_pcihost_realize(DeviceState *dev, Error 
**errp)
 QTAILQ_INIT(&s->pending_sei);
 QTAILQ_INIT(&s->zpci_devs);
 QTAILQ_INIT(&s->zpci_dma_limit);
+QTAILQ_INIT(&s->zpci_groups);
 
+s390_pci_init_default_group();
 css_register_io_adapters(CSS_IO_ADAPTER_PCI, true, false,
  S390_ADAPTER_SUPPRESSIBLE, errp);
 }
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 4eadd9e79416..c25b2a67efe0 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -298,21 +298,25 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t 
ra)
 stq_p(&resquery->edma, ZPCI_EDMA_ADDR);
 stl_p(&resquery->fid, pbdev->fid);
 stw_p(&resquery->pchid, 0);
-stw_p(&resquery->ug, 1);
+stw_p(&resquery->ug, ZPCI_DEFAULT_FN_GRP);
 stl_p(&resquery->uid, pbdev->uid);
 stw_p(&resquery->hdr.rsp, CLP_RC_OK);
 break;
 }
 case CLP_QUERY_PCI_FNGRP: {
 ClpRspQueryPciGrp *resgrp = (ClpRspQueryPciGrp *)resh;
-resgrp->fr = 1;
-stq_p(&resgrp->dasm, 0);
-stq_p(&resgrp->msia, ZPCI_MSI_ADDR);
-stw_p(&resgrp->mui, DEFAULT_MUI);
-stw_p(&resgrp->i, 128);
-stw_p(&resgrp->maxstbl, 128);
-resgrp->version = 0;
 
+ClpReqQueryPciGrp *reqgrp = (ClpReqQueryPciGrp *)reqh;
+S390PCIGroup *group;
+
+group = s390_group_find(reqgrp->g);
+if (!group) {
+/* We do not allow access to unknown groups */
+/* The group must have been obtained with a vfio device */
+stw_p(&resgrp->hdr.rsp, CLP_RC_QUERYPCIFG_PFGID);
+goto out;
+}
+memcpy(resgrp, &group->zpci_group, sizeof(ClpRspQueryPciGrp));
 stw_p(&resgrp->hdr.rsp, CLP_RC_OK);
 break;
 }
@@ -787,7 +791,8 @@ int pcistb_service_call(S390CPU *cpu, uint8_t r1, uint8_t 
r3, uint64_t gaddr,
 }
 /* Length must be greater than 8, a multiple of 8 */
 /* and not greater than maxstbl */
-if ((len <= 8) || (len % 8) || (len > pbdev->maxstbl)) {
+if ((len <= 8) || (len % 8) ||
+(len > pbdev->pci_group->zpci_group.maxstbl)) {
 goto specification_error;
 }
 /* Do not cross a 4K-byte boundary */
diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
index 5f339e57fb68..869c0f254b7f 100644
--- a/include/hw/s390x/s390-pci-bus.h
+++ b/include/hw/s390x/s390-pci-bus.h
@@ -316,6 +316,14 @@ typedef struct ZpciFmb {
 } ZpciFmb;
 QEMU_BUILD_BUG_MSG(offsetof(ZpciFmb, fmt0) != 48, "padding in ZpciFmb");
 
+#define ZPCI_DEFAULT_FN_GRP 0x20
+typedef struct S390PCIGroup {
+ClpRspQueryPciGrp zpci_group;
+int id;
+QTAILQ_ENTRY(S390PCIGroup) link;
+} S390PCIGroup;
+S390PCIGroup *s390_group_find(int id);
+
 struct S390PCIBusDevice {
 DeviceState qdev;
 PCIDevice *pdev;
@@ -333,6 +341,7 @@ struct S390PCIBusDevice {
 uint16_t noi;
 uint16_t maxstbl;
 uint8_t sum;
+S390PCIGroup *pci_group;
 S390MsixInfo msix;
 Adapte

[PULL v3 32/32] vfio: fix incorrect print type

2020-11-01 Thread Alex Williamson

From: Zhengui li 

The type of input variable is unsigned int
while the printer type is int. So fix incorrect print type.

Signed-off-by: Zhengui li 
Signed-off-by: Alex Williamson 
---
 hw/vfio/common.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 57f55f0447d6..e18ea2cf9124 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -205,7 +205,7 @@ void vfio_region_write(void *opaque, hwaddr addr,
 buf.qword = cpu_to_le64(data);
 break;
 default:
-hw_error("vfio: unsupported write size, %d bytes", size);
+hw_error("vfio: unsupported write size, %u bytes", size);
 break;
 }
 
@@ -262,7 +262,7 @@ uint64_t vfio_region_read(void *opaque,
 data = le64_to_cpu(buf.qword);
 break;
 default:
-hw_error("vfio: unsupported read size, %d bytes", size);
+hw_error("vfio: unsupported read size, %u bytes", size);
 break;
 }

[PULL v3 30/32] s390x/pci: get zPCI function info from host

2020-11-01 Thread Alex Williamson

From: Matthew Rosato 

We use the capability chains of the VFIO_DEVICE_GET_INFO ioctl to retrieve
the CLP information that the kernel exports.

To be compatible with previous kernel versions we fall back on previous
predefined values, same as the emulation values, when the ioctl is found
to not support capability chains. If individual CLP capabilities are not
found, we fall back on default values for only those capabilities missing
from the chain.

This patch is based on work previously done by Pierre Morel.

Signed-off-by: Matthew Rosato 
Reviewed-by: Cornelia Huck 
[aw: non-Linux build fixes]
Signed-off-by: Alex Williamson 
---
 hw/s390x/s390-pci-bus.c  |9 +-
 hw/s390x/s390-pci-vfio.c |  180 ++
 hw/s390x/trace-events|5 +
 include/hw/s390x/s390-pci-bus.h  |1 
 include/hw/s390x/s390-pci-clp.h  |   12 ++-
 include/hw/s390x/s390-pci-vfio.h |2 
 6 files changed, 202 insertions(+), 7 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 072b56e45ee5..48a3be802f8e 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -738,7 +738,7 @@ static void s390_pci_iommu_free(S390pciState *s, PCIBus 
*bus, int32_t devfn)
 object_unref(OBJECT(iommu));
 }
 
-static S390PCIGroup *s390_group_create(int id)
+S390PCIGroup *s390_group_create(int id)
 {
 S390PCIGroup *group;
 S390pciState *s = s390_get_phb();
@@ -783,7 +783,7 @@ static void set_pbdev_info(S390PCIBusDevice *pbdev)
 pbdev->zpci_fn.sdma = ZPCI_SDMA_ADDR;
 pbdev->zpci_fn.edma = ZPCI_EDMA_ADDR;
 pbdev->zpci_fn.pchid = 0;
-pbdev->zpci_fn.ug = ZPCI_DEFAULT_FN_GRP;
+pbdev->zpci_fn.pfgid = ZPCI_DEFAULT_FN_GRP;
 pbdev->zpci_fn.fid = pbdev->fid;
 pbdev->zpci_fn.uid = pbdev->uid;
 pbdev->pci_group = s390_group_find(ZPCI_DEFAULT_FN_GRP);
@@ -863,7 +863,8 @@ static int s390_pci_msix_init(S390PCIBusDevice *pbdev)
 name = g_strdup_printf("msix-s390-%04x", pbdev->uid);
 memory_region_init_io(&pbdev->msix_notify_mr, OBJECT(pbdev),
   &s390_msi_ctrl_ops, pbdev, name, PAGE_SIZE);
-memory_region_add_subregion(&pbdev->iommu->mr, ZPCI_MSI_ADDR,
+memory_region_add_subregion(&pbdev->iommu->mr,
+pbdev->pci_group->zpci_group.msia,
 &pbdev->msix_notify_mr);
 g_free(name);
 
@@ -1016,6 +1017,8 @@ static void s390_pcihost_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 if (object_dynamic_cast(OBJECT(dev), "vfio-pci")) {
 pbdev->fh |= FH_SHM_VFIO;
 pbdev->iommu->dma_limit = s390_pci_start_dma_count(s, pbdev);
+/* Fill in CLP information passed via the vfio region */
+s390_pci_get_clp_info(pbdev);
 } else {
 pbdev->fh |= FH_SHM_EMUL;
 }
diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index 0621fa386ced..d5c78063b5bc 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -10,9 +10,13 @@
  */
 
 #include 
+#include 
+#include 
 
 #include "qemu/osdep.h"
+#include "trace.h"
 #include "hw/s390x/s390-pci-bus.h"
+#include "hw/s390x/s390-pci-clp.h"
 #include "hw/s390x/s390-pci-vfio.h"
 #include "hw/vfio/pci.h"
 #include "hw/vfio/vfio-common.h"
@@ -94,3 +98,179 @@ void s390_pci_end_dma_count(S390pciState *s, 
S390PCIDMACount *cnt)
 QTAILQ_REMOVE(&s->zpci_dma_limit, cnt, link);
 }
 }
+
+static void s390_pci_read_base(S390PCIBusDevice *pbdev,
+   struct vfio_device_info *info)
+{
+struct vfio_info_cap_header *hdr;
+struct vfio_device_info_cap_zpci_base *cap;
+VFIOPCIDevice *vpci =  container_of(pbdev->pdev, VFIOPCIDevice, pdev);
+
+hdr = vfio_get_device_info_cap(info, VFIO_DEVICE_INFO_CAP_ZPCI_BASE);
+
+/* If capability not provided, just leave the defaults in place */
+if (hdr == NULL) {
+trace_s390_pci_clp_cap(vpci->vbasedev.name,
+   VFIO_DEVICE_INFO_CAP_ZPCI_BASE);
+return;
+}
+cap = (void *) hdr;
+
+pbdev->zpci_fn.sdma = cap->start_dma;
+pbdev->zpci_fn.edma = cap->end_dma;
+pbdev->zpci_fn.pchid = cap->pchid;
+pbdev->zpci_fn.vfn = cap->vfn;
+pbdev->zpci_fn.pfgid = cap->gid;
+/* The following values remain 0 until we support other FMB formats */
+pbdev->zpci_fn.fmbl = 0;
+pbdev->zpci_fn.pft = 0;
+}
+
+static void s390_pci_read_group(S390PCIBusDevice *pbdev,
+struct vfio_device_info *info)
+{
+struct vfio_info_cap_header *hdr;
+struct vfio_device_info_cap_zpci_group *cap;
+ClpRspQueryPciGrp *resgrp;
+VFIOPCIDevice *vpci =  container_of(pbdev->pdev, VFIOPCIDevice, pdev);
+
+hdr = vfio_get_device_info_cap(info, VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
+
+/* If capability not provided, just use the default group */
+if (hdr == NULL) {
+trace_s390_pci_clp_cap(vpci->vbasedev.name,
+

[PULL v3 31/32] hw/vfio: Use lock guard macros

2020-11-01 Thread Alex Williamson

From: Amey Narkhede 

Use qemu LOCK_GUARD macros in hw/vfio.
Saves manual unlock calls

Signed-off-by: Amey Narkhede 
Signed-off-by: Alex Williamson 
---
 hw/vfio/platform.c |7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index 869ed2c39dcd..cc3f66f7e44c 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -166,7 +166,7 @@ static void vfio_intp_mmap_enable(void *opaque)
 VFIOINTp *tmp;
 VFIOPlatformDevice *vdev = (VFIOPlatformDevice *)opaque;
 
-qemu_mutex_lock(&vdev->intp_mutex);
+QEMU_LOCK_GUARD(&vdev->intp_mutex);
 QLIST_FOREACH(tmp, &vdev->intp_list, next) {
 if (tmp->state == VFIO_IRQ_ACTIVE) {
 trace_vfio_platform_intp_mmap_enable(tmp->pin);
@@ -174,12 +174,10 @@ static void vfio_intp_mmap_enable(void *opaque)
 timer_mod(vdev->mmap_timer,
   qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
   vdev->mmap_timeout);
-qemu_mutex_unlock(&vdev->intp_mutex);
 return;
 }
 }
 vfio_mmap_set_enabled(vdev, true);
-qemu_mutex_unlock(&vdev->intp_mutex);
 }
 
 /**
@@ -289,7 +287,7 @@ static void vfio_platform_eoi(VFIODevice *vbasedev)
 VFIOPlatformDevice *vdev =
 container_of(vbasedev, VFIOPlatformDevice, vbasedev);
 
-qemu_mutex_lock(&vdev->intp_mutex);
+QEMU_LOCK_GUARD(&vdev->intp_mutex);
 QLIST_FOREACH(intp, &vdev->intp_list, next) {
 if (intp->state == VFIO_IRQ_ACTIVE) {
 trace_vfio_platform_eoi(intp->pin,
@@ -314,7 +312,6 @@ static void vfio_platform_eoi(VFIODevice *vbasedev)
 vfio_intp_inject_pending_lockheld(intp);
 QSIMPLEQ_REMOVE_HEAD(&vdev->pending_intp_queue, pqnext);
 }
-qemu_mutex_unlock(&vdev->intp_mutex);
 }
 
 /**

[PULL v3 25/32] s390x/pci: create a header dedicated to PCI CLP

2020-11-01 Thread Alex Williamson

From: Pierre Morel 

To have a clean separation between s390-pci-bus.h and s390-pci-inst.h
headers we export the PCI CLP instructions in a dedicated header.

Signed-off-by: Pierre Morel 
Signed-off-by: Matthew Rosato 
Reviewed-by: Cornelia Huck 
Signed-off-by: Alex Williamson 
---
 include/hw/s390x/s390-pci-bus.h  |1 
 include/hw/s390x/s390-pci-clp.h  |  211 ++
 include/hw/s390x/s390-pci-inst.h |  196 ---
 3 files changed, 212 insertions(+), 196 deletions(-)
 create mode 100644 include/hw/s390x/s390-pci-clp.h

diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
index 6a35f1365bec..5f339e57fb68 100644
--- a/include/hw/s390x/s390-pci-bus.h
+++ b/include/hw/s390x/s390-pci-bus.h
@@ -19,6 +19,7 @@
 #include "hw/s390x/sclp.h"
 #include "hw/s390x/s390_flic.h"
 #include "hw/s390x/css.h"
+#include "hw/s390x/s390-pci-clp.h"
 #include "qom/object.h"
 
 #define TYPE_S390_PCI_HOST_BRIDGE "s390-pcihost"
diff --git a/include/hw/s390x/s390-pci-clp.h b/include/hw/s390x/s390-pci-clp.h
new file mode 100644
index ..3708acd173c6
--- /dev/null
+++ b/include/hw/s390x/s390-pci-clp.h
@@ -0,0 +1,211 @@
+/*
+ * s390 CLP instruction definitions
+ *
+ * Copyright 2019 IBM Corp.
+ * Author(s): Pierre Morel 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#ifndef HW_S390_PCI_CLP
+#define HW_S390_PCI_CLP
+
+/* CLP common request & response block size */
+#define CLP_BLK_SIZE 4096
+#define PCI_BAR_COUNT 6
+#define PCI_MAX_FUNCTIONS 4096
+
+typedef struct ClpReqHdr {
+uint16_t len;
+uint16_t cmd;
+} QEMU_PACKED ClpReqHdr;
+
+typedef struct ClpRspHdr {
+uint16_t len;
+uint16_t rsp;
+} QEMU_PACKED ClpRspHdr;
+
+/* CLP Response Codes */
+#define CLP_RC_OK 0x0010  /* Command request successfully */
+#define CLP_RC_CMD0x0020  /* Command code not recognized */
+#define CLP_RC_PERM   0x0030  /* Command not authorized */
+#define CLP_RC_FMT0x0040  /* Invalid command request format */
+#define CLP_RC_LEN0x0050  /* Invalid command request length */
+#define CLP_RC_8K 0x0060  /* Command requires 8K LPCB */
+#define CLP_RC_RESNOT00x0070  /* Reserved field not zero */
+#define CLP_RC_NODATA 0x0080  /* No data available */
+#define CLP_RC_FC_UNKNOWN 0x0100  /* Function code not recognized */
+
+/*
+ * Call Logical Processor - Command Codes
+ */
+#define CLP_LIST_PCI0x0002
+#define CLP_QUERY_PCI_FN0x0003
+#define CLP_QUERY_PCI_FNGRP 0x0004
+#define CLP_SET_PCI_FN  0x0005
+
+/* PCI function handle list entry */
+typedef struct ClpFhListEntry {
+uint16_t device_id;
+uint16_t vendor_id;
+#define CLP_FHLIST_MASK_CONFIG 0x8000
+uint32_t config;
+uint32_t fid;
+uint32_t fh;
+} QEMU_PACKED ClpFhListEntry;
+
+#define CLP_RC_SETPCIFN_FH  0x0101 /* Invalid PCI fn handle */
+#define CLP_RC_SETPCIFN_FHOP0x0102 /* Fn handle not valid for op */
+#define CLP_RC_SETPCIFN_DMAAS   0x0103 /* Invalid DMA addr space */
+#define CLP_RC_SETPCIFN_RES 0x0104 /* Insufficient resources */
+#define CLP_RC_SETPCIFN_ALRDY   0x0105 /* Fn already in requested state */
+#define CLP_RC_SETPCIFN_ERR 0x0106 /* Fn in permanent error state */
+#define CLP_RC_SETPCIFN_RECPND  0x0107 /* Error recovery pending */
+#define CLP_RC_SETPCIFN_BUSY0x0108 /* Fn busy */
+#define CLP_RC_LISTPCI_BADRT0x010a /* Resume token not recognized */
+#define CLP_RC_QUERYPCIFG_PFGID 0x010b /* Unrecognized PFGID */
+
+/* request or response block header length */
+#define LIST_PCI_HDR_LEN 32
+
+/* Number of function handles fitting in response block */
+#define CLP_FH_LIST_NR_ENTRIES \
+((CLP_BLK_SIZE - 2 * LIST_PCI_HDR_LEN) \
+/ sizeof(ClpFhListEntry))
+
+#define CLP_SET_ENABLE_PCI_FN  0 /* Yes, 0 enables it */
+#define CLP_SET_DISABLE_PCI_FN 1 /* Yes, 1 disables it */
+
+#define CLP_UTIL_STR_LEN 64
+
+#define CLP_MASK_FMT 0xf000
+
+/* List PCI functions request */
+typedef struct ClpReqListPci {
+ClpReqHdr hdr;
+uint32_t fmt;
+uint64_t reserved1;
+uint64_t resume_token;
+uint64_t reserved2;
+} QEMU_PACKED ClpReqListPci;
+
+/* List PCI functions response */
+typedef struct ClpRspListPci {
+ClpRspHdr hdr;
+uint32_t fmt;
+uint64_t reserved1;
+uint64_t resume_token;
+uint32_t mdd;
+uint16_t max_fn;
+uint8_t flags;
+uint8_t entry_size;
+ClpFhListEntry fh_list[CLP_FH_LIST_NR_ENTRIES];
+} QEMU_PACKED ClpRspListPci;
+
+/* Query PCI function request */
+typedef struct ClpReqQueryPci {
+ClpReqHdr hdr;
+uint32_t fmt;
+uint64_t reserved1;
+uint32_t fh; /* function handle */
+uint32_t reserved2;
+uint64_t reserved3;
+} QEMU_PACKED ClpReqQueryPci;
+
+/* Query PCI function response */
+typedef struct ClpRspQueryPci {
+ClpRspHdr hdr;
+uint32_t fmt

[Bug 1902451] Re: incorrect cpuid feature detection

2020-11-01 Thread Luis

** Tags added: cpuid

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1902451

Title:
  incorrect cpuid feature detection

Status in QEMU:
  New

Bug description:
  Hello,

  I am currently developing a x64 kernel and I wanted to check through
  cpuid if some features are available in the guest. When I try to
  enable cpu features like vmcb_clean or constant_tsc qemu is saying
  that my host doesn't support the requested features. However cat
  /proc/cpuinfo tells a different story:

  model name:  AMD Ryzen 5 3500U
  flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf 
pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx 
f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 
3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext 
perfctr_llc mwaitx cpb hw_pstate sme pti ssbd sev ibpb vmmcall fsgsbase bmi1 
avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves 
clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean 
flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif 
overflow_recov succor smca

  I also checked it myself by running cpuid and check the bits as in the
  AMD Manual. Everything checks out but qemu still fails.

  QEMU version: QEMU emulator version 4.2.0

  $ qemu-system-x86_64 -cpu host,+vmcb_clean,enforce -enable-kvm -drive 
format=raw,file=target/x86_64-os/debug/bootimage-my_kernel.bin -serial stdio 
-display none
  qemu-system-x86_64: warning: host doesn't support requested feature: 
CPUID.800AH:EDX.vmcb-clean [bit 5]
  qemu-system-x86_64: Host doesn't support requested features

  or

  $ qemu-system-x86_64 -cpu host,+constant_tsc,enforce -enable-kvm -drive 
format=raw,file=target/x86_64-os/debug/bootimage-my_kernel.bin -serial stdio 
-display none
  qemu-system-x86_64: Property '.constant_tsc' not found

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1902451/+subscriptions

Re: [PATCH v2 14/19] RFC: accel/tcg: Support split-rwx for darwin/iOS with vm_remap

2020-11-01 Thread Joelle van Dyne

Another change I made in alloc_code_gen_buffer_mirror_vmremap (in my
patch as well) is to remove VM_FLAGS_RANDOM_ADDR. This was causing a
rare out of memory error whenever the random address it chooses is too
high.

-j

On Sat, Oct 31, 2020 at 6:42 PM Joelle van Dyne  wrote:
>
> There's a compiler warning:
>
> warning: incompatible pointer to integer conversion assigning to
> 'mach_vm_address_t' (aka 'unsigned long long') from 'void *'
> [-Wint-conversion]
> buf_rw = tcg_ctx->code_gen_buffer;
>
> I changed it to
> buf_rw = (mach_vm_address_t)tcg_ctx->code_gen_buffer;
>
> Also, MAP_JIT doesn't work with the split mapping (it needs the same
> entitlements that allows for RWX mapping) so I made the following
> changes
>
> @@ -1088,15 +1094,11 @@ static bool alloc_code_gen_buffer(size_t size,
> int mirror, Error **errp)
>  return true;
>  }
>  #else
> -static bool alloc_code_gen_buffer_anon(size_t size, int prot, Error **errp)
> +static bool alloc_code_gen_buffer_anon(size_t size, int prot, int
> flags, Error **errp)
>  {
> -int flags = MAP_PRIVATE | MAP_ANONYMOUS;
>  void *buf;
>
> -#ifdef CONFIG_DARWIN
> -/* Applicable to both iOS and macOS (Apple Silicon). */
> -flags |= MAP_JIT;
> -#endif
> +flags |= MAP_PRIVATE | MAP_ANONYMOUS;
>
>  buf = mmap(NULL, size, prot, flags, -1, 0);
>  if (buf == MAP_FAILED) {
> @@ -1211,7 +1213,7 @@ static bool
> alloc_code_gen_buffer_mirror_vmremap(size_t size, Error **errp)
>  vm_prot_t cur_prot, max_prot;
>
>  /* Map the read-write portion via normal anon memory. */
> -if (!alloc_code_gen_buffer_anon(size, PROT_READ | PROT_WRITE, errp)) {
> +if (!alloc_code_gen_buffer_anon(size, PROT_READ | PROT_WRITE, 0, errp)) {
>  return false;
>  }
>
> @@ -1263,6 +1265,8 @@ static bool alloc_code_gen_buffer_mirror(size_t
> size, Error **errp)
>
>  static bool alloc_code_gen_buffer(size_t size, int mirror, Error **errp)
>  {
> +int flags = 0;
> +
>  if (mirror) {
>  Error *local_err = NULL;
>  if (alloc_code_gen_buffer_mirror(size, &local_err)) {
> @@ -1283,8 +1287,11 @@ static bool alloc_code_gen_buffer(size_t size,
> int mirror, Error **errp)
>  /* The tcg interpreter does not need execute permission. */
>  prot = PROT_READ | PROT_WRITE;
>  #endif
> +#ifdef CONFIG_DARWIN
> +flags |= MAP_JIT;
> +#endif
>
> -return alloc_code_gen_buffer_anon(size, prot, errp);
> +return alloc_code_gen_buffer_anon(size, prot, flags, errp);
>  }
>  #endif /* USE_STATIC_CODE_GEN_BUFFER, WIN32, POSIX */
>
> With this in addition to the iOS host patches, I was able to run it on
> the iPad but am getting random crashes that I am continuing to debug.
>
> -j
>
> On Thu, Oct 29, 2020 at 5:49 PM Richard Henderson
>  wrote:
> >
> > Cribbed from code posted by Joelle van Dyne ,
> > and rearranged to a cleaner structure.  Completely untested.
> >
> > Signed-off-by: Richard Henderson 
> > ---
> >  accel/tcg/translate-all.c | 68 ++-
> >  1 file changed, 67 insertions(+), 1 deletion(-)
> >
> > diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> > index 3e69ebd1d3..bf8263fdb4 100644
> > --- a/accel/tcg/translate-all.c
> > +++ b/accel/tcg/translate-all.c
> > @@ -1093,6 +1093,11 @@ static bool alloc_code_gen_buffer_anon(size_t size, 
> > int prot, Error **errp)
> >  int flags = MAP_PRIVATE | MAP_ANONYMOUS;
> >  void *buf;
> >
> > +#ifdef CONFIG_DARWIN
> > +/* Applicable to both iOS and macOS (Apple Silicon). */
> > +flags |= MAP_JIT;
> > +#endif
> > +
> >  buf = mmap(NULL, size, prot, flags, -1, 0);
> >  if (buf == MAP_FAILED) {
> >  error_setg_errno(errp, errno,
> > @@ -1182,13 +1187,74 @@ static bool 
> > alloc_code_gen_buffer_mirror_memfd(size_t size, Error **errp)
> >  qemu_madvise(buf_rx, size, QEMU_MADV_HUGEPAGE);
> >  return true;
> >  }
> > -#endif
> > +#endif /* CONFIG_LINUX */
> > +
> > +#ifdef CONFIG_DARWIN
> > +#include 
> > +
> > +extern kern_return_t mach_vm_remap(vm_map_t target_task,
> > +   mach_vm_address_t *target_address,
> > +   mach_vm_size_t size,
> > +   mach_vm_offset_t mask,
> > +   int flags,
> > +   vm_map_t src_task,
> > +   mach_vm_address_t src_address,
> > +   boolean_t copy,
> > +   vm_prot_t *cur_protection,
> > +   vm_prot_t *max_protection,
> > +   vm_inherit_t inheritance);
> > +
> > +static bool alloc_code_gen_buffer_mirror_vmremap(size_t size, Error **errp)
> > +{
> > +kern_return_t ret;
> > +mach_vm_address_t buf_rw, buf_rx;
> > +vm_prot_t cur_prot, max_prot;
> > +
> > +/* Map the read-write portion via normal anon memory. */
> > +if (!alloc_code_gen

[PATCH-for-5.2] scripts/oss-fuzz: rename bin/qemu-fuzz-i386

2020-11-01 Thread Alexander Bulekov

OSS-Fuzz changed the way it scans for fuzzers in $DEST_DIR. The new code
also scans subdirectories for fuzzers. This means that OSS-Fuzz is
considering bin/qemu-fuzz-i386 as an independent fuzzer (it is not - it
requires a --fuzz-target argument). This has led to coverage-build
failures and false crash reports. To work around this, we take advantage
of OSS-Fuzz' filename extension check - OSS-Fuzz will not run anything
that has an extension that is not ".exe":
https://github.com/google/oss-fuzz/blob/master/infra/utils.py#L115

Reported-by: OSS-Fuzz (Issue 26725)
Reported-by: OSS-Fuzz (Issue 26679)
Signed-off-by: Alexander Bulekov 
---

Also, for context:
https://github.com/google/oss-fuzz/issues/4575

 scripts/oss-fuzz/build.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/scripts/oss-fuzz/build.sh b/scripts/oss-fuzz/build.sh
index fcae4a0c26..3b1c82b63d 100755
--- a/scripts/oss-fuzz/build.sh
+++ b/scripts/oss-fuzz/build.sh
@@ -91,7 +91,7 @@ make "-j$(nproc)" qemu-fuzz-i386 V=1
 # Copy over the datadir
 cp  -r ../pc-bios/ "$DEST_DIR/pc-bios"
 
-cp "./qemu-fuzz-i386" "$DEST_DIR/bin/"
+cp "./qemu-fuzz-i386" "$DEST_DIR/bin/qemu-fuzz-i386.base"
 
 # Run the fuzzer with no arguments, to print the help-string and get the list
 # of available fuzz-targets. Copy over the qemu-fuzz-i386, naming it according
@@ -104,7 +104,7 @@ do
 # that are thin wrappers around this target that set the required
 # environment variables according to predefined configs.
 if [ "$target" != "generic-fuzz" ]; then
-ln  "$DEST_DIR/bin/qemu-fuzz-i386" \
+ln  "$DEST_DIR/bin/qemu-fuzz-i386.base" \
 "$DEST_DIR/qemu-fuzz-i386-target-$target"
 fi
 done
-- 
2.28.0

[PATCH-for-5.2 v2] util/cutils: Fix Coverity array overrun in freq_to_str()

2020-11-01 Thread Philippe Mathieu-Daudé

Rewrite the iteration to avoid an array overrun. This fixes
CID 1435957:  Memory - illegal accesses (OVERRUN):

>>> Overrunning array "suffixes" of 7 8-byte elements at element
index 7 (byte offset 63) using index "idx" (which evaluates to 7).

Note, the biggest input value freq_to_str() can accept is UINT64_MAX,
which is ~18.446 EHz, less than 1000 EHz.

Reported-by: Eduardo Habkost 
Signed-off-by: Philippe Mathieu-Daudé 
---
Supersedes: <20201029185506.1241912-1-f4...@amsat.org>
---
 util/cutils.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/util/cutils.c b/util/cutils.c
index c395974fab4..723051da6e8 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -889,11 +889,13 @@ char *freq_to_str(uint64_t freq_hz)
 {
 static const char *const suffixes[] = { "", "K", "M", "G", "T", "P", "E" };
 double freq = freq_hz;
-size_t idx = 0;
+size_t idx;
 
-while (freq >= 1000.0 && idx < ARRAY_SIZE(suffixes)) {
+for (idx = 0; idx < ARRAY_SIZE(suffixes) - 1; idx++) {
+if (freq < 1000.0) {
+break;
+}
 freq /= 1000.0;
-idx++;
 }
 
 return g_strdup_printf("%0.3g %sHz", freq, suffixes[idx]);
-- 
2.26.2

[PATCH-for-5.2 v3] util/cutils: Fix Coverity array overrun in freq_to_str()

2020-11-01 Thread Philippe Mathieu-Daudé

Fix Coverity CID 1435957:  Memory - illegal accesses (OVERRUN):

>>> Overrunning array "suffixes" of 7 8-byte elements at element
index 7 (byte offset 63) using index "idx" (which evaluates to 7).

Note, the biggest input value freq_to_str() can accept is UINT64_MAX,
which is ~18.446 EHz, less than 1000 EHz.

Reported-by: Eduardo Habkost 
Suggested-by: Peter Maydell 
Signed-off-by: Philippe Mathieu-Daudé 
---
v3: Follow Peter's suggestion
---
 util/cutils.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/util/cutils.c b/util/cutils.c
index c395974fab4..2f869a843a5 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -891,10 +891,11 @@ char *freq_to_str(uint64_t freq_hz)
 double freq = freq_hz;
 size_t idx = 0;
 
-while (freq >= 1000.0 && idx < ARRAY_SIZE(suffixes)) {
+while (freq >= 1000.0) {
 freq /= 1000.0;
 idx++;
 }
+assert(idx < ARRAY_SIZE(suffixes));
 
 return g_strdup_printf("%0.3g %sHz", freq, suffixes[idx]);
 }
-- 
2.26.2

Re: [PATCH-for-5.2 v3] util/cutils: Fix Coverity array overrun in freq_to_str()

2020-11-01 Thread Peter Maydell

On Sun, 1 Nov 2020 at 21:57, Philippe Mathieu-Daudé  wrote:
>
> Fix Coverity CID 1435957:  Memory - illegal accesses (OVERRUN):
>
> >>> Overrunning array "suffixes" of 7 8-byte elements at element
> index 7 (byte offset 63) using index "idx" (which evaluates to 7).
>
> Note, the biggest input value freq_to_str() can accept is UINT64_MAX,
> which is ~18.446 EHz, less than 1000 EHz.
>
> Reported-by: Eduardo Habkost 
> Suggested-by: Peter Maydell 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> v3: Follow Peter's suggestion
> ---
>  util/cutils.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/util/cutils.c b/util/cutils.c
> index c395974fab4..2f869a843a5 100644
> --- a/util/cutils.c
> +++ b/util/cutils.c
> @@ -891,10 +891,11 @@ char *freq_to_str(uint64_t freq_hz)
>  double freq = freq_hz;
>  size_t idx = 0;
>
> -while (freq >= 1000.0 && idx < ARRAY_SIZE(suffixes)) {
> +while (freq >= 1000.0) {
>  freq /= 1000.0;
>  idx++;
>  }
> +assert(idx < ARRAY_SIZE(suffixes));
>
>  return g_strdup_printf("%0.3g %sHz", freq, suffixes[idx]);
>  }
> --

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH 4/5] spapr: Pass &error_abort when getting some PC DIMM properties

2020-11-01 Thread David Gibson

On Fri, Oct 30, 2020 at 02:25:42PM +0100, Greg Kurz wrote:
> On Wed, 28 Oct 2020 16:22:16 +0100
> Igor Mammedov  wrote:
> 
> > On Tue, 27 Oct 2020 16:18:58 +0100
> > Greg Kurz  wrote:
> > 
> [...]
> > > 
> > > It might require some more code refactoring because the way regular
> > > PC-DIMMs are broken down into a set of logical memory blocks (LMBs),
> > > each one having its own DRC but it's certainly doable. Probably for
> > > QEMU 6.0 though since we're entering soft freeze and David already
> > > fired a PR today.
> > 
> > as far as it's not forgotten, it can be done later.
> > 
> 
> David,
> 
> Can you create a ppc-for-6.0 branch ?

Done.

> 
> Cheers,
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH 1/4] hw/pci-host/prep: Update coding style to make checkpatch.pl happy

2020-11-01 Thread David Gibson

On Mon, Oct 12, 2020 at 09:19:03AM +0200, Philippe Mathieu-Daudé wrote:
> To make the next commit easier to review, clean this code first.
> 
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: David Gibson 

> ---
>  hw/pci-host/prep.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/pci-host/prep.c b/hw/pci-host/prep.c
> index d0323fefb10..80dfb67da43 100644
> --- a/hw/pci-host/prep.c
> +++ b/hw/pci-host/prep.c
> @@ -234,8 +234,10 @@ static void raven_pcihost_realizefn(DeviceState *d, 
> Error **errp)
>  sysbus_init_irq(dev, &s->pci_irqs[i]);
>  }
>  } else {
> -/* According to PReP specification section 6.1.6 "System Interrupt
> - * Assignments", all PCI interrupts are routed via IRQ 15 */
> +/*
> + * According to PReP specification section 6.1.6 "System Interrupt
> + * Assignments", all PCI interrupts are routed via IRQ 15
> + */
>  s->or_irq = OR_IRQ(object_new(TYPE_OR_IRQ));
>  object_property_set_int(OBJECT(s->or_irq), "num-lines", PCI_NUM_PINS,
>  &error_fatal);

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH 2/4] hw/pci-host/prep: Remove legacy PReP machine temporary workaround

2020-11-01 Thread David Gibson

On Mon, Oct 12, 2020 at 09:19:04AM +0200, Philippe Mathieu-Daudé wrote:
> The legacy PReP machine has been removed in commit b2ce76a0730
> ("hw/ppc/prep: Remove the deprecated "prep" machine and the
> OpenHackware BIOS"). This temporary workaround is no more
> required, remove it.
> 
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: David Gibson 

> ---
>  hw/pci-host/prep.c | 32 +++-
>  1 file changed, 11 insertions(+), 21 deletions(-)
> 
> diff --git a/hw/pci-host/prep.c b/hw/pci-host/prep.c
> index 80dfb67da43..064593d1e52 100644
> --- a/hw/pci-host/prep.c
> +++ b/hw/pci-host/prep.c
> @@ -75,7 +75,6 @@ struct PRePPCIState {
>  RavenPCIState pci_dev;
>  
>  int contiguous_map;
> -bool is_legacy_prep;
>  };
>  
>  #define BIOS_SIZE (1 * MiB)
> @@ -229,24 +228,18 @@ static void raven_pcihost_realizefn(DeviceState *d, 
> Error **errp)
>  MemoryRegion *address_space_mem = get_system_memory();
>  int i;
>  
> -if (s->is_legacy_prep) {
> -for (i = 0; i < PCI_NUM_PINS; i++) {
> -sysbus_init_irq(dev, &s->pci_irqs[i]);
> -}
> -} else {
> -/*
> - * According to PReP specification section 6.1.6 "System Interrupt
> - * Assignments", all PCI interrupts are routed via IRQ 15
> - */
> -s->or_irq = OR_IRQ(object_new(TYPE_OR_IRQ));
> -object_property_set_int(OBJECT(s->or_irq), "num-lines", PCI_NUM_PINS,
> -&error_fatal);
> -qdev_realize(DEVICE(s->or_irq), NULL, &error_fatal);
> -sysbus_init_irq(dev, &s->or_irq->out_irq);
> +/*
> + * According to PReP specification section 6.1.6 "System Interrupt
> + * Assignments", all PCI interrupts are routed via IRQ 15.
> + */
> +s->or_irq = OR_IRQ(object_new(TYPE_OR_IRQ));
> +object_property_set_int(OBJECT(s->or_irq), "num-lines", PCI_NUM_PINS,
> +&error_fatal);
> +qdev_realize(DEVICE(s->or_irq), NULL, &error_fatal);
> +sysbus_init_irq(dev, &s->or_irq->out_irq);
>  
> -for (i = 0; i < PCI_NUM_PINS; i++) {
> -s->pci_irqs[i] = qdev_get_gpio_in(DEVICE(s->or_irq), i);
> -}
> +for (i = 0; i < PCI_NUM_PINS; i++) {
> +s->pci_irqs[i] = qdev_get_gpio_in(DEVICE(s->or_irq), i);
>  }
>  
>  qdev_init_gpio_in(d, raven_change_gpio, 1);
> @@ -403,9 +396,6 @@ static Property raven_pcihost_properties[] = {
>  DEFINE_PROP_UINT32("elf-machine", PREPPCIState, pci_dev.elf_machine,
> EM_NONE),
>  DEFINE_PROP_STRING("bios-name", PREPPCIState, pci_dev.bios_name),
> -/* Temporary workaround until legacy prep machine is removed */
> -DEFINE_PROP_BOOL("is-legacy-prep", PREPPCIState, is_legacy_prep,
> - false),
>  DEFINE_PROP_END_OF_LIST()
>  };
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: Out-of-Process Device Emulation session at KVM Forum 2020

2020-11-01 Thread Jason Wang




On 2020/10/30 下午9:15, Stefan Hajnoczi wrote:

On Fri, Oct 30, 2020 at 12:08 PM Jason Wang  wrote:

On 2020/10/30 下午7:13, Stefan Hajnoczi wrote:

On Fri, Oct 30, 2020 at 9:46 AM Jason Wang  wrote:

On 2020/10/30 下午2:21, Stefan Hajnoczi wrote:

On Fri, Oct 30, 2020 at 3:04 AM Alex Williamson
 wrote:

It's great to revisit ideas, but proclaiming a uAPI is bad solely
because the data transfer is opaque, without defining why that's bad,
evaluating the feasibility and implementation of defining a well
specified data format rather than protocol, including cross-vendor
support, or proposing any sort of alternative is not so helpful imo.

The migration approaches in VFIO and vDPA/vhost were designed for
different requirements and I think this is why there are different
perspectives on this. Here is a comparison and how VFIO could be
extended in the future. I see 3 levels of device state compatibility:

1. The device cannot save/load state blobs, instead userspace fetches
and restores specific values of the device's runtime state (e.g. last
processed ring index). This is the vhost approach.

2. The device can save/load state in a standard format. This is
similar to #1 except that there is a single read/write blob interface
instead of fine-grained get_FOO()/set_FOO() interfaces. This approach
pushes the migration state parsing into the device so that userspace
doesn't need knowledge of every device type. With this approach it is
possible for a device from vendor A to migrate to a device from vendor
B, as long as they both implement the same standard migration format.
The limitation of this approach is that vendor-specific state cannot
be transferred.

3. The device can save/load opaque blobs. This is the initial VFIO
approach.

I still don't get why it must be opaque.

If the device state format needs to be in the VMM then each device
needs explicit enablement in each VMM (QEMU, cloud-hypervisor, etc).

Let's invert the question: why does the VMM need to understand the
device state of a _passthrough_ device?


For better manageability, compatibility and debug-ability. If we depends
on a opaque structure, do we encourage device to implement its own
migration protocol? It would be very challenge.

For VFIO in the kernel, I suspect a uAPI that may result a opaque data
to be read or wrote from guest violates the Linux uAPI principle. It
will be very hard to maintain uABI or even impossible. It looks to me
VFIO is the first subsystem that is trying to do this.

I think our concepts of uAPI are different. The uAPI of read(2) and
write(2) does not define the structure of the data buffers. VFIO
device regions are exactly the same, the structure of the data is not
defined by the kernel uAPI.



I think we're talking about different things. It's not about the data 
structure, it's about whether to data that reads from kernel can be 
understood by userspace.





Maybe microcode and firmware loading is an example we agree on?



I think not. They are bytecodes that have

1) strict ABI definitions
2) understood by userspace





A device from vendor A cannot migrate to a device from
vendor B because the format is incompatible. This approach works well
when devices have unique guest-visible hardware interfaces so the
guest wouldn't be able to handle migrating a device from vendor A to a
device from vendor B anyway.

For VFIO I guess cross vendor live migration can't succeed unless we do
some cheats in device/vendor id.

Yes. I haven't looked into the details of PCI (Sub-)Device/Vendor IDs
and how to best enable migration but I hope that can be solved. The
simplest approach is to override the IDs and make them part of the
guest configuration.


That would be very tricky (or requires whitelist). E.g the opaque of the
src may match the opaque of the dst by chance.

Luckily identifying things based on magic constants has been solved
many times in the past.

A central identifier registry prevents all collisions but is a pain to
manage. Or use a 128-bit UUID and self-allocate the identifier with an
extremely low chance of collision:
https://en.wikipedia.org/wiki/Universally_unique_identifier#Collisions



I may miss something. I think we're talking about cross vendor live 
migration.


Would you want src and dest have same UUID or not?

If they have different UUIDs, how could we know we can live migrate 
between them.


If they have the same UUID, what's the rule of forcing the the vendors 
to choose same UUID (a spec)?


Thanks





For at least virtio, they will still go with virtio/vDPA. The advantages
are:

1) virtio/vDPA can serve kernel subsystems which VFIO can't, this is
very important for containers

I'm not sure I understand this. If the kernel wants to use the device
then it doesn't use VFIO, it runs the kernel driver instead.


Current spec is not suitable for all type of device. We've received many
feedbacks that virtio(pci) might not work very well. Another point is
that there could be vendor that don't want go wit

Re: Out-of-Process Device Emulation session at KVM Forum 2020

2020-11-01 Thread Jason Wang

On 2020/11/1 下午4:26, Paolo Bonzini wrote:

Il sab 31 ott 2020, 22:49 Michael S. Tsirkin > ha scritto:

> > I still don't get why it must be opaque.
>
> If the device state format needs to be in the VMM then each device
> needs explicit enablement in each VMM (QEMU, cloud-hypervisor, etc).

And QEMU cares why exactly?

QEMU cares for another reason. It is more code to review, and it's 
worth spending the time to reviewing it only if we can do a decent job 
at reviewing it.

There are several cases in which drivers migrate non-architectural, 
implementation-dependent state. There are some examples in nested 
virtualization (the deadline of the VMX preemption timer) or device 
emulation (the RTC has quite a few example also of how those changed 
through the years). We probably don't have anyway the knowledge of the 
innards of the drivers to do a decent job at reviewing patches that 
affect those.

> Let's invert the question: why does the VMM need to understand the
> device state of a _passthrough_ device?

To support cross version migration and compatibility checks.

That doesn't have to be in the VMM. We should give guidance but that 
can be in terms of documentation.

I doubt this can work well if we don't force it via ABI.

Thanks

Also, in QEMU we chose the path of dropping sections on the source 
when migrating to older versions, but that can also be considered a 
deficiency of vmstate---a self-synchronizing format (Anthony many 
years ago wanted to use X509 as the migration format) would be much 
better. And for some specific device types we could define standard 
formats, just like PCI has standard classes.

Paolo

This problem is harder than it appears, I don't think vendors
will do a good job of it without any guidance and standards.

-- 
MST

Re: Out-of-Process Device Emulation session at KVM Forum 2020

2020-11-01 Thread Jason Wang




On 2020/10/30 下午7:13, Stefan Hajnoczi wrote:

I still don't get why it must be opaque.

If the device state format needs to be in the VMM then each device
needs explicit enablement in each VMM (QEMU, cloud-hypervisor, etc).

Let's invert the question: why does the VMM need to understand the
device state of a_passthrough_  device?



It's not a 100% passthrough device if you want to support live 
migration. E.g the device state save and restore is not under the 
control of drivers in the guest.


And if I understand correctly, it usually requires device emulation or 
mediation in either userspace or kernel to support e.g dirty page 
tracking and other things.


Thanks

[Bug 1902470] [NEW] migration with TLS-MultiFD is stuck when the dst-libvirtd service restarts

2020-11-01 Thread Yan Jin

Public bug reported:

hi,

I found that the multi-channel TLS-handshake will be stuck when the dst-
libvirtd restarts, both the src and dst sockets are blocked in recvmsg.
In the meantime, live_migration thread is blocked in
multifd_send_sync_main, so migration cannot be cancelled though src-
libvirt has delivered the QMP command.

Is there any way to exit migration when the multi-channel TLS-handshake
is stuck? Does setting TLS handshake timeout function take effect?

The stack trace are as follows:

=src qemu-system-aar stack=:
#0  0x87d6f28c in recvmsg () from target:/usr/lib64/libpthread.so.0
#1  0xe3817424 in qio_channel_socket_readv (ioc=0xe9e30a30, 
iov=0xdb58e8a8, niov=1, fds=0x0, nfds=0x0, errp=0x0) at 
../io/channel-socket.c:502
#2  0xe380f468 in qio_channel_readv_full (ioc=0xe9e30a30, 
iov=0xdb58e8a8, niov=1, fds=0x0, nfds=0x0, errp=0x0) at ../io/channel.c:66
#3  0xe380f9e8 in qio_channel_read (ioc=0xe9e30a30, 
buf=0xea204e9b "\026\003\001\001L\001", buflen=5, errp=0x0) at 
../io/channel.c:217
#4  0xe380e7d4 in qio_channel_tls_read_handler (buf=0xea204e9b 
"\026\003\001\001L\001", len=5, opaque=0xfffd38001190) at ../io/channel-tls.c:53
#5  0xe3801114 in qcrypto_tls_session_pull (opaque=0xe99d5700, 
buf=0xea204e9b, len=5) at ../crypto/tlssession.c:89
#6  0x8822ed30 in _gnutls_stream_read (ms=0xdb58eaac, 
pull_func=0xfffd38001870, size=5, bufel=, 
session=0xe983cd60) at buffers.c:346
#7  _gnutls_read (ms=0xdb58eaac, pull_func=0xfffd38001870, size=5, 
bufel=, session=0xe983cd60) at buffers.c:426
#8  _gnutls_io_read_buffered (session=session@entry=0xe983cd60, total=5, 
recv_type=recv_type@entry=4294967295, ms=0xdb58eaac) at buffers.c:581
#9  0x88224954 in recv_headers (ms=, 
record=0x883cd000 , 
htype=65535, type=2284006288, record_params=0xe9e22a60, 
session=0xe983cd60) at record.c:1163
#10 _gnutls_recv_in_buffers (session=session@entry=0xe983cd60, 
type=2284006288, type@entry=GNUTLS_HANDSHAKE, htype=65535, 
htype@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, ms=, 
ms@entry=0) at record.c:1302
#11 0x88230568 in _gnutls_handshake_io_recv_int 
(session=session@entry=0xe983cd60, 
htype=htype@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, 
hsk=hsk@entry=0xdb58ec38, optional=optional@entry=1) at buffers.c:1445
#12 0x88232b90 in _gnutls_recv_handshake 
(session=session@entry=0xe983cd60, 
type=type@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, 
optional=optional@entry=1, buf=buf@entry=0x0) at handshake.c:1534
#13 0x88235b40 in handshake_client 
(session=session@entry=0xe983cd60) at handshake.c:2925
#14 0x88237824 in gnutls_handshake (session=0xe983cd60) at 
handshake.c:2739
#15 0xe380213c in qcrypto_tls_session_handshake 
(session=0xe99d5700, errp=0xdb58ee58) at ../crypto/tlssession.c:493
#16 0xe380ea40 in qio_channel_tls_handshake_task (ioc=0xfffd38001190, 
task=0xea61d4e0, context=0x0) at ../io/channel-tls.c:161
#17 0xe380ec60 in qio_channel_tls_handshake (ioc=0xfffd38001190, 
func=0xe3394d20 , opaque=0xea189c30, 
destroy=0x0, context=0x0) at ../io/channel-tls.c:239
#18 0xe3394e78 in multifd_tls_channel_connect (p=0xea189c30, 
ioc=0xe9e30a30, errp=0xdb58ef28) at ../migration/multifd.c:782
#19 0xe3394f30 in multifd_channel_connect (p=0xea189c30, 
ioc=0xe9e30a30, error=0x0) at ../migration/multifd.c:804
#20 0xe33950b8 in multifd_new_send_channel_async (task=0xea6855a0, 
opaque=0xea189c30) at ../migration/multifd.c:858
#21 0xe3810cf8 in qio_task_complete (task=0xea6855a0) at 
../io/task.c:197
#22 0xe381096c in qio_task_thread_result (opaque=0xea6855a0) at 
../io/task.c:112
#23 0x88701df8 in ?? () from target:/usr/lib64/libglib-2.0.so.0
#24 0x88705a7c in g_main_context_dispatch () from 
target:/usr/lib64/libglib-2.0.so.0
#25 0xe3a5a29c in glib_pollfds_poll () at ../util/main-loop.c:221
#26 0xe3a5a324 in os_host_main_loop_wait (timeout=0) at 
../util/main-loop.c:244
#27 0xe3a5a444 in main_loop_wait (nonblocking=0) at 
../util/main-loop.c:520
#28 0xe3696b20 in qemu_main_loop () at ../softmmu/vl.c:1677
#29 0xe30949e4 in main (argc=81, argv=0xdb58f2c8, 
envp=0xdb58f558) at ../softmmu/main.c:50

=src live_migration stack=:
#0  0x87d6a5d8 in pthread_cond_wait () from 
target:/usr/lib64/libpthread.so.0
#1  0xe3a5f3ec in qemu_sem_wait (sem=0xea189d40) at 
../util/qemu-thread-posix.c:328
#2  0xe3394838 in multifd_send_sync_main (f=0xe983f0e0) at 
../migration/multifd.c:638
#3  0xe37de310 in ram_save_setup (f=0xe983f0e0, 
opaque=0xe4198708 ) at ../migration/ram.c:2588
#4  0xe31cf7ac in qemu_savevm_state_setup (f=0xe983f0e0) at 
../migration/savevm.c:1176
#5  0xe3248360 in

Re: [RFC PATCH v2 07/13] hw/arm/virt-acpi-build: distinguish possible and present cpus Message

2020-11-01 Thread Ying Fang





On 10/30/2020 1:20 AM, Andrew Jones wrote:


You need to remove 'Message' from the summary.

On Tue, Oct 20, 2020 at 09:14:34PM +0800, Ying Fang wrote:

When building ACPI tables regarding CPUs we should always build
them for the number of possible CPUs, not the number of present
CPUs. We then ensure only the present CPUs are enabled.

Signed-off-by: Andrew Jones 


I guess my s-o-b is here because this is a rework of

https://github.com/rhdrjones/qemu/commit/b18d7a889f424b8a8679c43d7f4804fdeeeaf3fd


The s-o-b is given since this one is based on your branch.



I think it changed enough you could just drop my authorship. A
based-on comment in the commit message would be more than enough.


Thanks. Will fix it. Hope it won't make you confused.



Comment on the patch below.


Signed-off-by: Ying Fang 
---
  hw/arm/virt-acpi-build.c | 17 -
  1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index a222981737..fae5a26741 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -57,14 +57,18 @@
  
  #define ARM_SPI_BASE 32
  
-static void acpi_dsdt_add_cpus(Aml *scope, int cpus)

+static void acpi_dsdt_add_cpus(Aml *scope, VirtMachineState *vms)
  {
  uint16_t i;
+CPUArchIdList *possible_cpus = MACHINE(vms)->possible_cpus;
  
-for (i = 0; i < cpus; i++) {

+for (i = 0; i < possible_cpus->len; i++) {
  Aml *dev = aml_device("C%.03X", i);
  aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0007")));
  aml_append(dev, aml_name_decl("_UID", aml_int(i)));
+if (possible_cpus->cpus[i].cpu == NULL) {
+aml_append(dev, aml_name_decl("_STA", aml_int(0)));
+}
  aml_append(scope, dev);
  }
  }
@@ -470,6 +474,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
  const int *irqmap = vms->irqmap;
  AcpiMadtGenericDistributor *gicd;
  AcpiMadtGenericMsiFrame *gic_msi;
+int possible_cpus = MACHINE(vms)->possible_cpus->len;
  int i;
  
  acpi_data_push(table_data, sizeof(AcpiMultipleApicTable));

@@ -480,7 +485,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
  gicd->base_address = cpu_to_le64(memmap[VIRT_GIC_DIST].base);
  gicd->version = vms->gic_version;
  
-for (i = 0; i < MACHINE(vms)->smp.cpus; i++) {

+for (i = 0; i < possible_cpus; i++) {
  AcpiMadtGenericCpuInterface *gicc = acpi_data_push(table_data,
 sizeof(*gicc));
  ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(i));
@@ -495,7 +500,9 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
  gicc->cpu_interface_number = cpu_to_le32(i);
  gicc->arm_mpidr = cpu_to_le64(armcpu->mp_affinity);
  gicc->uid = cpu_to_le32(i);
-gicc->flags = cpu_to_le32(ACPI_MADT_GICC_ENABLED);
+if (i < MACHINE(vms)->smp.cpus) {


Shouldn't this be


Yes, Stupid mistake. Maybe it was lost when I am doing the rebase.
Will fix that. Thanks for your patience in the reply and review.

Ying Fang.


 if (possible_cpus->cpus[i].cpu != NULL) {


+gicc->flags = cpu_to_le32(ACPI_MADT_GICC_ENABLED);
+}
  
  if (arm_feature(&armcpu->env, ARM_FEATURE_PMU)) {

  gicc->performance_interrupt = cpu_to_le32(PPI(VIRTUAL_PMU_IRQ));
@@ -599,7 +606,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
   * the RTC ACPI device at all when using UEFI.
   */
  scope = aml_scope("\\_SB");
-acpi_dsdt_add_cpus(scope, ms->smp.cpus);
+acpi_dsdt_add_cpus(scope, vms);
  acpi_dsdt_add_uart(scope, &memmap[VIRT_UART],
 (irqmap[VIRT_UART] + ARM_SPI_BASE));
  if (vmc->acpi_expose_flash) {
--
2.23.0




Thanks,
drew

.

[Bug 1902470] Re: migration with TLS-MultiFD is stuck when the dst-libvirtd service restarts

2020-11-01 Thread Yan Jin

** Description changed:

  hi,
  
- I found that the multi-channel TLS-handshake will be stuck when the 
dst-libvirtd restarts, both the src and dst sockets are blocked in recvmsg. In 
the meantime, live_migration thread is blocked in multifd_send_sync_main, so
- migration cannot be cancelled though src-libvirt has delivered the QMP 
command.
+ I found that the multi-channel TLS-handshake will be stuck when the dst-
+ libvirtd restarts, both the src and dst sockets are blocked in recvmsg.
+ In the meantime, live_migration thread is blocked in
+ multifd_send_sync_main, so migration cannot be cancelled though src-
+ libvirt has delivered the QMP command.
  
  Is there any way to exit migration when the multi-channel TLS-handshake
  is stuck? Does setting TLS handshake timeout function take effect?
  
  The stack trace are as follows:
  
  =src qemu-system-aar stack=:
  #0  0x87d6f28c in recvmsg () from target:/usr/lib64/libpthread.so.0
  #1  0xe3817424 in qio_channel_socket_readv (ioc=0xe9e30a30, 
iov=0xdb58e8a8, niov=1, fds=0x0, nfds=0x0, errp=0x0) at 
../io/channel-socket.c:502
  #2  0xe380f468 in qio_channel_readv_full (ioc=0xe9e30a30, 
iov=0xdb58e8a8, niov=1, fds=0x0, nfds=0x0, errp=0x0) at ../io/channel.c:66
  #3  0xe380f9e8 in qio_channel_read (ioc=0xe9e30a30, 
buf=0xea204e9b "\026\003\001\001L\001", buflen=5, errp=0x0) at 
../io/channel.c:217
  #4  0xe380e7d4 in qio_channel_tls_read_handler (buf=0xea204e9b 
"\026\003\001\001L\001", len=5, opaque=0xfffd38001190) at ../io/channel-tls.c:53
  #5  0xe3801114 in qcrypto_tls_session_pull (opaque=0xe99d5700, 
buf=0xea204e9b, len=5) at ../crypto/tlssession.c:89
  #6  0x8822ed30 in _gnutls_stream_read (ms=0xdb58eaac, 
pull_func=0xfffd38001870, size=5, bufel=, 
session=0xe983cd60) at buffers.c:346
  #7  _gnutls_read (ms=0xdb58eaac, pull_func=0xfffd38001870, size=5, 
bufel=, session=0xe983cd60) at buffers.c:426
  #8  _gnutls_io_read_buffered (session=session@entry=0xe983cd60, total=5, 
recv_type=recv_type@entry=4294967295, ms=0xdb58eaac) at buffers.c:581
- #9  0x88224954 in recv_headers (ms=, 
record=0x883cd000 , 
htype=65535, type=2284006288,
- record_params=0xe9e22a60, session=0xe983cd60) at record.c:1163
- #10 _gnutls_recv_in_buffers (session=session@entry=0xe983cd60, 
type=2284006288, type@entry=GNUTLS_HANDSHAKE, htype=65535, 
htype@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST,
- ms=, ms@entry=0) at record.c:1302
- #11 0x88230568 in _gnutls_handshake_io_recv_int 
(session=session@entry=0xe983cd60, 
htype=htype@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, 
hsk=hsk@entry=0xdb58ec38,
- optional=optional@entry=1) at buffers.c:1445
- #12 0x88232b90 in _gnutls_recv_handshake 
(session=session@entry=0xe983cd60, 
type=type@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, optional=optional@entry=1,
- buf=buf@entry=0x0) at handshake.c:1534
+ #9  0x88224954 in recv_headers (ms=, 
record=0x883cd000 , 
htype=65535, type=2284006288, record_params=0xe9e22a60, 
session=0xe983cd60) at record.c:1163
+ #10 _gnutls_recv_in_buffers (session=session@entry=0xe983cd60, 
type=2284006288, type@entry=GNUTLS_HANDSHAKE, htype=65535, 
htype@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, ms=, 
ms@entry=0) at record.c:1302
+ #11 0x88230568 in _gnutls_handshake_io_recv_int 
(session=session@entry=0xe983cd60, 
htype=htype@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, 
hsk=hsk@entry=0xdb58ec38, optional=optional@entry=1) at buffers.c:1445
+ #12 0x88232b90 in _gnutls_recv_handshake 
(session=session@entry=0xe983cd60, 
type=type@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, 
optional=optional@entry=1, buf=buf@entry=0x0) at handshake.c:1534
  #13 0x88235b40 in handshake_client 
(session=session@entry=0xe983cd60) at handshake.c:2925
  #14 0x88237824 in gnutls_handshake (session=0xe983cd60) at 
handshake.c:2739
  #15 0xe380213c in qcrypto_tls_session_handshake 
(session=0xe99d5700, errp=0xdb58ee58) at ../crypto/tlssession.c:493
  #16 0xe380ea40 in qio_channel_tls_handshake_task (ioc=0xfffd38001190, 
task=0xea61d4e0, context=0x0) at ../io/channel-tls.c:161
- #17 0xe380ec60 in qio_channel_tls_handshake (ioc=0xfffd38001190, 
func=0xe3394d20 , opaque=0xea189c30, 
destroy=0x0, context=0x0)
- at ../io/channel-tls.c:239
+ #17 0xe380ec60 in qio_channel_tls_handshake (ioc=0xfffd38001190, 
func=0xe3394d20 , opaque=0xea189c30, 
destroy=0x0, context=0x0) at ../io/channel-tls.c:239
  #18 0xe3394e78 in multifd_tls_channel_connect (p=0xea189c30, 
ioc=0xe9e30a30, errp=0xdb58ef28) at ../migration/multifd.c:782
  #19 0xe3394f30 in multifd_channel_connect (p=0xea189c30, 
ioc=0xe9e30a30, error=0x0) at ../migration/multifd.c:804
  #20 0xe33950b8 in multifd_new_send

[Bug 1902470] Re: migration with TLS-MultiFD is stuck when the dst-libvirtd service restarts

2020-11-01 Thread Yan Jin

** Description changed:

  hi,
  
  I found that the multi-channel TLS-handshake will be stuck when the dst-
  libvirtd restarts, both the src and dst sockets are blocked in recvmsg.
  In the meantime, live_migration thread is blocked in
  multifd_send_sync_main, so migration cannot be cancelled though src-
  libvirt has delivered the QMP command.
  
  Is there any way to exit migration when the multi-channel TLS-handshake
- is stuck? Does setting TLS handshake timeout function take effect?
+ is stuck? Does setting TLS-handshake timeout function take effect?
  
  The stack trace are as follows:
  
  =src qemu-system-aar stack=:
  #0  0x87d6f28c in recvmsg () from target:/usr/lib64/libpthread.so.0
  #1  0xe3817424 in qio_channel_socket_readv (ioc=0xe9e30a30, 
iov=0xdb58e8a8, niov=1, fds=0x0, nfds=0x0, errp=0x0) at 
../io/channel-socket.c:502
  #2  0xe380f468 in qio_channel_readv_full (ioc=0xe9e30a30, 
iov=0xdb58e8a8, niov=1, fds=0x0, nfds=0x0, errp=0x0) at ../io/channel.c:66
  #3  0xe380f9e8 in qio_channel_read (ioc=0xe9e30a30, 
buf=0xea204e9b "\026\003\001\001L\001", buflen=5, errp=0x0) at 
../io/channel.c:217
  #4  0xe380e7d4 in qio_channel_tls_read_handler (buf=0xea204e9b 
"\026\003\001\001L\001", len=5, opaque=0xfffd38001190) at ../io/channel-tls.c:53
  #5  0xe3801114 in qcrypto_tls_session_pull (opaque=0xe99d5700, 
buf=0xea204e9b, len=5) at ../crypto/tlssession.c:89
  #6  0x8822ed30 in _gnutls_stream_read (ms=0xdb58eaac, 
pull_func=0xfffd38001870, size=5, bufel=, 
session=0xe983cd60) at buffers.c:346
  #7  _gnutls_read (ms=0xdb58eaac, pull_func=0xfffd38001870, size=5, 
bufel=, session=0xe983cd60) at buffers.c:426
  #8  _gnutls_io_read_buffered (session=session@entry=0xe983cd60, total=5, 
recv_type=recv_type@entry=4294967295, ms=0xdb58eaac) at buffers.c:581
  #9  0x88224954 in recv_headers (ms=, 
record=0x883cd000 , 
htype=65535, type=2284006288, record_params=0xe9e22a60, 
session=0xe983cd60) at record.c:1163
  #10 _gnutls_recv_in_buffers (session=session@entry=0xe983cd60, 
type=2284006288, type@entry=GNUTLS_HANDSHAKE, htype=65535, 
htype@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, ms=, 
ms@entry=0) at record.c:1302
  #11 0x88230568 in _gnutls_handshake_io_recv_int 
(session=session@entry=0xe983cd60, 
htype=htype@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, 
hsk=hsk@entry=0xdb58ec38, optional=optional@entry=1) at buffers.c:1445
  #12 0x88232b90 in _gnutls_recv_handshake 
(session=session@entry=0xe983cd60, 
type=type@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, 
optional=optional@entry=1, buf=buf@entry=0x0) at handshake.c:1534
  #13 0x88235b40 in handshake_client 
(session=session@entry=0xe983cd60) at handshake.c:2925
  #14 0x88237824 in gnutls_handshake (session=0xe983cd60) at 
handshake.c:2739
  #15 0xe380213c in qcrypto_tls_session_handshake 
(session=0xe99d5700, errp=0xdb58ee58) at ../crypto/tlssession.c:493
  #16 0xe380ea40 in qio_channel_tls_handshake_task (ioc=0xfffd38001190, 
task=0xea61d4e0, context=0x0) at ../io/channel-tls.c:161
  #17 0xe380ec60 in qio_channel_tls_handshake (ioc=0xfffd38001190, 
func=0xe3394d20 , opaque=0xea189c30, 
destroy=0x0, context=0x0) at ../io/channel-tls.c:239
  #18 0xe3394e78 in multifd_tls_channel_connect (p=0xea189c30, 
ioc=0xe9e30a30, errp=0xdb58ef28) at ../migration/multifd.c:782
  #19 0xe3394f30 in multifd_channel_connect (p=0xea189c30, 
ioc=0xe9e30a30, error=0x0) at ../migration/multifd.c:804
  #20 0xe33950b8 in multifd_new_send_channel_async 
(task=0xea6855a0, opaque=0xea189c30) at ../migration/multifd.c:858
  #21 0xe3810cf8 in qio_task_complete (task=0xea6855a0) at 
../io/task.c:197
  #22 0xe381096c in qio_task_thread_result (opaque=0xea6855a0) at 
../io/task.c:112
  #23 0x88701df8 in ?? () from target:/usr/lib64/libglib-2.0.so.0
  #24 0x88705a7c in g_main_context_dispatch () from 
target:/usr/lib64/libglib-2.0.so.0
  #25 0xe3a5a29c in glib_pollfds_poll () at ../util/main-loop.c:221
  #26 0xe3a5a324 in os_host_main_loop_wait (timeout=0) at 
../util/main-loop.c:244
  #27 0xe3a5a444 in main_loop_wait (nonblocking=0) at 
../util/main-loop.c:520
  #28 0xe3696b20 in qemu_main_loop () at ../softmmu/vl.c:1677
  #29 0xe30949e4 in main (argc=81, argv=0xdb58f2c8, 
envp=0xdb58f558) at ../softmmu/main.c:50
  
  =src live_migration stack=:
  #0  0x87d6a5d8 in pthread_cond_wait () from 
target:/usr/lib64/libpthread.so.0
  #1  0xe3a5f3ec in qemu_sem_wait (sem=0xea189d40) at 
../util/qemu-thread-posix.c:328
  #2  0xe3394838 in multifd_send_sync_main (f=0xe983f0e0) at 
../migration/multifd.c:638
  #3  0xe37de310 in ram_save_setup (f=0xe983f0e0, 
opaqu

[PATCH V2] hw/display/exynos4210_fimd: Fix potential NULL pointer dereference

2020-11-01 Thread AlexChen

In exynos4210_fimd_update(), the pointer s is dereferinced before
being check if it is valid, which may lead to NULL pointer dereference.
So move the assignment to global_width after checking that the s is valid.

Reported-by: Euler Robot 
Signed-off-by: Alex Chen 
---
 hw/display/exynos4210_fimd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/display/exynos4210_fimd.c b/hw/display/exynos4210_fimd.c
index 4c16e1f5a0..34a960a976 100644
--- a/hw/display/exynos4210_fimd.c
+++ b/hw/display/exynos4210_fimd.c
@@ -1275,12 +1275,14 @@ static void exynos4210_fimd_update(void *opaque)
 bool blend = false;
 uint8_t *host_fb_addr;
 bool is_dirty = false;
-const int global_width = (s->vidtcon[2] & FIMD_VIDTCON2_SIZE_MASK) + 1;
+int global_width;

 if (!s || !s->console || !s->enabled ||
 surface_bits_per_pixel(qemu_console_surface(s->console)) == 0) {
 return;
 }
+
+global_width = (s->vidtcon[2] & FIMD_VIDTCON2_SIZE_MASK) + 1;
 exynos4210_update_resolution(s);
 surface = qemu_console_surface(s->console);

-- 
2.19.1

Re: [PATCH] util: Remove redundant checks in the openpty()

2020-11-01 Thread AlexChen

On 2020/10/31 23:21, Peter Maydell wrote:
> On Sat, 31 Oct 2020 at 11:04, AlexChen  wrote:
>>
>> As we can see from the following function call stack, the amaster and the 
>> aslave
>> cannot be NULL: char_pty_open() -> qemu_openpty_raw() -> openpty().
>> In addition, the amaster and the aslave has been dereferenced at the 
>> beginning
>> of the openpty(). So the checks on amaster and aslave in the openpty() are 
>> redundant.
>>
>> Reported-by: Euler Robot 
>> Signed-off-by: Alex Chen 
> 
> This function is trying to match the BSD/glibc openpty()
> function, so the thing to check here is not QEMU's specific
> current usage but the API specification for openpty():
> https://www.gnu.org/software/libc/manual/html_node/Pseudo_002dTerminal-Pairs.html
> https://www.freebsd.org/cgi/man.cgi?query=openpty
> 
> The spec says that name, termp and winp can all be
> NULL, but it doesn't say this for amaster and aslave,
> so indeed the change in this patch is the correct one.
> 
>> ---
>>  util/qemu-openpty.c | 7 +++
>>  1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/util/qemu-openpty.c b/util/qemu-openpty.c
>> index eb17f5b0bc..427f43a769 100644
>> --- a/util/qemu-openpty.c
>> +++ b/util/qemu-openpty.c
>> @@ -80,10 +80,9 @@ static int openpty(int *amaster, int *aslave, char *name,
>>  (termp != NULL && tcgetattr(sfd, termp) < 0))
>>  goto err;
>>
>> -if (amaster)
>> -*amaster = mfd;
>> -if (aslave)
>> -*aslave = sfd;
>> +*amaster = mfd;
>> +*aslave = sfd;
>> +
>>  if (winp)
>>  ioctl(sfd, TIOCSWINSZ, winp);
> 
> Reviewed-by: Peter Maydell 
> 
> though you might like to mention in the commit message that
> the openpty() API doesn't allow NULL amaster or aslave
> arguments.
>

Thanks for your review, I will add this description to my commit message in my 
patch V2.
In addition, since the amaster and the aslave are not allow to be NULL,
do we need to check that the amaster and the aslave are NULL in the beginning 
of the openpty()?
such as this modification:

diff --git a/util/qemu-openpty.c b/util/qemu-openpty.c
index eb17f5b0bc..1aadd39395 100644
--- a/util/qemu-openpty.c
+++ b/util/qemu-openpty.c
@@ -61,6 +61,9 @@ static int openpty(int *amaster, int *aslave, char *name,
 const char *slave;
 int mfd = -1, sfd = -1;

+if (!amaster || !aslave)
+goto err;
+
 *amaster = *aslave = -1;

 mfd = open("/dev/ptmx", O_RDWR | O_NOCTTY);
@@ -80,10 +83,9 @@ static int openpty(int *amaster, int *aslave, char *name,
 (termp != NULL && tcgetattr(sfd, termp) < 0))
 goto err;

-if (amaster)
-*amaster = mfd;
-if (aslave)
-*aslave = sfd;
+*amaster = mfd;
+*aslave = sfd;
+
 if (winp)
 ioctl(sfd, TIOCSWINSZ, winp);

@@ -92,7 +94,8 @@ static int openpty(int *amaster, int *aslave, char *name,
 err:
 if (sfd != -1)
 close(sfd);
-close(mfd);
+if (mfd != -1)
+close(mfd);
 return -1;
 }
 #endif
-- 
2.19.1

Thanks,
Alex

[PATCH] pci/shpc: don't push attention button when ejecting powered-off device

2020-11-01 Thread Roman Kagan

When the slot is in steady powered-off state and the device is being
removed, there's no need to press the attention button.  Nor is it
mandated by the Standard Hot-Plug Controller Specification, Rev. 1.0.

Moreover it confuses the guest, Linux in particular, as it assumes that
the attention button pressed in this state indicates that the device has
been inserted and will need to be powered on.  Therefore it transitions
the slot into BLINKING_ON state for 5 seconds, and discovers at the end
that no device is actually inserted:

... unplug request
[12685.451329] shpchp :01:00.0: Button pressed on Slot(2)
[12685.455478] shpchp :01:00.0: PCI slot #2 - powering off due to button 
press
... in 5 seconds OS powers off the slot, QEMU ejects the device
[12690.632282] shpchp :01:00.0: Latch open on Slot(2)
... excessive button press in steady powered-off state
[12690.634267] shpchp :01:00.0: Button pressed on Slot(2)
[12690.636256] shpchp :01:00.0: Card not present on Slot(2)
... the last button press spawns powering on the slot
[12690.638909] shpchp :01:00.0: PCI slot #2 - powering on due to button 
press
... in 5 more seconds attempt to power on discovers empty slot
[12695.735986] shpchp :01:00.0: No adapter on slot(2)

Worse, if the real device insertion happens within 5 seconds from the
apparent completion of the previous device removal (signaled via
DEVICE_DELETED event), the new button press will be interpreted as the
cancellation of that misguided powering on:

[13448.965295] shpchp :01:00.0: Button pressed on Slot(2)
[13448.969430] shpchp :01:00.0: PCI slot #2 - powering off due to button 
press
[13454.025107] shpchp :01:00.0: Latch open on Slot(2)
[13454.027101] shpchp :01:00.0: Button pressed on Slot(2)
[13454.029165] shpchp :01:00.0: Card not present on Slot(2)
... the excessive button press spawns powering on the slot
... device has already been ejected by QEMU
[13454.031949] shpchp :01:00.0: PCI slot #2 - powering on due to button 
press
... new device is inserted in the slot
[13456.861545] shpchp :01:00.0: Latch close on Slot(2)
... valid button press arrives before 5 s since the wrong one
[13456.864894] shpchp :01:00.0: Button pressed on Slot(2)
[13456.869211] shpchp :01:00.0: Card present on Slot(2)
... the valid button press is counted as cancellation of the wrong one
[13456.873173] shpchp :01:00.0: Button cancel on Slot(2)
[13456.877101] shpchp :01:00.0: PCI slot #2 - action canceled due to button 
press

As a result, the newly inserted device isn't brought up by the guest.

Avoid this situation by not pushing the attention button when the device
in the slot is in powered-off state and is being ejected.

FWIW pcie implementation doesn't suffer from this problem.

Signed-off-by: Roman Kagan 
---
 hw/pci/shpc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/pci/shpc.c b/hw/pci/shpc.c
index b00dce629c..837159c5bd 100644
--- a/hw/pci/shpc.c
+++ b/hw/pci/shpc.c
@@ -300,7 +300,6 @@ static void shpc_slot_command(SHPCDevice *shpc, uint8_t 
target,
 shpc_set_status(shpc, slot, SHPC_SLOT_STATUS_PRSNT_EMPTY,
 SHPC_SLOT_STATUS_PRSNT_MASK);
 shpc->config[SHPC_SLOT_EVENT_LATCH(slot)] |=
-SHPC_SLOT_EVENT_BUTTON |
 SHPC_SLOT_EVENT_MRL |
 SHPC_SLOT_EVENT_PRESENCE;
 }
@@ -566,7 +565,6 @@ void shpc_device_unplug_request_cb(HotplugHandler 
*hotplug_dev,
 return;
 }
 
-shpc->config[SHPC_SLOT_EVENT_LATCH(slot)] |= SHPC_SLOT_EVENT_BUTTON;
 state = shpc_get_status(shpc, slot, SHPC_SLOT_STATE_MASK);
 led = shpc_get_status(shpc, slot, SHPC_SLOT_PWR_LED_MASK);
 if (state == SHPC_STATE_DISABLED && led == SHPC_LED_OFF) {
@@ -577,6 +575,8 @@ void shpc_device_unplug_request_cb(HotplugHandler 
*hotplug_dev,
 shpc->config[SHPC_SLOT_EVENT_LATCH(slot)] |=
 SHPC_SLOT_EVENT_MRL |
 SHPC_SLOT_EVENT_PRESENCE;
+} else {
+shpc->config[SHPC_SLOT_EVENT_LATCH(slot)] |= SHPC_SLOT_EVENT_BUTTON;
 }
 shpc_set_status(shpc, slot, 0, SHPC_SLOT_STATUS_66);
 shpc_interrupt_update(pci_hotplug_dev);
-- 
2.28.0

[RFC v3 02/10] target/arm: Update ID fields

2020-11-01 Thread Peng Liang

Update definitions for ID fields, up to ARMv8.6.

Signed-off-by: zhanghailiang 
Signed-off-by: Peng Liang 
---
 target/arm/cpu.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index c18a91676656..4c76fff1985f 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1776,6 +1776,8 @@ FIELD(ID_ISAR6, DP, 4, 4)
 FIELD(ID_ISAR6, FHM, 8, 4)
 FIELD(ID_ISAR6, SB, 12, 4)
 FIELD(ID_ISAR6, SPECRES, 16, 4)
+FIELD(ID_ISAR6, BF16, 20, 4)
+FIELD(ID_ISAR6, I8MM, 24, 4)
 
 FIELD(ID_MMFR0, VMSA, 0, 4)
 FIELD(ID_MMFR0, PMSA, 4, 4)
@@ -1839,6 +1841,9 @@ FIELD(ID_AA64ISAR1, GPI, 28, 4)
 FIELD(ID_AA64ISAR1, FRINTTS, 32, 4)
 FIELD(ID_AA64ISAR1, SB, 36, 4)
 FIELD(ID_AA64ISAR1, SPECRES, 40, 4)
+FIELD(ID_AA64ISAR1, BF16, 44, 4)
+FIELD(ID_AA64ISAR1, DGH, 48, 4)
+FIELD(ID_AA64ISAR1, I8MM, 52, 4)
 
 FIELD(ID_AA64PFR0, EL0, 0, 4)
 FIELD(ID_AA64PFR0, EL1, 4, 4)
@@ -1849,11 +1854,18 @@ FIELD(ID_AA64PFR0, ADVSIMD, 20, 4)
 FIELD(ID_AA64PFR0, GIC, 24, 4)
 FIELD(ID_AA64PFR0, RAS, 28, 4)
 FIELD(ID_AA64PFR0, SVE, 32, 4)
+FIELD(ID_AA64PFR0, SEL2, 36, 4)
+FIELD(ID_AA64PFR0, MPAM, 40, 4)
+FIELD(ID_AA64PFR0, AMU, 44, 4)
+FIELD(ID_AA64PFR0, DIT, 44, 4)
+FIELD(ID_AA64PFR0, CSV2, 56, 4)
+FIELD(ID_AA64PFR0, CSV3, 60, 4)
 
 FIELD(ID_AA64PFR1, BT, 0, 4)
 FIELD(ID_AA64PFR1, SBSS, 4, 4)
 FIELD(ID_AA64PFR1, MTE, 8, 4)
 FIELD(ID_AA64PFR1, RAS_FRAC, 12, 4)
+FIELD(ID_AA64PFR1, MPAM_FRAC, 16, 4)
 
 FIELD(ID_AA64MMFR0, PARANGE, 0, 4)
 FIELD(ID_AA64MMFR0, ASIDBITS, 4, 4)
@@ -1867,6 +1879,8 @@ FIELD(ID_AA64MMFR0, TGRAN16_2, 32, 4)
 FIELD(ID_AA64MMFR0, TGRAN64_2, 36, 4)
 FIELD(ID_AA64MMFR0, TGRAN4_2, 40, 4)
 FIELD(ID_AA64MMFR0, EXS, 44, 4)
+FIELD(ID_AA64MMFR0, FGT, 56, 4)
+FIELD(ID_AA64MMFR0, ECV, 60, 4)
 
 FIELD(ID_AA64MMFR1, HAFDBS, 0, 4)
 FIELD(ID_AA64MMFR1, VMIDBITS, 4, 4)
@@ -1876,6 +1890,8 @@ FIELD(ID_AA64MMFR1, LO, 16, 4)
 FIELD(ID_AA64MMFR1, PAN, 20, 4)
 FIELD(ID_AA64MMFR1, SPECSEI, 24, 4)
 FIELD(ID_AA64MMFR1, XNX, 28, 4)
+FIELD(ID_AA64MMFR1, TWED, 32, 4)
+FIELD(ID_AA64MMFR1, ETS, 36, 4)
 
 FIELD(ID_AA64MMFR2, CNP, 0, 4)
 FIELD(ID_AA64MMFR2, UAO, 4, 4)
@@ -1902,6 +1918,7 @@ FIELD(ID_AA64DFR0, CTX_CMPS, 28, 4)
 FIELD(ID_AA64DFR0, PMSVER, 32, 4)
 FIELD(ID_AA64DFR0, DOUBLELOCK, 36, 4)
 FIELD(ID_AA64DFR0, TRACEFILT, 40, 4)
+FIELD(ID_AA64DFR0, MUPMU, 48, 4)
 
 FIELD(ID_DFR0, COPDBG, 0, 4)
 FIELD(ID_DFR0, COPSDBG, 4, 4)
-- 
2.26.2

[RFC v3 03/10] target/arm: only set ID_PFR1_EL1.GIC for AArch32 guest

2020-11-01 Thread Peng Liang

Some AArch64 CPU doesn't support AArch32 mode, AArch32 registers should
be 0.

Signed-off-by: zhanghailiang 
Signed-off-by: Peng Liang 
---
 target/arm/helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 97bb6b8c01b4..ba6f30e02f5f 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -6660,7 +6660,7 @@ static uint64_t id_pfr1_read(CPUARMState *env, const 
ARMCPRegInfo *ri)
 ARMCPU *cpu = env_archcpu(env);
 uint64_t pfr1 = cpu->isar.id_pfr1;
 
-if (env->gicv3state) {
+if (!arm_feature(&cpu->env, ARM_FEATURE_AARCH64) && env->gicv3state) {
 pfr1 |= 1 << 28;
 }
 return pfr1;
-- 
2.26.2

[RFC v3 10/10] target/arm: Add CPU features to query-cpu-model-expansion

2020-11-01 Thread Peng Liang

Add CPU features to the result of query-cpu-model-expansion so that
other applications (such as libvirt) can know the supported CPU
features.

Signed-off-by: zhanghailiang 
Signed-off-by: Peng Liang 
---
 target/arm/monitor.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/target/arm/monitor.c b/target/arm/monitor.c
index 169d8a64b651..7950206352f1 100644
--- a/target/arm/monitor.c
+++ b/target/arm/monitor.c
@@ -104,6 +104,10 @@ static const char *cpu_model_advertised_features[] = {
 "sve640", "sve768", "sve896", "sve1024", "sve1152", "sve1280",
 "sve1408", "sve1536", "sve1664", "sve1792", "sve1920", "sve2048",
 "kvm-no-adjvtime", "kvm-steal-time",
+"aes", "sha1", "sha2", "crc32", "atomics", "asimdrdm", "sha3", "sm3", 
"sm4",
+"asimddp", "asimdfhm", "flagm", "rng", "dcpop", "jscvt", "fcma", "lrcpc",
+"frint", "sb", "i8mm", "bf16", "dgh", "fp", "asimd", "dit", "bt", "sbss",
+"uscat", "fphp", "asimdhp", "pmull", "sha512", "flagm2", "dcpodp", 
"ilrcpc",
 NULL
 };
 
-- 
2.26.2

[RFC v3 07/10] target/arm: Allow ID registers to synchronize to KVM

2020-11-01 Thread Peng Liang

There are 2 steps to synchronize the values of system registers from
CPU state to KVM:
1. write to the values of system registers from CPU state to
   (index,value) list by write_cpustate_to_list;
2. write the values in (index,value) list to KVM by
   write_list_to_kvmstate;

In step 1, the values of constant system registers are not allowed to
write to (index,value) list.  However, a constant system register is
CONSTANT for guest but not for QEMU, which means, QEMU can set/modify
the value of constant system registers that is different from phsical
registers when startup.  But if KVM is enabled, guest can not read the
values of the system registers which QEMU set unless they can be written
to (index,value) list.  And why not try to write to KVM if kvm_sync is
true?

At the moment we call write_cpustate_to_list, all ID registers are
contant, including ID_PFR1_EL1 and ID_AA64PFR0_EL1 because GIC has been
initialized.  Hence, let's give all ID registers a chance to write to
KVM.  If the write is successful, then write to (index,value) list.

Signed-off-by: zhanghailiang 
Signed-off-by: Peng Liang 
---
 target/arm/helper.c  | 46 +---
 target/arm/kvm.c | 38 
 target/arm/kvm_arm.h |  3 +++
 3 files changed, 76 insertions(+), 11 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 7f7100783b3a..41d912c7b8ff 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -35,6 +35,7 @@
 #include "arm_ldst.h"
 #include "exec/cpu_ldst.h"
 #endif
+#include "kvm_arm.h"
 
 #define ARM_CPU_FREQ 10 /* FIXME: 1 GHz, should be configurable */
 
@@ -355,6 +356,16 @@ static bool raw_accessors_invalid(const ARMCPRegInfo *ri)
 return true;
 }
 
+static inline bool is_id_register(const ARMCPRegInfo *ri)
+{
+/*
+ * (Op0, Op1, CRn, CRm, Op2) of ID registers is (3, 0, 0, crm, op2),
+ * where 1<=crm<8, 0<=op2<8.
+ */
+return ri->opc0 == 3 && ri->opc1 == 0 && ri->crn == 0 &&
+ri->crm > 0 && ri->crm < 8;
+}
+
 bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync)
 {
 /* Write the coprocessor state from cpu->env to the (index,value) list. */
@@ -371,30 +382,43 @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync)
 ok = false;
 continue;
 }
-if (ri->type & ARM_CP_NO_RAW) {
+/* Let's give ID registers a chance to synchronize to kvm. */
+if ((ri->type & ARM_CP_NO_RAW) && !(kvm_sync && is_id_register(ri))) {
 continue;
 }
 
 newval = read_raw_cp_reg(&cpu->env, ri);
 if (kvm_sync) {
-/*
- * Only sync if the previous list->cpustate sync succeeded.
- * Rather than tracking the success/failure state for every
- * item in the list, we just recheck "does the raw write we must
- * have made in write_list_to_cpustate() read back OK" here.
- */
-uint64_t oldval = cpu->cpreg_values[i];
+/* Only sync if we can sync to KVM successfully. */
+uint64_t oldval;
+uint64_t kvmval;
 
+if (kvm_arm_get_one_reg(cpu, cpu->cpreg_indexes[i], &oldval)) {
+continue;
+}
 if (oldval == newval) {
 continue;
 }
 
-write_raw_cp_reg(&cpu->env, ri, oldval);
-if (read_raw_cp_reg(&cpu->env, ri) != oldval) {
+if (kvm_arm_set_one_reg(cpu, cpu->cpreg_indexes[i], &newval)) {
+if (is_id_register(ri)) {
+ok = false;
+error_report("Cannot set ID regsiter %s: %s", ri->name,
+ strerror(errno));
+}
+continue;
+}
+if (kvm_arm_get_one_reg(cpu, cpu->cpreg_indexes[i], &kvmval) ||
+kvmval != newval) {
+if (is_id_register(ri)) {
+ok = false;
+error_report("Setting ID register %s doesn't effect",
+ ri->name);
+}
 continue;
 }
 
-write_raw_cp_reg(&cpu->env, ri, newval);
+kvm_arm_set_one_reg(cpu, cpu->cpreg_indexes[i], &oldval);
 }
 cpu->cpreg_values[i] = newval;
 }
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index ffe186de8d19..40d01ed9e3a4 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -506,6 +506,44 @@ out:
 return ret;
 }
 
+int kvm_arm_get_one_reg(ARMCPU *cpu, uint64_t regidx, uint64_t *target)
+{
+uint32_t v32;
+int ret;
+
+switch (regidx & KVM_REG_SIZE_MASK) {
+case KVM_REG_SIZE_U32:
+ret = kvm_get_one_reg(CPU(cpu), regidx, &v32);
+if (ret == 0) {
+*target = v32;
+}
+return ret;
+case KVM_REG_SIZE_U64:
+return kvm_get_one_reg(CPU(cpu), regidx, target);
+default:
+return -1;
+}
+}
+

[RFC v3 05/10] target/arm: Introduce kvm_arm_cpu_feature_supported

2020-11-01 Thread Peng Liang

Introduce kvm_arm_cpu_feature_supported to check whether KVM supports to
set CPU features in ARM.

Signed-off-by: zhanghailiang 
Signed-off-by: Peng Liang 
---
 target/arm/kvm64.c   | 14 ++
 target/arm/kvm_arm.h |  7 +++
 2 files changed, 21 insertions(+)

diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 86a5bca5a4ec..5700c4084090 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -721,6 +721,20 @@ bool kvm_arm_steal_time_supported(void)
 return kvm_check_extension(kvm_state, KVM_CAP_STEAL_TIME);
 }
 
+bool kvm_arm_cpu_feature_supported(void)
+{
+static bool cpu_feature_initialized;
+static bool cpu_feature_supported;
+
+if (!cpu_feature_initialized) {
+cpu_feature_supported = kvm_check_extension(kvm_state,
+KVM_CAP_ARM_CPU_FEATURE);
+cpu_feature_initialized = true;
+}
+
+return cpu_feature_supported;
+}
+
 QEMU_BUILD_BUG_ON(KVM_ARM64_SVE_VQ_MIN != 1);
 
 void kvm_arm_sve_get_vls(CPUState *cs, unsigned long *map)
diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index eb81b7059eb1..a6a1df775cd2 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -308,6 +308,13 @@ bool kvm_arm_pmu_supported(void);
  */
 bool kvm_arm_sve_supported(void);
 
+/**
+ * kvm_arm_cpu_feature_supported:
+ *
+ * Returns true if KVM can set CPU features and false otherwise.
+ */
+bool kvm_arm_cpu_feature_supported(void);
+
 /**
  * kvm_arm_get_max_vm_ipa_size:
  * @ms: Machine state handle
-- 
2.26.2

[RFC v3 01/10] linux-header: Introduce KVM_CAP_ARM_CPU_FEATURE

2020-11-01 Thread Peng Liang

Introduce KVM_CAP_ARM_CPU_FEATURE.

Signed-off-by: zhanghailiang 
Signed-off-by: Peng Liang 
---
 linux-headers/linux/kvm.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 43580c767c33..146eaec35d49 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1037,6 +1037,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_SMALLER_MAXPHYADDR 185
 #define KVM_CAP_S390_DIAG318 186
 #define KVM_CAP_STEAL_TIME 187
+#define KVM_CAP_ARM_CPU_FEATURE 191
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.26.2

[RFC v3 06/10] target/arm: register CPU features for property

2020-11-01 Thread Peng Liang

The Arm architecture specifies a number of ID registers that are
characterized as comprising a set of 4-bit ID fields. Each ID field
identifies the presence, and possibly the level of support for, a
particular feature in an implementation of the architecture. [1]

For most of the ID fields, there is a minimum presence value, equal to
or higher than which means the corresponding CPU feature is implemented.
Hence, we can use the minimum presence value to determine whether a CPU
feature is enabled and enable a CPU feature.

To disable a CPU feature, setting the corresponding ID field to 0x0/0xf
(for unsigned/signed field) seems as a good idea.  However, it maybe
lead to some problems.  For example,  ID_AA64PFR0_EL1.FP is a signed ID
field. ID_AA64PFR0_EL1.FP == 0x0 represents the implementation of FP
(floating-point) and ID_AA64PFR0_EL1.FP == 0x1 represents the
implementation of FPHP (half-precision floating-point).  If
ID_AA64PFR0_EL1.FP is set to 0xf when FPHP is disabled (which is also
disable FP), guest kernel maybe stuck.  Hence, we add a ni_value (means
not-implemented value) to disable a CPU feature safely.

[1] D13.1.3 Principles of the ID scheme for fields in ID registers in
DDI.0487

Signed-off-by: zhanghailiang 
Signed-off-by: Peng Liang 
---
 target/arm/cpu.c | 188 +++
 1 file changed, 188 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 13179e13e358..c5530550ece0 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1157,6 +1157,193 @@ unsigned int gt_cntfrq_period_ns(ARMCPU *cpu)
   NANOSECONDS_PER_SECOND / cpu->gt_cntfrq_hz : 1;
 }
 
+/**
+ * CPUFeatureInfo:
+ * @reg: The ID register where the ID field is in.
+ * @name: The name of the CPU feature.
+ * @length: The bit length of the ID field.
+ * @shift: The bit shift of the ID field in the ID register.
+ * @min_value: The minimum value equal to or larger than which means the CPU
+ *   feature is implemented.
+ * @ni_value: Not-implemented value. It will be set to the ID field when
+ *   disabling the CPU feature.  Usually, it's min_value - 1.
+ * @sign: Whether the ID field is signed.
+ * @is_32bit: Whether the CPU feature is for 32-bit.
+ *
+ * In ARM, a CPU feature is described by an ID field, which is a 4-bit field in
+ * an ID register.
+ */
+typedef struct CPUFeatureInfo {
+CPUIDReg reg;
+const char *name;
+int length;
+int shift;
+int min_value;
+int ni_value;
+bool sign;
+bool is_32bit;
+} CPUFeatureInfo;
+
+#define FIELD_INFO(feat_name, id_reg, field, s, min_val, ni_val, is32bit) { \
+.reg = id_reg, \
+.length = R_ ## id_reg ## _ ## field ## _LENGTH, \
+.shift = R_ ## id_reg ## _ ## field ## _SHIFT, \
+.sign = s, \
+.min_value = min_val, \
+.ni_value = ni_val, \
+.name = feat_name, \
+.is_32bit = is32bit, \
+}
+
+static struct CPUFeatureInfo cpu_features[] = {
+FIELD_INFO("aes", ID_AA64ISAR0, AES, false, 1, 0, false),
+FIELD_INFO("sha1", ID_AA64ISAR0, SHA1, false, 1, 0, false),
+FIELD_INFO("sha2", ID_AA64ISAR0, SHA2, false, 1, 0, false),
+FIELD_INFO("crc32", ID_AA64ISAR0, CRC32, false, 1, 0, false),
+FIELD_INFO("atomics", ID_AA64ISAR0, ATOMIC, false, 1, 0, false),
+FIELD_INFO("asimdrdm", ID_AA64ISAR0, RDM, false, 1, 0, false),
+FIELD_INFO("sha3", ID_AA64ISAR0, SHA3, false, 1, 0, false),
+FIELD_INFO("sm3", ID_AA64ISAR0, SM3, false, 1, 0, false),
+FIELD_INFO("sm4", ID_AA64ISAR0, SM4, false, 1, 0, false),
+FIELD_INFO("asimddp", ID_AA64ISAR0, DP, false, 1, 0, false),
+FIELD_INFO("asimdfhm", ID_AA64ISAR0, FHM, false, 1, 0, false),
+FIELD_INFO("flagm", ID_AA64ISAR0, TS, false, 1, 0, false),
+FIELD_INFO("rng", ID_AA64ISAR0, RNDR, false, 1, 0, false),
+
+FIELD_INFO("dcpop", ID_AA64ISAR1, DPB, false, 1, 0, false),
+FIELD_INFO("jscvt", ID_AA64ISAR1, JSCVT, false, 1, 0, false),
+FIELD_INFO("fcma", ID_AA64ISAR1, FCMA, false, 1, 0, false),
+FIELD_INFO("lrcpc", ID_AA64ISAR1, LRCPC, false, 1, 0, false),
+FIELD_INFO("frint", ID_AA64ISAR1, FRINTTS, false, 1, 0, false),
+FIELD_INFO("sb", ID_AA64ISAR1, SB, false, 1, 0, false),
+FIELD_INFO("i8mm", ID_AA64ISAR1, I8MM, false, 1, 0, false),
+FIELD_INFO("bf16", ID_AA64ISAR1, BF16, false, 1, 0, false),
+FIELD_INFO("dgh", ID_AA64ISAR1, DGH, false, 1, 0, false),
+
+FIELD_INFO("fp", ID_AA64PFR0, FP, true, 0, 0xf, false),
+FIELD_INFO("asimd", ID_AA64PFR0, ADVSIMD, true, 0, 0xf, false),
+FIELD_INFO("dit", ID_AA64PFR0, DIT, false, 1, 0, false),
+
+FIELD_INFO("bt", ID_AA64PFR1, BT, false, 1, 0, false),
+FIELD_INFO("sbss", ID_AA64PFR1, SBSS, false, 1, 0, false),
+
+FIELD_INFO("uscat", ID_AA64MMFR2, AT, false, 1, 0, false),
+
+{
+.reg = ID_AA64PFR0, .length = R_ID_AA64PFR0_FP_LENGTH,
+.shift = R_ID_AA64PFR0_FP_SHIFT, .sign = true, .min_value = 1,
+.ni_value = 0, .name = "fphp", .is_32bit = false,
+},
+{
+.reg = ID_AA64PFR0, .len

[RFC v3 00/10] Support disable/enable CPU features for AArch64

2020-11-01 Thread Peng Liang

QEMU does not support disable/enable CPU features in AArch64 for now.
This patch series add support for CPU features in AArch64.

Firstly, we change the isar struct in ARMCPU to an array for
convenience.  Secondly, we add support to configure CPU feautres in
AArch64 and make sure that the ID registers can be synchronized to KVM
so that guest can read the value we configure.  Thirdly, we add a
mechanism to solve the dependency relationship of some CPU features.
Last, we add a KVM_CAP_ARM_CPU_FEATURE to check whether KVM supports to
set CPU features in AArch64.

Also export CPU features to the result of qmp query-cpu-model-expansion
so that libvirt can get the supported CPU features.

Update the ID fields to ARMv8.6 and add some CPU features according to
the new ID fields.

With related KVM patch set[1], we can disable/enable CPU features in
AArch64.

[1] 
https://patchwork.kernel.org/project/kvm/cover/20201102033422.657391-1-liangpen...@huawei.com/

v2 -> v3:
 - rebase to newest code

v1 -> v2:
 - adjust the order of patches
 - only expose AArch64 features which are exposed by kernel via /proc/cpuinfo
 - add check for conflict CPU features set by user
 - split the change in linux-headers/linux/kvm.h

Peng Liang (10):
  linux-header: Introduce KVM_CAP_ARM_CPU_FEATURE
  target/arm: Update ID fields
  target/arm: only set ID_PFR1_EL1.GIC for AArch32 guest
  target/arm: convert isar regs to array
  target/arm: Introduce kvm_arm_cpu_feature_supported
  target/arm: register CPU features for property
  target/arm: Allow ID registers to synchronize to KVM
  target/arm: Introduce user_mask to indicate whether the feature is set
explicitly
  target/arm: introduce CPU feature dependency mechanism
  target/arm: Add CPU features to query-cpu-model-expansion

 hw/intc/armv7m_nvic.c |  32 +--
 linux-headers/linux/kvm.h |   1 +
 target/arm/cpu.c  | 575 +-
 target/arm/cpu.h  | 255 +
 target/arm/cpu64.c| 190 ++---
 target/arm/cpu_tcg.c  | 314 +++--
 target/arm/helper.c   | 106 ---
 target/arm/internals.h|  15 +-
 target/arm/kvm.c  |  38 +++
 target/arm/kvm64.c|  90 +++---
 target/arm/kvm_arm.h  |  10 +
 target/arm/monitor.c  |   4 +
 12 files changed, 1038 insertions(+), 592 deletions(-)

-- 
2.26.2

[RFC v3 04/10] target/arm: convert isar regs to array

2020-11-01 Thread Peng Liang

The isar in ARMCPU is a struct, each field of which represents an ID
register.  It's not convenient for us to support CPU feature in AArch64.
So let's change it to an array first and add an enum as the index of the
array for convenience.  Since we will never access high 32-bits of ID
registers in AArch32, it's harmless to change the ID registers in
AArch32 to 64-bits.

Signed-off-by: zhanghailiang 
Signed-off-by: Peng Liang 
---
 hw/intc/armv7m_nvic.c  |  32 ++---
 target/arm/cpu.c   | 251 
 target/arm/cpu.h   | 237 ---
 target/arm/cpu64.c | 190 -
 target/arm/cpu_tcg.c   | 314 +
 target/arm/helper.c|  58 
 target/arm/internals.h |  15 +-
 target/arm/kvm64.c |  76 +-
 8 files changed, 593 insertions(+), 580 deletions(-)

diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
index 42b1ad59e65d..9fad5df74481 100644
--- a/hw/intc/armv7m_nvic.c
+++ b/hw/intc/armv7m_nvic.c
@@ -1241,17 +1241,17 @@ static uint32_t nvic_readl(NVICState *s, uint32_t 
offset, MemTxAttrs attrs)
 if (!arm_feature(&cpu->env, ARM_FEATURE_M_MAIN)) {
 goto bad_offset;
 }
-return cpu->isar.id_pfr0;
+return cpu->isar.regs[ID_PFR0];
 case 0xd44: /* PFR1.  */
 if (!arm_feature(&cpu->env, ARM_FEATURE_M_MAIN)) {
 goto bad_offset;
 }
-return cpu->isar.id_pfr1;
+return cpu->isar.regs[ID_PFR1];
 case 0xd48: /* DFR0.  */
 if (!arm_feature(&cpu->env, ARM_FEATURE_M_MAIN)) {
 goto bad_offset;
 }
-return cpu->isar.id_dfr0;
+return cpu->isar.regs[ID_DFR0];
 case 0xd4c: /* AFR0.  */
 if (!arm_feature(&cpu->env, ARM_FEATURE_M_MAIN)) {
 goto bad_offset;
@@ -1261,52 +1261,52 @@ static uint32_t nvic_readl(NVICState *s, uint32_t 
offset, MemTxAttrs attrs)
 if (!arm_feature(&cpu->env, ARM_FEATURE_M_MAIN)) {
 goto bad_offset;
 }
-return cpu->isar.id_mmfr0;
+return cpu->isar.regs[ID_MMFR0];
 case 0xd54: /* MMFR1.  */
 if (!arm_feature(&cpu->env, ARM_FEATURE_M_MAIN)) {
 goto bad_offset;
 }
-return cpu->isar.id_mmfr1;
+return cpu->isar.regs[ID_MMFR1];
 case 0xd58: /* MMFR2.  */
 if (!arm_feature(&cpu->env, ARM_FEATURE_M_MAIN)) {
 goto bad_offset;
 }
-return cpu->isar.id_mmfr2;
+return cpu->isar.regs[ID_MMFR2];
 case 0xd5c: /* MMFR3.  */
 if (!arm_feature(&cpu->env, ARM_FEATURE_M_MAIN)) {
 goto bad_offset;
 }
-return cpu->isar.id_mmfr3;
+return cpu->isar.regs[ID_MMFR3];
 case 0xd60: /* ISAR0.  */
 if (!arm_feature(&cpu->env, ARM_FEATURE_M_MAIN)) {
 goto bad_offset;
 }
-return cpu->isar.id_isar0;
+return cpu->isar.regs[ID_ISAR0];
 case 0xd64: /* ISAR1.  */
 if (!arm_feature(&cpu->env, ARM_FEATURE_M_MAIN)) {
 goto bad_offset;
 }
-return cpu->isar.id_isar1;
+return cpu->isar.regs[ID_ISAR1];
 case 0xd68: /* ISAR2.  */
 if (!arm_feature(&cpu->env, ARM_FEATURE_M_MAIN)) {
 goto bad_offset;
 }
-return cpu->isar.id_isar2;
+return cpu->isar.regs[ID_ISAR2];
 case 0xd6c: /* ISAR3.  */
 if (!arm_feature(&cpu->env, ARM_FEATURE_M_MAIN)) {
 goto bad_offset;
 }
-return cpu->isar.id_isar3;
+return cpu->isar.regs[ID_ISAR3];
 case 0xd70: /* ISAR4.  */
 if (!arm_feature(&cpu->env, ARM_FEATURE_M_MAIN)) {
 goto bad_offset;
 }
-return cpu->isar.id_isar4;
+return cpu->isar.regs[ID_ISAR4];
 case 0xd74: /* ISAR5.  */
 if (!arm_feature(&cpu->env, ARM_FEATURE_M_MAIN)) {
 goto bad_offset;
 }
-return cpu->isar.id_isar5;
+return cpu->isar.regs[ID_ISAR5];
 case 0xd78: /* CLIDR */
 return cpu->clidr;
 case 0xd7c: /* CTR */
@@ -1510,11 +1510,11 @@ static uint32_t nvic_readl(NVICState *s, uint32_t 
offset, MemTxAttrs attrs)
 }
 return cpu->env.v7m.fpdscr[attrs.secure];
 case 0xf40: /* MVFR0 */
-return cpu->isar.mvfr0;
+return cpu->isar.regs[MVFR0];
 case 0xf44: /* MVFR1 */
-return cpu->isar.mvfr1;
+return cpu->isar.regs[MVFR1];
 case 0xf48: /* MVFR2 */
-return cpu->isar.mvfr2;
+return cpu->isar.regs[MVFR2];
 default:
 bad_offset:
 qemu_log_mask(LOG_GUEST_ERROR, "NVIC: Bad read offset 0x%x\n", offset);
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 07492e9f9a44..13179e13e358 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -170,9 +170,9 @@ static void arm_cpu_reset(DeviceState *dev)
 g_hash_table_foreach(cpu->cp_regs, cp_reg_check_reset, cpu);
 
 env->vfp.xregs[ARM_VFP_FPSID] = cpu->r

[RFC v3 08/10] target/arm: Introduce user_mask to indicate whether the feature is set explicitly

2020-11-01 Thread Peng Liang

To add CPU feature dependencies, we need to known whether a CPU feature
is set explicitly or automatically by dependencies mechanism.  Introduce
user_mask to do that.

Signed-off-by: zhanghailiang 
Signed-off-by: Peng Liang 
---
 target/arm/cpu.c | 2 ++
 target/arm/cpu.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index c5530550ece0..8c84a16d92a8 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1306,6 +1306,8 @@ static void arm_cpu_set_feature_prop(Object *obj, Visitor 
*v, const char *name,
 return;
 }
 
+isar->user_mask[feat->reg] |= MAKE_64BIT_MASK(feat->shift, feat->length);
+
 if (value) {
 if (object_property_get_bool(obj, feat->name, &error_abort)) {
 return;
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index c20f1ae20429..1ee653a712fd 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -932,6 +932,7 @@ struct ARMCPU {
  */
 struct ARMISARegisters {
 uint64_t regs[ID_MAX];
+uint64_t user_mask[ID_MAX];
 } isar;
 uint64_t midr;
 uint32_t revidr;
-- 
2.26.2

[RFC v3 09/10] target/arm: introduce CPU feature dependency mechanism

2020-11-01 Thread Peng Liang

Some CPU features are dependent on other CPU features.  For example,
ID_AA64PFR0_EL1.FP field and ID_AA64PFR0_EL1.AdvSIMD must have the same
value, which means FP and ADVSIMD are dependent on each other, FPHP and
ADVSIMDHP are dependent on each other.

This commit introduces a mechanism for CPU feature dependency in
AArch64.  We build a directed graph from the CPU feature dependency
relationship, each edge from->to means the `to` CPU feature is dependent
on the `from` CPU feature.  And we will automatically enable/disable CPU
feature according to the directed graph.

For example, a and b CPU features are in relationship a->b, which means
b is dependent on a.  If b is enabled by user, then a is enabled
automatically.  And if a is disabled by user, then b is disabled
automatically.

Signed-off-by: zhanghailiang 
Signed-off-by: Peng Liang 
---
 target/arm/cpu.c | 134 +++
 1 file changed, 134 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 8c84a16d92a8..9d5916719a24 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1266,6 +1266,107 @@ static struct CPUFeatureInfo cpu_features[] = {
 },
 };
 
+typedef struct CPUFeatureDep {
+CPUFeatureInfo from, to;
+} CPUFeatureDep;
+
+static const CPUFeatureDep feature_dependencies[] = {
+{
+.from = FIELD_INFO("fp", ID_AA64PFR0, FP, true, 0, 0xf, false),
+.to = FIELD_INFO("asimd", ID_AA64PFR0, ADVSIMD, true, 0, 0xf, false),
+},
+{
+.from = FIELD_INFO("asimd", ID_AA64PFR0, ADVSIMD, true, 0, 0xf, false),
+.to = FIELD_INFO("fp", ID_AA64PFR0, FP, true, 0, 0xf, false),
+},
+{
+.from = {
+.reg = ID_AA64PFR0, .length = R_ID_AA64PFR0_FP_LENGTH,
+.shift = R_ID_AA64PFR0_FP_SHIFT, .sign = true, .min_value = 1,
+.ni_value = 0, .name = "fphp", .is_32bit = false,
+},
+.to = {
+.reg = ID_AA64PFR0, .length = R_ID_AA64PFR0_ADVSIMD_LENGTH,
+.shift = R_ID_AA64PFR0_ADVSIMD_SHIFT, .sign = true, .min_value = 1,
+.ni_value = 0, .name = "asimdhp", .is_32bit = false,
+},
+},
+{
+.from = {
+.reg = ID_AA64PFR0, .length = R_ID_AA64PFR0_ADVSIMD_LENGTH,
+.shift = R_ID_AA64PFR0_ADVSIMD_SHIFT, .sign = true, .min_value = 1,
+.ni_value = 0, .name = "asimdhp", .is_32bit = false,
+},
+.to = {
+.reg = ID_AA64PFR0, .length = R_ID_AA64PFR0_FP_LENGTH,
+.shift = R_ID_AA64PFR0_FP_SHIFT, .sign = true, .min_value = 1,
+.ni_value = 0, .name = "fphp", .is_32bit = false,
+},
+},
+{
+
+.from = FIELD_INFO("aes", ID_AA64ISAR0, AES, false, 1, 0, false),
+.to = {
+.reg = ID_AA64ISAR0, .length = R_ID_AA64ISAR0_AES_LENGTH,
+.shift = R_ID_AA64ISAR0_AES_SHIFT, .sign = false, .min_value = 2,
+.ni_value = 1, .name = "pmull", .is_32bit = false,
+},
+},
+{
+
+.from = FIELD_INFO("sha2", ID_AA64ISAR0, SHA2, false, 1, 0, false),
+.to = {
+.reg = ID_AA64ISAR0, .length = R_ID_AA64ISAR0_SHA2_LENGTH,
+.shift = R_ID_AA64ISAR0_SHA2_SHIFT, .sign = false, .min_value = 2,
+.ni_value = 1, .name = "sha512", .is_32bit = false,
+},
+},
+{
+.from = FIELD_INFO("lrcpc", ID_AA64ISAR1, LRCPC, false, 1, 0, false),
+.to = {
+.reg = ID_AA64ISAR1, .length = R_ID_AA64ISAR1_LRCPC_LENGTH,
+.shift = R_ID_AA64ISAR1_LRCPC_SHIFT, .sign = false, .min_value = 2,
+.ni_value = 1, .name = "ilrcpc", .is_32bit = false,
+},
+},
+{
+.from = FIELD_INFO("sm3", ID_AA64ISAR0, SM3, false, 1, 0, false),
+.to = FIELD_INFO("sm4", ID_AA64ISAR0, SM4, false, 1, 0, false),
+},
+{
+.from = FIELD_INFO("sm4", ID_AA64ISAR0, SM4, false, 1, 0, false),
+.to = FIELD_INFO("sm3", ID_AA64ISAR0, SM3, false, 1, 0, false),
+},
+{
+.from = FIELD_INFO("sha1", ID_AA64ISAR0, SHA1, false, 1, 0, false),
+.to = FIELD_INFO("sha2", ID_AA64ISAR0, SHA2, false, 1, 0, false),
+},
+{
+.from = FIELD_INFO("sha2", ID_AA64ISAR0, SHA2, false, 1, 0, false),
+.to = FIELD_INFO("sha1", ID_AA64ISAR0, SHA1, false, 1, 0, false),
+},
+{
+.from = FIELD_INFO("sha1", ID_AA64ISAR0, SHA1, false, 1, 0, false),
+.to = FIELD_INFO("sha3", ID_AA64ISAR0, SHA3, false, 1, 0, false),
+},
+{
+.from = FIELD_INFO("sha3", ID_AA64ISAR0, SHA3, false, 1, 0, false),
+.to = {
+.reg = ID_AA64ISAR0, .length = R_ID_AA64ISAR0_SHA2_LENGTH,
+.shift = R_ID_AA64ISAR0_SHA2_SHIFT, .sign = false, .min_value = 2,
+.ni_value = 1, .name = "sha512", .is_32bit = false,
+},
+},
+{
+.from = {
+.reg = ID_AA64ISAR0, .length = R_ID_AA64ISAR0_SHA2_LENGTH,
+.shift = R_ID_AA64ISAR0_SHA2_SHIFT, .sign = false

Re: [PATCH v10 1/8] Introduce yank feature

2020-11-01 Thread Markus Armbruster

Lukas Straub  writes:

> The yank feature allows to recover from hanging qemu by "yanking"
> at various parts. Other qemu systems can register themselves and
> multiple yank functions. Then all yank functions for selected
> instances can be called by the 'yank' out-of-band qmp command.
> Available instances can be queried by a 'query-yank' oob command.
>
> Signed-off-by: Lukas Straub 
> Acked-by: Stefan Hajnoczi 
[...]
>  qapi_storage_daemon_modules = [
> diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
> index 0b444b76d2..79c1705ed7 100644
> --- a/qapi/qapi-schema.json
> +++ b/qapi/qapi-schema.json
> @@ -91,3 +91,4 @@
>  { 'include': 'audio.json' }
>  { 'include': 'acpi.json' }
>  { 'include': 'pci.json' }
> +{ 'include': 'yank.json' }

This adds the documentation at the very end of the reference manual.  Is
this where you want it to go?  Check generated
docs/interop/qemu-qmp-ref.html.

> diff --git a/qapi/yank.json b/qapi/yank.json
> new file mode 100644
> index 00..1964a2202e
> --- /dev/null
> +++ b/qapi/yank.json
> @@ -0,0 +1,115 @@
> +# -*- Mode: Python -*-
> +# vim: filetype=python
> +#
> +

Please add a suitable heading here.  Headings look like this:

   ##
   # Text of heading goes here
   ##

Without it, the yank stuff gets squashed into the previous section
(happens to be PCI).

If you want to add an introduction or overview, it goes right below the
heading.  I'm not asking you to do that, I'm only telling you what's
possible.

[...]

Solid work, pleasant to review, thanks!

Reviewed-by: Markus Armbruster

Re: [PATCH v10 7/8] MAINTAINERS: Add myself as maintainer for yank feature

2020-11-01 Thread Markus Armbruster

Lukas Straub  writes:

> I'll maintain this for now as the colo usecase is the first user
> of this functionality.
>
> Signed-off-by: Lukas Straub 
> Acked-by: Stefan Hajnoczi 
> ---
>  MAINTAINERS | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 8c744a9bdf..81288fd219 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2676,6 +2676,13 @@ F: util/uuid.c
>  F: include/qemu/uuid.h
>  F: tests/test-uuid.c
>
> +Yank feature
> +M: Lukas Straub 
> +S: Odd fixes
> +F: util/yank.c
> +F: include/qemu/yank.h
> +F: qapi/yank.json
> +
>  COLO Framework
>  M: zhanghailiang 
>  S: Maintained

I'd squash this into PATCH 1 to mollify checkpatch.pl.

Regardless,
Reviewed-by: Markus Armbruster

96 matches

Mail list logo