date:20210426

Re: [PATCH] qapi: deprecate drive-backup

2021-04-26 Thread Peter Krempa

On Fri, Apr 23, 2021 at 15:59:00 +0300, Vladimir Sementsov-Ogievskiy wrote:
> Modern way is using blockdev-add + blockdev-backup, which provides a
> lot more control on how target is opened.
> 
> As example of drive-backup problems consider the following:
> 
> User of drive-backup expects that target will be opened in the same
> cache and aio mode as source. Corresponding logic is in
> drive_backup_prepare(), where we take bs->open_flags of source.
> 
> It works rather bad if source was added by blockdev-add. Assume source
> is qcow2 image. On blockdev-add we should specify aio and cache options
> for file child of qcow2 node. What happens next:
> 
> drive_backup_prepare() looks at bs->open_flags of qcow2 source node.
> But there no BDRV_O_NOCAHE neither BDRV_O_NATIVE_AIO: BDRV_O_NOCAHE is
> places in bs->file->bs->open_flags, and BDRV_O_NATIVE_AIO is nowhere,
> as file-posix parse options and simply set s->use_linux_aio.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
> 
> Hi all! I remember, I suggested to deprecate drive-backup some time ago,
> and nobody complain.. But that old patch was inside the series with
> other more questionable deprecations and it did not landed.
> 
> Let's finally deprecate what should be deprecated long ago.
> 
> We now faced a problem in our downstream, described in commit message.
> In downstream I've fixed it by simply enabling O_DIRECT and linux_aio
> unconditionally for drive_backup target. But actually this just shows
> that using drive-backup in blockdev era is a bad idea. So let's motivate
> everyone (including Virtuozzo of course) to move to new interfaces and
> avoid problems with all that outdated option inheritance.

libvirt never used 'drive-backup' thus

Reviewed-by: Peter Krempa

Re: [PULL 1/2] amd_iommu: Fix pte_override_page_mask()

2021-04-26 Thread Jean-Philippe Brucker

On Fri, Apr 23, 2021 at 05:11:33PM +0100, Peter Maydell wrote:
> > > Jean-Philippe, do you know if this is a regression since 5.2?
> >
> > I don't think so, I can reproduce it with v5.2.0.
> 
> OK, thanks; I think I favour not putting this into rc5, then.

No problem, please let me know if I should resend after the next release

Thanks,
Jean

[PATCH] hw/block/nvme: fix csi field for cns 0x00 and 0x11

2021-04-26 Thread Gollu Appalanaidu

As per the TP 4056d Namespace types CNS 0x00 and CNS 0x11
CSI field shouldn't use but it is being used for these two
Identify command CNS values, fix that.

Signed-off-by: Gollu Appalanaidu 
---
 hw/nvme/ctrl.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 2e7498a73e..1657b1d04a 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -4244,11 +4244,16 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, 
NvmeRequest *req, bool active)
 }
 }
 
-if (c->csi == NVME_CSI_NVM && nvme_csi_has_nvm_support(ns)) {
-return nvme_c2h(n, (uint8_t *)&ns->id_ns, sizeof(NvmeIdNs), req);
+if (active && nvme_csi_has_nvm_support(ns)) {
+goto out;
+} else if (!active && ns->csi == NVME_CSI_NVM) {
+goto out;
+} else {
+return NVME_INVALID_CMD_SET | NVME_DNR;
 }
 
-return NVME_INVALID_CMD_SET | NVME_DNR;
+out:
+return nvme_c2h(n, (uint8_t *)&ns->id_ns, sizeof(NvmeIdNs), req);
 }
 
 static uint16_t nvme_identify_ns_attached_list(NvmeCtrl *n, NvmeRequest *req)
-- 
2.17.1

Re: [PATCH 00/14] hw(/block/)nvme: spring cleaning

2021-04-26 Thread Klaus Jensen


On Apr 19 21:27, Klaus Jensen wrote:

From: Klaus Jensen 

This series consists of various clean up patches.

The final patch moves nvme emulation from hw/block to hw/nvme.

Klaus Jensen (14):
 hw/block/nvme: rename __nvme_zrm_open
 hw/block/nvme: rename __nvme_advance_zone_wp
 hw/block/nvme: rename __nvme_select_ns_iocs
 hw/block/nvme: consolidate header files
 hw/block/nvme: cleanup includes
 hw/block/nvme: remove non-shared defines from header file
 hw/block/nvme: replace nvme_ns_status
 hw/block/nvme: cache lba and ms sizes
 hw/block/nvme: add metadata offset helper
 hw/block/nvme: streamline namespace array indexing
 hw/block/nvme: remove num_namespaces member
 hw/block/nvme: remove irrelevant zone resource checks
 hw/block/nvme: move zoned constraints checks
 hw/nvme: move nvme emulation out of hw/block

meson.build   |   1 +
hw/block/nvme-dif.h   |  63 ---
hw/block/nvme-ns.h| 229 -
hw/block/nvme-subsys.h|  59 ---
hw/block/nvme.h   | 266 ---
hw/nvme/nvme.h| 547 ++
hw/nvme/trace.h   |   1 +
hw/{block/nvme.c => nvme/ctrl.c}  | 204 
hw/{block/nvme-dif.c => nvme/dif.c}   |  57 +--
hw/{block/nvme-ns.c => nvme/ns.c} | 104 ++--
hw/{block/nvme-subsys.c => nvme/subsys.c} |  13 +-
MAINTAINERS   |   2 +-
hw/Kconfig|   1 +
hw/block/Kconfig  |   5 -
hw/block/meson.build  |   1 -
hw/block/trace-events | 206 
hw/meson.build|   1 +
hw/nvme/Kconfig   |   4 +
hw/nvme/meson.build   |   1 +
hw/nvme/trace-events  | 204 
20 files changed, 928 insertions(+), 1041 deletions(-)
delete mode 100644 hw/block/nvme-dif.h
delete mode 100644 hw/block/nvme-ns.h
delete mode 100644 hw/block/nvme-subsys.h
delete mode 100644 hw/block/nvme.h
create mode 100644 hw/nvme/nvme.h
create mode 100644 hw/nvme/trace.h
rename hw/{block/nvme.c => nvme/ctrl.c} (98%)
rename hw/{block/nvme-dif.c => nvme/dif.c} (90%)
rename hw/{block/nvme-ns.c => nvme/ns.c} (87%)
rename hw/{block/nvme-subsys.c => nvme/subsys.c} (85%)
create mode 100644 hw/nvme/Kconfig
create mode 100644 hw/nvme/meson.build
create mode 100644 hw/nvme/trace-events

--
2.31.1




Applied to nvme-next.


signature.asc
Description: PGP signature

[PATCH 1/1] amd_iommu: fix wrong MMIO operations

2021-04-26 Thread Roman Kapl

Address was swapped with value when writing MMIO registers, so the user
saw garbage in lot of cases. The interrupt status was not correctly set.

Signed-off-by: Roman Kapl 
---
 hw/i386/amd_iommu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 74a93a5d93..bb5ce8c04d 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -141,13 +141,13 @@ static bool amdvi_test_mask(AMDVIState *s, hwaddr addr, 
uint64_t val)
 /* OR a 64-bit register with a 64-bit value storing result in the register */
 static void amdvi_assign_orq(AMDVIState *s, hwaddr addr, uint64_t val)
 {
-amdvi_writeq_raw(s, addr, amdvi_readq(s, addr) | val);
+amdvi_writeq_raw(s, amdvi_readq(s, addr) | val, addr);
 }
 
 /* AND a 64-bit register with a 64-bit value storing result in the register */
 static void amdvi_assign_andq(AMDVIState *s, hwaddr addr, uint64_t val)
 {
-   amdvi_writeq_raw(s, addr, amdvi_readq(s, addr) & val);
+   amdvi_writeq_raw(s, amdvi_readq(s, addr) & val, addr);
 }
 
 static void amdvi_generate_msi_interrupt(AMDVIState *s)
@@ -382,7 +382,7 @@ static void amdvi_completion_wait(AMDVIState *s, uint64_t 
*cmd)
 }
 /* set completion interrupt */
 if (extract64(cmd[0], 1, 1)) {
-amdvi_test_mask(s, AMDVI_MMIO_STATUS, AMDVI_MMIO_STATUS_COMP_INT);
+amdvi_assign_orq(s, AMDVI_MMIO_STATUS, AMDVI_MMIO_STATUS_COMP_INT);
 /* generate interrupt */
 amdvi_generate_msi_interrupt(s);
 }
-- 
2.20.1

Re: [PATCH] hw/pci-host: Do not build gpex-acpi.c if GPEX is not selected

2021-04-26 Thread Gerd Hoffmann

On Sun, Apr 25, 2021 at 08:21:24PM +0200, Philippe Mathieu-Daudé wrote:
> Since its introduction in commit 5b85eabe68f ("acpi: add
> acpi_dsdt_add_gpex") we build gpex-acpi.c if ACPI is selected,
> even if the GPEX_HOST device isn't build. Add the missing
> Kconfig dependency.
> 
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Gerd Hoffmann

Re: [PATCH v3 06/33] util/async: aio_co_schedule(): support reschedule in same ctx

2021-04-26 Thread Vladimir Sementsov-Ogievskiy


23.04.2021 13:09, Roman Kagan wrote:

On Fri, Apr 16, 2021 at 11:08:44AM +0300, Vladimir Sementsov-Ogievskiy wrote:

With the following patch we want to call wake coroutine from thread.
And it doesn't work with aio_co_wake:
Assume we have no iothreads.
Assume we have a coroutine A, which waits in the yield point for
external aio_co_wake(), and no progress can be done until it happen.
Main thread is in blocking aio_poll() (for example, in blk_read()).

Now, in a separate thread we do aio_co_wake(). It calls  aio_co_enter(),
which goes through last "else" branch and do aio_context_acquire(ctx).

Now we have a deadlock, as aio_poll() will not release the context lock
until some progress is done, and progress can't be done until
aio_co_wake() wake the coroutine A. And it can't because it wait for
aio_context_acquire().

Still, aio_co_schedule() works well in parallel with blocking
aio_poll(). So we want use it. Let's add a possibility of rescheduling
coroutine in same ctx where it was yield'ed.

Fetch co->ctx in same way as it is done in aio_co_wake().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/aio.h | 2 +-
  util/async.c| 8 
  2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/block/aio.h b/include/block/aio.h
index 5f342267d5..744b695525 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -643,7 +643,7 @@ static inline bool aio_node_check(AioContext *ctx, bool 
is_external)
  
  /**

   * aio_co_schedule:
- * @ctx: the aio context
+ * @ctx: the aio context, if NULL, the current ctx of @co will be used.
   * @co: the coroutine
   *
   * Start a coroutine on a remote AioContext.
diff --git a/util/async.c b/util/async.c
index 674dbefb7c..750be555c6 100644
--- a/util/async.c
+++ b/util/async.c
@@ -545,6 +545,14 @@ fail:
  
  void aio_co_schedule(AioContext *ctx, Coroutine *co)

  {
+if (!ctx) {
+/*
+ * Read coroutine before co->ctx.  Matches smp_wmb in
+ * qemu_coroutine_enter.
+ */
+smp_read_barrier_depends();
+ctx = qatomic_read(&co->ctx);
+}


I'd rather not extend this interface, but add a new one on top.  And
document how it's different from aio_co_wake().



Agree, that's better. Will do.



--
Best regards,
Vladimir

Re: [PULL 15/24] bsd-user: Fix commentary issues

2021-04-26 Thread Daniel P . Berrangé

On Fri, Apr 23, 2021 at 02:39:50PM -0600, i...@bsdimp.com wrote:
> From: Warner Losh 
> 
> Lines > 80 or 90 characters
> C++ comments
> BSD /*- block comment convention removed.
> 
> Signed-off-by: Warner Losh 
> ---
>  bsd-user/bsd-mman.h | 42 ++
>  1 file changed, 26 insertions(+), 16 deletions(-)
> 
> diff --git a/bsd-user/bsd-mman.h b/bsd-user/bsd-mman.h
> index 910e8c1921..5a64d0d425 100644
> --- a/bsd-user/bsd-mman.h
> +++ b/bsd-user/bsd-mman.h
> @@ -1,4 +1,4 @@
> -/*-
> +/*
>   * Copyright (c) 1982, 1986, 1993
>   *  The Regents of the University of California.  All rights reserved.
>   *
> @@ -30,16 +30,20 @@
>   * $FreeBSD: src/sys/sys/mman.h,v 1.42 2008/03/28 04:29:27 ps Exp $
>   */
>  
> -#define TARGET_FREEBSD_MAP_RESERVED0080 0x0080  /* previously misimplemented 
> MAP_INHERIT */
> -#define TARGET_FREEBSD_MAP_RESERVED0100 0x0100  /* previously unimplemented 
> MAP_NOEXTEND */
> -#define TARGET_FREEBSD_MAP_STACK0x0400  /* region grows down, like a 
> stack */
> -#define TARGET_FREEBSD_MAP_NOSYNC   0x0800  /* page to but do not sync 
> underlying file */
> +#define TARGET_FREEBSD_MAP_RESERVED0080 0x0080
> + /* previously misimplemented MAP_INHERIT */
> +#define TARGET_FREEBSD_MAP_RESERVED0100 0x0100
> + /* previously unimplemented MAP_NOEXTEND */
> +#define TARGET_FREEBSD_MAP_STACK0x0400
> + /* region grows down, like a stack */
> +#define TARGET_FREEBSD_MAP_NOSYNC   0x0800
> + /* page to but do not sync underlying file 
> */

I find this indented following comment style more ambiguous as to
what constant the comment applies to. IMHO would be clearer as

 /* previously misimplemented MAP_INHERIT */
 #define TARGET_FREEBSD_MAP_RESERVED0080 0x0080

 /* previously unimplemented MAP_NOEXTEND */
 #define TARGET_FREEBSD_MAP_RESERVED0100 0x0100

 /* region grows down, like a stack */
 #define TARGET_FREEBSD_MAP_STACK0x0400

 /* page to but do not sync underlying file */
 #define TARGET_FREEBSD_MAP_NOSYNC   0x0800

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[Bug 1895305] Re: pthread_cancel fails with "RT33" with musl libc

2021-04-26 Thread Ariadne Conill

This was a downstream regression in Alpine caused by an attempt to make
older Go binaries work under emulation.  We have reverted the patch
there.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1895305

Title:
  pthread_cancel fails with "RT33" with musl libc

Status in QEMU:
  New

Bug description:
  From my testing it seems that QEMU built against musl libc crashes on
  pthread_cancel cancel calls - if the binary is also built with musl
  libc.

  Minimal sample:

  #include 
  #include 
  #include 
  void* threadfunc(void* ignored) {
while (1) {
pause();
}
return NULL;
  }
  int main() {
pthread_t thread;
pthread_create(&thread, NULL, &threadfunc, NULL);
sleep(1);
pthread_cancel(thread);
printf("OK, alive\n");
  }

  In an Alpine Linux aarch64 chroot (on an x86_64 host) the binary will
  just output RT33 and has exit code 161.

  Using qemu-aarch64 on an x86_64 host results in the output (fish shell)
fish: “qemu-aarch64-static ./musl-stat…” terminated by signal Unknown 
(Unknown)
  or (bash)
Real-time signal 2

  and exit code 164.

  It doesn't matter whether the binary is linked dynamically or static.
  You can see my test results in the following table:

  |  | QEMU glibc | QEMU musl |
  |--||---|
  | binary glibc dynamic | ✓  | ✓ |
  | binary glibc static  | ✓  | ✓ |
  | binary musl dynamic  | ✓  | ✗ |
  | binary musl static   | ✓  | ✗ |

  Both QEMU builds are v5.1.0 (glibc v2.32 / musl v1.2.1)

  I've uploaded all my compile and test commands (plus a script to
  conveniently run them all) to https://github.com/z3ntu/qemu-
  pthread_cancel . It also includes the built binaries if needed. The
  test script output can be found at https://github.com/z3ntu/qemu-
  pthread_cancel/blob/master/results.txt

  Further links:
  - https://gitlab.com/postmarketOS/pmaports/-/issues/190#note_141902075
  - https://gitlab.com/postmarketOS/pmbootstrap/-/issues/1970

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1895305/+subscriptions

Re: Resetting non-qdev children in a 3-phase reset device

2021-04-26 Thread Peter Maydell

On Mon, 26 Apr 2021 at 06:19, Markus Armbruster  wrote:
> device_legacy_reset() is deprecated:
>
> /**
>  * device_legacy_reset:
>  *
>  * Reset a single device (by calling the reset method).
>  * Note: This function is deprecated and will be removed when it becomes 
> unused.
>  * Please use device_cold_reset() now.
>  */
> void device_legacy_reset(DeviceState *dev);
>
> Good to know, but how do we get from here to there?  If we could simply
> replace one call by the other, surely we'd have done it already.

device_legacy_reset() semantics:
 * call the reset method on this device, and do nothing else
device_cold_reset() semantics:
 * call the reset method on this device
 * reset any qbuses this device owns (and so on recursively
   down the qbus tree)

So if the device has no qbuses they're equivalent and we can
just replace one call with the other. If the device does have
a qbus then it's more complicated (and I would start by asking
"do we really want to simulate power-cycling the device but *not*
power-cycling of its bus here?"...)

There are less than 20 callers of device_legacy_reset(), I guess
we should at least go through and identify "which of these are
definitely not resetting a device with qbuses" and update those.

thanks
-- PMM

Re: [PATCH 03/11] block/block-gen.h: bind monitor

2021-04-26 Thread Vladimir Sementsov-Ogievskiy


24.04.2021 08:23, Markus Armbruster wrote:

Vladimir Sementsov-Ogievskiy  writes:


If we have current monitor, let's bind it to wrapper coroutine too.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/block-gen.h | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/block/block-gen.h b/block/block-gen.h
index c1fd3f40de..61f055a8cc 100644
--- a/block/block-gen.h
+++ b/block/block-gen.h
@@ -27,6 +27,7 @@
  #define BLOCK_BLOCK_GEN_H
  
  #include "block/block_int.h"

+#include "monitor/monitor.h"
  
  /* Base structure for argument packing structures */

  typedef struct AioPollCo {
@@ -38,11 +39,20 @@ typedef struct AioPollCo {
  
  static inline int aio_poll_co(AioPollCo *s)

  {
+Monitor *mon = monitor_cur();


This gets the currently executing coroutine's monitor from the hash
table.


  assert(!qemu_in_coroutine());
  
+if (mon) {

+monitor_set_cur(s->co, mon);


This writes it back.  No-op, since the coroutine hasn't changed.  Why?


No. s->co != qemu_corotuine_current(), so it's not a write back, but creating a 
new entry in the hash map. s->co is a new coroutine which we are going to start.




+}
+
  aio_co_enter(s->ctx, s->co);
  AIO_WAIT_WHILE(s->ctx, s->in_progress);
  
+if (mon) {

+monitor_set_cur(s->co, NULL);


This removes s->co's monitor from the hash table.  Why?


+}
+
  return s->ret;
  }




If I comment the new code of this patch (keeping the whole series applied), 249 
fails, as error message goes simply to stderr, not to monitor:

249   fail   [11:56:54] [11:56:54]   0.3s   (last: 0.2s)  output mismatch 
(see 249.out.bad)
--- /work/src/qemu/up/hmp-qemu-io/tests/qemu-iotests/249.out
+++ 249.out.bad
@@ -9,7 +9,8 @@

 { 'execute': 'human-monitor-command',
'arguments': {'command-line': 'qemu-io none0 "aio_write 0 2k"'}}
-{"return": "Block node is read-onlyrn"}
+QEMU_PROG: Block node is read-only
+{"return": ""}

 === Run block-commit on base using an invalid filter node name

@@ -24,7 +25,8 @@

 { 'execute': 'human-monitor-command',
'arguments': {'command-line': 'qemu-io none0 "aio_write 0 2k"'}}
-{"return": "Block node is read-onlyrn"}
+QEMU_PROG: Block node is read-only
+{"return": ""}

 === Run block-commit on base using the default filter node name

@@ -43,5 +45,6 @@

 { 'execute': 'human-monitor-command',
'arguments': {'command-line': 'qemu-io none0 "aio_write 0 2k"'}}
-{"return": "Block node is read-onlyrn"}
+QEMU_PROG: Block node is read-only
+{"return": ""}
 *** done
Failures: 249
Failed 1 of 1 iotests



--
Best regards,
Vladimir

Re: [PULL 15/24] bsd-user: Fix commentary issues

2021-04-26 Thread Peter Maydell

On Mon, 26 Apr 2021 at 10:01, Daniel P. Berrangé  wrote:
>
> On Fri, Apr 23, 2021 at 02:39:50PM -0600, i...@bsdimp.com wrote:
> > -#define TARGET_FREEBSD_MAP_RESERVED0080 0x0080  /* previously 
> > misimplemented MAP_INHERIT */
> > -#define TARGET_FREEBSD_MAP_RESERVED0100 0x0100  /* previously 
> > unimplemented MAP_NOEXTEND */
> > -#define TARGET_FREEBSD_MAP_STACK0x0400  /* region grows down, like 
> > a stack */
> > -#define TARGET_FREEBSD_MAP_NOSYNC   0x0800  /* page to but do not sync 
> > underlying file */
> > +#define TARGET_FREEBSD_MAP_RESERVED0080 0x0080
> > + /* previously misimplemented MAP_INHERIT 
> > */
> > +#define TARGET_FREEBSD_MAP_RESERVED0100 0x0100
> > + /* previously unimplemented MAP_NOEXTEND 
> > */
> > +#define TARGET_FREEBSD_MAP_STACK0x0400
> > + /* region grows down, like a stack */
> > +#define TARGET_FREEBSD_MAP_NOSYNC   0x0800
> > + /* page to but do not sync underlying 
> > file */
>
> I find this indented following comment style more ambiguous as to
> what constant the comment applies to. IMHO would be clearer as
>
>  /* previously misimplemented MAP_INHERIT */
>  #define TARGET_FREEBSD_MAP_RESERVED0080 0x0080
>
>  /* previously unimplemented MAP_NOEXTEND */
>  #define TARGET_FREEBSD_MAP_RESERVED0100 0x0100
>
>  /* region grows down, like a stack */
>  #define TARGET_FREEBSD_MAP_STACK0x0400
>
>  /* page to but do not sync underlying file */
>  #define TARGET_FREEBSD_MAP_NOSYNC   0x0800

Or alternatively decide that this is one of those cases where the coding
style's "If wrapping the line at 80 columns is obviously less readable and
more awkward, prefer not to wrap it" advice applies. The lines as they stand
are only 95 characters or so long.

thanks
-- PMM

[Bug 1749393] Re: sbrk() not working under qemu-user with a PIE-compiled binary?

2021-04-26 Thread Launchpad Bug Tracker

** Merge proposal linked:
   
https://code.launchpad.net/~paelzer/ubuntu/+source/qemu/+git/qemu/+merge/401771

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1749393

Title:
  sbrk() not working under qemu-user with a PIE-compiled binary?

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Released
Status in qemu source package in Focal:
  Triaged

Bug description:
  [Impact]

   * The current space reserved can be too small and we can end up
 with no space at all for BRK. It can happen to any case, but is
 much more likely with the now common PIE binaries.

   * Backport the upstream fix which reserves a bit more space while loading
 and giving it back after interpreter and stack is loaded.

  [Test Plan]

   * On x86 run:
  sudo apt install -y qemu-user-static docker.io
  sudo docker run --rm arm64v8/debian:bullseye bash -c 'apt update && apt 
install -y wget'
  ...
  Running hooks in /etc/ca-certificates/update.d...
  done.
  Errors were encountered while processing:
   libc-bin
  E: Sub-process /usr/bin/dpkg returned an error code (1)

  
  [Where problems could occur]

   * Regressions would be around use-cases of linux-user that is
 emulation not of a system but of binaries.
 Commonly uses for cross-tests and cross-builds so that is the
 space to watch for regressions

  [Other Info]
   
   * n/a


  ---

  In Debian unstable, we recently switched bash to be a PIE-compiled
  binary (for hardening). Unfortunately this resulted in bash being
  broken when run under qemu-user (for all target architectures, host
  being amd64 for me).

  $ sudo chroot /srv/chroots/sid-i386/ qemu-i386-static /bin/bash
  bash: xmalloc: .././shell.c:1709: cannot allocate 10 bytes (0 bytes allocated)

  bash has its own malloc implementation based on sbrk():
  https://git.savannah.gnu.org/cgit/bash.git/tree/lib/malloc/malloc.c

  When we disable this internal implementation and rely on glibc's
  malloc, then everything is fine. But it might be that glibc has a
  fallback when sbrk() is not working properly and it might hide the
  underlying problem in qemu-user.

  This issue has also been reported to the bash upstream author and he 
suggested that the issue might be in qemu-user so I'm opening a ticket here. 
Here's the discussion with the bash upstream author:
  https://lists.gnu.org/archive/html/bug-bash/2018-02/threads.html#00080

  You can find the problematic bash binary in that .deb file:
  
http://snapshot.debian.org/archive/debian/20180206T154716Z/pool/main/b/bash/bash_4.4.18-1_i386.deb

  The version of qemu I have been using is 2.11 (Debian package qemu-
  user-static version 1:2.11+dfsg-1) but I have had reports that the
  problem is reproducible with older versions (back to 2.8 at least).

  Here are the related Debian bug reports:
  https://bugs.debian.org/889869
  https://bugs.debian.org/865599

  It's worth noting that bash used to have this problem (when compiled as a PIE 
binary) even when run directly but then something got fixed in the kernel and 
now the problem only appears when run under qemu-user:
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518483

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1749393/+subscriptions

Re: Resetting non-qdev children in a 3-phase reset device

2021-04-26 Thread Philippe Mathieu-Daudé

On 4/25/21 8:33 PM, Peter Maydell wrote:
> On Sat, 24 Apr 2021 at 14:04, Philippe Mathieu-Daudé  wrote:
>> I now understand better the diag288 case, but I still don't understand
>> the TYPE_APIC one. It has no DeviceClass::reset(), its abstract parent
>> TYPE_APIC_COMMON register apic_reset_common() but being TYPE_DEVICE it
>> is not on a qbus. It is somehow connected to the X86CPU object, but the
>> single call to apic_init_reset() is from do_cpu_init() - not a reset
>> method -.
> 
> pc_machine_reset() calls device_legacy_reset(cpu->apic_state)
> which is to say it invokes the DeviceState::reset method,
> which is either kvm_apic_reset or apic_reset_common.

Oh, thanks! I guess "convoluted" is the proper adjective to describe
this reset logic. I suppose APIC is a very old device, part of the
Frankenstein PC, so hard to rework (because we are scared of the
implications of changing old & heavily used devices).

Windows Cirrus Build broken

2021-04-26 Thread Alex Bennée

Hi Yonggang,

It looks like the Windows msys2 build has been broken a while on Cirrus:

Last working:

  https://cirrus-ci.com/task/6239849435889664

First broken:

  https://cirrus-ci.com/task/5344535250206720

History:

  https://cirrus-ci.com/github/qemu/qemu

While the green to red points the finger at Mark's qemu-sparc merge the
failure looks a lot more fundamental than that (can't find make).

Any chance you could have a look at what's going on?

-- 
Alex Bennée

Re: [PATCH for-6.0 v2 1/2] hw/block/nvme: fix invalid msix exclusive uninit

2021-04-26 Thread Philippe Mathieu-Daudé

On 4/26/21 6:40 AM, Klaus Jensen wrote:
> On Apr 23 07:21, Klaus Jensen wrote:
>> From: Klaus Jensen 
>>
>> Commit 1901b4967c3f changed the nvme device from using a bar exclusive
>> for MSI-x to sharing it on bar0.
>>
>> Unfortunately, the msix_uninit_exclusive_bar() call remains in
>> nvme_exit() which causes havoc when the device is removed with, say,
>> device_del. Fix this.
>>
>> Additionally, a subregion is added but it is not removed on exit which
>> causes a reference to linger and the drive to never be unlocked.
>>
>> Fixes: 1901b4967c3f ("hw/block/nvme: move msix table and pba to BAR 0")
>> Signed-off-by: Klaus Jensen 
>> ---
>> hw/block/nvme.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
>> index 624a1431d072..5fe082ec34c5 100644
>> --- a/hw/block/nvme.c
>> +++ b/hw/block/nvme.c
>> @@ -6235,7 +6235,8 @@ static void nvme_exit(PCIDevice *pci_dev)
>>     if (n->pmr.dev) {
>>     host_memory_backend_set_mapped(n->pmr.dev, false);
>>     }
>> -    msix_uninit_exclusive_bar(pci_dev);
>> +    msix_uninit(pci_dev, &n->bar0, &n->bar0);
>> +    memory_region_del_subregion(&n->bar0, &n->iomem);
>> }
>>
>> static Property nvme_props[] = {
>> -- 
>> 2.31.1
>>
> 
> Ping for a review on this please :)

You forgot to Cc the maintainers :/ (doing it now).

$ ./scripts/get_maintainer.pl -f include/hw/pci/msix.h
"Michael S. Tsirkin"  (supporter:PCI)
Marcel Apfelbaum  (supporter:PCI)

[Bug 1749393] Re: sbrk() not working under qemu-user with a PIE-compiled binary?

2021-04-26 Thread Christian Ehrhardt 

** Description changed:

+ [Impact]
+ 
+  * The current space reserved can be too small and we can end up
+with no space at all for BRK. It can happen to any case, but is
+much more likely with the now common PIE binaries.
+ 
+  * Backport the upstream fix which reserves a bit more space while loading
+and giving it back after interpreter and stack is loaded.
+ 
+ [Test Plan]
+ 
+  * On x86 run:
+ sudo apt install -y qemu-user-static docker.io
+ sudo docker run --rm arm64v8/debian:bullseye bash -c 'apt update && apt 
install -y wget'
+ ...
+ Running hooks in /etc/ca-certificates/update.d...
+ done.
+ Errors were encountered while processing:
+  libc-bin
+ E: Sub-process /usr/bin/dpkg returned an error code (1)
+ 
+ 
+ [Where problems could occur]
+ 
+  * Regressions would be around use-cases of linux-user that is
+emulation not of a system but of binaries.
+Commonly uses for cross-tests and cross-builds so that is the
+space to watch for regressions
+ 
+ [Other Info]
+  
+  * n/a
+ 
+ 
+ ---
+ 
  In Debian unstable, we recently switched bash to be a PIE-compiled
  binary (for hardening). Unfortunately this resulted in bash being broken
  when run under qemu-user (for all target architectures, host being amd64
  for me).
  
  $ sudo chroot /srv/chroots/sid-i386/ qemu-i386-static /bin/bash
  bash: xmalloc: .././shell.c:1709: cannot allocate 10 bytes (0 bytes allocated)
  
  bash has its own malloc implementation based on sbrk():
  https://git.savannah.gnu.org/cgit/bash.git/tree/lib/malloc/malloc.c
  
  When we disable this internal implementation and rely on glibc's malloc,
  then everything is fine. But it might be that glibc has a fallback when
  sbrk() is not working properly and it might hide the underlying problem
  in qemu-user.
  
  This issue has also been reported to the bash upstream author and he 
suggested that the issue might be in qemu-user so I'm opening a ticket here. 
Here's the discussion with the bash upstream author:
  https://lists.gnu.org/archive/html/bug-bash/2018-02/threads.html#00080
  
  You can find the problematic bash binary in that .deb file:
  
http://snapshot.debian.org/archive/debian/20180206T154716Z/pool/main/b/bash/bash_4.4.18-1_i386.deb
  
  The version of qemu I have been using is 2.11 (Debian package qemu-user-
  static version 1:2.11+dfsg-1) but I have had reports that the problem is
  reproducible with older versions (back to 2.8 at least).
  
  Here are the related Debian bug reports:
  https://bugs.debian.org/889869
  https://bugs.debian.org/865599
  
  It's worth noting that bash used to have this problem (when compiled as a PIE 
binary) even when run directly but then something got fixed in the kernel and 
now the problem only appears when run under qemu-user:
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518483

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1749393

Title:
  sbrk() not working under qemu-user with a PIE-compiled binary?

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Released
Status in qemu source package in Focal:
  Triaged

Bug description:
  [Impact]

   * The current space reserved can be too small and we can end up
 with no space at all for BRK. It can happen to any case, but is
 much more likely with the now common PIE binaries.

   * Backport the upstream fix which reserves a bit more space while loading
 and giving it back after interpreter and stack is loaded.

  [Test Plan]

   * On x86 run:
  sudo apt install -y qemu-user-static docker.io
  sudo docker run --rm arm64v8/debian:bullseye bash -c 'apt update && apt 
install -y wget'
  ...
  Running hooks in /etc/ca-certificates/update.d...
  done.
  Errors were encountered while processing:
   libc-bin
  E: Sub-process /usr/bin/dpkg returned an error code (1)

  
  [Where problems could occur]

   * Regressions would be around use-cases of linux-user that is
 emulation not of a system but of binaries.
 Commonly uses for cross-tests and cross-builds so that is the
 space to watch for regressions

  [Other Info]
   
   * n/a


  ---

  In Debian unstable, we recently switched bash to be a PIE-compiled
  binary (for hardening). Unfortunately this resulted in bash being
  broken when run under qemu-user (for all target architectures, host
  being amd64 for me).

  $ sudo chroot /srv/chroots/sid-i386/ qemu-i386-static /bin/bash
  bash: xmalloc: .././shell.c:1709: cannot allocate 10 bytes (0 bytes allocated)

  bash has its own malloc implementation based on sbrk():
  https://git.savannah.gnu.org/cgit/bash.git/tree/lib/malloc/malloc.c

  When we disable this internal implementation and rely on glibc's
  malloc, then everything is fine. But it might be that glibc has a
  fallback when sbrk() is not working properly and it might hide the
  underlying problem in qemu-user.

  This issue has also been reported to the bash u

Re: Resetting non-qdev children in a 3-phase reset device

2021-04-26 Thread Peter Maydell

On Mon, 26 Apr 2021 at 10:23, Philippe Mathieu-Daudé  wrote:
>
> On 4/25/21 8:33 PM, Peter Maydell wrote:
> > On Sat, 24 Apr 2021 at 14:04, Philippe Mathieu-Daudé  
> > wrote:
> >> I now understand better the diag288 case, but I still don't understand
> >> the TYPE_APIC one. It has no DeviceClass::reset(), its abstract parent
> >> TYPE_APIC_COMMON register apic_reset_common() but being TYPE_DEVICE it
> >> is not on a qbus. It is somehow connected to the X86CPU object, but the
> >> single call to apic_init_reset() is from do_cpu_init() - not a reset
> >> method -.
> >
> > pc_machine_reset() calls device_legacy_reset(cpu->apic_state)
> > which is to say it invokes the DeviceState::reset method,
> > which is either kvm_apic_reset or apic_reset_common.
>
> Oh, thanks! I guess "convoluted" is the proper adjective to describe
> this reset logic. I suppose APIC is a very old device, part of the
> Frankenstein PC, so hard to rework (because we are scared of the
> implications of changing old & heavily used devices).

This is mostly just another instance of "our reset logic doesn't
deal well with devices which aren't on buses". The APIC isn't on a bus,
so the machine that uses it has a local workaround to manually arrange
for it to reset, just as it does for the CPU.

thanks
-- PMM

First time patch not reflected in the mailing list

2021-04-26 Thread Edge NFV

Hi

I sent a patch around four hours back but I doesn't seem to be reflected on
to the mailing list. That patch was my first mail to this mail list. I
understand that this a moderated list and the first message needs to be
white listed by the moderator. Please let me know how long it will take to
white list my patch message.

E

Re: [PATCH v1 13/25] hw/tricore: Add testdevice for tests in tests/tcg/

2021-04-26 Thread Philippe Mathieu-Daudé

On 4/19/21 4:54 PM, Alex Bennée wrote:
> From: Bastian Koppelmann 
> 
> this device is used to verify the correctness of regression tests by
> allowing guests to write their exit status to this device. This is then
> used by qemu to exit using the written status.
> 
> Reviewed-by: Alex Bennée 
> Signed-off-by: Bastian Koppelmann 
> Signed-off-by: Alex Bennée 
> Message-Id: <20210305170045.869437-4-kbast...@mail.uni-paderborn.de>
> ---
>  include/hw/tricore/tricore_testdevice.h | 38 
>  hw/tricore/tricore_testboard.c  |  8 +++
>  hw/tricore/tricore_testdevice.c | 82 +
>  hw/tricore/meson.build  |  1 +
>  4 files changed, 129 insertions(+)
>  create mode 100644 include/hw/tricore/tricore_testdevice.h
>  create mode 100644 hw/tricore/tricore_testdevice.c

> +#include "hw/tricore/tricore_testdevice.h"
> +
> +static void tricore_testdevice_write(void *opaque, hwaddr offset,
> +  uint64_t value, unsigned size)
> +{
> +exit(value);

   ->  qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);

I'd rather use a 2 steps check of value such watchdog devices do
(to be sure the guest is still in control and isn't nut).


A general comments, all targets require a such test feature,
so we should have a generic user-creatable sysbus-testdev for that.

Regards,

Phil.

[Bug 1925496] Re: nvme disk cannot be hotplugged after removal

2021-04-26 Thread Oguz Bektas

BTW Re: Regression, I think it's not, because this didn't work a year
ago either, but I wasn't sure if it's a bug.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1925496

Title:
  nvme disk cannot be hotplugged after removal

Status in QEMU:
  Confirmed

Bug description:
  Hello,

  When I try to re-add an nvme disk shortly after removing it, I get an
  error about duplicate ID.

  See the following commands to reproduce. This happens consistently on
  all VMs that I tested:

  
  attach
  ==

  $VAR1 = {
'arguments' => {
 'command-line' => 'drive_add auto 
"file=/dev/zvol/rpool/data/vm-2-disk-1,if=none,id=drive-nvme1,format=raw,cache=none,aio=native,detect-zeroes=on"'
   },
'execute' => 'human-monitor-command'
  };
  $VAR1 = {
'execute' => 'device_add',
'arguments' => {
 'serial' => 'nvme1',
 'drive' => 'drive-nvme1',
 'driver' => 'nvme',
 'id' => 'nvme1'
   }
  };

  
  detach
  ===
  $VAR1 = {
'arguments' => {
 'id' => 'nvme1'
   },
'execute' => 'device_del'
  };
  $VAR1 = {
'execute' => 'human-monitor-command',
'arguments' => {
 'command-line' => 'drive_del drive-nvme1'
   }
  };

  reattach
  ===
  $VAR1 = {
'arguments' => {
 'command-line' => 'drive_add auto 
"file=/dev/zvol/rpool/data/vm-2-disk-1,if=none,id=drive-nvme1,format=raw,cache=none,aio=native,detect-zeroes=on"'
   },
'execute' => 'human-monitor-command'
  };

  
  and I get:
  "Duplicate ID 'drive-nvme1' for drive"

  although it does not show up in query-block or query-pci anymore after
  the first detach.

  
  Is this a bug or am I missing something? Please advise.

  Best regards,
  Oguz

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1925496/+subscriptions

Re: [PATCH for-6.0 v2 1/2] hw/block/nvme: fix invalid msix exclusive uninit

2021-04-26 Thread Klaus Jensen


On Apr 26 11:27, Philippe Mathieu-Daudé wrote:

On 4/26/21 6:40 AM, Klaus Jensen wrote:

On Apr 23 07:21, Klaus Jensen wrote:

From: Klaus Jensen 

Commit 1901b4967c3f changed the nvme device from using a bar exclusive
for MSI-x to sharing it on bar0.

Unfortunately, the msix_uninit_exclusive_bar() call remains in
nvme_exit() which causes havoc when the device is removed with, say,
device_del. Fix this.

Additionally, a subregion is added but it is not removed on exit which
causes a reference to linger and the drive to never be unlocked.

Fixes: 1901b4967c3f ("hw/block/nvme: move msix table and pba to BAR 0")
Signed-off-by: Klaus Jensen 
---
hw/block/nvme.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 624a1431d072..5fe082ec34c5 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -6235,7 +6235,8 @@ static void nvme_exit(PCIDevice *pci_dev)
    if (n->pmr.dev) {
    host_memory_backend_set_mapped(n->pmr.dev, false);
    }
-    msix_uninit_exclusive_bar(pci_dev);
+    msix_uninit(pci_dev, &n->bar0, &n->bar0);
+    memory_region_del_subregion(&n->bar0, &n->iomem);
}

static Property nvme_props[] = {
-- 
2.31.1



Ping for a review on this please :)


You forgot to Cc the maintainers :/ (doing it now).

$ ./scripts/get_maintainer.pl -f include/hw/pci/msix.h
"Michael S. Tsirkin"  (supporter:PCI)
Marcel Apfelbaum  (supporter:PCI)



I didnt consider CC'ing the PCI maintainers directly, but makes total 
sense here, thanks!


signature.asc
Description: PGP signature

[Bug 1749393] Re: sbrk() not working under qemu-user with a PIE-compiled binary?

2021-04-26 Thread Christian Ehrhardt 

For Focal:
- SRU Template added to the bug
- MP: 
https://code.launchpad.net/~paelzer/ubuntu/+source/qemu/+git/qemu/+merge/401771
- PPA: 
https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4535/+packages 
(still building)

I'd ask anyone affected by this on Focal to give it a try on the PPA and
let us know if this fix would work for you.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1749393

Title:
  sbrk() not working under qemu-user with a PIE-compiled binary?

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Released
Status in qemu source package in Focal:
  Triaged

Bug description:
  [Impact]

   * The current space reserved can be too small and we can end up
 with no space at all for BRK. It can happen to any case, but is
 much more likely with the now common PIE binaries.

   * Backport the upstream fix which reserves a bit more space while loading
 and giving it back after interpreter and stack is loaded.

  [Test Plan]

   * On x86 run:
  sudo apt install -y qemu-user-static docker.io
  sudo docker run --rm arm64v8/debian:bullseye bash -c 'apt update && apt 
install -y wget'
  ...
  Running hooks in /etc/ca-certificates/update.d...
  done.
  Errors were encountered while processing:
   libc-bin
  E: Sub-process /usr/bin/dpkg returned an error code (1)

  
  [Where problems could occur]

   * Regressions would be around use-cases of linux-user that is
 emulation not of a system but of binaries.
 Commonly uses for cross-tests and cross-builds so that is the
 space to watch for regressions

  [Other Info]
   
   * n/a


  ---

  In Debian unstable, we recently switched bash to be a PIE-compiled
  binary (for hardening). Unfortunately this resulted in bash being
  broken when run under qemu-user (for all target architectures, host
  being amd64 for me).

  $ sudo chroot /srv/chroots/sid-i386/ qemu-i386-static /bin/bash
  bash: xmalloc: .././shell.c:1709: cannot allocate 10 bytes (0 bytes allocated)

  bash has its own malloc implementation based on sbrk():
  https://git.savannah.gnu.org/cgit/bash.git/tree/lib/malloc/malloc.c

  When we disable this internal implementation and rely on glibc's
  malloc, then everything is fine. But it might be that glibc has a
  fallback when sbrk() is not working properly and it might hide the
  underlying problem in qemu-user.

  This issue has also been reported to the bash upstream author and he 
suggested that the issue might be in qemu-user so I'm opening a ticket here. 
Here's the discussion with the bash upstream author:
  https://lists.gnu.org/archive/html/bug-bash/2018-02/threads.html#00080

  You can find the problematic bash binary in that .deb file:
  
http://snapshot.debian.org/archive/debian/20180206T154716Z/pool/main/b/bash/bash_4.4.18-1_i386.deb

  The version of qemu I have been using is 2.11 (Debian package qemu-
  user-static version 1:2.11+dfsg-1) but I have had reports that the
  problem is reproducible with older versions (back to 2.8 at least).

  Here are the related Debian bug reports:
  https://bugs.debian.org/889869
  https://bugs.debian.org/865599

  It's worth noting that bash used to have this problem (when compiled as a PIE 
binary) even when run directly but then something got fixed in the kernel and 
now the problem only appears when run under qemu-user:
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518483

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1749393/+subscriptions

Re: [PATCH 1/2] block/export: Free ignored Error

2021-04-26 Thread Vladimir Sementsov-Ogievskiy


22.04.2021 17:53, Max Reitz wrote:

When invoking block-export-add with some iothread and
fixed-iothread=false, and changing the node's iothread fails, the error
is supposed to be ignored.

However, it is still stored in *errp, which is wrong.  If a second error
occurs, the "*errp must be NULL" assertion in error_setv() fails:

   qemu-system-x86_64: ../util/error.c:59: error_setv: Assertion
   `*errp == NULL' failed.

So the error from bdrv_try_set_aio_context() must be freed when it is
ignored.

Fixes: f51d23c80af73c95e0ce703ad06a300f1b3d63ef
("block/export: add iothread and fixed-iothread options")
Signed-off-by: Max Reitz 
---
  block/export/export.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/block/export/export.c b/block/export/export.c
index fec7d9f738..ce5dd3e59b 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -68,6 +68,7 @@ static const BlockExportDriver 
*blk_exp_find_driver(BlockExportType type)
  
  BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)

  {
+ERRP_GUARD();
  bool fixed_iothread = export->has_fixed_iothread && 
export->fixed_iothread;
  const BlockExportDriver *drv;
  BlockExport *exp = NULL;
@@ -127,6 +128,9 @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error 
**errp)
  ctx = new_ctx;
  } else if (fixed_iothread) {
  goto fail;
+} else {
+error_free(*errp);
+*errp = NULL;
  }
  }
  



I don't think ERRP_GUARD is needed in this case: we don't need to handle errp 
somehow except for just free if it was set. So we can simply do:

} else if (errp) {
   error_free(*errp);
   *errp = NULL;
}

Let's only check that errp is really set on failure path of 
bdrv_try_set_aio_context():

bdrv_try_set_aio_context() fails iff bdrv_can_set_aio_context() fails, which in 
turn may fail iff bdrv_parent_can_set_aio_context() or 
bdrv_child_can_set_aio_context() fails.

bdrv_parent_can_set_aio_context() has two failure path, on first it set errp by 
hand, and on second it has assertion that errp is set.

bdrv_child_can_set_aio_context() may fail only if nested call to 
bdrv_can_set_aio_context() fails, so recursion is closed.


--
Best regards,
Vladimir

[PATCH] make vfio and DAX cache work together

2021-04-26 Thread Edge NFV

 Signed-off-by: Edge NFV 
---
 hw/vfio/common.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index ae5654fcdb..83e15bf7a3 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -668,6 +668,15 @@ static void vfio_listener_region_add(MemoryListener
*listener,
 int128_get64(int128_sub(section->size, int128_one(;
 return;
 }
+
+/* Do not add virtio fs cache section */
+if (!strcmp(memory_region_name(section->mr), "virtio-fs-cache")) {
+trace_vfio_listener_region_add_skip(
+section->offset_within_address_space,
+section->offset_within_address_space +
+int128_get64(int128_sub(section->size, int128_one(;
+return;
+}

 if (unlikely((section->offset_within_address_space &
   ~qemu_real_host_page_mask) !=
-- 
2.25.1

Re: [PATCH v5 0/4] accel/tcg: Make sure that tb->size != 0 after translation

2021-04-26 Thread Cornelia Huck

On Fri, 23 Apr 2021 10:50:59 -0700
Richard Henderson  wrote:

> On 4/23/21 3:31 AM, Cornelia Huck wrote:
> > So, what's the way forward here? I can pick this if I get an ack for
> > the arm patch. If someone else wants to take this, I'll just ack the
> > s390x patch.  
> 
> You've volunteered, so that means you get it, I think.  ;-)
> 
> 
> r~
> 

I guessed as much :) Thanks for your review!

Re: [PATCH v5 0/4] accel/tcg: Make sure that tb->size != 0 after translation

2021-04-26 Thread Cornelia Huck

On Fri, 16 Apr 2021 17:49:35 +0200
Ilya Leoshkevich  wrote:

> If arch-specific code generates a translation block of size 0,
> tb_gen_code() may generate a spurious exception.
> 
> Fix s390x (patch 1), ARM (patch 2) and xtensa (patch 3) and add an
> assertion in order to catch such situations earlier (patch 4).
> 
> v1: https://lists.nongnu.org/archive/html/qemu-devel/2021-04/msg02037.html
> v1 -> v2: Fix target/s390x instead of trying to tolerate tb->size == 0
>   in tb_gen_code().
> 
> v2: https://lists.nongnu.org/archive/html/qemu-devel/2021-04/msg02101.html
> v2 -> v3: Split the common code change into a separate patch, add the
>   ARM patch in order to fix
>   https://gitlab.com/cohuck/qemu/-/jobs/1178409450
> 
> v3: https://lists.nongnu.org/archive/html/qemu-devel/2021-04/msg02332.html
> v3 -> v4: Add the xtensa patch in order to fix
>   https://gitlab.com/cohuck/qemu/-/jobs/1178409540
> 
> v4: https://lists.nongnu.org/archive/html/qemu-devel/2021-04/msg02592.html
> v4 -> v5: Handle thumb: the following C code triggers the assertion:
>   typedef void (*funcptr)(void);
>   int main() { funcptr f = (funcptr)0x0001; f(); }
> 
> Ilya Leoshkevich (4):
>   target/s390x: Fix translation exception on illegal instruction
>   target/arm: Make sure that commpage's tb->size != 0
>   target/xtensa: Make sure that tb->size != 0
>   accel/tcg: Assert that tb->size != 0 after translation
> 
>  accel/tcg/translate-all.c |  1 +
>  target/arm/translate.c|  2 ++
>  target/s390x/translate.c  | 16 +++-
>  target/xtensa/translate.c |  3 +++
>  4 files changed, 17 insertions(+), 5 deletions(-)
> 

Thanks, queued to s390-next.

X on old (non-x86) Linux guests

2021-04-26 Thread Dr. David Alan Gilbert

Hi,
  Over the weekend I got a Red Hat 6.x (not RHEL!) for Alpha booting
under QEMU which was pretty neat.  But I failed to find a succesful
combination to get X working; has anyone any suggestions?

  That distro was from around 2000; the challenge is since we don't have
VESA on non-x86, we can't change mode that way, so generic XF86_SVGA
doesn't want to play with any of the devices.

  I also tried the ati device, but the accelerated mach64 driver
didn't recognise that ID.

  Has anyone found any combo that works?
I suspect using one of the existing devices, lying about PCI ID, and
then turning off all accelerations might have a chance but I've not got
that far.

[Alpha took a bit of a fight; none of the SCSI controllers were
happy, but the CMD646 worked well enough to install off a CD image
from a -kernel passed in ]


Dave
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH v3 0/2] tests/docker: tests/tcg/ppc64le: Newer toolchain to build tests for PowerISA v3.1 instructions

2021-04-26 Thread Alex Bennée



matheus.fe...@eldorado.org.br writes:

> From: Matheus Ferst 
>
> This series adds gcc-10 based images to enable the build of tests with Power10
> instructions. Then, to put it to good use, a tests for the byte-reverse
> instructions (implemented in 9d69cfa2faa7) is introduced.
>
> v3:
> - Fixed field 'needs' of powerpc-test-debian-cross-container in
>   .gitlab-ci.d/containers.yml
>
> v2:
> - Unused images removed from tests/docker/Makefile.include,
>   tests/docker/dockerfiles, and .gitlab-ci.d/containers.yml
> - Nested ppc64-* and ppc64le-* cases in tests/tcg/configure.sh
> - Fixed inline assembly usage and unused header removed from
>   tests/tcg/ppc64le/byte_reverse.c
>
> Matheus Ferst (2):
>   tests/docker: gcc-10 based images for ppc64{,le} tests
>   tests/tcg/ppc64le: tests for brh/brw/brd
>
>  .gitlab-ci.d/containers.yml   | 13 +++-
>  tests/docker/Makefile.include |  5 ++---
>  .../dockerfiles/debian-powerpc-cross.docker   | 12 ---
>  .../debian-powerpc-test-cross.docker  | 17 +++
>  .../dockerfiles/debian-ppc64-cross.docker | 11 --
>  tests/tcg/configure.sh| 20 +-
>  tests/tcg/ppc64/Makefile.target   |  7 +++
>  tests/tcg/ppc64le/Makefile.target |  7 +++
>  tests/tcg/ppc64le/byte_reverse.c  | 21 +++
>  9 files changed, 67 insertions(+), 46 deletions(-)
>  delete mode 100644 tests/docker/dockerfiles/debian-powerpc-cross.docker
>  create mode 100644 tests/docker/dockerfiles/debian-powerpc-test-cross.docker
>  delete mode 100644 tests/docker/dockerfiles/debian-ppc64-cross.docker
>  create mode 100644 tests/tcg/ppc64le/byte_reverse.c

Queued to testing/next, thanks.

It should be noted the GitLab CI run still skips these tests due to use
of debian-all-test-cross which is still debian10 based. As long as the
maintainer and interested developers still run tests locally this
shouldn't be a major hole in coverage.

Once bullseye goes gold we will be updating all our test images in due
course anyway.

-- 
Alex Bennée

Re: [PATCH 0/2] iotests/qsd-jobs: Use common.qemu for the QSD

2021-04-26 Thread Stefan Hajnoczi

On Thu, Apr 01, 2021 at 03:28:13PM +0200, Max Reitz wrote:
> (Alternative to: “iotests/qsd-jobs: Filter events in the first test”)
> 
> Hi,
> 
> The qsd-jobs test has kind of unreliable output, because sometimes the
> job is ready before ‘quit’, and sometimes it is not.  This series
> presents one approach to fix that, which is to extend common.qemu to
> allow running the storage daemon instead of qemu, and then to use that
> in qsd-jobs to wait for the BLOCK_JOB_READY event before issuing the
> ‘quit’ command.
> 
> I took patch 1 from my “qcow2: Improve refcount structure rebuilding”
> series.
> (https://lists.nongnu.org/archive/html/qemu-block/2021-03/msg00654.html)
> 
> As noted above, this series is an alternative to “iotests/qsd-jobs:
> Filter events in the first test”.  I like this series here better
> because I’d prefer it if tests that do QMP actually check the output so
> they control what’s really happening.
> On the other hand, this may be too complicated for 6.0, and we might
> want to fix qsd-jobs in 6.0.
> 
> 
> Max Reitz (2):
>   iotests/common.qemu: Allow using the QSD
>   iotests/qsd-jobs: Use common.qemu for the QSD
> 
>  tests/qemu-iotests/common.qemu| 53 +-
>  tests/qemu-iotests/tests/qsd-jobs | 55 ---
>  tests/qemu-iotests/tests/qsd-jobs.out | 10 -
>  3 files changed, 92 insertions(+), 26 deletions(-)
> 
> -- 
> 2.29.2
> 

Acked-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 2/2] iotests/307: Test iothread conflict for exports

2021-04-26 Thread Vladimir Sementsov-Ogievskiy


22.04.2021 17:53, Max Reitz wrote:

Passing fixed-iothread=true should make iothread conflicts fatal,
whereas fixed-iothread=false should not.

Combine the second case with an error condition that is checked after
the iothread is handled, to verify that qemu does not crash if there is
such an error after changing the iothread failed.

Signed-off-by: Max Reitz 


Reviewed-by: Vladimir Sementsov-Ogievskiy 
Tested-by: Vladimir Sementsov-Ogievskiy 


--
Best regards,
Vladimir

Re: [PATCH v1 13/25] hw/tricore: Add testdevice for tests in tests/tcg/

2021-04-26 Thread Alex Bennée

Philippe Mathieu-Daudé  writes:

> On 4/19/21 4:54 PM, Alex Bennée wrote:
>> From: Bastian Koppelmann 
>> 
>> this device is used to verify the correctness of regression tests by
>> allowing guests to write their exit status to this device. This is then
>> used by qemu to exit using the written status.
>> 
>> Reviewed-by: Alex Bennée 
>> Signed-off-by: Bastian Koppelmann 
>> Signed-off-by: Alex Bennée 
>> Message-Id: <20210305170045.869437-4-kbast...@mail.uni-paderborn.de>
>> ---
>>  include/hw/tricore/tricore_testdevice.h | 38 
>>  hw/tricore/tricore_testboard.c  |  8 +++
>>  hw/tricore/tricore_testdevice.c | 82 +
>>  hw/tricore/meson.build  |  1 +
>>  4 files changed, 129 insertions(+)
>>  create mode 100644 include/hw/tricore/tricore_testdevice.h
>>  create mode 100644 hw/tricore/tricore_testdevice.c
>
>> +#include "hw/tricore/tricore_testdevice.h"
>> +
>> +static void tricore_testdevice_write(void *opaque, hwaddr offset,
>> +  uint64_t value, unsigned size)
>> +{
>> +exit(value);
>
>->  qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
>
> I'd rather use a 2 steps check of value such watchdog devices do
> (to be sure the guest is still in control and isn't nut).

This isn't any different to what we do for TARGET_SYS_EXIT_EXTENDED or
the various other semihosting exits. Maybe we could do a better job of
flagging that these devices (or features) give the guest an avenue to
cause QEMU to shutdown but none of these are enabled by default.

>
> A general comments, all targets require a such test feature,
> so we should have a generic user-creatable sysbus-testdev for that.

We also have the isa-debug-exit device used by x86. I believe there is
also a PCI device (pci-testdev) used to submit error-exit results for
kvm-unit-tests.

I'm all for modelling a cleaner abstraction that could be used by all
these methods and avoiding multiple exit paths but I don't want to hold
up Bastian's tests to a higher standard without addressing the other
cases. In the meantime given it improves the testing situation for
Tricore I don't think it's a major issue.

>
> Regards,
>
> Phil.

-- 
Alex Bennée

Re: [PATCH 1/2] block/export: Free ignored Error

2021-04-26 Thread Max Reitz


On 26.04.21 11:44, Vladimir Sementsov-Ogievskiy wrote:

22.04.2021 17:53, Max Reitz wrote:

When invoking block-export-add with some iothread and
fixed-iothread=false, and changing the node's iothread fails, the error
is supposed to be ignored.

However, it is still stored in *errp, which is wrong.  If a second error
occurs, the "*errp must be NULL" assertion in error_setv() fails:

   qemu-system-x86_64: ../util/error.c:59: error_setv: Assertion
   `*errp == NULL' failed.

So the error from bdrv_try_set_aio_context() must be freed when it is
ignored.

Fixes: f51d23c80af73c95e0ce703ad06a300f1b3d63ef
    ("block/export: add iothread and fixed-iothread options")
Signed-off-by: Max Reitz 
---
  block/export/export.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/block/export/export.c b/block/export/export.c
index fec7d9f738..ce5dd3e59b 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -68,6 +68,7 @@ static const BlockExportDriver 
*blk_exp_find_driver(BlockExportType type)

  BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
  {
+    ERRP_GUARD();
  bool fixed_iothread = export->has_fixed_iothread && 
export->fixed_iothread;

  const BlockExportDriver *drv;
  BlockExport *exp = NULL;
@@ -127,6 +128,9 @@ BlockExport *blk_exp_add(BlockExportOptions 
*export, Error **errp)

  ctx = new_ctx;
  } else if (fixed_iothread) {
  goto fail;
+    } else {
+    error_free(*errp);
+    *errp = NULL;
  }
  }



I don't think ERRP_GUARD is needed in this case: we don't need to handle 
errp somehow except for just free if it was set.


Perhaps not, but style-wise, I prefer not special-casing the
errp == NULL case.

(It can be argued that ERRP_GUARD similarly special-cases it, but that’s 
hidden from my view.  Also, the errp == NULL case actually doesn’t even 
happen, so ERRP_GUARD is effectively a no-op and it won’t cost 
performance (not that it really matters).)


Of course we could also do this:

ret = bdrv_try_set_aio_context(bs, new_ctx, fixed_iothread ? errp : NULL);

Would be even shorter.


So we can simply do:

} else if (errp) {
    error_free(*errp);
    *errp = NULL;
}

Let's only check that errp is really set on failure path of 
bdrv_try_set_aio_context():


OK,  but out of interest, why?  error_free() doesn’t care.  I mean it 
might be a problem if blk_exp_add() returns an error without setting 
*errp, but that’d’ve been pre-existing.


Max

bdrv_try_set_aio_context() fails iff bdrv_can_set_aio_context() fails, 
which in turn may fail iff bdrv_parent_can_set_aio_context() or 
bdrv_child_can_set_aio_context() fails.


bdrv_parent_can_set_aio_context() has two failure path, on first it set 
errp by hand, and on second it has assertion that errp is set.


bdrv_child_can_set_aio_context() may fail only if nested call to 
bdrv_can_set_aio_context() fails, so recursion is closed.

Re: firmware selection for SEV-ES

2021-04-26 Thread Laszlo Ersek

On 04/23/21 19:36, Pavel Hrdina wrote:
> On Fri, Apr 23, 2021 at 03:06:49PM +0200, Laszlo Ersek wrote:
>> On 04/23/21 15:01, Pavel Hrdina wrote:
>>> On Fri, Apr 23, 2021 at 02:34:02PM +0200, Laszlo Ersek wrote:
 On 04/23/21 12:31, Pavel Hrdina wrote:
> On Fri, Apr 23, 2021 at 10:16:24AM +0200, Michal Privoznik wrote:
>> On 4/22/21 4:13 PM, Laszlo Ersek wrote:
>>> On 04/21/21 13:51, Pavel Hrdina wrote:

>>> Should we file a libvirtd Feature Request (where?) for recognizing the
>>> @amd-sev-es feature flag?
>>
>> Yes, we should. We can use RedHat bugzilla for that. Laszlo - do you 
>> want to
>> do it yourself or shall I help you with that?
>
> This BZ looks like it's already tracking support for amd-sev-es [1].
>
> Pavel.
>
> [1] 

 That's a private RHBZ that tracks SEV-ES for a product that is different
 from "libvirt upstream".
>>>
>>> I didn't notice that's a private RHBZ, thanks for pointing it out.
>>>
>>> For upstream libvirt we no longer use RHBZ to track RFEs/BZs, we use
>>> gitlab issues so if we want to track the work in upstream as well I can
>>> create a new issue.
>>
>> Heh, I suspected I was missing something there :) Yes, please, if you or
>> Michal could create a new issue in gitlab, that would be great.
> 
> I've created a new libvirt issue [1], hopefully the description makes
> sense. :)
> 
> Pavel
> 
> [1] 

Looks good to me, thank you!
Laszlo

Re: [PATCH 1/2] block/export: Free ignored Error

2021-04-26 Thread Vladimir Sementsov-Ogievskiy


26.04.2021 13:33, Max Reitz wrote:

On 26.04.21 11:44, Vladimir Sementsov-Ogievskiy wrote:

22.04.2021 17:53, Max Reitz wrote:

When invoking block-export-add with some iothread and
fixed-iothread=false, and changing the node's iothread fails, the error
is supposed to be ignored.

However, it is still stored in *errp, which is wrong.  If a second error
occurs, the "*errp must be NULL" assertion in error_setv() fails:

   qemu-system-x86_64: ../util/error.c:59: error_setv: Assertion
   `*errp == NULL' failed.

So the error from bdrv_try_set_aio_context() must be freed when it is
ignored.

Fixes: f51d23c80af73c95e0ce703ad06a300f1b3d63ef
    ("block/export: add iothread and fixed-iothread options")
Signed-off-by: Max Reitz 
---
  block/export/export.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/block/export/export.c b/block/export/export.c
index fec7d9f738..ce5dd3e59b 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -68,6 +68,7 @@ static const BlockExportDriver 
*blk_exp_find_driver(BlockExportType type)
  BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
  {
+    ERRP_GUARD();
  bool fixed_iothread = export->has_fixed_iothread && 
export->fixed_iothread;
  const BlockExportDriver *drv;
  BlockExport *exp = NULL;
@@ -127,6 +128,9 @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error 
**errp)
  ctx = new_ctx;
  } else if (fixed_iothread) {
  goto fail;
+    } else {
+    error_free(*errp);
+    *errp = NULL;
  }
  }



I don't think ERRP_GUARD is needed in this case: we don't need to handle errp 
somehow except for just free if it was set.


Perhaps not, but style-wise, I prefer not special-casing the
errp == NULL case.

(It can be argued that ERRP_GUARD similarly special-cases it, but that’s hidden 
from my view.  Also, the errp == NULL case actually doesn’t even happen, so 
ERRP_GUARD is effectively a no-op and it won’t cost performance (not that it 
really matters).)


Hm. I don't know. May be you are right.. Actually, I don't care too much, so, 
patch is OK as is:

Reviewed-by: Vladimir Sementsov-Ogievskiy 



Of course we could also do this:

ret = bdrv_try_set_aio_context(bs, new_ctx, fixed_iothread ? errp : NULL);

Would be even shorter.


So we can simply do:

} else if (errp) {
    error_free(*errp);
    *errp = NULL;
}

Let's only check that errp is really set on failure path of 
bdrv_try_set_aio_context():


OK,  but out of interest, why?  error_free() doesn’t care.  I mean it might be 
a problem if blk_exp_add() returns an error without setting *errp, but 
that’d’ve been pre-existing.



I remember we still have some functions not setting errp on some error paths.. 
bdrv_open_driver() has work-around for such bad .*open handlers of some 
drivers... So I decided to look through.




bdrv_try_set_aio_context() fails iff bdrv_can_set_aio_context() fails, which in 
turn may fail iff bdrv_parent_can_set_aio_context() or 
bdrv_child_can_set_aio_context() fails.

bdrv_parent_can_set_aio_context() has two failure path, on first it set errp by 
hand, and on second it has assertion that errp is set.

bdrv_child_can_set_aio_context() may fail only if nested call to 
bdrv_can_set_aio_context() fails, so recursion is closed.







--
Best regards,
Vladimir

Re: [PATCH] hw/block/nvme: fix csi field for cns 0x00 and 0x11

2021-04-26 Thread Klaus Jensen


On Apr 26 13:16, Gollu Appalanaidu wrote:

As per the TP 4056d Namespace types CNS 0x00 and CNS 0x11
CSI field shouldn't use but it is being used for these two
Identify command CNS values, fix that.

Signed-off-by: Gollu Appalanaidu 
---
hw/nvme/ctrl.c | 11 ---
1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 2e7498a73e..1657b1d04a 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -4244,11 +4244,16 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, 
NvmeRequest *req, bool active)
}
}

-if (c->csi == NVME_CSI_NVM && nvme_csi_has_nvm_support(ns)) {
-return nvme_c2h(n, (uint8_t *)&ns->id_ns, sizeof(NvmeIdNs), req);
+if (active && nvme_csi_has_nvm_support(ns)) {
+goto out;
+} else if (!active && ns->csi == NVME_CSI_NVM) {
+goto out;
+} else {
+return NVME_INVALID_CMD_SET | NVME_DNR;
}

-return NVME_INVALID_CMD_SET | NVME_DNR;
+out:
+return nvme_c2h(n, (uint8_t *)&ns->id_ns, sizeof(NvmeIdNs), req);
}

static uint16_t nvme_identify_ns_attached_list(NvmeCtrl *n, NvmeRequest *req)
--
2.17.1




Looking closer at this, since we only support the NVM and Zoned command 
sets, we can get rid of the `nvme_csi_has_nvm_support()` helper and just 
assume NVM command set support for all namespaces. The way different 
command sets are handled doesn't scale anyway, so we might as well 
simplify the logic a bit.


Something like this (compile-tested only) patch maybe?

diff --git i/hw/nvme/ctrl.c w/hw/nvme/ctrl.c
index 2e7498a73e70..7fcd6992358d 100644
--- i/hw/nvme/ctrl.c
+++ w/hw/nvme/ctrl.c
@@ -4178,16 +4178,6 @@ static uint16_t nvme_rpt_empty_id_struct(NvmeCtrl *n, 
NvmeRequest *req)
 return nvme_c2h(n, id, sizeof(id), req);
 }

-static inline bool nvme_csi_has_nvm_support(NvmeNamespace *ns)
-{
-switch (ns->csi) {
-case NVME_CSI_NVM:
-case NVME_CSI_ZONED:
-return true;
-}
-return false;
-}
-
 static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeRequest *req)
 {
 trace_pci_nvme_identify_ctrl();
@@ -4244,7 +4234,7 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest 
*req, bool active)
 }
 }

-if (c->csi == NVME_CSI_NVM && nvme_csi_has_nvm_support(ns)) {
+if (active || ns->csi == NVME_CSI_NVM) {
 return nvme_c2h(n, (uint8_t *)&ns->id_ns, sizeof(NvmeIdNs), req);
 }

@@ -4315,7 +4305,7 @@ static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, 
NvmeRequest *req,
 }
 }

-if (c->csi == NVME_CSI_NVM && nvme_csi_has_nvm_support(ns)) {
+if (c->csi == NVME_CSI_NVM) {
 return nvme_rpt_empty_id_struct(n, req);
 } else if (c->csi == NVME_CSI_ZONED && ns->csi == NVME_CSI_ZONED) {
 return nvme_c2h(n, (uint8_t *)ns->id_ns_zoned, sizeof(NvmeIdNsZoned),



signature.asc
Description: PGP signature

Re: Resetting non-qdev children in a 3-phase reset device

2021-04-26 Thread Philippe Mathieu-Daudé

On 4/26/21 11:33 AM, Peter Maydell wrote:
> On Mon, 26 Apr 2021 at 10:23, Philippe Mathieu-Daudé  wrote:
>> On 4/25/21 8:33 PM, Peter Maydell wrote:
>>> On Sat, 24 Apr 2021 at 14:04, Philippe Mathieu-Daudé  
>>> wrote:
 I now understand better the diag288 case, but I still don't understand
 the TYPE_APIC one. It has no DeviceClass::reset(), its abstract parent
 TYPE_APIC_COMMON register apic_reset_common() but being TYPE_DEVICE it
 is not on a qbus. It is somehow connected to the X86CPU object, but the
 single call to apic_init_reset() is from do_cpu_init() - not a reset
 method -.
>>>
>>> pc_machine_reset() calls device_legacy_reset(cpu->apic_state)
>>> which is to say it invokes the DeviceState::reset method,
>>> which is either kvm_apic_reset or apic_reset_common.
>>
>> Oh, thanks! I guess "convoluted" is the proper adjective to describe
>> this reset logic. I suppose APIC is a very old device, part of the
>> Frankenstein PC, so hard to rework (because we are scared of the
>> implications of changing old & heavily used devices).
> 
> This is mostly just another instance of "our reset logic doesn't
> deal well with devices which aren't on buses". The APIC isn't on a bus,
> so the machine that uses it has a local workaround to manually arrange
> for it to reset, just as it does for the CPU.

But the APIC is created within the CPU realize():

static void x86_cpu_apic_create(X86CPU *cpu, Error **errp)
{
APICCommonState *apic;
ObjectClass *apic_class = OBJECT_CLASS(apic_get_class());

cpu->apic_state = DEVICE(object_new_with_class(apic_class));
...

... so I'd expect the CPU (TYPE_X86_CPU) to reset the APIC in its
reset(), not the machine (regardless how the CPU itself is reset).

*But* there is an undocumented MachineClass::wakeup() handler which
complicates the picture:

static void pc_machine_reset(MachineState *machine)
{
CPUState *cs;
X86CPU *cpu;

qemu_devices_reset();

/* Reset APIC after devices have been reset to cancel
 * any changes that qemu_devices_reset() might have done.
 */
CPU_FOREACH(cs) {
cpu = X86_CPU(cs);

if (cpu->apic_state) {
device_legacy_reset(cpu->apic_state);
}
}
}

static void pc_machine_wakeup(MachineState *machine)
{
cpu_synchronize_all_states();
pc_machine_reset(machine);
cpu_synchronize_all_post_reset();
}

It is called once:

/*
 * Wake the VM after suspend.
 */
static void qemu_system_wakeup(void)
{
MachineClass *mc;

mc = current_machine ? MACHINE_GET_CLASS(current_machine) : NULL;

if (mc && mc->wakeup) {
mc->wakeup(current_machine);
}
}

and TYPE_PC_MACHINE is the single object implementing it.

This wakeup is handled in main_loop_should_exit() after the
qemu_reset_requested() check, and pc_machine_wakeup calls
cpu_synchronize_all_post_reset().

Could a 3-phase CPU reset simplify all this?

I agree it is 'mostly just another instance of "our reset logic
doesn't deal well with devices which aren't on buses"', but a
very convoluted one IMHO.

Regards,

Phil.

Re: socket.c added support for unix domain socket datagram transport

2021-04-26 Thread Ralph Schmieder




> On Apr 23, 2021, at 18:39, Stefano Brivio  wrote:
> 
> On Fri, 23 Apr 2021 17:48:08 +0200
> Ralph Schmieder  wrote:
> 
>> Hi, Stefano... Thanks for the detailed response... inline
>> Thanks,
>> -ralph
>> 
>> 
>>> On Apr 23, 2021, at 17:29, Stefano Brivio 
>>> wrote:
>>> 
>>> Hi Ralph,
>>> 
>>> On Fri, 23 Apr 2021 08:56:48 +0200
>>> Ralph Schmieder  wrote:
>>> 
 Hey...  new to this list.  I was looking for a way to use Unix
 domain sockets as a network transport between local VMs.
 
 I'm part of a team where we run dozens if not hundreds of VMs on a
 single compute instance which are highly interconnected.
 
 In the current implementation, I use UDP sockets (e.g. something
 like 
 
 -netdev
 id=bla,type=socket,udp=localhost:1234,localaddr=localhost:5678) 
 
 which works great.
 
 The downside of this approach is that I need to keep track of all
 the UDP ports in use and that there's a potential for clashes.
 Clearly, having Unix domain sockets where I could store the
 sockets in the VM's directory would be much easier to manage.
 
 However, even though there is some AF_UNIX support in net/socket.c,
 it's
 
 - not configurable
 - it doesn't work  
>>> 
>>> I hate to say this, but I've been working on something very similar
>>> just in the past days, with the notable difference that I'm using
>>> stream-oriented AF_UNIX sockets instead of datagram-oriented.
>>> 
>>> I have a similar use case, and after some experiments I picked a
>>> stream-oriented socket over datagram-oriented because:
>>> 
>>> - overhead appears to be the same
>>> 
>>> - message boundaries make little sense here -- you already have a
>>> 32-bit vnet header with the message size defining the message
>>> boundaries
>>> 
>>> - datagram-oriented AF_UNIX sockets are actually reliable and
>>> there's no packet reordering on Linux, but this is not "formally"
>>> guaranteed
>>> 
>>> - it's helpful for me to know when a qemu instance disconnects for
>>> any reason
>>> 
>> 
>> IMO, dgram is the right tool for this as it is symmetrical to using a
>> UDP transport... Since I need to pick up those packets from outside
>> of Qemu (inside of a custom networking fabric) I'd have to make
>> assumptions about the data and don't know the length of the packet in
>> advance.
> 
> Okay, so it doesn't seem to fit your case, but this specific point is
> where you actually have a small advantage using a stream-oriented
> socket. If you receive a packet and have a smaller receive buffer, you
> can read the length of the packet from the vnet header and then read
> the rest of the packet at a later time.
> 
> With a datagram-oriented socket, you would need to know the maximum
> packet size in advance, and use a receive buffer that's large enough to
> contain it, because if you don't, you'll discard data.

For me, the maximum packet size is a jumbo frame (e.g. 9x1024) anyway -- 
everything must fit into an atomic write of that size.

> 
> The same reasoning applies to a receive buffer that's larger than the
> maximum packet size you can get -- you can then read multiple packets at
> a time, filling your buffer, partially reading a packet at the end of
> it, and reading the rest later.
> 
> With a datagram-oriented socket you need to resort to recvmmsg() to
> receive multiple packets with one syscall (nothing against it, it's
> just slightly more tedious).
> 
>> Using the datagram approach fits nicely into this concept.
>> So, yes, in my instance the transport IS truly connectionless and VMs
>> just keep sending packets if the fabric isn't there or doesn't pick
>> up their packets.
> 
> I see, in that case I guess you really need a datagram-oriented
> socket... even though what happens with my patch (just like with the
> existing TCP support) is that your fabric would need to be there when
> qemu starts, but if it disappears later, qemu will simply close the
> socket. Indeed, it's not "hotplug", which is probably what you need.

That's the point.  This is peer-to-peer/point-to-point and not client/server.

> 
>> And maybe there's use for both, as there's currently already support
>> for connection oriented (TCP) and connectionless (UDP) inet
>> transports. 
> 
> Yes, I think so.
> 
 As a side note, I tried to pass in an already open FD, but that
 didn't work either.  
>>> 
>>> This actually worked for me as a quick work-around, either with:
>>> https://github.com/StevenVanAcker/udstools
>>> 
>>> or with a trivial C implementation of that, that does essentially:
>>> 
>>> fd = strtol(argv[1], NULL, 0);
>>> if (fd < 3 || fd > INT_MAX || errno)
>>> usage(argv[0]);
>>> 
>>> s = socket(AF_UNIX, SOCK_STREAM, 0);
>>> if (s < 0) {
>>> perror("socket");
>>> exit(EXIT_FAILURE);
>>> }
>>> 
>>> if (connect(s, (const struct sockaddr *)&addr, sizeof(addr)) < 0) {
>>> perror("connect");
>>>

Re: [PATCH v2 00/25] linux-user/sparc: Implement rt signals

2021-04-26 Thread Alex Bennée

Richard Henderson  writes:

> On 4/25/21 7:53 PM, Richard Henderson wrote:
>> Supercedes:20210425155749.896330-1-richard.hender...@linaro.org
>> ("linux-user/sparc64: Implement signals")
>> This time, in the lead-up, merge the sparc and sparc64 directories.
>> Implement rt signals for sparc32 as well, since there are only a few
>> differences between the two.
>
> I notice that we don't actually have any sparc32 testing in tree.
> I wonder if I can steal an old debian libc image to install along with
> our modern-ish sparc64 compiler to get this working...

It depends on the crt0 bits packaged with the sparc64 compiler. I went
through this with riscv64/32 and found the compiler as packaged by
debian couldn't build 32 bit because it wasn't configured to.

Having said that if we are not expecting things to change that much we
could just do it the same way as hexagon. The alternative is to convince
someone to start a estoteric binary distribution that caters to weird
and ancient build tools ;-)

>
> Also, I've just had a look through linux/arch/sparc/kernel/sparc32.c,
> and see that there's more work to be done for sparcv8plus.  In
> particular, need to save the high-half of the global and out
> registers.
>
>
> r~

-- 
Alex Bennée

RE: [RFC PATCH 0/4] target/ppc: code motion to compile translate_init

2021-04-26 Thread Bruno Piazera Larsen

> > The current patch series aims to isolate common code from 
> > translation-related
> > code. This isolation is required to disable TCG at build time, and the
> > final system still work.
> >
> > This patch series is still WIP, so comments are welcome
> >
> > Bruno Larsen (billionai) (4):
> >target/ppc: move opcode table logic to translate.c
> >target/ppc: isolated SPR read/write callbacks
> >target/ppc: Move SPR generation to separate file
> >target/ppc: isolated cpu init from translation logic
>
> This does not apply to master.  You should say what the patch requirements 
> are.

Oh, right, sorry! This patch series is based on my patch from April 14th, 
subject:
[PATCH v3] target/ppc: code motion from translate_init.c.inc to gdbstub.c


Bruno Piazera Larsen

Instituto de Pesquisas 
ELDORADO

Departamento Computação Embarcada

Analista de Software Trainee

Aviso Legal - Disclaimer

[PATCH v2 0/2] kvm: use KVM_{GET|SET}_SREGS2 when available

2021-04-26 Thread Maxim Levitsky

This implements support for using these new
IOCTLS to migrate PDPTRS.

Best regards,
Maxim Levitsky

Maxim Levitsky (2):
  kvm: update kernel headers for KVM_{GET|SET}_SREGS2
  KVM: use KVM_{GET|SET}_SREGS2 when supported.

 accel/kvm/kvm-all.c |   5 ++
 include/sysemu/kvm.h|   4 ++
 linux-headers/asm-x86/kvm.h |  13 +
 linux-headers/linux/kvm.h   |   5 ++
 target/i386/cpu.h   |   3 +
 target/i386/kvm/kvm.c   | 107 +++-
 target/i386/machine.c   |  30 ++
 7 files changed, 165 insertions(+), 2 deletions(-)

-- 
2.26.2

[PATCH v2 1/2] kvm: update kernel headers for KVM_{GET|SET}_SREGS2

2021-04-26 Thread Maxim Levitsky

Signed-off-by: Maxim Levitsky 
---
 linux-headers/asm-x86/kvm.h | 13 +
 linux-headers/linux/kvm.h   |  5 +
 2 files changed, 18 insertions(+)

diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index 8e76d3701d..d61dc76e24 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -158,6 +158,19 @@ struct kvm_sregs {
__u64 interrupt_bitmap[(KVM_NR_INTERRUPTS + 63) / 64];
 };
 
+struct kvm_sregs2 {
+   /* out (KVM_GET_SREGS2) / in (KVM_SET_SREGS2) */
+   struct kvm_segment cs, ds, es, fs, gs, ss;
+   struct kvm_segment tr, ldt;
+   struct kvm_dtable gdt, idt;
+   __u64 cr0, cr2, cr3, cr4, cr8;
+   __u64 efer;
+   __u64 apic_base;
+   __u64 flags;
+   __u64 pdptrs[4];
+};
+#define KVM_SREGS2_FLAGS_PDPTRS_VALID 1
+
 /* for KVM_GET_FPU and KVM_SET_FPU */
 struct kvm_fpu {
__u8  fpr[8][16];
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 020b62a619..22ea29a243 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1056,6 +1056,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_ENFORCE_PV_FEATURE_CPUID 190
 #define KVM_CAP_SYS_HYPERV_CPUID 191
 #define KVM_CAP_DIRTY_LOG_RING 192
+#define KVM_CAP_SREGS2 199
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1563,6 +1564,10 @@ struct kvm_pv_cmd {
 /* Available with KVM_CAP_DIRTY_LOG_RING */
 #define KVM_RESET_DIRTY_RINGS  _IO(KVMIO, 0xc7)
 
+
+#define KVM_GET_SREGS2 _IOR(KVMIO,  0xcc, struct kvm_sregs2)
+#define KVM_SET_SREGS2 _IOW(KVMIO,  0xcd, struct kvm_sregs2)
+
 /* Secure Encrypted Virtualization command */
 enum sev_cmd_id {
/* Guest initialization commands */
-- 
2.26.2

[PATCH v2 2/2] KVM: use KVM_{GET|SET}_SREGS2 when supported.

2021-04-26 Thread Maxim Levitsky

This allows to make PDPTRs part of the migration
stream and thus not reload them after migration which
is against X86 spec.

Signed-off-by: Maxim Levitsky 
---
 accel/kvm/kvm-all.c   |   5 ++
 include/sysemu/kvm.h  |   4 ++
 target/i386/cpu.h |   3 ++
 target/i386/kvm/kvm.c | 107 +-
 target/i386/machine.c |  30 
 5 files changed, 147 insertions(+), 2 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index b6d9f92f15..0397b3cb2b 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -142,6 +142,7 @@ bool kvm_msi_via_irqfd_allowed;
 bool kvm_gsi_routing_allowed;
 bool kvm_gsi_direct_mapping;
 bool kvm_allowed;
+bool kvm_sregs2;
 bool kvm_readonly_mem_allowed;
 bool kvm_vm_attributes_allowed;
 bool kvm_direct_msi_allowed;
@@ -2186,6 +2187,10 @@ static int kvm_init(MachineState *ms)
 kvm_ioeventfd_any_length_allowed =
 (kvm_check_extension(s, KVM_CAP_IOEVENTFD_ANY_LENGTH) > 0);
 
+
+kvm_sregs2 =
+(kvm_check_extension(s, KVM_CAP_SREGS2) > 0);
+
 kvm_state = s;
 
 ret = kvm_arch_init(ms, s);
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index a1ab1ee12d..b3d4538c55 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -32,6 +32,7 @@
 #ifdef CONFIG_KVM_IS_POSSIBLE
 
 extern bool kvm_allowed;
+extern bool kvm_sregs2;
 extern bool kvm_kernel_irqchip;
 extern bool kvm_split_irqchip;
 extern bool kvm_async_interrupts_allowed;
@@ -139,6 +140,9 @@ extern bool kvm_msi_use_devid;
  */
 #define kvm_gsi_direct_mapping() (kvm_gsi_direct_mapping)
 
+
+#define kvm_supports_sregs2() (kvm_sregs2)
+
 /**
  * kvm_readonly_mem_enabled:
  *
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 570f916878..ac877097d4 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1422,6 +1422,9 @@ typedef struct CPUX86State {
 SegmentCache idt; /* only base and limit are used */
 
 target_ulong cr[5]; /* NOTE: cr1 is unused */
+
+bool pdptrs_valid;
+uint64_t pdptrs[4];
 int32_t a20_mask;
 
 BNDReg bnd_regs[4];
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 7fe9f52710..93570706e1 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2514,6 +2514,61 @@ static int kvm_put_sregs(X86CPU *cpu)
 return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_SREGS, &sregs);
 }
 
+static int kvm_put_sregs2(X86CPU *cpu)
+{
+CPUX86State *env = &cpu->env;
+struct kvm_sregs2 sregs;
+int i;
+
+sregs.flags = 0;
+
+if ((env->eflags & VM_MASK)) {
+set_v8086_seg(&sregs.cs, &env->segs[R_CS]);
+set_v8086_seg(&sregs.ds, &env->segs[R_DS]);
+set_v8086_seg(&sregs.es, &env->segs[R_ES]);
+set_v8086_seg(&sregs.fs, &env->segs[R_FS]);
+set_v8086_seg(&sregs.gs, &env->segs[R_GS]);
+set_v8086_seg(&sregs.ss, &env->segs[R_SS]);
+} else {
+set_seg(&sregs.cs, &env->segs[R_CS]);
+set_seg(&sregs.ds, &env->segs[R_DS]);
+set_seg(&sregs.es, &env->segs[R_ES]);
+set_seg(&sregs.fs, &env->segs[R_FS]);
+set_seg(&sregs.gs, &env->segs[R_GS]);
+set_seg(&sregs.ss, &env->segs[R_SS]);
+}
+
+set_seg(&sregs.tr, &env->tr);
+set_seg(&sregs.ldt, &env->ldt);
+
+sregs.idt.limit = env->idt.limit;
+sregs.idt.base = env->idt.base;
+memset(sregs.idt.padding, 0, sizeof sregs.idt.padding);
+sregs.gdt.limit = env->gdt.limit;
+sregs.gdt.base = env->gdt.base;
+memset(sregs.gdt.padding, 0, sizeof sregs.gdt.padding);
+
+sregs.cr0 = env->cr[0];
+sregs.cr2 = env->cr[2];
+sregs.cr3 = env->cr[3];
+sregs.cr4 = env->cr[4];
+
+sregs.cr8 = cpu_get_apic_tpr(cpu->apic_state);
+sregs.apic_base = cpu_get_apic_base(cpu->apic_state);
+
+sregs.efer = env->efer;
+
+if (env->pdptrs_valid) {
+for (i = 0; i < 4; i++) {
+sregs.pdptrs[i] = env->pdptrs[i];
+}
+sregs.flags |= KVM_SREGS2_FLAGS_PDPTRS_VALID;
+}
+
+return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_SREGS2, &sregs);
+}
+
+
 static void kvm_msr_buf_reset(X86CPU *cpu)
 {
 memset(cpu->kvm_msr_buf, 0, MSR_BUF_SIZE);
@@ -3175,6 +3230,53 @@ static int kvm_get_sregs(X86CPU *cpu)
 return 0;
 }
 
+static int kvm_get_sregs2(X86CPU *cpu)
+{
+CPUX86State *env = &cpu->env;
+struct kvm_sregs2 sregs;
+int i, ret;
+
+ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_SREGS2, &sregs);
+if (ret < 0) {
+return ret;
+}
+
+get_seg(&env->segs[R_CS], &sregs.cs);
+get_seg(&env->segs[R_DS], &sregs.ds);
+get_seg(&env->segs[R_ES], &sregs.es);
+get_seg(&env->segs[R_FS], &sregs.fs);
+get_seg(&env->segs[R_GS], &sregs.gs);
+get_seg(&env->segs[R_SS], &sregs.ss);
+
+get_seg(&env->tr, &sregs.tr);
+get_seg(&env->ldt, &sregs.ldt);
+
+env->idt.limit = sregs.idt.limit;
+env->idt.base = sregs.idt.base;
+env->gdt.limit = sregs.gdt.limit;
+env->gdt.base = sregs.gdt.base;
+
+env->cr[0] = sregs.cr0;
+env->cr[2] = sregs.cr2;
+env->cr[3

Re: [PATCH v1 13/25] hw/tricore: Add testdevice for tests in tests/tcg/

2021-04-26 Thread Philippe Mathieu-Daudé

On 4/26/21 12:15 PM, Alex Bennée wrote:
> Philippe Mathieu-Daudé  writes:
>> On 4/19/21 4:54 PM, Alex Bennée wrote:
>>> From: Bastian Koppelmann 
>>>
>>> this device is used to verify the correctness of regression tests by
>>> allowing guests to write their exit status to this device. This is then
>>> used by qemu to exit using the written status.
>>>
>>> Reviewed-by: Alex Bennée 
>>> Signed-off-by: Bastian Koppelmann 
>>> Signed-off-by: Alex Bennée 
>>> Message-Id: <20210305170045.869437-4-kbast...@mail.uni-paderborn.de>
>>> ---
>>>  include/hw/tricore/tricore_testdevice.h | 38 
>>>  hw/tricore/tricore_testboard.c  |  8 +++
>>>  hw/tricore/tricore_testdevice.c | 82 +
>>>  hw/tricore/meson.build  |  1 +
>>>  4 files changed, 129 insertions(+)
>>>  create mode 100644 include/hw/tricore/tricore_testdevice.h
>>>  create mode 100644 hw/tricore/tricore_testdevice.c
>>
>>> +#include "hw/tricore/tricore_testdevice.h"
>>> +
>>> +static void tricore_testdevice_write(void *opaque, hwaddr offset,
>>> +  uint64_t value, unsigned size)
>>> +{
>>> +exit(value);
>>
>>->  qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
>>
>> I'd rather use a 2 steps check of value such watchdog devices do
>> (to be sure the guest is still in control and isn't nut).
> 
> This isn't any different to what we do for TARGET_SYS_EXIT_EXTENDED or
> the various other semihosting exits. Maybe we could do a better job of
> flagging that these devices (or features) give the guest an avenue to
> cause QEMU to shutdown but none of these are enabled by default.

My concern here is the console being modified and not being restored
correctly. Maybe not a problem for the current test, but could happens
later with more tests added, or the device re-used elsewhere.

This is a one-line change, which can be done later.

This concert also applies to the semihosting exit(). Can be done later too.

> 
>>
>> A general comments, all targets require a such test feature,
>> so we should have a generic user-creatable sysbus-testdev for that.
> 
> We also have the isa-debug-exit device used by x86. I believe there is
> also a PCI device (pci-testdev) used to submit error-exit results for
> kvm-unit-tests.
> 
> I'm all for modelling a cleaner abstraction that could be used by all
> these methods and avoiding multiple exit paths but I don't want to hold
> up Bastian's tests to a higher standard without addressing the other
> cases. In the meantime given it improves the testing situation for
> Tricore I don't think it's a major issue.

Agreed, not a major issue, my comment are not blocking this patch.

Thanks,

Phil.

Re: socket.c added support for unix domain socket datagram transport

2021-04-26 Thread Ralph Schmieder




> On Apr 23, 2021, at 18:54, Stefano Brivio  wrote:
> 
> On Fri, 23 Apr 2021 17:21:38 +0100
> Daniel P. Berrangé  wrote:
> 
>> On Fri, Apr 23, 2021 at 05:29:40PM +0200, Stefano Brivio wrote:
>>> Hi Ralph,
>>> 
>>> On Fri, 23 Apr 2021 08:56:48 +0200
>>> Ralph Schmieder  wrote:
>>> 
 Hey...  new to this list.  I was looking for a way to use Unix domain
 sockets as a network transport between local VMs.
 
 I'm part of a team where we run dozens if not hundreds of VMs on a
 single compute instance which are highly interconnected.
 
 In the current implementation, I use UDP sockets (e.g. something like 
 
 -netdev
 id=bla,type=socket,udp=localhost:1234,localaddr=localhost:5678) 
 
 which works great.
 
 The downside of this approach is that I need to keep track of all the
 UDP ports in use and that there's a potential for clashes.  Clearly,
 having Unix domain sockets where I could store the sockets in the
 VM's directory would be much easier to manage.
 
 However, even though there is some AF_UNIX support in net/socket.c,
 it's
 
 - not configurable
 - it doesn't work  
>>> 
>>> I hate to say this, but I've been working on something very similar
>>> just in the past days, with the notable difference that I'm using
>>> stream-oriented AF_UNIX sockets instead of datagram-oriented.
>>> 
>>> I have a similar use case, and after some experiments I picked a
>>> stream-oriented socket over datagram-oriented because:
>>> 
>>> - overhead appears to be the same
>>> 
>>> - message boundaries make little sense here -- you already have a
>>>  32-bit vnet header with the message size defining the message
>>>  boundaries
>>> 
>>> - datagram-oriented AF_UNIX sockets are actually reliable and there's
>>>  no packet reordering on Linux, but this is not "formally" guaranteed
>>> 
>>> - it's helpful for me to know when a qemu instance disconnects for any
>>>  reason  
>> 
>> The current IP socket impl for the net socket backend uses SOCK_DGRAM,
>> so from a consistency POV it feels sensible todo the same for UNIX
>> sockets too.
> 
> That's just for UDP though -- it also supports TCP with the "connect="
> parameter, and given that a stream-oriented AF_UNIX socket behaves very
> similarly, I recycled that parameter and just extended that bit of
> documentation.
> 
>> None the less, your last point in particular about wanting to know
>> about disconnects feels valid, and if its useful to you for UNIX
>> sockets, then it ought to be useful for IP sockets too.
>> 
>> IOW, I wonder if  we should use DGRAM for UNIX sockets too by default
>> to match current behaviour, but then also add a CLI option that allows
>> choice of DGRAM vs STREAM, and wire that up for IP & UNIX sockets.
> 
> The choice would only apply to AF_UNIX (that is, not to TCP and UDP).
> 
> The current situation isn't entirely consistent, because for TCP you
> have "connect=", for UDP it's "udp=" or "mcast=", and I'm extending the
> "connect=" case to support stream-oriented AF_UNIX, which I think is
> consistent.
> 
> However, to have it symmetric for the datagram-oriented case
> (UDP and AF_UNIX), ideally it should be changed to
> "dgram=host:port|path" -- which I guess we can't do.

That's what I thought, too. It would be canonical but would break backwards 
compatibility... hence impossible to change the existing parameter from udp to 
dgram.

However, an alternative would be to use the dgram approach for both UDP and UDS 
dgram and keep the UDP option for backwards compatibility.

The code would have to ensure that the dgram param type matches the localaddr 
param type and that localaddr has to be present if dgram is provided (as with 
UDP currently).


> 
> I see two alternatives:
> 
> 1.
>  - "connect=" (TCP only)
>  - "unix=path,type=dgram|stream"
>  - "udp=" (UDP only)
> 
> 2.
>  - "connect=" (TCP and AF_UNIX stream)
>  - "unix_dgram="
>  - "udp=" (UDP only)
> 

> The major thing I like of 2. is that we save some code and a further
> option, but other than that I don't have a strong preference.
> 
> -- 
> Stefano

Re: First time patch not reflected in the mailing list

2021-04-26 Thread Dr. David Alan Gilbert

* Edge NFV (edge...@gmail.com) wrote:
> Hi
> 
> I sent a patch around four hours back but I doesn't seem to be reflected on
> to the mailing list. That patch was my first mail to this mail list. I
> understand that this a moderated list and the first message needs to be
> white listed by the moderator. Please let me know how long it will take to
> white list my patch message.

This list isn't moderated normally so shouldn't be a problem.

Still the list seems to be running a bit slowly today, I see your patch
has made it through though.

Dave

> E
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH v2 1/3] doc: Fix some mistakes in the SEV documentation

2021-04-26 Thread Laszlo Ersek

On 04/23/21 22:08, Tom Lendacky wrote:
> From: Tom Lendacky 
> 
> Fix some spelling and grammar mistakes in the amd-memory-encryption.txt
> file. No new information added.
> 
> Signed-off-by: Tom Lendacky 
> ---
>  docs/amd-memory-encryption.txt | 59 +-
>  1 file changed, 29 insertions(+), 30 deletions(-)
> 
> diff --git a/docs/amd-memory-encryption.txt b/docs/amd-memory-encryption.txt
> index 145896aec7..ed85159ea7 100644
> --- a/docs/amd-memory-encryption.txt
> +++ b/docs/amd-memory-encryption.txt
> @@ -1,38 +1,38 @@
>  Secure Encrypted Virtualization (SEV) is a feature found on AMD processors.
>  
>  SEV is an extension to the AMD-V architecture which supports running 
> encrypted
> -virtual machine (VMs) under the control of KVM. Encrypted VMs have their 
> pages
> +virtual machines (VMs) under the control of KVM. Encrypted VMs have their 
> pages
>  (code and data) secured such that only the guest itself has access to the
>  unencrypted version. Each encrypted VM is associated with a unique encryption
> -key; if its data is accessed to a different entity using a different key the
> +key; if its data is accessed by a different entity using a different key the
>  encrypted guests data will be incorrectly decrypted, leading to 
> unintelligible
>  data.
>  
> -The key management of this feature is handled by separate processor known as
> -AMD secure processor (AMD-SP) which is present in AMD SOCs. Firmware running
> -inside the AMD-SP provide commands to support common VM lifecycle. This
> +Key management for this feature is handled by a separate processor known as 
> the
> +AMD secure processor (AMD-SP), which is present in AMD SOCs. Firmware running
> +inside the AMD-SP provides commands to support a common VM lifecycle. This
>  includes commands for launching, snapshotting, migrating and debugging the
> -encrypted guest. Those SEV command can be issued via KVM_MEMORY_ENCRYPT_OP
> +encrypted guest. These SEV commands can be issued via KVM_MEMORY_ENCRYPT_OP
>  ioctls.
>  
>  Launching
>  -
> -Boot images (such as bios) must be encrypted before guest can be booted.
> -MEMORY_ENCRYPT_OP ioctl provides commands to encrypt the images 
> :LAUNCH_START,
> +Boot images (such as bios) must be encrypted before a guest can be booted. 
> The
> +MEMORY_ENCRYPT_OP ioctl provides commands to encrypt the images: 
> LAUNCH_START,
>  LAUNCH_UPDATE_DATA, LAUNCH_MEASURE and LAUNCH_FINISH. These four commands
>  together generate a fresh memory encryption key for the VM, encrypt the boot
> -images and provide a measurement than can be used as an attestation of the
> +images and provide a measurement than can be used as an attestation of a
>  successful launch.
>  
>  LAUNCH_START is called first to create a cryptographic launch context within
> -the firmware. To create this context, guest owner must provides guest policy,
> +the firmware. To create this context, guest owner must provide a guest 
> policy,
>  its public Diffie-Hellman key (PDH) and session parameters. These inputs
> -should be treated as binary blob and must be passed as-is to the SEV 
> firmware.
> +should be treated as a binary blob and must be passed as-is to the SEV 
> firmware.
>  
> -The guest policy is passed as plaintext and hypervisor may able to read it
> +The guest policy is passed as plaintext. A hypervisor may choose to read it,
>  but should not modify it (any modification of the policy bits will result
>  in bad measurement). The guest policy is a 4-byte data structure containing
> -several flags that restricts what can be done on running SEV guest.
> +several flags that restricts what can be done on a running SEV guest.
>  See KM Spec section 3 and 6.2 for more details.
>  
>  The guest policy can be provided via the 'policy' property (see below)
> @@ -40,31 +40,30 @@ The guest policy can be provided via the 'policy' 
> property (see below)
>  # ${QEMU} \
> sev-guest,id=sev0,policy=0x1...\
>  
> -Guest owners provided DH certificate and session parameters will be used to
> +The guest owner provided DH certificate and session parameters will be used 
> to
>  establish a cryptographic session with the guest owner to negotiate keys used
>  for the attestation.
>  
> -The DH certificate and session blob can be provided via 'dh-cert-file' and
> -'session-file' property (see below
> +The DH certificate and session blob can be provided via the 'dh-cert-file' 
> and
> +'session-file' properties (see below)
>  
>  # ${QEMU} \
>   sev-guest,id=sev0,dh-cert-file=,session-file=
>  
>  LAUNCH_UPDATE_DATA encrypts the memory region using the cryptographic context
> -created via LAUNCH_START command. If required, this command can be called
> +created via the LAUNCH_START command. If required, this command can be called
>  multiple times to encrypt different memory regions. The command also 
> calculates
>  the measurement of the memory contents as it encrypts.
>  
> -LAUNCH_MEASURE command can be used to retr

Re: [PATCH v6 00/12] qcow2: fix parallel rewrite and discard (lockless)

2021-04-26 Thread Vladimir Sementsov-Ogievskiy


22.04.2021 19:30, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

It's an alternative lock-less solution to
   [PATCH v4 0/3] qcow2: fix parallel rewrite and discard (rw-lock)

In v6 a lot of things are rewritten.

What is changed:

1. rename the feature to host_range_refcnt, move it to separate file
2. better naming for everything (I hope)
3. cover reads, not only writes
4. do "ref" in qcow2_get_host_offset(), qcow2_alloc_host_offset(),
 qcow2_alloc_compressed_cluster_offset().
and callers do "unref" appropriately.




About performance. With these series we do extra allocations and hash-map 
operations.. Still testing by

./build/qemu-img bench -c 100 -s 4K --image-opts driver=null-co,size=5G

and

./build/qemu-img bench -c 100 -s 4K -w --image-opts driver=null-co,size=5G

I see difference less than 1%.


--
Best regards,
Vladimir

Re: [PATCH] make vfio and DAX cache work together

2021-04-26 Thread Dr. David Alan Gilbert

* Edge NFV (edge...@gmail.com) wrote:
>  Signed-off-by: Edge NFV 

Hi,
  I take it that 'Edge NFV' isn't your real name; apologies if it is.
It's unusual not to use a real name; I would be interested to know
why you feel uncomfortable not doing.

> ---
>  hw/vfio/common.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index ae5654fcdb..83e15bf7a3 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -668,6 +668,15 @@ static void vfio_listener_region_add(MemoryListener
> *listener,
>  int128_get64(int128_sub(section->size, int128_one(;
>  return;
>  }
> +
> +/* Do not add virtio fs cache section */
> +if (!strcmp(memory_region_name(section->mr), "virtio-fs-cache")) {

So first, this is a patch that fixes something that isn't yet in qemu;
the DAX mode of virtiofs.
Secondly, hard coding the name like this is probably the wrong thing to
do; we need a way for the cache to declare it wants to be omitted.
Thirdly, shouldn't this actually be a change to
vfio_listener_skip_section to add this test?

Dave

> +trace_vfio_listener_region_add_skip(
> +section->offset_within_address_space,
> +section->offset_within_address_space +
> +int128_get64(int128_sub(section->size, int128_one(;
> +return;
> +}
> 
>  if (unlikely((section->offset_within_address_space &
>~qemu_real_host_page_mask) !=
> -- 
> 2.25.1
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH 03/11] block/block-gen.h: bind monitor

2021-04-26 Thread Markus Armbruster

Vladimir Sementsov-Ogievskiy  writes:

> 24.04.2021 08:23, Markus Armbruster wrote:
>> Vladimir Sementsov-Ogievskiy  writes:
>> 
>>> If we have current monitor, let's bind it to wrapper coroutine too.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>   block/block-gen.h | 10 ++
>>>   1 file changed, 10 insertions(+)
>>>
>>> diff --git a/block/block-gen.h b/block/block-gen.h
>>> index c1fd3f40de..61f055a8cc 100644
>>> --- a/block/block-gen.h
>>> +++ b/block/block-gen.h
>>> @@ -27,6 +27,7 @@
>>>   #define BLOCK_BLOCK_GEN_H
>>>   
>>>   #include "block/block_int.h"
>>> +#include "monitor/monitor.h"
>>>   
>>>   /* Base structure for argument packing structures */
>>>   typedef struct AioPollCo {
>>> @@ -38,11 +39,20 @@ typedef struct AioPollCo {
>>>   
>>>   static inline int aio_poll_co(AioPollCo *s)
>>>   {
>>> +Monitor *mon = monitor_cur();
>> 
>> This gets the currently executing coroutine's monitor from the hash
>> table.
>> 
>>>   assert(!qemu_in_coroutine());
>>>   
>>> +if (mon) {
>>> +monitor_set_cur(s->co, mon);
>> 
>> This writes it back.  No-op, since the coroutine hasn't changed.  Why?
>
> No. s->co != qemu_corotuine_current(), so it's not a write back, but creating 
> a new entry in the hash map. s->co is a new coroutine which we are going to 
> start.

Ah, that's what I missed.  Thanks!

[...]

Re: [PATCH v2 5/5] sockets: Support multipath TCP

2021-04-26 Thread Dr. David Alan Gilbert

* Daniel P. Berrangé (berra...@redhat.com) wrote:
> On Wed, Apr 21, 2021 at 12:28:34PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" 
> > 
> > Multipath TCP allows combining multiple interfaces/routes into a single
> > socket, with very little work for the user/admin.
> > 
> > It's enabled by 'mptcp' on most socket addresses:
> > 
> >./qemu-system-x86_64 -nographic -incoming tcp:0:,mptcp
> > 
> > Signed-off-by: Dr. David Alan Gilbert 
> > ---
> >  io/dns-resolver.c   |  4 
> >  qapi/sockets.json   |  5 -
> >  util/qemu-sockets.c | 23 +++
> >  3 files changed, 31 insertions(+), 1 deletion(-)
> 
> Reviewed-by: Daniel P. Berrangé 

Thanks, given this is more socketary than migration code, do you want to
take this series via your tree?

Dave

> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [RFC v9 15/29] vfio: Set up nested stage mappings

2021-04-26 Thread Auger Eric

Hi Kunkun,

On 4/14/21 3:45 AM, Kunkun Jiang wrote:
> On 2021/4/13 20:57, Auger Eric wrote:
>> Hi Kunkun,
>>
>> On 4/13/21 2:10 PM, Kunkun Jiang wrote:
>>> Hi Eric,
>>>
>>> On 2021/4/11 20:08, Eric Auger wrote:
 In nested mode, legacy vfio_iommu_map_notify cannot be used as
 there is no "caching" mode and we do not trap on map.

 On Intel, vfio_iommu_map_notify was used to DMA map the RAM
 through the host single stage.

 With nested mode, we need to setup the stage 2 and the stage 1
 separately. This patch introduces a prereg_listener to setup
 the stage 2 mapping.

 The stage 1 mapping, owned by the guest, is passed to the host
 when the guest invalidates the stage 1 configuration, through
 a dedicated PCIPASIDOps callback. Guest IOTLB invalidations
 are cascaded downto the host through another IOMMU MR UNMAP
 notifier.

 Signed-off-by: Eric Auger 

 ---

 v7 -> v8:
 - properly handle new IOMMUTLBEntry fields and especially
     propagate DOMAIN and PASID based invalidations

 v6 -> v7:
 - remove PASID based invalidation

 v5 -> v6:
 - add error_report_err()
 - remove the abort in case of nested stage case

 v4 -> v5:
 - use VFIO_IOMMU_SET_PASID_TABLE
 - use PCIPASIDOps for config notification

 v3 -> v4:
 - use iommu_inv_pasid_info for ASID invalidation

 v2 -> v3:
 - use VFIO_IOMMU_ATTACH_PASID_TABLE
 - new user API
 - handle leaf

 v1 -> v2:
 - adapt to uapi changes
 - pass the asid
 - pass IOMMU_NOTIFIER_S1_CFG when initializing the config notifier
 ---
    hw/vfio/common.c | 139
 +--
    hw/vfio/pci.c    |  21 +++
    hw/vfio/trace-events |   2 +
    3 files changed, 157 insertions(+), 5 deletions(-)

 diff --git a/hw/vfio/common.c b/hw/vfio/common.c
 index 0cd7ef2139..e369d451e7 100644
 --- a/hw/vfio/common.c
 +++ b/hw/vfio/common.c
 @@ -595,6 +595,73 @@ static bool vfio_get_xlat_addr(IOMMUTLBEntry
 *iotlb, void **vaddr,
    return true;
    }
    +/* Propagate a guest IOTLB invalidation to the host (nested
 mode) */
 +static void vfio_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry
 *iotlb)
 +{
 +    VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
 +    struct vfio_iommu_type1_cache_invalidate ustruct = {};
 +    VFIOContainer *container = giommu->container;
 +    int ret;
 +
 +    assert(iotlb->perm == IOMMU_NONE);
 +
 +    ustruct.argsz = sizeof(ustruct);
 +    ustruct.flags = 0;
 +    ustruct.info.argsz = sizeof(struct iommu_cache_invalidate_info);
 +    ustruct.info.version = IOMMU_CACHE_INVALIDATE_INFO_VERSION_1;
 +    ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
 +
 +    switch (iotlb->granularity) {
 +    case IOMMU_INV_GRAN_DOMAIN:
 +    ustruct.info.granularity = IOMMU_INV_GRANU_DOMAIN;
 +    break;
 +    case IOMMU_INV_GRAN_PASID:
 +    {
 +    struct iommu_inv_pasid_info *pasid_info;
 +    int archid = -1;
 +
 +    pasid_info = &ustruct.info.granu.pasid_info;
 +    ustruct.info.granularity = IOMMU_INV_GRANU_PASID;
 +    if (iotlb->flags & IOMMU_INV_FLAGS_ARCHID) {
 +    pasid_info->flags |= IOMMU_INV_ADDR_FLAGS_ARCHID;
 +    archid = iotlb->arch_id;
 +    }
 +    pasid_info->archid = archid;
 +    trace_vfio_iommu_asid_inv_iotlb(archid);
 +    break;
 +    }
 +    case IOMMU_INV_GRAN_ADDR:
 +    {
 +    hwaddr start = iotlb->iova + giommu->iommu_offset;
 +    struct iommu_inv_addr_info *addr_info;
 +    size_t size = iotlb->addr_mask + 1;
 +    int archid = -1;
 +
 +    addr_info = &ustruct.info.granu.addr_info;
 +    ustruct.info.granularity = IOMMU_INV_GRANU_ADDR;
 +    if (iotlb->leaf) {
 +    addr_info->flags |= IOMMU_INV_ADDR_FLAGS_LEAF;
 +    }
 +    if (iotlb->flags & IOMMU_INV_FLAGS_ARCHID) {
 +    addr_info->flags |= IOMMU_INV_ADDR_FLAGS_ARCHID;
 +    archid = iotlb->arch_id;
 +    }
 +    addr_info->archid = archid;
 +    addr_info->addr = start;
 +    addr_info->granule_size = size;
 +    addr_info->nb_granules = 1;
 +    trace_vfio_iommu_addr_inv_iotlb(archid, start, size,
 +    1, iotlb->leaf);
 +    break;
 +    }
>>> Should we pass a size to  host kernel here, even if vSMMU doesn't
>>> support
>>> RIL or guest kernel doesn't use RIL?
>>>
>>> It will cause TLBI issue in  this scenario: Guest kernel issues a
>>> TLBI cmd
>>> without "range" (tg = 0) to invalidate a 2M huge page. Then qemu passed
>>> the iova and size (4K) to host kernel

Re: [PATCH v2 2/3] docs: Add SEV-ES documentation to amd-memory-encryption.txt

2021-04-26 Thread Laszlo Ersek

On 04/23/21 22:08, Tom Lendacky wrote:
> From: Tom Lendacky 
> 
> Update the amd-memory-encryption.txt file with information about SEV-ES,
> including how to launch an SEV-ES guest and some of the differences
> between SEV and SEV-ES guests in regards to launching and measuring the
> guest.
> 
> Signed-off-by: Tom Lendacky 
> ---
>  docs/amd-memory-encryption.txt | 54 +-
>  1 file changed, 47 insertions(+), 7 deletions(-)
> 
> diff --git a/docs/amd-memory-encryption.txt b/docs/amd-memory-encryption.txt
> index ed85159ea7..ffca382b5f 100644
> --- a/docs/amd-memory-encryption.txt
> +++ b/docs/amd-memory-encryption.txt
> @@ -15,6 +15,13 @@ includes commands for launching, snapshotting, migrating 
> and debugging the
>  encrypted guest. These SEV commands can be issued via KVM_MEMORY_ENCRYPT_OP
>  ioctls.
>  
> +Secure Encrypted Virtualization - Encrypted State (SEV-ES) builds on the SEV
> +support to additionally protect the guest register state. In order to allow a
> +hypervisor to perform functions on behalf of a guest, there is architectural
> +support for notifying a guest's operating system when certain types of 
> VMEXITs
> +are about to occur. This allows the guest to selectively share information 
> with
> +the hypervisor to satisfy the requested function.
> +
>  Launching
>  -
>  Boot images (such as bios) must be encrypted before a guest can be booted. 
> The
> @@ -24,6 +31,9 @@ together generate a fresh memory encryption key for the VM, 
> encrypt the boot
>  images and provide a measurement than can be used as an attestation of a
>  successful launch.
>  
> +For a SEV-ES guest, the LAUNCH_UPDATE_VMSA command is also used to encrypt 
> the
> +guest register state, or VM save area (VMSA), for all of the guest vCPUs.
> +
>  LAUNCH_START is called first to create a cryptographic launch context within
>  the firmware. To create this context, guest owner must provide a guest 
> policy,
>  its public Diffie-Hellman key (PDH) and session parameters. These inputs
> @@ -40,6 +50,12 @@ The guest policy can be provided via the 'policy' property 
> (see below)
>  # ${QEMU} \
> sev-guest,id=sev0,policy=0x1...\
>  
> +Setting the "SEV-ES required" policy bit (bit 2) will launch the guest as a
> +SEV-ES guest (see below)
> +
> +# ${QEMU} \
> +   sev-guest,id=sev0,policy=0x5...\
> +
>  The guest owner provided DH certificate and session parameters will be used 
> to
>  establish a cryptographic session with the guest owner to negotiate keys used
>  for the attestation.
> @@ -55,13 +71,19 @@ created via the LAUNCH_START command. If required, this 
> command can be called
>  multiple times to encrypt different memory regions. The command also 
> calculates
>  the measurement of the memory contents as it encrypts.
>  
> -LAUNCH_MEASURE can be used to retrieve the measurement of encrypted memory.
> -This measurement is a signature of the memory contents that can be sent to 
> the
> -guest owner as an attestation that the memory was encrypted correctly by the
> -firmware. The guest owner may wait to provide the guest confidential 
> information
> -until it can verify the attestation measurement. Since the guest owner knows 
> the
> -initial contents of the guest at boot, the attestation measurement can be
> -verified by comparing it to what the guest owner expects.
> +LAUNCH_UPDATE_VMSA encrypts all the vCPU VMSAs for a SEV-ES guest using the
> +cryptographic context created via the LAUNCH_START command. The command also
> +calculates the measurement of the VMSAs as it encrypts them.
> +
> +LAUNCH_MEASURE can be used to retrieve the measurement of encrypted memory 
> and,
> +for a SEV-ES guest, encrypted VMSAs. This measurement is a signature of the
> +memory contents and, for a SEV-ES guest, the VMSA contents, that can be sent
> +to the guest owner as an attestation that the memory and VMSAs were encrypted
> +correctly by the firmware. The guest owner may wait to provide the guest
> +confidential information until it can verify the attestation measurement.
> +Since the guest owner knows the initial contents of the guest at boot, the
> +attestation measurement can be verified by comparing it to what the guest 
> owner
> +expects.
>  
>  LAUNCH_FINISH finalizes the guest launch and destroys the cryptographic
>  context.
> @@ -75,6 +97,22 @@ To launch a SEV guest
>  -machine ...,confidential-guest-support=sev0 \
>  -object sev-guest,id=sev0,cbitpos=47,reduced-phys-bits=1
>  
> +To launch a SEV-ES guest
> +
> +# ${QEMU} \
> +-machine ...,confidential-guest-support=sev0 \
> +-object sev-guest,id=sev0,cbitpos=47,reduced-phys-bits=1,policy=0x5
> +
> +An SEV-ES guest has some restrictions as compared to a SEV guest. Because the
> +guest register state is encrypted and cannot be updated by the 
> VMM/hypervisor,
> +a SEV-ES guest:
> + - Does not support SMM - SMM support requires updating the guest register
> +   state.
> + - Does not support reboot - a system reset r

Re: [PATCH v2] pc-bios/s390-ccw: Use reset_psw pointer instead of hard-coded null pointer

2021-04-26 Thread Janosch Frank

On 4/23/21 4:24 PM, Thomas Huth wrote:
> When compiling the s390-ccw bios with clang, it emits a warning like this:
> 
>  pc-bios/s390-ccw/jump2ipl.c:86:9: warning: indirection of non-volatile null
>   pointer will be deleted, not trap [-Wnull-dereference]
>  if (*((uint64_t *)0) & RESET_PSW_MASK) {
>  ^~~~
>  pc-bios/s390-ccw/jump2ipl.c:86:9: note: consider using __builtin_trap() or
>   qualifying pointer with 'volatile'
> 
> We could add a "volatile" here to shut it up, but on the other hand,
> we also have a pointer variable called "reset_psw" in this file already
> that points to the PSW at address 0, so we can simply use that pointer
> variable instead.

LGTM, I'm just wondering why I didn't clean that up when I last changed
the file.

Reviewed-by: Janosch Frank 

> 
> Reviewed-by: Philippe Mathieu-Daudé 
> Signed-off-by: Thomas Huth 
> ---
>  v2: Extend comment as suggested by Cornelia
> 
>  pc-bios/s390-ccw/jump2ipl.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/pc-bios/s390-ccw/jump2ipl.c b/pc-bios/s390-ccw/jump2ipl.c
> index b9c70d64a5..73e4367e09 100644
> --- a/pc-bios/s390-ccw/jump2ipl.c
> +++ b/pc-bios/s390-ccw/jump2ipl.c
> @@ -82,8 +82,8 @@ void jump_to_low_kernel(void)
>  jump_to_IPL_code(KERN_IMAGE_START);
>  }
>  
> -/* Trying to get PSW at zero address */
> -if (*((uint64_t *)0) & RESET_PSW_MASK) {
> +/* Trying to get PSW at zero address (pointed to by reset_psw) */
> +if (*reset_psw & RESET_PSW_MASK) {
>  /*
>   * Surely nobody will try running directly from lowcore, so
>   * let's use 0 as an indication that we want to load the reset
>

[Bug 1323758] Re: Mouse stops working when connected usb-storage-device

2021-04-26 Thread Lucas Kanashiro

@Kendrick could you please tell us which version of Ubuntu and qemu you
are using?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1323758

Title:
  Mouse stops working when connected usb-storage-device

Status in QEMU:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  I'm running a guest that has Windows 8 Pro (x64) installed. Every time
  I pass through a usb storage device from the host to the guest, the
  mouse stops working in the vnc client. When I remove the usb-device
  the mouse works again.

  The mouse only stops working when I pass through a usb storage device
  and then make the vlc viewer (client) inactive by clicking on another
  program on the local computer (where I'm running the vnc viewer
  (client)). As long as I keep the vnc viewer active, the mouse works
  without any problems. But as soon as I make the vnc viewer inactive
  and then active again, the mouse will no longer work. I have to reboot
  the guest or remove the usb storage device.

  I can't find any related problems on the internet, so it may be just
  me?

  I hope someone can help me with this.

  EDIT: I posted the extra/new information in comments. But as I know
  see it might be wrong and maybe I should've posted them in this bug
  description container (by editing)? Please tell me if I did it wrong
  and I will change it. Sorry for this misunderstanding.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1323758/+subscriptions

Re: [PATCH v2 3/3] docs/interop/firmware.json: Add SEV-ES support

2021-04-26 Thread Laszlo Ersek

On 04/23/21 22:08, Tom Lendacky wrote:
> From: Tom Lendacky 
> 
> Create an enum definition, '@amd-sev-es', for SEV-ES and add documention
> for the new enum. Add an example that shows some of the requirements for
> SEV-ES, including not having SMM support and the requirement for an
> X64-only build.
> 
> Signed-off-by: Tom Lendacky 
> ---
>  docs/interop/firmware.json | 47 +-
>  1 file changed, 46 insertions(+), 1 deletion(-)
> 
> diff --git a/docs/interop/firmware.json b/docs/interop/firmware.json
> index 9d94ccafa9..8d8b0be030 100644
> --- a/docs/interop/firmware.json
> +++ b/docs/interop/firmware.json
> @@ -115,6 +115,12 @@
>  #   this feature are documented in
>  #   "docs/amd-memory-encryption.txt".
>  #
> +# @amd-sev-es: The firmware supports running under AMD Secure Encrypted
> +#  Virtualization - Encrypted State, as specified in the AMD64
> +#  Architecture Programmer's Manual. QEMU command line options
> +#  related to this feature are documented in
> +#  "docs/amd-memory-encryption.txt".
> +#
>  # @enrolled-keys: The variable store (NVRAM) template associated with
>  # the firmware binary has the UEFI Secure Boot
>  # operational mode turned on, with certificates
> @@ -179,7 +185,7 @@
>  # Since: 3.0
>  ##
>  { 'enum' : 'FirmwareFeature',
> -  'data' : [ 'acpi-s3', 'acpi-s4', 'amd-sev', 'enrolled-keys',
> +  'data' : [ 'acpi-s3', 'acpi-s4', 'amd-sev', 'amd-sev-es', 'enrolled-keys',
>   'requires-smm', 'secure-boot', 'verbose-dynamic',
>   'verbose-static' ] }
>  
> @@ -504,6 +510,45 @@
>  # }
>  #
>  # {
> +# "description": "OVMF with SEV-ES support",
> +# "interface-types": [
> +# "uefi"
> +# ],
> +# "mapping": {
> +# "device": "flash",
> +# "executable": {
> +# "filename": "/usr/share/OVMF/OVMF_CODE.fd",
> +# "format": "raw"
> +# },
> +# "nvram-template": {
> +# "filename": "/usr/share/OVMF/OVMF_VARS.fd",
> +# "format": "raw"
> +# }
> +# },
> +# "targets": [
> +# {
> +# "architecture": "x86_64",
> +# "machines": [
> +# "pc-q35-*"
> +# ]
> +# }
> +# ],
> +# "features": [
> +# "acpi-s3",
> +# "amd-sev",
> +# "amd-sev-es",
> +# "verbose-dynamic"
> +# ],
> +# "tags": [
> +# "-a X64",
> +# "-p OvmfPkg/OvmfPkgX64.dsc",
> +# "-t GCC48",
> +# "-b DEBUG",
> +# "-D FD_SIZE_4MB"
> +# ]
> +# }
> +#
> +# {
>  # "description": "UEFI firmware for ARM64 virtual machines",
>  # "interface-types": [
>  # "uefi"
> 

Reviewed-by: Laszlo Ersek 

Thanks!
Laszlo

[Bug 1900122] Re: Unsupported ioctl: cmd=0xffffffff80685600 when accessing /dev/video* in aarch64 guest

2021-04-26 Thread Laurent Vivier

** Tags added: linux-user

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1900122

Title:
  Unsupported ioctl: cmd=0x80685600 when accessing /dev/video*
  in aarch64 guest

Status in QEMU:
  New

Bug description:
  **Description:**
  Any attempt to work with video in aarch64 architecture emulated on x86_64 
leads currently to the error "Function not implemented". For example:

  ```
  # v4l2-ctl -l --verbose
  Failed to open /dev/video0: Function not implemented

  root@12dd9b6fcfcb:/# ll /dev/video*
  crw-rw 1 root video 81, 0 Oct 16 09:23 /dev/video0
  crw-rw 1 root video 81, 1 Oct 16 09:23 /dev/video1

  ```

  **Steps to reproduce the issue:**

  I have a following setup:

  Host Hardware: x86_64 equipped with a webcam (tried different webcams)
  Host OS: Ubuntu 20.04.1

  Guest Architecture: aarch64
  Guest OS: Ubuntu 20.04 (also tried 16.x and 18.x)

  Emulation: quemu-user-static (also tried binfmt)

  Guest OS is running via Docker + QEMU

  ```
  ➜ cat /proc/sys/fs/binfmt_misc/qemu-aarch64
  enabled
  interpreter /usr/bin/qemu-aarch64-static
  flags: F
  offset 0
  magic 7f454c46020101000200b700
  mask ff00feff
  ```

  **Results received:**
  see desrciption.

  
  **Environment:**

  * QEMU version: (if you can know it):

  ipxe-qemu-256k-compat-efi-roms/focal,now 1.0.0+git-20150424.a25a16d-0ubuntu4 
all [installed,automatic]
  ipxe-qemu/focal-updates,now 1.0.0+git-20190109.133f4c4-0ubuntu3.2 all 
[installed,automatic]
  qemu-block-extra/focal-updates,now 1:4.2-3ubuntu6.7 amd64 
[installed,automatic]
  qemu-kvm/focal-updates,now 1:4.2-3ubuntu6.7 amd64 [installed]
  qemu-system-common/focal-updates,now 1:4.2-3ubuntu6.7 amd64 
[installed,automatic]
  qemu-system-data/focal-updates,now 1:4.2-3ubuntu6.7 all [installed,automatic]
  qemu-system-gui/focal-updates,now 1:4.2-3ubuntu6.7 amd64 [installed,automatic]
  qemu-system-x86/focal-updates,now 1:4.2-3ubuntu6.7 amd64 [installed,automatic]
  qemu-user-binfmt/focal-updates,now 1:4.2-3ubuntu6.7 amd64 
[installed,automatic]
  qemu-user/focal-updates,now 1:4.2-3ubuntu6.7 amd64 [installed]
  qemu-utils/focal-updates,now 1:4.2-3ubuntu6.7 amd64 [installed,automatic]
  qemu/focal-updates,now 1:4.2-3ubuntu6.7 amd64 [installed]

  * Container application: Docker

  **Output of `docker version`, `podman version` or `singularity
  version`**

  ```
  ➜ docker version
  Client: Docker Engine - Community
   Version:   20.10.0-beta1
   API version:   1.40
   Go version:go1.13.15
   Git commit:ac365d7
   Built: Tue Oct 13 18:15:22 2020
   OS/Arch:   linux/amd64
   Context:   default
   Experimental:  true

  Server: Docker Engine - Community
   Engine:
    Version:  19.03.13
    API version:  1.40 (minimum version 1.12)
    Go version:   go1.13.15
    Git commit:   4484c46d9d
    Built:Wed Sep 16 17:01:20 2020
    OS/Arch:  linux/amd64
    Experimental: false
   containerd:
    Version:  1.4.1
    GitCommit:c623d1b36f09f8ef6536a057bd658b3aa8632828
   runc:
    Version:  1.0.0-rc92
    GitCommit:ff819c7e9184c13b7c2607fe6c30ae19403a7aff
   docker-init:
    Version:  0.18.0
    GitCommit:fec3683

  ```

  Guest aarch64 runs in privileged mode:

  `docker run --privileged --device=/dev/video0:/dev/video0 --env
  DISPLAY=unix$DISPLAY -v $XAUTH:/root/.Xauthority  -v
  /tmp/.X11-unix:/tmp/.X11-unix -it --rm arm64v8/ubuntu:20.04 bash`

  **Additional information:**
  I tried also binfmt way to register emulators. The output of `v4l-ctl` was a 
little bit different:

  ```
  # v4l2-ctl -l
  Unsupported ioctl: cmd=0x80685600
  Failed to open /dev/video0: Function not implemented

  ```

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1900122/+subscriptions

Re: socket.c added support for unix domain socket datagram transport

2021-04-26 Thread Daniel P . Berrangé

On Fri, Apr 23, 2021 at 06:54:08PM +0200, Stefano Brivio wrote:
> On Fri, 23 Apr 2021 17:21:38 +0100
> Daniel P. Berrangé  wrote:
> > The current IP socket impl for the net socket backend uses SOCK_DGRAM,
> > so from a consistency POV it feels sensible todo the same for UNIX
> > sockets too.
> 
> That's just for UDP though -- it also supports TCP with the "connect="
> parameter, and given that a stream-oriented AF_UNIX socket behaves very
> similarly, I recycled that parameter and just extended that bit of
> documentation.
> 
> > None the less, your last point in particular about wanting to know
> > about disconnects feels valid, and if its useful to you for UNIX
> > sockets, then it ought to be useful for IP sockets too.
> > 
> > IOW, I wonder if  we should use DGRAM for UNIX sockets too by default
> > to match current behaviour, but then also add a CLI option that allows
> > choice of DGRAM vs STREAM, and wire that up for IP & UNIX sockets.
> 
> The choice would only apply to AF_UNIX (that is, not to TCP and UDP).
> 
> The current situation isn't entirely consistent, because for TCP you
> have "connect=", for UDP it's "udp=" or "mcast=", and I'm extending the
> "connect=" case to support stream-oriented AF_UNIX, which I think is
> consistent.
> 
> However, to have it symmetric for the datagram-oriented case
> (UDP and AF_UNIX), ideally it should be changed to
> "dgram=host:port|path" -- which I guess we can't do.
> 
> I see two alternatives:
> 
> 1.
>   - "connect=" (TCP only)
>   - "unix=path,type=dgram|stream"
>   - "udp=" (UDP only)

This doesn't work when you need the UNIX server to be a
listener socket, as we've no way to express that, without
adding yet another parameter.

> 2.
>   - "connect=" (TCP and AF_UNIX stream)
>   - "unix_dgram="
>   - "udp=" (UDP only)

Also needs

   "listen=" (TCP and AF_UNIX stream)

"udp" has a corresponding optional "localaddr" for the sending
address.

Also overloading "connect" means we have to parse the value
to guess whether its a UNIX path or a IP addr:port pair.

I doubt people will have UNIX paths called "127.0.0.1:"
but if we can avoid such ambiguity by design, it is better.

> The major thing I like of 2. is that we save some code and a further
> option, but other than that I don't have a strong preference.

The pain we're feeling is largely because the design of the net
option syntax is one of the oldest parts of QEMU and has only
been very partially improved upon. It is certainly not using
QAPI best practice, if we look at this:

  { 'struct': 'NetdevSocketOptions',
'data': {
  '*fd':'str',
  '*listen':'str',
  '*connect':   'str',
  '*mcast': 'str',
  '*localaddr': 'str',
  '*udp':   'str' } }

Then some things come to mind

 - We're not provinding a way to say what kind of "fd" is
   passed in - is it a UDP/TCP FD, is it a listener or
   client FD, is it unicast or multicast FD. Though QEMU
   can interogate the socket to discover this I guess.

 - All of the properties there except "fd" are encoding two values
   in a single property - address + port. This is an anti-pattern

 - No support for ipv4=on|off and ipv6=on|off flags to control
   dual-stack usage.

 - Redundancy of separate parameters for "mcast" and "udp" when
   it is distinguishable based on the address given AFAIR.

 - No support for UNIX sockets


The "right" way to fix most of this long term is a radical change
to introduce use of the SocketAddress struct.

I could envisage something like this

  { 'enum': 'NetdevSocketMode',
'data':  ['dgram', 'client', 'server'] }

  { 'struct': 'NetdevSocketOptions',
'data': {
  'addr':  'SocketAddress',
  '*localaddr': 'SocketAddress',
  '*mode':  'NetdevSocketMode' } }


 - A TCP client

  addr.type = inet
  addr.host = 192.168.1.1
  mode = client

 - A TCP server on all interfaces

  addr.type = inet
  addr.host = 
  mode = server

 - A TCP server on a specific interface

  addr.type = inet
  addr.host = 192.168.1.1
  mode = server

 - A TCP server on all interfaces, without IPv4

  addr.type = inet
  addr.host = 
  addr.has_ipv4 = true
  addr.ipv4 = false
  mode = server

 - A UDP unicast transport

  addr.type = inet
  addr.host = 192.168.1.1
  mode = dgram

 - A UDP unicast transport with local addr

  addr.type = inet
  addr.host = 192.168.1.1
  localaddr.type = inet
  localaddr.host = 192.168.1.2
  mode = dgram

 - A UDP multicast transport

 addr.type = inet
 addr.host = 224.0.23.166
 mode = dgram

 - A UNIX stream client

  addr.type = unix
  addr.path = /some/socket
  mode = client

 - A UNIX stream server

  addr.type = unix
  addr.path = /some/socket
  mode = server

 - A UNIX dgram transport

  addr.type = unix
  addr.path = /some/socket
  mode = dgram


Now, of course you're just interested in adding UNIX socket support.

T

Re: [PATCH v2] vfio-ccw: Permit missing IRQs

2021-04-26 Thread Cornelia Huck

On Fri, 23 Apr 2021 12:24:57 -0400
Eric Farman  wrote:

> On Fri, 2021-04-23 at 09:22 -0400, Matthew Rosato wrote:

> > So, this looks OK to me.  
> 
> +1 (Thanks for doing the research, Matt)

+1 to both the analysis and the thanks :)

> 
> > 
> >   
> > > handle any ioctl failure gracefully), but worth a second look. In
> > > fact,
> > > we already unregister the crw irq unconditionally, so we would
> > > likely
> > > already have seen any problems for that one, so I assume all is
> > > good.
> > >   
> > 
> > But +1 on driving the path and making sure it works anyway (do a 
> > double-unregister?)  
> 
> Yeah, works fine. Tried skipping the register of the CRW and double-
> unregistering the IO IRQ.
> 
> I also tried a combination where I unconditionally unregister the REQ
> IRQ, which obviously throws a message when it doesn't exist on the
> host.

Good, thanks for double-checking.

> 
> That might be nice to clean up so that adding new IRQs in the future is
> more intuitive; I'll add it to the list unless you want me to address
> it in a v2 of this. (Previously, the addition of the REQ IRQ needed to
> add the cleanup of the CRW IRQ. So the next IRQ would need to cleanup
> the REQ IRQ.)

Yeah, let's just do it on top.

Re: [PATCH v2] vfio-ccw: Permit missing IRQs

2021-04-26 Thread Cornelia Huck

On Wed, 21 Apr 2021 17:20:53 +0200
Eric Farman  wrote:

> Commit 690e29b91102 ("vfio-ccw: Refactor ccw irq handler") changed
> one of the checks for the IRQ notifier registration from saying
> "the host needs to recognize the only IRQ that exists" to saying
> "the host needs to recognize ANY IRQ that exists."
> 
> And this worked fine, because the subsequent change to support the
> CRW IRQ notifier doesn't get into this code when running on an older
> kernel, thanks to a guard by a capability region. The later addition
> of the REQ(uest) IRQ by commit b2f96f9e4f5f ("vfio-ccw: Connect the
> device request notifier") broke this assumption because there is no
> matching capability region. Thus, running new QEMU on an older
> kernel fails with:
> 
>   vfio: unexpected number of irqs 2
> 
> Let's adapt the message here so that there's a better clue of what
> IRQ is missing.
> 
> Furthermore, let's make the REQ(uest) IRQ not fail when attempting
> to register it, to permit running vfio-ccw on a newer QEMU with an
> older kernel.
> 
> Fixes: b2f96f9e4f5f ("vfio-ccw: Connect the device request notifier")
> Signed-off-by: Eric Farman 
> ---
> 
> Notes:
> v1->v2:
>  - Keep existing "invalid number of IRQs" message with adapted text [CH]
>  - Move the "is this an error" test to the registration of the IRQ in
>question, rather than making it allowable for any IRQ mismatch [CH]
>  - Drop Fixes tag for initial commit [EF]
> 
> v1: 
> https://lore.kernel.org/qemu-devel/20210419184906.2847283-1-far...@linux.ibm.com/
> 
>  hw/vfio/ccw.c | 12 +++-
>  1 file changed, 7 insertions(+), 5 deletions(-)

Thanks, applied.

[PATCH] cutils: Fix memleak in get_relocated_path()

2021-04-26 Thread Zhenzhong Duan

Valgrind complains definitely loss in get_relocated_path(), because
GString is leaked in get_relocated_path() when returning with gchar *.
Use g_string_free(, false) to free GString while preserving gchar *.

Signed-off-by: Zhenzhong Duan 
---
 util/cutils.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/util/cutils.c b/util/cutils.c
index ee908486da..f58c2157d2 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -1055,5 +1055,5 @@ char *get_relocated_path(const char *dir)
 assert(G_IS_DIR_SEPARATOR(dir[-1]));
 g_string_append(result, dir - 1);
 }
-return result->str;
+return g_string_free(result, FALSE);
 }
-- 
2.25.1

[PATCH v2] hw/i386: Expand the range of CPU topologies between smp and maxcpus

2021-04-26 Thread caodon...@kingsoft.com

Change the criteria for the initial CPU topology and maxcpus, user can
have more settings

Signed-off-by: Dongli Cao 
---
hw/i386/pc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 8a84b25..ef2e819 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -751,7 +751,7 @@ void pc_smp_parse(MachineState *ms, QemuOpts *opts)
 exit(1);
 }

-if (sockets * dies * cores * threads != ms->smp.max_cpus) {
+if (sockets * dies * cores * threads > ms->smp.max_cpus) {
 error_report("Invalid CPU topology deprecated: "
  "sockets (%u) * dies (%u) * cores (%u) * threads (%u) 
"
  "!= maxcpus (%u)",
--
1.8.3.1









caodon...@kingsoft.com

Re: [PATCH for-6.0 v2 1/2] hw/block/nvme: fix invalid msix exclusive uninit

2021-04-26 Thread Michael S. Tsirkin

On Mon, Apr 26, 2021 at 11:27:04AM +0200, Philippe Mathieu-Daudé wrote:
> On 4/26/21 6:40 AM, Klaus Jensen wrote:
> > On Apr 23 07:21, Klaus Jensen wrote:
> >> From: Klaus Jensen 
> >>
> >> Commit 1901b4967c3f changed the nvme device from using a bar exclusive
> >> for MSI-x to sharing it on bar0.
> >>
> >> Unfortunately, the msix_uninit_exclusive_bar() call remains in
> >> nvme_exit() which causes havoc when the device is removed with, say,
> >> device_del. Fix this.
> >>
> >> Additionally, a subregion is added but it is not removed on exit which
> >> causes a reference to linger and the drive to never be unlocked.
> >>
> >> Fixes: 1901b4967c3f ("hw/block/nvme: move msix table and pba to BAR 0")
> >> Signed-off-by: Klaus Jensen 

Reviewed-by: Michael S. Tsirkin 

> >> ---
> >> hw/block/nvme.c | 3 ++-
> >> 1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> >> index 624a1431d072..5fe082ec34c5 100644
> >> --- a/hw/block/nvme.c
> >> +++ b/hw/block/nvme.c
> >> @@ -6235,7 +6235,8 @@ static void nvme_exit(PCIDevice *pci_dev)
> >>     if (n->pmr.dev) {
> >>     host_memory_backend_set_mapped(n->pmr.dev, false);
> >>     }
> >> -    msix_uninit_exclusive_bar(pci_dev);
> >> +    msix_uninit(pci_dev, &n->bar0, &n->bar0);
> >> +    memory_region_del_subregion(&n->bar0, &n->iomem);
> >> }
> >>
> >> static Property nvme_props[] = {
> >> -- 
> >> 2.31.1
> >>
> > 
> > Ping for a review on this please :)
> 
> You forgot to Cc the maintainers :/ (doing it now).
> 
> $ ./scripts/get_maintainer.pl -f include/hw/pci/msix.h
> "Michael S. Tsirkin"  (supporter:PCI)
> Marcel Apfelbaum  (supporter:PCI)

[PATCH] make vfio and DAX cache work together

2021-04-26 Thread Edge NFV



I am using Kata containers and create containers inside the virtual machine. 
The hypervisor used is QEMU.
My workload is I/O and network intensive. So, I prefer SR-IOV VF (which 
provides better network isolation) and DAX caching to work together.
However, I was unable to create a QEMU based virual machine when the above 
mentioned featires are enabled together.

Currently DAX cache need to be set to 0, to boot the VM successfully with the 
SR-IOV VF to be attached.
Enabling both SR-IOV VF and DAX work together will potentially improve 
performance for workloads which are I/O and network intensive.
This patch enables SR-IOV VF and DAX to work together.

[PATCH] make vfio and DAX cache work together by skipping virtio fs cache section during vfio memory region add

2021-04-26 Thread Edge NFV

Signed-off-by: Edge NFV 
---
 hw/vfio/common.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index ae5654fcdb..83e15bf7a3 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -668,6 +668,15 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 int128_get64(int128_sub(section->size, int128_one(;
 return;
 }
+
+/* Do not add virtio fs cache section */  
+if (!strcmp(memory_region_name(section->mr), "virtio-fs-cache")) {
+trace_vfio_listener_region_add_skip(
+section->offset_within_address_space,
+section->offset_within_address_space +
+int128_get64(int128_sub(section->size, int128_one(;
+return;
+}  
 
 if (unlikely((section->offset_within_address_space &
   ~qemu_real_host_page_mask) !=
-- 
2.25.1

Re: [PATCH] make vfio and DAX cache work together by skipping virtio fs cache section during vfio memory region add

2021-04-26 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20210426054513.132980-2-edge...@gmail.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20210426054513.132980-2-edge...@gmail.com
Subject: [PATCH] make vfio and DAX cache work together by skipping virtio fs 
cache section during vfio memory region add

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] patchew/20210426054513.132980-2-edge...@gmail.com -> 
patchew/20210426054513.132980-2-edge...@gmail.com
Switched to a new branch 'test'
32bddb3 make vfio and DAX cache work together by skipping virtio fs cache 
section during vfio memory region add

=== OUTPUT BEGIN ===
ERROR: trailing whitespace
#21: FILE: hw/vfio/common.c:671:
+$

ERROR: trailing whitespace
#22: FILE: hw/vfio/common.c:672:
+/* Do not add virtio fs cache section */  $

ERROR: trailing whitespace
#29: FILE: hw/vfio/common.c:679:
+}  $

total: 3 errors, 0 warnings, 15 lines checked

Commit 32bddb3c2253 (make vfio and DAX cache work together by skipping virtio 
fs cache section during vfio memory region add) has style problems, please 
review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20210426054513.132980-2-edge...@gmail.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH v2] hw/i386: Expand the range of CPU topologies between smp and maxcpus

2021-04-26 Thread Daniel P . Berrangé

On Mon, Apr 26, 2021 at 10:08:52AM +0800, caodon...@kingsoft.com wrote:
> Change the criteria for the initial CPU topology and maxcpus, user can
> have more settings

Can you provide a better explanation of why this is needed. What
valid usage scenario is blocked by the current check ?

AFAICT, it partially reverts an intentional change done in several
years ago in :


  commit bc1fb850a31468ac4976f3895f01a6d981e06d0a
  Author: Igor Mammedov 
  Date:   Thu Sep 13 13:06:01 2018 +0200

vl.c deprecate incorrect CPUs topology

-smp [cpus],sockets/cores/threads[,maxcpus] should describe topology
so that total number of logical CPUs [sockets * cores * threads]
would be equal to [maxcpus], however historically we didn't have
such check in QEMU and it is possible to start VM with an invalid
topology.
Deprecate invalid options combination so we can make sure that
the topology VM started with is always correct in the future.
Users with an invalid sockets/cores/threads/maxcpus values should
fix their CLI to make sure that
   [sockets * cores * threads] == [maxcpus]



> 
> Signed-off-by: Dongli Cao 
> ---
> hw/i386/pc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 8a84b25..ef2e819 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -751,7 +751,7 @@ void pc_smp_parse(MachineState *ms, QemuOpts *opts)
>  exit(1);
>  }
> 
> -if (sockets * dies * cores * threads != ms->smp.max_cpus) {
> +if (sockets * dies * cores * threads > ms->smp.max_cpus) {
>  error_report("Invalid CPU topology deprecated: "
>   "sockets (%u) * dies (%u) * cores (%u) * threads 
> (%u) "
>   "!= maxcpus (%u)",

This is 

> --
> 1.8.3.1
> 
> 
> 
> 
> 
> 
> 
> 
> 
> caodon...@kingsoft.com
> 
> 

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v3] multi-process: Initialize variables declared with g_auto*

2021-04-26 Thread Stefan Hajnoczi

On Tue, Apr 06, 2021 at 10:00:03PM +0800, Zenghui Yu wrote:
> [+Stefan]
> 
> On 2021/3/12 19:21, Zenghui Yu wrote:
> > Quote docs/devel/style.rst (section "Automatic memory deallocation"):
> > 
> > * Variables declared with g_auto* MUST always be initialized,
> >otherwise the cleanup function will use uninitialized stack memory
> > 
> > Initialize @name properly to get rid of the compilation error (using
> > gcc-7.3.0 on CentOS):
> > 
> > ../hw/remote/proxy.c: In function 'pci_proxy_dev_realize':
> > /usr/include/glib-2.0/glib/glib-autocleanups.h:28:3: error: 'name' may be 
> > used uninitialized in this function [-Werror=maybe-uninitialized]
> > g_free (*pp);
> > ^~~~
> > ../hw/remote/proxy.c:350:30: note: 'name' was declared here
> >   g_autofree char *name;
> >^~~~
> > 
> > Signed-off-by: Zenghui Yu 
> > Reviewed-by: Jagannathan Raman 
> > Reviewed-by: Philippe Mathieu-Daudé 
> 
> Message-Id: <20210312112143.1369-1-yuzeng...@huawei.com>
> 
> Ping for 6.0, thanks.

I'm sorry I missed this email! QEMU 6.0.0-rc4 has already been tagged
and the final release is tomorrow. Only critical patches are applied at
this stage of the release process.

My understanding is that this patch silences a gcc 7.3.0 warning. The
warning is bogus since both code paths always initialize the variable.
You should still be able to compile successfully using ./configure
--disable-werror.

I guess this issue was hit on CentOS 7 since CentOS 8 ships a newer
version of gcc. Debian stable also ships a newer gcc. That probably
explains why this issue has not been encountered by others. I don't
think the patch is absolutely critical for QEMU 6.0 although I regret
not having merged it earlier in the release process. Sorry again.

I have queued this patch for QEMU 6.1 and CCed it for -stable for
inclusion in QEMU 6.0.1.

If you think this patch is critical for QEMU 6.0, please get in contact
with myself and Peter Maydel ("pm215"), preferrably on #qemu
irc.oftc.net IRC as soon as possible.

Thanks, applied to my block-next tree:
https://gitlab.com/stefanha/qemu/commits/block-next

Stefan

signature.asc
Description: PGP signature

Re: [PATCH] make vfio and DAX cache work together by skipping virtio fs cache section during vfio memory region add

2021-04-26 Thread Laurent Vivier

On 26/04/2021 07:45, Edge NFV wrote:
> Signed-off-by: Edge NFV 

You must use your real name for a patch submission:

https://wiki.qemu.org/Contribute/SubmitAPatch#Patch_emails_must_include_a_Signed-off-by:_line

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297

And CC: maintainer of the file:

$ ./scripts/get_maintainer.pl -f hw/vfio/common.c
Alex Williamson  (supporter:VFIO)
qemu-devel@nongnu.org (open list:All patches CC here)

If you want to submit it to qemu-trivial, don't send a separate mail, but cc: 
also
qemu-trivial.

Thanks,
Laurent

> ---
>  hw/vfio/common.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index ae5654fcdb..83e15bf7a3 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -668,6 +668,15 @@ static void vfio_listener_region_add(MemoryListener 
> *listener,
>  int128_get64(int128_sub(section->size, int128_one(;
>  return;
>  }
> +
> +/* Do not add virtio fs cache section */  
> +if (!strcmp(memory_region_name(section->mr), "virtio-fs-cache")) {
> +trace_vfio_listener_region_add_skip(
> +section->offset_within_address_space,
> +section->offset_within_address_space +
> +int128_get64(int128_sub(section->size, int128_one(;
> +return;
> +}  
>  
>  if (unlikely((section->offset_within_address_space &
>~qemu_real_host_page_mask) !=
>

[Bug 1437970] Re: qemu-system-x86_64 - two mouse pointers & fast scrolling problem

2021-04-26 Thread Thomas Huth

Hi! I'm currently looking through old bug reports. Big sorry, seems like this
completely fell through the cracks... sometimes developers are just too busy
with other stuff or nobody really has a clue how to tackle certain bug 
tickets...
but it would have been good to have at least some reply here - I know this is
quite frustrating for a bug reporter otherwise.

Anyway, the QEMU project is currently considering to move its bug tracking
to another system. For this we need to know which bugs are still valid
and which could be closed already. Thus we are setting older bugs to
"Incomplete" now.

If you want to get this bug report transferred to the new system, then please
switch the state back to "New" within the next 60 days, otherwise this report
will be marked as "Expired". Or please mark it as "Fix Released" if the
problem has been solved with a newer version of QEMU already.

Thank you and sorry for the inconvenience.

** Changed in: qemu
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1437970

Title:
  qemu-system-x86_64 - two mouse pointers & fast scrolling problem

Status in QEMU:
  Incomplete

Bug description:
  Hello,

  System Specs

  Host:
  
  Slackware 14.1 x86_64
  Openbox 3.5.2
  tint2 panel (svn version)
  Nvidia GTX660M
  nvidia-driver-346.35
  Screen: 17" @1920x1080@60Hz
  Logitech USB mouse

  Guest
  -
  Slackware 14.1 x86_64
  XFce 4.10
  Screen 17" @1920x1080
  xf86-video-vmware 13.0.1

  *** I am not using an xorg.conf file in the guest ***

  QEMU 2.2.1

  I start Slackware in QEMU using 'Zoom To Fit' when it first boots up
  and I log into X, at this point I notice the mouse shows with 2
  pointers and when I move the mouse around, shows as trails, but it's
  actually a second pointer that appears under the first.

  If I use 'Ctrl Alt F' and go into full screen mode the mouse gets
  corrected and only appears as one pointer and no pointer under the
  second one when moving around. So this mouse problem only appears the
  first time I log into X with 'Zoom To Fit'.

  Also if log in instead as 'Full Screen' I do not see the issue, as
  well as if I log in 'Full Screen' and change back to 'Zoom To Fit' it
  does not happen.

  I also noticed using '-usbdevice tablet', the mouse wheel scrolling in any 
application, the mouse ends up moving me instead to another virtual desktop, 
XFce by default uses 4. Sometimes it moves me to another desktop if I move the 
mouse wheel slowly, and sometimes I need to move
  it quickly, but in Firefox it doesn't take much movement of the mousewheel 
and I get moved to another desktop. Browsing with Firefox
  using '-usbdevice tablet' is not easy.

  Command line options:
  
  qemu-system-x86_64 -rtc base=localtime Slackware\ 14.1\ x64.img -m 4096 
--enable-kvm -smp 2 -vga vmware -usbdevice tablet

  I wanted to use -usbdevice tablet to have seamless mouse movement back
  and forth from Host to Guest without having to Grab...

  If I remove '-usbdevice tablet' and log into X the frst time as 'Zoom
  To Fit' I see two mouse pointers, but as soon as I click the desktop
  and the mouse is grabbed the second one goes away and when I move the
  mouse there is only one pointer.

  Also without the option '-usbdevice tablet' and I move the scroll
  wheel quickly the mouse stays focused on the application and it
  doesn't move the desktop.

  Please see the attached screen shots, qemu_1.jpg shows 2 mouse
  pointers when I first log into X as 'Zoom To Fit', and qemu_2.jpg
  shows when I'm staying in 'Zoom To Fit' mode and moving the mouse
  around, with a pointer under the pointer.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1437970/+subscriptions

Re: X on old (non-x86) Linux guests

2021-04-26 Thread Thomas Huth


On 26/04/2021 12.01, Dr. David Alan Gilbert wrote:

Hi,
   Over the weekend I got a Red Hat 6.x (not RHEL!) for Alpha booting
under QEMU which was pretty neat.  But I failed to find a succesful
combination to get X working; has anyone any suggestions?

   That distro was from around 2000; the challenge is since we don't have
VESA on non-x86, we can't change mode that way, so generic XF86_SVGA
doesn't want to play with any of the devices.

   I also tried the ati device, but the accelerated mach64 driver
didn't recognise that ID.

   Has anyone found any combo that works?


Not sure if it is of any help, but IIRC the advent calendar image #4 from 
the 2014 edition also uses an ancient Red Hat image with X, and it still 
seems to be working with recent versions of QEMU:


 https://www.qemu-advent-calendar.org/2014/#day-4

Maybe you could copy the config from that image?

 Thomas

Re: X on old (non-x86) Linux guests

2021-04-26 Thread Laurent Vivier

On 26/04/2021 12:01, Dr. David Alan Gilbert wrote:
> Hi,
>   Over the weekend I got a Red Hat 6.x (not RHEL!) for Alpha booting
> under QEMU which was pretty neat.  But I failed to find a succesful
> combination to get X working; has anyone any suggestions?
> 
>   That distro was from around 2000; the challenge is since we don't have
> VESA on non-x86, we can't change mode that way, so generic XF86_SVGA
> doesn't want to play with any of the devices.
> 
>   I also tried the ati device, but the accelerated mach64 driver
> didn't recognise that ID.
> 
>   Has anyone found any combo that works?
> I suspect using one of the existing devices, lying about PCI ID, and
> then turning off all accelerations might have a chance but I've not got
> that far.
> 
> [Alpha took a bit of a fight; none of the SCSI controllers were
> happy, but the CMD646 worked well enough to install off a CD image
> from a -kernel passed in ]
> 

Did you try to use kernel framebuffer with X fbdev driver?

Thanks,
Laurent

Re: [RFC PATCH] scripts/tracetool: don't barf validating TCG types

2021-04-26 Thread Stefan Hajnoczi

On Tue, Apr 06, 2021 at 05:53:07PM +0100, Alex Bennée wrote:
> TCG types will be transformed into the appropriate host types later on
> in the tool. Try and work around this by detecting those cases and
> pressing on.
> 
> [AJB: this seems a bit too hacky - but the problem is validate_type is
> buried a few layers deep. Maybe we should just drop TCGv from
> ALLOWED_TYPES and manually do if bit.startswith("TCGv_") in validate_type?]

Please include a line from a trace-events file that triggers the issue.
It's unclear to me what the problem is although I guess it's related to
TCGv_* types being rejected by validate_type. The
bit.startswith("TCGv_") change you mentioned sounds fine or a slightly
more general ALLOWED_TYPE_PREFIXES = ['TCGv_'].

> Fixes: 73ff061032 ("trace: only permit standard C types and fixed size 
> integer types")
> Signed-off-by: Alex Bennée 
> Cc: Matheus Ferst 
> ---
>  scripts/tracetool/__init__.py | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/scripts/tracetool/__init__.py b/scripts/tracetool/__init__.py
> index 5bc94d95cf..ea078e34b4 100644
> --- a/scripts/tracetool/__init__.py
> +++ b/scripts/tracetool/__init__.py
> @@ -87,10 +87,11 @@ def out(*lines, **kwargs):
>  "ssize_t",
>  "uintptr_t",
>  "ptrdiff_t",
> -# Magic substitution is done by tracetool
> +# Magic substitution is done by tracetool TCG_2_HOST

This makes it clearer that "TCG_2_HOST" is a code reference:

  # Magic substitution is done using tracetool.transform.TCG_2_HOST


signature.asc
Description: PGP signature

Re: [PATCH] cutils: Fix memleak in get_relocated_path()

2021-04-26 Thread Philippe Mathieu-Daudé

Hi,

On 4/27/21 12:30 AM, Zhenzhong Duan wrote:
> Valgrind complains definitely loss in get_relocated_path(), because
> GString is leaked in get_relocated_path() when returning with gchar *.
> Use g_string_free(, false) to free GString while preserving gchar *.
> 
> Signed-off-by: Zhenzhong Duan 
> ---
>  util/cutils.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/util/cutils.c b/util/cutils.c
> index ee908486da..f58c2157d2 100644
> --- a/util/cutils.c
> +++ b/util/cutils.c
> @@ -1055,5 +1055,5 @@ char *get_relocated_path(const char *dir)
>  assert(G_IS_DIR_SEPARATOR(dir[-1]));
>  g_string_append(result, dir - 1);
>  }
> -return result->str;
> +return g_string_free(result, FALSE);
>  }
> 

Thanks for your patch, but Stefano sent the same fix 2 weeks ago:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg798279.html

It should be merged once the development tree opens again (we are
now 'freezed' before the v6.0.0 release).

You might want to send your Review-by or Tested-by tag to Stefano's
patch.

Regards,

Phil.

Re: X on old (non-x86) Linux guests

2021-04-26 Thread Dr. David Alan Gilbert

* Thomas Huth (th.h...@posteo.de) wrote:
> On 26/04/2021 12.01, Dr. David Alan Gilbert wrote:
> > Hi,
> >Over the weekend I got a Red Hat 6.x (not RHEL!) for Alpha booting
> > under QEMU which was pretty neat.  But I failed to find a succesful
> > combination to get X working; has anyone any suggestions?
> > 
> >That distro was from around 2000; the challenge is since we don't have
> > VESA on non-x86, we can't change mode that way, so generic XF86_SVGA
> > doesn't want to play with any of the devices.
> > 
> >I also tried the ati device, but the accelerated mach64 driver
> > didn't recognise that ID.
> > 
> >Has anyone found any combo that works?
> 
> Not sure if it is of any help, but IIRC the advent calendar image #4 from
> the 2014 edition also uses an ancient Red Hat image with X, and it still
> seems to be working with recent versions of QEMU:
> 
>  https://www.qemu-advent-calendar.org/2014/#day-4
> 
> Maybe you could copy the config from that image?

That's using the VESA driver, which we generally don't have on non-x86.

Dave

>  Thomas
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: X on old (non-x86) Linux guests

2021-04-26 Thread Dr. David Alan Gilbert

* Laurent Vivier (lviv...@redhat.com) wrote:
> On 26/04/2021 12:01, Dr. David Alan Gilbert wrote:
> > Hi,
> >   Over the weekend I got a Red Hat 6.x (not RHEL!) for Alpha booting
> > under QEMU which was pretty neat.  But I failed to find a succesful
> > combination to get X working; has anyone any suggestions?
> > 
> >   That distro was from around 2000; the challenge is since we don't have
> > VESA on non-x86, we can't change mode that way, so generic XF86_SVGA
> > doesn't want to play with any of the devices.
> > 
> >   I also tried the ati device, but the accelerated mach64 driver
> > didn't recognise that ID.
> > 
> >   Has anyone found any combo that works?
> > I suspect using one of the existing devices, lying about PCI ID, and
> > then turning off all accelerations might have a chance but I've not got
> > that far.
> > 
> > [Alpha took a bit of a fight; none of the SCSI controllers were
> > happy, but the CMD646 worked well enough to install off a CD image
> > from a -kernel passed in ]
> > 
> 
> Did you try to use kernel framebuffer with X fbdev driver?

I hadn't, but how do I get it into fb mode - vga=ask doesn't work, so
again I don't think I can get into graphics.

Dave

> Thanks,
> Laurent
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH v4] net/macos: implement vmnet-based netdev

2021-04-26 Thread Alessio Dionisi

Hello,

I'm replying to this thread to let you know that I am available to help out on 
this patch as I have been working on the same feature for a few weeks.

AD

> On 3 Mar 2021, at 12:24, Phillip Tennen  > wrote:
> 
> Thanks very much for your help and feedback!
> 
> Apologies for my delay in following up. I'll submit a new version that 
> implements the feedback you've provided here, as well as the QAPI schema 
> changes @Markus Armbruster  (thanks to you as well 
> for your time and review!) pointed out.
> 
> Phillip
> 
> On Wed, Feb 24, 2021 at 12:25 AM Roman Bolshakov  > wrote:
> On Thu, Feb 18, 2021 at 02:49:47PM +0100, phillip.en...@gmail.com 
>  wrote:
> > From: Phillip Tennen mailto:phil...@axleos.com>>
> > 
> > This patch implements a new netdev device, reachable via -netdev
> > vmnet-macos, that’s backed by macOS’s vmnet framework.
> > 
> > The vmnet framework provides native bridging support, and its usage in
> > this patch is intended as a replacement for attempts to use a tap device
> > via the tuntaposx kernel extension. Notably, the tap/tuntaposx approach
> > never would have worked in the first place, as QEMU interacts with the
> > tap device via poll(), and macOS does not support polling device files.
> > 
> > vmnet requires either a special entitlement, granted via a provisioning
> > profile, or root access. Otherwise attempts to create the virtual
> > interface will fail with a “generic error” status code. QEMU may not
> > currently be signed with an entitlement granted in a provisioning
> > profile, as this would necessitate pre-signed binary build distribution,
> > rather than source-code distribution. As such, using this netdev
> > currently requires that qemu be run with root access. I’ve opened a
> > feedback report with Apple to allow the use of the relevant entitlement
> > with this use case:
> > https://openradar.appspot.com/radar?id=5007417364447232 
> > 
> > 
> > vmnet offers three operating modes, all of which are supported by this
> > patch via the “mode=host|shared|bridge” option:
> > 
> > * "Host" mode: Allows the vmnet interface to communicate with other
> > * vmnet
> > interfaces that are in host mode and also with the native host.
> > * "Shared" mode: Allows traffic originating from the vmnet interface to
> > reach the Internet through a NAT. The vmnet interface can also
> > communicate with the native host.
> > * "Bridged" mode: Bridges the vmnet interface with a physical network
> > interface.
> > 
> > Each of these modes also provide some extra configuration that’s
> > supported by this patch:
> > 
> > * "Bridged" mode: The user may specify the physical interface to bridge
> > with. Defaults to en0.
> > * "Host" mode / "Shared" mode: The user may specify the DHCP range and
> > subnet. Allocated by vmnet if not provided.
> > 
> > vmnet also offers some extra configuration options that are not
> > supported by this patch:
> > 
> > * Enable isolation from other VMs using vmnet
> > * Port forwarding rules
> > * Enabling TCP segmentation offload
> > * Only applicable in "shared" mode: specifying the NAT IPv6 prefix
> > * Only available in "host" mode: specifying the IP address for the VM
> > within an isolated network
> > 
> > Note that this patch requires macOS 10.15 as a minimum, as this is when
> > bridging support was implemented in vmnet.framework.
> > 
> > Signed-off-by: Phillip Tennen  > >
> > ---
> >  configure |   2 +-
> >  net/clients.h |   6 +
> >  net/meson.build   |   1 +
> >  net/net.c |   3 +
> >  net/vmnet-macos.c | 447 ++
> >  qapi/net.json | 120 -
> >  qemu-options.hx   |   9 +
> >  7 files changed, 585 insertions(+), 3 deletions(-)
> >  create mode 100644 net/vmnet-macos.c
> > 
> 
> Hi Phillip,
> 
> Thanks for working on this!
> 
> Note that the patch doesn't apply to current master and there's a lot of
> warnings wrt trailing whitespaces:
> 
> git am v4-net-macos-implement-vmnet-based-netdev.patch
> Applying: net/macos: implement vmnet-based netdev
> .git/rebase-apply/patch:462: trailing whitespace.
>  * If QEMU is started with -nographic, no Cocoa event loop will be
> .git/rebase-apply/patch:465: trailing whitespace.
> dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_HIGH,
> .git/rebase-apply/patch:466: trailing whitespace.
>  0),
> .git/rebase-apply/patch:532: trailing whitespace.
> # @host: the guest may communicate with the host
> .git/rebase-apply/patch:535: trailing whitespace.
> # @shared: the guest may reach the Internet through a NAT,
> error: patch failed: configure:778
> error: configure: patch does not apply
> Patch failed at 0001 net/macos: implement vmnet-based netdev
> hint: Use

Re: [PULL 46/46] accel: introduce AccelCPUClass extending CPUClass

2021-04-26 Thread Philippe Mathieu-Daudé

Hi Claudio,

+Eduardo/Paolo

On 2/5/21 11:56 PM, Richard Henderson wrote:
> From: Claudio Fontana 
> 
> add a new optional interface to CPUClass, which allows accelerators
> to extend the CPUClass with additional accelerator-specific
> initializations.
> 
> This will allow to separate the target cpu code that is specific
> to each accelerator, and register it automatically with object
> hierarchy lookup depending on accelerator code availability,
> as part of the accel_init_interfaces() initialization step.
> 
> Signed-off-by: Claudio Fontana 
> Message-Id: <20210204163931.7358-19-cfont...@suse.de>
> Signed-off-by: Richard Henderson 
> ---
>  include/hw/core/accel-cpu.h | 38 
>  include/hw/core/cpu.h   |  4 
>  accel/accel-common.c| 44 +
>  MAINTAINERS |  1 +
>  4 files changed, 87 insertions(+)
>  create mode 100644 include/hw/core/accel-cpu.h
> 
> diff --git a/include/hw/core/accel-cpu.h b/include/hw/core/accel-cpu.h
> new file mode 100644
> index 00..24a6697412
> --- /dev/null
> +++ b/include/hw/core/accel-cpu.h
> @@ -0,0 +1,38 @@
> +/*
> + * Accelerator interface, specializes CPUClass
> + * This header is used only by target-specific code.
> + *
> + * Copyright 2021 SUSE LLC
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef ACCEL_CPU_H
> +#define ACCEL_CPU_H
> +
> +/*
> + * This header is used to define new accelerator-specific target-specific
> + * accelerator cpu subclasses.
> + * It uses CPU_RESOLVING_TYPE, so this is clearly target-specific.
> + *
> + * Do not try to use for any other purpose than the implementation of new
> + * subclasses in target/, or the accel implementation itself in accel/
> + */
> +
> +#define TYPE_ACCEL_CPU "accel-" CPU_RESOLVING_TYPE
> +#define ACCEL_CPU_NAME(name) (name "-" TYPE_ACCEL_CPU)
> +typedef struct AccelCPUClass AccelCPUClass;
> +DECLARE_CLASS_CHECKERS(AccelCPUClass, ACCEL_CPU, TYPE_ACCEL_CPU)
> +
> +typedef struct AccelCPUClass {
> +/*< private >*/
> +ObjectClass parent_class;
> +/*< public >*/
> +
> +void (*cpu_class_init)(CPUClass *cc);
> +void (*cpu_instance_init)(CPUState *cpu);
> +void (*cpu_realizefn)(CPUState *cpu, Error **errp);

We could use a const pointer to const AccelCPUOps.

> +} AccelCPUClass;
> +
> +#endif /* ACCEL_CPU_H */
> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index 4f6c6b18c9..38d813c389 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
> @@ -79,6 +79,9 @@ struct TranslationBlock;
>  /* see tcg-cpu-ops.h */
>  struct TCGCPUOps;
>  
> +/* see accel-cpu.h */
> +struct AccelCPUClass;
> +
>  /**
>   * CPUClass:
>   * @class_by_name: Callback to map -cpu command line model name to an
> @@ -187,6 +190,7 @@ struct CPUClass {
>  /* Keep non-pointer data at the end to minimize holes.  */
>  int gdb_num_core_regs;
>  bool gdb_stop_before_watchpoint;
> +struct AccelCPUClass *accel_cpu;

(so here AccelCPUOps)

>  
>  /* when TCG is not available, this pointer is NULL */
>  struct TCGCPUOps *tcg_ops;
> diff --git a/accel/accel-common.c b/accel/accel-common.c
> index 6b59873419..9901b0531c 100644
> --- a/accel/accel-common.c
> +++ b/accel/accel-common.c
> @@ -26,6 +26,9 @@
>  #include "qemu/osdep.h"
>  #include "qemu/accel.h"
>  
> +#include "cpu.h"
> +#include "hw/core/accel-cpu.h"
> +
>  #ifndef CONFIG_USER_ONLY
>  #include "accel-softmmu.h"
>  #endif /* !CONFIG_USER_ONLY */
> @@ -46,16 +49,57 @@ AccelClass *accel_find(const char *opt_name)
>  return ac;
>  }
>  
> +static void accel_init_cpu_int_aux(ObjectClass *klass, void *opaque)
> +{
> +CPUClass *cc = CPU_CLASS(klass);
> +AccelCPUClass *accel_cpu = opaque;
> +
> +cc->accel_cpu = accel_cpu;
> +if (accel_cpu->cpu_class_init) {
> +accel_cpu->cpu_class_init(cc);
> +}
> +}
> +
> +/* initialize the arch-specific accel CpuClass interfaces */
> +static void accel_init_cpu_interfaces(AccelClass *ac)
> +{
> +const char *ac_name; /* AccelClass name */
> +char *acc_name;  /* AccelCPUClass name */
> +ObjectClass *acc;/* AccelCPUClass */
> +
> +ac_name = object_class_get_name(OBJECT_CLASS(ac));
> +g_assert(ac_name != NULL);
> +
> +acc_name = g_strdup_printf("%s-%s", ac_name, CPU_RESOLVING_TYPE);
> +acc = object_class_by_name(acc_name);
> +g_free(acc_name);
> +
> +if (acc) {
> +object_class_foreach(accel_init_cpu_int_aux,
> + CPU_RESOLVING_TYPE, false, acc);
> +}
> +}
> +
>  void accel_init_interfaces(AccelClass *ac)
>  {
>  #ifndef CONFIG_USER_ONLY
>  accel_init_ops_interfaces(ac);
>  #endif /* !CONFIG_USER_ONLY */
> +
> +accel_init_cpu_interfaces(ac);
>  }
>  
> +static const TypeInfo accel_cpu_type = {
> +.name = TYPE_ACCEL_CPU,
> +.parent = TYPE_OBJECT,
> +.abstract = true,

Re: [PATCH 2/3] vhost-blk: Add vhost-blk-common abstraction

2021-04-26 Thread Stefan Hajnoczi

On Thu, Apr 08, 2021 at 06:12:51PM +0800, Xie Yongji wrote:
> diff --git a/hw/block/vhost-blk-common.c b/hw/block/vhost-blk-common.c
> new file mode 100644
> index 00..96500f6c89
> --- /dev/null
> +++ b/hw/block/vhost-blk-common.c
> @@ -0,0 +1,291 @@
> +/*
> + * Parent class for vhost based block devices
> + *
> + * Copyright (C) 2021 Bytedance Inc. and/or its affiliates. All rights 
> reserved.
> + *
> + * Author:
> + *   Xie Yongji 
> + *
> + * Heavily based on the vhost-user-blk.c by:
> + *   Changpeng Liu 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.

The hw/block/vhost-user-blk.c license is the GNU LGPL, version 2 or
later. Why did you change the license and did you get permission from
the copyright holders?


signature.asc
Description: PGP signature

Re: X on old (non-x86) Linux guests

2021-04-26 Thread Laurent Vivier

On 26/04/2021 16:18, Dr. David Alan Gilbert wrote:
> * Laurent Vivier (lviv...@redhat.com) wrote:
>> On 26/04/2021 12:01, Dr. David Alan Gilbert wrote:
>>> Hi,
>>>   Over the weekend I got a Red Hat 6.x (not RHEL!) for Alpha booting
>>> under QEMU which was pretty neat.  But I failed to find a succesful
>>> combination to get X working; has anyone any suggestions?
>>>
>>>   That distro was from around 2000; the challenge is since we don't have
>>> VESA on non-x86, we can't change mode that way, so generic XF86_SVGA
>>> doesn't want to play with any of the devices.
>>>
>>>   I also tried the ati device, but the accelerated mach64 driver
>>> didn't recognise that ID.
>>>
>>>   Has anyone found any combo that works?
>>> I suspect using one of the existing devices, lying about PCI ID, and
>>> then turning off all accelerations might have a chance but I've not got
>>> that far.
>>>
>>> [Alpha took a bit of a fight; none of the SCSI controllers were
>>> happy, but the CMD646 worked well enough to install off a CD image
>>> from a -kernel passed in ]
>>>
>>
>> Did you try to use kernel framebuffer with X fbdev driver?
> 
> I hadn't, but how do I get it into fb mode - vga=ask doesn't work, so
> again I don't think I can get into graphics.

In kernel 2.2, it seems to be "video=[FB]", perhaps video=atyfb ?

thanks,
Laurent

Re: [PATCH] make vfio and DAX cache work together

2021-04-26 Thread Alex Williamson

On Mon, 26 Apr 2021 13:19:05 +0100
"Dr. David Alan Gilbert"  wrote:

> * Edge NFV (edge...@gmail.com) wrote:
> >  Signed-off-by: Edge NFV   
> 
> Hi,
>   I take it that 'Edge NFV' isn't your real name; apologies if it is.
> It's unusual not to use a real name; I would be interested to know
> why you feel uncomfortable not doing.

The documentation noted by Laurent even goes so far as to request a
real name for the Sign-off.  Intentionally masking your identity will
only serve to raise suspicion and increase the chances that the patch
is ignored.  I think perhaps aside from one or two legacy contributors,
all QEMU contributions are signed-off by real names these days.

I also require a commit log describing the change.

> > ---
> >  hw/vfio/common.c | 9 +
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > index ae5654fcdb..83e15bf7a3 100644
> > --- a/hw/vfio/common.c
> > +++ b/hw/vfio/common.c
> > @@ -668,6 +668,15 @@ static void vfio_listener_region_add(MemoryListener
> > *listener,
> >  int128_get64(int128_sub(section->size, int128_one(;
> >  return;
> >  }
> > +
> > +/* Do not add virtio fs cache section */
> > +if (!strcmp(memory_region_name(section->mr), "virtio-fs-cache")) {  
> 
> So first, this is a patch that fixes something that isn't yet in qemu;
> the DAX mode of virtiofs.
> Secondly, hard coding the name like this is probably the wrong thing to
> do; we need a way for the cache to declare it wants to be omitted.
> Thirdly, shouldn't this actually be a change to
> vfio_listener_skip_section to add this test?

Agree on all points, there needs to be justification on why this region
cannot be a DMA target for the device, not simply wishing to skip it to
workaround a boot failure.  Thanks,

Alex

> > +trace_vfio_listener_region_add_skip(
> > +section->offset_within_address_space,
> > +section->offset_within_address_space +
> > +int128_get64(int128_sub(section->size, int128_one(;
> > +return;
> > +}
> > 
> >  if (unlikely((section->offset_within_address_space &
> >~qemu_real_host_page_mask) !=
> > -- 
> > 2.25.1

Re: [PATCH 3/3] vhost-vdpa-blk: Introduce vhost-vdpa-blk host device

2021-04-26 Thread Stefan Hajnoczi

On Thu, Apr 08, 2021 at 06:12:52PM +0800, Xie Yongji wrote:
> +static const int vdpa_feature_bits[] = {
> +VIRTIO_BLK_F_SIZE_MAX,
> +VIRTIO_BLK_F_SEG_MAX,
> +VIRTIO_BLK_F_GEOMETRY,
> +VIRTIO_BLK_F_BLK_SIZE,
> +VIRTIO_BLK_F_TOPOLOGY,
> +VIRTIO_BLK_F_MQ,
> +VIRTIO_BLK_F_RO,
> +VIRTIO_BLK_F_FLUSH,
> +VIRTIO_BLK_F_CONFIG_WCE,
> +VIRTIO_BLK_F_DISCARD,
> +VIRTIO_BLK_F_WRITE_ZEROES,
> +VIRTIO_F_VERSION_1,
> +VIRTIO_RING_F_INDIRECT_DESC,
> +VIRTIO_RING_F_EVENT_IDX,
> +VIRTIO_F_NOTIFY_ON_EMPTY,
> +VHOST_INVALID_FEATURE_BIT
> +};

Please add VIRTIO_F_RING_PACKED so it can be implemented in vDPA in the
future without changes to the QEMU vhost-vdpa-blk.c code.

> +static void vhost_vdpa_blk_device_realize(DeviceState *dev, Error **errp)
> +{
> +VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> +VHostVdpaBlk *s = VHOST_VDPA_BLK(vdev);
> +VHostBlkCommon *vbc = VHOST_BLK_COMMON(s);
> +Error *err = NULL;
> +int ret;
> +
> +s->vdpa.device_fd = qemu_open_old(s->vdpa_dev, O_RDWR);
> +if (s->vdpa.device_fd == -1) {
> +error_setg(errp, "vhost-vdpa-blk: open %s failed: %s",
> +   s->vdpa_dev, strerror(errno));
> +return;
> +}
> +
> +vhost_blk_common_realize(vbc, vhost_vdpa_blk_handle_output, &err);
> +if (err != NULL) {
> +error_propagate(errp, err);
> +goto blk_err;
> +}
> +
> +vbc->vhost_vqs = g_new0(struct vhost_virtqueue, vbc->num_queues);

This is already done by vhost_blk_common_realize(). The old pointer is
overwritten and leaked here.

> +static const VMStateDescription vmstate_vhost_vdpa_blk = {
> +.name = "vhost-vdpa-blk",
> +.minimum_version_id = 1,
> +.version_id = 1,
> +.fields = (VMStateField[]) {
> +VMSTATE_VIRTIO_DEVICE,
> +VMSTATE_END_OF_LIST()
> +},
> +};

vdpa-blk does not support live migration yet. Please remove this.

Does hw/virtio/vhost.c should automatically register a migration
blocker? If not, please register one.

> +#define TYPE_VHOST_VDPA_BLK "vhost-vdpa-blk"

At this stage vdpa-blk is still very new and in development. I suggest
naming it x-vhost-vdpa-blk so that incompatible changes can still be
made to the command-line/APIs. "x-" can be removed later when the
feature has matured.


signature.asc
Description: PGP signature

Re: [PULL 15/24] bsd-user: Fix commentary issues

2021-04-26 Thread Warner Losh

On Mon, Apr 26, 2021 at 3:13 AM Peter Maydell 
wrote:

> On Mon, 26 Apr 2021 at 10:01, Daniel P. Berrangé 
> wrote:
> >
> > On Fri, Apr 23, 2021 at 02:39:50PM -0600, i...@bsdimp.com wrote:
> > > -#define TARGET_FREEBSD_MAP_RESERVED0080 0x0080  /* previously
> misimplemented MAP_INHERIT */
> > > -#define TARGET_FREEBSD_MAP_RESERVED0100 0x0100  /* previously
> unimplemented MAP_NOEXTEND */
> > > -#define TARGET_FREEBSD_MAP_STACK0x0400  /* region grows down,
> like a stack */
> > > -#define TARGET_FREEBSD_MAP_NOSYNC   0x0800  /* page to but do not
> sync underlying file */
> > > +#define TARGET_FREEBSD_MAP_RESERVED0080 0x0080
> > > + /* previously misimplemented
> MAP_INHERIT */
> > > +#define TARGET_FREEBSD_MAP_RESERVED0100 0x0100
> > > + /* previously unimplemented
> MAP_NOEXTEND */
> > > +#define TARGET_FREEBSD_MAP_STACK0x0400
> > > + /* region grows down, like a stack */
> > > +#define TARGET_FREEBSD_MAP_NOSYNC   0x0800
> > > + /* page to but do not sync
> underlying file */
> >
> > I find this indented following comment style more ambiguous as to
> > what constant the comment applies to. IMHO would be clearer as
> >
> >  /* previously misimplemented MAP_INHERIT */
> >  #define TARGET_FREEBSD_MAP_RESERVED0080 0x0080
> >
> >  /* previously unimplemented MAP_NOEXTEND */
> >  #define TARGET_FREEBSD_MAP_RESERVED0100 0x0100
> >
> >  /* region grows down, like a stack */
> >  #define TARGET_FREEBSD_MAP_STACK0x0400
> >
> >  /* page to but do not sync underlying file */
> >  #define TARGET_FREEBSD_MAP_NOSYNC   0x0800
>
> Or alternatively decide that this is one of those cases where the coding
> style's "If wrapping the line at 80 columns is obviously less readable and
> more awkward, prefer not to wrap it" advice applies. The lines as they
> stand
> are only 95 characters or so long.
>

I'm cool either way. Projects differ about how rigid or flexible style can
be.
I noticed this when cleaning up another thing. If we can apply that advice,
I'd prefer that.

Maybe it's better to just drop this entirely. In the final state, this file
is
here, but none of these flags are actually used. Not sure why they aren't,
but there's commentary that's explicit about using the host constants
in many places. Rather than take the time to sort this all out right now
(it is a minor detail I think in the scheme of things compared to all the
other changes coming), I'd propose dropping this hunk entirely, and
revisiting when the merging is done...

Warner

Re: [PATCH 2/5] hw/pcmcia/microdrive: Register machine reset handler

2021-04-26 Thread Philippe Mathieu-Daudé

On 4/25/21 8:36 PM, Peter Maydell wrote:
> On Sat, 24 Apr 2021 at 17:22, Philippe Mathieu-Daudé  wrote:
>>
>> The abstract PCMCIA_CARD is a bus-less TYPE_DEVICE, so devices
>> implementing it are not reset automatically.
>> Register a reset handler so children get reset on machine reset.
>>
>> Note, the DSCM-1 device (TYPE_DSCM1) which inherits
>> TYPE_MICRODRIVE and PCMCIA_CARD reset itself when a disk is
>> attached or detached, but was not resetting itself on machine
>> reset.
>>
>> It doesn't seem to be an issue because it is that way since the
>> device QDev'ifycation 8 years ago, in commit d1f2c96a81a
>> ("pcmcia: QOM'ify PCMCIACardState and MicroDriveState").
>> Still, correct to have a proper API usage.
>>
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>>  hw/pcmcia/pcmcia.c | 25 +
>>  1 file changed, 25 insertions(+)
>>
>> diff --git a/hw/pcmcia/pcmcia.c b/hw/pcmcia/pcmcia.c
>> index 03d13e7d670..73656257227 100644
>> --- a/hw/pcmcia/pcmcia.c
>> +++ b/hw/pcmcia/pcmcia.c
>> @@ -6,14 +6,39 @@
>>
>>  #include "qemu/osdep.h"
>>  #include "qemu/module.h"
>> +#include "sysemu/reset.h"
>>  #include "hw/pcmcia.h"
>>
>> +static void pcmcia_card_reset_handler(void *dev)
>> +{
>> +device_legacy_reset(DEVICE(dev));
>> +}
>> +
>> +static void pcmcia_card_realize(DeviceState *dev, Error **errp)
>> +{
>> +qemu_register_reset(pcmcia_card_reset_handler, dev);
>> +}
>> +
>> +static void pcmcia_card_unrealize(DeviceState *dev)
>> +{
>> +qemu_unregister_reset(pcmcia_card_reset_handler, dev);
>> +}
> 
> Why isn't a pcmcia card something that plugs into a bus ?

No clue, looks like a very old device with unfinished qdev-ification?

See pxa2xx_pcmcia_attach():

/* Insert a new card into a slot */
int pxa2xx_pcmcia_attach(void *opaque, PCMCIACardState *card)
{
PXA2xxPCMCIAState *s = (PXA2xxPCMCIAState *) opaque;
PCMCIACardClass *pcc;

...
s->card = card;
pcc = PCMCIA_CARD_GET_CLASS(s->card);
...
s->card->slot = &s->slot;
pcc->attach(s->card);
...
}

[for-6.1 v2 2/2] virtiofsd: Add support for FUSE_SYNCFS request

2021-04-26 Thread Greg Kurz

Honor the expected behavior of syncfs() to synchronously flush all
data and metadata on linux systems.

Flushing is done with syncfs(). This is suboptimal as it will also
flush writes performed by any other process on the same file system,
and thus add an unbounded time penalty to syncfs(). This may be
optimized in the future, but enforce correctness first.

Signed-off-by: Greg Kurz 
---
 tools/virtiofsd/fuse_lowlevel.c   | 19 ++
 tools/virtiofsd/fuse_lowlevel.h   | 13 
 tools/virtiofsd/passthrough_ll.c  | 29 +++
 tools/virtiofsd/passthrough_seccomp.c |  1 +
 4 files changed, 62 insertions(+)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 58e32fc96369..918ab11f54c2 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -1870,6 +1870,24 @@ static void do_lseek(fuse_req_t req, fuse_ino_t nodeid,
 }
 }
 
+static void do_syncfs(fuse_req_t req, fuse_ino_t nodeid,
+  struct fuse_mbuf_iter *iter)
+{
+struct fuse_syncfs_in *arg;
+
+arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+if (!arg) {
+fuse_reply_err(req, EINVAL);
+return;
+}
+
+if (req->se->op.syncfs) {
+req->se->op.syncfs(req, arg->flags);
+} else {
+fuse_reply_err(req, ENOSYS);
+}
+}
+
 static void do_init(fuse_req_t req, fuse_ino_t nodeid,
 struct fuse_mbuf_iter *iter)
 {
@@ -2267,6 +2285,7 @@ static struct {
 [FUSE_RENAME2] = { do_rename2, "RENAME2" },
 [FUSE_COPY_FILE_RANGE] = { do_copy_file_range, "COPY_FILE_RANGE" },
 [FUSE_LSEEK] = { do_lseek, "LSEEK" },
+[FUSE_SYNCFS] = { do_syncfs, "SYNCFS" },
 };
 
 #define FUSE_MAXOP (sizeof(fuse_ll_ops) / sizeof(fuse_ll_ops[0]))
diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index 3bf786b03485..220bb3db4898 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -1225,6 +1225,19 @@ struct fuse_lowlevel_ops {
  */
 void (*lseek)(fuse_req_t req, fuse_ino_t ino, off_t off, int whence,
   struct fuse_file_info *fi);
+
+/**
+ * Synchronize file system content
+ *
+ * If this request is answered with an error code of ENOSYS,
+ * this is treated as success and future calls to syncfs() will
+ * succeed automatically without being sent to the filesystem
+ * process.
+ *
+ * @param req request handle
+ * @param flags not used yet
+ */
+void (*syncfs)(fuse_req_t req, uint64_t flags);
 };
 
 /**
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 1553d2ef454f..6790a2f6fe10 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -3124,6 +3124,34 @@ static void lo_lseek(fuse_req_t req, fuse_ino_t ino, 
off_t off, int whence,
 }
 }
 
+static void lo_syncfs(fuse_req_t req, uint64_t flags)
+{
+struct lo_data *lo = lo_data(req);
+int fd, ret;
+
+/* No flags supported yet */
+if (flags) {
+fuse_reply_err(req, EINVAL);
+return;
+}
+
+fd = lo_inode_open(lo, &lo->root, O_RDONLY);
+if (fd < 0) {
+fuse_reply_err(req, errno);
+return;
+}
+
+/*
+ * FIXME: this is suboptimal because it will also flush unrelated
+ *writes not coming from the client. This can dramatically
+ *increase the time spent in syncfs() if some process is
+ *writing lots of data on the same filesystem as virtiofsd.
+ */
+ret = syncfs(fd);
+fuse_reply_err(req, ret < 0 ? errno : 0);
+close(fd);
+}
+
 static void lo_destroy(void *userdata)
 {
 struct lo_data *lo = (struct lo_data *)userdata;
@@ -3184,6 +3212,7 @@ static struct fuse_lowlevel_ops lo_oper = {
 .copy_file_range = lo_copy_file_range,
 #endif
 .lseek = lo_lseek,
+.syncfs = lo_syncfs,
 .destroy = lo_destroy,
 };
 
diff --git a/tools/virtiofsd/passthrough_seccomp.c 
b/tools/virtiofsd/passthrough_seccomp.c
index 62441cfcdb95..343188447901 100644
--- a/tools/virtiofsd/passthrough_seccomp.c
+++ b/tools/virtiofsd/passthrough_seccomp.c
@@ -107,6 +107,7 @@ static const int syscall_allowlist[] = {
 SCMP_SYS(set_robust_list),
 SCMP_SYS(setxattr),
 SCMP_SYS(symlinkat),
+SCMP_SYS(syncfs),
 SCMP_SYS(time), /* Rarely needed, except on static builds */
 SCMP_SYS(tgkill),
 SCMP_SYS(unlinkat),
-- 
2.26.3

[for-6.1 v2 0/2] virtiofsd: Add support for FUSE_SYNCFS request

2021-04-26 Thread Greg Kurz

FUSE_SYNCFS allows the client to flush the host page cache.
This isn't available in upstream linux yet, but the following
tree can be used to test:

https://gitlab.com/gkurz/linux/-/tree/virtio-fs-sync

v2: - based on new version of FUSE_SYNCFS
  https://listman.redhat.com/archives/virtio-fs/2021-April/msg00166.html
- propagate syncfs() errors to client (Vivek)

Greg Kurz (2):
  Update linux headers to 5.12-rc8 + FUSE_SYNCFS
  virtiofsd: Add support for FUSE_SYNCFS request

 include/standard-headers/drm/drm_fourcc.h | 23 -
 include/standard-headers/linux/ethtool.h  | 54 ++-
 include/standard-headers/linux/fuse.h | 13 ++-
 include/standard-headers/linux/input.h|  2 +-
 .../standard-headers/rdma/vmw_pvrdma-abi.h|  7 ++
 linux-headers/asm-generic/unistd.h|  4 +-
 linux-headers/asm-mips/unistd_n32.h   |  1 +
 linux-headers/asm-mips/unistd_n64.h   |  1 +
 linux-headers/asm-mips/unistd_o32.h   |  1 +
 linux-headers/asm-powerpc/kvm.h   |  2 +
 linux-headers/asm-powerpc/unistd_32.h |  1 +
 linux-headers/asm-powerpc/unistd_64.h |  1 +
 linux-headers/asm-s390/unistd_32.h|  1 +
 linux-headers/asm-s390/unistd_64.h|  1 +
 linux-headers/asm-x86/kvm.h   |  1 +
 linux-headers/asm-x86/unistd_32.h |  1 +
 linux-headers/asm-x86/unistd_64.h |  1 +
 linux-headers/asm-x86/unistd_x32.h|  1 +
 linux-headers/linux/kvm.h | 89 +++
 linux-headers/linux/vfio.h| 27 ++
 tools/virtiofsd/fuse_lowlevel.c   | 19 
 tools/virtiofsd/fuse_lowlevel.h   | 13 +++
 tools/virtiofsd/passthrough_ll.c  | 29 ++
 tools/virtiofsd/passthrough_seccomp.c |  1 +
 24 files changed, 267 insertions(+), 27 deletions(-)

-- 
2.26.3

[for-6.1 v2 1/2] Update linux headers to 5.12-rc8 + FUSE_SYNCFS

2021-04-26 Thread Greg Kurz

NOT YET IN UPSTREAM LINUX. DO NOT MERGE.

Signed-off-by: Greg Kurz 
---
 include/standard-headers/drm/drm_fourcc.h | 23 -
 include/standard-headers/linux/ethtool.h  | 54 ++-
 include/standard-headers/linux/fuse.h | 13 ++-
 include/standard-headers/linux/input.h|  2 +-
 .../standard-headers/rdma/vmw_pvrdma-abi.h|  7 ++
 linux-headers/asm-generic/unistd.h|  4 +-
 linux-headers/asm-mips/unistd_n32.h   |  1 +
 linux-headers/asm-mips/unistd_n64.h   |  1 +
 linux-headers/asm-mips/unistd_o32.h   |  1 +
 linux-headers/asm-powerpc/kvm.h   |  2 +
 linux-headers/asm-powerpc/unistd_32.h |  1 +
 linux-headers/asm-powerpc/unistd_64.h |  1 +
 linux-headers/asm-s390/unistd_32.h|  1 +
 linux-headers/asm-s390/unistd_64.h|  1 +
 linux-headers/asm-x86/kvm.h   |  1 +
 linux-headers/asm-x86/unistd_32.h |  1 +
 linux-headers/asm-x86/unistd_64.h |  1 +
 linux-headers/asm-x86/unistd_x32.h|  1 +
 linux-headers/linux/kvm.h | 89 +++
 linux-headers/linux/vfio.h| 27 ++
 20 files changed, 205 insertions(+), 27 deletions(-)

diff --git a/include/standard-headers/drm/drm_fourcc.h 
b/include/standard-headers/drm/drm_fourcc.h
index c47e19810c05..a61ae520c2db 100644
--- a/include/standard-headers/drm/drm_fourcc.h
+++ b/include/standard-headers/drm/drm_fourcc.h
@@ -526,6 +526,25 @@ extern "C" {
  */
 #define I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS fourcc_mod_code(INTEL, 7)
 
+/*
+ * Intel Color Control Surface with Clear Color (CCS) for Gen-12 render
+ * compression.
+ *
+ * The main surface is Y-tiled and is at plane index 0 whereas CCS is linear
+ * and at index 1. The clear color is stored at index 2, and the pitch should
+ * be ignored. The clear color structure is 256 bits. The first 128 bits
+ * represents Raw Clear Color Red, Green, Blue and Alpha color each represented
+ * by 32 bits. The raw clear color is consumed by the 3d engine and generates
+ * the converted clear color of size 64 bits. The first 32 bits store the Lower
+ * Converted Clear Color value and the next 32 bits store the Higher Converted
+ * Clear Color value when applicable. The Converted Clear Color values are
+ * consumed by the DE. The last 64 bits are used to store Color Discard Enable
+ * and Depth Clear Value Valid which are ignored by the DE. A CCS cache line
+ * corresponds to an area of 4x1 tiles in the main surface. The main surface
+ * pitch is required to be a multiple of 4 tile widths.
+ */
+#define I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC fourcc_mod_code(INTEL, 8)
+
 /*
  * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
  *
@@ -1035,9 +1054,9 @@ drm_fourcc_canonicalize_nvidia_format_mod(uint64_t 
modifier)
  * Not all combinations are valid, and different SoCs may support different
  * combinations of layout and options.
  */
-#define __fourcc_mod_amlogic_layout_mask 0xf
+#define __fourcc_mod_amlogic_layout_mask 0xff
 #define __fourcc_mod_amlogic_options_shift 8
-#define __fourcc_mod_amlogic_options_mask 0xf
+#define __fourcc_mod_amlogic_options_mask 0xff
 
 #define DRM_FORMAT_MOD_AMLOGIC_FBC(__layout, __options) \
fourcc_mod_code(AMLOGIC, \
diff --git a/include/standard-headers/linux/ethtool.h 
b/include/standard-headers/linux/ethtool.h
index 8bfd01d230da..8e166b3c49dd 100644
--- a/include/standard-headers/linux/ethtool.h
+++ b/include/standard-headers/linux/ethtool.h
@@ -26,6 +26,14 @@
  * have the same layout for 32-bit and 64-bit userland.
  */
 
+/* Note on reserved space.
+ * Reserved fields must not be accessed directly by user space because
+ * they may be replaced by a different field in the future. They must
+ * be initialized to zero before making the request, e.g. via memset
+ * of the entire structure or implicitly by not being set in a structure
+ * initializer.
+ */
+
 /**
  * struct ethtool_cmd - DEPRECATED, link control and status
  * This structure is DEPRECATED, please use struct ethtool_link_settings.
@@ -67,6 +75,7 @@
  * and other link features that the link partner advertised
  * through autonegotiation; 0 if unknown or not applicable.
  * Read-only.
+ * @reserved: Reserved for future use; see the note on reserved space.
  *
  * The link speed in Mbps is split between @speed and @speed_hi.  Use
  * the ethtool_cmd_speed() and ethtool_cmd_speed_set() functions to
@@ -155,6 +164,7 @@ static inline uint32_t ethtool_cmd_speed(const struct 
ethtool_cmd *ep)
  * @bus_info: Device bus address.  This should match the dev_name()
  * string for the underlying bus device, if there is one.  May be
  * an empty string.
+ * @reserved2: Reserved for future use; see the note on reserved space.
  * @n_priv_flags: Number of flags valid for %ETHTOOL_GPFLAGS and
  * %ETHTOOL_SPFLAGS commands; also the number of strings in the
  * %ETH_SS_PRIV_FLAGS set
@@ -356,

Re: [PATCH for-6.0 v2 1/2] hw/block/nvme: fix invalid msix exclusive uninit

2021-04-26 Thread Peter Maydell

On Fri, 23 Apr 2021 at 06:21, Klaus Jensen  wrote:
>
> From: Klaus Jensen 
>
> Commit 1901b4967c3f changed the nvme device from using a bar exclusive
> for MSI-x to sharing it on bar0.
>
> Unfortunately, the msix_uninit_exclusive_bar() call remains in
> nvme_exit() which causes havoc when the device is removed with, say,
> device_del. Fix this.
>
> Additionally, a subregion is added but it is not removed on exit which
> causes a reference to linger and the drive to never be unlocked.
>
> Fixes: 1901b4967c3f ("hw/block/nvme: move msix table and pba to BAR 0")
> Signed-off-by: Klaus Jensen 
> ---
>  hw/block/nvme.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 624a1431d072..5fe082ec34c5 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -6235,7 +6235,8 @@ static void nvme_exit(PCIDevice *pci_dev)
>  if (n->pmr.dev) {
>  host_memory_backend_set_mapped(n->pmr.dev, false);
>  }
> -msix_uninit_exclusive_bar(pci_dev);
> +msix_uninit(pci_dev, &n->bar0, &n->bar0);
> +memory_region_del_subregion(&n->bar0, &n->iomem);
>  }
>
>  static Property nvme_props[] = {
> --

Applied this patch (but not patch 2) to master for 6.0; thanks.

-- PMM

Re: X on old (non-x86) Linux guests

2021-04-26 Thread BALATON Zoltan


Hello,

On Mon, 26 Apr 2021, Dr. David Alan Gilbert wrote:

 Over the weekend I got a Red Hat 6.x (not RHEL!) for Alpha booting
under QEMU which was pretty neat.  But I failed to find a succesful
combination to get X working; has anyone any suggestions?


Adding Andrew who has experimented with old X framebuffer so he may 
remember something more but that was on x86.



 That distro was from around 2000; the challenge is since we don't have
VESA on non-x86, we can't change mode that way, so generic XF86_SVGA
doesn't want to play with any of the devices.

 I also tried the ati device, but the accelerated mach64 driver
didn't recognise that ID.


The ati-vga partially emulates an ATI Rage128 Pro so it won't work with 
mach64 driver that is older and while functionally similar has different 
registers. You probably need to load aty128fb and then set a mode with 
fbset then may need to edit X conf but I forgot which option was neded, 
something about UseFb or similar so it won't try to change mode itself but 
use the already set Linux FB because otherwise it did not detect the card 
properly but I don'r remember the details so may be wrong. Also some 2D 
accel is emulated so may work without disabling it but I think has some 
bugs so it may have glitches.



 Has anyone found any combo that works?
I suspect using one of the existing devices, lying about PCI ID, and
then turning off all accelerations might have a chance but I've not got
that far.


Changing the PCI ID may not help as these ATI chips have different 
registers so only compatible with the right drivers. I've tried to use 
current ati-vga with a Mac ROM that expects mach64 but it did not work.


It may help to add -trace enable="ati*" and maybe also enable some debug 
defines in ati_int.h to see if it accesses the card at all but with the 
right driver that works with Rage128Pro it should produce some picture at 
least in fb console and we could run X with it on x86 before.


More info on ati-vga is here: 
https://osdn.net/projects/qmiga/wiki/SubprojectAti


By the way, last time I've experimented with it I've found that mouse 
pointer getting out of sync and jumping around is probably a result of 
mouse acceleration on the host is not taken into account when tracking 
guest pointer position. Is that possible and is there a way to fix it?


Regards,
BALATON Zoltan

Re: [PATCH 0/3] Introduce vhost-vdpa block device

2021-04-26 Thread Stefan Hajnoczi

On Thu, Apr 08, 2021 at 06:12:49PM +0800, Xie Yongji wrote:
> Since we already have some ways to emulate vDPA block device
> in kernel[1] or userspace[2]. This series tries to introduce a
> new vhost-vdpa block device for that. To use it, we can add
> something like:
> 
> qemu-system-x86_64 \
> -device vhost-vdpa-blk-pci,vdpa-dev=/dev/vhost-vdpa-0

This device is similar to vhost-user-blk. QEMU does not see it as a
block device so storage migration, I/O throttling, image formats, etc
are not supported. Stefano Garzarella and I discussed how vdpa-blk
devices could integrate more deeply with QEMU's block layer. The QEMU
block layer could be enabled only when necessary and otherwise bypassed
for maximum performance.

This alternative approach is similar to how vhost-net is implemented in
QEMU. A BlockDriver would handle the vdpa-blk device and the regular
virtio-blk-pci device would still be present. The virtqueues could be
delegated to the vdpa-blk device in order to bypass the QEMU block
layer.

I wanted to mention this since it's likely that this kind of vdpa-blk
device implementation will be posted in the future and you might be
interested. It makes live migration with non-shared storage possible,
for example.

An issue with vhost-user-blk is that the ownership of qdev properties
and VIRTIO Configuration Space fields was not clearly defined. Some
things are handled by QEMU's vhost-user-blk code, some things are
handled by the vhost-user device backend, and some things are negotiated
between both entities. This patch series follows the existing
vhost-user-blk approach, which I think will show serious issues as the
device is more widely used and whenever virtio-blk or the implementation
is extended with new features. It is very hard to provide backwards
compatibility with the current approach where the ownership of qdev
properties and VIRTIO Configuration Space fields is ad-hoc and largely
undefined.

Since vDPA has VIRTIO Configuration Space APIs, I suggest that the
vhost-vDPA device controls the entire configuration space. QEMU should
simply forward accesses between the guest and vhost-vdpa.

Regarding qdev properties, it's a little trickier because QEMU needs to
do the emulated VIRTIO device setup (allocating virtqueues, setting
their sizes, etc). Can QEMU query this information from the vDPA device?
If not, which qdev properties are read-only and must match the
configuration of the vDPA device and which are read-write and can
control the vDPA device?

Stefan

signature.asc
Description: PGP signature

Re: [PATCH v8] introduce vfio-user protocol specification

2021-04-26 Thread Stefan Hajnoczi

On Wed, Apr 14, 2021 at 04:41:22AM -0700, Thanos Makatos wrote:
> This patch introduces the vfio-user protocol specification (formerly
> known as VFIO-over-socket), which is designed to allow devices to be
> emulated outside QEMU, in a separate process. vfio-user reuses the
> existing VFIO defines, structs and concepts.
> 
> It has been earlier discussed as an RFC in:
> "RFC: use VFIO over a UNIX domain socket to implement device offloading"
> 
> Signed-off-by: John G Johnson 
> Signed-off-by: Thanos Makatos 
> Signed-off-by: John Levon 

No review yet but I wanted to agree on the next steps once the spec has
been reviewed.

One or more of you would be added to ./MAINTAINERS and handle future
patch review and pull requests for the spec.

The spec will be unstable/experimental at least until QEMU vfio-user
implementation has landed. Otherwise it's hard to know whether the
protocol really works.

Does this sound good?

Stefan

signature.asc
Description: PGP signature

Re: [PATCH v3] target/ppc: code motion from translate_init.c.inc to gdbstub.c

2021-04-26 Thread Fabiano Rosas

David Gibson  writes:

> On Wed, Apr 14, 2021 at 01:09:19PM -0700, Richard Henderson wrote:
>> On 4/14/21 7:59 AM, Bruno Larsen (billionai) wrote:
>> > All the code related to gdb has been moved from translate_init.c.inc
>> > file to the gdbstub.c file, where it makes more sense.
>> > 
>> > This new version puts the prototypes in internal.h, to not expose
>> > them unnecessarily.
>> > 
>> > Signed-off-by: Bruno Larsen (billionai) 
>> > Suggested-by: Fabiano Rosas 
>> > ---
>> >   target/ppc/gdbstub.c| 258 
>> >   target/ppc/internal.h   |   5 +
>> >   target/ppc/translate_init.c.inc | 254 +--
>> >   3 files changed, 264 insertions(+), 253 deletions(-)
>> 
>> Reviewed-by: Richard Henderson 
>
> Applied to ppc-for-6.1, thanks.


The prototypes moved to internal.h in v3 so gdbstub.c needs to include
it now. The linux-user build is breaking with:

$ ../configure --target-list=ppc64le-linux-user
$ make -j$(nproc)
(...)
[316/959] Compiling C object 
libqemu-ppc64le-linux-user.fa.p/target_ppc_gdbstub.c.o   
FAILED: libqemu-ppc64le-linux-user.fa.p/target_ppc_gdbstub.c.o  
  
cc -Ilibqemu-ppc64le-linux-user.fa.p -I. -I.. -Itarget/ppc -I../target/ppc 
-I../linux-user/host/x86_64 -Ilinux-user -I../linux-user -Ilinux-user/ppc 
-I../linux-user/ppc -I../capstone/include/capstone -Itrace -Iqap
i -Iui/shader -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include 
-fdiagnostics-color=auto -pipe -Wall -Winvalid-pch -Werror -std=gnu99 -O2 -g 
-isystem /home/fabiano/kvm/qemu-patch-testing/linux-headers -isystem
 linux-headers -iquote . -iquote /home/fabiano/kvm/qemu-patch-testing -iquote 
/home/fabiano/kvm/qemu-patch-testing/include -iquote 
/home/fabiano/kvm/qemu-patch-testing/disas/libvixl -iquote 
/home/fabiano/kvm/qemu-
patch-testing/tcg/i386 -iquote /home/fabiano/kvm/qemu-patch-testing/accel/tcg 
-pthread -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -m64 -mcx16 -D_GNU_SOURCE 
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes
 -Wredundant-decls -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common -fwrapv -Wold-style-declaration 
-Wold-style-definition -Wtype-limits -Wformat-security -Wformat-y2k -Winit-self 
-Wig
nored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels 
-Wexpansion-to-defined -Wimplicit-fallthrough=2 -Wno-missing-include-dirs 
-Wno-shift-negative-value -Wno-psabi -fstack-protector-strong -fPIC -isystem.
./linux-headers -isystemlinux-headers -DNEED_CPU_H 
'-DCONFIG_TARGET="ppc64le-linux-user-config-target.h"' 
'-DCONFIG_DEVICES="ppc64le-linux-user-config-devices.h"' -MD -MQ 
libqemu-ppc64le-linux-user.fa.p/target_ppc
_gdbstub.c.o -MF libqemu-ppc64le-linux-user.fa.p/target_ppc_gdbstub.c.o.d -o 
libqemu-ppc64le-linux-user.fa.p/target_ppc_gdbstub.c.o -c 
../target/ppc/gdbstub.c   
../target/ppc/gdbstub.c:615:8: error: no previous prototype for 
‘ppc_gdb_arch_name’ [-Werror=missing-prototypes]
 
  615 | gchar *ppc_gdb_arch_name(CPUState *cs)  
  
  |^
  
../target/ppc/gdbstub.c:624:6: error: no previous prototype for ‘ppc_gdb_init’ 
[-Werror=missing-prototypes]
  
  624 | void ppc_gdb_init(CPUState *cs, PowerPCCPUClass *pcc)
  |  ^~~~   
  
cc1: all warnings being treated as errors

>> > +void ppc_gdb_init(CPUState *cs, PowerPCCPUClass *pcc)
>> > +{
>> > +
>> > +if (pcc->insns_flags & PPC_FLOAT) {
>> 
>> Watch the extra blank lines.
>
> Fixed in my tree.

Re: X on old (non-x86) Linux guests

2021-04-26 Thread Gerd Hoffmann

On Mon, Apr 26, 2021 at 11:01:26AM +0100, Dr. David Alan Gilbert wrote:
> Hi,
>   Over the weekend I got a Red Hat 6.x (not RHEL!) for Alpha booting
> under QEMU which was pretty neat.  But I failed to find a succesful
> combination to get X working; has anyone any suggestions?
> 
>   That distro was from around 2000; the challenge is since we don't have
> VESA on non-x86, we can't change mode that way, so generic XF86_SVGA
> doesn't want to play with any of the devices.

Well, one trick is to run the vgabios in x86emu.  This is what the xorg
vesa driver still does today on non-x86 archs (which includes x86_64).

I suspect Red Has 6.x is to old for that one though.  When was it
released exactly?  I would guess late 90-ies ...

If the vesa path fails the best bet would probably be a driver for the
cirrus.  There is a cirrusfb driver in the kernel which could be
combined with the fbdev xserver, not sure when cirrusfb was merged
and whenever redhat enabled the driver though.

Also not sure whenever there is an working xfree86 cirrus driver.

> I also tried the ati device, but the accelerated mach64 driver
> didn't recognise that ID.

The virtual ati cards are probably too new.

HTH,
  Gerd

[Bug 1926174] [NEW] Laggy and/or displaced mouse input on CloudReady (Chrome OS) VM

2021-04-26 Thread Michael Monreal

Public bug reported:

This weekend I tried to get a CloudReady (Chrome OS) VM running on qemu
5.2. This seems to wok quite well, performance seems to be great in
fact. Only problem is mouse input.

Using SDL display, there is no visible mouse unless I set "show-
cursor=on". After that the mouse pointer flickers a bit and most of the
time is displaced so I need to press below a button in order to hit it.
After switching to fullscreen and back using ctrl-alt-f this effect
seems to be fixed for a while but the mouse pointer does not reach all
parts of the emulated screen anymore.

Using SPICE instead the mouse pointer is drawn, but it is *very* laggy.
In fact it is only drawn every few seconds so it is unusable but
placement seems to be correct. Text input is instant, so general
emulation speed is not an issue here.

To reproduce, download the free image from
https://www.neverware.com/freedownload#home-edition-install

Then run one of the following commands:

qemu-system-x86_64 -drive driver=raw,file=cloudready-
free-89.3.3-64bit.bin -machine pc,accel=kvm -m 2048 -device virtio-
vga,virgl=on -display sdl,gl=on,show-cursor=on -usb -device usb-mouse
-device intel-hda -device hda-duplex

qemu-system-x86_64 -drive driver=raw,file=cloudready-
free-89.3.3-64bit.bin -machine pc,accel=kvm -m 2048 -device virtio-
vga,virgl=on -display spice-app,gl=on -usb -device usb-mouse -device
intel-hda -device hda-duplex

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1926174

Title:
  Laggy and/or displaced mouse input on CloudReady (Chrome OS) VM

Status in QEMU:
  New

Bug description:
  This weekend I tried to get a CloudReady (Chrome OS) VM running on
  qemu 5.2. This seems to wok quite well, performance seems to be great
  in fact. Only problem is mouse input.

  Using SDL display, there is no visible mouse unless I set "show-
  cursor=on". After that the mouse pointer flickers a bit and most of
  the time is displaced so I need to press below a button in order to
  hit it. After switching to fullscreen and back using ctrl-alt-f this
  effect seems to be fixed for a while but the mouse pointer does not
  reach all parts of the emulated screen anymore.

  Using SPICE instead the mouse pointer is drawn, but it is *very*
  laggy. In fact it is only drawn every few seconds so it is unusable
  but placement seems to be correct. Text input is instant, so general
  emulation speed is not an issue here.

  To reproduce, download the free image from
  https://www.neverware.com/freedownload#home-edition-install

  Then run one of the following commands:

  qemu-system-x86_64 -drive driver=raw,file=cloudready-
  free-89.3.3-64bit.bin -machine pc,accel=kvm -m 2048 -device virtio-
  vga,virgl=on -display sdl,gl=on,show-cursor=on -usb -device usb-mouse
  -device intel-hda -device hda-duplex

  qemu-system-x86_64 -drive driver=raw,file=cloudready-
  free-89.3.3-64bit.bin -machine pc,accel=kvm -m 2048 -device virtio-
  vga,virgl=on -display spice-app,gl=on -usb -device usb-mouse -device
  intel-hda -device hda-duplex

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1926174/+subscriptions

Re: [PATCH v3 18/36] block: add bdrv_attach_child_common() transaction action

2021-04-26 Thread Kevin Wolf

Am 17.03.2021 um 15:35 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Split out no-perm part of bdrv_root_attach_child() into separate
> transaction action. bdrv_root_attach_child() now moves to new
> permission update paradigm: first update graph relations then update
> permissions.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block.c | 189 
>  1 file changed, 135 insertions(+), 54 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 98ff44dbf7..b6bdc534d2 100644
> --- a/block.c
> +++ b/block.c
> @@ -2921,37 +2921,73 @@ static void bdrv_replace_child(BdrvChild *child, 
> BlockDriverState *new_bs)
>  }
>  }
>  
> -/*
> - * This function steals the reference to child_bs from the caller.
> - * That reference is later dropped by bdrv_root_unref_child().
> - *
> - * On failure NULL is returned, errp is set and the reference to
> - * child_bs is also dropped.
> - *
> - * The caller must hold the AioContext lock @child_bs, but not that of @ctx
> - * (unless @child_bs is already in @ctx).
> - */
> -BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
> -  const char *child_name,
> -  const BdrvChildClass *child_class,
> -  BdrvChildRole child_role,
> -  uint64_t perm, uint64_t shared_perm,
> -  void *opaque, Error **errp)
> +static void bdrv_remove_empty_child(BdrvChild *child)
>  {
> -BdrvChild *child;
> -Error *local_err = NULL;
> -int ret;
> -AioContext *ctx;
> +assert(!child->bs);
> +QLIST_SAFE_REMOVE(child, next);
> +g_free(child->name);
> +g_free(child);
> +}
>  
> -ret = bdrv_check_update_perm(child_bs, NULL, perm, shared_perm, NULL, 
> errp);
> -if (ret < 0) {
> -bdrv_abort_perm_update(child_bs);
> -bdrv_unref(child_bs);
> -return NULL;
> +typedef struct BdrvAttachChildCommonState {
> +BdrvChild **child;
> +AioContext *old_parent_ctx;
> +AioContext *old_child_ctx;
> +} BdrvAttachChildCommonState;
> +
> +static void bdrv_attach_child_common_abort(void *opaque)
> +{
> +BdrvAttachChildCommonState *s = opaque;
> +BdrvChild *child = *s->child;
> +BlockDriverState *bs = child->bs;
> +
> +bdrv_replace_child_noperm(child, NULL);
> +
> +if (bdrv_get_aio_context(bs) != s->old_child_ctx) {
> +bdrv_try_set_aio_context(bs, s->old_child_ctx, &error_abort);
>  }
>  
> -child = g_new(BdrvChild, 1);
> -*child = (BdrvChild) {
> +if (bdrv_child_get_parent_aio_context(child) != s->old_parent_ctx) {
> +GSList *ignore = g_slist_prepend(NULL, child);
> +
> +child->klass->can_set_aio_ctx(child, s->old_parent_ctx, &ignore,
> +  &error_abort);
> +g_slist_free(ignore);
> +ignore = g_slist_prepend(NULL, child);
> +child->klass->set_aio_ctx(child, s->old_parent_ctx, &ignore);
> +
> +g_slist_free(ignore);
> +}
> +
> +bdrv_unref(bs);
> +bdrv_remove_empty_child(child);
> +*s->child = NULL;
> +}
> +
> +static TransactionActionDrv bdrv_attach_child_common_drv = {
> +.abort = bdrv_attach_child_common_abort,
> +};
> +
> +/*
> + * Common part of attoching bdrv child to bs or to blk or to job
> + */
> +static int bdrv_attach_child_common(BlockDriverState *child_bs,
> +const char *child_name,
> +const BdrvChildClass *child_class,
> +BdrvChildRole child_role,
> +uint64_t perm, uint64_t shared_perm,
> +void *opaque, BdrvChild **child,
> +Transaction *tran, Error **errp)
> +{
> +BdrvChild *new_child;
> +AioContext *parent_ctx;
> +AioContext *child_ctx = bdrv_get_aio_context(child_bs);
> +
> +assert(child);
> +assert(*child == NULL);
> +
> +new_child = g_new(BdrvChild, 1);
> +*new_child = (BdrvChild) {
>  .bs = NULL,
>  .name   = g_strdup(child_name),
>  .klass  = child_class,
> @@ -2961,37 +2997,92 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState 
> *child_bs,
>  .opaque = opaque,
>  };
>  
> -ctx = bdrv_child_get_parent_aio_context(child);
> -
> -/* If the AioContexts don't match, first try to move the subtree of
> +/*
> + * If the AioContexts don't match, first try to move the subtree of
>   * child_bs into the AioContext of the new parent. If this doesn't work,
> - * try moving the parent into the AioContext of child_bs instead. */
> -if (bdrv_get_aio_context(child_bs) != ctx) {
> -ret = bdrv_try_set_aio_context(child_bs, ctx, &local_err);
> + * try moving the parent into the AioContext of child_bs instead.
> + */
> +p

Re: [PATCH v6 06/18] cpu: Assert DeviceClass::vmsd is NULL on user emulation

2021-04-26 Thread Dr. David Alan Gilbert

* Philippe Mathieu-Daudé (f4...@amsat.org) wrote:
> Migration is specific to system emulation.
> 
> Restrict current DeviceClass::vmsd to sysemu using #ifdef'ry,
> and assert in cpu_exec_realizefn() that dc->vmsd not set under
> user emulation.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  cpu.c  | 1 +
>  target/sh4/cpu.c   | 5 +++--
>  target/unicore32/cpu.c | 4 
>  target/xtensa/cpu.c| 4 +++-
>  4 files changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/cpu.c b/cpu.c
> index bfbe5a66f95..4fed04219df 100644
> --- a/cpu.c
> +++ b/cpu.c
> @@ -138,6 +138,7 @@ void cpu_exec_realizefn(CPUState *cpu, Error **errp)
>  #endif /* CONFIG_TCG */
>  
>  #ifdef CONFIG_USER_ONLY
> +assert(qdev_get_vmsd(DEVICE(cpu)) == NULL);

Why not make that:
   assert(qdev_get_vmsd(DEVICE(cpu)) == NULL ||
  qdev_get_vmsd(DEVICE(cpu)->unmigratable)

then you don't have to worry about the changes below.

Dave

>  assert(cc->vmsd == NULL);
>  #else
>  if (qdev_get_vmsd(DEVICE(cpu)) == NULL) {
> diff --git a/target/sh4/cpu.c b/target/sh4/cpu.c
> index ac65c88f1f8..35d4251aaf3 100644
> --- a/target/sh4/cpu.c
> +++ b/target/sh4/cpu.c
> @@ -218,10 +218,12 @@ static void superh_cpu_initfn(Object *obj)
>  env->movcal_backup_tail = &(env->movcal_backup);
>  }
>  
> +#ifndef CONFIG_USER_ONLY
>  static const VMStateDescription vmstate_sh_cpu = {
>  .name = "cpu",
>  .unmigratable = 1,
>  };
> +#endif
>  
>  #include "hw/core/tcg-cpu-ops.h"
>  
> @@ -257,12 +259,11 @@ static void superh_cpu_class_init(ObjectClass *oc, void 
> *data)
>  cc->gdb_write_register = superh_cpu_gdb_write_register;
>  #ifndef CONFIG_USER_ONLY
>  cc->get_phys_page_debug = superh_cpu_get_phys_page_debug;
> +dc->vmsd = &vmstate_sh_cpu;
>  #endif
>  cc->disas_set_info = superh_cpu_disas_set_info;
>  
>  cc->gdb_num_core_regs = 59;
> -
> -dc->vmsd = &vmstate_sh_cpu;
>  cc->tcg_ops = &superh_tcg_ops;
>  }
>  
> diff --git a/target/unicore32/cpu.c b/target/unicore32/cpu.c
> index 0258884f845..a74ac7b6140 100644
> --- a/target/unicore32/cpu.c
> +++ b/target/unicore32/cpu.c
> @@ -115,10 +115,12 @@ static void uc32_cpu_initfn(Object *obj)
>  #endif
>  }
>  
> +#ifndef CONFIG_USER_ONLY
>  static const VMStateDescription vmstate_uc32_cpu = {
>  .name = "cpu",
>  .unmigratable = 1,
>  };
> +#endif
>  
>  #include "hw/core/tcg-cpu-ops.h"
>  
> @@ -146,7 +148,9 @@ static void uc32_cpu_class_init(ObjectClass *oc, void 
> *data)
>  cc->dump_state = uc32_cpu_dump_state;
>  cc->set_pc = uc32_cpu_set_pc;
>  cc->get_phys_page_debug = uc32_cpu_get_phys_page_debug;
> +#ifndef CONFIG_USER_ONLY
>  dc->vmsd = &vmstate_uc32_cpu;
> +#endif
>  cc->tcg_ops = &uc32_tcg_ops;
>  }
>  
> diff --git a/target/xtensa/cpu.c b/target/xtensa/cpu.c
> index e2b2c7a71c1..a66527e2d45 100644
> --- a/target/xtensa/cpu.c
> +++ b/target/xtensa/cpu.c
> @@ -176,10 +176,12 @@ static void xtensa_cpu_initfn(Object *obj)
>  #endif
>  }
>  
> +#ifndef CONFIG_USER_ONLY
>  static const VMStateDescription vmstate_xtensa_cpu = {
>  .name = "cpu",
>  .unmigratable = 1,
>  };
> +#endif
>  
>  #include "hw/core/tcg-cpu-ops.h"
>  
> @@ -216,9 +218,9 @@ static void xtensa_cpu_class_init(ObjectClass *oc, void 
> *data)
>  cc->gdb_stop_before_watchpoint = true;
>  #ifndef CONFIG_USER_ONLY
>  cc->get_phys_page_debug = xtensa_cpu_get_phys_page_debug;
> +dc->vmsd = &vmstate_xtensa_cpu;
>  #endif
>  cc->disas_set_info = xtensa_cpu_disas_set_info;
> -dc->vmsd = &vmstate_xtensa_cpu;
>  cc->tcg_ops = &xtensa_tcg_ops;
>  }
>  
> -- 
> 2.26.3
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH v3 22/36] block: add bdrv_remove_filter_or_cow transaction action

2021-04-26 Thread Kevin Wolf

Am 17.03.2021 um 15:35 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block.c | 78 +++--
>  1 file changed, 76 insertions(+), 2 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 11f7ad0818..2fca1f2ad5 100644
> --- a/block.c
> +++ b/block.c
> @@ -2929,12 +2929,19 @@ static void bdrv_replace_child(BdrvChild *child, 
> BlockDriverState *new_bs)
>  }
>  }
>  
> +static void bdrv_child_free(void *opaque)
> +{
> +BdrvChild *c = opaque;
> +
> +g_free(c->name);
> +g_free(c);
> +}
> +
>  static void bdrv_remove_empty_child(BdrvChild *child)
>  {
>  assert(!child->bs);
>  QLIST_SAFE_REMOVE(child, next);
> -g_free(child->name);
> -g_free(child);
> +bdrv_child_free(child);
>  }
>  
>  typedef struct BdrvAttachChildCommonState {
> @@ -4956,6 +4963,73 @@ static bool should_update_child(BdrvChild *c, 
> BlockDriverState *to)
>  return ret;
>  }
>  
> +typedef struct BdrvRemoveFilterOrCowChild {
> +BdrvChild *child;
> +bool is_backing;
> +} BdrvRemoveFilterOrCowChild;
> +
> +/* this doesn't restore original child bs, only the child itself */

Hm, this comment tells me that it's intentional, but why is it correct?

> +static void bdrv_remove_filter_or_cow_child_abort(void *opaque)
> +{
> +BdrvRemoveFilterOrCowChild *s = opaque;
> +BlockDriverState *parent_bs = s->child->opaque;
> +
> +QLIST_INSERT_HEAD(&parent_bs->children, s->child, next);
> +if (s->is_backing) {
> +parent_bs->backing = s->child;
> +} else {
> +parent_bs->file = s->child;
> +}
> +}

Kevin

Re: [RFC] tcg plugin: Additional plugin interface

2021-04-26 Thread Min-Yih Hsu

Hi Alex,

> On Apr 23, 2021, at 8:44 AM, Alex Bennée  wrote:
> 
> 
> Min-Yih Hsu  writes:
> 
>> Hi Alex and QEMU developers,
>> 
>> Recently I was working with the TCG plugin. I found that 
>> `qemu_plugin_cb_flags` seems to reserve the functionality to
>> read / write CPU register state, I'm wondering if you can share some
>> roadmap or thoughts on this feature?
> 
> I think reading the CPU register state is certainly on the roadmap,
> writing registers presents a more philosophical question of if it opens
> the way to people attempting a GPL bypass via plugins. However reading
> registers would certainly be a worthwhile addition to the API.

Interesting…I’ve never thought about this problem before.

> 
>> Personally I see reading the CPU register state as (kind of) low-hanging 
>> fruit. The most straightforward way to implement
>> it will be adding another function that can be called by insn_exec callbacks 
>> to read (v)CPU register values. What do you
>> think about this?
> 
> It depends on your definition of low hanging fruit ;-)
> 
> Yes the implementation would be a simple helper which could be called
> from a callback - I don't think we need to limit it to just insn_exec. I
> think the challenge is proving a non-ugly API that works cleanly across
> all the architectures. I'm not keen on exposing arbitrary gdb register
> IDs to the plugins.
> 
> There has been some discussion previously on the list which is probably
> worth reviewing:
> 
>  Date: Mon, 7 Dec 2020 16:03:24 -0500
>  From: Aaron Lindsay 
>  Subject: Plugin Register Accesses
>  Message-ID: 
> 
> But in short I think we need a new subsystem in QEMU where frontends can
> register registers (sic) and then provide a common API for various
> users. This common subsystem would then be the source of data for:
> 
>  - plugins
>  - gdbstub
>  - monitor (info registers)
>  - -d LOG_CPU logging
> 
> If you are interested in tackling such a project I'm certainly happy to
> provide pointers and review.

Thank you! Yeah I’m definitely going to scratch a prototype for this register 
reading plugin interface. I’ll take a look at related email discussions.

Best,
-Min

> 
>> 
>> Thank you
>> -Min
> 
> 
> -- 
> Alex Bennée

Re: [PATCH 2/2] plugins: Move all typedef and type declaration to the front of the qemu-plugin.h

2021-04-26 Thread Alex Bennée



Yonggang Luo  writes:

What's the rationale for moving everything around in the file. As it
currently stands the typedefs are create as we get to each new set of
helpers/plugin points. Doing it all up front seems like a lot of churn
for not particular reason.

> Signed-off-by: Yonggang Luo 
> ---
>  include/qemu/qemu-plugin.h | 187 ++---
>  1 file changed, 92 insertions(+), 95 deletions(-)
>
> diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
> index 97cdfd7761..2cb17f3051 100644
> --- a/include/qemu/qemu-plugin.h
> +++ b/include/qemu/qemu-plugin.h
> @@ -81,27 +81,6 @@ typedef struct qemu_info_t {
>  };
>  } qemu_info_t;
>  
> -/**
> - * qemu_plugin_install() - Install a plugin
> - * @id: this plugin's opaque ID
> - * @info: a block describing some details about the guest
> - * @argc: number of arguments
> - * @argv: array of arguments (@argc elements)
> - *
> - * All plugins must export this symbol which is called when the plugin
> - * is first loaded. Calling qemu_plugin_uninstall() from this function
> - * is a bug.
> - *
> - * Note: @info is only live during the call. Copy any information we
> - * want to keep. @argv remains valid throughout the lifetime of the
> - * loaded plugin.
> - *
> - * Return: 0 on successful loading, !0 for an error.
> - */
> -QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
> -   const qemu_info_t *info,
> -   int argc, char **argv);
> -
>  /**
>   * typedef qemu_plugin_simple_cb_t - simple callback
>   * @id: the unique qemu_plugin_id_t
> @@ -135,6 +114,98 @@ typedef void 
> (*qemu_plugin_vcpu_simple_cb_t)(qemu_plugin_id_t id,
>  typedef void (*qemu_plugin_vcpu_udata_cb_t)(unsigned int vcpu_index,
>  void *userdata);
>  
> +/** struct qemu_plugin_tb - Opaque handle for a translation block */
> +struct qemu_plugin_tb;
> +/** struct qemu_plugin_insn - Opaque handle for a translated instruction */
> +struct qemu_plugin_insn;
> +
> +/**
> + * enum qemu_plugin_cb_flags - type of callback
> + *
> + * @QEMU_PLUGIN_CB_NO_REGS: callback does not access the CPU's regs
> + * @QEMU_PLUGIN_CB_R_REGS: callback reads the CPU's regs
> + * @QEMU_PLUGIN_CB_RW_REGS: callback reads and writes the CPU's regs
> + *
> + * Note: currently unused, plugins cannot read or change system
> + * register state.
> + */
> +enum qemu_plugin_cb_flags {
> +QEMU_PLUGIN_CB_NO_REGS,
> +QEMU_PLUGIN_CB_R_REGS,
> +QEMU_PLUGIN_CB_RW_REGS,
> +};
> +
> +enum qemu_plugin_mem_rw {
> +QEMU_PLUGIN_MEM_R = 1,
> +QEMU_PLUGIN_MEM_W,
> +QEMU_PLUGIN_MEM_RW,
> +};
> +
> +/**
> + * typedef qemu_plugin_vcpu_tb_trans_cb_t - translation callback
> + * @id: unique plugin id
> + * @tb: opaque handle used for querying and instrumenting a block.
> + */
> +typedef void (*qemu_plugin_vcpu_tb_trans_cb_t)(qemu_plugin_id_t id,
> +   struct qemu_plugin_tb *tb);
> +
> +/**
> + * enum qemu_plugin_op - describes an inline op
> + *
> + * @QEMU_PLUGIN_INLINE_ADD_U64: add an immediate value uint64_t
> + *
> + * Note: currently only a single inline op is supported.
> + */
> +
> +enum qemu_plugin_op {
> +QEMU_PLUGIN_INLINE_ADD_U64,
> +};
> +
> +/**
> + * typedef qemu_plugin_meminfo_t - opaque memory transaction handle
> + *
> + * This can be further queried using the qemu_plugin_mem_* query
> + * functions.
> + */
> +typedef uint32_t qemu_plugin_meminfo_t;
> +/** struct qemu_plugin_hwaddr - opaque hw address handle */
> +struct qemu_plugin_hwaddr;
> +
> +typedef void
> +(*qemu_plugin_vcpu_mem_cb_t)(unsigned int vcpu_index,
> + qemu_plugin_meminfo_t info, uint64_t vaddr,
> + void *userdata);
> +
> +typedef void
> +(*qemu_plugin_vcpu_syscall_cb_t)(qemu_plugin_id_t id, unsigned int 
> vcpu_index,
> + int64_t num, uint64_t a1, uint64_t a2,
> + uint64_t a3, uint64_t a4, uint64_t a5,
> + uint64_t a6, uint64_t a7, uint64_t a8);
> +typedef void
> +(*qemu_plugin_vcpu_syscall_ret_cb_t)(qemu_plugin_id_t id, unsigned int 
> vcpu_idx,
> + int64_t num, int64_t ret);
> +
> +/**
> + * qemu_plugin_install() - Install a plugin
> + * @id: this plugin's opaque ID
> + * @info: a block describing some details about the guest
> + * @argc: number of arguments
> + * @argv: array of arguments (@argc elements)
> + *
> + * All plugins must export this symbol which is called when the plugin
> + * is first loaded. Calling qemu_plugin_uninstall() from this function
> + * is a bug.
> + *
> + * Note: @info is only live during the call. Copy any information we
> + * want to keep. @argv remains valid throughout the lifetime of the
> + * loaded plugin.
> + *
> + * Return: 0 on successful loading, !0 for an error.
> + */
> +QEMU_PLUGIN_EXP

[PATCH v5 cxl2.0-v3-doe 0/6] QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2.0

2021-04-26 Thread Chris Browy

This patch implements the PCIe Data Object Exchange (DOE) for PCIe 4.0/5.0
and later and CXL 2.0 "type-3" memory devices supporting the following 
protocols:
 1: PCIe DOE Discovery protocol
 2: CXL DOE Compliance Mode protocol
 3: CXL DOE CDAT protocol

Implementation is based on QEMU version which added CXL 2.0 "type-3" support
https://gitlab.com/bwidawsk/qemu/-/tree/cxl-2.0v4
6882c0453eea74d639ac75ec0f362d0cf9f1c744

PCIe Data Object Exchange (DOE) implementation for QEMU refers to
"Data Object Exchange ECN, March 12, 2020" [1]

The Data Object Exchange implementation of CXL Compliance Mode is
refers to "Compute Express Link (CXL) Specification, Rev. 2.0, Oct.
2020" [2]

The Data Object Exchange implementation of CXL Coherent Device Attribute
Table (CDAT). This implementation is referring to "Coherent Device
Attribute Table Specification, Rev. 1.02, Oct. 2020" [3] and "Compute
Express Link Specification, Rev. 2.0, Oct. 2020" [2]

The CDAT can be specified in two ways. One is to add ",cdat="
in "-device cxl-type3"'s command option. The file is required to provide
the whole CDAT table in binary mode. The other is to use the default
CDAT value created by build_cdat_table in hw/cxl/cxl-cdat.c.

Pre-built CDAT table for testing, contains one CDAT header and six
CDAT entries: DSMAS, DSLBIS, DSMSCIS, DSIS, DSEMTS, and SSLBIS
respectively.

Changes since PATCH v4:
1-3: PCIe DOE linux header and macros and PCIe Discovery protocol
4:   Clean up CXL compliance mode DOE protocol including default responses
5-6: Clean up CXL CDAT DOE protocol including tesing built-in and external CDAT 
tables

[1]: https://members.pcisig.com/wg/PCI-SIG/document/14143
[2]: https://www.computeexpresslink.org/
[3]: 
https://uefi.org/sites/default/files/resources/Coherent%20Device%20Attribute%20Table_1.02.pdf

hchkuo (6):
  standard-headers/linux/pci_regs: PCI header from Linux kernel
  include/hw/pci: headers for PCIe DOE
  hw/pci: PCIe Data Object Exchange implementation
  cxl/compliance: CXL Compliance Data Object Exchange implementation
  cxl/cdat: CXL CDAT Data Object Exchange implementation
  test/cdat: CXL CDAT test data

 MAINTAINERS   |   7 +
 hw/cxl/cxl-cdat.c | 228 +
 hw/cxl/meson.build|   1 +
 hw/mem/cxl_type3.c| 202 
 hw/pci/meson.build|   1 +
 hw/pci/pcie_doe.c | 374 ++
 include/hw/cxl/cxl_cdat.h | 149 +
 include/hw/cxl/cxl_compliance.h   | 293 +
 include/hw/cxl/cxl_component.h|   7 +
 include/hw/cxl/cxl_device.h   |   4 +
 include/hw/cxl/cxl_pci.h  |   2 +
 include/hw/pci/pci_ids.h  |   2 +
 include/hw/pci/pcie.h |   1 +
 include/hw/pci/pcie_doe.h | 123 +++
 include/hw/pci/pcie_regs.h|   3 +
 include/standard-headers/linux/pci_regs.h |   3 +-
 tests/data/cdat/cdat.dat  | Bin 0 -> 148 bytes
 17 files changed, 1399 insertions(+), 1 deletion(-)
 create mode 100644 hw/cxl/cxl-cdat.c
 create mode 100644 hw/pci/pcie_doe.c
 create mode 100644 include/hw/cxl/cxl_cdat.h
 create mode 100644 include/hw/cxl/cxl_compliance.h
 create mode 100644 include/hw/pci/pcie_doe.h
 create mode 100644 tests/data/cdat/cdat.dat

-- 
2.17.1

1 2 3 >

1 - 100 of 210 matches

Mail list logo