date:20230331

Re: [RFC PATCH 3/5] ebpf: Added declaration/initialization routines.

2023-03-31 Thread Jason Wang

On Thu, Mar 30, 2023 at 4:34 PM Daniel P. Berrangé  wrote:
>
> On Thu, Mar 30, 2023 at 02:54:32PM +0800, Jason Wang wrote:
> > On Thu, Mar 30, 2023 at 8:33 AM Andrew Melnychenko  
> > wrote:
> > >
> > > Now, the binary objects may be retrieved by id/name.
> > > It would require for future qmp commands that may require specific
> > > eBPF blob.
> > >
> > > Signed-off-by: Andrew Melnychenko 
> > > ---
> > >  ebpf/ebpf.c  | 48 
> > >  ebpf/ebpf.h  | 25 +
> > >  ebpf/ebpf_rss.c  |  4 
> > >  ebpf/meson.build |  1 +
> > >  4 files changed, 78 insertions(+)
> > >  create mode 100644 ebpf/ebpf.c
> > >  create mode 100644 ebpf/ebpf.h
> > >
> > > diff --git a/ebpf/ebpf.c b/ebpf/ebpf.c
> > > new file mode 100644
> > > index 00..86320d72f5
> > > --- /dev/null
> > > +++ b/ebpf/ebpf.c
> > > @@ -0,0 +1,48 @@
> > > +/*
> > > + * QEMU eBPF binary declaration routine.
> > > + *
> > > + * Developed by Daynix Computing LTD (http://www.daynix.com)
> > > + *
> > > + * Authors:
> > > + *  Andrew Melnychenko 
> > > + *
> > > + * This work is licensed under the terms of the GNU GPL, version 2 or
> > > + * later.  See the COPYING file in the top-level directory.
> > > + */
> > > +
> > > +#include "qemu/osdep.h"
> > > +#include "qemu/queue.h"
> > > +#include "ebpf/ebpf.h"
> > > +
> > > +struct ElfBinaryDataEntry {
> > > +const char *id;
> > > +const void * (*fn)(size_t *);
> > > +
> > > +QSLIST_ENTRY(ElfBinaryDataEntry) node;
> > > +};
> > > +
> > > +static QSLIST_HEAD(, ElfBinaryDataEntry) ebpf_elf_obj_list =
> > > +QSLIST_HEAD_INITIALIZER();
> > > +
> > > +void ebpf_register_binary_data(const char *id, const void * (*fn)(size_t 
> > > *))
> > > +{
> > > +struct ElfBinaryDataEntry *data = NULL;
> > > +
> > > +data = g_malloc0(sizeof(*data));
> > > +data->fn = fn;
> > > +data->id = id;
> > > +
> > > +QSLIST_INSERT_HEAD(&ebpf_elf_obj_list, data, node);
> > > +}
> > > +
> > > +const void *ebpf_find_binary_by_id(const char *id, size_t *sz)
> > > +{
> > > +struct ElfBinaryDataEntry *it = NULL;
> > > +QSLIST_FOREACH(it, &ebpf_elf_obj_list, node) {
> > > +if (strcmp(id, it->id) == 0) {
> > > +return it->fn(sz);
> > > +}
> > > +}
> > > +
> > > +return NULL;
> > > +}
> > > diff --git a/ebpf/ebpf.h b/ebpf/ebpf.h
> > > new file mode 100644
> > > index 00..fd705cb73e
> > > --- /dev/null
> > > +++ b/ebpf/ebpf.h
> > > @@ -0,0 +1,25 @@
> > > +/*
> > > + * QEMU eBPF binary declaration routine.
> > > + *
> > > + * Developed by Daynix Computing LTD (http://www.daynix.com)
> > > + *
> > > + * Authors:
> > > + *  Andrew Melnychenko 
> > > + *
> > > + * This work is licensed under the terms of the GNU GPL, version 2 or
> > > + * later.  See the COPYING file in the top-level directory.
> > > + */
> > > +
> > > +#ifndef EBPF_H
> > > +#define EBPF_H
> > > +
> > > +void ebpf_register_binary_data(const char *id, const void * (*fn)(size_t 
> > > *));
> > > +const void *ebpf_find_binary_by_id(const char *id, size_t *sz);
> > > +
> > > +#define ebpf_binary_init(id, fn) 
> > >   \
> > > +static void __attribute__((constructor)) ebpf_binary_init_ ## fn(void)   
> > >   \
> > > +{
> > >   \
> > > +ebpf_register_binary_data(id, fn);   
> > >   \
> > > +}
> > > +
> > > +#endif /* EBPF_H */
> > > diff --git a/ebpf/ebpf_rss.c b/ebpf/ebpf_rss.c
> > > index 08015fecb1..b4038725f2 100644
> > > --- a/ebpf/ebpf_rss.c
> > > +++ b/ebpf/ebpf_rss.c
> > > @@ -21,6 +21,8 @@
> > >
> > >  #include "ebpf/ebpf_rss.h"
> > >  #include "ebpf/rss.bpf.skeleton.h"
> > > +#include "ebpf/ebpf.h"
> > > +
> > >  #include "trace.h"
> > >
> > >  void ebpf_rss_init(struct EBPFRSSContext *ctx)
> > > @@ -237,3 +239,5 @@ void ebpf_rss_unload(struct EBPFRSSContext *ctx)
> > >  ctx->obj = NULL;
> > >  ctx->program_fd = -1;
> > >  }
> > > +
> > > +ebpf_binary_init("rss", rss_bpf__elf_bytes)
> >
> > Who or how the ABI compatibility is preserved between libvirt and Qemu?
>
> There's no real problem with binary compatibility to solve any more.
>
> When libvirt first launches a QEMU VM, it will fetch the eBPF programs
> it needs from that running QEMU using QMP. WHen it later needs to
> enable features that use eBPF, it already has the program data that
> matches the running QEMU

Ok, then who will validate the eBPF program? I don't think libvirt can
trust what is received from Qemu otherwise arbitrary eBPF programs
could be executed by Qemu in this way. One example is that when guests
escape to Qemu it can modify the rss_bpf__elf_bytes. Though
BPF_PROG_TYPE_SOCKET_FILTER gives some of the restrictions, we still
need to evaluate side effects of this. Or we need to find other ways
like using the binary in libvirt or use rx filter event

Re: [PATCH 09/11] tests/requirements.txt: bump up avocado-framework version to 101.0

2023-03-31 Thread Thomas Huth


On 30/03/2023 14.21, Thomas Huth wrote:

On 30/03/2023 14.12, Alex Bennée wrote:


Thomas Huth  writes:


On 30/03/2023 12.11, Alex Bennée wrote:

From: Kautuk Consul 
Avocado version 101.0 has a fix to re-compute the checksum
of an asset file if the algorithm used in the *-CHECKSUM
file isn't the same as the one being passed to it by the
avocado user (i.e. the avocado_qemu python module).
In the earlier avocado versions this fix wasn't there due
to which if the checksum wouldn't match the earlier
checksum (calculated by a different algorithm), the avocado
code would start downloading a fresh image from the internet
URL thus making the test-cases take longer to execute.
Bump up the avocado-framework version to 101.0.
Signed-off-by: Kautuk Consul 
Tested-by: Hariharan T S 
Message-Id: <20230327115030.3418323-2-kcon...@linux.vnet.ibm.com>
---
   tests/requirements.txt | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tests/requirements.txt b/tests/requirements.txt
index 0ba561b6bd..a6f73da681 100644
--- a/tests/requirements.txt
+++ b/tests/requirements.txt
@@ -2,5 +2,5 @@
   # in the tests/venv Python virtual environment. For more info,
   # refer to: https://pip.pypa.io/en/stable/user_guide/#id1
   # Note that qemu.git/python/ is always implicitly installed.
-avocado-framework==88.1
+avocado-framework==101.0
   pycdlib==1.11.0


Did you check whether the same amount of avocado tests still works as
before? ... last time I tried to bump the version, a lot of things
were failing, and I think Cleber was recently working  on fixing
things, but I haven't heart anything back from him yet that it would
be OK to bump to a newer version now ...


I ran it on my default build and the only failure was:

  (008/222) 
tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg: 
INTERRUPTED: timeout (240.01 s)


which passed on a retry. But now I realise with failfast it skipped a bunch:


That one is also failing for me here when I apply the patch. Without the 
patch, the test is working fine. I think this needs more careful testing 
first - e.g. the tests are run in parallel now by default, which breaks a 
lot of our timeout settings.


FWIW, I think we likely want something like this added to this patch,
so we avoid to run those tests in parallel (unless requested with -jX):

diff a/tests/Makefile.include b/tests/Makefile.include
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -138,12 +138,15 @@ get-vm-image-fedora-31-%: check-venv
 # download all vm images, according to defined targets
 get-vm-images: check-venv $(patsubst %,get-vm-image-fedora-31-%, 
$(FEDORA_31_DOWNLOAD))
 
+JOBS_OPTION=$(lastword -j1 $(filter-out -j, $(filter -j%, $(MAKEFLAGS

+
 check-avocado: check-venv $(TESTS_RESULTS_DIR) get-vm-images
$(call quiet-command, \
 $(TESTS_PYTHON) -m avocado \
 --show=$(AVOCADO_SHOW) run --job-results-dir=$(TESTS_RESULTS_DIR) \
 $(if $(AVOCADO_TAGS),, --filter-by-tags-include-empty \
--filter-by-tags-include-empty-key) \
+--max-parallel-tasks $(JOBS_OPTION:-j%=%) \
 $(AVOCADO_CMDLINE_TAGS) \
 $(if $(GITLAB_CI),,--failfast) $(AVOCADO_TESTS), \
 "AVOCADO", "tests/avocado")

That way we can avoid the timeout problems unless we found a
proper solution for those.

 Thomas

Re: [RFC PATCH v1 10/26] migration/ram: Introduce 'fixed-ram' migration stream capability

2023-03-31 Thread Daniel P . Berrangé

On Thu, Mar 30, 2023 at 06:01:51PM -0400, Peter Xu wrote:
> On Thu, Mar 30, 2023 at 03:03:20PM -0300, Fabiano Rosas wrote:
> > From: Nikolay Borisov 
> > 
> > Implement 'fixed-ram' feature. The core of the feature is to ensure that
> > each ram page of the migration stream has a specific offset in the
> > resulting migration stream. The reason why we'd want such behavior are
> > two fold:
> > 
> >  - When doing a 'fixed-ram' migration the resulting file will have a
> >bounded size, since pages which are dirtied multiple times will
> >always go to a fixed location in the file, rather than constantly
> >being added to a sequential stream. This eliminates cases where a vm
> >with, say, 1G of ram can result in a migration file that's 10s of
> >GBs, provided that the workload constantly redirties memory.
> > 
> >  - It paves the way to implement DIO-enabled save/restore of the
> >migration stream as the pages are ensured to be written at aligned
> >offsets.
> > 
> > The feature requires changing the stream format. First, a bitmap is
> > introduced which tracks which pages have been written (i.e are
> > dirtied) during migration and subsequently it's being written in the
> > resulting file, again at a fixed location for every ramblock. Zero
> > pages are ignored as they'd be zero in the destination migration as
> > well. With the changed format data would look like the following:
> > 
> > |name len|name|used_len|pc*|bitmap_size|pages_offset|bitmap|pages|
> 
> What happens with huge pages?  Would page size matter here?
> 
> I would assume it's fine it uses a constant (small) page size, assuming
> that should match with the granule that qemu tracks dirty (which IIUC is
> the host page size not guest's).
> 
> But I didn't yet pay any further thoughts on that, maybe it would be
> worthwhile in all cases to record page sizes here to be explicit or the
> meaning of bitmap may not be clear (and then the bitmap_size will be a
> field just for sanity check too).

I think recording the page sizes is an anti-feature in this case.

The migration format / state needs to reflect the guest ABI, but
we need to be free to have different backend config behind that
either side of the save/restore.

IOW, if I start a QEMU with 2 GB of RAM, I should be free to use
small pages initially and after restore use 2 x 1 GB hugepages,
or vica-verca.

The important thing with the pages that are saved into the file
is that they are a 1:1 mapping guest RAM regions to file offsets.
IOW, the 2 GB of guest RAM is always a contiguous 2 GB region
in the file.

If the src VM used 1 GB pages, we would be writing a full 2 GB
of data assuming both pages were dirty.

If the src VM used 4k pages, we would be writing some subset of
the 2 GB of data, and the rest would be unwritten.

Either way, when reading back the data we restore it into either
1 GB pages of 4k pages, beause any places there were unwritten
orignally  will read back as zeros.

> If postcopy might be an option, we'd want the page size to be the host page
> size because then looking up the bitmap will be straightforward, deciding
> whether we should copy over page (UFFDIO_COPY) or fill in with zeros
> (UFFDIO_ZEROPAGE).

This format is only intended for the case where we are migrating to
a random-access medium, aka a file, because the fixed RAM mappings
to disk mean that we need to seek back to the original location to
re-write pages that get dirtied. It isn't suitable for a live
migration stream, and thus postcopy is inherantly out of scope.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [RFC PATCH 3/5] ebpf: Added declaration/initialization routines.

2023-03-31 Thread Daniel P . Berrangé

On Fri, Mar 31, 2023 at 03:48:18PM +0800, Jason Wang wrote:
> On Thu, Mar 30, 2023 at 4:34 PM Daniel P. Berrangé  
> wrote:
> >
> > On Thu, Mar 30, 2023 at 02:54:32PM +0800, Jason Wang wrote:
> > > On Thu, Mar 30, 2023 at 8:33 AM Andrew Melnychenko  
> > > wrote:
> > >
> > > Who or how the ABI compatibility is preserved between libvirt and Qemu?
> >
> > There's no real problem with binary compatibility to solve any more.
> >
> > When libvirt first launches a QEMU VM, it will fetch the eBPF programs
> > it needs from that running QEMU using QMP. WHen it later needs to
> > enable features that use eBPF, it already has the program data that
> > matches the running QEMU
> 
> Ok, then who will validate the eBPF program? I don't think libvirt can
> trust what is received from Qemu otherwise arbitrary eBPF programs
> could be executed by Qemu in this way. One example is that when guests
> escape to Qemu it can modify the rss_bpf__elf_bytes. Though
> BPF_PROG_TYPE_SOCKET_FILTER gives some of the restrictions, we still
> need to evaluate side effects of this. Or we need to find other ways
> like using the binary in libvirt or use rx filter events.

As I mentioned, when libvirt first launches QEMU it will fetch the
eBPF programs and keep them for later use. At that point the guest
CPUs haven't started running, and so QEMU it still sufficiently
trustworthy.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v3 5/6] target/riscv: Enable PC-relative translation in system mode

2023-03-31 Thread LIU Zhiwei




On 2023/3/31 9:45, Weiwei Li wrote:

The existence of CF_PCREL can improve performance with the guest
kernel's address space randomization.  Each guest process maps
libc.so (et al) at a different virtual address, and this allows
those translations to be shared.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
  target/riscv/cpu.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 646fa31a59..3b562d5d9f 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1193,6 +1193,8 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
  
  
  #ifndef CONFIG_USER_ONLY

+cs->tcg_cflags |= CF_PCREL;
+


Reviewed-by: LIU Zhiwei 

Zhiwei


  if (cpu->cfg.ext_sstc) {
  riscv_timer_init(cpu);
  }

Re: [PATCH for 8.1 v2 5/6] vdpa: move CVQ isolation check to net_init_vhost_vdpa

2023-03-31 Thread Jason Wang




在 2023/3/30 18:42, Eugenio Perez Martin 写道:

On Thu, Mar 30, 2023 at 8:23 AM Jason Wang  wrote:

On Thu, Mar 30, 2023 at 2:20 PM Jason Wang  wrote:

On Fri, Mar 24, 2023 at 3:54 AM Eugenio Pérez  wrote:

Evaluating it at start time instead of initialization time may make the
guest capable of dynamically adding or removing migration blockers.

Also, moving to initialization reduces the number of ioctls in the
migration, reducing failure possibilities.

As a drawback we need to check for CVQ isolation twice: one time with no
MQ negotiated and another one acking it, as long as the device supports
it.  This is because Vring ASID / group management is based on vq
indexes, but we don't know the index of CVQ before negotiating MQ.

We need to fail if we see a device that can isolate cvq without MQ but
not with MQ.


Signed-off-by: Eugenio Pérez 
---
v2: Take out the reset of the device from vhost_vdpa_cvq_is_isolated
---
  net/vhost-vdpa.c | 194 ---
  1 file changed, 151 insertions(+), 43 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 4397c0d4b3..db2c9afcb3 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -43,6 +43,13 @@ typedef struct VhostVDPAState {

  /* The device always have SVQ enabled */
  bool always_svq;
+
+/* The device can isolate CVQ in its own ASID if MQ is negotiated */
+bool cvq_isolated_mq;
+
+/* The device can isolate CVQ in its own ASID if MQ is not negotiated */
+bool cvq_isolated;

As stated above, if we need a device that cvq_isolated_mq^cvq_isolated
== true, we need to fail. This may reduce the complexity of the code?

Thanks

Since we are the mediation layer, Qemu can alway choose to negotiate
MQ regardless whether or not it is supported by the guest. In this
way, we can have a stable virtqueue index for cvq.


I think it is a great idea and it simplifies this patch somehow.
However, we need something like the queue mapping [1] to do so :).

To double confirm:
* If the device supports MQ, only probe MQ. If not, only probe !MQ.
* Only store cvq_isolated in VhostVDPAState.

Now, if the device does not negotiate MQ but the device supports MQ:



I'm not sure I understand here, if device supports MQ it should accepts 
MQ or we can fail the initialization here.




* All the requests to queue 3 must be redirected to the last queue in
the device. That includes set_vq_address, notifiers regions, etc.



This also means we will only mediate the case:

1) Qemu emulated virtio-net has 1 queue but device support multiple queue

but not

2) Qemu emulated virtio-net has M queue but device support N queue (N>M)




I'm totally ok to go this route but it's not immediate.



Yes but I mean, we can start from failing the device if 
cvq_isolated_mq^cvq_isolated == true (or I wonder if we can meet this 
condition for any existing parents).


Thanks




Thanks!

[1] https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg07157.html

Re: [RFC PATCH 3/5] ebpf: Added declaration/initialization routines.

2023-03-31 Thread Jason Wang

On Fri, Mar 31, 2023 at 3:59 PM Daniel P. Berrangé  wrote:
>
> On Fri, Mar 31, 2023 at 03:48:18PM +0800, Jason Wang wrote:
> > On Thu, Mar 30, 2023 at 4:34 PM Daniel P. Berrangé  
> > wrote:
> > >
> > > On Thu, Mar 30, 2023 at 02:54:32PM +0800, Jason Wang wrote:
> > > > On Thu, Mar 30, 2023 at 8:33 AM Andrew Melnychenko  
> > > > wrote:
> > > >
> > > > Who or how the ABI compatibility is preserved between libvirt and Qemu?
> > >
> > > There's no real problem with binary compatibility to solve any more.
> > >
> > > When libvirt first launches a QEMU VM, it will fetch the eBPF programs
> > > it needs from that running QEMU using QMP. WHen it later needs to
> > > enable features that use eBPF, it already has the program data that
> > > matches the running QEMU
> >
> > Ok, then who will validate the eBPF program? I don't think libvirt can
> > trust what is received from Qemu otherwise arbitrary eBPF programs
> > could be executed by Qemu in this way. One example is that when guests
> > escape to Qemu it can modify the rss_bpf__elf_bytes. Though
> > BPF_PROG_TYPE_SOCKET_FILTER gives some of the restrictions, we still
> > need to evaluate side effects of this. Or we need to find other ways
> > like using the binary in libvirt or use rx filter events.
>
> As I mentioned, when libvirt first launches QEMU it will fetch the
> eBPF programs and keep them for later use. At that point the guest
> CPUs haven't started running, and so QEMU it still sufficiently
> trustworthy.

Well, this means the QMP command is safe only before Qemu starts to
run VCPU. I'm not sure this is a good design. Or at least we need to
fail the QMP command if VCPU starts to run.

Thanks

>
> With regards,
> Daniel
> --
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
>

Re: [RFC PATCH 3/5] ebpf: Added declaration/initialization routines.

2023-03-31 Thread Daniel P . Berrangé

On Fri, Mar 31, 2023 at 04:03:39PM +0800, Jason Wang wrote:
> On Fri, Mar 31, 2023 at 3:59 PM Daniel P. Berrangé  
> wrote:
> >
> > On Fri, Mar 31, 2023 at 03:48:18PM +0800, Jason Wang wrote:
> > > On Thu, Mar 30, 2023 at 4:34 PM Daniel P. Berrangé  
> > > wrote:
> > > >
> > > > On Thu, Mar 30, 2023 at 02:54:32PM +0800, Jason Wang wrote:
> > > > > On Thu, Mar 30, 2023 at 8:33 AM Andrew Melnychenko 
> > > > >  wrote:
> > > > >
> > > > > Who or how the ABI compatibility is preserved between libvirt and 
> > > > > Qemu?
> > > >
> > > > There's no real problem with binary compatibility to solve any more.
> > > >
> > > > When libvirt first launches a QEMU VM, it will fetch the eBPF programs
> > > > it needs from that running QEMU using QMP. WHen it later needs to
> > > > enable features that use eBPF, it already has the program data that
> > > > matches the running QEMU
> > >
> > > Ok, then who will validate the eBPF program? I don't think libvirt can
> > > trust what is received from Qemu otherwise arbitrary eBPF programs
> > > could be executed by Qemu in this way. One example is that when guests
> > > escape to Qemu it can modify the rss_bpf__elf_bytes. Though
> > > BPF_PROG_TYPE_SOCKET_FILTER gives some of the restrictions, we still
> > > need to evaluate side effects of this. Or we need to find other ways
> > > like using the binary in libvirt or use rx filter events.
> >
> > As I mentioned, when libvirt first launches QEMU it will fetch the
> > eBPF programs and keep them for later use. At that point the guest
> > CPUs haven't started running, and so QEMU it still sufficiently
> > trustworthy.
> 
> Well, this means the QMP command is safe only before Qemu starts to
> run VCPU. I'm not sure this is a good design. Or at least we need to
> fail the QMP command if VCPU starts to run.

Currently QEMU has the ability to just create the eBPF programs itself
at will, when it is launched in a privileged scenario regardless of
guest CPU state. In terms of QMP, the reporting of QEMU PIDs for its
various vCPU, I/O threads is also not to be trusted after vCPU starts
if the guest workload is not trustworthy.  I feel this is more of docs
problem to explain the caveats that apps should be aware of.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [RFC PATCH 3/5] ebpf: Added declaration/initialization routines.

2023-03-31 Thread Jason Wang

On Fri, Mar 31, 2023 at 4:13 PM Daniel P. Berrangé  wrote:
>
> On Fri, Mar 31, 2023 at 04:03:39PM +0800, Jason Wang wrote:
> > On Fri, Mar 31, 2023 at 3:59 PM Daniel P. Berrangé  
> > wrote:
> > >
> > > On Fri, Mar 31, 2023 at 03:48:18PM +0800, Jason Wang wrote:
> > > > On Thu, Mar 30, 2023 at 4:34 PM Daniel P. Berrangé 
> > > >  wrote:
> > > > >
> > > > > On Thu, Mar 30, 2023 at 02:54:32PM +0800, Jason Wang wrote:
> > > > > > On Thu, Mar 30, 2023 at 8:33 AM Andrew Melnychenko 
> > > > > >  wrote:
> > > > > >
> > > > > > Who or how the ABI compatibility is preserved between libvirt and 
> > > > > > Qemu?
> > > > >
> > > > > There's no real problem with binary compatibility to solve any more.
> > > > >
> > > > > When libvirt first launches a QEMU VM, it will fetch the eBPF programs
> > > > > it needs from that running QEMU using QMP. WHen it later needs to
> > > > > enable features that use eBPF, it already has the program data that
> > > > > matches the running QEMU
> > > >
> > > > Ok, then who will validate the eBPF program? I don't think libvirt can
> > > > trust what is received from Qemu otherwise arbitrary eBPF programs
> > > > could be executed by Qemu in this way. One example is that when guests
> > > > escape to Qemu it can modify the rss_bpf__elf_bytes. Though
> > > > BPF_PROG_TYPE_SOCKET_FILTER gives some of the restrictions, we still
> > > > need to evaluate side effects of this. Or we need to find other ways
> > > > like using the binary in libvirt or use rx filter events.
> > >
> > > As I mentioned, when libvirt first launches QEMU it will fetch the
> > > eBPF programs and keep them for later use. At that point the guest
> > > CPUs haven't started running, and so QEMU it still sufficiently
> > > trustworthy.
> >
> > Well, this means the QMP command is safe only before Qemu starts to
> > run VCPU. I'm not sure this is a good design. Or at least we need to
> > fail the QMP command if VCPU starts to run.
>
> Currently QEMU has the ability to just create the eBPF programs itself
> at will, when it is launched in a privileged scenario regardless of
> guest CPU state. In terms of QMP, the reporting of QEMU PIDs for its
> various vCPU, I/O threads is also not to be trusted after vCPU starts
> if the guest workload is not trustworthy.

Indeed.

> I feel this is more of docs
> problem to explain the caveats that apps should be aware of.

Ok, we can probably document this and in the future we probably need
to address them.

Thanks

>
>
> With regards,
> Daniel
> --
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
>

Re: [RFC PATCH 0/3] configure: create a python venv and install meson

2023-03-31 Thread Paolo Bonzini


On 3/30/23 16:11, John Snow wrote:

* undo the meson parts from PATCH 3; make patch 3 create the venv +
subsume the MKVENV parts of the Makefiles + always set
explicit_python=yes (so that at this point the in-tree meson is always
used).

* add a patch that starts rejecting --meson=/path/to/meson and drops
explicit_python (instead using pyvenv/bin/meson to check whether a
system meson is usable)

* make Meson use a sphinx-build binary from the virtual environment
(i.e. pass -Dsphinx_build=$PWD/pyvenv/bin/sphinx-build)


Yep, let's talk about this part in particular.


Oh, wait, for this one I already have a patch from my experiment that
used importlib.metadata to look up the entry point dynamically[1] (and
that's where the shim idea developed from).  All I need to do is change
the path passed to find_program() and rewrite the commit message.

Paolo

[1] 
https://lore.kernel.org/qemu-devel/2c63f79d-b46d-841b-bed3-0dca33eab...@redhat.com/

--- 8< 
From: Paolo Bonzini 
Subject: [PATCH] meson: pick sphinx-build from virtual environment

configure is now creating a virtual environment and populating it
with shim binaries that always refer to the correct Python runtime.
docs/meson.build can rely on this, and stop using a sphinx_build
option that may or may not refer to the same version of Python that
is used for the rest of the build.

In the long term, it may actually make sense for Meson's Python
module to include the logic to build such shims, so that other
programs can do the same without needing a full-blown virtual
environment.  However, in the context of QEMU there is no need to
wait for that; QEMU's meson.build already relies on config-host.mak
and on the target list that configure prepares, i.e. it is not
standalone.

Signed-off-by: Paolo Bonzini 


diff --git a/docs/conf.py b/docs/conf.py
index 7e215aa9a5c6..c687ff266301 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -32,15 +32,6 @@
 from distutils.version import LooseVersion
 from sphinx.errors import ConfigError
 
-# Make Sphinx fail cleanly if using an old Python, rather than obscurely

-# failing because some code in one of our extensions doesn't work there.
-# In newer versions of Sphinx this will display nicely; in older versions
-# Sphinx will also produce a Python backtrace but at least the information
-# gets printed...
-if sys.version_info < (3,7):
-raise ConfigError(
-"QEMU requires a Sphinx that uses Python 3.7 or better\n")
-
 # The per-manual conf.py will set qemu_docdir for a single-manual build;
 # otherwise set it here if this is an entire-manual-set build.
 # This is always the absolute path of the docs/ directory in the source tree.
diff --git a/docs/meson.build b/docs/meson.build
index f220800e3e59..1c5fd66bfa7f 100644
--- a/docs/meson.build
+++ b/docs/meson.build
@@ -1,5 +1,6 @@
-sphinx_build = find_program(get_option('sphinx_build'),
# This assumes that Python is inside the venv that configure prepares
+sphinx_build = find_program(fs.parent(python.full_path()) / 'sphinx-build',
 required: get_option('docs'))
 
 # Check if tools are available to build documentation.

diff --git a/meson_options.txt b/meson_options.txt
index b541ab2851dd..8dedec0cf91a 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -12,8 +12,6 @@ option('pkgversion', type : 'string', value : '',
description: 'use specified string as sub-version of the package')
 option('smbd', type : 'string', value : '',
description: 'Path to smbd for slirp networking')
-option('sphinx_build', type : 'string', value : 'sphinx-build',
-   description: 'Use specified sphinx-build for building document')
 option('iasl', type : 'string', value : '',
description: 'Path to ACPI disassembler')
 option('tls_priority', type : 'string', value : 'NORMAL',
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index bf852f4b957e..6a71c3bad296 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -58,8 +58,6 @@ meson_options_help() {
   printf "%s\n" '  --localedir=VALUELocale data directory 
[share/locale]'
   printf "%s\n" '  --localstatedir=VALUELocalstate data directory 
[/var/local]'
   printf "%s\n" '  --mandir=VALUE   Manual page directory [share/man]'
-  printf "%s\n" '  --sphinx-build=VALUE Use specified sphinx-build for 
building document'
-  printf "%s\n" '   [sphinx-build]'
   printf "%s\n" '  --sysconfdir=VALUE   Sysconf data directory [etc]'
   printf "%s\n" '  --tls-priority=VALUE Default TLS protocol/cipher 
priority string'
   printf "%s\n" '   [NORMAL]'
@@ -429,7 +427,6 @@ _meson_option_parse() {
 --disable-sndio) printf "%s" -Dsndio=disabled ;;
 --enable-sparse) printf "%s" -Dsparse=enabled ;;
 --disable-sparse) printf "%s" -Dsparse=disabled ;;
---sphinx-build=*) quot

Re: [RFC PATCH 1/3] python: add mkvenv.py

2023-03-31 Thread Paolo Bonzini

On 3/30/23 16:00, John Snow wrote:

 > +                yield {
 > +                    'name': entry_point.name
,
 > +                    'module': module,
 > +                    'import_name': attr,
 > +                    'func': attr,

What about using a dataclass or namedtuple instead of a dictionary?

Sure. Once 3.8 is our minimum there's no point, though.

Well, that's why I also mentioned namedtuples.  But no big deal.

BTW, another way to repair Debian 10's pip is to create a symbolic link
to sys.base_prefix + '/share/python-wheels' in sys.prefix +
'/share/python-wheels'.  Since this is much faster, perhaps it can be
done unconditionally and checkpip mode can go away together with
self._context?

I guess I like it less because it's way more Debian-specific at that 
point. I think I'd sooner say "Sorry, Debian 10 isn't supported!"

(Or encourage users to upgrade their pip/setuptools/ensurepip to 
something that doesn't trigger the bug.)

Or, IOW, I feel like it's normal to expect ensurepip to work but mussing 
around with symlinks to special directories created by a distribution 
just feels way more fiddly.

No doubt about that.  It's just the balance between simple fiddly code 
and more robust code that is also longer.

Anyhow later on we will split mkvenv.py in multiple patches so it will 
be easy to revert checkpip when time comes.  For example, when Python 
3.7 is dropped for good rather than being just "untested but should 
work", this Debian 10 hack and the importlib_metadata/pkg_resources 
fallbacks go away at the same time.

Paolo

virtio-net-failover intermittent test hangs eating CPU on s390 host

2023-03-31 Thread Peter Maydell

Found a couple of virtio-net-failover test processes sat on the
s390 CI runner with the virtio-net-failover process eating CPU.
Backtrace (I captured from both, but the backtraces are the same
in both cases):


Process tree:
virtio-net-fail(3435488)---qemu-system-i38(3435776)
===
PROCESS: 3435488
gitlab-+ 3435488 3415953 24 Mar30 ?04:01:46
/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/tests/qtest/virtio-net-failover
--tap -k
[New LWP 3435489]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/s390x-linux-gnu/libthread_db.so.1".
__libc_send (fd=fd@entry=3, buf=buf@entry=0x2aa08e5f5c0,
len=len@entry=29, flags=flags@entry=0) at
../sysdeps/unix/sysv/linux/send.c:30
30  ../sysdeps/unix/sysv/linux/send.c: No such file or directory.

Thread 2 (Thread 0x3ffb25ff900 (LWP 3435489)):
#0  syscall () at ../sysdeps/unix/sysv/linux/s390/s390-64/syscall.S:37
#1  0x02aa086d9cf4 in qemu_futex_wait (val=,
f=) at
/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/include/qemu/futex.h:29
#2  qemu_event_wait (ev=ev@entry=0x2aa0874b890 )
at ../util/qemu-thread-posix.c:464
#3  0x02aa08705e82 in call_rcu_thread (opaque=opaque@entry=0x0) at
../util/rcu.c:261
#4  0x02aa086d8d5a in qemu_thread_start (args=) at
../util/qemu-thread-posix.c:541
#5  0x03ffb2887e66 in start_thread (arg=0x3ffb25ff900) at
pthread_create.c:477
#6  0x03ffb277cbe6 in thread_start () at
../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65

Thread 1 (Thread 0x3ffb2cf2770 (LWP 3435488)):
#0  __libc_send (fd=fd@entry=3, buf=buf@entry=0x2aa08e5f5c0,
len=len@entry=29, flags=flags@entry=0) at
../sysdeps/unix/sysv/linux/send.c:30
#1  0x02aa086d5878 in qemu_send_full (s=s@entry=3,
buf=0x2aa08e5f5c0, count=count@entry=29) at ../util/osdep.c:509
#2  0x02aa086aab8a in socket_send (size=,
buf=, fd=3) at ../tests/qtest/libqmp.c:172
#3  _qmp_fd_vsend_fds (fd=, fds=,
fds@entry=0x0, fds_num=fds_num@entry=0, fmt=,
ap=ap@entry=0x3ffd0679f00) at ../tests/qtest/libqmp.c:172
#4  0x02aa086aaf72 in qmp_fd_vsend (fd=,
fmt=, ap=ap@entry=0x3ffd0679f00) at
../tests/qtest/libqmp.c:190
#5  0x02aa086a886c in qtest_qmp_vsend (ap=0x3ffd0679f00,
fmt=, s=0x2aa08e63d70) at ../tests/qtest/libqtest.c:788
#6  qtest_vqmp (ap=0x3ffd0679f00, fmt=,
s=0x2aa08e63d70) at ../tests/qtest/libqtest.c:762
#7  qtest_qmp (s=0x2aa08e63d70, fmt=) at
../tests/qtest/libqtest.c:788
#8  0x02aa086911d0 in migrate_status (qts=) at
../tests/qtest/virtio-net-failover.c:596
#9  0x02aa0869cee0 in test_migrate_off_abort (opaque=) at ../tests/qtest/virtio-net-failover.c:1425
#10 0x03ffb2a7e608 in ?? () from /lib/s390x-linux-gnu/libglib-2.0.so.0
#11 0x03ffb2a7e392 in ?? () from /lib/s390x-linux-gnu/libglib-2.0.so.0
#12 0x03ffb2a7e392 in ?? () from /lib/s390x-linux-gnu/libglib-2.0.so.0
#13 0x03ffb2a7e392 in ?? () from /lib/s390x-linux-gnu/libglib-2.0.so.0
#14 0x03ffb2a7e392 in ?? () from /lib/s390x-linux-gnu/libglib-2.0.so.0
#15 0x03ffb2a7eada in g_test_run_suite () from
/lib/s390x-linux-gnu/libglib-2.0.so.0
#16 0x03ffb2a7eb10 in g_test_run () from
/lib/s390x-linux-gnu/libglib-2.0.so.0
#17 0x02aa086905e2 in main (argc=, argv=) at ../tests/qtest/virtio-net-failover.c:1897
[Inferior 1 (process 3435488) detached]

===
PROCESS: 3435776
gitlab-+ 3435776 3435488 18 Mar30 ?03:04:00 ./qemu-system-i386
-qtest unix:/tmp/qtest-3435488.sock -qtest-log /dev/null -chardev
socket,path=/tmp/qtest-3435488.qmp,id=char0 -mon
chardev=char0,mode=control -display none -M q35 -nodefaults -device
pcie-root-port,id=root0,addr=0x1,bus=pcie.0,chassis=1 -device
pcie-root-port,id=root1,addr=0x2,bus=pcie.0,chassis=2 -netdev
user,id=hs0 -netdev user,id=hs1 -accel qtest
[New LWP 3435778]
[New LWP 3435779]
[New LWP 3435780]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/s390x-linux-gnu/libthread_db.so.1".
0x03ff8e871c8c in __ppoll (fds=0x2aa37996d80, nfds=5,
timeout=, timeout@entry=0x3ffcf0fa428,
sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
44  ../sysdeps/unix/sysv/linux/ppoll.c: No such file or directory.

Thread 4 (Thread 0x3ff7e9a0900 (LWP 3435780)):
#0  futex_wait_cancelable (private=0, expected=0,
futex_word=0x2aa3789d928) at ../sysdeps/nptl/futex-internal.h:183
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0,
mutex=0x2aa36313260 , cond=0x2aa3789d900) at
pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=cond@entry=0x2aa3789d900,
mutex=mutex@entry=0x2aa36313260 ) at
pthread_cond_wait.c:647
#3  0x02aa35a3d4be in qemu_cond_wait_impl (cond=0x2aa3789d900,
mutex=0x2aa36313260 , file=0x2aa35b84c4c
"../softmmu/cpus.c", line=) at
../util/qemu-thread-posix.c:225
#4  0x02aa3566df2e in qemu_wait_io_event
(cpu=cpu@entry=0x2aa37897350) at ../softmmu/cpus.c:424
#5  0x02aa356df704 in dummy_cpu_thread_fn
(arg=arg@entry=0x2aa378

Re: [RFC PATCH 1/3] python: add mkvenv.py

2023-03-31 Thread Paolo Bonzini


On 3/31/23 10:44, Paolo Bonzini wrote:


    What about using a dataclass or namedtuple instead of a dictionary?


Sure. Once 3.8 is our minimum there's no point, though.


Well, that's why I also mentioned namedtuples.  But no big deal.


Sorry, I misunderstood this (I read "until 3.8 is our minimum" and 
interpreted that as "dataclasses are not in 3.6").


I agree, not much need to future-proof the <=3.7 parts of the code.

Paolo

Re: [PATCH] hw/ssi: Fix Linux driver init issue with xilinx_spi

2023-03-31 Thread Edgar E. Iglesias

On Thu, Mar 23, 2023 at 7:29 PM Chris Rauer  wrote:

> The problem is that the Linux driver expects the master transaction inhibit
> bit(R_SPICR_MTI) to be set during driver initialization so that it can
> detect the fifo size but QEMU defaults it to zero out of reset.  The
> datasheet indicates this bit is active on reset.
>
> See page 25, SPI Control Register section:
>
> https://www.xilinx.com/content/dam/xilinx/support/documents/ip_documentation/axi_quad_spi/v3_2/pg153-axi-quad-spi.pdf
>
>
Yes, MTI should be set when the device comes out of reset.

Reviewed-by: Edgar E. Iglesias 



> Signed-off-by: Chris Rauer 
> ---
>  hw/ssi/xilinx_spi.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/hw/ssi/xilinx_spi.c b/hw/ssi/xilinx_spi.c
> index 552927622f..d4de2e7aab 100644
> --- a/hw/ssi/xilinx_spi.c
> +++ b/hw/ssi/xilinx_spi.c
> @@ -156,6 +156,7 @@ static void xlx_spi_do_reset(XilinxSPI *s)
>  txfifo_reset(s);
>
>  s->regs[R_SPISSR] = ~0;
> +s->regs[R_SPICR] = R_SPICR_MTI;
>  xlx_spi_update_irq(s);
>  xlx_spi_update_cs(s);
>  }
> --
> 2.40.0.348.gf938b09366-goog
>
>
>

Re: [PATCH] hw/ssi: Fix Linux driver init issue with xilinx_spi

2023-03-31 Thread Peter Maydell

On Fri, 31 Mar 2023 at 11:09, Edgar E. Iglesias
 wrote:
>
>
> On Thu, Mar 23, 2023 at 7:29 PM Chris Rauer  wrote:
>>
>> The problem is that the Linux driver expects the master transaction inhibit
>> bit(R_SPICR_MTI) to be set during driver initialization so that it can
>> detect the fifo size but QEMU defaults it to zero out of reset.  The
>> datasheet indicates this bit is active on reset.
>>
>> See page 25, SPI Control Register section:
>> https://www.xilinx.com/content/dam/xilinx/support/documents/ip_documentation/axi_quad_spi/v3_2/pg153-axi-quad-spi.pdf
>>
>
> Yes, MTI should be set when the device comes out of reset.
>
> Reviewed-by: Edgar E. Iglesias 

Thanks; applied to target-arm.next for 8.0.

-- PMM

Re: [PATCH for 8.1 v2 5/6] vdpa: move CVQ isolation check to net_init_vhost_vdpa

2023-03-31 Thread Eugenio Perez Martin

On Fri, Mar 31, 2023 at 10:00 AM Jason Wang  wrote:
>
>
> 在 2023/3/30 18:42, Eugenio Perez Martin 写道:
> > On Thu, Mar 30, 2023 at 8:23 AM Jason Wang  wrote:
> >> On Thu, Mar 30, 2023 at 2:20 PM Jason Wang  wrote:
> >>> On Fri, Mar 24, 2023 at 3:54 AM Eugenio Pérez  wrote:
>  Evaluating it at start time instead of initialization time may make the
>  guest capable of dynamically adding or removing migration blockers.
> 
>  Also, moving to initialization reduces the number of ioctls in the
>  migration, reducing failure possibilities.
> 
>  As a drawback we need to check for CVQ isolation twice: one time with no
>  MQ negotiated and another one acking it, as long as the device supports
>  it.  This is because Vring ASID / group management is based on vq
>  indexes, but we don't know the index of CVQ before negotiating MQ.
> >>> We need to fail if we see a device that can isolate cvq without MQ but
> >>> not with MQ.
> >>>
>  Signed-off-by: Eugenio Pérez 
>  ---
>  v2: Take out the reset of the device from vhost_vdpa_cvq_is_isolated
>  ---
>    net/vhost-vdpa.c | 194 ---
>    1 file changed, 151 insertions(+), 43 deletions(-)
> 
>  diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
>  index 4397c0d4b3..db2c9afcb3 100644
>  --- a/net/vhost-vdpa.c
>  +++ b/net/vhost-vdpa.c
>  @@ -43,6 +43,13 @@ typedef struct VhostVDPAState {
> 
>    /* The device always have SVQ enabled */
>    bool always_svq;
>  +
>  +/* The device can isolate CVQ in its own ASID if MQ is negotiated */
>  +bool cvq_isolated_mq;
>  +
>  +/* The device can isolate CVQ in its own ASID if MQ is not 
>  negotiated */
>  +bool cvq_isolated;
> >>> As stated above, if we need a device that cvq_isolated_mq^cvq_isolated
> >>> == true, we need to fail. This may reduce the complexity of the code?
> >>>
> >>> Thanks
> >> Since we are the mediation layer, Qemu can alway choose to negotiate
> >> MQ regardless whether or not it is supported by the guest. In this
> >> way, we can have a stable virtqueue index for cvq.
> >>
> > I think it is a great idea and it simplifies this patch somehow.
> > However, we need something like the queue mapping [1] to do so :).
> >
> > To double confirm:
> > * If the device supports MQ, only probe MQ. If not, only probe !MQ.
> > * Only store cvq_isolated in VhostVDPAState.
> >
> > Now, if the device does not negotiate MQ but the device supports MQ:
>
>
> I'm not sure I understand here, if device supports MQ it should accepts
> MQ or we can fail the initialization here.
>

My fault, I wanted to say "if the device offers MQ but the driver does
not acks it".

>
> > * All the requests to queue 3 must be redirected to the last queue in
> > the device. That includes set_vq_address, notifiers regions, etc.
>
>
> This also means we will only mediate the case:
>
> 1) Qemu emulated virtio-net has 1 queue but device support multiple queue
>
> but not
>
> 2) Qemu emulated virtio-net has M queue but device support N queue (N>M)
>

Right.

>
> >
> > I'm totally ok to go this route but it's not immediate.
>
>
> Yes but I mean, we can start from failing the device if
> cvq_isolated_mq^cvq_isolated == true
>

So probe the two cases but set VhostVDPAState->cvq_isolated =
cvq_isolated && cvq_mq_isolated then? No map involved that way, and
all parents should behave that way.

> (or I wonder if we can meet this condition for any existing parents).

I don't think so, but I think we need to probe the two anyway.
Otherwise we may change the dataplane asid too.

Thanks!

Re: [PATCH 1/2] tests/requirements.txt: bump up avocado-framework version to 101.0

2023-03-31 Thread Alex Bennée



Kautuk Consul  writes:

> Hi,
> On 2023-03-27 07:50:29, Kautuk Consul wrote:
>> Avocado version 101.0 has a fix to re-compute the checksum
>> of an asset file if the algorithm used in the *-CHECKSUM
>> file isn't the same as the one being passed to it by the
>> avocado user (i.e. the avocado_qemu python module).
>> In the earlier avocado versions this fix wasn't there due
>> to which if the checksum wouldn't match the earlier
>> checksum (calculated by a different algorithm), the avocado
>> code would start downloading a fresh image from the internet
>> URL thus making the test-cases take longer to execute.
>> 
>> Bump up the avocado-framework version to 101.0.
> Any comments on this ? I have tested this patch and it seems to work
> fine with the avocado test-cases.

I'm dropping this from the for-8.0 series as it causes a bunch of
failures in tests. I'll keep it in testing/next for when the tree
re-opens.

>> 
>> Signed-off-by: Kautuk Consul 
>> Tested-by: Hariharan T S 
>> ---
>>  tests/requirements.txt | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/tests/requirements.txt b/tests/requirements.txt
>> index 0ba561b6bd..a6f73da681 100644
>> --- a/tests/requirements.txt
>> +++ b/tests/requirements.txt
>> @@ -2,5 +2,5 @@
>>  # in the tests/venv Python virtual environment. For more info,
>>  # refer to: https://pip.pypa.io/en/stable/user_guide/#id1
>>  # Note that qemu.git/python/ is always implicitly installed.
>> -avocado-framework==88.1
>> +avocado-framework==101.0
>>  pycdlib==1.11.0
>> -- 
>> 2.39.2
>> 
>> 


-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH v4 1/3] qtest: Add functions for accessing devices on Aspeed I2C controller

2023-03-31 Thread Thomas Huth


On 28/03/2023 19.19, Stefan Berger wrote:

Add read and write functions for accessing registers of I2C devices
connected to the Aspeed I2C controller.

Signed-off-by: Stefan Berger 
Reviewed-by: Cédric Le Goater 
Reviewed-by: Ninad Palsule 
---
  include/hw/i2c/aspeed_i2c.h |   7 +++
  tests/qtest/qtest_aspeed.c  | 117 
  tests/qtest/qtest_aspeed.h  |  41 +
  3 files changed, 165 insertions(+)
  create mode 100644 tests/qtest/qtest_aspeed.c
  create mode 100644 tests/qtest/qtest_aspeed.h


Acked-by: Thomas Huth

Re: [PATCH v4 2/3] qtest: Move tpm_util_tis_transmit() into tpm-tis-utils.c and rename it

2023-03-31 Thread Thomas Huth


On 28/03/2023 19.19, Stefan Berger wrote:

To be able to remove tpm_tis_base_addr from test cases that do not really
need it move the tpm_util_tis_transmit() function into tpm-tis-utils.c and
rename it to tpm_tis_transmit().

Fix a locality parameter in a test case on the way.

Signed-off-by: Stefan Berger 
Reviewed-by: Ninad Palsule 
---
  tests/qtest/tpm-crb-swtpm-test.c|  3 --
  tests/qtest/tpm-crb-test.c  |  3 --
  tests/qtest/tpm-tis-device-swtpm-test.c |  5 +--
  tests/qtest/tpm-tis-swtpm-test.c|  5 +--
  tests/qtest/tpm-tis-util.c  | 47 -
  tests/qtest/tpm-tis-util.h  |  4 +++
  tests/qtest/tpm-util.c  | 45 ---
  tests/qtest/tpm-util.h  |  3 --
  8 files changed, 56 insertions(+), 59 deletions(-)


Reviewed-by: Thomas Huth

Re: [PATCH 0/5] Cleanup [h_enter|spapr_exit]_nested routines

2023-03-31 Thread Cédric Le Goater


On 3/31/23 08:53, Harsh Prateek Bora wrote:

This patchset introduces helper routines to enable (and does) cleaning
up of h_enter_nested() and spapr_exit_nested() routines in existing api
for nested virtualization on Power/SPAPR for better code readability /
maintenance. No functional changes intended with this patchset.


Adding Nick since he did most of this work.

C.




Harsh Prateek Bora (5):
   ppc: spapr: cleanup cr get/store with helper routines.
   ppc: spapr: cleanup h_enter_nested() with helper routines.
   ppc: spapr: assert early rather late in h_enter_nested()
   ppc: spapr: cleanup spapr_exit_nested() with helper routines.
   MAINTAINERS: Adding myself in the list for ppc/spapr

  MAINTAINERS  |   1 +
  hw/ppc/spapr_hcall.c | 251 ---
  target/ppc/cpu.c |  17 +++
  target/ppc/cpu.h |   2 +
  4 files changed, 161 insertions(+), 110 deletions(-)

Re: [PATCH 1/2] tests/requirements.txt: bump up avocado-framework version to 101.0

2023-03-31 Thread Kautuk Consul

On 2023-03-31 11:19:18, Alex Bennée wrote:
> 
> Kautuk Consul  writes:
> 
> > Hi,
> > On 2023-03-27 07:50:29, Kautuk Consul wrote:
> >> Avocado version 101.0 has a fix to re-compute the checksum
> >> of an asset file if the algorithm used in the *-CHECKSUM
> >> file isn't the same as the one being passed to it by the
> >> avocado user (i.e. the avocado_qemu python module).
> >> In the earlier avocado versions this fix wasn't there due
> >> to which if the checksum wouldn't match the earlier
> >> checksum (calculated by a different algorithm), the avocado
> >> code would start downloading a fresh image from the internet
> >> URL thus making the test-cases take longer to execute.
> >> 
> >> Bump up the avocado-framework version to 101.0.
> > Any comments on this ? I have tested this patch and it seems to work
> > fine with the avocado test-cases.
> 
> I'm dropping this from the for-8.0 series as it causes a bunch of
> failures in tests. I'll keep it in testing/next for when the tree
> re-opens.
Sure, sounds good. Thanks.
> 
> >> 
> >> Signed-off-by: Kautuk Consul 
> >> Tested-by: Hariharan T S 
> >> ---
> >>  tests/requirements.txt | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >> 
> >> diff --git a/tests/requirements.txt b/tests/requirements.txt
> >> index 0ba561b6bd..a6f73da681 100644
> >> --- a/tests/requirements.txt
> >> +++ b/tests/requirements.txt
> >> @@ -2,5 +2,5 @@
> >>  # in the tests/venv Python virtual environment. For more info,
> >>  # refer to: https://pip.pypa.io/en/stable/user_guide/#id1
> >>  # Note that qemu.git/python/ is always implicitly installed.
> >> -avocado-framework==88.1
> >> +avocado-framework==101.0
> >>  pycdlib==1.11.0
> >> -- 
> >> 2.39.2
> >> 
> >> 
> 
> 
> -- 
> Alex Bennée
> Virtualisation Tech Lead @ Linaro
>

Re: [PATCH 0/5] Cleanup [h_enter|spapr_exit]_nested routines

2023-03-31 Thread Daniel Henrique Barboza





On 3/31/23 07:39, Cédric Le Goater wrote:

On 3/31/23 08:53, Harsh Prateek Bora wrote:

This patchset introduces helper routines to enable (and does) cleaning
up of h_enter_nested() and spapr_exit_nested() routines in existing api
for nested virtualization on Power/SPAPR for better code readability /
maintenance. No functional changes intended with this patchset.


Adding Nick since he did most of this work.



And also Fabiano.


Daniel



C.




Harsh Prateek Bora (5):
   ppc: spapr: cleanup cr get/store with helper routines.
   ppc: spapr: cleanup h_enter_nested() with helper routines.
   ppc: spapr: assert early rather late in h_enter_nested()
   ppc: spapr: cleanup spapr_exit_nested() with helper routines.
   MAINTAINERS: Adding myself in the list for ppc/spapr

  MAINTAINERS  |   1 +
  hw/ppc/spapr_hcall.c | 251 ---
  target/ppc/cpu.c |  17 +++
  target/ppc/cpu.h |   2 +
  4 files changed, 161 insertions(+), 110 deletions(-)

Re: [PATCH 2/2] tests/avocado/boot_linux.py: re-enable test-case for ppc64

2023-03-31 Thread Cédric Le Goater


Hello,

[ Copying qemu-ppc@ and Daniel ]

On 3/28/23 13:24, Kautuk Consul wrote:

On 2023-03-27 17:07:30, Alex Bennée wrote:


Kautuk Consul  writes:


Fixes c0c8687ef0("tests/avocado: disable BootLinuxPPC64 test in CI").

Commit c0c8687ef0fd990db8db1655a8a6c5a5e35dd4bb disabled the test-case
for PPC64. On investigation, this turns out to be an issue with the
time taken for downloading the Fedora 31 qcow2 image being included
within the test-case timeout.
Re-enable this test-case by setting the timeout to 360 seconds just
before launching the downloaded VM image.

Signed-off-by: Kautuk Consul 
Reported-by: Alex Bennée 
Tested-by: Hariharan T S hariharan...@linux.vnet.ibm.com


It doesn't really address the principle problem that the
boot_linux.py:BootLinuxPPC64.test_pseries_tcg is super heavyweight for
only 2% extra coverage of the executed lines.

By re-enabling this test-case we will ensure that PPC64 part of qemu
works okay in terms of basic linux boot. Without this we will have
a regression in the sense that there won't be any way to test out
basic linux boot for PPC64.


There are ways and pseries is not only PPC64 machine. There is more
to it. See :

  https://github.com/legoater/qemu-ppc-boot/tree/main/buildroot
  https://github.com/legoater/buildroot/tree/qemu-ppc/board/qemu

QEMU PPC maintainers have external tools for regressions which are
run regularly, at least before sending a PR for upstream.

Thanks,

C.



What we really need is a script so we can compare the output between the
two jsons:

   gcovr --json --exclude-unreachable-branches --print-summary -o coverage.json 
--root ../../ . *.p

because I suspect we could make up that missing few % noodling the
baseline test a bit more.

Can you tell me how you check code coverage with and without this
test-case ? I am kind of new to qemu so it would be nice to know how you
do this. And I am trying to increase the code coverage by improving
the baseline test by including more devices in the qemu-system-ppc64
command line so I would appreciate any tips on how to do that also.



---
  tests/avocado/boot_linux.py | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tests/avocado/boot_linux.py b/tests/avocado/boot_linux.py
index be30dcbd58..c3869a987c 100644
--- a/tests/avocado/boot_linux.py
+++ b/tests/avocado/boot_linux.py
@@ -91,9 +91,9 @@ class BootLinuxPPC64(LinuxTest):
  :avocado: tags=arch:ppc64
  """
  
+# timeout for downloading new VM image.

  timeout = 360
  
-@skipIf(os.getenv('GITLAB_CI'), 'Running on GitLab')

  def test_pseries_tcg(self):
  """
  :avocado: tags=machine:pseries
@@ -101,6 +101,10 @@ def test_pseries_tcg(self):
  """
  self.require_accelerator("tcg")
  self.vm.add_args("-accel", "tcg")
+
+# timeout for actual Linux PPC boot test
+self.timeout = 360
+
  self.launch_and_wait(set_up_ssh_connection=False)



--
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH v2 0/3] qapi: allow unions to contain further unions

2023-03-31 Thread Het Gala


Hi all,

On 17/03/23 9:25 pm, Markus Armbruster wrote:

Daniel P. Berrangé  writes:


Currently it is not possible for a union type to contain a
further union as one (or more) of its branches. This relaxes
that restriction and adds the calls needed to validate field
name uniqueness as unions are flattened.

I apologize for the long delay.  Sick child, sick me, much snot, little
sleep.

PATCH 1 is wrong, but I was able to figure out what's going on there,
and suggested a patch that hopefully works.

PATCH 2 is okay.  I suggested a few tweaks.  I'd put it first, but
that's up to you.

PATCH 3 looks good.

Looking forward to v3.


Thankyou Markus for your suggestions and I hope everyone is in good 
health now. This is just a friendly reminder if Daniel is ready with v3 
patches for the same :)


Regards,
Het Gala

Re: [PULL 0/6] Misc fixes for 2023-03-30

2023-03-31 Thread Peter Maydell

On Thu, 30 Mar 2023 at 14:19, Philippe Mathieu-Daudé  wrote:
>
> The following changes since commit f00506aeca2f6d92318967693f8da8c713c163f3:
>
>   Merge tag 'pull-tcg-20230328' of https://gitlab.com/rth7680/qemu into 
> staging (2023-03-29 11:19:19 +0100)
>
> are available in the Git repository at:
>
>   https://github.com/philmd/qemu.git tags/misc-fixes-20230330
>
> for you to fetch changes up to aad3eb1ffeb65205153fb31d81d4f268186cde7a:
>
>   block/dmg: Ignore C99 prototype declaration mismatch from  
> (2023-03-30 15:03:36 +0200)
>
> 
> - linux-user:
>   . Don't use 16-bit UIDs with SPARC V9
>   . Pick MIPS3 CPU by default to run NaN2008 ELF binaries
>
> - HW:
>   . Fix invalid GT64120 north bridge endianness register swap
>   . Prevent NULL pointer dereference by SMBus devices
>
> - Buildsys:
>   . Fix compiling with liblzfse on Darwin
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.0
for any user-visible changes.

-- PMM

Re: On integrating LoongArch EDK2 firmware into QEMU build process

2023-03-31 Thread Gerd Hoffmann

On Fri, Mar 31, 2023 at 08:54:16AM +0800, maobibo wrote:
> Xuerui,
> 
> Thanks for your mail, it is a good suggestion. Now we are planing to
> move LoongArch uefi bios from edk2-platform to edk2 repo, so that uefi
> bios supporting LoongArch can be auto compiled and uploaded to qemu
> repo. Only that process is somwhat slow since lacking of hands,
> however we are doing this.

Good, so I think it makes sense for qemu to just wait for that to
happen.

Related question:  What are the requirements to build the firmware?
Fedora 38 ships cross compiler packages ...

  binutils-loongarch64-linux-gnu-2.39-3.fc38.x86_64
  gcc-loongarch64-linux-gnu-12.2.1-5.fc38.x86_64

... but when trying to use them to compile the loongarch firmware gcc
throws errors:

loongarch64-linux-gnu-gcc: error: unrecognized command-line option 
‘-mno-explicit-relocs’

I suspect gcc-12 is just too old?

take care,
  Gerd

[PATCH] tests: lcitool: Switch to OpenSUSE Leap 15.4

2023-03-31 Thread Peter Krempa

The 15.3 version is EOL now:

https://get.opensuse.org/leap/15.3

Switch the dockerfile to 15.4.

Signed-off-by: Peter Krempa 
---
 tests/docker/dockerfiles/opensuse-leap.docker | 25 +--
 tests/lcitool/refresh |  2 +-
 2 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/tests/docker/dockerfiles/opensuse-leap.docker 
b/tests/docker/dockerfiles/opensuse-leap.docker
index 8e9500e443..91a67bfd0d 100644
--- a/tests/docker/dockerfiles/opensuse-leap.docker
+++ b/tests/docker/dockerfiles/opensuse-leap.docker
@@ -1,10 +1,10 @@
 # THIS FILE WAS AUTO-GENERATED
 #
-#  $ lcitool dockerfile --layers all opensuse-leap-153 qemu
+#  $ lcitool dockerfile --layers all opensuse-leap-154 qemu
 #
 # https://gitlab.com/libvirt/libvirt-ci

-FROM registry.opensuse.org/opensuse/leap:15.3
+FROM registry.opensuse.org/opensuse/leap:15.4

 RUN zypper update -y && \
 zypper install -y \
@@ -81,6 +81,7 @@ RUN zypper update -y && \
lttng-ust-devel \
lzo-devel \
make \
+   meson \
mkisofs \
ncat \
ncurses-devel \
@@ -89,9 +90,14 @@ RUN zypper update -y && \
pam-devel \
pcre-devel-static \
pkgconfig \
-   python39-base \
-   python39-pip \
-   python39-setuptools \
+   python3-Pillow \
+   python3-PyYAML \
+   python3-Sphinx \
+   python3-base \
+   python3-numpy \
+   python3-opencv \
+   python3-pip \
+   python3-sphinx_rtd_theme \
rdma-core-devel \
rpm \
sed \
@@ -124,18 +130,11 @@ RUN zypper update -y && \
 ln -s /usr/bin/ccache /usr/libexec/ccache-wrappers/g++ && \
 ln -s /usr/bin/ccache /usr/libexec/ccache-wrappers/gcc

-RUN /usr/bin/pip3.9 install \
-PyYAML \
-meson==0.63.2 \
-pillow \
-sphinx \
-sphinx-rtd-theme
-
 ENV CCACHE_WRAPPERSDIR "/usr/libexec/ccache-wrappers"
 ENV LANG "en_US.UTF-8"
 ENV MAKE "/usr/bin/make"
 ENV NINJA "/usr/bin/ninja"
-ENV PYTHON "/usr/bin/python3.9"
+ENV PYTHON "/usr/bin/python3"
 # As a final step configure the user (if env is defined)
 ARG USER
 ARG UID
diff --git a/tests/lcitool/refresh b/tests/lcitool/refresh
index c0d7ad5516..b3acd9d6b0 100755
--- a/tests/lcitool/refresh
+++ b/tests/lcitool/refresh
@@ -120,7 +120,7 @@ try:
 generate_dockerfile("debian-amd64", "debian-11",
 trailer="".join(debian11_extras))
 generate_dockerfile("fedora", "fedora-37")
-generate_dockerfile("opensuse-leap", "opensuse-leap-153")
+generate_dockerfile("opensuse-leap", "opensuse-leap-154")
 generate_dockerfile("ubuntu2004", "ubuntu-2004")
 generate_dockerfile("ubuntu2204", "ubuntu-2204")

-- 
2.39.2

Re: [PATCH] tests: lcitool: Switch to OpenSUSE Leap 15.4

2023-03-31 Thread Daniel P . Berrangé

On Fri, Mar 31, 2023 at 03:11:41PM +0200, Peter Krempa wrote:
> The 15.3 version is EOL now:
> 
> https://get.opensuse.org/leap/15.3
> 
> Switch the dockerfile to 15.4.
> 
> Signed-off-by: Peter Krempa 
> ---
>  tests/docker/dockerfiles/opensuse-leap.docker | 25 +--
>  tests/lcitool/refresh |  2 +-
>  2 files changed, 13 insertions(+), 14 deletions(-)

Reviewed-by: Daniel P. Berrangé 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 1/7] hw/i2c: pmbus add support for block receive

2023-03-31 Thread Corey Minyard

On Fri, Mar 31, 2023 at 12:07:50AM +, Titus Rwantare wrote:
> PMBus devices can send and receive variable length data using the
> block read and write format, with the first byte in the payload
> denoting the length.
> 
> This is mostly used for strings and on-device logs. Devices can
> respond to a block read with an empty string.
> 
> Reviewed-by: Hao Wu 
> Signed-off-by: Titus Rwantare 

Acked-by: Corey Minyard 

> ---
>  hw/i2c/pmbus_device.c | 30 +-
>  include/hw/i2c/pmbus_device.h |  7 +++
>  2 files changed, 36 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/i2c/pmbus_device.c b/hw/i2c/pmbus_device.c
> index c3d6046784..02647769cd 100644
> --- a/hw/i2c/pmbus_device.c
> +++ b/hw/i2c/pmbus_device.c
> @@ -95,7 +95,6 @@ void pmbus_send64(PMBusDevice *pmdev, uint64_t data)
>  void pmbus_send_string(PMBusDevice *pmdev, const char *data)
>  {
>  size_t len = strlen(data);
> -g_assert(len > 0);
>  g_assert(len + pmdev->out_buf_len < SMBUS_DATA_MAX_LEN);
>  pmdev->out_buf[len + pmdev->out_buf_len] = len;
>  
> @@ -105,6 +104,35 @@ void pmbus_send_string(PMBusDevice *pmdev, const char 
> *data)
>  pmdev->out_buf_len += len + 1;
>  }
>  
> +uint8_t pmbus_receive_block(PMBusDevice *pmdev, uint8_t *dest, size_t len)
> +{
> +/* dest may contain data from previous writes */
> +memset(dest, 0, len);
> +
> +/* Exclude command code from return value */
> +pmdev->in_buf++;
> +pmdev->in_buf_len--;
> +
> +/* The byte after the command code denotes the length */
> +uint8_t sent_len = pmdev->in_buf[0];
> +
> +if (sent_len != pmdev->in_buf_len - 1) {
> +qemu_log_mask(LOG_GUEST_ERROR,
> +  "%s: length mismatch. Expected %d bytes, got %d 
> bytes\n",
> +  __func__, sent_len, pmdev->in_buf_len - 1);
> +}
> +
> +/* exclude length byte */
> +pmdev->in_buf++;
> +pmdev->in_buf_len--;
> +
> +if (pmdev->in_buf_len < len) {
> +len = pmdev->in_buf_len;
> +}
> +memcpy(dest, pmdev->in_buf, len);
> +return len;
> +}
> +
>  
>  static uint64_t pmbus_receive_uint(PMBusDevice *pmdev)
>  {
> diff --git a/include/hw/i2c/pmbus_device.h b/include/hw/i2c/pmbus_device.h
> index 93f5d57c9d..7dc00cc4d9 100644
> --- a/include/hw/i2c/pmbus_device.h
> +++ b/include/hw/i2c/pmbus_device.h
> @@ -501,6 +501,13 @@ void pmbus_send64(PMBusDevice *state, uint64_t data);
>   */
>  void pmbus_send_string(PMBusDevice *state, const char *data);
>  
> +/**
> + * @brief Receive data sent with Block Write.
> + * @param dest - memory with enough capacity to receive the write
> + * @param len - the capacity of dest
> + */
> +uint8_t pmbus_receive_block(PMBusDevice *pmdev, uint8_t *dest, size_t len);
> +
>  /**
>   * @brief Receive data over PMBus
>   * These methods help track how much data is being received over PMBus
> -- 
> 2.40.0.423.gd6c402a77b-goog
>

Re: [PATCH 2/7] hw/i2c: pmbus: add vout mode bitfields

2023-03-31 Thread Corey Minyard

On Fri, Mar 31, 2023 at 12:07:51AM +, Titus Rwantare wrote:
> The VOUT_MODE command is described in the PMBus Specification,
> Part II, Ver 1.3 Section 8.3
> 
> VOUT_MODE has a three bit mode and 4 bit parameter, the three bit
> mode determines whether voltages are formatted as uint16, uint16,
> VID, and Direct modes. VID and Direct modes use the remaining 5 bits
> to scale the voltage readings.
> 
> Reviewed-by: Hao Wu 
> Signed-off-by: Titus Rwantare 

Ok, I see the new sensor later.

Acked-by: Corey Minyard 

> ---
>  include/hw/i2c/pmbus_device.h | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/include/hw/i2c/pmbus_device.h b/include/hw/i2c/pmbus_device.h
> index 7dc00cc4d9..2e95164aa1 100644
> --- a/include/hw/i2c/pmbus_device.h
> +++ b/include/hw/i2c/pmbus_device.h
> @@ -444,6 +444,14 @@ typedef struct PMBusCoefficients {
>  int32_t R; /* exponent */
>  } PMBusCoefficients;
>  
> +/**
> + * VOUT_Mode bit fields
> + */
> +typedef struct PMBusVoutMode {
> +uint8_t  mode:3;
> +int8_t   exp:5;
> +} PMBusVoutMode;
> +
>  /**
>   * Convert sensor values to direct mode format
>   *
> -- 
> 2.40.0.423.gd6c402a77b-goog
>

Re: [PATCH 3/7] hw/i2c: pmbus: add fan support

2023-03-31 Thread Corey Minyard

On Fri, Mar 31, 2023 at 12:07:52AM +, Titus Rwantare wrote:
> PMBus devices may integrate fans whose operation is configurable
> over PMBus. This commit allows the driver to read and write the
> fan control registers but does not model the operation of fans.
> 
> Reviewed-by: Stephen Longfield 
> Signed-off-by: Titus Rwantare 

Acked-by: Corey Minyard 

> ---
>  hw/i2c/pmbus_device.c | 176 ++
>  include/hw/i2c/pmbus_device.h |   1 +
>  2 files changed, 177 insertions(+)
> 
> diff --git a/hw/i2c/pmbus_device.c b/hw/i2c/pmbus_device.c
> index 02647769cd..bb42e410b4 100644
> --- a/hw/i2c/pmbus_device.c
> +++ b/hw/i2c/pmbus_device.c
> @@ -490,6 +490,54 @@ static uint8_t pmbus_receive_byte(SMBusDevice *smd)
>  }
>  break;
>  
> +case PMBUS_FAN_CONFIG_1_2:/* R/W byte */
> +if (pmdev->pages[index].page_flags & PB_HAS_FAN) {
> +pmbus_send8(pmdev, pmdev->pages[index].fan_config_1_2);
> +} else {
> +goto passthough;
> +}
> +break;
> +
> +case PMBUS_FAN_COMMAND_1: /* R/W word */
> +if (pmdev->pages[index].page_flags & PB_HAS_FAN) {
> +pmbus_send16(pmdev, pmdev->pages[index].fan_command_1);
> +} else {
> +goto passthough;
> +}
> +break;
> +
> +case PMBUS_FAN_COMMAND_2: /* R/W word */
> +if (pmdev->pages[index].page_flags & PB_HAS_FAN) {
> +pmbus_send16(pmdev, pmdev->pages[index].fan_command_2);
> +} else {
> +goto passthough;
> +}
> +break;
> +
> +case PMBUS_FAN_CONFIG_3_4:/* R/W byte */
> +if (pmdev->pages[index].page_flags & PB_HAS_FAN) {
> +pmbus_send8(pmdev, pmdev->pages[index].fan_config_3_4);
> +} else {
> +goto passthough;
> +}
> +break;
> +
> +case PMBUS_FAN_COMMAND_3: /* R/W word */
> +if (pmdev->pages[index].page_flags & PB_HAS_FAN) {
> +pmbus_send16(pmdev, pmdev->pages[index].fan_command_3);
> +} else {
> +goto passthough;
> +}
> +break;
> +
> +case PMBUS_FAN_COMMAND_4: /* R/W word */
> +if (pmdev->pages[index].page_flags & PB_HAS_FAN) {
> +pmbus_send16(pmdev, pmdev->pages[index].fan_command_4);
> +} else {
> +goto passthough;
> +}
> +break;
> +
>  case PMBUS_VOUT_OV_FAULT_LIMIT:   /* R/W word */
>  if (pmdev->pages[index].page_flags & PB_HAS_VOUT) {
>  pmbus_send16(pmdev, pmdev->pages[index].vout_ov_fault_limit);
> @@ -800,6 +848,22 @@ static uint8_t pmbus_receive_byte(SMBusDevice *smd)
>  pmbus_send8(pmdev, pmdev->pages[index].status_mfr_specific);
>  break;
>  
> +case PMBUS_STATUS_FANS_1_2:   /* R/W byte */
> +if (pmdev->pages[index].page_flags & PB_HAS_FAN) {
> +pmbus_send8(pmdev, pmdev->pages[index].status_fans_1_2);
> +} else {
> +goto passthough;
> +}
> +break;
> +
> +case PMBUS_STATUS_FANS_3_4:   /* R/W byte */
> +if (pmdev->pages[index].page_flags & PB_HAS_FAN) {
> +pmbus_send8(pmdev, pmdev->pages[index].status_fans_3_4);
> +} else {
> +goto passthough;
> +}
> +break;
> +
>  case PMBUS_READ_EIN:  /* Read-Only block 5 bytes */
>  if (pmdev->pages[index].page_flags & PB_HAS_EIN) {
>  pmbus_send(pmdev, pmdev->pages[index].read_ein, 5);
> @@ -872,6 +936,54 @@ static uint8_t pmbus_receive_byte(SMBusDevice *smd)
>  }
>  break;
>  
> +case PMBUS_READ_FAN_SPEED_1:  /* Read-Only word */
> +if (pmdev->pages[index].page_flags & PB_HAS_FAN) {
> +pmbus_send16(pmdev, pmdev->pages[index].read_fan_speed_1);
> +} else {
> +goto passthough;
> +}
> +break;
> +
> +case PMBUS_READ_FAN_SPEED_2:  /* Read-Only word */
> +if (pmdev->pages[index].page_flags & PB_HAS_FAN) {
> +pmbus_send16(pmdev, pmdev->pages[index].read_fan_speed_2);
> +} else {
> +goto passthough;
> +}
> +break;
> +
> +case PMBUS_READ_FAN_SPEED_3:  /* Read-Only word */
> +if (pmdev->pages[index].page_flags & PB_HAS_FAN) {
> +pmbus_send16(pmdev, pmdev->pages[index].read_fan_speed_3);
> +} else {
> +goto passthough;
> +}
> +break;
> +
> +case PMBUS_READ_FAN_SPEED_4:  /* Read-Only word */
> +if (pmdev->pages[index].page_flags & PB_HAS_FAN) {
> +pmbus_send16(pmdev, pmdev->pages[index].read_fan_speed_4);
> +} else {
> +goto passthough;
> +}
> +break;
> +
> +case PMBUS_READ_DUTY_CYCLE:   /* Read-Only word */
> +if (pmdev->pages[index].page_flags & PB_HAS_FAN) {

Re: [PATCH 4/7] hw/i2c: pmbus: block uninitialised string reads

2023-03-31 Thread Corey Minyard

On Fri, Mar 31, 2023 at 12:07:53AM +, Titus Rwantare wrote:
> Devices models calling pmbus_send_string can't be relied upon to
> send a non-zero pointer. This logs an error and doesn't segfault.
> 
> Reviewed-by: Patrick Venture 
> Signed-off-by: Titus Rwantare 

Acked-by: Corey Minyard 

> ---
>  hw/i2c/pmbus_device.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/hw/i2c/pmbus_device.c b/hw/i2c/pmbus_device.c
> index bb42e410b4..18e629eaac 100644
> --- a/hw/i2c/pmbus_device.c
> +++ b/hw/i2c/pmbus_device.c
> @@ -94,6 +94,13 @@ void pmbus_send64(PMBusDevice *pmdev, uint64_t data)
>  
>  void pmbus_send_string(PMBusDevice *pmdev, const char *data)
>  {
> +if (!data) {
> +qemu_log_mask(LOG_GUEST_ERROR,
> +  "%s: %s: uninitialised read from 0x%02x\n",
> +  __func__, DEVICE(pmdev)->canonical_path, pmdev->code);
> +return;
> +}
> +
>  size_t len = strlen(data);
>  g_assert(len + pmdev->out_buf_len < SMBUS_DATA_MAX_LEN);
>  pmdev->out_buf[len + pmdev->out_buf_len] = len;
> -- 
> 2.40.0.423.gd6c402a77b-goog
>

Re: [PATCH 5/7] hw/i2c: pmbus: add VCAP register

2023-03-31 Thread Corey Minyard

On Fri, Mar 31, 2023 at 12:07:54AM +, Titus Rwantare wrote:
> VCAP is a register for devices with energy storage capacitors.
> 
> Reviewed-by: Benjamin Streb 
> Signed-off-by: Titus Rwantare 

Acked-by: Corey Minyard 

> ---
>  hw/i2c/pmbus_device.c | 8 
>  include/hw/i2c/pmbus_device.h | 1 +
>  2 files changed, 9 insertions(+)
> 
> diff --git a/hw/i2c/pmbus_device.c b/hw/i2c/pmbus_device.c
> index 18e629eaac..ef0314a913 100644
> --- a/hw/i2c/pmbus_device.c
> +++ b/hw/i2c/pmbus_device.c
> @@ -903,6 +903,14 @@ static uint8_t pmbus_receive_byte(SMBusDevice *smd)
>  }
>  break;
>  
> +case PMBUS_READ_VCAP: /* Read-Only word */
> +if (pmdev->pages[index].page_flags & PB_HAS_VCAP) {
> +pmbus_send16(pmdev, pmdev->pages[index].read_vcap);
> +} else {
> +goto passthough;
> +}
> +break;
> +
>  case PMBUS_READ_VOUT: /* Read-Only word */
>  if (pmdev->pages[index].page_flags & PB_HAS_VOUT) {
>  pmbus_send16(pmdev, pmdev->pages[index].read_vout);
> diff --git a/include/hw/i2c/pmbus_device.h b/include/hw/i2c/pmbus_device.h
> index ad431bdc7c..f195c11384 100644
> --- a/include/hw/i2c/pmbus_device.h
> +++ b/include/hw/i2c/pmbus_device.h
> @@ -243,6 +243,7 @@ OBJECT_DECLARE_TYPE(PMBusDevice, PMBusDeviceClass,
>  #define PB_HAS_VIN_RATING  BIT_ULL(13)
>  #define PB_HAS_VOUT_RATING BIT_ULL(14)
>  #define PB_HAS_VOUT_MODE   BIT_ULL(15)
> +#define PB_HAS_VCAPBIT_ULL(16)
>  #define PB_HAS_IOUTBIT_ULL(21)
>  #define PB_HAS_IIN BIT_ULL(22)
>  #define PB_HAS_IOUT_RATING BIT_ULL(23)
> -- 
> 2.40.0.423.gd6c402a77b-goog
>

Re: [PATCH 6/7] hw/sensor: add ADM1266 device model

2023-03-31 Thread Corey Minyard

On Fri, Mar 31, 2023 at 12:07:55AM +, Titus Rwantare wrote:
>   The ADM1266 is a cascadable super sequencer with margin control and
>   fault recording.

This sounds like serious marketing-speak :).  I looked up the chip and
yes, that's what they say about it.

>   This commit adds basic support for its PMBus commands and models
>   the identification registers that can be modified in a firmware
>   update.
> 
> Reviewed-by: Hao Wu 
> Signed-off-by: Titus Rwantare 

Looks good. 

Acked-by: Corey Minyard 

> ---
>  hw/arm/Kconfig|   1 +
>  hw/sensor/Kconfig |   5 +
>  hw/sensor/adm1266.c   | 255 ++
>  hw/sensor/meson.build |   1 +
>  4 files changed, 262 insertions(+)
>  create mode 100644 hw/sensor/adm1266.c
> 
> diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
> index b5aed4aff5..4e44a7451d 100644
> --- a/hw/arm/Kconfig
> +++ b/hw/arm/Kconfig
> @@ -407,6 +407,7 @@ config XLNX_VERSAL
>  config NPCM7XX
>  bool
>  select A9MPCORE
> +select ADM1266
>  select ADM1272
>  select ARM_GIC
>  select SMBUS
> diff --git a/hw/sensor/Kconfig b/hw/sensor/Kconfig
> index e03bd09b50..bc6331b4ab 100644
> --- a/hw/sensor/Kconfig
> +++ b/hw/sensor/Kconfig
> @@ -22,6 +22,11 @@ config ADM1272
>  bool
>  depends on I2C
>  
> +config ADM1266
> +bool
> +depends on PMBUS
> +default y if PMBUS
> +
>  config MAX34451
>  bool
>  depends on I2C
> diff --git a/hw/sensor/adm1266.c b/hw/sensor/adm1266.c
> new file mode 100644
> index 00..0745b12b1d
> --- /dev/null
> +++ b/hw/sensor/adm1266.c
> @@ -0,0 +1,255 @@
> +/*
> + * Analog Devices ADM1266 Cascadable Super Sequencer with Margin Control and
> + * Fault Recording with PMBus
> + *
> + * 
> https://www.analog.com/media/en/technical-documentation/data-sheets/adm1266.pdf
> + *
> + * Copyright 2023 Google LLC
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include "qemu/osdep.h"
> +#include 
> +#include "hw/i2c/pmbus_device.h"
> +#include "hw/irq.h"
> +#include "migration/vmstate.h"
> +#include "qapi/error.h"
> +#include "qapi/visitor.h"
> +#include "qemu/log.h"
> +#include "qemu/module.h"
> +
> +#define TYPE_ADM1266 "adm1266"
> +OBJECT_DECLARE_SIMPLE_TYPE(ADM1266State, ADM1266)
> +
> +#define ADM1266_BLACKBOX_CONFIG 0xD3
> +#define ADM1266_PDIO_CONFIG 0xD4
> +#define ADM1266_READ_STATE  0xD9
> +#define ADM1266_READ_BLACKBOX   0xDE
> +#define ADM1266_SET_RTC 0xDF
> +#define ADM1266_GPIO_SYNC_CONFIGURATION 0xE1
> +#define ADM1266_BLACKBOX_INFORMATION0xE6
> +#define ADM1266_PDIO_STATUS 0xE9
> +#define ADM1266_GPIO_STATUS 0xEA
> +
> +/* Defaults */
> +#define ADM1266_OPERATION_DEFAULT   0x80
> +#define ADM1266_CAPABILITY_DEFAULT  0xA0
> +#define ADM1266_CAPABILITY_NO_PEC   0x20
> +#define ADM1266_PMBUS_REVISION_DEFAULT  0x22
> +#define ADM1266_MFR_ID_DEFAULT  "ADI"
> +#define ADM1266_MFR_ID_DEFAULT_LEN  32
> +#define ADM1266_MFR_MODEL_DEFAULT   "ADM1266-A1"
> +#define ADM1266_MFR_MODEL_DEFAULT_LEN   32
> +#define ADM1266_MFR_REVISION_DEFAULT"25"
> +#define ADM1266_MFR_REVISION_DEFAULT_LEN8
> +
> +#define ADM1266_NUM_PAGES   17
> +/**
> + * PAGE Index
> + * Page 0 VH1.
> + * Page 1 VH2.
> + * Page 2 VH3.
> + * Page 3 VH4.
> + * Page 4 VP1.
> + * Page 5 VP2.
> + * Page 6 VP3.
> + * Page 7 VP4.
> + * Page 8 VP5.
> + * Page 9 VP6.
> + * Page 10 VP7.
> + * Page 11 VP8.
> + * Page 12 VP9.
> + * Page 13 VP10.
> + * Page 14 VP11.
> + * Page 15 VP12.
> + * Page 16 VP13.
> + */
> +typedef struct ADM1266State {
> +PMBusDevice parent;
> +
> +char mfr_id[32];
> +char mfr_model[32];
> +char mfr_rev[8];
> +} ADM1266State;
> +
> +static const uint8_t adm1266_ic_device_id[] = {0x03, 0x41, 0x12, 0x66};
> +static const uint8_t adm1266_ic_device_rev[] = {0x08, 0x01, 0x08, 0x07, 0x0,
> +0x0, 0x07, 0x41, 0x30};
> +
> +static void adm1266_exit_reset(Object *obj)
> +{
> +ADM1266State *s = ADM1266(obj);
> +PMBusDevice *pmdev = PMBUS_DEVICE(obj);
> +
> +pmdev->page = 0;
> +pmdev->capability = ADM1266_CAPABILITY_NO_PEC;
> +
> +for (int i = 0; i < ADM1266_NUM_PAGES; i++) {
> +pmdev->pages[i].operation = ADM1266_OPERATION_DEFAULT;
> +pmdev->pages[i].revision = ADM1266_PMBUS_REVISION_DEFAULT;
> +pmdev->pages[i].vout_mode = 0;
> +pmdev->pages[i].read_vout = pmbus_data2linear_mode(12, 0);
> +pmdev->pages[i].vout_margin_high = pmbus_data2linear_mode(15, 0);
> +pmdev->pages[i].vout_margin_low = pmbus_data2linear_mode(3, 0);
> +pmdev->pages[i].vout_ov_fault_limit = pmbus_data2linear_mode(16, 0);
> +pmdev->pages[i].revision = ADM1266_PMBUS_REVISION_DEFAULT;
> +}
> +
> +

Re: [PATCH 7/7] tests/qtest: add tests for ADM1266

2023-03-31 Thread Corey Minyard

On Fri, Mar 31, 2023 at 12:07:56AM +, Titus Rwantare wrote:
>   The ADM1266 can have string fields written by the driver, so
>   it's worth specifically testing.
> 
> Reviewed-by: Hao Wu 
> Signed-off-by: Titus Rwantare 

Acked-by: Corey Minyard 

> ---
>  tests/qtest/adm1266-test.c | 123 +
>  tests/qtest/meson.build|   1 +
>  2 files changed, 124 insertions(+)
>  create mode 100644 tests/qtest/adm1266-test.c
> 
> diff --git a/tests/qtest/adm1266-test.c b/tests/qtest/adm1266-test.c
> new file mode 100644
> index 00..6431a21de6
> --- /dev/null
> +++ b/tests/qtest/adm1266-test.c
> @@ -0,0 +1,123 @@
> +/*
> + * Analog Devices ADM1266 Cascadable Super Sequencer with Margin Control and
> + * Fault Recording with PMBus
> + *
> + * Copyright 2022 Google LLC
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include "qemu/osdep.h"
> +#include 
> +#include "hw/i2c/pmbus_device.h"
> +#include "libqtest-single.h"
> +#include "libqos/qgraph.h"
> +#include "libqos/i2c.h"
> +#include "qapi/qmp/qdict.h"
> +#include "qapi/qmp/qnum.h"
> +#include "qemu/bitops.h"
> +
> +#define TEST_ID "adm1266-test"
> +#define TEST_ADDR (0x12)
> +
> +#define ADM1266_BLACKBOX_CONFIG 0xD3
> +#define ADM1266_PDIO_CONFIG 0xD4
> +#define ADM1266_READ_STATE  0xD9
> +#define ADM1266_READ_BLACKBOX   0xDE
> +#define ADM1266_SET_RTC 0xDF
> +#define ADM1266_GPIO_SYNC_CONFIGURATION 0xE1
> +#define ADM1266_BLACKBOX_INFORMATION0xE6
> +#define ADM1266_PDIO_STATUS 0xE9
> +#define ADM1266_GPIO_STATUS 0xEA
> +
> +/* Defaults */
> +#define ADM1266_OPERATION_DEFAULT   0x80
> +#define ADM1266_CAPABILITY_DEFAULT  0xA0
> +#define ADM1266_CAPABILITY_NO_PEC   0x20
> +#define ADM1266_PMBUS_REVISION_DEFAULT  0x22
> +#define ADM1266_MFR_ID_DEFAULT  "ADI"
> +#define ADM1266_MFR_ID_DEFAULT_LEN  32
> +#define ADM1266_MFR_MODEL_DEFAULT   "ADM1266-A1"
> +#define ADM1266_MFR_MODEL_DEFAULT_LEN   32
> +#define ADM1266_MFR_REVISION_DEFAULT"25"
> +#define ADM1266_MFR_REVISION_DEFAULT_LEN8
> +#define TEST_STRING_A   "a sample"
> +#define TEST_STRING_B   "b sample"
> +#define TEST_STRING_C   "rev c"
> +
> +static void compare_string(QI2CDevice *i2cdev, uint8_t reg,
> +   const char *test_str)
> +{
> +uint8_t len = i2c_get8(i2cdev, reg);
> +char i2c_str[SMBUS_DATA_MAX_LEN] = {0};
> +
> +i2c_read_block(i2cdev, reg, (uint8_t *)i2c_str, len);
> +g_assert_cmpstr(i2c_str, ==, test_str);
> +}
> +
> +static void write_and_compare_string(QI2CDevice *i2cdev, uint8_t reg,
> + const char *test_str, uint8_t len)
> +{
> +char buf[SMBUS_DATA_MAX_LEN] = {0};
> +buf[0] = len;
> +strncpy(buf + 1, test_str, len);
> +i2c_write_block(i2cdev, reg, (uint8_t *)buf, len + 1);
> +compare_string(i2cdev, reg, test_str);
> +}
> +
> +static void test_defaults(void *obj, void *data, QGuestAllocator *alloc)
> +{
> +uint16_t i2c_value;
> +QI2CDevice *i2cdev = (QI2CDevice *)obj;
> +
> +i2c_value = i2c_get8(i2cdev, PMBUS_OPERATION);
> +g_assert_cmphex(i2c_value, ==, ADM1266_OPERATION_DEFAULT);
> +
> +i2c_value = i2c_get8(i2cdev, PMBUS_REVISION);
> +g_assert_cmphex(i2c_value, ==, ADM1266_PMBUS_REVISION_DEFAULT);
> +
> +compare_string(i2cdev, PMBUS_MFR_ID, ADM1266_MFR_ID_DEFAULT);
> +compare_string(i2cdev, PMBUS_MFR_MODEL, ADM1266_MFR_MODEL_DEFAULT);
> +compare_string(i2cdev, PMBUS_MFR_REVISION, ADM1266_MFR_REVISION_DEFAULT);
> +}
> +
> +/* test r/w registers */
> +static void test_rw_regs(void *obj, void *data, QGuestAllocator *alloc)
> +{
> +QI2CDevice *i2cdev = (QI2CDevice *)obj;
> +
> +/* empty strings */
> +i2c_set8(i2cdev, PMBUS_MFR_ID, 0);
> +compare_string(i2cdev, PMBUS_MFR_ID, "");
> +
> +i2c_set8(i2cdev, PMBUS_MFR_MODEL, 0);
> +compare_string(i2cdev, PMBUS_MFR_MODEL, "");
> +
> +i2c_set8(i2cdev, PMBUS_MFR_REVISION, 0);
> +compare_string(i2cdev, PMBUS_MFR_REVISION, "");
> +
> +/* test strings */
> +write_and_compare_string(i2cdev, PMBUS_MFR_ID, TEST_STRING_A,
> + sizeof(TEST_STRING_A));
> +write_and_compare_string(i2cdev, PMBUS_MFR_ID, TEST_STRING_B,
> + sizeof(TEST_STRING_B));
> +write_and_compare_string(i2cdev, PMBUS_MFR_ID, TEST_STRING_C,
> + sizeof(TEST_STRING_C));
> +}
> +
> +static void adm1266_register_nodes(void)
> +{
> +QOSGraphEdgeOptions opts = {
> +.extra_device_opts = "id=" TEST_ID ",address=0x12"
> +};
> +add_qi2c_address(&opts, &(QI2CAddress) { TEST_ADDR });
> +
> +qos_node_create_driver("adm1266", i2c_devi

Re: [PATCH 0/7] bsd-user: remove bitrotted NetBSD and OpenBSD bsd-user support

2023-03-31 Thread Warner Losh

Please note: This did come from me, from a new machine that's slightly
misconfigured, so it didn't go through Google's email server and so you may
get a spoofing warning. I'll fix that in v2, if there is one, or in the
pull request if there's no changes.

Warner

On Fri, Mar 31, 2023 at 8:19 AM Warner Losh  wrote:

> The NetBSD and OpenBSD support in bsd-user hasn't built since before the
> meson
> conversion. It's also out of sync with many of the recent changes in the
> bsd-user fork and has just been removed there. Remove it from master for
> the
> same reasons: it generates a number of false positives with grep and has
> increasingly gotten in the way. The bsd-user fork code is much more
> advanced,
> and even it doesn't compile and is out of date. Remove this from both
> branches. If others wish to bring it up to speed, I'm happy to help them.
>
> Warner Losh (7):
>   bsd-user: Remove obsolete prototypes
>   bsd-user: Remove netbsd system call inclusion and defines
>   bsd-user: Remove netbsd system call tracing
>   bsd-user: Remove openbsd system call inclusion and defines
>   bsd-user: Remove openbsd system call tracing
>   bsd-user: Remove netbsd directory
>   bsd-user: Remove openbsd directory
>
>  bsd-user/netbsd/host-os.h|  25 --
>  bsd-user/netbsd/os-strace.h  |   1 -
>  bsd-user/netbsd/strace.list  | 145 ---
>  bsd-user/netbsd/syscall_nr.h | 373 ---
>  bsd-user/netbsd/target_os_elf.h  | 147 ---
>  bsd-user/netbsd/target_os_siginfo.h  |  82 --
>  bsd-user/netbsd/target_os_signal.h   |  69 -
>  bsd-user/netbsd/target_os_stack.h|  56 
>  bsd-user/netbsd/target_os_thread.h   |  25 --
>  bsd-user/openbsd/host-os.h   |  25 --
>  bsd-user/openbsd/os-strace.h |   1 -
>  bsd-user/openbsd/strace.list | 187 --
>  bsd-user/openbsd/syscall_nr.h| 225 
>  bsd-user/openbsd/target_os_elf.h | 147 ---
>  bsd-user/openbsd/target_os_siginfo.h |  82 --
>  bsd-user/openbsd/target_os_signal.h  |  69 -
>  bsd-user/openbsd/target_os_stack.h   |  56 
>  bsd-user/openbsd/target_os_thread.h  |  25 --
>  bsd-user/qemu.h  |  16 --
>  bsd-user/strace.c|  34 ---
>  bsd-user/syscall_defs.h  |  29 +--
>  21 files changed, 1 insertion(+), 1818 deletions(-)
>  delete mode 100644 bsd-user/netbsd/host-os.h
>  delete mode 100644 bsd-user/netbsd/os-strace.h
>  delete mode 100644 bsd-user/netbsd/strace.list
>  delete mode 100644 bsd-user/netbsd/syscall_nr.h
>  delete mode 100644 bsd-user/netbsd/target_os_elf.h
>  delete mode 100644 bsd-user/netbsd/target_os_siginfo.h
>  delete mode 100644 bsd-user/netbsd/target_os_signal.h
>  delete mode 100644 bsd-user/netbsd/target_os_stack.h
>  delete mode 100644 bsd-user/netbsd/target_os_thread.h
>  delete mode 100644 bsd-user/openbsd/host-os.h
>  delete mode 100644 bsd-user/openbsd/os-strace.h
>  delete mode 100644 bsd-user/openbsd/strace.list
>  delete mode 100644 bsd-user/openbsd/syscall_nr.h
>  delete mode 100644 bsd-user/openbsd/target_os_elf.h
>  delete mode 100644 bsd-user/openbsd/target_os_siginfo.h
>  delete mode 100644 bsd-user/openbsd/target_os_signal.h
>  delete mode 100644 bsd-user/openbsd/target_os_stack.h
>  delete mode 100644 bsd-user/openbsd/target_os_thread.h
>
> --
> 2.39.2
>
>

[PATCH 0/7] bsd-user: remove bitrotted NetBSD and OpenBSD bsd-user support

2023-03-31 Thread Warner Losh

The NetBSD and OpenBSD support in bsd-user hasn't built since before the meson
conversion. It's also out of sync with many of the recent changes in the
bsd-user fork and has just been removed there. Remove it from master for the
same reasons: it generates a number of false positives with grep and has
increasingly gotten in the way. The bsd-user fork code is much more advanced,
and even it doesn't compile and is out of date. Remove this from both
branches. If others wish to bring it up to speed, I'm happy to help them.

Warner Losh (7):
  bsd-user: Remove obsolete prototypes
  bsd-user: Remove netbsd system call inclusion and defines
  bsd-user: Remove netbsd system call tracing
  bsd-user: Remove openbsd system call inclusion and defines
  bsd-user: Remove openbsd system call tracing
  bsd-user: Remove netbsd directory
  bsd-user: Remove openbsd directory

 bsd-user/netbsd/host-os.h|  25 --
 bsd-user/netbsd/os-strace.h  |   1 -
 bsd-user/netbsd/strace.list  | 145 ---
 bsd-user/netbsd/syscall_nr.h | 373 ---
 bsd-user/netbsd/target_os_elf.h  | 147 ---
 bsd-user/netbsd/target_os_siginfo.h  |  82 --
 bsd-user/netbsd/target_os_signal.h   |  69 -
 bsd-user/netbsd/target_os_stack.h|  56 
 bsd-user/netbsd/target_os_thread.h   |  25 --
 bsd-user/openbsd/host-os.h   |  25 --
 bsd-user/openbsd/os-strace.h |   1 -
 bsd-user/openbsd/strace.list | 187 --
 bsd-user/openbsd/syscall_nr.h| 225 
 bsd-user/openbsd/target_os_elf.h | 147 ---
 bsd-user/openbsd/target_os_siginfo.h |  82 --
 bsd-user/openbsd/target_os_signal.h  |  69 -
 bsd-user/openbsd/target_os_stack.h   |  56 
 bsd-user/openbsd/target_os_thread.h  |  25 --
 bsd-user/qemu.h  |  16 --
 bsd-user/strace.c|  34 ---
 bsd-user/syscall_defs.h  |  29 +--
 21 files changed, 1 insertion(+), 1818 deletions(-)
 delete mode 100644 bsd-user/netbsd/host-os.h
 delete mode 100644 bsd-user/netbsd/os-strace.h
 delete mode 100644 bsd-user/netbsd/strace.list
 delete mode 100644 bsd-user/netbsd/syscall_nr.h
 delete mode 100644 bsd-user/netbsd/target_os_elf.h
 delete mode 100644 bsd-user/netbsd/target_os_siginfo.h
 delete mode 100644 bsd-user/netbsd/target_os_signal.h
 delete mode 100644 bsd-user/netbsd/target_os_stack.h
 delete mode 100644 bsd-user/netbsd/target_os_thread.h
 delete mode 100644 bsd-user/openbsd/host-os.h
 delete mode 100644 bsd-user/openbsd/os-strace.h
 delete mode 100644 bsd-user/openbsd/strace.list
 delete mode 100644 bsd-user/openbsd/syscall_nr.h
 delete mode 100644 bsd-user/openbsd/target_os_elf.h
 delete mode 100644 bsd-user/openbsd/target_os_siginfo.h
 delete mode 100644 bsd-user/openbsd/target_os_signal.h
 delete mode 100644 bsd-user/openbsd/target_os_stack.h
 delete mode 100644 bsd-user/openbsd/target_os_thread.h

-- 
2.39.2

[PATCH 3/7] bsd-user: Remove netbsd system call tracing

2023-03-31 Thread Warner Losh

Remove NetBSD system call tracing. We've not supported building all the
BSDs into one module for some time, and the NetBSD support hasn't even
built since the meson conversion.

Signed-off-by: Warner Losh 
---
 bsd-user/qemu.h   |  5 -
 bsd-user/strace.c | 17 -
 2 files changed, 22 deletions(-)

diff --git a/bsd-user/qemu.h b/bsd-user/qemu.h
index 4062ee720f..b82f7b6f00 100644
--- a/bsd-user/qemu.h
+++ b/bsd-user/qemu.h
@@ -196,11 +196,6 @@ print_freebsd_syscall(int num,
   abi_long arg4, abi_long arg5, abi_long arg6);
 void print_freebsd_syscall_ret(int num, abi_long ret);
 void
-print_netbsd_syscall(int num,
- abi_long arg1, abi_long arg2, abi_long arg3,
- abi_long arg4, abi_long arg5, abi_long arg6);
-void print_netbsd_syscall_ret(int num, abi_long ret);
-void
 print_openbsd_syscall(int num,
   abi_long arg1, abi_long arg2, abi_long arg3,
   abi_long arg4, abi_long arg5, abi_long arg6);
diff --git a/bsd-user/strace.c b/bsd-user/strace.c
index 96499751eb..bde906e9be 100644
--- a/bsd-user/strace.c
+++ b/bsd-user/strace.c
@@ -152,9 +152,6 @@ static void print_syscall_ret_addr(const struct syscallname 
*name, abi_long ret)
 static const struct syscallname freebsd_scnames[] = {
 #include "freebsd/strace.list"
 };
-static const struct syscallname netbsd_scnames[] = {
-#include "netbsd/strace.list"
-};
 static const struct syscallname openbsd_scnames[] = {
 #include "openbsd/strace.list"
 };
@@ -229,20 +226,6 @@ void print_freebsd_syscall_ret(int num, abi_long ret)
 print_syscall_ret(num, ret, freebsd_scnames, ARRAY_SIZE(freebsd_scnames));
 }
 
-void print_netbsd_syscall(int num, abi_long arg1, abi_long arg2, abi_long arg3,
-abi_long arg4, abi_long arg5, abi_long arg6)
-{
-
-print_syscall(num, netbsd_scnames, ARRAY_SIZE(netbsd_scnames),
-  arg1, arg2, arg3, arg4, arg5, arg6);
-}
-
-void print_netbsd_syscall_ret(int num, abi_long ret)
-{
-
-print_syscall_ret(num, ret, netbsd_scnames, ARRAY_SIZE(netbsd_scnames));
-}
-
 void print_openbsd_syscall(int num, abi_long arg1, abi_long arg2, abi_long 
arg3,
 abi_long arg4, abi_long arg5, abi_long arg6)
 {
-- 
2.39.2

[PATCH 1/7] bsd-user: Remove obsolete prototypes

2023-03-31 Thread Warner Losh

These prototypes have been obsolete since 304f944e5104.

Signed-off-by: Warner Losh 
---
 bsd-user/qemu.h | 6 --
 1 file changed, 6 deletions(-)

diff --git a/bsd-user/qemu.h b/bsd-user/qemu.h
index 41d84e0b81..4062ee720f 100644
--- a/bsd-user/qemu.h
+++ b/bsd-user/qemu.h
@@ -169,12 +169,6 @@ abi_long do_freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 abi_long arg2, abi_long arg3, abi_long arg4,
 abi_long arg5, abi_long arg6, abi_long arg7,
 abi_long arg8);
-abi_long do_netbsd_syscall(void *cpu_env, int num, abi_long arg1,
-   abi_long arg2, abi_long arg3, abi_long arg4,
-   abi_long arg5, abi_long arg6);
-abi_long do_openbsd_syscall(void *cpu_env, int num, abi_long arg1,
-abi_long arg2, abi_long arg3, abi_long arg4,
-abi_long arg5, abi_long arg6);
 void gemu_log(const char *fmt, ...) G_GNUC_PRINTF(1, 2);
 extern __thread CPUState *thread_cpu;
 void cpu_loop(CPUArchState *env);
-- 
2.39.2

[PATCH 2/7] bsd-user: Remove netbsd system call inclusion and defines

2023-03-31 Thread Warner Losh

Remove NetBSD system call inclusion and defines. We've not supported
building all the BSDs into one module for some time, and the NetBSD
support hasn't even built since the meson conversion.

Signed-off-by: Warner Losh 
---
 bsd-user/syscall_defs.h | 16 
 1 file changed, 16 deletions(-)

diff --git a/bsd-user/syscall_defs.h b/bsd-user/syscall_defs.h
index b6d113d24a..8352ab783c 100644
--- a/bsd-user/syscall_defs.h
+++ b/bsd-user/syscall_defs.h
@@ -26,7 +26,6 @@
 #include "errno_defs.h"
 
 #include "freebsd/syscall_nr.h"
-#include "netbsd/syscall_nr.h"
 #include "openbsd/syscall_nr.h"
 
 /*
@@ -40,9 +39,6 @@
  * FreeBSD uses a 64bits time_t except on i386
  * so we have to add a special case here.
  *
- * On NetBSD time_t is always defined as an int64_t.  On OpenBSD time_t
- * is always defined as an int.
- *
  */
 #if (!defined(TARGET_I386))
 typedef int64_t target_freebsd_time_t;
@@ -69,18 +65,6 @@ struct target_iovec {
 
 #define TARGET_FREEBSD_MAP_FLAGMASK 0x1ff7
 
-#define TARGET_NETBSD_MAP_INHERIT   0x0080  /* region is retained after */
-/* exec */
-#define TARGET_NETBSD_MAP_TRYFIXED  0x0400  /* attempt hint address, even 
*/
-/* within break */
-#define TARGET_NETBSD_MAP_WIRED 0x0800  /* mlock() mapping when it is 
*/
-/* established */
-
-#define TARGET_NETBSD_MAP_STACK 0x2000  /* allocated from memory, */
-/* swap space (stack) */
-
-#define TARGET_NETBSD_MAP_FLAGMASK  0x3ff7
-
 #define TARGET_OPENBSD_MAP_INHERIT  0x0080  /* region is retained after */
 /* exec */
 #define TARGET_OPENBSD_MAP_NOEXTEND 0x0100  /* for MAP_FILE, don't change 
*/
-- 
2.39.2

[PATCH 5/7] bsd-user: Remove openbsd system call tracing

2023-03-31 Thread Warner Losh

Remove OpenBSD system call tracing. We've not supported building all the
BSDs into one module for some time, and the OpenBSD support hasn't even
built since the meson conversion.

Signed-off-by: Warner Losh 
---
 bsd-user/qemu.h   |  5 -
 bsd-user/strace.c | 17 -
 2 files changed, 22 deletions(-)

diff --git a/bsd-user/qemu.h b/bsd-user/qemu.h
index b82f7b6f00..c921c3cb63 100644
--- a/bsd-user/qemu.h
+++ b/bsd-user/qemu.h
@@ -195,11 +195,6 @@ print_freebsd_syscall(int num,
   abi_long arg1, abi_long arg2, abi_long arg3,
   abi_long arg4, abi_long arg5, abi_long arg6);
 void print_freebsd_syscall_ret(int num, abi_long ret);
-void
-print_openbsd_syscall(int num,
-  abi_long arg1, abi_long arg2, abi_long arg3,
-  abi_long arg4, abi_long arg5, abi_long arg6);
-void print_openbsd_syscall_ret(int num, abi_long ret);
 /**
  * print_taken_signal:
  * @target_signum: target signal being taken
diff --git a/bsd-user/strace.c b/bsd-user/strace.c
index bde906e9be..26abb9f1db 100644
--- a/bsd-user/strace.c
+++ b/bsd-user/strace.c
@@ -152,9 +152,6 @@ static void print_syscall_ret_addr(const struct syscallname 
*name, abi_long ret)
 static const struct syscallname freebsd_scnames[] = {
 #include "freebsd/strace.list"
 };
-static const struct syscallname openbsd_scnames[] = {
-#include "openbsd/strace.list"
-};
 
 static void print_syscall(int num, const struct syscallname *scnames,
 unsigned int nscnames, abi_long arg1, abi_long arg2, abi_long arg3,
@@ -226,20 +223,6 @@ void print_freebsd_syscall_ret(int num, abi_long ret)
 print_syscall_ret(num, ret, freebsd_scnames, ARRAY_SIZE(freebsd_scnames));
 }
 
-void print_openbsd_syscall(int num, abi_long arg1, abi_long arg2, abi_long 
arg3,
-abi_long arg4, abi_long arg5, abi_long arg6)
-{
-
-print_syscall(num, openbsd_scnames, ARRAY_SIZE(openbsd_scnames), arg1, 
arg2,
-arg3, arg4, arg5, arg6);
-}
-
-void print_openbsd_syscall_ret(int num, abi_long ret)
-{
-
-print_syscall_ret(num, ret, openbsd_scnames, ARRAY_SIZE(openbsd_scnames));
-}
-
 static void
 print_signal(abi_ulong arg, int last)
 {
-- 
2.39.2

[PATCH 7/7] bsd-user: Remove openbsd directory

2023-03-31 Thread Warner Losh

The OpenBSD support in the bsd-user fork can't even compile. It is being
removed there. Remove it here as well. If someone wants to revive it,
then I'm happy to help them do so. This hasn't built since the
conversion to meson.

Signed-off-by: Warner Losh 
---
 bsd-user/openbsd/host-os.h   |  25 ---
 bsd-user/openbsd/os-strace.h |   1 -
 bsd-user/openbsd/strace.list | 187 --
 bsd-user/openbsd/syscall_nr.h| 225 ---
 bsd-user/openbsd/target_os_elf.h | 147 -
 bsd-user/openbsd/target_os_siginfo.h |  82 --
 bsd-user/openbsd/target_os_signal.h  |  69 
 bsd-user/openbsd/target_os_stack.h   |  56 ---
 bsd-user/openbsd/target_os_thread.h  |  25 ---
 9 files changed, 817 deletions(-)
 delete mode 100644 bsd-user/openbsd/host-os.h
 delete mode 100644 bsd-user/openbsd/os-strace.h
 delete mode 100644 bsd-user/openbsd/strace.list
 delete mode 100644 bsd-user/openbsd/syscall_nr.h
 delete mode 100644 bsd-user/openbsd/target_os_elf.h
 delete mode 100644 bsd-user/openbsd/target_os_siginfo.h
 delete mode 100644 bsd-user/openbsd/target_os_signal.h
 delete mode 100644 bsd-user/openbsd/target_os_stack.h
 delete mode 100644 bsd-user/openbsd/target_os_thread.h

diff --git a/bsd-user/openbsd/host-os.h b/bsd-user/openbsd/host-os.h
deleted file mode 100644
index b9222335d4..00
--- a/bsd-user/openbsd/host-os.h
+++ /dev/null
@@ -1,25 +0,0 @@
-/*
- *  OpenBSD host dependent code and definitions
- *
- *  Copyright (c) 2013 Stacey D. Son
- *
- *  This program is free software; you can redistribute it and/or modify
- *  it under the terms of the GNU General Public License as published by
- *  the Free Software Foundation; either version 2 of the License, or
- *  (at your option) any later version.
- *
- *  This program is distributed in the hope that it will be useful,
- *  but WITHOUT ANY WARRANTY; without even the implied warranty of
- *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- *  GNU General Public License for more details.
- *
- *  You should have received a copy of the GNU General Public License
- *  along with this program; if not, see .
- */
-
-#ifndef HOST_OS_H
-#define HOST_OS_H
-
-#define HOST_DEFAULT_BSD_TYPE target_openbsd
-
-#endif /* HOST_OS_H */
diff --git a/bsd-user/openbsd/os-strace.h b/bsd-user/openbsd/os-strace.h
deleted file mode 100644
index 9161390433..00
--- a/bsd-user/openbsd/os-strace.h
+++ /dev/null
@@ -1 +0,0 @@
-/* XXX OpenBSD dependent strace print functions */
diff --git a/bsd-user/openbsd/strace.list b/bsd-user/openbsd/strace.list
deleted file mode 100644
index 1f0a3316f3..00
--- a/bsd-user/openbsd/strace.list
+++ /dev/null
@@ -1,187 +0,0 @@
-{ TARGET_OPENBSD_NR___getcwd, "__getcwd", NULL, NULL, NULL },
-{ TARGET_OPENBSD_NR___semctl, "__semctl", NULL, NULL, NULL },
-{ TARGET_OPENBSD_NR___syscall, "__syscall", NULL, NULL, NULL },
-{ TARGET_OPENBSD_NR___sysctl, "__sysctl", NULL, NULL, NULL },
-{ TARGET_OPENBSD_NR_accept, "accept", "%s(%d,%#x,%#x)", NULL, NULL },
-{ TARGET_OPENBSD_NR_access, "access", "%s(\"%s\",%#o)", NULL, NULL },
-{ TARGET_OPENBSD_NR_acct, "acct", NULL, NULL, NULL },
-{ TARGET_OPENBSD_NR_adjfreq, "adjfreq", NULL, NULL, NULL },
-{ TARGET_OPENBSD_NR_adjtime, "adjtime", NULL, NULL, NULL },
-{ TARGET_OPENBSD_NR_bind, "bind", NULL, NULL, NULL },
-{ TARGET_OPENBSD_NR_break, "break", NULL, NULL, NULL },
-{ TARGET_OPENBSD_NR_chdir, "chdir", "%s(\"%s\")", NULL, NULL },
-{ TARGET_OPENBSD_NR_chflags, "chflags", NULL, NULL, NULL },
-{ TARGET_OPENBSD_NR_chmod, "chmod", "%s(\"%s\",%#o)", NULL, NULL },
-{ TARGET_OPENBSD_NR_chown, "chown", NULL, NULL, NULL },
-{ TARGET_OPENBSD_NR_chroot, "chroot", NULL, NULL, NULL },
-{ TARGET_OPENBSD_NR_clock_getres, "clock_getres", NULL, NULL, NULL },
-{ TARGET_OPENBSD_NR_clock_gettime, "clock_gettime", NULL, NULL, NULL },
-{ TARGET_OPENBSD_NR_clock_settime, "clock_settime", NULL, NULL, NULL },
-{ TARGET_OPENBSD_NR_close, "close", "%s(%d)", NULL, NULL },
-{ TARGET_OPENBSD_NR_closefrom, "closefrom", NULL, NULL, NULL },
-{ TARGET_OPENBSD_NR_connect, "connect", "%s(%d,%#x,%d)", NULL, NULL },
-{ TARGET_OPENBSD_NR_dup, "dup", NULL, NULL, NULL },
-{ TARGET_OPENBSD_NR_dup2, "dup2", NULL, NULL, NULL },
-{ TARGET_OPENBSD_NR_execve, "execve", NULL, print_execve, NULL },
-{ TARGET_OPENBSD_NR_exit, "exit", "%s(%d)\n", NULL, NULL },
-{ TARGET_OPENBSD_NR_fchdir, "fchdir", NULL, NULL, NULL },
-{ TARGET_OPENBSD_NR_fchflags, "fchflags", NULL, NULL, NULL },
-{ TARGET_OPENBSD_NR_fchmod, "fchmod", "%s(%d,%#o)", NULL, NULL },
-{ TARGET_OPENBSD_NR_fchown, "fchown", "%s(\"%s\",%d,%d)", NULL, NULL },
-{ TARGET_OPENBSD_NR_fcntl, "fcntl", NULL, NULL, NULL },
-{ TARGET_OPENBSD_NR_fhopen, "fhopen", NULL, NULL, NULL },
-{ TARGET_OPENBSD_NR_fhstat, "fhstat", NULL, NULL, NULL },
-{ TARGET_OPENBSD_NR_fhstatfs, "fhstatfs", NULL, NULL, NULL },
-{ TARGET_OPENBSD_NR_flock, "flock", NULL, NULL, NULL },
-{ TARGET_OPENBS

[PATCH 4/7] bsd-user: Remove openbsd system call inclusion and defines

2023-03-31 Thread Warner Losh

Remove OpenBSD system call inclusion and defines. We've not supported
building all the BSDs into one module for some time, and the OpenBSD
support hasn't even built since the meson conversion.
---
 bsd-user/syscall_defs.h | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/bsd-user/syscall_defs.h b/bsd-user/syscall_defs.h
index 8352ab783c..a3a0fdbb52 100644
--- a/bsd-user/syscall_defs.h
+++ b/bsd-user/syscall_defs.h
@@ -26,7 +26,6 @@
 #include "errno_defs.h"
 
 #include "freebsd/syscall_nr.h"
-#include "openbsd/syscall_nr.h"
 
 /*
  * machine/_types.h
@@ -65,17 +64,7 @@ struct target_iovec {
 
 #define TARGET_FREEBSD_MAP_FLAGMASK 0x1ff7
 
-#define TARGET_OPENBSD_MAP_INHERIT  0x0080  /* region is retained after */
-/* exec */
-#define TARGET_OPENBSD_MAP_NOEXTEND 0x0100  /* for MAP_FILE, don't change 
*/
-/* file size */
-#define TARGET_OPENBSD_MAP_TRYFIXED 0x0400  /* attempt hint address, */
-/* even within heap */
-
-#define TARGET_OPENBSD_MAP_FLAGMASK 0x17f7
-
-/* XXX */
-#define TARGET_BSD_MAP_FLAGMASK 0x3ff7
+#define TARGET_BSD_MAP_FLAGMASK 0x1ff7
 
 /*
  * sys/time.h
-- 
2.39.2

[PATCH 6/7] bsd-user: Remove netbsd directory

2023-03-31 Thread Warner Losh

The NetBSD support in the bsd-user fork can't even compile. It is being
removed there. Remove it here as well. If someone wants to revive it,
then I'm happy to help them do so.  This hasn't built since the
conversion to meson.

Signed-off-by: Warner Losh 
---
 bsd-user/netbsd/host-os.h   |  25 --
 bsd-user/netbsd/os-strace.h |   1 -
 bsd-user/netbsd/strace.list | 145 ---
 bsd-user/netbsd/syscall_nr.h| 373 
 bsd-user/netbsd/target_os_elf.h | 147 ---
 bsd-user/netbsd/target_os_siginfo.h |  82 --
 bsd-user/netbsd/target_os_signal.h  |  69 -
 bsd-user/netbsd/target_os_stack.h   |  56 -
 bsd-user/netbsd/target_os_thread.h  |  25 --
 9 files changed, 923 deletions(-)
 delete mode 100644 bsd-user/netbsd/host-os.h
 delete mode 100644 bsd-user/netbsd/os-strace.h
 delete mode 100644 bsd-user/netbsd/strace.list
 delete mode 100644 bsd-user/netbsd/syscall_nr.h
 delete mode 100644 bsd-user/netbsd/target_os_elf.h
 delete mode 100644 bsd-user/netbsd/target_os_siginfo.h
 delete mode 100644 bsd-user/netbsd/target_os_signal.h
 delete mode 100644 bsd-user/netbsd/target_os_stack.h
 delete mode 100644 bsd-user/netbsd/target_os_thread.h

diff --git a/bsd-user/netbsd/host-os.h b/bsd-user/netbsd/host-os.h
deleted file mode 100644
index 7c14b1ea78..00
--- a/bsd-user/netbsd/host-os.h
+++ /dev/null
@@ -1,25 +0,0 @@
-/*
- *  NetBSD host dependent code and definitions
- *
- *  Copyright (c) 2013 Stacey D. Son
- *
- *  This program is free software; you can redistribute it and/or modify
- *  it under the terms of the GNU General Public License as published by
- *  the Free Software Foundation; either version 2 of the License, or
- *  (at your option) any later version.
- *
- *  This program is distributed in the hope that it will be useful,
- *  but WITHOUT ANY WARRANTY; without even the implied warranty of
- *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- *  GNU General Public License for more details.
- *
- *  You should have received a copy of the GNU General Public License
- *  along with this program; if not, see .
- */
-
-#ifndef HOST_OS_H
-#define HOST_OS_H
-
-#define HOST_DEFAULT_BSD_TYPE target_netbsd
-
-#endif /* HOST_OS_H */
diff --git a/bsd-user/netbsd/os-strace.h b/bsd-user/netbsd/os-strace.h
deleted file mode 100644
index 70cf51d63a..00
--- a/bsd-user/netbsd/os-strace.h
+++ /dev/null
@@ -1 +0,0 @@
-/* XXX NetBSD dependent strace print functions */
diff --git a/bsd-user/netbsd/strace.list b/bsd-user/netbsd/strace.list
deleted file mode 100644
index 5609d70d65..00
--- a/bsd-user/netbsd/strace.list
+++ /dev/null
@@ -1,145 +0,0 @@
-{ TARGET_NETBSD_NR___getcwd, "__getcwd", NULL, NULL, NULL },
-{ TARGET_NETBSD_NR___syscall, "__syscall", NULL, NULL, NULL },
-{ TARGET_NETBSD_NR___sysctl, "__sysctl", NULL, NULL, NULL },
-{ TARGET_NETBSD_NR_accept, "accept", "%s(%d,%#x,%#x)", NULL, NULL },
-{ TARGET_NETBSD_NR_access, "access", "%s(\"%s\",%#o)", NULL, NULL },
-{ TARGET_NETBSD_NR_acct, "acct", NULL, NULL, NULL },
-{ TARGET_NETBSD_NR_adjtime, "adjtime", NULL, NULL, NULL },
-{ TARGET_NETBSD_NR_bind, "bind", NULL, NULL, NULL },
-{ TARGET_NETBSD_NR_break, "break", NULL, NULL, NULL },
-{ TARGET_NETBSD_NR_chdir, "chdir", "%s(\"%s\")", NULL, NULL },
-{ TARGET_NETBSD_NR_chflags, "chflags", NULL, NULL, NULL },
-{ TARGET_NETBSD_NR_chmod, "chmod", "%s(\"%s\",%#o)", NULL, NULL },
-{ TARGET_NETBSD_NR_chown, "chown", NULL, NULL, NULL },
-{ TARGET_NETBSD_NR_chroot, "chroot", NULL, NULL, NULL },
-{ TARGET_NETBSD_NR_clock_getres, "clock_getres", NULL, NULL, NULL },
-{ TARGET_NETBSD_NR_clock_gettime, "clock_gettime", NULL, NULL, NULL },
-{ TARGET_NETBSD_NR_clock_settime, "clock_settime", NULL, NULL, NULL },
-{ TARGET_NETBSD_NR_close, "close", "%s(%d)", NULL, NULL },
-{ TARGET_NETBSD_NR_connect, "connect", "%s(%d,%#x,%d)", NULL, NULL },
-{ TARGET_NETBSD_NR_dup, "dup", NULL, NULL, NULL },
-{ TARGET_NETBSD_NR_dup2, "dup2", NULL, NULL, NULL },
-{ TARGET_NETBSD_NR_execve, "execve", NULL, print_execve, NULL },
-{ TARGET_NETBSD_NR_exit, "exit", "%s(%d)\n", NULL, NULL },
-{ TARGET_NETBSD_NR_fchdir, "fchdir", NULL, NULL, NULL },
-{ TARGET_NETBSD_NR_fchflags, "fchflags", NULL, NULL, NULL },
-{ TARGET_NETBSD_NR_fchmod, "fchmod", "%s(%d,%#o)", NULL, NULL },
-{ TARGET_NETBSD_NR_fchown, "fchown", "%s(\"%s\",%d,%d)", NULL, NULL },
-{ TARGET_NETBSD_NR_fcntl, "fcntl", NULL, NULL, NULL },
-{ TARGET_NETBSD_NR_flock, "flock", NULL, NULL, NULL },
-{ TARGET_NETBSD_NR_fork, "fork", "%s()", NULL, NULL },
-{ TARGET_NETBSD_NR_fpathconf, "fpathconf", NULL, NULL, NULL },
-{ TARGET_NETBSD_NR_fsync, "fsync", NULL, NULL, NULL },
-{ TARGET_NETBSD_NR_ftruncate, "ftruncate", NULL, NULL, NULL },
-{ TARGET_NETBSD_NR_futimes, "futimes", NULL, NULL, NULL },
-{ TARGET_NETBSD_NR_getegid, "getegid", "%s()", NULL, NULL },
-{ TARGET_NETBSD_NR_geteuid, "geteuid", "%s()", NULL, NULL },
-{ TARGET_NETBSD_NR_getgid, "getgid", "%s()", NU

Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram

2023-03-31 Thread Fabiano Rosas

Peter Xu  writes:

> On Thu, Mar 30, 2023 at 03:03:10PM -0300, Fabiano Rosas wrote:
>> Hi folks,
>
> Hi,
>
>> 
>> I'm continuing the work done last year to add a new format of
>> migration stream that can be used to migrate large guests to a single
>> file in a performant way.
>> 
>> This is an early RFC with the previous code + my additions to support
>> multifd and direct IO. Let me know what you think!
>> 
>> Here are the reference links for previous discussions:
>> 
>> https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg01813.html
>> https://lists.gnu.org/archive/html/qemu-devel/2022-10/msg01338.html
>> https://lists.gnu.org/archive/html/qemu-devel/2022-10/msg05536.html
>> 
>> The series has 4 main parts:
>> 
>> 1) File migration: A new "file:" migration URI. So "file:mig" does the
>>same as "exec:cat > mig". Patches 1-4 implement this;
>> 
>> 2) Fixed-ram format: A new format for the migration stream. Puts guest
>>pages at their relative offsets in the migration file. This saves
>>space on the worst case of RAM utilization because every page has a
>>fixed offset in the migration file and (potentially) saves us time
>>because we could write pages independently in parallel. It also
>>gives alignment guarantees so we could use O_DIRECT. Patches 5-13
>>implement this;
>> 
>> With patches 1-13 these two^ can be used with:
>> 
>> (qemu) migrate_set_capability fixed-ram on
>> (qemu) migrate[_incoming] file:mig
>
> Have you considered enabling the new fixed-ram format with postcopy when
> loading?
>
> Due to the linear offseting of pages, I think it can achieve super fast vm
> loads due to O(1) lookup of pages and local page fault resolutions.
>

I don't think we have looked that much at the loading side yet. Good to
know that it has potential to be faster. I'll look into it. Thanks for
the suggestion.

>> 
>> --> new in this series:
>> 
>> 3) MultiFD support: This is about making use of the parallelism
>>allowed by the new format. We just need the threading and page
>>queuing infrastructure that is already in place for
>>multifd. Patches 14-24 implement this;
>> 
>> (qemu) migrate_set_capability fixed-ram on
>> (qemu) migrate_set_capability multifd on
>> (qemu) migrate_set_parameter multifd-channels 4
>> (qemu) migrate_set_parameter max-bandwith 0
>> (qemu) migrate[_incoming] file:mig
>> 
>> 4) Add a new "direct_io" parameter and enable O_DIRECT for the
>>properly aligned segments of the migration (mostly ram). Patch 25.
>> 
>> (qemu) migrate_set_parameter direct-io on
>> 
>> Thanks! Some data below:
>> =
>> 
>> Outgoing migration to file. NVMe disk. XFS filesystem.
>> 
>> - Single migration runs of stopped 32G guest with ~90% RAM usage. Guest
>>   running `stress-ng --vm 4 --vm-bytes 90% --vm-method all --verify -t
>>   10m -v`:
>> 
>> migration type  | MB/s | pages/s |  ms
>> +--+-+--
>> savevm io_uring |  434 |  102294 | 71473
>
> So I assume this is the non-live migration scenario.  Could you explain
> what does io_uring mean here?
>

This table is all non-live migration. This particular line is a snapshot
(hmp_savevm->save_snapshot). I thought it could be relevant because it
is another way by which we write RAM into disk.

The io_uring is noise, I was initially under the impression that the
block device aio configuration affected this scenario.

>> file:   | 3017 |  855862 | 10301
>> fixed-ram   | 1982 |  330686 | 15637
>> +--+-+--
>> fixed-ram + multifd + O_DIRECT
>>  2 ch.  | 5565 | 1500882 |  5576
>>  4 ch.  | 5735 | 1991549 |  5412
>>  8 ch.  | 5650 | 1769650 |  5489
>> 16 ch.  | 6071 | 1832407 |  5114
>> 32 ch.  | 6147 | 1809588 |  5050
>> 64 ch.  | 6344 | 1841728 |  4895
>>128 ch.  | 6120 | 1915669 |  5085
>> +--+-+--
>
> Thanks,

Re: [RFC PATCH v1 10/26] migration/ram: Introduce 'fixed-ram' migration stream capability

2023-03-31 Thread Peter Xu

On Fri, Mar 31, 2023 at 08:56:01AM +0100, Daniel P. Berrangé wrote:
> On Thu, Mar 30, 2023 at 06:01:51PM -0400, Peter Xu wrote:
> > On Thu, Mar 30, 2023 at 03:03:20PM -0300, Fabiano Rosas wrote:
> > > From: Nikolay Borisov 
> > > 
> > > Implement 'fixed-ram' feature. The core of the feature is to ensure that
> > > each ram page of the migration stream has a specific offset in the
> > > resulting migration stream. The reason why we'd want such behavior are
> > > two fold:
> > > 
> > >  - When doing a 'fixed-ram' migration the resulting file will have a
> > >bounded size, since pages which are dirtied multiple times will
> > >always go to a fixed location in the file, rather than constantly
> > >being added to a sequential stream. This eliminates cases where a vm
> > >with, say, 1G of ram can result in a migration file that's 10s of
> > >GBs, provided that the workload constantly redirties memory.
> > > 
> > >  - It paves the way to implement DIO-enabled save/restore of the
> > >migration stream as the pages are ensured to be written at aligned
> > >offsets.
> > > 
> > > The feature requires changing the stream format. First, a bitmap is
> > > introduced which tracks which pages have been written (i.e are
> > > dirtied) during migration and subsequently it's being written in the
> > > resulting file, again at a fixed location for every ramblock. Zero
> > > pages are ignored as they'd be zero in the destination migration as
> > > well. With the changed format data would look like the following:
> > > 
> > > |name len|name|used_len|pc*|bitmap_size|pages_offset|bitmap|pages|
> > 
> > What happens with huge pages?  Would page size matter here?
> > 
> > I would assume it's fine it uses a constant (small) page size, assuming
> > that should match with the granule that qemu tracks dirty (which IIUC is
> > the host page size not guest's).
> > 
> > But I didn't yet pay any further thoughts on that, maybe it would be
> > worthwhile in all cases to record page sizes here to be explicit or the
> > meaning of bitmap may not be clear (and then the bitmap_size will be a
> > field just for sanity check too).
> 
> I think recording the page sizes is an anti-feature in this case.
> 
> The migration format / state needs to reflect the guest ABI, but
> we need to be free to have different backend config behind that
> either side of the save/restore.
> 
> IOW, if I start a QEMU with 2 GB of RAM, I should be free to use
> small pages initially and after restore use 2 x 1 GB hugepages,
> or vica-verca.
> 
> The important thing with the pages that are saved into the file
> is that they are a 1:1 mapping guest RAM regions to file offsets.
> IOW, the 2 GB of guest RAM is always a contiguous 2 GB region
> in the file.
> 
> If the src VM used 1 GB pages, we would be writing a full 2 GB
> of data assuming both pages were dirty.
> 
> If the src VM used 4k pages, we would be writing some subset of
> the 2 GB of data, and the rest would be unwritten.
> 
> Either way, when reading back the data we restore it into either
> 1 GB pages of 4k pages, beause any places there were unwritten
> orignally  will read back as zeros.

I think there's already the page size information, because there's a bitmap
embeded in the format at least in the current proposal, and the bitmap can
only be defined with a page size provided in some form.

Here I agree the backend can change before/after a migration (live or
not).  Though the question is whether page size matters in the snapshot
layout rather than what the loaded QEMU instance will use as backend.

> 
> > If postcopy might be an option, we'd want the page size to be the host page
> > size because then looking up the bitmap will be straightforward, deciding
> > whether we should copy over page (UFFDIO_COPY) or fill in with zeros
> > (UFFDIO_ZEROPAGE).
> 
> This format is only intended for the case where we are migrating to
> a random-access medium, aka a file, because the fixed RAM mappings
> to disk mean that we need to seek back to the original location to
> re-write pages that get dirtied. It isn't suitable for a live
> migration stream, and thus postcopy is inherantly out of scope.

Yes, I've commented also in the cover letter, but I can expand a bit.

I mean support postcopy only when loading, but not when saving.

Saving to file definitely cannot work with postcopy because there's no dest
qemu running.

Loading from file, OTOH, can work together with postcopy.

Right now AFAICT current approach is precopy loading the whole guest image
with the supported snapshot format (if I can call it just a snapshot).

What I want to say is we can consider supporting postcopy on loading in
that we start an "empty" QEMU dest node, when any page fault triggered we
do it using userfault and lookup the snapshot file instead rather than
sending a request back to the source.  I mentioned that because there'll be
two major benefits which I mentioned in reply to the cover letter quickly,
but I

[PATCH 3/3] target/arm: Implement FEAT_PAN3

2023-03-31 Thread Peter Maydell

FEAT_PAN3 adds an EPAN bit to SCTLR_EL1 and SCTLR_EL2, which allows
the PAN bit to make memory non-privileged-read/write if it is
user-executable as well as if it is user-read/write.

Implement this feature and enable it in the AArch64 'max' CPU.

Signed-off-by: Peter Maydell 
---
 docs/system/arm/emulation.rst |  1 +
 target/arm/cpu.h  |  5 +
 target/arm/cpu64.c|  2 +-
 target/arm/ptw.c  | 14 +-
 4 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index 2062d712610..73389878755 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -56,6 +56,7 @@ the following architecture extensions:
 - FEAT_MTE3 (MTE Asymmetric Fault Handling)
 - FEAT_PAN (Privileged access never)
 - FEAT_PAN2 (AT S1E1R and AT S1E1W instruction variants affected by PSTATE.PAN)
+- FEAT_PAN3 (Support for SCTLR_ELx.EPAN)
 - FEAT_PAuth (Pointer authentication)
 - FEAT_PMULL (PMULL, PMULL2 instructions)
 - FEAT_PMUv3p1 (PMU Extensions v3.1)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index c097cae9882..d469a2637b3 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3823,6 +3823,11 @@ static inline bool isar_feature_aa64_ats1e1(const 
ARMISARegisters *id)
 return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, PAN) >= 2;
 }
 
+static inline bool isar_feature_aa64_pan3(const ARMISARegisters *id)
+{
+return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, PAN) >= 3;
+}
+
 static inline bool isar_feature_aa64_hcx(const ARMISARegisters *id)
 {
 return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, HCX) != 0;
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 0fb07cc7b6d..735ca541634 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -1302,7 +1302,7 @@ static void aarch64_max_initfn(Object *obj)
 t = FIELD_DP64(t, ID_AA64MMFR1, VH, 1);   /* FEAT_VHE */
 t = FIELD_DP64(t, ID_AA64MMFR1, HPDS, 1); /* FEAT_HPDS */
 t = FIELD_DP64(t, ID_AA64MMFR1, LO, 1);   /* FEAT_LOR */
-t = FIELD_DP64(t, ID_AA64MMFR1, PAN, 2);  /* FEAT_PAN2 */
+t = FIELD_DP64(t, ID_AA64MMFR1, PAN, 3);  /* FEAT_PAN3 */
 t = FIELD_DP64(t, ID_AA64MMFR1, XNX, 1);  /* FEAT_XNX */
 t = FIELD_DP64(t, ID_AA64MMFR1, ETS, 1);  /* FEAT_ETS */
 t = FIELD_DP64(t, ID_AA64MMFR1, HCX, 1);  /* FEAT_HCX */
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index ec3f51782aa..499308fcb07 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -947,6 +947,7 @@ static int get_S2prot(CPUARMState *env, int s2ap, int xn, 
bool s1_is_el0)
 static int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
   int ap, int ns, int xn, int pxn)
 {
+ARMCPU *cpu = env_archcpu(env);
 bool is_user = regime_is_user(env, mmu_idx);
 int prot_rw, user_rw;
 bool have_wxn;
@@ -958,8 +959,19 @@ static int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, 
bool is_aa64,
 if (is_user) {
 prot_rw = user_rw;
 } else {
+/*
+ * PAN controls can forbid data accesses but don't affect insn fetch.
+ * Plain PAN forbids data accesses if EL0 has data permissions;
+ * PAN3 forbids data accesses if EL0 has either data or exec perms.
+ * Note that for AArch64 the 'user can exec' case is exactly !xn.
+ * We make the IMPDEF choices that SCR_EL3.SIF and Realm EL2&0
+ * do not affect EPAN.
+ */
 if (user_rw && regime_is_pan(env, mmu_idx)) {
-/* PAN forbids data accesses but doesn't affect insn fetch */
+prot_rw = 0;
+} else if (cpu_isar_feature(aa64_pan3, cpu) && is_aa64 &&
+   regime_is_pan(env, mmu_idx) &&
+   (regime_sctlr(env, mmu_idx) & SCTLR_EPAN) && !xn) {
 prot_rw = 0;
 } else {
 prot_rw = simple_ap_to_rw_prot_is_user(ap, false);
-- 
2.34.1

[PATCH 1/3] target/arm: Pass ARMMMUFaultInfo to merge_syn_data_abort()

2023-03-31 Thread Peter Maydell

We already pass merge_syn_data_abort() two fields from the
ARMMMUFaultInfo struct, and we're about to want to use a third field.
Refactor to just pass a pointer to the fault info.

Signed-off-by: Peter Maydell 
---
 target/arm/tcg/tlb_helper.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/target/arm/tcg/tlb_helper.c b/target/arm/tcg/tlb_helper.c
index 31eb77f7df9..1a61adb8a68 100644
--- a/target/arm/tcg/tlb_helper.c
+++ b/target/arm/tcg/tlb_helper.c
@@ -24,9 +24,9 @@ bool arm_s1_regime_using_lpae_format(CPUARMState *env, 
ARMMMUIdx mmu_idx)
 }
 
 static inline uint32_t merge_syn_data_abort(uint32_t template_syn,
+ARMMMUFaultInfo *fi,
 unsigned int target_el,
-bool same_el, bool ea,
-bool s1ptw, bool is_write,
+bool same_el, bool is_write,
 int fsc)
 {
 uint32_t syn;
@@ -43,9 +43,9 @@ static inline uint32_t merge_syn_data_abort(uint32_t 
template_syn,
  * ISS encoding for an exception from a Data Abort, the
  * ISV field.
  */
-if (!(template_syn & ARM_EL_ISV) || target_el != 2 || s1ptw) {
+if (!(template_syn & ARM_EL_ISV) || target_el != 2 || fi->s1ptw) {
 syn = syn_data_abort_no_iss(same_el, 0,
-ea, 0, s1ptw, is_write, fsc);
+fi->ea, 0, fi->s1ptw, is_write, fsc);
 } else {
 /*
  * Fields: IL, ISV, SAS, SSE, SRT, SF and AR come from the template
@@ -54,7 +54,7 @@ static inline uint32_t merge_syn_data_abort(uint32_t 
template_syn,
  */
 syn = syn_data_abort_with_iss(same_el,
   0, 0, 0, 0, 0,
-  ea, 0, s1ptw, is_write, fsc,
+  fi->ea, 0, fi->s1ptw, is_write, fsc,
   true);
 /* Merge the runtime syndrome with the template syndrome.  */
 syn |= template_syn;
@@ -117,9 +117,8 @@ void arm_deliver_fault(ARMCPU *cpu, vaddr addr,
 syn = syn_insn_abort(same_el, fi->ea, fi->s1ptw, fsc);
 exc = EXCP_PREFETCH_ABORT;
 } else {
-syn = merge_syn_data_abort(env->exception.syndrome, target_el,
-   same_el, fi->ea, fi->s1ptw,
-   access_type == MMU_DATA_STORE,
+syn = merge_syn_data_abort(env->exception.syndrome, fi, target_el,
+   same_el, access_type == MMU_DATA_STORE,
fsc);
 if (access_type == MMU_DATA_STORE
 && arm_feature(env, ARM_FEATURE_V6)) {
-- 
2.34.1

[PATCH 0/3] target/arm: Fix ESR_EL2 buglet, implement FEAT_PAN3

2023-03-31 Thread Peter Maydell

The main purpose of this patchset is to implement FEAT_PAN3,
which allows the guest to force privileged code to not be able
to access memory that can be executed by user code. (This is
an extension of the existing FEAT_PAN which denies access
if user code could read or write the memory.) That is all
done in patch 3.

Patches 1 and 2 fix a buglet in our ESR_EL2 syndrome reporting
that I happened to notice while testing the FEAT_PAN3 code:
we were reporting the detailed instruction syndrome information
for all data aborts reported to EL2, whereas the architecture
requires this to happen only for stage-2 aborts, not stage-1
aborts.

This is all for-8.1 material -- the syndrome bug is minor
and has been around forever so isn't worth trying to fix
for 8.0 at this point in the release cycle.

thanks
-- PMM

Peter Maydell (3):
  target/arm: Pass ARMMMUFaultInfo to merge_syn_data_abort()
  target/arm: Don't set ISV when reporting stage 1 faults in ESR_EL2
  target/arm: Implement FEAT_PAN3

 docs/system/arm/emulation.rst |  1 +
 target/arm/cpu.h  |  5 +
 target/arm/cpu64.c|  2 +-
 target/arm/ptw.c  | 14 +-
 target/arm/tcg/tlb_helper.c   | 26 --
 5 files changed, 36 insertions(+), 12 deletions(-)

-- 
2.34.1

[PATCH 2/3] target/arm: Don't set ISV when reporting stage 1 faults in ESR_EL2

2023-03-31 Thread Peter Maydell

The syndrome value reported to ESR_EL2 should only contain the
detailed instruction syndrome information when the fault has been
caused by a stage 2 abort, not when the fault was a stage 1 abort
(i.e.  caused by execution at EL2).  We were getting this wrong and
reporting the detailed ISV information all the time.

Fix the bug by checking fi->stage2.  Add a TODO comment noting the
cases where we'll have to come back and revisit this when we
implement FEAT_LS64 and friends.

Signed-off-by: Peter Maydell 
---
 target/arm/tcg/tlb_helper.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/target/arm/tcg/tlb_helper.c b/target/arm/tcg/tlb_helper.c
index 1a61adb8a68..d5a89bc5141 100644
--- a/target/arm/tcg/tlb_helper.c
+++ b/target/arm/tcg/tlb_helper.c
@@ -32,8 +32,9 @@ static inline uint32_t merge_syn_data_abort(uint32_t 
template_syn,
 uint32_t syn;
 
 /*
- * ISV is only set for data aborts routed to EL2 and
- * never for stage-1 page table walks faulting on stage 2.
+ * ISV is only set for stage-2 data aborts routed to EL2 and
+ * never for stage-1 page table walks faulting on stage 2
+ * or for stage-1 faults.
  *
  * Furthermore, ISV is only set for certain kinds of load/stores.
  * If the template syndrome does not have ISV set, we should leave
@@ -42,8 +43,14 @@ static inline uint32_t merge_syn_data_abort(uint32_t 
template_syn,
  * See ARMv8 specs, D7-1974:
  * ISS encoding for an exception from a Data Abort, the
  * ISV field.
+ *
+ * TODO: FEAT_LS64/FEAT_LS64_V/FEAT_SL64_ACCDATA: Translation,
+ * Access Flag, and Permission faults caused by LD64B, ST64B,
+ * ST64BV, or ST64BV0 insns report syndrome info even for stage-1
+ * faults and regardless of the target EL.
  */
-if (!(template_syn & ARM_EL_ISV) || target_el != 2 || fi->s1ptw) {
+if (!(template_syn & ARM_EL_ISV) || target_el != 2
+|| fi->s1ptw || !fi->stage2) {
 syn = syn_data_abort_no_iss(same_el, 0,
 fi->ea, 0, fi->s1ptw, is_write, fsc);
 } else {
-- 
2.34.1

Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram

2023-03-31 Thread Peter Xu

On Fri, Mar 31, 2023 at 11:37:50AM -0300, Fabiano Rosas wrote:
> >> Outgoing migration to file. NVMe disk. XFS filesystem.
> >> 
> >> - Single migration runs of stopped 32G guest with ~90% RAM usage. Guest
> >>   running `stress-ng --vm 4 --vm-bytes 90% --vm-method all --verify -t
> >>   10m -v`:
> >> 
> >> migration type  | MB/s | pages/s |  ms
> >> +--+-+--
> >> savevm io_uring |  434 |  102294 | 71473
> >
> > So I assume this is the non-live migration scenario.  Could you explain
> > what does io_uring mean here?
> >
> 
> This table is all non-live migration. This particular line is a snapshot
> (hmp_savevm->save_snapshot). I thought it could be relevant because it
> is another way by which we write RAM into disk.

I see, so if all non-live that explains, because I was curious what's the
relationship between this feature and the live snapshot that QEMU also
supports.

I also don't immediately see why savevm will be much slower, do you have an
answer?  Maybe it's somewhere but I just overlooked..

IIUC this is "vm suspend" case, so there's an extra benefit knowledge of
"we can stop the VM".  It smells slightly weird to build this on top of
"migrate" from that pov, rather than "savevm", though.  Any thoughts on
this aspect (on why not building this on top of "savevm")?

Thanks,

> 
> The io_uring is noise, I was initially under the impression that the
> block device aio configuration affected this scenario.
> 
> >> file:   | 3017 |  855862 | 10301
> >> fixed-ram   | 1982 |  330686 | 15637
> >> +--+-+--
> >> fixed-ram + multifd + O_DIRECT
> >>  2 ch.  | 5565 | 1500882 |  5576
> >>  4 ch.  | 5735 | 1991549 |  5412
> >>  8 ch.  | 5650 | 1769650 |  5489
> >> 16 ch.  | 6071 | 1832407 |  5114
> >> 32 ch.  | 6147 | 1809588 |  5050
> >> 64 ch.  | 6344 | 1841728 |  4895
> >>128 ch.  | 6120 | 1915669 |  5085
> >> +--+-+--
> >
> > Thanks,
> 

-- 
Peter Xu

Re: [PATCH] meson: add more version numbers to the summary

2023-03-31 Thread Richard Henderson


On 3/30/23 03:46, Paolo Bonzini wrote:

-cc.find_library('gpg-error', required: true)])
+cc.find_library('gpg-error', required: true)],
+version: gcrypt.version())


Indentation.

Reviewed-by: Richard Henderson 


r~

Re: [PATCH] meson: add more version numbers to the summary

2023-03-31 Thread Richard Henderson


On 3/31/23 07:54, Richard Henderson wrote:

On 3/30/23 03:46, Paolo Bonzini wrote:

-    cc.find_library('gpg-error', required: true)])
+    cc.find_library('gpg-error', required: true)],
+    version: gcrypt.version())


Indentation.


Bah, mis-read the patch.



Reviewed-by: Richard Henderson 


r~

[PATCH] MAINTAINERS: Add Eugenio Pérez as vhost-shadow-virtqueue reviewer

2023-03-31 Thread Eugenio Pérez

I'd like to be notified on SVQ patches and review them.

Signed-off-by: Eugenio Pérez 
---
 MAINTAINERS | 4 
 1 file changed, 4 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index ef45b5e71e..986119e8ab 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2061,6 +2061,10 @@ F: backends/vhost-user.c
 F: include/sysemu/vhost-user-backend.h
 F: subprojects/libvhost-user/
 
+vhost-shadow-virtqueue
+R: Eugenio Pérez 
+F: hw/virtio/vhost-shadow-virtqueue.*
+
 virtio
 M: Michael S. Tsirkin 
 S: Supported
-- 
2.31.1

Re: [RFC PATCH v1 10/26] migration/ram: Introduce 'fixed-ram' migration stream capability

2023-03-31 Thread Fabiano Rosas

Peter Xu  writes:

>> 
>> * pc - refers to the page_size/mr->addr members, so newly added members
>> begin from "bitmap_size".
>
> Could you elaborate more on what's the pc?
>
> I also didn't see this *pc in below migration.rst update.
>

Yeah, you need to be looking at the code to figure that one out. That
was intended to reference some postcopy data that is (already) inserted
into the stream. Literally this:

if (migrate_postcopy_ram() && block->page_size !=
  qemu_host_page_size) {
qemu_put_be64(f, block->page_size);
}
if (migrate_ignore_shared()) {
qemu_put_be64(f, block->mr->addr);
}

It has nothing to do with this patch. I need to rewrite that part of the
commit message a bit.

>> 
>> This layout is initialized during ram_save_setup so instead of having a
>> sequential stream of pages that follow the ramblock headers the dirty
>> pages for a ramblock follow its header. Since all pages have a fixed
>> location RAM_SAVE_FLAG_EOS is no longer generated on every migration
>> iteration but there is effectively a single RAM_SAVE_FLAG_EOS right at
>> the end.
>> 
>> Signed-off-by: Nikolay Borisov 
>> Signed-off-by: Fabiano Rosas 

...

>> @@ -4390,6 +4432,12 @@ void migrate_fd_connect(MigrationState *s, Error 
>> *error_in)
>>  }
>>  }
>>  
>> +if (migrate_check_fixed_ram(s, &local_err) < 0) {
>
> This check might be too late afaict, QMP cmd "migrate" could have already
> succeeded.
>
> Can we do an early check in / close to qmp_migrate()?  The idea is we fail
> at the QMP migrate command there.
>

Yes, some of it depends on the QEMUFile being known but I can at least
move part of the verification earlier.

>> +migrate_fd_cleanup(s);
>> +migrate_fd_error(s, local_err);
>> +return;
>> +}
>> +
>>  if (resume) {
>>  /* Wakeup the main migration thread to do the recovery */
>>  migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_PAUSED,
>> @@ -4519,6 +4567,7 @@ static Property migration_properties[] = {
>>  DEFINE_PROP_STRING("tls-authz", MigrationState, parameters.tls_authz),
>>  
>>  /* Migration capabilities */
>> +DEFINE_PROP_MIG_CAP("x-fixed-ram", MIGRATION_CAPABILITY_FIXED_RAM),
>>  DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
>>  DEFINE_PROP_MIG_CAP("x-rdma-pin-all", 
>> MIGRATION_CAPABILITY_RDMA_PIN_ALL),
>>  DEFINE_PROP_MIG_CAP("x-auto-converge", 
>> MIGRATION_CAPABILITY_AUTO_CONVERGE),
>> diff --git a/migration/migration.h b/migration/migration.h
>> index 2da2f8a164..8cf3caecfe 100644
>> --- a/migration/migration.h
>> +++ b/migration/migration.h
>> @@ -416,6 +416,7 @@ bool migrate_zero_blocks(void);
>>  bool migrate_dirty_bitmaps(void);
>>  bool migrate_ignore_shared(void);
>>  bool migrate_validate_uuid(void);
>> +int migrate_fixed_ram(void);
>>  
>>  bool migrate_auto_converge(void);
>>  bool migrate_use_multifd(void);
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 96e8a19a58..56f0f782c8 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -1310,9 +1310,14 @@ static int save_zero_page_to_file(PageSearchStatus 
>> *pss,
>>  int len = 0;
>>  
>>  if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
>> -len += save_page_header(pss, block, offset | RAM_SAVE_FLAG_ZERO);
>> -qemu_put_byte(file, 0);
>> -len += 1;
>> +if (migrate_fixed_ram()) {
>> +/* for zero pages we don't need to do anything */
>> +len = 1;
>
> I think you wanted to increase the "duplicated" counter, but this will also
> increase ram-transferred even though only 1 byte.
>

Ah, well spotted, that is indeed incorrect.

> Perhaps just pass a pointer to keep the bytes, and return true/false to
> increase the counter (to make everything accurate)?
>

Ok

>> +} else {
>> +len += save_page_header(pss, block, offset | 
>> RAM_SAVE_FLAG_ZERO);
>> +qemu_put_byte(file, 0);
>> +len += 1;
>> +}
>>  ram_release_page(block->idstr, offset);
>>  }
>>  return len;

[PATCH v4 7/8] target/riscv: Enable PC-relative translation in system mode

2023-03-31 Thread Weiwei Li

The existence of CF_PCREL can improve performance with the guest
kernel's address space randomization.  Each guest process maps
libc.so (et al) at a different virtual address, and this allows
those translations to be shared.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: LIU Zhiwei 
---
 target/riscv/cpu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 646fa31a59..3b562d5d9f 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1193,6 +1193,8 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 
 
 #ifndef CONFIG_USER_ONLY
+cs->tcg_cflags |= CF_PCREL;
+
 if (cpu->cfg.ext_sstc) {
 riscv_timer_init(cpu);
 }
-- 
2.25.1

[PATCH v4 5/8] accel/tcg: Fix overwrite problems of tcg_cflags

2023-03-31 Thread Weiwei Li

CPUs often set CF_PCREL in tcg_cflags before qemu_init_vcpu(), in which
tcg_cflags will be overwrited by tcg_cpu_init_cflags().

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 accel/tcg/tcg-accel-ops.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/accel/tcg/tcg-accel-ops.c b/accel/tcg/tcg-accel-ops.c
index af35e0d092..58c8e64096 100644
--- a/accel/tcg/tcg-accel-ops.c
+++ b/accel/tcg/tcg-accel-ops.c
@@ -59,7 +59,7 @@ void tcg_cpu_init_cflags(CPUState *cpu, bool parallel)
 
 cflags |= parallel ? CF_PARALLEL : 0;
 cflags |= icount_enabled() ? CF_USE_ICOUNT : 0;
-cpu->tcg_cflags = cflags;
+cpu->tcg_cflags |= cflags;
 }
 
 void tcg_cpus_destroy(CPUState *cpu)
-- 
2.25.1

[PATCH v4 2/8] target/riscv: Update cur_pmmask/base when xl changes

2023-03-31 Thread Weiwei Li

write_mstatus() can only change current xl when in debug mode.
And we need update cur_pmmask/base in this case.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: LIU Zhiwei 
---
 target/riscv/csr.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index d522efc0b6..43b9ad4500 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -1277,8 +1277,15 @@ static RISCVException write_mstatus(CPURISCVState *env, 
int csrno,
 mstatus = set_field(mstatus, MSTATUS64_SXL, xl);
 }
 env->mstatus = mstatus;
-env->xl = cpu_recompute_xl(env);
 
+/*
+ * Except in debug mode, UXL/SXL can only be modified by higher
+ * privilege mode. So xl will not be changed in normal mode.
+ */
+if (env->debugger) {
+env->xl = cpu_recompute_xl(env);
+riscv_cpu_update_mask(env);
+}
 return RISCV_EXCP_NONE;
 }
 
-- 
2.25.1

[PATCH v4 1/8] target/riscv: Fix pointer mask transformation for vector address

2023-03-31 Thread Weiwei Li

actual_address = (requested_address & ~mpmmask) | mpmbase.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: Daniel Henrique Barboza 
Reviewed-by: LIU Zhiwei 
---
 target/riscv/vector_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 2423affe37..a58d82af8c 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -172,7 +172,7 @@ static inline uint32_t vext_get_total_elems(CPURISCVState 
*env, uint32_t desc,
 
 static inline target_ulong adjust_addr(CPURISCVState *env, target_ulong addr)
 {
-return (addr & env->cur_pmmask) | env->cur_pmbase;
+return (addr & ~env->cur_pmmask) | env->cur_pmbase;
 }
 
 /*
-- 
2.25.1

[PATCH v4 6/8] accel/tcg: Fix tb mis-matched problem when CF_PCREL is enabled

2023-03-31 Thread Weiwei Li

A corner case is triggered  when tb block with first_pc = 0x8008
and first_pc = 0x80200 has the same jump cache hash, and share
the same tb entry with the same tb information except PC.
The executed sequence is as follows:
tb(0x8008) -> tb(0x8008)-> tb(0x80200) -> tb(0x8008)

1. At the first time tb for 0x8008 is loaded, tb in jmp_cache is
filled, however pc is not updated.
2. At the second time tb for 0x8008 is looked up in tb_lookup(),
pc in jmp cache is set to 0x8008.
3. when tb for 0x80200 is loaded, tb for jmp cache is updated to
this block, however pc is not updated, and remains to be 0x8008.
4. Finally at the last time tb for 0x8008 is looked up, tb for
0x80200 is mismatched.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 accel/tcg/cpu-exec.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index c815f2dbfd..faff413f42 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -983,6 +983,9 @@ cpu_exec_loop(CPUState *cpu, SyncClocks *sc)
 h = tb_jmp_cache_hash_func(pc);
 /* Use the pc value already stored in tb->pc. */
 qatomic_set(&cpu->tb_jmp_cache->array[h].tb, tb);
+if (cflags & CF_PCREL) {
+qatomic_set(&cpu->tb_jmp_cache->array[h].pc, pc);
+}
 }
 
 #ifndef CONFIG_USER_ONLY
-- 
2.25.1

[PATCH v4 8/8] target/riscv: Add pointer mask support for instruction fetch

2023-03-31 Thread Weiwei Li

Transform the fetch address in cpu_get_tb_cpu_state() when pointer
mask for instruction is enabled.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 target/riscv/cpu.h|  1 +
 target/riscv/cpu_helper.c | 20 +++-
 target/riscv/csr.c|  2 --
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 638e47c75a..57bd9c3279 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -368,6 +368,7 @@ struct CPUArchState {
 #endif
 target_ulong cur_pmmask;
 target_ulong cur_pmbase;
+bool cur_pminsn;
 
 /* Fields from here on are preserved across CPU reset. */
 QEMUTimer *stimer; /* Internal timer for S-mode interrupt */
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index f88c503cf4..b683a770fe 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -40,6 +40,19 @@ int riscv_cpu_mmu_index(CPURISCVState *env, bool ifetch)
 #endif
 }
 
+static target_ulong adjust_pc_address(CPURISCVState *env, target_ulong pc)
+{
+target_ulong adjust_pc = pc;
+
+if (env->cur_pminsn) {
+adjust_pc = (adjust_pc & ~env->cur_pmmask) | env->cur_pmbase;
+} else if (env->xl == MXL_RV32) {
+adjust_pc &= UINT32_MAX;
+}
+
+return adjust_pc;
+}
+
 void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong *pc,
   target_ulong *cs_base, uint32_t *pflags)
 {
@@ -48,7 +61,7 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong 
*pc,
 
 uint32_t flags = 0;
 
-*pc = env->xl == MXL_RV32 ? env->pc & UINT32_MAX : env->pc;
+*pc = adjust_pc_address(env, env->pc);
 *cs_base = 0;
 
 if (cpu->cfg.ext_zve32f) {
@@ -124,6 +137,7 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong 
*pc,
 void riscv_cpu_update_mask(CPURISCVState *env)
 {
 target_ulong mask = -1, base = 0;
+bool insn = false;
 /*
  * TODO: Current RVJ spec does not specify
  * how the extension interacts with XLEN.
@@ -135,18 +149,21 @@ void riscv_cpu_update_mask(CPURISCVState *env)
 if (env->mmte & M_PM_ENABLE) {
 mask = env->mpmmask;
 base = env->mpmbase;
+insn = env->mmte & MMTE_M_PM_INSN;
 }
 break;
 case PRV_S:
 if (env->mmte & S_PM_ENABLE) {
 mask = env->spmmask;
 base = env->spmbase;
+insn = env->mmte & MMTE_S_PM_INSN;
 }
 break;
 case PRV_U:
 if (env->mmte & U_PM_ENABLE) {
 mask = env->upmmask;
 base = env->upmbase;
+insn = env->mmte & MMTE_U_PM_INSN;
 }
 break;
 default:
@@ -161,6 +178,7 @@ void riscv_cpu_update_mask(CPURISCVState *env)
 env->cur_pmmask = mask;
 env->cur_pmbase = base;
 }
+env->cur_pminsn = insn;
 }
 
 #ifndef CONFIG_USER_ONLY
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 43b9ad4500..0902b64129 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -3518,8 +3518,6 @@ static RISCVException write_mmte(CPURISCVState *env, int 
csrno,
 /* for machine mode pm.current is hardwired to 1 */
 wpri_val |= MMTE_M_PM_CURRENT;
 
-/* hardwiring pm.instruction bit to 0, since it's not supported yet */
-wpri_val &= ~(MMTE_M_PM_INSN | MMTE_S_PM_INSN | MMTE_U_PM_INSN);
 env->mmte = wpri_val | PM_EXT_DIRTY;
 riscv_cpu_update_mask(env);
 
-- 
2.25.1

[PATCH v4 0/8] target/riscv: Fix pointer mask related support

2023-03-31 Thread Weiwei Li

This patchset tries to fix some problem in current implementation for pointer 
mask, and add support for pointer mask of instruction fetch.

The port is available here:
https://github.com/plctlab/plct-qemu/tree/plct-pm-fix-v4

v2:
* drop some error patchs
* Add patch 2 and 3 to fix the new problems
* Add patch 4 and 5 to use PC-relative translation for pointer mask for 
instruction fetch

v3:
* use target_pc temp instead of cpu_pc to store into badaddr in patch 3
* use dest_gpr instead of tcg_temp_new() for succ_pc in patch 4
* enable CF_PCREL for system mode in seperate patch 5

v4：
* Fix wrong pc_save value for conditional jump in patch 4
* Fix tcg_cflags overwrite problem to make CF_PCREL really work in new patch 5
* Fix tb mis-matched problem in new patch 6

Weiwei Li (8):
  target/riscv: Fix pointer mask transformation for vector address
  target/riscv: Update cur_pmmask/base when xl changes
  target/riscv: Fix target address to update badaddr
  target/riscv: Add support for PC-relative translation
  accel/tcg: Fix overwrite problems of tcg_cflags
  accel/tcg: Fix tb mis-matched problem when CF_PCREL is enabled
  target/riscv: Enable PC-relative translation in system mode
  target/riscv: Add pointer mask support for instruction fetch

 accel/tcg/cpu-exec.c|  3 ++
 accel/tcg/tcg-accel-ops.c   |  2 +-
 target/riscv/cpu.c  | 31 +++
 target/riscv/cpu.h  |  1 +
 target/riscv/cpu_helper.c   | 20 ++-
 target/riscv/csr.c  | 11 ++--
 target/riscv/insn_trans/trans_rvi.c.inc | 47 
 target/riscv/translate.c| 72 ++---
 target/riscv/vector_helper.c|  2 +-
 9 files changed, 145 insertions(+), 44 deletions(-)

-- 
2.25.1

[PATCH v4 3/8] target/riscv: Fix target address to update badaddr

2023-03-31 Thread Weiwei Li

Compute the target address  before storing it into badaddr
when mis-aligned exception is triggered.
Use a target_pc temp to store the target address to avoid
the confusing operation that udpate target address into
cpu_pc before misalign check, then update it into badaddr
and restore cpu_pc to current pc if exception is triggered.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 target/riscv/insn_trans/trans_rvi.c.inc | 23 ---
 target/riscv/translate.c| 21 ++---
 2 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc
index 4ad54e8a49..48c73cfcfe 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -51,25 +51,30 @@ static bool trans_jal(DisasContext *ctx, arg_jal *a)
 static bool trans_jalr(DisasContext *ctx, arg_jalr *a)
 {
 TCGLabel *misaligned = NULL;
+TCGv target_pc = tcg_temp_new();
 
-tcg_gen_addi_tl(cpu_pc, get_gpr(ctx, a->rs1, EXT_NONE), a->imm);
-tcg_gen_andi_tl(cpu_pc, cpu_pc, (target_ulong)-2);
+tcg_gen_addi_tl(target_pc, get_gpr(ctx, a->rs1, EXT_NONE), a->imm);
+tcg_gen_andi_tl(target_pc, target_pc, (target_ulong)-2);
+
+if (get_xl(ctx) == MXL_RV32) {
+tcg_gen_ext32s_tl(target_pc, target_pc);
+}
 
-gen_set_pc(ctx, cpu_pc);
 if (!has_ext(ctx, RVC)) {
 TCGv t0 = tcg_temp_new();
 
 misaligned = gen_new_label();
-tcg_gen_andi_tl(t0, cpu_pc, 0x2);
+tcg_gen_andi_tl(t0, target_pc, 0x2);
 tcg_gen_brcondi_tl(TCG_COND_NE, t0, 0x0, misaligned);
 }
 
 gen_set_gpri(ctx, a->rd, ctx->pc_succ_insn);
+tcg_gen_mov_tl(cpu_pc, target_pc);
 lookup_and_goto_ptr(ctx);
 
 if (misaligned) {
 gen_set_label(misaligned);
-gen_exception_inst_addr_mis(ctx);
+gen_exception_inst_addr_mis(ctx, target_pc);
 }
 ctx->base.is_jmp = DISAS_NORETURN;
 
@@ -153,6 +158,7 @@ static bool gen_branch(DisasContext *ctx, arg_b *a, TCGCond 
cond)
 TCGLabel *l = gen_new_label();
 TCGv src1 = get_gpr(ctx, a->rs1, EXT_SIGN);
 TCGv src2 = get_gpr(ctx, a->rs2, EXT_SIGN);
+target_ulong next_pc;
 
 if (get_xl(ctx) == MXL_RV128) {
 TCGv src1h = get_gprh(ctx, a->rs1);
@@ -169,9 +175,12 @@ static bool gen_branch(DisasContext *ctx, arg_b *a, 
TCGCond cond)
 
 gen_set_label(l); /* branch taken */
 
-if (!has_ext(ctx, RVC) && ((ctx->base.pc_next + a->imm) & 0x3)) {
+next_pc = ctx->base.pc_next + a->imm;
+if (!has_ext(ctx, RVC) && (next_pc & 0x3)) {
 /* misaligned */
-gen_exception_inst_addr_mis(ctx);
+TCGv target_pc = tcg_temp_new();
+gen_get_target_pc(target_pc, ctx, next_pc);
+gen_exception_inst_addr_mis(ctx, target_pc);
 } else {
 gen_goto_tb(ctx, 0, ctx->base.pc_next + a->imm);
 }
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 0ee8ee147d..7b5223efc2 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -222,21 +222,18 @@ static void decode_save_opc(DisasContext *ctx)
 ctx->insn_start = NULL;
 }
 
-static void gen_set_pc_imm(DisasContext *ctx, target_ulong dest)
+static void gen_get_target_pc(TCGv target, DisasContext *ctx,
+  target_ulong dest)
 {
 if (get_xl(ctx) == MXL_RV32) {
 dest = (int32_t)dest;
 }
-tcg_gen_movi_tl(cpu_pc, dest);
+tcg_gen_movi_tl(target, dest);
 }
 
-static void gen_set_pc(DisasContext *ctx, TCGv dest)
+static void gen_set_pc_imm(DisasContext *ctx, target_ulong dest)
 {
-if (get_xl(ctx) == MXL_RV32) {
-tcg_gen_ext32s_tl(cpu_pc, dest);
-} else {
-tcg_gen_mov_tl(cpu_pc, dest);
-}
+gen_get_target_pc(cpu_pc, ctx, dest);
 }
 
 static void generate_exception(DisasContext *ctx, int excp)
@@ -257,9 +254,9 @@ static void gen_exception_illegal(DisasContext *ctx)
 }
 }
 
-static void gen_exception_inst_addr_mis(DisasContext *ctx)
+static void gen_exception_inst_addr_mis(DisasContext *ctx, TCGv target)
 {
-tcg_gen_st_tl(cpu_pc, cpu_env, offsetof(CPURISCVState, badaddr));
+tcg_gen_st_tl(target, cpu_env, offsetof(CPURISCVState, badaddr));
 generate_exception(ctx, RISCV_EXCP_INST_ADDR_MIS);
 }
 
@@ -551,7 +548,9 @@ static void gen_jal(DisasContext *ctx, int rd, target_ulong 
imm)
 next_pc = ctx->base.pc_next + imm;
 if (!has_ext(ctx, RVC)) {
 if ((next_pc & 0x3) != 0) {
-gen_exception_inst_addr_mis(ctx);
+TCGv target_pc = tcg_temp_new();
+gen_get_target_pc(target_pc, ctx, next_pc);
+gen_exception_inst_addr_mis(ctx, target_pc);
 return;
 }
 }
-- 
2.25.1

[PATCH v4 4/8] target/riscv: Add support for PC-relative translation

2023-03-31 Thread Weiwei Li

Add a base save_pc For PC-relative translation(CF_PCREL).
Diable the directly sync pc from tb by riscv_cpu_synchronize_from_tb.
Sync pc before it's used or updated from tb related pc:
   real_pc = (old)env->pc + target_pc(from tb) - ctx->save_pc

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 target/riscv/cpu.c  | 29 +-
 target/riscv/insn_trans/trans_rvi.c.inc | 24 +--
 target/riscv/translate.c| 53 +
 3 files changed, 85 insertions(+), 21 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 1e97473af2..646fa31a59 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -658,16 +658,18 @@ static vaddr riscv_cpu_get_pc(CPUState *cs)
 static void riscv_cpu_synchronize_from_tb(CPUState *cs,
   const TranslationBlock *tb)
 {
-RISCVCPU *cpu = RISCV_CPU(cs);
-CPURISCVState *env = &cpu->env;
-RISCVMXL xl = FIELD_EX32(tb->flags, TB_FLAGS, XL);
+if (!(tb_cflags(tb) & CF_PCREL)) {
+RISCVCPU *cpu = RISCV_CPU(cs);
+CPURISCVState *env = &cpu->env;
+RISCVMXL xl = FIELD_EX32(tb->flags, TB_FLAGS, XL);
 
-tcg_debug_assert(!(cs->tcg_cflags & CF_PCREL));
+tcg_debug_assert(!(cs->tcg_cflags & CF_PCREL));
 
-if (xl == MXL_RV32) {
-env->pc = (int32_t) tb->pc;
-} else {
-env->pc = tb->pc;
+if (xl == MXL_RV32) {
+env->pc = (int32_t) tb->pc;
+} else {
+env->pc = tb->pc;
+}
 }
 }
 
@@ -693,11 +695,18 @@ static void riscv_restore_state_to_opc(CPUState *cs,
 RISCVCPU *cpu = RISCV_CPU(cs);
 CPURISCVState *env = &cpu->env;
 RISCVMXL xl = FIELD_EX32(tb->flags, TB_FLAGS, XL);
+target_ulong pc;
+
+if (tb_cflags(tb) & CF_PCREL) {
+pc = (env->pc & TARGET_PAGE_MASK) | data[0];
+} else {
+pc = data[0];
+}
 
 if (xl == MXL_RV32) {
-env->pc = (int32_t)data[0];
+env->pc = (int32_t)pc;
 } else {
-env->pc = data[0];
+env->pc = pc;
 }
 env->bins = data[1];
 }
diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc
index 48c73cfcfe..daa490e7aa 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -38,7 +38,15 @@ static bool trans_lui(DisasContext *ctx, arg_lui *a)
 
 static bool trans_auipc(DisasContext *ctx, arg_auipc *a)
 {
-gen_set_gpri(ctx, a->rd, a->imm + ctx->base.pc_next);
+assert(ctx->pc_save != -1);
+if (tb_cflags(ctx->base.tb) & CF_PCREL) {
+TCGv target_pc = dest_gpr(ctx, a->rd);
+tcg_gen_addi_tl(target_pc, cpu_pc, a->imm + ctx->base.pc_next -
+   ctx->pc_save);
+gen_set_gpr(ctx, a->rd, target_pc);
+} else {
+gen_set_gpri(ctx, a->rd, a->imm + ctx->base.pc_next);
+}
 return true;
 }
 
@@ -68,7 +76,14 @@ static bool trans_jalr(DisasContext *ctx, arg_jalr *a)
 tcg_gen_brcondi_tl(TCG_COND_NE, t0, 0x0, misaligned);
 }
 
-gen_set_gpri(ctx, a->rd, ctx->pc_succ_insn);
+if (tb_cflags(ctx->base.tb) & CF_PCREL) {
+TCGv succ_pc = dest_gpr(ctx, a->rd);
+tcg_gen_addi_tl(succ_pc, cpu_pc, ctx->pc_succ_insn - ctx->pc_save);
+gen_set_gpr(ctx, a->rd, succ_pc);
+} else {
+gen_set_gpri(ctx, a->rd, ctx->pc_succ_insn);
+}
+
 tcg_gen_mov_tl(cpu_pc, target_pc);
 lookup_and_goto_ptr(ctx);
 
@@ -159,6 +174,7 @@ static bool gen_branch(DisasContext *ctx, arg_b *a, TCGCond 
cond)
 TCGv src1 = get_gpr(ctx, a->rs1, EXT_SIGN);
 TCGv src2 = get_gpr(ctx, a->rs2, EXT_SIGN);
 target_ulong next_pc;
+target_ulong orig_pc_save = ctx->pc_save;
 
 if (get_xl(ctx) == MXL_RV128) {
 TCGv src1h = get_gprh(ctx, a->rs1);
@@ -175,6 +191,7 @@ static bool gen_branch(DisasContext *ctx, arg_b *a, TCGCond 
cond)
 
 gen_set_label(l); /* branch taken */
 
+ctx->pc_save = orig_pc_save;
 next_pc = ctx->base.pc_next + a->imm;
 if (!has_ext(ctx, RVC) && (next_pc & 0x3)) {
 /* misaligned */
@@ -182,8 +199,9 @@ static bool gen_branch(DisasContext *ctx, arg_b *a, TCGCond 
cond)
 gen_get_target_pc(target_pc, ctx, next_pc);
 gen_exception_inst_addr_mis(ctx, target_pc);
 } else {
-gen_goto_tb(ctx, 0, ctx->base.pc_next + a->imm);
+gen_goto_tb(ctx, 0, next_pc);
 }
+ctx->pc_save = -1;
 ctx->base.is_jmp = DISAS_NORETURN;
 
 return true;
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 7b5223efc2..2dd594ddae 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -59,6 +59,7 @@ typedef struct DisasContext {
 DisasContextBase base;
 /* pc_succ_insn points to the instruction following base.pc_next */
 target_ulong pc_succ_insn;
+target_ulong pc_save;
 target_ulong priv_ver;
 RISCVMXL misa_mxl_max;
 RISCVMXL xl;
@@ -225,15 +226,24

Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram

2023-03-31 Thread Fabiano Rosas

Peter Xu  writes:

> On Fri, Mar 31, 2023 at 11:37:50AM -0300, Fabiano Rosas wrote:
>> >> Outgoing migration to file. NVMe disk. XFS filesystem.
>> >> 
>> >> - Single migration runs of stopped 32G guest with ~90% RAM usage. Guest
>> >>   running `stress-ng --vm 4 --vm-bytes 90% --vm-method all --verify -t
>> >>   10m -v`:
>> >> 
>> >> migration type  | MB/s | pages/s |  ms
>> >> +--+-+--
>> >> savevm io_uring |  434 |  102294 | 71473
>> >
>> > So I assume this is the non-live migration scenario.  Could you explain
>> > what does io_uring mean here?
>> >
>> 
>> This table is all non-live migration. This particular line is a snapshot
>> (hmp_savevm->save_snapshot). I thought it could be relevant because it
>> is another way by which we write RAM into disk.
>
> I see, so if all non-live that explains, because I was curious what's the
> relationship between this feature and the live snapshot that QEMU also
> supports.
>
> I also don't immediately see why savevm will be much slower, do you have an
> answer?  Maybe it's somewhere but I just overlooked..
>

I don't have a concrete answer. I could take a jab and maybe blame the
extra memcpy for the buffer in QEMUFile? Or perhaps an unintended effect
of bandwidth limits?

> IIUC this is "vm suspend" case, so there's an extra benefit knowledge of
> "we can stop the VM".  It smells slightly weird to build this on top of
> "migrate" from that pov, rather than "savevm", though.  Any thoughts on
> this aspect (on why not building this on top of "savevm")?
>

I share the same perception. I have done initial experiments with
savevm, but I decided to carry on the work that was already started by
others because my understanding of the problem was yet incomplete.

One point that has been raised is that the fixed-ram format alone does
not bring that many performance improvements. So we'll need
multi-threading and direct-io on top of it. Re-using multifd
infrastructure seems like it could be a good idea.

> Thanks,
>
>> 
>> The io_uring is noise, I was initially under the impression that the
>> block device aio configuration affected this scenario.
>> 
>> >> file:   | 3017 |  855862 | 10301
>> >> fixed-ram   | 1982 |  330686 | 15637
>> >> +--+-+--
>> >> fixed-ram + multifd + O_DIRECT
>> >>  2 ch.  | 5565 | 1500882 |  5576
>> >>  4 ch.  | 5735 | 1991549 |  5412
>> >>  8 ch.  | 5650 | 1769650 |  5489
>> >> 16 ch.  | 6071 | 1832407 |  5114
>> >> 32 ch.  | 6147 | 1809588 |  5050
>> >> 64 ch.  | 6344 | 1841728 |  4895
>> >>128 ch.  | 6120 | 1915669 |  5085
>> >> +--+-+--
>> >
>> > Thanks,
>>

Re: [RFC PATCH v1 10/26] migration/ram: Introduce 'fixed-ram' migration stream capability

2023-03-31 Thread Daniel P . Berrangé

On Fri, Mar 31, 2023 at 10:39:23AM -0400, Peter Xu wrote:
> On Fri, Mar 31, 2023 at 08:56:01AM +0100, Daniel P. Berrangé wrote:
> > On Thu, Mar 30, 2023 at 06:01:51PM -0400, Peter Xu wrote:
> > > On Thu, Mar 30, 2023 at 03:03:20PM -0300, Fabiano Rosas wrote:
> > > > From: Nikolay Borisov 
> > > > 
> > > > Implement 'fixed-ram' feature. The core of the feature is to ensure that
> > > > each ram page of the migration stream has a specific offset in the
> > > > resulting migration stream. The reason why we'd want such behavior are
> > > > two fold:
> > > > 
> > > >  - When doing a 'fixed-ram' migration the resulting file will have a
> > > >bounded size, since pages which are dirtied multiple times will
> > > >always go to a fixed location in the file, rather than constantly
> > > >being added to a sequential stream. This eliminates cases where a vm
> > > >with, say, 1G of ram can result in a migration file that's 10s of
> > > >GBs, provided that the workload constantly redirties memory.
> > > > 
> > > >  - It paves the way to implement DIO-enabled save/restore of the
> > > >migration stream as the pages are ensured to be written at aligned
> > > >offsets.
> > > > 
> > > > The feature requires changing the stream format. First, a bitmap is
> > > > introduced which tracks which pages have been written (i.e are
> > > > dirtied) during migration and subsequently it's being written in the
> > > > resulting file, again at a fixed location for every ramblock. Zero
> > > > pages are ignored as they'd be zero in the destination migration as
> > > > well. With the changed format data would look like the following:
> > > > 
> > > > |name len|name|used_len|pc*|bitmap_size|pages_offset|bitmap|pages|
> > > 
> > > What happens with huge pages?  Would page size matter here?
> > > 
> > > I would assume it's fine it uses a constant (small) page size, assuming
> > > that should match with the granule that qemu tracks dirty (which IIUC is
> > > the host page size not guest's).
> > > 
> > > But I didn't yet pay any further thoughts on that, maybe it would be
> > > worthwhile in all cases to record page sizes here to be explicit or the
> > > meaning of bitmap may not be clear (and then the bitmap_size will be a
> > > field just for sanity check too).
> > 
> > I think recording the page sizes is an anti-feature in this case.
> > 
> > The migration format / state needs to reflect the guest ABI, but
> > we need to be free to have different backend config behind that
> > either side of the save/restore.
> > 
> > IOW, if I start a QEMU with 2 GB of RAM, I should be free to use
> > small pages initially and after restore use 2 x 1 GB hugepages,
> > or vica-verca.
> > 
> > The important thing with the pages that are saved into the file
> > is that they are a 1:1 mapping guest RAM regions to file offsets.
> > IOW, the 2 GB of guest RAM is always a contiguous 2 GB region
> > in the file.
> > 
> > If the src VM used 1 GB pages, we would be writing a full 2 GB
> > of data assuming both pages were dirty.
> > 
> > If the src VM used 4k pages, we would be writing some subset of
> > the 2 GB of data, and the rest would be unwritten.
> > 
> > Either way, when reading back the data we restore it into either
> > 1 GB pages of 4k pages, beause any places there were unwritten
> > orignally  will read back as zeros.
> 
> I think there's already the page size information, because there's a bitmap
> embeded in the format at least in the current proposal, and the bitmap can
> only be defined with a page size provided in some form.
> 
> Here I agree the backend can change before/after a migration (live or
> not).  Though the question is whether page size matters in the snapshot
> layout rather than what the loaded QEMU instance will use as backend.

IIUC, the page size information merely sets a constraint on the granularity
of unwritten (sparse) regions in the file. If we didn't want to express
page size directly in the file format we would need explicit start/end
offsets for each written block. This is less convenient that just having
a bitmap, so I think its ok to use the page size bitmap

> > > If postcopy might be an option, we'd want the page size to be the host 
> > > page
> > > size because then looking up the bitmap will be straightforward, deciding
> > > whether we should copy over page (UFFDIO_COPY) or fill in with zeros
> > > (UFFDIO_ZEROPAGE).
> > 
> > This format is only intended for the case where we are migrating to
> > a random-access medium, aka a file, because the fixed RAM mappings
> > to disk mean that we need to seek back to the original location to
> > re-write pages that get dirtied. It isn't suitable for a live
> > migration stream, and thus postcopy is inherantly out of scope.
> 
> Yes, I've commented also in the cover letter, but I can expand a bit.
> 
> I mean support postcopy only when loading, but not when saving.
> 
> Saving to file definitely cannot work with postcopy because there's no dest
> qem

Re: [PATCH 1/3] target/arm: Pass ARMMMUFaultInfo to merge_syn_data_abort()

2023-03-31 Thread Richard Henderson


On 3/31/23 07:50, Peter Maydell wrote:

We already pass merge_syn_data_abort() two fields from the
ARMMMUFaultInfo struct, and we're about to want to use a third field.
Refactor to just pass a pointer to the fault info.

Signed-off-by: Peter Maydell
---
  target/arm/tcg/tlb_helper.c | 15 +++
  1 file changed, 7 insertions(+), 8 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram

2023-03-31 Thread Daniel P . Berrangé

On Fri, Mar 31, 2023 at 10:52:09AM -0400, Peter Xu wrote:
> On Fri, Mar 31, 2023 at 11:37:50AM -0300, Fabiano Rosas wrote:
> > >> Outgoing migration to file. NVMe disk. XFS filesystem.
> > >> 
> > >> - Single migration runs of stopped 32G guest with ~90% RAM usage. Guest
> > >>   running `stress-ng --vm 4 --vm-bytes 90% --vm-method all --verify -t
> > >>   10m -v`:
> > >> 
> > >> migration type  | MB/s | pages/s |  ms
> > >> +--+-+--
> > >> savevm io_uring |  434 |  102294 | 71473
> > >
> > > So I assume this is the non-live migration scenario.  Could you explain
> > > what does io_uring mean here?
> > >
> > 
> > This table is all non-live migration. This particular line is a snapshot
> > (hmp_savevm->save_snapshot). I thought it could be relevant because it
> > is another way by which we write RAM into disk.
> 
> I see, so if all non-live that explains, because I was curious what's the
> relationship between this feature and the live snapshot that QEMU also
> supports.
> 
> I also don't immediately see why savevm will be much slower, do you have an
> answer?  Maybe it's somewhere but I just overlooked..
> 
> IIUC this is "vm suspend" case, so there's an extra benefit knowledge of
> "we can stop the VM".  It smells slightly weird to build this on top of
> "migrate" from that pov, rather than "savevm", though.  Any thoughts on
> this aspect (on why not building this on top of "savevm")?

Currently savevm covers memory, device state and disk snapshots
saving into the VM's disks, which basically means only works
with qcow2.

Libvirt's save logic only cares about saving memory and device
state, and supports saving guests regardless of what storage is
used, saving it externally from the disk.

This is only possible with 'migrate' today and so 'savevm' isn't
useful for this tasks from libvirt's POV.

In the past it has been suggested that actually 'savevm' command
as a concept is redundant, and that we could in fact layer it
on top of a combination of migration and block snapshot APIs.
eg if we had a 'blockdev:' migration protocol for saving the
vmstate.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 2/3] target/arm: Don't set ISV when reporting stage 1 faults in ESR_EL2

2023-03-31 Thread Richard Henderson


On 3/31/23 07:50, Peter Maydell wrote:

The syndrome value reported to ESR_EL2 should only contain the
detailed instruction syndrome information when the fault has been
caused by a stage 2 abort, not when the fault was a stage 1 abort
(i.e.  caused by execution at EL2).  We were getting this wrong and
reporting the detailed ISV information all the time.

Fix the bug by checking fi->stage2.  Add a TODO comment noting the
cases where we'll have to come back and revisit this when we
implement FEAT_LS64 and friends.

Signed-off-by: Peter Maydell
---
  target/arm/tcg/tlb_helper.c | 13 ++---
  1 file changed, 10 insertions(+), 3 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram

2023-03-31 Thread Peter Xu

On Fri, Mar 31, 2023 at 12:30:45PM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > On Fri, Mar 31, 2023 at 11:37:50AM -0300, Fabiano Rosas wrote:
> >> >> Outgoing migration to file. NVMe disk. XFS filesystem.
> >> >> 
> >> >> - Single migration runs of stopped 32G guest with ~90% RAM usage. Guest
> >> >>   running `stress-ng --vm 4 --vm-bytes 90% --vm-method all --verify -t
> >> >>   10m -v`:
> >> >> 
> >> >> migration type  | MB/s | pages/s |  ms
> >> >> +--+-+--
> >> >> savevm io_uring |  434 |  102294 | 71473
> >> >
> >> > So I assume this is the non-live migration scenario.  Could you explain
> >> > what does io_uring mean here?
> >> >
> >> 
> >> This table is all non-live migration. This particular line is a snapshot
> >> (hmp_savevm->save_snapshot). I thought it could be relevant because it
> >> is another way by which we write RAM into disk.
> >
> > I see, so if all non-live that explains, because I was curious what's the
> > relationship between this feature and the live snapshot that QEMU also
> > supports.
> >
> > I also don't immediately see why savevm will be much slower, do you have an
> > answer?  Maybe it's somewhere but I just overlooked..
> >
> 
> I don't have a concrete answer. I could take a jab and maybe blame the
> extra memcpy for the buffer in QEMUFile? Or perhaps an unintended effect
> of bandwidth limits?

IMHO it would be great if this can be investigated and reasons provided in
the next cover letter.

> 
> > IIUC this is "vm suspend" case, so there's an extra benefit knowledge of
> > "we can stop the VM".  It smells slightly weird to build this on top of
> > "migrate" from that pov, rather than "savevm", though.  Any thoughts on
> > this aspect (on why not building this on top of "savevm")?
> >
> 
> I share the same perception. I have done initial experiments with
> savevm, but I decided to carry on the work that was already started by
> others because my understanding of the problem was yet incomplete.
> 
> One point that has been raised is that the fixed-ram format alone does
> not bring that many performance improvements. So we'll need
> multi-threading and direct-io on top of it. Re-using multifd
> infrastructure seems like it could be a good idea.

The thing is IMHO concurrency is not as hard if VM stopped, and when we're
100% sure locally on where the page will go.

IOW, I think multifd provides a lot of features that may not really be
useful for this effort, meanwhile using those features may need to already
pay for the overhead to support those features.

For example, a major benefit of multifd is it allows pages sent out of
order, so it indexes the page as a header.  I didn't read the follow up
patches, but I assume that's not needed in this effort.

What I understand so far with fixes-ram is we dump the whole ramblock
memory into a chunk at offset of a file.  Can concurrency of that
achievable easily by creating a bunch of threads dumping altogether during
the savevm, with different offsets of guest ram & file passed over?

It's very possible that I overlooked a lot of things, but IMHO my point is
it'll always be great to have a small section discussing the pros and cons
in the cover letter on the decision of using "migrate" infra rather than
"savevm".  Because it's still against the intuition at least to some
reviewers (like me..).  What I worry is this can be implemented more
efficiently and with less LOCs into savevm (and perhaps also benefit normal
savevm too!  so there's chance current savevm users can already benefit
from this) but we didn't do so because the project simply started with
using QMP migrate.  Any investigation on figuring more of this out would be
greatly helpful.

Thanks,

-- 
Peter Xu

Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram

2023-03-31 Thread Daniel P . Berrangé

On Fri, Mar 31, 2023 at 11:55:03AM -0400, Peter Xu wrote:
> On Fri, Mar 31, 2023 at 12:30:45PM -0300, Fabiano Rosas wrote:
> > Peter Xu  writes:
> > 
> > > On Fri, Mar 31, 2023 at 11:37:50AM -0300, Fabiano Rosas wrote:
> > >> >> Outgoing migration to file. NVMe disk. XFS filesystem.
> > >> >> 
> > >> >> - Single migration runs of stopped 32G guest with ~90% RAM usage. 
> > >> >> Guest
> > >> >>   running `stress-ng --vm 4 --vm-bytes 90% --vm-method all --verify -t
> > >> >>   10m -v`:
> > >> >> 
> > >> >> migration type  | MB/s | pages/s |  ms
> > >> >> +--+-+--
> > >> >> savevm io_uring |  434 |  102294 | 71473
> > >> >
> > >> > So I assume this is the non-live migration scenario.  Could you explain
> > >> > what does io_uring mean here?
> > >> >
> > >> 
> > >> This table is all non-live migration. This particular line is a snapshot
> > >> (hmp_savevm->save_snapshot). I thought it could be relevant because it
> > >> is another way by which we write RAM into disk.
> > >
> > > I see, so if all non-live that explains, because I was curious what's the
> > > relationship between this feature and the live snapshot that QEMU also
> > > supports.
> > >
> > > I also don't immediately see why savevm will be much slower, do you have 
> > > an
> > > answer?  Maybe it's somewhere but I just overlooked..
> > >
> > 
> > I don't have a concrete answer. I could take a jab and maybe blame the
> > extra memcpy for the buffer in QEMUFile? Or perhaps an unintended effect
> > of bandwidth limits?
> 
> IMHO it would be great if this can be investigated and reasons provided in
> the next cover letter.
> 
> > 
> > > IIUC this is "vm suspend" case, so there's an extra benefit knowledge of
> > > "we can stop the VM".  It smells slightly weird to build this on top of
> > > "migrate" from that pov, rather than "savevm", though.  Any thoughts on
> > > this aspect (on why not building this on top of "savevm")?
> > >
> > 
> > I share the same perception. I have done initial experiments with
> > savevm, but I decided to carry on the work that was already started by
> > others because my understanding of the problem was yet incomplete.
> > 
> > One point that has been raised is that the fixed-ram format alone does
> > not bring that many performance improvements. So we'll need
> > multi-threading and direct-io on top of it. Re-using multifd
> > infrastructure seems like it could be a good idea.
> 
> The thing is IMHO concurrency is not as hard if VM stopped, and when we're
> 100% sure locally on where the page will go.

We shouldn't assume the VM is stopped though. When saving to the file
the VM may still be active. The fixed-ram format lets us re-write the
same memory location on disk multiple times in this case, thus avoiding
growth of the file size.

> IOW, I think multifd provides a lot of features that may not really be
> useful for this effort, meanwhile using those features may need to already
> pay for the overhead to support those features.
> 
> For example, a major benefit of multifd is it allows pages sent out of
> order, so it indexes the page as a header.  I didn't read the follow up
> patches, but I assume that's not needed in this effort.
> 
> What I understand so far with fixes-ram is we dump the whole ramblock
> memory into a chunk at offset of a file.  Can concurrency of that
> achievable easily by creating a bunch of threads dumping altogether during
> the savevm, with different offsets of guest ram & file passed over?

I feel like the migration code is already insanely complicated and
the many threads involved have caused no end of subtle bugs. 

It was Juan I believe who expressed a desire to entirely remove
non-multifd code in the future, in order to reduce the maint burden.
IOW, ideally we would be pushing mgmt apps towards always using
multifd at all times, even if they only ask it to create 1 single
thread.

That would in turn suggest against creating new concurrency
mechanisms on top of non-multifd code, both to avoid adding yet
more complexity and also because it would make it harder to later
delete the non-multifd code.

On the libvirt side wrt fixed-ram, we could just use multifd
exclusively, as there should be no downside to it even for a
single FD.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[PATCH v12 0/3] Add support for TPM devices over I2C bus

2023-03-31 Thread Ninad Palsule



Hello,
Incorporated review comments from Stefan. Please review.

This drop adds support for the TPM devices attached to the I2C bus. It
only supports the TPM2 protocol. You need to run it with the external
TPM emulator like swtpm. I have tested it with swtpm.

I have refered to the work done by zhdan...@meta.com but at the core
level out implementation is different.
https://github.com/theopolis/qemu/commit/2e2e57cde9e419c36af8071bb85392ad1ed70966

Based-on: $MESSAGE_ID

Ninad Palsule (3):
  docs: Add support for TPM devices over I2C bus
  tpm: Extend common APIs to support TPM TIS I2C
  tpm: Add support for TPM device over I2C bus

 docs/specs/tpm.rst  |  21 ++
 hw/arm/Kconfig  |   1 +
 hw/tpm/Kconfig  |   7 +
 hw/tpm/meson.build  |   1 +
 hw/tpm/tpm_tis.h|   3 +
 hw/tpm/tpm_tis_common.c |  36 ++-
 hw/tpm/tpm_tis_i2c.c| 562 
 hw/tpm/trace-events |   6 +
 include/hw/acpi/tpm.h   |  41 +++
 include/sysemu/tpm.h|   3 +
 10 files changed, 673 insertions(+), 8 deletions(-)
 create mode 100644 hw/tpm/tpm_tis_i2c.c

-- 
2.37.2

Re: [RFC PATCH v1 10/26] migration/ram: Introduce 'fixed-ram' migration stream capability

2023-03-31 Thread Peter Xu

On Fri, Mar 31, 2023 at 04:34:57PM +0100, Daniel P. Berrangé wrote:
> On Fri, Mar 31, 2023 at 10:39:23AM -0400, Peter Xu wrote:
> > On Fri, Mar 31, 2023 at 08:56:01AM +0100, Daniel P. Berrangé wrote:
> > > On Thu, Mar 30, 2023 at 06:01:51PM -0400, Peter Xu wrote:
> > > > On Thu, Mar 30, 2023 at 03:03:20PM -0300, Fabiano Rosas wrote:
> > > > > From: Nikolay Borisov 
> > > > > 
> > > > > Implement 'fixed-ram' feature. The core of the feature is to ensure 
> > > > > that
> > > > > each ram page of the migration stream has a specific offset in the
> > > > > resulting migration stream. The reason why we'd want such behavior are
> > > > > two fold:
> > > > > 
> > > > >  - When doing a 'fixed-ram' migration the resulting file will have a
> > > > >bounded size, since pages which are dirtied multiple times will
> > > > >always go to a fixed location in the file, rather than constantly
> > > > >being added to a sequential stream. This eliminates cases where a 
> > > > > vm
> > > > >with, say, 1G of ram can result in a migration file that's 10s of
> > > > >GBs, provided that the workload constantly redirties memory.
> > > > > 
> > > > >  - It paves the way to implement DIO-enabled save/restore of the
> > > > >migration stream as the pages are ensured to be written at aligned
> > > > >offsets.
> > > > > 
> > > > > The feature requires changing the stream format. First, a bitmap is
> > > > > introduced which tracks which pages have been written (i.e are
> > > > > dirtied) during migration and subsequently it's being written in the
> > > > > resulting file, again at a fixed location for every ramblock. Zero
> > > > > pages are ignored as they'd be zero in the destination migration as
> > > > > well. With the changed format data would look like the following:
> > > > > 
> > > > > |name len|name|used_len|pc*|bitmap_size|pages_offset|bitmap|pages|
> > > > 
> > > > What happens with huge pages?  Would page size matter here?
> > > > 
> > > > I would assume it's fine it uses a constant (small) page size, assuming
> > > > that should match with the granule that qemu tracks dirty (which IIUC is
> > > > the host page size not guest's).
> > > > 
> > > > But I didn't yet pay any further thoughts on that, maybe it would be
> > > > worthwhile in all cases to record page sizes here to be explicit or the
> > > > meaning of bitmap may not be clear (and then the bitmap_size will be a
> > > > field just for sanity check too).
> > > 
> > > I think recording the page sizes is an anti-feature in this case.
> > > 
> > > The migration format / state needs to reflect the guest ABI, but
> > > we need to be free to have different backend config behind that
> > > either side of the save/restore.
> > > 
> > > IOW, if I start a QEMU with 2 GB of RAM, I should be free to use
> > > small pages initially and after restore use 2 x 1 GB hugepages,
> > > or vica-verca.
> > > 
> > > The important thing with the pages that are saved into the file
> > > is that they are a 1:1 mapping guest RAM regions to file offsets.
> > > IOW, the 2 GB of guest RAM is always a contiguous 2 GB region
> > > in the file.
> > > 
> > > If the src VM used 1 GB pages, we would be writing a full 2 GB
> > > of data assuming both pages were dirty.
> > > 
> > > If the src VM used 4k pages, we would be writing some subset of
> > > the 2 GB of data, and the rest would be unwritten.
> > > 
> > > Either way, when reading back the data we restore it into either
> > > 1 GB pages of 4k pages, beause any places there were unwritten
> > > orignally  will read back as zeros.
> > 
> > I think there's already the page size information, because there's a bitmap
> > embeded in the format at least in the current proposal, and the bitmap can
> > only be defined with a page size provided in some form.
> > 
> > Here I agree the backend can change before/after a migration (live or
> > not).  Though the question is whether page size matters in the snapshot
> > layout rather than what the loaded QEMU instance will use as backend.
> 
> IIUC, the page size information merely sets a constraint on the granularity
> of unwritten (sparse) regions in the file. If we didn't want to express
> page size directly in the file format we would need explicit start/end
> offsets for each written block. This is less convenient that just having
> a bitmap, so I think its ok to use the page size bitmap

I'm perfectly fine with having the bitmap.  The original question was about
whether we should store page_size into the same header too along with the
bitmap.

Currently I think the page size can be implied by either the system
configuration (e.g. arch, cpu setups) and also the size of bitmap.  So I'm
wondering whether it'll be cleaner to replace the bitmap size with page
size (hence one can calculate the bitmap size from the page size), or just
keep both of them for sanity.

Besides, since we seem to be defining a new header format to be stored on
disks, maybe it'll be worthwhile to leave some s

[PATCH v12 2/3] tpm: Extend common APIs to support TPM TIS I2C

2023-03-31 Thread Ninad Palsule

From: Ninad Palsule 

Qemu already supports devices attached to ISA and sysbus. This drop adds
support for the I2C bus attached TPM devices.

This commit includes changes for the common code.
- Added support for the new checksum registers which are required for
  the I2C support. The checksum calculation is handled in the qemu
  common code.
- Added wrapper function for read and write data so that I2C code can
  call it without MMIO interface.

The TPM TIS I2C spec describes in the table in section "Interface Locality
Usage per Register" that the TPM_INT_ENABLE and TPM_INT_STATUS registers
must be writable for any locality even if the locality is not the active
locality. Therefore, remove the checks whether the writing locality is the
active locality for these registers.

Signed-off-by: Ninad Palsule 
Signed-off-by: Stefan Berger 
Reviewed-by: Stefan Berger 
Tested-by: Stefan Berger 
Reviewed-by: Cédric Le Goater 
Reviewed-by: Joel Stanley 
Tested-by: Joel Stanley 

---
V2:

Incorporated Stephen's comments.

- Removed checksum enable and checksum get registers.
- Added checksum calculation function which can be called from
  i2c layer.

---
V3:
Incorporated review comments from Cedric and Stefan.

- Pass locality to the checksum calculation function and cleanup
- Moved I2C related definations in the acpi/tpm.h

---
V4:

Incorporated review comments by Stefan

- Remove the check for locality while calculating checksum
- Use bswap16 instead of cpu_ti_be16.
- Rename TPM_I2C register by dropping _TIS_ from it.

---
V7:

Incorporated review comments from Stefan.

- Removed locality check from INT_ENABLE and INT_STATUS registers write
  path.
- Moved TPM_DATA_CSUM_ENABLED define in the tpm.h

---
V8:
Incorporated review comments from Stefan

- Moved the INT_ENABLE mask to tpm.h file.

---
V12:
Incorporated review comments from Stefan.
- Moved STS read/write mask to tpm.h
---
 hw/tpm/tpm_tis.h|  3 +++
 hw/tpm/tpm_tis_common.c | 36 
 include/hw/acpi/tpm.h   | 41 +
 3 files changed, 72 insertions(+), 8 deletions(-)

diff --git a/hw/tpm/tpm_tis.h b/hw/tpm/tpm_tis.h
index f6b5872ba6..6f29a508dd 100644
--- a/hw/tpm/tpm_tis.h
+++ b/hw/tpm/tpm_tis.h
@@ -86,5 +86,8 @@ int tpm_tis_pre_save(TPMState *s);
 void tpm_tis_reset(TPMState *s);
 enum TPMVersion tpm_tis_get_tpm_version(TPMState *s);
 void tpm_tis_request_completed(TPMState *s, int ret);
+uint32_t tpm_tis_read_data(TPMState *s, hwaddr addr, unsigned size);
+void tpm_tis_write_data(TPMState *s, hwaddr addr, uint64_t val, uint32_t size);
+uint16_t tpm_tis_get_checksum(TPMState *s);
 
 #endif /* TPM_TPM_TIS_H */
diff --git a/hw/tpm/tpm_tis_common.c b/hw/tpm/tpm_tis_common.c
index 503be2a541..c07c179dbc 100644
--- a/hw/tpm/tpm_tis_common.c
+++ b/hw/tpm/tpm_tis_common.c
@@ -26,6 +26,8 @@
 #include "hw/irq.h"
 #include "hw/isa/isa.h"
 #include "qapi/error.h"
+#include "qemu/bswap.h"
+#include "qemu/crc-ccitt.h"
 #include "qemu/module.h"
 
 #include "hw/acpi/tpm.h"
@@ -447,6 +449,23 @@ static uint64_t tpm_tis_mmio_read(void *opaque, hwaddr 
addr,
 return val;
 }
 
+/*
+ * A wrapper read function so that it can be directly called without
+ * mmio.
+ */
+uint32_t tpm_tis_read_data(TPMState *s, hwaddr addr, unsigned size)
+{
+return tpm_tis_mmio_read(s, addr, size);
+}
+
+/*
+ * Calculate current data buffer checksum
+ */
+uint16_t tpm_tis_get_checksum(TPMState *s)
+{
+return bswap16(crc_ccitt(0, s->buffer, s->rw_offset));
+}
+
 /*
  * Write a value to a register of the TIS interface
  * See specs pages 33-63 for description of the registers
@@ -588,10 +607,6 @@ static void tpm_tis_mmio_write(void *opaque, hwaddr addr,
 
 break;
 case TPM_TIS_REG_INT_ENABLE:
-if (s->active_locty != locty) {
-break;
-}
-
 s->loc[locty].inte &= mask;
 s->loc[locty].inte |= (val & (TPM_TIS_INT_ENABLED |
 TPM_TIS_INT_POLARITY_MASK |
@@ -601,10 +616,6 @@ static void tpm_tis_mmio_write(void *opaque, hwaddr addr,
 /* hard wired -- ignore */
 break;
 case TPM_TIS_REG_INT_STATUS:
-if (s->active_locty != locty) {
-break;
-}
-
 /* clearing of interrupt flags */
 if (((val & TPM_TIS_INTERRUPTS_SUPPORTED)) &&
 (s->loc[locty].ints & TPM_TIS_INTERRUPTS_SUPPORTED)) {
@@ -767,6 +778,15 @@ static void tpm_tis_mmio_write(void *opaque, hwaddr addr,
 }
 }
 
+/*
+ * A wrapper write function so that it can be directly called without
+ * mmio.
+ */
+void tpm_tis_write_data(TPMState *s, hwaddr addr, uint64_t val, uint32_t size)
+{
+tpm_tis_mmio_write(s, addr, val, size);
+}
+
 const MemoryRegionOps tpm_tis_memory_ops = {
 .read = tpm_tis_mmio_read,
 .write = tpm_tis_mmio_write,
diff --git a/include/hw/acpi/tpm.h b/include/hw/acpi/tpm.h
index 559ba6906c..579c45f5ba 100644
--- a/include/hw/acpi/tpm.h
+++ b/include/hw/acpi/tpm.h
@@ -93,6 +9

[PATCH v12 1/3] docs: Add support for TPM devices over I2C bus

2023-03-31 Thread Ninad Palsule

From: Ninad Palsule 

This is a documentation change for I2C TPM device support.

Qemu already supports devices attached to ISA and sysbus.
This drop adds support for the I2C bus attached TPM devices.

Signed-off-by: Ninad Palsule 
Reviewed-by: Stefan Berger 
Reviewed-by: Cédric Le Goater 
Reviewed-by: Joel Stanley 

---
V2:

Incorporated Stephen's review comments
- Added example in the document.

---
V4:
Incorporate Cedric & Stefan's comments

- Added example for ast2600-evb
- Corrected statement about arm virtual machine.

---
V6:
Incorporated review comments from Stefan.

---
V8:

Incorporate review comments from Joel and Stefan

- Removed the rainier example
- Added step required to configure on ast2600-evb
---
 docs/specs/tpm.rst | 21 +
 1 file changed, 21 insertions(+)

diff --git a/docs/specs/tpm.rst b/docs/specs/tpm.rst
index 535912a92b..efe124a148 100644
--- a/docs/specs/tpm.rst
+++ b/docs/specs/tpm.rst
@@ -21,12 +21,16 @@ QEMU files related to TPM TIS interface:
  - ``hw/tpm/tpm_tis_common.c``
  - ``hw/tpm/tpm_tis_isa.c``
  - ``hw/tpm/tpm_tis_sysbus.c``
+ - ``hw/tpm/tpm_tis_i2c.c``
  - ``hw/tpm/tpm_tis.h``
 
 Both an ISA device and a sysbus device are available. The former is
 used with pc/q35 machine while the latter can be instantiated in the
 Arm virt machine.
 
+An I2C device support is also provided which can be instantiated in the Arm
+based emulation machines. This device only supports the TPM 2 protocol.
+
 CRB interface
 -
 
@@ -348,6 +352,23 @@ In case an Arm virt machine is emulated, use the following 
command line:
 -drive if=pflash,format=raw,file=flash0.img,readonly=on \
 -drive if=pflash,format=raw,file=flash1.img
 
+In case a ast2600-evb bmc machine is emulated and you want to use a TPM device
+attached to I2C bus, use the following command line:
+
+.. code-block:: console
+
+  qemu-system-arm -M ast2600-evb -nographic \
+-kernel arch/arm/boot/zImage \
+-dtb arch/arm/boot/dts/aspeed-ast2600-evb.dtb \
+-initrd rootfs.cpio \
+-chardev socket,id=chrtpm,path=/tmp/mytpm1/swtpm-sock \
+-tpmdev emulator,id=tpm0,chardev=chrtpm \
+-device tpm-tis-i2c,tpmdev=tpm0,bus=aspeed.i2c.bus.12,address=0x2e
+
+  For testing, use this command to load the driver to the correct address
+
+  echo tpm_tis_i2c 0x2e > /sys/bus/i2c/devices/i2c-12/new_device
+
 In case SeaBIOS is used as firmware, it should show the TPM menu item
 after entering the menu with 'ESC'.
 
-- 
2.37.2

[PATCH v12 3/3] tpm: Add support for TPM device over I2C bus

2023-03-31 Thread Ninad Palsule

From: Ninad Palsule 

Qemu already supports devices attached to ISA and sysbus. This drop adds
support for the I2C bus attached TPM devices. I2C model only supports
TPM2 protocol.

This commit includes changes for the common code.
- Added I2C emulation model. Logic was added in the model to temporarily
  cache the data as I2C interface works per byte basis.
- New tpm type "tpm-tis-i2c" added for I2C support. The user has to
  provide this string on command line.

Testing:
  TPM I2C device module is tested using SWTPM (software based TPM
  package). Qemu uses the rainier machine and is connected to swtpm over
  the socket interface.

  The command to start swtpm is as follows:
  $ swtpm socket --tpmstate dir=/tmp/mytpm1\
 --ctrl type=unixio,path=/tmp/mytpm1/swtpm-sock  \
 --tpm2 --log level=100

  The command to start qemu is as follows:
  $ qemu-system-arm -M rainier-bmc -nographic \
-kernel ${IMAGEPATH}/fitImage-linux.bin \
-dtb ${IMAGEPATH}/aspeed-bmc-ibm-rainier.dtb \
-initrd ${IMAGEPATH}/obmc-phosphor-initramfs.rootfs.cpio.xz \
-drive 
file=${IMAGEPATH}/obmc-phosphor-image.rootfs.wic.qcow2,if=sd,index=2 \
-net nic -net 
user,hostfwd=:127.0.0.1:-:22,hostfwd=:127.0.0.1:2443-:443 \
-chardev socket,id=chrtpm,path=/tmp/mytpm1/swtpm-sock \
-tpmdev emulator,id=tpm0,chardev=chrtpm \
-device tpm-tis-i2c,tpmdev=tpm0,bus=aspeed.i2c.bus.12,address=0x2e

Signed-off-by: Ninad Palsule 
Reviewed-by: Stefan Berger 
Tested-by: Stefan Berger 
Reviewed-by: Cédric Le Goater 
Reviewed-by: Joel Stanley 
Tested-by: Joel Stanley 
---
V2:
Incorporated Stephen's review comments.
- Handled checksum related register in I2C layer
- Defined I2C interface capabilities and return those instead of
  capabilities from TPM TIS. Add required capabilities from TIS.
- Do not cache FIFO data in the I2C layer.
- Make sure that Device address change register is not passed to I2C
  layer as capability indicate that it is not supported.
- Added boundary checks.
- Make sure that bits 26-31 are zeroed for the TPM_STS register on read
- Updated Kconfig files for new define.

---
V3:
- Moved processing of register TPM_I2C_LOC_SEL in the I2C. So I2C layer
  remembers the locality and pass it to TIS on each read/write.
- The write data is no more cached in the I2C layer so the buffer size
  is reduced to 16 bytes.
- Checksum registers are now managed by the I2C layer. Added new
  function in TIS layer to return the checksum and used that to process
  the request.
- Now 2-4 byte register value will be passed to TIS layer in a single
  write call instead of 1 byte at a time. Added functions to convert
  between little endian stream of bytes to single 32 bit unsigned
  integer. Similarly 32  bit integer to stream of bytes.
- Added restriction on device change register.
- Replace few if-else statement with switch statement for clarity.
- Log warning when unknown register is received.
- Moved all register definations to acpi/tmp.h

---
V4:
Incorporated review comments from Cedric and Stefan.
- Reduced data[] size from 16 byte to 5 bytes.
- Added register name in the mapping table which can be used for
  tracing.
- Removed the endian conversion functions instead used simple logic
  provided by Stefan.
- Rename I2C registers to reduce the length.
- Added traces for send, recv and event functions. You can turn on trace
  on command line by using "-trace "tpm_tis_i2c*" option.

---
V5:
Fixed issues reported by Stefan's test.
- Added mask for the INT_ENABLE register.
- Use correct TIS register for reading interrupt capability.
- Cleanup how register is converted from I2C to TIS and also saved
  information like tis_addr and register name in the i2cst so that we
  can only convert it once on i2c_send.
- Trace register number for unknown registers.

---
V6:
Fixed review comments from Stefan.
- Fixed some variable size.
- Removed unused variables.
- Added vmstat backin to handle migration.
- Added post load phase to reload tis address and register name.

---
V7:
Incorporated review comments from Stefan.
- Added tpm_tis_i2c_initfn function
- Set the device catagory DEVICE_CATEGORY_MISC.
- Corrected default locality selection.
- Other cleanup. Include file cleanup.

---
V8:
Incorporated review comments from Stefan.
- Removed the irq initialization as linux doesn't support interrupts for
  TPM
- Handle INT_CAPABILITY register in I2C only and return 0 to indicate
  that it is not supported.

---
V9:
- Added copyright
- Added set data function and called it few places.
- Rename function tpm_i2c_interface_capability

---
V10:
- Fixed the copyright text.

---
V11:
- As per specs changed STS register to support read/write in the middle
- Fixed issue in the checksum register

---
V12:
- Added validation for the locality.
- Applied correct mask for STS read and write.
---
 hw/arm/Kconfig   |   1 +
 hw/tpm/Kconfig   |   7 +
 hw/tpm/m

[PATCH v2] block-backend: Add new bds_io_in_flight counter

2023-03-31 Thread Hanna Czenczek

IDE TRIM is a BB user that wants to elevate its BB's in-flight counter
for a "macro" operation that consists of several actual I/O operations.
Each of those operations is individually started and awaited.  It does
this so that blk_drain() will drain the whole TRIM, and not just a
single one of the many discard operations it may encompass.

When request queuing is enabled, this leads to a deadlock: The currently
ongoing discard is drained, and the next one is queued, waiting for the
drain to stop.  Meanwhile, TRIM still keeps the in-flight counter
elevated, waiting for all discards to stop -- which will never happen,
because with the in-flight counter elevated, the BB is never considered
drained, so the drained section does not begin and cannot end.

There are two separate cases to look at here, namely bdrv_drain*() and
blk_drain*().  As said above, we do want blk_drain*() to settle the
whole operation: The only way to do so is to disable request queuing,
then.  So, we do that: Have blk_drain() and blk_drain_all() temporarily
disable request queuing, which prevents the deadlock.

(The devil's in the details, though: blk_drain_all() runs
bdrv_drain_all_begin() first, so when we get to the individual BB, there
may already be queued requests.  Therefore, we have to not only disable
request queuing then, but resume all already-queued requests, too.)

For bdrv_drain*(), we want request queuing -- and macro requests such as
IDE's TRIM request do not matter.  bdrv_drain*() wants to keep I/O
requests from BDS nodes, and the TRIM does not issue such requests; it
instead does so through blk_*() functions, which themselves elevate the
BB's in-flight counter.  So the idea is to drain (and potentially queue)
those blk_*() requests, but completely ignore the TRIM.

We can do that by splitting a new counter off of the existing BB
counter: The new bds_io_in_flight counter counts all those blk_*()
requests that can issue I/O to a BDS (so must be drained by
bdrv_drain*()), but will never block waiting on another request on the
BB.

In blk_drain*(), we disable request queuing and settle all requests (the
full in_flight count).  In bdrv_drain*() (i.e. blk_root_drained_poll()),
we only settle bds_io_in_flight_count, ignoring all requests that will
not directly issue I/O requests to BDS nodes.

Reported-by: Fiona Ebner 
Fixes: 7e5cdb345f77d76cb4877fe6230c4e17a7d0d0ca
   ("ide: Increment BB in-flight counter for TRIM BH")
Signed-off-by: Hanna Czenczek 
---
 block/block-backend.c | 157 ++
 1 file changed, 130 insertions(+), 27 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 2ee39229e4..6b9cf1c8c4 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -91,8 +91,27 @@ struct BlockBackend {
  * in-flight requests but aio requests can exist even when blk->root is
  * NULL, so we cannot rely on its counter for that case.
  * Accessed with atomic ops.
+ *
+ * bds_io_in_flight is the subset of in-flight requests that may directly
+ * issue I/O to a BDS node.  Polling the BB's AioContext, these requests
+ * must always make progress, eventually leading to bds_io_in_flight being
+ * decremented again (either when they request is settled, or when it is
+ * queued because of request queuing).
+ * In contrast to these, there are more abstract requests, which will not
+ * themselves issue I/O to a BDS node, but instead, when necessary, create
+ * specific BDS I/O requests that do so on their behalf, and then they 
block
+ * waiting for those subordinate requests.
+ * While request queuing is enabled, we must not have drained_poll wait on
+ * such abstract requests, because if one of its subordinate requests is
+ * queued, it will block and cannot progress until the drained section 
ends,
+ * which leads to a deadlock.  Luckily, it is safe to ignore such requests
+ * when draining BDS nodes: After all, they themselves do not issue I/O to
+ * BDS nodes.
+ * Finally, when draining a BB (blk_drain(), blk_drain_all()), we simply
+ * disable request queuing and can thus safely await all in-flight 
requests.
  */
 unsigned int in_flight;
+unsigned int bds_io_in_flight;
 };
 
 typedef struct BlockBackendAIOCB {
@@ -138,6 +157,9 @@ static bool blk_root_change_aio_ctx(BdrvChild *child, 
AioContext *ctx,
 GHashTable *visited, Transaction *tran,
 Error **errp);
 
+static void blk_inc_bds_io_in_flight(BlockBackend *blk);
+static void blk_dec_bds_io_in_flight(BlockBackend *blk);
+
 static char *blk_root_get_parent_desc(BdrvChild *child)
 {
 BlockBackend *blk = child->opaque;
@@ -1266,15 +1288,15 @@ blk_check_byte_request(BlockBackend *blk, int64_t 
offset, int64_t bytes)
 return 0;
 }
 
-/* To be called between exactly one pair of blk_inc/dec_in_flight() */
+/* To be called between exactly one pair o

Re: [PATCH 0/2] hw/acpi: bump MADT to revision 5

2023-03-31 Thread Igor Mammedov

On Wed, 29 Mar 2023 12:47:05 -0400
"Michael S. Tsirkin"  wrote:

> On Wed, Mar 29, 2023 at 08:14:37AM -0500, Eric DeVolder wrote:
> > 
> > 
> > On 3/29/23 00:19, Michael S. Tsirkin wrote:  
> > > Hmm I don't think we can reasonably make such a change for 8.0.
> > > Seems too risky.
> > > Also, I feel we want to have an internal (with "x-" prefix") flag to
> > > revert to old behaviour, in case of breakage on some guests.  and maybe
> > > we want to keep old revision for old machine types.  
> > Ok, what option name, for keeping old behavior, would you like?  
> 
> Don't much care. x-madt-rev?

if it works fine (cold & hot-plug) with older linux/windows guests
I'd rather avoid adding compat knob (we typically do that in ACPI tables
only when change breaks something).

(as old guest I'd define WinXP sp3 (/me wonders if we  still care about
dead EOLed OS) perhaps WS2008 would be a better minimum target these days
and RHEL6 (or some older ACPI enabled kernel with hotplug support))

> 
> > > 
> > > 
> > > On Tue, Mar 28, 2023 at 11:59:24AM -0400, Eric DeVolder wrote:  
> > > > The following Linux kernel change broke CPU hotplug for MADT revision
> > > > less than 5.
> > > > 
> > > >   commit e2869bd7af60 ("x86/acpi/boot: Do not register processors that 
> > > > cannot be onlined for x2APIC")  
> > > 
> > > Presumably it's being fixed? Link to discussion? Patch fixing that in
> > > Linux?  
> > 
> > https://lore.kernel.org/linux-acpi/20230327191026.3454-1-eric.devol...@oracle.com/T/#t
> >   
> 
> Great! Maybe stick a Link: tag in the commit log.

So it's guest bug which is in process of being fixed.
(i.e. QEMU technically correct as long as MADT revision < 5)

In this case I'd not touch x86 MADT at all (It should be upto
downstream distros to fix guest kernel).

Probably the same applies to ARM variant
i.e. we should bump rev only when current one gets in the way
(aka we are pulling in new fields/definitions from new version)

   
> > > > As part of the investigation into resolving this breakage, I learned
> > > > that i386 QEMU reports revision 1, while technically it is at revision 
> > > > 3.
> > > > (Arm QEMU reports revision 4, and that is valid/correct.)
> > > > 
> > > > ACPI 6.3 bumps MADT revision to 5 as it introduces an Online Capable
> > > > flag that the above Linux patch utilizes to denote hot pluggable CPUs.
> > > > 
> > > > So in order to bump MADT to the current revision of 5, need to
> > > > validate that all MADT table changes between 1 and 5 are present
> > > > in QEMU.
> > > > 
> > > > Below is a table summarizing the changes to the MADT. This information
> > > > gleamed from the ACPI specs on uefi.org.
> > > > 
> > > > ACPIMADTWhat
> > > > Version Version
> > > > 1.0 MADT not present
> > > > 2.0 1   Section 5.2.10.4
> > > > 3.0 2   Section 5.2.11.4
> > > >   5.2.11.13 Local SAPIC Structure added two new fields:
> > > >ACPI Processor UID Value
> > > >ACPI Processor UID String
> > > >   5.2.10.14 Platform Interrupt Sources Structure:
> > > >Reserved changed to Platform Interrupt Sources Flags
> > > > 3.0b2   Section 5.2.11.4
> > > >   Added a section describing guidelines for the 
> > > > ordering of
> > > >   processors in the MADT to support proper boot 
> > > > processor
> > > >   and multi-threaded logical processor operation.
> > > > 4.0 3   Section 5.2.12
> > > >   Adds Processor Local x2APIC structure type 9
> > > >   Adds Local x2APIC NMI structure type 0xA
> > > > 5.0 3   Section 5.2.12
> > > > 6.0 3   Section 5.2.12
> > > > 6.0a4   Section 5.2.12
> > > >   Adds ARM GIC structure types 0xB-0xF
> > > > 6.2a45  Section 5.2.12   <--- yep it says version 45!
> > > > 6.2b5   Section 5.2.12
> > > >   GIC ITS last Reserved offset changed to 16 from 20 
> > > > (typo)
> > > > 6.3 5   Section 5.2.12
> > > >   Adds Local APIC Flags Online Capable!
> > > >   Adds GICC SPE Overflow Interrupt field
> > > > 6.4 5   Section 5.2.12
> > > >   Adds Multiprocessor Wakeup Structure type 0x10
> > > >   (change notes says structure previously misplaced?)
> > > > 6.5 5   Section 5.2.12
> > > > 
> > > > For the MADT revision change 1 -> 2, the spec has a change to the
> > > > SAPIC structure. In general, QEMU does not generate/support SAPIC.
> > > > So the QEMU i386 MADT revision can safely be moved to 2.
> > > > 
> > > > For the MADT revision change 2 -> 3, the spec adds Local x2APIC
> > > > structures. QEMU has long supported x2apic ACPI structures. A simple
> > > > search of x2apic within QEMU source and hw/i386/acpi-common.c
> > > > specifically reveals this.  
> > > 
> > > But not unconditionally.  
> > 
> > I don't think that

Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram

2023-03-31 Thread Peter Xu

On Fri, Mar 31, 2023 at 05:10:16PM +0100, Daniel P. Berrangé wrote:
> On Fri, Mar 31, 2023 at 11:55:03AM -0400, Peter Xu wrote:
> > On Fri, Mar 31, 2023 at 12:30:45PM -0300, Fabiano Rosas wrote:
> > > Peter Xu  writes:
> > > 
> > > > On Fri, Mar 31, 2023 at 11:37:50AM -0300, Fabiano Rosas wrote:
> > > >> >> Outgoing migration to file. NVMe disk. XFS filesystem.
> > > >> >> 
> > > >> >> - Single migration runs of stopped 32G guest with ~90% RAM usage. 
> > > >> >> Guest
> > > >> >>   running `stress-ng --vm 4 --vm-bytes 90% --vm-method all --verify 
> > > >> >> -t
> > > >> >>   10m -v`:
> > > >> >> 
> > > >> >> migration type  | MB/s | pages/s |  ms
> > > >> >> +--+-+--
> > > >> >> savevm io_uring |  434 |  102294 | 71473
> > > >> >
> > > >> > So I assume this is the non-live migration scenario.  Could you 
> > > >> > explain
> > > >> > what does io_uring mean here?
> > > >> >
> > > >> 
> > > >> This table is all non-live migration. This particular line is a 
> > > >> snapshot
> > > >> (hmp_savevm->save_snapshot). I thought it could be relevant because it
> > > >> is another way by which we write RAM into disk.
> > > >
> > > > I see, so if all non-live that explains, because I was curious what's 
> > > > the
> > > > relationship between this feature and the live snapshot that QEMU also
> > > > supports.
> > > >
> > > > I also don't immediately see why savevm will be much slower, do you 
> > > > have an
> > > > answer?  Maybe it's somewhere but I just overlooked..
> > > >
> > > 
> > > I don't have a concrete answer. I could take a jab and maybe blame the
> > > extra memcpy for the buffer in QEMUFile? Or perhaps an unintended effect
> > > of bandwidth limits?
> > 
> > IMHO it would be great if this can be investigated and reasons provided in
> > the next cover letter.
> > 
> > > 
> > > > IIUC this is "vm suspend" case, so there's an extra benefit knowledge of
> > > > "we can stop the VM".  It smells slightly weird to build this on top of
> > > > "migrate" from that pov, rather than "savevm", though.  Any thoughts on
> > > > this aspect (on why not building this on top of "savevm")?
> > > >
> > > 
> > > I share the same perception. I have done initial experiments with
> > > savevm, but I decided to carry on the work that was already started by
> > > others because my understanding of the problem was yet incomplete.
> > > 
> > > One point that has been raised is that the fixed-ram format alone does
> > > not bring that many performance improvements. So we'll need
> > > multi-threading and direct-io on top of it. Re-using multifd
> > > infrastructure seems like it could be a good idea.
> > 
> > The thing is IMHO concurrency is not as hard if VM stopped, and when we're
> > 100% sure locally on where the page will go.
> 
> We shouldn't assume the VM is stopped though. When saving to the file
> the VM may still be active. The fixed-ram format lets us re-write the
> same memory location on disk multiple times in this case, thus avoiding
> growth of the file size.

Before discussing on reusing multifd below, now I have a major confusing on
the use case of the feature..

The question is whether we would like to stop the VM after fixed-ram
migration completes.  I'm asking because:

  1. If it will stop, then it looks like a "VM suspend" to me. If so, could
 anyone help explain why we don't stop the VM first then migrate?
 Because it avoids copying single pages multiple times, no fiddling
 with dirty tracking at all - we just don't ever track anything.  In
 short, we'll stop the VM anyway, then why not stop it slightly
 earlier?

  2. If it will not stop, then it's "VM live snapshot" to me.  We have
 that, aren't we?  That's more efficient because it'll wr-protect all
 guest pages, any write triggers a CoW and we only copy the guest pages
 once and for all.

Either way to go, there's no need to copy any page more than once.  Did I
miss anything perhaps very important?

I would guess it's option (1) above, because it seems we don't snapshot the
disk alongside.  But I am really not sure now..

> 
> > IOW, I think multifd provides a lot of features that may not really be
> > useful for this effort, meanwhile using those features may need to already
> > pay for the overhead to support those features.
> > 
> > For example, a major benefit of multifd is it allows pages sent out of
> > order, so it indexes the page as a header.  I didn't read the follow up
> > patches, but I assume that's not needed in this effort.
> > 
> > What I understand so far with fixes-ram is we dump the whole ramblock
> > memory into a chunk at offset of a file.  Can concurrency of that
> > achievable easily by creating a bunch of threads dumping altogether during
> > the savevm, with different offsets of guest ram & file passed over?
> 
> I feel like the migration code is already insanely complicated and
> the many threads involved have caused no end of subtle bugs. 
> 
> It was Juan I be

Re: [PATCH 2/2] hw/acpi: i386: bump MADT to revision 5

2023-03-31 Thread Igor Mammedov

On Wed, 29 Mar 2023 08:16:26 -0500
Eric DeVolder  wrote:

> On 3/29/23 00:03, Michael S. Tsirkin wrote:
> > On Tue, Mar 28, 2023 at 11:59:26AM -0400, Eric DeVolder wrote:  
> >> Currently i386 QEMU generates MADT revision 3, and reports
> >> MADT revision 1. ACPI 6.3 introduces MADT revision 5.
> >>
> >> For MADT revision 4, that introduces ARM GIC structures, which do
> >> not apply to i386.
> >>
> >> For MADT revision 5, the Local APIC flags introduces the Online
> >> Capable bitfield.
> >>
> >> Making MADT generate and report revision 5 will solve problems with
> >> CPU hotplug (the Online Capable flag indicates hotpluggable CPUs).
> >>
> >> Signed-off-by: Eric DeVolder   
> > 
> > I am looking for ways to reduce risk of breakage with this.
> > We don't currently have a reason to change it if cpu
> > hotplug is off, do we? Maybe make it conditional on that.  
> 
> By "cpu hotplug off", do you mean, for example, no maxcpus= option?
> In other words, how should I detect "cpu hotplug off"?
> eric
I'm not sure that it's possible disable CPU hotplug at all.
even if one doesn't have maxcpus on CLI present CPUs are described
as hotpluggbale and can be unplugged and re-plugged later.

Re: [PATCH 3/3] target/arm: Implement FEAT_PAN3

2023-03-31 Thread Richard Henderson


On 3/31/23 07:50, Peter Maydell wrote:

FEAT_PAN3 adds an EPAN bit to SCTLR_EL1 and SCTLR_EL2, which allows
the PAN bit to make memory non-privileged-read/write if it is
user-executable as well as if it is user-read/write.

Implement this feature and enable it in the AArch64 'max' CPU.

Signed-off-by: Peter Maydell
---
  docs/system/arm/emulation.rst |  1 +
  target/arm/cpu.h  |  5 +
  target/arm/cpu64.c|  2 +-
  target/arm/ptw.c  | 14 +-
  4 files changed, 20 insertions(+), 2 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [PULL 00/15] tcg patch queue

2023-03-31 Thread Richard Henderson


On 3/30/23 03:37, Joel Stanley wrote:

On Tue, 28 Mar 2023 at 22:59, Richard Henderson
 wrote:


The following changes since commit d37158bb2425e7ebffb167d611be01f1e9e6c86f:

   Update version for v8.0.0-rc2 release (2023-03-28 20:43:21 +0100)

are available in the Git repository at:

   https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230328

for you to fetch changes up to 87e303de70f93bf700f58412fb9b2c3ec918c4b5:

   softmmu: Restore use of CPU watchpoint for all accelerators (2023-03-28 
15:24:06 -0700)


Use a local version of GTree [#285]
Fix page_set_flags vs the last page of the address space [#1528]
Re-enable gdbstub breakpoints under KVM


Emilio Cota (2):
   util: import GTree as QTree
   tcg: use QTree instead of GTree

Philippe Mathieu-Daudé (3):
   softmmu: Restrict cpu_check_watchpoint / address_matches to TCG accel
   softmmu/watchpoint: Add missing 'qemu/error-report.h' include
   softmmu: Restore use of CPU watchpoint for all accelerators

Richard Henderson (10):
   linux-user: Diagnose misaligned -R size
   accel/tcg: Pass last not end to page_set_flags
   accel/tcg: Pass last not end to page_reset_target_data
   accel/tcg: Pass last not end to PAGE_FOR_EACH_TB
   accel/tcg: Pass last not end to page_collection_lock
   accel/tcg: Pass last not end to tb_invalidate_phys_page_range__locked
   accel/tcg: Pass last not end to tb_invalidate_phys_range
   linux-user: Pass last not end to probe_guest_base
   include/exec: Change reserved_va semantics to last byte
   linux-user/arm: Take more care allocating commpage


Thanks for getting these fixes merged.

This last one (4f5c67f8df7f26e559509c68c45e652709edd23f) causes a
regression for me. On ppc64le, qemu-arm now segfaults. If I revert
this one I can run executables without the assert.

The segfault looks like this:

#0  0x0001001e44fc in stl_he_p (v=5, ptr=0x240450ffc) at
/home/joel/qemu/include/qemu/bswap.h:260
#1  stl_le_p (v=5, ptr=0x240450ffc) at /home/joel/qemu/include/qemu/bswap.h:302
#2  init_guest_commpage () at ../linux-user/elfload.c:460
#3  probe_guest_base (image_name=image_name@entry=0x1003c72e0
 "/home/joel/hello",
 guest_loaddr=guest_loaddr@entry=65536,
guest_hiaddr=guest_hiaddr@entry=17411743) at
../linux-user/elfload.c:2818
#4  0x0001001e50d4 in load_elf_image (image_name=0x1003c72e0
 "/home/joel/hello",
 image_fd=, info=info@entry=0x7fffe7e8,
pinterp_name=pinterp_name@entry=0x7fffe558,
 bprm_buf=bprm_buf@entry=0x7fffe8d0 "\177ELF\001\001\001") at
../linux-user/elfload.c:3108
#5  0x0001001e5434 in load_elf_binary (bprm=0x7fffe8d0,
info=0x7fffe7e8) at ../linux-user/elfload.c:3548
#6  0x0001001e85bc in loader_exec (fdexec=,
filename=, argv=,
 envp=, regs=0x7fffe888, infop=0x7fffe7e8,
bprm=0x7fffe8d0) at ../linux-user/linuxload.c:155
#7  0x000100046c7c in main (argc=,
argv=0x71c8, envp=) at ../linux-user/main.c:892


Gah!  I've exposed the same sort of overflow conditions within target_mmap and friends.  I 
think the only short-term solution for 8.0 is to revert the last patch.



r~

[PATCH v5 3/3] qtest: Add a test case for TPM TIS I2C connected to Aspeed I2C controller

2023-03-31 Thread Stefan Berger

Add a test case for the TPM TIS I2C device exercising most of its
functionality, including localities.

Signed-off-by: Stefan Berger 
Tested-by: Cédric Le Goater 
---
 tests/qtest/meson.build|   3 +
 tests/qtest/tpm-tis-i2c-test.c | 663 +
 2 files changed, 666 insertions(+)
 create mode 100644 tests/qtest/tpm-tis-i2c-test.c

diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 85ea4e8d99..cfc66ade6f 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -200,6 +200,7 @@ qtests_arm = \
   (config_all_devices.has_key('CONFIG_ASPEED_SOC') ? qtests_aspeed : []) + \
   (config_all_devices.has_key('CONFIG_NPCM7XX') ? qtests_npcm7xx : []) + \
   (config_all_devices.has_key('CONFIG_GENERIC_LOADER') ? ['hexloader-test'] : 
[]) + \
+  (config_all_devices.has_key('CONFIG_TPM_TIS_I2C') ? ['tpm-tis-i2c-test'] : 
[]) + \
   ['arm-cpu-features',
'microbit-test',
'test-arm-mptimer',
@@ -212,6 +213,7 @@ qtests_aarch64 = \
 ['tpm-tis-device-test', 'tpm-tis-device-swtpm-test'] : []) +   
  \
   (config_all_devices.has_key('CONFIG_XLNX_ZYNQMP_ARM') ? ['xlnx-can-test', 
'fuzz-xlnx-dp-test'] : []) + \
   (config_all_devices.has_key('CONFIG_RASPI') ? ['bcm2835-dma-test'] : []) +  \
+  (config_all_devices.has_key('CONFIG_TPM_TIS_I2C') ? ['tpm-tis-i2c-test'] : 
[]) + \
   ['arm-cpu-features',
'numa-test',
'boot-serial-test',
@@ -304,6 +306,7 @@ qtests = {
   'tpm-crb-test': [io, tpmemu_files],
   'tpm-tis-swtpm-test': [io, tpmemu_files, 'tpm-tis-util.c'],
   'tpm-tis-test': [io, tpmemu_files, 'tpm-tis-util.c'],
+  'tpm-tis-i2c-test': [io, tpmemu_files, 'qtest_aspeed.c'],
   'tpm-tis-device-swtpm-test': [io, tpmemu_files, 'tpm-tis-util.c'],
   'tpm-tis-device-test': [io, tpmemu_files, 'tpm-tis-util.c'],
   'vmgenid-test': files('boot-sector.c', 'acpi-utils.c'),
diff --git a/tests/qtest/tpm-tis-i2c-test.c b/tests/qtest/tpm-tis-i2c-test.c
new file mode 100644
index 00..7a590ac551
--- /dev/null
+++ b/tests/qtest/tpm-tis-i2c-test.c
@@ -0,0 +1,663 @@
+/*
+ * QTest testcases for TPM TIS on I2C (derived from TPM TIS test)
+ *
+ * Copyright (c) 2023 IBM Corporation
+ * Copyright (c) 2023 Red Hat, Inc.
+ *
+ * Authors:
+ *   Stefan Berger 
+ *   Marc-André Lureau 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include 
+
+#include "libqtest-single.h"
+#include "hw/acpi/tpm.h"
+#include "hw/pci/pci_ids.h"
+#include "qtest_aspeed.h"
+#include "tpm-emu.h"
+
+#define DEBUG_TIS_TEST 0
+
+#define DPRINTF(fmt, ...) do { \
+if (DEBUG_TIS_TEST) { \
+printf(fmt, ## __VA_ARGS__); \
+} \
+} while (0)
+
+#define DPRINTF_ACCESS \
+DPRINTF("%s: %d: locty=%d l=%d access=0x%02x pending_request_flag=0x%x\n", 
\
+__func__, __LINE__, locty, l, access, pending_request_flag)
+
+#define DPRINTF_STS \
+DPRINTF("%s: %d: sts = 0x%08x\n", __func__, __LINE__, sts)
+
+#define I2C_SLAVE_ADDR   0x2e
+#define I2C_DEV_BUS_NUM  10
+
+static const uint8_t TPM_CMD[12] =
+"\x80\x01\x00\x00\x00\x0c\x00\x00\x01\x44\x00\x00";
+
+static uint32_t aspeed_bus_addr;
+
+static uint8_t cur_locty = 0xff;
+
+static void tpm_tis_i2c_set_locty(uint8_t locty)
+{
+if (cur_locty != locty) {
+cur_locty = locty;
+aspeed_i2c_writeb(global_qtest, aspeed_bus_addr, I2C_SLAVE_ADDR,
+  TPM_I2C_REG_LOC_SEL, locty);
+}
+}
+
+static uint8_t tpm_tis_i2c_readb(uint8_t locty, uint8_t reg)
+{
+tpm_tis_i2c_set_locty(locty);
+return aspeed_i2c_readb(global_qtest, aspeed_bus_addr, I2C_SLAVE_ADDR, 
reg);
+}
+
+static uint16_t tpm_tis_i2c_readw(uint8_t locty, uint8_t reg)
+{
+tpm_tis_i2c_set_locty(locty);
+return aspeed_i2c_readw(global_qtest, aspeed_bus_addr, I2C_SLAVE_ADDR, 
reg);
+}
+
+static uint32_t tpm_tis_i2c_readl(uint8_t locty, uint8_t reg)
+{
+tpm_tis_i2c_set_locty(locty);
+return aspeed_i2c_readl(global_qtest, aspeed_bus_addr, I2C_SLAVE_ADDR, 
reg);
+}
+
+static void tpm_tis_i2c_writeb(uint8_t locty, uint8_t reg, uint8_t v)
+{
+if (reg != TPM_I2C_REG_LOC_SEL) {
+tpm_tis_i2c_set_locty(locty);
+}
+aspeed_i2c_writeb(global_qtest, aspeed_bus_addr, I2C_SLAVE_ADDR, reg, v);
+}
+
+static void tpm_tis_i2c_writel(uint8_t locty, uint8_t reg, uint32_t v)
+{
+if (reg != TPM_I2C_REG_LOC_SEL) {
+tpm_tis_i2c_set_locty(locty);
+}
+aspeed_i2c_writel(global_qtest, aspeed_bus_addr, I2C_SLAVE_ADDR, reg, v);
+}
+
+static void tpm_tis_i2c_test_basic(const void *data)
+{
+uint8_t access;
+uint32_t v, v2;
+
+/*
+ * All register accesses below must work without locality 0 being the
+ * active locality. Therefore, ensure access is released.
+ */
+tpm_tis_i2c_writeb(0, TPM_I2C_REG_ACCESS,
+   TPM_TIS_ACCESS_ACTIVE_LOCALITY);
+access = tpm_tis_i2c_readb(0, TPM_I2C_REG_ACCESS);

[PATCH v5 1/3] qtest: Add functions for accessing devices on Aspeed I2C controller

2023-03-31 Thread Stefan Berger

Add read and write functions for accessing registers of I2C devices
connected to the Aspeed I2C controller.

Signed-off-by: Stefan Berger 
Reviewed-by: Cédric Le Goater 
Reviewed-by: Ninad Palsule 
Acked-by: Thomas Huth 
---
 include/hw/i2c/aspeed_i2c.h |   7 +++
 tests/qtest/qtest_aspeed.c  | 117 
 tests/qtest/qtest_aspeed.h  |  41 +
 3 files changed, 165 insertions(+)
 create mode 100644 tests/qtest/qtest_aspeed.c
 create mode 100644 tests/qtest/qtest_aspeed.h

diff --git a/include/hw/i2c/aspeed_i2c.h b/include/hw/i2c/aspeed_i2c.h
index adc904d6c1..51c944efea 100644
--- a/include/hw/i2c/aspeed_i2c.h
+++ b/include/hw/i2c/aspeed_i2c.h
@@ -38,6 +38,13 @@ OBJECT_DECLARE_TYPE(AspeedI2CState, AspeedI2CClass, 
ASPEED_I2C)
 #define ASPEED_I2C_OLD_NUM_REG 11
 #define ASPEED_I2C_NEW_NUM_REG 22
 
+#define A_I2CD_M_STOP_CMD   BIT(5)
+#define A_I2CD_M_RX_CMD BIT(3)
+#define A_I2CD_M_TX_CMD BIT(1)
+#define A_I2CD_M_START_CMD  BIT(0)
+
+#define A_I2CD_MASTER_ENBIT(0)
+
 /* Tx State Machine */
 #define   I2CD_TX_STATE_MASK  0xf
 #define I2CD_IDLE 0x0
diff --git a/tests/qtest/qtest_aspeed.c b/tests/qtest/qtest_aspeed.c
new file mode 100644
index 00..afee9a1864
--- /dev/null
+++ b/tests/qtest/qtest_aspeed.c
@@ -0,0 +1,117 @@
+/*
+ * Aspeed i2c bus interface for reading from and writing to i2c device 
registers
+ *
+ * Copyright (c) 2023 IBM Corporation
+ *
+ * Authors:
+ *   Stefan Berger 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include "qtest_aspeed.h"
+#include "hw/i2c/aspeed_i2c.h"
+
+static void aspeed_i2c_startup(QTestState *s, uint32_t baseaddr,
+   uint8_t slave_addr, uint8_t reg)
+{
+uint32_t v;
+static int once;
+
+if (!once) {
+/* one time: enable master */
+   qtest_writel(s, baseaddr + A_I2CC_FUN_CTRL, 0);
+   v = qtest_readl(s, baseaddr + A_I2CC_FUN_CTRL) | A_I2CD_MASTER_EN;
+   qtest_writel(s, baseaddr + A_I2CC_FUN_CTRL, v);
+   once = 1;
+}
+
+/* select device */
+qtest_writel(s, baseaddr + A_I2CD_BYTE_BUF, slave_addr << 1);
+qtest_writel(s, baseaddr + A_I2CD_CMD,
+ A_I2CD_M_START_CMD | A_I2CD_M_RX_CMD);
+
+/* select the register to write to */
+qtest_writel(s, baseaddr + A_I2CD_BYTE_BUF, reg);
+qtest_writel(s, baseaddr + A_I2CD_CMD, A_I2CD_M_TX_CMD);
+}
+
+static uint32_t aspeed_i2c_read_n(QTestState *s,
+  uint32_t baseaddr, uint8_t slave_addr,
+  uint8_t reg, size_t nbytes)
+{
+uint32_t res = 0;
+uint32_t v;
+size_t i;
+
+aspeed_i2c_startup(s, baseaddr, slave_addr, reg);
+
+for (i = 0; i < nbytes; i++) {
+qtest_writel(s, baseaddr + A_I2CD_CMD, A_I2CD_M_RX_CMD);
+v = qtest_readl(s, baseaddr + A_I2CD_BYTE_BUF) >> 8;
+res |= (v & 0xff) << (i * 8);
+}
+
+qtest_writel(s, baseaddr + A_I2CD_CMD, A_I2CD_M_STOP_CMD);
+
+return res;
+}
+
+uint32_t aspeed_i2c_readl(QTestState *s,
+  uint32_t baseaddr, uint8_t slave_addr, uint8_t reg)
+{
+return aspeed_i2c_read_n(s, baseaddr, slave_addr, reg, sizeof(uint32_t));
+}
+
+uint16_t aspeed_i2c_readw(QTestState *s,
+  uint32_t baseaddr, uint8_t slave_addr, uint8_t reg)
+{
+return aspeed_i2c_read_n(s, baseaddr, slave_addr, reg, sizeof(uint16_t));
+}
+
+uint8_t aspeed_i2c_readb(QTestState *s,
+ uint32_t baseaddr, uint8_t slave_addr, uint8_t reg)
+{
+return aspeed_i2c_read_n(s, baseaddr, slave_addr, reg, sizeof(uint8_t));
+}
+
+static void aspeed_i2c_write_n(QTestState *s, 
+   uint32_t baseaddr, uint8_t slave_addr,
+   uint8_t reg, uint32_t v, size_t nbytes)
+{
+size_t i;
+
+aspeed_i2c_startup(s, baseaddr, slave_addr, reg);
+
+for (i = 0; i < nbytes; i++) {
+qtest_writel(s, baseaddr + A_I2CD_BYTE_BUF, v & 0xff);
+v >>= 8;
+qtest_writel(s, baseaddr + A_I2CD_CMD, A_I2CD_M_TX_CMD);
+}
+
+qtest_writel(s, baseaddr + A_I2CD_CMD, A_I2CD_M_STOP_CMD);
+}
+
+void aspeed_i2c_writel(QTestState *s,
+   uint32_t baseaddr, uint8_t slave_addr,
+   uint8_t reg, uint32_t v)
+{
+aspeed_i2c_write_n(s, baseaddr, slave_addr, reg, v, sizeof(v));
+}
+
+void aspeed_i2c_writew(QTestState *s,
+   uint32_t baseaddr, uint8_t slave_addr,
+   uint8_t reg, uint16_t v)
+{
+aspeed_i2c_write_n(s, baseaddr, slave_addr, reg, v, sizeof(v));
+}
+
+void aspeed_i2c_writeb(QTestState *s,
+   uint32_t baseaddr, uint8_t slave_addr,
+   uint8_t reg, uint8_t v)
+{
+aspeed_i2c_write_n(s, baseaddr, slave_addr, reg, v, sizeof(v));
+

[PATCH v5 0/3] qtests: tpm: Add test cases for TPM TIS I2C device emulation

2023-03-31 Thread Stefan Berger

This series adds test cases exercising much of the TPM TIS I2C device model
assuming that the device is connected to the Aspeed I2C controller. Tests
are passing on little and big endian hosts.

This series of patches builds on the following series of patches
providing the TPM TIS I2C device emulation (v12):
https://lists.nongnu.org/archive/html/qemu-devel/2023-03/msg07258.html


Regards,
Stefan

v5:
  - 3/3: Added more test cases; read from REG_STS + 1 and + 3; try to
 select an invalid locality

v4:
  - 1/3: Use qtest_writel() and qtest_readl()

v3:
  - 1/3: Renaming of inline function and added comment
  - 3/3: Made variables static

v2:
  - Split off Aspeed I2C controller library functions
  - Tweaking on test cases


Stefan Berger (3):
  qtest: Add functions for accessing devices on Aspeed I2C controller
  qtest: Move tpm_util_tis_transmit() into tpm-tis-utils.c and rename it
  qtest: Add a test case for TPM TIS I2C connected to Aspeed I2C
controller

 include/hw/i2c/aspeed_i2c.h |   7 +
 tests/qtest/meson.build |   3 +
 tests/qtest/qtest_aspeed.c  | 117 +
 tests/qtest/qtest_aspeed.h  |  41 ++
 tests/qtest/tpm-crb-swtpm-test.c|   3 -
 tests/qtest/tpm-crb-test.c  |   3 -
 tests/qtest/tpm-tis-device-swtpm-test.c |   5 +-
 tests/qtest/tpm-tis-i2c-test.c  | 663 
 tests/qtest/tpm-tis-swtpm-test.c|   5 +-
 tests/qtest/tpm-tis-util.c  |  47 +-
 tests/qtest/tpm-tis-util.h  |   4 +
 tests/qtest/tpm-util.c  |  45 --
 tests/qtest/tpm-util.h  |   3 -
 13 files changed, 887 insertions(+), 59 deletions(-)
 create mode 100644 tests/qtest/qtest_aspeed.c
 create mode 100644 tests/qtest/qtest_aspeed.h
 create mode 100644 tests/qtest/tpm-tis-i2c-test.c

-- 
2.39.2

[PATCH v5 2/3] qtest: Move tpm_util_tis_transmit() into tpm-tis-utils.c and rename it

2023-03-31 Thread Stefan Berger

To be able to remove tpm_tis_base_addr from test cases that do not really
need it move the tpm_util_tis_transmit() function into tpm-tis-utils.c and
rename it to tpm_tis_transmit().

Fix a locality parameter in a test case on the way.

Signed-off-by: Stefan Berger 
Reviewed-by: Ninad Palsule 
Reviewed-by: Thomas Huth 
---
 tests/qtest/tpm-crb-swtpm-test.c|  3 --
 tests/qtest/tpm-crb-test.c  |  3 --
 tests/qtest/tpm-tis-device-swtpm-test.c |  5 +--
 tests/qtest/tpm-tis-swtpm-test.c|  5 +--
 tests/qtest/tpm-tis-util.c  | 47 -
 tests/qtest/tpm-tis-util.h  |  4 +++
 tests/qtest/tpm-util.c  | 45 ---
 tests/qtest/tpm-util.h  |  3 --
 8 files changed, 56 insertions(+), 59 deletions(-)

diff --git a/tests/qtest/tpm-crb-swtpm-test.c b/tests/qtest/tpm-crb-swtpm-test.c
index 40254f762f..ffeb1c396b 100644
--- a/tests/qtest/tpm-crb-swtpm-test.c
+++ b/tests/qtest/tpm-crb-swtpm-test.c
@@ -19,9 +19,6 @@
 #include "tpm-tests.h"
 #include "hw/acpi/tpm.h"
 
-/* Not used but needed for linking */
-uint64_t tpm_tis_base_addr = TPM_TIS_ADDR_BASE;
-
 typedef struct TestState {
 char *src_tpm_path;
 char *dst_tpm_path;
diff --git a/tests/qtest/tpm-crb-test.c b/tests/qtest/tpm-crb-test.c
index 7b94453390..396ae3f91c 100644
--- a/tests/qtest/tpm-crb-test.c
+++ b/tests/qtest/tpm-crb-test.c
@@ -19,9 +19,6 @@
 #include "qemu/module.h"
 #include "tpm-emu.h"
 
-/* Not used but needed for linking */
-uint64_t tpm_tis_base_addr = TPM_TIS_ADDR_BASE;
-
 #define TPM_CMD "\x80\x01\x00\x00\x00\x0c\x00\x00\x01\x44\x00\x00"
 
 static void tpm_crb_test(const void *data)
diff --git a/tests/qtest/tpm-tis-device-swtpm-test.c 
b/tests/qtest/tpm-tis-device-swtpm-test.c
index 8c067fddd4..517a077005 100644
--- a/tests/qtest/tpm-tis-device-swtpm-test.c
+++ b/tests/qtest/tpm-tis-device-swtpm-test.c
@@ -18,6 +18,7 @@
 #include "libqtest.h"
 #include "qemu/module.h"
 #include "tpm-tests.h"
+#include "tpm-tis-util.h"
 #include "hw/acpi/tpm.h"
 
 uint64_t tpm_tis_base_addr = 0xc00;
@@ -33,7 +34,7 @@ static void tpm_tis_swtpm_test(const void *data)
 {
 const TestState *ts = data;
 
-tpm_test_swtpm_test(ts->src_tpm_path, tpm_util_tis_transfer,
+tpm_test_swtpm_test(ts->src_tpm_path, tpm_tis_transfer,
 "tpm-tis-device", MACHINE_OPTIONS);
 }
 
@@ -42,7 +43,7 @@ static void tpm_tis_swtpm_migration_test(const void *data)
 const TestState *ts = data;
 
 tpm_test_swtpm_migration_test(ts->src_tpm_path, ts->dst_tpm_path, ts->uri,
-  tpm_util_tis_transfer, "tpm-tis-device",
+  tpm_tis_transfer, "tpm-tis-device",
   MACHINE_OPTIONS);
 }
 
diff --git a/tests/qtest/tpm-tis-swtpm-test.c b/tests/qtest/tpm-tis-swtpm-test.c
index 11539c0a52..105e42e21d 100644
--- a/tests/qtest/tpm-tis-swtpm-test.c
+++ b/tests/qtest/tpm-tis-swtpm-test.c
@@ -17,6 +17,7 @@
 #include "libqtest.h"
 #include "qemu/module.h"
 #include "tpm-tests.h"
+#include "tpm-tis-util.h"
 #include "hw/acpi/tpm.h"
 
 uint64_t tpm_tis_base_addr = TPM_TIS_ADDR_BASE;
@@ -31,7 +32,7 @@ static void tpm_tis_swtpm_test(const void *data)
 {
 const TestState *ts = data;
 
-tpm_test_swtpm_test(ts->src_tpm_path, tpm_util_tis_transfer,
+tpm_test_swtpm_test(ts->src_tpm_path, tpm_tis_transfer,
 "tpm-tis", NULL);
 }
 
@@ -40,7 +41,7 @@ static void tpm_tis_swtpm_migration_test(const void *data)
 const TestState *ts = data;
 
 tpm_test_swtpm_migration_test(ts->src_tpm_path, ts->dst_tpm_path, ts->uri,
-  tpm_util_tis_transfer, "tpm-tis", NULL);
+  tpm_tis_transfer, "tpm-tis", NULL);
 }
 
 int main(int argc, char **argv)
diff --git a/tests/qtest/tpm-tis-util.c b/tests/qtest/tpm-tis-util.c
index 939893bf01..728cd3e065 100644
--- a/tests/qtest/tpm-tis-util.c
+++ b/tests/qtest/tpm-tis-util.c
@@ -52,7 +52,7 @@ void tpm_tis_test_check_localities(const void *data)
 uint32_t rid;
 
 for (locty = 0; locty < TPM_TIS_NUM_LOCALITIES; locty++) {
-access = readb(TIS_REG(0, TPM_TIS_REG_ACCESS));
+access = readb(TIS_REG(locty, TPM_TIS_REG_ACCESS));
 g_assert_cmpint(access, ==, TPM_TIS_ACCESS_TPM_REG_VALID_STS |
 TPM_TIS_ACCESS_TPM_ESTABLISHMENT);
 
@@ -449,3 +449,48 @@ void tpm_tis_test_check_transmit(const void *data)
 writeb(TIS_REG(0, TPM_TIS_REG_ACCESS), TPM_TIS_ACCESS_ACTIVE_LOCALITY);
 access = readb(TIS_REG(0, TPM_TIS_REG_ACCESS));
 }
+
+void tpm_tis_transfer(QTestState *s,
+  const unsigned char *req, size_t req_size,
+  unsigned char *rsp, size_t rsp_size)
+{
+uint32_t sts;
+uint16_t bcount;
+size_t i;
+
+/* request use of locality 0 */
+qtest_writeb(s, TIS_REG(0, TPM_TIS_REG_ACCESS), 
TPM_TIS_ACCESS_REQUEST_USE);
+qtest_

[PATCH] coverity: unify Fedora dockerfiles

2023-03-31 Thread Paolo Bonzini

The Fedora CI and coverity runs are using a slightly different set of
packages.  Copy most of the content over from tests/docker while
keeping the commands at the end that unpack the tools.

Signed-off-by: Paolo Bonzini 
---
 scripts/coverity-scan/coverity-scan.docker | 250 -
 1 file changed, 145 insertions(+), 105 deletions(-)

diff --git a/scripts/coverity-scan/coverity-scan.docker 
b/scripts/coverity-scan/coverity-scan.docker
index 6f60a52d23..a349578526 100644
--- a/scripts/coverity-scan/coverity-scan.docker
+++ b/scripts/coverity-scan/coverity-scan.docker
@@ -15,112 +15,152 @@
 # The work of actually doing the build is handled by the
 # run-coverity-scan script.
 
-FROM fedora:30
-ENV PACKAGES \
-alsa-lib-devel \
-bc \
-brlapi-devel \
-bzip2 \
-bzip2-devel \
-ccache \
-clang \
-curl \
-cyrus-sasl-devel \
-dbus-daemon \
-device-mapper-multipath-devel \
-findutils \
-gcc \
-gcc-c++ \
-gettext \
-git \
-glib2-devel \
-glusterfs-api-devel \
-gnutls-devel \
-gtk3-devel \
-hostname \
-libaio-devel \
-libasan \
-libattr-devel \
-libblockdev-mpath-devel \
-libcap-devel \
-libcap-ng-devel \
-libcurl-devel \
-libepoxy-devel \
-libfdt-devel \
-libgbm-devel \
-libiscsi-devel \
-libjpeg-devel \
-libpmem-devel \
-libnfs-devel \
-libpng-devel \
-librbd-devel \
-libseccomp-devel \
-libssh-devel \
-libubsan \
-libudev-devel \
-libusbx-devel \
-libzstd-devel \
-llvm \
-lzo-devel \
-make \
-mingw32-bzip2 \
-mingw32-curl \
-mingw32-glib2 \
-mingw32-gmp \
-mingw32-gnutls \
-mingw32-gtk3 \
-mingw32-libjpeg-turbo \
-mingw32-libpng \
-mingw32-libtasn1 \
-mingw32-nettle \
-mingw32-nsis \
-mingw32-pixman \
-mingw32-pkg-config \
-mingw32-SDL2 \
-mingw64-bzip2 \
-mingw64-curl \
-mingw64-glib2 \
-mingw64-gmp \
-mingw64-gnutls \
-mingw64-gtk3 \
-mingw64-libjpeg-turbo \
-mingw64-libpng \
-mingw64-libtasn1 \
-mingw64-nettle \
-mingw64-pixman \
-mingw64-pkg-config \
-mingw64-SDL2 \
-ncurses-devel \
-nettle-devel \
-numactl-devel \
-perl \
-perl-Test-Harness \
-pixman-devel \
-pulseaudio-libs-devel \
-python3 \
-python3-sphinx \
-PyYAML \
-rdma-core-devel \
-SDL2-devel \
-snappy-devel \
-sparse \
-spice-server-devel \
-systemd-devel \
-systemtap-sdt-devel \
-tar \
-usbredir-devel \
-virglrenderer-devel \
-vte291-devel \
-wget \
-which \
-xen-devel \
-xfsprogs-devel \
-zlib-devel
-ENV QEMU_CONFIGURE_OPTS --python=/usr/bin/python3
+FROM registry.fedoraproject.org/fedora:37
 
-RUN dnf install -y $PACKAGES
-RUN rpm -q $PACKAGES | sort > /packages.txt
-ENV PATH $PATH:/usr/libexec/python3-sphinx/
+RUN dnf install -y nosync && \
+echo -e '#!/bin/sh\n\
+if test -d /usr/lib64\n\
+then\n\
+export LD_PRELOAD=/usr/lib64/nosync/nosync.so\n\
+else\n\
+export LD_PRELOAD=/usr/lib/nosync/nosync.so\n\
+fi\n\
+exec "$@"' > /usr/bin/nosync && \
+chmod +x /usr/bin/nosync && \
+nosync dnf update -y && \
+nosync dnf install -y \
+   SDL2-devel \
+   SDL2_image-devel \
+   alsa-lib-devel \
+   bash \
+   bc \
+   bison \
+   brlapi-devel \
+   bzip2 \
+   bzip2-devel \
+   ca-certificates \
+   capstone-devel \
+   ccache \
+   clang \
+   ctags \
+   cyrus-sasl-devel \
+   daxctl-devel \
+   dbus-daemon \
+   device-mapper-multipath-devel \
+   diffutils \
+   findutils \
+   flex \
+   fuse3-devel \
+   gcc \
+   gcc-c++ \
+   gcovr \
+   genisoimage \
+   gettext \
+   git \
+   glib2-devel \
+   glib2-static \
+   glibc-langpack-en \
+   glibc-static \
+   glusterfs-api-devel \
+   gnutls-devel \
+   gtk3-devel \
+   hostname \
+   jemalloc-devel \
+   json-c-devel \
+   libaio-devel \
+   libasan \
+   libattr-devel \
+   libbpf-devel \
+   libcacard-devel \
+   libcap-ng-devel \
+   libcmocka-devel \
+   libcurl-devel \
+   libdrm-devel \
+   libepoxy-devel \
+   libfdt-devel \
+   libffi-devel \
+   libgcrypt-devel \
+   libiscsi-devel \
+   libjpeg-devel \
+   libnfs-devel \
+   libpmem-devel \
+   libpng-devel \
+   librbd-devel \
+

Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram

2023-03-31 Thread Fabiano Rosas

Peter Xu  writes:

> On Fri, Mar 31, 2023 at 05:10:16PM +0100, Daniel P. Berrangé wrote:
>> On Fri, Mar 31, 2023 at 11:55:03AM -0400, Peter Xu wrote:
>> > On Fri, Mar 31, 2023 at 12:30:45PM -0300, Fabiano Rosas wrote:
>> > > Peter Xu  writes:
>> > > 
>> > > > On Fri, Mar 31, 2023 at 11:37:50AM -0300, Fabiano Rosas wrote:
>> > > >> >> Outgoing migration to file. NVMe disk. XFS filesystem.
>> > > >> >> 
>> > > >> >> - Single migration runs of stopped 32G guest with ~90% RAM usage. 
>> > > >> >> Guest
>> > > >> >>   running `stress-ng --vm 4 --vm-bytes 90% --vm-method all 
>> > > >> >> --verify -t
>> > > >> >>   10m -v`:
>> > > >> >> 
>> > > >> >> migration type  | MB/s | pages/s |  ms
>> > > >> >> +--+-+--
>> > > >> >> savevm io_uring |  434 |  102294 | 71473
>> > > >> >
>> > > >> > So I assume this is the non-live migration scenario.  Could you 
>> > > >> > explain
>> > > >> > what does io_uring mean here?
>> > > >> >
>> > > >> 
>> > > >> This table is all non-live migration. This particular line is a 
>> > > >> snapshot
>> > > >> (hmp_savevm->save_snapshot). I thought it could be relevant because it
>> > > >> is another way by which we write RAM into disk.
>> > > >
>> > > > I see, so if all non-live that explains, because I was curious what's 
>> > > > the
>> > > > relationship between this feature and the live snapshot that QEMU also
>> > > > supports.
>> > > >
>> > > > I also don't immediately see why savevm will be much slower, do you 
>> > > > have an
>> > > > answer?  Maybe it's somewhere but I just overlooked..
>> > > >
>> > > 
>> > > I don't have a concrete answer. I could take a jab and maybe blame the
>> > > extra memcpy for the buffer in QEMUFile? Or perhaps an unintended effect
>> > > of bandwidth limits?
>> > 
>> > IMHO it would be great if this can be investigated and reasons provided in
>> > the next cover letter.
>> > 
>> > > 
>> > > > IIUC this is "vm suspend" case, so there's an extra benefit knowledge 
>> > > > of
>> > > > "we can stop the VM".  It smells slightly weird to build this on top of
>> > > > "migrate" from that pov, rather than "savevm", though.  Any thoughts on
>> > > > this aspect (on why not building this on top of "savevm")?
>> > > >
>> > > 
>> > > I share the same perception. I have done initial experiments with
>> > > savevm, but I decided to carry on the work that was already started by
>> > > others because my understanding of the problem was yet incomplete.
>> > > 
>> > > One point that has been raised is that the fixed-ram format alone does
>> > > not bring that many performance improvements. So we'll need
>> > > multi-threading and direct-io on top of it. Re-using multifd
>> > > infrastructure seems like it could be a good idea.
>> > 
>> > The thing is IMHO concurrency is not as hard if VM stopped, and when we're
>> > 100% sure locally on where the page will go.
>> 
>> We shouldn't assume the VM is stopped though. When saving to the file
>> the VM may still be active. The fixed-ram format lets us re-write the
>> same memory location on disk multiple times in this case, thus avoiding
>> growth of the file size.
>
> Before discussing on reusing multifd below, now I have a major confusing on
> the use case of the feature..
>
> The question is whether we would like to stop the VM after fixed-ram
> migration completes.  I'm asking because:
>

We would.

>   1. If it will stop, then it looks like a "VM suspend" to me. If so, could
>  anyone help explain why we don't stop the VM first then migrate?
>  Because it avoids copying single pages multiple times, no fiddling
>  with dirty tracking at all - we just don't ever track anything.  In
>  short, we'll stop the VM anyway, then why not stop it slightly
>  earlier?
>

Looking at the previous discussions I don't see explicit mentions of a
requirement either way (stop before or stop after). I agree it makes
more sense to stop the guest first and then migrate without having to
deal with dirty pages.

I presume libvirt just migrates without altering the guest run state so
we implemented this to work in both scenarios. But even then, it seems
QEMU could store the current VM state, stop it, migrate and restore the
state on the destination.

I might be missing context here since I wasn't around when this work
started. Someone correct me if I'm wrong please.

>   2. If it will not stop, then it's "VM live snapshot" to me.  We have
>  that, aren't we?  That's more efficient because it'll wr-protect all
>  guest pages, any write triggers a CoW and we only copy the guest pages
>  once and for all.
>
> Either way to go, there's no need to copy any page more than once.  Did I
> miss anything perhaps very important?
>
> I would guess it's option (1) above, because it seems we don't snapshot the
> disk alongside.  But I am really not sure now..
>

Re: [PATCH] riscv: Add support for the Zfa extension

2023-03-31 Thread Christoph Müllner

On Mon, Mar 27, 2023 at 7:18 PM Richard Henderson
 wrote:
>
> On 3/27/23 01:00, Christoph Muellner wrote:
> > +uint64_t helper_fminm_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
> > +{
> > +float32 frs1 = check_nanbox_s(env, rs1);
> > +float32 frs2 = check_nanbox_s(env, rs2);
> > +
> > +if (float32_is_any_nan(frs1) || float32_is_any_nan(frs2)) {
> > +return float32_default_nan(&env->fp_status);
> > +}
> > +
> > +return nanbox_s(env, float32_minimum_number(frs1, frs2, 
> > &env->fp_status));
> > +}
>
> Better to set and clear fp_status->default_nan_mode around the operation.

I don't see how this can help:
* default_nan_mode defines if the default_nan is generated or if the
operand's NaN should be used
* RISC-V has default_nan_mode always set to true (operations should
return the a canonical NaN and not propagate NaN values)
* That also does not help to eliminate the is_any_nan() tests, because
float*_minimum_number() and float*_minnum() return the non-NaN number
if (only) one operand is NaN

Am I missing something?


>
> > +uint64_t helper_fround_s(CPURISCVState *env, uint64_t frs1)
> > +{
> > +if (float32_is_zero(frs1) ||
> > +float32_is_infinity(frs1)) {
> > +return frs1;
> > +}
> > +
> > +if (float32_is_any_nan(frs1)) {
> > +riscv_cpu_set_fflags(env, FPEXC_NV);
> > +return frs1;
> > +}
> > +
> > +int32_t tmp = float32_to_int32(frs1, &env->fp_status);
> > +return nanbox_s(env, int32_to_float32(tmp, &env->fp_status));
> > +}
>
> Very much incorrect, since int32_t does not have the range for the 
> intermediate result.
> In any case, the function you want is float32_round_to_int, which eliminates 
> the
> zero/inf/nan special cases.  It will raise inexact, so perfect for froundnx, 
> but you'll
> need to save/restore float_flag_inexact around the function.

Understood the issue and changed to the proposed API.

>
> > +uint64_t helper_fli_s(CPURISCVState *env, uint32_t rs1)
> > +{
> > +const uint32_t fli_s_table[] = {
>
> static const.  You don't need to use float32_default_nan, use the correct 
> architected
> constant.  This entire operation should be done at translation time.

Done.

>
> > +target_ulong helper_fcvtmod_w_d(CPURISCVState *env, uint64_t frs1)
> > +{
> > +if (float64_is_any_nan(frs1) ||
> > +float64_is_infinity(frs1)) {
> > +return 0;
> > +}
> > +
> > +return float64_to_int32(frs1, &env->fp_status);
> > +}
>
> Incorrect, as float64_to_int32 will saturate the result, whereas you need the 
> modular result.
>
> There is code to do the conversion mod 2**64 in target/alpha/ (do_cvttq).  We 
> should move
> this to generic code if it is to be used by more than one target.

Understood the issue.
ARM has something similar in HELPER(fjcvtzs).

Given the different flag behaviour of the ARM and the Alpha
instructions, I created a RISC-V specific routine.
For RISC-V the flags have to be identical to fcvt.w.d with the same value.

>
> > +bool trans_fmvp_d_x(DisasContext *ctx, arg_fmvp_d_x *a)
> > +{
> > +REQUIRE_FPU;
> > +REQUIRE_ZFA(ctx);
> > +REQUIRE_EXT(ctx, RVD);
> > +REQUIRE_32BIT(ctx);
> > +
> > +TCGv src1 = get_gpr(ctx, a->rs1, EXT_ZERO);
> > +TCGv_i64 t1 = tcg_temp_new_i64();
> > +
> > +tcg_gen_extu_tl_i64(t1, src1);
> > +tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rd], t1, 32, 32);
> > +mark_fs_dirty(ctx);
> > +return true;
> > +}
>
> This does not match the linked document, which says that this insn has two 
> inputs and sets
> the complete fpr.

Fixed.

Thanks!

Re: [PATCH] riscv: Add support for the Zfa extension

2023-03-31 Thread Christoph Müllner

On Mon, Mar 27, 2023 at 10:42 AM liweiwei  wrote:
>
>
> On 2023/3/27 16:00, Christoph Muellner wrote:
> > From: Christoph Müllner 
> >
> > This patch introduces the RISC-V Zfa extension, which introduces
> > additional floating-point extensions:
> > * fli (load-immediate) with pre-defined immediates
> > * fminm/fmaxm (like fmin/fmax but with different NaN behaviour)
> > * fround/froundmx (round to integer)
> > * fcvtmod.w.d (Modular Convert-to-Integer)
> > * fmv* to access high bits of float register bigger than XLEN
> > * Quiet comparison instructions (fleq/fltq)
> >
> > Zfa defines its instructions in combination with the following extensions:
> > * single-precision floating-point (F)
> > * double-precision floating-point (D)
> > * quad-precision floating-point (Q)
> > * half-precision floating-point (Zfh)
> >
> > Since QEMU does not support the RISC-V quad-precision floating-point
> > ISA extension (Q), this patch does not include the instructions that
> > depend on this extension. All other instructions are included in this
> > patch.
> >
> > The Zfa specification is not frozen at the moment (which is why this
> > patch is RFC) and can be found here:
> >https://github.com/riscv/riscv-isa-manual/blob/master/src/zfa.tex
> >
> > Signed-off-by: Christoph Müllner 
> > ---
> >   target/riscv/cpu.c|   8 +
> >   target/riscv/cpu.h|   1 +
> >   target/riscv/fpu_helper.c | 324 +
> >   target/riscv/helper.h |  22 ++
> >   target/riscv/insn32.decode|  67 
> >   target/riscv/insn_trans/trans_rvzfa.c.inc | 410 ++
> >   target/riscv/translate.c  |   1 +
> >   7 files changed, 833 insertions(+)
> >   create mode 100644 target/riscv/insn_trans/trans_rvzfa.c.inc
> >
> > diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> > index 1e97473af2..bac9ced4a2 100644
> > --- a/target/riscv/cpu.c
> > +++ b/target/riscv/cpu.c
> > @@ -83,6 +83,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
> >   ISA_EXT_DATA_ENTRY(zifencei, true, PRIV_VERSION_1_10_0, ext_ifencei),
> >   ISA_EXT_DATA_ENTRY(zihintpause, true, PRIV_VERSION_1_10_0, 
> > ext_zihintpause),
> >   ISA_EXT_DATA_ENTRY(zawrs, true, PRIV_VERSION_1_12_0, ext_zawrs),
> > +ISA_EXT_DATA_ENTRY(zfa, true, PRIV_VERSION_1_12_0, ext_zfa),
> >   ISA_EXT_DATA_ENTRY(zfh, true, PRIV_VERSION_1_11_0, ext_zfh),
> >   ISA_EXT_DATA_ENTRY(zfhmin, true, PRIV_VERSION_1_12_0, ext_zfhmin),
> >   ISA_EXT_DATA_ENTRY(zfinx, true, PRIV_VERSION_1_12_0, ext_zfinx),
> > @@ -404,6 +405,7 @@ static void rv64_thead_c906_cpu_init(Object *obj)
> >   cpu->cfg.ext_u = true;
> >   cpu->cfg.ext_s = true;
> >   cpu->cfg.ext_icsr = true;
> > +cpu->cfg.ext_zfa = true;
> >   cpu->cfg.ext_zfh = true;
> >   cpu->cfg.mmu = true;
> >   cpu->cfg.ext_xtheadba = true;
> > @@ -865,6 +867,11 @@ static void riscv_cpu_validate_set_extensions(RISCVCPU 
> > *cpu, Error **errp)
> >   return;
> >   }
> >
> > +if (cpu->cfg.ext_zfa && !cpu->cfg.ext_f) {
> > +error_setg(errp, "Zfa extension requires F extension");
> > +return;
> > +}
> > +
> >   if (cpu->cfg.ext_zfh) {
> >   cpu->cfg.ext_zfhmin = true;
> >   }
> > @@ -1381,6 +1388,7 @@ static Property riscv_cpu_extensions[] = {
> >   DEFINE_PROP_BOOL("Zicsr", RISCVCPU, cfg.ext_icsr, true),
> >   DEFINE_PROP_BOOL("Zihintpause", RISCVCPU, cfg.ext_zihintpause, true),
> >   DEFINE_PROP_BOOL("Zawrs", RISCVCPU, cfg.ext_zawrs, true),
> > +DEFINE_PROP_BOOL("Zfa", RISCVCPU, cfg.ext_zfa, false),
> >   DEFINE_PROP_BOOL("Zfh", RISCVCPU, cfg.ext_zfh, false),
> >   DEFINE_PROP_BOOL("Zfhmin", RISCVCPU, cfg.ext_zfhmin, false),
> >   DEFINE_PROP_BOOL("Zve32f", RISCVCPU, cfg.ext_zve32f, false),
> > diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> > index 638e47c75a..deae410fc2 100644
> > --- a/target/riscv/cpu.h
> > +++ b/target/riscv/cpu.h
> > @@ -462,6 +462,7 @@ struct RISCVCPUConfig {
> >   bool ext_svpbmt;
> >   bool ext_zdinx;
> >   bool ext_zawrs;
> > +bool ext_zfa;
> >   bool ext_zfh;
> >   bool ext_zfhmin;
> >   bool ext_zfinx;
> > diff --git a/target/riscv/fpu_helper.c b/target/riscv/fpu_helper.c
> > index 449d236df6..55c75bf063 100644
> > --- a/target/riscv/fpu_helper.c
> > +++ b/target/riscv/fpu_helper.c
> > @@ -252,6 +252,18 @@ uint64_t helper_fmin_s(CPURISCVState *env, uint64_t 
> > rs1, uint64_t rs2)
> >   float32_minimum_number(frs1, frs2, &env->fp_status));
> >   }
> >
> > +uint64_t helper_fminm_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
> > +{
> > +float32 frs1 = check_nanbox_s(env, rs1);
> > +float32 frs2 = check_nanbox_s(env, rs2);
> > +
> > +if (float32_is_any_nan(frs1) || float32_is_any_nan(frs2)) {
> > +return float32_default_nan(&env->fp_status);
> I think we should also add nanbox_s for it.

Done.

[RFC PATCH v2] riscv: Add support for the Zfa extension

2023-03-31 Thread Christoph Muellner

From: Christoph Müllner 

This patch introduces the RISC-V Zfa extension, which introduces
additional floating-point extensions:
* fli (load-immediate) with pre-defined immediates
* fminm/fmaxm (like fmin/fmax but with different NaN behaviour)
* fround/froundmx (round to integer)
* fcvtmod.w.d (Modular Convert-to-Integer)
* fmv* to access high bits of float register bigger than XLEN
* Quiet comparison instructions (fleq/fltq)

Zfa defines its instructions in combination with the following extensions:
* single-precision floating-point (F)
* double-precision floating-point (D)
* quad-precision floating-point (Q)
* half-precision floating-point (Zfh)

Since QEMU does not support the RISC-V quad-precision floating-point
ISA extension (Q), this patch does not include the instructions that
depend on this extension. All other instructions are included in this
patch.

The Zfa specification is not frozen at the moment (which is why this
patch is RFC) and can be found here:
  https://github.com/riscv/riscv-isa-manual/blob/master/src/zfa.tex

Signed-off-by: Christoph Müllner 
---
Changes in v2:
* Remove calls to mark_fs_dirty() in comparison trans functions
* Rewrite fround(nx) using float*_round_to_int()
* Move fli* to translation unit and fix NaN-boxing of NaN values
* Reimplement FCVTMOD.W.D
* Add use of second register in trans_fmvp_d_x()

 target/riscv/cpu.c|   8 +
 target/riscv/cpu.h|   1 +
 target/riscv/fpu_helper.c | 258 +++
 target/riscv/helper.h |  19 +
 target/riscv/insn32.decode|  67 +++
 target/riscv/insn_trans/trans_rvzfa.c.inc | 529 ++
 target/riscv/translate.c  |   1 +
 7 files changed, 883 insertions(+)
 create mode 100644 target/riscv/insn_trans/trans_rvzfa.c.inc

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 1e97473af2..bac9ced4a2 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -83,6 +83,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(zifencei, true, PRIV_VERSION_1_10_0, ext_ifencei),
 ISA_EXT_DATA_ENTRY(zihintpause, true, PRIV_VERSION_1_10_0, 
ext_zihintpause),
 ISA_EXT_DATA_ENTRY(zawrs, true, PRIV_VERSION_1_12_0, ext_zawrs),
+ISA_EXT_DATA_ENTRY(zfa, true, PRIV_VERSION_1_12_0, ext_zfa),
 ISA_EXT_DATA_ENTRY(zfh, true, PRIV_VERSION_1_11_0, ext_zfh),
 ISA_EXT_DATA_ENTRY(zfhmin, true, PRIV_VERSION_1_12_0, ext_zfhmin),
 ISA_EXT_DATA_ENTRY(zfinx, true, PRIV_VERSION_1_12_0, ext_zfinx),
@@ -404,6 +405,7 @@ static void rv64_thead_c906_cpu_init(Object *obj)
 cpu->cfg.ext_u = true;
 cpu->cfg.ext_s = true;
 cpu->cfg.ext_icsr = true;
+cpu->cfg.ext_zfa = true;
 cpu->cfg.ext_zfh = true;
 cpu->cfg.mmu = true;
 cpu->cfg.ext_xtheadba = true;
@@ -865,6 +867,11 @@ static void riscv_cpu_validate_set_extensions(RISCVCPU 
*cpu, Error **errp)
 return;
 }
 
+if (cpu->cfg.ext_zfa && !cpu->cfg.ext_f) {
+error_setg(errp, "Zfa extension requires F extension");
+return;
+}
+
 if (cpu->cfg.ext_zfh) {
 cpu->cfg.ext_zfhmin = true;
 }
@@ -1381,6 +1388,7 @@ static Property riscv_cpu_extensions[] = {
 DEFINE_PROP_BOOL("Zicsr", RISCVCPU, cfg.ext_icsr, true),
 DEFINE_PROP_BOOL("Zihintpause", RISCVCPU, cfg.ext_zihintpause, true),
 DEFINE_PROP_BOOL("Zawrs", RISCVCPU, cfg.ext_zawrs, true),
+DEFINE_PROP_BOOL("Zfa", RISCVCPU, cfg.ext_zfa, false),
 DEFINE_PROP_BOOL("Zfh", RISCVCPU, cfg.ext_zfh, false),
 DEFINE_PROP_BOOL("Zfhmin", RISCVCPU, cfg.ext_zfhmin, false),
 DEFINE_PROP_BOOL("Zve32f", RISCVCPU, cfg.ext_zve32f, false),
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 638e47c75a..deae410fc2 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -462,6 +462,7 @@ struct RISCVCPUConfig {
 bool ext_svpbmt;
 bool ext_zdinx;
 bool ext_zawrs;
+bool ext_zfa;
 bool ext_zfh;
 bool ext_zfhmin;
 bool ext_zfinx;
diff --git a/target/riscv/fpu_helper.c b/target/riscv/fpu_helper.c
index 449d236df6..c0ebaa040f 100644
--- a/target/riscv/fpu_helper.c
+++ b/target/riscv/fpu_helper.c
@@ -252,6 +252,21 @@ uint64_t helper_fmin_s(CPURISCVState *env, uint64_t rs1, 
uint64_t rs2)
 float32_minimum_number(frs1, frs2, &env->fp_status));
 }
 
+uint64_t helper_fminm_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+float32 frs1 = check_nanbox_s(env, rs1);
+float32 frs2 = check_nanbox_s(env, rs2);
+float32 ret;
+
+if (float32_is_any_nan(frs1) || float32_is_any_nan(frs2)) {
+ret = float32_default_nan(&env->fp_status);
+} else {
+ret = float32_minimum_number(frs1, frs2, &env->fp_status);
+}
+
+return nanbox_s(env, ret);
+}
+
 uint64_t helper_fmax_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
 {
 float32 frs1 = check_nanbox_s(env, rs1);
@@ -261,6 +276,21 @@ uint64_t helper_fmax_s(CPURISCVState *env, uint64_t rs1, 
uint64_t rs

Re: [PATCH v5 3/3] qtest: Add a test case for TPM TIS I2C connected to Aspeed I2C controller

2023-03-31 Thread Ninad Palsule




On 3/31/23 12:30 PM, Stefan Berger wrote:

Add a test case for the TPM TIS I2C device exercising most of its
functionality, including localities.

Signed-off-by: Stefan Berger 
Tested-by: Cédric Le Goater 
---


Tested-by: Ninad Palsule

Re: Cxl devel!

2023-03-31 Thread Maverickk 78

Hi Jonathan,

Thanks for the response, effort and time you spent to list down the
TODOs in CXL space.

I just started understanding CXL2.0, am part of a startup developing a
CXL2.0 switch to build
compostable architecture, it's been 6 weeks.

As part of it I have built QEMU and configured with CXL devices as
documented in
https://stevescargall.com/blog/2022/01/20/how-to-emulate-cxl-devices-using-kvm-and-qemu/

And use your PoC code to understand the FMAPI & MCTP message flow.

Going forward I will ramp-up on the existing support in QEMU,
especially regarding the points you listed and
get used to the development/debug/test workflow, maybe I need 2-3
weeks to process all the information
you provided.

Any cheatsheets from your side will be helpful and it will help me
catch up soon.

Looking forward to working with you.

Regards
Raghu



On Tue, 28 Mar 2023 at 18:29, Jonathan Cameron
 wrote:
>
> On Fri, 24 Mar 2023 04:32:52 +0530
> Maverickk 78  wrote:
>
> > Hello Jonathan
> >
> > Raghu here, I'm going over your cxl patches for past few days, it's very
> > impressive.
> >
> > I want to get involved and contribute in your endeavor, may be bits &
> > pieces to start.
> >
> > If you're specific trivial task(cvl/pcie/fm) about cxl, please let me know.
> >
> > Regards
> > Raghu
> >
>
> Hi Raghu,
>
> Great that you are interested in getting involved.
>
> As to suggestions for what to do, it's depends on what interests you.
> I'll list some broad categories and hopefully we can focus in on stuff.
>
> Following is brainstorming on the spot, so I've probably forgotten lots
> of things.   There is an out of date todo at:
> https://gitlab.com/jic23/qemu/-/wikis/TODO%20list
>
> Smallish tasks.
> 1) Increase fidelity of emulation.  In many places we take short cuts in
>the interests of supporting 'enough' to be able to test kernel code 
> against..
>A classic example of this is we don't perform any of the checks we should 
> be
>on HDM decoders.  Tightening those restrictions up would be great. 
> Typically that
>involves tweaking the kernel code to try and do 'wrong' things.
>There are some other examples of this on gitlab.com/jic23/qemu around 
> locking of
>registers. This is rarely as high priority as 'new features' but we will 
> want to
>tidy up all these loose corners eventually.
> 2) Missing features.  An example of this is the security related stuff that 
> went into
>the kernel recently.  Whilst that is fairly easy to check using the cxl 
> mocking
>driver in the kernel, I'd also like to see a QEMU implementation.
>Some of the big features don't interact as they should.  For instance we 
> don't report
>poison list overflow via the event log yet.  It would be great to get this 
> all working
>rather than requiring injection of poison and the event as currently 
> needed (not all
>upstream yet).
> 3) Cleanup some of the existing emulation that we haven't upstreamed yet.
>- CPMU. Main challenge with this is finding balance between insane 
> commandlines
>  and flexibility.  Right now the code on gitlab.com/jic23/qemu 
> (cxl-)
>  provides a fairly random set of counters that were handy for testing 
> corners
>  of the driver that's at v3 on the kernel mailing lists.
>- Review and testing of the stuff that is on my tree (all been on list I 
> think) but
>  not yet at the top. Fixing up problems with that in advance will save us 
> time
>  when proposing them for upstream.
>- SPDM / CMA.  Right now this relies on a connection to SPDM-emu.  I'd 
> like to explore
>  if we can use libspdm as a library instead.  Last time I checked this 
> looked non
>  trivial but the dmtf tools team are keen to help.
>
>
> Bigger stuff - note that people are already looking at some of these but they
> may be interested in some help.
> 1) An example type 2 device.  We'd probably have to invent something along the
>lines of a simple copy offload engine.  The intent being to prove out that
>the kernel code works.  Dan has some stuff on the git.kernel.org tree to 
> support
>type 2 device.
> 2) Tests.  So far we test the bios table generation and that we can start 
> qemu with
>different topologies. I'd love to see a test that actually brings up a 
> region and
>tests some reading and writing + ideally looks at result in memory devices 
> to check
>everything worked.
> 3) Dynamic Capacity Devices - some stuff on going related to this, but there 
> is a lot
>to do.  Main focus today is on MHDs.   Perhaps look at the very earl code 
> posted
>for switch CCIs.  We have a lot of work to do in kernel for this stuff as 
> well.
> 4) MCTP CCI.  I posted a PoC for this a long time back.  It works but we'd 
> need to figure
>out how to wire it up sensibly.
>
> Jonathan
>

Re: [PATCH v6 2/3] qga: Add `merged` variant to GuestExecCaptureOutputMode

2023-03-31 Thread Daniel Xu

Hi Daniel,

On Thu, Mar 23, 2023, at 3:26 AM, Daniel P. Berrangé wrote:
> On Wed, Mar 22, 2023 at 06:19:27PM -0600, Daniel Xu wrote:
>> Currently, any captured output (via `capture-output`) is segregated into
>> separate GuestExecStatus fields (`out-data` and `err-data`). This means
>> that downstream consumers have no way to reassemble the captured data
>> back into the original stream.
>> 
>> This is relevant for chatty and semi-interactive (ie. read only) CLI
>> tools.  Such tools may deliberately interleave stdout and stderr for
>> visual effect. If segregated, the output becomes harder to visually
>> understand.
>> 
>> This commit adds a new enum variant to the GuestExecCaptureOutputMode
>> qapi to merge the output streams such that consumers can have a pristine
>> view of the original command output.
>> 
>> Signed-off-by: Daniel Xu 
>> ---
>>  qga/commands.c   | 25 +++--
>>  qga/qapi-schema.json |  5 -
>>  2 files changed, 27 insertions(+), 3 deletions(-)
>
> Reviewed-by: Daniel P. Berrangé 

Is there anyone in particular I should CC to get this series merged?

Thanks,
Daniel

Re: [PATCH v2 4/6] target/ppc: Alignment faults do not set DSISR in ISA v3.0 onward

2023-03-31 Thread Fabiano Rosas

Nicholas Piggin  writes:

> This optional behavior was removed from the ISA in v3.0, see
> Summary of Changes preface:
>
>   Data Storage Interrupt Status Register for Alignment Interrupt:
>   Simplifies the Alignment interrupt by remov- ing the Data Storage
>   Interrupt Status Register (DSISR) from the set of registers modified
>   by the Alignment interrupt.
>
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas

Re: [PATCH v2 5/6] target/ppc: Add SRR1 prefix indication to interrupt handlers

2023-03-31 Thread Fabiano Rosas

Nicholas Piggin  writes:

> ISA v3.1 introduced prefix instructions. Among the changes, various
> synchronous interrupts report whether they were caused by a prefix
> instruction in (H)SRR1.
>
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas

Re: [PATCH v2 3/6] target/ppc: Fix instruction loading endianness in alignment interrupt

2023-03-31 Thread Fabiano Rosas

Nicholas Piggin  writes:

> powerpc ifetch endianness depends on MSR[LE] so it has to byteswap
> after cpu_ldl_code(). This corrects DSISR bits in alignment
> interrupts when running in little endian mode.
>
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas

Re: [PATCH] riscv: Add support for the Zfa extension

2023-03-31 Thread Richard Henderson


On 3/31/23 11:22, Christoph Müllner wrote:

On Mon, Mar 27, 2023 at 7:18 PM Richard Henderson
 wrote:


On 3/27/23 01:00, Christoph Muellner wrote:

+uint64_t helper_fminm_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+float32 frs1 = check_nanbox_s(env, rs1);
+float32 frs2 = check_nanbox_s(env, rs2);
+
+if (float32_is_any_nan(frs1) || float32_is_any_nan(frs2)) {
+return float32_default_nan(&env->fp_status);
+}
+
+return nanbox_s(env, float32_minimum_number(frs1, frs2, &env->fp_status));
+}


Better to set and clear fp_status->default_nan_mode around the operation.


I don't see how this can help:
* default_nan_mode defines if the default_nan is generated or if the
operand's NaN should be used
* RISC-V has default_nan_mode always set to true (operations should
return the a canonical NaN and not propagate NaN values)
* That also does not help to eliminate the is_any_nan() tests, because
float*_minimum_number() and float*_minnum() return the non-NaN number
if (only) one operand is NaN

Am I missing something?


Oh goodness, I did mis-read this.

But if you need a nan when an input is a nan, then float32_min instead of 
float32_minimum_number (which goes out of its way to select the non-nan result) is the 
correct function to use.



r~

Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram

2023-03-31 Thread Peter Xu

On Fri, Mar 31, 2023 at 03:18:37PM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > On Fri, Mar 31, 2023 at 05:10:16PM +0100, Daniel P. Berrangé wrote:
> >> On Fri, Mar 31, 2023 at 11:55:03AM -0400, Peter Xu wrote:
> >> > On Fri, Mar 31, 2023 at 12:30:45PM -0300, Fabiano Rosas wrote:
> >> > > Peter Xu  writes:
> >> > > 
> >> > > > On Fri, Mar 31, 2023 at 11:37:50AM -0300, Fabiano Rosas wrote:
> >> > > >> >> Outgoing migration to file. NVMe disk. XFS filesystem.
> >> > > >> >> 
> >> > > >> >> - Single migration runs of stopped 32G guest with ~90% RAM 
> >> > > >> >> usage. Guest
> >> > > >> >>   running `stress-ng --vm 4 --vm-bytes 90% --vm-method all 
> >> > > >> >> --verify -t
> >> > > >> >>   10m -v`:
> >> > > >> >> 
> >> > > >> >> migration type  | MB/s | pages/s |  ms
> >> > > >> >> +--+-+--
> >> > > >> >> savevm io_uring |  434 |  102294 | 71473
> >> > > >> >
> >> > > >> > So I assume this is the non-live migration scenario.  Could you 
> >> > > >> > explain
> >> > > >> > what does io_uring mean here?
> >> > > >> >
> >> > > >> 
> >> > > >> This table is all non-live migration. This particular line is a 
> >> > > >> snapshot
> >> > > >> (hmp_savevm->save_snapshot). I thought it could be relevant because 
> >> > > >> it
> >> > > >> is another way by which we write RAM into disk.
> >> > > >
> >> > > > I see, so if all non-live that explains, because I was curious 
> >> > > > what's the
> >> > > > relationship between this feature and the live snapshot that QEMU 
> >> > > > also
> >> > > > supports.
> >> > > >
> >> > > > I also don't immediately see why savevm will be much slower, do you 
> >> > > > have an
> >> > > > answer?  Maybe it's somewhere but I just overlooked..
> >> > > >
> >> > > 
> >> > > I don't have a concrete answer. I could take a jab and maybe blame the
> >> > > extra memcpy for the buffer in QEMUFile? Or perhaps an unintended 
> >> > > effect
> >> > > of bandwidth limits?
> >> > 
> >> > IMHO it would be great if this can be investigated and reasons provided 
> >> > in
> >> > the next cover letter.
> >> > 
> >> > > 
> >> > > > IIUC this is "vm suspend" case, so there's an extra benefit 
> >> > > > knowledge of
> >> > > > "we can stop the VM".  It smells slightly weird to build this on top 
> >> > > > of
> >> > > > "migrate" from that pov, rather than "savevm", though.  Any thoughts 
> >> > > > on
> >> > > > this aspect (on why not building this on top of "savevm")?
> >> > > >
> >> > > 
> >> > > I share the same perception. I have done initial experiments with
> >> > > savevm, but I decided to carry on the work that was already started by
> >> > > others because my understanding of the problem was yet incomplete.
> >> > > 
> >> > > One point that has been raised is that the fixed-ram format alone does
> >> > > not bring that many performance improvements. So we'll need
> >> > > multi-threading and direct-io on top of it. Re-using multifd
> >> > > infrastructure seems like it could be a good idea.
> >> > 
> >> > The thing is IMHO concurrency is not as hard if VM stopped, and when 
> >> > we're
> >> > 100% sure locally on where the page will go.
> >> 
> >> We shouldn't assume the VM is stopped though. When saving to the file
> >> the VM may still be active. The fixed-ram format lets us re-write the
> >> same memory location on disk multiple times in this case, thus avoiding
> >> growth of the file size.
> >
> > Before discussing on reusing multifd below, now I have a major confusing on
> > the use case of the feature..
> >
> > The question is whether we would like to stop the VM after fixed-ram
> > migration completes.  I'm asking because:
> >
> 
> We would.
> 
> >   1. If it will stop, then it looks like a "VM suspend" to me. If so, could
> >  anyone help explain why we don't stop the VM first then migrate?
> >  Because it avoids copying single pages multiple times, no fiddling
> >  with dirty tracking at all - we just don't ever track anything.  In
> >  short, we'll stop the VM anyway, then why not stop it slightly
> >  earlier?
> >
> 
> Looking at the previous discussions I don't see explicit mentions of a
> requirement either way (stop before or stop after). I agree it makes
> more sense to stop the guest first and then migrate without having to
> deal with dirty pages.
> 
> I presume libvirt just migrates without altering the guest run state so
> we implemented this to work in both scenarios. But even then, it seems
> QEMU could store the current VM state, stop it, migrate and restore the
> state on the destination.

Yes, I can understand having a unified interface for libvirt would be great
in this case.  So I am personally not against reusing qmp command "migrate"
if that would help in any case from libvirt pov.

However this is an important question to be answered very sure before
building more things on top.  IOW, even if reusing QMP migrate, we could
consider a totally different impl (e.g. don't reuse migration thread model)

Re: [PATCH 0/7] bsd-user: remove bitrotted NetBSD and OpenBSD bsd-user support

2023-03-31 Thread Richard Henderson


On 3/31/23 07:18, Warner Losh wrote:

The NetBSD and OpenBSD support in bsd-user hasn't built since before the meson
conversion. It's also out of sync with many of the recent changes in the
bsd-user fork and has just been removed there. Remove it from master for the
same reasons: it generates a number of false positives with grep and has
increasingly gotten in the way. The bsd-user fork code is much more advanced,
and even it doesn't compile and is out of date. Remove this from both
branches. If others wish to bring it up to speed, I'm happy to help them.

Warner Losh (7):
   bsd-user: Remove obsolete prototypes
   bsd-user: Remove netbsd system call inclusion and defines
   bsd-user: Remove netbsd system call tracing
   bsd-user: Remove openbsd system call inclusion and defines
   bsd-user: Remove openbsd system call tracing
   bsd-user: Remove netbsd directory
   bsd-user: Remove openbsd directory


Reviewed-by: Richard Henderson 

r~

1 2 >

1 - 100 of 129 matches

Mail list logo