date:20211014

Re: [PATCH RFC 00/15] virtio-mem: Expose device memory via separate memslots

2021-10-14 Thread David Hildenbrand

On 13.10.21 21:03, Dr. David Alan Gilbert wrote:
> * David Hildenbrand (da...@redhat.com) wrote:
>> Based-on: 20211011175346.15499-1-da...@redhat.com
>>
>> A virtio-mem device is represented by a single large RAM memory region
>> backed by a single large mmap.
>>
>> Right now, we map that complete memory region into guest physical addres
>> space, resulting in a very large memory mapping, KVM memory slot, ...
>> although only a small amount of memory might actually be exposed to the VM.
>>
>> For example, when starting a VM with a 1 TiB virtio-mem device that only
>> exposes little device memory (e.g., 1 GiB) towards the VM initialliy,
>> in order to hotplug more memory later, we waste a lot of memory on metadata
>> for KVM memory slots (> 2 GiB!) and accompanied bitmaps. Although some
>> optimizations in KVM are being worked on to reduce this metadata overhead
>> on x86-64 in some cases, it remains a problem with nested VMs and there are
>> other reasons why we would want to reduce the total memory slot to a
>> reasonable minimum.
>>
>> We want to:
>> a) Reduce the metadata overhead, including bitmap sizes inside KVM but also
>>inside QEMU KVM code where possible.
>> b) Not always expose all device-memory to the VM, to reduce the attack
>>surface of malicious VMs without using userfaultfd.
>>
>> So instead, expose the RAM memory region not by a single large mapping
>> (consuming one memslot) but instead by multiple mappings, each consuming
>> one memslot. To do that, we divide the RAM memory region via aliases into
>> separate parts and only map the aliases into a device container we actually
>> need. We have to make sure that QEMU won't silently merge the memory
>> sections corresponding to the aliases (and thereby also memslots),
>> otherwise we lose atomic updates with KVM and vhost-user, which we deeply
>> care about when adding/removing memory. Further, to get memslot accounting
>> right, such merging is better avoided.
>>
>> Within the memslots, virtio-mem can (un)plug memory in smaller granularity
>> dynamically. So memslots are a pure optimization to tackle a) and b) above.
>>
>> Memslots are right now mapped once they fall into the usable device region
>> (which grows/shrinks on demand right now either when requesting to
>>  hotplug more memory or during/after reboots). In the future, with
>> VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE, we'll be able to (un)map aliases even
>> more dynamically when (un)plugging device blocks.
>>
>>
>> Adding a 500GiB virtio-mem device and not hotplugging any memory results in:
>> 00014000-01047fff (prio 0, i/o): device-memory
>>   00014000-007e3fff (prio 0, i/o): virtio-mem-memslots
>>
>> Requesting the VM to consume 2 GiB results in (note: the usable region size
>> is bigger than 2 GiB, so 3 * 1 GiB memslots are required):
>> 00014000-01047fff (prio 0, i/o): device-memory
>>   00014000-007e3fff (prio 0, i/o): virtio-mem-memslots
>> 00014000-00017fff (prio 0, ram): alias 
>> virtio-mem-memslot-0 @mem0 -3fff
>> 00018000-0001bfff (prio 0, ram): alias 
>> virtio-mem-memslot-1 @mem0 4000-7fff
>> 0001c000-0001 (prio 0, ram): alias 
>> virtio-mem-memslot-2 @mem0 8000-bfff
> 
> I've got a vague memory that there were some devices that didn't like
> doing split IO across a memory region (or something) - some virtio
> devices?  Do you know if that's still true and if that causes a problem?

Interesting point! I am not aware of any such issues, and I'd be
surprised if we'd still have such buggy devices, because the layout
virtio-mem now creates is just very similar to the layout we'll
automatically create with ordinary DIMMs.

If we hotplug DIMMs they will end up consecutive in guest physical
address space, however, having separate memory regions and requiring
separate memory slots. So, very similar to a virtio-mem device now.

Maybe the catch is that it's hard to cross memory regions that are e.g.,
>- 128 MiB aligned, because ordinary allocations (e.g., via the buddy in
Linux which supports <= 4 MiB pages) in won't cross these blocks.

-- 
Thanks,

David / dhildenb

Re: [PATCH v3 3/3] vdpa: Check for iova range at mappings changes

2021-10-14 Thread Jason Wang

On Thu, Oct 14, 2021 at 1:57 PM Eugenio Perez Martin
 wrote:
>
> On Thu, Oct 14, 2021 at 5:30 AM Jason Wang  wrote:
> >
> > On Tue, Oct 12, 2021 at 10:07 PM Eugenio Pérez  wrote:
> > >
> > > Check vdpa device range before updating memory regions so we don't add
> > > any outside of it, and report the invalid change if any.
> > >
> > > Signed-off-by: Eugenio Pérez 
> > > ---
> > >  include/hw/virtio/vhost-vdpa.h |  2 ++
> > >  hw/virtio/vhost-vdpa.c | 62 +-
> > >  hw/virtio/trace-events |  1 +
> > >  3 files changed, 49 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/include/hw/virtio/vhost-vdpa.h 
> > > b/include/hw/virtio/vhost-vdpa.h
> > > index a8963da2d9..c288cf7ecb 100644
> > > --- a/include/hw/virtio/vhost-vdpa.h
> > > +++ b/include/hw/virtio/vhost-vdpa.h
> > > @@ -13,6 +13,7 @@
> > >  #define HW_VIRTIO_VHOST_VDPA_H
> > >
> > >  #include "hw/virtio/virtio.h"
> > > +#include "standard-headers/linux/vhost_types.h"
> > >
> > >  typedef struct VhostVDPAHostNotifier {
> > >  MemoryRegion mr;
> > > @@ -24,6 +25,7 @@ typedef struct vhost_vdpa {
> > >  uint32_t msg_type;
> > >  bool iotlb_batch_begin_sent;
> > >  MemoryListener listener;
> > > +struct vhost_vdpa_iova_range iova_range;
> > >  struct vhost_dev *dev;
> > >  VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
> > >  } VhostVDPA;
> > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > index be7c63b4ba..dbf773d032 100644
> > > --- a/hw/virtio/vhost-vdpa.c
> > > +++ b/hw/virtio/vhost-vdpa.c
> > > @@ -37,20 +37,34 @@ static Int128 vhost_vdpa_section_end(const 
> > > MemoryRegionSection *section)
> > >  return llend;
> > >  }
> > >
> > > -static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection 
> > > *section)
> > > -{
> > > -return (!memory_region_is_ram(section->mr) &&
> > > -!memory_region_is_iommu(section->mr)) ||
> > > -memory_region_is_protected(section->mr) ||
> > > -   /* vhost-vDPA doesn't allow MMIO to be mapped  */
> > > -memory_region_is_ram_device(section->mr) ||
> > > -   /*
> > > -* Sizing an enabled 64-bit BAR can cause spurious mappings to
> > > -* addresses in the upper part of the 64-bit address space.  
> > > These
> > > -* are never accessed by the CPU and beyond the address width 
> > > of
> > > -* some IOMMU hardware.  TODO: VDPA should tell us the IOMMU 
> > > width.
> > > -*/
> > > -   section->offset_within_address_space & (1ULL << 63);
>
> [1]
>
> > > +static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection 
> > > *section,
> > > +uint64_t iova_min,
> > > +uint64_t iova_max)
> > > +{
> > > +Int128 llend;
> > > +
> > > +if ((!memory_region_is_ram(section->mr) &&
> > > + !memory_region_is_iommu(section->mr)) ||
> > > +memory_region_is_protected(section->mr) ||
> > > +/* vhost-vDPA doesn't allow MMIO to be mapped  */
> > > +memory_region_is_ram_device(section->mr)) {
> > > +return true;
> > > +}
> > > +
> > > +if (section->offset_within_address_space < iova_min) {
> > > +error_report("RAM section out of device range (min=%lu, 
> > > addr=%lu)",
> > > + iova_min, section->offset_within_address_space);
> > > +return true;
> > > +}
> > > +
> > > +llend = vhost_vdpa_section_end(section);
> > > +if (int128_gt(llend, int128_make64(iova_max))) {
> > > +error_report("RAM section out of device range (max=%lu, end 
> > > addr=%lu)",
> > > + iova_max, int128_get64(llend));
> > > +return true;
> > > +}
> > > +
> > > +return false;
> > >  }
> > >
> > >  static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr 
> > > size,
> > > @@ -162,7 +176,8 @@ static void 
> > > vhost_vdpa_listener_region_add(MemoryListener *listener,
> > >  void *vaddr;
> > >  int ret;
> > >
> > > -if (vhost_vdpa_listener_skipped_section(section)) {
> > > +if (vhost_vdpa_listener_skipped_section(section, v->iova_range.first,
> > > +v->iova_range.last)) {
> > >  return;
> > >  }
> > >
> > > @@ -220,7 +235,8 @@ static void 
> > > vhost_vdpa_listener_region_del(MemoryListener *listener,
> > >  Int128 llend, llsize;
> > >  int ret;
> > >
> > > -if (vhost_vdpa_listener_skipped_section(section)) {
> > > +if (vhost_vdpa_listener_skipped_section(section, v->iova_range.first,
> > > +v->iova_range.last)) {
> > >  return;
> > >  }
> > >
> > > @@ -288,6 +304,19 @@ static void vhost_vdpa_add_status(struct vhost_dev 
> > > *dev, uint8_t status)
> > >  vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &s);
> > >  }
> > >
> > > +static void vho

Re: [PATCH 00/11] 9p: Add support for Darwin

2021-10-14 Thread Greg Kurz

Hi Will,

It is strongly recommended that you Cc maintainers to increase the odds
they notice your patches in the flood of qemu-devel. FYI I only noticed
them because git-send-email Cc'd me thanks to the Reviewed-by: tags and
my address didn't change in the meantime. I'm thus Cc'ing Christian
who is the primary maintainer now (i.e. the person that can merge
your patches and send a PR for upstream inclusion).

FWIW git-publish [1] can Cc the relevant people for free.

[1] https://github.com/stefanha/git-publish

On Wed, 13 Oct 2021 19:03:54 -0400
Will Cohen  wrote:

> This is a continuation of a patch series adding 9p server support for Darwin,
> originally submitted by Keno Fischer in mid-2018
> (https://lists.nongnu.org/archive/html/qemu-devel/2018-06/msg04643.html). In
> some sense, this could be considered [PATCH v4] of that process, but I assume
> that the multi-year gap merits a fresh start..
> 

This makes sense. For consistency with that assumption, it would also
make sense to clear all preexisting Reviewed-by: tags.

> It has since been updated and rebased for NixOS by Michael Roitzsch
> (https://github.com/NixOS/nixpkgs/pull/122420) with a goal of resubmitting
> upstream. I am submitting his patch set as suggested, as developed by Michael,
> with his Signed-off-by headers included in full.
> 

QEMU cares about tracking of who did what and follows a policy inspired
from the linux kernel [2] [3].

Michael's Signed-off-by: should then appear on all patches, with a
mention of the extra changes that he made, e.g.

Signed-off-by: Keno Fischer 
[Michael Roitzsch: - rebased for NixOS
   - some other change]
Signed-off-by: Michael Roitzsch 

If no changes were made, you still need to add a Signed-off-by: tag.

[2] https://wiki.qemu.org/Contribute/SubmitAPatch
[3] 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297

> Additionally, I have run the patches through checkpatch.pl and adjusted coding
> style accordingly (with the exception of ignoring warnings about avoid

Good ! If you have an account on gitlab, you can also push a branch there.
It will be submitted to gitlab CI and maybe give you the opportunity to
polish the patches some more before submission.

> architecture specific defines in hw/9pfs/9p-util-darwin.c, where they seem
> unavoidable), and have signed off on those modified commits.
> 

As explained above, your Signed-off-by: is also needed in all patches,
even if you didn't change them.

Cheers,

--
Greg

> 
>

Re: [PATCH v1] libvhost-user: fix VHOST_USER_REM_MEM_REG not closing the fd

2021-10-14 Thread David Hildenbrand

On 14.10.21 07:29, Raphael Norwitz wrote:
> On Wed, Oct 13, 2021 at 11:51:24AM +0200, David Hildenbrand wrote:
>> On 13.10.21 11:48, Stefan Hajnoczi wrote:
>>> On Tue, Oct 12, 2021 at 08:38:32PM +0200, David Hildenbrand wrote:
 We end up not closing the file descriptor, resulting in leaking one
 file descriptor for each VHOST_USER_REM_MEM_REG message.

 Fixes: 875b9fd97b34 ("Support individual region unmap in libvhost-user")
 Cc: Michael S. Tsirkin 
 Cc: Raphael Norwitz 
 Cc: "Marc-André Lureau" 
 Cc: Stefan Hajnoczi 
 Cc: Paolo Bonzini 
 Cc: Coiby Xu 
 Signed-off-by: David Hildenbrand 
 ---
   subprojects/libvhost-user/libvhost-user.c | 2 ++
   1 file changed, 2 insertions(+)

 diff --git a/subprojects/libvhost-user/libvhost-user.c 
 b/subprojects/libvhost-user/libvhost-user.c
 index bf09693255..bb5c3b3280 100644
 --- a/subprojects/libvhost-user/libvhost-user.c
 +++ b/subprojects/libvhost-user/libvhost-user.c
 @@ -839,6 +839,8 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
   vu_panic(dev, "Specified region not found\n");
   }
 +close(vmsg->fds[0]);
>>>
>>> Does anything check that exactly 1 fd was received? For example,
>>> vu_set_log_fd_exec() does:
>>>
>>>if (vmsg->fd_num != 1) {
>>>vu_panic(dev, "Invalid log_fd message");
>>>return false;
>>>}
>>>
>>> I think that's necessary both to make vhost-user master development
>>> easier and because fds[] is not initialized to -1.
> 
> Ack - will add that.
> 
>>
>> Similarly, vu_add_mem_reg() assumes exactly one was sent AFAIKS.
> 
> Ack
> 
>>
>> If we panic, do we still have to call vmsg_close_fds() ?
>>
> 
> I think so. What else will close the FDs?
> 
> AFAICT a vu_panic does not imply that the overall process has to die if that's
> what you mean. What if one process is exposing multiple devices and only one 
> of
> them panics?

So IIUC, you'll send some patches to tackle the fd checks?

While at it, we might want to simplify VHOST_USER_REM_MEM_REG. 
I have a patch there that needs tweaking to cover the point Stefan raised
regarding duplicate ranges. We might want to do the memmove within the loop
instead and drop the "break" to process all elements.


commit 34d71b6531c74a61442432b37e5829a76a7017c5
Author: David Hildenbrand 
Date:   Tue Oct 12 13:25:43 2021 +0200

libvhost-user: Simplify VHOST_USER_REM_MEM_REG

Let's avoid having to manually copy all elements. Copy only the ones
necessary to close the hole and perform the operation in-place without
a second array.

Signed-off-by: David Hildenbrand 

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index 7b0e40256e..499c31dc68 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -796,10 +796,8 @@ static inline bool reg_equal(VuDevRegion *vudev_reg,
 
 static bool
 vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
-int i, j;
-bool found = false;
-VuDevRegion shadow_regions[VHOST_USER_MAX_RAM_SLOTS] = {};
 VhostUserMemoryRegion m = vmsg->payload.memreg.region, *msg_region = &m;
+int i;
 
 DPRINT("Removing region:\n");
 DPRINT("guest_phys_addr: 0x%016"PRIx64"\n",
@@ -811,28 +809,27 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
 DPRINT("mmap_offset  0x%016"PRIx64"\n",
msg_region->mmap_offset);
 
-for (i = 0, j = 0; i < dev->nregions; i++) {
-if (!reg_equal(&dev->regions[i], msg_region)) {
-shadow_regions[j].gpa = dev->regions[i].gpa;
-shadow_regions[j].size = dev->regions[i].size;
-shadow_regions[j].qva = dev->regions[i].qva;
-shadow_regions[j].mmap_addr = dev->regions[i].mmap_addr;
-shadow_regions[j].mmap_offset = dev->regions[i].mmap_offset;
-j++;
-} else {
-found = true;
+for (i = 0; i < dev->nregions; i++) {
+if (reg_equal(&dev->regions[i], msg_region)) {
 VuDevRegion *r = &dev->regions[i];
 void *m = (void *) (uintptr_t) r->mmap_addr;
 
 if (m) {
 munmap(m, r->size + r->mmap_offset);
 }
+break;
 }
 }
 
-if (found) {
-memcpy(dev->regions, shadow_regions,
-   sizeof(VuDevRegion) * VHOST_USER_MAX_RAM_SLOTS);
+if (i < dev->nregions) {
+/*
+ * Shift all affected entries by 1 to close the hole at index i and
+ * zero out the last entry.
+ */
+memmove(dev->regions + i, dev->regions + i + 1,
+   sizeof(VuDevRegion) * (dev->nregions - i - 1));
+memset(dev->regions + dev->nregions - 1, 0,
+   sizeof(VuDevRegion));
 DPRINT("Successfully removed a region\n");
 dev->nregions--;
 vmsg_set_reply_u64(vmsg, 0);



On a related note, I proposed in a RFC series to increase t

Re: [PATCH v2 04/13] target/riscv: Replace riscv_cpu_is_32bit with riscv_cpu_mxl

2021-10-14 Thread LIU Zhiwei




On 2021/10/14 上午4:50, Richard Henderson wrote:

Shortly, the set of supported XL will not be just 32 and 64,
and representing that properly using the enumeration will be
imperative.

Two places, booting and gdb, intentionally use misa_mxl_max
to emphasize the use of the reset value of misa.mxl, and not
the current cpu state.

Signed-off-by: Richard Henderson 
---
  target/riscv/cpu.h|  9 -
  hw/riscv/boot.c   |  2 +-
  semihosting/arm-compat-semi.c |  2 +-
  target/riscv/cpu.c| 24 ++--
  target/riscv/cpu_helper.c | 12 ++--
  target/riscv/csr.c| 24 
  target/riscv/gdbstub.c|  2 +-
  target/riscv/monitor.c|  4 ++--
  8 files changed, 45 insertions(+), 34 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index e708fcc168..87248b562a 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -396,7 +396,14 @@ FIELD(TB_FLAGS, VILL, 8, 1)
  FIELD(TB_FLAGS, HLSX, 9, 1)
  FIELD(TB_FLAGS, MSTATUS_HS_FS, 10, 2)
  
-bool riscv_cpu_is_32bit(CPURISCVState *env);

+#ifdef CONFIG_RISCV32
+#define riscv_cpu_mxl(env)  MXL_RV32
+#else
+static inline RISCVMXL riscv_cpu_mxl(CPURISCVState *env)
+{
+return env->misa_mxl;
+}
+#endif
  


Hi Richard,

I don't know why we use CONFIG_RISCV32 here. I looked through the target 
source code. It doesn't use this macro before.


And why we need special process of CONFIG_RISCV32.


  /*
   * A simplification for VLMAX
diff --git a/hw/riscv/boot.c b/hw/riscv/boot.c
index 993bf89064..d1ffc7b56c 100644
--- a/hw/riscv/boot.c
+++ b/hw/riscv/boot.c
@@ -35,7 +35,7 @@
  
  bool riscv_is_32bit(RISCVHartArrayState *harts)

  {
-return riscv_cpu_is_32bit(&harts->harts[0].env);
+return harts->harts[0].env.misa_mxl_max == MXL_RV32;


Why not use  misa_mxl  here?  As this is just a replacement of 
riscv_cpu_is_32bit like many other places.


I think it should give more explicitly explanation when we use misa_mxl_max.

Thanks,
Zhiwei


  }
  
  target_ulong riscv_calc_kernel_start_addr(RISCVHartArrayState *harts,

diff --git a/semihosting/arm-compat-semi.c b/semihosting/arm-compat-semi.c
index 01badea99c..37963becae 100644
--- a/semihosting/arm-compat-semi.c
+++ b/semihosting/arm-compat-semi.c
@@ -775,7 +775,7 @@ static inline bool is_64bit_semihosting(CPUArchState *env)
  #if defined(TARGET_ARM)
  return is_a64(env);
  #elif defined(TARGET_RISCV)
-return !riscv_cpu_is_32bit(env);
+return riscv_cpu_mxl(env) != MXL_RV32;
  #else
  #error un-handled architecture
  #endif
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index fdf031a394..1857670a69 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -108,11 +108,6 @@ const char *riscv_cpu_get_trap_name(target_ulong cause, 
bool async)
  }
  }
  
-bool riscv_cpu_is_32bit(CPURISCVState *env)

-{
-return env->misa_mxl == MXL_RV32;
-}
-
  static void set_misa(CPURISCVState *env, RISCVMXL mxl, uint32_t ext)
  {
  env->misa_mxl_max = env->misa_mxl = mxl;
@@ -249,7 +244,7 @@ static void riscv_cpu_dump_state(CPUState *cs, FILE *f, int 
flags)
  #ifndef CONFIG_USER_ONLY
  qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "mhartid ", env->mhartid);
  qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "mstatus ", 
(target_ulong)env->mstatus);
-if (riscv_cpu_is_32bit(env)) {
+if (riscv_cpu_mxl(env) == MXL_RV32) {
  qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "mstatush ",
   (target_ulong)(env->mstatus >> 32));
  }
@@ -372,10 +367,16 @@ static void riscv_cpu_reset(DeviceState *dev)
  static void riscv_cpu_disas_set_info(CPUState *s, disassemble_info *info)
  {
  RISCVCPU *cpu = RISCV_CPU(s);
-if (riscv_cpu_is_32bit(&cpu->env)) {
+
+switch (riscv_cpu_mxl(&cpu->env)) {
+case MXL_RV32:
  info->print_insn = print_insn_riscv32;
-} else {
+break;
+case MXL_RV64:
  info->print_insn = print_insn_riscv64;
+break;
+default:
+g_assert_not_reached();
  }
  }
  
@@ -631,10 +632,13 @@ static gchar *riscv_gdb_arch_name(CPUState *cs)

  RISCVCPU *cpu = RISCV_CPU(cs);
  CPURISCVState *env = &cpu->env;
  
-if (riscv_cpu_is_32bit(env)) {

+switch (riscv_cpu_mxl(env)) {
+case MXL_RV32:
  return g_strdup("riscv:rv32");
-} else {
+case MXL_RV64:
  return g_strdup("riscv:rv64");
+default:
+g_assert_not_reached();
  }
  }
  
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c

index 14d1d3cb72..403f54171d 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -152,7 +152,7 @@ bool riscv_cpu_fp_enabled(CPURISCVState *env)
  
  void riscv_cpu_swap_hypervisor_regs(CPURISCVState *env)

  {
-uint64_t sd = riscv_cpu_is_32bit(env) ? MSTATUS32_SD : MSTATUS64_SD;
+uint64_t sd = riscv_cpu_mxl(env) == MXL_RV32 ? MSTATUS32_SD : MSTATUS64_SD;
  uint64_t mstatus_mask = MSTATUS_MXR | MSTATUS_SUM | MSTAT

Re: [PATCH v3 3/4] s390x: topology: CPU topology objects and structures

2021-10-14 Thread Thomas Huth


On 16/09/2021 15.50, Pierre Morel wrote:

We use new objects to have a dynamic administration of the CPU topology.
The highest level object in this implementation is the s390 book and
in this first implementation of CPU topology for S390 we have a single
book.
The book is built as a SYSBUS bridge during the CPU initialization.

Every object under this single book will be build dynamically
immediately after a CPU has be realized if it is needed.
The CPU will fill the sockets once after the other, according to the
number of core per socket defined during the smp parsing.

Each CPU inside a socket will be represented by a bit in a 64bit
unsigned long. Set on plug and clear on unplug of a CPU.

Signed-off-by: Pierre Morel 
---
  hw/s390x/cpu-topology.c | 353 
  hw/s390x/meson.build|   1 +
  hw/s390x/s390-virtio-ccw.c  |   4 +
  include/hw/s390x/cpu-topology.h |  67 ++
  target/s390x/cpu.h  |  47 +
  5 files changed, 472 insertions(+)
  create mode 100644 hw/s390x/cpu-topology.c
  create mode 100644 include/hw/s390x/cpu-topology.h

diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
new file mode 100644
index 00..f0ffd86a4f
--- /dev/null
+++ b/hw/s390x/cpu-topology.c
@@ -0,0 +1,353 @@
+/*
+ * CPU Topology
+ *
+ * Copyright 2021 IBM Corp.
+ * Author(s): Pierre Morel 
+
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "hw/sysbus.h"
+#include "hw/s390x/cpu-topology.h"
+#include "hw/qdev-properties.h"
+#include "hw/boards.h"
+#include "qemu/typedefs.h"
+#include "target/s390x/cpu.h"
+#include "hw/s390x/s390-virtio-ccw.h"
+
+static S390TopologyCores *s390_create_cores(S390TopologySocket *socket,
+int origin)
+{
+DeviceState *dev;
+S390TopologyCores *cores;
+const MachineState *ms = MACHINE(qdev_get_machine());
+
+if (socket->bus->num_children >= ms->smp.cores) {
+return NULL;
+}
+
+dev = qdev_new(TYPE_S390_TOPOLOGY_CORES);
+qdev_realize_and_unref(dev, socket->bus, &error_fatal);
+
+cores = S390_TOPOLOGY_CORES(dev);
+cores->origin = origin;
+socket->cnt += 1;
+
+return cores;
+}
+
+static S390TopologySocket *s390_create_socket(S390TopologyBook *book, int id)
+{
+DeviceState *dev;
+S390TopologySocket *socket;
+const MachineState *ms = MACHINE(qdev_get_machine());
+
+if (book->bus->num_children >= ms->smp.sockets) {
+return NULL;
+}
+
+dev = qdev_new(TYPE_S390_TOPOLOGY_SOCKET);
+qdev_realize_and_unref(dev, book->bus, &error_fatal);
+
+socket = S390_TOPOLOGY_SOCKET(dev);
+socket->socket_id = id;
+book->cnt++;
+
+return socket;
+}
+
+/*
+ * s390_get_cores:
+ * @socket: the socket to search into
+ * @origin: the origin specified for the S390TopologyCores
+ *
+ * returns a pointer to a S390TopologyCores structure within a socket having
+ * the specified origin.
+ * First search if the socket is already containing the S390TopologyCores
+ * structure and if not create one with this origin.
+ */
+static S390TopologyCores *s390_get_cores(S390TopologySocket *socket, int 
origin)
+{
+S390TopologyCores *cores;
+BusChild *kid;
+
+QTAILQ_FOREACH(kid, &socket->bus->children, sibling) {
+cores = S390_TOPOLOGY_CORES(kid->child);
+if (cores->origin == origin) {
+return cores;
+}
+}
+return s390_create_cores(socket, origin);
+}
+
+/*
+ * s390_get_socket:
+ * @book: The book to search into
+ * @socket_id: the identifier of the socket to search for
+ *
+ * returns a pointer to a S390TopologySocket structure within a book having
+ * the specified socket_id.
+ * First search if the book is already containing the S390TopologySocket
+ * structure and if not create one with this socket_id.
+ */
+static S390TopologySocket *s390_get_socket(S390TopologyBook *book,
+   int socket_id)
+{
+S390TopologySocket *socket;
+BusChild *kid;
+
+QTAILQ_FOREACH(kid, &book->bus->children, sibling) {
+socket = S390_TOPOLOGY_SOCKET(kid->child);
+if (socket->socket_id == socket_id) {
+return socket;
+}
+}
+return s390_create_socket(book, socket_id);
+}
+
+/*
+ * s390_topology_new_cpu:
+ * @core_id: the core ID is machine wide
+ *
+ * We have a single book returned by s390_get_topology(),
+ * then we build the hierarchy on demand.
+ * Note that we do not destroy the hierarchy on error creating
+ * an entry in the topology, we just keep it empty.
+ * We do not need to worry about not finding a topology level
+ * entry this would have been caught during smp parsing.
+ */
+void s390_topology_new_cpu(int core_id)
+{
+const MachineState *ms = MACHINE(qdev_get_machine(

Re: [PATCH v2] aspeed: Add support for the fp5280g2-bmc board

2021-10-14 Thread Cédric Le Goater


On 10/14/21 08:45, John Wang wrote:

The fp5280g2-bmc is supported by OpenBMC, It's
based on the following device tree

https://github.com/openbmc/linux/blob/dev-5.10/arch/arm/boot/dts/aspeed-bmc-inspur-fp5280g2.dts

Signed-off-by: John Wang 


Is a flash image available so that we can give this new machine a try ?

LGTM.

Reviewed-by: Cédric Le Goater 

Thanks,

C.


---
  hw/arm/aspeed.c | 74 +
  1 file changed, 74 insertions(+)

diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index 01c1747972..21b690334e 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -141,6 +141,21 @@ struct AspeedMachineState {
  SCU_AST2500_HW_STRAP_ACPI_ENABLE |  \
  SCU_HW_STRAP_SPI_MODE(SCU_HW_STRAP_SPI_MASTER))
  
+/* FP5280G2 hardware value: 0XF100D286 */

+#define FP5280G2_BMC_HW_STRAP1 (  \
+SCU_AST2500_HW_STRAP_SPI_AUTOFETCH_ENABLE | \
+SCU_AST2500_HW_STRAP_GPIO_STRAP_ENABLE |\
+SCU_AST2500_HW_STRAP_UART_DEBUG |   \
+SCU_AST2500_HW_STRAP_RESERVED28 |   \
+SCU_AST2500_HW_STRAP_DDR4_ENABLE |  \
+SCU_HW_STRAP_VGA_CLASS_CODE |   \
+SCU_HW_STRAP_LPC_RESET_PIN |\
+SCU_HW_STRAP_SPI_MODE(SCU_HW_STRAP_SPI_MASTER) |\
+SCU_AST2500_HW_STRAP_SET_AXI_AHB_RATIO(AXI_AHB_RATIO_2_1) | \
+SCU_HW_STRAP_MAC1_RGMII |   \
+SCU_HW_STRAP_VGA_SIZE_SET(VGA_16M_DRAM) |   \
+SCU_AST2500_HW_STRAP_RESERVED1)
+
  /* Witherspoon hardware value: 0xF10AD216 (but use romulus definition) */
  #define WITHERSPOON_BMC_HW_STRAP1 ROMULUS_BMC_HW_STRAP1
  
@@ -456,6 +471,15 @@ static void aspeed_machine_init(MachineState *machine)

  arm_load_kernel(ARM_CPU(first_cpu), machine, &aspeed_board_binfo);
  }
  
+static void at24c_eeprom_init(I2CBus *bus, uint8_t addr, uint32_t rsize)

+{
+I2CSlave *i2c_dev = i2c_slave_new("at24c-eeprom", addr);
+DeviceState *dev = DEVICE(i2c_dev);
+
+qdev_prop_set_uint32(dev, "rom-size", rsize);
+i2c_slave_realize_and_unref(i2c_dev, bus, &error_abort);
+}
+
  static void palmetto_bmc_i2c_init(AspeedMachineState *bmc)
  {
  AspeedSoCState *soc = &bmc->soc;
@@ -717,6 +741,34 @@ static void g220a_bmc_i2c_init(AspeedMachineState *bmc)
eeprom_buf);
  }
  
+static void fp5280g2_bmc_i2c_init(AspeedMachineState *bmc)

+{
+AspeedSoCState *soc = &bmc->soc;
+I2CSlave *i2c_mux;
+
+/* The at24c256 */
+at24c_eeprom_init(aspeed_i2c_get_bus(&soc->i2c, 1), 0x50, 32768);
+
+/* The fp5280g2 expects a TMP112 but a TMP105 is compatible */
+i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 2), TYPE_TMP105,
+ 0x48);
+i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 2), TYPE_TMP105,
+ 0x49);
+
+i2c_mux = i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 2),
+ "pca9546", 0x70);
+/* It expects a TMP112 but a TMP105 is compatible */
+i2c_slave_create_simple(pca954x_i2c_get_bus(i2c_mux, 0), TYPE_TMP105,
+ 0x4a);
+
+/* It expects a ds3232 but a ds1338 is good enough */
+i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 4), "ds1338", 0x68);
+
+/* It expects a pca9555 but a pca9552 is compatible */
+i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 8), TYPE_PCA9552,
+ 0x20);
+}
+
  static void rainier_bmc_i2c_init(AspeedMachineState *bmc)
  {
  AspeedSoCState *soc = &bmc->soc;
@@ -1082,6 +1134,24 @@ static void aspeed_machine_g220a_class_init(ObjectClass 
*oc, void *data)
  aspeed_soc_num_cpus(amc->soc_name);
  };
  
+static void aspeed_machine_fp5280g2_class_init(ObjectClass *oc, void *data)

+{
+MachineClass *mc = MACHINE_CLASS(oc);
+AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
+
+mc->desc   = "Inspur FP5280G2 BMC (ARM1176)";
+amc->soc_name  = "ast2500-a1";
+amc->hw_strap1 = FP5280G2_BMC_HW_STRAP1;
+amc->fmc_model = "n25q512a";
+amc->spi_model = "mx25l25635e";
+amc->num_cs= 2;
+amc->macs_mask  = ASPEED_MAC0_ON | ASPEED_MAC1_ON;
+amc->i2c_init  = fp5280g2_bmc_i2c_init;
+mc->default_ram_size = 512 * MiB;
+mc->default_cpus = mc->min_cpus = mc->max_cpus =
+aspeed_soc_num_cpus(amc->soc_name);
+};
+
  static void aspeed_machine_rainier_class_init(ObjectClass *oc, void *data)
  {
  MachineClass *mc = MACHINE_CLASS(oc);
@@ -1146,6 +1216,10 @@ static const TypeInfo aspeed_machine_types[] = {
  .name  = MACHINE_TYPE_NAME("g220a-bmc"),
  .parent= TYPE_ASPEED_MACHINE,
  .class_init= aspeed_machine_g220a_class_init,
+}, {
+

Re: [PATCH] monitor/qmp: fix race with clients disconnecting early

2021-10-14 Thread Markus Armbruster

Stefan Reiter  writes:

> On 10/12/21 11:27 AM, Markus Armbruster wrote:
>> Stefan, any thoughts on this?
>> 
>
> Sorry, I didn't get to work on implementing this. Idea 3 does seem very
> reasonable, though I suppose the question is what all should go into the
> per-session state, and also how it is passed down.

Let's start with the state you have shown to be problematic.  To decide
what else to move from monitor state to session state, we'll want to
review monitor state one by one.  Can be done in review of patches
creating the session state.

Most users need the current session state.  So have struct MonitorQMP
hold a pointer to it.

To execute an in-band command in the main thread, we need the command's
session state.  I'd try adding a pointer to QMPRequest.

Then use reference counting to keep the session alive until all its
commands are processed.

Makes sense?

> We did roll out my initial patch to our users in the meantime and got
> some positive feedback (that specific error disappeared), however another
> one (reported at the same time) still exists, so I was trying to diagnose
> - it might also turn out to be monitor related and resolved by the more
> thorough fix proposed here, but unsure.

That would be lovely.

Thanks!

Re: [PATCH v2] aspeed: Add support for the fp5280g2-bmc board

2021-10-14 Thread John Wang

Cédric Le Goater  于2021年10月14日周四 下午3:19写道：
>
> On 10/14/21 08:45, John Wang wrote:
> > The fp5280g2-bmc is supported by OpenBMC, It's
> > based on the following device tree
> >
> > https://github.com/openbmc/linux/blob/dev-5.10/arch/arm/boot/dts/aspeed-bmc-inspur-fp5280g2.dts
> >
> > Signed-off-by: John Wang 
>
> Is a flash image available so that we can give this new machine a try ?

I've tested. here is the image:
https://drive.google.com/file/d/1lAQ7nG2sq0FfAVydjSlF1zHmnKlgFCVq/view?usp=sharing

it can be built on openbmc/openbmc with
https://gerrit.openbmc-project.xyz/c/openbmc/openbmc/+/47824

>
> LGTM.
>
> Reviewed-by: Cédric Le Goater 
>
> Thanks,
>
> C.
>
> > ---
> >   hw/arm/aspeed.c | 74 +
> >   1 file changed, 74 insertions(+)
> >
> > diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
> > index 01c1747972..21b690334e 100644
> > --- a/hw/arm/aspeed.c
> > +++ b/hw/arm/aspeed.c
> > @@ -141,6 +141,21 @@ struct AspeedMachineState {
> >   SCU_AST2500_HW_STRAP_ACPI_ENABLE |  \
> >   SCU_HW_STRAP_SPI_MODE(SCU_HW_STRAP_SPI_MASTER))
> >
> > +/* FP5280G2 hardware value: 0XF100D286 */
> > +#define FP5280G2_BMC_HW_STRAP1 (  \
> > +SCU_AST2500_HW_STRAP_SPI_AUTOFETCH_ENABLE | \
> > +SCU_AST2500_HW_STRAP_GPIO_STRAP_ENABLE |\
> > +SCU_AST2500_HW_STRAP_UART_DEBUG |   \
> > +SCU_AST2500_HW_STRAP_RESERVED28 |   \
> > +SCU_AST2500_HW_STRAP_DDR4_ENABLE |  \
> > +SCU_HW_STRAP_VGA_CLASS_CODE |   \
> > +SCU_HW_STRAP_LPC_RESET_PIN |\
> > +SCU_HW_STRAP_SPI_MODE(SCU_HW_STRAP_SPI_MASTER) |\
> > +SCU_AST2500_HW_STRAP_SET_AXI_AHB_RATIO(AXI_AHB_RATIO_2_1) | \
> > +SCU_HW_STRAP_MAC1_RGMII |   \
> > +SCU_HW_STRAP_VGA_SIZE_SET(VGA_16M_DRAM) |   \
> > +SCU_AST2500_HW_STRAP_RESERVED1)
> > +
> >   /* Witherspoon hardware value: 0xF10AD216 (but use romulus definition) */
> >   #define WITHERSPOON_BMC_HW_STRAP1 ROMULUS_BMC_HW_STRAP1
> >
> > @@ -456,6 +471,15 @@ static void aspeed_machine_init(MachineState *machine)
> >   arm_load_kernel(ARM_CPU(first_cpu), machine, &aspeed_board_binfo);
> >   }
> >
> > +static void at24c_eeprom_init(I2CBus *bus, uint8_t addr, uint32_t rsize)
> > +{
> > +I2CSlave *i2c_dev = i2c_slave_new("at24c-eeprom", addr);
> > +DeviceState *dev = DEVICE(i2c_dev);
> > +
> > +qdev_prop_set_uint32(dev, "rom-size", rsize);
> > +i2c_slave_realize_and_unref(i2c_dev, bus, &error_abort);
> > +}
> > +
> >   static void palmetto_bmc_i2c_init(AspeedMachineState *bmc)
> >   {
> >   AspeedSoCState *soc = &bmc->soc;
> > @@ -717,6 +741,34 @@ static void g220a_bmc_i2c_init(AspeedMachineState *bmc)
> > eeprom_buf);
> >   }
> >
> > +static void fp5280g2_bmc_i2c_init(AspeedMachineState *bmc)
> > +{
> > +AspeedSoCState *soc = &bmc->soc;
> > +I2CSlave *i2c_mux;
> > +
> > +/* The at24c256 */
> > +at24c_eeprom_init(aspeed_i2c_get_bus(&soc->i2c, 1), 0x50, 32768);
> > +
> > +/* The fp5280g2 expects a TMP112 but a TMP105 is compatible */
> > +i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 2), TYPE_TMP105,
> > + 0x48);
> > +i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 2), TYPE_TMP105,
> > + 0x49);
> > +
> > +i2c_mux = i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 2),
> > + "pca9546", 0x70);
> > +/* It expects a TMP112 but a TMP105 is compatible */
> > +i2c_slave_create_simple(pca954x_i2c_get_bus(i2c_mux, 0), TYPE_TMP105,
> > + 0x4a);
> > +
> > +/* It expects a ds3232 but a ds1338 is good enough */
> > +i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 4), "ds1338", 
> > 0x68);
> > +
> > +/* It expects a pca9555 but a pca9552 is compatible */
> > +i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 8), TYPE_PCA9552,
> > + 0x20);
> > +}
> > +
> >   static void rainier_bmc_i2c_init(AspeedMachineState *bmc)
> >   {
> >   AspeedSoCState *soc = &bmc->soc;
> > @@ -1082,6 +1134,24 @@ static void 
> > aspeed_machine_g220a_class_init(ObjectClass *oc, void *data)
> >   aspeed_soc_num_cpus(amc->soc_name);
> >   };
> >
> > +static void aspeed_machine_fp5280g2_class_init(ObjectClass *oc, void *data)
> > +{
> > +MachineClass *mc = MACHINE_CLASS(oc);
> > +AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
> > +
> > +mc->desc   = "Inspur FP5280G2 BMC (ARM1176)";
> > +amc->soc_name  = "ast2500-a1";
> > +amc->hw_strap1 = FP5280G2_BMC_HW_STRAP1;
> > +amc->fmc_model = "n25q512a";
> > +amc->spi_model =

Re: [PATCH v4 2/2] monitor: refactor set/expire_password and allow VNC display id

2021-10-14 Thread Markus Armbruster

Stefan Reiter  writes:

> On 10/12/21 11:24 AM, Markus Armbruster wrote:
>> Stefan Reiter  writes:
>> 
>>> It is possible to specify more than one VNC server on the command line,
>>> either with an explicit ID or the auto-generated ones à "default",
>>> "vnc2", "vnc3", ...
>>>
>>> It is not possible to change the password on one of these extra VNC
>>> displays though. Fix this by adding a "display" parameter to the
>>> "set_password" and "expire_password" QMP and HMP commands.
>>>
>>> For HMP, the display is specified using the "-d" value flag.
>>>
>>> For QMP, the schema is updated to explicitly express the supported
>>> variants of the commands with protocol-discriminated unions.
>>>
>>> Suggested-by: Eric Blake 
>>> Suggested-by: Markus Armbruster 
>>> Signed-off-by: Stefan Reiter 
>> 
>> [...]
>> 
>>> diff --git a/qapi/ui.json b/qapi/ui.json
>>> index d7567ac866..4244c62c30 100644
>>> --- a/qapi/ui.json
>>> +++ b/qapi/ui.json
>>> @@ -9,22 +9,23 @@
>>>   { 'include': 'common.json' }
>>>   { 'include': 'sockets.json' }
>>>   
>>> +##
>>> +# @DisplayProtocol:
>>> +#
>>> +# Display protocols which support changing password options.
>>> +#
>>> +# Since: 6.2
>>> +#
>>> +##
>>> +{ 'enum': 'DisplayProtocol',
>>> +  'data': [ { 'name': 'vnc', 'if': 'CONFIG_VNC' },
>>> +{ 'name': 'spice', 'if': 'CONFIG_SPICE' } ] }
>>> +
>> 
>> 
>> 
>>>   ##
>>>   # @set_password:
>>>   #
>>>   # Sets the password of a remote display session.
>>>   #
>>> -# @protocol: - 'vnc' to modify the VNC server password
>>> -#- 'spice' to modify the Spice server password
>>> -#
>>> -# @password: the new password
>>> -#
>>> -# @connected: how to handle existing clients when changing the
>>> -# password.  If nothing is specified, defaults to 'keep'
>>> -# 'fail' to fail the command if clients are connected
>>> -# 'disconnect' to disconnect existing clients
>>> -# 'keep' to maintain existing clients
>>> -#
>>>   # Returns: - Nothing on success
>>>   #  - If Spice is not enabled, DeviceNotFound
>>>   #
>>> @@ -37,33 +38,106 @@
>>>   # <- { "return": {} }
>>>   #
>>>   ##
>>> -{ 'command': 'set_password',
>>> -  'data': {'protocol': 'str', 'password': 'str', '*connected': 'str'} }
>>> +{ 'command': 'set_password', 'boxed': true, 'data': 'SetPasswordOptions' }
>>> +
>>> +##
>>> +# @SetPasswordOptions:
>>> +#
>>> +# Data required to set a new password on a display server protocol.
>>> +#
>>> +# @protocol: - 'vnc' to modify the VNC server password
>>> +#- 'spice' to modify the Spice server password
>>> +#
>>> +# @password: the new password
>>> +#
>>> +# Since: 6.2
>>> +#
>>> +##
>>> +{ 'union': 'SetPasswordOptions',
>>> +  'base': { 'protocol': 'DisplayProtocol',
>>> +'password': 'str' },
>>> +  'discriminator': 'protocol',
>>> +  'data': { 'vnc': 'SetPasswordOptionsVnc',
>>> +'spice': 'SetPasswordOptionsSpice' } }
>>> +
>>> +##
>>> +# @SetPasswordAction:
>>> +#
>>> +# An action to take on changing a password on a connection with active 
>>> clients.
>>> +#
>>> +# @fail: fail the command if clients are connected
>>> +#
>>> +# @disconnect: disconnect existing clients
>>> +#
>>> +# @keep: maintain existing clients
>>> +#
>>> +# Since: 6.2
>>> +#
>>> +##
>>> +{ 'enum': 'SetPasswordAction',
>>> +  'data': [ 'fail', 'disconnect', 'keep' ] }
>>> +
>>> +##
>>> +# @SetPasswordActionVnc:
>>> +#
>>> +# See @SetPasswordAction. VNC only supports the keep action. 'connection'
>>> +# should just be omitted for VNC, this is kept for backwards compatibility.
>>> +#
>>> +# @keep: maintain existing clients
>>> +#
>>> +# Since: 6.2
>>> +#
>>> +##
>>> +{ 'enum': 'SetPasswordActionVnc',
>>> +  'data': [ 'keep' ] }
>>> +
>>> +##
>>> +# @SetPasswordOptionsSpice:
>>> +#
>>> +# Options for set_password specific to the VNC procotol.
>>> +#
>>> +# @connected: How to handle existing clients when changing the
>>> +# password. If nothing is specified, defaults to 'keep'.
>>> +#
>>> +# Since: 6.2
>>> +#
>>> +##
>>> +{ 'struct': 'SetPasswordOptionsSpice',
>>> +  'data': { '*connected': 'SetPasswordAction' } }
>>> +
>>> +##
>>> +# @SetPasswordOptionsVnc:
>>> +#
>>> +# Options for set_password specific to the VNC procotol.
>>> +#
>>> +# @display: The id of the display where the password should be changed.
>>> +#   Defaults to the first.
>>> +#
>>> +# @connected: How to handle existing clients when changing the
>>> +# password.
>>> +#
>>> +# Features:
>>> +# @deprecated: For VNC, @connected will always be 'keep', parameter should 
>>> be
>>> +#  omitted.
>>> +#
>>> +# Since: 6.2
>>> +#
>>> +##
>>> +{ 'struct': 'SetPasswordOptionsVnc',
>>> +  'data': { '*display': 'str',
>>> +'*connected': { 'type': 'SetPasswordActionVnc',
>>> +'features': ['deprecated'] } } }
>> 
>> Let me describe what you're doing in my own words, to make sure I got
>> both the what and the why:
>> 
>> 1. Change @pro

Re: [PATCH v2] aspeed: Add support for the fp5280g2-bmc board

2021-10-14 Thread Cédric Le Goater


On 10/14/21 09:36, John Wang wrote:

Cédric Le Goater  于2021年10月14日周四 下午3:19写道：


On 10/14/21 08:45, John Wang wrote:

The fp5280g2-bmc is supported by OpenBMC, It's
based on the following device tree

https://github.com/openbmc/linux/blob/dev-5.10/arch/arm/boot/dts/aspeed-bmc-inspur-fp5280g2.dts

Signed-off-by: John Wang 


Is a flash image available so that we can give this new machine a try ?


I've tested. here is the image:
https://drive.google.com/file/d/1lAQ7nG2sq0FfAVydjSlF1zHmnKlgFCVq/view?usp=sharing


I will add it to my collection of images for non regression.



it can be built on openbmc/openbmc with
https://gerrit.openbmc-project.xyz/c/openbmc/openbmc/+/47824


Thanks

C.

Re: [PATCH v2 03/13] target/riscv: Split misa.mxl and misa.ext

2021-10-14 Thread LIU Zhiwei




On 2021/10/14 上午4:50, Richard Henderson wrote:

The hw representation of misa.mxl is at the high bits of the
misa csr.  Representing this in the same way inside QEMU
results in overly complex code trying to check that field.

Signed-off-by: Richard Henderson 
---
v2: Reset misa.mxl.
---
  target/riscv/cpu.h  | 15 +++
  linux-user/elfload.c|  2 +-
  linux-user/riscv/cpu_loop.c |  2 +-
  target/riscv/cpu.c  | 78 +
  target/riscv/csr.c  | 44 ++---
  target/riscv/gdbstub.c  |  8 ++--
  target/riscv/machine.c  | 10 +++--
  target/riscv/translate.c| 10 +++--
  8 files changed, 100 insertions(+), 69 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 7084efc452..e708fcc168 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -25,6 +25,7 @@
  #include "exec/cpu-defs.h"
  #include "fpu/softfloat-types.h"
  #include "qom/object.h"
+#include "cpu_bits.h"
  
  #define TCG_GUEST_DEFAULT_MO 0
  
@@ -51,9 +52,6 @@

  # define TYPE_RISCV_CPU_BASETYPE_RISCV_CPU_BASE64
  #endif
  
-#define RV32 ((target_ulong)1 << (TARGET_LONG_BITS - 2))

-#define RV64 ((target_ulong)2 << (TARGET_LONG_BITS - 2))
-
  #define RV(x) ((target_ulong)1 << (x - 'A'))
  
  #define RVI RV('I')

@@ -133,8 +131,12 @@ struct CPURISCVState {
  target_ulong priv_ver;
  target_ulong bext_ver;
  target_ulong vext_ver;
-target_ulong misa;
-target_ulong misa_mask;
+
+/* RISCVMXL, but uint32_t for vmstate migration */
+uint32_t misa_mxl;  /* current mxl */
+uint32_t misa_mxl_max;  /* max mxl for this cpu */
+uint32_t misa_ext;  /* current extensions */
+uint32_t misa_ext_mask; /* max ext for this cpu */
  
  uint32_t features;
  
@@ -313,7 +315,7 @@ struct RISCVCPU {
  
  static inline int riscv_has_ext(CPURISCVState *env, target_ulong ext)

  {
-return (env->misa & ext) != 0;
+return (env->misa_ext & ext) != 0;
  }
  
  static inline bool riscv_feature(CPURISCVState *env, int feature)

@@ -322,7 +324,6 @@ static inline bool riscv_feature(CPURISCVState *env, int 
feature)
  }
  
  #include "cpu_user.h"

-#include "cpu_bits.h"
  
  extern const char * const riscv_int_regnames[];

  extern const char * const riscv_fpr_regnames[];
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 2404d482ba..214c1aa40d 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -1448,7 +1448,7 @@ static uint32_t get_elf_hwcap(void)
  uint32_t mask = MISA_BIT('I') | MISA_BIT('M') | MISA_BIT('A')
  | MISA_BIT('F') | MISA_BIT('D') | MISA_BIT('C');
  
-return cpu->env.misa & mask;

+return cpu->env.misa_ext & mask;
  #undef MISA_BIT
  }
  
diff --git a/linux-user/riscv/cpu_loop.c b/linux-user/riscv/cpu_loop.c

index 9859a366e4..e5bb6d908a 100644
--- a/linux-user/riscv/cpu_loop.c
+++ b/linux-user/riscv/cpu_loop.c
@@ -133,7 +133,7 @@ void target_cpu_copy_regs(CPUArchState *env, struct 
target_pt_regs *regs)
  env->gpr[xSP] = regs->sp;
  env->elf_flags = info->elf_flags;
  
-if ((env->misa & RVE) && !(env->elf_flags & EF_RISCV_RVE)) {

+if ((env->misa_ext & RVE) && !(env->elf_flags & EF_RISCV_RVE)) {
  error_report("Incompatible ELF: RVE cpu requires RVE ABI binary");
  exit(EXIT_FAILURE);
  }
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 1d69d1887e..fdf031a394 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -110,16 +110,13 @@ const char *riscv_cpu_get_trap_name(target_ulong cause, 
bool async)
  
  bool riscv_cpu_is_32bit(CPURISCVState *env)

  {
-if (env->misa & RV64) {
-return false;
-}
-
-return true;
+return env->misa_mxl == MXL_RV32;
  }
  
-static void set_misa(CPURISCVState *env, target_ulong misa)

+static void set_misa(CPURISCVState *env, RISCVMXL mxl, uint32_t ext)
  {
-env->misa_mask = env->misa = misa;
+env->misa_mxl_max = env->misa_mxl = mxl;
+env->misa_ext_mask = env->misa_ext = ext;
  }
  
  static void set_priv_version(CPURISCVState *env, int priv_ver)

@@ -148,9 +145,9 @@ static void riscv_any_cpu_init(Object *obj)
  {
  CPURISCVState *env = &RISCV_CPU(obj)->env;
  #if defined(TARGET_RISCV32)
-set_misa(env, RV32 | RVI | RVM | RVA | RVF | RVD | RVC | RVU);
+set_misa(env, MXL_RV32, RVI | RVM | RVA | RVF | RVD | RVC | RVU);
  #elif defined(TARGET_RISCV64)
-set_misa(env, RV64 | RVI | RVM | RVA | RVF | RVD | RVC | RVU);
+set_misa(env, MXL_RV64, RVI | RVM | RVA | RVF | RVD | RVC | RVU);
  #endif
  set_priv_version(env, PRIV_VERSION_1_11_0);
  }
@@ -160,20 +157,20 @@ static void rv64_base_cpu_init(Object *obj)
  {
  CPURISCVState *env = &RISCV_CPU(obj)->env;
  /* We set this in the realise function */
-set_misa(env, RV64);
+set_misa(env, MXL_RV64, 0);
  }
  
  static void rv64_sifive_u_cpu_init(Object *obj)

  {
  CPURISCVState *env = &RISCV_CPU(obj)->env;
-set_misa(env, RV64 | RVI | R

Re: [PATCH v3 3/3] vdpa: Check for iova range at mappings changes

2021-10-14 Thread Eugenio Perez Martin

On Thu, Oct 14, 2021 at 9:02 AM Jason Wang  wrote:
>
> On Thu, Oct 14, 2021 at 1:57 PM Eugenio Perez Martin
>  wrote:
> >
> > On Thu, Oct 14, 2021 at 5:30 AM Jason Wang  wrote:
> > >
> > > On Tue, Oct 12, 2021 at 10:07 PM Eugenio Pérez  
> > > wrote:
> > > >
> > > > Check vdpa device range before updating memory regions so we don't add
> > > > any outside of it, and report the invalid change if any.
> > > >
> > > > Signed-off-by: Eugenio Pérez 
> > > > ---
> > > >  include/hw/virtio/vhost-vdpa.h |  2 ++
> > > >  hw/virtio/vhost-vdpa.c | 62 +-
> > > >  hw/virtio/trace-events |  1 +
> > > >  3 files changed, 49 insertions(+), 16 deletions(-)
> > > >
> > > > diff --git a/include/hw/virtio/vhost-vdpa.h 
> > > > b/include/hw/virtio/vhost-vdpa.h
> > > > index a8963da2d9..c288cf7ecb 100644
> > > > --- a/include/hw/virtio/vhost-vdpa.h
> > > > +++ b/include/hw/virtio/vhost-vdpa.h
> > > > @@ -13,6 +13,7 @@
> > > >  #define HW_VIRTIO_VHOST_VDPA_H
> > > >
> > > >  #include "hw/virtio/virtio.h"
> > > > +#include "standard-headers/linux/vhost_types.h"
> > > >
> > > >  typedef struct VhostVDPAHostNotifier {
> > > >  MemoryRegion mr;
> > > > @@ -24,6 +25,7 @@ typedef struct vhost_vdpa {
> > > >  uint32_t msg_type;
> > > >  bool iotlb_batch_begin_sent;
> > > >  MemoryListener listener;
> > > > +struct vhost_vdpa_iova_range iova_range;
> > > >  struct vhost_dev *dev;
> > > >  VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
> > > >  } VhostVDPA;
> > > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > > index be7c63b4ba..dbf773d032 100644
> > > > --- a/hw/virtio/vhost-vdpa.c
> > > > +++ b/hw/virtio/vhost-vdpa.c
> > > > @@ -37,20 +37,34 @@ static Int128 vhost_vdpa_section_end(const 
> > > > MemoryRegionSection *section)
> > > >  return llend;
> > > >  }
> > > >
> > > > -static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection 
> > > > *section)
> > > > -{
> > > > -return (!memory_region_is_ram(section->mr) &&
> > > > -!memory_region_is_iommu(section->mr)) ||
> > > > -memory_region_is_protected(section->mr) ||
> > > > -   /* vhost-vDPA doesn't allow MMIO to be mapped  */
> > > > -memory_region_is_ram_device(section->mr) ||
> > > > -   /*
> > > > -* Sizing an enabled 64-bit BAR can cause spurious mappings 
> > > > to
> > > > -* addresses in the upper part of the 64-bit address space. 
> > > >  These
> > > > -* are never accessed by the CPU and beyond the address 
> > > > width of
> > > > -* some IOMMU hardware.  TODO: VDPA should tell us the 
> > > > IOMMU width.
> > > > -*/
> > > > -   section->offset_within_address_space & (1ULL << 63);
> >
> > [1]
> >
> > > > +static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection 
> > > > *section,
> > > > +uint64_t iova_min,
> > > > +uint64_t iova_max)
> > > > +{
> > > > +Int128 llend;
> > > > +
> > > > +if ((!memory_region_is_ram(section->mr) &&
> > > > + !memory_region_is_iommu(section->mr)) ||
> > > > +memory_region_is_protected(section->mr) ||
> > > > +/* vhost-vDPA doesn't allow MMIO to be mapped  */
> > > > +memory_region_is_ram_device(section->mr)) {
> > > > +return true;
> > > > +}
> > > > +
> > > > +if (section->offset_within_address_space < iova_min) {
> > > > +error_report("RAM section out of device range (min=%lu, 
> > > > addr=%lu)",
> > > > + iova_min, section->offset_within_address_space);
> > > > +return true;
> > > > +}
> > > > +
> > > > +llend = vhost_vdpa_section_end(section);
> > > > +if (int128_gt(llend, int128_make64(iova_max))) {
> > > > +error_report("RAM section out of device range (max=%lu, end 
> > > > addr=%lu)",
> > > > + iova_max, int128_get64(llend));
> > > > +return true;
> > > > +}
> > > > +
> > > > +return false;
> > > >  }
> > > >
> > > >  static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, 
> > > > hwaddr size,
> > > > @@ -162,7 +176,8 @@ static void 
> > > > vhost_vdpa_listener_region_add(MemoryListener *listener,
> > > >  void *vaddr;
> > > >  int ret;
> > > >
> > > > -if (vhost_vdpa_listener_skipped_section(section)) {
> > > > +if (vhost_vdpa_listener_skipped_section(section, 
> > > > v->iova_range.first,
> > > > +v->iova_range.last)) {
> > > >  return;
> > > >  }
> > > >
> > > > @@ -220,7 +235,8 @@ static void 
> > > > vhost_vdpa_listener_region_del(MemoryListener *listener,
> > > >  Int128 llend, llsize;
> > > >  int ret;
> > > >
> > > > -if (vhost_vdpa_listener_skipped_section(section)) {
> > > > +if (vhost_vdpa_listener_skipped_section(section, 
> > > > v->iova_rang

Re: [PATCH v3 2/4] s390x: kvm: topology: interception of PTF instruction

2021-10-14 Thread Pierre Morel





On 10/13/21 11:11, Thomas Huth wrote:

On 13/10/2021 09.55, Pierre Morel wrote:



On 10/13/21 09:25, Thomas Huth wrote:

On 16/09/2021 15.50, Pierre Morel wrote:

When the host supports the CPU topology facility, the PTF
instruction with function code 2 is interpreted by the SIE,
provided that the userland hypervizor activates the interpretation
by using the KVM_CAP_S390_CPU_TOPOLOGY KVM extension.

The PTF instructions with function code 0 and 1 are intercepted
and must be emulated by the userland hypervizor.

Signed-off-by: Pierre Morel 
---

...

diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index 5b1fdb55c4..dd036961fe 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -97,6 +97,7 @@
  #define PRIV_B9_EQBS    0x9c
  #define PRIV_B9_CLP 0xa0
+#define PRIV_B9_PTF 0xa2
  #define PRIV_B9_PCISTG  0xd0
  #define PRIV_B9_PCILG   0xd2
  #define PRIV_B9_RPCIT   0xd3
@@ -362,6 +363,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
  kvm_vm_enable_cap(s, KVM_CAP_S390_USER_SIGP, 0);
  kvm_vm_enable_cap(s, KVM_CAP_S390_VECTOR_REGISTERS, 0);
  kvm_vm_enable_cap(s, KVM_CAP_S390_USER_STSI, 0);
+    kvm_vm_enable_cap(s, KVM_CAP_S390_CPU_TOPOLOGY, 0);


Should this maybe rather be done in the last patch, to avoid a state 
where PTF is available, but STSI 15 is not implemented yet (when 
bisecting through these commits later)?


  Thomas



Yes you are right, thanks.


I'm also still a little bit surprised that there is really no migration 
code involved here yet. What if a guest gets started on a system with 
KVM_CAP_S390_CPU_TOPOLOGY support and later migrated to a system without 
KVM_CAP_S390_CPU_TOPOLOGY support? Is there already some magic in place 
that rejects such a migration? If not, the guest might first learn that 
it could use the PTF instruction, but suddenly it is then not available 
anymore? Does Linux cope right with PTF becoming unavailable during 
runtime? But even if it does, I think it's likely not in the sense of 
the architecture if certain instructions might disappear during runtime? 
Or do I miss something?


  Thomas




I check on this and take the consequences.

Pierre

--
Pierre Morel
IBM Lab Boeblingen

Re: [PATCH v2 05/13] target/riscv: Add MXL/SXL/UXL to TB_FLAGS

2021-10-14 Thread LIU Zhiwei




On 2021/10/14 上午4:50, Richard Henderson wrote:

Begin adding support for switching XLEN at runtime.  Extract the
effective XLEN from MISA and MSTATUS and store for use during translation.

Signed-off-by: Richard Henderson 
---
v2: Force SXL and UXL to valid values.
---
  target/riscv/cpu.h|  2 ++
  target/riscv/cpu.c|  8 
  target/riscv/cpu_helper.c | 33 +
  target/riscv/csr.c|  3 +++
  target/riscv/translate.c  |  2 +-
  5 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 87248b562a..445ba5b395 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -395,6 +395,8 @@ FIELD(TB_FLAGS, VILL, 8, 1)
  /* Is a Hypervisor instruction load/store allowed? */
  FIELD(TB_FLAGS, HLSX, 9, 1)
  FIELD(TB_FLAGS, MSTATUS_HS_FS, 10, 2)
+/* The combination of MXL/SXL/UXL that applies to the current cpu mode. */
+FIELD(TB_FLAGS, XL, 12, 2)
  
  #ifdef CONFIG_RISCV32

  #define riscv_cpu_mxl(env)  MXL_RV32
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 1857670a69..840edd66f8 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -355,6 +355,14 @@ static void riscv_cpu_reset(DeviceState *dev)
  env->misa_mxl = env->misa_mxl_max;
  env->priv = PRV_M;
  env->mstatus &= ~(MSTATUS_MIE | MSTATUS_MPRV);
+if (env->misa_mxl > MXL_RV32) {
+/*
+ * The reset status of SXL/UXL is officially undefined,
+ * but invalid settings would result in a tcg assert.
+ */
+env->mstatus = set_field(env->mstatus, MSTATUS64_SXL, env->misa_mxl);
+env->mstatus = set_field(env->mstatus, MSTATUS64_UXL, env->misa_mxl);
+}


Can you give more explanation about the assert? As the cpu will always 
reset to M mode, I think we can omit the the setting of UXL or SXL.


Thanks,
Zhiwei


  env->mcause = 0;
  env->pc = env->resetvec;
  env->two_stage_lookup = false;
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 403f54171d..429afd1f48 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -35,6 +35,37 @@ int riscv_cpu_mmu_index(CPURISCVState *env, bool ifetch)
  #endif
  }
  
+static RISCVMXL cpu_get_xl(CPURISCVState *env)

+{
+#if defined(TARGET_RISCV32)
+return MXL_RV32;
+#elif defined(CONFIG_USER_ONLY)
+return MXL_RV64;
+#else
+RISCVMXL xl = riscv_cpu_mxl(env);
+
+/*
+ * When emulating a 32-bit-only cpu, use RV32.
+ * When emulating a 64-bit cpu, and MXL has been reduced to RV32,
+ * MSTATUSH doesn't have UXL/SXL, therefore XLEN cannot be widened
+ * back to RV64 for lower privs.
+ */
+if (xl != MXL_RV32) {
+switch (env->priv) {
+case PRV_M:
+break;
+case PRV_U:
+xl = get_field(env->mstatus, MSTATUS64_UXL);
+break;
+default: /* PRV_S | PRV_H */
+xl = get_field(env->mstatus, MSTATUS64_SXL);
+break;
+}
+}
+return xl;
+#endif
+}
+
  void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong *pc,
target_ulong *cs_base, uint32_t *pflags)
  {
@@ -78,6 +109,8 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong 
*pc,
  }
  #endif
  
+flags = FIELD_DP32(flags, TB_FLAGS, XL, cpu_get_xl(env));

+
  *pflags = flags;
  }
  
diff --git a/target/riscv/csr.c b/target/riscv/csr.c

index 9c0753bc8b..c4a479ddd2 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -526,6 +526,9 @@ static RISCVException write_mstatus(CPURISCVState *env, int 
csrno,
  mstatus = set_field(mstatus, MSTATUS32_SD, dirty);
  } else {
  mstatus = set_field(mstatus, MSTATUS64_SD, dirty);
+/* SXL and UXL fields are for now read only */
+mstatus = set_field(mstatus, MSTATUS64_SXL, MXL_RV64);
+mstatus = set_field(mstatus, MSTATUS64_UXL, MXL_RV64);
  }
  env->mstatus = mstatus;
  
diff --git a/target/riscv/translate.c b/target/riscv/translate.c

index 422f8ab8d0..7e7bb67d15 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -539,7 +539,6 @@ static void riscv_tr_init_disas_context(DisasContextBase 
*dcbase, CPUState *cs)
  #else
  ctx->virt_enabled = false;
  #endif
-ctx->xl = env->misa_mxl;
  ctx->misa_ext = env->misa_ext;
  ctx->frm = -1;  /* unknown rounding mode */
  ctx->ext_ifencei = cpu->cfg.ext_ifencei;
@@ -551,6 +550,7 @@ static void riscv_tr_init_disas_context(DisasContextBase 
*dcbase, CPUState *cs)
  ctx->lmul = FIELD_EX32(tb_flags, TB_FLAGS, LMUL);
  ctx->mlen = 1 << (ctx->sew  + 3 - ctx->lmul);
  ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
+ctx->xl = FIELD_EX32(tb_flags, TB_FLAGS, XL);
  ctx->cs = cs;
  ctx->w = false;
  ctx->ntemp = 0;

Re: [PATCH v2] tests: qtest: Add virtio-iommu test

2021-10-14 Thread Eric Auger

Hi Thomas,

On 10/14/21 7:58 AM, Thomas Huth wrote:
> On 13/10/2021 21.50, Eric Auger wrote:
>> Add the framework to test the virtio-iommu-pci device
>> and tests exercising the attach/detach, map/unmap API.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
> ...
>> +/**
>> + * send_map - Send a map command to the device
>> + * @domain: domain the new binding is attached to
>> + * @virt_start: iova start
>> + * @virt_end: iova end
>> + * @phys_start: base physical address
>> + * @flags: mapping flags
>> + */
>> +static int send_map(QTestState *qts, QVirtioIOMMU *v_iommu,
>> +    uint32_t domain, uint64_t virt_start, uint64_t
>> virt_end,
>> +    uint64_t phys_start, uint32_t flags)
>> +{
>> +    QVirtioDevice *dev = v_iommu->vdev;
>> +    QVirtQueue *vq = v_iommu->vq;
>> +    uint64_t ro_addr, wr_addr;
>> +    uint32_t free_head;
>> +    struct virtio_iommu_req_map req;
>> +    size_t ro_size = sizeof(req) - sizeof(struct
>> virtio_iommu_req_tail);
>> +    size_t wr_size = sizeof(struct virtio_iommu_req_tail);
>> +    char buffer[64];
>> +    int ret;
>> +
>> +    req.head.type = VIRTIO_IOMMU_T_MAP;
>> +    req.domain = domain;
>> +    req.virt_start = virt_start;
>> +    req.virt_end = virt_end;
>> +    req.phys_start = phys_start;
>> +    req.flags = flags;
>> +
>> +    ro_addr = guest_alloc(alloc, ro_size);
>> +    wr_addr = guest_alloc(alloc, wr_size);
>> +
>> +    qtest_memwrite(qts, ro_addr, &req, ro_size);
>> +    free_head = qvirtqueue_add(qts, vq, ro_addr, ro_size, false, true);
>> +    qvirtqueue_add(qts, vq, wr_addr, wr_size, true, false);
>> +    qvirtqueue_kick(qts, dev, vq, free_head);
>> +    qvirtio_wait_used_elem(qts, dev, vq, free_head, NULL,
>> +   QVIRTIO_IOMMU_TIMEOUT_US);
>> +    memread(wr_addr, buffer, wr_size);
>
> qtest_memread() please.

My apologies, I forgot those qtest_memread() comment and the char
buffer[64] replacement too :-(
>
>> +    ret = ((struct virtio_iommu_req_tail *)buffer)->status;
>> +    guest_free(alloc, ro_addr);
>> +    guest_free(alloc, wr_addr);
>> +    return ret;
>> +}
>> +
>> +/**
>> + * send_unmap - Send an unmap command to the device
>> + * @domain: domain the new binding is attached to
>> + * @virt_start: iova start
>> + * @virt_end: iova end
>> + */
>> +static int send_unmap(QTestState *qts, QVirtioIOMMU *v_iommu,
>> +  uint32_t domain, uint64_t virt_start, uint64_t
>> virt_end)
>> +{
>> +    QVirtioDevice *dev = v_iommu->vdev;
>> +    QVirtQueue *vq = v_iommu->vq;
>> +    uint64_t ro_addr, wr_addr;
>> +    uint32_t free_head;
>> +    struct virtio_iommu_req_unmap req;
>> +    size_t ro_size = sizeof(req) - sizeof(struct
>> virtio_iommu_req_tail);
>> +    size_t wr_size = sizeof(struct virtio_iommu_req_tail);
>> +    char buffer[64];
>> +    int ret;
>> +
>> +    req.head.type = VIRTIO_IOMMU_T_UNMAP;
>> +    req.domain = domain;
>> +    req.virt_start = virt_start;
>> +    req.virt_end = virt_end;
>> +
>> +    ro_addr = guest_alloc(alloc, ro_size);
>> +    wr_addr = guest_alloc(alloc, wr_size);
>> +
>> +    qtest_memwrite(qts, ro_addr, &req, ro_size);
>> +    free_head = qvirtqueue_add(qts, vq, ro_addr, ro_size, false, true);
>> +    qvirtqueue_add(qts, vq, wr_addr, wr_size, true, false);
>> +    qvirtqueue_kick(qts, dev, vq, free_head);
>> +    qvirtio_wait_used_elem(qts, dev, vq, free_head, NULL,
>> +   QVIRTIO_IOMMU_TIMEOUT_US);
>> +    memread(wr_addr, buffer, wr_size);
>
> qtest_memread() please.
>
>> +    ret = ((struct virtio_iommu_req_tail *)buffer)->status;
>> +    guest_free(alloc, ro_addr);
>> +    guest_free(alloc, wr_addr);
>> +    return ret;
>> +}
>
> With the two memread()s changed into qtest_memread()s:
>
> Acked-by: Thomas Huth 
Thank you for your patience!

Eric

Re: [PATCH v3 1/2] virtio-iommu: Remove the non transitional name

2021-10-14 Thread Jean-Philippe Brucker

On Wed, Oct 13, 2021 at 03:17:54PM -0400, Eric Auger wrote:
> Remove the non transitional name for virtio iommu. Like other
> devices introduced after 1.0 spec, the virtio-iommu does
> not need it.
> 
> Signed-off-by: Eric Auger 
> Reported-by: Andrea Bolognani 
> Reviewed-by: Cornelia Huck 

Reviewed-by: Jean-Philippe Brucker 

> ---
>  hw/virtio/virtio-iommu-pci.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/hw/virtio/virtio-iommu-pci.c b/hw/virtio/virtio-iommu-pci.c
> index 770c286be7..86fa4e6c28 100644
> --- a/hw/virtio/virtio-iommu-pci.c
> +++ b/hw/virtio/virtio-iommu-pci.c
> @@ -100,7 +100,6 @@ static void virtio_iommu_pci_instance_init(Object *obj)
>  static const VirtioPCIDeviceTypeInfo virtio_iommu_pci_info = {
>  .base_name = TYPE_VIRTIO_IOMMU_PCI,
>  .generic_name  = "virtio-iommu-pci",
> -.non_transitional_name = "virtio-iommu-pci-non-transitional",
>  .instance_size = sizeof(VirtIOIOMMUPCI),
>  .instance_init = virtio_iommu_pci_instance_init,
>  .class_init= virtio_iommu_pci_class_init,
> -- 
> 2.27.0
>

Re: [PATCH v3 2/2] virtio-iommu: Drop base_name and change generic_name

2021-10-14 Thread Jean-Philippe Brucker

On Wed, Oct 13, 2021 at 03:17:55PM -0400, Eric Auger wrote:
> Drop base_name and turn generic_name into
> "virtio-iommu-pci". This is more in line with
> other modern-only devices.
> 
> Signed-off-by: Eric Auger 
> Suggested-by: Cornelia Huck 
> Reviewed-by: Cornelia Huck 

Reviewed-by: Jean-Philippe Brucker 

> ---
>  hw/virtio/virtio-iommu-pci.c | 3 +--
>  include/hw/virtio/virtio-iommu.h | 2 +-
>  2 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/virtio/virtio-iommu-pci.c b/hw/virtio/virtio-iommu-pci.c
> index 86fa4e6c28..a160ae6b41 100644
> --- a/hw/virtio/virtio-iommu-pci.c
> +++ b/hw/virtio/virtio-iommu-pci.c
> @@ -98,8 +98,7 @@ static void virtio_iommu_pci_instance_init(Object *obj)
>  }
>  
>  static const VirtioPCIDeviceTypeInfo virtio_iommu_pci_info = {
> -.base_name = TYPE_VIRTIO_IOMMU_PCI,
> -.generic_name  = "virtio-iommu-pci",
> +.generic_name  = TYPE_VIRTIO_IOMMU_PCI,
>  .instance_size = sizeof(VirtIOIOMMUPCI),
>  .instance_init = virtio_iommu_pci_instance_init,
>  .class_init= virtio_iommu_pci_class_init,
> diff --git a/include/hw/virtio/virtio-iommu.h 
> b/include/hw/virtio/virtio-iommu.h
> index 273e35c04b..e2339e5b72 100644
> --- a/include/hw/virtio/virtio-iommu.h
> +++ b/include/hw/virtio/virtio-iommu.h
> @@ -26,7 +26,7 @@
>  #include "qom/object.h"
>  
>  #define TYPE_VIRTIO_IOMMU "virtio-iommu-device"
> -#define TYPE_VIRTIO_IOMMU_PCI "virtio-iommu-device-base"
> +#define TYPE_VIRTIO_IOMMU_PCI "virtio-iommu-pci"
>  OBJECT_DECLARE_SIMPLE_TYPE(VirtIOIOMMU, VIRTIO_IOMMU)
>  
>  #define TYPE_VIRTIO_IOMMU_MEMORY_REGION "virtio-iommu-memory-region"
> -- 
> 2.27.0
>

Re: [PATCH v2 08/13] target/riscv: Replace is_32bit with get_xl/get_xlen

2021-10-14 Thread LIU Zhiwei




On 2021/10/14 上午4:50, Richard Henderson wrote:

In preparation for RV128, replace a simple predicate
with a more versatile test.

Signed-off-by: Richard Henderson 

Reviewed-by: LIU Zhiwei

Zhiwei

---
  target/riscv/translate.c | 32 +---
  1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 7e7bb67d15..5724a62bb0 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -91,16 +91,18 @@ static inline bool has_ext(DisasContext *ctx, uint32_t ext)
  }
  
  #ifdef TARGET_RISCV32

-# define is_32bit(ctx)  true
+#define get_xl(ctx)MXL_RV32
  #elif defined(CONFIG_USER_ONLY)
-# define is_32bit(ctx)  false
+#define get_xl(ctx)MXL_RV64
  #else
-static inline bool is_32bit(DisasContext *ctx)
-{
-return ctx->xl == MXL_RV32;
-}
+#define get_xl(ctx)((ctx)->xl)
  #endif
  
+static inline int get_xlen(DisasContext *ctx)

+{
+return 16 << get_xl(ctx);
+}
+
  /* The word size for this operation. */
  static inline int oper_len(DisasContext *ctx)
  {
@@ -282,7 +284,7 @@ static void gen_jal(DisasContext *ctx, int rd, target_ulong 
imm)
  static void mark_fs_dirty(DisasContext *ctx)
  {
  TCGv tmp;
-target_ulong sd = is_32bit(ctx) ? MSTATUS32_SD : MSTATUS64_SD;
+target_ulong sd = get_xl(ctx) == MXL_RV32 ? MSTATUS32_SD : MSTATUS64_SD;
  
  if (ctx->mstatus_fs != MSTATUS_FS) {

  /* Remember the state change for the rest of the TB. */
@@ -341,16 +343,16 @@ EX_SH(12)
  }  \
  } while (0)
  
-#define REQUIRE_32BIT(ctx) do { \

-if (!is_32bit(ctx)) {   \
-return false;   \
-}   \
+#define REQUIRE_32BIT(ctx) do {\
+if (get_xl(ctx) != MXL_RV32) { \
+return false;  \
+}  \
  } while (0)
  
-#define REQUIRE_64BIT(ctx) do { \

-if (is_32bit(ctx)) {\
-return false;   \
-}   \
+#define REQUIRE_64BIT(ctx) do {\
+if (get_xl(ctx) < MXL_RV64) {  \
+return false;  \
+}  \
  } while (0)
  
  static int ex_rvc_register(DisasContext *ctx, int reg)

[PATCH v3] tests: qtest: Add virtio-iommu test

2021-10-14 Thread Eric Auger

Add the framework to test the virtio-iommu-pci device
and tests exercising the attach/detach, map/unmap API.

Signed-off-by: Eric Auger 
Acked-by: Thomas Huth 

---

This applies on top of jean-Philippe's
[PATCH v4 00/11] virtio-iommu: Add ACPI support
branch can be found at:
https://github.com/eauger/qemu.git
branch qtest-virtio-iommu-v3

To run the tests:
make tests/qtest/qos-test
cd build
QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon 
QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64  tests/qtest/qos-test

v2 -> v3:
- s/memread/qtest_memread
- s/char buffer[64]/struct virtio_iommu_req_tail buffer
- added Thomas' A-b

v1 -> v2:
- fix the license info (Thomas)
- use UINT64_MAX (Philippe)
---
 tests/qtest/libqos/meson.build|   1 +
 tests/qtest/libqos/virtio-iommu.c | 172 +
 tests/qtest/libqos/virtio-iommu.h |  40 
 tests/qtest/meson.build   |   1 +
 tests/qtest/virtio-iommu-test.c   | 299 ++
 5 files changed, 513 insertions(+)
 create mode 100644 tests/qtest/libqos/virtio-iommu.c
 create mode 100644 tests/qtest/libqos/virtio-iommu.h
 create mode 100644 tests/qtest/virtio-iommu-test.c

diff --git a/tests/qtest/libqos/meson.build b/tests/qtest/libqos/meson.build
index 1f5c8f1053..ba90bbe2b8 100644
--- a/tests/qtest/libqos/meson.build
+++ b/tests/qtest/libqos/meson.build
@@ -40,6 +40,7 @@ libqos_srcs = files('../libqtest.c',
 'virtio-rng.c',
 'virtio-scsi.c',
 'virtio-serial.c',
+'virtio-iommu.c',
 
 # qgraph machines:
 'aarch64-xlnx-zcu102-machine.c',
diff --git a/tests/qtest/libqos/virtio-iommu.c 
b/tests/qtest/libqos/virtio-iommu.c
new file mode 100644
index 00..fd355ff0a4
--- /dev/null
+++ b/tests/qtest/libqos/virtio-iommu.c
@@ -0,0 +1,172 @@
+/*
+ * libqos driver virtio-iommu-pci framework
+ *
+ * Copyright (c) 2021 Red Hat, Inc.
+ *
+ * Authors:
+ *  Eric Auger 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at your
+ * option) any later version.  See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest.h"
+#include "qemu/module.h"
+#include "qgraph.h"
+#include "virtio-iommu.h"
+#include "hw/virtio/virtio-iommu.h"
+
+static QGuestAllocator *alloc;
+
+/* virtio-iommu-device */
+static void *qvirtio_iommu_get_driver(QVirtioIOMMU *v_iommu,
+  const char *interface)
+{
+if (!g_strcmp0(interface, "virtio-iommu")) {
+return v_iommu;
+}
+if (!g_strcmp0(interface, "virtio")) {
+return v_iommu->vdev;
+}
+
+fprintf(stderr, "%s not present in virtio-iommu-device\n", interface);
+g_assert_not_reached();
+}
+
+static void *qvirtio_iommu_device_get_driver(void *object,
+ const char *interface)
+{
+QVirtioIOMMUDevice *v_iommu = object;
+return qvirtio_iommu_get_driver(&v_iommu->iommu, interface);
+}
+
+static void virtio_iommu_cleanup(QVirtioIOMMU *interface)
+{
+qvirtqueue_cleanup(interface->vdev->bus, interface->vq, alloc);
+}
+
+static void virtio_iommu_setup(QVirtioIOMMU *interface)
+{
+QVirtioDevice *vdev = interface->vdev;
+uint64_t features;
+
+features = qvirtio_get_features(vdev);
+features &= ~(QVIRTIO_F_BAD_FEATURE |
+  (1ull << VIRTIO_RING_F_INDIRECT_DESC) |
+  (1ull << VIRTIO_RING_F_EVENT_IDX) |
+  (1ull << VIRTIO_IOMMU_F_BYPASS));
+qvirtio_set_features(vdev, features);
+interface->vq = qvirtqueue_setup(interface->vdev, alloc, 0);
+qvirtio_set_driver_ok(interface->vdev);
+}
+
+static void qvirtio_iommu_device_destructor(QOSGraphObject *obj)
+{
+QVirtioIOMMUDevice *v_iommu = (QVirtioIOMMUDevice *) obj;
+QVirtioIOMMU *iommu = &v_iommu->iommu;
+
+virtio_iommu_cleanup(iommu);
+}
+
+static void qvirtio_iommu_device_start_hw(QOSGraphObject *obj)
+{
+QVirtioIOMMUDevice *v_iommu = (QVirtioIOMMUDevice *) obj;
+QVirtioIOMMU *iommu = &v_iommu->iommu;
+
+virtio_iommu_setup(iommu);
+}
+
+static void *virtio_iommu_device_create(void *virtio_dev,
+QGuestAllocator *t_alloc,
+void *addr)
+{
+QVirtioIOMMUDevice *virtio_rdevice = g_new0(QVirtioIOMMUDevice, 1);
+QVirtioIOMMU *interface = &virtio_rdevice->iommu;
+
+interface->vdev = virtio_dev;
+alloc = t_alloc;
+
+virtio_rdevice->obj.get_driver = qvirtio_iommu_device_get_driver;
+virtio_rdevice->obj.destructor = qvirtio_iommu_device_destructor;
+virtio_rdevice->obj.start_hw = qvirtio_iommu_device_start_hw;
+
+return &virtio_rdevice->obj;
+}
+
+/* virtio-iommu-pci */
+static void *qvirtio_iommu_pci_get_driver(void *object, const char *interface)
+{
+QVirtioIOMMUPCI *v_iommu = object;
+if (!g_strcmp0(interface, "pci-device")) {
+return v_iommu->pci_vdev.pdev;
+}
+return qvirtio_iommu_get_driver(&v_io

Re: [PATCH] hw/elf_ops.h: switch to ssize_t for elf loader return type

2021-10-14 Thread Stefano Garzarella


On Wed, Oct 06, 2021 at 09:28:39PM +0200, Luc Michel wrote:

Until now, int was used as the return type for all the ELF
loader related functions. The returned value is the sum of all loaded
program headers "MemSize" fields.

Because of the overflow check in elf_ops.h, trying to load an ELF bigger
than INT_MAX will fail. Switch to ssize_t to remove this limitation.

Signed-off-by: Luc Michel 
---
include/hw/elf_ops.h | 25 +-
include/hw/loader.h  | 60 ++--
hw/core/loader.c | 60 +++-
3 files changed, 74 insertions(+), 71 deletions(-)

diff --git a/include/hw/elf_ops.h b/include/hw/elf_ops.h
index 1c37cec4ae..5c2ea0339e 100644
--- a/include/hw/elf_ops.h
+++ b/include/hw/elf_ops.h
@@ -310,24 +310,25 @@ static struct elf_note *glue(get_elf_note_type, 
SZ)(struct elf_note *nhdr,
}

return nhdr;
}

-static int glue(load_elf, SZ)(const char *name, int fd,
-  uint64_t (*elf_note_fn)(void *, void *, bool),
-  uint64_t (*translate_fn)(void *, uint64_t),
-  void *translate_opaque,
-  int must_swab, uint64_t *pentry,
-  uint64_t *lowaddr, uint64_t *highaddr,
-  uint32_t *pflags, int elf_machine,
-  int clear_lsb, int data_swab,
-  AddressSpace *as, bool load_rom,
-  symbol_fn_t sym_cb)
+static ssize_t glue(load_elf, SZ)(const char *name, int fd,
+  uint64_t (*elf_note_fn)(void *, void *, 
bool),
+  uint64_t (*translate_fn)(void *, uint64_t),
+  void *translate_opaque,
+  int must_swab, uint64_t *pentry,
+  uint64_t *lowaddr, uint64_t *highaddr,
+  uint32_t *pflags, int elf_machine,
+  int clear_lsb, int data_swab,
+  AddressSpace *as, bool load_rom,
+  symbol_fn_t sym_cb)
{
struct elfhdr ehdr;
struct elf_phdr *phdr = NULL, *ph;
-int size, i, total_size;
+int size, i;
+ssize_t total_size;
elf_word mem_size, file_size, data_offset;
uint64_t addr, low = (uint64_t)-1, high = 0;
GMappedFile *mapped_file = NULL;
uint8_t *data = NULL;
int ret = ELF_LOAD_FAILED;


Since we assign `total_size` to `ret` and we return `ret`, `ret` should 
be ssize_t too, right?


The rest LGTM.

Thanks,
Stefano

Re: [PATCH v2 09/13] target/riscv: Replace DisasContext.w with DisasContext.ol

2021-10-14 Thread LIU Zhiwei




On 2021/10/14 上午4:51, Richard Henderson wrote:

In preparation for RV128, consider more than just "w" for
operand size modification.  This will be used for the "d"
insns from RV128 as well.

Rename oper_len to get_olen to better match get_xlen.

Signed-off-by: Richard Henderson 
---
  target/riscv/translate.c| 71 -
  target/riscv/insn_trans/trans_rvb.c.inc |  8 +--
  target/riscv/insn_trans/trans_rvi.c.inc | 18 +++
  target/riscv/insn_trans/trans_rvm.c.inc | 10 ++--
  4 files changed, 63 insertions(+), 44 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 5724a62bb0..6ab5c6aa58 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -67,7 +67,7 @@ typedef struct DisasContext {
 to any system register, which includes CSR_FRM, so we do not have
 to reset this known value.  */
  int frm;
-bool w;
+RISCVMXL ol;


Why not directly use the xl?

Thanks,
Zhiwei


  bool virt_enabled;
  bool ext_ifencei;
  bool hlsx;
@@ -103,12 +103,17 @@ static inline int get_xlen(DisasContext *ctx)
  return 16 << get_xl(ctx);
  }
  
-/* The word size for this operation. */

-static inline int oper_len(DisasContext *ctx)
-{
-return ctx->w ? 32 : TARGET_LONG_BITS;
-}
+/* The operation length, as opposed to the xlen. */
+#ifdef TARGET_RISCV32
+#define get_ol(ctx)MXL_RV32
+#else
+#define get_ol(ctx)((ctx)->ol)
+#endif
  
+static inline int get_olen(DisasContext *ctx)

+{
+return 16 << get_ol(ctx);
+}
  
  /*

   * RISC-V requires NaN-boxing of narrower width floating point values.
@@ -221,24 +226,34 @@ static TCGv get_gpr(DisasContext *ctx, int reg_num, 
DisasExtend ext)
  return ctx->zero;
  }
  
-switch (ctx->w ? ext : EXT_NONE) {

-case EXT_NONE:
-return cpu_gpr[reg_num];
-case EXT_SIGN:
-t = temp_new(ctx);
-tcg_gen_ext32s_tl(t, cpu_gpr[reg_num]);
-return t;
-case EXT_ZERO:
-t = temp_new(ctx);
-tcg_gen_ext32u_tl(t, cpu_gpr[reg_num]);
-return t;
+switch (get_ol(ctx)) {
+case MXL_RV32:
+switch (ext) {
+case EXT_NONE:
+break;
+case EXT_SIGN:
+t = temp_new(ctx);
+tcg_gen_ext32s_tl(t, cpu_gpr[reg_num]);
+return t;
+case EXT_ZERO:
+t = temp_new(ctx);
+tcg_gen_ext32u_tl(t, cpu_gpr[reg_num]);
+return t;
+default:
+g_assert_not_reached();
+}
+break;
+case MXL_RV64:
+break;
+default:
+g_assert_not_reached();
  }
-g_assert_not_reached();
+return cpu_gpr[reg_num];
  }
  
  static TCGv dest_gpr(DisasContext *ctx, int reg_num)

  {
-if (reg_num == 0 || ctx->w) {
+if (reg_num == 0 || get_olen(ctx) < TARGET_LONG_BITS) {
  return temp_new(ctx);
  }
  return cpu_gpr[reg_num];
@@ -247,10 +262,15 @@ static TCGv dest_gpr(DisasContext *ctx, int reg_num)
  static void gen_set_gpr(DisasContext *ctx, int reg_num, TCGv t)
  {
  if (reg_num != 0) {
-if (ctx->w) {
+switch (get_ol(ctx)) {
+case MXL_RV32:
  tcg_gen_ext32s_tl(cpu_gpr[reg_num], t);
-} else {
+break;
+case MXL_RV64:
  tcg_gen_mov_tl(cpu_gpr[reg_num], t);
+break;
+default:
+g_assert_not_reached();
  }
  }
  }
@@ -411,7 +431,7 @@ static bool gen_shift_imm_fn(DisasContext *ctx, arg_shift 
*a, DisasExtend ext,
   void (*func)(TCGv, TCGv, target_long))
  {
  TCGv dest, src1;
-int max_len = oper_len(ctx);
+int max_len = get_olen(ctx);
  
  if (a->shamt >= max_len) {

  return false;
@@ -430,7 +450,7 @@ static bool gen_shift_imm_tl(DisasContext *ctx, arg_shift 
*a, DisasExtend ext,
   void (*func)(TCGv, TCGv, TCGv))
  {
  TCGv dest, src1, src2;
-int max_len = oper_len(ctx);
+int max_len = get_olen(ctx);
  
  if (a->shamt >= max_len) {

  return false;
@@ -454,7 +474,7 @@ static bool gen_shift(DisasContext *ctx, arg_r *a, 
DisasExtend ext,
  TCGv src2 = get_gpr(ctx, a->rs2, EXT_NONE);
  TCGv ext2 = tcg_temp_new();
  
-tcg_gen_andi_tl(ext2, src2, oper_len(ctx) - 1);

+tcg_gen_andi_tl(ext2, src2, get_olen(ctx) - 1);
  func(dest, src1, ext2);
  
  gen_set_gpr(ctx, a->rd, dest);

@@ -554,7 +574,6 @@ static void riscv_tr_init_disas_context(DisasContextBase 
*dcbase, CPUState *cs)
  ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
  ctx->xl = FIELD_EX32(tb_flags, TB_FLAGS, XL);
  ctx->cs = cs;
-ctx->w = false;
  ctx->ntemp = 0;
  memset(ctx->temp, 0, sizeof(ctx->temp));
  
@@ -578,9 +597,9 @@ static void riscv_tr_translate_insn(DisasContextBase *dcbase, CPUState *cpu)

  CPURISCVState *env = cpu->env_ptr;
  uint16_t opcode16 = translator_lduw(env, &ctx->base, ctx->

Re: [PATCH v1 0/9] migration/ram: Optimize for virtio-mem via RamDiscardManager

2021-10-14 Thread David Hildenbrand

On 13.10.21 19:35, Dr. David Alan Gilbert wrote:
> * David Hildenbrand (da...@redhat.com) wrote:
>> This series is fully reviewed by Peter and I hope we can get either more
>> review feedback or get it merged via the migration tree soonish. Thanks.
> 
> Yep, I think that's a full set now; we should take this via migration.

Thanks! (BTW, this is actually v6, I messed up :) )


-- 
Thanks,

David / dhildenb

Re: [PATCH v1 0/9] migration/ram: Optimize for virtio-mem via RamDiscardManager

2021-10-14 Thread David Hildenbrand

On 13.10.21 19:35, Dr. David Alan Gilbert wrote:
> * David Hildenbrand (da...@redhat.com) wrote:
>> This series is fully reviewed by Peter and I hope we can get either more
>> review feedback or get it merged via the migration tree soonish. Thanks.
> 
> Yep, I think that's a full set now; we should take this via migration.

Thanks! (BTW, this is actually v6, I messed up :) )


-- 
Thanks,

David / dhildenb

Re: [PATCH 1/2] qga-win: Detect OS based on Windows 10 by first build number

2021-10-14 Thread Marc-André Lureau

On Tue, Sep 14, 2021 at 4:18 PM Kostiantyn Kostiuk 
wrote:

> Windows Server 2016, 2019, 2022 are based on Windows 10 and
> have the same major and minor versions. So, the only way to
> detect the proper version is to use the build number.
>
> Before this commit, the guest agent use the last build number
> for each OS, but it causes problems when new OS releases.
> There are few preview versions before release, and we
> can't update this list.
>
> After this commit, the guest agent will use the first build
> number. For each new preview version or release version,
> Microsoft increases the build number, so we can add the number
> of the first preview build and this will work until the new
> OS release.
>
> Signed-off-by: Kostiantyn Kostiuk 
>

Reviewed-by: Marc-André Lureau 

---
>  qga/commands-win32.c | 18 +++---
>  1 file changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/qga/commands-win32.c b/qga/commands-win32.c
> index 4e84afd83b..a8e9d40b31 100644
> --- a/qga/commands-win32.c
> +++ b/qga/commands-win32.c
> @@ -2162,7 +2162,7 @@ static ga_matrix_lookup_t const
> WIN_VERSION_MATRIX[2][8] = {
>  };
>
>  typedef struct _ga_win_10_0_server_t {
> -int final_build;
> +int first_build;
>  char const *version;
>  char const *version_id;
>  } ga_win_10_0_server_t;
> @@ -2202,18 +2202,22 @@ static char *ga_get_win_name(OSVERSIONINFOEXW
> const *os_version, bool id)
>  int tbl_idx = (os_version->wProductType != VER_NT_WORKSTATION);
>  ga_matrix_lookup_t const *table = WIN_VERSION_MATRIX[tbl_idx];
>  ga_win_10_0_server_t const *win_10_0_table =
> WIN_10_0_SERVER_VERSION_MATRIX;
> +ga_win_10_0_server_t const *win_10_0_version = NULL;
>  while (table->version != NULL) {
>  if (major == 10 && minor == 0 && tbl_idx) {
>  while (win_10_0_table->version != NULL) {
> -if (build <= win_10_0_table->final_build) {
> -if (id) {
> -return g_strdup(win_10_0_table->version_id);
> -} else {
> -return g_strdup(win_10_0_table->version);
> -}
> +if (build >= win_10_0_table->first_build) {
> +win_10_0_version = win_10_0_table;
>  }
>  win_10_0_table++;
>  }
> +if (win_10_0_table) {
> +if (id) {
> +return g_strdup(win_10_0_version->version_id);
> +} else {
> +return g_strdup(win_10_0_version->version);
> +}
> +}
>  } else if (major == table->major && minor == table->minor) {
>  if (id) {
>  return g_strdup(table->version_id);
> --
> 2.33.0
>
>
>

-- 
Marc-André Lureau

Re: [PATCH] hw/elf_ops.h: switch to ssize_t for elf loader return type

2021-10-14 Thread Luc Michel

On 10:36 Thu 14 Oct , Stefano Garzarella wrote:
> On Wed, Oct 06, 2021 at 09:28:39PM +0200, Luc Michel wrote:
> >Until now, int was used as the return type for all the ELF
> >loader related functions. The returned value is the sum of all loaded
> >program headers "MemSize" fields.
> >
> >Because of the overflow check in elf_ops.h, trying to load an ELF bigger
> >than INT_MAX will fail. Switch to ssize_t to remove this limitation.
> >
> >Signed-off-by: Luc Michel 
> >---
> > include/hw/elf_ops.h | 25 +-
> > include/hw/loader.h  | 60 ++--
> > hw/core/loader.c | 60 +++-
> > 3 files changed, 74 insertions(+), 71 deletions(-)
> >
> >diff --git a/include/hw/elf_ops.h b/include/hw/elf_ops.h
> >index 1c37cec4ae..5c2ea0339e 100644
> >--- a/include/hw/elf_ops.h
> >+++ b/include/hw/elf_ops.h
> >@@ -310,24 +310,25 @@ static struct elf_note *glue(get_elf_note_type, 
> >SZ)(struct elf_note *nhdr,
> > }
> >
> > return nhdr;
> > }
> >
> >-static int glue(load_elf, SZ)(const char *name, int fd,
> >-  uint64_t (*elf_note_fn)(void *, void *, bool),
> >-  uint64_t (*translate_fn)(void *, uint64_t),
> >-  void *translate_opaque,
> >-  int must_swab, uint64_t *pentry,
> >-  uint64_t *lowaddr, uint64_t *highaddr,
> >-  uint32_t *pflags, int elf_machine,
> >-  int clear_lsb, int data_swab,
> >-  AddressSpace *as, bool load_rom,
> >-  symbol_fn_t sym_cb)
> >+static ssize_t glue(load_elf, SZ)(const char *name, int fd,
> >+  uint64_t (*elf_note_fn)(void *, void *, 
> >bool),
> >+  uint64_t (*translate_fn)(void *, 
> >uint64_t),
> >+  void *translate_opaque,
> >+  int must_swab, uint64_t *pentry,
> >+  uint64_t *lowaddr, uint64_t *highaddr,
> >+  uint32_t *pflags, int elf_machine,
> >+  int clear_lsb, int data_swab,
> >+  AddressSpace *as, bool load_rom,
> >+  symbol_fn_t sym_cb)
> > {
> > struct elfhdr ehdr;
> > struct elf_phdr *phdr = NULL, *ph;
> >-int size, i, total_size;
> >+int size, i;
> >+ssize_t total_size;
> > elf_word mem_size, file_size, data_offset;
> > uint64_t addr, low = (uint64_t)-1, high = 0;
> > GMappedFile *mapped_file = NULL;
> > uint8_t *data = NULL;
> > int ret = ELF_LOAD_FAILED;
> 
> Since we assign `total_size` to `ret` and we return `ret`, `ret` should 
> be ssize_t too, right?
Yes you are right I missed this one. I'll send a v2.
Thanks.
 
Luc

> 
> The rest LGTM.
> 
> Thanks,
> Stefano
> 
> 
> 
> To declare a filtering error, please use the following link : 
> https://www.security-mail.net/reporter.php?mid=7f69.6167ec18.9b19a.0&r=lmichel%40kalray.eu&s=sgarzare%40redhat.com&o=Re%3A+%5BPATCH%5D+hw%2Felf_ops.h%3A+switch+to+ssize_t+for+elf+loader+return+type&verdict=C&c=618071aa1c7ceb467a44ac8717a3e44186291f37
> 

--

Re: [PATCH v2 09/13] target/riscv: Replace DisasContext.w with DisasContext.ol

2021-10-14 Thread Frédéric Pétrot

Hi,
Le 14/10/2021 à 10:40, LIU Zhiwei a écrit :
> 
> On 2021/10/14 上午4:51, Richard Henderson wrote:
>> In preparation for RV128, consider more than just "w" for
>> operand size modification.  This will be used for the "d"
>> insns from RV128 as well.
>>
>> Rename oper_len to get_olen to better match get_xlen.
>>
>> Signed-off-by: Richard Henderson 
>> ---
>>   target/riscv/translate.c    | 71 -
>>   target/riscv/insn_trans/trans_rvb.c.inc |  8 +--
>>   target/riscv/insn_trans/trans_rvi.c.inc | 18 +++
>>   target/riscv/insn_trans/trans_rvm.c.inc | 10 ++--
>>   4 files changed, 63 insertions(+), 44 deletions(-)
>>
>> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
>> index 5724a62bb0..6ab5c6aa58 100644
>> --- a/target/riscv/translate.c
>> +++ b/target/riscv/translate.c
>> @@ -67,7 +67,7 @@ typedef struct DisasContext {
>>  to any system register, which includes CSR_FRM, so we do not have
>>  to reset this known value.  */
>>   int frm;
>> -    bool w;
>> +    RISCVMXL ol;
> 
> Why not directly use the xl?

  Hi Zhiwei,

  I am not speaking for Richard, but my understanding is that 'ol' is linked to
  the instruction being translated, suffixed by 'w' in rv64 and 'w' and 'd' in
  rv128, while 'xl' is the value in mstatus (or misa) depending on the register
  size of the current execution (mxl, sxl, uxl).
  Note that the spec says that the insns results should be sign-extended to the
  maximum mxl size supported by the processor (held in misa_mxl_max, that is
  implicitely in tl right now, but that will need to be put into the disas
  context in 128-bit).
  To summarize, we have the maximum (processor instanciation time) register size
  in misa_mxl_max, the current register size in xl, and the current insn size in
  ol.

  Frédéric
> 
> Thanks,
> Zhiwei
> 
>>   bool virt_enabled;
>>   bool ext_ifencei;
>>   bool hlsx;
>> @@ -103,12 +103,17 @@ static inline int get_xlen(DisasContext *ctx)
>>   return 16 << get_xl(ctx);
>>   }
>>   -/* The word size for this operation. */
>> -static inline int oper_len(DisasContext *ctx)
>> -{
>> -    return ctx->w ? 32 : TARGET_LONG_BITS;
>> -}
>> +/* The operation length, as opposed to the xlen. */
>> +#ifdef TARGET_RISCV32
>> +#define get_ol(ctx)    MXL_RV32
>> +#else
>> +#define get_ol(ctx)    ((ctx)->ol)
>> +#endif
>>   +static inline int get_olen(DisasContext *ctx)
>> +{
>> +    return 16 << get_ol(ctx);
>> +}
>>     /*
>>    * RISC-V requires NaN-boxing of narrower width floating point values.
>> @@ -221,24 +226,34 @@ static TCGv get_gpr(DisasContext *ctx, int reg_num,
>> DisasExtend ext)
>>   return ctx->zero;
>>   }
>>   -    switch (ctx->w ? ext : EXT_NONE) {
>> -    case EXT_NONE:
>> -    return cpu_gpr[reg_num];
>> -    case EXT_SIGN:
>> -    t = temp_new(ctx);
>> -    tcg_gen_ext32s_tl(t, cpu_gpr[reg_num]);
>> -    return t;
>> -    case EXT_ZERO:
>> -    t = temp_new(ctx);
>> -    tcg_gen_ext32u_tl(t, cpu_gpr[reg_num]);
>> -    return t;
>> +    switch (get_ol(ctx)) {
>> +    case MXL_RV32:
>> +    switch (ext) {
>> +    case EXT_NONE:
>> +    break;
>> +    case EXT_SIGN:
>> +    t = temp_new(ctx);
>> +    tcg_gen_ext32s_tl(t, cpu_gpr[reg_num]);
>> +    return t;
>> +    case EXT_ZERO:
>> +    t = temp_new(ctx);
>> +    tcg_gen_ext32u_tl(t, cpu_gpr[reg_num]);
>> +    return t;
>> +    default:
>> +    g_assert_not_reached();
>> +    }
>> +    break;
>> +    case MXL_RV64:
>> +    break;
>> +    default:
>> +    g_assert_not_reached();
>>   }
>> -    g_assert_not_reached();
>> +    return cpu_gpr[reg_num];
>>   }
>>     static TCGv dest_gpr(DisasContext *ctx, int reg_num)
>>   {
>> -    if (reg_num == 0 || ctx->w) {
>> +    if (reg_num == 0 || get_olen(ctx) < TARGET_LONG_BITS) {
>>   return temp_new(ctx);
>>   }
>>   return cpu_gpr[reg_num];
>> @@ -247,10 +262,15 @@ static TCGv dest_gpr(DisasContext *ctx, int reg_num)
>>   static void gen_set_gpr(DisasContext *ctx, int reg_num, TCGv t)
>>   {
>>   if (reg_num != 0) {
>> -    if (ctx->w) {
>> +    switch (get_ol(ctx)) {
>> +    case MXL_RV32:
>>   tcg_gen_ext32s_tl(cpu_gpr[reg_num], t);
>> -    } else {
>> +    break;
>> +    case MXL_RV64:
>>   tcg_gen_mov_tl(cpu_gpr[reg_num], t);
>> +    break;
>> +    default:
>> +    g_assert_not_reached();
>>   }
>>   }
>>   }
>> @@ -411,7 +431,7 @@ static bool gen_shift_imm_fn(DisasContext *ctx, arg_shift
>> *a, DisasExtend ext,
>>    void (*func)(TCGv, TCGv, target_long))
>>   {
>>   TCGv dest, src1;
>> -    int max_len = oper_len(ctx);
>> +    int max_len = get_olen(ctx);
>>     if (a->shamt >= max_len) {
>>   return false;
>> @@ -430,7 +450,7 @@ static bool gen_shift_imm_tl(DisasContext *ctx, arg_

Re: [PATCH v3] hw/usb/vt82c686-uhci-pci: Use ISA instead of PCI interrupts

2021-10-14 Thread Gerd Hoffmann

On Wed, Oct 13, 2021 at 02:13:09PM +0200, BALATON Zoltan wrote:
> This device is part of a superio/ISA bridge chip and IRQs from it are
> routed to an ISA interrupt set by the Interrupt Line PCI config
> register. Change uhci_update_irq() to allow this and implement it in
> vt82c686-uhci-pci.

Looks good.  There are some unrelated changes in though (whitespace,
comments, ...), and the vt82c686-uhci-pci.c changes should be a
separate patch.

thanks,
  Gerd

Re: [PATCH v3] hw/usb/vt82c686-uhci-pci: Use ISA instead of PCI interrupts

2021-10-14 Thread Gerd Hoffmann

  Hi,

> >  if (s->masterbus) {
> >  USBPort *ports[NB_PORTS];
> 
> usb_uhci_common_realize() should be refactored making it PCI-agnostic.

Not sure this is needed here.  This seems to be more a platform-specific
oddity in IRQ routing than a non-pci UHCI device.

take care,
  Gerd

[PATCH 0/3] Postcopy migration: Add userfaultfd- user-mode-only capability

2021-10-14 Thread Lin Ma

Since kernel v5.11, Unprivileged user (without SYS_CAP_PTRACE capability)
must pass UFFD_USER_MODE_ONLY to userfaultd in case unprivileged_userfaultfd
sysctl knob is 0.
Please refer to https://lwn.net/Articles/819834/ and the kernel commits:
37cd0575 userfaultfd: add UFFD_USER_MODE_ONLY
d0d4730a userfaultfd: add user-mode only option to unprivileged_userfaultfd 
sysctl knob

This patch set adds a migration capability to pass UFFD_USER_MODE_ONLY
for postcopy migration.

Lin Ma (3):
  migration: introduce postcopy-uffd-usermode-only capability
  migration: postcopy-uffd-usermode-only documentation
  tests: add postcopy-uffd-usermode-only capability into migration-test

 docs/devel/migration.rst |  9 +
 migration/migration.c|  9 +
 migration/migration.h|  1 +
 migration/postcopy-ram.c | 22 +++---
 qapi/migration.json  |  8 +++-
 tests/qtest/migration-test.c | 11 +--
 6 files changed, 54 insertions(+), 6 deletions(-)

-- 
2.26.2

[PATCH 1/3] migration: introduce postcopy-uffd-usermode-only capability

2021-10-14 Thread Lin Ma

The default value of unprivileged_userfaultfd sysctl knob was changed to
0 since kernel v5.11 by commit d0d4730a:
userfaultfd: add user-mode only option to unprivileged_userfaultfd sysctl knob.

In this mode, An unprivileged user (without SYS_CAP_PTRACE capability) must
pass UFFD_USER_MODE_ONLY to userfaultd or the API will fail with EPERM.

So add a capability to pass UFFD_USER_MODE_ONLY to support it.

Signed-off-by: Lin Ma 
---
 migration/migration.c|  9 +
 migration/migration.h|  1 +
 migration/postcopy-ram.c | 22 +++---
 qapi/migration.json  |  8 +++-
 4 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 6ac807ef3d..86212dcb70 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2380,6 +2380,15 @@ bool migrate_postcopy_blocktime(void)
 return s->enabled_capabilities[MIGRATION_CAPABILITY_POSTCOPY_BLOCKTIME];
 }
 
+bool migrate_postcopy_uffd_usermode_only(void)
+{
+MigrationState *s;
+
+s = migrate_get_current();
+
+return 
s->enabled_capabilities[MIGRATION_CAPABILITY_POSTCOPY_UFFD_USERMODE_ONLY];
+}
+
 bool migrate_use_compression(void)
 {
 MigrationState *s;
diff --git a/migration/migration.h b/migration/migration.h
index 7a5aa8c2fd..a516d7f59f 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -358,6 +358,7 @@ int migrate_decompress_threads(void);
 bool migrate_use_events(void);
 bool migrate_postcopy_blocktime(void);
 bool migrate_background_snapshot(void);
+bool migrate_postcopy_uffd_usermode_only(void);
 
 /* Sending on the return path - generic and then for each message type */
 void migrate_send_rp_shut(MigrationIncomingState *mis,
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 2e9697bdd2..078c558626 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -206,9 +206,14 @@ static bool receive_ufd_features(uint64_t *features)
 struct uffdio_api api_struct = {0};
 int ufd;
 bool ret = true;
+int flags;
+
+flags = O_CLOEXEC;
+if (migrate_postcopy_uffd_usermode_only())
+flags |= UFFD_USER_MODE_ONLY;
 
 /* if we are here __NR_userfaultfd should exists */
-ufd = syscall(__NR_userfaultfd, O_CLOEXEC);
+ufd = syscall(__NR_userfaultfd, flags);
 if (ufd == -1) {
 error_report("%s: syscall __NR_userfaultfd failed: %s", __func__,
  strerror(errno));
@@ -352,13 +357,18 @@ bool 
postcopy_ram_supported_by_host(MigrationIncomingState *mis)
 struct uffdio_range range_struct;
 uint64_t feature_mask;
 Error *local_err = NULL;
+int flags;
 
 if (qemu_target_page_size() > pagesize) {
 error_report("Target page size bigger than host page size");
 goto out;
 }
 
-ufd = syscall(__NR_userfaultfd, O_CLOEXEC);
+flags = O_CLOEXEC;
+if (migrate_postcopy_uffd_usermode_only())
+flags |= UFFD_USER_MODE_ONLY;
+
+ufd = syscall(__NR_userfaultfd, flags);
 if (ufd == -1) {
 error_report("%s: userfaultfd not available: %s", __func__,
  strerror(errno));
@@ -1064,8 +1074,14 @@ retry:
 
 int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
 {
+int flags;
+
+flags = O_CLOEXEC | O_NONBLOCK;
+if (migrate_postcopy_uffd_usermode_only())
+flags |= UFFD_USER_MODE_ONLY;
+
 /* Open the fd for the kernel to give us userfaults */
-mis->userfault_fd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
+mis->userfault_fd = syscall(__NR_userfaultfd, flags);
 if (mis->userfault_fd == -1) {
 error_report("%s: Failed to open userfault fd: %s", __func__,
  strerror(errno));
diff --git a/qapi/migration.json b/qapi/migration.json
index 88f07baedd..3af1ec4cec 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -452,6 +452,11 @@
 #   procedure starts. The VM RAM is saved with running VM.
 #   (since 6.0)
 #
+# @postcopy-uffd-usermode-only: If enabled, It allows unprivileged users to use
+#   userfaultfd but with the restriction that page
+#   faults from only user mode can be handled.
+#   (since 6.2.0)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
@@ -459,7 +464,8 @@
'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram',
'block', 'return-path', 'pause-before-switchover', 'multifd',
'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
-   'x-ignore-shared', 'validate-uuid', 'background-snapshot'] }
+   'x-ignore-shared', 'validate-uuid', 'background-snapshot',
+   'postcopy-uffd-usermode-only'] }
 
 ##
 # @MigrationCapabilityStatus:
-- 
2.26.2

[PATCH 2/3] migration: postcopy-uffd-usermode-only documentation

2021-10-14 Thread Lin Ma

Signed-off-by: Lin Ma 
---
 docs/devel/migration.rst | 9 +
 1 file changed, 9 insertions(+)

diff --git a/docs/devel/migration.rst b/docs/devel/migration.rst
index 2401253482..dfdd3f20b4 100644
--- a/docs/devel/migration.rst
+++ b/docs/devel/migration.rst
@@ -639,6 +639,15 @@ postcopy-blocktime value of qmp command will show 
overlapped blocking
 time for all vCPU, postcopy-vcpu-blocktime will show list of blocking
 time per vCPU.
 
+Since kernel v5.11, Unprivileged user (without SYS_CAP_PTRACE capability)
+must pass UFFD_USER_MODE_ONLY to userfaultd if the unprivileged_userfaultfd
+sysctl knob is 0.
+
+To allow unprivileged user postcopy, Issue this command on destination
+monitor prior to turning on postcopy-ram:
+
+``migrate_set_capability postcopy-uffd-usermode-only on``
+
 .. note::
   During the postcopy phase, the bandwidth limits set using
   ``migrate_set_parameter`` is ignored (to avoid delaying requested pages that
-- 
2.26.2

[PATCH 3/3] tests: add postcopy-uffd-usermode-only capability into migration-test

2021-10-14 Thread Lin Ma

Signed-off-by: Lin Ma 
---
 tests/qtest/migration-test.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index cc5e83d98a..0cd4f49bed 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -38,6 +38,7 @@
 unsigned start_address;
 unsigned end_address;
 static bool uffd_feature_thread_id;
+static bool uffd_usermode_only;
 
 /* A downtime where the test really should converge */
 #define CONVERGE_DOWNTIME 1000
@@ -60,8 +61,12 @@ static bool ufd_version_check(void)
 int ufd = syscall(__NR_userfaultfd, O_CLOEXEC);
 
 if (ufd == -1) {
-g_test_message("Skipping test: userfaultfd not available");
-return false;
+ufd = syscall(__NR_userfaultfd, O_CLOEXEC | UFFD_USER_MODE_ONLY);
+if (ufd == -1) {
+   g_test_message("Skipping test: userfaultfd not available");
+return false;
+   } else
+uffd_usermode_only = true;
 }
 
 api_struct.api = UFFD_API;
@@ -670,6 +675,8 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
 }
 
 migrate_set_capability(from, "postcopy-ram", true);
+if (uffd_usermode_only)
+migrate_set_capability(to, "postcopy-uffd-usermode-only", true);
 migrate_set_capability(to, "postcopy-ram", true);
 migrate_set_capability(to, "postcopy-blocktime", true);
 
-- 
2.26.2

Is the ppc440 "bamboo" board in QEMU still of any use?

2021-10-14 Thread Thomas Huth




 Hi,

I tried to build a current Linux kernel for the "bamboo" board and use it in 
QEMU, but QEMU then quickly aborts with:


 pci.c:262: pci_bus_change_irq_level: Assertion `irq_num >= 0' failed.

(or with a "DCR write error" if I try to use the cuImage instead).

I googled a little bit and found this discussion:

https://qemu-devel.nongnu.narkive.com/vYHona3u/emulating-powerpc-440ep-with-qemu-system-ppcemb#post2

Seems like this board was used for KVM on the PPC440 only, and has never 
been enabled with the TCG emulation?


Well, KVM support on the 440 has been removed years ago already:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b2677b8dd8de0dc1496ede4da09b9dfd59f15cea

So is this "bamboo" board dead code in QEMU now? Or does anybody still have 
a kernel binary which could be used for testing it? Note: This board does 
not support "-bios", so u-boot or other firmwares are certainly also not an 
option here...

Should we mark "bamboo" as deprecated nowadays?

 Thomas

Re: [PATCH 0/3] linux-aio: allow block devices to limit aio-max-batch

2021-10-14 Thread Stefano Garzarella

Kind ping :-)

Thanks,
Stefano

On Thu, Sep 23, 2021 at 4:31 PM Stefano Garzarella  wrote:
>
> Commit d7ddd0a161 ("linux-aio: limit the batch size using
> `aio-max-batch` parameter") added a way to limit the batch size
> of Linux AIO backend for the entire AIO context.
>
> The same AIO context can be shared by multiple devices, so
> latency-sensitive devices may want to limit the batch size even
> more to avoid increasing latency.
>
> This series add the `aio-max-batch` option to the file backend,
> and use it in laio_co_submit() and laio_io_unplug() to limit the
> Linux AIO batch size more than the limit set by the AIO context.
>
> Stefano Garzarella (3):
>   file-posix: add `aio-max-batch` option
>   linux-aio: add `dev_max_batch` parameter to laio_co_submit()
>   linux-aio: add `dev_max_batch` parameter to laio_io_unplug()
>
>  qapi/block-core.json|  5 +
>  include/block/raw-aio.h |  6 --
>  block/file-posix.c  | 14 --
>  block/linux-aio.c   | 38 +++---
>  4 files changed, 48 insertions(+), 15 deletions(-)
>
> --
> 2.31.1
>
>

Re: [PATCH v1 7/9] migration: Simplify alignment and alignment checks

2021-10-14 Thread Philippe Mathieu-Daudé

On 10/11/21 19:53, David Hildenbrand wrote:
> Let's use QEMU_ALIGN_DOWN() and friends to make the code a bit easier to
> read.
> 
> Reviewed-by: Peter Xu 
> Signed-off-by: David Hildenbrand 
> ---
>  migration/migration.c| 6 +++---
>  migration/postcopy-ram.c | 9 -
>  migration/ram.c  | 2 +-
>  3 files changed, 8 insertions(+), 9 deletions(-)

Nice :)

Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 1/9] tests/docker: Add debian-nios2-cross image

2021-10-14 Thread Alex Bennée



Richard Henderson  writes:

> On 10/12/21 10:03 AM, Alex Bennée wrote:
>> We need to split this like in hexagon and have a second stage which does
>> a:
>>COPY --from=0 /usr/local /usr/local
>> This will limit the size of the final image (and also avoid
>> duplicting
>> the UID in the hexagon build).
>
> Yeah, well, I had to take that out because it errors out.
> I have no idea what that does or means.

Having this worked for me:

modified   tests/docker/dockerfiles/debian-nios2-cross.docker
@@ -31,4 +31,14 @@ ADD build-toolchain.sh /root/build-toolchain.sh
 
 RUN cd /root && ./build-toolchain.sh
 
+FROM debian:buster-slim
+# Duplicate deb line as deb-src
+RUN cat /etc/apt/sources.list | sed "s/^deb\ /deb-src /" >> 
/etc/apt/sources.list
+# Install QEMU build deps for use in CI
+RUN apt update && \
+DEBIAN_FRONTEND=noninteractive apt install -yy eatmydata && \
+DEBIAN_FRONTEND=noninteractive eatmydata apt install -yy git ninja-build 
&& \
+DEBIAN_FRONTEND=noninteractive eatmydata \
+apt build-dep -yy --arch-only qemu
+COPY --from=0 /usr/local /usr/local
 ENV PATH $PATH:/usr/local/bin/

-- 
Alex Bennée

Re: [PATCH 1/2] tests/docker: Remove fedora-i386-cross from DOCKER_PARTIAL_IMAGES

2021-10-14 Thread Alex Bennée



Richard Henderson  writes:

> The image was upgraded to a full image in ee381b7fe146.
> This makes it possible to use docker-test@image syntax
> with this container.
>
> Cc: Thomas Huth 
> Cc: Alex Bennée 
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée

Re: [PATCH 2/9] linux-user/nios2: Properly emulate EXCP_TRAP

2021-10-14 Thread Alex Bennée



Richard Henderson  writes:

> The real kernel has to load the instruction and extract
> the imm5 field; for qemu, modify the translator to do this.
>
> The use of R_AT for this in cpu_loop was a bug.  Handle
> the other trap numbers as per the kernel's trap_table.
>
> Signed-off-by: Richard Henderson 

I'm taking the values of the registers on trust but the change of moving
the error_code seems OK to me:

Reviewed-by: Alex Bennée 

-- 
Alex Bennée

Re: [PATCH v3 6/6] tests/qapi-schema: Test cases for aliases

2021-10-14 Thread Kevin Wolf

Am 13.10.2021 um 13:10 hat Markus Armbruster geschrieben:
> Markus Armbruster  writes:
> 
> > Kevin Wolf  writes:
> >
> >> Am 11.10.2021 um 09:44 hat Markus Armbruster geschrieben:
> >>> Kevin Wolf  writes:
> >>> 
> >>> [...]
> >>> 
> >>> > What I had in mind was using the schema to generate the necessary code,
> >>> > using the generic keyval parser everywhere, and just providing a hook
> >>> > where the QDict could be modified after the keyval parser and before the
> >>> > visitor. Most command line options would not have any explicit code for
> >>> > parsing input, only the declarative schema and the final option handler
> >>> > getting a QAPI type (which could actually be the corresponding QMP
> >>> > command handler in the common case).
> >>> 
> >>> A walk to the bakery made me see a problem with transforming the qdict
> >>> between keyval parsing and the visitor: error reporting.  On closer
> >>> investigation, the problem exists even with just aliases.
> >>
> >> I already commented on part of this on IRC, but let me reply here as
> >> well.
> >>
> >> On general remark is that I would make the same distinction between
> >> aliases for compatibility and usability that I mentioned elsewhere in
> >> this thread.
> >>
> >> In the case of compatibility, it's already a concession that we still
> >> accept the option - suboptimal error messages for incorrect command
> >> lines aren't a major concern for me there. Users are supposed to move to
> >> the new syntax anyway.
> >
> > Well, these aren't "incorrect", merely deprecated.  Bad UX is still bad
> > there, but at least it'll "fix" itself when the deprecated part goes
> > away.

Most of the error messages aren't "incorrect" either, merely suboptimal
and guiding the user towards verbose non-deprecated alternatives.

> >>> We obediently do what the error message tells us to do:
> >>> 
> >>> $ qemu-system-x86_64 -chardev 
> >>> udp,id=chr0,localaddr=localhost,backend.data.remote.data.port=12345
> >>> qemu-system-x86_64: -chardev 
> >>> udp,id=chr0,localaddr=localhost,backend.data.remote.data.port=12345: 
> >>> Parameters 'backend.*' used inconsistently
> >>> 
> >>> Mission accomplished: confusion :)
> >>
> >> This one already fails before aliases do their work. The reason is that
> >> the default key for -chardev is 'backend', and QMP and CLI use the same
> >> name 'backend' for two completely different things.
> >
> > Right.  I was confused (and the mission was truly accomplished).
> >
> >> We could rename the default key into 'x-backend' and make it behave the
> >> same as 'backend', then the keyval parser would only fail when you
> >> explicitly wrote '-chardev backend=udp,...' and the problem is more
> >> obvious.
> >
> > Technically a compatibility break, but we can hope that the longhand
> > form we change isn't used.

No, it's not a compatibility break. Existing command lines can only have
'backend=...', but not 'backend.*=...', so there is no conflict and they
keep working.

> >> By the way, we have a very similar issue with '-drive file=...', if we
> >> ever want to parse that one with the keyval parser. It can be both a
> >> string for the filename or an object for the configuration of the 'file'
> >> child that many block drivers have.
> >
> > Should I be running for the hills?

If that means that I can then just commit whatever feels right to me? :-P

> >>> Example: manual transformation leads to confusion #2
> >>> 
> >>> Confusion is possible even without tricking the user into mixing
> >>> "standard" and "alternate" explicitly:
> >>> 
> >>> $ qemu-system-x86_64 -chardev udp,id=chr0,backend.data.remote.type=udp
> >>> qemu-system-x86_64: -chardev 
> >>> udp,id=chr0,backend.data.remote.type=udp: Parameters 'backend.*' used 
> >>> inconsistently
> >>> 
> >>> Here, the user tried to stick to "standard", but forgot to specify a
> >>> required member.  The transformation machinery then "helpfully"
> >>> transformed nothing into something, which made the visitor throw up.
> >>
> >> Not the visitor, but the keyval parser. Same problem as above.
> >>
> >>> Clear error reporting is a critical part of a human-friendly interface.
> >>> We have two separate problems with it:
> >>> 
> >>> 1. The visitor reports errors as if aliases didn't exist
> >>> 
> >>>Fixing this is "merely" a matter of tracing back alias applications.
> >>>More complexity...
> >>> 
> >>> 2. The visitor reports errors as if manual transformation didn't exist
> >>> 
> >>>Manual transformation can distort the users input beyond recognition.
> >>>The visitor reports errors for the transformed input.
> >>> 
> >>>To fix this one, we'd have to augment the parse tree so it points
> >>>back at the actual user input, and then make the manual
> >>>transformations preserve that.  Uff!
> >>
> >> Manual transformations are hard to write in a way that they give perfect
> >> results. As long as they are only used for legacy syntax and we expect
> >> users to

Re: [PATCH v1] libvhost-user: fix VHOST_USER_REM_MEM_REG skipping mmap_addr

2021-10-14 Thread Stefan Hajnoczi

On Thu, Oct 14, 2021 at 04:52:48AM +, Raphael Norwitz wrote:
> On Wed, Oct 13, 2021 at 10:40:46AM +0100, Stefan Hajnoczi wrote:
> > On Mon, Oct 11, 2021 at 10:10:47PM +0200, David Hildenbrand wrote:
> > > We end up not copying the mmap_addr of all existing regions, resulting
> > > in a SEGFAULT once we actually try to map/access anything within our
> > > memory regions.
> > > 
> > > Fixes: 875b9fd97b34 ("Support individual region unmap in libvhost-user")
> > > Cc: qemu-sta...@nongnu.org
> > > Cc: Michael S. Tsirkin 
> > > Cc: Raphael Norwitz 
> > > Cc: "Marc-André Lureau" 
> > > Cc: Stefan Hajnoczi 
> > > Cc: Paolo Bonzini 
> > > Cc: Coiby Xu 
> > > Signed-off-by: David Hildenbrand 
> > > ---
> > >  subprojects/libvhost-user/libvhost-user.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/subprojects/libvhost-user/libvhost-user.c 
> > > b/subprojects/libvhost-user/libvhost-user.c
> > > index bf09693255..787f4d2d4f 100644
> > > --- a/subprojects/libvhost-user/libvhost-user.c
> > > +++ b/subprojects/libvhost-user/libvhost-user.c
> > > @@ -816,6 +816,7 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
> > >  shadow_regions[j].gpa = dev->regions[i].gpa;
> > >  shadow_regions[j].size = dev->regions[i].size;
> > >  shadow_regions[j].qva = dev->regions[i].qva;
> > > +shadow_regions[j].mmap_addr = dev->regions[i].mmap_addr;
> > >  shadow_regions[j].mmap_offset = dev->regions[i].mmap_offset;
> > >  j++;
> > >  } else {
> > 
> > Raphael: Some questions about vu_rem_mem_reg():
> > 
> > - What ensures that shadow_regions[VHOST_USER_MAX_RAM_SLOTS] is large
> >   enough? The add_mem_reg/set_mem_table code doesn't seem to check
> >   whether there is enough space in dev->regions[] before adding regions.
> >
> 
> Correct - it does not check if there is enough space as is. I can add that.
> 
> > - What happens when the master populated dev->regions[] with multiple
> >   copies of the same region? dev->nregions is only decremented once and
> >   no longer accurately reflects the number of elements in
> >   dev->regions[].
> 
> That case is also not accounted for. I will add it.
> 
> > 
> > libvhost-user must not trust the vhost-user master since vhost-user
> > needs to provide process isolation. Please add input validation.
> > 
> 
> Got it - let me start working on a series.

Great, thank you!

Stefan


signature.asc
Description: PGP signature

Re: [PATCH 2/2] tests/docker: Fix fedora-i386-cross

2021-10-14 Thread Alex Bennée



Richard Henderson  writes:

> By using PKG_CONFIG_PATH instead of PKG_CONFIG_LIBDIR,
> we were still including the 64-bit packages.  Install
> pcre-devel.i686 to fill a missing glib2 dependency.
>
> By using --extra-cflags instead of --cpu, we incorrectly
> use the wrong probing during meson.
>
> Cc: Alex Bennée 
> Cc: Paolo Bonzini 
> Cc: Daniel P. Berrangé 
> Cc: Richard W.M. Jones 
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée

Re: [PATCH 3/9] linux-user/nios2: Fixes for signal frame setup

2021-10-14 Thread Alex Bennée



Richard Henderson  writes:

> Do not confuse host and guest addresses.  Lock and unlock
> the target_rt_sigframe structure in setup_rt_sigframe.
>
> Since rt_setup_ucontext always returns 0, drop the return
> value entirely.  This eliminates the only write to the err
> variable in setup_rt_sigframe.
>
> Always copy the siginfo structure.
>
> Reviewed-by: Peter Maydell 
> Signed-off-by: Richard Henderson 
> Message-Id: <20210924165926.752809-19-richard.hender...@linaro.org>

Reviewed-by: Alex Bennée 

-- 
Alex Bennée

Re: [PATCH v3 3/3] vdpa: Check for iova range at mappings changes

2021-10-14 Thread Jason Wang

On Thu, Oct 14, 2021 at 4:08 PM Eugenio Perez Martin
 wrote:
>
> On Thu, Oct 14, 2021 at 9:02 AM Jason Wang  wrote:
> >
> > On Thu, Oct 14, 2021 at 1:57 PM Eugenio Perez Martin
> >  wrote:
> > >
> > > On Thu, Oct 14, 2021 at 5:30 AM Jason Wang  wrote:
> > > >
> > > > On Tue, Oct 12, 2021 at 10:07 PM Eugenio Pérez  
> > > > wrote:
> > > > >
> > > > > Check vdpa device range before updating memory regions so we don't add
> > > > > any outside of it, and report the invalid change if any.
> > > > >
> > > > > Signed-off-by: Eugenio Pérez 
> > > > > ---
> > > > >  include/hw/virtio/vhost-vdpa.h |  2 ++
> > > > >  hw/virtio/vhost-vdpa.c | 62 
> > > > > +-
> > > > >  hw/virtio/trace-events |  1 +
> > > > >  3 files changed, 49 insertions(+), 16 deletions(-)
> > > > >
> > > > > diff --git a/include/hw/virtio/vhost-vdpa.h 
> > > > > b/include/hw/virtio/vhost-vdpa.h
> > > > > index a8963da2d9..c288cf7ecb 100644
> > > > > --- a/include/hw/virtio/vhost-vdpa.h
> > > > > +++ b/include/hw/virtio/vhost-vdpa.h
> > > > > @@ -13,6 +13,7 @@
> > > > >  #define HW_VIRTIO_VHOST_VDPA_H
> > > > >
> > > > >  #include "hw/virtio/virtio.h"
> > > > > +#include "standard-headers/linux/vhost_types.h"
> > > > >
> > > > >  typedef struct VhostVDPAHostNotifier {
> > > > >  MemoryRegion mr;
> > > > > @@ -24,6 +25,7 @@ typedef struct vhost_vdpa {
> > > > >  uint32_t msg_type;
> > > > >  bool iotlb_batch_begin_sent;
> > > > >  MemoryListener listener;
> > > > > +struct vhost_vdpa_iova_range iova_range;
> > > > >  struct vhost_dev *dev;
> > > > >  VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
> > > > >  } VhostVDPA;
> > > > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > > > index be7c63b4ba..dbf773d032 100644
> > > > > --- a/hw/virtio/vhost-vdpa.c
> > > > > +++ b/hw/virtio/vhost-vdpa.c
> > > > > @@ -37,20 +37,34 @@ static Int128 vhost_vdpa_section_end(const 
> > > > > MemoryRegionSection *section)
> > > > >  return llend;
> > > > >  }
> > > > >
> > > > > -static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection 
> > > > > *section)
> > > > > -{
> > > > > -return (!memory_region_is_ram(section->mr) &&
> > > > > -!memory_region_is_iommu(section->mr)) ||
> > > > > -memory_region_is_protected(section->mr) ||
> > > > > -   /* vhost-vDPA doesn't allow MMIO to be mapped  */
> > > > > -memory_region_is_ram_device(section->mr) ||
> > > > > -   /*
> > > > > -* Sizing an enabled 64-bit BAR can cause spurious 
> > > > > mappings to
> > > > > -* addresses in the upper part of the 64-bit address 
> > > > > space.  These
> > > > > -* are never accessed by the CPU and beyond the address 
> > > > > width of
> > > > > -* some IOMMU hardware.  TODO: VDPA should tell us the 
> > > > > IOMMU width.
> > > > > -*/
> > > > > -   section->offset_within_address_space & (1ULL << 63);
> > >
> > > [1]
> > >
> > > > > +static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection 
> > > > > *section,
> > > > > +uint64_t iova_min,
> > > > > +uint64_t iova_max)
> > > > > +{
> > > > > +Int128 llend;
> > > > > +
> > > > > +if ((!memory_region_is_ram(section->mr) &&
> > > > > + !memory_region_is_iommu(section->mr)) ||
> > > > > +memory_region_is_protected(section->mr) ||
> > > > > +/* vhost-vDPA doesn't allow MMIO to be mapped  */
> > > > > +memory_region_is_ram_device(section->mr)) {
> > > > > +return true;
> > > > > +}
> > > > > +
> > > > > +if (section->offset_within_address_space < iova_min) {
> > > > > +error_report("RAM section out of device range (min=%lu, 
> > > > > addr=%lu)",
> > > > > + iova_min, section->offset_within_address_space);
> > > > > +return true;
> > > > > +}
> > > > > +
> > > > > +llend = vhost_vdpa_section_end(section);
> > > > > +if (int128_gt(llend, int128_make64(iova_max))) {
> > > > > +error_report("RAM section out of device range (max=%lu, end 
> > > > > addr=%lu)",
> > > > > + iova_max, int128_get64(llend));
> > > > > +return true;
> > > > > +}
> > > > > +
> > > > > +return false;
> > > > >  }
> > > > >
> > > > >  static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, 
> > > > > hwaddr size,
> > > > > @@ -162,7 +176,8 @@ static void 
> > > > > vhost_vdpa_listener_region_add(MemoryListener *listener,
> > > > >  void *vaddr;
> > > > >  int ret;
> > > > >
> > > > > -if (vhost_vdpa_listener_skipped_section(section)) {
> > > > > +if (vhost_vdpa_listener_skipped_section(section, 
> > > > > v->iova_range.first,
> > > > > +v->iova_range.last)) {
> > > > >  return;
> > > > >  }
> > > > >
> > > >

Re: [PATCH v3] hw/usb/vt82c686-uhci-pci: Use ISA instead of PCI interrupts

2021-10-14 Thread BALATON Zoltan


On Thu, 14 Oct 2021, Gerd Hoffmann wrote:

On Wed, Oct 13, 2021 at 02:13:09PM +0200, BALATON Zoltan wrote:

This device is part of a superio/ISA bridge chip and IRQs from it are
routed to an ISA interrupt set by the Interrupt Line PCI config
register. Change uhci_update_irq() to allow this and implement it in
vt82c686-uhci-pci.


Looks good.  There are some unrelated changes in though (whitespace,
comments, ...), and the vt82c686-uhci-pci.c changes should be a
separate patch.


So you mean split it into a series of three small patches? Should I do a 
w4 with that?


Regards,
BALATON Zoltan

Re: Is the ppc440 "bamboo" board in QEMU still of any use?

2021-10-14 Thread Christophe Leroy





Le 14/10/2021 à 11:31, Thomas Huth a écrit :


  Hi,

I tried to build a current Linux kernel for the "bamboo" board and use 
it in QEMU, but QEMU then quickly aborts with:


  pci.c:262: pci_bus_change_irq_level: Assertion `irq_num >= 0' failed.

(or with a "DCR write error" if I try to use the cuImage instead).

I googled a little bit and found this discussion:

https://qemu-devel.nongnu.narkive.com/vYHona3u/emulating-powerpc-440ep-with-qemu-system-ppcemb#post2 



Seems like this board was used for KVM on the PPC440 only, and has never 
been enabled with the TCG emulation?


Well, KVM support on the 440 has been removed years ago already:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b2677b8dd8de0dc1496ede4da09b9dfd59f15cea 



So is this "bamboo" board dead code in QEMU now? Or does anybody still 
have a kernel binary which could be used for testing it? Note: This 
board does not support "-bios", so u-boot or other firmwares are 
certainly also not an option here...

Should we mark "bamboo" as deprecated nowadays?



I have the following change in QEMU to be able to run the bamboo, found 
it some time ago via google (can't remember where):


diff --git a/hw/ppc/ppc4xx_pci.c b/hw/ppc/ppc4xx_pci.c
index 8147ba6f94..600e89e791 100644
--- a/hw/ppc/ppc4xx_pci.c
+++ b/hw/ppc/ppc4xx_pci.c
@@ -246,7 +246,7 @@ static int ppc4xx_pci_map_irq(PCIDevice *pci_dev, 
int irq_num)


 trace_ppc4xx_pci_map_irq(pci_dev->devfn, irq_num, slot);

-return slot - 1;
+return slot ? slot - 1 : slot;
 }

 static void ppc4xx_pci_set_irq(void *opaque, int irq_num, int level)
---

It's probably no the final change, but at least it allows booting bamboo 
on qemu again.


Christophe

Re: Is the ppc440 "bamboo" board in QEMU still of any use?

2021-10-14 Thread Christophe Leroy





Le 14/10/2021 à 12:34, Christophe Leroy a écrit :



Le 14/10/2021 à 11:31, Thomas Huth a écrit :


  Hi,

I tried to build a current Linux kernel for the "bamboo" board and use 
it in QEMU, but QEMU then quickly aborts with:


  pci.c:262: pci_bus_change_irq_level: Assertion `irq_num >= 0' failed.

(or with a "DCR write error" if I try to use the cuImage instead).

I googled a little bit and found this discussion:

https://qemu-devel.nongnu.narkive.com/vYHona3u/emulating-powerpc-440ep-with-qemu-system-ppcemb#post2 



Seems like this board was used for KVM on the PPC440 only, and has 
never been enabled with the TCG emulation?


Well, KVM support on the 440 has been removed years ago already:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b2677b8dd8de0dc1496ede4da09b9dfd59f15cea 



So is this "bamboo" board dead code in QEMU now? Or does anybody still 
have a kernel binary which could be used for testing it? Note: This 
board does not support "-bios", so u-boot or other firmwares are 
certainly also not an option here...

Should we mark "bamboo" as deprecated nowadays?



I have the following change in QEMU to be able to run the bamboo, found 
it some time ago via google (can't remember where):


diff --git a/hw/ppc/ppc4xx_pci.c b/hw/ppc/ppc4xx_pci.c
index 8147ba6f94..600e89e791 100644
--- a/hw/ppc/ppc4xx_pci.c
+++ b/hw/ppc/ppc4xx_pci.c
@@ -246,7 +246,7 @@ static int ppc4xx_pci_map_irq(PCIDevice *pci_dev, 
int irq_num)


  trace_ppc4xx_pci_map_irq(pci_dev->devfn, irq_num, slot);

-    return slot - 1;
+    return slot ? slot - 1 : slot;
  }

  static void ppc4xx_pci_set_irq(void *opaque, int irq_num, int level)
---

It's probably no the final change, but at least it allows booting bamboo 
on qemu again.




Found the source : 
https://www.mail-archive.com/qemu-devel@nongnu.org/msg769121.html


Christophe

Re: [PATCH 00/11] 9p: Add support for Darwin

2021-10-14 Thread Will Cohen

Many thanks for all the clarifications — it’s my first time using
git-send-email and first time with mailing-list-based devel workflows. Will
adjust accordingly, work through gitlab, and eventually resend via
git-publish as v2.

On Thu, Oct 14, 2021 at 3:04 AM Greg Kurz  wrote:

> Hi Will,
>
> It is strongly recommended that you Cc maintainers to increase the odds
> they notice your patches in the flood of qemu-devel. FYI I only noticed
> them because git-send-email Cc'd me thanks to the Reviewed-by: tags and
> my address didn't change in the meantime. I'm thus Cc'ing Christian
> who is the primary maintainer now (i.e. the person that can merge
> your patches and send a PR for upstream inclusion).
>
> FWIW git-publish [1] can Cc the relevant people for free.
>
> [1] https://github.com/stefanha/git-publish
>
> On Wed, 13 Oct 2021 19:03:54 -0400
> Will Cohen  wrote:
>
> > This is a continuation of a patch series adding 9p server support for
> Darwin,
> > originally submitted by Keno Fischer in mid-2018
> > (https://lists.nongnu.org/archive/html/qemu-devel/2018-06/msg04643.html).
> In
> > some sense, this could be considered [PATCH v4] of that process, but I
> assume
> > that the multi-year gap merits a fresh start..
> >
>
> This makes sense. For consistency with that assumption, it would also
> make sense to clear all preexisting Reviewed-by: tags.
>
> > It has since been updated and rebased for NixOS by Michael Roitzsch
> > (https://github.com/NixOS/nixpkgs/pull/122420) with a goal of
> resubmitting
> > upstream. I am submitting his patch set as suggested, as developed by
> Michael,
> > with his Signed-off-by headers included in full.
> >
>
> QEMU cares about tracking of who did what and follows a policy inspired
> from the linux kernel [2] [3].
>
> Michael's Signed-off-by: should then appear on all patches, with a
> mention of the extra changes that he made, e.g.
>
> Signed-off-by: Keno Fischer 
> [Michael Roitzsch: - rebased for NixOS
>- some other change]
> Signed-off-by: Michael Roitzsch 
>
> If no changes were made, you still need to add a Signed-off-by: tag.
>
> [2] https://wiki.qemu.org/Contribute/SubmitAPatch
> [3]
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297
>
> > Additionally, I have run the patches through checkpatch.pl and adjusted
> coding
> > style accordingly (with the exception of ignoring warnings about avoid
>
> Good ! If you have an account on gitlab, you can also push a branch there.
> It will be submitted to gitlab CI and maybe give you the opportunity to
> polish the patches some more before submission.
>
> > architecture specific defines in hw/9pfs/9p-util-darwin.c, where they
> seem
> > unavoidable), and have signed off on those modified commits.
> >
>
> As explained above, your Signed-off-by: is also needed in all patches,
> even if you didn't change them.
>
> Cheers,
>
> --
> Greg
>
> >
> >
>
>

Re: [PATCH v1] vsock: don't send messages when vsock device is not started

2021-10-14 Thread Stefano Garzarella

Please CC maintainers (MST in this case)

On Fri, 2021-10-01 at 18:42 +, Jiang Wang wrote:
> Added a check in vhost_vsock_common_send_transport_reset,
> and only send messages on event queue when the vsock host
> device is started.
> 
> Suggested-by: Stefano Garzarella 
> Signed-off-by: Jiang Wang 
> ---
>  hw/virtio/vhost-vsock-common.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/virtio/vhost-vsock-common.c b/hw/virtio/vhost-vsock-
> common.c
> index 4ad6e234ad..64425719a2 100644
> --- a/hw/virtio/vhost-vsock-common.c
> +++ b/hw/virtio/vhost-vsock-common.c
> @@ -138,8 +138,10 @@ static void
> vhost_vsock_common_send_transport_reset(VHostVSockCommon *vvc)
>  goto out;
>  }
>  
> -    virtqueue_push(vq, elem, sizeof(event));
> -    virtio_notify(VIRTIO_DEVICE(vvc), vq);
> +    if (vvc->vhost_dev.started) {
> +    virtqueue_push(vq, elem, sizeof(event));
> +    virtio_notify(VIRTIO_DEVICE(vvc), vq);
> +    }
>  
>  out:
>  g_free(elem);

I agree that we should avoid to enqueue reset messages if the device is
not started, but this change is wrong, because we are still doing
`virtqueue_pop()`.

I think we should skip vhost_vsock_common_send_transport_reset()
entirely.

Thanks,
Stefano

Re: Is the ppc440 "bamboo" board in QEMU still of any use?

2021-10-14 Thread Cédric Le Goater


On 10/14/21 12:34, Christophe Leroy wrote:



Le 14/10/2021 à 11:31, Thomas Huth a écrit :


  Hi,

I tried to build a current Linux kernel for the "bamboo" board and use it in 
QEMU, but QEMU then quickly aborts with:

  pci.c:262: pci_bus_change_irq_level: Assertion `irq_num >= 0' failed.

(or with a "DCR write error" if I try to use the cuImage instead).

I googled a little bit and found this discussion:

https://qemu-devel.nongnu.narkive.com/vYHona3u/emulating-powerpc-440ep-with-qemu-system-ppcemb#post2

Seems like this board was used for KVM on the PPC440 only, and has never been 
enabled with the TCG emulation?

Well, KVM support on the 440 has been removed years ago already:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b2677b8dd8de0dc1496ede4da09b9dfd59f15cea

So is this "bamboo" board dead code in QEMU now? Or does anybody still have a kernel 
binary which could be used for testing it? Note: This board does not support "-bios", so 
u-boot or other firmwares are certainly also not an option here...
Should we mark "bamboo" as deprecated nowadays?



I have the following change in QEMU to be able to run the bamboo, found it some 
time ago via google (can't remember where):

diff --git a/hw/ppc/ppc4xx_pci.c b/hw/ppc/ppc4xx_pci.c
index 8147ba6f94..600e89e791 100644
--- a/hw/ppc/ppc4xx_pci.c
+++ b/hw/ppc/ppc4xx_pci.c
@@ -246,7 +246,7 @@ static int ppc4xx_pci_map_irq(PCIDevice *pci_dev, int 
irq_num)

  trace_ppc4xx_pci_map_irq(pci_dev->devfn, irq_num, slot);

-    return slot - 1;
+    return slot ? slot - 1 : slot;
  }

  static void ppc4xx_pci_set_irq(void *opaque, int irq_num, int level)


could you try to use :

static inline int ppce500_pci_map_irq_slot(int devno, int irq_num)
{
return (devno + irq_num) % 4;
}

Thanks,

C.

Re: Is the ppc440 "bamboo" board in QEMU still of any use?

2021-10-14 Thread Mark Cave-Ayland


On 14/10/2021 11:47, Christophe Leroy wrote:


Le 14/10/2021 à 12:34, Christophe Leroy a écrit :



Le 14/10/2021 à 11:31, Thomas Huth a écrit :


  Hi,

I tried to build a current Linux kernel for the "bamboo" board and use it in QEMU, 
but QEMU then quickly aborts with:


  pci.c:262: pci_bus_change_irq_level: Assertion `irq_num >= 0' failed.

(or with a "DCR write error" if I try to use the cuImage instead).

I googled a little bit and found this discussion:

https://qemu-devel.nongnu.narkive.com/vYHona3u/emulating-powerpc-440ep-with-qemu-system-ppcemb#post2 



Seems like this board was used for KVM on the PPC440 only, and has never been 
enabled with the TCG emulation?


Well, KVM support on the 440 has been removed years ago already:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b2677b8dd8de0dc1496ede4da09b9dfd59f15cea 



So is this "bamboo" board dead code in QEMU now? Or does anybody still have a 
kernel binary which could be used for testing it? Note: This board does not 
support "-bios", so u-boot or other firmwares are certainly also not an option 
here...

Should we mark "bamboo" as deprecated nowadays?



I have the following change in QEMU to be able to run the bamboo, found it some 
time ago via google (can't remember where):


diff --git a/hw/ppc/ppc4xx_pci.c b/hw/ppc/ppc4xx_pci.c
index 8147ba6f94..600e89e791 100644
--- a/hw/ppc/ppc4xx_pci.c
+++ b/hw/ppc/ppc4xx_pci.c
@@ -246,7 +246,7 @@ static int ppc4xx_pci_map_irq(PCIDevice *pci_dev, int 
irq_num)

  trace_ppc4xx_pci_map_irq(pci_dev->devfn, irq_num, slot);

-    return slot - 1;
+    return slot ? slot - 1 : slot;
  }

  static void ppc4xx_pci_set_irq(void *opaque, int irq_num, int level)
---

It's probably no the final change, but at least it allows booting bamboo on qemu 
again.




Found the source : 
https://www.mail-archive.com/qemu-devel@nongnu.org/msg769121.html


Ah yes, that thread rings a bell. I think the important part was in my initial reply 
at https://www.mail-archive.com/qemu-devel@nongnu.org/msg769115.html: in other words 
ppc4xx_pci_map_irq() function expects the IRQ number to range from 1 to 4.


When I looked at this the issue was caused by the guest writing to PCI configuration 
space to disable PCI interrupts: this ends up calling pci_update_irq_disabled() as below:


/* Called after interrupt disabled field update in config space,
 * assert/deassert interrupts if necessary.
 * Gets original interrupt disable bit value (before update). */
static void pci_update_irq_disabled(PCIDevice *d, int was_irq_disabled)
{
int i, disabled = pci_irq_disabled(d);
if (disabled == was_irq_disabled)
return;
for (i = 0; i < PCI_NUM_PINS; ++i) {
int state = pci_irq_state(d, i);
pci_change_irq_level(d, i, disabled ? -state : state);
}
}

Since the IRQ is disabled pci_change_irq_level() ends up being called with -1 which 
triggers the assert().


My feeling is that the existing assert() is correct, since from what I can see 
without it there would be an IRQ array underflow, however I wasn't sure whether 
passing a negative number to pci_change_irq_level() is supposed to be valid?



ATB,

Mark.

Re: [PATCH RFC 12/15] virtio-mem: Expose device memory via separate memslots

2021-10-14 Thread Dr. David Alan Gilbert

* David Hildenbrand (da...@redhat.com) wrote:
> KVM nowadays supports a lot of memslots. We want to exploit that in
> virtio-mem, exposing device memory via separate memslots to the guest
> on demand, essentially reducing the total size of KVM slots
> significantly (and thereby metadata in KVM and in QEMU for KVM memory
> slots) especially when exposing initially only a small amount of memory
> via a virtio-mem device to the guest, to hotplug more later. Further,
> not always exposing the full device memory region to the guest reduces
> the attack surface in many setups without requiring other mechanisms
> like uffd for protection of unplugged memory.
> 
> So split the original RAM region via memory region aliases into separate
> chunks (ending up as individual memslots), and dynamically map the
> required chunks (falling into the usable region) into the container.
> 
> For now, we always map the memslots covered by the usable region. In the
> future, with VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE, we'll be able to map
> memslots on actual demand and optimize further.
> 
> Users can specify via the "max-memslots" property how much memslots the
> virtio-mem device is allowed to use at max. "0" translates to "auto, no
> limit" and is determinded automatically using a heuristic. When a maximum
> (> 1) is specified, that auto-determined value is capped. The parameter
> doesn't have to be migrated and can differ between source and destination.
> The only reason the parameter exists is not make some corner case setups
> (multiple large virtio-mem devices assigned to a single virtual NUMA node
>  with only very limited available memslots, hotplug of vhost devices) work.
> The parameter will be set to be "0" as default soon, whereby it will remain
> to be "1" for compat machines.
> 
> The properties "memslots" and "used-memslots" are read-only.
> 
> Signed-off-by: David Hildenbrand 

I think you need to move this patch after the vhost-user patches so that
you don't break a bisect including vhost-user.

But I do worry about the effect on vhost-user:
  a) What about external programs like dpdk?
  b) I worry if you end up with a LOT of slots you end up with a lot of
mmap's and fd's in vhost-user; I'm not quite sure what all the effects
of that will be.

Dave

> ---
>  hw/virtio/virtio-mem-pci.c |  22 +
>  hw/virtio/virtio-mem.c | 173 -
>  include/hw/virtio/virtio-mem.h |  29 +-
>  3 files changed, 220 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/virtio/virtio-mem-pci.c b/hw/virtio/virtio-mem-pci.c
> index be2383b0c5..2c1be2afb7 100644
> --- a/hw/virtio/virtio-mem-pci.c
> +++ b/hw/virtio/virtio-mem-pci.c
> @@ -82,6 +82,20 @@ static uint64_t virtio_mem_pci_get_min_alignment(const 
> MemoryDeviceState *md)
>  &error_abort);
>  }
>  
> +static unsigned int virtio_mem_pci_get_used_memslots(const MemoryDeviceState 
> *md,
> + Error **errp)
> +{
> +return object_property_get_uint(OBJECT(md), 
> VIRTIO_MEM_USED_MEMSLOTS_PROP,
> +&error_abort);
> +}
> +
> +static unsigned int virtio_mem_pci_get_memslots(const MemoryDeviceState *md,
> +Error **errp)
> +{
> +return object_property_get_uint(OBJECT(md), VIRTIO_MEM_MEMSLOTS_PROP,
> +&error_abort);
> +}
> +
>  static void virtio_mem_pci_size_change_notify(Notifier *notifier, void *data)
>  {
>  VirtIOMEMPCI *pci_mem = container_of(notifier, VirtIOMEMPCI,
> @@ -115,6 +129,8 @@ static void virtio_mem_pci_class_init(ObjectClass *klass, 
> void *data)
>  mdc->get_memory_region = virtio_mem_pci_get_memory_region;
>  mdc->fill_device_info = virtio_mem_pci_fill_device_info;
>  mdc->get_min_alignment = virtio_mem_pci_get_min_alignment;
> +mdc->get_used_memslots = virtio_mem_pci_get_used_memslots;
> +mdc->get_memslots = virtio_mem_pci_get_memslots;
>  }
>  
>  static void virtio_mem_pci_instance_init(Object *obj)
> @@ -142,6 +158,12 @@ static void virtio_mem_pci_instance_init(Object *obj)
>  object_property_add_alias(obj, VIRTIO_MEM_REQUESTED_SIZE_PROP,
>OBJECT(&dev->vdev),
>VIRTIO_MEM_REQUESTED_SIZE_PROP);
> +object_property_add_alias(obj, VIRTIO_MEM_MEMSLOTS_PROP,
> +  OBJECT(&dev->vdev),
> +  VIRTIO_MEM_MEMSLOTS_PROP);
> +object_property_add_alias(obj, VIRTIO_MEM_USED_MEMSLOTS_PROP,
> +  OBJECT(&dev->vdev),
> +  VIRTIO_MEM_USED_MEMSLOTS_PROP);
>  }
>  
>  static const VirtioPCIDeviceTypeInfo virtio_mem_pci_info = {
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index 1e29706798..f7e8f1db83 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -23,6 +23,7 @@
>  #include "hw/virtio

Re: [PATCH v2] docs/specs/tpm: Clarify command line parameters for network migration

2021-10-14 Thread David Edmondson

On Wednesday, 2021-10-13 at 23:27:00 -04, Stefan Berger wrote:

> Clarify the command line parameters for swtpm migration across the network
> for the case of shared storage and non-shared storage.
>
> Signed-off-by: Stefan Berger 
> ---
>  docs/specs/tpm.rst | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/docs/specs/tpm.rst b/docs/specs/tpm.rst
> index 3be190343a..956f2c13dc 100644
> --- a/docs/specs/tpm.rst
> +++ b/docs/specs/tpm.rst
> @@ -482,7 +482,8 @@ VM migration across the network:
>   - QEMU command line parameters should be identical apart from the
> '-incoming' option on the destination side
>  
> - - swtpm command line parameters should be identical
> + - swtpm command line parameters can be identical if storage is not
> +   shared and should be different for shared storage

I would interpret this as relating to the storage of the VM (virtual
disk devices), but you are referring to the storage used by swtpm,
presumably.

How about:

- typically, swtpm command line parameters should be different if the
  vTPM state directory is on shared storage, and should be the same if
  the vTPM state directory is not shared.

>  
>  VM Snapshotting:
>   - QEMU command line parameters should be identical

[PATCH v3 1/3] tests/acpi: Get prepared for IORT E.b revision upgrade

2021-10-14 Thread Eric Auger

Ignore IORT till reference blob for E.b spec revision gets
added.

Signed-off-by: Eric Auger 
---
 tests/qtest/bios-tables-test-allowed-diff.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8bf..9a5a923d6b8 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,2 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/virt/IORT",
-- 
2.26.3

[PATCH v3 0/3] hw/arm/virt_acpi_build: Upgrate the IORT table up to revision E.b

2021-10-14 Thread Eric Auger

This series upgrades the ACPI IORT table up to the E.b
specification revision. One of the goal of this upgrade
is to allow the addition of RMR nodes along with the SMMUv3.

The latest IORT specification (ARM DEN 0049E.b) can be found at
IO Remapping Table - Platform Design Document
https://developer.arm.com/documentation/den0049/latest/

This series can be found at
https://github.com/eauger/qemu.git
branch: iort_Eb_v3

History:
v2 -> v3:
- comment on IORT node ID (Igor)
- single mapping disabled comment
- Added Jean and Igor's R-b
- added diff to patvch 3/3

v1 -> v2:
- fix Revision value in ITS and SMMUv3 nodes (Phil)
- Increment an identifier (Phil)

Eric Auger (3):
  tests/acpi: Get prepared for IORT E.b revision upgrade
  hw/arm/virt-acpi-build: IORT upgrade up to revision E.b
  tests/acpi: Generate reference blob for IORT rev E.b

 hw/arm/virt-acpi-build.c  |  48 ++
 tests/data/acpi/virt/IORT | Bin 124 -> 128 bytes
 tests/data/acpi/virt/IORT.memhp   | Bin 124 -> 128 bytes
 tests/data/acpi/virt/IORT.numamem | Bin 124 -> 128 bytes
 tests/data/acpi/virt/IORT.pxb | Bin 124 -> 128 bytes
 5 files changed, 29 insertions(+), 19 deletions(-)

-- 
2.26.3

[PATCH v3 2/3] hw/arm/virt-acpi-build: IORT upgrade up to revision E.b

2021-10-14 Thread Eric Auger

Upgrade the IORT table from B to E.b specification
revision (ARM DEN 0049E.b).

The SMMUv3 and root complex node have additional
fields. Also unique IORT node identifiers are
introduced: they are generated in sequential order.
They are not cross-referenced though.

Signed-off-by: Eric Auger 
Reviewed-by: Jean-Philippe Brucker 
Reviewed-by: Igor Mammedov 

---

v2 -> v3:
- added Jean and Igor's R-b
- added comment about node IDs in the commit msg (Igor)
---
 hw/arm/virt-acpi-build.c | 48 
 1 file changed, 29 insertions(+), 19 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index dd17a48c00e..d3bb4cba3ed 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -241,19 +241,20 @@ static void acpi_dsdt_add_tpm(Aml *scope, 
VirtMachineState *vms)
 #endif
 
 #define ID_MAPPING_ENTRY_SIZE 20
-#define SMMU_V3_ENTRY_SIZE 60
-#define ROOT_COMPLEX_ENTRY_SIZE 32
+#define SMMU_V3_ENTRY_SIZE 68
+#define ROOT_COMPLEX_ENTRY_SIZE 36
 #define IORT_NODE_OFFSET 48
 
 static void build_iort_id_mapping(GArray *table_data, uint32_t input_base,
   uint32_t id_count, uint32_t out_ref)
 {
-/* Identity RID mapping covering the whole input RID range */
+/* Table 4 ID mapping format */
 build_append_int_noprefix(table_data, input_base, 4); /* Input base */
 build_append_int_noprefix(table_data, id_count, 4); /* Number of IDs */
 build_append_int_noprefix(table_data, input_base, 4); /* Output base */
 build_append_int_noprefix(table_data, out_ref, 4); /* Output Reference */
-build_append_int_noprefix(table_data, 0, 4); /* Flags */
+/* Flags */
+build_append_int_noprefix(table_data, 0 /* Single mapping (disabled) */, 
4);
 }
 
 struct AcpiIortIdMapping {
@@ -298,7 +299,7 @@ static int iort_idmap_compare(gconstpointer a, 
gconstpointer b)
 /*
  * Input Output Remapping Table (IORT)
  * Conforms to "IO Remapping Table System Software on ARM Platforms",
- * Document number: ARM DEN 0049B, October 2015
+ * Document number: ARM DEN 0049E.b, Feb 2021
  */
 static void
 build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
@@ -307,10 +308,11 @@ build_iort(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 const uint32_t iort_node_offset = IORT_NODE_OFFSET;
 size_t node_size, smmu_offset = 0;
 AcpiIortIdMapping *idmap;
+uint32_t id = 0;
 GArray *smmu_idmaps = g_array_new(false, true, sizeof(AcpiIortIdMapping));
 GArray *its_idmaps = g_array_new(false, true, sizeof(AcpiIortIdMapping));
 
-AcpiTable table = { .sig = "IORT", .rev = 0, .oem_id = vms->oem_id,
+AcpiTable table = { .sig = "IORT", .rev = 3, .oem_id = vms->oem_id,
 .oem_table_id = vms->oem_table_id };
 /* Table 2 The IORT */
 acpi_table_begin(&table, table_data);
@@ -358,12 +360,12 @@ build_iort(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 build_append_int_noprefix(table_data, IORT_NODE_OFFSET, 4);
 build_append_int_noprefix(table_data, 0, 4); /* Reserved */
 
-/* 3.1.1.3 ITS group node */
+/* Table 12 ITS Group Format */
 build_append_int_noprefix(table_data, 0 /* ITS Group */, 1); /* Type */
 node_size =  20 /* fixed header size */ + 4 /* 1 GIC ITS Identifier */;
 build_append_int_noprefix(table_data, node_size, 2); /* Length */
-build_append_int_noprefix(table_data, 0, 1); /* Revision */
-build_append_int_noprefix(table_data, 0, 4); /* Reserved */
+build_append_int_noprefix(table_data, 1, 1); /* Revision */
+build_append_int_noprefix(table_data, id++, 4); /* Identifier */
 build_append_int_noprefix(table_data, 0, 4); /* Number of ID mappings */
 build_append_int_noprefix(table_data, 0, 4); /* Reference to ID Array */
 build_append_int_noprefix(table_data, 1, 4); /* Number of ITSs */
@@ -374,19 +376,19 @@ build_iort(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 int irq =  vms->irqmap[VIRT_SMMU] + ARM_SPI_BASE;
 
 smmu_offset = table_data->len - table.table_offset;
-/* 3.1.1.2 SMMUv3 */
+/* Table 9 SMMUv3 Format */
 build_append_int_noprefix(table_data, 4 /* SMMUv3 */, 1); /* Type */
 node_size =  SMMU_V3_ENTRY_SIZE + ID_MAPPING_ENTRY_SIZE;
 build_append_int_noprefix(table_data, node_size, 2); /* Length */
-build_append_int_noprefix(table_data, 0, 1); /* Revision */
-build_append_int_noprefix(table_data, 0, 4); /* Reserved */
+build_append_int_noprefix(table_data, 4, 1); /* Revision */
+build_append_int_noprefix(table_data, id++, 4); /* Identifier */
 build_append_int_noprefix(table_data, 1, 4); /* Number of ID mappings 
*/
 /* Reference to ID Array */
 build_append_int_noprefix(table_data, SMMU_V3_ENTRY_SIZE, 4);
 /* Base address */
 build_append_int_noprefix(table_data, vms->memmap[VIRT_SMMU].base, 8);
 /* Fla

[PATCH v3 3/3] tests/acpi: Generate reference blob for IORT rev E.b

2021-10-14 Thread Eric Auger

Re-generate reference blobs with rebuild-expected-aml.sh.

Differences reported by "make check V=1" are listed below
(IORT.numamem). Differences for other variants are similar.

 /*
  * Intel ACPI Component Architecture
  * AML/ASL+ Disassembler version 20180629 (64-bit version)
  * Copyright (c) 2000 - 2018 Intel Corporation
  *
- * Disassembly of tests/data/acpi/virt/IORT.numamem, Thu Oct 14 06:13:19 2021
+ * Disassembly of /tmp/aml-K8L9A1, Thu Oct 14 06:13:19 2021
  *
  * ACPI Data Table [IORT]
  *
  * Format: [HexOffset DecimalOffset ByteLength]  FieldName : FieldValue
  */

 [000h    4]Signature : "IORT"[IO Remapping Table]
-[004h 0004   4] Table Length : 007C
-[008h 0008   1] Revision : 00
-[009h 0009   1] Checksum : 07
+[004h 0004   4] Table Length : 0080
+[008h 0008   1] Revision : 03
+[009h 0009   1] Checksum : B3
 [00Ah 0010   6]   Oem ID : "BOCHS "
 [010h 0016   8] Oem Table ID : "BXPC"
 [018h 0024   4] Oem Revision : 0001
 [01Ch 0028   4]  Asl Compiler ID : "BXPC"
 [020h 0032   4]Asl Compiler Revision : 0001

 [024h 0036   4]   Node Count : 0002
 [028h 0040   4]  Node Offset : 0030
 [02Ch 0044   4] Reserved : 

 [030h 0048   1] Type : 00
 [031h 0049   2]   Length : 0018
-[033h 0051   1] Revision : 00
+[033h 0051   1] Revision : 01
 [034h 0052   4] Reserved : 
 [038h 0056   4]Mapping Count : 
 [03Ch 0060   4]   Mapping Offset : 

 [040h 0064   4] ItsCount : 0001
 [044h 0068   4]  Identifiers : 

 [048h 0072   1] Type : 02
-[049h 0073   2]   Length : 0034
-[04Bh 0075   1] Revision : 00
-[04Ch 0076   4] Reserved : 
+[049h 0073   2]   Length : 0038
+[04Bh 0075   1] Revision : 03
+[04Ch 0076   4] Reserved : 0001
 [050h 0080   4]Mapping Count : 0001
-[054h 0084   4]   Mapping Offset : 0020
+[054h 0084   4]   Mapping Offset : 0024

 [058h 0088   8]Memory Properties : [IORT Memory Access Properties]
 [058h 0088   4]  Cache Coherency : 0001
 [05Ch 0092   1]Hints (decoded below) : 00
Transient : 0
   Write Allocate : 0
Read Allocate : 0
 Override : 0
 [05Dh 0093   2] Reserved : 
 [05Fh 0095   1] Memory Flags (decoded below) : 03
Coherency : 1
 Device Attribute : 1
 [060h 0096   4]ATS Attribute : 
 [064h 0100   4]   PCI Segment Number : 
-[068h 0104   1]Memory Size Limit : 00
+[068h 0104   1]Memory Size Limit : 40
 [069h 0105   3] Reserved : 00

-[068h 0104   4]   Input base : 
-[06Ch 0108   4] ID Count : 
-[070h 0112   4]  Output Base : 
-[074h 0116   4] Output Reference : 0030
-[078h 0120   4]Flags (decoded below) : 
+[06Ch 0108   4]   Input base : 
+[070h 0112   4] ID Count : 
+[074h 0116   4]  Output Base : 
+[078h 0120   4] Output Reference : 0030
+[07Ch 0124   4]Flags (decoded below) : 
   Single Mapping : 0

-Raw Table Data: Length 124 (0x7C)
+Raw Table Data: Length 128 (0x80)

-: 49 4F 52 54 7C 00 00 00 00 07 42 4F 43 48 53 20  // IORT|.BOCHS
+: 49 4F 52 54 80 00 00 00 03 B3 42 4F 43 48 53 20  // IORT..BOCHS
 0010: 42 58 50 43 20 20 20 20 01 00 00 00 42 58 50 43  // BXPCBXPC
 0020: 01 00 00 00 02 00 00 00 30 00 00 00 00 00 00 00  // 0...
-0030: 00 18 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // 
-0040: 01 00 00 00 00 00 00 00 02 34 00 00 00 00 00 00  // .4..
-0050: 01 00 00 00 20 00 00 00 01 00 00 00 00 00 00 03  //  ...
-0060: 00 00 00 00 00 00 00 00 00 00 00 00 FF FF 00 00  // 
-0070: 00 00 00 00 30 00 00 00 00 00 00 00  // 0...
+0030: 00 18 00 01 00 00 00 00 00 00 00 00 00 00 00 00  // 
+0040: 01 00 00 00 00 00 00 00 02 38 00 03 01 00 00 00  // .8..
+0050: 01 00 00 00 24 00 00 00 01 00 00 00 00 00 00 03  // $...
+0060:

Re: [RFC PATCH v4 08/20] vhost: Route guest->host notification through shadow virtqueue

2021-10-14 Thread Eugenio Perez Martin

On Wed, Oct 13, 2021 at 5:27 AM Jason Wang  wrote:
>
>
> 在 2021/10/1 下午3:05, Eugenio Pérez 写道:
> > Shadow virtqueue notifications forwarding is disabled when vhost_dev
> > stops, so code flow follows usual cleanup.
> >
> > Also, host notifiers must be disabled at SVQ start,
>
>
> Any reason for this?
>

It will be addressed in a later series, sorry.

>
> > and they will not
> > start if SVQ has been enabled when device is stopped. This is trivial
> > to address, but it is left out for simplicity at this moment.
>
>
> It looks to me this patch also contains the following logics
>
> 1) codes to enable svq
>
> 2) codes to let svq to be enabled from QMP.
>
> I think they need to be split out,

I agree that we can split this more, with the code that belongs to SVQ
and the code that belongs to vhost-vdpa. it will be addressed in
future series.

> we may endup with the following
> series of patches
>

With "series of patches" do you mean to send every step in a separated
series? There are odds of having the need of modifying code already
sent & merged with later ones. If you confirm to me that it is fine, I
can do it that way for sure.

> 1) svq skeleton with enable/disable
> 2) route host notifier to svq
> 3) route guest notifier to svq
> 4) codes to enable svq
> 5) enable svq via QMP
>

I'm totally fine with that, but there is code that is never called if
the qmp command is not added. The compiler complains about static
functions that are not called, making impossible things like bisecting
through these commits, unless I use attribute((unused)) or similar. Or
have I missed something?

We could do that way with the code that belongs to SVQ though, since
all of it is declared in headers. But to delay the "enable svq via
qmp" to the last one makes debugging harder, as we cannot just enable
notifications forwarding with no buffers forwarding.

If I introduce a change in the notifications code, I can simply go to
these commits and enable SVQ for notifications. This way I can have an
idea of what part is failing. A similar logic can be applied to other
devices than vp_vdpa. We would lose it if we

>
> >
> > Signed-off-by: Eugenio Pérez 
> > ---
> >   qapi/net.json  |   2 +-
> >   hw/virtio/vhost-shadow-virtqueue.h |   8 ++
> >   include/hw/virtio/vhost-vdpa.h |   4 +
> >   hw/virtio/vhost-shadow-virtqueue.c | 138 -
> >   hw/virtio/vhost-vdpa.c | 116 +++-
> >   5 files changed, 264 insertions(+), 4 deletions(-)
> >
> > diff --git a/qapi/net.json b/qapi/net.json
> > index a2c30fd455..fe546b0e7c 100644
> > --- a/qapi/net.json
> > +++ b/qapi/net.json
> > @@ -88,7 +88,7 @@
> >   #
> >   # @enable: true to use the alternate shadow VQ notifications
> >   #
> > -# Returns: Always error, since SVQ is not implemented at the moment.
> > +# Returns: Error if failure, or 'no error' for success.
> >   #
> >   # Since: 6.2
> >   #
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
> > b/hw/virtio/vhost-shadow-virtqueue.h
> > index 27ac6388fa..237cfceb9c 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > @@ -14,6 +14,14 @@
> >
> >   typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> >
> > +EventNotifier *vhost_svq_get_svq_call_notifier(VhostShadowVirtqueue *svq);
>
>
> Let's move this function to another patch since it's unrelated to the
> guest->host routing.
>

Right, I missed it while squashing commits and at later reviews.

>
> > +void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int 
> > call_fd);
> > +
> > +bool vhost_svq_start(struct vhost_dev *dev, unsigned idx,
> > + VhostShadowVirtqueue *svq);
> > +void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> > +VhostShadowVirtqueue *svq);
> > +
> >   VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx);
> >
> >   void vhost_svq_free(VhostShadowVirtqueue *vq);
> > diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> > index 0d565bb5bd..48aae59d8e 100644
> > --- a/include/hw/virtio/vhost-vdpa.h
> > +++ b/include/hw/virtio/vhost-vdpa.h
> > @@ -12,6 +12,8 @@
> >   #ifndef HW_VIRTIO_VHOST_VDPA_H
> >   #define HW_VIRTIO_VHOST_VDPA_H
> >
> > +#include 
> > +
> >   #include "qemu/queue.h"
> >   #include "hw/virtio/virtio.h"
> >
> > @@ -24,6 +26,8 @@ typedef struct vhost_vdpa {
> >   int device_fd;
> >   uint32_t msg_type;
> >   MemoryListener listener;
> > +bool shadow_vqs_enabled;
> > +GPtrArray *shadow_vqs;
> >   struct vhost_dev *dev;
> >   QLIST_ENTRY(vhost_vdpa) entry;
> >   VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
> > b/hw/virtio/vhost-shadow-virtqueue.c
> > index c4826a1b56..21dc99ab5d 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -9,9 +9,12 @@
> >
> >   #include "qemu/osdep.h"
> >   #incl

Re: [PATCH 00/11] 9p: Add support for Darwin

2021-10-14 Thread Christian Schoenebeck

On Donnerstag, 14. Oktober 2021 12:48:55 CEST Will Cohen wrote:
> Many thanks for all the clarifications — it’s my first time using
> git-send-email and first time with mailing-list-based devel workflows. Will
> adjust accordingly, work through gitlab, and eventually resend via
> git-publish as v2.

So the intended use case is macOS being host.

Has this been tested, and if yes, using which 9p client and which macOS 
version?

Best regards,
Christian Schoenebeck

Re: [RFC PATCH v4 05/20] vhost: Add x-vhost-enable-shadow-vq qmp

2021-10-14 Thread Eugenio Perez Martin

On Tue, Oct 12, 2021 at 3:46 PM Markus Armbruster  wrote:
>
> Eugenio Perez Martin  writes:
>
> > On Tue, Oct 12, 2021 at 7:18 AM Markus Armbruster  wrote:
> >>
> >> Eugenio Pérez  writes:
> >>
> >> > Command to enable shadow virtqueue.
> >> >
> >> > Signed-off-by: Eugenio Pérez 
> >> > ---
> >> >  qapi/net.json  | 23 +++
> >> >  hw/virtio/vhost-vdpa.c |  8 
> >> >  2 files changed, 31 insertions(+)
> >> >
> >> > diff --git a/qapi/net.json b/qapi/net.json
> >> > index 7fab2e7cd8..a2c30fd455 100644
> >> > --- a/qapi/net.json
> >> > +++ b/qapi/net.json
> >> > @@ -79,6 +79,29 @@
> >> >  { 'command': 'netdev_del', 'data': {'id': 'str'},
> >> >'allow-preconfig': true }
> >> >
> >> > +##
> >> > +# @x-vhost-enable-shadow-vq:
> >> > +#
> >> > +# Use vhost shadow virtqueue.
> >> > +#
> >> > +# @name: the device name of the VirtIO device
> >>
> >> Is this a qdev ID?  A network client name?
> >
> > At this moment is the virtio device name, the one specified at the
> > call of "virtio_init". But this should change, maybe the qdev id or
> > something that can be provided by the command line fits better here.
>
> To refer to a device backend, we commonly use a backend-specific ID.
> For network backends, that's NetClientState member name.
>

Ok so I will use the NetClientState member name, it fits way better
here than the virtio device name.

Thanks!

> To refer to a device frontend, we commonly use a qdev ID or a QOM path.
>
> [...]
>

Re: [PATCH v3 3/4] s390x: topology: CPU topology objects and structures

2021-10-14 Thread Pierre Morel





On 10/14/21 09:16, Thomas Huth wrote:

On 16/09/2021 15.50, Pierre Morel wrote:

We use new objects to have a dynamic administration of the CPU topology.
The highest level object in this implementation is the s390 book and
in this first implementation of CPU topology for S390 we have a single
book.
The book is built as a SYSBUS bridge during the CPU initialization.

Every object under this single book will be build dynamically
immediately after a CPU has be realized if it is needed.
The CPU will fill the sockets once after the other, according to the
number of core per socket defined during the smp parsing.

Each CPU inside a socket will be represented by a bit in a 64bit
unsigned long. Set on plug and clear on unplug of a CPU.

Signed-off-by: Pierre Morel 
---
  hw/s390x/cpu-topology.c | 353 
  hw/s390x/meson.build    |   1 +
  hw/s390x/s390-virtio-ccw.c  |   4 +
  include/hw/s390x/cpu-topology.h |  67 ++
  target/s390x/cpu.h  |  47 +
  5 files changed, 472 insertions(+)
  create mode 100644 hw/s390x/cpu-topology.c
  create mode 100644 include/hw/s390x/cpu-topology.h

diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
new file mode 100644
index 00..f0ffd86a4f
--- /dev/null
+++ b/hw/s390x/cpu-topology.c
@@ -0,0 +1,353 @@
+/*
+ * CPU Topology
+ *
+ * Copyright 2021 IBM Corp.
+ * Author(s): Pierre Morel 
+
+ * This work is licensed under the terms of the GNU GPL, version 2 or 
(at

+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "hw/sysbus.h"
+#include "hw/s390x/cpu-topology.h"
+#include "hw/qdev-properties.h"
+#include "hw/boards.h"
+#include "qemu/typedefs.h"
+#include "target/s390x/cpu.h"
+#include "hw/s390x/s390-virtio-ccw.h"
+
+static S390TopologyCores *s390_create_cores(S390TopologySocket *socket,
+    int origin)
+{
+    DeviceState *dev;
+    S390TopologyCores *cores;
+    const MachineState *ms = MACHINE(qdev_get_machine());
+
+    if (socket->bus->num_children >= ms->smp.cores) {
+    return NULL;
+    }
+
+    dev = qdev_new(TYPE_S390_TOPOLOGY_CORES);
+    qdev_realize_and_unref(dev, socket->bus, &error_fatal);
+
+    cores = S390_TOPOLOGY_CORES(dev);
+    cores->origin = origin;
+    socket->cnt += 1;
+
+    return cores;
+}
+
+static S390TopologySocket *s390_create_socket(S390TopologyBook *book, 
int id)

+{
+    DeviceState *dev;
+    S390TopologySocket *socket;
+    const MachineState *ms = MACHINE(qdev_get_machine());
+
+    if (book->bus->num_children >= ms->smp.sockets) {
+    return NULL;
+    }
+
+    dev = qdev_new(TYPE_S390_TOPOLOGY_SOCKET);
+    qdev_realize_and_unref(dev, book->bus, &error_fatal);
+
+    socket = S390_TOPOLOGY_SOCKET(dev);
+    socket->socket_id = id;
+    book->cnt++;
+
+    return socket;
+}
+
+/*
+ * s390_get_cores:
+ * @socket: the socket to search into
+ * @origin: the origin specified for the S390TopologyCores
+ *
+ * returns a pointer to a S390TopologyCores structure within a socket 
having

+ * the specified origin.
+ * First search if the socket is already containing the 
S390TopologyCores

+ * structure and if not create one with this origin.
+ */
+static S390TopologyCores *s390_get_cores(S390TopologySocket *socket, 
int origin)

+{
+    S390TopologyCores *cores;
+    BusChild *kid;
+
+    QTAILQ_FOREACH(kid, &socket->bus->children, sibling) {
+    cores = S390_TOPOLOGY_CORES(kid->child);
+    if (cores->origin == origin) {
+    return cores;
+    }
+    }
+    return s390_create_cores(socket, origin);
+}
+
+/*
+ * s390_get_socket:
+ * @book: The book to search into
+ * @socket_id: the identifier of the socket to search for
+ *
+ * returns a pointer to a S390TopologySocket structure within a book 
having

+ * the specified socket_id.
+ * First search if the book is already containing the S390TopologySocket
+ * structure and if not create one with this socket_id.
+ */
+static S390TopologySocket *s390_get_socket(S390TopologyBook *book,
+   int socket_id)
+{
+    S390TopologySocket *socket;
+    BusChild *kid;
+
+    QTAILQ_FOREACH(kid, &book->bus->children, sibling) {
+    socket = S390_TOPOLOGY_SOCKET(kid->child);
+    if (socket->socket_id == socket_id) {
+    return socket;
+    }
+    }
+    return s390_create_socket(book, socket_id);
+}
+
+/*
+ * s390_topology_new_cpu:
+ * @core_id: the core ID is machine wide
+ *
+ * We have a single book returned by s390_get_topology(),
+ * then we build the hierarchy on demand.
+ * Note that we do not destroy the hierarchy on error creating
+ * an entry in the topology, we just keep it empty.
+ * We do not need to worry about not finding a topology level
+ * entry this would have been caught during smp parsing.
+ */
+void s390_topology_new_cpu(int core_id)
+{
+

Re: [PATCH v3 0/3] hw/arm/virt_acpi_build: Upgrate the IORT table up to revision E.b

2021-10-14 Thread Eric Auger

Hi,

On 10/14/21 1:56 PM, Eric Auger wrote:
> This series upgrades the ACPI IORT table up to the E.b
> specification revision. One of the goal of this upgrade
> is to allow the addition of RMR nodes along with the SMMUv3.

while I have sent a separate RFC to expose RMR nodes along with SMMUv3 I
must aknowledge this will take time to be upstreamed (SMMUv3 nested
stage thingy), if ever, and this anyway requires additional time to mature:
- there is one problem with the IORT spec which mandates 1 RMR node per
BDF and also
- there is the issue of the boot preserve_config which was known to
introduce some regressions and which is also mandated by the IORT spec.

So I wanted to emphasize that this IORT upgrade does not come at the
moment with any new feature. My understanding is old guests should work
fine with this new revision but if you feel that's a bad thing, please
raise your voice.

Thanks

Eric
>
> The latest IORT specification (ARM DEN 0049E.b) can be found at
> IO Remapping Table - Platform Design Document
> https://developer.arm.com/documentation/den0049/latest/
>
> This series can be found at
> https://github.com/eauger/qemu.git
> branch: iort_Eb_v3
>
> History:
> v2 -> v3:
> - comment on IORT node ID (Igor)
> - single mapping disabled comment
> - Added Jean and Igor's R-b
> - added diff to patvch 3/3
>
> v1 -> v2:
> - fix Revision value in ITS and SMMUv3 nodes (Phil)
> - Increment an identifier (Phil)
>
> Eric Auger (3):
>   tests/acpi: Get prepared for IORT E.b revision upgrade
>   hw/arm/virt-acpi-build: IORT upgrade up to revision E.b
>   tests/acpi: Generate reference blob for IORT rev E.b
>
>  hw/arm/virt-acpi-build.c  |  48 ++
>  tests/data/acpi/virt/IORT | Bin 124 -> 128 bytes
>  tests/data/acpi/virt/IORT.memhp   | Bin 124 -> 128 bytes
>  tests/data/acpi/virt/IORT.numamem | Bin 124 -> 128 bytes
>  tests/data/acpi/virt/IORT.pxb | Bin 124 -> 128 bytes
>  5 files changed, 29 insertions(+), 19 deletions(-)
>

Re: [PATCH v3 0/3] hw/arm/virt_acpi_build: Upgrate the IORT table up to revision E.b

2021-10-14 Thread Michael S. Tsirkin

On Thu, Oct 14, 2021 at 01:56:40PM +0200, Eric Auger wrote:
> This series upgrades the ACPI IORT table up to the E.b
> specification revision. One of the goal of this upgrade
> is to allow the addition of RMR nodes along with the SMMUv3.
> 
> The latest IORT specification (ARM DEN 0049E.b) can be found at
> IO Remapping Table - Platform Design Document
> https://developer.arm.com/documentation/den0049/latest/
> 
> This series can be found at
> https://github.com/eauger/qemu.git
> branch: iort_Eb_v3

ACPI things look good

Reviewed-by: Michael S. Tsirkin 

ARM only so ARM tree?

> History:
> v2 -> v3:
> - comment on IORT node ID (Igor)
> - single mapping disabled comment
> - Added Jean and Igor's R-b
> - added diff to patvch 3/3
> 
> v1 -> v2:
> - fix Revision value in ITS and SMMUv3 nodes (Phil)
> - Increment an identifier (Phil)
> 
> Eric Auger (3):
>   tests/acpi: Get prepared for IORT E.b revision upgrade
>   hw/arm/virt-acpi-build: IORT upgrade up to revision E.b
>   tests/acpi: Generate reference blob for IORT rev E.b
> 
>  hw/arm/virt-acpi-build.c  |  48 ++
>  tests/data/acpi/virt/IORT | Bin 124 -> 128 bytes
>  tests/data/acpi/virt/IORT.memhp   | Bin 124 -> 128 bytes
>  tests/data/acpi/virt/IORT.numamem | Bin 124 -> 128 bytes
>  tests/data/acpi/virt/IORT.pxb | Bin 124 -> 128 bytes
>  5 files changed, 29 insertions(+), 19 deletions(-)
> 
> -- 
> 2.26.3

Re: [RFC PATCH v4 09/20] vdpa: Save call_fd in vhost-vdpa

2021-10-14 Thread Eugenio Perez Martin

On Wed, Oct 13, 2021 at 5:43 AM Jason Wang  wrote:
>
>
> 在 2021/10/1 下午3:05, Eugenio Pérez 写道:
> > We need to know it to switch to Shadow VirtQueue.
> >
> > Signed-off-by: Eugenio Pérez 
> > ---
> >   include/hw/virtio/vhost-vdpa.h | 2 ++
> >   hw/virtio/vhost-vdpa.c | 5 +
> >   2 files changed, 7 insertions(+)
> >
> > diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> > index 48aae59d8e..fddac248b3 100644
> > --- a/include/hw/virtio/vhost-vdpa.h
> > +++ b/include/hw/virtio/vhost-vdpa.h
> > @@ -30,6 +30,8 @@ typedef struct vhost_vdpa {
> >   GPtrArray *shadow_vqs;
> >   struct vhost_dev *dev;
> >   QLIST_ENTRY(vhost_vdpa) entry;
> > +/* File descriptor the device uses to call VM/SVQ */
> > +int call_fd[VIRTIO_QUEUE_MAX];
>
>
> Any reason we don't do this for kick_fd or why
> virtio_queue_get_guest_notifier() can't work here? Need a comment or
> commit log.
>
> I think we need to have a consistent way to handle both kick and call fd.
>
> Thanks
>

The reasons for it have been given in answers to patch 08/20, since
both have converged to it somehow. Please let me know if you think
otherwise and this needs to be continued here.

Thanks!

>
> >   VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
> >   } VhostVDPA;
> >
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 36c954a779..57a857444a 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -652,7 +652,12 @@ static int vhost_vdpa_set_vring_kick(struct vhost_dev 
> > *dev,
> >   static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
> >  struct vhost_vring_file *file)
> >   {
> > +struct vhost_vdpa *v = dev->opaque;
> > +int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
> > +
> >   trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
> > +
> > +v->call_fd[vdpa_idx] = file->fd;
> >   return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
> >   }
> >
>

Re: [PATCH v3] tests: qtest: Add virtio-iommu test

2021-10-14 Thread Jean-Philippe Brucker

Hi Eric,

On Thu, Oct 14, 2021 at 04:34:05AM -0400, Eric Auger wrote:
> Add the framework to test the virtio-iommu-pci device
> and tests exercising the attach/detach, map/unmap API.
> 
> Signed-off-by: Eric Auger 
> Acked-by: Thomas Huth 
> 
> ---
> 
> This applies on top of jean-Philippe's
> [PATCH v4 00/11] virtio-iommu: Add ACPI support
> branch can be found at:
> https://github.com/eauger/qemu.git
> branch qtest-virtio-iommu-v3
> 
> To run the tests:
> make tests/qtest/qos-test
> cd build
> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon 
> QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64  tests/qtest/qos-test

Looks like some archs cannot run the test:

$ make check# built with all targets
qemu-system-arm: -device virtio-iommu-device: VIRTIO-IOMMU is not attached to 
any PCI bus!
Broken pipe
ERROR qtest-arm/qos-test - too few tests run (expected 80, got 75)

Also, should the test run on aarch64?


[...]
> diff --git a/tests/qtest/virtio-iommu-test.c b/tests/qtest/virtio-iommu-test.c
> new file mode 100644
> index 00..ac4d38c779
> --- /dev/null
> +++ b/tests/qtest/virtio-iommu-test.c
> @@ -0,0 +1,299 @@
> +/*
> + * QTest testcase for VirtIO IOMMU
> + *
> + * Copyright (c) 2021 Red Hat, Inc.
> + *
> + * Authors:
> + *  Eric Auger 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or (at 
> your
> + * option) any later version.  See the COPYING file in the top-level 
> directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "libqtest-single.h"
> +#include "qemu/module.h"
> +#include "libqos/qgraph.h"
> +#include "libqos/virtio-iommu.h"
> +#include "hw/virtio/virtio-iommu.h"
> +
> +#define PCI_SLOT_HP 0x06
> +#define QVIRTIO_IOMMU_TIMEOUT_US (30 * 1000 * 1000)
> +
> +static QGuestAllocator *alloc;
> +
> +static void pci_config(void *obj, void *data, QGuestAllocator *t_alloc)
> +{
> +QVirtioIOMMU *v_iommu = obj;
> +QVirtioDevice *dev = v_iommu->vdev;
> +uint64_t input_range_start = qvirtio_config_readq(dev, 8);
> +uint64_t input_range_end = qvirtio_config_readq(dev, 16);
> +uint32_t domain_range_start = qvirtio_config_readl(dev, 24);
> +uint32_t domain_range_end = qvirtio_config_readl(dev, 28);
> +
> +g_assert_cmpint(input_range_start, ==, 0);
> +g_assert_cmphex(input_range_end, ==, UINT64_MAX);
> +g_assert_cmpint(domain_range_start, ==, 0);
> +g_assert_cmpint(domain_range_end, ==, 32);

By the way, this value seems to be left from when the config declared a
number of domain bits. It's now a range so the value could be UINT32_MAX.
Right now the driver can't manage more than 32 endpoints at a time.
I have a patch changing that but planning to send later, it doesn't feel
urgent.

> +}
> +
> +/**
> + * send_attach_detach - Send an attach/detach command to the device
> + * @type: VIRTIO_IOMMU_T_ATTACH/VIRTIO_IOMMU_T_DETACH
> + * @domain: domain the end point is attached to
> + * @ep: end-point

("endpoint"?)

> + */
> +static int send_attach_detach(QTestState *qts, QVirtioIOMMU *v_iommu,
> +  uint8_t type, uint32_t domain, uint32_t ep)
> +{
> +QVirtioDevice *dev = v_iommu->vdev;
> +QVirtQueue *vq = v_iommu->vq;
> +uint64_t ro_addr, wr_addr;
> +uint32_t free_head;
> +struct virtio_iommu_req_attach req; /* same layout as detach */

Should the reserved fields be initialized to zero?

The test fails here with my recent bypass patch, which sanity-checks the
new flags field. But I could change the test in the bypass series, since
the test doesn't fail as is.

On a related note, the spec says that the device MUST reject the request
if field reserved is not zero. I can also send a patch for that.

> +size_t ro_size = sizeof(req) - sizeof(struct virtio_iommu_req_tail);
> +size_t wr_size = sizeof(struct virtio_iommu_req_tail);
> +struct virtio_iommu_req_tail buffer;
> +int ret;
> +
> +req.head.type = type;
> +req.domain = domain;
> +req.endpoint = ep;

A driver would explicitly write little-endian in there with cpu_to_le*().
I guess that also matters here if the test could run on a big-endian host?

> +
> +ro_addr = guest_alloc(alloc, ro_size);
> +wr_addr = guest_alloc(alloc, wr_size);
> +
> +qtest_memwrite(qts, ro_addr, &req, ro_size);
> +free_head = qvirtqueue_add(qts, vq, ro_addr, ro_size, false, true);
> +qvirtqueue_add(qts, vq, wr_addr, wr_size, true, false);
> +qvirtqueue_kick(qts, dev, vq, free_head);
> +qvirtio_wait_used_elem(qts, dev, vq, free_head, NULL,
> +   QVIRTIO_IOMMU_TIMEOUT_US);
> +qtest_memread(qts, wr_addr, &buffer, wr_size);
> +ret = buffer.status;

Could check that the rest of the buffer is still 0

> +guest_free(alloc, ro_addr);
> +guest_free(alloc, wr_addr);
> +return ret;
> +}
> +
> +/**
> + * send_map - Send a map command to the device
> + * @domain: domain the new binding is attached to

(what is the new binding?)

> + * @v

Re: [PATCH 00/11] 9p: Add support for Darwin

2021-10-14 Thread Will Cohen

Correct. It's been tested and functions when applied to QEMU master, with
host running macOS Big Sur 11.6 (personal machine) using client 9p2000.L
(taking a cue from the guest mounting instructions on
https://wiki.qemu.org/Documentation/9psetup).

On Thu, Oct 14, 2021 at 7:57 AM Christian Schoenebeck <
qemu_...@crudebyte.com> wrote:

> On Donnerstag, 14. Oktober 2021 12:48:55 CEST Will Cohen wrote:
> > Many thanks for all the clarifications — it’s my first time using
> > git-send-email and first time with mailing-list-based devel workflows.
> Will
> > adjust accordingly, work through gitlab, and eventually resend via
> > git-publish as v2.
>
> So the intended use case is macOS being host.
>
> Has this been tested, and if yes, using which 9p client and which macOS
> version?
>
> Best regards,
> Christian Schoenebeck
>
>
>

Re: [RFC PATCH v4 10/20] vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call

2021-10-14 Thread Eugenio Perez Martin

On Wed, Oct 13, 2021 at 5:43 AM Jason Wang  wrote:
>
>
> 在 2021/10/1 下午3:05, Eugenio Pérez 写道:
> > Signed-off-by: Eugenio Pérez 
> > ---
> >   hw/virtio/vhost-vdpa.c | 17 ++---
> >   1 file changed, 14 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 57a857444a..bc34de2439 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -649,16 +649,27 @@ static int vhost_vdpa_set_vring_kick(struct vhost_dev 
> > *dev,
> >   return vhost_vdpa_call(dev, VHOST_SET_VRING_KICK, file);
> >   }
> >
> > +static int vhost_vdpa_set_vring_dev_call(struct vhost_dev *dev,
> > + struct vhost_vring_file *file)
> > +{
> > +trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
> > +return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
> > +}
> > +
> >   static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
> >  struct vhost_vring_file *file)
> >   {
> >   struct vhost_vdpa *v = dev->opaque;
> >   int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
> >
> > -trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
> > -
> >   v->call_fd[vdpa_idx] = file->fd;
> > -return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
> > +if (v->shadow_vqs_enabled) {
> > +VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, 
> > vdpa_idx);
> > +vhost_svq_set_guest_call_notifier(svq, file->fd);
> > +return 0;
> > +} else {
> > +return vhost_vdpa_set_vring_dev_call(dev, file);
> > +}
>
>
> I feel like we should do the same for kick fd.
>
> Thanks
>

I think this also has been answered on 08/20, but feel free to tell me
otherwise if I missed something.

Thanks!

>
> >   }
> >
> >   static int vhost_vdpa_get_features(struct vhost_dev *dev,
>

[PATCH] virtiofsd: Error on bad socket group name

2021-10-14 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

Make the '--socket-group=' option fail if the group name is unknown:

./tools/virtiofsd/virtiofsd  --socket-group=zaphod
vhost socket: unable to find group 'zaphod'

Reported-by: Xiaoling Gao 
Signed-off-by: Dr. David Alan Gilbert 
---
 tools/virtiofsd/fuse_virtio.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 8f4fd165b9..39eebffb62 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -999,6 +999,13 @@ static int fv_create_listen_socket(struct fuse_session *se)
  "vhost socket failed to set group to %s (%d): %m\n",
  se->vu_socket_group, g->gr_gid);
 }
+} else {
+fuse_log(FUSE_LOG_ERR,
+ "vhost socket: unable to find group '%s'\n",
+ se->vu_socket_group);
+close(listen_sock);
+umask(old_umask);
+return -1;
 }
 }
 umask(old_umask);
-- 
2.31.1

Re: [PATCH 00/11] 9p: Add support for Darwin

2021-10-14 Thread Christian Schoenebeck

On Donnerstag, 14. Oktober 2021 14:22:19 CEST Will Cohen wrote:
> Correct. It's been tested and functions when applied to QEMU master, with
> host running macOS Big Sur 11.6 (personal machine) using client 9p2000.L
> (taking a cue from the guest mounting instructions on
> https://wiki.qemu.org/Documentation/9psetup).

So it was the Linux kernel's 9p client on guest side with 9p 'local' fs driver 
and 9p transport driver was 'virtio-pci'.

I was just wondering if somebody already bothered for macOS being the guest, 
because that use case is a bit more challenging, especially with macOS 11 and 
higher. But I see that's nothing you were into.

> On Thu, Oct 14, 2021 at 7:57 AM Christian Schoenebeck <
> 
> qemu_...@crudebyte.com> wrote:
> > On Donnerstag, 14. Oktober 2021 12:48:55 CEST Will Cohen wrote:
> > > Many thanks for all the clarifications — it’s my first time using
> > > git-send-email and first time with mailing-list-based devel workflows.
> > 
> > Will
> > 
> > > adjust accordingly, work through gitlab, and eventually resend via
> > > git-publish as v2.
> > 
> > So the intended use case is macOS being host.
> > 
> > Has this been tested, and if yes, using which 9p client and which macOS
> > version?
> > 
> > Best regards,
> > Christian Schoenebeck

Re: Is the ppc440 "bamboo" board in QEMU still of any use?

2021-10-14 Thread BALATON Zoltan


On Thu, 14 Oct 2021, Mark Cave-Ayland wrote:

On 14/10/2021 11:47, Christophe Leroy wrote:

Le 14/10/2021 à 12:34, Christophe Leroy a écrit :

Le 14/10/2021 à 11:31, Thomas Huth a écrit :


  Hi,

I tried to build a current Linux kernel for the "bamboo" board and use it 
in QEMU, but QEMU then quickly aborts with:


  pci.c:262: pci_bus_change_irq_level: Assertion `irq_num >= 0' failed.

(or with a "DCR write error" if I try to use the cuImage instead).

I googled a little bit and found this discussion:

https://qemu-devel.nongnu.narkive.com/vYHona3u/emulating-powerpc-440ep-with-qemu-system-ppcemb#post2 

Seems like this board was used for KVM on the PPC440 only, and has never 
been enabled with the TCG emulation?


Well, KVM support on the 440 has been removed years ago already:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b2677b8dd8de0dc1496ede4da09b9dfd59f15cea 

So is this "bamboo" board dead code in QEMU now? Or does anybody still 
have a kernel binary which could be used for testing it? Note: This board 
does not support "-bios", so u-boot or other firmwares are certainly also 
not an option here...

Should we mark "bamboo" as deprecated nowadays?



I have the following change in QEMU to be able to run the bamboo, found it 
some time ago via google (can't remember where):


diff --git a/hw/ppc/ppc4xx_pci.c b/hw/ppc/ppc4xx_pci.c
index 8147ba6f94..600e89e791 100644
--- a/hw/ppc/ppc4xx_pci.c
+++ b/hw/ppc/ppc4xx_pci.c
@@ -246,7 +246,7 @@ static int ppc4xx_pci_map_irq(PCIDevice *pci_dev, int 
irq_num)


  trace_ppc4xx_pci_map_irq(pci_dev->devfn, irq_num, slot);

-    return slot - 1;
+    return slot ? slot - 1 : slot;
  }

  static void ppc4xx_pci_set_irq(void *opaque, int irq_num, int level)
---

It's probably no the final change, but at least it allows booting bamboo 
on qemu again.




Found the source : 
https://www.mail-archive.com/qemu-devel@nongnu.org/msg769121.html


Ah yes, that thread rings a bell. I think the important part was in my 
initial reply at 
https://www.mail-archive.com/qemu-devel@nongnu.org/msg769115.html: in other 
words ppc4xx_pci_map_irq() function expects the IRQ number to range from 1 to 
4.


When I looked at this the issue was caused by the guest writing to PCI 
configuration space to disable PCI interrupts: this ends up calling 
pci_update_irq_disabled() as below:


/* Called after interrupt disabled field update in config space,
* assert/deassert interrupts if necessary.
* Gets original interrupt disable bit value (before update). */
static void pci_update_irq_disabled(PCIDevice *d, int was_irq_disabled)
{
   int i, disabled = pci_irq_disabled(d);
   if (disabled == was_irq_disabled)
   return;
   for (i = 0; i < PCI_NUM_PINS; ++i) {
   int state = pci_irq_state(d, i);
   pci_change_irq_level(d, i, disabled ? -state : state);
   }
}

Since the IRQ is disabled pci_change_irq_level() ends up being called with -1 
which triggers the assert().


My feeling is that the existing assert() is correct, since from what I can 
see without it there would be an IRQ array underflow, however I wasn't sure 
whether passing a negative number to pci_change_irq_level() is supposed to be 
valid?


A comment from Peter Maydell in hw/ppc/ppc440_pcix.c which is similar code 
for sam460ex says:


/*
 * All four IRQ[ABCD] pins from all slots are tied to a single board
 * IRQ, so our mapping function here maps everything to IRQ 0.
 * The code in pci_change_irq_level() tracks the number of times
 * the mapped IRQ is asserted and deasserted, so if multiple devices
 * assert an IRQ at the same time the behaviour is correct.
 *
 * This may need further refactoring for boards that use multiple IRQ lines.
 */
static int ppc440_pcix_map_irq(PCIDevice *pci_dev, int irq_num)

So I guess negative values could be valid to mean decrease the number of 
times this was raised but I'm not sure. The sam460ex maps all PCI 
interrupts to a single IRQ so this function just returns 0 and let the 
hw/pci/pci.c deal with the rest. Something similar might work for 440ep 
but don't know what the real hardware does. I could not find any 
explanation or IRQ mapping in the data sheets I have on 440ep but this may 
be specific to the bamboo board and we don't have much info on that.


Regards,
BALATON Zoltan

Re: [PATCH v3] hw/usb/vt82c686-uhci-pci: Use ISA instead of PCI interrupts

2021-10-14 Thread Gerd Hoffmann

On Thu, Oct 14, 2021 at 12:22:58PM +0200, BALATON Zoltan wrote:
> On Thu, 14 Oct 2021, Gerd Hoffmann wrote:
> > On Wed, Oct 13, 2021 at 02:13:09PM +0200, BALATON Zoltan wrote:
> > > This device is part of a superio/ISA bridge chip and IRQs from it are
> > > routed to an ISA interrupt set by the Interrupt Line PCI config
> > > register. Change uhci_update_irq() to allow this and implement it in
> > > vt82c686-uhci-pci.
> > 
> > Looks good.  There are some unrelated changes in though (whitespace,
> > comments, ...), and the vt82c686-uhci-pci.c changes should be a
> > separate patch.
> 
> So you mean split it into a series of three small patches? Should I do a w4
> with that?

I was thinking about two patches: drop the unrelated stuff, one patch
for the irq signaling change, and one for the vt82c686 changes.

But of course you can add more patches for the other changes, i.e.
dropping the else branch for level = 0 and other small tweaks.

take care,
  Gerd

Re: [PATCH v3] hw/usb/vt82c686-uhci-pci: Use ISA instead of PCI interrupts

2021-10-14 Thread BALATON Zoltan


On Thu, 14 Oct 2021, Gerd Hoffmann wrote:

On Thu, Oct 14, 2021 at 12:22:58PM +0200, BALATON Zoltan wrote:

On Thu, 14 Oct 2021, Gerd Hoffmann wrote:

On Wed, Oct 13, 2021 at 02:13:09PM +0200, BALATON Zoltan wrote:

This device is part of a superio/ISA bridge chip and IRQs from it are
routed to an ISA interrupt set by the Interrupt Line PCI config
register. Change uhci_update_irq() to allow this and implement it in
vt82c686-uhci-pci.


Looks good.  There are some unrelated changes in though (whitespace,
comments, ...), and the vt82c686-uhci-pci.c changes should be a
separate patch.


So you mean split it into a series of three small patches? Should I do a w4
with that?


I was thinking about two patches: drop the unrelated stuff, one patch
for the irq signaling change, and one for the vt82c686 changes.

But of course you can add more patches for the other changes, i.e.
dropping the else branch for level = 0 and other small tweaks.


The tewak for the comment is needed for checkpach, dropping the else is 
just to make the function shorter and IMO more readable. I can do this in 
a previous patch so checkpatch won't complain on the pci irq change or 
need at least the comment change in that patch. I'm thinking about an 
alternative way for the vt82c686 part to avoid using isa_get_irq which has 
a comment saying it should be dropped but not sure about that yet, I'll 
try it and submit a v4 with at least breaking it up to smaller patches.


Thanks,
BALATON Zoltan

Re: [PATCH 2/3] gdbstub: implement NOIRQ support for single step on KVM

2021-10-14 Thread Alex Bennée



Maxim Levitsky  writes:

> On Wed, 2021-10-13 at 16:50 +0100, Alex Bennée wrote:
>> Maxim Levitsky  writes:
>> 
>> > Signed-off-by: Maxim Levitsky 
>> > ---
>> >  accel/kvm/kvm-all.c  | 25 ++
>> >  gdbstub.c| 60 
>> >  include/sysemu/kvm.h | 13 ++
>> >  3 files changed, 88 insertions(+), 10 deletions(-)
>> > 
>> > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> > index 6b187e9c96..e141260796 100644
>> > --- a/accel/kvm/kvm-all.c
>> > +++ b/accel/kvm/kvm-all.c
>> > @@ -169,6 +169,8 @@ bool kvm_vm_attributes_allowed;
>> >  bool kvm_direct_msi_allowed;
>> >  bool kvm_ioeventfd_any_length_allowed;
>> >  bool kvm_msi_use_devid;
>> > +bool kvm_has_guest_debug;
>> > +int kvm_sstep_flags;
>> >  static bool kvm_immediate_exit;
>> >  static hwaddr kvm_max_slot_size = ~0;
>> >  
>> > @@ -2559,6 +2561,25 @@ static int kvm_init(MachineState *ms)
>> >  kvm_sregs2 =
>> >  (kvm_check_extension(s, KVM_CAP_SREGS2) > 0);
>> >  
>> > +kvm_has_guest_debug =
>> > +(kvm_check_extension(s, KVM_CAP_SET_GUEST_DEBUG) > 0);
>> > +
>> > +kvm_sstep_flags = 0;
>> > +
>> > +if (kvm_has_guest_debug) {
>> > +/* Assume that single stepping is supported */
>> > +kvm_sstep_flags = SSTEP_ENABLE;
>> > +
>> > +int guest_debug_flags =
>> > +kvm_check_extension(s, KVM_CAP_SET_GUEST_DEBUG2);
>> > +
>> > +if (guest_debug_flags > 0) {
>> > +if (guest_debug_flags & KVM_GUESTDBG_BLOCKIRQ) {
>> > +kvm_sstep_flags |= SSTEP_NOIRQ;
>> > +}
>> > +}
>> > +}
>> > +
>> >  kvm_state = s;
>> >  
>> >  ret = kvm_arch_init(ms, s);
>> > @@ -3188,6 +3209,10 @@ int kvm_update_guest_debug(CPUState *cpu, unsigned 
>> > long reinject_trap)
>> >  
>> >  if (cpu->singlestep_enabled) {
>> >  data.dbg.control |= KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_SINGLESTEP;
>> > +
>> > +if (cpu->singlestep_enabled & SSTEP_NOIRQ) {
>> > +data.dbg.control |= KVM_GUESTDBG_BLOCKIRQ;
>> > +}
>> >  }
>> >  kvm_arch_update_guest_debug(cpu, &data.dbg);
>> >  
>> > diff --git a/gdbstub.c b/gdbstub.c
>> > index 5d8e6ae3cd..48bb803bae 100644
>> > --- a/gdbstub.c
>> > +++ b/gdbstub.c
>> > @@ -368,12 +368,11 @@ typedef struct GDBState {
>> >  gdb_syscall_complete_cb current_syscall_cb;
>> >  GString *str_buf;
>> >  GByteArray *mem_buf;
>> > +int sstep_flags;
>> > +int supported_sstep_flags;
>> >  } GDBState;
>> >  
>> > -/* By default use no IRQs and no timers while single stepping so as to
>> > - * make single stepping like an ICE HW step.
>> > - */
>> > -static int sstep_flags = SSTEP_ENABLE|SSTEP_NOIRQ|SSTEP_NOTIMER;
>> > +static GDBState gdbserver_state;
>> >  
>> >  /* Retrieves flags for single step mode. */
>> >  static int get_sstep_flags(void)
>> > @@ -385,11 +384,10 @@ static int get_sstep_flags(void)
>> >  if (replay_mode != REPLAY_MODE_NONE) {
>> >  return SSTEP_ENABLE;
>> >  } else {
>> > -return sstep_flags;
>> > +return gdbserver_state.sstep_flags;
>> >  }
>> >  }
>> >  
>> > -static GDBState gdbserver_state;
>> >  
>> >  static void init_gdbserver_state(void)
>> >  {
>> > @@ -399,6 +397,23 @@ static void init_gdbserver_state(void)
>> >  gdbserver_state.str_buf = g_string_new(NULL);
>> >  gdbserver_state.mem_buf = g_byte_array_sized_new(MAX_PACKET_LENGTH);
>> >  gdbserver_state.last_packet = 
>> > g_byte_array_sized_new(MAX_PACKET_LENGTH + 4);
>> > +
>> > +
>> > +if (kvm_enabled()) {
>> > +gdbserver_state.supported_sstep_flags = 
>> > kvm_get_supported_sstep_flags();
>> > +} else {
>> > +gdbserver_state.supported_sstep_flags =
>> > +SSTEP_ENABLE | SSTEP_NOIRQ | SSTEP_NOTIMER;
>> > +}
>> 
>> This fails to build:
>> 
>> o -c ../../gdbstub.c
>> ../../gdbstub.c: In function ‘init_gdbserver_state’:
>> ../../gdbstub.c:403:49: error: implicit declaration of function 
>> ‘kvm_get_supported_sstep_flags’ [-Werror=implicit-function-declaration]
>>   403 | gdbserver_state.supported_sstep_flags = 
>> kvm_get_supported_sstep_flags();
>>   | 
>> ^
>> ../../gdbstub.c:403:49: error: nested extern declaration of 
>> ‘kvm_get_supported_sstep_flags’ [-Werror=nested-externs]
>> ../../gdbstub.c: In function ‘gdbserver_start’:
>> ../../gdbstub.c:3531:27: error: implicit declaration of function
>> ‘kvm_supports_guest_debug’; did you mean ‘kvm_update_guest_debug’?
>> [-Werror=implicit-function-declaration]
>>  3531 | if (kvm_enabled() && !kvm_supports_guest_debug()) {
>>   |   ^~~~
>>   |   kvm_update_guest_debug
>> ../../gdbstub.c:3531:27: error: nested extern declaration of 
>> ‘kvm_supports_guest_debug’ [-Werror=nested-externs]
>> cc1: all warnings being treated as er

Re: [PATCH RFC 12/15] virtio-mem: Expose device memory via separate memslots

2021-10-14 Thread David Hildenbrand

On 14.10.21 13:45, Dr. David Alan Gilbert wrote:
> * David Hildenbrand (da...@redhat.com) wrote:
>> KVM nowadays supports a lot of memslots. We want to exploit that in
>> virtio-mem, exposing device memory via separate memslots to the guest
>> on demand, essentially reducing the total size of KVM slots
>> significantly (and thereby metadata in KVM and in QEMU for KVM memory
>> slots) especially when exposing initially only a small amount of memory
>> via a virtio-mem device to the guest, to hotplug more later. Further,
>> not always exposing the full device memory region to the guest reduces
>> the attack surface in many setups without requiring other mechanisms
>> like uffd for protection of unplugged memory.
>>
>> So split the original RAM region via memory region aliases into separate
>> chunks (ending up as individual memslots), and dynamically map the
>> required chunks (falling into the usable region) into the container.
>>
>> For now, we always map the memslots covered by the usable region. In the
>> future, with VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE, we'll be able to map
>> memslots on actual demand and optimize further.
>>
>> Users can specify via the "max-memslots" property how much memslots the
>> virtio-mem device is allowed to use at max. "0" translates to "auto, no
>> limit" and is determinded automatically using a heuristic. When a maximum
>> (> 1) is specified, that auto-determined value is capped. The parameter
>> doesn't have to be migrated and can differ between source and destination.
>> The only reason the parameter exists is not make some corner case setups
>> (multiple large virtio-mem devices assigned to a single virtual NUMA node
>>  with only very limited available memslots, hotplug of vhost devices) work.
>> The parameter will be set to be "0" as default soon, whereby it will remain
>> to be "1" for compat machines.
>>
>> The properties "memslots" and "used-memslots" are read-only.
>>
>> Signed-off-by: David Hildenbrand 
> 
> I think you need to move this patch after the vhost-user patches so that
> you don't break a bisect including vhost-user.

As the default is set to 1 and is set to 0 ("auto") in the last patch in
this series, there should be (almost) no difference regarding vhost-user.

> 
> But I do worry about the effect on vhost-user:

The 4096 limit was certainly more "let's make it extreme so we raise
some eyebrows and we can talk about the implications". I'd be perfectly
happy with 256 or better 512. Anything that's bigger than 32 in case of
virtiofsd :)

>   a) What about external programs like dpdk?

At least initially virtio-mem won't apply to dpdk and similar workloads
(RT). For example, virtio-mem is incompatible with mlock. So I think the
most important use case to optimize for is virtio-mem+virtiofsd
(especially kata).

>   b) I worry if you end up with a LOT of slots you end up with a lot of
> mmap's and fd's in vhost-user; I'm not quite sure what all the effects
> of that will be.

At least for virtio-mem, there will be a small number of fd's, as many
memslots share the same fd, so with virtio-mem it's not an issue.

#VMAs is indeed worth discussing. Usually we can have up to 64k VMAs in
a process. The downside of having many is some reduce pagefault
performance. It really also depends on the target application. Maybe
there should be some libvhost-user toggle, where the application can opt
in to allow more?

-- 
Thanks,

David / dhildenb

Re: [PATCH] failover: allow to pause the VM during the migration

2021-10-14 Thread Dr. David Alan Gilbert

* Laurent Vivier (lviv...@redhat.com) wrote:
> If we want to save a snapshot of a VM to a file, we used to follow the
> following steps:
> 
> 1- stop the VM:
>(qemu) stop
> 
> 2- migrate the VM to a file:
>(qemu) migrate "exec:cat > snapshot"
> 
> 3- resume the VM:
>(qemu) cont
> 
> After that we can restore the snapshot with:
>   qemu-system-x86_64 ... -incoming "exec:cat snapshot"
>   (qemu) cont
> 
> But when failover is configured, it doesn't work anymore.
> 
> As the failover needs to ask the guest OS to unplug the card
> the machine cannot be paused.
> 
> This patch introduces a new migration parameter, "pause-vm", that
> asks the migration to pause the VM during the migration startup
> phase after the the card is unplugged.
> 
> Once the migration is done, we only need to resume the VM with
> "cont" and the card is plugged back:
> 
> 1- set the parameter:
>(qemu) migrate_set_parameter pause-vm on
> 
> 2- migrate the VM to a file:
>(qemu) migrate "exec:cat > snapshot"
> 
>The primary failover card (VFIO) is unplugged and the VM is paused.
> 
> 3- resume the VM:
>(qemu) cont
> 
>The VM restarts and the primary failover card is plugged back
> 
> The VM state sent in the migration stream is "paused", it means
> when the snapshot is loaded or if the stream is sent to a destination
> QEMU, the VM needs to be resumed manually.
> 
> Signed-off-by: Laurent Vivier 

A mix of comments:
  a) As a boolean, this should be a MigrationCapability rather than a
parameter
  b) We already have a pause-before-switchover capability for a pause
that happens later in the flow; so this would be something like
pause-after-unplug
  c) Is this really the right answer?  Could this be done a different
way by doing the unplugs using (a possibly new) qmp command - so
that you can explicitly trigger the unplug prior to the migration?

Dave

> ---
>  qapi/migration.json| 20 +++---
>  include/hw/virtio/virtio-net.h |  1 +
>  hw/net/virtio-net.c| 33 ++
>  migration/migration.c  | 37 +-
>  monitor/hmp-cmds.c |  8 
>  5 files changed, 95 insertions(+), 4 deletions(-)
> 
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 88f07baedd06..86284d96ad2a 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -743,6 +743,10 @@
>  #block device name if there is one, and to their 
> node name
>  #otherwise. (Since 5.2)
>  #
> +# @pause-vm:   Pause the virtual machine before doing the migration.
> +#  This allows QEMU to unplug a card before doing the
> +#  migration as it depends on the guest kernel. (since 
> 6.2)
> +#
>  # Since: 2.4
>  ##
>  { 'enum': 'MigrationParameter',
> @@ -758,7 +762,7 @@
> 'xbzrle-cache-size', 'max-postcopy-bandwidth',
> 'max-cpu-throttle', 'multifd-compression',
> 'multifd-zlib-level' ,'multifd-zstd-level',
> -   'block-bitmap-mapping' ] }
> +   'block-bitmap-mapping', 'pause-vm' ] }
>  
>  ##
>  # @MigrateSetParameters:
> @@ -903,6 +907,10 @@
>  #block device name if there is one, and to their 
> node name
>  #otherwise. (Since 5.2)
>  #
> +# @pause-vm:   Pause the virtual machine before doing the migration.
> +#  This allows QEMU to unplug a card before doing the
> +#  migration as it depends on the guest kernel. (since 
> 6.2)
> +#
>  # Since: 2.4
>  ##
>  # TODO either fuse back into MigrationParameters, or make
> @@ -934,7 +942,8 @@
>  '*multifd-compression': 'MultiFDCompression',
>  '*multifd-zlib-level': 'uint8',
>  '*multifd-zstd-level': 'uint8',
> -'*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
> +'*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ],
> +'*pause-vm': 'bool' } }
>  
>  ##
>  # @migrate-set-parameters:
> @@ -1099,6 +1108,10 @@
>  #block device name if there is one, and to their 
> node name
>  #otherwise. (Since 5.2)
>  #
> +# @pause-vm:   Pause the virtual machine before doing the migration.
> +#  This allows QEMU to unplug a card before doing the
> +#  migration as it depends on the guest kernel. (since 
> 6.2)
> +#
>  # Since: 2.4
>  ##
>  { 'struct': 'MigrationParameters',
> @@ -1128,7 +1141,8 @@
>  '*multifd-compression': 'MultiFDCompression',
>  '*multifd-zlib-level': 'uint8',
>  '*multifd-zstd-level': 'uint8',
> -'*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
> +'*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ],
> +'*pause-vm': 'bool' } }
>  
>  ##
>  # @query-migrate-parameters:
> diff --git a/include

[PATCH v8 7/8] hw/arm/virt-acpi-build: Generate PPTT table

2021-10-14 Thread Yanan Wang

Generate PPTT table for Arm virt machines.

Signed-off-by: Yanan Wang 
Reviewed-by: Andrew Jones 
---
 hw/arm/virt-acpi-build.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 6cec97352b..658a0a5d39 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -875,13 +875,19 @@ void virt_acpi_build(VirtMachineState *vms, 
AcpiBuildTables *tables)
 dsdt = tables_blob->len;
 build_dsdt(tables_blob, tables->linker, vms);
 
-/* FADT MADT GTDT MCFG SPCR pointed to by RSDT */
+/* FADT MADT PPTT GTDT MCFG SPCR pointed to by RSDT */
 acpi_add_table(table_offsets, tables_blob);
 build_fadt_rev5(tables_blob, tables->linker, vms, dsdt);
 
 acpi_add_table(table_offsets, tables_blob);
 build_madt(tables_blob, tables->linker, vms);
 
+if (!vmc->no_cpu_topology) {
+acpi_add_table(table_offsets, tables_blob);
+build_pptt(tables_blob, tables->linker, ms,
+   vms->oem_id, vms->oem_table_id);
+}
+
 acpi_add_table(table_offsets, tables_blob);
 build_gtdt(tables_blob, tables->linker, vms);
 
-- 
2.19.1

[PATCH v8 3/8] hw/arm/virt: Add cpu-map to device tree

2021-10-14 Thread Yanan Wang

From: Andrew Jones 

Support device tree CPU topology descriptions.

In accordance with the Devicetree Specification, the Linux Doc
"arm/cpus.yaml" requires that cpus and cpu nodes in the DT are
present. And we have already met the requirement by generating
/cpus/cpu@* nodes for members within ms->smp.cpus. Accordingly,
we should also create subnodes in cpu-map for the present cpus,
each of which relates to an unique cpu node.

The Linux Doc "cpu/cpu-topology.txt" states that the hierarchy
of CPUs in a SMP system is defined through four entities and
they are socket/cluster/core/thread. It is also required that
a socket node's child nodes must be one or more cluster nodes.
Given that currently we are only provided with information of
socket/core/thread, we assume there is one cluster child node
in each socket node when creating cpu-map.

Signed-off-by: Andrew Jones 
Co-developed-by: Yanan Wang 
Signed-off-by: Yanan Wang 
---
 hw/arm/virt.c | 70 +++
 1 file changed, 60 insertions(+), 10 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index d241516523..f80af19cd3 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -351,20 +351,21 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
 int cpu;
 int addr_cells = 1;
 const MachineState *ms = MACHINE(vms);
+const VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
 int smp_cpus = ms->smp.cpus;
 
 /*
- * From Documentation/devicetree/bindings/arm/cpus.txt
- *  On ARM v8 64-bit systems value should be set to 2,
- *  that corresponds to the MPIDR_EL1 register size.
- *  If MPIDR_EL1[63:32] value is equal to 0 on all CPUs
- *  in the system, #address-cells can be set to 1, since
- *  MPIDR_EL1[63:32] bits are not used for CPUs
- *  identification.
+ * See Linux Documentation/devicetree/bindings/arm/cpus.yaml
+ * On ARM v8 64-bit systems value should be set to 2,
+ * that corresponds to the MPIDR_EL1 register size.
+ * If MPIDR_EL1[63:32] value is equal to 0 on all CPUs
+ * in the system, #address-cells can be set to 1, since
+ * MPIDR_EL1[63:32] bits are not used for CPUs
+ * identification.
  *
- *  Here we actually don't know whether our system is 32- or 64-bit one.
- *  The simplest way to go is to examine affinity IDs of all our CPUs. If
- *  at least one of them has Aff3 populated, we set #address-cells to 2.
+ * Here we actually don't know whether our system is 32- or 64-bit one.
+ * The simplest way to go is to examine affinity IDs of all our CPUs. If
+ * at least one of them has Aff3 populated, we set #address-cells to 2.
  */
 for (cpu = 0; cpu < smp_cpus; cpu++) {
 ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu));
@@ -407,8 +408,57 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
 ms->possible_cpus->cpus[cs->cpu_index].props.node_id);
 }
 
+if (!vmc->no_cpu_topology) {
+qemu_fdt_setprop_cell(ms->fdt, nodename, "phandle",
+  qemu_fdt_alloc_phandle(ms->fdt));
+}
+
 g_free(nodename);
 }
+
+if (!vmc->no_cpu_topology) {
+/*
+ * Add vCPU topology description through fdt node cpu-map.
+ *
+ * See Linux Documentation/devicetree/bindings/cpu/cpu-topology.txt
+ * In a SMP system, the hierarchy of CPUs can be defined through
+ * four entities that are used to describe the layout of CPUs in
+ * the system: socket/cluster/core/thread.
+ *
+ * A socket node represents the boundary of system physical package
+ * and its child nodes must be one or more cluster nodes. A system
+ * can contain several layers of clustering within a single physical
+ * package and cluster nodes can be contained in parent cluster nodes.
+ *
+ * Given that cluster is not yet supported in the vCPU topology,
+ * we currently generate one cluster node within each socket node
+ * by default.
+ */
+qemu_fdt_add_subnode(ms->fdt, "/cpus/cpu-map");
+
+for (cpu = smp_cpus - 1; cpu >= 0; cpu--) {
+char *cpu_path = g_strdup_printf("/cpus/cpu@%d", cpu);
+char *map_path;
+
+if (ms->smp.threads > 1) {
+map_path = g_strdup_printf(
+"/cpus/cpu-map/socket%d/cluster0/core%d/thread%d",
+cpu / (ms->smp.cores * ms->smp.threads),
+(cpu / ms->smp.threads) % ms->smp.cores,
+cpu % ms->smp.threads);
+} else {
+map_path = g_strdup_printf(
+"/cpus/cpu-map/socket%d/cluster0/core%d",
+cpu / ms->smp.cores,
+cpu % ms->smp.cores);
+}
+qemu_fdt_add_path(ms->fdt, map_path);
+qemu_fdt_setprop_phandle(ms->fdt, map_path, "cpu", cpu_path);
+
+

[PATCH v8 5/8] hw/acpi/aml-build: Add PPTT table

2021-10-14 Thread Yanan Wang

From: Andrew Jones 

Add the Processor Properties Topology Table (PPTT) used to
describe CPU topology information to ACPI guests.

Note, a DT-boot Linux guest with a non-flat CPU topology will
see socket and core IDs being sequential integers starting
from zero, which is different from ACPI-boot Linux guest,
e.g. with -smp 4,sockets=2,cores=2,threads=1

a DT boot produces:

 cpu:  0 package_id:  0 core_id:  0
 cpu:  1 package_id:  0 core_id:  1
 cpu:  2 package_id:  1 core_id:  0
 cpu:  3 package_id:  1 core_id:  1

an ACPI boot produces:

 cpu:  0 package_id: 36 core_id:  0
 cpu:  1 package_id: 36 core_id:  1
 cpu:  2 package_id: 96 core_id:  2
 cpu:  3 package_id: 96 core_id:  3

This is due to several reasons:

 1) DT cpu nodes do not have an equivalent field to what the PPTT
ACPI Processor ID must be, i.e. something equal to the MADT CPU
UID or equal to the UID of an ACPI processor container. In both
ACPI cases those are platform dependant IDs assigned by the
vendor.

 2) While QEMU is the vendor for a guest, if the topology specifies
SMT (> 1 thread), then, with ACPI, it is impossible to assign a
core-id the same value as a package-id, thus it is not possible
to have package-id=0 and core-id=0. This is because package and
core containers must be in the same ACPI namespace and therefore
must have unique UIDs.

 3) ACPI processor containers are not mandatorily required for PPTT
tables to be used and, due to the limitations of which IDs are
selected described above in (2), they are not helpful for QEMU,
so we don't build them with this patch. In the absence of them,
Linux assigns its own unique IDs. The maintainers have chosen not
to use counters from zero, but rather ACPI table offsets, which
explains why the numbers are so much larger than with DT.

 4) When there is no SMT (threads=1) the core IDs for ACPI boot guests
match the logical CPU IDs, because these IDs must be equal to the
MADT CPU UID (as no processor containers are present), and QEMU
uses the logical CPU ID for these MADT IDs.

So in summary, with QEMU as the vendor for the guests, we simply
use sequential integers starting from zero for the non-leaf nodes
but with ID-valid flag unset, so that guest will ignore them and
use table offsets as unique container IDs. And we use logical CPU
IDs for the leaf nodes with the ID-valid flag set, which will be
consistent with MADT.

Signed-off-by: Andrew Jones 
Co-developed-by: Yanan Wang 
Signed-off-by: Yanan Wang 
Reviewed-by: Michael S. Tsirkin 
---
 hw/acpi/aml-build.c | 60 +
 include/hw/acpi/aml-build.h |  3 ++
 2 files changed, 63 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index b7b9db6888..0d50e88e9d 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1990,6 +1990,66 @@ void build_processor_hierarchy_node(GArray *tbl, 
uint32_t flags,
 }
 }
 
+/* ACPI 6.2: 5.2.29 Processor Properties Topology Table (PPTT) */
+void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms,
+const char *oem_id, const char *oem_table_id)
+{
+int pptt_start = table_data->len;
+int uid = 0;
+int socket;
+AcpiTable table = { .sig = "PPTT", .rev = 2,
+.oem_id = oem_id, .oem_table_id = oem_table_id };
+
+acpi_table_begin(&table, table_data);
+
+for (socket = 0; socket < ms->smp.sockets; socket++) {
+uint32_t socket_offset = table_data->len - pptt_start;
+int core;
+
+build_processor_hierarchy_node(
+table_data,
+/*
+ * ACPI 6.2 - Physical package
+ * represents the boundary of a physical package
+ */
+(1 << 0),
+0, socket, NULL, 0);
+
+for (core = 0; core < ms->smp.cores; core++) {
+uint32_t core_offset = table_data->len - pptt_start;
+int thread;
+
+if (ms->smp.threads > 1) {
+build_processor_hierarchy_node(
+table_data,
+/*
+ * ACPI 6.2 - Physical package
+ * doesn't represent the boundary of a physical package
+ */
+(0 << 0),
+socket_offset, core, NULL, 0);
+
+for (thread = 0; thread < ms->smp.threads; thread++) {
+build_processor_hierarchy_node(
+table_data,
+(1 << 1) | /* ACPI 6.2 - ACPI Processor ID valid */
+(1 << 2) | /* ACPI 6.3 - Processor is a Thread */
+(1 << 3),  /* ACPI 6.3 - Node is a Leaf */
+core_offset, uid++, NULL, 0);
+}
+} else {
+build_processor_hierarchy_node(
+table_data,
+(1 << 1) | /* ACPI 6.2 - ACPI Processor ID valid */
+

[PATCH v8 4/8] hw/acpi/aml-build: Add Processor hierarchy node structure

2021-10-14 Thread Yanan Wang

Add a generic API to build Processor hierarchy node structure (Type 0),
which is strictly consistent with descriptions in ACPI 6.2: 5.2.29.1.

This function will be used to build ACPI PPTT table for cpu topology.

Co-developed-by: Ying Fang 
Co-developed-by: Henglong Fan 
Co-developed-by: Yanan Wang 
Signed-off-by: Yanan Wang 
Reviewed-by: Andrew Jones 
Reviewed-by: Michael S. Tsirkin 
---
 hw/acpi/aml-build.c | 26 ++
 include/hw/acpi/aml-build.h |  4 
 2 files changed, 30 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 76af0ebaf9..b7b9db6888 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1964,6 +1964,32 @@ void build_slit(GArray *table_data, BIOSLinker *linker, 
MachineState *ms,
 acpi_table_end(linker, &table);
 }
 
+/* ACPI 6.2: 5.2.29.1 Processor hierarchy node structure (Type 0) */
+void build_processor_hierarchy_node(GArray *tbl, uint32_t flags,
+uint32_t parent, uint32_t id,
+uint32_t *priv_rsrc, uint32_t priv_num)
+{
+int i;
+
+build_append_byte(tbl, 0); /* Type 0 - processor */
+build_append_byte(tbl, 20 + priv_num * 4); /* Length */
+build_append_int_noprefix(tbl, 0, 2);  /* Reserved */
+build_append_int_noprefix(tbl, flags, 4);  /* Flags */
+build_append_int_noprefix(tbl, parent, 4); /* Parent */
+build_append_int_noprefix(tbl, id, 4); /* ACPI Processor ID */
+
+/* Number of private resources */
+build_append_int_noprefix(tbl, priv_num, 4);
+
+/* Private resources[N] */
+if (priv_num > 0) {
+assert(priv_rsrc);
+for (i = 0; i < priv_num; i++) {
+build_append_int_noprefix(tbl, priv_rsrc[i], 4);
+}
+}
+}
+
 /* build rev1/rev3/rev5.1 FADT */
 void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
 const char *oem_id, const char *oem_table_id)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 3cf6f2c1b9..2c457c8f17 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -489,6 +489,10 @@ void build_srat_memory(GArray *table_data, uint64_t base,
 void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms,
 const char *oem_id, const char *oem_table_id);
 
+void build_processor_hierarchy_node(GArray *tbl, uint32_t flags,
+uint32_t parent, uint32_t id,
+uint32_t *priv_rsrc, uint32_t priv_num);
+
 void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
 const char *oem_id, const char *oem_table_id);
 
-- 
2.19.1

Re: [PATCH 00/11] 9p: Add support for Darwin

2021-10-14 Thread Will Cohen

Correct, sorry for the imprecise language. The use case being contemplated
is limited to Linux as the guest side, specifically for cross-platform
tools where the macOS implementation consists of integrating a Linux VM via
QEMU.

NixOS (updater of the original patch,
https://github.com/NixOS/nixpkgs/pull/122420) would be able to use this to
provide macOS support via a VM.

Lima and Podman as containerization alternatives to Docker would like to
performantly mount volumes between macOS users and their respective VMs.
Lima currently accomplishes this via sshfs, but would like to move to 9p
for stability/performance reasons (https://github.com/lima-vm/lima/issues/20).
Podman has yet to fully settle on an implementation at all due to similar
outstanding concerns, but the furthest along proposed implementation choice
has been 9pfs as well (https://github.com/containers/podman/pull/11454),
pending adoption of such functionality in QEMU upstream.

On Thu, Oct 14, 2021 at 8:55 AM Christian Schoenebeck <
qemu_...@crudebyte.com> wrote:

> On Donnerstag, 14. Oktober 2021 14:22:19 CEST Will Cohen wrote:
> > Correct. It's been tested and functions when applied to QEMU master, with
> > host running macOS Big Sur 11.6 (personal machine) using client 9p2000.L
> > (taking a cue from the guest mounting instructions on
> > https://wiki.qemu.org/Documentation/9psetup).
>
> So it was the Linux kernel's 9p client on guest side with 9p 'local' fs
> driver
> and 9p transport driver was 'virtio-pci'.
>
> I was just wondering if somebody already bothered for macOS being the
> guest,
> because that use case is a bit more challenging, especially with macOS 11
> and
> higher. But I see that's nothing you were into.
>
> > On Thu, Oct 14, 2021 at 7:57 AM Christian Schoenebeck <
> >
> > qemu_...@crudebyte.com> wrote:
> > > On Donnerstag, 14. Oktober 2021 12:48:55 CEST Will Cohen wrote:
> > > > Many thanks for all the clarifications — it’s my first time using
> > > > git-send-email and first time with mailing-list-based devel
> workflows.
> > >
> > > Will
> > >
> > > > adjust accordingly, work through gitlab, and eventually resend via
> > > > git-publish as v2.
> > >
> > > So the intended use case is macOS being host.
> > >
> > > Has this been tested, and if yes, using which 9p client and which macOS
> > > version?
> > >
> > > Best regards,
> > > Christian Schoenebeck
>
>
>

[PATCH v8 1/8] hw/arm/virt: Only describe cpu topology since virt-6.2

2021-10-14 Thread Yanan Wang

On existing older machine types, without cpu topology described
in ACPI or DT, the guest will populate one by default. With the
topology described, it will read the information and set up its
topology as instructed, but that may not be the same as what was
getting used by default. It's possible that an user application
has a dependency on the default topology and if the default one
gets changed it will probably behave differently.

Based on above consideration we'd better only describe topology
information to the guest on 6.2 and later machine types.

Signed-off-by: Yanan Wang 
Reviewed-by: Andrew Jones 
---
 hw/arm/virt.c | 1 +
 include/hw/arm/virt.h | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 7170aaacd5..d241516523 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2816,6 +2816,7 @@ static void virt_machine_6_1_options(MachineClass *mc)
 virt_machine_6_2_options(mc);
 compat_props_add(mc->compat_props, hw_compat_6_1, hw_compat_6_1_len);
 mc->smp_props.prefer_sockets = true;
+vmc->no_cpu_topology = true;
 
 /* qemu ITS was introduced with 6.2 */
 vmc->no_tcg_its = true;
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index b461b8d261..dc6b66ffc8 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -125,11 +125,13 @@ struct VirtMachineClass {
 bool claim_edge_triggered_timers;
 bool smbios_old_sys_ver;
 bool no_highmem_ecam;
-bool no_ged;   /* Machines < 4.2 has no support for ACPI GED device */
+bool no_ged;   /* Machines < 4.2 have no support for ACPI GED device */
 bool kvm_no_adjvtime;
 bool no_kvm_steal_time;
 bool acpi_expose_flash;
 bool no_secure_gpio;
+/* Machines < 6.2 have no support for describing cpu topology to guest */
+bool no_cpu_topology;
 };
 
 struct VirtMachineState {
-- 
2.19.1

[PATCH v8 6/8] tests/data/acpi/virt: Add an empty expected file for PPTT

2021-10-14 Thread Yanan Wang

Add a generic empty binary file for the new introduced PPTT table
under tests/data/acpi/virt, and list it as files to be changed in
tests/qtest/bios-tables-test-allowed-diff.h

Signed-off-by: Yanan Wang 
---
 tests/data/acpi/virt/PPTT   | 0
 tests/qtest/bios-tables-test-allowed-diff.h | 1 +
 2 files changed, 1 insertion(+)
 create mode 100644 tests/data/acpi/virt/PPTT

diff --git a/tests/data/acpi/virt/PPTT b/tests/data/acpi/virt/PPTT
new file mode 100644
index 00..e69de29bb2
diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..cb143a55a6 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,2 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/virt/PPTT",
-- 
2.19.1

Re: [PATCH 0/5] 9pfs: introduce P9Array

2021-10-14 Thread Christian Schoenebeck

On Freitag, 1. Oktober 2021 16:25:22 CEST Christian Schoenebeck wrote:
> This is simply a refactored follow-up of the following previous series
> ("introduce QArray"):
> https://lists.gnu.org/archive/html/qemu-devel/2021-08/msg04530.html
> 
> There was no consensus about a project wide "QArray" shared utility code,
> nor does there seem to be a need for a project wide code, so let's just
> refactor QArray -> P9Array and make it a 9P local type for now:
> https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg07743.html
> 
> Changes QArray v3 -> P9Array v1:
> 
>   * Moved header file, i.e. include/qemu/qarray.h -> fsdev/p9array.h
> 
>   * Refactored *QArray* -> *P9Array*
> 
>   * Refactored P9ARRAY_CREATE -> P9ARRAY_NEW and
> p9array_create_* -> p9array_new_* accordingly
> 
>   * Refactored DECLARE_P9ARRAY_TYPE -> P9ARRAY_DECLARE_TYPE
> 
>   * Refactored DEFINE_P9ARRAY_TYPE -> P9ARRAY_DEFINE_TYPE
> 
>   [No behaviour changes whatsoever]
> 
> Christian Schoenebeck (5):
>   9pfs: introduce P9Array
>   fsdev/p9array.h: check scalar type in P9ARRAY_NEW()
>   9pfs: make V9fsString usable via P9Array API
>   9pfs: make V9fsPath usable via P9Array API
>   9pfs: use P9Array in v9fs_walk()
> 
>  fsdev/9p-marshal.c |   2 +
>  fsdev/9p-marshal.h |   3 +
>  fsdev/file-op-9p.h |   2 +
>  fsdev/p9array.h| 160 +
>  hw/9pfs/9p.c   |  19 ++
>  5 files changed, 174 insertions(+), 12 deletions(-)
>  create mode 100644 fsdev/p9array.h

Queued on 9p.next:
https://github.com/cschoenebeck/qemu/commits/9p.next

Thanks!

Best regards,
Christian Schoenebeck

Best regards,
Christian Schoenebeck

[PATCH v8 2/8] device_tree: Add qemu_fdt_add_path

2021-10-14 Thread Yanan Wang

qemu_fdt_add_path() works like qemu_fdt_add_subnode(), except it
also adds all missing subnodes from the given path. We'll use it
in a coming patch where we will add cpu-map to the device tree.

And we also tweak an error message of qemu_fdt_add_subnode().

Cc: David Gibson 
Cc: Alistair Francis 
Co-developed-by: Andrew Jones 
Signed-off-by: Yanan Wang 
Reviewed-by: David Gibson 
Reviewed-by: Andrew Jones 
---
 include/sysemu/device_tree.h |  1 +
 softmmu/device_tree.c| 44 ++--
 2 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/include/sysemu/device_tree.h b/include/sysemu/device_tree.h
index 8a2fe55622..ef060a9759 100644
--- a/include/sysemu/device_tree.h
+++ b/include/sysemu/device_tree.h
@@ -121,6 +121,7 @@ uint32_t qemu_fdt_get_phandle(void *fdt, const char *path);
 uint32_t qemu_fdt_alloc_phandle(void *fdt);
 int qemu_fdt_nop_node(void *fdt, const char *node_path);
 int qemu_fdt_add_subnode(void *fdt, const char *name);
+int qemu_fdt_add_path(void *fdt, const char *path);
 
 #define qemu_fdt_setprop_cells(fdt, node_path, property, ...) \
 do {  \
diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
index b621f63fba..3965c834ca 100644
--- a/softmmu/device_tree.c
+++ b/softmmu/device_tree.c
@@ -540,8 +540,8 @@ int qemu_fdt_add_subnode(void *fdt, const char *name)
 
 retval = fdt_add_subnode(fdt, parent, basename);
 if (retval < 0) {
-error_report("FDT: Failed to create subnode %s: %s", name,
- fdt_strerror(retval));
+error_report("%s: Failed to create subnode %s: %s",
+ __func__, name, fdt_strerror(retval));
 exit(1);
 }
 
@@ -549,6 +549,46 @@ int qemu_fdt_add_subnode(void *fdt, const char *name)
 return retval;
 }
 
+/*
+ * qemu_fdt_add_path: Like qemu_fdt_add_subnode(), but will add
+ * all missing subnodes from the given path.
+ */
+int qemu_fdt_add_path(void *fdt, const char *path)
+{
+const char *name;
+const char *p = path;
+int namelen, retval;
+int parent = 0;
+
+if (path[0] != '/') {
+return -1;
+}
+
+while (p) {
+name = p + 1;
+p = strchr(name, '/');
+namelen = p != NULL ? p - name : strlen(name);
+
+retval = fdt_subnode_offset_namelen(fdt, parent, name, namelen);
+if (retval < 0 && retval != -FDT_ERR_NOTFOUND) {
+error_report("%s: Unexpected error in finding subnode %.*s: %s",
+ __func__, namelen, name, fdt_strerror(retval));
+exit(1);
+} else if (retval == -FDT_ERR_NOTFOUND) {
+retval = fdt_add_subnode_namelen(fdt, parent, name, namelen);
+if (retval < 0) {
+error_report("%s: Failed to create subnode %.*s: %s",
+ __func__, namelen, name, fdt_strerror(retval));
+exit(1);
+}
+}
+
+parent = retval;
+}
+
+return retval;
+}
+
 void qemu_fdt_dumpdtb(void *fdt, int size)
 {
 const char *dumpdtb = current_machine->dumpdtb;
-- 
2.19.1

[PATCH v8 0/8] hw/arm/virt: Introduce cpu topology support

2021-10-14 Thread Yanan Wang

Hi,

This is the latest v8 with update in patch #6 and #8. Now only one
generic reference file for PPTT is added in tests/data/acpi/virt.
Machiel and Igor, please help to have a look, thanks!

Description of this series:
Once the view of an accurate virtual cpu topology is provided to guest,
with a well-designed vCPU pinning to the pCPU we may get a huge benefit,
e.g., the scheduling performance improvement. See Dario Faggioli's
research and the related performance tests in [1] for reference.

This patch series introduces cpu topology support for Arm platform.
Both cpu-map in DT and ACPI PPTT table are introduced to store the
topology information. And we only describe the topology information
to 6.2 and newer virt machines, considering compatibility.

[1] 
https://kvmforum2020.sched.com/event/eE1y/virtual-topology-for-virtual-machines
-friend-or-foe-dario-faggioli-suse

Series tested locally on Arm64 machines kunpeng920.
After booting a Linux guest with "-smp 
16,sockets=4,cores=4,threads=1,maxcpus=16",
through lscpu we will see the information about CPU topology like:
Architecture:aarch64
Byte Order:  Little Endian
CPU(s):  16
On-line CPU(s) list: 0-15
Thread(s) per core:  1
Core(s) per socket:  4
Socket(s):   4
NUMA node(s):1
Vendor ID:   0x48
Model:   0
Stepping:0x1
BogoMIPS:200.00
NUMA node0 CPU(s):   0-15

and with "-smp 16" we will see:
Architecture:aarch64
Byte Order:  Little Endian
CPU(s):  16
On-line CPU(s) list: 0-15
Thread(s) per core:  1
Core(s) per socket:  16
Socket(s):   1
NUMA node(s):1
Vendor ID:   0x48
Model:   0
Stepping:0x1
BogoMIPS:200.00
NUMA node0 CPU(s):   0-15

Changelog:
v7->v8:
- rebased on top of master (commit e5b2333f24)
- only add one generic expected file for PPTT instead of four, which works
  fine enough for now (patch #6 and #8 updated)
- v7: 
https://lore.kernel.org/qemu-devel/20211007030746.10420-1-wangyana...@huawei.com/

v6->v7:
- rebased on top of master (commit ca61fa4b80)
- use newly introduced acpi_table_begin/acpi_table_end APIs to build PPTT
  (patch #5 updated)
- add reference files for PPTT to fix broken bios-table-test for Aarch64
  virt machine (patch #6-#8 added)
- v6: 
https://lore.kernel.org/qemu-devel/20210824122016.144364-1-wangyana...@huawei.com/

Andrew Jones (2):
  hw/arm/virt: Add cpu-map to device tree
  hw/acpi/aml-build: Add PPTT table

Yanan Wang (6):
  hw/arm/virt: Only describe cpu topology since virt-6.2
  device_tree: Add qemu_fdt_add_path
  hw/acpi/aml-build: Add Processor hierarchy node structure
  tests/data/acpi/virt: Add an empty expected file for PPTT
  hw/arm/virt-acpi-build: Generate PPTT table
  tests/data/acpi/virt: Update the empty expected file for PPTT

 hw/acpi/aml-build.c  |  86 +++
 hw/arm/virt-acpi-build.c |   8 +++-
 hw/arm/virt.c|  71 +
 include/hw/acpi/aml-build.h  |   7 +++
 include/hw/arm/virt.h|   4 +-
 include/sysemu/device_tree.h |   1 +
 softmmu/device_tree.c|  44 +-
 tests/data/acpi/virt/PPTT| Bin 0 -> 76 bytes
 8 files changed, 207 insertions(+), 14 deletions(-)
 create mode 100644 tests/data/acpi/virt/PPTT

--
2.19.1

[PATCH v8 8/8] tests/data/acpi/virt: Update the empty expected file for PPTT

2021-10-14 Thread Yanan Wang

Run ./tests/data/acpi/rebuild-expected-aml.sh from build directory
to update PPTT binary. Also empty bios-tables-test-allowed-diff.h.

Disassembled output of the updated new file:

/*
 * Intel ACPI Component Architecture
 * AML/ASL+ Disassembler version 20180810 (64-bit version)
 * Copyright (c) 2000 - 2018 Intel Corporation
 *
 * Disassembly of tests/data/acpi/virt/PPTT, Fri Oct  8 10:12:32 2021
 *
 * ACPI Data Table [PPTT]
 *
 * Format: [HexOffset DecimalOffset ByteLength]  FieldName : FieldValue
 */

[000h    4]Signature : "PPTT"[Processor Properties 
Topology Table]
[004h 0004   4] Table Length : 004C
[008h 0008   1] Revision : 02
[009h 0009   1] Checksum : A8
[00Ah 0010   6]   Oem ID : "BOCHS "
[010h 0016   8] Oem Table ID : "BXPC"
[018h 0024   4] Oem Revision : 0001
[01Ch 0028   4]  Asl Compiler ID : "BXPC"
[020h 0032   4]Asl Compiler Revision : 0001

[024h 0036   1]Subtable Type : 00 [Processor Hierarchy Node]
[025h 0037   1]   Length : 14
[026h 0038   2] Reserved : 
[028h 0040   4]Flags (decoded below) : 0001
Physical package : 1
 ACPI Processor ID valid : 0
[02Ch 0044   4]   Parent : 
[030h 0048   4]ACPI Processor ID : 
[034h 0052   4]  Private Resource Number : 

[038h 0056   1]Subtable Type : 00 [Processor Hierarchy Node]
[039h 0057   1]   Length : 14
[03Ah 0058   2] Reserved : 
[03Ch 0060   4]Flags (decoded below) : 000A
Physical package : 0
 ACPI Processor ID valid : 1
[040h 0064   4]   Parent : 0024
[044h 0068   4]ACPI Processor ID : 
[048h 0072   4]  Private Resource Number : 

Raw Table Data: Length 76 (0x4C)

: 50 50 54 54 4C 00 00 00 02 A8 42 4F 43 48 53 20  // PPTTL.BOCHS
0010: 42 58 50 43 20 20 20 20 01 00 00 00 42 58 50 43  // BXPCBXPC
0020: 01 00 00 00 00 14 00 00 01 00 00 00 00 00 00 00  // 
0030: 00 00 00 00 00 00 00 00 00 14 00 00 0A 00 00 00  // 
0040: 24 00 00 00 00 00 00 00 00 00 00 00  // $...

Signed-off-by: Yanan Wang 
---
 tests/data/acpi/virt/PPTT   | Bin 0 -> 76 bytes
 tests/qtest/bios-tables-test-allowed-diff.h |   1 -
 2 files changed, 1 deletion(-)

diff --git a/tests/data/acpi/virt/PPTT b/tests/data/acpi/virt/PPTT
index 
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..7a1258ecf123555b24462c98ccbb76b4ac1d0c2b
 100644
GIT binary patch
literal 76
zcmV-S0JHy4P*hY*2s6tOeNK+s}SWrVCARr(C&SWrU&06aWAL0
i0006aWAUa01^0tW#A

literal 0
KcmV+b0RR631

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index cb143a55a6..dfb8523c8b 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1,2 +1 @@
 /* List of comma-separated changed AML files to ignore */
-"tests/data/acpi/virt/PPTT",
-- 
2.19.1

Re: [PATCH v3] hw/usb/vt82c686-uhci-pci: Use ISA instead of PCI interrupts

2021-10-14 Thread Philippe Mathieu-Daudé

On 10/14/21 11:12, Gerd Hoffmann wrote:
>   Hi,
> 
>>>  if (s->masterbus) {
>>>  USBPort *ports[NB_PORTS];
>>
>> usb_uhci_common_realize() should be refactored making it PCI-agnostic.
> 
> Not sure this is needed here.  This seems to be more a platform-specific
> oddity in IRQ routing than a non-pci UHCI device.

OK, fine then.

Re: Approaches for same-on-same linux-user execve?

2021-10-14 Thread Assad Hashmi

> ARM's armie takes a different approach with the trap and emulate of
> SIGILL instructions. This works well for the occasional "new"
> instruction but will be less efficient overall if your instruction
> stream is entirely novel.

To clarify: earlier versions of armie did use the SIGILL trap-and-emulate
method, which was limited. Recent versions, including the latest release are
based on the DynamoRIO platform which enables full emulation and
instrumentation (https://dynamorio.org). By default, DynamoRIO and by extension
armie, follow all child processes, see
https://dynamorio.org/page_deploy.html#op_children.

As new Arm architecture features are added to QEMU, e.g. SVE, SVE2, SME etc.
there is an expectation in the Arm community that QEMU can run large Arm
user-space applications on Arm hardware, making lack of same-on-same execve a
not insignificant blocker.

AIUI, given the open-source licensing of QEMU and DynamoRIO, there would be no
legal reason for QEMU not to borrow from DynamoRIO.


On Fri, 8 Oct 2021 at 11:49, Alex Bennée  wrote:
>
>
> Arnd Bergmann  writes:
>
> > On Thu, Oct 7, 2021 at 4:32 PM Alex Bennée  wrote:
> >>
> >> I came across a use-case this week for ARM although this may be also
> >> applicable to architectures where QEMU's emulation is ahead of the
> >> hardware currently widely available - for example if you want to
> >> exercise SVE code on AArch64. When the linux-user architecture is not
> >> the same as the host architecture then binfmt_misc works perfectly fine.
> >>
> >> However in the case you are running same-on-same you can't use
> >> binfmt_misc to redirect execution to using QEMU because any attempt to
> >> trap native binaries will cause your userspace to hang as binfmt_misc
> >> will be invoked to run the QEMU binary needed to run your application
> >> and a deadlock ensues.
> >
> > Can you clarify how the code would run in this case? Does qemu-user
> > still emulate every single instruction, both the compatible and the 
> > incompatible
> > ones, or is the idea here to run as much as possible natively and only
> > emulate the instructions that are not available natively, using either
> > SIGILL or searching through the object code for those instructions?
>
> qemu-user only every does a complete translation. The hope is of course
> our translator is "fairly efficient" so for example integer SVE
> operations should get unrolled into a series of AdvSIMD instructions on
> the backend.
>
> ARM's armie takes a different approach with the trap and emulate of
> SIGILL instructions. This works well for the occasional "new"
> instruction but will be less efficient overall if your instruction
> stream is entirely novel.
>
> >> Trap execve in QEMU linux-user
> >> --
> >>
> >> We could add a flag to QEMU so at the point of execve it manually
> >> invokes the new process with QEMU, passing on the flag to persist this
> >> behaviour.
> >
> > This sounds like the obvious approach if you already do a full
> > instruction emulation. You'd still need to run the parent process
> > by calling qemu-user manually, but I suppose you need to do
> > something like this in any case.
> >
> >> Add path mask to binfmt_misc
> >> 
> >>
> >> The other option would be to extend binfmt_misc to have a path mask so
> >> it only applies it's alternative execution scheme to binaries in a
> >> particular section of the file-system (or maybe some sort of pattern?).
> >
> > The main downside I see here is that it requires kernel modification, so
> > it would not work for old kernels.
> >
> >> Are there any other approaches you could take? Which do you think has
> >> the most merit?
> >
> > If we modify binfmt_misc in the kernel, it might be helpful to do it
> > by extending it with namespace support, so it could be constrained
> > to a single container without having to do the emulation outside.
> > Unfortunately that does not solve the problem of preventing the
> > qemu-user binary from triggering the binfmt_misc lookup.
>
> I wonder how that would interact with the persistent ("P") mode of
> binfmt_misc. The backend is identified at the start and gets re-used
> rather than looked up each time.
>
> >
> >Arnd
>
>
> --
> Alex Bennée

[PATCH v4 1/3] vdpa: Skip protected ram IOMMU mappings

2021-10-14 Thread Eugenio Pérez

Following the logic of commit 56918a126ae ("memory: Add RAM_PROTECTED
flag to skip IOMMU mappings") with VFIO, skip memory sections
inaccessible via normal mechanisms, including DMA.

Signed-off-by: Eugenio Pérez 
Acked-by: Jason Wang 
---
 hw/virtio/vhost-vdpa.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 47d7a5a23d..ea1aa71ad8 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -28,6 +28,7 @@ static bool 
vhost_vdpa_listener_skipped_section(MemoryRegionSection *section)
 {
 return (!memory_region_is_ram(section->mr) &&
 !memory_region_is_iommu(section->mr)) ||
+memory_region_is_protected(section->mr) ||
/* vhost-vDPA doesn't allow MMIO to be mapped  */
 memory_region_is_ram_device(section->mr) ||
/*
-- 
2.27.0

[PATCH v4 2/3] vdpa: Add vhost_vdpa_section_end

2021-10-14 Thread Eugenio Pérez

Abstract this operation, that will be reused when validating the region
against the iova range that the device supports.

Signed-off-by: Eugenio Pérez 
Acked-by: Jason Wang 
---
 hw/virtio/vhost-vdpa.c | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index ea1aa71ad8..be7c63b4ba 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -24,6 +24,19 @@
 #include "trace.h"
 #include "qemu-common.h"
 
+/*
+ * Return one past the end of the end of section. Be careful with uint64_t
+ * conversions!
+ */
+static Int128 vhost_vdpa_section_end(const MemoryRegionSection *section)
+{
+Int128 llend = int128_make64(section->offset_within_address_space);
+llend = int128_add(llend, section->size);
+llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
+
+return llend;
+}
+
 static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section)
 {
 return (!memory_region_is_ram(section->mr) &&
@@ -160,10 +173,7 @@ static void vhost_vdpa_listener_region_add(MemoryListener 
*listener,
 }
 
 iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
-llend = int128_make64(section->offset_within_address_space);
-llend = int128_add(llend, section->size);
-llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
-
+llend = vhost_vdpa_section_end(section);
 if (int128_ge(int128_make64(iova), llend)) {
 return;
 }
@@ -221,9 +231,7 @@ static void vhost_vdpa_listener_region_del(MemoryListener 
*listener,
 }
 
 iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
-llend = int128_make64(section->offset_within_address_space);
-llend = int128_add(llend, section->size);
-llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
+llend = vhost_vdpa_section_end(section);
 
 trace_vhost_vdpa_listener_region_del(v, iova, int128_get64(llend));
 
-- 
2.27.0

[PATCH v4 0/3] vdpa: Check iova range on memory regions ops

2021-10-14 Thread Eugenio Pérez

At this moment vdpa will not send memory regions bigger than 1<<63.
However, actual iova range could be way more restrictive than that.

Since we can obtain the range through vdpa ioctl call, just save it
from the beginning of the operation and check against it.

Changes from v3:
* Change default value from UINT64_MAX >> 1 to UINT64_MAX, making it
  aligned with the kernel.

Changes from v2:
* Fallback to a default value in case kernel does not support
  VHOST_VDPA_GET_IOVA_RANGE syscall.

Changes from v1:
* Use of int128_gt instead of plain uint64_t < comparison on memory
  range end.
* Document vhost_vdpa_section_end's return value so it's clear that
  it returns "one past end".

Eugenio Pérez (3):
  vdpa: Skip protected ram IOMMU mappings
  vdpa: Add vhost_vdpa_section_end
  vdpa: Check for iova range at mappings changes

 include/hw/virtio/vhost-vdpa.h |  2 +
 hw/virtio/vhost-vdpa.c | 81 +-
 hw/virtio/trace-events |  1 +
 3 files changed, 63 insertions(+), 21 deletions(-)

-- 
2.27.0

[PATCH v4 3/3] vdpa: Check for iova range at mappings changes

2021-10-14 Thread Eugenio Pérez

Check vdpa device range before updating memory regions so we don't add
any outside of it, and report the invalid change if any.

Signed-off-by: Eugenio Pérez 
---
 include/hw/virtio/vhost-vdpa.h |  2 ++
 hw/virtio/vhost-vdpa.c | 62 +-
 hw/virtio/trace-events |  1 +
 3 files changed, 49 insertions(+), 16 deletions(-)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index a8963da2d9..c288cf7ecb 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -13,6 +13,7 @@
 #define HW_VIRTIO_VHOST_VDPA_H
 
 #include "hw/virtio/virtio.h"
+#include "standard-headers/linux/vhost_types.h"
 
 typedef struct VhostVDPAHostNotifier {
 MemoryRegion mr;
@@ -24,6 +25,7 @@ typedef struct vhost_vdpa {
 uint32_t msg_type;
 bool iotlb_batch_begin_sent;
 MemoryListener listener;
+struct vhost_vdpa_iova_range iova_range;
 struct vhost_dev *dev;
 VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
 } VhostVDPA;
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index be7c63b4ba..7691e904ac 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -37,20 +37,34 @@ static Int128 vhost_vdpa_section_end(const 
MemoryRegionSection *section)
 return llend;
 }
 
-static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section)
-{
-return (!memory_region_is_ram(section->mr) &&
-!memory_region_is_iommu(section->mr)) ||
-memory_region_is_protected(section->mr) ||
-   /* vhost-vDPA doesn't allow MMIO to be mapped  */
-memory_region_is_ram_device(section->mr) ||
-   /*
-* Sizing an enabled 64-bit BAR can cause spurious mappings to
-* addresses in the upper part of the 64-bit address space.  These
-* are never accessed by the CPU and beyond the address width of
-* some IOMMU hardware.  TODO: VDPA should tell us the IOMMU width.
-*/
-   section->offset_within_address_space & (1ULL << 63);
+static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
+uint64_t iova_min,
+uint64_t iova_max)
+{
+Int128 llend;
+
+if ((!memory_region_is_ram(section->mr) &&
+ !memory_region_is_iommu(section->mr)) ||
+memory_region_is_protected(section->mr) ||
+/* vhost-vDPA doesn't allow MMIO to be mapped  */
+memory_region_is_ram_device(section->mr)) {
+return true;
+}
+
+if (section->offset_within_address_space < iova_min) {
+error_report("RAM section out of device range (min=%lu, addr=%lu)",
+ iova_min, section->offset_within_address_space);
+return true;
+}
+
+llend = vhost_vdpa_section_end(section);
+if (int128_gt(llend, int128_make64(iova_max))) {
+error_report("RAM section out of device range (max=%lu, end addr=%lu)",
+ iova_max, int128_get64(llend));
+return true;
+}
+
+return false;
 }
 
 static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
@@ -162,7 +176,8 @@ static void vhost_vdpa_listener_region_add(MemoryListener 
*listener,
 void *vaddr;
 int ret;
 
-if (vhost_vdpa_listener_skipped_section(section)) {
+if (vhost_vdpa_listener_skipped_section(section, v->iova_range.first,
+v->iova_range.last)) {
 return;
 }
 
@@ -220,7 +235,8 @@ static void vhost_vdpa_listener_region_del(MemoryListener 
*listener,
 Int128 llend, llsize;
 int ret;
 
-if (vhost_vdpa_listener_skipped_section(section)) {
+if (vhost_vdpa_listener_skipped_section(section, v->iova_range.first,
+v->iova_range.last)) {
 return;
 }
 
@@ -288,6 +304,19 @@ static void vhost_vdpa_add_status(struct vhost_dev *dev, 
uint8_t status)
 vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &s);
 }
 
+static void vhost_vdpa_get_iova_range(struct vhost_vdpa *v)
+{
+int ret = vhost_vdpa_call(v->dev, VHOST_VDPA_GET_IOVA_RANGE,
+  &v->iova_range);
+if (ret != 0) {
+v->iova_range.first = 0;
+v->iova_range.last = UINT64_MAX;
+}
+
+trace_vhost_vdpa_get_iova_range(v->dev, v->iova_range.first,
+v->iova_range.last);
+}
+
 static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
 {
 struct vhost_vdpa *v;
@@ -300,6 +329,7 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void 
*opaque, Error **errp)
 v->listener = vhost_vdpa_memory_listener;
 v->msg_type = VHOST_IOTLB_MSG_V2;
 
+vhost_vdpa_get_iova_range(v);
 vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
VIRTIO_CONFIG_S_DRIVER);
 
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events

Re: [RFC PATCH v4 12/20] virtio: Add vhost_shadow_vq_get_vring_addr

2021-10-14 Thread Eugenio Perez Martin

On Wed, Oct 13, 2021 at 5:54 AM Jason Wang  wrote:
>
>
> 在 2021/10/1 下午3:05, Eugenio Pérez 写道:
> > It reports the shadow virtqueue address from qemu virtual address space
>
>
> I think both the title and commit log needs to more tweaks. Looking at
> the codes, what id does is actually introduce vring into svq.
>

Right, this commit evolved a little bit providing more functionality
and it is not reflected in the commit message. I will expand it.

>
> >
> > Signed-off-by: Eugenio Pérez 
> > ---
> >   hw/virtio/vhost-shadow-virtqueue.h |  4 +++
> >   hw/virtio/vhost-shadow-virtqueue.c | 50 ++
> >   2 files changed, 54 insertions(+)
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
> > b/hw/virtio/vhost-shadow-virtqueue.h
> > index 237cfceb9c..2df3d117f5 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > @@ -16,6 +16,10 @@ typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> >
> >   EventNotifier *vhost_svq_get_svq_call_notifier(VhostShadowVirtqueue *svq);
> >   void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int 
> > call_fd);
> > +void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
> > +  struct vhost_vring_addr *addr);
> > +size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq);
> > +size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
> >
> >   bool vhost_svq_start(struct vhost_dev *dev, unsigned idx,
> >VhostShadowVirtqueue *svq);
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
> > b/hw/virtio/vhost-shadow-virtqueue.c
> > index 3fe129cf63..5c1899f6af 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -18,6 +18,9 @@
> >
> >   /* Shadow virtqueue to relay notifications */
> >   typedef struct VhostShadowVirtqueue {
> > +/* Shadow vring */
> > +struct vring vring;
> > +
> >   /* Shadow kick notifier, sent to vhost */
> >   EventNotifier kick_notifier;
> >   /* Shadow call notifier, sent to vhost */
> > @@ -38,6 +41,9 @@ typedef struct VhostShadowVirtqueue {
> >
> >   /* Virtio queue shadowing */
> >   VirtQueue *vq;
> > +
> > +/* Virtio device */
> > +VirtIODevice *vdev;
> >   } VhostShadowVirtqueue;
> >
> >   /* Forward guest notifications */
> > @@ -93,6 +99,35 @@ void 
> > vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd)
> >   event_notifier_init_fd(&svq->guest_call_notifier, call_fd);
> >   }
> >
> > +/*
> > + * Get the shadow vq vring address.
> > + * @svq Shadow virtqueue
> > + * @addr Destination to store address
> > + */
> > +void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
> > +  struct vhost_vring_addr *addr)
> > +{
> > +addr->desc_user_addr = (uint64_t)svq->vring.desc;
> > +addr->avail_user_addr = (uint64_t)svq->vring.avail;
> > +addr->used_user_addr = (uint64_t)svq->vring.used;
> > +}
> > +
> > +size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq)
> > +{
> > +uint16_t vq_idx = virtio_get_queue_index(svq->vq);
> > +size_t desc_size = virtio_queue_get_desc_size(svq->vdev, vq_idx);
> > +size_t avail_size = virtio_queue_get_avail_size(svq->vdev, vq_idx);
> > +
> > +return ROUND_UP(desc_size + avail_size, qemu_real_host_page_size);
>
>
> Is this round up required by the spec?
>

No, I was trying to avoid that more qemu data get exposed to the
device because of mapping at page granularity, in case data gets
allocated after some region. I will expand with a comment, but if
there are other ways to achieve or it is not needed please let me
know!

>
> > +}
> > +
> > +size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq)
> > +{
> > +uint16_t vq_idx = virtio_get_queue_index(svq->vq);
> > +size_t used_size = virtio_queue_get_used_size(svq->vdev, vq_idx);
> > +return ROUND_UP(used_size, qemu_real_host_page_size);
> > +}
> > +
> >   /*
> >* Restore the vhost guest to host notifier, i.e., disables svq effect.
> >*/
> > @@ -178,6 +213,10 @@ void vhost_svq_stop(struct vhost_dev *dev, unsigned 
> > idx,
> >   VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> >   {
> >   int vq_idx = dev->vq_index + idx;
> > +unsigned num = virtio_queue_get_num(dev->vdev, vq_idx);
> > +size_t desc_size = virtio_queue_get_desc_size(dev->vdev, vq_idx);
> > +size_t driver_size;
> > +size_t device_size;
> >   g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 
> > 1);
> >   int r;
> >
> > @@ -196,6 +235,15 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev 
> > *dev, int idx)
> >   }
> >
> >   svq->vq = virtio_get_queue(dev->vdev, vq_idx);
> > +svq->vdev = dev->vdev;
> > +driver_size = vhost_svq_driver_area_size(svq);
> > +device_size = vhost_svq_device_area_size(svq);
> > +svq->vring.num = num;
> > +svq->vri

Re: [PATCH v4 2/2] monitor: refactor set/expire_password and allow VNC display id

2021-10-14 Thread Stefan Reiter


On 10/14/21 9:14 AM, Markus Armbruster wrote:

Stefan Reiter  writes:


On 10/12/21 11:24 AM, Markus Armbruster wrote:

Stefan Reiter  writes:


It is possible to specify more than one VNC server on the command line,
either with an explicit ID or the auto-generated ones à "default",
"vnc2", "vnc3", ...

It is not possible to change the password on one of these extra VNC
displays though. Fix this by adding a "display" parameter to the
"set_password" and "expire_password" QMP and HMP commands.

For HMP, the display is specified using the "-d" value flag.

For QMP, the schema is updated to explicitly express the supported
variants of the commands with protocol-discriminated unions.

Suggested-by: Eric Blake 
Suggested-by: Markus Armbruster 
Signed-off-by: Stefan Reiter 


[...]


[...]


Let me describe what you're doing in my own words, to make sure I got
both the what and the why:

1. Change @protocol and @connected from str to enum.

2. Turn the argument struct into a union tagged by @protocol, to enable
 3. and 4.

3. Add argument @display in branch 'vnc'.

4. Use a separate enum for @connected in each union branch, to enable 4.

5. Trim this enum in branch 'vnc' to the values that are actually
 supported.  Note that just one value remains, i.e. @connected is now
 an optional argument that can take only the default value.

6. Since such an argument is pointless, deprecate @connected in branch
 'vnc'.

Correct?


Yes.


Thanks!


Ignorant question: is it possible to have more than one 'spice' display?


Not in terms of passwords anyway. So only one SPICE password can be set at
any time, i.e. the 'display' parameter in this function does not apply.



Item 5. is not a compatibility break: before your patch,
qmp_set_password() rejects the values you drop.


Yes.



Item 5. adds another enum to the schema in return for more accurate
introspection and slightly simpler qmp_set_password().  I'm not sure
that's worthwhile.  I think we could use the same enum for both union
branches.


I tried to avoid mixing manual parsing and declarative QAPI stuff as much
as I could, so I moved as much as possible to QAPI. I personally like the
extra documentation of having the separate enum.


It's a valid choice.  I'm just pointing out another valid choice.  The
difference between them will go away when we remove deprecated
@connected.  You choose :)


I'd split this patch into three parts: item 1., 2.+3., 4.-6., because
each part can stand on its own.


The reason why I didn't do that initially is more to do with the C side.
I think splitting it up as you describe would make for some awkward diffs
on the qmp_set_password/hmp_set_password side.

I can try of course.


It's a suggestion, not a demand.  I'm a compulsory patch splitter.


I'll just have a go and see what falls out. I do agree that this patch is a
bit long on its own.




  Though I also want to have it said that I'm not a fan
of how the HMP functions have to expand so much to accommodate the QAPI
structure in general.


Care to elaborate?


Well, before this patch 'hmp_set_password' was for all intents and purposes a
single function call to 'qmp_set_password'. Now it has a bunch of parameter
parsing and even validation (e.g. enum parsing). That's why I asked in the
v3 patch (I think?) if there was a way to call the QAPI style parsing from
there, since the parameters are all named the same and in a qdict already..

Something like:

  void hmp_set_password(Monitor *mon, const QDict *qdict)
  {
ExpirePasswordOptions opts = qapi_magical_parse_fn(qdict);
qmp_set_password(&opts,·&err);
[error handling]
  }

That being said, I don't mind the current form enough to make this a bigger
discussion either, so if there isn't an easy way to do the above, let's just
leave it like it is.

Re: [PATCH 00/16] fdt: Make OF_BOARD a boolean option

2021-10-14 Thread Tom Rini

On Wed, Oct 13, 2021 at 12:06:02PM -0600, Simon Glass wrote:
> Hi François,
> 
> On Wed, 13 Oct 2021 at 11:35, François Ozog  wrote:
> >
> > Hi Simon
> >
> > Le mer. 13 oct. 2021 à 16:49, Simon Glass  a écrit :
> >>
> >> Hi Tom, Bin,François,
> >>
> >> On Tue, 12 Oct 2021 at 19:34, Tom Rini  wrote:
> >> >
> >> > On Wed, Oct 13, 2021 at 09:29:14AM +0800, Bin Meng wrote:
> >> > > Hi Simon,
> >> > >
> >> > > On Wed, Oct 13, 2021 at 9:01 AM Simon Glass  wrote:
> >> > > >
> >> > > > With Ilias' efforts we have dropped OF_PRIOR_STAGE and OF_HOSTFILE so
> >> > > > there are only three ways to obtain a devicetree:
> >> > > >
> >> > > >- OF_SEPARATE - the normal way, where the devicetree is built and
> >> > > >   appended to U-Boot
> >> > > >- OF_EMBED - for development purposes, the devicetree is embedded 
> >> > > > in
> >> > > >   the ELF file (also used for EFI)
> >> > > >- OF_BOARD - the board figures it out on its own
> >> > > >
> >> > > > The last one is currently set up so that no devicetree is needed at 
> >> > > > all
> >> > > > in the U-Boot tree. Most boards do provide one, but some don't. Some
> >> > > > don't even provide instructions on how to boot on the board.
> >> > > >
> >> > > > The problems with this approach are documented at [1].
> >> > > >
> >> > > > In practice, OF_BOARD is not really distinct from OF_SEPARATE. Any 
> >> > > > board
> >> > > > can obtain its devicetree at runtime, even it is has a devicetree 
> >> > > > built
> >> > > > in U-Boot. This is because U-Boot may be a second-stage bootloader 
> >> > > > and its
> >> > > > caller may have a better idea about the hardware available in the 
> >> > > > machine.
> >> > > > This is the case with a few QEMU boards, for example.
> >> > > >
> >> > > > So it makes no sense to have OF_BOARD as a 'choice'. It should be an
> >> > > > option, available with either OF_SEPARATE or OF_EMBED.
> >> > > >
> >> > > > This series makes this change, adding various missing devicetree 
> >> > > > files
> >> > > > (and placeholders) to make the build work.
> >> > >
> >> > > Adding device trees that are never used sounds like a hack to me.
> >> > >
> >> > > For QEMU, device tree is dynamically generated on the fly based on
> >> > > command line parameters, and the device tree you put in this series
> >> > > has various hardcoded  values which normally do not show up
> >> > > in hand-written dts files.
> >> > >
> >> > > I am not sure I understand the whole point of this.
> >> >
> >> > I am also confused and do not like the idea of adding device trees for
> >> > platforms that are capable of and can / do have a device tree to give us
> >> > at run time.
> >>
> >> (I'll just reply to this one email, since the same points applies to
> >> all replies I think)
> >>
> >> I have been thinking about this and discussing it with people for a
> >> few months now. I've been signalling a change like this for over a
> >> month now, on U-Boot contributor calls and in discussions with Linaro
> >> people. I sent a patch (below) to try to explain things. I hope it is
> >> not a surprise!
> >>
> >> The issue here is that we need a devicetree in-tree in U-Boot, to
> >> avoid the mess that has been created by OF_PRIOR_STAGE, OF_BOARD,
> >> BINMAN_STANDALONE_FDT and to a lesser extent, OF_HOSTFILE. Between
> >> Ilias' series and this one we can get ourselves on a stronger footing.
> >> There is just OF_SEPARATE, with OF_EMBED for debugging/ELF use.
> >> For more context:
> >>
> >> http://patchwork.ozlabs.org/project/uboot/patch/20210919215111.3830278-3-...@chromium.org/
> >>
> >> BTW I did suggest to QEMU ARM that they support a way of adding the
> >> u-boot.dtsi but there was not much interest there (in fact the
> >> maintainer would prefer there was no special support even for booting
> >> Linux directly!)
> >
> > i understand their point of view and agree with it.
> >>
> >> But in any case it doesn't really help U-Boot. I
> >> think the path forward might be to run QEMU twice, once to get its
> >> generated tree and once to give the 'merged' tree with the U-Boot
> >> properties in it, if people want to use U-Boot features.
> >>
> >> I do strongly believe that OF_BOARD must be a run-time option, not a
> >> build-time one. It creates all sorts of problems and obscurity which
> >> have taken months to unpick. See the above patch for the rationale.
> >>
> >> To add to that rationale, OF_BOARD needs to be an option available to
> >> any board. At some point in the future it may become a common way
> >> things are done, e.g. TF-A calling U-Boot and providing a devicetree
> >> to it. It doesn't make any sense to have people decide whether or not
> >> to set OF_BOARD at build time, thus affecting how the image is put
> >> together. We'll end up with different U-Boot build targets like
> >> capricorn, capricorn_of_board and the like. It should be obvious where
> >> that will lead. Instead, OF_BOARD needs to become a commonly used
> >> option, perhaps enabled by mos

Re: [RFC PATCH v4 13/20] vdpa: Save host and guest features

2021-10-14 Thread Eugenio Perez Martin

On Wed, Oct 13, 2021 at 5:57 AM Jason Wang  wrote:
>
>
> 在 2021/10/1 下午3:05, Eugenio Pérez 写道:
> > Those are needed for SVQ: Host ones are needed to check if SVQ knows
> > how to talk with the device and for feature negotiation, and guest ones
> > to know if SVQ can talk with it.
> >
> > Signed-off-by: Eugenio Pérez 
> > ---
> >   include/hw/virtio/vhost-vdpa.h |  2 ++
> >   hw/virtio/vhost-vdpa.c | 31 ---
> >   2 files changed, 30 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> > index fddac248b3..9044ae694b 100644
> > --- a/include/hw/virtio/vhost-vdpa.h
> > +++ b/include/hw/virtio/vhost-vdpa.h
> > @@ -26,6 +26,8 @@ typedef struct vhost_vdpa {
> >   int device_fd;
> >   uint32_t msg_type;
> >   MemoryListener listener;
> > +uint64_t host_features;
> > +uint64_t guest_features;
>
>
> Any reason that we can't use the features stored in VirtioDevice?
>
> Thanks
>

It was easier to handle the non standard _F_STOP feature flag in
vhost-vdpa but I think we can use VirtIODevice flags for the next
series.

Thanks!

>
> >   bool shadow_vqs_enabled;
> >   GPtrArray *shadow_vqs;
> >   struct vhost_dev *dev;
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 6c5f4c98b8..a057e8277d 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -439,10 +439,19 @@ static int vhost_vdpa_set_mem_table(struct vhost_dev 
> > *dev,
> >   return 0;
> >   }
> >
> > -static int vhost_vdpa_set_features(struct vhost_dev *dev,
> > -   uint64_t features)
> > +/**
> > + * Internal set_features() that follows vhost/VirtIO protocol for that
> > + */
> > +static int vhost_vdpa_backend_set_features(struct vhost_dev *dev,
> > +   uint64_t features)
> >   {
> > +struct vhost_vdpa *v = dev->opaque;
> > +
> >   int ret;
> > +if (v->host_features & BIT_ULL(VIRTIO_F_QUEUE_STATE)) {
> > +features |= BIT_ULL(VIRTIO_F_QUEUE_STATE);
> > +}
> > +
> >   trace_vhost_vdpa_set_features(dev, features);
> >   ret = vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
> >   uint8_t status = 0;
> > @@ -455,6 +464,17 @@ static int vhost_vdpa_set_features(struct vhost_dev 
> > *dev,
> >   return !(status & VIRTIO_CONFIG_S_FEATURES_OK);
> >   }
> >
> > +/**
> > + * Exposed vhost set features
> > + */
> > +static int vhost_vdpa_set_features(struct vhost_dev *dev,
> > +   uint64_t features)
> > +{
> > +struct vhost_vdpa *v = dev->opaque;
> > +v->guest_features = features;
> > +return vhost_vdpa_backend_set_features(dev, features);
> > +}
> > +
> >   static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
> >   {
> >   uint64_t features;
> > @@ -673,12 +693,17 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev 
> > *dev,
> >   }
> >
> >   static int vhost_vdpa_get_features(struct vhost_dev *dev,
> > - uint64_t *features)
> > +   uint64_t *features)
> >   {
> >   int ret;
> >
> >   ret = vhost_vdpa_call(dev, VHOST_GET_FEATURES, features);
> >   trace_vhost_vdpa_get_features(dev, *features);
> > +
> > +if (ret == 0) {
> > +struct vhost_vdpa *v = dev->opaque;
> > +v->host_features = *features;
> > +}
> >   return ret;
> >   }
> >
>

Re: [RFC PATCH v4 15/20] vhost: Shadow virtqueue buffers forwarding

2021-10-14 Thread Eugenio Perez Martin

On Tue, Oct 12, 2021 at 3:48 PM Markus Armbruster  wrote:
>
> Eugenio Perez Martin  writes:
>
> > On Tue, Oct 12, 2021 at 7:21 AM Markus Armbruster  wrote:
> >>
> >> Eugenio Pérez  writes:
> >>
> >> > Initial version of shadow virtqueue that actually forward buffers. There
> >> > are no iommu support at the moment, and that will be addressed in future
> >> > patches of this series. Since all vhost-vdpa devices uses forced IOMMU,
> >> > this means that SVQ is not usable at this point of the series on any
> >> > device.
> >> >
> >> > For simplicity it only supports modern devices, that expects vring
> >> > in little endian, with split ring and no event idx or indirect
> >> > descriptors. Support for them will not be added in this series.
> >> >
> >> > It reuses the VirtQueue code for the device part. The driver part is
> >> > based on Linux's virtio_ring driver, but with stripped functionality
> >> > and optimizations so it's easier to review. Later commits add simpler
> >> > ones.
> >> >
> >> > SVQ uses VIRTIO_CONFIG_S_DEVICE_STOPPED to pause the device and
> >> > retrieve its status (next available idx the device was going to
> >> > consume) race-free. It can later reset the device to replace vring
> >> > addresses etc. When SVQ starts qemu can resume consuming the guest's
> >> > driver ring from that state, without notice from the latter.
> >> >
> >> > This status bit VIRTIO_CONFIG_S_DEVICE_STOPPED is currently discussed
> >> > in VirtIO, and is implemented in qemu VirtIO-net devices in previous
> >> > commits.
> >> >
> >> > Removal of _S_DEVICE_STOPPED bit (in other words, resuming the device)
> >> > can be done in the future if an use case arises. At this moment we can
> >> > just rely on reseting the full device.
> >> >
> >> > Signed-off-by: Eugenio Pérez 
> >> > ---
> >> >  qapi/net.json  |   2 +-
> >> >  hw/virtio/vhost-shadow-virtqueue.c | 237 -
> >> >  hw/virtio/vhost-vdpa.c | 109 -
> >> >  3 files changed, 337 insertions(+), 11 deletions(-)
> >> >
> >> > diff --git a/qapi/net.json b/qapi/net.json
> >> > index fe546b0e7c..1f4a55f2c5 100644
> >> > --- a/qapi/net.json
> >> > +++ b/qapi/net.json
> >> > @@ -86,7 +86,7 @@
> >> >  #
> >> >  # @name: the device name of the VirtIO device
> >> >  #
> >> > -# @enable: true to use the alternate shadow VQ notifications
> >> > +# @enable: true to use the alternate shadow VQ buffers fowarding path
> >>
> >> Uh, why does the flag change meaning half-way through this series?
> >>
> >
> > Before this patch, the SVQ mode just makes an extra hop for
> > notifications. Guest ones are now received by qemu via ioeventfd, and
> > qemu forwards them to the device using a different eventfd. The
> > reverse is also true: the device ones will be received by qemu by
> > device call fd, and then qemu will forward them to the guest using a
> > different irqfd.
> >
> > This intermediate step is not very useful by itself, but helps for
> > checking that that part of the communication works fine, with no need
> > for shadow virtqueue to understand vring format. Doing that way also
> > produces smaller patches.
> >
> > So it makes sense to me to tell what QMP command does exactly at every
> > point of the series. However I can directly document it as "use the
> > alternate shadow VQ buffers forwarding path" from the beginning.
> >
> > Does this make sense, or will it be better to write the final
> > intention of the command?
> >
> > Thanks!
>
> Working your explanation into commit messages and possibly comments
> should do.
>

Got it, I will include them in both. Thanks!

Re: [PATCH v3 1/9] bsd-user/mmap.c: Always zero MAP_ANONYMOUS memory in mmap_frag()

2021-10-14 Thread Kyle Evans

On Fri, Oct 8, 2021 at 4:24 PM Warner Losh  wrote:
>
> From: Mikaël Urankar 
>
> Similar to the equivalent linux-user commit e6deac9cf99
>
> When mapping MAP_ANONYMOUS memory fragments, still need notice about to
> set it zero, or it will cause issues.
>
> Signed-off-by: Mikaël Urankar 
> Signed-off-by: Warner Losh 
> Reviewed-by: Richard Henderson 
> ---
>  bsd-user/mmap.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/bsd-user/mmap.c b/bsd-user/mmap.c
> index b40ab9045f..fc3c1480f5 100644
> --- a/bsd-user/mmap.c
> +++ b/bsd-user/mmap.c
> @@ -180,10 +180,12 @@ static int mmap_frag(abi_ulong real_start,
>  if (prot_new != (prot1 | PROT_WRITE))
>  mprotect(host_start, qemu_host_page_size, prot_new);
>  } else {
> -/* just update the protection */
>  if (prot_new != prot1) {
>  mprotect(host_start, qemu_host_page_size, prot_new);
>  }
> +if (prot_new & PROT_WRITE) {
> +memset(g2h_untagged(start), 0, end - start);
> +}
>  }
>  return 0;
>  }
> --
> 2.32.0
>

Reviewed-by: Kyle Evans

Re: [PATCH v3 2/9] bsd-user/mmap.c: check pread's return value to fix warnings with _FORTIFY_SOURCE

2021-10-14 Thread Kyle Evans

On Fri, Oct 8, 2021 at 4:24 PM Warner Losh  wrote:
>
> From: Mikaël Urankar 
>
> Simmilar to the equivalent linux-user: commit fb7e378cf9c, which added
> checking to pread's return value. Update to current qemu standards with
> {} around the if statement.
>
> Signed-off-by: Mikaël Urankar 
> Signed-off-by: Warner Losh 
> Reviewed-by: Richard Henderson 
> Reviewed-by: Philippe Mathieu-Daudé 
> ---
>  bsd-user/mmap.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/bsd-user/mmap.c b/bsd-user/mmap.c
> index fc3c1480f5..4f4fa3ab46 100644
> --- a/bsd-user/mmap.c
> +++ b/bsd-user/mmap.c
> @@ -174,7 +174,9 @@ static int mmap_frag(abi_ulong real_start,
>  mprotect(host_start, qemu_host_page_size, prot1 | PROT_WRITE);
>
>  /* read the corresponding file data */
> -pread(fd, g2h_untagged(start), end - start, offset);
> +if (pread(fd, g2h_untagged(start), end - start, offset) == -1) {
> +return -1;
> +}
>
>  /* put final protection */
>  if (prot_new != (prot1 | PROT_WRITE))
> @@ -593,7 +595,9 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int 
> prot,
>-1, 0);
>  if (retaddr == -1)
>  goto fail;
> -pread(fd, g2h_untagged(start), len, offset);
> +if (pread(fd, g2h_untagged(start), len, offset) == -1) {
> +goto fail;
> +}
>  if (!(prot & PROT_WRITE)) {
>  ret = target_mprotect(start, len, prot);
>  if (ret != 0) {
> --
> 2.32.0
>

Reviewed-by: Kyle Evans

Re: [PATCH v3 5/9] bsd-user/mmap.c: mmap prefer MAP_ANON for BSD

2021-10-14 Thread Kyle Evans

On Fri, Oct 8, 2021 at 4:24 PM Warner Losh  wrote:
>
> MAP_ANON and MAP_ANONYMOUS are identical. Prefer MAP_ANON for BSD since
> the file is now a confusing mix of the two.
>
> Signed-off-by: Warner Losh 
> Reviewed-by: Philippe Mathieu-Daudé 
> Reviewed-by: Richard Henderson 
> ---
>  bsd-user/mmap.c | 11 +--
>  1 file changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/bsd-user/mmap.c b/bsd-user/mmap.c
> index f0be3b12cf..301108ed25 100644
> --- a/bsd-user/mmap.c
> +++ b/bsd-user/mmap.c
> @@ -285,7 +285,7 @@ static abi_ulong mmap_find_vma_aligned(abi_ulong start, 
> abi_ulong size,
>  addr = start;
>  wrapped = repeat = 0;
>  prev = 0;
> -flags = MAP_ANONYMOUS | MAP_PRIVATE;
> +flags = MAP_ANON | MAP_PRIVATE;
>  if (alignment != 0) {
>  flags |= MAP_ALIGNED(alignment);
>  }
> @@ -409,7 +409,7 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int 
> prot,
>  if (flags & MAP_FIXED) {
>  printf("MAP_FIXED ");
>  }
> -if (flags & MAP_ANONYMOUS) {
> +if (flags & MAP_ANON) {
>  printf("MAP_ANON ");
>  }
>  if (flags & MAP_EXCL) {
> @@ -431,7 +431,7 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int 
> prot,
>  }
>  #endif
>
> -if ((flags & MAP_ANONYMOUS) && fd != -1) {
> +if ((flags & MAP_ANON) && fd != -1) {
>  errno = EINVAL;
>  goto fail;
>  }
> @@ -533,7 +533,7 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int 
> prot,
>   * qemu_real_host_page_size
>   */
>  p = mmap(g2h_untagged(start), host_len, prot,
> - flags | MAP_FIXED | ((fd != -1) ? MAP_ANONYMOUS : 0), -1, 
> 0);
> + flags | MAP_FIXED | ((fd != -1) ? MAP_ANON : 0), -1, 0);
>  if (p == MAP_FAILED)
>  goto fail;
>  /* update start so that it points to the file position at 'offset' */
> @@ -696,8 +696,7 @@ static void mmap_reserve(abi_ulong start, abi_ulong size)
>  }
>  if (real_start != real_end) {
>  mmap(g2h_untagged(real_start), real_end - real_start, PROT_NONE,
> - MAP_FIXED | MAP_ANONYMOUS | MAP_PRIVATE,
> - -1, 0);
> + MAP_FIXED | MAP_ANON | MAP_PRIVATE, -1, 0);
>  }
>  }
>
> --
> 2.32.0
>

Reviewed-by: Kyle Evans

1 2 3 4 >

1 - 100 of 328 matches

Mail list logo