date:20230801

RE: [PATCH RFC v1 3/3] vfio/pci: dynamic MSI-X allocation in interrupt restoring

2023-08-01 Thread Liu, Jing2

Hi Alex,

> On July 28, 2023 1:25 AM, Alex Williamson  wrote:
> 
> On Thu, 27 Jul 2023 03:24:10 -0400
> Jing Liu  wrote:
> 
> > During migration restoring, vfio_enable_vectors() is called to restore
> > enabling MSI-X interrupts for assigned devices. It sets the range from
> > 0 to nr_vectors to kernel to enable MSI-X and the vectors unmasked in
> > guest. During the MSI-X enabling, all the vectors within the range are
> > allocated according to the ioctl().
> >
> > When dynamic MSI-X allocation is supported, we only want the guest
> > unmasked vectors being allocated and enabled. Therefore, Qemu can
> > first set vector 0 to enable MSI-X and after that, all the vectors can
> > be allocated in need.
> >
> > Signed-off-by: Jing Liu 
> > ---
> >  hw/vfio/pci.c | 32 
> >  1 file changed, 32 insertions(+)
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index
> > 8c485636445c..43ffacd5b36a 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -375,6 +375,38 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev,
> bool msix)
> >  int ret = 0, i, argsz;
> >  int32_t *fds;
> >
> > +/*
> > + * If dynamic MSI-X allocation is supported, the vectors to be 
> > allocated
> > + * and enabled can be scattered. Before kernel enabling MSI-X, setting
> > + * nr_vectors causes all these vectors being allocated on host.
> 
> s/being/to be/
Will change.

> 
> > + *
> > + * To keep allocation as needed, first setup vector 0 with an invalid
> > + * fd to make MSI-X enabled, then enable vectors by setting all so that
> > + * kernel allocates and enables interrupts only when enabled in guest.
> > + */
> > +if (msix && !(vdev->msix->irq_info_flags &
> > + VFIO_IRQ_INFO_NORESIZE)) {
> 
> !vdev->msix->noresize again seems cleaner.
Sure, will change.
> 
> > +argsz = sizeof(*irq_set) + sizeof(*fds);
> > +
> > +irq_set = g_malloc0(argsz);
> > +irq_set->argsz = argsz;
> > +irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
> > + VFIO_IRQ_SET_ACTION_TRIGGER;
> > +irq_set->index = msix ? VFIO_PCI_MSIX_IRQ_INDEX :
> > + VFIO_PCI_MSI_IRQ_INDEX;
> 
> Why are we testing msix again within a branch that requires msix?
Ah, yes. Will remove the test.

> 
> > +irq_set->start = 0;
> > +irq_set->count = 1;
> > +fds = (int32_t *)&irq_set->data;
> > +fds[0] = -1;
> > +
> > +ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS,
> > + irq_set);
> > +
> > +g_free(irq_set);
> > +
> > +if (ret) {
> > +return ret;
> > +}
> > +}
> 
> So your goal here is simply to get the kernel to call vfio_msi_enable() with 
> nvec
> = 1 to get MSI-X enabled on the device, which then allows the kernel to use 
> the
> dynamic expansion when we call SET_IRQS again with a potentially sparse set of
> eventfds to vector mappings.  

Yes, that's what I can think out to get MSI-X enabled first. The only question 
is that,
when getting kernel to call vfio_msi_enable() with nvec=1, kernel will allocate 
one
interrupt along with enabling MSI-X, which cannot avoid. 

Therefore, if we set vector 0 for example, irq for vec 0 will be allocated in 
kernel.
And later if vector 0 is unmasked in guest, then enable it as normal; but if 
vector 0
is always masked in guest, then we leave an allocated irq there (unenabled 
though)
until MSI-X disable.
I'm not sure if this is okay, but cannot think out other cleaner way.
And I also wonder if it is possible, or vector 0 is always being enabled?

This seems very similar to the nr_vectors == 0
> branch of vfio_msix_enable() where it uses a do_use and release call to
> accomplish getting MSI-X enabled.  

They are similar. Use a do_use to setup userspace triggering also makes kernel
one allocated irq there. And my understanding is that, the following release 
function
actually won't release if it is a userspace trigger.

static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
{
/*
 * There are still old guests that mask and unmask vectors on every
 * interrupt.  If we're using QEMU bypass with a KVM irqfd, leave all of
 * the KVM setup in place, simply switch VFIO to use the non-bypass
 * eventfd.  We'll then fire the interrupt through QEMU and the MSI-X
 * core will mask the interrupt and set pending bits, allowing it to
 * be re-asserted on unmask.  Nothing to do if already using QEMU mode.
 */
...
}

We should consolidate, probably by pulling
> this out into a function since it seems cleaner to use the fd = -1 trick than 
> to
> setup userspace triggering and immediately release.  Thanks,

Oh, yes, agree that uses fd=-1 trick is cleaner and we don't need depend on the 
maskable
bit in qemu. According to your suggestion, I will create a function e.g., 
vfio_enable_msix_no_vec(vdev), which only sets vector 0 with fd=-1 to kernel, 
and 
returns the r

Re: [PATCH 0/8] Adds CPU hot-plug support to Loongarch

2023-08-01 Thread lixianglai


Hi,  Salil

On 2023/7/27 pm 10:51, Salil Mehta wrote:

Hello,


From: lixianglai 
Sent: Thursday, July 27, 2023 3:14 AM
To: Gavin Shan ; qemu-devel@nongnu.org; Salil Mehta
; zhukeqian ; Bibo Mao

Subject: Re: [PATCH 0/8] Adds CPU hot-plug support to Loongarch

Hi Gavin and Salil,

On 7/27/23 8:57 AM, Gavin Shan wrote:

Hi Xianglai,

On 7/20/23 17:15, xianglai li wrote:

Hello everyone, We refer to the implementation of ARM CPU
Hot-Plug to add GED-based CPU Hot-Plug support to Loongarch.

The first 4 patches are changes to the QEMU common code,
including adding GED support for CPU Hot-Plug, updating
the ACPI table creation process, and adding
qdev_disconnect_gpio_out_named
and cpu_address_space_destroy interfaces to release resources
when CPU un-plug.

The last four patches are Loongarch architecture-related,
and the modifications include the definition of the hook
function related to the CPU Hot-(UN)Plug, the allocation
and release of CPU resources when CPU Hot-(UN)Plug,
the creation process of updating the ACPI table,
and finally the custom switch for the CPU Hot-Plug.


[...]

It seems the design for ARM64 hotplug has been greatly referred, but the authors
are missed to be copied if you were referring to the following repository. There
will be conflicts between those two patchsets as I can see and I'm not sure how
it can be resolved. In theory, one patchset needs to be rebased on another one
since they're sharing large amount of codes.

   https://github.com/salil-mehta/qemu.git
   (branch: virt-cpuhp-armv8/rfc-v1-port11052023.dev-1)

I took a quick scan on this series. Loongarch doesn't have ARM64's constraint 
due
to vGIC3 where all vCPUs's file descriptor needs to be in place. It means the 
possible
and not yet present vCPU needs to be visible to KVM. Without this constraint, 
the
implementation is simplified a lot.

We have great respect and gratitude to Salil and his team for their work
and contributions to CPU HotPlug,


Many thanks! Really appreciate this.



which greatly reduced the difficulty of developing CPU HotPlug in
Loongarch.


We are glad that this work is useful to other companies as well this
was one of our goal.



So, We plan to rebase the next version of patch based on their code.


Great. Thanks for clarifying this as accountability is important
even though we are working in a collaborative environment.

As such, I am planning to send the RFC V2 in 2 weeks of time and
will make sure that the patches which are required by your patch-set
are formed in such a way that they can be independently accepted
w.r.t rest of the ARM64 architecture specific patch-set.



Hi Salil,

I estimate that it will take quite some time for your patch series to
merge in,

if allowed, can you merge your patch series changes to the common code
section into the community first,

so that our code can be rebase and merged.


Sure, as clarified above, something similar, will create a patch-set
which will have patches which can be independently accepted/Ack'ed
and consumed even before the complete patch-set.

Although I think, even in current form most patches have been arranged
in such a way. But I will doubly check again before I float RFC V2.

I am sorry for the late reply.

Thanks very much,We look forward to joining the community with your 
patch series.





Besides, we found the vCPU hotplug isn't working for TCG when Salil's
series was
tested. I'm not sure if we have same issue with this series, or TCG
isn't a concern
to us at all?

At present, QEMU only supports TCG under Loongarch,

and we test CPU HotPlug is also carried out under QEMU TCG,

and CPU HotPlug can work normally when testing.

Of course, we are also very happy to see you testing CPU hotplug under
Loongarch,

which can find more problems and help us improve our patch.


We know that. There are some trivial fixes we already have, I will push
them as well for the completion. We were not sure if anybody needs them
as usage of vCPU Hotplug under 'accel=tcg' is highly unlikely except for
testing cases. Please let us know if you have any?

No, we don't have it yet.


Many thanks!
Salil.

Re: [RFC] migration/block-dirty-bitmap: make loading bitmap for device with iothread future-proof

2023-08-01 Thread Fiona Ebner

Am 31.07.23 um 09:35 schrieb Juan Quintela:
> Fiona Ebner  wrote:
>> The bdrv_create_dirty_bitmap() function (which is also called by
>> bdrv_dirty_bitmap_create_successor()) uses bdrv_getlength(bs). This is
>> a wrapper around a coroutine, and when not called in coroutine context
>> would use bdrv_poll_co(). Such a call would trigger an assert() if the
>> correct AioContext hasn't been acquired before, because polling would
>> try to release the AioContext.
> 
> The ingenous in me thinks:
> 
> If the problem is that bdrv_poll_co() release an AioContext that it
> don't have acquired, perhaps we should fix bdrv_poll_co().

AFAIU, it's necessary to release the lock there, so somebody else can
make progress while we poll.

>> The issue would happen for snapshots,

Sorry, what I wrote is not true, because bdrv_co_load_vmstate() is a
coroutine, so I think it would be the same situation as for migration
and bdrv_co_getlength() would be called directly by the wrapper.

And the patch itself also got an issue. AFAICT, when
qcow2_co_load_vmstate() is called, we already have acquired the context
for the drive we load the snapshot from, and since
polling/AIO_WAIT_WHILE requires that the caller has acquired the context
exactly once, we'd need to distinguish if the dirty bitmap is for that
drive (can't acquire the context a second time) or a different drive
(need to acquire the context for the first time).

>> but won't in practice, because
>> saving a snapshot with a block dirty bitmap is currently not possible.
>> The reason is that dirty_bitmap_save_iterate() returns whether it has
>> completed the bulk phase, which only happens in postcopy, so
>> qemu_savevm_state_iterate() will always return 0, meaning the call
>> to iterate will be repeated over and over again without ever reaching
>> the completion phase.
>>
>> Still, this would make the code more robust for the future.
> 
> What I wonder is if we should annotate this calls somehow that they need
> to be called from a coroutine context, because I would have never found
> it.
> 
> And just thinking loud:
> 
> I still wonder if we should make incoming migration its own thread.
> Because we got more and more problems because it is a coroutine, that in
> non-multifd case can consume a whole CPU alone, so it makes no sense to
> have a coroutine.
> 

That would then be a situation where the issue would pop up (assuming
the new thread doesn't also use a coroutine to load the state).

> On the other hand, with multifd, it almost don't use any resources, so

Re: [PATCH 1/8] Update ACPI GED framework to support vcpu hot-(un)plug

2023-08-01 Thread lixianglai


Hi, Igor Mammedov :

On 7/28/23 7:45 PM, Igor Mammedov wrote:

On Thu, 20 Jul 2023 15:15:06 +0800
xianglai li  wrote:


ACPI GED shall be used to convey to the guest kernel about any cpu hot-(un)plug
events. Therefore, existing ACPI GED framework inside QEMU needs to be enhanced
to support CPU hot-(un)plug state and events.

skimmed through, it looks good to me

see nit below


Cc: Xiaojuan Yang 
Cc: Song Gao 
Cc: "Michael S. Tsirkin" 
Cc: Igor Mammedov 
Cc: Ani Sinha 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: "Philippe Mathieu-Daudé" 
Cc: Yanan Wang 
Cc: "Daniel P. Berrangé" 
Cc: Peter Xu 
Cc: David Hildenbrand 
Signed-off-by: xianglai li 
---
  hw/acpi/acpi-cpu-hotplug-stub.c|  6 +
  hw/acpi/cpu.c  |  7 --
  hw/acpi/generic_event_device.c | 33 ++
  include/hw/acpi/cpu_hotplug.h  | 10 
  include/hw/acpi/generic_event_device.h |  6 +
  5 files changed, 55 insertions(+), 7 deletions(-)

diff --git a/hw/acpi/acpi-cpu-hotplug-stub.c b/hw/acpi/acpi-cpu-hotplug-stub.c
index 3fc4b14c26..2aec90d968 100644
--- a/hw/acpi/acpi-cpu-hotplug-stub.c
+++ b/hw/acpi/acpi-cpu-hotplug-stub.c
@@ -24,6 +24,12 @@ void acpi_cpu_ospm_status(CPUHotplugState *cpu_st, 
ACPIOSTInfoList ***list)
  return;
  }
  
+void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner,

+ CPUHotplugState *state, hwaddr base_addr)
+{
+return;
+}
+
  void acpi_cpu_plug_cb(HotplugHandler *hotplug_dev,
CPUHotplugState *cpu_st, DeviceState *dev, Error **errp)
  {
diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
index 19c154d78f..6897c8789a 100644
--- a/hw/acpi/cpu.c
+++ b/hw/acpi/cpu.c
@@ -6,13 +6,6 @@
  #include "trace.h"
  #include "sysemu/numa.h"
  
-#define ACPI_CPU_HOTPLUG_REG_LEN 12

-#define ACPI_CPU_SELECTOR_OFFSET_WR 0
-#define ACPI_CPU_FLAGS_OFFSET_RW 4
-#define ACPI_CPU_CMD_OFFSET_WR 5
-#define ACPI_CPU_CMD_DATA_OFFSET_RW 8
-#define ACPI_CPU_CMD_DATA2_OFFSET_R 0
-
  #define OVMF_CPUHP_SMI_CMD 4
  
  enum {

diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index a3d31631fe..c5a70957b4 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -12,6 +12,7 @@
  #include "qemu/osdep.h"
  #include "qapi/error.h"
  #include "hw/acpi/acpi.h"
+#include "hw/acpi/cpu.h"
  #include "hw/acpi/generic_event_device.h"
  #include "hw/irq.h"
  #include "hw/mem/pc-dimm.h"
@@ -25,6 +26,7 @@ static const uint32_t ged_supported_events[] = {
  ACPI_GED_MEM_HOTPLUG_EVT,
  ACPI_GED_PWR_DOWN_EVT,
  ACPI_GED_NVDIMM_HOTPLUG_EVT,
+ACPI_GED_CPU_HOTPLUG_EVT,
  };
  
  /*

@@ -117,6 +119,10 @@ void build_ged_aml(Aml *table, const char *name, 
HotplugHandler *hotplug_dev,
 aml_notify(aml_name("\\_SB.NVDR"),
aml_int(0x80)));
  break;
+case ACPI_GED_CPU_HOTPLUG_EVT:
+aml_append(if_ctx, aml_call0(ACPI_CPU_CONTAINER "."
+ ACPI_CPU_SCAN_METHOD));
+break;
  default:
  /*
   * Please make sure all the events in ged_supported_events[]
@@ -234,6 +240,8 @@ static void acpi_ged_device_plug_cb(HotplugHandler 
*hotplug_dev,
  } else {
  acpi_memory_plug_cb(hotplug_dev, &s->memhp_state, dev, errp);
  }
+} else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
+acpi_cpu_plug_cb(hotplug_dev, &s->cpuhp_state, dev, errp);
  } else {
  error_setg(errp, "virt: device plug request for unsupported device"
 " type: %s", object_get_typename(OBJECT(dev)));
@@ -248,6 +256,8 @@ static void acpi_ged_unplug_request_cb(HotplugHandler 
*hotplug_dev,
  if ((object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) &&
 !(object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM {
  acpi_memory_unplug_request_cb(hotplug_dev, &s->memhp_state, dev, 
errp);
+} else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
+acpi_cpu_unplug_request_cb(hotplug_dev, &s->cpuhp_state, dev, errp);
  } else {
  error_setg(errp, "acpi: device unplug request for unsupported device"
 " type: %s", object_get_typename(OBJECT(dev)));
@@ -261,6 +271,8 @@ static void acpi_ged_unplug_cb(HotplugHandler *hotplug_dev,
  
  if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {

  acpi_memory_unplug_cb(&s->memhp_state, dev, errp);
+} else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
+acpi_cpu_unplug_cb(&s->cpuhp_state, dev, errp);
  } else {
  error_setg(errp, "acpi: device unplug for unsupported device"
 " type: %s", object_get_typename(OBJECT(dev)));
@@ -272,6 +284,7 @@ static void acpi_ged_ospm_status(AcpiDeviceIf *adev, 
ACPIOSTInfoList ***list)
  AcpiGedState *s = ACPI_GED(adev

Re: [PATCH 2/8] Update CPUs AML with cpu-(ctrl)dev change

2023-08-01 Thread lixianglai


Hi, Igor Mammedov :

On 7/28/23 7:55 PM, Igor Mammedov wrote:

On Thu, 20 Jul 2023 15:15:07 +0800
xianglai li  wrote:


CPUs Control device(\\_SB.PCI0) register interface for the x86 arch is based on
PCI and is IO port based and hence existing cpus AML code assumes _CRS objects
would evaluate to a system resource which describes IO Port address. But on 
LOONGARCH
arch CPUs control device(\\_SB.PRES) register interface is memory-mapped hence
_CRS object should evaluate to system resource which describes memory-mapped
base address.

This cpus AML code change updates the existing inerface of the build cpus AML

  ^^^ typo

Oh, I'm sorry for the typo.

function to accept both IO/MEMORY type regions and update the _CRS object
correspondingly.

try to reformat commit message  to less than 80 character per line

Ok! I'll modify it as suggested in the next version of patch.

Cc: Xiaojuan Yang 
Cc: Song Gao 
Cc: "Michael S. Tsirkin" 
Cc: Igor Mammedov 
Cc: Ani Sinha 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: "Philippe Mathieu-Daudé" 
Cc: Yanan Wang 
Cc: "Daniel P. Berrangé" 
Cc: Peter Xu 
Cc: David Hildenbrand 
Signed-off-by: xianglai li 
---
  hw/acpi/cpu.c | 30 +++---
  hw/i386/acpi-build.c  |  2 +-
  include/hw/acpi/cpu.h |  5 +++--
  3 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
index 6897c8789a..3b945a1a40 100644
--- a/hw/acpi/cpu.c
+++ b/hw/acpi/cpu.c
@@ -5,6 +5,7 @@
  #include "qapi/qapi-events-acpi.h"
  #include "trace.h"
  #include "sysemu/numa.h"
+#include "hw/acpi/cpu_hotplug.h"
  
  #define OVMF_CPUHP_SMI_CMD 4
  
@@ -331,9 +332,10 @@ const VMStateDescription vmstate_cpu_hotplug = {

  #define CPU_FW_EJECT_EVENT "CEJF"
  
  void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,

-hwaddr io_base,
+hwaddr mmap_io_base,
  const char *res_root,
-const char *event_handler_method)
+const char *event_handler_method,
+AmlRegionSpace rs)
  {
  Aml *ifctx;
  Aml *field;
@@ -360,14 +362,28 @@ void build_cpus_aml(Aml *table, MachineState *machine, 
CPUHotplugFeatures opts,
  aml_append(cpu_ctrl_dev, aml_mutex(CPU_LOCK, 0));
  
  crs = aml_resource_template();

-aml_append(crs, aml_io(AML_DECODE16, io_base, io_base, 1,
+if (rs == AML_SYSTEM_IO) {
+aml_append(crs, aml_io(AML_DECODE16, mmap_io_base, mmap_io_base, 1,
 ACPI_CPU_HOTPLUG_REG_LEN));
+} else {
+aml_append(crs, aml_memory32_fixed(mmap_io_base,
+   ACPI_CPU_HOTPLUG_REG_LEN, AML_READ_WRITE));
+}
  aml_append(cpu_ctrl_dev, aml_name_decl("_CRS", crs));
  
-/* declare CPU hotplug MMIO region with related access fields */

-aml_append(cpu_ctrl_dev,
-aml_operation_region("PRST", AML_SYSTEM_IO, aml_int(io_base),
- ACPI_CPU_HOTPLUG_REG_LEN));
+if (rs == AML_SYSTEM_IO) {
+/* declare CPU hotplug MMIO region with related access fields */
+aml_append(cpu_ctrl_dev,
+aml_operation_region("PRST", AML_SYSTEM_IO,
+ aml_int(mmap_io_base),
+ ACPI_CPU_HOTPLUG_REG_LEN));
+} else {
+aml_append(cpu_ctrl_dev,
+aml_operation_region("PRST", AML_SYSTEM_MEMORY,
+ aml_int(mmap_io_base),
+ ACPI_CPU_HOTPLUG_REG_LEN));
+}


to reduce duplication, following could be better way to spell it:
  g_assert(rs == foo1 || rs == foo2)
  ... aml_operation_region("PRST", rs, aml_int(io_base), ...



Well, this suggestion makes the code look good, I'll fix it in the next 
patch release.



  
  field = aml_field("PRST", AML_BYTE_ACC, AML_NOLOCK,

AML_WRITE_AS_ZEROS);
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 9c74fa17ad..5d02690593 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1548,7 +1548,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
  .fw_unplugs_cpu = pm->smi_on_cpu_unplug,
  };
  build_cpus_aml(dsdt, machine, opts, pm->cpu_hp_io_base,
-   "\\_SB.PCI0", "\\_GPE._E02");
+   "\\_SB.PCI0", "\\_GPE._E02", AML_SYSTEM_IO);
  }
  
  if (pcms->memhp_io_base && nr_mem) {

diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
index 999caaf510..cddea78333 100644
--- a/include/hw/acpi/cpu.h
+++ b/include/hw/acpi/cpu.h
@@ -56,9 +56,10 @@ typedef struct CPUHotplugFeatures {
  } CPUHotplugFeatures;
  
  void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFe

Re: [PATCH 3/8] Introduced a new function to disconnect GPIO connections

2023-08-01 Thread lixianglai


Hi, Igor Mammedov:

On 7/28/23 7:59 PM, Igor Mammedov wrote:

On Thu, 20 Jul 2023 15:15:08 +0800
xianglai li  wrote:


It introduces a new function to unwire the
vcpu<->exioi interrupts for the vcpu hot-(un)plug cases.

it's not a new function.
You probably wanted to say:

subj: make foo() public

it will be reused .someplace. for ...

Ok, I'll fix it in the next version of patch.

Cc: Xiaojuan Yang 
Cc: Song Gao 
Cc: "Michael S. Tsirkin" 
Cc: Igor Mammedov 
Cc: Ani Sinha 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: "Philippe Mathieu-Daudé" 
Cc: Yanan Wang 
Cc: "Daniel P. Berrangé" 
Cc: Peter Xu 
Cc: David Hildenbrand 
Signed-off-by: xianglai li 
---
  hw/core/gpio.c | 4 ++--
  include/hw/qdev-core.h | 2 ++
  2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/core/gpio.c b/hw/core/gpio.c
index 80d07a6ec9..4fc6409545 100644
--- a/hw/core/gpio.c
+++ b/hw/core/gpio.c
@@ -143,8 +143,8 @@ qemu_irq qdev_get_gpio_out_connector(DeviceState *dev, 
const char *name, int n)
  
  /* disconnect a GPIO output, returning the disconnected input (if any) */
  
-static qemu_irq qdev_disconnect_gpio_out_named(DeviceState *dev,

-   const char *name, int n)
+qemu_irq qdev_disconnect_gpio_out_named(DeviceState *dev,
+const char *name, int n)
  {
  char *propname = g_strdup_printf("%s[%d]",
   name ? name : "unnamed-gpio-out", n);
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 884c726a87..992f5419fa 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -739,6 +739,8 @@ qemu_irq qdev_get_gpio_out_connector(DeviceState *dev, 
const char *name, int n);
   */
  qemu_irq qdev_intercept_gpio_out(DeviceState *dev, qemu_irq icpt,
   const char *name, int n);
+qemu_irq qdev_disconnect_gpio_out_named(DeviceState *dev,
+   const char *name, int n);

watch for proper alignment

have you tried to run ./scripts/checkpatch.pl on you series?
(it should catch such cases)


Ok, I'll fix it in the next version of patch.

I ran ./scripts/checkpatch.pl on my patch.pl but didn't check for the 
problem.


  
  BusState *qdev_get_child_bus(DeviceState *dev, const char *name);
  


Thanks,

xianglai.

RE: [RFC PATCH v4 00/24] vfio: Adopt iommufd

2023-08-01 Thread Duan, Zhenzhong

Ping, any comments or suggestions are appreciated.

Thanks
Zhenzhong

>-Original Message-
>From: Duan, Zhenzhong 
>Sent: Wednesday, July 12, 2023 3:25 PM
>To: qemu-devel@nongnu.org
>Cc: alex.william...@redhat.com; c...@redhat.com; j...@nvidia.com;
>nicol...@nvidia.com; eric.au...@redhat.com; pet...@redhat.com;
>jasonw...@redhat.com; Tian, Kevin ; Liu, Yi L
>; Sun, Yi Y ; Peng, Chao P
>; Duan, Zhenzhong 
>Subject: [RFC PATCH v4 00/24] vfio: Adopt iommufd
>
>With the introduction of iommufd, the Linux kernel provides a generic
>interface for userspace drivers to propagate their DMA mappings to kernel
>for assigned devices. This series does the porting of the VFIO devices
>onto the /dev/iommu uapi and let it coexist with the legacy implementation.
>
>This QEMU integration is the result of a collaborative work between
>Yi Liu, Yi Sun, Nicolin Chen and Eric Auger.
>
>At QEMU level, interactions with the /dev/iommu are abstracted by a new
>iommufd object (compiled in with the CONFIG_IOMMUFD option).
>
>Any QEMU device (e.g. vfio device) wishing to use /dev/iommu must be
>linked with an iommufd object. In this series, the vfio-pci device is
>granted with such capability (other VFIO devices are not yet ready):
>
>It gets a new optional parameter named iommufd which allows to pass
>an iommufd object:
>
>-object iommufd,id=iommufd0
>-device vfio-pci,host=:02:00.0,iommufd=iommufd0
>
>Note the /dev/iommu and vfio cdev can be externally opened by a
>management layer. In such a case the fd is passed:
>
>-object iommufd,id=iommufd0,fd=22
>-device vfio-pci,iommufd=iommufd0,fd=23
>
>If the fd parameter is not passed, the fd is opened by QEMU.
>See https://www.mail-archive.com/qemu-devel@nongnu.org/msg937155.html
>for detailed discuss on this requirement.
>
>If no iommufd option is passed to the vfio-pci device, iommufd is not
>used and the end-user gets the behavior based on the legacy vfio iommu
>interfaces:
>
>-device vfio-pci,host=:02:00.0
>
>While the legacy kernel interface is group-centric, the new iommufd
>interface is device-centric, relying on device fd and iommufd.
>
>To support both interfaces in the QEMU VFIO device we reworked the vfio
>container abstraction so that the generic VFIO code can use either
>backend.
>
>The VFIOContainer object becomes a base object derived into
>a) the legacy VFIO container and
>b) the new iommufd based container.
>
>The base object implements generic code such as code related to
>memory_listener and address space management whereas the derived
>objects implement callbacks specific to either BE, legacy and
>iommufd. Indeed each backend has its own way to setup secure context
>and dma management interface. The below diagram shows how it looks
>like with both BEs.
>
>VFIO   AddressSpace/Memory
>+---+  +--+  +-+  +-+
>|  pci  |  | platform |  |  ap |  | ccw |
>+---+---+  ++-+  +--+--+  +--+--+ +--+
>|   |   |||   AddressSpace   |
>|   |   ||++-+
>+---V---V---VV+   /
>|   VFIOAddressSpace  | <+
>|  |  |  MemoryListener
>|  VFIOContainer list |
>+---+++
>||
>||
>+---V--++V--+
>|   iommufd||vfio legacy|
>|  container   || container |
>+---+--+++--+
>||
>| /dev/iommu | /dev/vfio/vfio
>| /dev/vfio/devices/vfioX| /dev/vfio/$group_id
>Userspace   ||
>++===
>
>Kernel  |  device fd |
>+---+| group/container fd
>| (BIND_IOMMUFD || (SET_CONTAINER/SET_IOMMU)
>|  ATTACH_IOAS) || device fd
>|   ||
>|   +---VV-+
>iommufd |   |vfio  |
>(map/unmap  |   +-++---+
>ioas_copy)  | || map/unmap
>| ||
> +--V--++-V--+  +--V+
> | iommfd core ||  device|  |  vfio iommu   |
> +-+++  +---+
>
>[Secure Context setup]
>- iommufd BE: uses device fd and iommufd to setup secure context
>  (bind_iommufd, attach_ioas)
>- vfio legacy BE: uses group fd and contai

Re: [RESEND PATCH] hw/i386/vmmouse:add relative packet flag for button status

2023-08-01 Thread Marc-André Lureau

On Tue, Aug 1, 2023 at 9:45 AM Zongmin Zhou  wrote:
>
> The buttons value use macros instead of direct numbers.
>
> If request relative mode, have to add this for
> guest vmmouse driver to judge this is a relative packet.
> otherwise,vmmouse driver will not match
> the condition 'status & VMMOUSE_RELATIVE_PACKET',
> and can't report events on the correct(relative) input device,
> result to relative mode unuseful.
>
> Signed-off-by: Zongmin Zhou
> ---
>  hw/i386/vmmouse.c | 15 ---
>  1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/hw/i386/vmmouse.c b/hw/i386/vmmouse.c
> index a56c185f15..6cd624bd09 100644
> --- a/hw/i386/vmmouse.c
> +++ b/hw/i386/vmmouse.c
> @@ -44,6 +44,12 @@
>
>  #define VMMOUSE_VERSION0x3442554a
>
> +#define VMMOUSE_RELATIVE_PACKET0x0001
> +
> +#define VMMOUSE_LEFT_BUTTON0x20
> +#define VMMOUSE_RIGHT_BUTTON   0x10
> +#define VMMOUSE_MIDDLE_BUTTON  0x08
> +
>  #ifdef DEBUG_VMMOUSE
>  #define DPRINTF(fmt, ...) printf(fmt, ## __VA_ARGS__)
>  #else
> @@ -103,15 +109,18 @@ static void vmmouse_mouse_event(void *opaque, int x, 
> int y, int dz, int buttons_
>  x, y, dz, buttons_state);
>
>  if ((buttons_state & MOUSE_EVENT_LBUTTON))
> -buttons |= 0x20;
> +buttons |= VMMOUSE_LEFT_BUTTON;
>  if ((buttons_state & MOUSE_EVENT_RBUTTON))
> -buttons |= 0x10;
> +buttons |= VMMOUSE_RIGHT_BUTTON;
>  if ((buttons_state & MOUSE_EVENT_MBUTTON))
> -buttons |= 0x08;
> +buttons |= VMMOUSE_MIDDLE_BUTTON;
>
>  if (s->absolute) {
>  x <<= 1;
>  y <<= 1;
> +} else{
> +/* add for guest vmmouse driver to judge this is a relative packet. 
> */
> +buttons |= VMMOUSE_RELATIVE_PACKET;
>  }

Reviewed-by: Marc-André Lureau 

>
>  s->queue[s->nb_queue++] = buttons;
> --
> 2.34.1
>
>


-- 
Marc-André Lureau

[PATCH 1/2] vmmouse: replace DPRINTF with tracing

2023-08-01 Thread marcandre . lureau

From: Marc-André Lureau 

Signed-off-by: Marc-André Lureau 
---
 hw/i386/vmmouse.c| 29 ++---
 hw/i386/trace-events | 10 ++
 2 files changed, 24 insertions(+), 15 deletions(-)

diff --git a/hw/i386/vmmouse.c b/hw/i386/vmmouse.c
index a56c185f15..fce13a5cde 100644
--- a/hw/i386/vmmouse.c
+++ b/hw/i386/vmmouse.c
@@ -32,6 +32,8 @@
 #include "cpu.h"
 #include "qom/object.h"
 
+#include "trace.h"
+
 /* debug only vmmouse */
 //#define DEBUG_VMMOUSE
 
@@ -44,12 +46,6 @@
 
 #define VMMOUSE_VERSION0x3442554a
 
-#ifdef DEBUG_VMMOUSE
-#define DPRINTF(fmt, ...) printf(fmt, ## __VA_ARGS__)
-#else
-#define DPRINTF(fmt, ...) do { } while (0)
-#endif
-
 #define TYPE_VMMOUSE "vmmouse"
 OBJECT_DECLARE_SIMPLE_TYPE(VMMouseState, VMMOUSE)
 
@@ -87,7 +83,8 @@ static void vmmouse_set_data(const uint32_t *data)
 
 static uint32_t vmmouse_get_status(VMMouseState *s)
 {
-DPRINTF("vmmouse_get_status()\n");
+trace_vmmouse_get_status();
+
 return (s->status << 16) | s->nb_queue;
 }
 
@@ -99,8 +96,7 @@ static void vmmouse_mouse_event(void *opaque, int x, int y, 
int dz, int buttons_
 if (s->nb_queue > (VMMOUSE_QUEUE_SIZE - 4))
 return;
 
-DPRINTF("vmmouse_mouse_event(%d, %d, %d, %d)\n",
-x, y, dz, buttons_state);
+trace_vmmouse_mouse_event(x, y, dz, buttons_state);
 
 if ((buttons_state & MOUSE_EVENT_LBUTTON))
 buttons |= 0x20;
@@ -151,7 +147,7 @@ static void vmmouse_update_handler(VMMouseState *s, int 
absolute)
 
 static void vmmouse_read_id(VMMouseState *s)
 {
-DPRINTF("vmmouse_read_id()\n");
+trace_vmmouse_read_id();
 
 if (s->nb_queue == VMMOUSE_QUEUE_SIZE)
 return;
@@ -163,19 +159,22 @@ static void vmmouse_read_id(VMMouseState *s)
 
 static void vmmouse_request_relative(VMMouseState *s)
 {
-DPRINTF("vmmouse_request_relative()\n");
+trace_vmmouse_request_relative();
+
 vmmouse_update_handler(s, 0);
 }
 
 static void vmmouse_request_absolute(VMMouseState *s)
 {
-DPRINTF("vmmouse_request_absolute()\n");
+trace_vmmouse_request_absolute();
+
 vmmouse_update_handler(s, 1);
 }
 
 static void vmmouse_disable(VMMouseState *s)
 {
-DPRINTF("vmmouse_disable()\n");
+trace_vmmouse_disable();
+
 s->status = 0x;
 vmmouse_remove_handler(s);
 }
@@ -184,7 +183,7 @@ static void vmmouse_data(VMMouseState *s, uint32_t *data, 
uint32_t size)
 {
 int i;
 
-DPRINTF("vmmouse_data(%d)\n", size);
+trace_vmmouse_data(size);
 
 if (size == 0 || size > 6 || size > s->nb_queue) {
 printf("vmmouse: driver requested too much data %d\n", size);
@@ -284,7 +283,7 @@ static void vmmouse_realizefn(DeviceState *dev, Error 
**errp)
 {
 VMMouseState *s = VMMOUSE(dev);
 
-DPRINTF("vmmouse_init\n");
+trace_vmmouse_init();
 
 if (!s->i8042) {
 error_setg(errp, "'i8042' link is not set");
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index 04fd71bfc4..53c02d7ac8 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -121,3 +121,13 @@ x86_pic_interrupt(int irqn, int level) "PIC interrupt #%d 
level:%d"
 # port92.c
 port92_read(uint8_t val) "port92: read 0x%02x"
 port92_write(uint8_t val) "port92: write 0x%02x"
+
+# vmmouse.c
+vmmouse_get_status(void) ""
+vmmouse_mouse_event(int x, int y, int dz, int buttons_state) "event: x=%d y=%d 
dz=%d state=%d"
+vmmouse_init(void) ""
+vmmouse_read_id(void) ""
+vmmouse_request_relative(void) ""
+vmmouse_request_absolute(void) ""
+vmmouse_disable(void) ""
+vmmouse_data(uint32_t size) "data: size=%" PRIu32
-- 
2.41.0

[PATCH 2/2] vmmouse: use explicit code

2023-08-01 Thread marcandre . lureau

From: Marc-André Lureau 

It's weird to shift x & y without obvious reason. Let's make this more
explicit and future-proof.

Signed-off-by: Marc-André Lureau 
---
 hw/i386/vmmouse.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/hw/i386/vmmouse.c b/hw/i386/vmmouse.c
index fce13a5cde..cd9ac11afc 100644
--- a/hw/i386/vmmouse.c
+++ b/hw/i386/vmmouse.c
@@ -46,6 +46,11 @@
 
 #define VMMOUSE_VERSION0x3442554a
 
+#define VMMOUSE_MIN_X 0
+#define VMMOUSE_MIN_Y 0
+#define VMMOUSE_MAX_X 0x
+#define VMMOUSE_MAX_Y 0x
+
 #define TYPE_VMMOUSE "vmmouse"
 OBJECT_DECLARE_SIMPLE_TYPE(VMMouseState, VMMOUSE)
 
@@ -106,8 +111,12 @@ static void vmmouse_mouse_event(void *opaque, int x, int 
y, int dz, int buttons_
 buttons |= 0x08;
 
 if (s->absolute) {
-x <<= 1;
-y <<= 1;
+x = qemu_input_scale_axis(x,
+  INPUT_EVENT_ABS_MIN, INPUT_EVENT_ABS_MAX,
+  VMMOUSE_MIN_X, VMMOUSE_MAX_X);
+y = qemu_input_scale_axis(y,
+  INPUT_EVENT_ABS_MIN, INPUT_EVENT_ABS_MAX,
+  VMMOUSE_MIN_Y, VMMOUSE_MAX_Y);
 }
 
 s->queue[s->nb_queue++] = buttons;
-- 
2.41.0

[PULL 4/5] xen: Don't pass MemoryListener around by value

2023-08-01 Thread Anthony PERARD via

From: Peter Maydell 

Coverity points out (CID 1513106, 1513107) that MemoryListener is a
192 byte struct which we are passing around by value.  Switch to
passing a const pointer into xen_register_ioreq() and then to
xen_do_ioreq_register().  We can also make the file-scope
MemoryListener variables const, since nothing changes them.

Signed-off-by: Peter Maydell 
Acked-by: Anthony PERARD 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20230718101057.1110979-1-peter.mayd...@linaro.org>
Signed-off-by: Anthony PERARD 
---
 hw/arm/xen_arm.c| 4 ++--
 hw/i386/xen/xen-hvm.c   | 4 ++--
 hw/xen/xen-hvm-common.c | 8 
 include/hw/xen/xen-hvm-common.h | 2 +-
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/hw/arm/xen_arm.c b/hw/arm/xen_arm.c
index 044093fec7..1d3e6d481a 100644
--- a/hw/arm/xen_arm.c
+++ b/hw/arm/xen_arm.c
@@ -37,7 +37,7 @@
 #define TYPE_XEN_ARM  MACHINE_TYPE_NAME("xenpvh")
 OBJECT_DECLARE_SIMPLE_TYPE(XenArmState, XEN_ARM)
 
-static MemoryListener xen_memory_listener = {
+static const MemoryListener xen_memory_listener = {
 .region_add = xen_region_add,
 .region_del = xen_region_del,
 .log_start = NULL,
@@ -108,7 +108,7 @@ static void xen_arm_init(MachineState *machine)
 
 xam->state =  g_new0(XenIOState, 1);
 
-xen_register_ioreq(xam->state, machine->smp.cpus, xen_memory_listener);
+xen_register_ioreq(xam->state, machine->smp.cpus, &xen_memory_listener);
 
 #ifdef CONFIG_TPM
 if (xam->cfg.tpm_base_addr) {
diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index 3da5a2b23f..f42621e674 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -458,7 +458,7 @@ static void xen_log_global_stop(MemoryListener *listener)
 xen_in_migration = false;
 }
 
-static MemoryListener xen_memory_listener = {
+static const MemoryListener xen_memory_listener = {
 .name = "xen-memory",
 .region_add = xen_region_add,
 .region_del = xen_region_del,
@@ -582,7 +582,7 @@ void xen_hvm_init_pc(PCMachineState *pcms, MemoryRegion 
**ram_memory)
 
 state = g_new0(XenIOState, 1);
 
-xen_register_ioreq(state, max_cpus, xen_memory_listener);
+xen_register_ioreq(state, max_cpus, &xen_memory_listener);
 
 QLIST_INIT(&xen_physmap);
 xen_read_physmap(state);
diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
index 886c3ee944..565dc39c8f 100644
--- a/hw/xen/xen-hvm-common.c
+++ b/hw/xen/xen-hvm-common.c
@@ -765,8 +765,8 @@ void xen_shutdown_fatal_error(const char *fmt, ...)
 }
 
 static void xen_do_ioreq_register(XenIOState *state,
-   unsigned int max_cpus,
-   MemoryListener xen_memory_listener)
+  unsigned int max_cpus,
+  const MemoryListener *xen_memory_listener)
 {
 int i, rc;
 
@@ -824,7 +824,7 @@ static void xen_do_ioreq_register(XenIOState *state,
 
 qemu_add_vm_change_state_handler(xen_hvm_change_state_handler, state);
 
-state->memory_listener = xen_memory_listener;
+state->memory_listener = *xen_memory_listener;
 memory_listener_register(&state->memory_listener, &address_space_memory);
 
 state->io_listener = xen_io_listener;
@@ -842,7 +842,7 @@ static void xen_do_ioreq_register(XenIOState *state,
 }
 
 void xen_register_ioreq(XenIOState *state, unsigned int max_cpus,
-MemoryListener xen_memory_listener)
+const MemoryListener *xen_memory_listener)
 {
 int rc;
 
diff --git a/include/hw/xen/xen-hvm-common.h b/include/hw/xen/xen-hvm-common.h
index f9559e2885..4e9904f1a6 100644
--- a/include/hw/xen/xen-hvm-common.h
+++ b/include/hw/xen/xen-hvm-common.h
@@ -93,7 +93,7 @@ void xen_device_unrealize(DeviceListener *listener, 
DeviceState *dev);
 
 void xen_hvm_change_state_handler(void *opaque, bool running, RunState rstate);
 void xen_register_ioreq(XenIOState *state, unsigned int max_cpus,
-MemoryListener xen_memory_listener);
+const MemoryListener *xen_memory_listener);
 
 void cpu_ioreq_pio(ioreq_t *req);
 #endif /* HW_XEN_HVM_COMMON_H */
-- 
Anthony PERARD

[PULL 3/5] thread-pool: signal "request_cond" while locked

2023-08-01 Thread Anthony PERARD via

From: Anthony PERARD 

thread_pool_free() might have been called on the `pool`, which would
be a reason for worker_thread() to quit. In this case,
`pool->request_cond` is been destroyed.

If worker_thread() didn't managed to signal `request_cond` before it
been destroyed by thread_pool_free(), we got:
util/qemu-thread-posix.c:198: qemu_cond_signal: Assertion 
`cond->initialized' failed.

One backtrace:
__GI___assert_fail (assertion=0x5614abcb "cond->initialized", 
file=0x5614ab88 "util/qemu-thread-posix.c", line=198,
function=0x5614ad80 <__PRETTY_FUNCTION__.17104> "qemu_cond_signal") 
at assert.c:101
qemu_cond_signal (cond=0x7fffb800db30) at util/qemu-thread-posix.c:198
worker_thread (opaque=0x7fffb800dab0) at util/thread-pool.c:129
qemu_thread_start (args=0x7fffb8000b20) at util/qemu-thread-posix.c:505
start_thread (arg=) at pthread_create.c:486

Reported here:
https://lore.kernel.org/all/ZJwoK50FcnTSfFZ8@MacBook-Air-de-Roger.local/T/#u

To avoid issue, keep lock while sending a signal to `request_cond`.

Fixes: 900fa208f506 ("thread-pool: replace semaphore with condition variable")
Signed-off-by: Anthony PERARD 
Reviewed-by: Stefan Hajnoczi 
Message-Id: <20230714152720.5077-1-anthony.per...@citrix.com>
Signed-off-by: Anthony PERARD 
---
 util/thread-pool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/util/thread-pool.c b/util/thread-pool.c
index 0d97888df0..e3d8292d14 100644
--- a/util/thread-pool.c
+++ b/util/thread-pool.c
@@ -120,13 +120,13 @@ static void *worker_thread(void *opaque)
 
 pool->cur_threads--;
 qemu_cond_signal(&pool->worker_stopped);
-qemu_mutex_unlock(&pool->lock);
 
 /*
  * Wake up another thread, in case we got a wakeup but decided
  * to exit due to pool->cur_threads > pool->max_threads.
  */
 qemu_cond_signal(&pool->request_cond);
+qemu_mutex_unlock(&pool->lock);
 return NULL;
 }
 
-- 
Anthony PERARD

[PULL 2/5] xen-block: Avoid leaks on new error path

2023-08-01 Thread Anthony PERARD via

From: Anthony PERARD 

Commit 189829399070 ("xen-block: Use specific blockdev driver")
introduced a new error path, without taking care of allocated
resources.

So only allocate the qdicts after the error check, and free both
`filename` and `driver` when we are about to return and thus taking
care of both success and error path.

Coverity only spotted the leak of qdicts (*_layer variables).

Reported-by: Peter Maydell 
Fixes: Coverity CID 1508722, 1398649
Fixes: 189829399070 ("xen-block: Use specific blockdev driver")
Signed-off-by: Anthony PERARD 
Reviewed-by: Paul Durrant 
Reviewed-by: Peter Maydell 
Message-Id: <20230704171819.42564-1-anthony.per...@citrix.com>
Signed-off-by: Anthony PERARD 
---
 hw/block/xen-block.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
index f099914831..3906b9058b 100644
--- a/hw/block/xen-block.c
+++ b/hw/block/xen-block.c
@@ -781,14 +781,15 @@ static XenBlockDrive *xen_block_drive_create(const char 
*id,
 drive = g_new0(XenBlockDrive, 1);
 drive->id = g_strdup(id);
 
-file_layer = qdict_new();
-driver_layer = qdict_new();
-
 rc = stat(filename, &st);
 if (rc) {
 error_setg_errno(errp, errno, "Could not stat file '%s'", filename);
 goto done;
 }
+
+file_layer = qdict_new();
+driver_layer = qdict_new();
+
 if (S_ISBLK(st.st_mode)) {
 qdict_put_str(file_layer, "driver", "host_device");
 } else {
@@ -796,7 +797,6 @@ static XenBlockDrive *xen_block_drive_create(const char *id,
 }
 
 qdict_put_str(file_layer, "filename", filename);
-g_free(filename);
 
 if (mode && *mode != 'w') {
 qdict_put_bool(file_layer, "read-only", true);
@@ -831,7 +831,6 @@ static XenBlockDrive *xen_block_drive_create(const char *id,
 qdict_put_str(file_layer, "locking", "off");
 
 qdict_put_str(driver_layer, "driver", driver);
-g_free(driver);
 
 qdict_put(driver_layer, "file", file_layer);
 
@@ -842,6 +841,8 @@ static XenBlockDrive *xen_block_drive_create(const char *id,
 qobject_unref(driver_layer);
 
 done:
+g_free(filename);
+g_free(driver);
 if (*errp) {
 xen_block_drive_destroy(drive, NULL);
 return NULL;
-- 
Anthony PERARD

[PULL 1/5] hw/xen: Clarify (lack of) error handling in transaction_commit()

2023-08-01 Thread Anthony PERARD via

From: David Woodhouse 

Coverity was unhappy (CID 1508359) because we didn't check the return of
init_walk_op() in transaction_commit(), despite doing so at every other
call site.

Strictly speaking, this is a false positive since it can never fail. It
only fails for invalid user input (transaction ID or path), and both of
those are hard-coded to known sane values in this invocation.

But Coverity doesn't know that, and neither does the casual reader of the
code.

Returning an error here would be weird, since the transaction *is*
committed by this point; all the walk_op is doing is firing watches on
the newly-committed changed nodes. So make it a g_assert(!ret), since
it really should never happen.

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
Message-Id: <20076888f6bdf06a65aafc5cf954260965d45b97.ca...@infradead.org>
Signed-off-by: Anthony PERARD 
---
 hw/i386/kvm/xenstore_impl.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/hw/i386/kvm/xenstore_impl.c b/hw/i386/kvm/xenstore_impl.c
index 305fe75519..d9732b567e 100644
--- a/hw/i386/kvm/xenstore_impl.c
+++ b/hw/i386/kvm/xenstore_impl.c
@@ -1022,6 +1022,7 @@ static int transaction_commit(XenstoreImplState *s, 
XsTransaction *tx)
 {
 struct walk_op op;
 XsNode **n;
+int ret;
 
 if (s->root_tx != tx->base_tx) {
 return EAGAIN;
@@ -1032,7 +1033,16 @@ static int transaction_commit(XenstoreImplState *s, 
XsTransaction *tx)
 s->root_tx = tx->tx_id;
 s->nr_nodes = tx->nr_nodes;
 
-init_walk_op(s, &op, XBT_NULL, tx->dom_id, "/", &n);
+ret = init_walk_op(s, &op, XBT_NULL, tx->dom_id, "/", &n);
+/*
+ * There are two reasons why init_walk_op() may fail: an invalid tx_id,
+ * or an invalid path. We pass XBT_NULL and "/", and it cannot fail.
+ * If it does, the world is broken. And returning 'ret' would be weird
+ * because the transaction *was* committed, and all this tree walk is
+ * trying to do is fire the resulting watches on newly-committed nodes.
+ */
+g_assert(!ret);
+
 op.deleted_in_tx = false;
 op.mutating = true;
 
-- 
Anthony PERARD

[PULL 5/5] xen-platform: do full PCI reset during unplug of IDE devices

2023-08-01 Thread Anthony PERARD via

From: Olaf Hering 

The IDE unplug function needs to reset the entire PCI device, to make
sure all state is initialized to defaults. This is done by calling
pci_device_reset, which resets not only the chip specific registers, but
also all PCI state. This fixes "unplug" in a Xen HVM domU with the
modular legacy xenlinux PV drivers.

Commit ee358e919e38 ("hw/ide/piix: Convert reset handler to
DeviceReset") changed the way how the the disks are unplugged. Prior
this commit the PCI device remained unchanged. After this change,
piix_ide_reset is exercised after the "unplug" command, which was not
the case prior that commit. This function resets the command register.
As a result the ata_piix driver inside the domU will see a disabled PCI
device. The generic PCI code will reenable the PCI device. On the qemu
side, this runs pci_default_write_config/pci_update_mappings. Here a
changed address is returned by pci_bar_address, this is the address
which was truncated in piix_ide_reset. In case of a Xen HVM domU, the
address changes from 0xc120 to 0xc100. This truncation was a bug in
piix_ide_reset, which was fixed in commit 230dfd9257 ("hw/ide/piix:
properly initialize the BMIBA register"). If pci_xen_ide_unplug had used
pci_device_reset, the PCI registers would have been properly reset, and
commit ee358e919e38 would have not introduced a regression for this
specific domU environment.

While the unplug is supposed to hide the IDE disks, the changed BMIBA
address broke the UHCI device. In case the domU has an USB tablet
configured, to recive absolute pointer coordinates for the GUI, it will
cause a hang during device discovery of the partly discovered USB hid
device. Reading the USBSTS word size register will fail. The access ends
up in the QEMU piix-bmdma device, instead of the expected uhci device.
Here a byte size request is expected, and a value of ~0 is returned. As
a result the UCHI driver sees an error state in the register, and turns
off the UHCI controller.

Signed-off-by: Olaf Hering 
Reviewed-by: Paul Durrant 
Message-Id: <20230720072950.20198-1-o...@aepfle.de>
Signed-off-by: Anthony PERARD 
---
 hw/i386/xen/xen_platform.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c
index 57f1d742c1..17457ff3de 100644
--- a/hw/i386/xen/xen_platform.c
+++ b/hw/i386/xen/xen_platform.c
@@ -164,8 +164,9 @@ static void pci_unplug_nics(PCIBus *bus)
  *
  * [1] 
https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/misc/hvm-emulated-unplug.pandoc
  */
-static void pci_xen_ide_unplug(DeviceState *dev, bool aux)
+static void pci_xen_ide_unplug(PCIDevice *d, bool aux)
 {
+DeviceState *dev = DEVICE(d);
 PCIIDEState *pci_ide;
 int i;
 IDEDevice *idedev;
@@ -195,7 +196,7 @@ static void pci_xen_ide_unplug(DeviceState *dev, bool aux)
 blk_unref(blk);
 }
 }
-device_cold_reset(dev);
+pci_device_reset(d);
 }
 
 static void unplug_disks(PCIBus *b, PCIDevice *d, void *opaque)
@@ -210,7 +211,7 @@ static void unplug_disks(PCIBus *b, PCIDevice *d, void 
*opaque)
 
 switch (pci_get_word(d->config + PCI_CLASS_DEVICE)) {
 case PCI_CLASS_STORAGE_IDE:
-pci_xen_ide_unplug(DEVICE(d), aux);
+pci_xen_ide_unplug(d, aux);
 break;
 
 case PCI_CLASS_STORAGE_SCSI:
-- 
Anthony PERARD

[PULL 0/5] Misc fixes, for thread-pool, xen, and xen-emulate

2023-08-01 Thread Anthony PERARD via

The following changes since commit 802341823f1720511dd5cf53ae40285f7978c61b:

  Merge tag 'pull-tcg-20230731' of https://gitlab.com/rth7680/qemu into staging 
(2023-07-31 14:02:51 -0700)

are available in the Git repository at:

  https://xenbits.xen.org/git-http/people/aperard/qemu-dm.git 
tags/pull-xen-20230801

for you to fetch changes up to 856ca10f9ce1fcffeab18546b36a64f79017c905:

  xen-platform: do full PCI reset during unplug of IDE devices (2023-08-01 
10:22:33 +0100)


Misc fixes, for thread-pool, xen, and xen-emulate

* fix an access to `request_cond` QemuCond in thread-pool
* fix issue with PCI devices when unplugging IDE devices in Xen guest
* several fixes for issues pointed out by Coverity


Anthony PERARD (2):
  xen-block: Avoid leaks on new error path
  thread-pool: signal "request_cond" while locked

David Woodhouse (1):
  hw/xen: Clarify (lack of) error handling in transaction_commit()

Olaf Hering (1):
  xen-platform: do full PCI reset during unplug of IDE devices

Peter Maydell (1):
  xen: Don't pass MemoryListener around by value

 hw/arm/xen_arm.c|  4 ++--
 hw/block/xen-block.c| 11 ++-
 hw/i386/kvm/xenstore_impl.c | 12 +++-
 hw/i386/xen/xen-hvm.c   |  4 ++--
 hw/i386/xen/xen_platform.c  |  7 ---
 hw/xen/xen-hvm-common.c |  8 
 include/hw/xen/xen-hvm-common.h |  2 +-
 util/thread-pool.c  |  2 +-
 8 files changed, 31 insertions(+), 19 deletions(-)

[PATCH v3 02/17] tests: Rename test-x86-cpuid.c to test-x86-topo.c

2023-08-01 Thread Zhao Liu

From: Zhao Liu 

In fact, this unit tests APIC ID other than CPUID.
Rename to test-x86-topo.c to make its name more in line with its
actual content.

Signed-off-by: Zhao Liu 
Tested-by: Yongwei Ma 
Reviewed-by: Philippe Mathieu-Daudé 
---
Changes since v1:
 * Rename test-x86-apicid.c to test-x86-topo.c. (Yanan)
---
 MAINTAINERS  | 2 +-
 tests/unit/meson.build   | 4 ++--
 tests/unit/{test-x86-cpuid.c => test-x86-topo.c} | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)
 rename tests/unit/{test-x86-cpuid.c => test-x86-topo.c} (99%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 12e59b6b27de..51ba3d593e90 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1719,7 +1719,7 @@ F: include/hw/southbridge/ich9.h
 F: include/hw/southbridge/piix.h
 F: hw/isa/apm.c
 F: include/hw/isa/apm.h
-F: tests/unit/test-x86-cpuid.c
+F: tests/unit/test-x86-topo.c
 F: tests/qtest/test-x86-cpuid-compat.c
 
 PC Chipset
diff --git a/tests/unit/meson.build b/tests/unit/meson.build
index 93977cc32d2b..39b5d0007c69 100644
--- a/tests/unit/meson.build
+++ b/tests/unit/meson.build
@@ -21,8 +21,8 @@ tests = {
   'test-opts-visitor': [testqapi],
   'test-visitor-serialization': [testqapi],
   'test-bitmap': [],
-  # all code tested by test-x86-cpuid is inside topology.h
-  'test-x86-cpuid': [],
+  # all code tested by test-x86-topo is inside topology.h
+  'test-x86-topo': [],
   'test-cutils': [],
   'test-div128': [],
   'test-shift128': [],
diff --git a/tests/unit/test-x86-cpuid.c b/tests/unit/test-x86-topo.c
similarity index 99%
rename from tests/unit/test-x86-cpuid.c
rename to tests/unit/test-x86-topo.c
index bfabc0403a1a..2b104f86d7c2 100644
--- a/tests/unit/test-x86-cpuid.c
+++ b/tests/unit/test-x86-topo.c
@@ -1,5 +1,5 @@
 /*
- *  Test code for x86 CPUID and Topology functions
+ *  Test code for x86 APIC ID and Topology functions
  *
  *  Copyright (c) 2012 Red Hat Inc.
  *
-- 
2.34.1

[PATCH v3 03/17] softmmu: Fix CPUSTATE.nr_cores' calculation

2023-08-01 Thread Zhao Liu

From: Zhuocheng Ding 

>From CPUState.nr_cores' comment, it represents "number of cores within
this CPU package".

After 003f230e37d7 ("machine: Tweak the order of topology members in
struct CpuTopology"), the meaning of smp.cores changed to "the number of
cores in one die", but this commit missed to change CPUState.nr_cores'
caculation, so that CPUState.nr_cores became wrong and now it
misses to consider numbers of clusters and dies.

At present, only i386 is using CPUState.nr_cores.

But as for i386, which supports die level, the uses of CPUState.nr_cores
are very confusing:

Early uses are based on the meaning of "cores per package" (before die
is introduced into i386), and later uses are based on "cores per die"
(after die's introduction).

This difference is due to that commit a94e1428991f ("target/i386: Add
CPUID.1F generation support for multi-dies PCMachine") misunderstood
that CPUState.nr_cores means "cores per die" when caculated
CPUID.1FH.01H:EBX. After that, the changes in i386 all followed this
wrong understanding.

With the influence of 003f230e37d7 and a94e1428991f, for i386 currently
the result of CPUState.nr_cores is "cores per die", thus the original
uses of CPUState.cores based on the meaning of "cores per package" are
wrong when mutiple dies exist:
1. In cpu_x86_cpuid() of target/i386/cpu.c, CPUID.01H:EBX[bits 23:16] is
   incorrect because it expects "cpus per package" but now the
   result is "cpus per die".
2. In cpu_x86_cpuid() of target/i386/cpu.c, for all leaves of CPUID.04H:
   EAX[bits 31:26] is incorrect because they expect "cpus per package"
   but now the result is "cpus per die". The error not only impacts the
   EAX caculation in cache_info_passthrough case, but also impacts other
   cases of setting cache topology for Intel CPU according to cpu
   topology (specifically, the incoming parameter "num_cores" expects
   "cores per package" in encode_cache_cpuid4()).
3. In cpu_x86_cpuid() of target/i386/cpu.c, CPUID.0BH.01H:EBX[bits
   15:00] is incorrect because the EBX of 0BH.01H (core level) expects
   "cpus per package", which may be different with 1FH.01H (The reason
   is 1FH can support more levels. For QEMU, 1FH also supports die,
   1FH.01H:EBX[bits 15:00] expects "cpus per die").
4. In cpu_x86_cpuid() of target/i386/cpu.c, when CPUID.8001H is
   caculated, here "cpus per package" is expected to be checked, but in
   fact, now it checks "cpus per die". Though "cpus per die" also works
   for this code logic, this isn't consistent with AMD's APM.
5. In cpu_x86_cpuid() of target/i386/cpu.c, CPUID.8008H:ECX expects
   "cpus per package" but it obtains "cpus per die".
6. In simulate_rdmsr() of target/i386/hvf/x86_emu.c, in
   kvm_rdmsr_core_thread_count() of target/i386/kvm/kvm.c, and in
   helper_rdmsr() of target/i386/tcg/sysemu/misc_helper.c,
   MSR_CORE_THREAD_COUNT expects "cpus per package" and "cores per
   package", but in these functions, it obtains "cpus per die" and
   "cores per die".

On the other hand, these uses are correct now (they are added in/after
a94e1428991f):
1. In cpu_x86_cpuid() of target/i386/cpu.c, topo_info.cores_per_die
   meets the actual meaning of CPUState.nr_cores ("cores per die").
2. In cpu_x86_cpuid() of target/i386/cpu.c, vcpus_per_socket (in CPUID.
   04H's caculation) considers number of dies, so it's correct.
3. In cpu_x86_cpuid() of target/i386/cpu.c, CPUID.1FH.01H:EBX[bits
   15:00] needs "cpus per die" and it gets the correct result, and
   CPUID.1FH.02H:EBX[bits 15:00] gets correct "cpus per package".

When CPUState.nr_cores is correctly changed to "cores per package" again
, the above errors will be fixed without extra work, but the "currently"
correct cases will go wrong and need special handling to pass correct
"cpus/cores per die" they want.

Thus in this patch, we fix CPUState.nr_cores' caculation to fit the
original meaning "cores per package", as well as changing calculation of
topo_info.cores_per_die, vcpus_per_socket and CPUID.1FH.

In addition, in the nr_threads' comment, specify it represents the
number of threads in the "core" to avoid confusion, and also add comment
for nr_dies in CPUX86State.

Fixes: a94e1428991f ("target/i386: Add CPUID.1F generation support for 
multi-dies PCMachine")
Fixes: 003f230e37d7 ("machine: Tweak the order of topology members in struct 
CpuTopology")
Signed-off-by: Zhuocheng Ding 
Co-developed-by: Zhao Liu 
Signed-off-by: Zhao Liu 
---
Changes since v2:
 * Use wrapped helper to get cores per socket in qemu_init_vcpu().
Changes since v1:
 * Add comment for nr_dies in CPUX86State. (Yanan)
---
 include/hw/core/cpu.h | 2 +-
 softmmu/cpus.c| 2 +-
 target/i386/cpu.c | 9 -
 target/i386/cpu.h | 1 +
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index fdcbe8735258..57f4d50ace72 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -277,7 +277,7 @@ struct qemu_work_item;
  *   See TranslationBl

[PATCH v3 00/17] Support smp.clusters for x86

2023-08-01 Thread Zhao Liu

From: Zhao Liu 

Hi list,

This is the our v3 patch series, rebased on the master branch at the
commit 234320cd0573 ("Merge tag 'pull-target-arm-20230731' of https:
//git.linaro.org/people/pmaydell/qemu-arm into staging").

Comparing with v2 [1], v3 mainly adds "Tested-by", "Reviewed-by" and
"ACKed-by" (for PC related patchies) tags and minor code changes (Pls
see changelog).


# Introduction

This series add the cluster support for x86 PC machine, which allows
x86 can use smp.clusters to configure x86 modlue level CPU topology.

And since the compatibility issue (see section: ## Why not share L2
cache in cluster directly), this series also introduce a new command
to adjust the x86 L2 cache topology.

Welcome your comments!


# Backgroud

The "clusters" parameter in "smp" is introduced by ARM [2], but x86
hasn't supported it.

At present, x86 defaults L2 cache is shared in one core, but this is
not enough. There're some platforms that multiple cores share the
same L2 cache, e.g., Alder Lake-P shares L2 cache for one module of
Atom cores [3], that is, every four Atom cores shares one L2 cache.
Therefore, we need the new CPU topology level (cluster/module).

Another reason is for hybrid architecture. cluster support not only
provides another level of topology definition in x86, but would aslo
provide required code change for future our hybrid topology support.


# Overview

## Introduction of module level for x86

"cluster" in smp is the CPU topology level which is between "core" and
die.

For x86, the "cluster" in smp is corresponding to the module level [4],
which is above the core level. So use the "module" other than "cluster"
in x86 code.

And please note that x86 already has a cpu topology level also named
"cluster" [4], this level is at the upper level of the package. Here,
the cluster in x86 cpu topology is completely different from the
"clusters" as the smp parameter. After the module level is introduced,
the cluster as the smp parameter will actually refer to the module level
of x86.


## Why not share L2 cache in cluster directly

Though "clusters" was introduced to help define L2 cache topology
[2], using cluster to define x86's L2 cache topology will cause the
compatibility problem:

Currently, x86 defaults that the L2 cache is shared in one core, which
actually implies a default setting "cores per L2 cache is 1" and
therefore implicitly defaults to having as many L2 caches as cores.

For example (i386 PC machine):
-smp 16,sockets=2,dies=2,cores=2,threads=2,maxcpus=16 (*)

Considering the topology of the L2 cache, this (*) implicitly means "1
core per L2 cache" and "2 L2 caches per die".

If we use cluster to configure L2 cache topology with the new default
setting "clusters per L2 cache is 1", the above semantics will change
to "2 cores per cluster" and "1 cluster per L2 cache", that is, "2
cores per L2 cache".

So the same command (*) will cause changes in the L2 cache topology,
further affecting the performance of the virtual machine.

Therefore, x86 should only treat cluster as a cpu topology level and
avoid using it to change L2 cache by default for compatibility.


## module level in CPUID

Currently, we don't expose module level in CPUID.1FH because currently
linux (v6.2-rc6) doesn't support module level. And exposing module and
die levels at the same time in CPUID.1FH will cause linux to calculate
wrong die_id. The module level should be exposed until the real machine
has the module level in CPUID.1FH.

We can configure CPUID.04H.02H (L2 cache topology) with module level by
a new command:

"-cpu,x-l2-cache-topo=cluster"

More information about this command, please see the section: "## New
property: x-l2-cache-topo".


## New cache topology info in CPUCacheInfo

Currently, by default, the cache topology is encoded as:
1. i/d cache is shared in one core.
2. L2 cache is shared in one core.
3. L3 cache is shared in one die.

This default general setting has caused a misunderstanding, that is, the
cache topology is completely equated with a specific cpu topology, such
as the connection between L2 cache and core level, and the connection
between L3 cache and die level.

In fact, the settings of these topologies depend on the specific
platform and are not static. For example, on Alder Lake-P, every
four Atom cores share the same L2 cache [2].

Thus, in this patch set, we explicitly define the corresponding cache
topology for different cpu models and this has two benefits:
1. Easy to expand to new CPU models in the future, which has different
   cache topology.
2. It can easily support custom cache topology by some command (e.g.,
   x-l2-cache-topo).


## New property: x-l2-cache-topo

The property l2-cache-topo will be used to change the L2 cache topology
in CPUID.04H.

Now it allows user to set the L2 cache is shared in core level or
cluster level.

If user passes "-cpu x-l2-cache-topo=[core|cluster]" then older L2 cache
topology will be overrided by the new topology setting.

Since CP

[PATCH v3 09/17] i386: Support module_id in X86CPUTopoIDs

2023-08-01 Thread Zhao Liu

From: Zhuocheng Ding 

Add module_id member in X86CPUTopoIDs.

module_id can be parsed from APIC ID, so also update APIC ID parsing
rule to support module level. With this support, the conversions with
module level between X86CPUTopoIDs, X86CPUTopoInfo and APIC ID are
completed.

module_id can be also generated from cpu topology, and before i386
supports "clusters" in smp, the default "clusters per die" is only 1,
thus the module_id generated in this way is 0, so that it will not
conflict with the module_id generated by APIC ID.

Signed-off-by: Zhuocheng Ding 
Co-developed-by: Zhao Liu 
Signed-off-by: Zhao Liu 
Acked-by: Michael S. Tsirkin 
---
Changes since v1:
 * Merge the patch "i386: Update APIC ID parsing rule to support module
   level" into this one. (Yanan)
 * Move the apicid_module_width() and apicid_module_offset() support
   into the previous modules_per_die related patch. (Yanan)
---
 hw/i386/x86.c  | 28 +---
 include/hw/i386/topology.h | 17 +
 2 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index a552ae8bb4a8..0b460fd6074d 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -314,11 +314,11 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 
 /*
  * If APIC ID is not set,
- * set it based on socket/die/core/thread properties.
+ * set it based on socket/die/cluster/core/thread properties.
  */
 if (cpu->apic_id == UNASSIGNED_APIC_ID) {
-int max_socket = (ms->smp.max_cpus - 1) /
-smp_threads / smp_cores / ms->smp.dies;
+int max_socket = (ms->smp.max_cpus - 1) / smp_threads / smp_cores /
+ms->smp.clusters / ms->smp.dies;
 
 /*
  * die-id was optional in QEMU 4.0 and older, so keep it optional
@@ -365,6 +365,14 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 topo_ids.die_id = cpu->die_id;
 topo_ids.core_id = cpu->core_id;
 topo_ids.smt_id = cpu->thread_id;
+
+/*
+ * TODO: This is the temporary initialization for topo_ids.module_id to
+ * avoid "maybe-uninitialized" compilation errors. Will remove when
+ * X86CPU supports cluster_id.
+ */
+topo_ids.module_id = 0;
+
 cpu->apic_id = x86_apicid_from_topo_ids(&topo_info, &topo_ids);
 }
 
@@ -373,11 +381,13 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 MachineState *ms = MACHINE(x86ms);
 
 x86_topo_ids_from_apicid(cpu->apic_id, &topo_info, &topo_ids);
+
 error_setg(errp,
-"Invalid CPU [socket: %u, die: %u, core: %u, thread: %u] with"
-" APIC ID %" PRIu32 ", valid index range 0:%d",
-topo_ids.pkg_id, topo_ids.die_id, topo_ids.core_id, 
topo_ids.smt_id,
-cpu->apic_id, ms->possible_cpus->len - 1);
+"Invalid CPU [socket: %u, die: %u, module: %u, core: %u, thread: 
%u]"
+" with APIC ID %" PRIu32 ", valid index range 0:%d",
+topo_ids.pkg_id, topo_ids.die_id, topo_ids.module_id,
+topo_ids.core_id, topo_ids.smt_id, cpu->apic_id,
+ms->possible_cpus->len - 1);
 return;
 }
 
@@ -498,6 +508,10 @@ const CPUArchIdList 
*x86_possible_cpu_arch_ids(MachineState *ms)
 ms->possible_cpus->cpus[i].props.has_die_id = true;
 ms->possible_cpus->cpus[i].props.die_id = topo_ids.die_id;
 }
+if (ms->smp.clusters > 1) {
+ms->possible_cpus->cpus[i].props.has_cluster_id = true;
+ms->possible_cpus->cpus[i].props.cluster_id = topo_ids.module_id;
+}
 ms->possible_cpus->cpus[i].props.has_core_id = true;
 ms->possible_cpus->cpus[i].props.core_id = topo_ids.core_id;
 ms->possible_cpus->cpus[i].props.has_thread_id = true;
diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
index c807d3811dd3..3cec97b377f2 100644
--- a/include/hw/i386/topology.h
+++ b/include/hw/i386/topology.h
@@ -50,6 +50,7 @@ typedef uint32_t apic_id_t;
 typedef struct X86CPUTopoIDs {
 unsigned pkg_id;
 unsigned die_id;
+unsigned module_id;
 unsigned core_id;
 unsigned smt_id;
 } X86CPUTopoIDs;
@@ -127,6 +128,7 @@ static inline apic_id_t 
x86_apicid_from_topo_ids(X86CPUTopoInfo *topo_info,
 {
 return (topo_ids->pkg_id  << apicid_pkg_offset(topo_info)) |
(topo_ids->die_id  << apicid_die_offset(topo_info)) |
+   (topo_ids->module_id << apicid_module_offset(topo_info)) |
(topo_ids->core_id << apicid_core_offset(topo_info)) |
topo_ids->smt_id;
 }
@@ -140,12 +142,16 @@ static inline void x86_topo_ids_from_idx(X86CPUTopoInfo 
*topo_info,
  X86CPUTopoIDs *topo_ids)
 {
 unsigned nr_dies = topo_info->dies_per_pkg;
-unsigned nr_cores = topo_info->cores_per_module *
-topo_info->modules_per_die;
+unsigned nr_modules = topo_inf

[PATCH v3 01/17] i386: Fix comment style in topology.h

2023-08-01 Thread Zhao Liu

From: Zhao Liu 

For function comments in this file, keep the comment style consistent
with other places.

Signed-off-by: Zhao Liu 
Reviewed-by: Philippe Mathieu-Daudé 
Acked-by: Michael S. Tsirkin 
---
 include/hw/i386/topology.h | 33 +
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
index 81573f6cfde0..5a19679f618b 100644
--- a/include/hw/i386/topology.h
+++ b/include/hw/i386/topology.h
@@ -24,7 +24,8 @@
 #ifndef HW_I386_TOPOLOGY_H
 #define HW_I386_TOPOLOGY_H
 
-/* This file implements the APIC-ID-based CPU topology enumeration logic,
+/*
+ * This file implements the APIC-ID-based CPU topology enumeration logic,
  * documented at the following document:
  *   Intel® 64 Architecture Processor Topology Enumeration
  *   
http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/
@@ -41,7 +42,8 @@
 
 #include "qemu/bitops.h"
 
-/* APIC IDs can be 32-bit, but beware: APIC IDs > 255 require x2APIC support
+/*
+ * APIC IDs can be 32-bit, but beware: APIC IDs > 255 require x2APIC support
  */
 typedef uint32_t apic_id_t;
 
@@ -58,8 +60,7 @@ typedef struct X86CPUTopoInfo {
 unsigned threads_per_core;
 } X86CPUTopoInfo;
 
-/* Return the bit width needed for 'count' IDs
- */
+/* Return the bit width needed for 'count' IDs */
 static unsigned apicid_bitwidth_for_count(unsigned count)
 {
 g_assert(count >= 1);
@@ -67,15 +68,13 @@ static unsigned apicid_bitwidth_for_count(unsigned count)
 return count ? 32 - clz32(count) : 0;
 }
 
-/* Bit width of the SMT_ID (thread ID) field on the APIC ID
- */
+/* Bit width of the SMT_ID (thread ID) field on the APIC ID */
 static inline unsigned apicid_smt_width(X86CPUTopoInfo *topo_info)
 {
 return apicid_bitwidth_for_count(topo_info->threads_per_core);
 }
 
-/* Bit width of the Core_ID field
- */
+/* Bit width of the Core_ID field */
 static inline unsigned apicid_core_width(X86CPUTopoInfo *topo_info)
 {
 return apicid_bitwidth_for_count(topo_info->cores_per_die);
@@ -87,8 +86,7 @@ static inline unsigned apicid_die_width(X86CPUTopoInfo 
*topo_info)
 return apicid_bitwidth_for_count(topo_info->dies_per_pkg);
 }
 
-/* Bit offset of the Core_ID field
- */
+/* Bit offset of the Core_ID field */
 static inline unsigned apicid_core_offset(X86CPUTopoInfo *topo_info)
 {
 return apicid_smt_width(topo_info);
@@ -100,14 +98,14 @@ static inline unsigned apicid_die_offset(X86CPUTopoInfo 
*topo_info)
 return apicid_core_offset(topo_info) + apicid_core_width(topo_info);
 }
 
-/* Bit offset of the Pkg_ID (socket ID) field
- */
+/* Bit offset of the Pkg_ID (socket ID) field */
 static inline unsigned apicid_pkg_offset(X86CPUTopoInfo *topo_info)
 {
 return apicid_die_offset(topo_info) + apicid_die_width(topo_info);
 }
 
-/* Make APIC ID for the CPU based on Pkg_ID, Core_ID, SMT_ID
+/*
+ * Make APIC ID for the CPU based on Pkg_ID, Core_ID, SMT_ID
  *
  * The caller must make sure core_id < nr_cores and smt_id < nr_threads.
  */
@@ -120,7 +118,8 @@ static inline apic_id_t 
x86_apicid_from_topo_ids(X86CPUTopoInfo *topo_info,
topo_ids->smt_id;
 }
 
-/* Calculate thread/core/package IDs for a specific topology,
+/*
+ * Calculate thread/core/package IDs for a specific topology,
  * based on (contiguous) CPU index
  */
 static inline void x86_topo_ids_from_idx(X86CPUTopoInfo *topo_info,
@@ -137,7 +136,8 @@ static inline void x86_topo_ids_from_idx(X86CPUTopoInfo 
*topo_info,
 topo_ids->smt_id = cpu_index % nr_threads;
 }
 
-/* Calculate thread/core/package IDs for a specific topology,
+/*
+ * Calculate thread/core/package IDs for a specific topology,
  * based on APIC ID
  */
 static inline void x86_topo_ids_from_apicid(apic_id_t apicid,
@@ -155,7 +155,8 @@ static inline void x86_topo_ids_from_apicid(apic_id_t 
apicid,
 topo_ids->pkg_id = apicid >> apicid_pkg_offset(topo_info);
 }
 
-/* Make APIC ID for the CPU 'cpu_index'
+/*
+ * Make APIC ID for the CPU 'cpu_index'
  *
  * 'cpu_index' is a sequential, contiguous ID for the CPU.
  */
-- 
2.34.1

[PATCH v3 07/17] i386: Introduce module-level cpu topology to CPUX86State

2023-08-01 Thread Zhao Liu

From: Zhuocheng Ding 

smp command has the "clusters" parameter but x86 hasn't supported that
level. "cluster" is a CPU topology level concept above cores, in which
the cores may share some resources (L2 cache or some others like L3
cache tags, depending on the Archs) [1][2]. For x86, the resource shared
by cores at the cluster level is mainly the L2 cache.

However, using cluster to define x86's L2 cache topology will cause the
compatibility problem:

Currently, x86 defaults that the L2 cache is shared in one core, which
actually implies a default setting "cores per L2 cache is 1" and
therefore implicitly defaults to having as many L2 caches as cores.

For example (i386 PC machine):
-smp 16,sockets=2,dies=2,cores=2,threads=2,maxcpus=16 (*)

Considering the topology of the L2 cache, this (*) implicitly means "1
core per L2 cache" and "2 L2 caches per die".

If we use cluster to configure L2 cache topology with the new default
setting "clusters per L2 cache is 1", the above semantics will change
to "2 cores per cluster" and "1 cluster per L2 cache", that is, "2
cores per L2 cache".

So the same command (*) will cause changes in the L2 cache topology,
further affecting the performance of the virtual machine.

Therefore, x86 should only treat cluster as a cpu topology level and
avoid using it to change L2 cache by default for compatibility.

"cluster" in smp is the CPU topology level which is between "core" and
die.

For x86, the "cluster" in smp is corresponding to the module level [2],
which is above the core level. So use the "module" other than "cluster"
in i386 code.

And please note that x86 already has a cpu topology level also named
"cluster" [3], this level is at the upper level of the package. Here,
the cluster in x86 cpu topology is completely different from the
"clusters" as the smp parameter. After the module level is introduced,
the cluster as the smp parameter will actually refer to the module level
of x86.

[1]: 864c3b5c32f0 ("hw/core/machine: Introduce CPU cluster topology support")
[2]: Yanan's comment about "cluster",
 https://lists.gnu.org/archive/html/qemu-devel/2023-02/msg04051.html
[3]: SDM, vol.3, ch.9, 9.9.1 Hierarchical Mapping of Shared Resources.

Signed-off-by: Zhuocheng Ding 
Co-developed-by: Zhao Liu 
Signed-off-by: Zhao Liu 
Acked-by: Michael S. Tsirkin 
---
Changes since v1:
 * The background of the introduction of the "cluster" parameter and its
   exact meaning were revised according to Yanan's explanation. (Yanan)
---
 hw/i386/x86.c | 1 +
 target/i386/cpu.c | 1 +
 target/i386/cpu.h | 5 +
 3 files changed, 7 insertions(+)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index a88a126123be..4efc390905ff 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -309,6 +309,7 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 init_topo_info(&topo_info, x86ms);
 
 env->nr_dies = ms->smp.dies;
+env->nr_modules = ms->smp.clusters;
 
 /*
  * If APIC ID is not set,
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index fc50bf98c60e..8a9fd5682efc 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7606,6 +7606,7 @@ static void x86_cpu_initfn(Object *obj)
 CPUX86State *env = &cpu->env;
 
 env->nr_dies = 1;
+env->nr_modules = 1;
 cpu_set_cpustate_pointers(cpu);
 
 object_property_add(obj, "feature-words", "X86CPUFeatureWordInfo",
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 7638128d59cc..5e97d0b76574 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1880,6 +1880,11 @@ typedef struct CPUArchState {
 
 /* Number of dies within this CPU package. */
 unsigned nr_dies;
+/*
+ * Number of modules within this CPU package.
+ * Module level in x86 cpu topology is corresponding to smp.clusters.
+ */
+unsigned nr_modules;
 } CPUX86State;
 
 struct kvm_msrs;
-- 
2.34.1

[PATCH v3 10/17] i386/cpu: Introduce cluster-id to X86CPU

2023-08-01 Thread Zhao Liu

From: Zhuocheng Ding 

We introduce cluster-id other than module-id to be consistent with
CpuInstanceProperties.cluster-id, and this avoids the confusion
of parameter names when hotplugging.

Following the legacy smp check rules, also add the cluster_id validity
into x86_cpu_pre_plug().

Signed-off-by: Zhuocheng Ding 
Co-developed-by: Zhao Liu 
Signed-off-by: Zhao Liu 
Acked-by: Michael S. Tsirkin 
---
 hw/i386/x86.c | 33 +
 target/i386/cpu.c |  2 ++
 target/i386/cpu.h |  1 +
 3 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 0b460fd6074d..8154b86f95c7 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -328,6 +328,14 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 cpu->die_id = 0;
 }
 
+/*
+ * cluster-id was optional in QEMU 8.0 and older, so keep it optional
+ * if there's only one cluster per die.
+ */
+if (cpu->cluster_id < 0 && ms->smp.clusters == 1) {
+cpu->cluster_id = 0;
+}
+
 if (cpu->socket_id < 0) {
 error_setg(errp, "CPU socket-id is not set");
 return;
@@ -344,6 +352,14 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
cpu->die_id, ms->smp.dies - 1);
 return;
 }
+if (cpu->cluster_id < 0) {
+error_setg(errp, "CPU cluster-id is not set");
+return;
+} else if (cpu->cluster_id > ms->smp.clusters - 1) {
+error_setg(errp, "Invalid CPU cluster-id: %u must be in range 
0:%u",
+   cpu->cluster_id, ms->smp.clusters - 1);
+return;
+}
 if (cpu->core_id < 0) {
 error_setg(errp, "CPU core-id is not set");
 return;
@@ -363,16 +379,9 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 
 topo_ids.pkg_id = cpu->socket_id;
 topo_ids.die_id = cpu->die_id;
+topo_ids.module_id = cpu->cluster_id;
 topo_ids.core_id = cpu->core_id;
 topo_ids.smt_id = cpu->thread_id;
-
-/*
- * TODO: This is the temporary initialization for topo_ids.module_id to
- * avoid "maybe-uninitialized" compilation errors. Will remove when
- * X86CPU supports cluster_id.
- */
-topo_ids.module_id = 0;
-
 cpu->apic_id = x86_apicid_from_topo_ids(&topo_info, &topo_ids);
 }
 
@@ -419,6 +428,14 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 }
 cpu->die_id = topo_ids.die_id;
 
+if (cpu->cluster_id != -1 && cpu->cluster_id != topo_ids.module_id) {
+error_setg(errp, "property cluster-id: %u doesn't match set apic-id:"
+" 0x%x (cluster-id: %u)", cpu->cluster_id, cpu->apic_id,
+topo_ids.module_id);
+return;
+}
+cpu->cluster_id = topo_ids.module_id;
+
 if (cpu->core_id != -1 && cpu->core_id != topo_ids.core_id) {
 error_setg(errp, "property core-id: %u doesn't match set apic-id:"
 " 0x%x (core-id: %u)", cpu->core_id, cpu->apic_id,
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index d6969813ee02..ffa282219078 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7806,12 +7806,14 @@ static Property x86_cpu_properties[] = {
 DEFINE_PROP_UINT32("apic-id", X86CPU, apic_id, 0),
 DEFINE_PROP_INT32("thread-id", X86CPU, thread_id, 0),
 DEFINE_PROP_INT32("core-id", X86CPU, core_id, 0),
+DEFINE_PROP_INT32("cluster-id", X86CPU, cluster_id, 0),
 DEFINE_PROP_INT32("die-id", X86CPU, die_id, 0),
 DEFINE_PROP_INT32("socket-id", X86CPU, socket_id, 0),
 #else
 DEFINE_PROP_UINT32("apic-id", X86CPU, apic_id, UNASSIGNED_APIC_ID),
 DEFINE_PROP_INT32("thread-id", X86CPU, thread_id, -1),
 DEFINE_PROP_INT32("core-id", X86CPU, core_id, -1),
+DEFINE_PROP_INT32("cluster-id", X86CPU, cluster_id, -1),
 DEFINE_PROP_INT32("die-id", X86CPU, die_id, -1),
 DEFINE_PROP_INT32("socket-id", X86CPU, socket_id, -1),
 #endif
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 5e97d0b76574..d9577938ae04 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2034,6 +2034,7 @@ struct ArchCPU {
 int32_t node_id; /* NUMA node this CPU belongs to */
 int32_t socket_id;
 int32_t die_id;
+int32_t cluster_id;
 int32_t core_id;
 int32_t thread_id;
 
-- 
2.34.1

[PATCH v3 08/17] i386: Support modules_per_die in X86CPUTopoInfo

2023-08-01 Thread Zhao Liu

From: Zhuocheng Ding 

Support module level in i386 cpu topology structure "X86CPUTopoInfo".

Since x86 does not yet support the "clusters" parameter in "-smp",
X86CPUTopoInfo.modules_per_die is currently always 1. Therefore, the
module level width in APIC ID, which can be calculated by
"apicid_bitwidth_for_count(topo_info->modules_per_die)", is always 0
for now, so we can directly add APIC ID related helpers to support
module level parsing.

At present, we don't expose module level in CPUID.1FH because currently
linux (v6.4-rc1) doesn't support module level. And exposing module and
die levels at the same time in CPUID.1FH will cause linux to calculate
the wrong die_id. The module level should be exposed until the real
machine has the module level in CPUID.1FH.

In addition, update topology structure in test-x86-topo.c.

Signed-off-by: Zhuocheng Ding 
Co-developed-by: Zhao Liu 
Signed-off-by: Zhao Liu 
Acked-by: Michael S. Tsirkin 
---
Changes since v1:
 * Include module level related helpers (apicid_module_width() and
   apicid_module_offset()) in this patch. (Yanan)
---
 hw/i386/x86.c  |  3 ++-
 include/hw/i386/topology.h | 22 +++
 target/i386/cpu.c  | 12 ++
 tests/unit/test-x86-topo.c | 45 --
 4 files changed, 52 insertions(+), 30 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 4efc390905ff..a552ae8bb4a8 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -72,7 +72,8 @@ static void init_topo_info(X86CPUTopoInfo *topo_info,
 MachineState *ms = MACHINE(x86ms);
 
 topo_info->dies_per_pkg = ms->smp.dies;
-topo_info->cores_per_die = ms->smp.cores;
+topo_info->modules_per_die = ms->smp.clusters;
+topo_info->cores_per_module = ms->smp.cores;
 topo_info->threads_per_core = ms->smp.threads;
 }
 
diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
index 5a19679f618b..c807d3811dd3 100644
--- a/include/hw/i386/topology.h
+++ b/include/hw/i386/topology.h
@@ -56,7 +56,8 @@ typedef struct X86CPUTopoIDs {
 
 typedef struct X86CPUTopoInfo {
 unsigned dies_per_pkg;
-unsigned cores_per_die;
+unsigned modules_per_die;
+unsigned cores_per_module;
 unsigned threads_per_core;
 } X86CPUTopoInfo;
 
@@ -77,7 +78,13 @@ static inline unsigned apicid_smt_width(X86CPUTopoInfo 
*topo_info)
 /* Bit width of the Core_ID field */
 static inline unsigned apicid_core_width(X86CPUTopoInfo *topo_info)
 {
-return apicid_bitwidth_for_count(topo_info->cores_per_die);
+return apicid_bitwidth_for_count(topo_info->cores_per_module);
+}
+
+/* Bit width of the Module_ID (cluster ID) field */
+static inline unsigned apicid_module_width(X86CPUTopoInfo *topo_info)
+{
+return apicid_bitwidth_for_count(topo_info->modules_per_die);
 }
 
 /* Bit width of the Die_ID field */
@@ -92,10 +99,16 @@ static inline unsigned apicid_core_offset(X86CPUTopoInfo 
*topo_info)
 return apicid_smt_width(topo_info);
 }
 
+/* Bit offset of the Module_ID (cluster ID) field */
+static inline unsigned apicid_module_offset(X86CPUTopoInfo *topo_info)
+{
+return apicid_core_offset(topo_info) + apicid_core_width(topo_info);
+}
+
 /* Bit offset of the Die_ID field */
 static inline unsigned apicid_die_offset(X86CPUTopoInfo *topo_info)
 {
-return apicid_core_offset(topo_info) + apicid_core_width(topo_info);
+return apicid_module_offset(topo_info) + apicid_module_width(topo_info);
 }
 
 /* Bit offset of the Pkg_ID (socket ID) field */
@@ -127,7 +140,8 @@ static inline void x86_topo_ids_from_idx(X86CPUTopoInfo 
*topo_info,
  X86CPUTopoIDs *topo_ids)
 {
 unsigned nr_dies = topo_info->dies_per_pkg;
-unsigned nr_cores = topo_info->cores_per_die;
+unsigned nr_cores = topo_info->cores_per_module *
+topo_info->modules_per_die;
 unsigned nr_threads = topo_info->threads_per_core;
 
 topo_ids->pkg_id = cpu_index / (nr_dies * nr_cores * nr_threads);
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 8a9fd5682efc..d6969813ee02 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -339,7 +339,9 @@ static void encode_cache_cpuid801d(CPUCacheInfo *cache,
 
 /* L3 is shared among multiple cores */
 if (cache->level == 3) {
-l3_threads = topo_info->cores_per_die * topo_info->threads_per_core;
+l3_threads = topo_info->modules_per_die *
+ topo_info->cores_per_module *
+ topo_info->threads_per_core;
 *eax |= (l3_threads - 1) << 14;
 } else {
 *eax |= ((topo_info->threads_per_core - 1) << 14);
@@ -6012,10 +6014,12 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 uint32_t cpus_per_pkg;
 
 topo_info.dies_per_pkg = env->nr_dies;
-topo_info.cores_per_die = cs->nr_cores / env->nr_dies;
+topo_info.modules_per_die = env->nr_modules;
+topo_info.cores_per_module = cs->nr_cores / env->nr_dies / env->nr_modules;

[PATCH v3 15/17] i386: Fix NumSharingCache for CPUID[0x8000001D].EAX[bits 25:14]

2023-08-01 Thread Zhao Liu

From: Zhao Liu 

The commit 8f4202fb1080 ("i386: Populate AMD Processor Cache Information
for cpuid 0x801D") adds the cache topology for AMD CPU by encoding
the number of sharing threads directly.

>From AMD's APM, NumSharingCache (CPUID[0x801D].EAX[bits 25:14])
means [1]:

The number of logical processors sharing this cache is the value of
this field incremented by 1. To determine which logical processors are
sharing a cache, determine a Share Id for each processor as follows:

ShareId = LocalApicId >> log2(NumSharingCache+1)

Logical processors with the same ShareId then share a cache. If
NumSharingCache+1 is not a power of two, round it up to the next power
of two.

>From the description above, the caculation of this feild should be same
as CPUID[4].EAX[bits 25:14] for intel cpus. So also use the offsets of
APIC ID to calculate this field.

Note: I don't have the AMD hardware available, hope folks can help me
to test this, thanks!

[1]: APM, vol.3, appendix.E.4.15 Function 8000_001Dh--Cache Topology
 Information

Cc: Babu Moger 
Signed-off-by: Zhao Liu 
---
Changes since v1:
 * Rename "l3_threads" to "num_apic_ids" in
   encode_cache_cpuid801d(). (Yanan)
 * Add the description of the original commit and add Cc.
---
 target/i386/cpu.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index c9897c0fe91a..f67b6be10b8d 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -361,7 +361,7 @@ static void encode_cache_cpuid801d(CPUCacheInfo *cache,
uint32_t *eax, uint32_t *ebx,
uint32_t *ecx, uint32_t *edx)
 {
-uint32_t l3_threads;
+uint32_t num_apic_ids;
 assert(cache->size == cache->line_size * cache->associativity *
   cache->partitions * cache->sets);
 
@@ -370,13 +370,11 @@ static void encode_cache_cpuid801d(CPUCacheInfo 
*cache,
 
 /* L3 is shared among multiple cores */
 if (cache->level == 3) {
-l3_threads = topo_info->modules_per_die *
- topo_info->cores_per_module *
- topo_info->threads_per_core;
-*eax |= (l3_threads - 1) << 14;
+num_apic_ids = 1 << apicid_die_offset(topo_info);
 } else {
-*eax |= ((topo_info->threads_per_core - 1) << 14);
+num_apic_ids = 1 << apicid_core_offset(topo_info);
 }
+*eax |= (num_apic_ids - 1) << 14;
 
 assert(cache->line_size > 0);
 assert(cache->partitions > 0);
-- 
2.34.1

[PATCH v3 12/17] hw/i386/pc: Support smp.clusters for x86 PC machine

2023-08-01 Thread Zhao Liu

From: Zhuocheng Ding 

As module-level topology support is added to X86CPU, now we can enable
the support for the cluster parameter on PC machines. With this support,
we can define a 5-level x86 CPU topology with "-smp":

-smp cpus=*,maxcpus=*,sockets=*,dies=*,clusters=*,cores=*,threads=*.

Additionally, add the 5-level topology example in description of "-smp".

Signed-off-by: Zhuocheng Ding 
Signed-off-by: Zhao Liu 
Reviewed-by: Yanan Wang 
Acked-by: Michael S. Tsirkin 
---
 hw/i386/pc.c|  1 +
 qemu-options.hx | 10 +-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 3109d5e0e035..f2ec5720d233 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1885,6 +1885,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE;
 mc->nvdimm_supported = true;
 mc->smp_props.dies_supported = true;
+mc->smp_props.clusters_supported = true;
 mc->default_ram_id = "pc.ram";
 pcmc->default_smbios_ep_type = SMBIOS_ENTRY_POINT_TYPE_64;
 
diff --git a/qemu-options.hx b/qemu-options.hx
index 29b98c3d4c55..5fb73d996151 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -319,14 +319,14 @@ SRST
 -smp 8,sockets=2,cores=2,threads=2,maxcpus=8
 
 The following sub-option defines a CPU topology hierarchy (2 sockets
-totally on the machine, 2 dies per socket, 2 cores per die, 2 threads
-per core) for PC machines which support sockets/dies/cores/threads.
-Some members of the option can be omitted but their values will be
-automatically computed:
+totally on the machine, 2 dies per socket, 2 clusters per die, 2 cores per
+cluster, 2 threads per core) for PC machines which support sockets/dies
+/clusters/cores/threads. Some members of the option can be omitted but
+their values will be automatically computed:
 
 ::
 
--smp 16,sockets=2,dies=2,cores=2,threads=2,maxcpus=16
+-smp 32,sockets=2,dies=2,clusters=2,cores=2,threads=2,maxcpus=32
 
 The following sub-option defines a CPU topology hierarchy (2 sockets
 totally on the machine, 2 clusters per socket, 2 cores per cluster,
-- 
2.34.1

[PATCH v3 16/17] i386: Use CPUCacheInfo.share_level to encode CPUID[0x8000001D].EAX[bits 25:14]

2023-08-01 Thread Zhao Liu

From: Zhao Liu 

CPUID[0x801D].EAX[bits 25:14] is used to represent the cache
topology for amd CPUs.

After cache models have topology information, we can use
CPUCacheInfo.share_level to decide which topology level to be encoded
into CPUID[0x801D].EAX[bits 25:14].

Signed-off-by: Zhao Liu 
---
Changes since v1:
 * Use cache->share_level as the parameter in
   max_processor_ids_for_cache().
---
 target/i386/cpu.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index f67b6be10b8d..6eee0274ade4 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -361,20 +361,12 @@ static void encode_cache_cpuid801d(CPUCacheInfo 
*cache,
uint32_t *eax, uint32_t *ebx,
uint32_t *ecx, uint32_t *edx)
 {
-uint32_t num_apic_ids;
 assert(cache->size == cache->line_size * cache->associativity *
   cache->partitions * cache->sets);
 
 *eax = CACHE_TYPE(cache->type) | CACHE_LEVEL(cache->level) |
(cache->self_init ? CACHE_SELF_INIT_LEVEL : 0);
-
-/* L3 is shared among multiple cores */
-if (cache->level == 3) {
-num_apic_ids = 1 << apicid_die_offset(topo_info);
-} else {
-num_apic_ids = 1 << apicid_core_offset(topo_info);
-}
-*eax |= (num_apic_ids - 1) << 14;
+*eax |= max_processor_ids_for_cache(topo_info, cache->share_level) << 14;
 
 assert(cache->line_size > 0);
 assert(cache->partitions > 0);
-- 
2.34.1

[PATCH v3 14/17] i386: Use CPUCacheInfo.share_level to encode CPUID[4]

2023-08-01 Thread Zhao Liu

From: Zhao Liu 

CPUID[4].EAX[bits 25:14] is used to represent the cache topology for
intel CPUs.

After cache models have topology information, we can use
CPUCacheInfo.share_level to decide which topology level to be encoded
into CPUID[4].EAX[bits 25:14].

And since maximum_processor_id (original "num_apic_ids") is parsed
based on cpu topology levels, which are verified when parsing smp, it's
no need to check this value by "assert(num_apic_ids > 0)" again, so
remove this assert.

Additionally, wrap the encoding of CPUID[4].EAX[bits 31:26] into a
helper to make the code cleaner.

Signed-off-by: Zhao Liu 
---
Changes since v1:
 * Use "enum CPUTopoLevel share_level" as the parameter in
   max_processor_ids_for_cache().
 * Make cache_into_passthrough case also use
   max_processor_ids_for_cache() and max_core_ids_in_package() to
   encode CPUID[4]. (Yanan)
 * Rename the title of this patch (the original is "i386: Use
   CPUCacheInfo.share_level to encode CPUID[4].EAX[bits 25:14]").
---
 target/i386/cpu.c | 70 +--
 1 file changed, 43 insertions(+), 27 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 55aba4889628..c9897c0fe91a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -234,22 +234,53 @@ static uint8_t cpuid2_cache_descriptor(CPUCacheInfo 
*cache)
((t) == UNIFIED_CACHE) ? CACHE_TYPE_UNIFIED : \
0 /* Invalid value */)
 
+static uint32_t max_processor_ids_for_cache(X86CPUTopoInfo *topo_info,
+enum CPUTopoLevel share_level)
+{
+uint32_t num_ids = 0;
+
+switch (share_level) {
+case CPU_TOPO_LEVEL_CORE:
+num_ids = 1 << apicid_core_offset(topo_info);
+break;
+case CPU_TOPO_LEVEL_DIE:
+num_ids = 1 << apicid_die_offset(topo_info);
+break;
+case CPU_TOPO_LEVEL_PACKAGE:
+num_ids = 1 << apicid_pkg_offset(topo_info);
+break;
+default:
+/*
+ * Currently there is no use case for SMT and MODULE, so use
+ * assert directly to facilitate debugging.
+ */
+g_assert_not_reached();
+}
+
+return num_ids - 1;
+}
+
+static uint32_t max_core_ids_in_package(X86CPUTopoInfo *topo_info)
+{
+uint32_t num_cores = 1 << (apicid_pkg_offset(topo_info) -
+   apicid_core_offset(topo_info));
+return num_cores - 1;
+}
 
 /* Encode cache info for CPUID[4] */
 static void encode_cache_cpuid4(CPUCacheInfo *cache,
-int num_apic_ids, int num_cores,
+X86CPUTopoInfo *topo_info,
 uint32_t *eax, uint32_t *ebx,
 uint32_t *ecx, uint32_t *edx)
 {
 assert(cache->size == cache->line_size * cache->associativity *
   cache->partitions * cache->sets);
 
-assert(num_apic_ids > 0);
 *eax = CACHE_TYPE(cache->type) |
CACHE_LEVEL(cache->level) |
(cache->self_init ? CACHE_SELF_INIT_LEVEL : 0) |
-   ((num_cores - 1) << 26) |
-   ((num_apic_ids - 1) << 14);
+   (max_core_ids_in_package(topo_info) << 26) |
+   (max_processor_ids_for_cache(topo_info, cache->share_level) << 14);
 
 assert(cache->line_size > 0);
 assert(cache->partitions > 0);
@@ -6116,56 +6147,41 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 int host_vcpus_per_cache = 1 + ((*eax & 0x3FFC000) >> 14);
 
 if (cores_per_pkg > 1) {
-int addressable_cores_offset =
-apicid_pkg_offset(&topo_info) -
-apicid_core_offset(&topo_info);
-
 *eax &= ~0xFC00;
-*eax |= (1 << addressable_cores_offset - 1) << 26;
+*eax |= max_core_ids_in_package(&topo_info) << 26;
 }
 if (host_vcpus_per_cache > cpus_per_pkg) {
-int pkg_offset = apicid_pkg_offset(&topo_info);
-
 *eax &= ~0x3FFC000;
-*eax |= (1 << pkg_offset - 1) << 14;
+*eax |=
+max_processor_ids_for_cache(&topo_info,
+CPU_TOPO_LEVEL_PACKAGE) << 14;
 }
 }
 } else if (cpu->vendor_cpuid_only && IS_AMD_CPU(env)) {
 *eax = *ebx = *ecx = *edx = 0;
 } else {
 *eax = 0;
-int addressable_cores_offset = apicid_pkg_offset(&topo_info) -
-   apicid_core_offset(&topo_info);
-int core_offset, die_offset;
 
 switch (count) {
 case 0: /* L1 dcache info */
-core_offset = apicid_core_offset(&topo_info);
 encode_cache_cpuid4(env->cache_info_cpuid4.l1d_c

[PATCH v3 04/17] i386/cpu: Fix i/d-cache topology to core level for Intel CPU

2023-08-01 Thread Zhao Liu

From: Zhao Liu 

For i-cache and d-cache, the maximum IDs for CPUs sharing cache (
CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits 25:14]) are
both 0, and this means i-cache and d-cache are shared in the SMT level.
This is correct if there's single thread per core, but is wrong for the
hyper threading case (one core contains multiple threads) since the
i-cache and d-cache are shared in the core level other than SMT level.

For AMD CPU, commit 8f4202fb1080 ("i386: Populate AMD Processor Cache
Information for cpuid 0x801D") has already introduced i/d cache
topology as core level by default.

Therefore, in order to be compatible with both multi-threaded and
single-threaded situations, we should set i-cache and d-cache be shared
at the core level by default.

This fix changes the default i/d cache topology from per-thread to
per-core. Potentially, this change in L1 cache topology may affect the
performance of the VM if the user does not specifically specify the
topology or bind the vCPU. However, the way to achieve optimal
performance should be to create a reasonable topology and set the
appropriate vCPU affinity without relying on QEMU's default topology
structure.

Fixes: 7e3482f82480 ("i386: Helpers to encode cache information consistently")
Suggested-by: Robert Hoo 
Signed-off-by: Zhao Liu 
---
Changes since v1:
 * Split this fix from the patch named "i386/cpu: Fix number of
   addressable IDs in CPUID.04H".
 * Add the explanation of the impact on performance. (Xiaoyao)
---
 target/i386/cpu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 50613cd04612..b439a05244ee 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6104,12 +6104,12 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 switch (count) {
 case 0: /* L1 dcache info */
 encode_cache_cpuid4(env->cache_info_cpuid4.l1d_cache,
-1, cs->nr_cores,
+cs->nr_threads, cs->nr_cores,
 eax, ebx, ecx, edx);
 break;
 case 1: /* L1 icache info */
 encode_cache_cpuid4(env->cache_info_cpuid4.l1i_cache,
-1, cs->nr_cores,
+cs->nr_threads, cs->nr_cores,
 eax, ebx, ecx, edx);
 break;
 case 2: /* L2 cache info */
-- 
2.34.1

[PATCH v3 13/17] i386: Add cache topology info in CPUCacheInfo

2023-08-01 Thread Zhao Liu

From: Zhao Liu 

Currently, by default, the cache topology is encoded as:
1. i/d cache is shared in one core.
2. L2 cache is shared in one core.
3. L3 cache is shared in one die.

This default general setting has caused a misunderstanding, that is, the
cache topology is completely equated with a specific cpu topology, such
as the connection between L2 cache and core level, and the connection
between L3 cache and die level.

In fact, the settings of these topologies depend on the specific
platform and are not static. For example, on Alder Lake-P, every
four Atom cores share the same L2 cache.

Thus, we should explicitly define the corresponding cache topology for
different cache models to increase scalability.

Except legacy_l2_cache_cpuid2 (its default topo level is
CPU_TOPO_LEVEL_UNKNOW), explicitly set the corresponding topology level
for all other cache models. In order to be compatible with the existing
cache topology, set the CPU_TOPO_LEVEL_CORE level for the i/d cache, set
the CPU_TOPO_LEVEL_CORE level for L2 cache, and set the
CPU_TOPO_LEVEL_DIE level for L3 cache.

The field for CPUID[4].EAX[bits 25:14] or CPUID[0x801D].EAX[bits
25:14] will be set based on CPUCacheInfo.share_level.

Signed-off-by: Zhao Liu 
---
Changes since v1:
 * Add the prefix "CPU_TOPO_LEVEL_*" for CPU topology level names.
   (Yanan)
 * Rename the "INVALID" level to "CPU_TOPO_LEVEL_UNKNOW". (Yanan)
---
 target/i386/cpu.c | 19 +++
 target/i386/cpu.h | 16 
 2 files changed, 35 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index ffa282219078..55aba4889628 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -436,6 +436,7 @@ static CPUCacheInfo legacy_l1d_cache = {
 .sets = 64,
 .partitions = 1,
 .no_invd_sharing = true,
+.share_level = CPU_TOPO_LEVEL_CORE,
 };
 
 /*FIXME: CPUID leaf 0x8005 is inconsistent with leaves 2 & 4 */
@@ -450,6 +451,7 @@ static CPUCacheInfo legacy_l1d_cache_amd = {
 .partitions = 1,
 .lines_per_tag = 1,
 .no_invd_sharing = true,
+.share_level = CPU_TOPO_LEVEL_CORE,
 };
 
 /* L1 instruction cache: */
@@ -463,6 +465,7 @@ static CPUCacheInfo legacy_l1i_cache = {
 .sets = 64,
 .partitions = 1,
 .no_invd_sharing = true,
+.share_level = CPU_TOPO_LEVEL_CORE,
 };
 
 /*FIXME: CPUID leaf 0x8005 is inconsistent with leaves 2 & 4 */
@@ -477,6 +480,7 @@ static CPUCacheInfo legacy_l1i_cache_amd = {
 .partitions = 1,
 .lines_per_tag = 1,
 .no_invd_sharing = true,
+.share_level = CPU_TOPO_LEVEL_CORE,
 };
 
 /* Level 2 unified cache: */
@@ -490,6 +494,7 @@ static CPUCacheInfo legacy_l2_cache = {
 .sets = 4096,
 .partitions = 1,
 .no_invd_sharing = true,
+.share_level = CPU_TOPO_LEVEL_CORE,
 };
 
 /*FIXME: CPUID leaf 2 descriptor is inconsistent with CPUID leaf 4 */
@@ -512,6 +517,7 @@ static CPUCacheInfo legacy_l2_cache_amd = {
 .associativity = 16,
 .sets = 512,
 .partitions = 1,
+.share_level = CPU_TOPO_LEVEL_CORE,
 };
 
 /* Level 3 unified cache: */
@@ -527,6 +533,7 @@ static CPUCacheInfo legacy_l3_cache = {
 .self_init = true,
 .inclusive = true,
 .complex_indexing = true,
+.share_level = CPU_TOPO_LEVEL_DIE,
 };
 
 /* TLB definitions: */
@@ -1819,6 +1826,7 @@ static const CPUCaches epyc_cache_info = {
 .lines_per_tag = 1,
 .self_init = 1,
 .no_invd_sharing = true,
+.share_level = CPU_TOPO_LEVEL_CORE,
 },
 .l1i_cache = &(CPUCacheInfo) {
 .type = INSTRUCTION_CACHE,
@@ -1831,6 +1839,7 @@ static const CPUCaches epyc_cache_info = {
 .lines_per_tag = 1,
 .self_init = 1,
 .no_invd_sharing = true,
+.share_level = CPU_TOPO_LEVEL_CORE,
 },
 .l2_cache = &(CPUCacheInfo) {
 .type = UNIFIED_CACHE,
@@ -1841,6 +1850,7 @@ static const CPUCaches epyc_cache_info = {
 .partitions = 1,
 .sets = 1024,
 .lines_per_tag = 1,
+.share_level = CPU_TOPO_LEVEL_CORE,
 },
 .l3_cache = &(CPUCacheInfo) {
 .type = UNIFIED_CACHE,
@@ -1854,6 +1864,7 @@ static const CPUCaches epyc_cache_info = {
 .self_init = true,
 .inclusive = true,
 .complex_indexing = true,
+.share_level = CPU_TOPO_LEVEL_DIE,
 },
 };
 
@@ -1919,6 +1930,7 @@ static const CPUCaches epyc_rome_cache_info = {
 .lines_per_tag = 1,
 .self_init = 1,
 .no_invd_sharing = true,
+.share_level = CPU_TOPO_LEVEL_CORE,
 },
 .l1i_cache = &(CPUCacheInfo) {
 .type = INSTRUCTION_CACHE,
@@ -1931,6 +1943,7 @@ static const CPUCaches epyc_rome_cache_info = {
 .lines_per_tag = 1,
 .self_init = 1,
 .no_invd_sharing = true,
+.share_level = CPU_TOPO_LEVEL_CORE,
 },
 .l2_cache = &(CPUCacheInfo) {
 .type = UNIFIED_CACHE,
@@ -1941,6 +1954,7 @@ static const CPUCaches epyc_rome_cache_info = {
 .partitions = 1,
 .sets = 1024,
 .lines_per_tag = 1,

[PATCH v3 17/17] i386: Add new property to control L2 cache topo in CPUID.04H

2023-08-01 Thread Zhao Liu

From: Zhao Liu 

The property x-l2-cache-topo will be used to change the L2 cache
topology in CPUID.04H.

Now it allows user to set the L2 cache is shared in core level or
cluster level.

If user passes "-cpu x-l2-cache-topo=[core|cluster]" then older L2 cache
topology will be overrided by the new topology setting.

Here we expose to user "cluster" instead of "module", to be consistent
with "cluster-id" naming.

Since CPUID.04H is used by intel CPUs, this property is available on
intel CPUs as for now.

When necessary, it can be extended to CPUID.801DH for amd CPUs.

Signed-off-by: Zhao Liu 
Tested-by: Yongwei Ma 
---
Changes since v1:
 * Rename MODULE branch to CPU_TOPO_LEVEL_MODULE to match the previous
   renaming changes.
---
 target/i386/cpu.c | 34 +-
 target/i386/cpu.h |  2 ++
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 6eee0274ade4..f4c48e19fa4e 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -243,6 +243,9 @@ static uint32_t max_processor_ids_for_cache(X86CPUTopoInfo 
*topo_info,
 case CPU_TOPO_LEVEL_CORE:
 num_ids = 1 << apicid_core_offset(topo_info);
 break;
+case CPU_TOPO_LEVEL_MODULE:
+num_ids = 1 << apicid_module_offset(topo_info);
+break;
 case CPU_TOPO_LEVEL_DIE:
 num_ids = 1 << apicid_die_offset(topo_info);
 break;
@@ -251,7 +254,7 @@ static uint32_t max_processor_ids_for_cache(X86CPUTopoInfo 
*topo_info,
 break;
 default:
 /*
- * Currently there is no use case for SMT and MODULE, so use
+ * Currently there is no use case for SMT, so use
  * assert directly to facilitate debugging.
  */
 g_assert_not_reached();
@@ -7458,6 +7461,34 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
**errp)
 env->cache_info_amd.l3_cache = &legacy_l3_cache;
 }
 
+if (cpu->l2_cache_topo_level) {
+/*
+ * FIXME: Currently only supports changing CPUID[4] (for intel), and
+ * will support changing CPUID[0x801D] when necessary.
+ */
+if (!IS_INTEL_CPU(env)) {
+error_setg(errp, "only intel cpus supports x-l2-cache-topo");
+return;
+}
+
+if (!strcmp(cpu->l2_cache_topo_level, "core")) {
+env->cache_info_cpuid4.l2_cache->share_level = CPU_TOPO_LEVEL_CORE;
+} else if (!strcmp(cpu->l2_cache_topo_level, "cluster")) {
+/*
+ * We expose to users "cluster" instead of "module", to be
+ * consistent with "cluster-id" naming.
+ */
+env->cache_info_cpuid4.l2_cache->share_level =
+CPU_TOPO_LEVEL_MODULE;
+} else {
+error_setg(errp,
+   "x-l2-cache-topo doesn't support '%s', "
+   "and it only supports 'core' or 'cluster'",
+   cpu->l2_cache_topo_level);
+return;
+}
+}
+
 #ifndef CONFIG_USER_ONLY
 MachineState *ms = MACHINE(qdev_get_machine());
 qemu_register_reset(x86_cpu_machine_reset_cb, cpu);
@@ -7961,6 +7992,7 @@ static Property x86_cpu_properties[] = {
  false),
 DEFINE_PROP_BOOL("x-intel-pt-auto-level", X86CPU, intel_pt_auto_level,
  true),
+DEFINE_PROP_STRING("x-l2-cache-topo", X86CPU, l2_cache_topo_level),
 DEFINE_PROP_END_OF_LIST()
 };
 
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 3f0cdc45607a..24db2a0d9588 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2057,6 +2057,8 @@ struct ArchCPU {
 int32_t hv_max_vps;
 
 bool xen_vapic;
+
+char *l2_cache_topo_level;
 };
 
 
-- 
2.34.1

[PATCH v3 06/17] i386/cpu: Consolidate the use of topo_info in cpu_x86_cpuid()

2023-08-01 Thread Zhao Liu

From: Zhao Liu 

In cpu_x86_cpuid(), there are many variables in representing the cpu
topology, e.g., topo_info, cs->nr_cores/cs->nr_threads.

Since the names of cs->nr_cores/cs->nr_threads does not accurately
represent its meaning, the use of cs->nr_cores/cs->nr_threads is prone
to confusion and mistakes.

And the structure X86CPUTopoInfo names its memebers clearly, thus the
variable "topo_info" should be preferred.

In addition, in cpu_x86_cpuid(), to uniformly use the topology variable,
replace env->dies with topo_info.dies_per_pkg as well.

Suggested-by: Robert Hoo 
Signed-off-by: Zhao Liu 
---
Changes since v1:
 * Extract cores_per_socket from the code block and use it as a local
   variable for cpu_x86_cpuid(). (Yanan)
 * Remove vcpus_per_socket variable and use cpus_per_pkg directly.
   (Yanan)
 * Replace env->dies with topo_info.dies_per_pkg in cpu_x86_cpuid().
---
 target/i386/cpu.c | 31 ++-
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index c80613bfcded..fc50bf98c60e 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6008,11 +6008,16 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 uint32_t limit;
 uint32_t signature[3];
 X86CPUTopoInfo topo_info;
+uint32_t cores_per_pkg;
+uint32_t cpus_per_pkg;
 
 topo_info.dies_per_pkg = env->nr_dies;
 topo_info.cores_per_die = cs->nr_cores / env->nr_dies;
 topo_info.threads_per_core = cs->nr_threads;
 
+cores_per_pkg = topo_info.cores_per_die * topo_info.dies_per_pkg;
+cpus_per_pkg = cores_per_pkg * topo_info.threads_per_core;
+
 /* Calculate & apply limits for different index ranges */
 if (index >= 0xC000) {
 limit = env->cpuid_xlevel2;
@@ -6048,8 +6053,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 *ecx |= CPUID_EXT_OSXSAVE;
 }
 *edx = env->features[FEAT_1_EDX];
-if (cs->nr_cores * cs->nr_threads > 1) {
-*ebx |= (cs->nr_cores * cs->nr_threads) << 16;
+if (cpus_per_pkg > 1) {
+*ebx |= cpus_per_pkg << 16;
 *edx |= CPUID_HT;
 }
 if (!cpu->enable_pmu) {
@@ -6086,8 +6091,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
  */
 if (*eax & 31) {
 int host_vcpus_per_cache = 1 + ((*eax & 0x3FFC000) >> 14);
-int vcpus_per_socket = cs->nr_cores * cs->nr_threads;
-if (cs->nr_cores > 1) {
+
+if (cores_per_pkg > 1) {
 int addressable_cores_offset =
 apicid_pkg_offset(&topo_info) -
 apicid_core_offset(&topo_info);
@@ -6095,7 +6100,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 *eax &= ~0xFC00;
 *eax |= (1 << addressable_cores_offset - 1) << 26;
 }
-if (host_vcpus_per_cache > vcpus_per_socket) {
+if (host_vcpus_per_cache > cpus_per_pkg) {
 int pkg_offset = apicid_pkg_offset(&topo_info);
 
 *eax &= ~0x3FFC000;
@@ -6240,12 +6245,12 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 switch (count) {
 case 0:
 *eax = apicid_core_offset(&topo_info);
-*ebx = cs->nr_threads;
+*ebx = topo_info.threads_per_core;
 *ecx |= CPUID_TOPOLOGY_LEVEL_SMT;
 break;
 case 1:
 *eax = apicid_pkg_offset(&topo_info);
-*ebx = cs->nr_cores * cs->nr_threads;
+*ebx = cpus_per_pkg;
 *ecx |= CPUID_TOPOLOGY_LEVEL_CORE;
 break;
 default:
@@ -6266,7 +6271,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 break;
 case 0x1F:
 /* V2 Extended Topology Enumeration Leaf */
-if (env->nr_dies < 2) {
+if (topo_info.dies_per_pkg < 2) {
 *eax = *ebx = *ecx = *edx = 0;
 break;
 }
@@ -6276,7 +6281,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 switch (count) {
 case 0:
 *eax = apicid_core_offset(&topo_info);
-*ebx = cs->nr_threads;
+*ebx = topo_info.threads_per_core;
 *ecx |= CPUID_TOPOLOGY_LEVEL_SMT;
 break;
 case 1:
@@ -6286,7 +6291,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 break;
 case 2:
 *eax = apicid_pkg_offset(&topo_info);
-*ebx = cs->nr_cores * cs->nr_threads;
+*ebx = cpus_per_pkg;
 *ecx |= CPUID_TOPOLOGY_LEVEL_DIE;
 break;
 default:
@@ -6511,7 +6516,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
  * d

[PATCH v3 05/17] i386/cpu: Use APIC ID offset to encode cache topo in CPUID[4]

2023-08-01 Thread Zhao Liu

From: Zhao Liu 

Refer to the fixes of cache_info_passthrough ([1], [2]) and SDM, the
CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26] should use the
nearest power-of-2 integer.

The nearest power-of-2 integer can be caculated by pow2ceil() or by
using APIC ID offset (like L3 topology using 1 << die_offset [3]).

But in fact, CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26]
are associated with APIC ID. For example, in linux kernel, the field
"num_threads_sharing" (Bits 25 - 14) is parsed with APIC ID. And for
another example, on Alder Lake P, the CPUID.04H:EAX[bits 31:26] is not
matched with actual core numbers and it's caculated by:
"(1 << (pkg_offset - core_offset)) - 1".

Therefore the offset of APIC ID should be preferred to caculate nearest
power-of-2 integer for CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits
31:26]:
1. d/i cache is shared in a core, 1 << core_offset should be used
   instand of "cs->nr_threads" in encode_cache_cpuid4() for
   CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits 25:14].
2. L2 cache is supposed to be shared in a core as for now, thereby
   1 << core_offset should also be used instand of "cs->nr_threads" in
   encode_cache_cpuid4() for CPUID.04H.02H:EAX[bits 25:14].
3. Similarly, the value for CPUID.04H:EAX[bits 31:26] should also be
   replaced by the offsets upper SMT level in APIC ID.

In addition, use APIC ID offset to replace "pow2ceil()" for
cache_info_passthrough case.

[1]: efb3934adf9e ("x86: cpu: make sure number of addressable IDs for processor 
cores meets the spec")
[2]: d7caf13b5fcf ("x86: cpu: fixup number of addressable IDs for logical 
processors sharing cache")
[3]: d65af288a84d ("i386: Update new x86_apicid parsing rules with die_offset 
support")

Fixes: 7e3482f82480 ("i386: Helpers to encode cache information consistently")
Suggested-by: Robert Hoo 
Signed-off-by: Zhao Liu 
---
Changes since v1:
 * Use APIC ID offset to replace "pow2ceil()" for cache_info_passthrough
   case. (Yanan)
 * Split the L1 cache fix into a separate patch.
 * Rename the title of this patch (the original is "i386/cpu: Fix number
   of addressable IDs in CPUID.04H").
---
 target/i386/cpu.c | 30 +++---
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index b439a05244ee..c80613bfcded 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6005,7 +6005,6 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 {
 X86CPU *cpu = env_archcpu(env);
 CPUState *cs = env_cpu(env);
-uint32_t die_offset;
 uint32_t limit;
 uint32_t signature[3];
 X86CPUTopoInfo topo_info;
@@ -6089,39 +6088,56 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 int host_vcpus_per_cache = 1 + ((*eax & 0x3FFC000) >> 14);
 int vcpus_per_socket = cs->nr_cores * cs->nr_threads;
 if (cs->nr_cores > 1) {
+int addressable_cores_offset =
+apicid_pkg_offset(&topo_info) -
+apicid_core_offset(&topo_info);
+
 *eax &= ~0xFC00;
-*eax |= (pow2ceil(cs->nr_cores) - 1) << 26;
+*eax |= (1 << addressable_cores_offset - 1) << 26;
 }
 if (host_vcpus_per_cache > vcpus_per_socket) {
+int pkg_offset = apicid_pkg_offset(&topo_info);
+
 *eax &= ~0x3FFC000;
-*eax |= (pow2ceil(vcpus_per_socket) - 1) << 14;
+*eax |= (1 << pkg_offset - 1) << 14;
 }
 }
 } else if (cpu->vendor_cpuid_only && IS_AMD_CPU(env)) {
 *eax = *ebx = *ecx = *edx = 0;
 } else {
 *eax = 0;
+int addressable_cores_offset = apicid_pkg_offset(&topo_info) -
+   apicid_core_offset(&topo_info);
+int core_offset, die_offset;
+
 switch (count) {
 case 0: /* L1 dcache info */
+core_offset = apicid_core_offset(&topo_info);
 encode_cache_cpuid4(env->cache_info_cpuid4.l1d_cache,
-cs->nr_threads, cs->nr_cores,
+(1 << core_offset),
+(1 << addressable_cores_offset),
 eax, ebx, ecx, edx);
 break;
 case 1: /* L1 icache info */
+core_offset = apicid_core_offset(&topo_info);
 encode_cache_cpuid4(env->cache_info_cpuid4.l1i_cache,
-cs->nr_threads, cs->nr_cores,
+(1 << core_offset),
+(1 << addressable_cores_offset),
 eax, ebx, ecx, edx);
 break;

[PATCH v3 11/17] tests: Add test case of APIC ID for module level parsing

2023-08-01 Thread Zhao Liu

From: Zhuocheng Ding 

After i386 supports module level, it's time to add the test for module
level's parsing.

Signed-off-by: Zhuocheng Ding 
Co-developed-by: Zhao Liu 
Signed-off-by: Zhao Liu 
Tested-by: Yongwei Ma 
Reviewed-by: Yanan Wang 
Acked-by: Michael S. Tsirkin 
---
 tests/unit/test-x86-topo.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/tests/unit/test-x86-topo.c b/tests/unit/test-x86-topo.c
index f21b8a5d95c2..55b731ccae55 100644
--- a/tests/unit/test-x86-topo.c
+++ b/tests/unit/test-x86-topo.c
@@ -37,6 +37,7 @@ static void test_topo_bits(void)
 topo_info = (X86CPUTopoInfo) {1, 1, 1, 1};
 g_assert_cmpuint(apicid_smt_width(&topo_info), ==, 0);
 g_assert_cmpuint(apicid_core_width(&topo_info), ==, 0);
+g_assert_cmpuint(apicid_module_width(&topo_info), ==, 0);
 g_assert_cmpuint(apicid_die_width(&topo_info), ==, 0);
 
 topo_info = (X86CPUTopoInfo) {1, 1, 1, 1};
@@ -74,13 +75,22 @@ static void test_topo_bits(void)
 topo_info = (X86CPUTopoInfo) {1, 1, 33, 2};
 g_assert_cmpuint(apicid_core_width(&topo_info), ==, 6);
 
-topo_info = (X86CPUTopoInfo) {1, 1, 30, 2};
+topo_info = (X86CPUTopoInfo) {1, 6, 30, 2};
+g_assert_cmpuint(apicid_module_width(&topo_info), ==, 3);
+topo_info = (X86CPUTopoInfo) {1, 7, 30, 2};
+g_assert_cmpuint(apicid_module_width(&topo_info), ==, 3);
+topo_info = (X86CPUTopoInfo) {1, 8, 30, 2};
+g_assert_cmpuint(apicid_module_width(&topo_info), ==, 3);
+topo_info = (X86CPUTopoInfo) {1, 9, 30, 2};
+g_assert_cmpuint(apicid_module_width(&topo_info), ==, 4);
+
+topo_info = (X86CPUTopoInfo) {1, 6, 30, 2};
 g_assert_cmpuint(apicid_die_width(&topo_info), ==, 0);
-topo_info = (X86CPUTopoInfo) {2, 1, 30, 2};
+topo_info = (X86CPUTopoInfo) {2, 6, 30, 2};
 g_assert_cmpuint(apicid_die_width(&topo_info), ==, 1);
-topo_info = (X86CPUTopoInfo) {3, 1, 30, 2};
+topo_info = (X86CPUTopoInfo) {3, 6, 30, 2};
 g_assert_cmpuint(apicid_die_width(&topo_info), ==, 2);
-topo_info = (X86CPUTopoInfo) {4, 1, 30, 2};
+topo_info = (X86CPUTopoInfo) {4, 6, 30, 2};
 g_assert_cmpuint(apicid_die_width(&topo_info), ==, 2);
 
 /* build a weird topology and see if IDs are calculated correctly
@@ -91,6 +101,7 @@ static void test_topo_bits(void)
 topo_info = (X86CPUTopoInfo) {1, 1, 6, 3};
 g_assert_cmpuint(apicid_smt_width(&topo_info), ==, 2);
 g_assert_cmpuint(apicid_core_offset(&topo_info), ==, 2);
+g_assert_cmpuint(apicid_module_offset(&topo_info), ==, 5);
 g_assert_cmpuint(apicid_die_offset(&topo_info), ==, 5);
 g_assert_cmpuint(apicid_pkg_offset(&topo_info), ==, 5);
 
-- 
2.34.1

Re: [PATCH 0/5] linux-user: brk/mmap fixes

2023-08-01 Thread Helge Deller


On 8/1/23 06:49, Joel Stanley wrote:

On Mon, 31 Jul 2023 at 18:24, Helge Deller  wrote:

As suggested, I've based my patches on top of yours and the tree can be
pulled from:
git pull https://github.com/hdeller/qemu-hppa/   brk-fixes-akihiko-2

My patches are neccessary to fix an arm-static testcase:
 /usr/bin/qemu-arm-static ./fstype

Let's try this patch series...


The armhf static binary works with expected output.


Good!


The arm static binary causes qemu to segfault:


I can't reproduce here.
I tried it in an arm64 chroot which provided the cross-compiler and worked for 
me:

(arm64-chroot)root@p100:/# uname -a
Linux p100 6.4.6-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Jul 24 20:51:12 UTC 
2023 aarch64 GNU/Linux
(arm64-chroot)root@p100:/# arm-linux-gnueabi-gcc-13 -o hello hello.c -static
(arm64-chroot)root@p100:/# file hello
hello: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically 
linked, BuildID[sha1]=fa0f7cd6e1779fa8cd76c6e5d3123900ceefa952, for GNU/Linux 
3.2.0, not stripped
(arm64-chroot)root@p100:/# ./hello
Hello, World!

Maybe you can send me your binary (and the needed klibc*so)?
Btw, I tested a whole bunch of platforms too, see below...

Helge


$ gdb -quiet --args ./build/qemu-arm -d guest_errors,page,strace ~/hello
Reading symbols from ./build/qemu-arm...
(gdb) r
Starting program: build/qemu-arm -d guest_errors,page,strace
/home/joel/hello
Using host libthread_db library "/lib/powerpc64le-linux-gnu/libthread_db.so.1".
[New Thread 0x7762ece0 (LWP 118359)]
host mmap_min_addr=0x1
pgb_find_hole: base @ 14042 for 4294967296 bytes
pgb_static: base @ 14042 for 4294967295 bytes
pgb_reserved_va: base @ 0x14042 for 4294967296 bytes
Locating guest address space @ 0x14042
page layout changed following mmap
startend  size prot
0001-0009 0008 ---
0009-0009b000 b000 ---
- 0001 r-x
page layout changed following mmap
startend  size prot
0001-0009 0008 r-x
0009-0009b000 b000 ---
- 0001 r-x
page layout changed following mmap
startend  size prot
0001-0009 0008 r-x
0009-000a 0001 rw-
- 0001 r-x
page layout changed following mmap
startend  size prot
0001-0009 0008 r-x
0009-000a 0001 rw-
e000-e081 0081 rw-
- 0001 r-x
page layout changed following mmap
startend  size prot
0001-0009 0008 r-x
0009-000a 0001 rw-
e000-e001 0001 ---
e001-e0811000 00801000 rw-
- 0001 r-x
guest_base  0x14042
page layout changed following binary load
startend  size prot
0001-0009 0008 r-x
0009-000a 0001 rw-
e000-e001 0001 ---
e001-e081 0080 rw-
e081-e0811000 1000 r-x
- 0001 r-x
start_brk   0x
end_code0x00084f7c
start_code  0x0001
start_data  0x00095098
end_data0x00098394
start_stack 0xe080f410
brk 0x0009b000
entry   0x00010418
argv_start  0xe080f414
env_start   0xe080f41c
auxv_start  0xe080f4a0
118357 brk(NULL) = 0x0009b000
118357 brk(0x0009b8fc) = 0x0009b000

Thread 1 "qemu-arm" received signal SIGSEGV, Segmentation fault.
0x7fffeed9bb74 in code_gen_buffer ()
(gdb)
(gdb) bt
#0  0x7fffeed9bb74 in code_gen_buffer ()
#1  0x000100169e3c in cpu_tb_exec (cpu=cpu@entry=0x1003d4aa0,
 itb=itb@entry=0x7fffeed9ba60 ,
tb_exit=tb_exit@entry=0x7fffe50c)
 at ../accel/tcg/cpu-exec.c:457
#2  0x00010016a564 in cpu_loop_exec_tb (tb_exit=0x7fffe50c,
last_tb=,
 pc=, tb=0x7fffeed9ba60 ,
cpu=)
 at ../accel/tcg/cpu-exec.c:919
#3  cpu_exec_loop (cpu=cpu@entry=0x1003d4aa0, sc=) at
../accel/tcg/cpu-exec.c:1040
#4  0x00010016aa0c in cpu_exec_setjmp (cpu=cpu@entry=0x1003d4aa0,
sc=)
 at ../accel/tcg/cpu-exec.c:1057
#5  0x00010016b0d0 in cpu_exec (cpu=0x1003d4aa0) at
../accel/tcg/cpu-exec.c:1083
#6  0x00010004d780 in cpu_loop (env=0x1003d4fb0) at
../linux-user/arm/cpu_loop.c:323
#7  0x000100047534 in main (argc=,
argv=0x7178, envp=)
 at ../linux-user/main.c:975

I tested 74a22a175c4340a01f6f860f72307093e3307681.


Those I did tested sucessfully (static binary):

alpha-chroot:
Linux p100 6.4.6-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Jul 24 20:51:12 UTC 
2023 alpha GNU/Linux
/hello: ELF 64-bit LSB executable, Alpha (unofficial), version 1 (SYSV), 
statically linked, BuildID[sha1]=5bf21139aa3937121e8843b062619de8e53d035a, for 
GNU/Linux 3.2.0, not stripped
Hello, World!

arm64-chroot:
Linux p100 6.4.6-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Jul 24 20:51:12 UTC 
2023 aarch64 GNU/Linux
/hello: ELF 64-bit LSB executable, ARM aarch64, version 1 (GNU/Linux), 
statically linked, BuildID[sha1]=201827af1ffdef4fc2afa404047c6d1a41e4825e, for 
GNU/Linux 3.7.0, not stripped
Hello, World!

armel-chroot:
Linux p100 6.4.6-200.fc38.x86_64 #1

Re: [PATCH for-8.2 v2 5/6] vfio/migration: Add P2P support for VFIO migration

2023-08-01 Thread Cédric Le Goater


On 7/31/23 12:25, Avihai Horon wrote:

VFIO migration uAPI defines an optional intermediate P2P quiescent
state. While in the P2P quiescent state, P2P DMA transactions cannot be
initiated by the device, but the device can respond to incoming ones.
Additionally, all outstanding P2P transactions are guaranteed to have
been completed by the time the device enters this state.

The purpose of this state is to support migration of multiple devices
that might do P2P transactions between themselves.

Add support for P2P migration by transitioning all the devices to the
P2P quiescent state before stopping or starting the devices. Use the new
VMChangeStateHandler prepare_cb to achieve that behavior.

This will allow migration of multiple VFIO devices if all of them
support P2P migration.

Signed-off-by: Avihai Horon 
---
  docs/devel/vfio-migration.rst | 93 +--
  hw/vfio/common.c  |  6 ++-
  hw/vfio/migration.c   | 47 --
  hw/vfio/trace-events  |  1 +
  4 files changed, 106 insertions(+), 41 deletions(-)

diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
index b433cb5bb2..605fe60e96 100644
--- a/docs/devel/vfio-migration.rst
+++ b/docs/devel/vfio-migration.rst
@@ -23,9 +23,21 @@ and recommends that the initial bytes are sent and loaded in 
the destination
  before stopping the source VM. Enabling this migration capability will
  guarantee that and thus, can potentially reduce downtime even further.
  
-Note that currently VFIO migration is supported only for a single device. This

-is due to VFIO migration's lack of P2P support. However, P2P support is planned
-to be added later on.
+To support migration of multiple devices that might do P2P transactions between
+themselves, VFIO migration uAPI defines an intermediate P2P quiescent state.
+While in the P2P quiescent state, P2P DMA transactions cannot be initiated by
+the device, but the device can respond to incoming ones. Additionally, all
+outstanding P2P transactions are guaranteed to have been completed by the time
+the device enters this state.
+
+All the devices that support P2P migration are first transitioned to the P2P
+quiescent state and only then are they stopped or started. This makes migration
+safe P2P-wise, since starting and stopping the devices is not done atomically
+for all the devices together.
+
+Thus, multiple VFIO devices migration is allowed only if all the devices
+support P2P migration. Single VFIO device migration is allowed regardless of
+P2P migration support.
  
  A detailed description of the UAPI for VFIO device migration can be found in

  the comment for the ``vfio_device_mig_state`` structure in the header file
@@ -132,54 +144,63 @@ will be blocked.
  Flow of state changes during Live migration
  ===
  
-Below is the flow of state change during live migration.

+Below is the state change flow during live migration for a VFIO device that
+supports both precopy and P2P migration. The flow for devices that don't
+support it is similar, except that the relevant states for precopy and P2P are
+skipped.
  The values in the parentheses represent the VM state, the migration state, and
  the VFIO device state, respectively.
-The text in the square brackets represents the flow if the VFIO device supports
-pre-copy.
  
  Live migration save path

  
  
  ::
  
-QEMU normal running state

-(RUNNING, _NONE, _RUNNING)
-  |
+   QEMU normal running state
+   (RUNNING, _NONE, _RUNNING)
+  |
   migrate_init spawns migration_thread
-Migration thread then calls each device's .save_setup()
-  (RUNNING, _SETUP, _RUNNING [_PRE_COPY])
-  |
-  (RUNNING, _ACTIVE, _RUNNING [_PRE_COPY])
-  If device is active, get pending_bytes by 
.state_pending_{estimate,exact}()
-  If total pending_bytes >= threshold_size, call .save_live_iterate()
-  [Data of VFIO device for pre-copy phase is copied]
-Iterate till total pending bytes converge and are less than threshold
-  |
-  On migration completion, vCPU stops and calls .save_live_complete_precopy for
-  each active device. The VFIO device is then transitioned into _STOP_COPY 
state
-  (FINISH_MIGRATE, _DEVICE, _STOP_COPY)
-  |
- For the VFIO device, iterate in .save_live_complete_precopy until
- pending data is 0
-   (FINISH_MIGRATE, _DEVICE, _STOP)
-  |
- (FINISH_MIGRATE, _COMPLETED, _STOP)
- Migraton thread schedules cleanup bottom half and exits
+Migration thread t

Re: [PATCH for-8.2 v2 2/6] sysemu: Add prepare callback to struct VMChangeStateEntry

2023-08-01 Thread Cédric Le Goater


On 7/31/23 12:25, Avihai Horon wrote:

Add prepare callback to struct VMChangeStateEntry.

The prepare callback is optional and can be set by the new function
qemu_add_vm_change_state_handler_prio_full() that allows setting this
callback in addition to the main callback.

The prepare callbacks and main callbacks are called in two separate
phases: First all prepare callbacks are called and only then all main
callbacks are called.

The purpose of the new prepare callback is to allow all devices to run a
preliminary task before calling the devices' main callbacks.

This will facilitate adding P2P support for VFIO migration where all
VFIO devices need to be put in an intermediate P2P quiescent state
before being stopped or started by the main callback.

Signed-off-by: Avihai Horon 


Reviewed-by: Cédric Le Goater 

Thanks,

C.



---
  include/sysemu/runstate.h |  4 
  softmmu/runstate.c| 40 +++
  2 files changed, 44 insertions(+)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 7beb29c2e2..764a0fc6a4 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -16,6 +16,10 @@ VMChangeStateEntry 
*qemu_add_vm_change_state_handler(VMChangeStateHandler *cb,
   void *opaque);
  VMChangeStateEntry *qemu_add_vm_change_state_handler_prio(
  VMChangeStateHandler *cb, void *opaque, int priority);
+VMChangeStateEntry *
+qemu_add_vm_change_state_handler_prio_full(VMChangeStateHandler *cb,
+   VMChangeStateHandler *prepare_cb,
+   void *opaque, int priority);
  VMChangeStateEntry *qdev_add_vm_change_state_handler(DeviceState *dev,
   VMChangeStateHandler *cb,
   void *opaque);
diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index f3bd862818..1652ed0439 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -271,6 +271,7 @@ void qemu_system_vmstop_request(RunState state)
  }
  struct VMChangeStateEntry {
  VMChangeStateHandler *cb;
+VMChangeStateHandler *prepare_cb;
  void *opaque;
  QTAILQ_ENTRY(VMChangeStateEntry) entries;
  int priority;
@@ -293,12 +294,39 @@ static QTAILQ_HEAD(, VMChangeStateEntry) 
vm_change_state_head =
   */
  VMChangeStateEntry *qemu_add_vm_change_state_handler_prio(
  VMChangeStateHandler *cb, void *opaque, int priority)
+{
+return qemu_add_vm_change_state_handler_prio_full(cb, NULL, opaque,
+  priority);
+}
+
+/**
+ * qemu_add_vm_change_state_handler_prio_full:
+ * @cb: the main callback to invoke
+ * @prepare_cb: a callback to invoke before the main callback
+ * @opaque: user data passed to the callbacks
+ * @priority: low priorities execute first when the vm runs and the reverse is
+ *true when the vm stops
+ *
+ * Register a main callback function and an optional prepare callback function
+ * that are invoked when the vm starts or stops running. The main callback and
+ * the prepare callback are called in two separate phases: First all prepare
+ * callbacks are called and only then all main callbacks are called. As its
+ * name suggests, the prepare callback can be used to do some preparatory work
+ * before invoking the main callback.
+ *
+ * Returns: an entry to be freed using qemu_del_vm_change_state_handler()
+ */
+VMChangeStateEntry *
+qemu_add_vm_change_state_handler_prio_full(VMChangeStateHandler *cb,
+   VMChangeStateHandler *prepare_cb,
+   void *opaque, int priority)
  {
  VMChangeStateEntry *e;
  VMChangeStateEntry *other;
  
  e = g_malloc0(sizeof(*e));

  e->cb = cb;
+e->prepare_cb = prepare_cb;
  e->opaque = opaque;
  e->priority = priority;
  
@@ -333,10 +361,22 @@ void vm_state_notify(bool running, RunState state)

  trace_vm_state_notify(running, state, RunState_str(state));
  
  if (running) {

+QTAILQ_FOREACH_SAFE(e, &vm_change_state_head, entries, next) {
+if (e->prepare_cb) {
+e->prepare_cb(e->opaque, running, state);
+}
+}
+
  QTAILQ_FOREACH_SAFE(e, &vm_change_state_head, entries, next) {
  e->cb(e->opaque, running, state);
  }
  } else {
+QTAILQ_FOREACH_REVERSE_SAFE(e, &vm_change_state_head, entries, next) {
+if (e->prepare_cb) {
+e->prepare_cb(e->opaque, running, state);
+}
+}
+
  QTAILQ_FOREACH_REVERSE_SAFE(e, &vm_change_state_head, entries, next) {
  e->cb(e->opaque, running, state);
  }

Re: [PATCH for-8.2 v2 6/6] vfio/migration: Allow migration of multiple P2P supporting devices

2023-08-01 Thread Cédric Le Goater


On 7/31/23 12:25, Avihai Horon wrote:

Now that P2P support has been added to VFIO migration, allow migration
of multiple devices if all of them support P2P migration.

Single device migration is allowed regardless of P2P migration support.

Signed-off-by: Avihai Horon 
Signed-off-by: Joao Martins 



Reviewed-by: Cédric Le Goater 

Thanks,

C.



---
  hw/vfio/common.c | 26 ++
  1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 7c3d636025..8a8d074e18 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -363,21 +363,31 @@ bool vfio_mig_active(void)
  
  static Error *multiple_devices_migration_blocker;
  
-static unsigned int vfio_migratable_device_num(void)

+/*
+ * Multiple devices migration is allowed only if all devices support P2P
+ * migration. Single device migration is allowed regardless of P2P migration
+ * support.
+ */
+static bool vfio_multiple_devices_migration_is_supported(void)
  {
  VFIOGroup *group;
  VFIODevice *vbasedev;
  unsigned int device_num = 0;
+bool all_support_p2p = true;
  
  QLIST_FOREACH(group, &vfio_group_list, next) {

  QLIST_FOREACH(vbasedev, &group->device_list, next) {
  if (vbasedev->migration) {
  device_num++;
+
+if (!(vbasedev->migration->mig_flags & VFIO_MIGRATION_P2P)) {
+all_support_p2p = false;
+}
  }
  }
  }
  
-return device_num;

+return all_support_p2p || device_num <= 1;
  }
  
  int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)

@@ -385,19 +395,19 @@ int vfio_block_multiple_devices_migration(VFIODevice 
*vbasedev, Error **errp)
  int ret;
  
  if (multiple_devices_migration_blocker ||

-vfio_migratable_device_num() <= 1) {
+vfio_multiple_devices_migration_is_supported()) {
  return 0;
  }
  
  if (vbasedev->enable_migration == ON_OFF_AUTO_ON) {

-error_setg(errp, "Migration is currently not supported with multiple "
- "VFIO devices");
+error_setg(errp, "Multiple VFIO devices migration is supported only if 
"
+ "all of them support P2P migration");
  return -EINVAL;
  }
  
  error_setg(&multiple_devices_migration_blocker,

-   "Migration is currently not supported with multiple "
-   "VFIO devices");
+   "Multiple VFIO devices migration is supported only if all of "
+   "them support P2P migration");
  ret = migrate_add_blocker(multiple_devices_migration_blocker, errp);
  if (ret < 0) {
  error_free(multiple_devices_migration_blocker);
@@ -410,7 +420,7 @@ int vfio_block_multiple_devices_migration(VFIODevice 
*vbasedev, Error **errp)
  void vfio_unblock_multiple_devices_migration(void)
  {
  if (!multiple_devices_migration_blocker ||
-vfio_migratable_device_num() > 1) {
+!vfio_multiple_devices_migration_is_supported()) {
  return;
  }

Re: [PATCH for-8.2 v2 3/6] qdev: Add qdev_add_vm_change_state_handler_full()

2023-08-01 Thread Cédric Le Goater


On 7/31/23 12:25, Avihai Horon wrote:

Add qdev_add_vm_change_state_handler_full() variant that allows setting
a prepare callback in addition to the main callback.

This will facilitate adding P2P support for VFIO migration in the
following patches.

Signed-off-by: Avihai Horon 
Signed-off-by: Joao Martins 



Reviewed-by: Cédric Le Goater 

Thanks,

C.



---
  include/sysemu/runstate.h |  3 +++
  hw/core/vm-change-state-handler.c | 14 +-
  2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 764a0fc6a4..08afb97695 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -23,6 +23,9 @@ 
qemu_add_vm_change_state_handler_prio_full(VMChangeStateHandler *cb,
  VMChangeStateEntry *qdev_add_vm_change_state_handler(DeviceState *dev,
   VMChangeStateHandler *cb,
   void *opaque);
+VMChangeStateEntry *qdev_add_vm_change_state_handler_full(
+DeviceState *dev, VMChangeStateHandler *cb,
+VMChangeStateHandler *prepare_cb, void *opaque);
  void qemu_del_vm_change_state_handler(VMChangeStateEntry *e);
  /**
   * vm_state_notify: Notify the state of the VM
diff --git a/hw/core/vm-change-state-handler.c 
b/hw/core/vm-change-state-handler.c
index 1f3630986d..8e2639224e 100644
--- a/hw/core/vm-change-state-handler.c
+++ b/hw/core/vm-change-state-handler.c
@@ -55,8 +55,20 @@ static int qdev_get_dev_tree_depth(DeviceState *dev)
  VMChangeStateEntry *qdev_add_vm_change_state_handler(DeviceState *dev,
   VMChangeStateHandler *cb,
   void *opaque)
+{
+return qdev_add_vm_change_state_handler_full(dev, cb, NULL, opaque);
+}
+
+/*
+ * Exactly like qdev_add_vm_change_state_handler() but passes a prepare_cb
+ * argument too.
+ */
+VMChangeStateEntry *qdev_add_vm_change_state_handler_full(
+DeviceState *dev, VMChangeStateHandler *cb,
+VMChangeStateHandler *prepare_cb, void *opaque)
  {
  int depth = qdev_get_dev_tree_depth(dev);
  
-return qemu_add_vm_change_state_handler_prio(cb, opaque, depth);

+return qemu_add_vm_change_state_handler_prio_full(cb, prepare_cb, opaque,
+  depth);
  }

Re: [PATCH] target/nios2: Pass semihosting arg to exit

2023-08-01 Thread Peter Maydell

On Mon, 31 Jul 2023 at 23:42, Keith Packard via  wrote:
>
> Instead of using the function number (which is always zero), fetch the
> application-provided exit code argument and pass that to the two exit
> functions.
>
> Signed-off-by: Keith Packard 
> ---
>  target/nios2/nios2-semi.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/target/nios2/nios2-semi.c b/target/nios2/nios2-semi.c
> index 3738774976..ffd1f095f6 100644
> --- a/target/nios2/nios2-semi.c
> +++ b/target/nios2/nios2-semi.c
> @@ -133,8 +133,9 @@ void do_nios2_semihosting(CPUNios2State *env)
>  args = env->regs[R_ARG1];
>  switch (nr) {
>  case HOSTED_EXIT:
> -gdb_exit(env->regs[R_ARG0]);
> -exit(env->regs[R_ARG0]);
> +GET_ARG(0);
> +gdb_exit(arg0);
> +exit(arg0);

The spec
https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;a=blob;f=libgloss/nios2/nios2-semi.txt;hb=HEAD

says that for HOSTED_EXIT the exit code is in r5,
not in a parameter block pointed to by r5. That
would imply that the correct change is to use
R_ARG1 rather than R_ARG0 here.

thanks
-- PMM

Re: [PATCH] target/nios2: Fix semihost lseek offset computation

2023-08-01 Thread Peter Maydell

On Tue, 1 Aug 2023 at 00:53, Keith Packard via  wrote:
>
> The arguments for deposit64 are (value, start, length, fieldval); this
> appears to have thought they were (value, fieldval, start,
> length). Reorder the parameters to match the actual function.
>
> Signed-off-by: Keith Packard 
> ---
>  target/nios2/nios2-semi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/nios2/nios2-semi.c b/target/nios2/nios2-semi.c
> index ffd1f095f6..fc1df1dfeb 100644
> --- a/target/nios2/nios2-semi.c
> +++ b/target/nios2/nios2-semi.c
> @@ -170,7 +170,7 @@ void do_nios2_semihosting(CPUNios2State *env)
>  GET_ARG64(2);
>  GET_ARG64(3);
>  semihost_sys_lseek(cs, nios2_semi_u64_cb, arg0,
> -   deposit64(arg2, arg1, 32, 32), arg3);
> +   deposit64(arg2, 32, 32, arg1), arg3);
>  break;
>
>  case HOSTED_RENAME:

Fixes: d1e23cbaa403b2d ("target/nios2: Use semihosting/syscalls.h")
Cc: qemu-sta...@nongnu.org
Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH QEMU 1/2] qapi: Reformat and craft the migration doc comments

2023-08-01 Thread Markus Armbruster

Yong Huang  writes:

> On Fri, Jul 28, 2023 at 3:49 PM Markus Armbruster  wrote:
>
>> ~hyman  writes:
>>
>> > From: Hyman Huang(黄勇) 
>> >
>> > Reformat migration doc comments to conform to current conventions
>> > as commit a937b6aa739 (qapi: Reformat doc comments to conform to
>> > current conventions).
>> >
>> > Also, craft the dirty-limit capability comment.
>>
>> Split into two patches?
>>
> Ok.
>
>>
>> > Signed-off-by: Hyman Huang(黄勇) 
>> > ---
>> >  qapi/migration.json | 66 +
>> >  1 file changed, 31 insertions(+), 35 deletions(-)
>> >
>> > diff --git a/qapi/migration.json b/qapi/migration.json
>> > index 6b49593d2f..5d5649c885 100644
>> > --- a/qapi/migration.json
>> > +++ b/qapi/migration.json
>> > @@ -258,17 +258,17 @@
>> >  # blocked.  Present and non-empty when migration is blocked.
>> >  # (since 6.0)
>> >  #
>> > -# @dirty-limit-throttle-time-per-round: Maximum throttle time 
>> > (inmicroseconds) of virtual
>> > -#   CPUs each dirty ring fullround, 
>> > which shows how
>> > -#   MigrationCapability 
>> > dirty-limitaffects the guest
>> > -#   during live migration. (since8.1)
>> > -#
>> > -# @dirty-limit-ring-full-time: Estimated average dirty ring full time(in 
>> > microseconds)
>> > -#  each dirty ring full round, note thatthe 
>> > value equals
>> > -#  dirty ring memory size divided byaverage 
>> > dirty page rate
>> > -#  of virtual CPU, which can be used 
>> > toobserve the average
>> > -#  memory load of virtual CPU indirectly.Note 
>> > that zero
>> > -#  means guest doesn't dirty memory (since8.1)
>> > +# @dirty-limit-throttle-time-per-round: Maximum throttle time
>> > +# (in microseconds) of virtual CPUs each dirty ring full round,
>> > +# which shows how MigrationCapability dirty-limit affects the
>>
>> Perhaps "for each ... round"?
>>
>> Remind me, what's a "dirty ring full round"?
>>
> Every time the x86 PML buffer is filled with gpa, hardware throws an
> exception and
> guest exits to kvm, then to qemu. Qemu will handle the exception with
> reaping the
> dirty ring and get the dirty page info, then enter the kvm, empty the PML
> buffer and
> enter guests again, i call this "dirty ring full round", but it seems not
> straightforward,
> please help me describe that,  thanks.

"x86 PML" is page modification logging, right?

>> > +# guest during live migration.  (Since 8.1)
>> > +#
>> > +# @dirty-limit-ring-full-time: Estimated average dirty ring full
>> > +# time (in microseconds) each dirty ring full round. The value
>>
>> Likewise.
>>
>> > +# equals dirty ring memory size divided by average dirty page
>>
>> "the dirty ring memory size divided by the average ..."
>>
>> > +# rate of the virtual CPU, which can be used to observe the
>> > +# average memory load of the virtual CPU indirectly. Note that
>> > +# zero means guest doesn't dirty memory.  (Since 8.1)
>>
>> Two spaces between sentences for consistency.
>>
>> >  #
>> >  # Since: 0.14
>> >  ##
>> > @@ -519,15 +519,11 @@
>> >  # are present.  'return-path' capability must be enabled to use
>> >  # it.  (since 8.1)
>> >  #
>> > -# @dirty-limit: If enabled, migration will use the dirty-limit algo to
>> > -#   throttle down guest instead of auto-converge algo.
>> > -#   Throttle algo only works when vCPU's dirtyrate greater
>> > -#   than 'vcpu-dirty-limit', read processes in guest os
>> > -#   aren't penalized any more, so this algo can improve
>> > -#   performance of vCPU during live migration. This is an
>> > -#   optional performance feature and should not affect the
>> > -#   correctness of the existing auto-converge algo.
>> > -#   (since 8.1)
>> > +# @dirty-limit: If enabled, migration will throttle vCPUs as needed to
>> > +# keep their dirty page rate within @vcpu-dirty-limit.  This can
>> > +# improve responsiveness of large guests during live migration,
>> > +# and can result in more stable read performance.  Requires KVM
>> > +# with accelerator property "dirty-ring-size" set.  (Since 8.1)
>> >  #
>> >  # Features:
>> >  #
>> > @@ -822,17 +818,17 @@
>> >  # Nodes are mapped to their block device name if there is one, and
>> >  # to their node name otherwise.  (Since 5.2)
>> >  #
>> > -# @x-vcpu-dirty-limit-period: Periodic time (in milliseconds) of 
>> > dirtylimit during
>> > -# live migration. Should be in the range 1to 
>> > 1000ms,
>> > -# defaults to 1000ms. (Since 8.1)
>> > +# @x-vcpu-dirty-limit-period: Periodic time (in milliseconds) of dirty
>> > +# limit during  live migration. Should be in the ra

Re: [PATCH QEMU v3 1/3] qapi: Reformat the dirty-limit migration doc comments

2023-08-01 Thread Markus Armbruster

~hyman  writes:

> From: Hyman Huang(黄勇) 
>
> Reformat the dirty-limit migration doc comments to conform
> to current conventions as commit a937b6aa739 (qapi: Reformat
> doc comments to conform to current conventions).
>
> Signed-off-by: Markus Armbruster 

Unexpected S-o-b.  Accident?

> Signed-off-by: Hyman Huang(黄勇) 
> ---
>  qapi/migration.json | 69 ++---
>  1 file changed, 34 insertions(+), 35 deletions(-)
>
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 6b49593d2f..a74ade4d72 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -258,17 +258,17 @@
>  # blocked.  Present and non-empty when migration is blocked.
>  # (since 6.0)
>  #
> -# @dirty-limit-throttle-time-per-round: Maximum throttle time (in 
> microseconds) of virtual
> -#   CPUs each dirty ring full round, 
> which shows how
> -#   MigrationCapability dirty-limit 
> affects the guest
> -#   during live migration. (since 8.1)
> -#
> -# @dirty-limit-ring-full-time: Estimated average dirty ring full time (in 
> microseconds)
> -#  each dirty ring full round, note that the 
> value equals
> -#  dirty ring memory size divided by average 
> dirty page rate
> -#  of virtual CPU, which can be used to observe 
> the average
> -#  memory load of virtual CPU indirectly. Note 
> that zero
> -#  means guest doesn't dirty memory (since 8.1)
> +# @dirty-limit-throttle-time-per-round: Maximum throttle time
> +# (in microseconds) of virtual CPUs each dirty ring full round,
> +# which shows how MigrationCapability dirty-limit affects the
> +# guest during live migration.  (Since 8.1)
> +#
> +# @dirty-limit-ring-full-time: Estimated average dirty ring full
> +# time (in microseconds) for each dirty ring full round. The

Two spaces between sentences for consistency.

> +# value equals the dirty ring memory size divided by the average
> +# dirty page rate of the virtual CPU, which can be used to
> +# observe the average memory load of the virtual CPU indirectly.
> +# Note that zero means guest doesn't dirty memory.  (Since 8.1)
>  #
>  # Since: 0.14
>  ##
> @@ -519,15 +519,14 @@
>  # are present.  'return-path' capability must be enabled to use
>  # it.  (since 8.1)
>  #
> -# @dirty-limit: If enabled, migration will use the dirty-limit algo to
> -#   throttle down guest instead of auto-converge algo.
> -#   Throttle algo only works when vCPU's dirtyrate greater
> -#   than 'vcpu-dirty-limit', read processes in guest os
> -#   aren't penalized any more, so this algo can improve
> -#   performance of vCPU during live migration. This is an
> -#   optional performance feature and should not affect the
> -#   correctness of the existing auto-converge algo.
> -#   (since 8.1)
> +# @dirty-limit: If enabled, migration will use the dirty-limit
> +# algorithim to throttle down guest instead of auto-converge
> +# algorithim. Throttle algorithim only works when vCPU's dirtyrate
> +# greater than 'vcpu-dirty-limit', read processes in guest os
> +# aren't penalized any more, so this algorithim can improve
> +# performance of vCPU during live migration. This is an optional
> +# performance feature and should not affect the correctness of the
> +# existing auto-converge algorithim.  (Since 8.1)
>  #
>  # Features:
>  #
> @@ -822,17 +821,17 @@
>  # Nodes are mapped to their block device name if there is one, and
>  # to their node name otherwise.  (Since 5.2)
>  #
> -# @x-vcpu-dirty-limit-period: Periodic time (in milliseconds) of dirty limit 
> during
> -# live migration. Should be in the range 1 to 
> 1000ms,
> -# defaults to 1000ms. (Since 8.1)
> +# @x-vcpu-dirty-limit-period: Periodic time (in milliseconds) of dirty
> +# limit during live migration. Should be in the range 1 to 1000ms.
> +# Defaults to 1000ms.  (Since 8.1)

Likewise.

>  #
>  # @vcpu-dirty-limit: Dirtyrate limit (MB/s) during live migration.
> -#Defaults to 1. (Since 8.1)
> +# Defaults to 1.  (Since 8.1)
>  #
>  # Features:
>  #
>  # @unstable: Members @x-checkpoint-delay and @x-vcpu-dirty-limit-period
> -#are experimental.
> +# are experimental.
>  #
>  # Since: 2.4
>  ##
> @@ -988,17 +987,17 @@
>  # Nodes are mapped to their block device name if there is one, and
>  # to their node name otherwise.  (Since 5.2)
>  #
> -# @x-vcpu-dirty-limit-period: Periodic time (in milliseconds) of dirty limit 
> during
> -# live migration. Should be in the range 1 to 
> 1000ms,
> -#

Re: [PATCH v3 3/3] cpu, softmmu/vl.c: Change parsing of -cpu argument to allow -cpu cpu, help to print options for the CPU type similar to how the '-device' option works.

2023-08-01 Thread Dinah B

Just realized that the commit message on this one got a little mangled. I'm
happy to revise it but I'd prefer to get the code reviewed first before
doing a purely commit message change.

-Dinah

On Sun, Jul 30, 2023 at 2:41 AM Dinah Baum  wrote:

> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1480
> Signed-off-by: Dinah Baum 
>
> Signed-off-by: Dinah Baum 
> ---
>  cpu.c| 41 
>  include/qapi/qmp/qdict.h |  1 +
>  qemu-options.hx  |  7 ---
>  qobject/qdict.c  |  5 +
>  softmmu/vl.c | 35 +-
>  5 files changed, 85 insertions(+), 4 deletions(-)
>
> diff --git a/cpu.c b/cpu.c
> index a99d09cd47..9971ffeeba 100644
> --- a/cpu.c
> +++ b/cpu.c
> @@ -43,6 +43,10 @@
>  #include "trace/trace-root.h"
>  #include "qemu/accel.h"
>  #include "qemu/plugin.h"
> +#include "qemu/cutils.h"
> +#include "qemu/qemu-print.h"
> +#include "qapi/qmp/qdict.h"
> +#include "qapi/qmp/qobject.h"
>
>  uintptr_t qemu_host_page_size;
>  intptr_t qemu_host_page_mask;
> @@ -312,6 +316,43 @@ CpuModelExpansionInfo
> *qmp_query_cpu_model_expansion(CpuModelExpansionType type,
>  return get_cpu_model_expansion_info(type, model, errp);
>  }
>
> +void list_cpu_model_expansion(CpuModelExpansionType type,
> +  CpuModelInfo *model,
> +  Error **errp)
> +{
> +CpuModelExpansionInfo *expansion_info;
> +QDict *qdict;
> +QDictEntry *qdict_entry;
> +const char *key;
> +QObject *obj;
> +QType q_type;
> +GPtrArray *array;
> +int i;
> +const char *type_name;
> +
> +expansion_info = get_cpu_model_expansion_info(type, model, errp);
> +if (expansion_info) {
> +qdict = qobject_to(QDict, expansion_info->model->props);
> +if (qdict) {
> +qemu_printf("%s features:\n", model->name);
> +array = g_ptr_array_new();
> +for (qdict_entry = (QDictEntry *)qdict_first(qdict);
> qdict_entry;
> + qdict_entry = (QDictEntry *)qdict_next(qdict,
> qdict_entry)) {
> +g_ptr_array_add(array, qdict_entry);
> +}
> +g_ptr_array_sort(array, (GCompareFunc)dict_key_compare);
> +for (i = 0; i < array->len; i++) {
> +qdict_entry = array->pdata[i];
> +key = qdict_entry_key(qdict_entry);
> +obj = qdict_get(qdict, key);
> +q_type = qobject_type(obj);
> +type_name = QType_str(q_type);
> +qemu_printf("  %s=<%s>\n", key, type_name);
> +}
> +}
> +}
> +}
> +
>  #if defined(CONFIG_USER_ONLY)
>  void tb_invalidate_phys_addr(hwaddr addr)
>  {
> diff --git a/include/qapi/qmp/qdict.h b/include/qapi/qmp/qdict.h
> index 82e90fc072..d0b6c3d358 100644
> --- a/include/qapi/qmp/qdict.h
> +++ b/include/qapi/qmp/qdict.h
> @@ -67,5 +67,6 @@ bool qdict_get_try_bool(const QDict *qdict, const char
> *key, bool def_value);
>  const char *qdict_get_try_str(const QDict *qdict, const char *key);
>
>  QDict *qdict_clone_shallow(const QDict *src);
> +int dict_key_compare(QDictEntry **entry1, QDictEntry **entry2);
>
>  #endif /* QDICT_H */
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 29b98c3d4c..e0f0284927 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -169,11 +169,12 @@ SRST
>  ERST
>
>  DEF("cpu", HAS_ARG, QEMU_OPTION_cpu,
> -"-cpu cpuselect CPU ('-cpu help' for list)\n", QEMU_ARCH_ALL)
> +"-cpu cpuselect CPU ('-cpu help' for list)\n"
> +"use '-cpu cpu,help' to print possible properties\n",
> QEMU_ARCH_ALL)
>  SRST
>  ``-cpu model``
> -Select CPU model (``-cpu help`` for list and additional feature
> -selection)
> +Select CPU model (``-cpu help`` and ``-cpu cpu,help``) for list and
> additional feature
> +selection
>  ERST
>
>  DEF("accel", HAS_ARG, QEMU_OPTION_accel,
> diff --git a/qobject/qdict.c b/qobject/qdict.c
> index 8faff230d3..31407e62f6 100644
> --- a/qobject/qdict.c
> +++ b/qobject/qdict.c
> @@ -447,3 +447,8 @@ void qdict_unref(QDict *q)
>  {
>  qobject_unref(q);
>  }
> +
> +int dict_key_compare(QDictEntry **entry1, QDictEntry **entry2)
> +{
> +return g_strcmp0(qdict_entry_key(*entry1), qdict_entry_key(*entry2));
> +}
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index b0b96f67fa..1fd87f2c06 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -501,6 +501,15 @@ static QemuOptsList qemu_action_opts = {
>  },
>  };
>
> +static QemuOptsList qemu_cpu_opts = {
> +.name = "cpu",
> +.implied_opt_name = "cpu",
> +.head = QTAILQ_HEAD_INITIALIZER(qemu_cpu_opts.head),
> +.desc = {
> +{ /* end of list */ }
> +},
> +};
> +
>  const char *qemu_get_vm_name(void)
>  {
>  return qemu_name;
> @@ -1159,6 +1168,26 @@ static int device_init_func(void *opaque, QemuOpts
> *opts, Error **errp)
>  return 0;
>  }
>
> +static int cpu_h

[PATCH 3/8] gitlab: always use updated msys installer

2023-08-01 Thread Daniel P . Berrangé

We current reference an msys installer binary from mid-2022, which means
after installation, it immediately has to re-download a bunch of newer
content. This wastes precious CI time.

The msys project publishes an installer binary with a fixed URL that
always references the latest content. We cache the downloads in gitlab
though and so once downloaded we would never re-fetch the installer
leading back to the same problem.

To deal with this we also fetch the pgp signature for the installer
on every run, and compare that to the previously cached signature. If
the signature changes, we re-download the full installer.

This ensures we always have the latest installer for msys, while also
maximising use of the gitlab cache.

Signed-off-by: Daniel P. Berrangé 
---
 .gitlab-ci.d/windows.yml | 30 +++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/.gitlab-ci.d/windows.yml b/.gitlab-ci.d/windows.yml
index 831b080d12..0bc04ad068 100644
--- a/.gitlab-ci.d/windows.yml
+++ b/.gitlab-ci.d/windows.yml
@@ -23,10 +23,34 @@
   - If ( !(Test-Path -Path msys64\var\cache ) ) {
   mkdir msys64\var\cache
 }
-  - If ( !(Test-Path -Path msys64\var\cache\msys2.exe ) ) {
+  - Invoke-WebRequest
+"https://repo.msys2.org/distrib/msys2-x86_64-latest.sfx.exe.sig";
+-outfile "msys2.exe.sig"
+  - if ( Test-Path -Path msys64\var\cache\msys2.exe.sig ) {
+  Write-Output "Cached installer sig" ;
+  if ( ((Get-FileHash msys2.exe.sig).Hash -ne (Get-FileHash 
msys64\var\cache\msys2.exe.sig).Hash) ) {
+Write-Output "Mis-matched installer sig, new installer download 
required" ;
+Remove-Item -Path msys64\var\cache\msys2.exe.sig ;
+if ( Test-Path -Path msys64\var\cache\msys2.exe ) {
+  Remove-Item -Path msys64\var\cache\msys2.exe
+}
+  } else {
+Write-Output "Matched installer sig, cached installer still valid"
+  }
+} else {
+  Write-Output "No cached installer sig, new installer download required" ;
+  if ( Test-Path -Path msys64\var\cache\msys2.exe ) {
+Remove-Item -Path msys64\var\cache\msys2.exe
+  }
+}
+  - if ( !(Test-Path -Path msys64\var\cache\msys2.exe ) ) {
+  Write-Output "Fetching latest installer" ;
   Invoke-WebRequest
-  
"https://github.com/msys2/msys2-installer/releases/download/2022-06-03/msys2-base-x86_64-20220603.sfx.exe";
-  -outfile "msys64\var\cache\msys2.exe"
+  "https://repo.msys2.org/distrib/msys2-x86_64-latest.sfx.exe";
+  -outfile "msys64\var\cache\msys2.exe" ;
+  Copy-Item -Path msys2.exe.sig -Destination msys64\var\cache\msys2.exe.sig
+} else {
+  Write-Output "Using cached installer"
 }
   - Write-Output "Invoking msys2.exe installer at $(Get-Date -Format u)"
   - msys64\var\cache\msys2.exe -y
-- 
2.41.0

[PATCH 5/8] gitlab: always populate cache for windows msys jobs

2023-08-01 Thread Daniel P . Berrangé

The cache is used to hold the msys installer. Even if the build phase
fails, we should still populate the cache as the installer will be
valid for next time.

Signed-off-by: Daniel P. Berrangé 
---
 .gitlab-ci.d/windows.yml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.gitlab-ci.d/windows.yml b/.gitlab-ci.d/windows.yml
index 6454880cb7..34109a80f2 100644
--- a/.gitlab-ci.d/windows.yml
+++ b/.gitlab-ci.d/windows.yml
@@ -8,6 +8,7 @@
 key: "${CI_JOB_NAME}-cache"
 paths:
   - msys64/var/cache
+when: always
   needs: []
   stage: build
   timeout: 80m
-- 
2.41.0

[PATCH 6/8] configure: support passthrough of -Dxxx args to meson

2023-08-01 Thread Daniel P . Berrangé

This can be useful for setting some meson global options, such as the
optimization level or debug state.xs

Signed-off-by: Daniel P. Berrangé 
---
 configure | 5 +
 1 file changed, 5 insertions(+)

diff --git a/configure b/configure
index 26ec5e4f54..9fe3718b77 100755
--- a/configure
+++ b/configure
@@ -757,6 +757,9 @@ for opt do
   # everything else has the same name in configure and meson
   --*) meson_option_parse "$opt" "$optarg"
   ;;
+  # Pass through -D options to meson
+  -D*) meson_options="$meson_options $opt"
+  ;;
   esac
 done
 
@@ -887,6 +890,8 @@ cat << EOF
   pie Position Independent Executables
   debug-tcg   TCG debugging (default is disabled)
 
+  -Dmesonoptname=val  passthrough option to meson unmodified
+
 NOTE: The object files are built at the place where configure is launched
 EOF
 exit 0
-- 
2.41.0

[PATCH 8/8] gitlab: disable FF_SCRIPT_SECTIONS on msys jobs

2023-08-01 Thread Daniel P . Berrangé

The FF_SCRIPT_SECTIONS=1 variable should ordinarily cause output from
each line of the job script to be presented in a collapsible section
with execution time listed.

While it works on Linux shared runners, when used with Windows runners
with PowerShell, this option does not create any sections, and actually
causes echo'ing of commands to be disabled, making it even worse to
debug the jobs.

Signed-off-by: Daniel P. Berrangé 
---
 .gitlab-ci.d/windows.yml | 4 
 1 file changed, 4 insertions(+)

diff --git a/.gitlab-ci.d/windows.yml b/.gitlab-ci.d/windows.yml
index 552e3b751d..cd7622a761 100644
--- a/.gitlab-ci.d/windows.yml
+++ b/.gitlab-ci.d/windows.yml
@@ -12,6 +12,10 @@
   needs: []
   stage: build
   timeout: 80m
+  variables:
+# This feature doesn't (currently) work with PowerShell, it stops
+# the echo'ing of commands being run and doesn't show any timing
+FF_SCRIPT_SECTIONS: 0
   artifacts:
 name: "$CI_JOB_NAME-$CI_COMMIT_REF_SLUG"
 expire_in: 7 days
-- 
2.41.0

[PATCH 2/8] gitlab: print timestamps during windows msys jobs

2023-08-01 Thread Daniel P . Berrangé

It is hard to get visibility into where time is consumed in our Windows
msys jobs. Adding a few log console messages with the timestamp will
aid in our debugging.

Signed-off-by: Daniel P. Berrangé 
---
 .gitlab-ci.d/windows.yml | 5 +
 1 file changed, 5 insertions(+)

diff --git a/.gitlab-ci.d/windows.yml b/.gitlab-ci.d/windows.yml
index f086540e40..831b080d12 100644
--- a/.gitlab-ci.d/windows.yml
+++ b/.gitlab-ci.d/windows.yml
@@ -19,6 +19,7 @@
 reports:
   junit: "build/meson-logs/testlog.junit.xml"
   before_script:
+  - Write-Output "Acquiring msys2.exe installer at $(Get-Date -Format u)"
   - If ( !(Test-Path -Path msys64\var\cache ) ) {
   mkdir msys64\var\cache
 }
@@ -27,6 +28,7 @@
   
"https://github.com/msys2/msys2-installer/releases/download/2022-06-03/msys2-base-x86_64-20220603.sfx.exe";
   -outfile "msys64\var\cache\msys2.exe"
 }
+  - Write-Output "Invoking msys2.exe installer at $(Get-Date -Format u)"
   - msys64\var\cache\msys2.exe -y
   - ((Get-Content -path .\msys64\etc\\post-install\\07-pacman-key.post -Raw)
   -replace '--refresh-keys', '--version') |
@@ -36,6 +38,7 @@
   - .\msys64\usr\bin\bash -lc 'pacman --noconfirm -Syuu'  # Normal update
   - taskkill /F /FI "MODULES eq msys-2.0.dll"
   script:
+  - Write-Output "Installing mingw packages at $(Get-Date -Format u)"
   - .\msys64\usr\bin\bash -lc "pacman -Sy --noconfirm --needed
   bison diffutils flex
   git grep make sed
@@ -66,6 +69,7 @@
   $MINGW_TARGET-spice
   $MINGW_TARGET-usbredir
   $MINGW_TARGET-zstd "
+  - Write-Output "Running build at $(Get-Date -Format u)"
   - $env:CHERE_INVOKING = 'yes'  # Preserve the current working directory
   - $env:MSYS = 'winsymlinks:native' # Enable native Windows symlink
   - mkdir build
@@ -73,6 +77,7 @@
   - ..\msys64\usr\bin\bash -lc "../configure --enable-fdt=system 
$CONFIGURE_ARGS"
   - ..\msys64\usr\bin\bash -lc "make"
   - ..\msys64\usr\bin\bash -lc "make check MTESTARGS='$TEST_ARGS' || { cat 
meson-logs/testlog.txt; exit 1; } ;"
+  - Write-Output "Finished build at $(Get-Date -Format u)"
 
 msys2-64bit:
   extends: .shared_msys2_builder
-- 
2.41.0

[PATCH 7/8] gitlab: disable optimization and debug symbols in msys build

2023-08-01 Thread Daniel P . Berrangé

Building at -O2, adds 33% to the build time, over -O2. IOW a build that
takes 45 minutes at -O0, takes 60 minutes at -O2. Turning off debug
symbols drops it further, down to 38 minutes.

IOW, a "-O2 -g" build is 58% slower than a "-O0" build on msys in the
gitlab CI windows shared runners.

Signed-off-by: Daniel P. Berrangé 
---
 .gitlab-ci.d/windows.yml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/.gitlab-ci.d/windows.yml b/.gitlab-ci.d/windows.yml
index 34109a80f2..552e3b751d 100644
--- a/.gitlab-ci.d/windows.yml
+++ b/.gitlab-ci.d/windows.yml
@@ -113,7 +113,7 @@ msys2-64bit:
 # commit 9f8e6cad65a6 ("gitlab-ci: Speed up the msys2-64bit job by using 
--without-default-devices"
 # changed to compile QEMU with the --without-default-devices switch
 # for the msys2 64-bit job, due to the build could not complete within
-CONFIGURE_ARGS:  --target-list=x86_64-softmmu --without-default-devices
+CONFIGURE_ARGS:  --target-list=x86_64-softmmu --without-default-devices 
-Ddebug=false -Doptimization=0
 # qTests don't run successfully with "--without-default-devices",
 # so let's exclude the qtests from CI for now.
 TEST_ARGS: --no-suite qtest
@@ -123,5 +123,5 @@ msys2-32bit:
   variables:
 MINGW_TARGET: mingw-w64-i686
 MSYSTEM: MINGW32
-CONFIGURE_ARGS:  --target-list=ppc64-softmmu
+CONFIGURE_ARGS:  --target-list=ppc64-softmmu -Ddebug=false -Doptimization=0
 TEST_ARGS: --no-suite qtest
-- 
2.41.0

[PATCH 4/8] gitlab: drop $CI_PROJECT_DIR from cache path

2023-08-01 Thread Daniel P . Berrangé

The gitlab cache is limited to only handle content within the
$CI_PROJECT_DIR hierarchy, and as such relative paths are always
implicitly relative to $CI_PROJECT_DIR.

Signed-off-by: Daniel P. Berrangé 
---
 .gitlab-ci.d/windows.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.gitlab-ci.d/windows.yml b/.gitlab-ci.d/windows.yml
index 0bc04ad068..6454880cb7 100644
--- a/.gitlab-ci.d/windows.yml
+++ b/.gitlab-ci.d/windows.yml
@@ -7,7 +7,7 @@
   cache:
 key: "${CI_JOB_NAME}-cache"
 paths:
-  - ${CI_PROJECT_DIR}/msys64/var/cache
+  - msys64/var/cache
   needs: []
   stage: build
   timeout: 80m
-- 
2.41.0

[PATCH 1/8] gitlab: remove duplication between msys jobs

2023-08-01 Thread Daniel P . Berrangé

Although they share a common parent, the two msys jobs still have
massive duplication in their script definitions that can easily be
collapsed.

Signed-off-by: Daniel P. Berrangé 
---
 .gitlab-ci.d/windows.yml | 132 +++
 1 file changed, 49 insertions(+), 83 deletions(-)

diff --git a/.gitlab-ci.d/windows.yml b/.gitlab-ci.d/windows.yml
index f889a468b5..f086540e40 100644
--- a/.gitlab-ci.d/windows.yml
+++ b/.gitlab-ci.d/windows.yml
@@ -35,97 +35,63 @@
   - .\msys64\usr\bin\bash -lc 'pacman --noconfirm -Syuu'  # Core update
   - .\msys64\usr\bin\bash -lc 'pacman --noconfirm -Syuu'  # Normal update
   - taskkill /F /FI "MODULES eq msys-2.0.dll"
-
-msys2-64bit:
-  extends: .shared_msys2_builder
   script:
   - .\msys64\usr\bin\bash -lc "pacman -Sy --noconfirm --needed
   bison diffutils flex
   git grep make sed
-  mingw-w64-x86_64-capstone
-  mingw-w64-x86_64-curl
-  mingw-w64-x86_64-cyrus-sasl
-  mingw-w64-x86_64-dtc
-  mingw-w64-x86_64-gcc
-  mingw-w64-x86_64-glib2
-  mingw-w64-x86_64-gnutls
-  mingw-w64-x86_64-gtk3
-  mingw-w64-x86_64-libgcrypt
-  mingw-w64-x86_64-libjpeg-turbo
-  mingw-w64-x86_64-libnfs
-  mingw-w64-x86_64-libpng
-  mingw-w64-x86_64-libssh
-  mingw-w64-x86_64-libtasn1
-  mingw-w64-x86_64-libusb
-  mingw-w64-x86_64-lzo2
-  mingw-w64-x86_64-nettle
-  mingw-w64-x86_64-ninja
-  mingw-w64-x86_64-pixman
-  mingw-w64-x86_64-pkgconf
-  mingw-w64-x86_64-python
-  mingw-w64-x86_64-SDL2
-  mingw-w64-x86_64-SDL2_image
-  mingw-w64-x86_64-snappy
-  mingw-w64-x86_64-spice
-  mingw-w64-x86_64-usbredir
-  mingw-w64-x86_64-zstd "
+  $MINGW_TARGET-capstone
+  $MINGW_TARGET-curl
+  $MINGW_TARGET-cyrus-sasl
+  $MINGW_TARGET-dtc
+  $MINGW_TARGET-gcc
+  $MINGW_TARGET-glib2
+  $MINGW_TARGET-gnutls
+  $MINGW_TARGET-gtk3
+  $MINGW_TARGET-libgcrypt
+  $MINGW_TARGET-libjpeg-turbo
+  $MINGW_TARGET-libnfs
+  $MINGW_TARGET-libpng
+  $MINGW_TARGET-libssh
+  $MINGW_TARGET-libtasn1
+  $MINGW_TARGET-libusb
+  $MINGW_TARGET-lzo2
+  $MINGW_TARGET-nettle
+  $MINGW_TARGET-ninja
+  $MINGW_TARGET-pixman
+  $MINGW_TARGET-pkgconf
+  $MINGW_TARGET-python
+  $MINGW_TARGET-SDL2
+  $MINGW_TARGET-SDL2_image
+  $MINGW_TARGET-snappy
+  $MINGW_TARGET-spice
+  $MINGW_TARGET-usbredir
+  $MINGW_TARGET-zstd "
   - $env:CHERE_INVOKING = 'yes'  # Preserve the current working directory
-  - $env:MSYSTEM = 'MINGW64' # Start a 64-bit MinGW environment
   - $env:MSYS = 'winsymlinks:native' # Enable native Windows symlink
   - mkdir build
   - cd build
-  # Note: do not remove "--without-default-devices"!
-  # commit 9f8e6cad65a6 ("gitlab-ci: Speed up the msys2-64bit job by using 
--without-default-devices"
-  # changed to compile QEMU with the --without-default-devices switch
-  # for the msys2 64-bit job, due to the build could not complete within
-  # the project timeout.
-  - ..\msys64\usr\bin\bash -lc '../configure --target-list=x86_64-softmmu
-  --without-default-devices --enable-fdt=system'
-  - ..\msys64\usr\bin\bash -lc 'make'
-  # qTests don't run successfully with "--without-default-devices",
-  # so let's exclude the qtests from CI for now.
-  - ..\msys64\usr\bin\bash -lc 'make check MTESTARGS=\"--no-suite qtest\" || { 
cat meson-logs/testlog.txt; exit 1; } ;'
+  - ..\msys64\usr\bin\bash -lc "../configure --enable-fdt=system 
$CONFIGURE_ARGS"
+  - ..\msys64\usr\bin\bash -lc "make"
+  - ..\msys64\usr\bin\bash -lc "make check MTESTARGS='$TEST_ARGS' || { cat 
meson-logs/testlog.txt; exit 1; } ;"
+
+msys2-64bit:
+  extends: .shared_msys2_builder
+  variables:
+MINGW_TARGET: mingw-w64-x86_64
+MSYSTEM: MINGW64
+# do not remove "--without-default-devices"!
+# commit 9f8e6cad65a6 ("gitlab-ci: Speed up the msys2-64bit job by using 
--without-default-devices"
+# changed to compile QEMU with the --without-default-devices switch
+# for the msys2 64-bit job, due to the build could not complete within
+CONFIGURE_ARGS:  --target-list=x86_64-softmmu --without-default-devices
+# qTests don't run successfully with "--without-default-devices",
+# so let's exclude the qtests from CI for now.
+TEST_ARGS: --no-suite qtest
 
 msys2-32bit:
   extends: .shared_msys2_builder
-  script:
-  - .\msys64\usr\bin\bash -lc "pacman -Sy --noconfirm --needed
-  bison diffutils flex
-  git grep make sed
-  mingw-w64-i686-capstone
-  mingw-w64-i686-curl
-  mingw-w64-i686-cyrus-sasl
-  mingw-w64-i686-dtc
-  mingw-w64-i686-gcc
-  mingw-w64-i686-glib2
-  mingw-w64-i686-gnutls
-  mingw-w64-i686-gtk3
-  mingw-w64-i686-libgcrypt
-  mingw-w64-i686-libjpeg-turbo
-  mingw-w64-i686-libnfs
-  mingw-w64-i686-libpng
-  mingw-w64-i686-libssh
-  mingw-w64-i686-libtasn1
-  mingw-w64-i686-libusb
-  mingw-w64-i686-lzo2

[PATCH 0/8] gitlab: speed up msys windows jobs with GCC

2023-08-01 Thread Daniel P . Berrangé

This is an alternative and/or complementary to Thomas' proposal
to use CLang with msys:

  https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg05402.html

First of all, the current msys installer we're using is over 12
months out of date. Thus after running the install, pacman then
replaces most of what we've just installed with new downloaded
content. Using the most update installer cuts 3+1/2 minutes off
the msys install time - 7 minutes becomes 3+1/2.

Secondly, QEMU defaults to compiling with -O2 and this is more
computationally expensive for GCC. Switching to -O0 drops the
build time from 60 minutes down to 45 minutes.

Thirdly, including debug symbols also has an overhead, and turning
that off reduces time still further down to 38 minutes.

IOW, between all three changes, we can cut approx 25-26 minutes
off the job execution time, bringing it nicely within the job
timeout.

The actually phase of installing the mingw deps still accounts
for about 10 minutes and has not been optimized.

Possibly the same trick of -O0 and skipping -g would also help
the clang alternative Thomas' proposed. If so, that could be
enough to let us enable more features / targets during the
msys build.

Daniel P. Berrangé (8):
  gitlab: remove duplication between msys jobs
  gitlab: print timestamps during windows msys jobs
  gitlab: always use updated msys installer
  gitlab: drop $CI_PROJECT_DIR from cache path
  gitlab: always populate cache for windows msys jobs
  configure: support passthrough of -Dxxx args to meson
  gitlab: disable optimization and debug symbols in msys build
  gitlab: disable FF_SCRIPT_SECTIONS on msys jobs

 .gitlab-ci.d/windows.yml | 174 +++
 configure|   5 ++
 2 files changed, 92 insertions(+), 87 deletions(-)

-- 
2.41.0

Re: [PATCH v3 1/3] qapi: Moved architecture agnostic data types to `machine`

2023-08-01 Thread Markus Armbruster

Dinah Baum  writes:

> Signed-off-by: Dinah Baum 
> ---
>  qapi/machine-target.json | 78 +---
>  qapi/machine.json| 77 +++
>  2 files changed, 78 insertions(+), 77 deletions(-)
>
> diff --git a/qapi/machine-target.json b/qapi/machine-target.json
> index f0a6b72414..3ee2f7ca6b 100644
> --- a/qapi/machine-target.json
> +++ b/qapi/machine-target.json
> @@ -4,83 +4,7 @@
>  # This work is licensed under the terms of the GNU GPL, version 2 or later.
>  # See the COPYING file in the top-level directory.
>  
> -##
> -# @CpuModelInfo:
> -#
> -# Virtual CPU model.
> -#
> -# A CPU model consists of the name of a CPU definition, to which delta
> -# changes are applied (e.g. features added/removed). Most magic values
> -# that an architecture might require should be hidden behind the name.
> -# However, if required, architectures can expose relevant properties.
> -#
> -# @name: the name of the CPU definition the model is based on
> -#
> -# @props: a dictionary of QOM properties to be applied
> -#
> -# Since: 2.8
> -##
> -{ 'struct': 'CpuModelInfo',
> -  'data': { 'name': 'str',
> -'*props': 'any' } }
> -
> -##
> -# @CpuModelExpansionType:
> -#
> -# An enumeration of CPU model expansion types.
> -#
> -# @static: Expand to a static CPU model, a combination of a static
> -# base model name and property delta changes.  As the static base
> -# model will never change, the expanded CPU model will be the
> -# same, independent of QEMU version, machine type, machine
> -# options, and accelerator options.  Therefore, the resulting
> -# model can be used by tooling without having to specify a
> -# compatibility machine - e.g. when displaying the "host" model.
> -# The @static CPU models are migration-safe.
> -#
> -# @full: Expand all properties.  The produced model is not guaranteed
> -# to be migration-safe, but allows tooling to get an insight and
> -# work with model details.
> -#
> -# Note: When a non-migration-safe CPU model is expanded in static
> -# mode, some features enabled by the CPU model may be omitted,
> -# because they can't be implemented by a static CPU model
> -# definition (e.g. cache info passthrough and PMU passthrough in
> -# x86). If you need an accurate representation of the features
> -# enabled by a non-migration-safe CPU model, use @full.  If you
> -# need a static representation that will keep ABI compatibility
> -# even when changing QEMU version or machine-type, use @static
> -# (but keep in mind that some features may be omitted).
> -#
> -# Since: 2.8
> -##
> -{ 'enum': 'CpuModelExpansionType',
> -  'data': [ 'static', 'full' ] }
> -
> -##
> -# @CpuModelCompareResult:
> -#
> -# An enumeration of CPU model comparison results.  The result is
> -# usually calculated using e.g. CPU features or CPU generations.
> -#
> -# @incompatible: If model A is incompatible to model B, model A is not
> -# guaranteed to run where model B runs and the other way around.
> -#
> -# @identical: If model A is identical to model B, model A is
> -# guaranteed to run where model B runs and the other way around.
> -#
> -# @superset: If model A is a superset of model B, model B is
> -# guaranteed to run where model A runs.  There are no guarantees
> -# about the other way.
> -#
> -# @subset: If model A is a subset of model B, model A is guaranteed to
> -# run where model B runs.  There are no guarantees about the other
> -# way.
> -#
> -# Since: 2.8
> -##
> -{ 'enum': 'CpuModelCompareResult',
> -  'data': [ 'incompatible', 'identical', 'superset', 'subset' ] }
> +{ 'include': 'machine.json' }
>  
>  ##
>  # @CpuModelBaselineInfo:
> diff --git a/qapi/machine.json b/qapi/machine.json
> index a08b6576ca..192c781310 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -1691,3 +1691,80 @@
>  { 'command': 'dumpdtb',
>'data': { 'filename': 'str' },
>'if': 'CONFIG_FDT' }
> +
> +##
> +# @CpuModelInfo:
> +#
> +# Virtual CPU model.
> +#
> +# A CPU model consists of the name of a CPU definition, to which delta
> +# changes are applied (e.g. features added/removed). Most magic values
> +# that an architecture might require should be hidden behind the name.
> +# However, if required, architectures can expose relevant properties.
> +#
> +# @name: the name of the CPU definition the model is based on
> +#
> +# @props: a dictionary of QOM properties to be applied
> +#
> +# Since: 2.8
> +##
> +{ 'struct': 'CpuModelInfo',
> +  'data': { 'name': 'str', '*props': 'any' } }
> +
> +##
> +# @CpuModelExpansionType:
> +#
> +# An enumeration of CPU model expansion types.
> +#
> +# @static: Expand to a static CPU model, a combination of a static
> +# base model name and property delta changes.  As the static base
> +# model will never change, the expanded CPU model will be the
> +# same, independent of QEMU version, machine type, machine
> +#

Re: [PATCH] target/nios2: Fix semihost lseek offset computation

2023-08-01 Thread Philippe Mathieu-Daudé


On 1/8/23 14:06, Peter Maydell wrote:

On Tue, 1 Aug 2023 at 00:53, Keith Packard via  wrote:


The arguments for deposit64 are (value, start, length, fieldval); this
appears to have thought they were (value, fieldval, start,
length). Reorder the parameters to match the actual function.

Signed-off-by: Keith Packard 
---
  target/nios2/nios2-semi.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/nios2/nios2-semi.c b/target/nios2/nios2-semi.c
index ffd1f095f6..fc1df1dfeb 100644
--- a/target/nios2/nios2-semi.c
+++ b/target/nios2/nios2-semi.c
@@ -170,7 +170,7 @@ void do_nios2_semihosting(CPUNios2State *env)
  GET_ARG64(2);
  GET_ARG64(3);
  semihost_sys_lseek(cs, nios2_semi_u64_cb, arg0,
-   deposit64(arg2, arg1, 32, 32), arg3);
+   deposit64(arg2, 32, 32, arg1), arg3);
  break;

  case HOSTED_RENAME:


Fixes: d1e23cbaa403b2d ("target/nios2: Use semihosting/syscalls.h")
Cc: qemu-sta...@nongnu.org
Reviewed-by: Peter Maydell 


Commit 950272506d ("target/m68k: Use semihosting/syscalls.h") has the
same issue.

Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH for-8.2 v2 5/6] vfio/migration: Add P2P support for VFIO migration

2023-08-01 Thread Avihai Horon




On 01/08/2023 15:02, Cédric Le Goater wrote:

External email: Use caution opening links or attachments


On 7/31/23 12:25, Avihai Horon wrote:

VFIO migration uAPI defines an optional intermediate P2P quiescent
state. While in the P2P quiescent state, P2P DMA transactions cannot be
initiated by the device, but the device can respond to incoming ones.
Additionally, all outstanding P2P transactions are guaranteed to have
been completed by the time the device enters this state.

The purpose of this state is to support migration of multiple devices
that might do P2P transactions between themselves.

Add support for P2P migration by transitioning all the devices to the
P2P quiescent state before stopping or starting the devices. Use the new
VMChangeStateHandler prepare_cb to achieve that behavior.

This will allow migration of multiple VFIO devices if all of them
support P2P migration.

Signed-off-by: Avihai Horon 
---
  docs/devel/vfio-migration.rst | 93 +--
  hw/vfio/common.c  |  6 ++-
  hw/vfio/migration.c   | 47 --
  hw/vfio/trace-events  |  1 +
  4 files changed, 106 insertions(+), 41 deletions(-)

diff --git a/docs/devel/vfio-migration.rst 
b/docs/devel/vfio-migration.rst

index b433cb5bb2..605fe60e96 100644
--- a/docs/devel/vfio-migration.rst
+++ b/docs/devel/vfio-migration.rst
@@ -23,9 +23,21 @@ and recommends that the initial bytes are sent and 
loaded in the destination

  before stopping the source VM. Enabling this migration capability will
  guarantee that and thus, can potentially reduce downtime even further.

-Note that currently VFIO migration is supported only for a single 
device. This
-is due to VFIO migration's lack of P2P support. However, P2P support 
is planned

-to be added later on.
+To support migration of multiple devices that might do P2P 
transactions between
+themselves, VFIO migration uAPI defines an intermediate P2P 
quiescent state.
+While in the P2P quiescent state, P2P DMA transactions cannot be 
initiated by
+the device, but the device can respond to incoming ones. 
Additionally, all
+outstanding P2P transactions are guaranteed to have been completed 
by the time

+the device enters this state.
+
+All the devices that support P2P migration are first transitioned to 
the P2P
+quiescent state and only then are they stopped or started. This 
makes migration
+safe P2P-wise, since starting and stopping the devices is not done 
atomically

+for all the devices together.
+
+Thus, multiple VFIO devices migration is allowed only if all the 
devices
+support P2P migration. Single VFIO device migration is allowed 
regardless of

+P2P migration support.

  A detailed description of the UAPI for VFIO device migration can be 
found in
  the comment for the ``vfio_device_mig_state`` structure in the 
header file

@@ -132,54 +144,63 @@ will be blocked.
  Flow of state changes during Live migration
  ===

-Below is the flow of state change during live migration.
+Below is the state change flow during live migration for a VFIO 
device that
+supports both precopy and P2P migration. The flow for devices that 
don't
+support it is similar, except that the relevant states for precopy 
and P2P are

+skipped.
  The values in the parentheses represent the VM state, the migration 
state, and

  the VFIO device state, respectively.
-The text in the square brackets represents the flow if the VFIO 
device supports

-pre-copy.

  Live migration save path
  

  ::

-    QEMU normal running state
-    (RUNNING, _NONE, _RUNNING)
-  |
+   QEMU normal running state
+   (RUNNING, _NONE, _RUNNING)
+  |
   migrate_init spawns migration_thread
-    Migration thread then calls each device's .save_setup()
-  (RUNNING, _SETUP, _RUNNING [_PRE_COPY])
-  |
-  (RUNNING, _ACTIVE, _RUNNING [_PRE_COPY])
-  If device is active, get pending_bytes by 
.state_pending_{estimate,exact}()
-  If total pending_bytes >= threshold_size, call 
.save_live_iterate()

-  [Data of VFIO device for pre-copy phase is copied]
-    Iterate till total pending bytes converge and are less than 
threshold

-  |
-  On migration completion, vCPU stops and calls 
.save_live_complete_precopy for
-  each active device. The VFIO device is then transitioned into 
_STOP_COPY state

-  (FINISH_MIGRATE, _DEVICE, _STOP_COPY)
-  |
- For the VFIO device, iterate in .save_live_complete_precopy until
- pending data is 0
-   (FINISH_MIGRATE, _DEVICE, _STOP)
-  |
- (FINISH_MIG

Re: [RFC PATCH 5/6] include/qemu/compiler: Fix problem with gcc_struct and Clang

2023-08-01 Thread Philippe Mathieu-Daudé


On 31/7/23 19:25, Daniel P. Berrangé wrote:

On Mon, Jul 31, 2023 at 04:10:36PM +0200, Philippe Mathieu-Daudé wrote:

On 31/7/23 11:32, Daniel P. Berrangé wrote:


I was surprised to see that we're not using ccache in gitlab CI. It wouldn't
help the from-clean compile time, but thereafter it ought to help. I'm doing
some tests with that to see the impact.


I tried that few years ago and this had very negative impact on custom
runners (maybe I wasn't doing it correctly). Hopefully that changed.


Our runner usage model has changed since then quite alot. What was previously
mostly on shared runners, is now on Azure private runners. I can imagine it
will vary tremendously on what you're using as a private runner.

In the specific case of the windows jobs though, we're using the shared
runners.

Either way, if our jobs are all wired up for ccache correctly, it is then
trivial to selectively turn off usage of ccache by just tweaking a few
env vars.


IIRC we weren't using runner tag to filter, so jobs expected to run on
shared runner were ending on private runner, and using env vars was not
working. But you are right, the comments I pointed are obsolete, I
clearly haven't followed all CI changes. Thanks for the tips,

Phil.

Re: [RFC PATCH 6/6] gitlab-ci.d/windows: Use Clang for compiling in the 64-bit MSYS2 job

2023-08-01 Thread Philippe Mathieu-Daudé


On 31/7/23 16:43, Thomas Huth wrote:

On 31/07/2023 16.23, Philippe Mathieu-Daudé wrote:

On 28/7/23 16:27, Thomas Huth wrote:

We are struggeling with timeouts in the 64-bit MSYS2 job. Clang seems
to be a little bit faster, so let's use this compiler now instead.

There is a problem with compiling the spice headers with Clang, though,
so we can only test this in the 32-bit builds with GCC now. And we have
to disable dbus-display - otherwise the compilation aborts in the CI.

Signed-off-by: Thomas Huth 
---
  .gitlab-ci.d/windows.yml | 14 --
  1 file changed, 8 insertions(+), 6 deletions(-)




@@ -77,13 +76,15 @@
  msys2-64bit:
    extends: .shared_msys2_builder
    variables:
-    MINGW_TARGET: mingw-w64-x86_64
-    MSYSTEM: MINGW64
+    MINGW_TARGET: mingw-w64-clang-x86_64
+    MSYSTEM: CLANG64


OK to use Clang, but I'm tempted to keep the GCC job in manual mode...


Why? We still have the 32-bit job with GCC, and the MinGW cross-compiler 
job with GCC, so that's already quite a bit of coverage, isn't it?


OK we are good then :)

Thanks,

Phil.

Re: [PATCH 1/2] vmmouse: replace DPRINTF with tracing

2023-08-01 Thread Philippe Mathieu-Daudé


On 1/8/23 11:39, marcandre.lur...@redhat.com wrote:

From: Marc-André Lureau 

Signed-off-by: Marc-André Lureau 
---
  hw/i386/vmmouse.c| 29 ++---
  hw/i386/trace-events | 10 ++
  2 files changed, 24 insertions(+), 15 deletions(-)




  static uint32_t vmmouse_get_status(VMMouseState *s)
  {
-DPRINTF("vmmouse_get_status()\n");
+trace_vmmouse_get_status();


Isn't it useful to log the status here? Anyhow:

Reviewed-by: Philippe Mathieu-Daudé 


+
  return (s->status << 16) | s->nb_queue;
  }

Re: [RFC PATCH 0/6] Use Clang for compiling in the 64-bit MSYS2 job

2023-08-01 Thread Daniel P . Berrangé

On Fri, Jul 28, 2023 at 04:27:42PM +0200, Thomas Huth wrote:
> The 64-bit MSYS2 job often times out in our CI, though we already have
> limited it to a very minimum by using --without-default-devices etc.
> GCC is incredibly slow here. By using Clang instead of GCC, the job
> is ca. 15 minutes faster - that's enough buffer to avoid the timeouts
> here.

FYI, I have a complementary series here that achieves a 25 minute
speedup cummulative speedup with GCC - the ideas would apply to
clang too.

https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg00055.html


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v3 2/3] qapi, target/: Enable 'query-cpu-model-expansion' on all architectures

2023-08-01 Thread Markus Armbruster

Dinah Baum  writes:

> Only architectures that implement the command will return
> results, others will return an error message as before.

A brief explanation why this is useful would be nice.

>
> Signed-off-by: Dinah Baum 
> ---
>  cpu.c| 20 +++
>  include/exec/cpu-common.h|  7 
>  qapi/machine-target.json | 60 
>  qapi/machine.json| 53 
>  target/arm/arm-qmp-cmds.c|  7 ++--
>  target/arm/cpu.h |  7 
>  target/i386/cpu-sysemu.c |  7 ++--
>  target/i386/cpu.h|  6 
>  target/s390x/cpu.h   |  7 
>  target/s390x/cpu_models_sysemu.c |  6 ++--
>  10 files changed, 110 insertions(+), 70 deletions(-)
>
> diff --git a/cpu.c b/cpu.c
> index 1c948d1161..a99d09cd47 100644
> --- a/cpu.c
> +++ b/cpu.c
> @@ -292,6 +292,26 @@ void list_cpus(void)
>  #endif
>  }
>  
> +CpuModelExpansionInfo *get_cpu_model_expansion_info(CpuModelExpansionType 
> type,
> +CpuModelInfo *model,
> +Error **errp)
> +{
> +/* XXX: implement cpu_model_expansion for targets that still miss it */
> +#if defined(cpu_model_expansion)
> +return cpu_model_expansion(type, model, errp);
> +#else
> +error_setg(errp, "Could not query cpu model information");
> +return NULL;
> +#endif
> +}

This is vague enough to leave the user wondering what could be done to
avoid this error and by whom.

Before the patch, it's clear enough: "The command
query-cpu-model-expansion has not been found".

You could go with something like "command not supported for this
target".

The error class changes from CommandNotFound to GenericError.  Please
verify libvirt is fine with that.

> +
> +CpuModelExpansionInfo *qmp_query_cpu_model_expansion(CpuModelExpansionType 
> type,
> + CpuModelInfo *model,
> + Error **errp)
> +{
> +return get_cpu_model_expansion_info(type, model, errp);
> +}
> +

Why do you need qmp_query_cpu_model_expansion() to become a wrapper
around the real function?

>  #if defined(CONFIG_USER_ONLY)
>  void tb_invalidate_phys_addr(hwaddr addr)
>  {
> diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
> index 87dc9a752c..653f8a9d2b 100644
> --- a/include/exec/cpu-common.h
> +++ b/include/exec/cpu-common.h
> @@ -6,6 +6,7 @@
>  #ifndef CONFIG_USER_ONLY
>  #include "exec/hwaddr.h"
>  #endif
> +#include "qapi/qapi-commands-machine.h"
>  
>  /**
>   * vaddr:
> @@ -167,4 +168,10 @@ int cpu_memory_rw_debug(CPUState *cpu, vaddr addr,
>  /* vl.c */
>  void list_cpus(void);
>  
> +CpuModelExpansionInfo *get_cpu_model_expansion_info(CpuModelExpansionType 
> type,
> +CpuModelInfo *model,
> +Error **errp);
> +void list_cpu_model_expansion(CpuModelExpansionType type,
> +  CpuModelInfo *model, Error **errp);
> +

The declaration of list_cpu_model_expansion() belongs to the next patch.

>  #endif /* CPU_COMMON_H */
> diff --git a/qapi/machine-target.json b/qapi/machine-target.json
> index 3ee2f7ca6b..a658b1754b 100644
> --- a/qapi/machine-target.json
> +++ b/qapi/machine-target.json
> @@ -139,66 +139,6 @@
>'returns': 'CpuModelBaselineInfo',
>'if': 'TARGET_S390X' }
>  
> -##
> -# @CpuModelExpansionInfo:
> -#
> -# The result of a cpu model expansion.
> -#
> -# @model: the expanded CpuModelInfo.
> -#
> -# Since: 2.8
> -##
> -{ 'struct': 'CpuModelExpansionInfo',
> -  'data': { 'model': 'CpuModelInfo' },
> -  'if': { 'any': [ 'TARGET_S390X',
> -   'TARGET_I386',
> -   'TARGET_ARM' ] } }
> -
> -##
> -# @query-cpu-model-expansion:
> -#
> -# Expands a given CPU model (or a combination of CPU model +
> -# additional options) to different granularities, allowing tooling to
> -# get an understanding what a specific CPU model looks like in QEMU
> -# under a certain configuration.
> -#
> -# This interface can be used to query the "host" CPU model.
> -#
> -# The data returned by this command may be affected by:
> -#
> -# * QEMU version: CPU models may look different depending on the QEMU
> -#   version.  (Except for CPU models reported as "static" in
> -#   query-cpu-definitions.)
> -# * machine-type: CPU model may look different depending on the
> -#   machine-type.  (Except for CPU models reported as "static" in
> -#   query-cpu-definitions.)
> -# * machine options (including accelerator): in some architectures,
> -#   CPU models may look different depending on machine and accelerator
> -#   options.  (Except for CPU models reported as "static" in
> -#   query-cpu-definitions.)
> -# * "-cpu" arguments and global properties: arguments to the -cpu
> -#   option and global properties ma

Re: [RFC PATCH 5/6] include/qemu/compiler: Fix problem with gcc_struct and Clang

2023-08-01 Thread Daniel P . Berrangé

On Mon, Jul 31, 2023 at 11:05:42AM +0100, Peter Maydell wrote:
> On Mon, 31 Jul 2023 at 10:11, Thomas Huth  wrote:
> > Or do you see another possibility how we could fix that timeout problem in
> > the 64-bit MSYS2 job? Still switching to clang, but compiling with
> > --extra-cflags="-Wno-unknown-attributes" maybe?
> 
> Is there any way we can separate out the "take 25 minutes to
> install a pile of packages" part from the "actually compile and
> test QEMU" part ?

I was thinking a little about what we actually aim to achieve with the
Windows MSys2 jobs.

We have long had the mingw cross compilation jobs for testing the Windows
platform compile phase. Please correct me if I'm wrong, but IIUC, the
msys2 jobs haven't identified any compilation problems that weren't already
found from the linux mingw cross compile jobs.

The definitely unique thing that the msys2 jobs *can* do though, is to
run the test suite. We can't run the test suite from the mingw cross jobs
unless we do something slightly crazy like adding Wine to the containers.
In libvirt we considered the latter until we realized that the smallest
possible wine install would still be enourmous.

In the Linux world we have the jobs split into three stages

  * Create the container images  (eg amd64-fedora-container)
  * Do the compilation work  (eg build-system-fedora)
  * Run the test suite   (eg check-system-fedora/avocado-system-fedora)

If the value of the msys2 jobs is that they let us run the test suite,
can we limit the usage of msys2 to just that part, and chain it upto
the fedora mingw cross compile jobs ie.

  win64-fedora-cross-container
|
+-> cross-win64-system
  |
  +-> check-win64-msys

In the "cross-win64-system" job we would have to publish the entire QEMU
'build' directory as an artifact, to pass it over to the msys job.  If
we also published /usr/x86_64-w64-mingw32/ as an artifact, then we would
not need to install any mingw packages under msys. The basic msys installer
can be run (which takes a couple of minutes), and then then we just dump
the Fedora artifacts of /usr/x86_64-w64-mingw32/ into the right place
and run the test suite.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 0/8] gitlab: speed up msys windows jobs with GCC

2023-08-01 Thread Markus Armbruster

Daniel P. Berrangé  writes:

> This is an alternative and/or complementary to Thomas' proposal
> to use CLang with msys:
>
>   https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg05402.html
>
> First of all, the current msys installer we're using is over 12
> months out of date. Thus after running the install, pacman then
> replaces most of what we've just installed with new downloaded
> content. Using the most update installer cuts 3+1/2 minutes off
> the msys install time - 7 minutes becomes 3+1/2.
>
> Secondly, QEMU defaults to compiling with -O2 and this is more
> computationally expensive for GCC. Switching to -O0 drops the
> build time from 60 minutes down to 45 minutes.

>From the fine manual[*]: "The effectiveness of some warnings depends on
optimizations also being enabled.  For example '-Wsuggest-final-types'
is more effective with link-time optimization and some instances of
other warnings may not be issued at all unless optimization is enabled.
While optimization in general improves the efficacy of control and data
flow sensitive warnings, in some cases it may also cause false
positives."  Do we care?

[...]


[*] https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/Warning-Options.html

[PATCH] linux-user: Emulate /proc/cpuinfo on aarch64 and arm

2023-08-01 Thread Helge Deller

Add emulation for /proc/cpuinfo for arm architecture.
The output below mimics output as seen on debian porterboxes.

aarch64 output example:

processor   : 0
BogoMIPS: 100.00
Features: fp asimd evtstrm cpuid
CPU implementer : 0x50
CPU architecture: 8
CPU variant : 0x0
CPU part: 0x0
CPU revision: 1

arm output example:

processor   : 0
model name  : ARMv7 Processor rev 2 (v7l)
BogoMIPS: 50.00
Features: half thumb fastmult vfp edsp thumbee vfpv3 tls idiva idivt 
vfpd32 lpae
CPU implementer : 0x56
CPU architecture: 7
CPU variant : 0x2
CPU part: 0x584
CPU revision: 2

Signed-off-by: Helge Deller 

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index dc8266c073..917c388073 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8287,7 +8287,8 @@ void target_exception_dump(CPUArchState *env, const char 
*fmt, int code)

 #if HOST_BIG_ENDIAN != TARGET_BIG_ENDIAN || \
 defined(TARGET_SPARC) || defined(TARGET_M68K) || defined(TARGET_HPPA) || \
-defined(TARGET_RISCV) || defined(TARGET_S390X)
+defined(TARGET_RISCV) || defined(TARGET_S390X) || defined(TARGET_ARM) || \
+defined(TARGET_AARCH64)
 static int is_proc(const char *filename, const char *entry)
 {
 return strcmp(filename, entry) == 0;
@@ -8503,6 +8504,33 @@ static int open_hardware(CPUArchState *cpu_env, int fd)
 }
 #endif

+#if defined(TARGET_AARCH64) || defined(TARGET_ARM)
+static int open_cpuinfo(CPUArchState *cpu_env, int fd)
+{
+int i, num_cpus;
+const int is64 = TARGET_ABI_BITS == 64;
+
+num_cpus = sysconf(_SC_NPROCESSORS_ONLN);
+for (i = 0; i < num_cpus; i++) {
+dprintf(fd, "processor\t: %d\n", i);
+if (!is64) {
+dprintf(fd, "model name\t: ARMv7 Processor rev 2 (v7l)\n");
+}
+dprintf(fd, "BogoMIPS\t: %d.00\n", is64 ? 100 : 50);
+dprintf(fd, "Features\t: %s\n",
+is64 ? "fp asimd evtstrm cpuid"
+ : "half thumb fastmult vfp edsp thumbee vfpv3 " \
+   "tls idiva idivt vfpd32 lpae");
+dprintf(fd, "CPU implementer\t: 0x%d\n", is64 ? 50 : 56);
+dprintf(fd, "CPU architecture: %d\n",is64 ? 8 : 7);
+dprintf(fd, "CPU variant\t: 0x%d\n", is64 ? 0 : 2);
+dprintf(fd, "CPU part\t: 0x%d\n",is64 ? 0 : 584);
+dprintf(fd, "CPU revision\t: %d\n\n",is64 ? 1 : 2);
+}
+return 0;
+}
+#endif
+
 int do_guest_openat(CPUArchState *cpu_env, int dirfd, const char *pathname,
 int flags, mode_t mode, bool safe)
 {
@@ -8522,7 +8550,8 @@ int do_guest_openat(CPUArchState *cpu_env, int dirfd, 
const char *pathname,
 { "/proc/net/route", open_net_route, is_proc },
 #endif
 #if defined(TARGET_SPARC) || defined(TARGET_HPPA) || \
-defined(TARGET_RISCV) || defined(TARGET_S390X)
+defined(TARGET_RISCV) || defined(TARGET_S390X) || \
+defined(TARGET_ARM)   || defined(TARGET_AARCH64)
 { "/proc/cpuinfo", open_cpuinfo, is_proc },
 #endif
 #if defined(TARGET_M68K)

Re: [PATCH v3 3/3] cpu, softmmu/vl.c: Change parsing of -cpu argument to allow -cpu cpu,help to print options for the CPU type similar to how the '-device' option works.

2023-08-01 Thread Markus Armbruster

Dinah Baum  writes:

> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1480
> Signed-off-by: Dinah Baum 
>
> Signed-off-by: Dinah Baum 

Looks basically the same as v2, which means my review still applies.

Message-ID: <878rdbfww1@pond.sub.org>
https://lists.nongnu.org/archive/html/qemu-devel/2023-05/msg06699.html

If you need further assistance, just ask.

Re: [PATCH] linux-user: Emulate /proc/cpuinfo on aarch64 and arm

2023-08-01 Thread Peter Maydell

On Tue, 1 Aug 2023 at 14:57, Helge Deller  wrote:
>
> Add emulation for /proc/cpuinfo for arm architecture.
> The output below mimics output as seen on debian porterboxes.


> +#if defined(TARGET_AARCH64) || defined(TARGET_ARM)
> +static int open_cpuinfo(CPUArchState *cpu_env, int fd)
> +{
> +int i, num_cpus;
> +const int is64 = TARGET_ABI_BITS == 64;
> +
> +num_cpus = sysconf(_SC_NPROCESSORS_ONLN);
> +for (i = 0; i < num_cpus; i++) {
> +dprintf(fd, "processor\t: %d\n", i);
> +if (!is64) {
> +dprintf(fd, "model name\t: ARMv7 Processor rev 2 (v7l)\n");
> +}
> +dprintf(fd, "BogoMIPS\t: %d.00\n", is64 ? 100 : 50);
> +dprintf(fd, "Features\t: %s\n",
> +is64 ? "fp asimd evtstrm cpuid"
> + : "half thumb fastmult vfp edsp thumbee vfpv3 " \
> +   "tls idiva idivt vfpd32 lpae");
> +dprintf(fd, "CPU implementer\t: 0x%d\n", is64 ? 50 : 56);
> +dprintf(fd, "CPU architecture: %d\n",is64 ? 8 : 7);
> +dprintf(fd, "CPU variant\t: 0x%d\n", is64 ? 0 : 2);
> +dprintf(fd, "CPU part\t: 0x%d\n",is64 ? 0 : 584);
> +dprintf(fd, "CPU revision\t: %d\n\n",is64 ? 1 : 2);

If you want to do this you should hook it up to what the
CPU being emulated actually is and what features it has.
(Compare how we set the hwcaps.)

> +}
> +return 0;
> +}

thanks
-- PMM

Re: [PATCH 0/8] gitlab: speed up msys windows jobs with GCC

2023-08-01 Thread Daniel P . Berrangé

On Tue, Aug 01, 2023 at 03:53:22PM +0200, Markus Armbruster wrote:
> Daniel P. Berrangé  writes:
> 
> > This is an alternative and/or complementary to Thomas' proposal
> > to use CLang with msys:
> >
> >   https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg05402.html
> >
> > First of all, the current msys installer we're using is over 12
> > months out of date. Thus after running the install, pacman then
> > replaces most of what we've just installed with new downloaded
> > content. Using the most update installer cuts 3+1/2 minutes off
> > the msys install time - 7 minutes becomes 3+1/2.
> >
> > Secondly, QEMU defaults to compiling with -O2 and this is more
> > computationally expensive for GCC. Switching to -O0 drops the
> > build time from 60 minutes down to 45 minutes.
> 
> From the fine manual[*]: "The effectiveness of some warnings depends on
> optimizations also being enabled.  For example '-Wsuggest-final-types'
> is more effective with link-time optimization and some instances of
> other warnings may not be issued at all unless optimization is enabled.
> While optimization in general improves the efficacy of control and data
> flow sensitive warnings, in some cases it may also cause false
> positives."  Do we care?

In general, yes, we do care.

In this specific case though, we're battling to figure out the lesser
of multiple evils.

Right now we configure with:

  --target-list=x86_64-softmmu --without-default-devices 

and so with optimization enabled, we'll get good warning coverage of
a small amount of code, except we don't because people started
ignoring the msys jobs as they timeout too frequently.

If we can use Thomas' clang switch or my -O0 patches, we can get
within the timeout, so people can trust the job once again. If we
can do both ideas and cut the time even more, then we can enable
more features (perhaps drop --without-default-devices).

So the warnings might not be quite as good, but we'll have the
warnings across a larger amount of code.

Alot of the warnings from the Linux/macOS builds will also apply
in the Windows builds. I think on balance I'd probably prefer us
to build a larger amount of code for Windows. This is in context
of free shared runners at least.

As a more drastic option, we might need to consider using the
Azure credits for Windows  runners too. If we could have bigger
VMs for Windows CI, we can build more and have better warnings
at the same time. 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH] linux-user: Emulate /proc/cpuinfo on aarch64 and arm

2023-08-01 Thread Helge Deller


On 8/1/23 16:22, Peter Maydell wrote:

On Tue, 1 Aug 2023 at 14:57, Helge Deller  wrote:


Add emulation for /proc/cpuinfo for arm architecture.
The output below mimics output as seen on debian porterboxes.




+#if defined(TARGET_AARCH64) || defined(TARGET_ARM)
+static int open_cpuinfo(CPUArchState *cpu_env, int fd)
+{
+int i, num_cpus;
+const int is64 = TARGET_ABI_BITS == 64;
+
+num_cpus = sysconf(_SC_NPROCESSORS_ONLN);
+for (i = 0; i < num_cpus; i++) {
+dprintf(fd, "processor\t: %d\n", i);
+if (!is64) {
+dprintf(fd, "model name\t: ARMv7 Processor rev 2 (v7l)\n");
+}
+dprintf(fd, "BogoMIPS\t: %d.00\n", is64 ? 100 : 50);
+dprintf(fd, "Features\t: %s\n",
+is64 ? "fp asimd evtstrm cpuid"
+ : "half thumb fastmult vfp edsp thumbee vfpv3 " \
+   "tls idiva idivt vfpd32 lpae");
+dprintf(fd, "CPU implementer\t: 0x%d\n", is64 ? 50 : 56);
+dprintf(fd, "CPU architecture: %d\n",is64 ? 8 : 7);
+dprintf(fd, "CPU variant\t: 0x%d\n", is64 ? 0 : 2);
+dprintf(fd, "CPU part\t: 0x%d\n",is64 ? 0 : 584);
+dprintf(fd, "CPU revision\t: %d\n\n",is64 ? 1 : 2);


If you want to do this you should hook it up to what the
CPU being emulated actually is and what features it has.
(Compare how we set the hwcaps.)


Nice!
I didn't know about those elf hwcaps...
I'll respin.

Thanks,
Helge

Re: [PATCH] linux-user: Emulate /proc/cpuinfo on aarch64 and arm

2023-08-01 Thread Peter Maydell

On Tue, 1 Aug 2023 at 15:41, Helge Deller  wrote:
>
> On 8/1/23 16:22, Peter Maydell wrote:
> > On Tue, 1 Aug 2023 at 14:57, Helge Deller  wrote:
> >>
> >> Add emulation for /proc/cpuinfo for arm architecture.
> >> The output below mimics output as seen on debian porterboxes.
> >
> >
> >> +#if defined(TARGET_AARCH64) || defined(TARGET_ARM)
> >> +static int open_cpuinfo(CPUArchState *cpu_env, int fd)
> >> +{
> >> +int i, num_cpus;
> >> +const int is64 = TARGET_ABI_BITS == 64;
> >> +
> >> +num_cpus = sysconf(_SC_NPROCESSORS_ONLN);
> >> +for (i = 0; i < num_cpus; i++) {
> >> +dprintf(fd, "processor\t: %d\n", i);
> >> +if (!is64) {
> >> +dprintf(fd, "model name\t: ARMv7 Processor rev 2 (v7l)\n");
> >> +}
> >> +dprintf(fd, "BogoMIPS\t: %d.00\n", is64 ? 100 : 50);
> >> +dprintf(fd, "Features\t: %s\n",
> >> +is64 ? "fp asimd evtstrm cpuid"
> >> + : "half thumb fastmult vfp edsp thumbee vfpv3 " \
> >> +   "tls idiva idivt vfpd32 lpae");
> >> +dprintf(fd, "CPU implementer\t: 0x%d\n", is64 ? 50 : 56);
> >> +dprintf(fd, "CPU architecture: %d\n",is64 ? 8 : 7);
> >> +dprintf(fd, "CPU variant\t: 0x%d\n", is64 ? 0 : 2);
> >> +dprintf(fd, "CPU part\t: 0x%d\n",is64 ? 0 : 584);
> >> +dprintf(fd, "CPU revision\t: %d\n\n",is64 ? 1 : 2);
> >
> > If you want to do this you should hook it up to what the
> > CPU being emulated actually is and what features it has.
> > (Compare how we set the hwcaps.)
>
> Nice!
> I didn't know about those elf hwcaps...

In an ideal world guest code should only look at the
hwcaps (which are the ABI-stable interface the kernel
provides), not at what's in /proc/cpuinfo. But of
course if you're a shell script then cpuinfo is a lot
easier to deal with...

thanks
-- PMM

Re: [RFC PATCH 08/19] HostMem: Add private property to indicate to use kvm gmem

2023-08-01 Thread Xiaoyao Li


On 8/1/2023 1:22 AM, Markus Armbruster wrote:

Xiaoyao Li  writes:


From: Isaku Yamahata 

Signed-off-by: Isaku Yamahata 
Signed-off-by: Xiaoyao Li 


[...]


diff --git a/qapi/qom.json b/qapi/qom.json
index 7f92ea43e8e1..e0b2044e3d20 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -605,6 +605,9 @@
  # @reserve: if true, reserve swap space (or huge pages) if applicable
  # (default: true) (since 6.1)
  #
+# @private: if true, use KVM gmem private memory
+#   (default: false) (since 8.1)
+#


Please format like

# @private: if true, use KVM gmem private memory (default: false)
# (since 8.1)

to blend in with recent commit a937b6aa739 (qapi: Reformat doc comments
to conform to current conventions).


will do it in next version.

Thanks!
-Xiaoyao


  # @size: size of the memory region in bytes
  #
  # @x-use-canonical-path-for-ramblock-id: if true, the canonical path
@@ -631,6 +634,7 @@
  '*prealloc-context': 'str',
  '*share': 'bool',
  '*reserve': 'bool',
+'*private': 'bool',
  'size': 'size',
  '*x-use-canonical-path-for-ramblock-id': 'bool' } }

Re: [PATCH] migration/calc-dirty-rate: millisecond precision period

2023-08-01 Thread gudkov . andrei--- via

On Mon, Jul 31, 2023 at 04:06:24PM -0400, Peter Xu wrote:
> Hi, Andrei,
> 
> On Mon, Jul 31, 2023 at 05:51:49PM +0300, gudkov.and...@huawei.com wrote:
> > On Mon, Jul 17, 2023 at 03:08:37PM -0400, Peter Xu wrote:
> > > On Tue, Jul 11, 2023 at 03:38:18PM +0300, gudkov.and...@huawei.com wrote:
> > > > On Thu, Jul 06, 2023 at 03:23:43PM -0400, Peter Xu wrote:
> > > > > On Thu, Jun 29, 2023 at 11:59:03AM +0300, Andrei Gudkov wrote:
> > > > > > Introduces alternative argument calc-time-ms, which is the
> > > > > > the same as calc-time but accepts millisecond value.
> > > > > > Millisecond precision allows to make predictions whether
> > > > > > migration will succeed or not. To do this, calculate dirty
> > > > > > rate with calc-time-ms set to max allowed downtime, convert
> > > > > > measured rate into volume of dirtied memory, and divide by
> > > > > > network throughput. If the value is lower than max allowed
> > > > > > downtime, then migration will converge.
> > > > > > 
> > > > > > Measurement results for single thread randomly writing to
> > > > > > a 24GiB region:
> > > > > > +--++
> > > > > > | calc-time-ms | dirty-rate (MiB/s) |
> > > > > > +--++
> > > > > > |  100 |   1880 |
> > > > > > |  200 |   1340 |
> > > > > > |  300 |   1120 |
> > > > > > |  400 |   1030 |
> > > > > > |  500 |868 |
> > > > > > |  750 |720 |
> > > > > > | 1000 |636 |
> > > > > > | 1500 |498 |
> > > > > > | 2000 |423 |
> > > > > > +--++
> > > > > 
> > > > > Do you mean the dirty workload is constant?  Why it differs so much 
> > > > > with
> > > > > different calc-time-ms?
> > > > 
> > > > Workload is as constant as it could be. But the naming is misleading.
> > > > What is named "dirty-rate" in fact is not "rate" at all.
> > > > calc-dirty-rate measures number of *uniquely* dirtied pages, i.e. each
> > > > page can contribute to the counter only once during measurement period.
> > > > That's why the values are decreasing. Consider also ad infinitum 
> > > > argument:
> > > > since VM has fixed number of pages and each page can be dirtied only 
> > > > once,
> > > > dirty-rate=number-of-dirtied-pages/calc-time -> 0 as calc-time -> inf.
> > > > It would make more sense to report number as "dirty-volume" --
> > > > without dividing it by calc-time.
> > > > 
> > > > Note that number of *uniquely* dirtied pages in given amount of time is
> > > > exactly what we need for doing migration-related predictions. There is
> > > > no error here.
> > > 
> > > Is calc-time-ms the duration of the measurement?
> > > 
> > > Taking the 1st line as example, 1880MB/s * 0.1s = 188MB.
> > > For the 2nd line, 1340MB/s * 0.2s = 268MB.
> > > Even for the longest duration of 2s, that's 846MB in total.
> > > 
> > > The range is 24GB.  In this case, most of the pages should only be written
> > > once even if random for all these test durations, right?
> > > 
> > 
> > Yes, I messed with load generator.
> > The effective memory region was much smaller than 24GiB.
> > I performed more testing (after fixing load generator),
> > now with different memory sizes and different modes.
> > 
> > +--+---+
> > | calc-time-ms |dirty rate MiB/s   |
> > |  ++---+--+
> > |  | theoretical| page-sampling | dirty-bitmap |
> > |  | (at 3M wr/sec) |   |  |
> > +--++---+--+
> > | 1GiB |
> > +--++---+--+
> > |  100 |   6996 |  7100 | 3192 |
> > |  200 |   4606 |  4660 | 2655 |
> > |  300 |   3305 |  3280 | 2371 |
> > |  400 |   2534 |  2525 | 2154 |
> > |  500 |   2041 |  2044 | 1871 |
> > |  750 |   1365 |  1341 | 1358 |
> > | 1000 |   1024 |  1052 | 1025 |
> > | 1500 |683 |   678 |  684 |
> > | 2000 |512 |   507 |  513 |
> > +--++---+--+
> > | 4GiB |
> > +--++---+--+
> > |  100 |  10232 |  8880 | 4070 |
> > |  200 |   8954 |  8049 | 3195 |
> > |  300 |   7889 |  7193 | 2881 |
> > |  400 |

Re: [RFC PATCH 08/19] HostMem: Add private property to indicate to use kvm gmem

2023-08-01 Thread Daniel P . Berrangé

On Mon, Jul 31, 2023 at 07:22:05PM +0200, Markus Armbruster wrote:
> Xiaoyao Li  writes:
> 
> > From: Isaku Yamahata 
> >
> > Signed-off-by: Isaku Yamahata 
> > Signed-off-by: Xiaoyao Li 
> 
> [...]
> 
> > diff --git a/qapi/qom.json b/qapi/qom.json
> > index 7f92ea43e8e1..e0b2044e3d20 100644
> > --- a/qapi/qom.json
> > +++ b/qapi/qom.json
> > @@ -605,6 +605,9 @@
> >  # @reserve: if true, reserve swap space (or huge pages) if applicable
> >  # (default: true) (since 6.1)
> >  #
> > +# @private: if true, use KVM gmem private memory
> > +#   (default: false) (since 8.1)
> > +#
> 
> Please format like
> 
># @private: if true, use KVM gmem private memory (default: false)
># (since 8.1)

Also QEMU 8.1.0 is in freeze right now, so there's no chance
of these patches making 8.1.0. IOW, use "since 8.2" as the
next release you might achieve merge for.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[PATCH] linux-user: Fix openat() emulation to correctly detect accesses to /proc

2023-08-01 Thread Helge Deller

In qemu we catch accesses to files like /proc/cpuinfo or /proc/net/route
and return to the guest contents which would be visible on a real system
(instead what the host would show).

This patch fixes a bug, where for example the accesses
cat /proccpuinfo
or
cd /proc && cat cpuinfo
will not be recognized by qemu and where qemu will wrongly show
the contents of the host's /proc/cpuinfo file.

Signed-off-by: Helge Deller 

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 917c388073..bb864c2bb3 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8531,9 +8531,11 @@ static int open_cpuinfo(CPUArchState *cpu_env, int fd)
 }
 #endif

-int do_guest_openat(CPUArchState *cpu_env, int dirfd, const char *pathname,
+int do_guest_openat(CPUArchState *cpu_env, int dirfd, const char *fname,
 int flags, mode_t mode, bool safe)
 {
+char proc_name[PATH_MAX];
+const char *pathname;
 struct fake_open {
 const char *filename;
 int (*fill)(CPUArchState *cpu_env, int fd);
@@ -8560,6 +8562,13 @@ int do_guest_openat(CPUArchState *cpu_env, int dirfd, 
const char *pathname,
 { NULL, NULL, NULL }
 };

+/* if this is a file from /proc/ filesystem, expand full name */
+if (realpath(fname, proc_name) && strncmp(proc_name, "/proc/", 6) == 0) {
+pathname = proc_name;
+} else {
+pathname = fname;
+}
+
 if (is_proc_myself(pathname, "exe")) {
 if (safe) {
 return safe_openat(dirfd, exec_path, flags, mode);

Re: [PATCH v2 0/9] gfxstream + rutabaga_gfx

2023-08-01 Thread Alyssa Ross

I was able to replicate my existing crosvm cross-domain testing setup
using QEMU, and all worked.  I didn't test any other capsets.

Tested-by: Alyssa Ross 


signature.asc
Description: PGP signature

Re: [PATCH v2 9/9] docs/system: add basic virtio-gpu documentation

2023-08-01 Thread Alyssa Ross

Gurchetan Singh  writes:

> +virtio-gpu rutabaga
> +---
> +
> +virtio-gpu can also leverage `rutabaga_gfx`_ to provide `gfxstream`_ 
> rendering
> +and `Wayland display passthrough`_.  With the gfxstream rendering mode, GLES
> +and Vulkan calls are forwarded directly to the host with minimal 
> modification.
> +
> +The crosvm book provides directions on how to build a `gfxstream-enabled
> +rutabaga`_ and launch a `guest Wayland compositor`_.
> +
> +This device does require host blob support (``hostmem`` field below), but not
> +all capsets (``capset_names`` below) have to enabled when starting the 
> device.

A more thorough description of what hostmem does, and how to determine
what value it should have, would be very welcome.


signature.asc
Description: PGP signature

Re: [PATCH] target/nios2: Pass semihosting arg to exit

2023-08-01 Thread Keith Packard via


> says that for HOSTED_EXIT the exit code is in r5,
> not in a parameter block pointed to by r5. That
> would imply that the correct change is to use
> R_ARG1 rather than R_ARG0 here.

Ah, thanks -- I hadn't managed to find the actual standard yet. I'll
resubmit with that fixed.

-- 
-keith


signature.asc
Description: PGP signature

Re: [PATCH v2 6/9] gfxstream + rutabaga: add initial support for gfxstream

2023-08-01 Thread Alyssa Ross

Gurchetan Singh  writes:

> +static int virtio_gpu_rutabaga_init(VirtIOGPU *g, Error **errp)
> +{
> +int result;
> +uint64_t capset_mask;
> +struct rutabaga_channels channels = { 0 };
> +struct rutabaga_builder builder = { 0 };
> +
> +VirtIOGPURutabaga *vr = VIRTIO_GPU_RUTABAGA(g);
> +vr->rutabaga = NULL;
> +
> +if (!vr->capset_names) {
> +error_setg(errp, "a capset name from virtio-gpu spec");
> +return -EINVAL;
> +}
> +
> +builder.wsi = RUTABAGA_WSI_SURFACELESS;
> +/*
> + * Currently, if WSI is specified, the only valid strings are 
> "surfaceless"
> + * or "headless".  Surfaceless doesn't create a native window surface, 
> but
> + * does copy from the render target to the Pixman buffer if a virtio-gpu
> + * 2D hypercall is issued.  Surfacless is the default.
> + *
> + * Headless is like surfaceless, but doesn't copy to the Pixman buffer. 
> The
> + * use case is automated testing environments where there is no need to 
> view
> + * results.
> + *
> + * In the future, more performant virtio-gpu 2D UI integration may be 
> added.
> + */
> +if (vr->wsi) {
> +if (g_str_equal(vr->wsi, "surfaceless")) {
> +vr->headless = false;
> +} else if (g_str_equal(vr->wsi, "headless")) {
> +vr->headless = true;
> +} else {
> +error_setg(errp, "invalid wsi option selected");
> +return -EINVAL;
> +}
> +}
> +
> +result = rutabaga_calculate_capset_mask(vr->capset_names, &capset_mask);
> +if (result) {
> +error_setg(errp, "invalid capset names: %s", vr->capset_names);
> +return result;
> +}
> +
> +builder.fence_cb = virtio_gpu_rutabaga_fence_cb;
> +builder.debug_cb = virtio_gpu_rutabaga_debug_cb;
> +builder.capset_mask = capset_mask;
> +
> +/*
> + * Using GPOINTER_TO_UINT(g) below causes segfaults.
> + */
> +builder.user_data =  (uint64_t)(uintptr_t *)(void *)g;
> +
> +if (vr->wayland_socket_path) {
> +if ((builder.capset_mask & (1 << RUTABAGA_CAPSET_CROSS_DOMAIN)) == 
> 0) {
> +error_setg(errp, "cross-domain required with wayland socket");
> +return -EINVAL;
> +}
> +
> +channels.channels = g_new0(struct rutabaga_channel, 1);
> +channels.num_channels = 1;
> +channels.channels[0].channel_name = vr->wayland_socket_path;
> +channels.channels[0].channel_type = RUTABAGA_CHANNEL_TYPE_WAYLAND;
> +builder.channels = &channels;
> +}
> +

Would it be feasible to identify whether Wayland should be used in some
other way, to avoid users having to manually specify the socket path and
allow the standard wl_display_connect() logic to be used?

> +result = rutabaga_init(&builder, &vr->rutabaga);
> +if (builder.capset_mask & (1 << RUTABAGA_CAPSET_CROSS_DOMAIN)) {
> +g_free(channels.channels);
> +}
> +
> +return result;
> +}


signature.asc
Description: PGP signature

Re: [PATCH] target/nios2: Pass semihosting arg to exit

2023-08-01 Thread Peter Maydell

On Tue, 1 Aug 2023 at 16:10, Keith Packard  wrote:
>
>
> > says that for HOSTED_EXIT the exit code is in r5,
> > not in a parameter block pointed to by r5. That
> > would imply that the correct change is to use
> > R_ARG1 rather than R_ARG0 here.
>
> Ah, thanks -- I hadn't managed to find the actual standard yet.

Yeah, the closest to a "standard" we have for nios2 is that
I asked the Codesourcery folks to document it in the libgloss
sources and put the URL to it in a comment at the top of
nios2-semi.c, given that there's no official spec and the
original and main guest-side user is libgloss.
m68k is in a similar position only without the URL
in our source file :-)

-- PMM

Rutabaga backwards compatibility

2023-08-01 Thread Alyssa Ross

Gurchetan Singh  writes:

> On Mon, Jul 24, 2023 at 2:56 AM Alyssa Ross  wrote:
>>
>> Gurchetan Singh  writes:
>>
>> > In terms of API stability/versioning/packaging, once this series is
>> > reviewed, the plan is to cut a "gfxstream upstream release branch".  We
>> > will have the same API guarantees as any other QEMU project then, i.e no
>> > breaking API changes for 5 years.
>>
>> What about Rutabaga?
>
> Yes, rutabaga + gfxstream will both be versioned and maintain API
> backwards compatibility in line with QEMU guidelines.

In that case, I should draw your attention to
, which I've just realised while testing v2
of your series here breaks the build of the rutabaga ffi, and which will
require the addition of a "prot" field to struct rutabaga_handle (a
breaking change).  I'll push a new version of that CL to fix rutabaga
ffi in the next few days.

Since this is already coming up, before the release has even been made,
is it worth exploring how to limit the rutabaga API to avoid more
breaking changes after the release?  Could there be more use of opaque
structs, for example?

(CCing the crosvm list)

signature.asc
Description: PGP signature

[PATCH] target/nios2: Pass semihosting arg to exit

2023-08-01 Thread Keith Packard via

Instead of using R_ARG0 (the semihost function number), use R_ARG1
(the provided exit status).

Signed-off-by: Keith Packard 
---
 target/nios2/nios2-semi.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/nios2/nios2-semi.c b/target/nios2/nios2-semi.c
index 3738774976..f3b7aee4f1 100644
--- a/target/nios2/nios2-semi.c
+++ b/target/nios2/nios2-semi.c
@@ -133,8 +133,8 @@ void do_nios2_semihosting(CPUNios2State *env)
 args = env->regs[R_ARG1];
 switch (nr) {
 case HOSTED_EXIT:
-gdb_exit(env->regs[R_ARG0]);
-exit(env->regs[R_ARG0]);
+gdb_exit(env->regs[R_ARG1]);
+exit(env->regs[R_ARG1]);
 
 case HOSTED_OPEN:
 GET_ARG(0);
-- 
2.40.1

Re: [PATCH] target/nios2: Pass semihosting arg to exit

2023-08-01 Thread Peter Maydell

On Tue, 1 Aug 2023 at 16:23, Keith Packard via  wrote:
>
> Instead of using R_ARG0 (the semihost function number), use R_ARG1
> (the provided exit status).
>
> Signed-off-by: Keith Packard 
> ---
>  target/nios2/nios2-semi.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/target/nios2/nios2-semi.c b/target/nios2/nios2-semi.c
> index 3738774976..f3b7aee4f1 100644
> --- a/target/nios2/nios2-semi.c
> +++ b/target/nios2/nios2-semi.c
> @@ -133,8 +133,8 @@ void do_nios2_semihosting(CPUNios2State *env)
>  args = env->regs[R_ARG1];
>  switch (nr) {
>  case HOSTED_EXIT:
> -gdb_exit(env->regs[R_ARG0]);
> -exit(env->regs[R_ARG0]);
> +gdb_exit(env->regs[R_ARG1]);
> +exit(env->regs[R_ARG1]);
>

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH] target/nios2: Pass semihosting arg to exit

2023-08-01 Thread Keith Packard via


> Yeah, the closest to a "standard" we have for nios2 is that
> I asked the Codesourcery folks to document it in the libgloss
> sources and put the URL to it in a comment at the top of
> nios2-semi.c, given that there's no official spec and the
> original and main guest-side user is libgloss.
> m68k is in a similar position only without the URL
> in our source file :-)

Yeah, we had the same ask when getting risc-v semihosting merged. For
that, I actually pushed through an "official" risc-v standard. I kinda
wish I'd used the m68k model instead of the arm model as that provides
simple POSIX semantics...

https://github.com/riscv-software-src/riscv-semihosting/blob/main/riscv-semihosting-spec.adoc

-- 
-keith


signature.asc
Description: PGP signature

Re: [PATCH] target/nios2: Pass semihosting arg to exit

2023-08-01 Thread Peter Maydell

On Tue, 1 Aug 2023 at 16:28, Keith Packard  wrote:
>
>
> > Yeah, the closest to a "standard" we have for nios2 is that
> > I asked the Codesourcery folks to document it in the libgloss
> > sources and put the URL to it in a comment at the top of
> > nios2-semi.c, given that there's no official spec and the
> > original and main guest-side user is libgloss.
> > m68k is in a similar position only without the URL
> > in our source file :-)
>
> Yeah, we had the same ask when getting risc-v semihosting merged. For
> that, I actually pushed through an "official" risc-v standard. I kinda
> wish I'd used the m68k model instead of the arm model as that provides
> simple POSIX semantics...

Yeah, there's a lot of stuff in the Arm semihosting API
that I wouldn't put in there for a from-new version
that didn't need to handle legacy guests. Notably the
"errno" concept is badly underspecified and/or broken.
On an architecture with a decent number of registers
it would probably also make sense to use them rather
than having every single call put its args in memory.

thanks
-- PMM

Re: [PATCH v3 00/17] Support smp.clusters for x86

2023-08-01 Thread Jonathan Cameron via

On Tue,  1 Aug 2023 18:35:10 +0800
Zhao Liu  wrote:

> From: Zhao Liu 
> 
> Hi list,
> 
> This is the our v3 patch series, rebased on the master branch at the
> commit 234320cd0573 ("Merge tag 'pull-target-arm-20230731' of https:
> //git.linaro.org/people/pmaydell/qemu-arm into staging").
> 
> Comparing with v2 [1], v3 mainly adds "Tested-by", "Reviewed-by" and
> "ACKed-by" (for PC related patchies) tags and minor code changes (Pls
> see changelog).
> 
> 
> # Introduction
> 
> This series add the cluster support for x86 PC machine, which allows
> x86 can use smp.clusters to configure x86 modlue level CPU topology.
> 
> And since the compatibility issue (see section: ## Why not share L2
> cache in cluster directly), this series also introduce a new command
> to adjust the x86 L2 cache topology.
> 
> Welcome your comments!
> 
> 
> # Backgroud
> 
> The "clusters" parameter in "smp" is introduced by ARM [2], but x86
> hasn't supported it.
> 
> At present, x86 defaults L2 cache is shared in one core, but this is
> not enough. There're some platforms that multiple cores share the
> same L2 cache, e.g., Alder Lake-P shares L2 cache for one module of
> Atom cores [3], that is, every four Atom cores shares one L2 cache.
> Therefore, we need the new CPU topology level (cluster/module).
> 
> Another reason is for hybrid architecture. cluster support not only
> provides another level of topology definition in x86, but would aslo
> provide required code change for future our hybrid topology support.
> 
> 
> # Overview
> 
> ## Introduction of module level for x86
> 
> "cluster" in smp is the CPU topology level which is between "core" and
> die.
> 
> For x86, the "cluster" in smp is corresponding to the module level [4],
> which is above the core level. So use the "module" other than "cluster"
> in x86 code.
> 
> And please note that x86 already has a cpu topology level also named
> "cluster" [4], this level is at the upper level of the package. Here,
> the cluster in x86 cpu topology is completely different from the
> "clusters" as the smp parameter. After the module level is introduced,
> the cluster as the smp parameter will actually refer to the module level
> of x86.
> 
> 
> ## Why not share L2 cache in cluster directly
> 
> Though "clusters" was introduced to help define L2 cache topology
> [2], using cluster to define x86's L2 cache topology will cause the
> compatibility problem:
> 
> Currently, x86 defaults that the L2 cache is shared in one core, which
> actually implies a default setting "cores per L2 cache is 1" and
> therefore implicitly defaults to having as many L2 caches as cores.
> 
> For example (i386 PC machine):
> -smp 16,sockets=2,dies=2,cores=2,threads=2,maxcpus=16 (*)
> 
> Considering the topology of the L2 cache, this (*) implicitly means "1
> core per L2 cache" and "2 L2 caches per die".
> 
> If we use cluster to configure L2 cache topology with the new default
> setting "clusters per L2 cache is 1", the above semantics will change
> to "2 cores per cluster" and "1 cluster per L2 cache", that is, "2
> cores per L2 cache".
> 
> So the same command (*) will cause changes in the L2 cache topology,
> further affecting the performance of the virtual machine.
> 
> Therefore, x86 should only treat cluster as a cpu topology level and
> avoid using it to change L2 cache by default for compatibility.
> 
> 
> ## module level in CPUID
> 
> Currently, we don't expose module level in CPUID.1FH because currently
> linux (v6.2-rc6) doesn't support module level. And exposing module and
> die levels at the same time in CPUID.1FH will cause linux to calculate
> wrong die_id. The module level should be exposed until the real machine
> has the module level in CPUID.1FH.
> 
> We can configure CPUID.04H.02H (L2 cache topology) with module level by
> a new command:
> 
> "-cpu,x-l2-cache-topo=cluster"
> 
> More information about this command, please see the section: "## New
> property: x-l2-cache-topo".
> 
> 
> ## New cache topology info in CPUCacheInfo
> 
> Currently, by default, the cache topology is encoded as:
> 1. i/d cache is shared in one core.
> 2. L2 cache is shared in one core.
> 3. L3 cache is shared in one die.
> 
> This default general setting has caused a misunderstanding, that is, the
> cache topology is completely equated with a specific cpu topology, such
> as the connection between L2 cache and core level, and the connection
> between L3 cache and die level.
> 
> In fact, the settings of these topologies depend on the specific
> platform and are not static. For example, on Alder Lake-P, every
> four Atom cores share the same L2 cache [2].
> 
> Thus, in this patch set, we explicitly define the corresponding cache
> topology for different cpu models and this has two benefits:
> 1. Easy to expand to new CPU models in the future, which has different
>cache topology.
> 2. It can easily support custom cache topology by some command (e.g.,
>x-l2-cache-topo).
> 
> 
> ## New property: x-l2-cache-

[PATCH] gdbstub: use 0 ("any process") on packets with no PID

2023-08-01 Thread Matheus Tavares Bernardino

Previously, qemu-user would always report PID 1 to GDB. This was changed
at dc14a7a6e9 (gdbstub: Report the actual qemu-user pid, 2023-06-30),
but read_thread_id() still considers GDB packets with "no PID" as "PID
1", which is not the qemu-user PID. Fix that by parsing "no PID" as "0",
which the GDB Remote Protocol defines as "any process".

Note that this should have no effect for system emulation as, in this
case, gdb_create_default_process() will assign PID 1 for the first
process and that is what the gdbstub uses for GDB requests with no PID,
or PID 0.

This issue was found with hexagon-lldb, which sends a "Hq" packet with
only the thread-id, but no process-id, leading to the invalid usage of
"PID 1" by qemu-hexagon and a subsequent "E22" reply.

Signed-off-by: Matheus Tavares Bernardino 
---
 gdbstub/gdbstub.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index ce8b42eb15..e74ecc78cc 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -537,7 +537,7 @@ static GDBThreadIdKind read_thread_id(const char *buf, 
const char **end_buf,
 /* Skip '.' */
 buf++;
 } else {
-p = 1;
+p = 0;
 }
 
 ret = qemu_strtoul(buf, &buf, 16, &t);
-- 
2.37.2

Re: qemu-img cache modes with Linux cgroup v1

2023-08-01 Thread Stefan Hajnoczi

Hi Daniel,
I agree with your points.

Stefan


signature.asc
Description: PGP signature

[PATCH for-8.1] target/m68k: Fix semihost lseek offset computation

2023-08-01 Thread Peter Maydell

The arguments for deposit64 are (value, start, length, fieldval); this
appears to have thought they were (value, fieldval, start,
length). Reorder the parameters to match the actual function.

Cc: qemu-sta...@nongnu.org
Fixes: 950272506d ("target/m68k: Use semihosting/syscalls.h")
Reported-by: Philippe Mathieu-Daudé 
Signed-off-by: Peter Maydell 
---
Same fix for m68k as Keith Packard just sent for nios2
---
 target/m68k/m68k-semi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/m68k/m68k-semi.c b/target/m68k/m68k-semi.c
index 88ad9ba8144..239f6e44e90 100644
--- a/target/m68k/m68k-semi.c
+++ b/target/m68k/m68k-semi.c
@@ -166,7 +166,7 @@ void do_m68k_semihosting(CPUM68KState *env, int nr)
 GET_ARG64(2);
 GET_ARG64(3);
 semihost_sys_lseek(cs, m68k_semi_u64_cb, arg0,
-   deposit64(arg2, arg1, 32, 32), arg3);
+   deposit64(arg2, 32, 32, arg1), arg3);
 break;
 
 case HOSTED_RENAME:
-- 
2.34.1

[PATCH] target/m68k: Add URL to semihosting spec

2023-08-01 Thread Peter Maydell

The spec for m68k semihosting is documented in the libgloss
sources. Add a comment with the URL for it, as we already
have for nios2 semihosting.

Signed-off-by: Peter Maydell 
---
 target/m68k/m68k-semi.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/target/m68k/m68k-semi.c b/target/m68k/m68k-semi.c
index 239f6e44e90..80cd8d70dbb 100644
--- a/target/m68k/m68k-semi.c
+++ b/target/m68k/m68k-semi.c
@@ -15,6 +15,10 @@
  *
  *  You should have received a copy of the GNU General Public License
  *  along with this program; if not, see .
+ *
+ *  The semihosting protocol implemented here is described in the
+ *  libgloss sources:
+ *  
https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=libgloss/m68k/m68k-semi.txt;hb=HEAD
  */
 
 #include "qemu/osdep.h"
-- 
2.34.1

[PATCH 2/2] block/blkio: add more comments on the fd passing handling

2023-08-01 Thread Stefano Garzarella

As Hanna pointed out, it is not clear in the code why qemu_open()
can fail, and why blkio_set_int("fd") is not enough to discover
the `fd` property support.

Let's fix them by adding more details in the code comments.

Suggested-by: Hanna Czenczek 
Signed-off-by: Stefano Garzarella 
---
 block/blkio.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/block/blkio.c b/block/blkio.c
index 2d53a865e7..848b8189d0 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -713,6 +713,12 @@ static int blkio_virtio_blk_connect(BlockDriverState *bs, 
QDict *options,
  */
 fd = qemu_open(path, O_RDWR, NULL);
 if (fd < 0) {
+/*
+ * qemu_open() can fail if the user specifies a path that is not
+ * a file or device, for example in the case of Unix Domain Socket
+ * for the virtio-blk-vhost-user driver. In such cases let's have
+ * libblkio open the path directly.
+ */
 fd_supported = false;
 } else {
 ret = blkio_set_int(s->blkio, "fd", fd);
@@ -734,9 +740,12 @@ static int blkio_virtio_blk_connect(BlockDriverState *bs, 
QDict *options,
 
 ret = blkio_connect(s->blkio);
 /*
- * If the libblkio driver doesn't support the `fd` property, 
blkio_connect()
- * will fail with -EINVAL. So let's try calling blkio_connect() again by
- * directly setting `path`.
+ * Before https://gitlab.com/libblkio/libblkio/-/merge_requests/208
+ * (libblkio <= v1.3.0), setting the `fd` property is not enough to check
+ * whether the driver supports the `fd` property or not. In that case,
+ * blkio_connect() will fail with -EINVAL.
+ * So let's try calling blkio_connect() again by directly setting `path`
+ * to cover this scenario.
  */
 if (fd_supported && ret == -EINVAL) {
 fd_supported = false;
-- 
2.41.0

Re: [PATCH v5 01/11] hw: arm: Add bananapi M2-Ultra and allwinner-r40 support

2023-08-01 Thread Peter Maydell

On Sat, 24 Jun 2023 at 16:02, Guenter Roeck  wrote:
>
> On 6/24/23 07:23, Guenter Roeck wrote:
> > On 6/24/23 03:40, Peter Maydell wrote:
> >> On Fri, 23 Jun 2023 at 20:33, Guenter Roeck  wrote:
> >>>
> >>> On 6/23/23 10:44, Peter Maydell wrote:
>  On Sat, 17 Jun 2023 at 17:29, Guenter Roeck  wrote:
> > Main problem is that the SD card gets instantiated randomly to
> > mmc0, mmc1, or mmc2, making it all but impossible to specify a
> > root file system device. The non-instantiated cards are always
> > reported as non-removable, including mmc0. Example:
> >
> > mmc0: Failed to initialize a non-removable card
> 
>  Do you mean that QEMU randomly connects the SD card to
>  a different MMC controller each time, or that Linux is
>  randomly assigning mmc0 to a different MMC controller each
>  time ?
> 
> >>>
> >>> Good question. Given the workaround (fix ?) I suggested is
> >>> in the devicetree file, I would assume it is the latter. I suspect
> >>> that Linux assigns drive names based on hardware detection order,
> >>> and that this is not deterministic for some reason. It is odd
> >>> because I have never experienced that with any other emulation.
> >>
> >> Yeah, I don't really understand why it would be non-deterministic.
> >> But it does make it sound like the right thing is for the
> >> device tree file to explicitly say which MMC controller is
> >> which -- presumably you might get unlucky with the timing
> >> on real hardware too.
> >>
> >
> > Agreed, only someone with real hardware would have to confirm
> > that this is the case.
> >
>
> Actually, the reason is quite simple. In the Linux kernel:
>
> static struct platform_driver sunxi_mmc_driver = {
>  .driver = {
>  .name   = "sunxi-mmc",
>  .probe_type = PROBE_PREFER_ASYNCHRONOUS,
>^
>  .of_match_table = sunxi_mmc_of_match,
>  .pm = &sunxi_mmc_pm_ops,
>  },
>  .probe  = sunxi_mmc_probe,
>  .remove = sunxi_mmc_remove,
> };
>
> All mmc devices instantiate at the same time, thus the
> device name association is random. If I drop the probe_type
> assignment, it becomes deterministic.
>
> On top of that, Linux does know which drives are removable
> from the devicetree file. However, since probe order is
> random, the assignment of the one removable drive to device
> names is random. Sometimes mmc0 shows up as removable,
> sometimes it is mmc1 or mmc2.
>
> So my conclusion is that qemu isn't doing anything wrong,
> it is all happening in the Linux kernel.

Hi Guenter -- do you know if this "random MMC controller"
issue has been fixed in Linux ? If so, we might be able
to update our test case image to avoid the slightly ugly
"root=b300" workaround at some point.

thanks
-- PMM

[PATCH 1/2] block/blkio: close the fd when blkio_connect() fails

2023-08-01 Thread Stefano Garzarella

libblkio drivers take ownership of `fd` only after a successful
blkio_connect(), so if it fails, we are still the owners.

Fixes: cad2ccc395 ("block/blkio: use qemu_open() to support fd passing for 
virtio-blk")
Suggested-by: Hanna Czenczek 
Signed-off-by: Stefano Garzarella 
---
 block/blkio.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/block/blkio.c b/block/blkio.c
index 8e7ce42c79..2d53a865e7 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -739,6 +739,7 @@ static int blkio_virtio_blk_connect(BlockDriverState *bs, 
QDict *options,
  * directly setting `path`.
  */
 if (fd_supported && ret == -EINVAL) {
+fd_supported = false;
 qemu_close(fd);
 
 /*
@@ -763,6 +764,14 @@ static int blkio_virtio_blk_connect(BlockDriverState *bs, 
QDict *options,
 }
 
 if (ret < 0) {
+if (fd_supported) {
+/*
+ * libblkio drivers take ownership of `fd` only after a successful
+ * blkio_connect(), so if it fails, we are still the owners.
+ */
+qemu_close(fd);
+}
+
 error_setg_errno(errp, -ret, "blkio_connect failed: %s",
  blkio_get_error_msg());
 return ret;
-- 
2.41.0

[PATCH 0/2] block/blkio: fix fd leak and add more comments for the fd passing

2023-08-01 Thread Stefano Garzarella

Hanna discovered an fd leak in the error path, and a few comments to
improve in the code.

Stefano Garzarella (2):
  block/blkio: close the fd when blkio_connect() fails
  block/blkio: add more comments on the fd passing handling

 block/blkio.c | 24 +---
 1 file changed, 21 insertions(+), 3 deletions(-)

-- 
2.41.0

Re: [PATCH] linux-user: Fix openat() emulation to correctly detect accesses to /proc

2023-08-01 Thread Daniel P . Berrangé

On Tue, Aug 01, 2023 at 04:58:57PM +0200, Helge Deller wrote:
> In qemu we catch accesses to files like /proc/cpuinfo or /proc/net/route
> and return to the guest contents which would be visible on a real system
> (instead what the host would show).
> 
> This patch fixes a bug, where for example the accesses
> cat /proccpuinfo
> or
> cd /proc && cat cpuinfo
> will not be recognized by qemu and where qemu will wrongly show
> the contents of the host's /proc/cpuinfo file.
> 
> Signed-off-by: Helge Deller 
> 
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index 917c388073..bb864c2bb3 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -8531,9 +8531,11 @@ static int open_cpuinfo(CPUArchState *cpu_env, int fd)
>  }
>  #endif
> 
> -int do_guest_openat(CPUArchState *cpu_env, int dirfd, const char *pathname,
> +int do_guest_openat(CPUArchState *cpu_env, int dirfd, const char *fname,
>  int flags, mode_t mode, bool safe)
>  {
> +char proc_name[PATH_MAX];

No PATH_MAX buffers declared on the stack please.

> +const char *pathname;
>  struct fake_open {
>  const char *filename;
>  int (*fill)(CPUArchState *cpu_env, int fd);
> @@ -8560,6 +8562,13 @@ int do_guest_openat(CPUArchState *cpu_env, int dirfd, 
> const char *pathname,
>  { NULL, NULL, NULL }
>  };
> 
> +/* if this is a file from /proc/ filesystem, expand full name */
> +if (realpath(fname, proc_name) && strncmp(proc_name, "/proc/", 6) == 0) {

QEMU relies on the extended semantics of realpath() where passing NULL
for 'proc_name' causes it to allocate a buffer of the correct size and
return the new buffer. Using that avoids PATH_MAX variables.

> +pathname = proc_name;
> +} else {
> +pathname = fname;
> +}

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [RFC PATCH 04/19] memory: Introduce memory_region_can_be_private()

2023-08-01 Thread Sean Christopherson

On Mon, Jul 31, 2023, Peter Xu wrote:
> On Mon, Jul 31, 2023 at 05:36:37PM -0400, Michael S. Tsirkin wrote:
> > On Mon, Jul 31, 2023 at 02:34:22PM -0700, Sean Christopherson wrote:
> > > On Mon, Jul 31, 2023, Peter Xu wrote:
> > > > On Mon, Jul 31, 2023 at 12:21:46PM -0400, Xiaoyao Li wrote:
> > > > > +bool memory_region_can_be_private(MemoryRegion *mr)
> > > > > +{
> > > > > +return mr->ram_block && mr->ram_block->gmem_fd >= 0;
> > > > > +}
> > > > 
> > > > This is not really MAP_PRIVATE, am I right?  If so, is there still 
> > > > chance
> > > > we rename it (it seems to be also in the kernel proposal all across..)?
> > > 
> > > Yes and yes.
> > > 
> > > > I worry it can be very confusing in the future against MAP_PRIVATE /
> > > > MAP_SHARED otherwise.
> > > 
> > > Heh, it's already quite confusing at times.  I'm definitely open to 
> > > naming that
> > > doesn't collide with MAP_{PRIVATE,SHARED}, especially if someone can come 
> > > with a
> > > naming scheme that includes a succinct way to describe memory that is 
> > > shared
> > > between two or more VMs, but is accessible to _only_ those VMs.
> > 
> > Standard solution is a technology specific prefix.
> > protect_shared, encrypt_shared etc.
> 
> Agreed, a prefix could definitely help (if nothing better comes at last..).
> If e.g. "encrypted" too long to be applied everywhere in var names and
> functions, maybe it can also be "enc_{private|shared}".

FWIW, I would stay away from "encrypted", there is no requirement that the 
memory
actually be encrypted.

Re: [RFC PATCH 03/19] RAMBlock: Support KVM gmemory

2023-08-01 Thread David Hildenbrand


On 31.07.23 18:21, Xiaoyao Li wrote:

From: Chao Peng 

Add KVM gmem support to RAMBlock so we can have both normal
hva based memory and gmem fd based memory in one RAMBlock.

The gmem part is represented by the gmem_fd.

Signed-off-by: Chao Peng 
Signed-off-by: Xiaoyao Li 


So, *someone* creates a RAMBlock in QEMU.

Who'll squeeze a gmem_fd in there? When? How?

Shouldn't we create the RAM memory region / RAMBlock already with the 
gmem_fd, and set that when initializing these things?


--
Cheers,

David / dhildenb

Re: [RFC PATCH 04/19] memory: Introduce memory_region_can_be_private()

2023-08-01 Thread Claudio Fontana

On 7/31/23 18:21, Xiaoyao Li wrote:
> Signed-off-by: Xiaoyao Li 
> ---
>  include/exec/memory.h | 9 +
>  softmmu/memory.c  | 5 +
>  2 files changed, 14 insertions(+)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 61e31c7b9874..e119d3ce1a1d 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -1679,6 +1679,15 @@ static inline bool memory_region_is_romd(MemoryRegion 
> *mr)
>   */
>  bool memory_region_is_protected(MemoryRegion *mr);
>  
> +/**
> + * memory_region_can_be_private: check whether a memory region can be private

The name of the function is not particularly informative,

> + *
> + * Returns %true if a memory region's ram_block has valid gmem fd assigned.

but in your comment you describe more accurately what it does, why not make it 
the function name?

bool memory_region_has_valid_gmem_fd()

> + *
> + * @mr: the memory region being queried
> + */
> +bool memory_region_can_be_private(MemoryRegion *mr);


bool memory_region_has_valid_gmem_fd()


Thanks,

C

> +
>  /**
>   * memory_region_get_iommu: check whether a memory region is an iommu
>   *
> diff --git a/softmmu/memory.c b/softmmu/memory.c
> index 4f8f8c0a02e6..336c76ede660 100644
> --- a/softmmu/memory.c
> +++ b/softmmu/memory.c
> @@ -1855,6 +1855,11 @@ bool memory_region_is_protected(MemoryRegion *mr)
>  return mr->ram && (mr->ram_block->flags & RAM_PROTECTED);
>  }
>  
> +bool memory_region_can_be_private(MemoryRegion *mr)
> +{
> +return mr->ram_block && mr->ram_block->gmem_fd >= 0;
> +}
> +
>  uint8_t memory_region_get_dirty_log_mask(MemoryRegion *mr)
>  {
>  uint8_t mask = mr->dirty_log_mask;

Re: [RFC PATCH 04/19] memory: Introduce memory_region_can_be_private()

2023-08-01 Thread Claudio Fontana

On 8/1/23 18:48, Claudio Fontana wrote:
> On 7/31/23 18:21, Xiaoyao Li wrote:
>> Signed-off-by: Xiaoyao Li 
>> ---
>>  include/exec/memory.h | 9 +
>>  softmmu/memory.c  | 5 +
>>  2 files changed, 14 insertions(+)
>>
>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index 61e31c7b9874..e119d3ce1a1d 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
>> @@ -1679,6 +1679,15 @@ static inline bool memory_region_is_romd(MemoryRegion 
>> *mr)
>>   */
>>  bool memory_region_is_protected(MemoryRegion *mr);
>>  
>> +/**
>> + * memory_region_can_be_private: check whether a memory region can be 
>> private
> 
> The name of the function is not particularly informative,
> 
>> + *
>> + * Returns %true if a memory region's ram_block has valid gmem fd assigned.
> 
> but in your comment you describe more accurately what it does, why not make 
> it the function name?
> 
> bool memory_region_has_valid_gmem_fd()


btw can a memory region have an invalid gmem_fd ?

If an invalid gmem_fd is just used to mark whether gmem_fd is present or not,

we could make it just:

bool memory_region_has_gmem_fd()


Thanks,

C

> 
>> + *
>> + * @mr: the memory region being queried
>> + */
>> +bool memory_region_can_be_private(MemoryRegion *mr);
> 
> 
> bool memory_region_has_valid_gmem_fd()
> 
> 
> Thanks,
> 
> C
> 
>> +
>>  /**
>>   * memory_region_get_iommu: check whether a memory region is an iommu
>>   *
>> diff --git a/softmmu/memory.c b/softmmu/memory.c
>> index 4f8f8c0a02e6..336c76ede660 100644
>> --- a/softmmu/memory.c
>> +++ b/softmmu/memory.c
>> @@ -1855,6 +1855,11 @@ bool memory_region_is_protected(MemoryRegion *mr)
>>  return mr->ram && (mr->ram_block->flags & RAM_PROTECTED);
>>  }
>>  
>> +bool memory_region_can_be_private(MemoryRegion *mr)
>> +{
>> +return mr->ram_block && mr->ram_block->gmem_fd >= 0;
>> +}
>> +
>>  uint8_t memory_region_get_dirty_log_mask(MemoryRegion *mr)
>>  {
>>  uint8_t mask = mr->dirty_log_mask;
>

Re: [PATCH] migration/calc-dirty-rate: millisecond precision period

2023-08-01 Thread Peter Xu

On Tue, Aug 01, 2023 at 05:55:29PM +0300, gudkov.and...@huawei.com wrote:
> Hmmm, such underestimation looks strange to me. I am willing to test
> page-sampling and see whether its quality can be improved. Do you have
> any specific suggestions on the application to use as a workload?

I could have had a wrong impression here, sorry.

I played again with the page sampling approach, and that's actually pretty
decent..

I had that impression probably based on the fact that by default we chose 2
pages out of 1000-ish (consider workloads having 100-ish memory updates
where no sample page falls into it), and I do remember in some cases of my
test setups quite some time ago, it shows totally wrong numbers. But maybe
I had a wrong test, or wrong memory.

Now thinking about it, for random/seq on not so small memory ranges, that
seems to all work.  For very small ones spread over it goes to the random
case.

> 
> If it turns out that page-sampling is not an option, then performance
> impact of the dirty-bitmap must be improved somehow. Maybe it makes
> sense to split memory into 4GiB chunks and measure dirty page rate
> independently for each of the chunks (without enabling page
> protections for memory outside of the currently processed chunk).
> But the downsides are that 1) total measurement time will increase
> proportionally by number of chunks 2) dirty page rate will be
> overestimated.
> 
> But actually I am still hoping on page sampling. Since my goal is to
> roughly predict what can be migrated and what cannot be, I would prefer
> to keep predictor as lite as possible, even at the cost of
> (overestimation) error.

Yes I also hope that works as you said.

Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH] acpi: Add emulated sleep button

2023-08-01 Thread Annie.li


Hi Igor,

On 7/14/2023 10:04 AM, Igor Mammedov wrote:

On Fri, 7 Jul 2023 13:43:36 -0400
"Annie.li"  wrote:


Hi Igor,

Revisiting this thread and have more questions, please clarify, thank you!

On 9/20/2021 3:53 AM, Igor Mammedov wrote:

On Fri, 6 Aug 2021 16:18:09 -0400
"Annie.li"  wrote:
  

Hello Igor,

This is an old patch, but it does what we need.
I am getting a little bit lost about not implementing fixed hardware
sleep button, can you please clarify? thank you!

On 7/20/2017 10:59 AM, Igor Mammedov wrote:

On Thu, 20 Jul 2017 11:31:26 +0200
Stefan Fritsch  wrote:
 

From: Stefan Fritsch 

Add an ACPI sleep button and QMP/HMP commands to trigger it.  A sleep
button is a so called "fixed hardware feature", which makes it more
suitable for putting the system to sleep than a laptop lid, for example.

The sleep button is disabled by default (Bit 5 in the FACP flags
register set and no button "device" present in SSDT/DSDT). Clearing said
bit enables it as a fixed feature device.

per spec sleep button is used for both putting system into
sleep and for waking it up.

Reusing system_wakeup 'button' to behave as per spec would
make this patch significantly smaller.

Current 'system_wakeup' sets the WAK_STS bit and PWRBTN_STS to wake up
the system, the system_wakeup 'button' is the power button. So(Correct me
if I am wrong) reusing the system_wakeup 'button' means reusing the power
button for sleep. See the following code of setting WAK_STS and PWRBTN_STS
for 'system_wakeup',
      case QEMU_WAKEUP_REASON_OTHER:
      /* ACPI_BITMASK_WAKE_STATUS should be set on resume.
     Pretend that resume was caused by power button */
      ar->pm1.evt.sts |=
      (ACPI_BITMASK_WAKE_STATUS | ACPI_BITMASK_POWER_BUTTON_STATUS);

(that's quite ancient code (0bacd1300d) and I couldn't find a reason why
power button was chosen.
  
https://urldefense.com/v3/__https://www.mail-archive.com/kvm@vger.kernel.org/msg05742.html__;!!ACWV5N9M2RV99hQ!PQp9_UyWYc4gTuTwNUiyPgE0Xwinlsi8F-J6zWOA8KRLUh0EIv68g01XQQrFzKipeZbe-vhHfpGZBb0$
that was tested with WinXP so would be wise to check if SLEEP button
works there. (though I'm not sure if we still care for XP being runnable on 
QEMU)
)


don't have WinXP at hand to check for now...
Microsoft has ended the WinXP support since Apr 8, 2014.
If someone is still running WinXP, not sure if they care about
the sleep button.


it doesn't have to be ACPI_BITMASK_POWER_BUTTON_STATUS,
you can convert it to sleep button by using ACPI_BITMASK_SLEEP_BUTTON_STATUS.
However you'll have to enable sleep button (which is disabled currently)
in FADT (ACPI_FADT_F_SLP_BUTTON), for guest to see the button.

Agree
Per ACPI spec, SLPBTN_STS -

"This optional bit is set when the sleep button is pressed. In the system
working state, while SLPBTN_EN and SLPBTN_STS are both set, an
interrupt event is raised. In the sleep or soft-off states a wake event is
generated when the sleeping button is pressed and the SLPBTN_EN bit
is set."

Using ACPI_BITMASK_SLEEP_BUTTON_STATUS means qemu ends up with
supporting the sleep button.  With this, implementing the fixed
hardware sleep button(what this patch does) is one option. The interesting
topic is whether it should be implemented as General Control sleep button
or fixed hardware button.




      break;

Per the ACPI spec, the power button can be used in single button model, i.e.
it can be used to either shut down the system or put the system into sleep.
However, this depends on the software policy and user's setting of the
system.
Sleep/shutdown can not be done through the power button in the same
scenario.
If the system has configured pressing the power button to put the system
into sleep,
system_sleep will put the system into sleep state through the power
button, as well
as system_powerdown. Pressing the power button will not shut down the
system.
In this case, system_powerdown has its own issue, but this is different
story and
let's just focus on things related to system_sleep here.

regardless of what button is used OSPM is free to enable or disable it
using FOO_EN bits.


Nod
The system_powerdown issue I mentioned above is -
If the guest is configured to go into sleep when the power button is
pressed, system_powerdown will trigger the guest to go into sleep state.
However, system_wakeup won't be able to wake up the guest in such case.
Looks the current qemu doesn't handle this scenario properly.

About reusing "system_wakeup", does it mean the following?

1. when guest is in sleep state, "system_wakeup" wakes up the guest
2. when guest is running, "system_wakeup" puts the guest into sleep

yes,  it could be something like this

  

"system_wakeup" sets WAK_STS and then system transitions to the
working state. Correspondingly, I suppose both SLPBTN_STS and
SLPBTN_EN need to be set for sleeping, and this is what fixed
hardware sleep button requires?

yep
 

I have combined the sleep and wakeup together, share the
code between. But Xe

Re: [PATCH v5 01/11] hw: arm: Add bananapi M2-Ultra and allwinner-r40 support

2023-08-01 Thread Guenter Roeck


On 8/1/23 09:01, Peter Maydell wrote:

On Sat, 24 Jun 2023 at 16:02, Guenter Roeck  wrote:


On 6/24/23 07:23, Guenter Roeck wrote:

On 6/24/23 03:40, Peter Maydell wrote:

On Fri, 23 Jun 2023 at 20:33, Guenter Roeck  wrote:


On 6/23/23 10:44, Peter Maydell wrote:

On Sat, 17 Jun 2023 at 17:29, Guenter Roeck  wrote:

Main problem is that the SD card gets instantiated randomly to
mmc0, mmc1, or mmc2, making it all but impossible to specify a
root file system device. The non-instantiated cards are always
reported as non-removable, including mmc0. Example:

mmc0: Failed to initialize a non-removable card


Do you mean that QEMU randomly connects the SD card to
a different MMC controller each time, or that Linux is
randomly assigning mmc0 to a different MMC controller each
time ?



Good question. Given the workaround (fix ?) I suggested is
in the devicetree file, I would assume it is the latter. I suspect
that Linux assigns drive names based on hardware detection order,
and that this is not deterministic for some reason. It is odd
because I have never experienced that with any other emulation.


Yeah, I don't really understand why it would be non-deterministic.
But it does make it sound like the right thing is for the
device tree file to explicitly say which MMC controller is
which -- presumably you might get unlucky with the timing
on real hardware too.



Agreed, only someone with real hardware would have to confirm
that this is the case.



Actually, the reason is quite simple. In the Linux kernel:

static struct platform_driver sunxi_mmc_driver = {
  .driver = {
  .name   = "sunxi-mmc",
  .probe_type = PROBE_PREFER_ASYNCHRONOUS,
^
  .of_match_table = sunxi_mmc_of_match,
  .pm = &sunxi_mmc_pm_ops,
  },
  .probe  = sunxi_mmc_probe,
  .remove = sunxi_mmc_remove,
};

All mmc devices instantiate at the same time, thus the
device name association is random. If I drop the probe_type
assignment, it becomes deterministic.

On top of that, Linux does know which drives are removable
from the devicetree file. However, since probe order is
random, the assignment of the one removable drive to device
names is random. Sometimes mmc0 shows up as removable,
sometimes it is mmc1 or mmc2.

So my conclusion is that qemu isn't doing anything wrong,
it is all happening in the Linux kernel.


Hi Guenter -- do you know if this "random MMC controller"
issue has been fixed in Linux ? If so, we might be able
to update our test case image to avoid the slightly ugly
"root=b300" workaround at some point.



No, it has not been fixed, or at least there is nothing in linux-next.
I don't have real hardware, so I am not in a position to submit, much
less test, a patch on it. Someone had mentioned that real hardware would
handle the problem in initramfs. That seems wrong to me, but it is what
it is.

I changed my own test to use the "root=b300" hack. That seems highly kludgy,
but at least it works.

Guenter

1 2 >

1 - 100 of 191 matches

Mail list logo